[ubuntu-uk] OCR ....
Bruno Girin
brunogirin at gmail.com
Mon Dec 6 20:18:43 GMT 2010
On Mon, 2010-12-06 at 16:55 +0000, Barry Drake wrote:
> On Mon, 2010-12-06 at 16:07 +0000, Simon Greenwood wrote:
>
> > I had a need to do some OCR recently and came across a project called
> > tesseract-ocr: http://code.google.com/p/tesseract-ocr/. It's based on
> > HP code that dates from the mid-90s. I've only used it to extract text
> > from existing graphics but it seems to be very accurate.
>
> You're right - it is accurate - and it works with the neat gui frontend
> that Danté mentioned - gscan2pdf. Makes a fantastic combination that's
> amazingly easy to use. Tesseract and gscan2pdf really ought to get into
> the normal Ubuntu release .... or at least be well promoted in the
> 'Software Centre' and Synaptic so they are easy to find. The only one
> that's really easy to find is gocr, and so far I'm not that impressed.
OCRFeeder is another option: it is in the Ubuntu repo, uses Tesseract as
a default back-end and can be installed from the software centre. I
haven't used it extensively so I have no idea how it compares to
gscan2pdf.
Cheers,
Bruno
More information about the ubuntu-uk
mailing list