[ubuntu-uk] OCR ....

Mon Dec 6 20:18:43 GMT 2010

On Mon, 2010-12-06 at 16:55 +0000, Barry Drake wrote:
> On Mon, 2010-12-06 at 16:07 +0000, Simon Greenwood wrote:
> 
> > I had a need to do some OCR recently and came across a project called
> > tesseract-ocr: http://code.google.com/p/tesseract-ocr/. It's based on
> > HP code that dates from the mid-90s. I've only used it to extract text
> > from existing graphics but it seems to be very accurate.
> 
> You're right - it is accurate - and it works with the neat gui frontend
> that Danté mentioned - gscan2pdf. Makes a fantastic combination that's
> amazingly easy to use.  Tesseract and gscan2pdf really ought to get into
> the normal Ubuntu release .... or at least be well promoted in the
> 'Software Centre' and Synaptic so they are easy to find.  The only one
> that's really easy to find is gocr, and so far I'm not that impressed.

OCRFeeder is another option: it is in the Ubuntu repo, uses Tesseract as
a default back-end and can be installed from the software centre. I
haven't used it extensively so I have no idea how it compares to
gscan2pdf.

Cheers,

Bruno