[Bug 461177] [NEW] tesseract 2.03 generates empty file

neuromancer neuromancer at devonlinux.net
Mon Oct 26 15:25:14 UTC 2009


Public bug reported:

Karmic Koala 9.10 beta - tesseract 2.03
Installed tesseract-ocr, tesseract-ocr-eng and  tesseract-ocr-ita.

I opened with gimp an image with some number and other informations and cutted just a selection of it and then saved to a tif format.
The image is very cleaned and well contrasted (white background and black text) but when I launched
tesseract inputimage.tif outputfile
the outputfile.txt generated was empty (no text and 1 byte in size).

After a bit of searching, I've found a solution here http://groups.google.com/group/tesseract-ocr/browse_thread/thread/2434f09ed180c092/e5ed41969097c708?lnk=gst&q=screenshot#e5ed41969097c708
Just do
convert inputimage.tif inputimage_tmp.pbm
convert inputimage_tmp.pbm inputimage_ok.tif
This problem exists because sometimes tif images have an alpha transparent layer that block the text recognition.

Accordly to this page http://code.google.com/p/tesseract-
ocr/issues/detail?id=160, the new version, 2.04, have fixed this
problem, so the only thing to do is to package new version.

** Affects: tesseract (Ubuntu)
     Importance: Undecided
         Status: New

-- 
tesseract 2.03 generates empty file
https://bugs.launchpad.net/bugs/461177
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs at lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs




More information about the universe-bugs mailing list