Portable scanner to text for ubuntu 12.04
Marius Gedminas
marius at pov.lt
Fri Jan 24 09:37:48 UTC 2014
On Thu, Jan 23, 2014 at 11:11:43AM -0500, John Hupp wrote:
> To convert that to text, you need OCR. I do not currently know of
> any functional open source OCR software for Linux. But I would be
> happy to be told that there is some.
The state of open-source OCR software is pitiful. The least bad
solution I found was Tesseract:
sudo apt-get install tesseract-ocr-eng
Now it only accepts TIFF files as input, so you need to convert
convert scanned-page.jpg scanned-page.tif
(My notes also indicate that I used GIMP for scanning and manually
cropped/rotated/converted the image to black and white before saving and
converting to TIFF. I don't know if Tesseract can handle images that
aren't black and white. Probably not.)
Anyway, when you have a B&W TIFF image, you can do OCR on it:
tesseract scanned-page.tif output-file
You'll end up with an output-file.txt that has the text (full of OCR
mistakes you need to fix manually).
Sad.
If there's anything better out there, I'd love to know.
Marius Gedminas
--
The difference between Microsoft and 'Jurassic Parc':
In one, a mad businessman makes a lot of money with beasts that should be
extinct.
The other is a film.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: Digital signature
URL: <https://lists.ubuntu.com/archives/ubuntu-users/attachments/20140124/4d747fd1/attachment.sig>
More information about the ubuntu-users
mailing list