« Changing the Practice of Law from Within | Main | I AM HAVING FUN! »

2005.11.09

The Problem with OCR

Rob Hyndman at the robhyndman.com blog has this comment to my post about my favorite technology tool, Copernic. Rob notes:

      "Copernic has changed everything for me, but I've found it to be especially powerful when used in conjunction with OCR'd documents. I don't bother cleaning them up - I just drop pdfs into a watch folder and then file the Word files that are produced. The OCR is so accurate in the latest gen of software that that is all it takes. Puts Copernic on steroids ..."

Rob makes a great point. OCR documents are completely searchable, my problem has been that OCR document sizes are so large that I have difficulty transmitting them to clients, experts and others. I use LeapFile for large file transfers, but the upload download time is still significant. Perhaps I am saving my OCR files with the wrong dpi. I wonder if anyone else has this problem?

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/235826/3614900

Listed below are links to weblogs that reference The Problem with OCR:

Comments

I go with 1 bit, 300 dpi, saved in pdf as raw scan output, and then OCR produces a file in native MS Word format. File sizes are smaller if you tell the OCR software not to save images when it does the scan.

Sample work product:

75 pages of paper produces a 4.2M pdf and then a Word doc of under 100K.

Thanks Rob. That makes sense. I'll give that a try.

90% of the time 300dpi is going too be plenty. OCR software has come along way in 9 months. The key being "searchable" documents. Manufactures are listening too their market sectors. In truth, more people like yourselves get updated with current trends, they will trip over each other too grab market share.

Post a comment

If you have a TypeKey or TypePad account, please Sign In

Enter your email address:

Delivered by FeedBurner

AddThis Social Bookmark Button

The History of GAL

Blog powered by TypePad