OCR Document Scanning & Processing

OCR or Optical Character Recognition is a technology that has been around for a number of years but has improved in accuracy as time has passed.

OCR is the process of recognising text in a scanned image and extracting that where required to other applications.

There are two kinds of OCR that Scantronics in the main utilise

Firstly full page OCR which is very useful for solicitors or accountants or anybody who needs to re-create an original from a scanned copy.

Secondly we are commonly asked to provide Searchable Text, this is when the original scanned documents remains in a non editable format yet the text and content is now searchable within the document and across networks.

Please note that using Adobe Acrobat will only allow you to search the OCR ‘ed text within the open document you would need some kind of search tool or document management tool to search across the network.

In summary we can Make Searchable Text accessible to third party documents. Make PDF files searchable Search for content within files from your Desktop or across your network using the right tools Batch OCR multiple documents

OCR’ing a book or magazine or in fact any document is only as good as the quality of the original, if there are marks or extraneous items on the page the OCR engine can interpret these as characters or page commands such as line break etc.

If the text on your page is faded or of poor quality this will also affect the accuracy of the OCR engine.

In the main we find that post production of the generation of a searchable PDF than the quality of our processes and engines will produce 95% or more recognisable text for the search engine.

For text that is to be cut and pasted or replicated in a third party application such as word then a degree of ‘cleaning’ of the formatting of the text and the content is required and this can increase the cost of the scan output to our clients.

We normally recommend the most cost effective path is to keep the scanned image as a text searchable PDF.