Extract Text, Rename the file and create a CSV file with the Document’s Contents
eDoc Zonal OCR is a program designed to capture data from scanned files, place the data in a csv file and rename the file based upon the contents of the file.
It has the ability to validate the captured data with EasyPatterns (similar to Regular Expressions) to assure accuracy.
The user can also use Fuzzy Logic when applying the EasyPattern to increase the percent of properly processed files.
For instance, many times an OCR engine will return an “I” instead of a “1” or a “O” instead of a “0”.
With Fuzzy Logic it can be assumed that if a number is being captured from the area of the file and an “O” is returned it is really a “0”.
Once processed the user can import the data in the csv file into their database if desired (An import is not included with this program) or just search for the renamed file.
The user can also view and update the file with eDocfile’s Doc Viewer as well to create a simple workflow.
Getting in the Zone – with OCR
A client requirement in a recent project was to be able to process scanned documents – extract information on the document to be used as a name and then rename the image in the same format with the extracted metadata and store it into a folder for onward processing. you would think that such a requirement would be well catered for since so many origanisations have scanning capability now. Well I was surprised I had to dig so hard but I did come up with the solution throught www.edocfile.com and their product Edoc Zonal Ocr. This application allows you to take a template scan (tiff G4 format) – mark out the zone to be read and test the reading capabilities of the OCR engine. Once happy with the capture and recognition you can choose the input and output folders for your scans and whether you wish various other transformations to happen during the running of the job (such as converting to pdf etc).
The wizard pretty much guides you through all of this. Prerequisites are that you need to Office 2003/2007 component MODI installed. I did this by installing only that component from an Office 2007 disk since I am running Office 2010 64 Bit which does not include the component. The product costs around $600 but if placed in the context of having to manually deal with each item – read and rename after scanning and then file the documents in a hierachy to allow them to be searched or further processed – that costs may be offset in time saved.
From the developer – some more description