29
Improving OCR Accuracy Clean Up and Enhance Scanned Images

Improve OCR Accuracy, Clean Up and Enhance Scanned Images

  • View
    1.016

  • Download
    0

Embed Size (px)

DESCRIPTION

See ways to improve OCR accuracy on document scans. Cleaning and enhancing images can greatly improve the accuracy of OCR interpretations on your documents. Learn about automatic sophisticated adaptive thresholding, text smoothing and more. Add field validation and preview and testing features for optimal OCR interpretation.

Citation preview

Page 1: Improve OCR Accuracy, Clean Up and Enhance Scanned Images

Improving OCR Accuracy

Clean Up and Enhance Scanned Images

Page 2: Improve OCR Accuracy, Clean Up and Enhance Scanned Images

Cleaner Image = More Accurate OCR

Page 3: Improve OCR Accuracy, Clean Up and Enhance Scanned Images

Your acceptable level of OCR accuracy may depend on your application

Page 4: Improve OCR Accuracy, Clean Up and Enhance Scanned Images

Healthcare and Legal applications have

high OCR accuracy requirements.

Page 5: Improve OCR Accuracy, Clean Up and Enhance Scanned Images

Pre-

Scanning

During

Scanning

Optimizing for the highest OCR accuracy generally is divided into two phases.

Page 6: Improve OCR Accuracy, Clean Up and Enhance Scanned Images

Form Design • adequate white

space • limited lines

Font Selection

• monospace like Courier or san serif fonts like Helvetica

• at least 10-13 points

Color Selection

• limited use of color

Set pre-processing standards and procedures

Page 7: Improve OCR Accuracy, Clean Up and Enhance Scanned Images

During scanning…

Scan at at least 300 dpi

and CLEAN.

Page 8: Improve OCR Accuracy, Clean Up and Enhance Scanned Images

Most capture applications include basic cleaning features.

Page 9: Improve OCR Accuracy, Clean Up and Enhance Scanned Images

Go beyond the basics with DocuFi’s

Page 10: Improve OCR Accuracy, Clean Up and Enhance Scanned Images

Adaptive thresholding assists in cleaning “dirty” documents or

documents with a colored background which interferes with the

foreground data.

Adaptive Thresholding

Page 11: Improve OCR Accuracy, Clean Up and Enhance Scanned Images

Adaptive thresholding assists in cleaning “dirty” documents or

documents with a colored background which interferes with the

foreground data.

Adaptive Thresholding

Most scanner and capture software can apply basic thresholding

technology.

Page 12: Improve OCR Accuracy, Clean Up and Enhance Scanned Images

Adaptive Thresholding

ImageRamp uses Adaptive Thresholding with advanced algorithms

and Sensitivity settings allowing you to optimize the thresholding for

your documents.

Page 13: Improve OCR Accuracy, Clean Up and Enhance Scanned Images

This option smoothes the edging of text. Smoothing text fills small

pits in the edges of a character and removes small bumps on the

edges. This improves legibility and reduce storage needs.

Smooth Text

Page 14: Improve OCR Accuracy, Clean Up and Enhance Scanned Images

Dither Form Fills

Black and white printed images may use dithering, often called dot

shading, to simulate shades of gray by varying the patterns of dots.

The Dither Form Fills feature removes areas of dot shading from an

image. This function is used to make a black and white TIFF image

appear as black and white and not a grayscale image.

Page 15: Improve OCR Accuracy, Clean Up and Enhance Scanned Images

This searches and resizes the document based on the outermost

located raster data or pixels.

Reset Margins

Page 16: Improve OCR Accuracy, Clean Up and Enhance Scanned Images

Using detected text as the basis for alignment, this tool is designed

to work with scanned office documents and eliminate rescans.

Deskew or Straighten Page

Page 17: Improve OCR Accuracy, Clean Up and Enhance Scanned Images

This selection detects and removes lines which may interfere with

OCR interpretation.

Remove Lines

Page 18: Improve OCR Accuracy, Clean Up and Enhance Scanned Images

Whether your scanned image is contaminated or a bad original, this

option removes extraneous black specks and fills in white holes on

black areas of an image.

Remove Noise or Despeckle

Page 19: Improve OCR Accuracy, Clean Up and Enhance Scanned Images

Auto Rotate automatically evaluates orientation based on the text

and rotates misoriented pages. Optionally, select a degree of

rotation for ImageRamp to rotate all pages based on the selection.

Auto Rotate and Rotate Pages

Page 20: Improve OCR Accuracy, Clean Up and Enhance Scanned Images

This can be used to eliminate unnecessary blank pages in a

document and make the file size smaller. Blank page detection can

also play a role in file splitting. Many users divide documents in a

scanning stack with blank pages and ImageRamp can be set to split

the stack of documents into multiple files when blanks are detected.

Remove Blank Pages

Page 21: Improve OCR Accuracy, Clean Up and Enhance Scanned Images

Besides cleaning and enhancing the image, ImageRamp has other ways

to improve OCR accuracy.

Page 22: Improve OCR Accuracy, Clean Up and Enhance Scanned Images

OCR with validation during processing is a very powerful way to eliminate entries not meeting a specific format rule.

For instance if an inventory item should contain three alpha characters followed by five numbers, all documents with item numbers that are not identified in the OCR process with that pattern may be tagged for manual inspection before further processing is done.

Field Validation Improves Accuracy.

PEN21096

CAP36581

INV98453

PA568793

Page 23: Improve OCR Accuracy, Clean Up and Enhance Scanned Images

ImageRamp offers significant preview and testing options to fine-tune settings. Additionally ImageRamp offers PDF or TIFF output which may differ in OCR accuracy.

Page 24: Improve OCR Accuracy, Clean Up and Enhance Scanned Images

Set Pre-

Processing

Standards

OCR

Accuracy

Scan at

300+ dpi

Capture

with Clean-

up

Wrap up: Ways to Improve OCR

3

Page 25: Improve OCR Accuracy, Clean Up and Enhance Scanned Images

Pre-Processing Standards

Encourage accuracy by setting document procedures and guidelines to:

Good pre-processing can be as important as the scanning technologies.

• Use adequate white space

• Limit lines and gridlines

• Limit the use of color

• Use OCR friendly fonts and sizes

Page 26: Improve OCR Accuracy, Clean Up and Enhance Scanned Images

Use an Intelligent Capture Solution such as ImageRamp

Page 28: Improve OCR Accuracy, Clean Up and Enhance Scanned Images

For more on: • Clean scans, • Ways to improve OCR

scanning, • Cleaning documents for

scanning, • Enhancing your images for

improved OCR, • Watching folders, • Batch Processing, • Bulk scanning, • Split files with barcodes, • Barcode splitting, • Docufi, • Imageramp, • Watch folders, • Data capture, • Intelligent Data Capture

Contact Us

DocuFi

30 years’ experience in the Document Imaging market.

ImageRamp www.docufi.com

ImageRamp Cleanup and Enhance for OCR

603-685-4033

Copyright ©2014

makers of ImageRamp, Document Management

Capture Solution

Page 29: Improve OCR Accuracy, Clean Up and Enhance Scanned Images

Image Credits

• Tim Evanson, “Albert V Bryan Federal District Courthouse - Alexandria Va - 0014 - 2012-03-10”, http://bit.ly/1iGIBpF

• takacsi75, “Medicine 02”, http://bit.ly/1dtsIxK • ToastyKen,”New Mophead”, http://bit.ly/1ijjkkD

• mjtmail (tiggy), “Day 307”, http://bit.ly/1g4G3Bw