Upload
jean-barrett
View
218
Download
1
Tags:
Embed Size (px)
Citation preview
Development of an OCR System
Nathan Harmata
TJHSST Computer Systems Lab2007-2008
What is OCR?
Optical Character Recognition
Font and handwriting based
Goals of My Project
Generic recognition for Latin-based fonts
Proper handling of most formatting
System built from scratch
Overview of Idocrase System
Image Processing
Transformations
Attribute
Character Model
Transformations
Sector Vector - image is parsed into parts that pass the vertical line test
- then each part is transformed into a collection of line segments
Gap Vector - gaps, if any, are found on the four sides of the image
Transformations
Pixel Concentration Vector – which sides, if any, have a higher concentration of pixels
Character Recognition
GCDD – Generic Character Definition Database
Averages of Character Models for every character from many different fonts
0 PixelConcentrationVector balanced balanced SectorVector 4 3 GapVector
Character Recognition
For a single character:
For words, dictionary and grammar references are used.
Idocrase Application
Results
-Mediocre word recognition-Doesn’t handle formatting well-Doesn’t handle small letters well-Fairly accurate single character recognition (93.7%)