View
219
Download
0
Tags:
Embed Size (px)
Citation preview
ExpressReader Pro
■ Printed Text OCR
■ Japanese / English
■ Recognition Rate
99.7% for Japanese
99.8% for English
■ Powerful Layout Analysis
■ for x86 based Windows PC
Features
Adoption for mathematical document
■ Application framework
■ Detection and recognition of mathematical formula
■ Output format
Problems
Flow diagram
Image scanning
Skew correction
Layout analysis
Character recognition
User modificationOutput conversion
Formula recognition
Formula detection
Component relation
Scanning
GraphicalUserInterface
INFTYformulaRecognition
Layout analysis
Character recognitionFormula detection
Formula detection 1
■ Score each words for both mathematical formula and text word, obtained by character recognition.
M 0 90 100 100 0 90 70 90
T 100 40 20 20 100 40 70 90
Formula detection 2
■ Parse by context-free grammar(CFG) - Formula is also non-terminal symbol of this CFG.
XML based processing
■ Input Recognition parameter, Image
■ While processing Layout information, etc
■ Output Result
OCR needs various data while processing
To implement OCR to certain application system,user must program to treat these data.----- Unify to XML
XML Based Processing
Layout analysis
Character recognitionFormula detection
GraphicalUserInterface
XML
XML
XML
Advantage of XML
■ Easy to convert to other formats (XSLT)
■ Easy to treat (DOM/SAX)
■ Extensible / Flexible
■ MathML
■ Platform independent
XML format 1
<OCR> <Parameter> ……Recognition Parameters </Parameter> <Document> <Sheet> <Area> <Text> ….. Recognized Results(After Recognition) </Text> </Area> </Sheet> </Document></OCR>
XML format 2
<Text tag="paragraph" language="English" line_direction="horz" rect="56,308,3258,714">
<ExpText tag_id="0"/> <Field> <Line rect="56,308,3257,392"> <Character rect="56,332,96,392" code="0x67">g <ExpCharacter original_code="0x67" offset="0" size="40"/> <Candidate id="1" code="0x67" sim="867"/> </Character> …… </Line> </Field></Text>
XML format 3
<Character rect="56,332,96,392" code="0x67">g <ExpCharacter original_code="0x67" offset="0" size="40"/> <Candidate id="1" code="0x67" sim="867"/></Character><Formula rect=“98,332,205,392”> <MathML> ….Mathematical formulae </MathML></Formula>