View
867
Download
0
Embed Size (px)
DESCRIPTION
Presentada en "Sesión de demostración de IMPACT en la BNE". Octubre. Biblioteca Nacional de España.
Citation preview
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT OCR in a nutshellClemens Neudecker, National Library of the NetherlandsIMPACT Demo Day, Biblioteca Nacional de España
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
OCR ProcessBinarisation= transform greyscale or colour images to bitonal (b/w)in order to separate foreground (text) from background
Segmentation= detection of layout elements in hierarchical order (blocks/regions, lines, words, glyphs)
Pattern Matching (Recognition)= matching of character shapes with internal font database (classifiers)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
ABBYY FineReaderMain OCR technology provider in IMPACTOCR technologies experts since 30 yearsIMPACT uses FineReader Engine (SDK)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
Binarisation
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
Adaptive Binarisation
Original scan
Prev. binarization
New binarization
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT Binarisation
6
Original State of the Art IMPACT
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
Segmentation
Blocks/Regions Words Glyphs
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT Segmentation examplePre-Impact FR Engine 9 FR Engine 10
Part of column was misclassified as image
8
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT Segmentation examplev. 9 v. 10
Linear word order errors
9
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT Segmentation examplev. 9 v. 10
Lost text
10
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
Fraktur recognition
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
Languages and DictionariesGoal:• Develop an interface so that external dictionaries can
be integrated into the FineReader Engine
2008 - 2009:• External Dictionary beta interface• Same quality as with internal dictionaries possible
2010 - 2011:• Make interface work reliably• Teach partners how to use it• Support for any language, any time period
12
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
ALTO: New native export format
Available since FRE 10 R2Supports most recent schema: ALTO v. 2.0Line coordinates available
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
Thank you! Questions?