Tuesday, 6th November 2007
Work Group CALYPODgraphiCs imAge anaLYsis from Printed Old Document
http://[email protected]
Thierry Brouard, Mathieu Delalandre, Nicholas Journetand Frédéric Nicolier
NaviDoMass Meeting6th November 2007
Paris V University, Paris, France
2
General Presentation (1/2)• Research Work Group
Group of researchers, coming from different laboratories, teams and projects, working toward a common specific research topic.
• Specific topic of researchAutomatic processing of the graphical parts in old printed books (segmentation, pre-processing, matching, OCR, retrieval, …)
• Objectives1. To develop and maintain a website to collect and
centralize information (web links, bibliographic references, papers …)
2. To put in relation (mailings, meetings) every people (human and computer sciences) working on this topic and to strengthen the collaborations
3. To develop “real-life” applications (AGORA, DMOS, ..) for the end-users partners of human science (CESR, ..)
ornamental letter
headline
figure
headline
3
General Presentation (2/2)
November
December
January February
March April
May June
July August
September
October
November
December
Calendar …
2006
2007
11th June, 1st Meeting (Paris)Starting date of Calypod Group
GDR-ISIS “Jeune Chercheur” Application “SILCIL”
5th July, opening of http://calypod.free.fr13th July, 2sd Calypod Meeting (La Rochelle)
13th November, 3rd Calypod Meeting (Tours)
Break period …..
6th November, Calypod talk at NaviDoMass Meeting (Paris)
6%
16%
17%
17%
27%
17%BVH
ANITTA
IAnaDoc
Not linked
NaviDoMass
Madonne
Calypod People (17)
Busson SébastienBaudrier EtienneNicolier Frédéric Landré Jérôme Delalandre MathieuKaratzas DimosthenisLladós JosepNicolas StéphaneRamos OriolPetitjean CarolineEngineer 5,88 1,00PhD Student 17,65 3,00Post-Doc 29,42 5,00Lecturer 29,40 5,00Professor 17,65 3,00
17,00
6%
18%
29%
29%
18%
Engineer
PhD Student
Post-Doc
Lecturer
Professor
Journet Nicholas Salmon Jean-PierreCoustaty Mickael Brouard ThierryOgier Jean-Marc Ramel Jean-Yves Sidere Nicolas
4
Research Project (1/2)
Color (black, white)
Size (small, large)
Background (almost empty, riched graphics)
letter (c) topic (vegetal) pattern (cross)
Multi-Criterion Retrieval of Ornamental Letter
Problematic ?
5
Research Project (2/2)
OLRImage
Pre-Processing
Printing Retrieval
L (90%)
Style Retrieval
Performance Evaluation
6
Image Pre-Processing
OLRImage
Pre-Processing
Printing Retrieval Style Retrieval
Performance Evaluation
Offset
Skewing
Overview• Translation, SPOMF (Symetric Phase Only
Matched Filter) correlation based method• Rotation, SPOMF on polar form of images• Scale, SPOMF + Mellin transform
Approach [Thévenaz98] A Pyramid Approach to SubpixelRegistration Based on Intensity, IEEE Trans
ImageProcessing
Degradation
7
Printing Retrieval (1/2)
(2) Most of the images are copyrighted, a system must retrieve them in real-time in order to allow crossed queries between the databases.
DB
DB
DB
query
query
r1 r2 r3
r1 r2 r3
(1) Historian people are interested in the wood plug tracking as tool to date the old books
Vascosan 1555
Marnef 1576
Printing houseplugexchangecopy
1531-1548
1511-1542
1555-1578
1497-1507
OLRImage
Pre-Processing
Printing Retrieval Style Retrieval
Performance Evaluation
8
Printing Retrieval (2/2)
Level 1 : image sizes Level 2 : image densityLevel 3 : RLE comparison
Our key ideas
(2) To use different level of operator
(from more speed to more accurate)
query
1st Level
2sd Level
Speed
Depth
(1) To use a Run Length Encoding (RLE) of Image
Compression rate/Dropcap
0,7
0,8
0,9
1
Dropcap
Co
mp
res
sio
n r
ate
0.75
0.950.8
8
x2 x2 x2
x1x1 x1
x2
line (y) image
1
3
1
2 4
5 6
7line (y+dy)
image 2
while x1 x2 handle image 1
while x2 x1 handle image 2
OLRImage
Pre-Processing
Printing Retrieval Style Retrieval
Performance Evaluation
9
• 2 steps– 1) Cluster the ornamental letters according to their styles– 2) Apply letter recognition algorithms according to the
cluster (letter black or white, background specificity…)
PreprocessingPreprocessing
Features ExtractionFeatures Extraction
Model TrainingModel Training
-Binarization-Resizing
-FFT, DCT, [Radon] Coefs.-Zernike Moments-Threshold Adj. Stats.-[Haralick, QMF]
-SVM•N-folder Cross Validation•Evaluation of the best model on a test database
OLRImage
Pre-Processing
Printing Retrieval Style Retrieval
Performance EvaluationStyle Retrieval (1/3)
10
C1
OLRImage
Pre-Processing
Printing Retrieval Style Retrieval
Performance EvaluationStyle Retrieval (2/3)
89,25%
420/466
87,5%
47/54
375/41291,0%
C1
C2
Test Samples(FFT, 100 coefs.)
C2
Graphical style retrieval (homogeneous vs. textured)
11
Test Samples(FFT, 100 coefs.)
93,1%
298/320
90,6%
145/160
175/16095,6%
C1
C2
Letter color retrieval (black vs. white)
C1
C2
OLRImage
Pre-Processing
Printing Retrieval Style Retrieval
Performance EvaluationStyle Retrieval (3/3)
12
Ornamental Letter Recognition (1/2)
A
Letter segmentatio
n
Character recognition
OLRImage
Pre-Processing
Printing Retrieval Style Retrieval
Performance Evaluation
13
Ornamental Letter Recognition (2/2)
OLRImage
Pre-Processing
Printing Retrieval Style Retrieval
Performance Evaluation
14
Performance Evaluation (1/1)
BaseOur
Retrieval engine
control
display
retrieve
Metadata
driven metadata acquisition
Bench1 Bench2 Bench2To produce
OCRImage
Pre-Processing
Printing Retrieval Style Retrieval
Performance Evaluation
Metadata file Metadata
file
Without retrieval
With retrieval more faster reduce error
15
Conclusion• Website
35 references20 weblinks4 test databases1 wiki
• Human Network17 people from computer and human sciences, still in progress (BCU Lausanne, ….)http://calypod.free.fr [email protected] Meetings, 3 invited talks
• Research WorksA common research project under way,grouped publications expected for the 1st semester 2008
August 144 Visit
September
196 Visit
October 334 Visit