Upload
vahe
View
42
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Towards Performance Evaluation of Symbol Recognition & Spotting Systems in a Localization Context. Mathieu Delalandre CVC, Barcelona, Spain EuroMed Meeting LORIA, Nancy city, France Monday 18th of May 2009. Introduction. tub. door. skin. door. sofa. r1 r2 r3. symbol. - PowerPoint PPT Presentation
Citation preview
Towards Performance Evaluation of Symbol Recognition & Spotting Systems
in a Localization Context
Mathieu DelalandreCVC, Barcelona, Spain
EuroMed MeetingLORIA, Nancy city, FranceMonday 18th of May 2009
Introduction
symbolbackgroundtext
Recognition
Spotting
r1 r2 r3
sofa
skin
tubdoordo
ordocument database
learning
database
Query By
Example
(QBE)
rank
labels
Symbol spotting: “a way to efficiently localize possible symbols and limit the computational complexity, without using full recognition methods” [Tombre2003] [Dosch2004] [Tabbone2004] [Zuwala2006] [Locteau2007] [Qureshi2007] [Rusinol2007]
Symbol recognition: ““a particular application of the general problem of pattern recognition, in which an unknown input pattern (i.e. input image) is classified as belonging to one of the relevant classes (i.e. predefined symbols) in the application domain” [Chhabra1998][Cordella1999] [Llados2002] [Tombre2005]
Electrical diagram
Mechanical drawing
Utility map
scanned
CAD file Web image
Introduction
Characterisation
GroundtruthGroundtruthGroundtruth
Groundtruthing
ResultsResultsResults
Performance evaluation
System
Performance evaluation: Information Retrieval [Salton1992], Computer Vision [Thacker2005], CBIR [Muller2001], DIA [Haralick2000]
Case of symbol recognition & spotting: [Ezra2008][Delalandre2008]
Training
data
dATADataData
Spotting/RecognitionSystem
Groundtruth Mapping
Region Of
Interest
Characterization
sofa
skin
tub
door
door
Labels
r1 r2 r3
RanksQBE
truth results
Learning
Performance evaluation
Plan1. Groundtruth and test documents2. Performance characterization3. Conclusions and perspectives
Groundtruth and test documents Overview of approaches
Real approachDocument
Document
Document
Groundtruth
Groundtruth
Groundtruth
Groundtruthing
- - weak ++ good real
approach
synthetic
approach
GTground-truthing
validation
groundtruth
drawings and alerts
groundtrutheddrawings
validation and
alertsevaluationtest images
recognition
results
Dosch and al 2006
10
43
2
5
4
01
5
32connectedparallel and
overlapped
Yan and al 2004
1. Overview of approaches2. Existing datasets
Rusinol and al 2009
Groundtruth and test documents Overview of approaches
Synthetic approach
Document
Document
Document
Groundtruth
Groundtruth
GroundtruthGroundtruthingSetting
- - weak ++ good
real approach
synthetic
approach
Aksoy 2000
binary noise
vectorial noise
Valveny and al 2007Zhai and al 2003
1. Overview of approaches2. Existing datasets
symbolbackground
Graphical documents are composed of two layers
To use a same background layer with different symbol layers
Groundtruth and test documents Overview of approaches
- - weak ++ good
real approach
synthetic
approach
Delalandre2008
1. Overview of approaches2. Existing datasets
Delalandre2008
Groundtruth and test documents Overview of approaches
c2
c1
M1M2M3M4
C1C2
C3C4
L1
θ1
p1
L2θ2
p2
p
1,0L 2,0
L
bounding box and control point
alignment
symbol model
loaded symbol
1. Overview of approaches2. Existing datasets
- - weak ++ good
real approach
synthetic
approach
Delalandre2008
Groundtruth and test documents Overview of approaches
GT GTGT GT
PositioningConstraints
SymbolModels
Document Generation
SymbolPositioning
Symbol Models
BuildingEngine
(2) run
(3) displa
y
(1) edit
Background Image
1. Overview of approaches2. Existing datasets
- - weak ++ good
real approach
synthetic
approach
Groundtruth and test documents Existing datasets
datasets
images
symbols
degradations
models
GREC’03 #30 3000 3000 10 5-50GREC’05 #16 1000 1000 6 25-
150GREC’07 #6 2100 2100 6 50-
150
ICPR’00 #9 450 11250 9 25
bags #16 1600 15046 none 25-150
floorplans #10 1000 26830 none 16diagrams #10 1000 14100 none 21
queries #6 6000 6000 none 16-21
Rusinol’09 #1 42 344 none 38
GREC
1. Overview of approaches2. Existing datasets
ICPRSESYD
Others
Groundtruth and test documents Existing datasets
GREC
1. Overview of approaches2. Existing datasets
ICPRSESYD
datasets
images
symbols
degradations
models
GREC’03 #30 3000 3000 10 5-50GREC’05 #16 1000 1000 6 25-
150GREC’07 #6 2100 2100 6 50-
150
ICPR’00 #9 450 11250 9 25
bags #16 1600 15046 none 25-150
floorplans #10 1000 26830 none 16diagrams #10 1000 14100 none 21
queries #6 6000 6000 none 16-21
Rusinol’09 #1 42 344 none 38
Others
Groundtruth and test documents Existing datasets
GREC
1. Overview of approaches2. Existing datasets
ICPRSESYD
datasets
images
symbols
degradations
models
GREC’03 #30 3000 3000 10 5-50GREC’05 #16 1000 1000 6 25-
150GREC’07 #6 2100 2100 6 50-
150
ICPR’00 #9 450 11250 9 25
bags #16 1600 15046 none 25-150
floorplans #10 1000 26830 none 16diagrams #10 1000 14100 none 21
queries #6 6000 6000 none 16-21
Rusinol’09 #1 42 344 none 38
Others
Groundtruth and test documents Existing datasets
1. Overview of approaches2. Existing datasets
GRECICPR
SESYD
datasets
images
symbols
degradations
models
GREC’03 #30 3000 3000 10 5-50GREC’05 #16 1000 1000 6 25-
150GREC’07 #6 2100 2100 6 50-
150
ICPR’00 #9 450 11250 9 25
bags #16 1600 15046 none 25-150
floorplans #10 1000 26830 none 16diagrams #10 1000 14100 none 21
queries #6 6000 6000 none 16-21
Rusinol’09 #1 42 344 none 38
Others
Groundtruth and test documents Existing datasets
1. Overview of approaches2. Existing datasets
GRECICPR
SESYD
datasets
images
symbols
degradations
models
GREC’03 #30 3000 3000 10 5-50GREC’05 #16 1000 1000 6 25-
150GREC’07 #6 2100 2100 6 50-
150
ICPR’00 #9 450 11250 9 25
bags #16 1600 15046 none 25-150
floorplans #10 1000 26830 none 16diagrams #10 1000 14100 none 21
queries #6 6000 6000 none 16-21
Rusinol’09 #1 42 344 none 38
Others
Groundtruth and test documents Existing datasets
1. Overview of approaches2. Existing datasets
GRECICPR
SESYD
datasets
images
symbols
degradations
models
GREC’03 #30 3000 3000 10 5-50GREC’05 #16 1000 1000 6 25-
150GREC’07 #6 2100 2100 6 50-
150
ICPR’00 #9 450 11250 9 25
bags #16 1600 15046 none 25-150
floorplans #10 1000 26830 none 16diagrams #10 1000 14100 none 21
queries #6 6000 6000 none 16-21
Rusinol’09 #1 42 344 none 38
Others
Groundtruth and test documents Existing datasets
1. Overview of approaches2. Existing datasets
GRECICPR
SESYD
datasets
images
symbols
degradations
models
GREC’03 #30 3000 3000 10 5-50GREC’05 #16 1000 1000 6 25-
150GREC’07 #6 2100 2100 6 50-
150
ICPR’00 #9 450 11250 9 25
bags #16 1600 15046 none 25-150
floorplans #10 1000 26830 none 16diagrams #10 1000 14100 none 21
queries #6 6000 6000 none 16-21
Rusinol’09 #1 42 344 none 38
Others
Groundtruth and test documents Existing datasets
1. Overview of approaches2. Existing datasets
GRECICPR
SESYD
GroundtruthGenerato
r of queries
1. Random selection of a document2. Radom selection of a symbol
v0x
s [0,1]
y
vmax
v x
es0
21 2
21
2vzerfs
l
n
nn
nnz
0
12
)12(!)1(2
2.02
52
12
12
3. Random crop
datasets
images
symbols
degradations
models
GREC’03 #30 3000 3000 10 5-50GREC’05 #16 1000 1000 6 25-
150GREC’07 #6 2100 2100 6 50-
150
ICPR’00 #9 450 11250 9 25
bags #16 1600 15046 none 25-150
floorplans #10 1000 26830 none 16diagrams #10 1000 14100 none 21
queries #6 6000 6000 none 16-21
Rusinol’09 #1 42 344 none 38
Others
Groundtruth and test documents Existing datasets
1. Overview of approaches2. Existing datasets
GRECICPR
SESYD
datasets
images
symbols
degradations
models
GREC’03 #30 3000 3000 10 5-50GREC’05 #16 1000 1000 6 25-
150GREC’07 #6 2100 2100 6 50-
150
ICPR’00 #9 450 11250 9 25
bags #16 1600 15046 none 25-150
floorplans #10 1000 26830 none 16diagrams #10 1000 14100 none 21
queries #6 6000 6000 none 16-21
Rusinol’09 #1 42 344 none 38
Others
Plan1. Groundtruth and test documents2. Performance characterization3. Conclusions and perspectives
Performance characterization Introduction
Performance characterisation (segmented symbols) [Valveny2004] [Dosch2006] [Valveny2007,2008a,2008b]
Recognition ratePrecision/RecallHomogeneitySeparability
Performance characterisation (real context)
Spotting/RecognitionSystem
Groundtruth Mapping
Region Of
Interest
Characterization
sofa
skin
tubdoor
door
Labels
r1 r2 r3
RanksQBE
truth results
Learning
Performance evaluation
Performance characterization About mapping
groundtruth
segmentation
segmentation
Layout analysis [Antonacopoulos1999]
Text/graphics separation [Wenyin1997]groundtruth
segme
ntation
truth results
Single : a model line matches only with
one detected line.
Split : two model lines
match with one detected line.
Merge : a model line matches with two
detected lines.
False alarm : a detected line
doesn't match with any model lines.
Miss : a model line doesn't
match with any detected lines.
Mapping cases
Symbol spotting [Rusinol2009]
Groundtruth
Results
Mappingc1 c2
g1 g2
r
rccecision 21Pr
21
21Reggcccall
Performance characterization Mapping, application to symbol
wrapperbox,
ellipsis
convex polygon
the precision will depend of
the model
could be of weak
precision
Which representation ? How to define the regions ?
concave polygon
precise but comparison is time
consuming
the polarized pat of the capacitor belong
to the symbol ?
Same for the moving area of the door ?
Lot of systems use sliding windows to detect symbols providing only points [Adam2001] [Dosh2004] [Rusinol2007]
pointHow to define
local thresholds
Compatibility with recognition systems ?
groundtruth
segmentation
Lot of systems use sliding windows to detect symbols providing only points [Adam2001] [Dosh2004] [Rusinol2007]Systems providing region of interest can “tune” their results, how to limit the over segmentation cases ?
Performance characterization Work in progress
Comparison of some criteria System of [Qureshi’08] , 100 floorplans (2521 symbols)
Domain definition of
the ROI
Orientation sampling
[0-2π]
Reporting [0-2π]Rate
s %
Region size dx×dy
results ground
truth
Signature based characterization
Plan1. Groundtruth and test documents2. Performance characterization3. Conclusions and perspectives
Conclusions and perspectives• Conclusions
– Large databases of segmented symbol images exist “GREC”– Synthetic databases in real context exist “SESYD”– True-life documents and groundtruth are at the corner “EPEIRES”– Characterization tools have been proposed “SymbolRec”
• Perspectives– Continue to produce other databases, using existing platforms– Mapping is the key problem today, to achieve a performance
evaluation in real context
ThanksAll the referenced papers can be found in
[1] M. Delalandre, E. Valveny and J. Lladós Performance Evaluation of Symbol Recognition and Spotting Systems: A Overview. Workshop on Document Analysis Systems (DAS), pp 497-505, 2008.