Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Fouille et classification de graphes :Application a l’analyse d’image cadastrale.
Romain Raveaux
L3I lab (EA 2118) – Universite de La Rochelle
Nantes le 24 Fevrier, 2011Expose au LINA
Encadrants: J-M. OgierJ-C. Burie
1 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Fiche Signaletique
• Situation actuelle :
A.T.E.R. a l’IUT d’informatique de La Rochelle (InstitutUniversitaire de Technologie)Sections 27 et 61
2 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
• Enseignement :
Maintenant:Module de reseaux
− Volume: 24h; Niveau: IUT 2; Type: CM, TD, TP
Programmation Android
− Volume: 14h; Niveau: IUT 2; Type: CM, TD, TP
Les annees precedentes:Module Statistiques
− Volume: 36h; Niveau: Master 1; Type: CM, TD
Image et Ontologie
− Volume: 13h; Niveau: Master 2; Type: CM, TD, TP
...
Pour un volume total de 336H equivalent TD.
3 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
• Domaine de recherche :
code 22 : Graphes, combinatoire, complexitecode 92 : Vision par ordinateur, reconnaissance de formescode 91 : Traitement et analyse d’images, de son, de signauxCodes fournis par la nomenclature thematique section 27 duCNU
• Responsabilite/Cooperation :
Collaboration scientifique avec l’entreprise SOOD (Thematiquede la securite documentaire)Co-organisateur du concours international de reconnaissancede symboles
4 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Mes excuses pour ce melange des genres
5 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Graphs are everywhere...
Graphs in Reality
Graphs model objects andtheir relationships.
Also referred to asnetworks.
All common datastructures can be modeledas graphs.
How similar are two graphs?
Graph similarity is thecentral problem for alllearning tasks such asclustering andclassification on graphs.
6 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Graphs are everywhere...
Graphs in Reality
Graphs model objects andtheir relationships.
Also referred to asnetworks.
All common datastructures can be modeledas graphs.
How similar are two graphs?
Graph similarity is thecentral problem for alllearning tasks such asclustering andclassification on graphs.
6 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Graphs are everywhere...
Graphs in Reality
Graphs model objects andtheir relationships.
Also referred to asnetworks.
All common datastructures can be modeledas graphs.
How similar are two graphs?
Graph similarity is thecentral problem for alllearning tasks such asclustering andclassification on graphs.
6 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Graphs are everywhere...
Graphs in Reality
Graphs model objects andtheir relationships.
Also referred to asnetworks.
All common datastructures can be modeledas graphs.
How similar are two graphs?
Graph similarity is thecentral problem for alllearning tasks such asclustering andclassification on graphs.
6 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
From the beginning...
Definition and notation of a graph:
Definition
Let LV and LE denote the set of node and edge labels, respectively.A labeled graph G is a 4-tuple G = (V ,E , µ, ξ) , where
V is the set of nodes,
E ⊆ V × V is the set of edges
µ : V → LV is a function assigning labels to the nodes, and
ξ : E → LE is a function assigning labels to the edges.
Let G1 = (V1,E1, µ1, ξ1) be the source graph
And G2 = (V2,E2, µ2, ξ2) the target graph
With V1 = (u1, ..., un) and V2 = (v1, ..., vm) respectively
7 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Graph isomorphism
Graph isomorphism
Find a mapping f : V1 → V2
i.e. x , y ∈ V1 ⇒ (x , y) ∈ E1
f is an isomorphism iff (f (x), f (y)) is an edge of G2.No polynomial-time algorithm is known for graph isomorphismNeither it is known to be NP-complete
Subgraph isomorphism
Means finding a subgraph G3 of G2 such that G1 and G3 areisomorphic.Subgraph isomorphism is NP-complete
8 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Graph isomorphism
Graph isomorphism
Find a mapping f : V1 → V2
i.e. x , y ∈ V1 ⇒ (x , y) ∈ E1
f is an isomorphism iff (f (x), f (y)) is an edge of G2.No polynomial-time algorithm is known for graph isomorphismNeither it is known to be NP-complete
Subgraph isomorphism
Means finding a subgraph G3 of G2 such that G1 and G3 areisomorphic.Subgraph isomorphism is NP-complete
8 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Graph isomorphism
Graph isomorphism
Find a mapping f : V1 → V2
i.e. x , y ∈ V1 ⇒ (x , y) ∈ E1
f is an isomorphism iff (f (x), f (y)) is an edge of G2.No polynomial-time algorithm is known for graph isomorphismNeither it is known to be NP-complete
Subgraph isomorphism
Means finding a subgraph G3 of G2 such that G1 and G3 areisomorphic.Subgraph isomorphism is NP-complete
8 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Graph isomorphism
Graph isomorphism
Find a mapping f : V1 → V2
i.e. x , y ∈ V1 ⇒ (x , y) ∈ E1
f is an isomorphism iff (f (x), f (y)) is an edge of G2.No polynomial-time algorithm is known for graph isomorphismNeither it is known to be NP-complete
Subgraph isomorphism
Means finding a subgraph G3 of G2 such that G1 and G3 areisomorphic.Subgraph isomorphism is NP-complete
8 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Graph isomorphism
Graph isomorphism
Find a mapping f : V1 → V2
i.e. x , y ∈ V1 ⇒ (x , y) ∈ E1
f is an isomorphism iff (f (x), f (y)) is an edge of G2.No polynomial-time algorithm is known for graph isomorphismNeither it is known to be NP-complete
Subgraph isomorphism
Means finding a subgraph G3 of G2 such that G1 and G3 areisomorphic.Subgraph isomorphism is NP-complete
8 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Graph isomorphism
Graph isomorphism
Find a mapping f : V1 → V2
i.e. x , y ∈ V1 ⇒ (x , y) ∈ E1
f is an isomorphism iff (f (x), f (y)) is an edge of G2.No polynomial-time algorithm is known for graph isomorphismNeither it is known to be NP-complete
Subgraph isomorphism
Means finding a subgraph G3 of G2 such that G1 and G3 areisomorphic.Subgraph isomorphism is NP-complete
8 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Graph isomorphism
Graph isomorphism
Find a mapping f : V1 → V2
i.e. x , y ∈ V1 ⇒ (x , y) ∈ E1
f is an isomorphism iff (f (x), f (y)) is an edge of G2.No polynomial-time algorithm is known for graph isomorphismNeither it is known to be NP-complete
Subgraph isomorphism
Means finding a subgraph G3 of G2 such that G1 and G3 areisomorphic.Subgraph isomorphism is NP-complete
8 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Graph isomorphism
Graph isomorphism
Find a mapping f : V1 → V2
i.e. x , y ∈ V1 ⇒ (x , y) ∈ E1
f is an isomorphism iff (f (x), f (y)) is an edge of G2.No polynomial-time algorithm is known for graph isomorphismNeither it is known to be NP-complete
Subgraph isomorphism
Means finding a subgraph G3 of G2 such that G1 and G3 areisomorphic.Subgraph isomorphism is NP-complete
8 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Graph isomorphism
Graph isomorphism
Find a mapping f : V1 → V2
i.e. x , y ∈ V1 ⇒ (x , y) ∈ E1
f is an isomorphism iff (f (x), f (y)) is an edge of G2.No polynomial-time algorithm is known for graph isomorphismNeither it is known to be NP-complete
Subgraph isomorphism
Means finding a subgraph G3 of G2 such that G1 and G3 areisomorphic.Subgraph isomorphism is NP-complete
8 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Error-tolerant graph isomorphism
Exact graph matching is useless in many computer visionapplications
Concerning graph matching under noise and distortion
The matching incorporates an error model to identify thedistortions which make one graph a distorted version of theother
Problems for real world applications
Error-tolerant
To measure the similarity of two graphs.
Runtime may grow exponentially with number of nodes
This is an enormous problem for large datasets of graphs
Wanted: Polynomial-time similarity measure for graphs
9 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Error-tolerant graph isomorphism
Exact graph matching is useless in many computer visionapplications
Concerning graph matching under noise and distortion
The matching incorporates an error model to identify thedistortions which make one graph a distorted version of theother
Problems for real world applications
Error-tolerant
To measure the similarity of two graphs.
Runtime may grow exponentially with number of nodes
This is an enormous problem for large datasets of graphs
Wanted: Polynomial-time similarity measure for graphs
9 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Error-tolerant graph isomorphism
Exact graph matching is useless in many computer visionapplications
Concerning graph matching under noise and distortion
The matching incorporates an error model to identify thedistortions which make one graph a distorted version of theother
Problems for real world applications
Error-tolerant
To measure the similarity of two graphs.
Runtime may grow exponentially with number of nodes
This is an enormous problem for large datasets of graphs
Wanted: Polynomial-time similarity measure for graphs
9 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Error-tolerant graph isomorphism
Exact graph matching is useless in many computer visionapplications
Concerning graph matching under noise and distortion
The matching incorporates an error model to identify thedistortions which make one graph a distorted version of theother
Problems for real world applications
Error-tolerant
To measure the similarity of two graphs.
Runtime may grow exponentially with number of nodes
This is an enormous problem for large datasets of graphs
Wanted: Polynomial-time similarity measure for graphs
9 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Error-tolerant graph isomorphism
Exact graph matching is useless in many computer visionapplications
Concerning graph matching under noise and distortion
The matching incorporates an error model to identify thedistortions which make one graph a distorted version of theother
Problems for real world applications
Error-tolerant
To measure the similarity of two graphs.
Runtime may grow exponentially with number of nodes
This is an enormous problem for large datasets of graphs
Wanted: Polynomial-time similarity measure for graphs
9 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Error-tolerant graph isomorphism
Exact graph matching is useless in many computer visionapplications
Concerning graph matching under noise and distortion
The matching incorporates an error model to identify thedistortions which make one graph a distorted version of theother
Problems for real world applications
Error-tolerant
To measure the similarity of two graphs.
Runtime may grow exponentially with number of nodes
This is an enormous problem for large datasets of graphs
Wanted: Polynomial-time similarity measure for graphs
9 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Error-tolerant graph isomorphism
Exact graph matching is useless in many computer visionapplications
Concerning graph matching under noise and distortion
The matching incorporates an error model to identify thedistortions which make one graph a distorted version of theother
Problems for real world applications
Error-tolerant
To measure the similarity of two graphs.
Runtime may grow exponentially with number of nodes
This is an enormous problem for large datasets of graphs
Wanted: Polynomial-time similarity measure for graphs
9 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Error-tolerant graph isomorphism
Exact graph matching is useless in many computer visionapplications
Concerning graph matching under noise and distortion
The matching incorporates an error model to identify thedistortions which make one graph a distorted version of theother
Problems for real world applications
Error-tolerant
To measure the similarity of two graphs.
Runtime may grow exponentially with number of nodes
This is an enormous problem for large datasets of graphs
Wanted: Polynomial-time similarity measure for graphs
9 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Error-tolerant graph isomorphism
Exact graph matching is useless in many computer visionapplications
Concerning graph matching under noise and distortion
The matching incorporates an error model to identify thedistortions which make one graph a distorted version of theother
Problems for real world applications
Error-tolerant
To measure the similarity of two graphs.
Runtime may grow exponentially with number of nodes
This is an enormous problem for large datasets of graphs
Wanted: Polynomial-time similarity measure for graphs
9 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Problem statement
A dissimilarity measure is a function : d : X × X → Rwhere X is the representation space for the object description.
Criteria for a good graph measure of similarity
ExpressiveEfficient to computeApplicable to wide range of graphs
10 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Problem statement
A dissimilarity measure is a function : d : X × X → Rwhere X is the representation space for the object description.
Criteria for a good graph measure of similarity
ExpressiveEfficient to computeApplicable to wide range of graphs
10 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Problem statement
A dissimilarity measure is a function : d : X × X → Rwhere X is the representation space for the object description.
Criteria for a good graph measure of similarity
ExpressiveEfficient to computeApplicable to wide range of graphs
10 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Problem statement
A dissimilarity measure is a function : d : X × X → Rwhere X is the representation space for the object description.
Criteria for a good graph measure of similarity
ExpressiveEfficient to computeApplicable to wide range of graphs
10 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Graph Edit Distance (ED)
The minimum amount of distortion that is needed to transform G1
into G2
Distortions si : deletions, insertions, substitutions of nodes andedges.
Edit path S = s1, ..., sn: A sequence of edit operations thattransforms G1 into G2.
Cost functions: Measuring the strength of a given distortion.
Edit distance d(G1,G2): Minimum cost edit path between twographs.
Problem of Edit Distance: NP complete
Explore the space of all possible mappings of the nodes andedges of G1 to the nodes and edges of G2.
Edit Distance computation also has a worst case exponentialcomplexity which prevents its use in large datasets.
11 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Graph Edit Distance (ED)
The minimum amount of distortion that is needed to transform G1
into G2
Distortions si : deletions, insertions, substitutions of nodes andedges.
Edit path S = s1, ..., sn: A sequence of edit operations thattransforms G1 into G2.
Cost functions: Measuring the strength of a given distortion.
Edit distance d(G1,G2): Minimum cost edit path between twographs.
Problem of Edit Distance: NP complete
Explore the space of all possible mappings of the nodes andedges of G1 to the nodes and edges of G2.
Edit Distance computation also has a worst case exponentialcomplexity which prevents its use in large datasets.
11 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Graph Edit Distance (ED)
The minimum amount of distortion that is needed to transform G1
into G2
Distortions si : deletions, insertions, substitutions of nodes andedges.
Edit path S = s1, ..., sn: A sequence of edit operations thattransforms G1 into G2.
Cost functions: Measuring the strength of a given distortion.
Edit distance d(G1,G2): Minimum cost edit path between twographs.
Problem of Edit Distance: NP complete
Explore the space of all possible mappings of the nodes andedges of G1 to the nodes and edges of G2.
Edit Distance computation also has a worst case exponentialcomplexity which prevents its use in large datasets.
11 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Graph Edit Distance (ED)
The minimum amount of distortion that is needed to transform G1
into G2
Distortions si : deletions, insertions, substitutions of nodes andedges.
Edit path S = s1, ..., sn: A sequence of edit operations thattransforms G1 into G2.
Cost functions: Measuring the strength of a given distortion.
Edit distance d(G1,G2): Minimum cost edit path between twographs.
Problem of Edit Distance: NP complete
Explore the space of all possible mappings of the nodes andedges of G1 to the nodes and edges of G2.
Edit Distance computation also has a worst case exponentialcomplexity which prevents its use in large datasets.
11 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Graph Edit Distance (ED)
The minimum amount of distortion that is needed to transform G1
into G2
Distortions si : deletions, insertions, substitutions of nodes andedges.
Edit path S = s1, ..., sn: A sequence of edit operations thattransforms G1 into G2.
Cost functions: Measuring the strength of a given distortion.
Edit distance d(G1,G2): Minimum cost edit path between twographs.
Problem of Edit Distance: NP complete
Explore the space of all possible mappings of the nodes andedges of G1 to the nodes and edges of G2.
Edit Distance computation also has a worst case exponentialcomplexity which prevents its use in large datasets.
11 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Graph Edit Distance (ED)
The minimum amount of distortion that is needed to transform G1
into G2
Distortions si : deletions, insertions, substitutions of nodes andedges.
Edit path S = s1, ..., sn: A sequence of edit operations thattransforms G1 into G2.
Cost functions: Measuring the strength of a given distortion.
Edit distance d(G1,G2): Minimum cost edit path between twographs.
Problem of Edit Distance: NP complete
Explore the space of all possible mappings of the nodes andedges of G1 to the nodes and edges of G2.
Edit Distance computation also has a worst case exponentialcomplexity which prevents its use in large datasets.
11 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Comparison between Classical Graph-Matching Methods
Table: In Terms of Their Computational Complexity and the Ability toPerform an Inexact Matching, [Llados 2001].
12 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Graph comparison through combinatorial optimization
Basic idea:
Methods are based on an optimization procedure mappinglocal substructuresAny node(un) from G1 can be assigned to any node(vm) of G2,Incurring some cost that depends on the un-vm assignment.It is required to map all nodes in such a way that the totalcost of the assignment is minimized.
Cost matrix representation (C ):
Cij correspond to the costs of assigning the i th node of G1 tothe j th node of G2.
C =
c1,1 ... ... c1,m
... ... ... ...
... ... ... ...cn,1 ... ... cn,m
13 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Graph comparison through combinatorial optimization
Basic idea:
Methods are based on an optimization procedure mappinglocal substructuresAny node(un) from G1 can be assigned to any node(vm) of G2,Incurring some cost that depends on the un-vm assignment.It is required to map all nodes in such a way that the totalcost of the assignment is minimized.
Cost matrix representation (C ):
Cij correspond to the costs of assigning the i th node of G1 tothe j th node of G2.
C =
c1,1 ... ... c1,m
... ... ... ...
... ... ... ...cn,1 ... ... cn,m
13 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Graph comparison through combinatorial optimization
Basic idea:
Methods are based on an optimization procedure mappinglocal substructuresAny node(un) from G1 can be assigned to any node(vm) of G2,Incurring some cost that depends on the un-vm assignment.It is required to map all nodes in such a way that the totalcost of the assignment is minimized.
Cost matrix representation (C ):
Cij correspond to the costs of assigning the i th node of G1 tothe j th node of G2.
C =
c1,1 ... ... c1,m
... ... ... ...
... ... ... ...cn,1 ... ... cn,m
13 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Graph comparison through combinatorial optimization
Basic idea:
Methods are based on an optimization procedure mappinglocal substructuresAny node(un) from G1 can be assigned to any node(vm) of G2,Incurring some cost that depends on the un-vm assignment.It is required to map all nodes in such a way that the totalcost of the assignment is minimized.
Cost matrix representation (C ):
Cij correspond to the costs of assigning the i th node of G1 tothe j th node of G2.
C =
c1,1 ... ... c1,m
... ... ... ...
... ... ... ...cn,1 ... ... cn,m
13 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Graph comparison through combinatorial optimization
Basic idea:
Methods are based on an optimization procedure mappinglocal substructuresAny node(un) from G1 can be assigned to any node(vm) of G2,Incurring some cost that depends on the un-vm assignment.It is required to map all nodes in such a way that the totalcost of the assignment is minimized.
Cost matrix representation (C ):
Cij correspond to the costs of assigning the i th node of G1 tothe j th node of G2.
13 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Combinatorial optimization: Comparative study
Node signature Distance
[Gold 1996] Node degree+Label *
[Shokoufandeh 2006] Eigen vector L2
[Riesen 2009] (1)Node+(2)Edge Edit cost* : Depends on the graph attribute type.
14 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Our proposal
A generalization of prior works
Where local substructures are represented as graphs
Where the cost function c(i , j) is a graph distance
15 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Our proposal
A generalization of prior works
Where local substructures are represented as graphs
Where the cost function c(i , j) is a graph distance
15 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Our proposal
A generalization of prior works
Where local substructures are represented as graphs
Where the cost function c(i , j) is a graph distance
15 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Our proposal
A generalization of prior works
Where local substructures are represented as graphs
Where the cost function c(i , j) is a graph distance
A graph matching method based on subgraph assignments
15 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Overview
A distance between graph
Subgraph decomposition
Optimization algorithm
Figure: Subgraph matching: A bipartite graph16 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Overview
A distance between graph
Subgraph decomposition
Optimization algorithm
Figure: Subgraph matching: A bipartite graph16 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Overview
A distance between graph
Subgraph decomposition
Optimization algorithm
Figure: Subgraph matching: A bipartite graph16 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Graph decomposition
A subgraph (sg):
A structure gathering theedges and theircorresponding endingvertices from a rootvertex.
Figure: Decomposition intosubgraph world
17 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Matrix representation I
The cost matrix contains the distances between every pair ofsubgraphs from G1 and G2.
What’s the best (minimum-cost) way to assign the subgraphs?
Assignment problem solved by the Hungarian method[Kuhn 1955]The cost of the minimum-weight subgraph matching :
SubGraph Matching Distance SGMD(G1,G2)
Example of possible variations of SGMD:
SGMDED : Based on edit distance.
SGMDGP : Based on graph probing.
C =
c1,1 ... ... c1,m
... ... ... ...
... ... ... ...cn,1 ... ... cn,m
18 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Matrix representation I
The cost matrix contains the distances between every pair ofsubgraphs from G1 and G2.
What’s the best (minimum-cost) way to assign the subgraphs?
Assignment problem solved by the Hungarian method[Kuhn 1955]The cost of the minimum-weight subgraph matching :
SubGraph Matching Distance SGMD(G1,G2)
Example of possible variations of SGMD:
SGMDED : Based on edit distance.
SGMDGP : Based on graph probing.
18 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Matrix representation I
The cost matrix contains the distances between every pair ofsubgraphs from G1 and G2.
What’s the best (minimum-cost) way to assign the subgraphs?
Assignment problem solved by the Hungarian method[Kuhn 1955]The cost of the minimum-weight subgraph matching :
SubGraph Matching Distance SGMD(G1,G2)
Example of possible variations of SGMD:
SGMDED : Based on edit distance.
SGMDGP : Based on graph probing.
18 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Matrix representation I
The cost matrix contains the distances between every pair ofsubgraphs from G1 and G2.
What’s the best (minimum-cost) way to assign the subgraphs?
Assignment problem solved by the Hungarian method[Kuhn 1955]The cost of the minimum-weight subgraph matching :
SubGraph Matching Distance SGMD(G1,G2)
Example of possible variations of SGMD:
SGMDED : Based on edit distance.
SGMDGP : Based on graph probing.
18 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Theoretical discussion
Our proposal is a pseudo metric
PositiveSymmetricTriangle inequality
SGMDED is a lower bound for the edit distance
∀G1,G2 :SGMDED(G1,G2)
max(|G1|, |G2|)≤ ED(G1,G2)
19 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Theoretical discussion
Our proposal is a pseudo metric
PositiveSymmetricTriangle inequality
SGMDED is a lower bound for the edit distance
∀G1,G2 :SGMDED(G1,G2)
max(|G1|, |G2|)≤ ED(G1,G2)
19 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Theoretical discussion
Our proposal is a pseudo metric
PositiveSymmetricTriangle inequality
SGMDED is a lower bound for the edit distance
∀G1,G2 :SGMDED(G1,G2)
max(|G1|, |G2|)≤ ED(G1,G2)
19 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Theoretical discussion
Our proposal is a pseudo metric
PositiveSymmetricTriangle inequality
SGMDED is a lower bound for the edit distance
∀G1,G2 :SGMDED(G1,G2)
max(|G1|, |G2|)≤ ED(G1,G2)
19 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Theoretical discussion
Our proposal is a pseudo metric
PositiveSymmetricTriangle inequality
SGMDED is a lower bound for the edit distance
∀G1,G2 :SGMDED(G1,G2)
max(|G1|, |G2|)≤ ED(G1,G2)
19 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Theoretical discussion
Our proposal is a pseudo metric
PositiveSymmetricTriangle inequality
SGMDED is a lower bound for the edit distance
∀G1,G2 :SGMDED(G1,G2)
max(|G1|, |G2|)≤ ED(G1,G2)
19 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Experiments
Hypothesis:
The more accurate the distance induced by graph matching is,the better the matching is.
The question turns into a graph distance comparison:
Correlation
Classification
20 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Experiments
Hypothesis:
The more accurate the distance induced by graph matching is,the better the matching is.
The question turns into a graph distance comparison:
Correlation
Classification
20 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Experiments
Hypothesis:
The more accurate the distance induced by graph matching is,the better the matching is.
The question turns into a graph distance comparison:
Correlation
Classification
20 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Databases
IAM Graph Database Repository (Standardized graph datasets for benchmarking).
Synthetic data set (Randomly generated for scalabilitytesting).
Home-made data sets (Domain-dependent applications).
Table: Characteristics of the four data sets used in our computationalexperiments
Base A Base B Base C Base D
Number of classes (N) 50 10 32 15| Training | 14128 114 9600 5062| Validation | 14101 56 3200 1688
Average number of nodes 12.03 5.56 8.84 4.7Average number of edges 9.86 11.71 10.15 3.6Average degree of nodes 1.63 4.21 1.15 1.3
21 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Databases
IAM Graph Database Repository (Standardized graph datasets for benchmarking).
Synthetic data set (Randomly generated for scalabilitytesting).
Home-made data sets (Domain-dependent applications).
Table: Characteristics of the four data sets used in our computationalexperiments
Base A Base B Base C Base D
Number of classes (N) 50 10 32 15| Training | 14128 114 9600 5062| Validation | 14101 56 3200 1688
Average number of nodes 12.03 5.56 8.84 4.7Average number of edges 9.86 11.71 10.15 3.6Average degree of nodes 1.63 4.21 1.15 1.3
21 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Databases
IAM Graph Database Repository (Standardized graph datasets for benchmarking).
Synthetic data set (Randomly generated for scalabilitytesting).
Home-made data sets (Domain-dependent applications).
Table: Characteristics of the four data sets used in our computationalexperiments
Base A Base B Base C Base D
Number of classes (N) 50 10 32 15| Training | 14128 114 9600 5062| Validation | 14101 56 3200 1688
Average number of nodes 12.03 5.56 8.84 4.7Average number of edges 9.86 11.71 10.15 3.6Average degree of nodes 1.63 4.21 1.15 1.3
21 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Protocol
Correlation between ED and suboptimal distances:
Rank correlation: Kendall correlationDistance correlation: Pearson correlation
Classification stage
1-NN classifier
22 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Protocol
Correlation between ED and suboptimal distances:
Rank correlation: Kendall correlationDistance correlation: Pearson correlation
Classification stage
1-NN classifier
22 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Protocol
Correlation between ED and suboptimal distances:
Rank correlation: Kendall correlationDistance correlation: Pearson correlation
Classification stage
1-NN classifier
22 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Protocol
Correlation between ED and suboptimal distances:
Rank correlation: Kendall correlationDistance correlation: Pearson correlation
Classification stage
1-NN classifier
22 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Protocol
Correlation between ED and suboptimal distances:
Rank correlation: Kendall correlationDistance correlation: Pearson correlation
Classification stage
1-NN classifier
22 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Rank relationship with edit distance I
23 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Rank relationship with edit distance I
23 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Bottom lines
Graph matching algorithm
Graph distance
Polynomial time complexity (O(n3))
Lower bound relation with the edit distance
Rank relation with edit distance
1-NN classifier is not negatively affected by using sub-optimaldistances.
Flexible distance with two meta-parameters:
Sub distanceSubgraph size
24 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Bottom lines
Graph matching algorithm
Graph distance
Polynomial time complexity (O(n3))
Lower bound relation with the edit distance
Rank relation with edit distance
1-NN classifier is not negatively affected by using sub-optimaldistances.
Flexible distance with two meta-parameters:
Sub distanceSubgraph size
24 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Bottom lines
Graph matching algorithm
Graph distance
Polynomial time complexity (O(n3))
Lower bound relation with the edit distance
Rank relation with edit distance
1-NN classifier is not negatively affected by using sub-optimaldistances.
Flexible distance with two meta-parameters:
Sub distanceSubgraph size
24 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Bottom lines
Graph matching algorithm
Graph distance
Polynomial time complexity (O(n3))
Lower bound relation with the edit distance
Rank relation with edit distance
1-NN classifier is not negatively affected by using sub-optimaldistances.
Flexible distance with two meta-parameters:
Sub distanceSubgraph size
24 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Bottom lines
Graph matching algorithm
Graph distance
Polynomial time complexity (O(n3))
Lower bound relation with the edit distance
Rank relation with edit distance
1-NN classifier is not negatively affected by using sub-optimaldistances.
Flexible distance with two meta-parameters:
Sub distanceSubgraph size
24 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Bottom lines
Graph matching algorithm
Graph distance
Polynomial time complexity (O(n3))
Lower bound relation with the edit distance
Rank relation with edit distance
1-NN classifier is not negatively affected by using sub-optimaldistances.
Flexible distance with two meta-parameters:
Sub distanceSubgraph size
24 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Bottom lines
Graph matching algorithm
Graph distance
Polynomial time complexity (O(n3))
Lower bound relation with the edit distance
Rank relation with edit distance
1-NN classifier is not negatively affected by using sub-optimaldistances.
Flexible distance with two meta-parameters:
Sub distanceSubgraph size
24 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statementState of the artOur proposal: Sub-graph matchingSummary
Transition
We have presented a graph distance involve in a nearestneighbor classification.
Comparing a graph to the whole training base is timeconsuming.
Let’s see how we can reduce the training set while keeping ahigh accuracy rate.
25 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results
The underlying question
Background
The selection of graph prototypes.
In a context of supervised classification.
Using structured data.
The underlying question.
How to find graph prototypes for a classification purpose ?
26 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results
The supervised classification of directed labeled graphs
Classification
Training set : Xtr = {x1, .., xL}Ll=1 with N classes.
Vectors of prototypes : V = {v1 1, .., vn m} with M prototypesper class et |V | = M × N .
Error on the test set: Enn(Xtest ; V ), Learning ofmeta-parameters.
Error on the validation base : Enn(Xval ; V ), Final results.
27 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results
An example: From symbol to graph
28 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results
The nature of graph prototypes
Graph prototypes nature stands apart in two ways :
The space the prototype belongs to.
The criteria in use to reach the prototype.
29 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results
Median Graphs
The criterion → minimization of the Sum Of Distances (SOD) for a given class.
Set Median Graphs.
Generalized Median Graphs.
Definition
Set Median Graph (smg)
smg = arg ming∈S
n∑i=1
d (g , gi )
Definition
Generalized Median Graphs (gmg)
gmg = arg ming∈U
n∑i=1
d (g , gi )
30 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results
Median Graphs
Definition
Set Median Graph (smg) → belongs to the training set.
Definition
Generalized Median Graph (gmg) → do not belong to the training set.
Can be computed invidually for each class.
30 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results
Discriminative Graphs
The Criterion → minimization of the error of classification on a test set.
Set Discriminative Graphs.
Generalized Discriminative Graphs.
Definition
Set Discriminative Graphs (sdg)
sdg = arg ming∈S
Enn(Xtest ; V )
Definition
Generalized Discriminative Graphs (gdg)
gdg = arg ming∈U
Enn(Xtest ; V )
31 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results
Discriminative Graphs
Definition
Set Discriminative Graph (sdg) → belongs to the training set.
Definition
Generalized Discriminative Graph (gdg) → do not belong to the training set.
is calculated over the whole test set (multi-class)
31 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results
2 Classes (Red, Green): Dissimilarity space.
32 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results
2 Classes : Set Median Graphs (SMG)
33 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results
2 Classes : Generalized Median Graphs (GMG)
34 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results
2 Classes : Set Discriminative Graphs (SDG)
35 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results
2 Classes : Generalized Discriminative Graphs (GDG)
36 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results
Genetic Algorithm(GA) dedicated to graph prototypes
An optimization problem
A criterion minimization (SOD or Enn);
The M-prototypes searched is a NP-Complete problem[JIANG 01].
GAs are well suited to solve this kind of non-analyticaldilemma [GOLDBERG 89].
37 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results
Fitness function
Median Graphs : SOD minimization.
Generalized Graphs : The error on a test base: Enn(Xtest ; V ).
Required: A fast graph dissimilarity measure
38 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results
Experimental results: The data sets
Table: Characteristics and specifications
Synthetic (A) GREC (B) Ferrer (C) Letter (D)Number of classes (N) 50 10 32 15
| Training | 10596 86 7200 3796| Test | 3532 56 2400 1266
| Validation | 14101 56 3200 1688Average number of nodes 12.03 5.56 8.84 4.7Average number of edges 9.86 11.71 10.15 3.6Average degree of nodes 1.63 4.21 1.15 1.3
39 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results
Convergence of the Genetic Algorithm
40 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results
Classification using M prototypes per class
41 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results
Conclusion
Contribution
A genetic algorithm dedicated to graph prototypes. Generation of different kindof prototype.
The possibility to produce M prototypes per classes.
Our conclusions
Generalized Median Graph has a better modeling ability than Set median Graph.
Generating M prototypes per class increase the performance in classification.
Generalized Discriminative Graphs: Enn(Xtest ; V ) <= Enn(Xtest ; Xtr )
Food for thoughts
To introduce probabilistic models into the GA’ functions.
To specialize GA’ function for the symbol application.
42 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results
Transition
We have presented some general applications of graphcomparison
Next slides are dedicated to the use of graph distances in acontext of performance evaluation
43 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Evaluation of Vectorized Documents
Overview:
Performance Evaluation(PE) has become of first interestduring the last years.
A contest on this topic: Since GREC’95 and every two years
The goal:
A need for standard protocols to compare and evaluatemethods.
To establish a solid knowledge of the state of the art
To determine the weaknesses and strengths
44 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Evaluation of Vectorized Documents
Overview:
Performance Evaluation(PE) has become of first interestduring the last years.
A contest on this topic: Since GREC’95 and every two years
The goal:
A need for standard protocols to compare and evaluatemethods.
To establish a solid knowledge of the state of the art
To determine the weaknesses and strengths
44 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Evaluation of Vectorized Documents
Overview:
Performance Evaluation(PE) has become of first interestduring the last years.
A contest on this topic: Since GREC’95 and every two years
The goal:
A need for standard protocols to compare and evaluatemethods.
To establish a solid knowledge of the state of the art
To determine the weaknesses and strengths
44 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Evaluation of Vectorized Documents
Overview:
Performance Evaluation(PE) has become of first interestduring the last years.
A contest on this topic: Since GREC’95 and every two years
The goal:
A need for standard protocols to compare and evaluatemethods.
To establish a solid knowledge of the state of the art
To determine the weaknesses and strengths
44 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Evaluation of Vectorized Documents
Overview:
Performance Evaluation(PE) has become of first interestduring the last years.
A contest on this topic: Since GREC’95 and every two years
The goal:
A need for standard protocols to compare and evaluatemethods.
To establish a solid knowledge of the state of the art
To determine the weaknesses and strengths
44 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Evaluation of Vectorized Documents
Overview:
Performance Evaluation(PE) has become of first interestduring the last years.
A contest on this topic: Since GREC’95 and every two years
The goal:
A need for standard protocols to compare and evaluatemethods.
To establish a solid knowledge of the state of the art
To determine the weaknesses and strengths
44 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Evaluation of Vectorized Documents
Overview:
Performance Evaluation(PE) has become of first interestduring the last years.
A contest on this topic: Since GREC’95 and every two years
The goal:
A need for standard protocols to compare and evaluatemethods.
To establish a solid knowledge of the state of the art
To determine the weaknesses and strengths
44 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Digital library
Graphic documents:
Automatic indexing
Symbol descriptorRelational indexing iDrop cap indexingMap indexing (ALPAGEproject)
45 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Digital library
Graphic documents:
Automatic indexing
Symbol descriptorRelational indexing iDrop cap indexingMap indexing (ALPAGEproject)
45 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Digital library
Graphic documents:
Automatic indexing
Symbol descriptorRelational indexing iDrop cap indexingMap indexing (ALPAGEproject)
45 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Digital library
Graphic documents:
Automatic indexing
Symbol descriptorRelational indexing iDrop cap indexingMap indexing (ALPAGEproject)
45 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Around performance evaluation for R2V system I
Performance evaluation of vectorization and line detection hasbeen reported by [Kong 1996], [Hori 1996], [Wenyin 1997] and[Chhabra 1998].
A method for evaluating the recognition of dashed lines
Hori and Doermann [Hori 1996], a measurement methodologyfor task-specific raster to vector conversion
Wenyin and Dori [Wenyin 1997], a protocol for evaluating therecognition of straight and circular lines
Phillips and Chhabra [Chhabra 1998], a methodology forevaluating graphics recognition systems operating on imagesthat contain straight lines and text blocks
46 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Around performance evaluation for R2V system I
Performance evaluation of vectorization and line detection hasbeen reported by [Kong 1996], [Hori 1996], [Wenyin 1997] and[Chhabra 1998].
A method for evaluating the recognition of dashed lines
Hori and Doermann [Hori 1996], a measurement methodologyfor task-specific raster to vector conversion
Wenyin and Dori [Wenyin 1997], a protocol for evaluating therecognition of straight and circular lines
Phillips and Chhabra [Chhabra 1998], a methodology forevaluating graphics recognition systems operating on imagesthat contain straight lines and text blocks
46 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Around performance evaluation for R2V system I
Performance evaluation of vectorization and line detection hasbeen reported by [Kong 1996], [Hori 1996], [Wenyin 1997] and[Chhabra 1998].
A method for evaluating the recognition of dashed lines
Hori and Doermann [Hori 1996], a measurement methodologyfor task-specific raster to vector conversion
Wenyin and Dori [Wenyin 1997], a protocol for evaluating therecognition of straight and circular lines
Phillips and Chhabra [Chhabra 1998], a methodology forevaluating graphics recognition systems operating on imagesthat contain straight lines and text blocks
46 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Around performance evaluation for R2V system I
Performance evaluation of vectorization and line detection hasbeen reported by [Kong 1996], [Hori 1996], [Wenyin 1997] and[Chhabra 1998].
A method for evaluating the recognition of dashed lines
Hori and Doermann [Hori 1996], a measurement methodologyfor task-specific raster to vector conversion
Wenyin and Dori [Wenyin 1997], a protocol for evaluating therecognition of straight and circular lines
Phillips and Chhabra [Chhabra 1998], a methodology forevaluating graphics recognition systems operating on imagesthat contain straight lines and text blocks
46 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Around performance evaluation for R2V system I
Performance evaluation of vectorization and line detection hasbeen reported by [Kong 1996], [Hori 1996], [Wenyin 1997] and[Chhabra 1998].
A method for evaluating the recognition of dashed lines
Hori and Doermann [Hori 1996], a measurement methodologyfor task-specific raster to vector conversion
Wenyin and Dori [Wenyin 1997], a protocol for evaluating therecognition of straight and circular lines
Phillips and Chhabra [Chhabra 1998], a methodology forevaluating graphics recognition systems operating on imagesthat contain straight lines and text blocks
46 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Around performance evaluation for R2V system II
All these methods are limited in their applicability to theALPAGE project.
All prior works focused on a lower level of consistency (arcsand segments) where we need an evaluation at polygon level.
Modification of previous methods to a polygon entity is nottrivial
A higher level requires a matching task when segments do not
We propose an extension to polygon level of relatedapproaches
Evaluation of Vectorized Documents by means of PolygonAssignments and a Graph-Based Dissimilarity
47 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Around performance evaluation for R2V system II
All these methods are limited in their applicability to theALPAGE project.
All prior works focused on a lower level of consistency (arcsand segments) where we need an evaluation at polygon level.
Modification of previous methods to a polygon entity is nottrivial
A higher level requires a matching task when segments do not
We propose an extension to polygon level of relatedapproaches
Evaluation of Vectorized Documents by means of PolygonAssignments and a Graph-Based Dissimilarity
47 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Around performance evaluation for R2V system II
All these methods are limited in their applicability to theALPAGE project.
All prior works focused on a lower level of consistency (arcsand segments) where we need an evaluation at polygon level.
Modification of previous methods to a polygon entity is nottrivial
A higher level requires a matching task when segments do not
We propose an extension to polygon level of relatedapproaches
Evaluation of Vectorized Documents by means of PolygonAssignments and a Graph-Based Dissimilarity
47 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Around performance evaluation for R2V system II
All these methods are limited in their applicability to theALPAGE project.
All prior works focused on a lower level of consistency (arcsand segments) where we need an evaluation at polygon level.
Modification of previous methods to a polygon entity is nottrivial
A higher level requires a matching task when segments do not
We propose an extension to polygon level of relatedapproaches
Evaluation of Vectorized Documents by means of PolygonAssignments and a Graph-Based Dissimilarity
47 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Around performance evaluation for R2V system II
All these methods are limited in their applicability to theALPAGE project.
All prior works focused on a lower level of consistency (arcsand segments) where we need an evaluation at polygon level.
Modification of previous methods to a polygon entity is nottrivial
A higher level requires a matching task when segments do not
We propose an extension to polygon level of relatedapproaches
Evaluation of Vectorized Documents by means of PolygonAssignments and a Graph-Based Dissimilarity
47 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Around performance evaluation for R2V system II
All these methods are limited in their applicability to theALPAGE project.
All prior works focused on a lower level of consistency (arcsand segments) where we need an evaluation at polygon level.
Modification of previous methods to a polygon entity is nottrivial
A higher level requires a matching task when segments do not
We propose an extension to polygon level of relatedapproaches
Evaluation of Vectorized Documents by means of PolygonAssignments and a Graph-Based Dissimilarity
47 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Around performance evaluation for R2V system II
All these methods are limited in their applicability to theALPAGE project.
All prior works focused on a lower level of consistency (arcsand segments) where we need an evaluation at polygon level.
Modification of previous methods to a polygon entity is nottrivial
A higher level requires a matching task when segments do not
We propose an extension to polygon level of relatedapproaches
Evaluation of Vectorized Documents by means of PolygonAssignments and a Graph-Based Dissimilarity
47 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Problem definition
Two issues about the evaluation of the:
1 Polygon detection
2 Polygon approximation
48 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Problem definition: Polygon detection
Given two sets ofpolygons, D1 and D2.
Associated together witha weight functionC : D1 × D2 → RFind a mappingf : D1 → D2 such thatthe cost function Eq. 1 isminimized∑
p∈D1
C (p, f (p)), (1)
where p is a polygon
Figure: Polygon partitions. (up)D1; (down) D2
49 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Problem definition: Polygon approximation
Given two polygons P1, P2 with N and M points, respectively.
The approximation error between P1 and P2 , d(P1,P2).
(a) P1 (b) P2
Figure: Polygons to be compared.
50 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Toward a proposition for evaluating polygon detectionalgorithms
Our proposal for assessing the quality of polygon detection system:
Two viewpoints:
1 Polygon location
2 Polygon approximation
A Local Evaluation of Vectorized Documents by means of PolygonAssignments and Matching
51 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Toward a proposition for evaluating polygon detectionalgorithms
Our proposal for assessing the quality of polygon detection system:
Two viewpoints:
1 Polygon location
2 Polygon approximation
A Local Evaluation of Vectorized Documents by means of PolygonAssignments and Matching
51 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Overview
1 A bipartite graph weighedby the symmetricdifference
To evaluate how wellpolygons are detectedand located
2 A cycle graph editdistance applied topolygons
The correctness of thepolygonalapproximation(Vectorizationprecision).
52 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Overview
1 A bipartite graph weighedby the symmetricdifference
To evaluate how wellpolygons are detectedand located
2 A cycle graph editdistance applied topolygons
The correctness of thepolygonalapproximation(Vectorizationprecision).
52 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
From graph to polygon
(a) Polygon (b) Cycle Graph
Figure: From polygon to cycle graph
The problem turns into a graph comparison problem.53 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Graph comparison
Figure: A possible edit path between graph g1 and g2 (node labels arerepresented by different shades of grey)[Riesen 2009]
The cost functions for attributed cycle graph matching are:
Table: Edit costs
Node Edge
Label Sub-stitution
γ((lAi )→ (lBj )) =
∣∣∣∣∣ lAi|A| −
lBj|B|
∣∣∣∣∣ γ((ΦAi )→ (ΦB
j )) =|ΦA
i −ΦBj |
360
Addition γ(λ→ (lBj )) =lBj|B| γ(λ→ (ΦB
j )) =|ΦB
j |360
Deletion γ((lAi )→ λ) =lAi|A| γ((ΦA
i )→ λ) =|ΦA
i |360
54 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Operation on polygons
Editing a vectorization withthe basic operations are:
Add
Delete
Move
Impact on the graphrepresentation
Through linearcombinations
It is possible to recreatethe usages of a personmodifying a vectorization
(a) Original (b) Add
(c) Delete (d) Move
Figure: Basic edit operationsapplied to a polygon.
55 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Operation on polygons
Editing a vectorization withthe basic operations are:
Add
Delete
Move
Impact on the graphrepresentation
Through linearcombinations
It is possible to recreatethe usages of a personmodifying a vectorization
(a) Original (b) Add
(c) Delete (d) Move
Figure: Basic edit operationsapplied to a polygon.
55 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Operation on polygons
Editing a vectorization withthe basic operations are:
Add
Delete
Move
Impact on the graphrepresentation
Through linearcombinations
It is possible to recreatethe usages of a personmodifying a vectorization
(a) Original (b) Add
(c) Delete (d) Move
Figure: Basic edit operationsapplied to a polygon.
55 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Operation on polygons
Editing a vectorization withthe basic operations are:
Add
Delete
Move
Impact on the graphrepresentation
Through linearcombinations
It is possible to recreatethe usages of a personmodifying a vectorization
(a) Original (b) Add
(c) Delete (d) Move
Figure: Basic edit operationsapplied to a polygon.
55 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Operation on polygons
Editing a vectorization withthe basic operations are:
Add
Delete
Move
Impact on the graphrepresentation
Through linearcombinations
It is possible to recreatethe usages of a personmodifying a vectorization
(a) Original (b) Add
(c) Delete (d) Move
Figure: Basic edit operationsapplied to a polygon.
55 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Operation on polygons
Editing a vectorization withthe basic operations are:
Add
Delete
Move
Impact on the graphrepresentation
Through linearcombinations
It is possible to recreatethe usages of a personmodifying a vectorization
(a) Original (b) Add
(c) Delete (d) Move
Figure: Basic edit operationsapplied to a polygon.
55 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Impact on the graph representation
Edit operation for segment deletion
Delete a segment = Delete a node and an edge into the graph
We conclude:
deletions = γ((lAi )→ λ) + γ((ΦAi )→ λ)
Theses deletions create an orphan edge that must be reconnected
Consequently we conclude:
substitution = γ((ΦAi )→ (ΦB
j ))
56 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Impact on the graph representation
Edit operation for segment deletion
Delete a segment = Delete a node and an edge into the graph
We conclude:
deletions = γ((lAi )→ λ) + γ((ΦAi )→ λ)
Theses deletions create an orphan edge that must be reconnected
Consequently we conclude:
substitution = γ((ΦAi )→ (ΦB
j ))
56 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Impact on the graph representation
Edit operation for segment deletion
Delete a segment = Delete a node and an edge into the graph
We conclude:
deletions = γ((lAi )→ λ) + γ((ΦAi )→ λ)
Theses deletions create an orphan edge that must be reconnected
Consequently we conclude:
substitution = γ((ΦAi )→ (ΦB
j ))
56 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Impact on the graph representation
Edit operation for segment deletion
Delete a segment = Delete a node and an edge into the graph
We conclude:
deletions = γ((lAi )→ λ) + γ((ΦAi )→ λ)
Theses deletions create an orphan edge that must be reconnected
Consequently we conclude:
substitution = γ((ΦAi )→ (ΦB
j ))
56 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Impact on the graph representation
Edit operation for segment deletion
Finally, the sequence of operations is:
γ(si → λ) = deletions + substitution
Simply, we swap formal expressions by their corresponding costs:
γ(si → λ) =lAi|A|
+ΦA
i
360+|ΦA
i − ΦBj |
360
57 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Impact on the graph representation
Edit operation for segment deletion
Finally, the sequence of operations is:
γ(si → λ) = deletions + substitution
Simply, we swap formal expressions by their corresponding costs:
γ(si → λ) =lAi|A|
+ΦA
i
360+|ΦA
i − ΦBj |
360
57 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Experiments
Database description
Protocol definition
Tests:
Polygon Matching Distance (PMD)Matched Edit Distance (MED)Cadastral parcel retrieval evaluation
are PMD and MED representative of polygon deformations ?
Shape variationPolygonal approximation modification
58 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Experiments
Database description
Protocol definition
Tests:
Polygon Matching Distance (PMD)Matched Edit Distance (MED)Cadastral parcel retrieval evaluation
are PMD and MED representative of polygon deformations ?
Shape variationPolygonal approximation modification
58 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Experiments
Database description
Protocol definition
Tests:
Polygon Matching Distance (PMD)Matched Edit Distance (MED)Cadastral parcel retrieval evaluation
are PMD and MED representative of polygon deformations ?
Shape variationPolygonal approximation modification
58 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Experiments
Database description
Protocol definition
Tests:
Polygon Matching Distance (PMD)Matched Edit Distance (MED)Cadastral parcel retrieval evaluation
are PMD and MED representative of polygon deformations ?
Shape variationPolygonal approximation modification
58 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Databases I : Polygon location
Base A Shape distortion: Derived from [Delalandre 2010] and[Dosch 2006]
To evaluate polygon detection methods
Figure: A sample among the seventy symbols used in our ranking test.
Figure: Examples of increasing levels of vectorial distortion59 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Databases II : Polygonal approximation evaluation
Base B Binary degradation: From the data set provided bythe GREC’03 contest.
The higher is the noise level the higher are the distortions onthe polygonal approximation.
60 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Databases III
61 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Databases III
61 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Databases IV
Base C Cadastral map collection from ALPAGE project
Computer generated elements (CG).
Manually vectorized references (GT).
(a) GT (b) CG
Figure: Two vectorizations to be mapped (|DCG | = 46 |DGT | = 40).
62 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Application to the evaluation of parcel detection I
A visual dissimilarity measure of local anomalies:
Comparing maps two by two.It facilitates the spotting of errorsA visual signs are worth a thousand words
(a) GT (b) CG
Figure: Two vectorizations to be mapped (|DCG | = 46 |DGT | = 40).
63 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Application to the evaluation of parcel detection II
Figure: Local dissimilarities between the two maps. The lighter thebetter.
64 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Summary I
A protocol for performance evaluation of polygon detectionalgorithms.
Our protocol is positioned as an extension of prior works, anextension at polygon level.
Our contribution is two-fold
An object mapping algorithm to roughly locate errors withinthe drawing.A cycle graph matching distance that depicts the accuracy ofthe polygonal approximation.
65 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Summary I
A protocol for performance evaluation of polygon detectionalgorithms.
Our protocol is positioned as an extension of prior works, anextension at polygon level.
Our contribution is two-fold
An object mapping algorithm to roughly locate errors withinthe drawing.A cycle graph matching distance that depicts the accuracy ofthe polygonal approximation.
65 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Summary I
A protocol for performance evaluation of polygon detectionalgorithms.
Our protocol is positioned as an extension of prior works, anextension at polygon level.
Our contribution is two-fold
An object mapping algorithm to roughly locate errors withinthe drawing.A cycle graph matching distance that depicts the accuracy ofthe polygonal approximation.
65 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Summary II
Both contributions were theoretically defined and adapted tothe PE of polygonized documents.
A set distance for the polygon matching distance (PMD)Dedicated edit costs for the graph matching method (MED)
The behavior of our set of indices was analyzed whenincreasing image degradation.
66 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Summary II
Both contributions were theoretically defined and adapted tothe PE of polygonized documents.
A set distance for the polygon matching distance (PMD)Dedicated edit costs for the graph matching method (MED)
The behavior of our set of indices was analyzed whenincreasing image degradation.
66 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Summary II
Both contributions were theoretically defined and adapted tothe PE of polygonized documents.
A set distance for the polygon matching distance (PMD)Dedicated edit costs for the graph matching method (MED)
The behavior of our set of indices was analyzed whenincreasing image degradation.
66 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
State of the artProblem statementOur proposal: Polygon detection qualitySummary
Summary II
Both contributions were theoretically defined and adapted tothe PE of polygonized documents.
A set distance for the polygon matching distance (PMD)Dedicated edit costs for the graph matching method (MED)
The behavior of our set of indices was analyzed whenincreasing image degradation.
66 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statement
Ontology inference from instances
Figure: Map interpretation strategy
67 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statement
Ontology inference from instances
Figure: UML representation designed thanks to the Eclipse ModelingFramework (EMF)
68 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statement
Ontology inference from instances
69 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statement
Ontology inference from instances
Figure: Data flow process for meta-model inference from a modelinstance
70 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statement
To take the stock on our work
Summary:
Graph-based representation
Graph comparison
Graph learning
Vectorized Document evaluation
71 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statement
To take the stock on our work
Summary:
Graph-based representation
Graph comparison
Graph learning
Vectorized Document evaluation
71 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statement
To take the stock on our work
Summary:
Graph-based representation
Graph comparison
Graph learning
Vectorized Document evaluation
71 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statement
To take the stock on our work
Summary:
Graph-based representation
Graph comparison
Graph learning
Vectorized Document evaluation
71 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statement
Perspective
Near future:
Graph: To check out the influence of subgraph matching withdifferent depths (1,2,...,|G |).
Will it increase the accuracy in classification ?What about time consumption ?
Performance Evaluation: To extend method to morecomplex objects than polygons.
To extend the concept to connected segments around the rootpolygon in order to constitute piece of symbols.To change the scope of our performance evaluation tool to thedirection of object spotting.
72 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statement
Perspective
Near future:
Graph: To check out the influence of subgraph matching withdifferent depths (1,2,...,|G |).
Will it increase the accuracy in classification ?What about time consumption ?
Performance Evaluation: To extend method to morecomplex objects than polygons.
To extend the concept to connected segments around the rootpolygon in order to constitute piece of symbols.To change the scope of our performance evaluation tool to thedirection of object spotting.
72 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statement
Food for thought
Long-term future: Semantic
Graph Matching for Model or Ontology Comparison
Image Processing Driven by Knowledge
73 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statement
Food for thought
Long-term future: Semantic
Graph Matching for Model or Ontology Comparison
Image Processing Driven by Knowledge
73 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statement
Key contributions
Graph classification: Published in PRL:
R. Raveaux J.-C. Burie and J.-M. Ogier. “A graph matchingmethod and a graph matching distance based on subgraphassignments”, Pattern Recognition Letters, 2009.
Performance Evaluation: Published in IJDAR:
R. Raveaux, J-C Burie and J-M Ogier. ”A Local Evaluation ofVectorized Documents by means of Polygon Assignments andMatching”,International Journal on Document Analysis andRecognition, 2010.
Graph mining : On the way, round 2 in CVIU:
R. Raveaux, S. Adam, P. Heroux, E. Trupin. ”Learning GraphPrototypes for Shape Recognition”, Computer Vision andImage Understanding .
74 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statement
Thank you for your attention
Some links:Software: http://alpage-l3i.univ-lr.frContact: http://romain.raveaux.free.frContest: http://iapr-tc10.univ-lr.fr
75 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statement
A Chhabra and I Phillips.The Second International Graphics Recognition Contest -Raster to Vector Conversion: A Report.Graphics Recognition: Algorithms and Systems, Lecture Notesin Computer Science, Springer, vol. 1389, 1998.
Mathieu Delalandre, Ernest Valveny, Tony Pridmore andDimosthenis Karatzas.Generation of synthetic documents for performance evaluationof symbol recognition & spotting systems.International Journal on Document Analysis and Recognition,page Online first, 2010.
Philippe Dosch and Ernest Valveny.Report on the Second Symbol Recognition Contest, 2006.
Steven Gold and Anand Rangarajan.Graph matching by graduated assignment.
75 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statement
IEEE transactions on pattern analysis and machineintelligence, pages 239–244, 1996.
O Hori and D S Doermann.Quantitative measurement of the performance ofraster-to-vector conversion algorithms.Graphics recognition – methods and applications (LectureNotes in Computer Science), vol. 1072, pages 57–68, 1996.
B Kong, I T Phillips, R M Haralick, A Prasad and R Kasturi.A benchmark: performance evaluation of dashed-line detectionalgorithms.Graphics recognition – methods and applications (LectureNotes in Computer Science), vol. 1072, pages 270–285, 1996.
H W Kuhn.The Hungarian method for the assignment problem.Naval Research Logistic Quarterly, vol. 2, pages 83–97, 1955.
75 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statement
J Llados, E Marti and J J Villanueva.Symbol Recognition by Error-Tolerant Subgraph Matchingbetween Region Adjacency Graphs.IEEE Transactions on Pattern Analysis and MachineIntelligence, vol. 23, pages 1137–1143, 2001.
Kaspar Riesen and Horst Bunke.Approximate graph edit distance computation by means ofbipartite graph matching.Image Vision Comput., vol. 27, no. 7, pages 950–959, 2009.
Ali Shokoufandeh, Lars Bretzner, Diego Macrini, M FatihDemirci, Clas Jonsson and Sven Dickinson.The representation and matching of categorical shape.Computer Vision and Image Understanding, vol. 103, no. 2,pages 139–154, 2006.
Liu Wenyin and Dov Dori.
75 / 75
Graph comparisonPrototype-Based Classification of Graphs
Evaluation of Vectorized DocumentsOntology generator from RDF documents
Problem statement
A protocol for performance evaluation of line detectionalgorithms.Machine Vision and Applications, vol. 9, no. 5, pages 240–250,1997.
75 / 75