Fouille et classification de graphes : Application à l...

Preview:

Citation preview

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Fouille et classification de graphes :Application a l’analyse d’image cadastrale.

Romain Raveaux

L3I lab (EA 2118) – Universite de La Rochelle

Nantes le 24 Fevrier, 2011Expose au LINA

Encadrants: J-M. OgierJ-C. Burie

1 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Fiche Signaletique

• Situation actuelle :

A.T.E.R. a l’IUT d’informatique de La Rochelle (InstitutUniversitaire de Technologie)Sections 27 et 61

2 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

• Enseignement :

Maintenant:Module de reseaux

− Volume: 24h; Niveau: IUT 2; Type: CM, TD, TP

Programmation Android

− Volume: 14h; Niveau: IUT 2; Type: CM, TD, TP

Les annees precedentes:Module Statistiques

− Volume: 36h; Niveau: Master 1; Type: CM, TD

Image et Ontologie

− Volume: 13h; Niveau: Master 2; Type: CM, TD, TP

...

Pour un volume total de 336H equivalent TD.

3 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

• Domaine de recherche :

code 22 : Graphes, combinatoire, complexitecode 92 : Vision par ordinateur, reconnaissance de formescode 91 : Traitement et analyse d’images, de son, de signauxCodes fournis par la nomenclature thematique section 27 duCNU

• Responsabilite/Cooperation :

Collaboration scientifique avec l’entreprise SOOD (Thematiquede la securite documentaire)Co-organisateur du concours international de reconnaissancede symboles

4 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Mes excuses pour ce melange des genres

5 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Graphs are everywhere...

Graphs in Reality

Graphs model objects andtheir relationships.

Also referred to asnetworks.

All common datastructures can be modeledas graphs.

How similar are two graphs?

Graph similarity is thecentral problem for alllearning tasks such asclustering andclassification on graphs.

6 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Graphs are everywhere...

Graphs in Reality

Graphs model objects andtheir relationships.

Also referred to asnetworks.

All common datastructures can be modeledas graphs.

How similar are two graphs?

Graph similarity is thecentral problem for alllearning tasks such asclustering andclassification on graphs.

6 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Graphs are everywhere...

Graphs in Reality

Graphs model objects andtheir relationships.

Also referred to asnetworks.

All common datastructures can be modeledas graphs.

How similar are two graphs?

Graph similarity is thecentral problem for alllearning tasks such asclustering andclassification on graphs.

6 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Graphs are everywhere...

Graphs in Reality

Graphs model objects andtheir relationships.

Also referred to asnetworks.

All common datastructures can be modeledas graphs.

How similar are two graphs?

Graph similarity is thecentral problem for alllearning tasks such asclustering andclassification on graphs.

6 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

From the beginning...

Definition and notation of a graph:

Definition

Let LV and LE denote the set of node and edge labels, respectively.A labeled graph G is a 4-tuple G = (V ,E , µ, ξ) , where

V is the set of nodes,

E ⊆ V × V is the set of edges

µ : V → LV is a function assigning labels to the nodes, and

ξ : E → LE is a function assigning labels to the edges.

Let G1 = (V1,E1, µ1, ξ1) be the source graph

And G2 = (V2,E2, µ2, ξ2) the target graph

With V1 = (u1, ..., un) and V2 = (v1, ..., vm) respectively

7 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Graph isomorphism

Graph isomorphism

Find a mapping f : V1 → V2

i.e. x , y ∈ V1 ⇒ (x , y) ∈ E1

f is an isomorphism iff (f (x), f (y)) is an edge of G2.No polynomial-time algorithm is known for graph isomorphismNeither it is known to be NP-complete

Subgraph isomorphism

Means finding a subgraph G3 of G2 such that G1 and G3 areisomorphic.Subgraph isomorphism is NP-complete

8 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Graph isomorphism

Graph isomorphism

Find a mapping f : V1 → V2

i.e. x , y ∈ V1 ⇒ (x , y) ∈ E1

f is an isomorphism iff (f (x), f (y)) is an edge of G2.No polynomial-time algorithm is known for graph isomorphismNeither it is known to be NP-complete

Subgraph isomorphism

Means finding a subgraph G3 of G2 such that G1 and G3 areisomorphic.Subgraph isomorphism is NP-complete

8 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Graph isomorphism

Graph isomorphism

Find a mapping f : V1 → V2

i.e. x , y ∈ V1 ⇒ (x , y) ∈ E1

f is an isomorphism iff (f (x), f (y)) is an edge of G2.No polynomial-time algorithm is known for graph isomorphismNeither it is known to be NP-complete

Subgraph isomorphism

Means finding a subgraph G3 of G2 such that G1 and G3 areisomorphic.Subgraph isomorphism is NP-complete

8 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Graph isomorphism

Graph isomorphism

Find a mapping f : V1 → V2

i.e. x , y ∈ V1 ⇒ (x , y) ∈ E1

f is an isomorphism iff (f (x), f (y)) is an edge of G2.No polynomial-time algorithm is known for graph isomorphismNeither it is known to be NP-complete

Subgraph isomorphism

Means finding a subgraph G3 of G2 such that G1 and G3 areisomorphic.Subgraph isomorphism is NP-complete

8 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Graph isomorphism

Graph isomorphism

Find a mapping f : V1 → V2

i.e. x , y ∈ V1 ⇒ (x , y) ∈ E1

f is an isomorphism iff (f (x), f (y)) is an edge of G2.No polynomial-time algorithm is known for graph isomorphismNeither it is known to be NP-complete

Subgraph isomorphism

Means finding a subgraph G3 of G2 such that G1 and G3 areisomorphic.Subgraph isomorphism is NP-complete

8 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Graph isomorphism

Graph isomorphism

Find a mapping f : V1 → V2

i.e. x , y ∈ V1 ⇒ (x , y) ∈ E1

f is an isomorphism iff (f (x), f (y)) is an edge of G2.No polynomial-time algorithm is known for graph isomorphismNeither it is known to be NP-complete

Subgraph isomorphism

Means finding a subgraph G3 of G2 such that G1 and G3 areisomorphic.Subgraph isomorphism is NP-complete

8 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Graph isomorphism

Graph isomorphism

Find a mapping f : V1 → V2

i.e. x , y ∈ V1 ⇒ (x , y) ∈ E1

f is an isomorphism iff (f (x), f (y)) is an edge of G2.No polynomial-time algorithm is known for graph isomorphismNeither it is known to be NP-complete

Subgraph isomorphism

Means finding a subgraph G3 of G2 such that G1 and G3 areisomorphic.Subgraph isomorphism is NP-complete

8 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Graph isomorphism

Graph isomorphism

Find a mapping f : V1 → V2

i.e. x , y ∈ V1 ⇒ (x , y) ∈ E1

f is an isomorphism iff (f (x), f (y)) is an edge of G2.No polynomial-time algorithm is known for graph isomorphismNeither it is known to be NP-complete

Subgraph isomorphism

Means finding a subgraph G3 of G2 such that G1 and G3 areisomorphic.Subgraph isomorphism is NP-complete

8 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Graph isomorphism

Graph isomorphism

Find a mapping f : V1 → V2

i.e. x , y ∈ V1 ⇒ (x , y) ∈ E1

f is an isomorphism iff (f (x), f (y)) is an edge of G2.No polynomial-time algorithm is known for graph isomorphismNeither it is known to be NP-complete

Subgraph isomorphism

Means finding a subgraph G3 of G2 such that G1 and G3 areisomorphic.Subgraph isomorphism is NP-complete

8 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Error-tolerant graph isomorphism

Exact graph matching is useless in many computer visionapplications

Concerning graph matching under noise and distortion

The matching incorporates an error model to identify thedistortions which make one graph a distorted version of theother

Problems for real world applications

Error-tolerant

To measure the similarity of two graphs.

Runtime may grow exponentially with number of nodes

This is an enormous problem for large datasets of graphs

Wanted: Polynomial-time similarity measure for graphs

9 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Error-tolerant graph isomorphism

Exact graph matching is useless in many computer visionapplications

Concerning graph matching under noise and distortion

The matching incorporates an error model to identify thedistortions which make one graph a distorted version of theother

Problems for real world applications

Error-tolerant

To measure the similarity of two graphs.

Runtime may grow exponentially with number of nodes

This is an enormous problem for large datasets of graphs

Wanted: Polynomial-time similarity measure for graphs

9 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Error-tolerant graph isomorphism

Exact graph matching is useless in many computer visionapplications

Concerning graph matching under noise and distortion

The matching incorporates an error model to identify thedistortions which make one graph a distorted version of theother

Problems for real world applications

Error-tolerant

To measure the similarity of two graphs.

Runtime may grow exponentially with number of nodes

This is an enormous problem for large datasets of graphs

Wanted: Polynomial-time similarity measure for graphs

9 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Error-tolerant graph isomorphism

Exact graph matching is useless in many computer visionapplications

Concerning graph matching under noise and distortion

The matching incorporates an error model to identify thedistortions which make one graph a distorted version of theother

Problems for real world applications

Error-tolerant

To measure the similarity of two graphs.

Runtime may grow exponentially with number of nodes

This is an enormous problem for large datasets of graphs

Wanted: Polynomial-time similarity measure for graphs

9 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Error-tolerant graph isomorphism

Exact graph matching is useless in many computer visionapplications

Concerning graph matching under noise and distortion

The matching incorporates an error model to identify thedistortions which make one graph a distorted version of theother

Problems for real world applications

Error-tolerant

To measure the similarity of two graphs.

Runtime may grow exponentially with number of nodes

This is an enormous problem for large datasets of graphs

Wanted: Polynomial-time similarity measure for graphs

9 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Error-tolerant graph isomorphism

Exact graph matching is useless in many computer visionapplications

Concerning graph matching under noise and distortion

The matching incorporates an error model to identify thedistortions which make one graph a distorted version of theother

Problems for real world applications

Error-tolerant

To measure the similarity of two graphs.

Runtime may grow exponentially with number of nodes

This is an enormous problem for large datasets of graphs

Wanted: Polynomial-time similarity measure for graphs

9 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Error-tolerant graph isomorphism

Exact graph matching is useless in many computer visionapplications

Concerning graph matching under noise and distortion

The matching incorporates an error model to identify thedistortions which make one graph a distorted version of theother

Problems for real world applications

Error-tolerant

To measure the similarity of two graphs.

Runtime may grow exponentially with number of nodes

This is an enormous problem for large datasets of graphs

Wanted: Polynomial-time similarity measure for graphs

9 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Error-tolerant graph isomorphism

Exact graph matching is useless in many computer visionapplications

Concerning graph matching under noise and distortion

The matching incorporates an error model to identify thedistortions which make one graph a distorted version of theother

Problems for real world applications

Error-tolerant

To measure the similarity of two graphs.

Runtime may grow exponentially with number of nodes

This is an enormous problem for large datasets of graphs

Wanted: Polynomial-time similarity measure for graphs

9 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Error-tolerant graph isomorphism

Exact graph matching is useless in many computer visionapplications

Concerning graph matching under noise and distortion

The matching incorporates an error model to identify thedistortions which make one graph a distorted version of theother

Problems for real world applications

Error-tolerant

To measure the similarity of two graphs.

Runtime may grow exponentially with number of nodes

This is an enormous problem for large datasets of graphs

Wanted: Polynomial-time similarity measure for graphs

9 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Problem statement

A dissimilarity measure is a function : d : X × X → Rwhere X is the representation space for the object description.

Criteria for a good graph measure of similarity

ExpressiveEfficient to computeApplicable to wide range of graphs

10 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Problem statement

A dissimilarity measure is a function : d : X × X → Rwhere X is the representation space for the object description.

Criteria for a good graph measure of similarity

ExpressiveEfficient to computeApplicable to wide range of graphs

10 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Problem statement

A dissimilarity measure is a function : d : X × X → Rwhere X is the representation space for the object description.

Criteria for a good graph measure of similarity

ExpressiveEfficient to computeApplicable to wide range of graphs

10 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Problem statement

A dissimilarity measure is a function : d : X × X → Rwhere X is the representation space for the object description.

Criteria for a good graph measure of similarity

ExpressiveEfficient to computeApplicable to wide range of graphs

10 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Graph Edit Distance (ED)

The minimum amount of distortion that is needed to transform G1

into G2

Distortions si : deletions, insertions, substitutions of nodes andedges.

Edit path S = s1, ..., sn: A sequence of edit operations thattransforms G1 into G2.

Cost functions: Measuring the strength of a given distortion.

Edit distance d(G1,G2): Minimum cost edit path between twographs.

Problem of Edit Distance: NP complete

Explore the space of all possible mappings of the nodes andedges of G1 to the nodes and edges of G2.

Edit Distance computation also has a worst case exponentialcomplexity which prevents its use in large datasets.

11 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Graph Edit Distance (ED)

The minimum amount of distortion that is needed to transform G1

into G2

Distortions si : deletions, insertions, substitutions of nodes andedges.

Edit path S = s1, ..., sn: A sequence of edit operations thattransforms G1 into G2.

Cost functions: Measuring the strength of a given distortion.

Edit distance d(G1,G2): Minimum cost edit path between twographs.

Problem of Edit Distance: NP complete

Explore the space of all possible mappings of the nodes andedges of G1 to the nodes and edges of G2.

Edit Distance computation also has a worst case exponentialcomplexity which prevents its use in large datasets.

11 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Graph Edit Distance (ED)

The minimum amount of distortion that is needed to transform G1

into G2

Distortions si : deletions, insertions, substitutions of nodes andedges.

Edit path S = s1, ..., sn: A sequence of edit operations thattransforms G1 into G2.

Cost functions: Measuring the strength of a given distortion.

Edit distance d(G1,G2): Minimum cost edit path between twographs.

Problem of Edit Distance: NP complete

Explore the space of all possible mappings of the nodes andedges of G1 to the nodes and edges of G2.

Edit Distance computation also has a worst case exponentialcomplexity which prevents its use in large datasets.

11 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Graph Edit Distance (ED)

The minimum amount of distortion that is needed to transform G1

into G2

Distortions si : deletions, insertions, substitutions of nodes andedges.

Edit path S = s1, ..., sn: A sequence of edit operations thattransforms G1 into G2.

Cost functions: Measuring the strength of a given distortion.

Edit distance d(G1,G2): Minimum cost edit path between twographs.

Problem of Edit Distance: NP complete

Explore the space of all possible mappings of the nodes andedges of G1 to the nodes and edges of G2.

Edit Distance computation also has a worst case exponentialcomplexity which prevents its use in large datasets.

11 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Graph Edit Distance (ED)

The minimum amount of distortion that is needed to transform G1

into G2

Distortions si : deletions, insertions, substitutions of nodes andedges.

Edit path S = s1, ..., sn: A sequence of edit operations thattransforms G1 into G2.

Cost functions: Measuring the strength of a given distortion.

Edit distance d(G1,G2): Minimum cost edit path between twographs.

Problem of Edit Distance: NP complete

Explore the space of all possible mappings of the nodes andedges of G1 to the nodes and edges of G2.

Edit Distance computation also has a worst case exponentialcomplexity which prevents its use in large datasets.

11 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Graph Edit Distance (ED)

The minimum amount of distortion that is needed to transform G1

into G2

Distortions si : deletions, insertions, substitutions of nodes andedges.

Edit path S = s1, ..., sn: A sequence of edit operations thattransforms G1 into G2.

Cost functions: Measuring the strength of a given distortion.

Edit distance d(G1,G2): Minimum cost edit path between twographs.

Problem of Edit Distance: NP complete

Explore the space of all possible mappings of the nodes andedges of G1 to the nodes and edges of G2.

Edit Distance computation also has a worst case exponentialcomplexity which prevents its use in large datasets.

11 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Comparison between Classical Graph-Matching Methods

Table: In Terms of Their Computational Complexity and the Ability toPerform an Inexact Matching, [Llados 2001].

12 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Graph comparison through combinatorial optimization

Basic idea:

Methods are based on an optimization procedure mappinglocal substructuresAny node(un) from G1 can be assigned to any node(vm) of G2,Incurring some cost that depends on the un-vm assignment.It is required to map all nodes in such a way that the totalcost of the assignment is minimized.

Cost matrix representation (C ):

Cij correspond to the costs of assigning the i th node of G1 tothe j th node of G2.

C =

c1,1 ... ... c1,m

... ... ... ...

... ... ... ...cn,1 ... ... cn,m

13 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Graph comparison through combinatorial optimization

Basic idea:

Methods are based on an optimization procedure mappinglocal substructuresAny node(un) from G1 can be assigned to any node(vm) of G2,Incurring some cost that depends on the un-vm assignment.It is required to map all nodes in such a way that the totalcost of the assignment is minimized.

Cost matrix representation (C ):

Cij correspond to the costs of assigning the i th node of G1 tothe j th node of G2.

C =

c1,1 ... ... c1,m

... ... ... ...

... ... ... ...cn,1 ... ... cn,m

13 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Graph comparison through combinatorial optimization

Basic idea:

Methods are based on an optimization procedure mappinglocal substructuresAny node(un) from G1 can be assigned to any node(vm) of G2,Incurring some cost that depends on the un-vm assignment.It is required to map all nodes in such a way that the totalcost of the assignment is minimized.

Cost matrix representation (C ):

Cij correspond to the costs of assigning the i th node of G1 tothe j th node of G2.

C =

c1,1 ... ... c1,m

... ... ... ...

... ... ... ...cn,1 ... ... cn,m

13 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Graph comparison through combinatorial optimization

Basic idea:

Methods are based on an optimization procedure mappinglocal substructuresAny node(un) from G1 can be assigned to any node(vm) of G2,Incurring some cost that depends on the un-vm assignment.It is required to map all nodes in such a way that the totalcost of the assignment is minimized.

Cost matrix representation (C ):

Cij correspond to the costs of assigning the i th node of G1 tothe j th node of G2.

C =

c1,1 ... ... c1,m

... ... ... ...

... ... ... ...cn,1 ... ... cn,m

13 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Graph comparison through combinatorial optimization

Basic idea:

Methods are based on an optimization procedure mappinglocal substructuresAny node(un) from G1 can be assigned to any node(vm) of G2,Incurring some cost that depends on the un-vm assignment.It is required to map all nodes in such a way that the totalcost of the assignment is minimized.

Cost matrix representation (C ):

Cij correspond to the costs of assigning the i th node of G1 tothe j th node of G2.

13 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Combinatorial optimization: Comparative study

Node signature Distance

[Gold 1996] Node degree+Label *

[Shokoufandeh 2006] Eigen vector L2

[Riesen 2009] (1)Node+(2)Edge Edit cost* : Depends on the graph attribute type.

14 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Our proposal

A generalization of prior works

Where local substructures are represented as graphs

Where the cost function c(i , j) is a graph distance

15 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Our proposal

A generalization of prior works

Where local substructures are represented as graphs

Where the cost function c(i , j) is a graph distance

15 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Our proposal

A generalization of prior works

Where local substructures are represented as graphs

Where the cost function c(i , j) is a graph distance

15 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Our proposal

A generalization of prior works

Where local substructures are represented as graphs

Where the cost function c(i , j) is a graph distance

A graph matching method based on subgraph assignments

15 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Overview

A distance between graph

Subgraph decomposition

Optimization algorithm

Figure: Subgraph matching: A bipartite graph16 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Overview

A distance between graph

Subgraph decomposition

Optimization algorithm

Figure: Subgraph matching: A bipartite graph16 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Overview

A distance between graph

Subgraph decomposition

Optimization algorithm

Figure: Subgraph matching: A bipartite graph16 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Graph decomposition

A subgraph (sg):

A structure gathering theedges and theircorresponding endingvertices from a rootvertex.

Figure: Decomposition intosubgraph world

17 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Matrix representation I

The cost matrix contains the distances between every pair ofsubgraphs from G1 and G2.

What’s the best (minimum-cost) way to assign the subgraphs?

Assignment problem solved by the Hungarian method[Kuhn 1955]The cost of the minimum-weight subgraph matching :

SubGraph Matching Distance SGMD(G1,G2)

Example of possible variations of SGMD:

SGMDED : Based on edit distance.

SGMDGP : Based on graph probing.

C =

c1,1 ... ... c1,m

... ... ... ...

... ... ... ...cn,1 ... ... cn,m

18 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Matrix representation I

The cost matrix contains the distances between every pair ofsubgraphs from G1 and G2.

What’s the best (minimum-cost) way to assign the subgraphs?

Assignment problem solved by the Hungarian method[Kuhn 1955]The cost of the minimum-weight subgraph matching :

SubGraph Matching Distance SGMD(G1,G2)

Example of possible variations of SGMD:

SGMDED : Based on edit distance.

SGMDGP : Based on graph probing.

18 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Matrix representation I

The cost matrix contains the distances between every pair ofsubgraphs from G1 and G2.

What’s the best (minimum-cost) way to assign the subgraphs?

Assignment problem solved by the Hungarian method[Kuhn 1955]The cost of the minimum-weight subgraph matching :

SubGraph Matching Distance SGMD(G1,G2)

Example of possible variations of SGMD:

SGMDED : Based on edit distance.

SGMDGP : Based on graph probing.

18 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Matrix representation I

The cost matrix contains the distances between every pair ofsubgraphs from G1 and G2.

What’s the best (minimum-cost) way to assign the subgraphs?

Assignment problem solved by the Hungarian method[Kuhn 1955]The cost of the minimum-weight subgraph matching :

SubGraph Matching Distance SGMD(G1,G2)

Example of possible variations of SGMD:

SGMDED : Based on edit distance.

SGMDGP : Based on graph probing.

18 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Theoretical discussion

Our proposal is a pseudo metric

PositiveSymmetricTriangle inequality

SGMDED is a lower bound for the edit distance

∀G1,G2 :SGMDED(G1,G2)

max(|G1|, |G2|)≤ ED(G1,G2)

19 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Theoretical discussion

Our proposal is a pseudo metric

PositiveSymmetricTriangle inequality

SGMDED is a lower bound for the edit distance

∀G1,G2 :SGMDED(G1,G2)

max(|G1|, |G2|)≤ ED(G1,G2)

19 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Theoretical discussion

Our proposal is a pseudo metric

PositiveSymmetricTriangle inequality

SGMDED is a lower bound for the edit distance

∀G1,G2 :SGMDED(G1,G2)

max(|G1|, |G2|)≤ ED(G1,G2)

19 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Theoretical discussion

Our proposal is a pseudo metric

PositiveSymmetricTriangle inequality

SGMDED is a lower bound for the edit distance

∀G1,G2 :SGMDED(G1,G2)

max(|G1|, |G2|)≤ ED(G1,G2)

19 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Theoretical discussion

Our proposal is a pseudo metric

PositiveSymmetricTriangle inequality

SGMDED is a lower bound for the edit distance

∀G1,G2 :SGMDED(G1,G2)

max(|G1|, |G2|)≤ ED(G1,G2)

19 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Theoretical discussion

Our proposal is a pseudo metric

PositiveSymmetricTriangle inequality

SGMDED is a lower bound for the edit distance

∀G1,G2 :SGMDED(G1,G2)

max(|G1|, |G2|)≤ ED(G1,G2)

19 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Experiments

Hypothesis:

The more accurate the distance induced by graph matching is,the better the matching is.

The question turns into a graph distance comparison:

Correlation

Classification

20 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Experiments

Hypothesis:

The more accurate the distance induced by graph matching is,the better the matching is.

The question turns into a graph distance comparison:

Correlation

Classification

20 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Experiments

Hypothesis:

The more accurate the distance induced by graph matching is,the better the matching is.

The question turns into a graph distance comparison:

Correlation

Classification

20 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Databases

IAM Graph Database Repository (Standardized graph datasets for benchmarking).

Synthetic data set (Randomly generated for scalabilitytesting).

Home-made data sets (Domain-dependent applications).

Table: Characteristics of the four data sets used in our computationalexperiments

Base A Base B Base C Base D

Number of classes (N) 50 10 32 15| Training | 14128 114 9600 5062| Validation | 14101 56 3200 1688

Average number of nodes 12.03 5.56 8.84 4.7Average number of edges 9.86 11.71 10.15 3.6Average degree of nodes 1.63 4.21 1.15 1.3

21 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Databases

IAM Graph Database Repository (Standardized graph datasets for benchmarking).

Synthetic data set (Randomly generated for scalabilitytesting).

Home-made data sets (Domain-dependent applications).

Table: Characteristics of the four data sets used in our computationalexperiments

Base A Base B Base C Base D

Number of classes (N) 50 10 32 15| Training | 14128 114 9600 5062| Validation | 14101 56 3200 1688

Average number of nodes 12.03 5.56 8.84 4.7Average number of edges 9.86 11.71 10.15 3.6Average degree of nodes 1.63 4.21 1.15 1.3

21 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Databases

IAM Graph Database Repository (Standardized graph datasets for benchmarking).

Synthetic data set (Randomly generated for scalabilitytesting).

Home-made data sets (Domain-dependent applications).

Table: Characteristics of the four data sets used in our computationalexperiments

Base A Base B Base C Base D

Number of classes (N) 50 10 32 15| Training | 14128 114 9600 5062| Validation | 14101 56 3200 1688

Average number of nodes 12.03 5.56 8.84 4.7Average number of edges 9.86 11.71 10.15 3.6Average degree of nodes 1.63 4.21 1.15 1.3

21 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Protocol

Correlation between ED and suboptimal distances:

Rank correlation: Kendall correlationDistance correlation: Pearson correlation

Classification stage

1-NN classifier

22 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Protocol

Correlation between ED and suboptimal distances:

Rank correlation: Kendall correlationDistance correlation: Pearson correlation

Classification stage

1-NN classifier

22 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Protocol

Correlation between ED and suboptimal distances:

Rank correlation: Kendall correlationDistance correlation: Pearson correlation

Classification stage

1-NN classifier

22 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Protocol

Correlation between ED and suboptimal distances:

Rank correlation: Kendall correlationDistance correlation: Pearson correlation

Classification stage

1-NN classifier

22 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Protocol

Correlation between ED and suboptimal distances:

Rank correlation: Kendall correlationDistance correlation: Pearson correlation

Classification stage

1-NN classifier

22 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Rank relationship with edit distance I

23 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Rank relationship with edit distance I

23 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Bottom lines

Graph matching algorithm

Graph distance

Polynomial time complexity (O(n3))

Lower bound relation with the edit distance

Rank relation with edit distance

1-NN classifier is not negatively affected by using sub-optimaldistances.

Flexible distance with two meta-parameters:

Sub distanceSubgraph size

24 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Bottom lines

Graph matching algorithm

Graph distance

Polynomial time complexity (O(n3))

Lower bound relation with the edit distance

Rank relation with edit distance

1-NN classifier is not negatively affected by using sub-optimaldistances.

Flexible distance with two meta-parameters:

Sub distanceSubgraph size

24 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Bottom lines

Graph matching algorithm

Graph distance

Polynomial time complexity (O(n3))

Lower bound relation with the edit distance

Rank relation with edit distance

1-NN classifier is not negatively affected by using sub-optimaldistances.

Flexible distance with two meta-parameters:

Sub distanceSubgraph size

24 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Bottom lines

Graph matching algorithm

Graph distance

Polynomial time complexity (O(n3))

Lower bound relation with the edit distance

Rank relation with edit distance

1-NN classifier is not negatively affected by using sub-optimaldistances.

Flexible distance with two meta-parameters:

Sub distanceSubgraph size

24 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Bottom lines

Graph matching algorithm

Graph distance

Polynomial time complexity (O(n3))

Lower bound relation with the edit distance

Rank relation with edit distance

1-NN classifier is not negatively affected by using sub-optimaldistances.

Flexible distance with two meta-parameters:

Sub distanceSubgraph size

24 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Bottom lines

Graph matching algorithm

Graph distance

Polynomial time complexity (O(n3))

Lower bound relation with the edit distance

Rank relation with edit distance

1-NN classifier is not negatively affected by using sub-optimaldistances.

Flexible distance with two meta-parameters:

Sub distanceSubgraph size

24 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Bottom lines

Graph matching algorithm

Graph distance

Polynomial time complexity (O(n3))

Lower bound relation with the edit distance

Rank relation with edit distance

1-NN classifier is not negatively affected by using sub-optimaldistances.

Flexible distance with two meta-parameters:

Sub distanceSubgraph size

24 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statementState of the artOur proposal: Sub-graph matchingSummary

Transition

We have presented a graph distance involve in a nearestneighbor classification.

Comparing a graph to the whole training base is timeconsuming.

Let’s see how we can reduce the training set while keeping ahigh accuracy rate.

25 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results

The underlying question

Background

The selection of graph prototypes.

In a context of supervised classification.

Using structured data.

The underlying question.

How to find graph prototypes for a classification purpose ?

26 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results

The supervised classification of directed labeled graphs

Classification

Training set : Xtr = {x1, .., xL}Ll=1 with N classes.

Vectors of prototypes : V = {v1 1, .., vn m} with M prototypesper class et |V | = M × N .

Error on the test set: Enn(Xtest ; V ), Learning ofmeta-parameters.

Error on the validation base : Enn(Xval ; V ), Final results.

27 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results

An example: From symbol to graph

28 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results

The nature of graph prototypes

Graph prototypes nature stands apart in two ways :

The space the prototype belongs to.

The criteria in use to reach the prototype.

29 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results

Median Graphs

The criterion → minimization of the Sum Of Distances (SOD) for a given class.

Set Median Graphs.

Generalized Median Graphs.

Definition

Set Median Graph (smg)

smg = arg ming∈S

n∑i=1

d (g , gi )

Definition

Generalized Median Graphs (gmg)

gmg = arg ming∈U

n∑i=1

d (g , gi )

30 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results

Median Graphs

Definition

Set Median Graph (smg) → belongs to the training set.

Definition

Generalized Median Graph (gmg) → do not belong to the training set.

Can be computed invidually for each class.

30 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results

Discriminative Graphs

The Criterion → minimization of the error of classification on a test set.

Set Discriminative Graphs.

Generalized Discriminative Graphs.

Definition

Set Discriminative Graphs (sdg)

sdg = arg ming∈S

Enn(Xtest ; V )

Definition

Generalized Discriminative Graphs (gdg)

gdg = arg ming∈U

Enn(Xtest ; V )

31 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results

Discriminative Graphs

Definition

Set Discriminative Graph (sdg) → belongs to the training set.

Definition

Generalized Discriminative Graph (gdg) → do not belong to the training set.

is calculated over the whole test set (multi-class)

31 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results

2 Classes (Red, Green): Dissimilarity space.

32 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results

2 Classes : Set Median Graphs (SMG)

33 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results

2 Classes : Generalized Median Graphs (GMG)

34 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results

2 Classes : Set Discriminative Graphs (SDG)

35 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results

2 Classes : Generalized Discriminative Graphs (GDG)

36 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results

Genetic Algorithm(GA) dedicated to graph prototypes

An optimization problem

A criterion minimization (SOD or Enn);

The M-prototypes searched is a NP-Complete problem[JIANG 01].

GAs are well suited to solve this kind of non-analyticaldilemma [GOLDBERG 89].

37 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results

Fitness function

Median Graphs : SOD minimization.

Generalized Graphs : The error on a test base: Enn(Xtest ; V ).

Required: A fast graph dissimilarity measure

38 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results

Experimental results: The data sets

Table: Characteristics and specifications

Synthetic (A) GREC (B) Ferrer (C) Letter (D)Number of classes (N) 50 10 32 15

| Training | 10596 86 7200 3796| Test | 3532 56 2400 1266

| Validation | 14101 56 3200 1688Average number of nodes 12.03 5.56 8.84 4.7Average number of edges 9.86 11.71 10.15 3.6Average degree of nodes 1.63 4.21 1.15 1.3

39 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results

Convergence of the Genetic Algorithm

40 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results

Classification using M prototypes per class

41 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results

Conclusion

Contribution

A genetic algorithm dedicated to graph prototypes. Generation of different kindof prototype.

The possibility to produce M prototypes per classes.

Our conclusions

Generalized Median Graph has a better modeling ability than Set median Graph.

Generating M prototypes per class increase the performance in classification.

Generalized Discriminative Graphs: Enn(Xtest ; V ) <= Enn(Xtest ; Xtr )

Food for thoughts

To introduce probabilistic models into the GA’ functions.

To specialize GA’ function for the symbol application.

42 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

The underlying questionThe different kind of graph prototypesAn optimization algorithm dedicated to graph prototypesExperimental Results

Transition

We have presented some general applications of graphcomparison

Next slides are dedicated to the use of graph distances in acontext of performance evaluation

43 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Evaluation of Vectorized Documents

Overview:

Performance Evaluation(PE) has become of first interestduring the last years.

A contest on this topic: Since GREC’95 and every two years

The goal:

A need for standard protocols to compare and evaluatemethods.

To establish a solid knowledge of the state of the art

To determine the weaknesses and strengths

44 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Evaluation of Vectorized Documents

Overview:

Performance Evaluation(PE) has become of first interestduring the last years.

A contest on this topic: Since GREC’95 and every two years

The goal:

A need for standard protocols to compare and evaluatemethods.

To establish a solid knowledge of the state of the art

To determine the weaknesses and strengths

44 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Evaluation of Vectorized Documents

Overview:

Performance Evaluation(PE) has become of first interestduring the last years.

A contest on this topic: Since GREC’95 and every two years

The goal:

A need for standard protocols to compare and evaluatemethods.

To establish a solid knowledge of the state of the art

To determine the weaknesses and strengths

44 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Evaluation of Vectorized Documents

Overview:

Performance Evaluation(PE) has become of first interestduring the last years.

A contest on this topic: Since GREC’95 and every two years

The goal:

A need for standard protocols to compare and evaluatemethods.

To establish a solid knowledge of the state of the art

To determine the weaknesses and strengths

44 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Evaluation of Vectorized Documents

Overview:

Performance Evaluation(PE) has become of first interestduring the last years.

A contest on this topic: Since GREC’95 and every two years

The goal:

A need for standard protocols to compare and evaluatemethods.

To establish a solid knowledge of the state of the art

To determine the weaknesses and strengths

44 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Evaluation of Vectorized Documents

Overview:

Performance Evaluation(PE) has become of first interestduring the last years.

A contest on this topic: Since GREC’95 and every two years

The goal:

A need for standard protocols to compare and evaluatemethods.

To establish a solid knowledge of the state of the art

To determine the weaknesses and strengths

44 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Evaluation of Vectorized Documents

Overview:

Performance Evaluation(PE) has become of first interestduring the last years.

A contest on this topic: Since GREC’95 and every two years

The goal:

A need for standard protocols to compare and evaluatemethods.

To establish a solid knowledge of the state of the art

To determine the weaknesses and strengths

44 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Digital library

Graphic documents:

Automatic indexing

Symbol descriptorRelational indexing iDrop cap indexingMap indexing (ALPAGEproject)

45 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Digital library

Graphic documents:

Automatic indexing

Symbol descriptorRelational indexing iDrop cap indexingMap indexing (ALPAGEproject)

45 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Digital library

Graphic documents:

Automatic indexing

Symbol descriptorRelational indexing iDrop cap indexingMap indexing (ALPAGEproject)

45 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Digital library

Graphic documents:

Automatic indexing

Symbol descriptorRelational indexing iDrop cap indexingMap indexing (ALPAGEproject)

45 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Around performance evaluation for R2V system I

Performance evaluation of vectorization and line detection hasbeen reported by [Kong 1996], [Hori 1996], [Wenyin 1997] and[Chhabra 1998].

A method for evaluating the recognition of dashed lines

Hori and Doermann [Hori 1996], a measurement methodologyfor task-specific raster to vector conversion

Wenyin and Dori [Wenyin 1997], a protocol for evaluating therecognition of straight and circular lines

Phillips and Chhabra [Chhabra 1998], a methodology forevaluating graphics recognition systems operating on imagesthat contain straight lines and text blocks

46 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Around performance evaluation for R2V system I

Performance evaluation of vectorization and line detection hasbeen reported by [Kong 1996], [Hori 1996], [Wenyin 1997] and[Chhabra 1998].

A method for evaluating the recognition of dashed lines

Hori and Doermann [Hori 1996], a measurement methodologyfor task-specific raster to vector conversion

Wenyin and Dori [Wenyin 1997], a protocol for evaluating therecognition of straight and circular lines

Phillips and Chhabra [Chhabra 1998], a methodology forevaluating graphics recognition systems operating on imagesthat contain straight lines and text blocks

46 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Around performance evaluation for R2V system I

Performance evaluation of vectorization and line detection hasbeen reported by [Kong 1996], [Hori 1996], [Wenyin 1997] and[Chhabra 1998].

A method for evaluating the recognition of dashed lines

Hori and Doermann [Hori 1996], a measurement methodologyfor task-specific raster to vector conversion

Wenyin and Dori [Wenyin 1997], a protocol for evaluating therecognition of straight and circular lines

Phillips and Chhabra [Chhabra 1998], a methodology forevaluating graphics recognition systems operating on imagesthat contain straight lines and text blocks

46 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Around performance evaluation for R2V system I

Performance evaluation of vectorization and line detection hasbeen reported by [Kong 1996], [Hori 1996], [Wenyin 1997] and[Chhabra 1998].

A method for evaluating the recognition of dashed lines

Hori and Doermann [Hori 1996], a measurement methodologyfor task-specific raster to vector conversion

Wenyin and Dori [Wenyin 1997], a protocol for evaluating therecognition of straight and circular lines

Phillips and Chhabra [Chhabra 1998], a methodology forevaluating graphics recognition systems operating on imagesthat contain straight lines and text blocks

46 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Around performance evaluation for R2V system I

Performance evaluation of vectorization and line detection hasbeen reported by [Kong 1996], [Hori 1996], [Wenyin 1997] and[Chhabra 1998].

A method for evaluating the recognition of dashed lines

Hori and Doermann [Hori 1996], a measurement methodologyfor task-specific raster to vector conversion

Wenyin and Dori [Wenyin 1997], a protocol for evaluating therecognition of straight and circular lines

Phillips and Chhabra [Chhabra 1998], a methodology forevaluating graphics recognition systems operating on imagesthat contain straight lines and text blocks

46 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Around performance evaluation for R2V system II

All these methods are limited in their applicability to theALPAGE project.

All prior works focused on a lower level of consistency (arcsand segments) where we need an evaluation at polygon level.

Modification of previous methods to a polygon entity is nottrivial

A higher level requires a matching task when segments do not

We propose an extension to polygon level of relatedapproaches

Evaluation of Vectorized Documents by means of PolygonAssignments and a Graph-Based Dissimilarity

47 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Around performance evaluation for R2V system II

All these methods are limited in their applicability to theALPAGE project.

All prior works focused on a lower level of consistency (arcsand segments) where we need an evaluation at polygon level.

Modification of previous methods to a polygon entity is nottrivial

A higher level requires a matching task when segments do not

We propose an extension to polygon level of relatedapproaches

Evaluation of Vectorized Documents by means of PolygonAssignments and a Graph-Based Dissimilarity

47 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Around performance evaluation for R2V system II

All these methods are limited in their applicability to theALPAGE project.

All prior works focused on a lower level of consistency (arcsand segments) where we need an evaluation at polygon level.

Modification of previous methods to a polygon entity is nottrivial

A higher level requires a matching task when segments do not

We propose an extension to polygon level of relatedapproaches

Evaluation of Vectorized Documents by means of PolygonAssignments and a Graph-Based Dissimilarity

47 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Around performance evaluation for R2V system II

All these methods are limited in their applicability to theALPAGE project.

All prior works focused on a lower level of consistency (arcsand segments) where we need an evaluation at polygon level.

Modification of previous methods to a polygon entity is nottrivial

A higher level requires a matching task when segments do not

We propose an extension to polygon level of relatedapproaches

Evaluation of Vectorized Documents by means of PolygonAssignments and a Graph-Based Dissimilarity

47 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Around performance evaluation for R2V system II

All these methods are limited in their applicability to theALPAGE project.

All prior works focused on a lower level of consistency (arcsand segments) where we need an evaluation at polygon level.

Modification of previous methods to a polygon entity is nottrivial

A higher level requires a matching task when segments do not

We propose an extension to polygon level of relatedapproaches

Evaluation of Vectorized Documents by means of PolygonAssignments and a Graph-Based Dissimilarity

47 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Around performance evaluation for R2V system II

All these methods are limited in their applicability to theALPAGE project.

All prior works focused on a lower level of consistency (arcsand segments) where we need an evaluation at polygon level.

Modification of previous methods to a polygon entity is nottrivial

A higher level requires a matching task when segments do not

We propose an extension to polygon level of relatedapproaches

Evaluation of Vectorized Documents by means of PolygonAssignments and a Graph-Based Dissimilarity

47 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Around performance evaluation for R2V system II

All these methods are limited in their applicability to theALPAGE project.

All prior works focused on a lower level of consistency (arcsand segments) where we need an evaluation at polygon level.

Modification of previous methods to a polygon entity is nottrivial

A higher level requires a matching task when segments do not

We propose an extension to polygon level of relatedapproaches

Evaluation of Vectorized Documents by means of PolygonAssignments and a Graph-Based Dissimilarity

47 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Problem definition

Two issues about the evaluation of the:

1 Polygon detection

2 Polygon approximation

48 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Problem definition: Polygon detection

Given two sets ofpolygons, D1 and D2.

Associated together witha weight functionC : D1 × D2 → RFind a mappingf : D1 → D2 such thatthe cost function Eq. 1 isminimized∑

p∈D1

C (p, f (p)), (1)

where p is a polygon

Figure: Polygon partitions. (up)D1; (down) D2

49 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Problem definition: Polygon approximation

Given two polygons P1, P2 with N and M points, respectively.

The approximation error between P1 and P2 , d(P1,P2).

(a) P1 (b) P2

Figure: Polygons to be compared.

50 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Toward a proposition for evaluating polygon detectionalgorithms

Our proposal for assessing the quality of polygon detection system:

Two viewpoints:

1 Polygon location

2 Polygon approximation

A Local Evaluation of Vectorized Documents by means of PolygonAssignments and Matching

51 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Toward a proposition for evaluating polygon detectionalgorithms

Our proposal for assessing the quality of polygon detection system:

Two viewpoints:

1 Polygon location

2 Polygon approximation

A Local Evaluation of Vectorized Documents by means of PolygonAssignments and Matching

51 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Overview

1 A bipartite graph weighedby the symmetricdifference

To evaluate how wellpolygons are detectedand located

2 A cycle graph editdistance applied topolygons

The correctness of thepolygonalapproximation(Vectorizationprecision).

52 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Overview

1 A bipartite graph weighedby the symmetricdifference

To evaluate how wellpolygons are detectedand located

2 A cycle graph editdistance applied topolygons

The correctness of thepolygonalapproximation(Vectorizationprecision).

52 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

From graph to polygon

(a) Polygon (b) Cycle Graph

Figure: From polygon to cycle graph

The problem turns into a graph comparison problem.53 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Graph comparison

Figure: A possible edit path between graph g1 and g2 (node labels arerepresented by different shades of grey)[Riesen 2009]

The cost functions for attributed cycle graph matching are:

Table: Edit costs

Node Edge

Label Sub-stitution

γ((lAi )→ (lBj )) =

∣∣∣∣∣ lAi|A| −

lBj|B|

∣∣∣∣∣ γ((ΦAi )→ (ΦB

j )) =|ΦA

i −ΦBj |

360

Addition γ(λ→ (lBj )) =lBj|B| γ(λ→ (ΦB

j )) =|ΦB

j |360

Deletion γ((lAi )→ λ) =lAi|A| γ((ΦA

i )→ λ) =|ΦA

i |360

54 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Operation on polygons

Editing a vectorization withthe basic operations are:

Add

Delete

Move

Impact on the graphrepresentation

Through linearcombinations

It is possible to recreatethe usages of a personmodifying a vectorization

(a) Original (b) Add

(c) Delete (d) Move

Figure: Basic edit operationsapplied to a polygon.

55 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Operation on polygons

Editing a vectorization withthe basic operations are:

Add

Delete

Move

Impact on the graphrepresentation

Through linearcombinations

It is possible to recreatethe usages of a personmodifying a vectorization

(a) Original (b) Add

(c) Delete (d) Move

Figure: Basic edit operationsapplied to a polygon.

55 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Operation on polygons

Editing a vectorization withthe basic operations are:

Add

Delete

Move

Impact on the graphrepresentation

Through linearcombinations

It is possible to recreatethe usages of a personmodifying a vectorization

(a) Original (b) Add

(c) Delete (d) Move

Figure: Basic edit operationsapplied to a polygon.

55 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Operation on polygons

Editing a vectorization withthe basic operations are:

Add

Delete

Move

Impact on the graphrepresentation

Through linearcombinations

It is possible to recreatethe usages of a personmodifying a vectorization

(a) Original (b) Add

(c) Delete (d) Move

Figure: Basic edit operationsapplied to a polygon.

55 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Operation on polygons

Editing a vectorization withthe basic operations are:

Add

Delete

Move

Impact on the graphrepresentation

Through linearcombinations

It is possible to recreatethe usages of a personmodifying a vectorization

(a) Original (b) Add

(c) Delete (d) Move

Figure: Basic edit operationsapplied to a polygon.

55 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Operation on polygons

Editing a vectorization withthe basic operations are:

Add

Delete

Move

Impact on the graphrepresentation

Through linearcombinations

It is possible to recreatethe usages of a personmodifying a vectorization

(a) Original (b) Add

(c) Delete (d) Move

Figure: Basic edit operationsapplied to a polygon.

55 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Impact on the graph representation

Edit operation for segment deletion

Delete a segment = Delete a node and an edge into the graph

We conclude:

deletions = γ((lAi )→ λ) + γ((ΦAi )→ λ)

Theses deletions create an orphan edge that must be reconnected

Consequently we conclude:

substitution = γ((ΦAi )→ (ΦB

j ))

56 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Impact on the graph representation

Edit operation for segment deletion

Delete a segment = Delete a node and an edge into the graph

We conclude:

deletions = γ((lAi )→ λ) + γ((ΦAi )→ λ)

Theses deletions create an orphan edge that must be reconnected

Consequently we conclude:

substitution = γ((ΦAi )→ (ΦB

j ))

56 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Impact on the graph representation

Edit operation for segment deletion

Delete a segment = Delete a node and an edge into the graph

We conclude:

deletions = γ((lAi )→ λ) + γ((ΦAi )→ λ)

Theses deletions create an orphan edge that must be reconnected

Consequently we conclude:

substitution = γ((ΦAi )→ (ΦB

j ))

56 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Impact on the graph representation

Edit operation for segment deletion

Delete a segment = Delete a node and an edge into the graph

We conclude:

deletions = γ((lAi )→ λ) + γ((ΦAi )→ λ)

Theses deletions create an orphan edge that must be reconnected

Consequently we conclude:

substitution = γ((ΦAi )→ (ΦB

j ))

56 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Impact on the graph representation

Edit operation for segment deletion

Finally, the sequence of operations is:

γ(si → λ) = deletions + substitution

Simply, we swap formal expressions by their corresponding costs:

γ(si → λ) =lAi|A|

+ΦA

i

360+|ΦA

i − ΦBj |

360

57 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Impact on the graph representation

Edit operation for segment deletion

Finally, the sequence of operations is:

γ(si → λ) = deletions + substitution

Simply, we swap formal expressions by their corresponding costs:

γ(si → λ) =lAi|A|

+ΦA

i

360+|ΦA

i − ΦBj |

360

57 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Experiments

Database description

Protocol definition

Tests:

Polygon Matching Distance (PMD)Matched Edit Distance (MED)Cadastral parcel retrieval evaluation

are PMD and MED representative of polygon deformations ?

Shape variationPolygonal approximation modification

58 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Experiments

Database description

Protocol definition

Tests:

Polygon Matching Distance (PMD)Matched Edit Distance (MED)Cadastral parcel retrieval evaluation

are PMD and MED representative of polygon deformations ?

Shape variationPolygonal approximation modification

58 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Experiments

Database description

Protocol definition

Tests:

Polygon Matching Distance (PMD)Matched Edit Distance (MED)Cadastral parcel retrieval evaluation

are PMD and MED representative of polygon deformations ?

Shape variationPolygonal approximation modification

58 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Experiments

Database description

Protocol definition

Tests:

Polygon Matching Distance (PMD)Matched Edit Distance (MED)Cadastral parcel retrieval evaluation

are PMD and MED representative of polygon deformations ?

Shape variationPolygonal approximation modification

58 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Databases I : Polygon location

Base A Shape distortion: Derived from [Delalandre 2010] and[Dosch 2006]

To evaluate polygon detection methods

Figure: A sample among the seventy symbols used in our ranking test.

Figure: Examples of increasing levels of vectorial distortion59 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Databases II : Polygonal approximation evaluation

Base B Binary degradation: From the data set provided bythe GREC’03 contest.

The higher is the noise level the higher are the distortions onthe polygonal approximation.

60 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Databases III

61 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Databases III

61 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Databases IV

Base C Cadastral map collection from ALPAGE project

Computer generated elements (CG).

Manually vectorized references (GT).

(a) GT (b) CG

Figure: Two vectorizations to be mapped (|DCG | = 46 |DGT | = 40).

62 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Application to the evaluation of parcel detection I

A visual dissimilarity measure of local anomalies:

Comparing maps two by two.It facilitates the spotting of errorsA visual signs are worth a thousand words

(a) GT (b) CG

Figure: Two vectorizations to be mapped (|DCG | = 46 |DGT | = 40).

63 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Application to the evaluation of parcel detection II

Figure: Local dissimilarities between the two maps. The lighter thebetter.

64 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Summary I

A protocol for performance evaluation of polygon detectionalgorithms.

Our protocol is positioned as an extension of prior works, anextension at polygon level.

Our contribution is two-fold

An object mapping algorithm to roughly locate errors withinthe drawing.A cycle graph matching distance that depicts the accuracy ofthe polygonal approximation.

65 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Summary I

A protocol for performance evaluation of polygon detectionalgorithms.

Our protocol is positioned as an extension of prior works, anextension at polygon level.

Our contribution is two-fold

An object mapping algorithm to roughly locate errors withinthe drawing.A cycle graph matching distance that depicts the accuracy ofthe polygonal approximation.

65 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Summary I

A protocol for performance evaluation of polygon detectionalgorithms.

Our protocol is positioned as an extension of prior works, anextension at polygon level.

Our contribution is two-fold

An object mapping algorithm to roughly locate errors withinthe drawing.A cycle graph matching distance that depicts the accuracy ofthe polygonal approximation.

65 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Summary II

Both contributions were theoretically defined and adapted tothe PE of polygonized documents.

A set distance for the polygon matching distance (PMD)Dedicated edit costs for the graph matching method (MED)

The behavior of our set of indices was analyzed whenincreasing image degradation.

66 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Summary II

Both contributions were theoretically defined and adapted tothe PE of polygonized documents.

A set distance for the polygon matching distance (PMD)Dedicated edit costs for the graph matching method (MED)

The behavior of our set of indices was analyzed whenincreasing image degradation.

66 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Summary II

Both contributions were theoretically defined and adapted tothe PE of polygonized documents.

A set distance for the polygon matching distance (PMD)Dedicated edit costs for the graph matching method (MED)

The behavior of our set of indices was analyzed whenincreasing image degradation.

66 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

State of the artProblem statementOur proposal: Polygon detection qualitySummary

Summary II

Both contributions were theoretically defined and adapted tothe PE of polygonized documents.

A set distance for the polygon matching distance (PMD)Dedicated edit costs for the graph matching method (MED)

The behavior of our set of indices was analyzed whenincreasing image degradation.

66 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statement

Ontology inference from instances

Figure: Map interpretation strategy

67 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statement

Ontology inference from instances

Figure: UML representation designed thanks to the Eclipse ModelingFramework (EMF)

68 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statement

Ontology inference from instances

69 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statement

Ontology inference from instances

Figure: Data flow process for meta-model inference from a modelinstance

70 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statement

To take the stock on our work

Summary:

Graph-based representation

Graph comparison

Graph learning

Vectorized Document evaluation

71 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statement

To take the stock on our work

Summary:

Graph-based representation

Graph comparison

Graph learning

Vectorized Document evaluation

71 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statement

To take the stock on our work

Summary:

Graph-based representation

Graph comparison

Graph learning

Vectorized Document evaluation

71 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statement

To take the stock on our work

Summary:

Graph-based representation

Graph comparison

Graph learning

Vectorized Document evaluation

71 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statement

Perspective

Near future:

Graph: To check out the influence of subgraph matching withdifferent depths (1,2,...,|G |).

Will it increase the accuracy in classification ?What about time consumption ?

Performance Evaluation: To extend method to morecomplex objects than polygons.

To extend the concept to connected segments around the rootpolygon in order to constitute piece of symbols.To change the scope of our performance evaluation tool to thedirection of object spotting.

72 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statement

Perspective

Near future:

Graph: To check out the influence of subgraph matching withdifferent depths (1,2,...,|G |).

Will it increase the accuracy in classification ?What about time consumption ?

Performance Evaluation: To extend method to morecomplex objects than polygons.

To extend the concept to connected segments around the rootpolygon in order to constitute piece of symbols.To change the scope of our performance evaluation tool to thedirection of object spotting.

72 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statement

Food for thought

Long-term future: Semantic

Graph Matching for Model or Ontology Comparison

Image Processing Driven by Knowledge

73 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statement

Food for thought

Long-term future: Semantic

Graph Matching for Model or Ontology Comparison

Image Processing Driven by Knowledge

73 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statement

Key contributions

Graph classification: Published in PRL:

R. Raveaux J.-C. Burie and J.-M. Ogier. “A graph matchingmethod and a graph matching distance based on subgraphassignments”, Pattern Recognition Letters, 2009.

Performance Evaluation: Published in IJDAR:

R. Raveaux, J-C Burie and J-M Ogier. ”A Local Evaluation ofVectorized Documents by means of Polygon Assignments andMatching”,International Journal on Document Analysis andRecognition, 2010.

Graph mining : On the way, round 2 in CVIU:

R. Raveaux, S. Adam, P. Heroux, E. Trupin. ”Learning GraphPrototypes for Shape Recognition”, Computer Vision andImage Understanding .

74 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statement

Thank you for your attention

Some links:Software: http://alpage-l3i.univ-lr.frContact: http://romain.raveaux.free.frContest: http://iapr-tc10.univ-lr.fr

75 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statement

A Chhabra and I Phillips.The Second International Graphics Recognition Contest -Raster to Vector Conversion: A Report.Graphics Recognition: Algorithms and Systems, Lecture Notesin Computer Science, Springer, vol. 1389, 1998.

Mathieu Delalandre, Ernest Valveny, Tony Pridmore andDimosthenis Karatzas.Generation of synthetic documents for performance evaluationof symbol recognition & spotting systems.International Journal on Document Analysis and Recognition,page Online first, 2010.

Philippe Dosch and Ernest Valveny.Report on the Second Symbol Recognition Contest, 2006.

Steven Gold and Anand Rangarajan.Graph matching by graduated assignment.

75 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statement

IEEE transactions on pattern analysis and machineintelligence, pages 239–244, 1996.

O Hori and D S Doermann.Quantitative measurement of the performance ofraster-to-vector conversion algorithms.Graphics recognition – methods and applications (LectureNotes in Computer Science), vol. 1072, pages 57–68, 1996.

B Kong, I T Phillips, R M Haralick, A Prasad and R Kasturi.A benchmark: performance evaluation of dashed-line detectionalgorithms.Graphics recognition – methods and applications (LectureNotes in Computer Science), vol. 1072, pages 270–285, 1996.

H W Kuhn.The Hungarian method for the assignment problem.Naval Research Logistic Quarterly, vol. 2, pages 83–97, 1955.

75 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statement

J Llados, E Marti and J J Villanueva.Symbol Recognition by Error-Tolerant Subgraph Matchingbetween Region Adjacency Graphs.IEEE Transactions on Pattern Analysis and MachineIntelligence, vol. 23, pages 1137–1143, 2001.

Kaspar Riesen and Horst Bunke.Approximate graph edit distance computation by means ofbipartite graph matching.Image Vision Comput., vol. 27, no. 7, pages 950–959, 2009.

Ali Shokoufandeh, Lars Bretzner, Diego Macrini, M FatihDemirci, Clas Jonsson and Sven Dickinson.The representation and matching of categorical shape.Computer Vision and Image Understanding, vol. 103, no. 2,pages 139–154, 2006.

Liu Wenyin and Dov Dori.

75 / 75

Graph comparisonPrototype-Based Classification of Graphs

Evaluation of Vectorized DocumentsOntology generator from RDF documents

Problem statement

A protocol for performance evaluation of line detectionalgorithms.Machine Vision and Applications, vol. 9, no. 5, pages 240–250,1997.

75 / 75

Recommended