View
6
Download
0
Category
Preview:
Citation preview
Graph Edit Distance
Benoıt Gauzere
June 30th 2016
Introduction
GED computation
Experiences
Discussion
Structural Information
Use of Structure- Activity Data To Compare Structure-Based Clustering Methods andDescriptors for Use in Compound Selection
Robert D Browndagger and Yvonne C MartinDagger
Pharmaceutical Products Division Abbott Laboratories D47EAP10 100 Abbott Park RoadAbbott Park Illinois 60064-3500
Received September 6 1995X
An evaluation of a variety of structure-based clustering methods for use in compound selection is presentedThe use of MACCS Unity and Daylight 2D descriptors Unity 3D rigid and flexible descriptors and twoin-house 3D descriptors based on potential pharmacophore points are considered The use of Wardrsquos andgroup-average hierarchical agglomerative Guenoche hierarchical divisive and Jarvis- Patrick nonhierarchicalclustering methods are compared The results suggest that 2D descriptors and hierarchical clustering methodsare best at separating biologically active molecules from inactives a prerequisite for a good compoundselection method In particular the combination of MACCS descriptors and Wardrsquos clustering was optimal
INTRODUCTION
The advent of high-throughput biological screening meth-ods have given pharmaceutical companies the ability toscreen many thousands of compounds in a short timeHowever there are many hundreds of thousands of com-pounds available both in-house and from commercial ven-dors Whilst it may be feasible to screen many or all of thecompounds available this is undesirable for reasons of costand time and may be unnecessary if it results in theproduction of some redundant information Therefore therehas been a great deal of interest in the use of compoundclustering techniques to aid in the selection of a representativesubset of all the compounds available1- 8 A similar problemfaces those interested in designing compounds for synthesisa good design will capture all the required information inthe minimum number of compounds9
Underpinning the compound selection methods is thesimilar property principle10 which states that structurallysimilar molecules will exhibit similar physicochemical andbiological properties Given a clustering method that cangroup structurally similar compounds together applicationof this principle implies that the selection or synthesis andtesting of representatives from each cluster produced froma set of compounds should be sufficient to understand thestructure- activity relationships of the whole set without theneed to test them allAn appropriate clustering method will ideally cluster all
similar compounds together whilst separating active andinactive compounds into different sets of clusters The firstfactor will ensure that every class of active compound isrepresented in the selected subset but that there is noredundancy The second factor will minimize the risk thatan inactive compound is selected as the representative of acluster containing one or more actives thereby missing aclass of active compoundsClustering is the process of dividing a set of entities into
subsets in which the members of each subset are similar toeach other but different from members of other subsets There
have been numerous cluster methods described generaldiscussions of many of these are given by Gordon11 byEverett12 and by Sneath and Sokal13 Several of thesemethods have be applied to clustering chemical structurescomprehensive reviews are given by Barnard and Downs14
and by Downs and Willett15 In outline the clusteringprocess for chemical structures is as follows(1) Select a set of attributes on which to base the
comparison of the structures These may be structuralfeatures andor physicochemical properties(2) Characterize every structure in the dataset in terms of
the attributes selected in step one(3) Calculate a coefficient of similarity dissimilarity or
distance between every pair of structures in the dataset basedon their attributes(4) Use a clustering method to group together similar
structures based on the coefficients calculated in step threeSome clustering methods may require the calculation ofsimilarity values between the new objects formed and theexisting objects(5) Analyze the resultant clusters or classification hierarchy
to determine which of the possible sets of clusters shouldbe chosenA number of methods are available both for the production
of descriptors in steps (1) and (2) and clusters in step (4)Whilst there are also a large number of coefficients that mightbe used in step (3) the choice of clustering method maydetermine which is best suitedIn this paper we present a study aimed at identifying the
most suitable descriptors and clustering methods for use incompound selection We have used a variety of methods tocluster sets of structures with known biological activities andevaluated the clusters produced according to their ability toseparate active and inactive compounds into different setsof clusters We have concerned ourselves with structurebased clustering For this the substructure search screensused in commercial database searching software have oftenbeen used as descriptors We have examined a number ofthese descriptors together with two developed in-house andhave considered the use of four commercially availableclustering methods
dagger brownrabbottcomDagger yvonnemartinabbottcomX Abstract published in AdVance ACS Abstracts January 15 1996
572 J Chem Inf Comput Sci 1996 36 572- 584
0095-2338961636-0572$12000 copy 1996 American Chemical Society
Cl
NN
CH3O
O
NH
NN
H3C O
O
HN
Cl
1 42
Encoding structural information
Vectors
I Embedding in an Euclidean space
All machine learning methods available
Loss of structural information
NN
C
O
C
2 42
Graph definition
A graph G isin GG = (V E) is a set of nodes V connected by a set ofedges E = V times V
If (vi vj) isin E then Vi is adjacent to Vj
3 42
Labelling function
Node labelling
I lv V rarr LVI LV Node label alphabet (symbolic real valued )
Edge labelling
I le V times V rarr LEI LE Edge label alphabet
4 42
Some graphs
Chemical science
NN
C
O
C
Symbolic labels atoms
Pattern Recognition
Vector labels Shape characteristics
5 42
Why use graphs
Graphs can handle structural informationBUTG 6= RN
We need to define a dissimilarity measure
6 42
Graph Edit Distance (GED)
O
C N
N
CC
CC
C
C
D
N
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
D
N
N
D
CC
C
CC
C
I Dissimilarity measure
I Quantify a distortion
7 42
Edit path
Graph edit distanceMinimal amount of distortion required to transform one graph into another
I Edit path γ Sequence of edit operations e
γ = e1 ep
I Elementary edit cost c(e)
D
N
N
D
CC
C
CC
C
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
N
N
CC
CC
C
C
C
O DN
D
N
8 42
Formal Definition
Edit path cost γ
c(γ) =sumeisinγ
c(e)
Graph edit distance
I Edit Distance ged(G1G2) = min
γisinP
sumc(γ)
I Optimal edit path
γlowast isin arg minγisinP
sumc(γ)
9 42
How to compute
Graph Edit Distance
Tree search
11 42
Tree search methods
Alowast
I Dijkstra-based algorithm
I Need an heuristic h(p)
Always find a solution
But may take a loooooong time
Exponential number of edit paths
Depth first search based algorithm[Abu-Aisheh et al 2015]
I Based on heuristic
I Limitation on the number of open paths
I Any time algorithm can return an approximation before termination
I Parallelizable
12 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
D
N
N
D
CC
C
CC
C
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
N
N
CC
CC
C
C
C
O DN
D
N
13 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
C
O DN
D
N
D
N
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
13 42
Computing graph edit distance
is equivalent to
finding an optimal assignment
between nodes
14 42
Cost
Mapping ϕ V1 cup εrarr V2 cup ε
S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost
+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost
15 42
Cost Matrix C
Lv (V ε1 V
ε2 ϕ) =
sumvisinV1
c(v ϕ(v))
︸ ︷︷ ︸node substitutions
+sum
visinV1V1
c(v ε)
︸ ︷︷ ︸node removals
+sum
visinV2V2
c(ε v)
︸ ︷︷ ︸node insertions
c(v(1)1 rarr v
(2)1 ) middot middot middot c(v
(1)1 rarr v
(2)m ) c(v
(1)1 rarr ε) infin middot middot middot infin
infin
c(v(1)i rarr v
(2)j )
c(v
(1)i rarr ε)
c(v(1)n rarr v
(2)1 ) c(v
(1)n rarr v
(2)m ) infin c(v
(1)n rarr ε)
c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0
infin
c(εrarr v(2)j )
infin c(εrarr v(2)m ) 0 0
16 42
Node operations cost
Node cost induced by ϕ
Lv (V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi=1
|G1|+|G2|sumj=1
(Xϕ C)(i j)
Vectorized version
Lv (V ε1 V
ε2 ϕ) = cgtxϕ
17 42
Computation of edge costs
Edges operations cost I
Edge cost matrix D
D(i j k l) = c((i j)rarr (k l))
I (i j) isin E1 rarr deletion operation
I (k l) isin E2 rarr insertion operation
I Edgersquos mapping is induced by nodes mapping
19 42
Edges operations cost II
Edgersquos cost
Qe(V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi j k l=1
Xϕ(i k)D(i j k l)Xϕ(j l)
Vectorized version
Qe(V ε1 V
ε2 ϕ) = xgtϕDxϕ
20 42
QAP formulation of GED
S(x) =1
2xgtDx︸ ︷︷ ︸
Edgersquos cost
+ cgtx︸︷︷︸Nodersquos cost
S(x) =1
2xgt∆x
xlowast = arg minxisinΠ
S(x)
21 42
An intractable problem
No garanties on ∆
rArr Non convex problem
rArr No polynomial solution of minxisinΠ
S(x)
I NP-Hard problem
Letrsquos look for an approximation
22 42
Approximation is overestimation
1 Any mapping corresponds to an edit path
2 Any edit path has a cost ge GED
3 Approximate mappinghArr Overestimation of GED
23 42
A First Approach
Linear approximation
xlowast = minxisinΠ
1
2xgtDx + cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Introduction
GED computation
Experiences
Discussion
Structural Information
Use of Structure- Activity Data To Compare Structure-Based Clustering Methods andDescriptors for Use in Compound Selection
Robert D Browndagger and Yvonne C MartinDagger
Pharmaceutical Products Division Abbott Laboratories D47EAP10 100 Abbott Park RoadAbbott Park Illinois 60064-3500
Received September 6 1995X
An evaluation of a variety of structure-based clustering methods for use in compound selection is presentedThe use of MACCS Unity and Daylight 2D descriptors Unity 3D rigid and flexible descriptors and twoin-house 3D descriptors based on potential pharmacophore points are considered The use of Wardrsquos andgroup-average hierarchical agglomerative Guenoche hierarchical divisive and Jarvis- Patrick nonhierarchicalclustering methods are compared The results suggest that 2D descriptors and hierarchical clustering methodsare best at separating biologically active molecules from inactives a prerequisite for a good compoundselection method In particular the combination of MACCS descriptors and Wardrsquos clustering was optimal
INTRODUCTION
The advent of high-throughput biological screening meth-ods have given pharmaceutical companies the ability toscreen many thousands of compounds in a short timeHowever there are many hundreds of thousands of com-pounds available both in-house and from commercial ven-dors Whilst it may be feasible to screen many or all of thecompounds available this is undesirable for reasons of costand time and may be unnecessary if it results in theproduction of some redundant information Therefore therehas been a great deal of interest in the use of compoundclustering techniques to aid in the selection of a representativesubset of all the compounds available1- 8 A similar problemfaces those interested in designing compounds for synthesisa good design will capture all the required information inthe minimum number of compounds9
Underpinning the compound selection methods is thesimilar property principle10 which states that structurallysimilar molecules will exhibit similar physicochemical andbiological properties Given a clustering method that cangroup structurally similar compounds together applicationof this principle implies that the selection or synthesis andtesting of representatives from each cluster produced froma set of compounds should be sufficient to understand thestructure- activity relationships of the whole set without theneed to test them allAn appropriate clustering method will ideally cluster all
similar compounds together whilst separating active andinactive compounds into different sets of clusters The firstfactor will ensure that every class of active compound isrepresented in the selected subset but that there is noredundancy The second factor will minimize the risk thatan inactive compound is selected as the representative of acluster containing one or more actives thereby missing aclass of active compoundsClustering is the process of dividing a set of entities into
subsets in which the members of each subset are similar toeach other but different from members of other subsets There
have been numerous cluster methods described generaldiscussions of many of these are given by Gordon11 byEverett12 and by Sneath and Sokal13 Several of thesemethods have be applied to clustering chemical structurescomprehensive reviews are given by Barnard and Downs14
and by Downs and Willett15 In outline the clusteringprocess for chemical structures is as follows(1) Select a set of attributes on which to base the
comparison of the structures These may be structuralfeatures andor physicochemical properties(2) Characterize every structure in the dataset in terms of
the attributes selected in step one(3) Calculate a coefficient of similarity dissimilarity or
distance between every pair of structures in the dataset basedon their attributes(4) Use a clustering method to group together similar
structures based on the coefficients calculated in step threeSome clustering methods may require the calculation ofsimilarity values between the new objects formed and theexisting objects(5) Analyze the resultant clusters or classification hierarchy
to determine which of the possible sets of clusters shouldbe chosenA number of methods are available both for the production
of descriptors in steps (1) and (2) and clusters in step (4)Whilst there are also a large number of coefficients that mightbe used in step (3) the choice of clustering method maydetermine which is best suitedIn this paper we present a study aimed at identifying the
most suitable descriptors and clustering methods for use incompound selection We have used a variety of methods tocluster sets of structures with known biological activities andevaluated the clusters produced according to their ability toseparate active and inactive compounds into different setsof clusters We have concerned ourselves with structurebased clustering For this the substructure search screensused in commercial database searching software have oftenbeen used as descriptors We have examined a number ofthese descriptors together with two developed in-house andhave considered the use of four commercially availableclustering methods
dagger brownrabbottcomDagger yvonnemartinabbottcomX Abstract published in AdVance ACS Abstracts January 15 1996
572 J Chem Inf Comput Sci 1996 36 572- 584
0095-2338961636-0572$12000 copy 1996 American Chemical Society
Cl
NN
CH3O
O
NH
NN
H3C O
O
HN
Cl
1 42
Encoding structural information
Vectors
I Embedding in an Euclidean space
All machine learning methods available
Loss of structural information
NN
C
O
C
2 42
Graph definition
A graph G isin GG = (V E) is a set of nodes V connected by a set ofedges E = V times V
If (vi vj) isin E then Vi is adjacent to Vj
3 42
Labelling function
Node labelling
I lv V rarr LVI LV Node label alphabet (symbolic real valued )
Edge labelling
I le V times V rarr LEI LE Edge label alphabet
4 42
Some graphs
Chemical science
NN
C
O
C
Symbolic labels atoms
Pattern Recognition
Vector labels Shape characteristics
5 42
Why use graphs
Graphs can handle structural informationBUTG 6= RN
We need to define a dissimilarity measure
6 42
Graph Edit Distance (GED)
O
C N
N
CC
CC
C
C
D
N
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
D
N
N
D
CC
C
CC
C
I Dissimilarity measure
I Quantify a distortion
7 42
Edit path
Graph edit distanceMinimal amount of distortion required to transform one graph into another
I Edit path γ Sequence of edit operations e
γ = e1 ep
I Elementary edit cost c(e)
D
N
N
D
CC
C
CC
C
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
N
N
CC
CC
C
C
C
O DN
D
N
8 42
Formal Definition
Edit path cost γ
c(γ) =sumeisinγ
c(e)
Graph edit distance
I Edit Distance ged(G1G2) = min
γisinP
sumc(γ)
I Optimal edit path
γlowast isin arg minγisinP
sumc(γ)
9 42
How to compute
Graph Edit Distance
Tree search
11 42
Tree search methods
Alowast
I Dijkstra-based algorithm
I Need an heuristic h(p)
Always find a solution
But may take a loooooong time
Exponential number of edit paths
Depth first search based algorithm[Abu-Aisheh et al 2015]
I Based on heuristic
I Limitation on the number of open paths
I Any time algorithm can return an approximation before termination
I Parallelizable
12 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
D
N
N
D
CC
C
CC
C
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
N
N
CC
CC
C
C
C
O DN
D
N
13 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
C
O DN
D
N
D
N
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
13 42
Computing graph edit distance
is equivalent to
finding an optimal assignment
between nodes
14 42
Cost
Mapping ϕ V1 cup εrarr V2 cup ε
S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost
+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost
15 42
Cost Matrix C
Lv (V ε1 V
ε2 ϕ) =
sumvisinV1
c(v ϕ(v))
︸ ︷︷ ︸node substitutions
+sum
visinV1V1
c(v ε)
︸ ︷︷ ︸node removals
+sum
visinV2V2
c(ε v)
︸ ︷︷ ︸node insertions
c(v(1)1 rarr v
(2)1 ) middot middot middot c(v
(1)1 rarr v
(2)m ) c(v
(1)1 rarr ε) infin middot middot middot infin
infin
c(v(1)i rarr v
(2)j )
c(v
(1)i rarr ε)
c(v(1)n rarr v
(2)1 ) c(v
(1)n rarr v
(2)m ) infin c(v
(1)n rarr ε)
c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0
infin
c(εrarr v(2)j )
infin c(εrarr v(2)m ) 0 0
16 42
Node operations cost
Node cost induced by ϕ
Lv (V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi=1
|G1|+|G2|sumj=1
(Xϕ C)(i j)
Vectorized version
Lv (V ε1 V
ε2 ϕ) = cgtxϕ
17 42
Computation of edge costs
Edges operations cost I
Edge cost matrix D
D(i j k l) = c((i j)rarr (k l))
I (i j) isin E1 rarr deletion operation
I (k l) isin E2 rarr insertion operation
I Edgersquos mapping is induced by nodes mapping
19 42
Edges operations cost II
Edgersquos cost
Qe(V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi j k l=1
Xϕ(i k)D(i j k l)Xϕ(j l)
Vectorized version
Qe(V ε1 V
ε2 ϕ) = xgtϕDxϕ
20 42
QAP formulation of GED
S(x) =1
2xgtDx︸ ︷︷ ︸
Edgersquos cost
+ cgtx︸︷︷︸Nodersquos cost
S(x) =1
2xgt∆x
xlowast = arg minxisinΠ
S(x)
21 42
An intractable problem
No garanties on ∆
rArr Non convex problem
rArr No polynomial solution of minxisinΠ
S(x)
I NP-Hard problem
Letrsquos look for an approximation
22 42
Approximation is overestimation
1 Any mapping corresponds to an edit path
2 Any edit path has a cost ge GED
3 Approximate mappinghArr Overestimation of GED
23 42
A First Approach
Linear approximation
xlowast = minxisinΠ
1
2xgtDx + cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Structural Information
Use of Structure- Activity Data To Compare Structure-Based Clustering Methods andDescriptors for Use in Compound Selection
Robert D Browndagger and Yvonne C MartinDagger
Pharmaceutical Products Division Abbott Laboratories D47EAP10 100 Abbott Park RoadAbbott Park Illinois 60064-3500
Received September 6 1995X
An evaluation of a variety of structure-based clustering methods for use in compound selection is presentedThe use of MACCS Unity and Daylight 2D descriptors Unity 3D rigid and flexible descriptors and twoin-house 3D descriptors based on potential pharmacophore points are considered The use of Wardrsquos andgroup-average hierarchical agglomerative Guenoche hierarchical divisive and Jarvis- Patrick nonhierarchicalclustering methods are compared The results suggest that 2D descriptors and hierarchical clustering methodsare best at separating biologically active molecules from inactives a prerequisite for a good compoundselection method In particular the combination of MACCS descriptors and Wardrsquos clustering was optimal
INTRODUCTION
The advent of high-throughput biological screening meth-ods have given pharmaceutical companies the ability toscreen many thousands of compounds in a short timeHowever there are many hundreds of thousands of com-pounds available both in-house and from commercial ven-dors Whilst it may be feasible to screen many or all of thecompounds available this is undesirable for reasons of costand time and may be unnecessary if it results in theproduction of some redundant information Therefore therehas been a great deal of interest in the use of compoundclustering techniques to aid in the selection of a representativesubset of all the compounds available1- 8 A similar problemfaces those interested in designing compounds for synthesisa good design will capture all the required information inthe minimum number of compounds9
Underpinning the compound selection methods is thesimilar property principle10 which states that structurallysimilar molecules will exhibit similar physicochemical andbiological properties Given a clustering method that cangroup structurally similar compounds together applicationof this principle implies that the selection or synthesis andtesting of representatives from each cluster produced froma set of compounds should be sufficient to understand thestructure- activity relationships of the whole set without theneed to test them allAn appropriate clustering method will ideally cluster all
similar compounds together whilst separating active andinactive compounds into different sets of clusters The firstfactor will ensure that every class of active compound isrepresented in the selected subset but that there is noredundancy The second factor will minimize the risk thatan inactive compound is selected as the representative of acluster containing one or more actives thereby missing aclass of active compoundsClustering is the process of dividing a set of entities into
subsets in which the members of each subset are similar toeach other but different from members of other subsets There
have been numerous cluster methods described generaldiscussions of many of these are given by Gordon11 byEverett12 and by Sneath and Sokal13 Several of thesemethods have be applied to clustering chemical structurescomprehensive reviews are given by Barnard and Downs14
and by Downs and Willett15 In outline the clusteringprocess for chemical structures is as follows(1) Select a set of attributes on which to base the
comparison of the structures These may be structuralfeatures andor physicochemical properties(2) Characterize every structure in the dataset in terms of
the attributes selected in step one(3) Calculate a coefficient of similarity dissimilarity or
distance between every pair of structures in the dataset basedon their attributes(4) Use a clustering method to group together similar
structures based on the coefficients calculated in step threeSome clustering methods may require the calculation ofsimilarity values between the new objects formed and theexisting objects(5) Analyze the resultant clusters or classification hierarchy
to determine which of the possible sets of clusters shouldbe chosenA number of methods are available both for the production
of descriptors in steps (1) and (2) and clusters in step (4)Whilst there are also a large number of coefficients that mightbe used in step (3) the choice of clustering method maydetermine which is best suitedIn this paper we present a study aimed at identifying the
most suitable descriptors and clustering methods for use incompound selection We have used a variety of methods tocluster sets of structures with known biological activities andevaluated the clusters produced according to their ability toseparate active and inactive compounds into different setsof clusters We have concerned ourselves with structurebased clustering For this the substructure search screensused in commercial database searching software have oftenbeen used as descriptors We have examined a number ofthese descriptors together with two developed in-house andhave considered the use of four commercially availableclustering methods
dagger brownrabbottcomDagger yvonnemartinabbottcomX Abstract published in AdVance ACS Abstracts January 15 1996
572 J Chem Inf Comput Sci 1996 36 572- 584
0095-2338961636-0572$12000 copy 1996 American Chemical Society
Cl
NN
CH3O
O
NH
NN
H3C O
O
HN
Cl
1 42
Encoding structural information
Vectors
I Embedding in an Euclidean space
All machine learning methods available
Loss of structural information
NN
C
O
C
2 42
Graph definition
A graph G isin GG = (V E) is a set of nodes V connected by a set ofedges E = V times V
If (vi vj) isin E then Vi is adjacent to Vj
3 42
Labelling function
Node labelling
I lv V rarr LVI LV Node label alphabet (symbolic real valued )
Edge labelling
I le V times V rarr LEI LE Edge label alphabet
4 42
Some graphs
Chemical science
NN
C
O
C
Symbolic labels atoms
Pattern Recognition
Vector labels Shape characteristics
5 42
Why use graphs
Graphs can handle structural informationBUTG 6= RN
We need to define a dissimilarity measure
6 42
Graph Edit Distance (GED)
O
C N
N
CC
CC
C
C
D
N
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
D
N
N
D
CC
C
CC
C
I Dissimilarity measure
I Quantify a distortion
7 42
Edit path
Graph edit distanceMinimal amount of distortion required to transform one graph into another
I Edit path γ Sequence of edit operations e
γ = e1 ep
I Elementary edit cost c(e)
D
N
N
D
CC
C
CC
C
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
N
N
CC
CC
C
C
C
O DN
D
N
8 42
Formal Definition
Edit path cost γ
c(γ) =sumeisinγ
c(e)
Graph edit distance
I Edit Distance ged(G1G2) = min
γisinP
sumc(γ)
I Optimal edit path
γlowast isin arg minγisinP
sumc(γ)
9 42
How to compute
Graph Edit Distance
Tree search
11 42
Tree search methods
Alowast
I Dijkstra-based algorithm
I Need an heuristic h(p)
Always find a solution
But may take a loooooong time
Exponential number of edit paths
Depth first search based algorithm[Abu-Aisheh et al 2015]
I Based on heuristic
I Limitation on the number of open paths
I Any time algorithm can return an approximation before termination
I Parallelizable
12 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
D
N
N
D
CC
C
CC
C
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
N
N
CC
CC
C
C
C
O DN
D
N
13 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
C
O DN
D
N
D
N
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
13 42
Computing graph edit distance
is equivalent to
finding an optimal assignment
between nodes
14 42
Cost
Mapping ϕ V1 cup εrarr V2 cup ε
S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost
+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost
15 42
Cost Matrix C
Lv (V ε1 V
ε2 ϕ) =
sumvisinV1
c(v ϕ(v))
︸ ︷︷ ︸node substitutions
+sum
visinV1V1
c(v ε)
︸ ︷︷ ︸node removals
+sum
visinV2V2
c(ε v)
︸ ︷︷ ︸node insertions
c(v(1)1 rarr v
(2)1 ) middot middot middot c(v
(1)1 rarr v
(2)m ) c(v
(1)1 rarr ε) infin middot middot middot infin
infin
c(v(1)i rarr v
(2)j )
c(v
(1)i rarr ε)
c(v(1)n rarr v
(2)1 ) c(v
(1)n rarr v
(2)m ) infin c(v
(1)n rarr ε)
c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0
infin
c(εrarr v(2)j )
infin c(εrarr v(2)m ) 0 0
16 42
Node operations cost
Node cost induced by ϕ
Lv (V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi=1
|G1|+|G2|sumj=1
(Xϕ C)(i j)
Vectorized version
Lv (V ε1 V
ε2 ϕ) = cgtxϕ
17 42
Computation of edge costs
Edges operations cost I
Edge cost matrix D
D(i j k l) = c((i j)rarr (k l))
I (i j) isin E1 rarr deletion operation
I (k l) isin E2 rarr insertion operation
I Edgersquos mapping is induced by nodes mapping
19 42
Edges operations cost II
Edgersquos cost
Qe(V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi j k l=1
Xϕ(i k)D(i j k l)Xϕ(j l)
Vectorized version
Qe(V ε1 V
ε2 ϕ) = xgtϕDxϕ
20 42
QAP formulation of GED
S(x) =1
2xgtDx︸ ︷︷ ︸
Edgersquos cost
+ cgtx︸︷︷︸Nodersquos cost
S(x) =1
2xgt∆x
xlowast = arg minxisinΠ
S(x)
21 42
An intractable problem
No garanties on ∆
rArr Non convex problem
rArr No polynomial solution of minxisinΠ
S(x)
I NP-Hard problem
Letrsquos look for an approximation
22 42
Approximation is overestimation
1 Any mapping corresponds to an edit path
2 Any edit path has a cost ge GED
3 Approximate mappinghArr Overestimation of GED
23 42
A First Approach
Linear approximation
xlowast = minxisinΠ
1
2xgtDx + cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Encoding structural information
Vectors
I Embedding in an Euclidean space
All machine learning methods available
Loss of structural information
NN
C
O
C
2 42
Graph definition
A graph G isin GG = (V E) is a set of nodes V connected by a set ofedges E = V times V
If (vi vj) isin E then Vi is adjacent to Vj
3 42
Labelling function
Node labelling
I lv V rarr LVI LV Node label alphabet (symbolic real valued )
Edge labelling
I le V times V rarr LEI LE Edge label alphabet
4 42
Some graphs
Chemical science
NN
C
O
C
Symbolic labels atoms
Pattern Recognition
Vector labels Shape characteristics
5 42
Why use graphs
Graphs can handle structural informationBUTG 6= RN
We need to define a dissimilarity measure
6 42
Graph Edit Distance (GED)
O
C N
N
CC
CC
C
C
D
N
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
D
N
N
D
CC
C
CC
C
I Dissimilarity measure
I Quantify a distortion
7 42
Edit path
Graph edit distanceMinimal amount of distortion required to transform one graph into another
I Edit path γ Sequence of edit operations e
γ = e1 ep
I Elementary edit cost c(e)
D
N
N
D
CC
C
CC
C
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
N
N
CC
CC
C
C
C
O DN
D
N
8 42
Formal Definition
Edit path cost γ
c(γ) =sumeisinγ
c(e)
Graph edit distance
I Edit Distance ged(G1G2) = min
γisinP
sumc(γ)
I Optimal edit path
γlowast isin arg minγisinP
sumc(γ)
9 42
How to compute
Graph Edit Distance
Tree search
11 42
Tree search methods
Alowast
I Dijkstra-based algorithm
I Need an heuristic h(p)
Always find a solution
But may take a loooooong time
Exponential number of edit paths
Depth first search based algorithm[Abu-Aisheh et al 2015]
I Based on heuristic
I Limitation on the number of open paths
I Any time algorithm can return an approximation before termination
I Parallelizable
12 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
D
N
N
D
CC
C
CC
C
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
N
N
CC
CC
C
C
C
O DN
D
N
13 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
C
O DN
D
N
D
N
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
13 42
Computing graph edit distance
is equivalent to
finding an optimal assignment
between nodes
14 42
Cost
Mapping ϕ V1 cup εrarr V2 cup ε
S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost
+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost
15 42
Cost Matrix C
Lv (V ε1 V
ε2 ϕ) =
sumvisinV1
c(v ϕ(v))
︸ ︷︷ ︸node substitutions
+sum
visinV1V1
c(v ε)
︸ ︷︷ ︸node removals
+sum
visinV2V2
c(ε v)
︸ ︷︷ ︸node insertions
c(v(1)1 rarr v
(2)1 ) middot middot middot c(v
(1)1 rarr v
(2)m ) c(v
(1)1 rarr ε) infin middot middot middot infin
infin
c(v(1)i rarr v
(2)j )
c(v
(1)i rarr ε)
c(v(1)n rarr v
(2)1 ) c(v
(1)n rarr v
(2)m ) infin c(v
(1)n rarr ε)
c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0
infin
c(εrarr v(2)j )
infin c(εrarr v(2)m ) 0 0
16 42
Node operations cost
Node cost induced by ϕ
Lv (V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi=1
|G1|+|G2|sumj=1
(Xϕ C)(i j)
Vectorized version
Lv (V ε1 V
ε2 ϕ) = cgtxϕ
17 42
Computation of edge costs
Edges operations cost I
Edge cost matrix D
D(i j k l) = c((i j)rarr (k l))
I (i j) isin E1 rarr deletion operation
I (k l) isin E2 rarr insertion operation
I Edgersquos mapping is induced by nodes mapping
19 42
Edges operations cost II
Edgersquos cost
Qe(V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi j k l=1
Xϕ(i k)D(i j k l)Xϕ(j l)
Vectorized version
Qe(V ε1 V
ε2 ϕ) = xgtϕDxϕ
20 42
QAP formulation of GED
S(x) =1
2xgtDx︸ ︷︷ ︸
Edgersquos cost
+ cgtx︸︷︷︸Nodersquos cost
S(x) =1
2xgt∆x
xlowast = arg minxisinΠ
S(x)
21 42
An intractable problem
No garanties on ∆
rArr Non convex problem
rArr No polynomial solution of minxisinΠ
S(x)
I NP-Hard problem
Letrsquos look for an approximation
22 42
Approximation is overestimation
1 Any mapping corresponds to an edit path
2 Any edit path has a cost ge GED
3 Approximate mappinghArr Overestimation of GED
23 42
A First Approach
Linear approximation
xlowast = minxisinΠ
1
2xgtDx + cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Graph definition
A graph G isin GG = (V E) is a set of nodes V connected by a set ofedges E = V times V
If (vi vj) isin E then Vi is adjacent to Vj
3 42
Labelling function
Node labelling
I lv V rarr LVI LV Node label alphabet (symbolic real valued )
Edge labelling
I le V times V rarr LEI LE Edge label alphabet
4 42
Some graphs
Chemical science
NN
C
O
C
Symbolic labels atoms
Pattern Recognition
Vector labels Shape characteristics
5 42
Why use graphs
Graphs can handle structural informationBUTG 6= RN
We need to define a dissimilarity measure
6 42
Graph Edit Distance (GED)
O
C N
N
CC
CC
C
C
D
N
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
D
N
N
D
CC
C
CC
C
I Dissimilarity measure
I Quantify a distortion
7 42
Edit path
Graph edit distanceMinimal amount of distortion required to transform one graph into another
I Edit path γ Sequence of edit operations e
γ = e1 ep
I Elementary edit cost c(e)
D
N
N
D
CC
C
CC
C
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
N
N
CC
CC
C
C
C
O DN
D
N
8 42
Formal Definition
Edit path cost γ
c(γ) =sumeisinγ
c(e)
Graph edit distance
I Edit Distance ged(G1G2) = min
γisinP
sumc(γ)
I Optimal edit path
γlowast isin arg minγisinP
sumc(γ)
9 42
How to compute
Graph Edit Distance
Tree search
11 42
Tree search methods
Alowast
I Dijkstra-based algorithm
I Need an heuristic h(p)
Always find a solution
But may take a loooooong time
Exponential number of edit paths
Depth first search based algorithm[Abu-Aisheh et al 2015]
I Based on heuristic
I Limitation on the number of open paths
I Any time algorithm can return an approximation before termination
I Parallelizable
12 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
D
N
N
D
CC
C
CC
C
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
N
N
CC
CC
C
C
C
O DN
D
N
13 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
C
O DN
D
N
D
N
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
13 42
Computing graph edit distance
is equivalent to
finding an optimal assignment
between nodes
14 42
Cost
Mapping ϕ V1 cup εrarr V2 cup ε
S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost
+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost
15 42
Cost Matrix C
Lv (V ε1 V
ε2 ϕ) =
sumvisinV1
c(v ϕ(v))
︸ ︷︷ ︸node substitutions
+sum
visinV1V1
c(v ε)
︸ ︷︷ ︸node removals
+sum
visinV2V2
c(ε v)
︸ ︷︷ ︸node insertions
c(v(1)1 rarr v
(2)1 ) middot middot middot c(v
(1)1 rarr v
(2)m ) c(v
(1)1 rarr ε) infin middot middot middot infin
infin
c(v(1)i rarr v
(2)j )
c(v
(1)i rarr ε)
c(v(1)n rarr v
(2)1 ) c(v
(1)n rarr v
(2)m ) infin c(v
(1)n rarr ε)
c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0
infin
c(εrarr v(2)j )
infin c(εrarr v(2)m ) 0 0
16 42
Node operations cost
Node cost induced by ϕ
Lv (V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi=1
|G1|+|G2|sumj=1
(Xϕ C)(i j)
Vectorized version
Lv (V ε1 V
ε2 ϕ) = cgtxϕ
17 42
Computation of edge costs
Edges operations cost I
Edge cost matrix D
D(i j k l) = c((i j)rarr (k l))
I (i j) isin E1 rarr deletion operation
I (k l) isin E2 rarr insertion operation
I Edgersquos mapping is induced by nodes mapping
19 42
Edges operations cost II
Edgersquos cost
Qe(V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi j k l=1
Xϕ(i k)D(i j k l)Xϕ(j l)
Vectorized version
Qe(V ε1 V
ε2 ϕ) = xgtϕDxϕ
20 42
QAP formulation of GED
S(x) =1
2xgtDx︸ ︷︷ ︸
Edgersquos cost
+ cgtx︸︷︷︸Nodersquos cost
S(x) =1
2xgt∆x
xlowast = arg minxisinΠ
S(x)
21 42
An intractable problem
No garanties on ∆
rArr Non convex problem
rArr No polynomial solution of minxisinΠ
S(x)
I NP-Hard problem
Letrsquos look for an approximation
22 42
Approximation is overestimation
1 Any mapping corresponds to an edit path
2 Any edit path has a cost ge GED
3 Approximate mappinghArr Overestimation of GED
23 42
A First Approach
Linear approximation
xlowast = minxisinΠ
1
2xgtDx + cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Labelling function
Node labelling
I lv V rarr LVI LV Node label alphabet (symbolic real valued )
Edge labelling
I le V times V rarr LEI LE Edge label alphabet
4 42
Some graphs
Chemical science
NN
C
O
C
Symbolic labels atoms
Pattern Recognition
Vector labels Shape characteristics
5 42
Why use graphs
Graphs can handle structural informationBUTG 6= RN
We need to define a dissimilarity measure
6 42
Graph Edit Distance (GED)
O
C N
N
CC
CC
C
C
D
N
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
D
N
N
D
CC
C
CC
C
I Dissimilarity measure
I Quantify a distortion
7 42
Edit path
Graph edit distanceMinimal amount of distortion required to transform one graph into another
I Edit path γ Sequence of edit operations e
γ = e1 ep
I Elementary edit cost c(e)
D
N
N
D
CC
C
CC
C
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
N
N
CC
CC
C
C
C
O DN
D
N
8 42
Formal Definition
Edit path cost γ
c(γ) =sumeisinγ
c(e)
Graph edit distance
I Edit Distance ged(G1G2) = min
γisinP
sumc(γ)
I Optimal edit path
γlowast isin arg minγisinP
sumc(γ)
9 42
How to compute
Graph Edit Distance
Tree search
11 42
Tree search methods
Alowast
I Dijkstra-based algorithm
I Need an heuristic h(p)
Always find a solution
But may take a loooooong time
Exponential number of edit paths
Depth first search based algorithm[Abu-Aisheh et al 2015]
I Based on heuristic
I Limitation on the number of open paths
I Any time algorithm can return an approximation before termination
I Parallelizable
12 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
D
N
N
D
CC
C
CC
C
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
N
N
CC
CC
C
C
C
O DN
D
N
13 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
C
O DN
D
N
D
N
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
13 42
Computing graph edit distance
is equivalent to
finding an optimal assignment
between nodes
14 42
Cost
Mapping ϕ V1 cup εrarr V2 cup ε
S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost
+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost
15 42
Cost Matrix C
Lv (V ε1 V
ε2 ϕ) =
sumvisinV1
c(v ϕ(v))
︸ ︷︷ ︸node substitutions
+sum
visinV1V1
c(v ε)
︸ ︷︷ ︸node removals
+sum
visinV2V2
c(ε v)
︸ ︷︷ ︸node insertions
c(v(1)1 rarr v
(2)1 ) middot middot middot c(v
(1)1 rarr v
(2)m ) c(v
(1)1 rarr ε) infin middot middot middot infin
infin
c(v(1)i rarr v
(2)j )
c(v
(1)i rarr ε)
c(v(1)n rarr v
(2)1 ) c(v
(1)n rarr v
(2)m ) infin c(v
(1)n rarr ε)
c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0
infin
c(εrarr v(2)j )
infin c(εrarr v(2)m ) 0 0
16 42
Node operations cost
Node cost induced by ϕ
Lv (V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi=1
|G1|+|G2|sumj=1
(Xϕ C)(i j)
Vectorized version
Lv (V ε1 V
ε2 ϕ) = cgtxϕ
17 42
Computation of edge costs
Edges operations cost I
Edge cost matrix D
D(i j k l) = c((i j)rarr (k l))
I (i j) isin E1 rarr deletion operation
I (k l) isin E2 rarr insertion operation
I Edgersquos mapping is induced by nodes mapping
19 42
Edges operations cost II
Edgersquos cost
Qe(V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi j k l=1
Xϕ(i k)D(i j k l)Xϕ(j l)
Vectorized version
Qe(V ε1 V
ε2 ϕ) = xgtϕDxϕ
20 42
QAP formulation of GED
S(x) =1
2xgtDx︸ ︷︷ ︸
Edgersquos cost
+ cgtx︸︷︷︸Nodersquos cost
S(x) =1
2xgt∆x
xlowast = arg minxisinΠ
S(x)
21 42
An intractable problem
No garanties on ∆
rArr Non convex problem
rArr No polynomial solution of minxisinΠ
S(x)
I NP-Hard problem
Letrsquos look for an approximation
22 42
Approximation is overestimation
1 Any mapping corresponds to an edit path
2 Any edit path has a cost ge GED
3 Approximate mappinghArr Overestimation of GED
23 42
A First Approach
Linear approximation
xlowast = minxisinΠ
1
2xgtDx + cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Some graphs
Chemical science
NN
C
O
C
Symbolic labels atoms
Pattern Recognition
Vector labels Shape characteristics
5 42
Why use graphs
Graphs can handle structural informationBUTG 6= RN
We need to define a dissimilarity measure
6 42
Graph Edit Distance (GED)
O
C N
N
CC
CC
C
C
D
N
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
D
N
N
D
CC
C
CC
C
I Dissimilarity measure
I Quantify a distortion
7 42
Edit path
Graph edit distanceMinimal amount of distortion required to transform one graph into another
I Edit path γ Sequence of edit operations e
γ = e1 ep
I Elementary edit cost c(e)
D
N
N
D
CC
C
CC
C
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
N
N
CC
CC
C
C
C
O DN
D
N
8 42
Formal Definition
Edit path cost γ
c(γ) =sumeisinγ
c(e)
Graph edit distance
I Edit Distance ged(G1G2) = min
γisinP
sumc(γ)
I Optimal edit path
γlowast isin arg minγisinP
sumc(γ)
9 42
How to compute
Graph Edit Distance
Tree search
11 42
Tree search methods
Alowast
I Dijkstra-based algorithm
I Need an heuristic h(p)
Always find a solution
But may take a loooooong time
Exponential number of edit paths
Depth first search based algorithm[Abu-Aisheh et al 2015]
I Based on heuristic
I Limitation on the number of open paths
I Any time algorithm can return an approximation before termination
I Parallelizable
12 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
D
N
N
D
CC
C
CC
C
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
N
N
CC
CC
C
C
C
O DN
D
N
13 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
C
O DN
D
N
D
N
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
13 42
Computing graph edit distance
is equivalent to
finding an optimal assignment
between nodes
14 42
Cost
Mapping ϕ V1 cup εrarr V2 cup ε
S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost
+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost
15 42
Cost Matrix C
Lv (V ε1 V
ε2 ϕ) =
sumvisinV1
c(v ϕ(v))
︸ ︷︷ ︸node substitutions
+sum
visinV1V1
c(v ε)
︸ ︷︷ ︸node removals
+sum
visinV2V2
c(ε v)
︸ ︷︷ ︸node insertions
c(v(1)1 rarr v
(2)1 ) middot middot middot c(v
(1)1 rarr v
(2)m ) c(v
(1)1 rarr ε) infin middot middot middot infin
infin
c(v(1)i rarr v
(2)j )
c(v
(1)i rarr ε)
c(v(1)n rarr v
(2)1 ) c(v
(1)n rarr v
(2)m ) infin c(v
(1)n rarr ε)
c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0
infin
c(εrarr v(2)j )
infin c(εrarr v(2)m ) 0 0
16 42
Node operations cost
Node cost induced by ϕ
Lv (V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi=1
|G1|+|G2|sumj=1
(Xϕ C)(i j)
Vectorized version
Lv (V ε1 V
ε2 ϕ) = cgtxϕ
17 42
Computation of edge costs
Edges operations cost I
Edge cost matrix D
D(i j k l) = c((i j)rarr (k l))
I (i j) isin E1 rarr deletion operation
I (k l) isin E2 rarr insertion operation
I Edgersquos mapping is induced by nodes mapping
19 42
Edges operations cost II
Edgersquos cost
Qe(V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi j k l=1
Xϕ(i k)D(i j k l)Xϕ(j l)
Vectorized version
Qe(V ε1 V
ε2 ϕ) = xgtϕDxϕ
20 42
QAP formulation of GED
S(x) =1
2xgtDx︸ ︷︷ ︸
Edgersquos cost
+ cgtx︸︷︷︸Nodersquos cost
S(x) =1
2xgt∆x
xlowast = arg minxisinΠ
S(x)
21 42
An intractable problem
No garanties on ∆
rArr Non convex problem
rArr No polynomial solution of minxisinΠ
S(x)
I NP-Hard problem
Letrsquos look for an approximation
22 42
Approximation is overestimation
1 Any mapping corresponds to an edit path
2 Any edit path has a cost ge GED
3 Approximate mappinghArr Overestimation of GED
23 42
A First Approach
Linear approximation
xlowast = minxisinΠ
1
2xgtDx + cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Why use graphs
Graphs can handle structural informationBUTG 6= RN
We need to define a dissimilarity measure
6 42
Graph Edit Distance (GED)
O
C N
N
CC
CC
C
C
D
N
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
D
N
N
D
CC
C
CC
C
I Dissimilarity measure
I Quantify a distortion
7 42
Edit path
Graph edit distanceMinimal amount of distortion required to transform one graph into another
I Edit path γ Sequence of edit operations e
γ = e1 ep
I Elementary edit cost c(e)
D
N
N
D
CC
C
CC
C
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
N
N
CC
CC
C
C
C
O DN
D
N
8 42
Formal Definition
Edit path cost γ
c(γ) =sumeisinγ
c(e)
Graph edit distance
I Edit Distance ged(G1G2) = min
γisinP
sumc(γ)
I Optimal edit path
γlowast isin arg minγisinP
sumc(γ)
9 42
How to compute
Graph Edit Distance
Tree search
11 42
Tree search methods
Alowast
I Dijkstra-based algorithm
I Need an heuristic h(p)
Always find a solution
But may take a loooooong time
Exponential number of edit paths
Depth first search based algorithm[Abu-Aisheh et al 2015]
I Based on heuristic
I Limitation on the number of open paths
I Any time algorithm can return an approximation before termination
I Parallelizable
12 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
D
N
N
D
CC
C
CC
C
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
N
N
CC
CC
C
C
C
O DN
D
N
13 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
C
O DN
D
N
D
N
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
13 42
Computing graph edit distance
is equivalent to
finding an optimal assignment
between nodes
14 42
Cost
Mapping ϕ V1 cup εrarr V2 cup ε
S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost
+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost
15 42
Cost Matrix C
Lv (V ε1 V
ε2 ϕ) =
sumvisinV1
c(v ϕ(v))
︸ ︷︷ ︸node substitutions
+sum
visinV1V1
c(v ε)
︸ ︷︷ ︸node removals
+sum
visinV2V2
c(ε v)
︸ ︷︷ ︸node insertions
c(v(1)1 rarr v
(2)1 ) middot middot middot c(v
(1)1 rarr v
(2)m ) c(v
(1)1 rarr ε) infin middot middot middot infin
infin
c(v(1)i rarr v
(2)j )
c(v
(1)i rarr ε)
c(v(1)n rarr v
(2)1 ) c(v
(1)n rarr v
(2)m ) infin c(v
(1)n rarr ε)
c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0
infin
c(εrarr v(2)j )
infin c(εrarr v(2)m ) 0 0
16 42
Node operations cost
Node cost induced by ϕ
Lv (V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi=1
|G1|+|G2|sumj=1
(Xϕ C)(i j)
Vectorized version
Lv (V ε1 V
ε2 ϕ) = cgtxϕ
17 42
Computation of edge costs
Edges operations cost I
Edge cost matrix D
D(i j k l) = c((i j)rarr (k l))
I (i j) isin E1 rarr deletion operation
I (k l) isin E2 rarr insertion operation
I Edgersquos mapping is induced by nodes mapping
19 42
Edges operations cost II
Edgersquos cost
Qe(V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi j k l=1
Xϕ(i k)D(i j k l)Xϕ(j l)
Vectorized version
Qe(V ε1 V
ε2 ϕ) = xgtϕDxϕ
20 42
QAP formulation of GED
S(x) =1
2xgtDx︸ ︷︷ ︸
Edgersquos cost
+ cgtx︸︷︷︸Nodersquos cost
S(x) =1
2xgt∆x
xlowast = arg minxisinΠ
S(x)
21 42
An intractable problem
No garanties on ∆
rArr Non convex problem
rArr No polynomial solution of minxisinΠ
S(x)
I NP-Hard problem
Letrsquos look for an approximation
22 42
Approximation is overestimation
1 Any mapping corresponds to an edit path
2 Any edit path has a cost ge GED
3 Approximate mappinghArr Overestimation of GED
23 42
A First Approach
Linear approximation
xlowast = minxisinΠ
1
2xgtDx + cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Graph Edit Distance (GED)
O
C N
N
CC
CC
C
C
D
N
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
D
N
N
D
CC
C
CC
C
I Dissimilarity measure
I Quantify a distortion
7 42
Edit path
Graph edit distanceMinimal amount of distortion required to transform one graph into another
I Edit path γ Sequence of edit operations e
γ = e1 ep
I Elementary edit cost c(e)
D
N
N
D
CC
C
CC
C
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
N
N
CC
CC
C
C
C
O DN
D
N
8 42
Formal Definition
Edit path cost γ
c(γ) =sumeisinγ
c(e)
Graph edit distance
I Edit Distance ged(G1G2) = min
γisinP
sumc(γ)
I Optimal edit path
γlowast isin arg minγisinP
sumc(γ)
9 42
How to compute
Graph Edit Distance
Tree search
11 42
Tree search methods
Alowast
I Dijkstra-based algorithm
I Need an heuristic h(p)
Always find a solution
But may take a loooooong time
Exponential number of edit paths
Depth first search based algorithm[Abu-Aisheh et al 2015]
I Based on heuristic
I Limitation on the number of open paths
I Any time algorithm can return an approximation before termination
I Parallelizable
12 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
D
N
N
D
CC
C
CC
C
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
N
N
CC
CC
C
C
C
O DN
D
N
13 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
C
O DN
D
N
D
N
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
13 42
Computing graph edit distance
is equivalent to
finding an optimal assignment
between nodes
14 42
Cost
Mapping ϕ V1 cup εrarr V2 cup ε
S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost
+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost
15 42
Cost Matrix C
Lv (V ε1 V
ε2 ϕ) =
sumvisinV1
c(v ϕ(v))
︸ ︷︷ ︸node substitutions
+sum
visinV1V1
c(v ε)
︸ ︷︷ ︸node removals
+sum
visinV2V2
c(ε v)
︸ ︷︷ ︸node insertions
c(v(1)1 rarr v
(2)1 ) middot middot middot c(v
(1)1 rarr v
(2)m ) c(v
(1)1 rarr ε) infin middot middot middot infin
infin
c(v(1)i rarr v
(2)j )
c(v
(1)i rarr ε)
c(v(1)n rarr v
(2)1 ) c(v
(1)n rarr v
(2)m ) infin c(v
(1)n rarr ε)
c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0
infin
c(εrarr v(2)j )
infin c(εrarr v(2)m ) 0 0
16 42
Node operations cost
Node cost induced by ϕ
Lv (V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi=1
|G1|+|G2|sumj=1
(Xϕ C)(i j)
Vectorized version
Lv (V ε1 V
ε2 ϕ) = cgtxϕ
17 42
Computation of edge costs
Edges operations cost I
Edge cost matrix D
D(i j k l) = c((i j)rarr (k l))
I (i j) isin E1 rarr deletion operation
I (k l) isin E2 rarr insertion operation
I Edgersquos mapping is induced by nodes mapping
19 42
Edges operations cost II
Edgersquos cost
Qe(V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi j k l=1
Xϕ(i k)D(i j k l)Xϕ(j l)
Vectorized version
Qe(V ε1 V
ε2 ϕ) = xgtϕDxϕ
20 42
QAP formulation of GED
S(x) =1
2xgtDx︸ ︷︷ ︸
Edgersquos cost
+ cgtx︸︷︷︸Nodersquos cost
S(x) =1
2xgt∆x
xlowast = arg minxisinΠ
S(x)
21 42
An intractable problem
No garanties on ∆
rArr Non convex problem
rArr No polynomial solution of minxisinΠ
S(x)
I NP-Hard problem
Letrsquos look for an approximation
22 42
Approximation is overestimation
1 Any mapping corresponds to an edit path
2 Any edit path has a cost ge GED
3 Approximate mappinghArr Overestimation of GED
23 42
A First Approach
Linear approximation
xlowast = minxisinΠ
1
2xgtDx + cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Edit path
Graph edit distanceMinimal amount of distortion required to transform one graph into another
I Edit path γ Sequence of edit operations e
γ = e1 ep
I Elementary edit cost c(e)
D
N
N
D
CC
C
CC
C
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
N
N
CC
CC
C
C
C
O DN
D
N
8 42
Formal Definition
Edit path cost γ
c(γ) =sumeisinγ
c(e)
Graph edit distance
I Edit Distance ged(G1G2) = min
γisinP
sumc(γ)
I Optimal edit path
γlowast isin arg minγisinP
sumc(γ)
9 42
How to compute
Graph Edit Distance
Tree search
11 42
Tree search methods
Alowast
I Dijkstra-based algorithm
I Need an heuristic h(p)
Always find a solution
But may take a loooooong time
Exponential number of edit paths
Depth first search based algorithm[Abu-Aisheh et al 2015]
I Based on heuristic
I Limitation on the number of open paths
I Any time algorithm can return an approximation before termination
I Parallelizable
12 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
D
N
N
D
CC
C
CC
C
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
N
N
CC
CC
C
C
C
O DN
D
N
13 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
C
O DN
D
N
D
N
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
13 42
Computing graph edit distance
is equivalent to
finding an optimal assignment
between nodes
14 42
Cost
Mapping ϕ V1 cup εrarr V2 cup ε
S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost
+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost
15 42
Cost Matrix C
Lv (V ε1 V
ε2 ϕ) =
sumvisinV1
c(v ϕ(v))
︸ ︷︷ ︸node substitutions
+sum
visinV1V1
c(v ε)
︸ ︷︷ ︸node removals
+sum
visinV2V2
c(ε v)
︸ ︷︷ ︸node insertions
c(v(1)1 rarr v
(2)1 ) middot middot middot c(v
(1)1 rarr v
(2)m ) c(v
(1)1 rarr ε) infin middot middot middot infin
infin
c(v(1)i rarr v
(2)j )
c(v
(1)i rarr ε)
c(v(1)n rarr v
(2)1 ) c(v
(1)n rarr v
(2)m ) infin c(v
(1)n rarr ε)
c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0
infin
c(εrarr v(2)j )
infin c(εrarr v(2)m ) 0 0
16 42
Node operations cost
Node cost induced by ϕ
Lv (V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi=1
|G1|+|G2|sumj=1
(Xϕ C)(i j)
Vectorized version
Lv (V ε1 V
ε2 ϕ) = cgtxϕ
17 42
Computation of edge costs
Edges operations cost I
Edge cost matrix D
D(i j k l) = c((i j)rarr (k l))
I (i j) isin E1 rarr deletion operation
I (k l) isin E2 rarr insertion operation
I Edgersquos mapping is induced by nodes mapping
19 42
Edges operations cost II
Edgersquos cost
Qe(V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi j k l=1
Xϕ(i k)D(i j k l)Xϕ(j l)
Vectorized version
Qe(V ε1 V
ε2 ϕ) = xgtϕDxϕ
20 42
QAP formulation of GED
S(x) =1
2xgtDx︸ ︷︷ ︸
Edgersquos cost
+ cgtx︸︷︷︸Nodersquos cost
S(x) =1
2xgt∆x
xlowast = arg minxisinΠ
S(x)
21 42
An intractable problem
No garanties on ∆
rArr Non convex problem
rArr No polynomial solution of minxisinΠ
S(x)
I NP-Hard problem
Letrsquos look for an approximation
22 42
Approximation is overestimation
1 Any mapping corresponds to an edit path
2 Any edit path has a cost ge GED
3 Approximate mappinghArr Overestimation of GED
23 42
A First Approach
Linear approximation
xlowast = minxisinΠ
1
2xgtDx + cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Formal Definition
Edit path cost γ
c(γ) =sumeisinγ
c(e)
Graph edit distance
I Edit Distance ged(G1G2) = min
γisinP
sumc(γ)
I Optimal edit path
γlowast isin arg minγisinP
sumc(γ)
9 42
How to compute
Graph Edit Distance
Tree search
11 42
Tree search methods
Alowast
I Dijkstra-based algorithm
I Need an heuristic h(p)
Always find a solution
But may take a loooooong time
Exponential number of edit paths
Depth first search based algorithm[Abu-Aisheh et al 2015]
I Based on heuristic
I Limitation on the number of open paths
I Any time algorithm can return an approximation before termination
I Parallelizable
12 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
D
N
N
D
CC
C
CC
C
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
N
N
CC
CC
C
C
C
O DN
D
N
13 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
C
O DN
D
N
D
N
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
13 42
Computing graph edit distance
is equivalent to
finding an optimal assignment
between nodes
14 42
Cost
Mapping ϕ V1 cup εrarr V2 cup ε
S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost
+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost
15 42
Cost Matrix C
Lv (V ε1 V
ε2 ϕ) =
sumvisinV1
c(v ϕ(v))
︸ ︷︷ ︸node substitutions
+sum
visinV1V1
c(v ε)
︸ ︷︷ ︸node removals
+sum
visinV2V2
c(ε v)
︸ ︷︷ ︸node insertions
c(v(1)1 rarr v
(2)1 ) middot middot middot c(v
(1)1 rarr v
(2)m ) c(v
(1)1 rarr ε) infin middot middot middot infin
infin
c(v(1)i rarr v
(2)j )
c(v
(1)i rarr ε)
c(v(1)n rarr v
(2)1 ) c(v
(1)n rarr v
(2)m ) infin c(v
(1)n rarr ε)
c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0
infin
c(εrarr v(2)j )
infin c(εrarr v(2)m ) 0 0
16 42
Node operations cost
Node cost induced by ϕ
Lv (V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi=1
|G1|+|G2|sumj=1
(Xϕ C)(i j)
Vectorized version
Lv (V ε1 V
ε2 ϕ) = cgtxϕ
17 42
Computation of edge costs
Edges operations cost I
Edge cost matrix D
D(i j k l) = c((i j)rarr (k l))
I (i j) isin E1 rarr deletion operation
I (k l) isin E2 rarr insertion operation
I Edgersquos mapping is induced by nodes mapping
19 42
Edges operations cost II
Edgersquos cost
Qe(V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi j k l=1
Xϕ(i k)D(i j k l)Xϕ(j l)
Vectorized version
Qe(V ε1 V
ε2 ϕ) = xgtϕDxϕ
20 42
QAP formulation of GED
S(x) =1
2xgtDx︸ ︷︷ ︸
Edgersquos cost
+ cgtx︸︷︷︸Nodersquos cost
S(x) =1
2xgt∆x
xlowast = arg minxisinΠ
S(x)
21 42
An intractable problem
No garanties on ∆
rArr Non convex problem
rArr No polynomial solution of minxisinΠ
S(x)
I NP-Hard problem
Letrsquos look for an approximation
22 42
Approximation is overestimation
1 Any mapping corresponds to an edit path
2 Any edit path has a cost ge GED
3 Approximate mappinghArr Overestimation of GED
23 42
A First Approach
Linear approximation
xlowast = minxisinΠ
1
2xgtDx + cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
How to compute
Graph Edit Distance
Tree search
11 42
Tree search methods
Alowast
I Dijkstra-based algorithm
I Need an heuristic h(p)
Always find a solution
But may take a loooooong time
Exponential number of edit paths
Depth first search based algorithm[Abu-Aisheh et al 2015]
I Based on heuristic
I Limitation on the number of open paths
I Any time algorithm can return an approximation before termination
I Parallelizable
12 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
D
N
N
D
CC
C
CC
C
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
N
N
CC
CC
C
C
C
O DN
D
N
13 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
C
O DN
D
N
D
N
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
13 42
Computing graph edit distance
is equivalent to
finding an optimal assignment
between nodes
14 42
Cost
Mapping ϕ V1 cup εrarr V2 cup ε
S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost
+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost
15 42
Cost Matrix C
Lv (V ε1 V
ε2 ϕ) =
sumvisinV1
c(v ϕ(v))
︸ ︷︷ ︸node substitutions
+sum
visinV1V1
c(v ε)
︸ ︷︷ ︸node removals
+sum
visinV2V2
c(ε v)
︸ ︷︷ ︸node insertions
c(v(1)1 rarr v
(2)1 ) middot middot middot c(v
(1)1 rarr v
(2)m ) c(v
(1)1 rarr ε) infin middot middot middot infin
infin
c(v(1)i rarr v
(2)j )
c(v
(1)i rarr ε)
c(v(1)n rarr v
(2)1 ) c(v
(1)n rarr v
(2)m ) infin c(v
(1)n rarr ε)
c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0
infin
c(εrarr v(2)j )
infin c(εrarr v(2)m ) 0 0
16 42
Node operations cost
Node cost induced by ϕ
Lv (V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi=1
|G1|+|G2|sumj=1
(Xϕ C)(i j)
Vectorized version
Lv (V ε1 V
ε2 ϕ) = cgtxϕ
17 42
Computation of edge costs
Edges operations cost I
Edge cost matrix D
D(i j k l) = c((i j)rarr (k l))
I (i j) isin E1 rarr deletion operation
I (k l) isin E2 rarr insertion operation
I Edgersquos mapping is induced by nodes mapping
19 42
Edges operations cost II
Edgersquos cost
Qe(V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi j k l=1
Xϕ(i k)D(i j k l)Xϕ(j l)
Vectorized version
Qe(V ε1 V
ε2 ϕ) = xgtϕDxϕ
20 42
QAP formulation of GED
S(x) =1
2xgtDx︸ ︷︷ ︸
Edgersquos cost
+ cgtx︸︷︷︸Nodersquos cost
S(x) =1
2xgt∆x
xlowast = arg minxisinΠ
S(x)
21 42
An intractable problem
No garanties on ∆
rArr Non convex problem
rArr No polynomial solution of minxisinΠ
S(x)
I NP-Hard problem
Letrsquos look for an approximation
22 42
Approximation is overestimation
1 Any mapping corresponds to an edit path
2 Any edit path has a cost ge GED
3 Approximate mappinghArr Overestimation of GED
23 42
A First Approach
Linear approximation
xlowast = minxisinΠ
1
2xgtDx + cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Tree search
11 42
Tree search methods
Alowast
I Dijkstra-based algorithm
I Need an heuristic h(p)
Always find a solution
But may take a loooooong time
Exponential number of edit paths
Depth first search based algorithm[Abu-Aisheh et al 2015]
I Based on heuristic
I Limitation on the number of open paths
I Any time algorithm can return an approximation before termination
I Parallelizable
12 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
D
N
N
D
CC
C
CC
C
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
N
N
CC
CC
C
C
C
O DN
D
N
13 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
C
O DN
D
N
D
N
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
13 42
Computing graph edit distance
is equivalent to
finding an optimal assignment
between nodes
14 42
Cost
Mapping ϕ V1 cup εrarr V2 cup ε
S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost
+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost
15 42
Cost Matrix C
Lv (V ε1 V
ε2 ϕ) =
sumvisinV1
c(v ϕ(v))
︸ ︷︷ ︸node substitutions
+sum
visinV1V1
c(v ε)
︸ ︷︷ ︸node removals
+sum
visinV2V2
c(ε v)
︸ ︷︷ ︸node insertions
c(v(1)1 rarr v
(2)1 ) middot middot middot c(v
(1)1 rarr v
(2)m ) c(v
(1)1 rarr ε) infin middot middot middot infin
infin
c(v(1)i rarr v
(2)j )
c(v
(1)i rarr ε)
c(v(1)n rarr v
(2)1 ) c(v
(1)n rarr v
(2)m ) infin c(v
(1)n rarr ε)
c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0
infin
c(εrarr v(2)j )
infin c(εrarr v(2)m ) 0 0
16 42
Node operations cost
Node cost induced by ϕ
Lv (V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi=1
|G1|+|G2|sumj=1
(Xϕ C)(i j)
Vectorized version
Lv (V ε1 V
ε2 ϕ) = cgtxϕ
17 42
Computation of edge costs
Edges operations cost I
Edge cost matrix D
D(i j k l) = c((i j)rarr (k l))
I (i j) isin E1 rarr deletion operation
I (k l) isin E2 rarr insertion operation
I Edgersquos mapping is induced by nodes mapping
19 42
Edges operations cost II
Edgersquos cost
Qe(V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi j k l=1
Xϕ(i k)D(i j k l)Xϕ(j l)
Vectorized version
Qe(V ε1 V
ε2 ϕ) = xgtϕDxϕ
20 42
QAP formulation of GED
S(x) =1
2xgtDx︸ ︷︷ ︸
Edgersquos cost
+ cgtx︸︷︷︸Nodersquos cost
S(x) =1
2xgt∆x
xlowast = arg minxisinΠ
S(x)
21 42
An intractable problem
No garanties on ∆
rArr Non convex problem
rArr No polynomial solution of minxisinΠ
S(x)
I NP-Hard problem
Letrsquos look for an approximation
22 42
Approximation is overestimation
1 Any mapping corresponds to an edit path
2 Any edit path has a cost ge GED
3 Approximate mappinghArr Overestimation of GED
23 42
A First Approach
Linear approximation
xlowast = minxisinΠ
1
2xgtDx + cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Tree search methods
Alowast
I Dijkstra-based algorithm
I Need an heuristic h(p)
Always find a solution
But may take a loooooong time
Exponential number of edit paths
Depth first search based algorithm[Abu-Aisheh et al 2015]
I Based on heuristic
I Limitation on the number of open paths
I Any time algorithm can return an approximation before termination
I Parallelizable
12 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
D
N
N
D
CC
C
CC
C
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
N
N
CC
CC
C
C
C
O DN
D
N
13 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
C
O DN
D
N
D
N
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
13 42
Computing graph edit distance
is equivalent to
finding an optimal assignment
between nodes
14 42
Cost
Mapping ϕ V1 cup εrarr V2 cup ε
S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost
+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost
15 42
Cost Matrix C
Lv (V ε1 V
ε2 ϕ) =
sumvisinV1
c(v ϕ(v))
︸ ︷︷ ︸node substitutions
+sum
visinV1V1
c(v ε)
︸ ︷︷ ︸node removals
+sum
visinV2V2
c(ε v)
︸ ︷︷ ︸node insertions
c(v(1)1 rarr v
(2)1 ) middot middot middot c(v
(1)1 rarr v
(2)m ) c(v
(1)1 rarr ε) infin middot middot middot infin
infin
c(v(1)i rarr v
(2)j )
c(v
(1)i rarr ε)
c(v(1)n rarr v
(2)1 ) c(v
(1)n rarr v
(2)m ) infin c(v
(1)n rarr ε)
c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0
infin
c(εrarr v(2)j )
infin c(εrarr v(2)m ) 0 0
16 42
Node operations cost
Node cost induced by ϕ
Lv (V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi=1
|G1|+|G2|sumj=1
(Xϕ C)(i j)
Vectorized version
Lv (V ε1 V
ε2 ϕ) = cgtxϕ
17 42
Computation of edge costs
Edges operations cost I
Edge cost matrix D
D(i j k l) = c((i j)rarr (k l))
I (i j) isin E1 rarr deletion operation
I (k l) isin E2 rarr insertion operation
I Edgersquos mapping is induced by nodes mapping
19 42
Edges operations cost II
Edgersquos cost
Qe(V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi j k l=1
Xϕ(i k)D(i j k l)Xϕ(j l)
Vectorized version
Qe(V ε1 V
ε2 ϕ) = xgtϕDxϕ
20 42
QAP formulation of GED
S(x) =1
2xgtDx︸ ︷︷ ︸
Edgersquos cost
+ cgtx︸︷︷︸Nodersquos cost
S(x) =1
2xgt∆x
xlowast = arg minxisinΠ
S(x)
21 42
An intractable problem
No garanties on ∆
rArr Non convex problem
rArr No polynomial solution of minxisinΠ
S(x)
I NP-Hard problem
Letrsquos look for an approximation
22 42
Approximation is overestimation
1 Any mapping corresponds to an edit path
2 Any edit path has a cost ge GED
3 Approximate mappinghArr Overestimation of GED
23 42
A First Approach
Linear approximation
xlowast = minxisinΠ
1
2xgtDx + cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
D
N
N
D
CC
C
CC
C
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
N
N
CC
CC
C
C
C
O DN
D
N
13 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
C
O DN
D
N
D
N
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
13 42
Computing graph edit distance
is equivalent to
finding an optimal assignment
between nodes
14 42
Cost
Mapping ϕ V1 cup εrarr V2 cup ε
S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost
+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost
15 42
Cost Matrix C
Lv (V ε1 V
ε2 ϕ) =
sumvisinV1
c(v ϕ(v))
︸ ︷︷ ︸node substitutions
+sum
visinV1V1
c(v ε)
︸ ︷︷ ︸node removals
+sum
visinV2V2
c(ε v)
︸ ︷︷ ︸node insertions
c(v(1)1 rarr v
(2)1 ) middot middot middot c(v
(1)1 rarr v
(2)m ) c(v
(1)1 rarr ε) infin middot middot middot infin
infin
c(v(1)i rarr v
(2)j )
c(v
(1)i rarr ε)
c(v(1)n rarr v
(2)1 ) c(v
(1)n rarr v
(2)m ) infin c(v
(1)n rarr ε)
c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0
infin
c(εrarr v(2)j )
infin c(εrarr v(2)m ) 0 0
16 42
Node operations cost
Node cost induced by ϕ
Lv (V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi=1
|G1|+|G2|sumj=1
(Xϕ C)(i j)
Vectorized version
Lv (V ε1 V
ε2 ϕ) = cgtxϕ
17 42
Computation of edge costs
Edges operations cost I
Edge cost matrix D
D(i j k l) = c((i j)rarr (k l))
I (i j) isin E1 rarr deletion operation
I (k l) isin E2 rarr insertion operation
I Edgersquos mapping is induced by nodes mapping
19 42
Edges operations cost II
Edgersquos cost
Qe(V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi j k l=1
Xϕ(i k)D(i j k l)Xϕ(j l)
Vectorized version
Qe(V ε1 V
ε2 ϕ) = xgtϕDxϕ
20 42
QAP formulation of GED
S(x) =1
2xgtDx︸ ︷︷ ︸
Edgersquos cost
+ cgtx︸︷︷︸Nodersquos cost
S(x) =1
2xgt∆x
xlowast = arg minxisinΠ
S(x)
21 42
An intractable problem
No garanties on ∆
rArr Non convex problem
rArr No polynomial solution of minxisinΠ
S(x)
I NP-Hard problem
Letrsquos look for an approximation
22 42
Approximation is overestimation
1 Any mapping corresponds to an edit path
2 Any edit path has a cost ge GED
3 Approximate mappinghArr Overestimation of GED
23 42
A First Approach
Linear approximation
xlowast = minxisinΠ
1
2xgtDx + cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Correspondance between edit paths and mappings[Bougleux et al 2015]
C
O DN
D
N
D
N
N
D
CC
C
CC
C
O
C N
N
CC
CC
C
C
13 42
Computing graph edit distance
is equivalent to
finding an optimal assignment
between nodes
14 42
Cost
Mapping ϕ V1 cup εrarr V2 cup ε
S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost
+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost
15 42
Cost Matrix C
Lv (V ε1 V
ε2 ϕ) =
sumvisinV1
c(v ϕ(v))
︸ ︷︷ ︸node substitutions
+sum
visinV1V1
c(v ε)
︸ ︷︷ ︸node removals
+sum
visinV2V2
c(ε v)
︸ ︷︷ ︸node insertions
c(v(1)1 rarr v
(2)1 ) middot middot middot c(v
(1)1 rarr v
(2)m ) c(v
(1)1 rarr ε) infin middot middot middot infin
infin
c(v(1)i rarr v
(2)j )
c(v
(1)i rarr ε)
c(v(1)n rarr v
(2)1 ) c(v
(1)n rarr v
(2)m ) infin c(v
(1)n rarr ε)
c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0
infin
c(εrarr v(2)j )
infin c(εrarr v(2)m ) 0 0
16 42
Node operations cost
Node cost induced by ϕ
Lv (V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi=1
|G1|+|G2|sumj=1
(Xϕ C)(i j)
Vectorized version
Lv (V ε1 V
ε2 ϕ) = cgtxϕ
17 42
Computation of edge costs
Edges operations cost I
Edge cost matrix D
D(i j k l) = c((i j)rarr (k l))
I (i j) isin E1 rarr deletion operation
I (k l) isin E2 rarr insertion operation
I Edgersquos mapping is induced by nodes mapping
19 42
Edges operations cost II
Edgersquos cost
Qe(V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi j k l=1
Xϕ(i k)D(i j k l)Xϕ(j l)
Vectorized version
Qe(V ε1 V
ε2 ϕ) = xgtϕDxϕ
20 42
QAP formulation of GED
S(x) =1
2xgtDx︸ ︷︷ ︸
Edgersquos cost
+ cgtx︸︷︷︸Nodersquos cost
S(x) =1
2xgt∆x
xlowast = arg minxisinΠ
S(x)
21 42
An intractable problem
No garanties on ∆
rArr Non convex problem
rArr No polynomial solution of minxisinΠ
S(x)
I NP-Hard problem
Letrsquos look for an approximation
22 42
Approximation is overestimation
1 Any mapping corresponds to an edit path
2 Any edit path has a cost ge GED
3 Approximate mappinghArr Overestimation of GED
23 42
A First Approach
Linear approximation
xlowast = minxisinΠ
1
2xgtDx + cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Computing graph edit distance
is equivalent to
finding an optimal assignment
between nodes
14 42
Cost
Mapping ϕ V1 cup εrarr V2 cup ε
S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost
+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost
15 42
Cost Matrix C
Lv (V ε1 V
ε2 ϕ) =
sumvisinV1
c(v ϕ(v))
︸ ︷︷ ︸node substitutions
+sum
visinV1V1
c(v ε)
︸ ︷︷ ︸node removals
+sum
visinV2V2
c(ε v)
︸ ︷︷ ︸node insertions
c(v(1)1 rarr v
(2)1 ) middot middot middot c(v
(1)1 rarr v
(2)m ) c(v
(1)1 rarr ε) infin middot middot middot infin
infin
c(v(1)i rarr v
(2)j )
c(v
(1)i rarr ε)
c(v(1)n rarr v
(2)1 ) c(v
(1)n rarr v
(2)m ) infin c(v
(1)n rarr ε)
c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0
infin
c(εrarr v(2)j )
infin c(εrarr v(2)m ) 0 0
16 42
Node operations cost
Node cost induced by ϕ
Lv (V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi=1
|G1|+|G2|sumj=1
(Xϕ C)(i j)
Vectorized version
Lv (V ε1 V
ε2 ϕ) = cgtxϕ
17 42
Computation of edge costs
Edges operations cost I
Edge cost matrix D
D(i j k l) = c((i j)rarr (k l))
I (i j) isin E1 rarr deletion operation
I (k l) isin E2 rarr insertion operation
I Edgersquos mapping is induced by nodes mapping
19 42
Edges operations cost II
Edgersquos cost
Qe(V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi j k l=1
Xϕ(i k)D(i j k l)Xϕ(j l)
Vectorized version
Qe(V ε1 V
ε2 ϕ) = xgtϕDxϕ
20 42
QAP formulation of GED
S(x) =1
2xgtDx︸ ︷︷ ︸
Edgersquos cost
+ cgtx︸︷︷︸Nodersquos cost
S(x) =1
2xgt∆x
xlowast = arg minxisinΠ
S(x)
21 42
An intractable problem
No garanties on ∆
rArr Non convex problem
rArr No polynomial solution of minxisinΠ
S(x)
I NP-Hard problem
Letrsquos look for an approximation
22 42
Approximation is overestimation
1 Any mapping corresponds to an edit path
2 Any edit path has a cost ge GED
3 Approximate mappinghArr Overestimation of GED
23 42
A First Approach
Linear approximation
xlowast = minxisinΠ
1
2xgtDx + cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Cost
Mapping ϕ V1 cup εrarr V2 cup ε
S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost
+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost
15 42
Cost Matrix C
Lv (V ε1 V
ε2 ϕ) =
sumvisinV1
c(v ϕ(v))
︸ ︷︷ ︸node substitutions
+sum
visinV1V1
c(v ε)
︸ ︷︷ ︸node removals
+sum
visinV2V2
c(ε v)
︸ ︷︷ ︸node insertions
c(v(1)1 rarr v
(2)1 ) middot middot middot c(v
(1)1 rarr v
(2)m ) c(v
(1)1 rarr ε) infin middot middot middot infin
infin
c(v(1)i rarr v
(2)j )
c(v
(1)i rarr ε)
c(v(1)n rarr v
(2)1 ) c(v
(1)n rarr v
(2)m ) infin c(v
(1)n rarr ε)
c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0
infin
c(εrarr v(2)j )
infin c(εrarr v(2)m ) 0 0
16 42
Node operations cost
Node cost induced by ϕ
Lv (V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi=1
|G1|+|G2|sumj=1
(Xϕ C)(i j)
Vectorized version
Lv (V ε1 V
ε2 ϕ) = cgtxϕ
17 42
Computation of edge costs
Edges operations cost I
Edge cost matrix D
D(i j k l) = c((i j)rarr (k l))
I (i j) isin E1 rarr deletion operation
I (k l) isin E2 rarr insertion operation
I Edgersquos mapping is induced by nodes mapping
19 42
Edges operations cost II
Edgersquos cost
Qe(V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi j k l=1
Xϕ(i k)D(i j k l)Xϕ(j l)
Vectorized version
Qe(V ε1 V
ε2 ϕ) = xgtϕDxϕ
20 42
QAP formulation of GED
S(x) =1
2xgtDx︸ ︷︷ ︸
Edgersquos cost
+ cgtx︸︷︷︸Nodersquos cost
S(x) =1
2xgt∆x
xlowast = arg minxisinΠ
S(x)
21 42
An intractable problem
No garanties on ∆
rArr Non convex problem
rArr No polynomial solution of minxisinΠ
S(x)
I NP-Hard problem
Letrsquos look for an approximation
22 42
Approximation is overestimation
1 Any mapping corresponds to an edit path
2 Any edit path has a cost ge GED
3 Approximate mappinghArr Overestimation of GED
23 42
A First Approach
Linear approximation
xlowast = minxisinΠ
1
2xgtDx + cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Cost Matrix C
Lv (V ε1 V
ε2 ϕ) =
sumvisinV1
c(v ϕ(v))
︸ ︷︷ ︸node substitutions
+sum
visinV1V1
c(v ε)
︸ ︷︷ ︸node removals
+sum
visinV2V2
c(ε v)
︸ ︷︷ ︸node insertions
c(v(1)1 rarr v
(2)1 ) middot middot middot c(v
(1)1 rarr v
(2)m ) c(v
(1)1 rarr ε) infin middot middot middot infin
infin
c(v(1)i rarr v
(2)j )
c(v
(1)i rarr ε)
c(v(1)n rarr v
(2)1 ) c(v
(1)n rarr v
(2)m ) infin c(v
(1)n rarr ε)
c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0
infin
c(εrarr v(2)j )
infin c(εrarr v(2)m ) 0 0
16 42
Node operations cost
Node cost induced by ϕ
Lv (V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi=1
|G1|+|G2|sumj=1
(Xϕ C)(i j)
Vectorized version
Lv (V ε1 V
ε2 ϕ) = cgtxϕ
17 42
Computation of edge costs
Edges operations cost I
Edge cost matrix D
D(i j k l) = c((i j)rarr (k l))
I (i j) isin E1 rarr deletion operation
I (k l) isin E2 rarr insertion operation
I Edgersquos mapping is induced by nodes mapping
19 42
Edges operations cost II
Edgersquos cost
Qe(V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi j k l=1
Xϕ(i k)D(i j k l)Xϕ(j l)
Vectorized version
Qe(V ε1 V
ε2 ϕ) = xgtϕDxϕ
20 42
QAP formulation of GED
S(x) =1
2xgtDx︸ ︷︷ ︸
Edgersquos cost
+ cgtx︸︷︷︸Nodersquos cost
S(x) =1
2xgt∆x
xlowast = arg minxisinΠ
S(x)
21 42
An intractable problem
No garanties on ∆
rArr Non convex problem
rArr No polynomial solution of minxisinΠ
S(x)
I NP-Hard problem
Letrsquos look for an approximation
22 42
Approximation is overestimation
1 Any mapping corresponds to an edit path
2 Any edit path has a cost ge GED
3 Approximate mappinghArr Overestimation of GED
23 42
A First Approach
Linear approximation
xlowast = minxisinΠ
1
2xgtDx + cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Node operations cost
Node cost induced by ϕ
Lv (V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi=1
|G1|+|G2|sumj=1
(Xϕ C)(i j)
Vectorized version
Lv (V ε1 V
ε2 ϕ) = cgtxϕ
17 42
Computation of edge costs
Edges operations cost I
Edge cost matrix D
D(i j k l) = c((i j)rarr (k l))
I (i j) isin E1 rarr deletion operation
I (k l) isin E2 rarr insertion operation
I Edgersquos mapping is induced by nodes mapping
19 42
Edges operations cost II
Edgersquos cost
Qe(V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi j k l=1
Xϕ(i k)D(i j k l)Xϕ(j l)
Vectorized version
Qe(V ε1 V
ε2 ϕ) = xgtϕDxϕ
20 42
QAP formulation of GED
S(x) =1
2xgtDx︸ ︷︷ ︸
Edgersquos cost
+ cgtx︸︷︷︸Nodersquos cost
S(x) =1
2xgt∆x
xlowast = arg minxisinΠ
S(x)
21 42
An intractable problem
No garanties on ∆
rArr Non convex problem
rArr No polynomial solution of minxisinΠ
S(x)
I NP-Hard problem
Letrsquos look for an approximation
22 42
Approximation is overestimation
1 Any mapping corresponds to an edit path
2 Any edit path has a cost ge GED
3 Approximate mappinghArr Overestimation of GED
23 42
A First Approach
Linear approximation
xlowast = minxisinΠ
1
2xgtDx + cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Computation of edge costs
Edges operations cost I
Edge cost matrix D
D(i j k l) = c((i j)rarr (k l))
I (i j) isin E1 rarr deletion operation
I (k l) isin E2 rarr insertion operation
I Edgersquos mapping is induced by nodes mapping
19 42
Edges operations cost II
Edgersquos cost
Qe(V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi j k l=1
Xϕ(i k)D(i j k l)Xϕ(j l)
Vectorized version
Qe(V ε1 V
ε2 ϕ) = xgtϕDxϕ
20 42
QAP formulation of GED
S(x) =1
2xgtDx︸ ︷︷ ︸
Edgersquos cost
+ cgtx︸︷︷︸Nodersquos cost
S(x) =1
2xgt∆x
xlowast = arg minxisinΠ
S(x)
21 42
An intractable problem
No garanties on ∆
rArr Non convex problem
rArr No polynomial solution of minxisinΠ
S(x)
I NP-Hard problem
Letrsquos look for an approximation
22 42
Approximation is overestimation
1 Any mapping corresponds to an edit path
2 Any edit path has a cost ge GED
3 Approximate mappinghArr Overestimation of GED
23 42
A First Approach
Linear approximation
xlowast = minxisinΠ
1
2xgtDx + cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Edges operations cost I
Edge cost matrix D
D(i j k l) = c((i j)rarr (k l))
I (i j) isin E1 rarr deletion operation
I (k l) isin E2 rarr insertion operation
I Edgersquos mapping is induced by nodes mapping
19 42
Edges operations cost II
Edgersquos cost
Qe(V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi j k l=1
Xϕ(i k)D(i j k l)Xϕ(j l)
Vectorized version
Qe(V ε1 V
ε2 ϕ) = xgtϕDxϕ
20 42
QAP formulation of GED
S(x) =1
2xgtDx︸ ︷︷ ︸
Edgersquos cost
+ cgtx︸︷︷︸Nodersquos cost
S(x) =1
2xgt∆x
xlowast = arg minxisinΠ
S(x)
21 42
An intractable problem
No garanties on ∆
rArr Non convex problem
rArr No polynomial solution of minxisinΠ
S(x)
I NP-Hard problem
Letrsquos look for an approximation
22 42
Approximation is overestimation
1 Any mapping corresponds to an edit path
2 Any edit path has a cost ge GED
3 Approximate mappinghArr Overestimation of GED
23 42
A First Approach
Linear approximation
xlowast = minxisinΠ
1
2xgtDx + cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Edges operations cost II
Edgersquos cost
Qe(V ε1 V
ε2 ϕ) =
|G1|+|G2|sumi j k l=1
Xϕ(i k)D(i j k l)Xϕ(j l)
Vectorized version
Qe(V ε1 V
ε2 ϕ) = xgtϕDxϕ
20 42
QAP formulation of GED
S(x) =1
2xgtDx︸ ︷︷ ︸
Edgersquos cost
+ cgtx︸︷︷︸Nodersquos cost
S(x) =1
2xgt∆x
xlowast = arg minxisinΠ
S(x)
21 42
An intractable problem
No garanties on ∆
rArr Non convex problem
rArr No polynomial solution of minxisinΠ
S(x)
I NP-Hard problem
Letrsquos look for an approximation
22 42
Approximation is overestimation
1 Any mapping corresponds to an edit path
2 Any edit path has a cost ge GED
3 Approximate mappinghArr Overestimation of GED
23 42
A First Approach
Linear approximation
xlowast = minxisinΠ
1
2xgtDx + cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
QAP formulation of GED
S(x) =1
2xgtDx︸ ︷︷ ︸
Edgersquos cost
+ cgtx︸︷︷︸Nodersquos cost
S(x) =1
2xgt∆x
xlowast = arg minxisinΠ
S(x)
21 42
An intractable problem
No garanties on ∆
rArr Non convex problem
rArr No polynomial solution of minxisinΠ
S(x)
I NP-Hard problem
Letrsquos look for an approximation
22 42
Approximation is overestimation
1 Any mapping corresponds to an edit path
2 Any edit path has a cost ge GED
3 Approximate mappinghArr Overestimation of GED
23 42
A First Approach
Linear approximation
xlowast = minxisinΠ
1
2xgtDx + cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
An intractable problem
No garanties on ∆
rArr Non convex problem
rArr No polynomial solution of minxisinΠ
S(x)
I NP-Hard problem
Letrsquos look for an approximation
22 42
Approximation is overestimation
1 Any mapping corresponds to an edit path
2 Any edit path has a cost ge GED
3 Approximate mappinghArr Overestimation of GED
23 42
A First Approach
Linear approximation
xlowast = minxisinΠ
1
2xgtDx + cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Approximation is overestimation
1 Any mapping corresponds to an edit path
2 Any edit path has a cost ge GED
3 Approximate mappinghArr Overestimation of GED
23 42
A First Approach
Linear approximation
xlowast = minxisinΠ
1
2xgtDx + cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
A First Approach
Linear approximation
xlowast = minxisinΠ
1
2xgtDx + cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
A First Approach
Linear approximation
xlowast = arg minxisinΠ
cgtx
I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]
I Linear approximation of QAP formulation
I Hungarian algorithm (O(n3))
No structural information
24 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Augmented Cost Matrix
Adding some stuctural information
I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs
I Complexity harr accuracy
N
N
N
N
25 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
A first QAP approach
xlowast = minxisinΠ
1
2xgtDx + cgtx
Gradient descent approach
I Letrsquos find a local minimum of S(x)
I Relax problem to continuous domain S
Some solvers exist
No consideration of discrete nature of solution
26 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Another strategy
Integer-Projected Fixed Point [Leordeanu et al 2009]
Franck-Wolfe like algorithm
Iterate until convergence
1 Discrete resolution of linear gradient of QAP rarr xt
2 Line search between xtminus1 and the solution found in step 1
27 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Operating IPFP
At convergence
I Optimal continuous solution x is stable
I Need of a projection step to embed x to Π
Uncontrolled loss
No garanties that S(x) asymp S(xprime)
Importance of initialization
I Local minimum
Initialization is important [Carletti et al 2015]
28 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)
ζ = 1 Convex objective function
ζ = minus1 Concave objective function
29 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
GNCCP approach [Liu and Qiao 2014]
From convex to concave objective function
S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)
GNCCP algorithm
x = 0For ζ = 1rarr minus1
1 xlarr arg minxisinΠprime
S(x ζ)
2 ζ larr ζ minus 01
I Iterates ζ over a modified IPFP objective function
30 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
From ζ = 1 to ζ = 0
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
minus1 minus05 0 05 1minus1
minus05
0
05
1
31 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
From ζ = 0 to ζ = minus1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
minus1 minus05 0 05 1
minus1
minus05
0
05
1
32 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
GNCCP vs IPFP
Pros
No more need of initialization
Converge towards a mappingmatrix
Cons
Complexity Iterate over IPFP
33 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]
C
C
C
CC C
CS
S
Alkane Acyclic
NN
C
O
C
C
C
MAO PAH
34 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Protocol
Arbitrary costs
I All substitutions 1
I All insertionsdeletions 3
Relative error
I Accuracy measure
I Overestimation The lowest approximation is the best one (dopt)
I Relative error d(Gi Gj)minus dopt(Gi Gj)
dopt(Gi Gj)
35 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Relative errors
Alkane Acyclic MAO PAH0
50
100
150
200
250
Datasets
o
f re
lative e
rror
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
36 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
log Time vs Score Deviation
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Alkane
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
Acyclic
A
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
MAO
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20
01
02
03
04
05
06
07
08
09
1
log10 of time in seconds
Score
devia
tion
PAH
LSAP Riesen
LSAP rw
LSAP Kminusgraphs
IPFP random
IPFP rw
GNCCP
37 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Analysis
Tradeoff
I Accuracy is dependant to complexity
I Choose your method according to your priority
More complete analysis
I ICPR GED contest httpsgdc2016greycfr
I Others methods + Others datasets
38 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
GED Limitations
Mathematical properties
I GED is a distance
I But not an euclidean one
Impossible to derive a trivial kernel
Use with caution in SVMs
Complexity
I Still hard to compute on larger graphs
I Accuracy hard to evaluate (lack of gt)
39 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Outlooks
Algorithms
I Matrix optimization
I Edge based mapping
I Optimization algorithms
40 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Outlooks
Applications
I Explore graph space trough GEDI Median graph joint work with Paul
I Use of ged for classification
I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre
I Behavior of different methods
41 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Conclusion
I GED is still an open problem
I Approximation algorithms exists
I but it stills room for improvements
I Focus on applications
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods
Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France
Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer
Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122
Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267
Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959
42 42
Recommended