40
GRAPH MATCHING AND LEARNING IN PATTERN RECOGNITION IN THE LAST 10 YEARS PASQUALE FOGGIA * , GENNARO PERCANNELLA and MARIO VENTO Department of Information Engineering Electrical Engineering and Applied Mathematics University of Salerno Via Giovanni Paolo II, 132, 84084 Fisciano (SA), Italy * [email protected] [email protected] [email protected] Received 25 June 2013 Accepted 14 October 2013 Published 9 December 2013 In this paper, we examine the main advances registered in the last ten years in Pattern Recognition methodologies based on graph matching and related techniques, analyzing more than 180 papers; the aim is to provide a systematic framework presenting the recent history and the current developments. This is made by introducing a categorization of graph-based tech- niques and reporting, for each class, the main contributions and the most outstanding research results. Keywords : Structural pattern recognition; graph matching; graph kernels; graph embeddings; graph learning; graph clustering; graph and tree search strategies. 1. Introduction Structural Pattern Recognition bases its theoretical foundations on the decomposi- tion of objects in terms of their constituent parts (subpatterns) and of the relations among them. Graphs, usually enriched with node and edge attributes, are the elec- tive data structures for supporting this kind of representations. Some of the methods working on graphs introduce some restrictions on the structure of the graphs (e.g. only allowing planar graphs) or on the kind of attributes (e.g. some methods only allow single real-valued attributes for the graph edges). The use of a graph-based pattern representation induces the need to formulate the main activities required for Pattern Recognition in terms of operations on graphs: classi¯cation, usually intended as the comparison between an object and a Corresponding author. International Journal of Pattern Recognition and Arti¯cial Intelligence Vol. 28, No. 1 (2014) 1450001 (40 pages) # . c World Scienti¯c Publishing Company DOI: 10.1142/S0218001414500013 1450001-1 Int. J. Patt. Recogn. Artif. Intell. 2014.28. Downloaded from www.worldscientific.com by 122.176.242.35 on 07/12/15. For personal use only.

GRAPH MATCHING AND LEARNING IN PATTERN RECOGNITION IN THE LAST 10 YEARS

Embed Size (px)

DESCRIPTION

International Journal of Pattern Recognitionand Arti¯cial IntelligenceVol. 28, No. 1 (2014) 1450001 (40 pages)#.c World Scienti¯c Publishing Company

Citation preview

  • GRAPH MATCHING AND LEARNING IN PATTERN

    RECOGNITION IN THE LAST 10 YEARS

    PASQUALE FOGGIA*, GENNARO PERCANNELLA and MARIO VENTO

    Department of Information Engineering

    Electrical Engineering and Applied Mathematics

    University of Salerno

    Via Giovanni Paolo II, 132, 84084 Fisciano (SA), Italy*[email protected]@[email protected]

    Received 25 June 2013

    Accepted 14 October 2013

    Published 9 December 2013

    In this paper, we examine the main advances registered in the last ten years in Pattern

    Recognition methodologies based on graph matching and related techniques, analyzing more

    than 180 papers; the aim is to provide a systematic framework presenting the recent history and

    the current developments. This is made by introducing a categorization of graph-based tech-niques and reporting, for each class, the main contributions and the most outstanding research

    results.

    Keywords : Structural pattern recognition; graph matching; graph kernels; graph embeddings;

    graph learning; graph clustering; graph and tree search strategies.

    1. Introduction

    Structural Pattern Recognition bases its theoretical foundations on the decomposi-

    tion of objects in terms of their constituent parts (subpatterns) and of the relations

    among them. Graphs, usually enriched with node and edge attributes, are the elec-

    tive data structures for supporting this kind of representations. Some of the methods

    working on graphs introduce some restrictions on the structure of the graphs (e.g.

    only allowing planar graphs) or on the kind of attributes (e.g. some methods only

    allow single real-valued attributes for the graph edges).

    The use of a graph-based pattern representation induces the need to formulate

    the main activities required for Pattern Recognition in terms of operations on

    graphs: classication, usually intended as the comparison between an object and a

    Corresponding author.

    International Journal of Pattern Recognitionand Articial Intelligence

    Vol. 28, No. 1 (2014) 1450001 (40 pages)

    #.c World Scientic Publishing CompanyDOI: 10.1142/S0218001414500013

    1450001-1

    Int.

    J. Pa

    tt. R

    ecog

    n. A

    rtif.

    Inte

    ll. 2

    014.

    28. D

    ownl

    oade

    d fro

    m w

    ww

    .wor

    ldsc

    ient

    ific.

    com

    by 1

    22.1

    76.2

    42.3

    5 on

    07/

    12/1

    5. F

    or p

    erso

    nal u

    se o

    nly.

  • set of prototypes, and learning, which is the process for obtaining a model of a class

    starting from a set of known samples, are among the key issues that must be

    addressed using graph-based techniques.

    The use of graphs in Pattern Recognition dates back to the early seventies, and

    the paper \Thirty years of graph matching in Pattern Recognition"26 reports a

    survey of the literature on graph-based techniques since the rst years and up to the

    early 2000's. We have surely assisted to a maturation of the classical techniques for

    graph comparison, either exact or inexact; at the same time we are assisting to a

    rapid growth of many alternative approaches, such as graph embedding and graph

    kernels, aimed at making possible the application to graphs of vector-based techni-

    ques for classication and learning (such as the ones derived from the statistical

    classication and learning theory).

    In this paper, we discuss the main advances registered in graph-based method-

    ologies in the last 10 years, analyzing more than 180 papers on this topic; the aim is

    to provide a systematic framework presenting the recent history of graphs in Pattern

    Recognition and the current trends.

    Our analysis starts from the above mentioned survey26 and completes its contents

    by considering a selection of the most recent main contributions; consequently, the

    present paper, for the sake of conciseness, reports only references to works published

    during the last 10 years. The reader is kindly invited to consult Ref. 26 for recovering

    the previous related works. However, the taxonomy of the papers presented in

    Ref. 26 has been extended with other graph-based problems that are related to

    matching, either because they involve some form of graph comparison, or because

    they use a graph-based approach to group patterns into classes. Figure 1 shows a

    graphical representation of the taxonomy adopted in this paper.

    In fact, in the last decade we have assisted to the birth and growth of methods

    facing learning and classication in a rather innovative scientic vision: the

    computational burden of matching algorithms together with their intrinsic com-

    plexity, in opposition to the well-established world of statistical Pattern Recognition

    methodologies, suggested new paradigms for the graph-based methods. Why do not

    we try to reduce graph matching and learning to vector-based operations, so as to

    make it possible the use of statistical approaches?

    Two opposite ways of facing the problem, each with its pros and cons: \graphs

    from the beginning to the end", with a few heavy algorithms, but the exploitation of

    all the information contained in the graphs; on the other side, the risk of loosing

    discriminating power during the conversion of graphs into vectors (by selecting

    suitable properties), counterbalanced by the immediate access to all the theoretically

    assessed achievements of the statistical framework. In a sense, there are some tra-

    ditional tools that can be considered to be halfway between these two approaches: an

    example is Graph Edit Distance (GED), that is based on a matching between the

    nodes and the edges of the two graphs, but produces a distance information that can

    be used to cast the graphs into a metric space. However, GED can still be considered

    P. Foggia, G. Percannella & M. Vento

    1450001-2

    Int.

    J. Pa

    tt. R

    ecog

    n. A

    rtif.

    Inte

    ll. 2

    014.

    28. D

    ownl

    oade

    d fro

    m w

    ww

    .wor

    ldsc

    ient

    ific.

    com

    by 1

    22.1

    76.2

    42.3

    5 on

    07/

    12/1

    5. F

    or p

    erso

    nal u

    se o

    nly.

  • an approach of the rst kind, since in the computation of the metric, the information

    attached to the subparts can be considered in a context-dependent way, and has not

    to be reduced a priori to a vectorial form.

    These two opposite factions are now simultaneously active, each hoping to

    overcome the other; 10 years ago these innovative methods were in the background,

    but now they are gaining more and more attention in the scientic literature on

    graphs. This is the reason why the categorization reported in this paper has

    been further expanded by including a new section describing a variety of novel

    approaches, such as graph embedding, graph kernels, graph clustering and graph

    learning, dedicating a subsection to each of them. It is worth pointing out that these

    methods were of course already known at the time of Ref. 26, but their diusion and

    scientic interest has shown a signicant growth in the last decade. For instance, a

    recent survey by Hancock and Wilson71 compare and contrast the work on graph-

    based techniques by the Bern group led by Horst Bunke and the York group led by

    Edwin Hancock. The rst group has historically put more emphasis on the purely

    structural aspects of graph-based techniques, while the second has focused on the

    extensions to the graph domain of probabilistic and information theoretic method-

    ologies; however, both the schools in the last decade have found a point of conver-

    gence in the adoption of graph kernels and graph embedding techniques. Another

    recent paper by Livi and Rizzi98 present a survey of graph matching techniques.

    However, despite its title, it is mostly dedicated to graph embeddings and graph

    kernels, and does not aim to cover comprehensively the graph matching techniques;

    furthermore the paper is less specically devoted to approaches used within the

    Pattern Recognition community.

    The overall organization of our paper is based on a categorization of the

    approaches with respect to the problem formulation they adopt, and secondarily to

    the kind of technique used to face the problem, following the taxonomy reported in

    Fig. 1. We have distinguished between graph matching problems, that will be pre-

    sented in Sec. 2, and other problems related to graph comparison, that are discussed

    in Sec. 3. In particular, the section on graph matching is divided between exact and

    inexact matching techniques. The section on other problems is articulated according

    to the techniques that have obtained most attention in recent literature, namely

    graph embedding, graph kernels, graph clustering and graph learning with a mis-

    cellaneous problems subsection for less common but related problems.

    For reasons of space, in this survey we have focused on the algorithms and not on

    their applications. The interested reader may nd some complementary surveys on

    the applications of graph matching to Computer Vision and Pattern Recogniton, in

    Refs. 28 and 53. For the very same reasons, we have not included research papers

    from outside of the Pattern Recognition community. Graph-based methods are used

    and investigated in many other research elds; among them, we can mention, with no

    pretense at completeness, Data Mining, Machine Learning, Complex Networks

    Analysis and Bioinformatics.

    Graphs in Pattern Recognition in the Last 10 Years

    1450001-3

    Int.

    J. Pa

    tt. R

    ecog

    n. A

    rtif.

    Inte

    ll. 2

    014.

    28. D

    ownl

    oade

    d fro

    m w

    ww

    .wor

    ldsc

    ient

    ific.

    com

    by 1

    22.1

    76.2

    42.3

    5 on

    07/

    12/1

    5. F

    or p

    erso

    nal u

    se o

    nly.

  • 2. Graph Matching

    We recall briey the terminology used in our previous survey.26 Exact graph

    matching is the search for a mapping between the nodes of two graphs which is edge-

    preserving, in the sense that if two nodes in the rst graph are linked by an edge, the

    corresponding nodes in the second graph must have an edge, too. Several variants of

    exact matching exist (e.g. isomorphism, subgraph isomorphism, monomorphism,

    homomorphism, maximum common subgraph) depending on whether this constraint

    must hold in both directions of the mapping or not, if the mapping must be injective

    and if the mapping must be surjective.

    More formally, given two graphsG1 V1;E1 andG2 V2;E2 (where V and Eare the sets of nodes and edges, respectively), a mapping is a function : V1 ! V2. Amapping is edge preserving i:

    8v;w 2 V1; v;w 2 E1 ) v; w 2 E2 _ v w: 1An edge preserving mapping is also called a homomorphism. A monomorphism, also

    called an edge-induced subgraph isomorphism, is a homomorphism that is also

    injective:

    8v 6 w 2 V1; v 6 w: 2A graph isomorphism is a monomorphism that is bijective, and whose inverse

    mapping 1 is also a monomorphism:

    8v2 2 V2; 9v1 1v2 2 V1 : v2 v1:1 is a monomorphism

    3

    Fig. 1. A graphical representation of the adopted categorization of the considered graph-based techni-

    ques. The techniques in the gure have been chosen because they either involve some kind of graph

    comparison, or use a graph-based approach to group objects into classes.

    P. Foggia, G. Percannella & M. Vento

    1450001-4

    Int.

    J. Pa

    tt. R

    ecog

    n. A

    rtif.

    Inte

    ll. 2

    014.

    28. D

    ownl

    oade

    d fro

    m w

    ww

    .wor

    ldsc

    ient

    ific.

    com

    by 1

    22.1

    76.2

    42.3

    5 on

    07/

    12/1

    5. F

    or p

    erso

    nal u

    se o

    nly.

  • A mapping is a subgraph isomorphism, that some authors call a node-induced

    subgraph isomorphism, if there is a (node-induced) subgraph of G 02 of G2 such that is an isomorphism between G1 and G

    02. More formally:

    V 02 V2 fv2 2 V2 : 9v1 2 V1 : v2 v1gE 02 E2 E2 \ V 02 V 02 is an isomorphism between G1 and G

    02 V 02 ;E 02:

    8>>>:

    4

    Finally, the maximum common subgraph problem is the search of the largest sub-

    graph ofG1 that is isomorphic to a subgraph ofG2 (and usually, of the corresponding

    mapping between the two subgraphs).

    In inexact graph matching, instead, the constraints on edge preservation are

    relaxed, either because the algorithms attempt to deal with errors in the input graphs

    (and so we have error-correcting matching) or because, for reducing the computa-

    tional cost, they search the mapping with a strategy that does not ensure the opti-

    mality of the found solutions (approximate or suboptimal matching).

    For inexact matching, there is not a single formal statement of the problem;

    instead, dierent papers often use slightly dierent formalizations, that may lead to

    dierent ways of relaxing the edge preservation constraints. With no pretense at

    completeness, in the following we will describe two formalizations that have been

    used by several works.

    In the rst denition, the concept of a mapping function is extended so as to

    include the possibility of mapping a node v to a special, null node denoted as ; thus

    the mapping is a function : V1 ! V2 [ fg. We will assume that is injective for thenodes of V1 not mapped to ,

    8v 6 w 2 V1; v 6 ) v 6 w 5while allowing that several nodes may be mapped to . With a slightly improper

    notation, we will say that 1w to indicate that there is no node v 2 V1 suchthat v w.

    Then, the cost of a mapping is dened as:

    C Xv2V1

    v6

    CRv; v Xv2V1

    v

    CDv Xw2V2

    1w

    CDw

    X

    v;w2E1v; w2E2

    C 0Rv;w; v; w

    X

    v;w2E1v; w62E2

    C 0Dv;w X

    v;w2E21v; 1w62E1

    C 0Dv;w; 6

    where CR: ; : is the cost for the replacement of a node, CD is the cost for thedeletion of a node, and C 0R and C

    0D are the replacement and deletion costs for edges.

    Graphs in Pattern Recognition in the Last 10 Years

    1450001-5

    Int.

    J. Pa

    tt. R

    ecog

    n. A

    rtif.

    Inte

    ll. 2

    014.

    28. D

    ownl

    oade

    d fro

    m w

    ww

    .wor

    ldsc

    ient

    ific.

    com

    by 1

    22.1

    76.2

    42.3

    5 on

    07/

    12/1

    5. F

    or p

    erso

    nal u

    se o

    nly.

  • These cost functions are to be dened according to the application requirements, and

    usually take into account additional, application-dependent attributes that are at-

    tached to nodes and edges.

    In this formulation, the matching problem is cast as the search of the matching

    that minimizes the cost C. With an appropriate choice of the cost functions, itcan be demonstrated that the exact matching problems dened previously can be

    seen as special cases of this one, with the additional requirement that the matching

    cost must be 0.

    In the second formulation, called weighted graph matching, the graphs are

    represented through their adjacency matrices; usually the elements of the matrices

    are not restricted to 0 and 1, but can express a continuous weight for the relation

    between two nodes: so the generic element Aij of the matrix A is 0 if there is not an

    edge between nodes i and j, and has otherwise a real value in 0; 1 denoting theweight for the edge i; j.

    Given two graphs represented by their adjacency matrices A and B, a com-

    patibility tensor Cijkl is introduced to measure the compatibility between two

    edges:

    Cijkl 0 if Aij 0 or Bij 0;cAij;Bkl otherwise

    7

    where c: ; : is a suitably dened compatibility function. The matching is repre-sented by a matching matrix M , whose elements Mik are 1 if node i of the rst

    graph is matched with node k of the second graph, 0 otherwise. Thus the matching

    problem is formulated as the search of the matrix M that maximizes the following

    function:

    WM Xi

    Xj

    Xk

    Xl

    Mik Mjl Cijkl: 8

    subject to the constraints:

    Mik 2 f0; 1g; 8i;Xk

    Mik 1; 8k;Xi

    Mik 1: 9

    Also with this formulation, it can be demonstrated that with a suitable choice of

    the compatibility function c: ; :, the various forms of exact matching can be seenas a special case.

    While in the years covered by Ref. 26 the research has explored both exact and

    inexact matching, the recent work on graphs in the Pattern Recognition community

    has been mostly focused on inexact graph matching. This may be due to the fact that

    today the Pattern Recognition research is applying graphs to more complex pro-

    blems than those that were feasible some years ago, and so it is more frequent the use

    of larger and noisier graphs.

    P. Foggia, G. Percannella & M. Vento

    1450001-6

    Int.

    J. Pa

    tt. R

    ecog

    n. A

    rtif.

    Inte

    ll. 2

    014.

    28. D

    ownl

    oade

    d fro

    m w

    ww

    .wor

    ldsc

    ient

    ific.

    com

    by 1

    22.1

    76.2

    42.3

    5 on

    07/

    12/1

    5. F

    or p

    erso

    nal u

    se o

    nly.

  • 2.1. Exact matching

    While there has been little work on the overall to improve existing exact matching

    algorithms, some eort has been put to provide a better characterization of the

    existing methods. As an example, the 2003 paper by De Santo et al.138 presents an

    extensive comparative evaluation of four exact algorithms for graph isomorphism

    and graph-subgraph isomorphism.

    Most existing exact matching algorithms are based on some form of tree search,

    where the matching is constructed starting with an empty mapping function and

    adding a pair of nodes at a time, usually with the possibility of backtracking, and the

    use of heuristics to avoid the complete exploration of the space of all the possible

    matchings. In 2007, Konc and Janei87 propose MaxCliqueDyn, an improved al-gorithm for nding the Maximum Clique (and hence the Maximum Common Sub-

    graph) which uses branch and bound, combined with approximate graph coloring for

    nding tight bounds in order to prune the search space. In a 2011 paper, Ullmann160

    presents a substantial improvement of his own very well-known subgraph isomor-

    phism algorithm from 1976. The new algorithm incorporates several ideas from the

    literature on the Binary Constraint Satisfaction Problem, of which the subgraph

    isomorphism can be considered a special case. Also Zampelli et al.179 propose a

    method based on Constraint Satisfaction, which is an extension of the technique

    introduced by Larrosa and Valiente in 2002.90 A further development of the tech-

    nique, with the introduction of a better ltering based on the AllDierent constraint,

    is proposed by Solnon152 in 2010.

    Among the approaches not based on tree search, we can mention Gori et al.,64

    who, in their 2005 paper, propose an isomorphism algorithm that is based on Ran-

    dom Walks, that works only on a class of graphs denoted by the authors as Mar-

    kovian Spectrally Distinguishable graphs; the authors verify experimentally on a large

    database of graphs that, as long as the graphs have some kind of irregularity or

    randomness, the probability of not satisfying this assumption is very low. The 2011

    paper by Weber et al.169 extends the matching algorithm based on the construction

    of a decision tree by Messmer and Bunke,109 signicantly reducing the spatial

    complexity for graphs whose nodes have a small number of dierent labels. In their

    2004 paper,39 Dickinson et al. discuss the matching problem (graph isomorphism,

    graph-subgraph isomorphism and maximum common subgraph) for the special case

    of graphs having unique node labels. Finally, the 2012 paper by Dahm et al.34 present

    a technique for speeding up existing exact subgraph isomorphism algorithms on large

    graphs.

    2.2. Inexact matching

    Inexact matching methods have received comparatively more attention in the re-

    search community, both by extending existing techniques and by introducing novel

    ideas unrelated to previous work. In particular, the extensions of previous methods

    have interested mostly algorithms based on the reduction of graph matching to a

    Graphs in Pattern Recognition in the Last 10 Years

    1450001-7

    Int.

    J. Pa

    tt. R

    ecog

    n. A

    rtif.

    Inte

    ll. 2

    014.

    28. D

    ownl

    oade

    d fro

    m w

    ww

    .wor

    ldsc

    ient

    ific.

    com

    by 1

    22.1

    76.2

    42.3

    5 on

    07/

    12/1

    5. F

    or p

    erso

    nal u

    se o

    nly.

  • continuous optimization problem, algorithms based on spectral properties of the

    graphs (i.e. properties related to the eigenvalues and eigenvectors of the adjacency

    matrix or of other matrices characterizing the graph structure), and methods ap-

    proximating the solution of the graph matching problem by means of the bipartite

    graph matching, which is a simpler problem solvable in polynomial time.

    Many inexact matching algorithms are formulated as an approximate way to

    compute the GED. A recent paper by Gao et al.57 in 2010 presents a survey on this

    topic. GED computes the distance between two graphs on the basis of the minimal

    set of edit operations (e.g. node additions and deletions, etc.) needed to transform

    one graph into the other one. A 2012 paper by Sole-Ribalta et al.151 provides a

    theoretical discussion on the relation between the properties of the distance function

    and the costs assigned to each edit operation.

    Although in principle the GED problem is not related to matching, in practice

    most methods compute the distance by nding a matching for the nodes that are

    preserved by the edit operations (i.e. those that are not added or removed, but

    possibly have their label changed); given this matching, the edit distance can be

    obtained as the sum of a term accounting for the matched nodes and their edges, and

    a term accounting for the remaining nodes/edges (see Eq. (6)). So usually the out-

    come of the algorithm is not only an indication of the distance between the graphs,

    but also the matching that is supposed to minimize the value of this distance. This is

    why we have chosen to include some GED methods in this section.

    2.2.1. Techniques based on tree search

    Methods based on tree search have been also used for inexact matching. In this case,

    the adopted heuristics may not ensure that the optimal solution is found, yielding a

    suboptimal matching. As an example, Sanfeliu et al.135,136 and Serratosa et al.,142

    extend their previous work on inexact matching of Function-Described Graphs

    (FDG), that are Attributed Relational Graphs enriched with constraints on the joint

    probabilities of nodes and edges, used to represent a set of graphs, while in Ref. 141,

    Serratosa et al. detail how these FDG can be automatically constructed. Cook et al.29

    in 2003 propose the use of beam search, a heuristic search method derived from the

    A* algorithm, for computing the GED. The paper by Hidovi and Pelillo73 in 2004extends the denition of a graph metric based on Maximum Common Subgraph,

    introduced by Bunke in 1999, so that it can also be applied to graphs with node

    attributes.

    2.2.2. Continuous optimization

    While graph matching is inherently a discrete optimization problem, several inexact

    algorithms have been proposed to reformulate it as a continuous problem (by

    relaxing some constraints), solve the continuous problem using one of the many

    available optimization algorithms and then recast the found solution in the discrete

    domain. Usually the algorithm used for the continuous problem only ensures that a

    P. Foggia, G. Percannella & M. Vento

    1450001-8

    Int.

    J. Pa

    tt. R

    ecog

    n. A

    rtif.

    Inte

    ll. 2

    014.

    28. D

    ownl

    oade

    d fro

    m w

    ww

    .wor

    ldsc

    ient

    ific.

    com

    by 1

    22.1

    76.2

    42.3

    5 on

    07/

    12/1

    5. F

    or p

    erso

    nal u

    se o

    nly.

  • local optimum is found; moreover, since a discretization step is required afterwards,

    the matching found may not even guarantee to exhibit local optimality.

    An example of evolution of an existing matching method of this category is given

    by the 2003 paper by Massaro and Pelillo,107 which improves a previous work on the

    search of the Maximum Common Subgraphs that use a theorem by Bomze to re-

    formulate this problem as a quadratic optimization in a continuous domain.

    Zaslavskiy et al.182 in their 2009 paper present a graph matching algorithm in which

    the matching is formulated as a convex-concave programming problem which is

    solved by interpolating between two approximate simpler formulations. Also the

    2011 paper by Rota Bul et al.134 is based on the same formulation of graphmatching; in this case the authors solve the quadratic optimization problem using

    infection-immunization dynamics, a new iterative algorithm based on evolutionary

    game theory. The 2002 paper by van Wyk et al.164 addresses the problem of At-

    tributed Graph matching as a parameter identication problem, and propose the use

    of a Reproducing Kernel Hilbert Space interpolator (RKHS) to solve this problem.

    The 2003 paper by van Wyk and van Wyk161 extends the previous method by

    providing a more general formulation of the problem. The same authors in a 2004

    paper163 further generalize the method, presenting a kernel-based framework for

    graph matching which include as special cases the previous two algorithms. In 2004,

    van Wyk and van Wyk162 present a graph matching algorithm based on the Pro-

    jections Onto Convex Sets approach. The 2006 paper by Justice and Hero81 proposes

    a reformulation of the GED as Binary Linear Programming problem, for which

    they provide upper and lower bounds in polynomial time. Kostin et al.89 in 2005

    present an extension of the probabilistic relaxation algorithm by Christmas et al.25

    Chevalier et al.22 propose in a 2007 paper a technique that integrates probabilistic

    relaxation with bipartite graph matching, applied to Region Adjacency Graphs. In

    their 2008 paper,157 Torresani et al. introduce an algorithm based on a technique

    called dual decomposition: the matching problem (in a continuous reformulation) is

    decomposed into a set of simpler problems, depending on a parameter vector; the

    simpler problems can be solved providing a lower bound to the minimization of the

    functional to be optimized. Then the algorithm searches for the tightest bound by

    varying the parameter vector. Caetano et al.19 propose in 2009 a technique in which

    the functional to be optimized has a parametric form, and the authors propose a

    training phase to learn these parameters. In a 2011 paper,21 Chang and Kimia

    present an extension of the Graduated Assigment Graph Matching by Gold and

    Rangarajan, modied so as to work on hypergraphs instead of graphs. Zhou and De

    la Torre184 present a method called factorized graph matching in which the anity

    matrix used to dene the functional to be optimized is factored into a Kronecker

    product of smaller matrices, separately encoding the structure of the graphs and the

    anities between nodes and between edges. The authors propose an optimization

    method based on this factorization that leads to an improvement in space and time

    requirements.

    Graphs in Pattern Recognition in the Last 10 Years

    1450001-9

    Int.

    J. Pa

    tt. R

    ecog

    n. A

    rtif.

    Inte

    ll. 2

    014.

    28. D

    ownl

    oade

    d fro

    m w

    ww

    .wor

    ldsc

    ient

    ific.

    com

    by 1

    22.1

    76.2

    42.3

    5 on

    07/

    12/1

    5. F

    or p

    erso

    nal u

    se o

    nly.

  • Sanrom et al.137 in 2012 propose a special purpose, probabilistic graph matchingmethod for graphs representing sets of 2D points, based on the Expectation

    Maximization (EM) algorithm.

    Sole-Ribalta and Serratosa149 in their 2011 paper propose two sub-optimal

    algorithms for the common labeling problem, a generalization of inexact graph

    matching in which the number of graphs is larger than two (the problem cannot be

    reduced to several pairwise matchings). The rst proposed algorithm uses an ex-

    tension of Graduated Assignment, while the second is based on a probabilistic for-

    mulation and adopts an iterative approach somewhat similar to Probabilistic

    Relaxation. A 2011 paper by Rodenas et al.131 presents a parallelized version of the

    rst algorithm. A 2013 paper by Sole-Ribalta and Serratosa150 present a further

    development of the rst algorithm, based on the matching of the nodes of all graphs

    to a virtual node set.

    2.2.3. Spectral methods

    Spectral matching methods are based on the observation that the eigenvalues of a

    matrix do not change if the rows and columns are permuted. Thus, given the matrix

    representations of two isomorphic graphs (for instance, their adjacency matrices),

    they have the same eigenvalues. The converse is not true; so, spectral methods are

    inexact in the sense that they do not ensure the optimality of the solution found.

    Caelli and Kosinov16,18 in 2004 propose a matching algorithm that uses the graph

    eigenvectors to dene a vector space onto which the graph nodes are projected; a

    clustering algorithm in this vector space is used to nd possible matches. Also

    Robles-Kelly and Hancock,130 in a 2007 paper, propose the embedding of graph nodes

    into a dierent space (a Riemannian manifold) using spectral properties. The 2004

    and 2005 papers by Robles-Kelly and Hancock,128,129 present a graph matching

    approach based on Spectral Seriation of graphs: the adjacency matrix is transformed

    into a sequence using spectral properties, then the matching is performed by com-

    puting the String Edit Distance between these sequences. Cour et al.31 in 2007

    propose a spectral matching method called balanced graph matching, using a novel

    relaxation scheme that naturally incorporates matching constraints. The authors

    also introduce a normalization technique that can be used to improve several other

    algorithms such as the classical Graduated Assignment Graph Matching by Gold

    and Rangarajan. Cho et al.23 propose a reformulation of the inexact graph matching

    as a random walk problem, and show that this formalization provides a theoretical

    interpretation of both spectral methods and of some other techniques based on

    continuous optimization; in this framework, the authors present an original algo-

    rithm based on techniques commonly used for Web ranking.

    In a 2006 paper, Qiu and Hancock118 present an approximate, hierarchical

    method for graph matching that uses spectral properties to partition each graph into

    nonoverlapping subgraphs, which are then matched separately, with a signicant

    reduction of the matching time. The same authors present a somewhat similar idea in

    P. Foggia, G. Percannella & M. Vento

    1450001-10

    Int.

    J. Pa

    tt. R

    ecog

    n. A

    rtif.

    Inte

    ll. 2

    014.

    28. D

    ownl

    oade

    d fro

    m w

    ww

    .wor

    ldsc

    ient

    ific.

    com

    by 1

    22.1

    76.2

    42.3

    5 on

    07/

    12/1

    5. F

    or p

    erso

    nal u

    se o

    nly.

  • a 2007 paper,119 where the partition is based on commute times, which can be

    computed from the Laplacian spectrum of the graph. Wilson and Zhu171 in their 2008

    paper present a survey of dierent techniques for the spectral representation of

    graphs and trees. In 2011, Escolano et al.45 propose a matching method based on the

    representation of a graph as a bag of partial node coverages, described using spectral

    features. In 2011, Duchenne et al.40 present a generalization of spectral matching

    techniques to hypergraphs, using some results from tensor algebra.

    2.2.4. Other approaches

    Among other techniques used for inexact matching, Bagdanov and Worring2 in a

    2003 paper introduce a matching algorithm based on bipartite matching for the so-

    called First Order Gaussian Graphs (FOGG ), which are an extension of random

    graphs having Gaussian random variables as their node attributes. Also the paper by

    Skomorowski147 in 2007 presents a pattern recognition algorithm based on a variant

    of random graphs, using for the matching a syntactic approach based on graph

    grammars. The 2003 paper by Park et al.116 addresses the problem of partial

    matching between a model graph and a larger image graph by combining a proba-

    bilistic formulation similar to the one used in probabilistic relaxation with a greedy

    search technique. In a 2006 paper, Conte et al.27 present an inexact matching

    technique for pyramidal graph structures, which is based on weighted bipartite graph

    matching, but use information from the upper levels of a pyramid to constrain the

    matching of the lower levels. Xiao et al.174 in 2008 propose a graph distance based on

    a vector representation called Substructure Abundance Vector (SAV), that can be

    considered as an extension of the graph distance based on Maximum Common

    Subgraph (MCS). The paper by Auwatanamongkol1 in 2007 proposes a genetic

    algorithm for a special case of inexact matching, where the nodes are associated to 2D

    points. Bourbakis et al.9 in 2007 introduce the so-called Local-Global graphs (L-G

    graphs), as an extension of Region Adjacency graphs in which the edges are obtained

    through a Delaunay triangulation, for which they introduce an inexact, suboptimal

    matching algorithm which is based on a greedy search. In 2002, Wang et al.168

    present a polynomial algorithm for the inexact graph-subgraph matching for the

    special case of undirected acyclic graphs. The 2004 paper by Sebastian et al.140

    presents a GED algorithm for the special case of shock graphs, based on dynamic

    programming. In their 2008 paper,4 Bai and Latecki propose an inexact suboptimal

    matching algorithm for skeleton graphs, based on the use of bipartite graph

    matching. Chowdury et al.24 in a 2009 paper combine weighted bipartite graph

    matching with the use of the automorphism groups for the cycles contained in the

    graph, to improve the accuracy of the matching found. A 2009 paper by Riesen and

    Bunke125 proposes an approximation of GED with the use of Bipartite Graph

    Matching, solved using the Munkres' algorithm. The 2010 paper by Kim et al.83

    approximates the matching between Attributed Relational Graphs using the nested

    assignment problem: an inner assigment step is used to nd the best matching of the

    Graphs in Pattern Recognition in the Last 10 Years

    1450001-11

    Int.

    J. Pa

    tt. R

    ecog

    n. A

    rtif.

    Inte

    ll. 2

    014.

    28. D

    ownl

    oade

    d fro

    m w

    ww

    .wor

    ldsc

    ient

    ific.

    com

    by 1

    22.1

    76.2

    42.3

    5 on

    07/

    12/1

    5. F

    or p

    erso

    nal u

    se o

    nly.

  • adjacent edges; this information is used then to dene a matching cost for the nodes,

    and an outer assigment step nds the node matchings that minimizes the sum of

    these costs. This double application of the assignment problem is the original aspect

    of this method, dierentiating it, for instance, from the heuristic proposed by Riesen

    and Bunke. Also Raveaux et al.121 in 2010 present an approximate algorithm based

    on bipartite graph matching; in this case the aim is to compute an approximation of

    the GED, and the bipartite matching is performed between small subgraphs of each

    of the two graphs. In 2011, Fankhauser et al.46 present an algorithm for computing

    the GED using bipartite graph matching, solved using the algorithm by Volgenant

    and Jonker. The same authors in 201247 propose an suboptimal technique for graph

    isomorphism, also based on bipartite graph matching. The algorithm has the dis-

    tinctive feature that it either nds an exact solution, or it rejects the pair of graphs; thus

    a slower algorithm can be used for the cases not covered by the proposed method.

    Tang et al.155 in 2011 propose a graph matching algorithm based on the Dot

    Product Representation of Graphs (DPRG) proposed by Scheinerman and Tucker,139

    which represents each node using a numeric vector chosen so that each edge value

    corresponds approximately to the dot product of the nodes connected by the edge; the

    choice of the node vectors is formulated as a continuous optimization problem. The

    proposed method is extended in a 2012 paper by the same authors.156 The 2011 paper

    by Macrini et al.104 proposes a matching algorithm for bone graphs, which are a

    representation for 3D shapes, using weighted bipartite graph matching. The paper by

    Jiang et al.78 in 2011 presents a technique for inexact subgraph isomorphism based on

    geometric hashing, requiring very little computational cost for the intended use case

    of searching for several small input graphs within a large reference graph.

    A novel optimization technique, Estimation of Distributions Algorithms (EDA),

    has been succesfully used for inexact graph matching. EDA are somewhat similar to

    genetic/evolutive algorithms, but the parameters of each tentative solution are

    considered as random variables; a stochastic sampling process is used to produce the

    next generation.

    The paper by Bengoetxea et al.5 in 2002 proposes the use of EDA for inexact,

    suboptimal graph matching, by associating each node of the rst graph to a random

    variable whose possible values are the nodes of the second graph. In 2005, Cesar et al.80

    formulate the inexact graph homomorphism as a discrete optimization problem, and

    compare beam search, genetic algoritms and EDA for solving this problem. A dierent

    approach, also based on a probabilistic framework, is proposed by Caelli and Caetano

    in 200517; the matching is formulated as an inference problem on a Hidden Markov

    Random Field (HMRF), for which an approximate solution is computed.

    The 2004 paper by Dickinson et al.38 denes a graph similarity measure for the

    special case of graphs having unique node labels, and proposes a hierarchical algo-

    rithm to eciently compute this measure. He et al.72 in 2004 propose an ad hoc

    matching algorithm for skeleton graphs, that performs a linearization of the graphs,

    and then uses string matching to nd an inexact correspondence. A similar approach

    P. Foggia, G. Percannella & M. Vento

    1450001-12

    Int.

    J. Pa

    tt. R

    ecog

    n. A

    rtif.

    Inte

    ll. 2

    014.

    28. D

    ownl

    oade

    d fro

    m w

    ww

    .wor

    ldsc

    ient

    ific.

    com

    by 1

    22.1

    76.2

    42.3

    5 on

    07/

    12/1

    5. F

    or p

    erso

    nal u

    se o

    nly.

  • is presented in the paper by Das et al. 35 in 2012 for graphs obtained by ngerprints.

    In 2008, Gao et al.56 introduce a Graph Distance algorithm for the special case of

    graphs whose nodes represent points in a 2D space, based on the Earth Mover

    Distance (EMD). The 2009 paper by Emms et al.44 presents an original approach to

    graph matching based on quantum computing, that uses the inherent parallelism of

    some quantum physics phenomena if run on a (hypothetical) quantistic computer.

    3. Other Problems

    In this section, we will present some recent developments on graph problems that are

    not, in a strict sense, forms of graph matching, but are related to matching either

    because they provide a way of comparing two graphs (this is the case for graph

    embeddings and graph kernels), or because they use a graph-based approach to group

    input patterns into classes (in an unsupervised way for graph clustering, and in a

    supervised or semi-supervised way for graph learning). We also mention some works

    on other graph-related problems which are of specic interest as Pattern Recognition

    basic tools, such as dimensionality reduction.

    Graph embeddings and graph kernels are perhaps the most signicant novelty in

    graph-based Pattern Recognition in the recent years. Although seminal works on

    these elds were already present in earlier literature, it is in the last decade that these

    techniques have gained popularity in the Pattern Recognition community. Gaertner

    et al.55 presents an early survey on kernels applied to nonvectorial data. Bunke

    et al.12 in 2005 present a survey of graph kernels and other graph-related techniques.

    Bunke and Riesen14 in their 2011 paper propose a useful review on the topic of graph

    kernels and graph embeddings; the same authors in 201215 extend this review and

    present these techniques as a way to unify the statistical and structural approaches

    in Pattern Recognition. Please note that, although it may seem that graph embed-

    dings and kernels could help reducing the computational complexity of graph com-

    parison, many of the proposed algorithm have a cost that is equal to or higher than

    traditional matching methods (for instance, some embedding methods require

    computing the GED, while others involve a cost that is related to the number of

    graphs in the considered set). The main benet of the novel techniques is instead in

    the availability of the large corpus of theoretically sound techniques from statistical

    Pattern Recognition.

    3.1. Graph embeddings

    In the literature the term Graph embedding is used with two slightly dierent

    meanings:

    . a technique that maps the nodes of a graph onto points in a vector space, in such a

    way that nodes having similar structural properties (e.g. the structure of their

    neighborhood) will be mapped onto points which are close in this space;

    Graphs in Pattern Recognition in the Last 10 Years

    1450001-13

    Int.

    J. Pa

    tt. R

    ecog

    n. A

    rtif.

    Inte

    ll. 2

    014.

    28. D

    ownl

    oade

    d fro

    m w

    ww

    .wor

    ldsc

    ient

    ific.

    com

    by 1

    22.1

    76.2

    42.3

    5 on

    07/

    12/1

    5. F

    or p

    erso

    nal u

    se o

    nly.

  • . a technique that maps whole graphs onto points in a vector space, in such a way

    that similar graphs are mapped onto close points (see Fig. 2).

    References 16, 18, 45 and 130, discussed previously in Sec. 2.2, are an example of

    the rst kind; also, the Dot Product Representation of Graphs139 mentioned in

    Sec. 2.2 belongs to this category. Yan et al.175 show in their 2007 paper that most

    commonly used dimensionality reduction techniques can be formulated as a graph

    embedding algorithm of this kind. Their work is the basis for an embedding tech-

    nique proposed by You et al.,177 called General Solution for Supervised Graph

    Embedding (GSSGE).

    In the following subsections, we will mainly concentrate on the second kind of

    graph embedding, presenting the relevant methods categorized according to the

    main properties they attempt to preserve in the mapping.

    3.1.1. Isometric embeddings

    Methods in this category start from a distance or similarity measure between graphs,

    and attempt to nd a mapping to vectors that preserve this measure.

    Bonabeau,6 in a 2002 paper, proposes a technique based on a Self-Organizing Map

    (SOM), an unsupervised neural network adopting competitive learning, in order to

    map graphs onto a bidimensional plane. Although the term embedding is not ex-

    plicitly used, it can be considered a form of graph embedding. The mapping found by

    the network is used both as an aid for the visualization of the data represented by the

    graphs, and for clustering.

    Also the 2003 paper by de Mauro et al.36 uses a Neural Network for graph

    embedding. In particular, the proposed method works on directed acyclic graphs, and

    uses a Recursive Neural Network. The network is trained by similarity learning: the

    training set is made by pairs of graphs which have been manually labeled with a

    Fig. 2. Graph Embedding: the mapping between graphs and points in a vector space is represented by thegraph name.

    P. Foggia, G. Percannella & M. Vento

    1450001-14

    Int.

    J. Pa

    tt. R

    ecog

    n. A

    rtif.

    Inte

    ll. 2

    014.

    28. D

    ownl

    oade

    d fro

    m w

    ww

    .wor

    ldsc

    ient

    ific.

    com

    by 1

    22.1

    76.2

    42.3

    5 on

    07/

    12/1

    5. F

    or p

    erso

    nal u

    se o

    nly.

  • similarity value, and the network aims to produce an output vector for each graph so

    that the Euclidean distance between vectors is consistent with the similarity between

    the corresponding graphs.

    A recent paper by Jouili and Tabbone79 proposes a graph embedding technique

    based on constant shift embedding, a framework proposed for the embedding of

    nonmetric spaces, mainly applied to clustering problems.

    3.1.2. Spectral embeddings

    The embedding algorithms in this subsection are based on the exploitation of spec-

    tral properties of graphs, i.e. properties related to the eigenvalues and eigenvectors of

    matrices representing the graphs, such as the adjacency matrix. Since spectral

    properties are invariant with respect to node permutations, they ensure that graphs

    with an isomorphic structure will be mapped to the same vectors.

    Luo et al.101 in a 2003 paper propose the use of spectral features for graph

    embedding; in particular, they decompose the adjacency matrix of a graph into its

    principal eigenmodes, and then compute from them a vector of numerical features

    (e.g. eigenmode volume, eigenmode perimeter, inter-eigenmode distances, etc.).

    Also the 2005 paper by Wilson et al.170 uses spectral properties to dene a graph

    embedding; in this case, the authors derive a set of polynomials from the spectral

    decomposition of the Laplacian of the adjacency matrix, and use the coecients of

    these polynomials as feature vectors.

    Also the 2009 paper by Xiao et al.172 proposes a graph embedding based on

    spectral properties; in particular the method uses the heat kernel, i.e. the solution of

    the heat equation on the graph, to obtain a set of invariant properties used to obtain

    a vector representation of the graph.

    Xiao et al.173 in a 2011 paper present an embedding for hierachical graphs,

    obtained by a hierarchical segmentation of images. Spectral features are computed

    on the levels of the hierarchy, obtaining a xed size feature vector.

    3.1.3. Subpattern embeddings

    These methods are based on the detection, or the enumeration, of specic types of

    subpatterns within the graphs to be embedded.

    Torsello and Hancock158 in 2007 propose an embedding algorithm for trees. The

    algorithm requires that all the trees to be embedded are known in advance. The

    embedding is based on the construction of a Union Tree, which is a directed, acyclic

    graph having all the considered trees as subgraphs; then each tree is represented by a

    vector that encodes which nodes of the Union Tree are used by the tree.

    Czech33 proposes in a 2011 paper an embedding method based on B-matrices,

    which are a structure based on the path lengths between the nodes of a graph and are

    invariant with respect to node permutations.

    A recent paper by Luqman et al.103 presents a fuzzy multilevel embedding tech-

    nique, that combines structural information of the graph and information from the

    Graphs in Pattern Recognition in the Last 10 Years

    1450001-15

    Int.

    J. Pa

    tt. R

    ecog

    n. A

    rtif.

    Inte

    ll. 2

    014.

    28. D

    ownl

    oade

    d fro

    m w

    ww

    .wor

    ldsc

    ient

    ific.

    com

    by 1

    22.1

    76.2

    42.3

    5 on

    07/

    12/1

    5. F

    or p

    erso

    nal u

    se o

    nly.

  • graph attributes using fuzzy histograms. The method uses an unsupervised learning

    phase to nd the fuzzy intervals used in the representation.

    In a 2011 paper, Gibert et al.60 present a graph embedding based on graphs of

    words, which are an extension of the popular bag of words approach. The method

    assumes that the graphs are obtained from images, with nodes corresponding to

    salient points, and node attributes corresponding to visual descriptors of the points.

    The method performs a quantization of the attribute space, constructing a codebook.

    This codebook is used to produce an intermediate graph, called graph of words,

    whose nodes are the codebook values, and whose edges correspond to the adjacency

    in the original graph of nodes mapped to those codebook values. The nodes and edges

    of the intermediate graph are labeled with the counts of the corresponding nodes/

    edges of the original graph; then an histogram of these counts is used as the

    embedding. Two 2012 papers by the same authors further develop this method: in

    Ref. 62 the authors add a more Sophisticated procedure for constructing the code-

    book, while in Ref. 61 they use a large set of features and apply a feature selection

    algorithm to determine the most signicant ones. The same authors, in a 2013

    paper,63 propose a somewhat similar embedding technique, that removes the

    assumptions that the graphs are obtained from images, and exploits also edge

    attributes if they are present.

    The 2010 paper by Richiardi et al.122 proposes two graph embedding techniques

    specically tailored for graphs having the following constraints: the number of nodes

    is xed across all the considered set of graphs, and a total ordering is dened in the

    set of nodes. The authors show that a graph embedding exploiting these constraints

    can outperform a more general one.

    3.1.4. Prototype-based embeddings

    These embedding methods assume that a set of prototype graphs is available, and

    the mapping of a graph onto a vector space is based on the distances (obtained

    according to a suitably dened distance function) of the graph from the prototypes.

    This technique can be seen as a special case of the dissimilarity representations

    introduced by Pekalska and Duin.117

    The rst of these methods has been proposed in 2007 by Riesen et al.127 The

    method has one prototype graph for each dimension of the vector space; the corre-

    sponding component of the vector is simply dened as the GED between the pro-

    totype and the graph to be embedded. The authors discuss several strategies for

    choosing the prototypes from a training set, and evaluate them by using the

    embedding for several classication tasks. In the same year, a paper by Riesen and

    Bunke124 further develops this idea by proposing the use of several sets of randomly

    chosen prototypes, and combining the classiers obtained for each of the corre-

    sponding embeddings to form aMultiple Classier System. The advantage is that the

    resulting classier is more robust with respect to the risk of a poor choice for the

    prototypes. A 2009 paper by Lee and Duin93 explores a similar idea, but instead of a

    P. Foggia, G. Percannella & M. Vento

    1450001-16

    Int.

    J. Pa

    tt. R

    ecog

    n. A

    rtif.

    Inte

    ll. 2

    014.

    28. D

    ownl

    oade

    d fro

    m w

    ww

    .wor

    ldsc

    ient

    ific.

    com

    by 1

    22.1

    76.2

    42.3

    5 on

    07/

    12/1

    5. F

    or p

    erso

    nal u

    se o

    nly.

  • random selection of the prototypes, the proposed method creates dierent base

    classiers by using node label information for extracting dierent sets of subgraphs

    from the training set. In 2010, Lee et al.94 propose a similar method in which, instead

    of extracting subgraphs, the node label information is used to alter the training

    graphs without changing their size.

    In a 2009 paper, Riesen and Bunke123 present a Lipschitz embedding for graphs.

    Lipschitz embedding is usually employed to regularize vector spaces, but in this case

    it is proposed as a method to construct a graph embedding. Basically, each com-

    ponent of the vector representation of a graph is deduced from a set of prototype

    graphs; the value of the component is the mean distance (using GED) with the

    corresponding set of prototypes (a dierent aggregation function than the mean

    could be used). The sets of prototypes are constructed using a clustering of a training

    set, based on theK-Medoids clustering algorithm. The same authors in another 2009

    paper126 propose a method for reducing the dimensionality of this embedding, by

    using Principal Component Analysis and Linear Discriminant Analysis. Bunke and

    Riesen13 in 2011 propose an extension to this technique, which formulates the

    problem of choosing the reference graphs as a feature selection: a rst embedding is

    built using a large number of reference graphs; then a feature selection algorithm is

    applied to the obtained vectors in order to select the most signicant features, and

    only the reference graphs corresponding to these features are retained.

    Also the 2012 paper by Borzeshi et al.8 addresses the problem of selecting the

    reference graphs for graph embedding. The authors present several algorithms which

    are based on a discriminative approach: they dene several objective functions to

    measure how much the prototypes are able to discriminate between classes, and

    select the prototypes by a greedy optimization of these functions.

    3.2. Graph kernels

    A graph kernel is a function that maps a couple of graphs onto a real number, and

    has similar properties to the dot product dened on vectors. More formally, if we

    denote with G the space of all the graphs, a graph kernel is a function k such as:

    k : GG! R; 10kG1;G2 kG2;G1 8G1;G2 2 G; 11

    Xni1

    Xnj1

    ci cj kGi;Gj 0 8G1; . . . ;G2 2 G; 8 c1; . . . ; cn 2 R: 12

    Equation (11) requires the function k to be symmetric, while Eq. (12) requires it to be

    positive semi-denite.

    Informally, a graph kernel can be considered as a measure of the similarity be-

    tween two graphs; however its formal properties allow a kernel to replace the vector

    dot product in several vector-based algorithms that use this operator (and other

    functions related to dot product, such as the Euclidean norm). Among the many

    Graphs in Pattern Recognition in the Last 10 Years

    1450001-17

    Int.

    J. Pa

    tt. R

    ecog

    n. A

    rtif.

    Inte

    ll. 2

    014.

    28. D

    ownl

    oade

    d fro

    m w

    ww

    .wor

    ldsc

    ient

    ific.

    com

    by 1

    22.1

    76.2

    42.3

    5 on

    07/

    12/1

    5. F

    or p

    erso

    nal u

    se o

    nly.

  • Pattern Recogniton techniques that can be adapted to graphs using kernels we

    mention Support Vector Machine classiers and Principal Component Analysis.

    Kernels have been used for a long time to extend to the nonlinear case linear

    algorithms working on vector spaces, thanks to the Mercer's theorem: given a kernel

    function dened on a compact Hausdor space X, there is a vector space V and a

    mapping between X and V such that the value of the kernel computed on two points

    in X is equal to the dot product of the corresponding points in V . Thus a kernel can

    be seen as an implicit way of performing an embedding into a vector space. Although

    Mercer's theorem does not apply to graph kernels, in practice these latter can be used

    as a theoretically sound way to extend a vector algorithm to graphs. Of course, the

    performance of these algorithms strongly depend on the appropriateness (with re-

    spect to the task at hand) of the notion of similarity embodied in the graph kernel.

    In their 2003 paper, Kashima et al. 82 specialize for the graph domain the idea of

    marginalized kernels, a probabilistic technique for dening a kernel based on the

    introduction of hidden variables. In this case, the hidden variable is a sequence of

    node indices, generated according to a random walk on one of the graphs. Given a

    value for the hidden variable, a kernel on sequences is computed using the sequence

    of visited nodes and edges; the marginalized kernel is obtained by computing the

    expected value (with respect to the joint distribution of the hidden and visible

    variables) of this sequence kernel. Mahe and Vert105 in 2009 extend this technique to

    trees, and present an application to molecular data.

    Borgwardt and Kriegel7 in 2005 present a graph kernel that is based on paths,

    instead of walks (a path is a walk without repeated nodes); in order to avoid the

    exponential cost of enumerating all the paths in a graph, the authors propose a

    scheme to use only the shortest path between any pair of nodes, since the shortest

    paths can be computed in polynomial time.

    Neuhaus and Bunke,112 in their 2006 paper, dene three graph kernels based on

    GED. The rst kernel requires the choice of a zero pattern, a graph that, with respect

    to the kernel, will behave similarly to a null vector. The authors show that this kernel

    fulls the theoretical requirements of a kernel function, but its practical performance

    is strongly aected by the choice of the zero pattern. The authors then introduce two

    other kernels, obtained from the sum and the product of the rst kernel over a set of

    zero patterns, and show that they have the same theoretical properties, but are more

    robust with respect to the choice of these patterns.

    In their 2009 paper, Neuhaus et al. 114 present three possible ways to use GED in

    the denition of a kernel. The rst way is a diusion kernel, which turns an edit

    distance matrix into a positive denite matrix satisfying the kernel properties, but

    has the inconvenience that the set of graphs to which it is applied must be nite and

    known a priori. The second way is a convolution kernel, which is based on a de-

    composition of the edit path between the two graphs into a sequence of substitution

    operations; given a kernel for individual substitutions, this approach provides a

    denition for a kernel between two graphs. The main drawback is the exponential

    P. Foggia, G. Percannella & M. Vento

    1450001-18

    Int.

    J. Pa

    tt. R

    ecog

    n. A

    rtif.

    Inte

    ll. 2

    014.

    28. D

    ownl

    oade

    d fro

    m w

    ww

    .wor

    ldsc

    ient

    ific.

    com

    by 1

    22.1

    76.2

    42.3

    5 on

    07/

    12/1

    5. F

    or p

    erso

    nal u

    se o

    nly.

  • complexity with respect to the number of nodes, for which the authors suggest an

    approximation. The third way is a random walk kernel, where the GED is used to

    dene a fuzzy product graph, from which a kernel is obtained that evaluates the local

    similarity of corresponding parts of the two graphs.

    The 2012 paper by Gazere et al.59 presents two graph kernels. The rst, calledLaplacian kernel, is based on the GED (approximated using the algorithm by Riesen

    and Bunke125). The product operation derived from the GED is not guaranteed to be

    positive denite, and so does not have the formal properties of a kernel; so the

    authors propose a technique to obtain from the distance matrix a positive denite

    matrix, which is then used as the kernel. The second kernel, called the treelet kernel,

    is based on treelets, which are all the possible trees having less than a xed number of

    nodes (in the papers, treelets up to six nodes are considered); the kernel is computed

    by counting the occurrences of each treelet in the graphs. This kernel can only be

    used for unattributed graphs, while the Laplacian kernel can also be employed for

    graphs having node and edge attributes. The same authors in Ref. 58 propose a

    kernel that is also based on treelets, but instead of simply counting their occurrences,

    uses a treelet edit distance to compare the treelets in one graph with those in the

    other one, so as to be tolerant with respect to slight deformations of the graphs.

    Grenier et al.66 in their 2013 paper propose a dierent treelet-based kernel, speci-

    cally devised for chemioinformatics applications, that incorporates also information

    on the position of each treelet within the graph.

    Shervashidze et al.,145 in their 2009 paper, present a kernel based on the use of

    graphlets, that is all the possible graphs having less than a xed number of nodes.

    Also the graphlet kernel, as the previously mentioned treelet kernel, has the limi-

    tation of being applicable only to unlabeled graphs. The paper considers graphlets up

    to ve nodes, and propose two dierent techniques to reduce the computational cost

    of nding all the occurence of the graphlets in a large graph: the rst is a probabilistic

    technique based on sampling, that replaces the exact number of graphlets with an

    estimate that is ensured to converge in probability to the true value; the second

    technique is applicable only to bounded-valence graphs, and it is based on an ecient

    algorithm for enumerating, on this kind of graphs, all the paths up to a xed length.

    An extension of the idea of graphlet kernels is introduced by Kondor et al. in

    2009.88 The authors dene a set of graph invariants, called the graphlet spectrum,

    based on the generalization of Fourier transforms over permutation groups. The

    kernel based on these invariants has the advantages of being applicable to labeled

    graphs, and of taking into account the position of the graphlets within the larger

    graph, and not only their frequency of occurrence.

    Bai and Hancock, in a 2013 paper,3 dene a novel kernel based on the Jensen

    Shannon divergence, which is an information theoretic measure of entropy. To

    apply this measure to graphs, the authors derive from each graph a probability

    distribution, based on the random walks on the graphs. Rossi et al. in their paper133

    propose an evolution of this method, dening a kernel that is similarly based on the

    Graphs in Pattern Recognition in the Last 10 Years

    1450001-19

    Int.

    J. Pa

    tt. R

    ecog

    n. A

    rtif.

    Inte

    ll. 2

    014.

    28. D

    ownl

    oade

    d fro

    m w

    ww

    .wor

    ldsc

    ient

    ific.

    com

    by 1

    22.1

    76.2

    42.3

    5 on

    07/

    12/1

    5. F

    or p

    erso

    nal u

    se o

    nly.

  • JensenShannon divergence, but uses continuous-time quantum walks instead of

    classical random walks.

    In a 2011 paper, Strug153 proposes a kernel specically devised for hierarchical

    graphs, which is based on the combination of a tree kernel with a classical graph

    kernel.

    Lozano and Escolano propose the use of graph kernels in a slightly dierent

    meaning as an aid to improve the performance of other operations on graphs. For

    instance, in Ref. 99 a kernelized version of the classical Graduated Assignment

    Graph Matching algorithm by Gold and Rangarajan, yielding an improvement in the

    accuracy and the robustness to noise of the matching. In Ref. 100 the same authors

    adopt a kernel for dening a graph-matching cost function, that is then used for a

    kernelized version of two existing matching algorithms; in this paper the authors also

    dene a kernel-based algorithm for constructing a prototype graph from a set of

    graphs, using this technique for graph clustering.

    A recent paper by Lee et al.92 investigates the dierent impact of structural

    information and graph attributes within a graph kernel, using a kernel based on the

    shortest paths, modied so as to have the possibility of changing the relative weights

    of the two kinds of information. The authors show experimentally that these two

    kinds are essentially dierent, and can reinforce each other. A similar conclusion

    investigation, with the same conclusion, is reached using a GED for comparing the

    graphs.

    3.3. Graph clustering

    The term Graph clustering is actually used in the literature with two dierent and

    unrelated meanings, which may be both of interest for researchers working in Pattern

    Recognition eld: in the rst sense, graphs are used to represent each of the objects to

    be clustered, so the clustering is performed on a set of graphs (see Fig. 3). In the second

    sense, which is the most frequently encountered, a single graph is used to represent the

    structure of the space to which the objects belong, with a node for each object, and

    edges encoding the relationships between pairs of objects (usually a similarity or a

    distance measure is associated with each edge); in this case the clustering is performed

    by partitioning the set of nodes of the graph according to some criterion (see Fig. 4). In

    order to dierentiate between the twomeanings of the term, we will speak of clustering

    of graphs when referring to the rst sense, and graph-based clustering when referring

    to the second one. This latter problem is related to graph-based segmentation, which

    is a wide eld of research that is not included in this survey.

    3.3.1. Clustering of graphs

    Regarding the clustering of graphs, Gnter and Bunke69 in 2002 present an extensionto graphs of the Unsupervised Learning Vector Quantization (LVQ ). The algorithm

    uses GED to evaluate the distance between an input graph and a cluster proto-

    type, and an original algorithm, also based on GED, that computes the weighted

    P. Foggia, G. Percannella & M. Vento

    1450001-20

    Int.

    J. Pa

    tt. R

    ecog

    n. A

    rtif.

    Inte

    ll. 2

    014.

    28. D

    ownl

    oade

    d fro

    m w

    ww

    .wor

    ldsc

    ient

    ific.

    com

    by 1

    22.1

    76.2

    42.3

    5 on

    07/

    12/1

    5. F

    or p

    erso

    nal u

    se o

    nly.

  • combination of two graphs (by determining the minimal set of edit operations to

    transform the rst graph into the second, and then choosing a subset of these

    operations depending on the weight), which is used for updating the winning pro-

    totype. In 2003 the same authors70 propose an extension of this method, by intro-

    ducing a set of clustering validation indices to choose the optimal number of LVQ

    nodes.

    Serratosa et al.141 propose an algorithm for the clustering of graphs based on

    Function-Described Graphs, which are Attributed Relational Graphs extended with

    Fig. 3. An example of graph clustering in the rst meaning (clustering of graphs): each of the objects to

    be clustered is represented by a graph.

    Fig. 4. An example of graph clustering in the second meaning (graph-based clustering): the clustering is

    performed by partitioning the set of nodes of a single graph.

    Graphs in Pattern Recognition in the Last 10 Years

    1450001-21

    Int.

    J. Pa

    tt. R

    ecog

    n. A

    rtif.

    Inte

    ll. 2

    014.

    28. D

    ownl

    oade

    d fro

    m w

    ww

    .wor

    ldsc

    ient

    ific.

    com

    by 1

    22.1

    76.2

    42.3

    5 on

    07/

    12/1

    5. F

    or p

    erso

    nal u

    se o

    nly.

  • information about constraints on the joint probabilities of nodes and edges. The

    algorithm is based on an incremental, hierarchical clustering strategy.

    Also the 2011 paper by Jain and Obermayer77 presents a method for the clus-

    tering of graphs based on the Vector Quantization with the k-Means algorithm. The

    proposed algorithm uses an embedding of graphs into Riemannian orbitfolds, based

    on GED, to perform the quantization. The authors present an extensive discussion

    of the theoretical properties of the proposed approach, providing some necessary

    conditions for optimality of the found clustering and for statistical consistency;

    the authors also discuss the impact of possible approximations for reducing the

    computational cost.

    3.3.2. Graph-based clustering

    Among the recent algorithms proposed for graph-based clustering, the paper by

    Guigues et al.68 in 2003 denes the so called cocoons, which are connected subgraphs

    characterized by the fact that the maximum dissimilarity between nodes within the

    subgraph is less than the minimum dissimilarity between a node within the subgraph

    and an outside node. The authors demonstrate that the cocoons of a graph form a

    hierarchy, and dene an algorithm for constructing this hierarchy, that can be used

    for a hierarchical clustering of the nodes of the graph. The same authors in Ref. 67

    present a dierent method for obtaining a hierarchical representation, applied to

    image representation and segmentation.

    The 2006 paper by Bras Silva et al.10 proposes an algorithm that is based on the

    graph coloring problem. Graph coloring involves assigning labels (called colors) to

    the nodes of a graph so that adjacent nodes have dierent colors, with the goal of

    minimizing the total number of colors used. The proposed clustering algorithm uses a

    greedy coloring technique from the literature, and then uses the resulting color as-

    signment as an aid to decide how to aggregate the nodes into clusters.

    Grady and Schwartz65 in 2006 present a graph-based clustering technique based

    on continuous optimization. The functional to be minimized is chosen so as to have a

    linear optimization problem, which can be solved with less computational cost and

    more numerical stability than other functionals used in graph-based clustering.

    However, the algorithm requires the choice of a ground node, which can aect the

    resulting partition; the authors propose a criterion for xing this node, but warn that

    this might not yield the oprimal performance, and so the method best suited for

    applications where an interactive form of clustering is required, allowing the user to

    change the ground node until a satisfactory clustering is found. A recent paper by

    Couprie et al.30 presents a generalized energy functional, which is demonstrated to be

    equivalent, by choosing the appropriate parameter values, to several optimization-

    based techniques used for clustering and segmentation, such as graph cuts.

    The 2006 paper by Frnti et al.54 proposes a graph-based technique, which usesan approximate nearest neighbor graph, to speed up an agglomerative clustering

    algorithm.

    P. Foggia, G. Percannella & M. Vento

    1450001-22

    Int.

    J. Pa

    tt. R

    ecog

    n. A

    rtif.

    Inte

    ll. 2

    014.

    28. D

    ownl

    oade

    d fro

    m w

    ww

    .wor

    ldsc

    ient

    ific.

    com

    by 1

    22.1

    76.2

    42.3

    5 on

    07/

    12/1

    5. F

    or p

    erso

    nal u

    se o

    nly.

  • Dhillon et al.37 in 2007 propose a multilevel graph-based clustering algorithm that

    exploits a theoretical relation between some kernels and some graph-based spectral

    clustering algorithms to perform the clustering with the same properties of a spectral

    method but without the computational cost of computing the eigenvectors of the

    graph.

    Foggia et al.52 in 2008 propose a graph-based clustering that is based on the

    Minimum Spanning Tree, used in combination with the Fuzzy C-Means (FCM )

    algorithm to determine automatically the clustering threshold. The method has been

    further extended in Ref. 186.

    The 2008 paper by Laskaris and Zafeiriou91 introduces a graph-based clustering

    algorithm that is based on FCM. Namely, FCM is used as preprocessing step, with

    the task of dividing the input data into a large number of clusters (overclustering).

    Then, the found clusters are used to construct a graph-based representation, the

    connectivity graph, with nodes corresponding to the cluster centroids, and edges to

    neighborhood relations among the clusters. The connectivity graph is used with

    several graph-based algorithms to nd a more accurate clustering, choosing auto-

    matically the optimal number of clusters, and for dimensionality reduction.

    Zanghi et al.180 propose in their 2008 paper a graph-based clustering method based

    on a probabilistic formulation of the problem, and using the ErdsRenyi mixturemodel for random graphs. The EM algorithm is used to solve the probabilistic

    problem. An extension of this method is dened in Ref. 181; the new algorithm adds

    the ability to use node information (in the form of node feature vectors) in addition to

    edge information representing the similarity of the corresponding data points.

    The 2009 paper by Kim and Choi84 presents an algorithm for graph-based clus-

    tering that uses the decomposition of the graph into r-regular subgraphs,

    i.e. connected subgraphs whose nodes are adjacent to exactly r other nodes. The

    decomposition is reduced to a continuous optimization problem and solved using

    Linear Programming techniques. After the decomposition, a renement step is used

    to prune inconsistent edges and to remove outliers.

    Wang et al.167 propose a clustering technique, called Integrated KL clustering,

    that is a hybrid between a traditional clustering approach (the K-means algorithm)

    and a graph-based clustering based on normalized graph cuts. The method should be

    convenient in situations where the input data are partly described by a feature

    vector, and partly by a set of similarity/dissimilarity relations encoded using a graph

    structure.

    Mimaroglu and Erdil110 in a 2011 paper propose a graph-based method for

    combining the results of several clustering algorithms. The method is given as input

    the results of a set of clustering algorithms applied to the same data; dierent

    algorithms can be used, or the same algorithm with dierent parameters. The

    method builds a graph with nodes corresponding to data points, and edges encoding

    the number of clustering algorithms that have assigned two data points to the same

    cluster. Then, the nodes of this graph are clustered so as to maximize the consensus

    among the dierent clustering algorithms, using a greedy search technique. The

    Graphs in Pattern Recognition in the Last 10 Years

    1450001-23

    Int.

    J. Pa

    tt. R

    ecog

    n. A

    rtif.

    Inte

    ll. 2

    014.

    28. D

    ownl

    oade

    d fro

    m w

    ww

    .wor

    ldsc

    ient

    ific.

    com

    by 1

    22.1

    76.2

    42.3

    5 on

    07/

    12/1

    5. F

    or p

    erso

    nal u

    se o

    nly.

  • authors report that the nal clustering obtained by the method is closer to a manual

    partition of the data, and is less inuenced by the choice of parameters than the

    initial algorithms.

    Nie et al.115 propose a graph-based clustering technique that uses a new formu-

    lation of the clustering problem, called the l1-norm graph clustering, where the goal is

    expressed as the minimization of the L1 norm of a suitable dened vector; this

    formulation should be more robust with respect to noise and outliers.

    The 2012 paper by Tabatabaei et al.154 presents a graph-based clustering algo-

    rithm where the clustering goal is formulated in terms of minimizing the normalized

    cut (Ncut ) metric. The clustering is performed using a greedy, agglomerative algo-

    rithm, followed by a renement procedure that evaluates the opportunity of moving

    the boundary nodes of each cluster to a neighboring one.

    In 2012, Ducornau et al.41 propose a hierarchical algorithm for hypergraph-based

    clustering, that is a generalization of graph-based clustering. The algorithm works by

    performing a rst level partitioning of the nodes using a spectral technique, and then

    the obtained partition is recursively rened.

    Shang et al.144 in their 2012 paper propose two graph-based algorithms for the

    co-clustering problem, which is aimed at nding at the same time coherent subsets of

    the datapoints and coherent subsets of the features used to represent them. The

    algorithms adopt iterative optimization schemes, based on graph Laplacian.

    3.4. Graph learning

    Several learning methods use a graph-based structure as part of the learning process.

    In some cases, the individual patterns are represented by graphs, and often also the

    class descriptions have a graph-based representations; in such cases, often some form

    of graph matching is involved in the algorithm (see Fig. 5). In other cases, a graph

    structure represents the whole input space, with nodes corresponding to individual

    patterns, and edges representing some sort of proximity or similarity relation. We

    will use the terms learning of graphs for the rst case, and graph-based learning for

    the second.

    3.4.1. Learning of graphs

    In 2005, Neuhaus and Bunke111 present a method to learn a GED using a Self-

    Organizing Map. The method is given a set of graphs with class labels, and learns the

    edit costs so as to ensure that graphs in the same class have a smaller GED than

    graphs belonging to dierent classes. The same authors propose an improved algo-

    rithm for solving the same problem in a 2007 paper.113 In this latter work, they

    reformulate the learning of the graph edit costs in a probabilistic framework, and use

    the Expectation Maximization algorithm to optimize these costs in the Maximum

    Likelihood sense.

    Also the paper by Serratosa et al.143 in 2011 proposes a method for learning the

    edit costs of a GED. In this case, the method is based on an Adaptive Learning

    P. Foggia, G. Percannella & M. Vento

    1450001-24

    Int.

    J. Pa

    tt. R

    ecog

    n. A

    rtif.

    Inte

    ll. 2

    014.

    28. D

    ownl

    oade

    d fro

    m w

    ww

    .wor

    ldsc

    ient

    ific.

    com

    by 1

    22.1

    76.2

    42.3

    5 on

    07/

    12/1

    5. F

    or p

    erso

    nal u

    se o

    nly.

  • paradigm, in which the system is sequentially given new graphs and attempts to

    classify them, and only if the class is dierent from the one a human expert would

    have chosen, a feedback is given to the system and it adapts the edit costs. A 2011

    paper by Sole-Ribalta and Serratosa148 provides a further elaboration on this

    method, by considering, once the edit costs are xed, the formal properties of the

    space of the possible matchings.

    A similar problem is addressed by the 2012 paper by Leordeanu et al.,96 where

    learning (both supervised and unsupervised) is used to obtain the parameters of a

    graph matching algorithm based on spectral properties.

    A 2008 paper by Maulik108 presents an algorithm for nding repeated subgraphs

    within large graphs, which can be considered an unsupervised form of graph-based

    learning. This can be used for data mining in domains suitable to a structural re-

    presentation, e.g. web pages or molecular databases. The algorithm uses Evolu-

    tionary Programming to perform the search, with a tness function based on the

    compression of the original graph attainable by the detection of the repeated

    substructure.

    In their 2009 papers, Ferrer et al.49,50 propose an algorithm for computing the

    median graph, that is the graph within a set of graphs that minimizes the sum of

    graph distances from the other graphs. The computation of the median graph can be

    (a) (b) (c)

    Fig. 5. An illustration of graph learning: (a) A set of objects made of three dierent kinds of parts (circles,triangles, rectangles); (b) the representation in terms of graphs (node attributes are the kinds of parts,

    while edges representing the only spatial relation \above" and therefore they do not have attributes); (c)

    the corresponding learned class description, a prototype containing the common substructure: the questionmark on the prototype nodes represents a generic value (a don't care) for the corresponding attribute.

    Graphs in Pattern Recognition in the Last 10 Years

    1450001-25

    Int.

    J. Pa

    tt. R

    ecog

    n. A

    rtif.

    Inte

    ll. 2

    014.

    28. D

    ownl

    oade

    d fro

    m w

    ww

    .wor

    ldsc

    ient

    ific.

    com

    by 1

    22.1

    76.2

    42.3

    5 on

    07/

    12/1

    5. F

    or p

    erso

    nal u

    se o

    nly.

  • considered a form of learning in the graph domain, since the median graph can be

    used as a prototype for a set of graphs. The proposed method is based on genetic

    algorithms, and performs a reduction of the search space by exploiting a novel

    theoretical bound on the sum of distances for the particular graph distance measure

    adopted (which is based on the maximum common subgraph).

    Ferrer et al. present a dierent median graph algorithm in 2010,51 which is based

    on the graph embedding technique by Riesen and Bunke.123 In particular, the

    method proposed by Ferrer et al. computes the median graph by converting the

    graphs into vectors using the cited graph embedding, nding the median vector of

    the set and then converting this vector back into a graph. In a 2011 paper,48 the same

    authors present an improved procedure for performing this last step of the algorithm.

    Jain and Obermayer, in their 2010 paper75 discuss the mean and the median of a

    set of graphs, using a theoretical formulation based on Riemannian orbitfolds, and

    present some sucient conditions to ensure that the estimators of the mean and the

    median are consistent with an underlying probability distribution of the graphs.

    Also the paper by Raveaux et al.120 deals with learning a prototype for a set of

    graphs. Four dierent kinds of prototypes are considered:median graphs, generalized

    median graphs, discriminant graphs and generalized discriminant graphs. Discrimi-

    nant graphs are prototypes chosen so as to maximize the performance of a Nearest

    Neighbor classier over a labeled training set. The generalized versions of median

    graphs and discriminant graphs are obtained by lifting the restriction that the

    prototype must be a member of the training set. All the four kinds of prototypes are

    computed using a Genetic Algorithm, with dierent chromosome encodings and

    tness functions.

    3.4.2. Graph-based learning

    Culp and Michailidis,32 in their 2008 paper, propose a semi-supervised learning al-

    gorithm based on graphs. Semi-supervised learning is a form of machine learning in

    which only a subset of the training data has class labels. The proposed method

    assumes that the structure of the input space is described as a graph, in which nodes

    are the input samples and edges encode the neighborhood relations; this graph

    structure is used to assign a label to unlabeled training samples during the learning

    process. A similar technique is proposed by Elmoataz et al.43 for graph-based regu-

    larization on weighted graphs.

    Also the 2012 paper by Rohban and Rabiee132 is related to graph-based semi-

    supervised learning. In particular, the authors investigate the preliminary step of

    graph construction: given a set of datapoints in a metric space, how a graph structure

    can be constructed so that graph-based semi-supervised learning can be applied

    eectively; the authors propose a supervised graph construction algorithm based on

    the optimization of a smoothness functional and showing that the use of neighbor-

    hood graphs based on this method outperforms the k-NN technique commonly used

    for this task.

    P. Foggia, G. Percannella & M. Vento

    1450001-26

    Int.

    J. Pa

    tt. R

    ecog

    n. A

    rtif.

    Inte

    ll. 2

    014.

    28. D

    ownl

    oade

    d fro

    m w

    ww

    .wor

    ldsc

    ient

    ific.

    com

    by 1

    22.1

    76.2

    42.3

    5 on

    07/

    12/1

    5. F

    or p

    erso

    nal u

    se o

    nly.

  • The 2012 paper by Wang et al.165 introduce a novel technique to construct the

    graph structure for graph-based semi-supervised learning. The authors propose the

    k-Regular Nearest Neighbor graph (k-RNN) instead of the more common k-NN

    graph. In the k-RNN graph, k is the average number of neighbors, and the graph is

    constructed so as to minimize total weights (representing distances) of the edges. The

    authors demonstrate the performance improvement of this technique in conjunction

    with the Manifold-ranking semi-supervised algorithm by Zhou et al.183

    Shiga and Mamitsuka146 in their 2012 paper present another graph-based, semi-

    supervised learning algorithm. The novel aspect of this proposal is that it inte-

    grates several graphs for representing dierent sources of evidence regarding

    the similarity of the input patterns. The algorithm