10
JID:TCS AID:9572 /FLA Doctopic: Algorithms, automata, complexity and games [m3G; v 1.123; Prn:20/01/2014; 13:10] P.1(1-10) Theoretical Computer Science ••• (••••) •••••• Contents lists available at ScienceDirect Theoretical Computer Science www.elsevier.com/locate/tcs Algorithms for parameterized maximum agreement forest problem on multiple trees Feng Shi a , Jianxin Wang a,, Jianer Chen a,b , Qilong Feng a , Jiong Guo c a School of Information Science and Engineering, Central South University, PR China b Department of Computer Science and Engineering, Texas A&M University, USA c Universität des Saarlandes, Campus E 1.7, Germany article info abstract Keywords: Fixed-parameter tractability Phylogenetic tree Maximum agreement forest The Maximum Agreement Forest problem (MAF) asks for a largest common subforest of a collection of phylogenetic trees. The MAF problem on two binary phylogenetic trees has been studied extensively in the literature. In this paper, we present a group of fixed- parameter tractable algorithms for the MAF problem on multiple (i.e., two or more) binary phylogenetic trees. Our techniques work fine for the problem for both rooted trees and unrooted trees. The computational complexity of our algorithms is comparable with that of the known algorithms for two trees, and is independent of the number of phylogenetic trees for which a maximum agreement forest is constructed. © 2014 Elsevier B.V. All rights reserved. 1. Introduction Phylogenetic trees have been widely used in the study of evolutionary biology to represent the tree-like evolution of a collection of species. While a popular way to construct a phylogenetic tree is to start from multiple alignments of genes over a set of species, different methods often lead to different trees. In order to facilitate the comparison of different phylogenetic trees, several distance metrics have been proposed in the literature, such as Robinson–Foulds distance [1], NNI (Nearest Neighbor Interchange) distance [2], TBR (Tree Bisection and Reconnection) distance and SPR (Subtree Prune and Regraft) distance [3,4]. A graph theoretical model, the maximum agreement forest (MAF) of two phylogenetic trees, has been formulated for the TBR distance and the SPR distance [5] for phylogenetic trees. Define the order of a forest to be the number of connected components in the forest. 1 Allen and Steel [6] proved that the TBR distance between two unrooted binary phylogenetic trees is equal to the order of their MAF minus 1, and Bordewich and Semple [7] proved that the rSPR distance between two rooted binary phylogenetic trees is equal to the order of their rooted version of MAF minus 1. In terms of computational complexity, it is known that computing the order of an MAF is NP-hard for two unrooted binary phylogenetic trees [5], as well as for two rooted binary phylogenetic trees [7]. A preliminary version of this work was reported in the Proceedings of the 19th International Computing and Combinatorics Conference, Lecture Notes in Computer Science, vol. 7396, pp. 567–578, 2013. This work is supported by the National Natural Science Foundation of China under Grants 61103033, 61173051, 61128006, 71221061, and Hunan Provincial Innovation Foundation For Postgraduate (CX2013B073). * Corresponding author. E-mail address: [email protected] (J. Wang). 1 The definitions for the study of maximum agreement forests have been kind of confusing. If size denotes the number of edges in a forest, then for a forest, the size is equal to the number of vertices minus the order. In particular, when the number of vertices is fixed, a forest of a large size means a small order of the forest. 0304-3975/$ – see front matter © 2014 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.tcs.2013.12.025

Algorithms for parameterized maximum agreement forest problem on multiple trees

  • Upload
    jiong

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Algorithms for parameterized maximum agreement forest problem on multiple trees

JID:TCS AID:9572 /FLA Doctopic: Algorithms, automata, complexity and games [m3G; v 1.123; Prn:20/01/2014; 13:10] P.1 (1-10)

Theoretical Computer Science ••• (••••) •••–•••

Contents lists available at ScienceDirect

Theoretical Computer Science

www.elsevier.com/locate/tcs

Algorithms for parameterized maximum agreement forestproblem on multiple trees ✩

Feng Shi a, Jianxin Wang a,∗, Jianer Chen a,b, Qilong Feng a, Jiong Guo c

a School of Information Science and Engineering, Central South University, PR Chinab Department of Computer Science and Engineering, Texas A&M University, USAc Universität des Saarlandes, Campus E 1.7, Germany

a r t i c l e i n f o a b s t r a c t

Keywords:Fixed-parameter tractabilityPhylogenetic treeMaximum agreement forest

The Maximum Agreement Forest problem (MAF) asks for a largest common subforest ofa collection of phylogenetic trees. The MAF problem on two binary phylogenetic treeshas been studied extensively in the literature. In this paper, we present a group of fixed-parameter tractable algorithms for the MAF problem on multiple (i.e., two or more) binaryphylogenetic trees. Our techniques work fine for the problem for both rooted trees andunrooted trees. The computational complexity of our algorithms is comparable with thatof the known algorithms for two trees, and is independent of the number of phylogenetictrees for which a maximum agreement forest is constructed.

© 2014 Elsevier B.V. All rights reserved.

1. Introduction

Phylogenetic trees have been widely used in the study of evolutionary biology to represent the tree-like evolution of acollection of species. While a popular way to construct a phylogenetic tree is to start from multiple alignments of genesover a set of species, different methods often lead to different trees. In order to facilitate the comparison of differentphylogenetic trees, several distance metrics have been proposed in the literature, such as Robinson–Foulds distance [1], NNI(Nearest Neighbor Interchange) distance [2], TBR (Tree Bisection and Reconnection) distance and SPR (Subtree Prune andRegraft) distance [3,4].

A graph theoretical model, the maximum agreement forest (MAF) of two phylogenetic trees, has been formulated for theTBR distance and the SPR distance [5] for phylogenetic trees. Define the order of a forest to be the number of connectedcomponents in the forest.1 Allen and Steel [6] proved that the TBR distance between two unrooted binary phylogenetictrees is equal to the order of their MAF minus 1, and Bordewich and Semple [7] proved that the rSPR distance between tworooted binary phylogenetic trees is equal to the order of their rooted version of MAF minus 1. In terms of computationalcomplexity, it is known that computing the order of an MAF is NP-hard for two unrooted binary phylogenetic trees [5], aswell as for two rooted binary phylogenetic trees [7].

✩ A preliminary version of this work was reported in the Proceedings of the 19th International Computing and Combinatorics Conference, Lecture Notesin Computer Science, vol. 7396, pp. 567–578, 2013. This work is supported by the National Natural Science Foundation of China under Grants 61103033,61173051, 61128006, 71221061, and Hunan Provincial Innovation Foundation For Postgraduate (CX2013B073).

* Corresponding author.E-mail address: [email protected] (J. Wang).

1 The definitions for the study of maximum agreement forests have been kind of confusing. If size denotes the number of edges in a forest, then for aforest, the size is equal to the number of vertices minus the order. In particular, when the number of vertices is fixed, a forest of a large size means a smallorder of the forest.

0304-3975/$ – see front matter © 2014 Elsevier B.V. All rights reserved.http://dx.doi.org/10.1016/j.tcs.2013.12.025

Page 2: Algorithms for parameterized maximum agreement forest problem on multiple trees

JID:TCS AID:9572 /FLA Doctopic: Algorithms, automata, complexity and games [m3G; v 1.123; Prn:20/01/2014; 13:10] P.2 (1-10)

2 F. Shi et al. / Theoretical Computer Science ••• (••••) •••–•••

The TBR and SPR distances of two phylogenetic trees measure the “difference” between the two trees constructed fromthe same collection of species, which can be small in practice, compared to the size of the trees. This observation and therelationship between the distances and the order of the MAF have motivated the study of parameterized algorithms for theMAF problem, where the problem is parameterized by the order k of the MAF. A parameterized problem is fixed-parametertractable [8] if it is solvable in time f (k)nO (1) , where k is the parameter and n is the instance size. In particular, for smallvalues of the parameter k, such an algorithm may solve the problem more effectively. Allen and Steel [6] showed that theMAF problem on two unrooted binary phylogenetic trees is fixed-parameter tractable. Hallett and McCartin [9] developed afaster parameterized algorithm of running time O (4kk5 + nO (1)) for the MAF problem on two unrooted binary phylogenetictrees. Whidden et al. [10] further improved the time complexity to O (4kk + n3) or O (4kn). A further faster algorithm hasbeen announced recently by Chen, Fan, and Sze [11], which runs in time O (3kn) and is currently the fastest algorithm forthe MAF problem on two unrooted binary phylogenetic trees. For the MAF problem on two rooted binary phylogenetic trees,Bordewich et al. [12] developed a parameterized algorithm of running time O (4kk4 +n3). Whidden et al. [13] improved thisbound and developed an algorithm of running time O (2.42kk + n3). This is currently the fastest algorithm for the MAFproblem on two rooted binary phylogenetic trees.

Parameterized algorithms and approximation algorithms for the MAF problem on two general (i.e., binary or non-binary)phylogenetic trees have also been studied recently [11,14].

On the other hand, the computational complexity for the MAF problem on more than two phylogenetic trees has notbeen studied as extensively as that on two trees. Note that it makes perfect sense to investigate the MAF problem on morethan two phylogenetic trees: we may construct the phylogenetic trees for the same collection of species using more thantwo methods, each producing a different tree. An MAF of order k for a set of phylogenetic trees means that for any twophylogenetic trees in the given set, one can be obtained from the other via no more than k − 1 TBR operations (for unrootedtrees) or no more than k − 1 rSPR operations (for rooted trees). However, it seems much more difficult to construct an MAFfor more than two trees than that for two trees. For example, while there have been several polynomial-time approximationalgorithms of ratio 3 for the MAF problem on two rooted binary phylogenetic trees [10,14] (the same ratio even holds truefor the MAF problem on two unrooted multifurcating trees [11]), the best polynomial-time approximation algorithm [15]for the MAF problem on more than two rooted binary phylogenetic trees has a ratio 8. To our best knowledge, it is stillunknown whether the MAF problem on more than two unrooted binary phylogenetic trees is fixed-parameter tractable. Forthe MAF problem on more than two rooted binary phylogenetic trees, the only parameterized algorithm was presented byChen and Wang [17], which runs in time O ∗(6k).

In the current paper, we will be focused on parameterized algorithms for the MAF problem on multiple (i.e., two or more)binary phylogenetic trees, for both the version of rooted trees and the version of unrooted trees. Our main contributionsinclude an O (3kn)-time parameterized algorithm for the MAF problem on multiple rooted binary phylogenetic trees, and anO (4kn)-time parameterized algorithm for the MAF problem on multiple unrooted binary phylogenetic trees. Our algorithmfor the MAF problem on multiple rooted binary phylogenetic trees improves the previously best algorithm of running timeO ∗(6k) [17]. We also give that the MAF problem on multiple unrooted binary phylogenetic trees is fixed-parameter tractable.

Our algorithms are based on the following simple ideas that, however, require a careful and efficient implementation. LetC = {T1, T2, . . . , Tm} be a collection of rooted or unrooted binary phylogenetic trees. Note that an MAF of order k for thetrees in C must be an agreement forest for the first two trees T1 and T2, which although may not be necessarily maximum.Therefore, if we can essentially examine all agreement forests of order bounded by k for the trees T1 and T2, then we caneasily check if any of them is an MAF for all the trees in C (note that checking if a forest is a subgraph of a tree in C iseasy). In order to implement this idea, however, we must overcome the following difficulties. First, we must ensure thatno agreement forest in our concern is missing. This in fact requires new and non-trivial techniques: all MAF algorithms fortwo trees proposed in the literature are based on resolving conflicting structures in the two trees, and do not guaranteeexamining all agreement forests of order bounded by k. The conflicting structures help to identify edges in the trees whoseremoval leads to the construction of the MAF. Therefore, if the two trees T1 and T2 do not conflict much (in an extreme case,T1 and T2 are isomorphism), then an MAF for T1 and T2 may not help much for constructing an MAF for all the trees in C .Secondly, with the assurance that essentially all concerned agreement forests for T1 and T2 are examined, we must makesure that our algorithms are sufficiently efficient. This goal has also been nicely achieved: compared with the algorithmspublished in the literature, our O (3kn)-time algorithm for the MAF problem on multiple rooted binary phylogenetic trees isasymptotically faster than the best published algorithm of running time O (4knO (1)) [7] for the MAF problem on two rootedbinary phylogenetic trees and the best algorithm of running time O ∗(6k) [17] for the MAF problem on multiple rootedbinary phylogenetic trees, and our O (4kn)-time algorithm for the MAF problem on multiple unrooted binary phylogenetictrees matches the computational complexity of the best published algorithm for the MAF problem on two unrooted binaryphylogenetic trees [9]. Moreover, the algorithms for two trees such as [7,9] do not seem to be extendable to the problemson more than two trees, while our algorithms work fine for the MAF problems for an arbitrary number of trees.2

2 Upon the final preparation of this manuscript, we were informed that some new algorithms have been published for the MAF problems on two rootedtrees [13,18] and on two unrooted trees [11] that have slightly improved complexity bounds. The algorithm presented by Chen and Wang [18] is currentlythe best algorithm for the MAF problem on two rooted trees of time O (2.344kn). However, all these algorithms again do not seem to be extendable to theproblems on more than two trees.

Page 3: Algorithms for parameterized maximum agreement forest problem on multiple trees

JID:TCS AID:9572 /FLA Doctopic: Algorithms, automata, complexity and games [m3G; v 1.123; Prn:20/01/2014; 13:10] P.3 (1-10)

F. Shi et al. / Theoretical Computer Science ••• (••••) •••–••• 3

2. Definitions and the problem formulations

Our definitions are consistent with those used in the literature [5,7,9,13]. A tree is a single-vertex tree if it consists of asingle vertex, which is defined to be the leaf of the tree. A tree is a single-edge tree if it consists of a single edge. A tree isbinary if either it is a single-vertex tree or each of its vertices has degree either 1 or 3. The degree-1 vertices are leaves andthe degree-3 vertices are non-leaves of the tree. There are two versions in our problem formulation, one is on unrooted treesand the other is on rooted trees. We first give the terminologies on the unrooted version, then remark on the differencesfor the rooted version. Let X be a fixed label-set.

2.1. Unrooted X-trees and X-forests

A binary tree is unrooted if no root is specified in the tree – in this case no ancestor-descendant relation is defined inthe tree. For the label-set X , an unrooted binary phylogenetic X-tree, or simply an unrooted X-tree, is an unrooted binarytree whose leaves are labeled bijectively by the label-set X (all non-leaves are not labeled). An unrooted X-tree will alsobe called an (unrooted) leaf-labeled tree if the label-set X is irrelevant. A subforest of an unrooted X-tree T is a subgraphof T , and a subtree of T is a connected subgraph of T . An unrooted X-forest F is a subforest of an unrooted X-tree Tthat contains all leaves of T such that each connected component of F contains at least one leaf in T . Thus, an unrootedX-forest F is a collection of leaf-labeled trees whose label-sets are disjoint such that the union of the label-sets is equalto X . Define the order of the X-forest F , denoted Ord(F ), to be the number of connected components in F . For a subset X ′of the label-set X , the subtree induced by X ′ in an unrooted X-tree T , denoted by T [X ′], is the minimal subtree of T thatcontains all leaves with labels in X ′ .

A subtree T ′ of an unrooted X-tree may contain unlabeled vertices of degree less than 3. In this case we apply the forcedcontraction operation on T ′ , which replaces each degree-2 vertex v and its incident edges with a single edge connecting thetwo neighbors of v , and removes each unlabeled vertex that has degree smaller than 2. Note that the forced contractiondoes not change the order of an X-forest. An X-forest F is strongly reduced if the forced contraction does not apply to F .It has been well-known that the forced contraction operation does not affect the construction of an MAF for X-trees (see,for example, [9,12]). Therefore, we will assume that the forced contraction is applied immediately whenever it is applicable.Thus, the X-forests in our discussion are always assumed to be strongly reduced. With this assumption, an unlabeled vertexin an unrooted X-trees is always of degree 3. If a leaf-labeled forest F ′ is isomorphic to a subforest of an X-forest F (up tothe forced contraction), then we will simply say that F ′ is a subforest of F .

2.2. Rooted X-trees and X-forests

A binary tree is rooted if a particular leaf is designated as the root (thus, this vertex is both a root and a leaf), whichspecifies a unique ancestor-descendant relation in the tree. A rooted X-tree is a rooted binary tree whose leaves (includingthe root) are labeled bijectively by the label-set X . The root of an X-tree will always be labeled by a special symbol ρ ,which is always assumed to be in the label-set X . A subtree T ′ of a rooted X-tree T is a connected subgraph of T whichcontains at least one leaf in T . In order to preserve the ancestor-descendant relation in T , we should define the root of thesubtree of T . If T ′ contains the vertex labeled ρ , certainly, it is the root of the subtree; if T ′ does not contain the vertexlabeled ρ , the node in T ′ which is the least common ancestor of the leaves in T ′ is defined to be the root of T ′ . A subforestof a rooted X-tree T is defined to be a subgraph of T . A (rooted) X-forest F is a subforest of a rooted X-tree T that containsa collection of subtrees whose label-sets are disjoint such that the union of the label-sets is equal to X . Thus, one of thesubtrees in a rooted X-forest F must have the vertex labeled ρ as its root.

We again assume that the forced contraction is applied on a subtree T ′ of a rooted X-tree T immediately whenever it isapplicable. However, if the root r of the subtree T ′ is of degree 2, then the operation will not be applied on r, in order topreserve the ancestor-descendant relation in T . Therefore, after the forced contraction, the root of a subtree T ′ of a rootedX-tree is either an unlabeled vertex that has degree 2 in T ′ , or the vertex labeled ρ that has degree 1 in T ′ , or a vertexwith a label in X that has degree 0 in T ′ . Each unlabeled vertex in T ′ that is not the root of T ′ has degree exactly 3 in T ′ .We say that a leaf-labeled forest F ′ is a subforest of a rooted X-forest F if up to forced contraction, there is an isomorphismbetween F ′ and a subforest of F that preserves the ancestor-descendant relation.

2.3. Agreement forests

The following terminologies can be used for both rooted version and unrooted version.Fixed a label-set X . An X-forest F is an agreement forest for a collection {F1, F2, . . . , Fm} of X-forests if F is a subforest

of Fi , for all i. A maximum agreement forest (abbr. MAF) F ∗ for {F1, F2, . . . , Fm} is an agreement forest for {F1, F2, . . . , Fm}with a minimum Ord(F ∗) over all agreement forests for {F1, F2, . . . , Fm}.

The problem we are focused on in this paper is a parameterized version of the Maximum Agreement Forest Problem foran arbitrary number of X-forests, which has a rooted version and an unrooted version. The problems are formally given asfollows.

Page 4: Algorithms for parameterized maximum agreement forest problem on multiple trees

JID:TCS AID:9572 /FLA Doctopic: Algorithms, automata, complexity and games [m3G; v 1.123; Prn:20/01/2014; 13:10] P.4 (1-10)

4 F. Shi et al. / Theoretical Computer Science ••• (••••) •••–•••

rooted maximum agreement forest (rooted-maf)Input: A set {F1, F2, . . . , Fm} of rooted X-forests, and a parameter kOutput: an agreement forest F ∗ for {F1, F2, . . . , Fm} with Ord(F ∗) � k, or report that no such agreement forest exists

unrooted maximum agreement forest (unrooted-maf)Input: A set {F1, F2, . . . , Fm} of unrooted X-forests, and a parameter kOutput: an agreement forest F ∗ for {F1, F2, . . . , Fm} with Ord(F ∗) � k, or report that no such agreement forest exists

A special case of the above definitions, in which each of the X-forests F1, F2, . . . , Fm is an X-tree, is the standard Max-imum Agreement Forest Problem on multiple binary phylogenetic trees, for the rooted version and the unrooted version,respectively.

The following concept, on two X-forests, will play an important role in our discussion, which applies to both the versionof rooted trees and the version of unrooted trees.

Definition 2.1. Let F1 and F2 be two X-forests (either both rooted or both unrooted). An agreement forest F for F1 and F2is a maximal agreement forest (maximal-AF) for F1 and F2 if there is no agreement forest F ′ for F1 and F2 such that F is asubforest of F ′ and Ord(F ′) < Ord(F ).

By definition, an MAF for two X-forests F1 and F2 is also a maximal-AF for F1 and F2, but the inverse is not necessarilytrue. The following lemma will be useful in our discussion, whose proof follows directly from the definition.

Lemma 2.2. Every agreement forest for two X-forests F1 and F2 is a subforest of a maximal-AF F ′ for F1 and F2 , but F ′ may not beunique.

3. Maximal-AF for two X-forests

Fix a label-set X . Because of the bijection between the leaves in an X-forest F and the elements in the label-set X ,sometimes we will use, without confusion, a label in X to refer to the corresponding leaf in F , or vice versa.

Let F1 and F2 be two X-forests, either both are rooted or both are unrooted. In this section, we discuss how we enu-merate all maximal-AFs for F1 and F2. The discussion is divided into the case for the rooted version and the case for theunrooted version.

3.1. Rooted maximal-AF

In this case, both F1 and F2 are rooted X-forests. We proceed by repeatedly removing edges in F1 and F2 until certaincondition is met. Let F ∗ be a fixed maximal-AF for F1 and F2.

Two labels a and b (and their corresponding leaves) in a forest are siblings if they have the common parent. We startwith the following simple lemma.

Lemma 3.1. Let F1 and F2 be two strongly reduced rooted X-forests. If F2 contains no sibling pairs, then F1 and F2 have a uniquemaximal-AF that can be constructed in linear time.

Proof. Let T be a connected component of F2, which is a rooted leaf-labeled tree. If F2 contains no sibling pairs, thenneither does T . Therefore, if T does not contain the leaf ρ , then T must be a single-vertex tree whose unique leaf is alabeled vertex. If T contains the leaf ρ , then T is either a single-vertex tree whose leaf is labeled ρ or a single-edge treewhose root is ρ with a unique child that is labeled by an element τ in X \ {ρ}. Thus, all connected components of theX-forest F2 are single-vertex trees, with the exception of at most one that is a single-edge tree whose two leaves arelabeled by the elements ρ and τ in X . Therefore, if the leaves ρ and τ are in the same connected component in theX-forest F1, then the (unique) maximal-AF for F1 and F2 is the X-forest F2 itself. On the other hand, if ρ and τ are indifferent connected components in F1, then the maximal-AF (again unique) for F1 and F2 consists of only single-vertextrees, each is labeled by an element in X . �

By Lemma 3.1, therefore, in the following discussion, we will assume that the rooted X-forest F2 has a sibling pair (a,b).By definition, a and b cannot be ρ . Let p2, which is an unlabeled vertex, be the parent of a and b in F2. If one of a and bis a single-vertex tree in the X-forest F1, then we can remove the edge in F2 that is incident to the label, and break up thesibling pair in F2. Thus, in the following discussion, we assume that none of a and b is a single-vertex tree in F1. Let p1and p′

1 be the parents of a and b in F1, respectively. We consider all possible cases for the labels a and b in the X-forest F1.

Case 1. The labels a and b are in different connected components in F1.

Page 5: Algorithms for parameterized maximum agreement forest problem on multiple trees

JID:TCS AID:9572 /FLA Doctopic: Algorithms, automata, complexity and games [m3G; v 1.123; Prn:20/01/2014; 13:10] P.5 (1-10)

F. Shi et al. / Theoretical Computer Science ••• (••••) •••–••• 5

Fig. 1. Rooted trees: the path connecting the labels a and b in F1.

In this case, a and b cannot be in the same connected component in the maximal-AF F ∗ . Therefore, one of the edges[a, p2] and [b, p2] in the X-forest F2 must be removed, which forces one of the labels a and b to be a single-vertex tree inthe maximal-AF F ∗ . Therefore, in this case, we apply the following branching step:

Step 1.

(branch-1) remove the edge [a, p1] in F1 and the edge [a, p2] in F2 to make a a single-vertex tree in both F1 and F2;(branch-2) remove the edge [b, p′

1] in F1 and the edge [b, p2] in F2 to make b a single-vertex tree in both F1 and F2.

One of these two branches must keep the maximal-AF F ∗ still a maximal-AF for the new X-forests F1 and F2.

Case 2. The labels a and b are also siblings in F1, i.e., p1 = p′1.

Since F ∗ is a maximal-AF, in this case, a and b must be also siblings in F ∗ . Therefore, the structure that consists of thetwo labels a and b and their common parent remains unchanged when we construct F ∗ for F1 and F2 by removing edgesin F1 and F2. Thus, this structure can be regarded as a single leaf labeled by a “combined” label ab in both F1 and F2. Toimplement this, we apply the following step:

Step 2. Remove a and b, and make their parent a new leaf labeled ab, in both F1 and F2.

We will call the operation in Step 2 “shrinking a and b into a new label ab”. Note that this step not only changes thestructure of F1 and F2, but also replaces the label-set X with a new label-set (X \ {a,b}) ∪ {ab}. If we also apply thisoperation in the maximal-AF F ∗ , then the new F ∗ remains a maximal-AF for the new F1 and F2.

Case 3. The labels a and b are in the same connected component in F1 but are not siblings.

Let P = {a, c1, c2, . . . , cr,b} be the unique path in F1 that connects a and b, in which let ch be the least common ancestorof a and b, where 1 � h � r. Since a and b are not siblings, r � 2. See Fig. 1 for an illustration. There are three possiblesubcases.

Subcase 3.1. The label a is a single-vertex tree in F ∗ . In this subcase, removing the edge incident to a in both F1 and F2keeps F ∗ a maximal-AF for F1 and F2.

Subcase 3.2. The label b is a single-vertex tree in F ∗ . In this subcase, removing the edge incident to b in both F1 and F2keeps F ∗ a maximal-AF for F1 and F2.

Subcase 3.3. Neither of a and b is a single-vertex tree in F ∗ . In this subcase, the two edges that are incident to a and bin the X-forest F2 must be kept in F ∗ . Therefore, a and b are siblings in F ∗ . On the other hand, in order to make a andb siblings in the X-forest F1, all edges that are not on the path P but are incident to a vertex c j in P , where j �= h, mustbe removed (note that this is because the subtrees in an X-forest must preserve the ancestor-descendant relation). Notethat since r � 2, there is at least one such edge. Therefore, in this subcase, if we remove all these edges, then F ∗ remains amaximal-AF for F1 and F2.

Summarizing the above analysis, in Case 3, we apply the following step:

Step 3.

(branch-1) remove the edge incident to a in both F1 and F2;(branch-2) remove the edge incident to b in both F1 and F2;(branch-3) remove all edges in F1 that are not on the path P connecting a and b but are incident to a vertex in P , except

the one that is incident to the least common ancestor of a and b.

One of these branches keeps F ∗ a maximal-AF for the new F1 and F2.

Page 6: Algorithms for parameterized maximum agreement forest problem on multiple trees

JID:TCS AID:9572 /FLA Doctopic: Algorithms, automata, complexity and games [m3G; v 1.123; Prn:20/01/2014; 13:10] P.6 (1-10)

6 F. Shi et al. / Theoretical Computer Science ••• (••••) •••–•••

Therefore, for two given rooted X-forests F1 and F2, if we iteratively apply the above process, branching accordinglybased on the cases, then the process will end up with a pair (F1, F2) in which F2 contains no sibling pairs. When thisoccurs, the process applies the following step:

Final step. If F2 contains no sibling pairs, then construct the (unique) maximal-AF F ∗ for F1 and F2, and convert F ∗ intoan agreement forest for the original F1 and F2.

When F2 contains no sibling pairs, by Lemma 3.1, we can construct the maximal-AF F ∗ for F1 and F2 in linear time.The forest F ∗ may not be a subforest of the original F1 and F2 because Step 2 shrinks labels. For this, we should “expand”the shrunk labels, in a straightforward way. Note that this expanding process may be applied iteratively.

Summarizing the above discussion, we conclude with the following lemma.

Lemma 3.2. Let F1 and F2 be two rooted X-forests. If we apply Steps 1–3 iteratively until F2 contains no sibling pairs, then for everymaximal-AF F ∗ for the original F1 and F2 , at least one of the branches in the process produces the maximal-AF F ∗ in its Final step.

Proof. Fix a maximal-AF F ∗ for F1 and F2. By the above analysis, for each of the cases, at least one of the branches in thecorresponding step keeps F ∗ a maximal-AF for F1 and F2. Moreover, when F2 contains no sibling pairs, the maximal-AF forF1 and F2 becomes unique. Combining these two facts, we conclude that at least one of the branches in the process endsup with a pair F1 and F2 whose maximal-AF, after the final step, is F ∗ . Since F ∗ is an arbitrary maximal-AF for F1 and F2,the lemma is proved. �3.2. Unrooted MAF

The analysis for the unrooted version proceeds in a similar manner. However, since an unrooted tree enforces noancestor-descendant relation in the tree, subtrees in the tree have no requirement of preserving such a relation. This factinduces certain subtle differences.

Let F1 and F2 be two unrooted X-forests, and let F ∗ be a fixed maximal-AF for F1 and F2. Recall that we assume F1and F2 to be strongly reduced.

Two labels a and b (and their corresponding leaves) in an unrooted X-forest F are siblings if either they are the twoleaves of a single-edge tree in F , or they are adjacent to the same non-leaf vertex in F , which will be called the “parent” ofa and b.

An unrooted X-forest with no sibling pairs has an even simpler structure: all of its connected components must besingle-vertex trees. Thus, we again have the following lemma:

Lemma 3.3. Let F1 and F2 be two unrooted X-forests. If F2 contains no sibling pairs, then the maximal-AF for F1 and F2 can beconstructed in linear time.

Thus, again we will assume that the unrooted X-forest F2 has a sibling pair (a,b). Similarly, if a or b is a single-vertextree in F1, then we can remove an edge in F2 to break up the sibling pair in F2. Thus, we will assume that none of a and bis a single-vertex tree in F1.

Case 1. The labels a and b are in different connected components in F1.

In this case, again one of the labels a and b must be a single-vertex tree in the maximal-AF F ∗ . Therefore, we apply thefollowing step:

Step 1.

(branch-1) remove the edge incident to a in both F1 and F2 to make a a single-vertex tree in both F1 and F2;(branch-2) remove the edge incident to b in both F1 and F2 to make b a single-vertex tree in both F1 and F2.

Case 2. The labels a and b are also siblings in F1.

We have to be a bit more careful for this case since a sibling pair may come from a single-edge tree. There are fourdifferent cases: (1) a and b come from a single-edge tree in both F1 and F2; (2) a and b come from a single-edge treein F1 but have a common parent p2 in F2; (3) a and b have a common parent p1 in F1 but come from a single-edge treein F2; and (4) a and b have a common parent p1 in F1 and also have a common parent p2 in F2. By a careful analysis andnoticing that F ∗ is maximal, we can verify that in all these subcases it is always safe to shrink a and b into a new label,which is implemented by the following step:

Page 7: Algorithms for parameterized maximum agreement forest problem on multiple trees

JID:TCS AID:9572 /FLA Doctopic: Algorithms, automata, complexity and games [m3G; v 1.123; Prn:20/01/2014; 13:10] P.7 (1-10)

F. Shi et al. / Theoretical Computer Science ••• (••••) •••–••• 7

Fig. 2. Unrooted trees: the path connecting the labels a and b in F1.

Step 2. Shrink the labels a and b in both F1 and F2: if a and b have a common parent, then remove the edges incident toa and b and make their parent a new leaf labeled ab; if a and b come from a single-edge tree, then combine them into asingle vertex labeled ab.

After this process, the maximal-AF F ∗ for F1 and F2, in which labels a and b are also shrunk, remains a maximal-AF forthe new F1 and F2.

Case 3. The labels a and b are in the same connected component in F1 but are not siblings.

Let P = {a, c1, c2, . . . , cr,b} be the unique path in F1 that connects a and b, where r � 2. See Fig. 2 for an illustration. Thecases in which either a or b is a single-vertex tree in F ∗ again cause removing the edge incident to a or b in F1. However,when a and b are siblings in F ∗ , then in F1, at most one of the edges that are not on the path P but are incident to avertex in P can be kept. However, since the subtree in an unrooted forest does not need to preserve any ancestor-descendantrelation, we cannot decide which of these edges should be kept. On the other hand, since r � 2, we know at least one ofthe two edges, which are not on the path P but are incident to c1 and cr , respectively, must be removed. Therefore, we canbranch by either removing the one incident to c1 or removing the one incident to cr . In summary, in Case 3, we apply thefollowing step:

Step 3.

(branch-1) remove the edge incident to a in both F1 and F2;(branch-2) remove the edge incident to b in both F1 and F2;(branch-3) remove the edge incident to c1 but not on the path P in F1;(branch-4) remove the edge incident to cr but not on the path P in F1.

One of these branches keeps F ∗ a maximal-AF for the new F1 and F2.Again if the unrooted X-forest F2 contains no sibling pairs, then we apply Lemma 3.3 to construct the maximal-AF for

F1 and F2 by the following step:

Final step. If F2 contains no sibling pairs, then construct the (unique) maximal-AF F ∗ for F1 and F2, and convert F ∗ intoan agreement forest for the original F1 and F2.

The above analysis finally gives the following conclusion, whose proof is exactly the same as that of Lemma 3.2 for therooted version.

Lemma 3.4. Let F1 and F2 be two unrooted X-forests. If we apply Steps 1–3 iteratively until F2 contains no sibling pairs, then for everymaximal-AF F ∗ for the original F1 and F2 , at least one of the branches in the process produces the maximal-AF F ∗ in its Final step.

4. The parameterized algorithms

Now we are ready for presenting the parameterized algorithms for the maf problem, for both the rooted version as wellas the unrooted version. Let F1, F2, . . . , Fm be m X-forests, either all are rooted or all are unrooted. We first give a fewlemmas, which hold true for both rooted version and unrooted version. Assume m � 3.

The first lemma follows directly from the definition.

Lemma 4.1. Let F ′ be an agreement forest for F1 and F2 . Then every agreement forest for {F ′, F3, . . . , Fm} is an agreement forestfor {F1, F2, . . . , Fm}. Moreover, if F ′ contains an MAF for {F1, F2, . . . , Fm}, then an MAF for {F ′, F3, . . . , Fm} is also an MAF for{F1, F2, . . . , Fm}.

Lemma 4.2. For every MAF F for {F1, F2, . . . , Fm}, there is a maximal-AF F ∗ for F1 and F2 such that F is also an MAF for{F ∗, F3, . . . , Fm}.

Page 8: Algorithms for parameterized maximum agreement forest problem on multiple trees

JID:TCS AID:9572 /FLA Doctopic: Algorithms, automata, complexity and games [m3G; v 1.123; Prn:20/01/2014; 13:10] P.8 (1-10)

8 F. Shi et al. / Theoretical Computer Science ••• (••••) •••–•••

Algorithm Rt-MAF(F1, F2, . . . , Fm;k)

Input: a collection {F1, F2, . . . , Fm} of rooted X-forests, m � 1, and a parameter kOutput: an agreement forest F ∗ for {F1, F2, . . . , Fm} with Ord(F ∗) � k if such an F ∗ exists

1. if (m = 1) then if (Ord(F1) � k) then return F1 else return(‘no’);2. if (Ord(F1) > k) then return(‘no’);3. if a label a is a single-vertex tree in exactly one of F1 and F2 then make a a single-vertex tree in both F1 and F2;4. if F2 has no sibling pairs then let F ′ be the maximal-AF for F1 and F2; return Rt-MAF(F ′, F3, . . . , Fm;k);5. let (a,b) be a sibling pair in F2;6. if a and b are in different connected components in F1 then branch:

1. make a a single-vertex tree in both F1 and F2; return Rt-MAF(F1, F2, . . . , Fm;k);2. make b a single-vertex tree in both F1 and F2; return Rt-MAF(F1, F2, . . . , Fm;k);

7. if a and b are also siblings in F1 then shrink a, b into a new leaf ab in both F1 and F2; return Rt-MAF(F1, F2, . . . , Fm;k);8. let P = {a, c1, . . . , cr ,b} be the unique path in F1 connecting a and b, r � 2; then branch:

1. make a a single-vertex tree in both F1 and F2; return Rt-MAF(F1, F2, . . . , Fm;k);2. make b a single-vertex tree in both F1 and F2; return Rt-MAF(F1, F2, . . . , Fm;k);3. remove all edges in F1 that are not on P but incident to a vertex in P , except the one incident to the least common

ancestor of a, b; return Rt-MAF(F1, F2, . . . , Fm;k).

Fig. 3. Algorithm for the rooted-maf problem.

Proof. Let F0 be an MAF for {F1, F2, . . . , Fm}. Then F0 is an agreement forest for F1 and F2. Let F ∗ be a maximal-AF for F1and F2 that has F0 as a subforest. Then F0 is an agreement forest for {F ∗, F3, . . . , Fm}. Therefore, the order of an MAF for{F ∗, F3, . . . , Fm} is at most Ord(F0). On the other hand, since F ∗ is a subforest of both F1 and F2, every agreement forestfor {F ∗, F3, . . . , Fm} is also an agreement forest for {F1, F2, . . . , Fm}. Therefore, the order of an MAF for {F ∗, F3, . . . , Fm} isat least Ord(F0), thus must be equal to Ord(F0). Since F0 is an agreement forest for {F ∗, F3, . . . , Fm}, F0 must be an MAFfor {F ∗, F3, . . . , Fm}. �

Now we consider an instance (F1, F2, . . . , Fm;k) of the parameterized maf problem, which is either rooted or un-rooted. For a subforest F ′ of a forest F , we always have Ord(F ) � Ord(F ′). Therefore, no maximal-AF F for F1 and F2with Ord(F ) > k can contain an MAF F ′ for (F1, F2, . . . , Fm) with Ord(F ′) � k. Therefore, we only need to examine allmaximal-AFs whose order is bounded by k. An outline of our algorithm works as follows:

Main-Algorithm1. construct a collection C of agreement forests for F1 and F2 that contains all maximal-AF F ∗ for F1 and F2 with

Ord(F ∗) � k;2. for each F in the collection C constructed in Step 1 do recursively work on the instance (F , F3, . . . , Fm;k).

Theorem 4.3. The Main-Algorithm correctly returns an agreement forest F for {F1, F2, . . . , Fm} with Ord(F ) � k if such an agreementforest exists.

Proof. For each F ′ in the collection C , by Lemma 4.1, any solution to the instance (F ′, F3, . . . , Fm;k) returned by Step 2 isalso a solution to the instance (F1, F2, . . . , Fm;k). On the other hand, if (F1, F2, . . . , Fm;k) has a solution, then an MAF F0for {F1, F2, . . . , Fm} satisfies Ord(F0) � k. For the maximal-AF F ∗ for F1 and F2 that contains F0, by Lemma 4.2, F0 is also asolution to (F ∗, F3, . . . , Fm;k), which is an instance examined in Step 2. On this instance, Step 2 will return a solution that,by Lemma 4.1, is also a solution to (F1, F2, . . . , Fm;k). �

In the following, we present the details for the Main-Algorithm for the rooted version and the unrooted version, re-spectively. According to Theorem 4.3, we must carefully check that all maximal-AFs F for F1 and F2 with Ord(F ) � k beconstructed in the collection C . Moreover, we should develop algorithms that achieve the desired complexity bounds.

4.1. A parameterized algorithm for rooted-maf

The parameterized algorithm for the rooted-maf problem is a combination of the analysis given in Section 3.1 and theMain-Algorithm, which is given in Fig. 3.

We consider the correctness and the complexity of the algorithm. To make our discussion more specific, we denote by( F̄1, F̄2, . . . , F̄m;k) the original input to the algorithm, and initiate with Fi = F̄ i for 1 � i � m.

The algorithm is a branch-and-search process, in which Step 6 and Step 8 contain branches. The execution of the algo-rithm can be depicted by a search tree T whose leaves correspond to conclusions or solutions generated by the algorithmbased on different branches. Each internal node of the search tree T corresponds to a branch in the search process atStep 6 or Step 8 based on an instance of the problem. The root of the tree T is on the instance that is the original input tothe algorithm. We will call a path from the root to a leaf in the search tree T a computational path in the process, whichcorresponds to a particular sequence of executions in the algorithm that leads to a conclusion or solution. The algorithmreturns an agreement forest for the original input if and only if there is a computational path that outputs the forest.

Page 9: Algorithms for parameterized maximum agreement forest problem on multiple trees

JID:TCS AID:9572 /FLA Doctopic: Algorithms, automata, complexity and games [m3G; v 1.123; Prn:20/01/2014; 13:10] P.9 (1-10)

F. Shi et al. / Theoretical Computer Science ••• (••••) •••–••• 9

Algorithm UR-MAF(F1, F2, . . . , Fm;k)

Input: a collection {F1, F2, . . . , Fm} of unrooted X-forests, m � 1, and a parameter kOutput: an agreement forest F ∗ for {F1, F2, . . . , Fm} with Ord(F ∗) � k if such an F ∗ exists\\ Steps 1–7 are the same as that of the algorithm Rt-MAF as given in Fig. 3;\\ a and b are siblings in F2

8. let P = {a, c1, . . . , cr ,b} be the unique path in F1 connecting a and b, r � 2;then branch:

1. make a a single-vertex tree in both F1 and F2; return UR-MAF(F1, F2, . . . , Fm;k);2. make b a single-vertex tree in both F1 and F2; return UR-MAF(F1, F2, . . . , Fm;k);3. remove the edge incident to c1 but not on P in F1; return UR-MAF(F1, F2, . . . , Fm;k);4. remove the edge incident to cr but not on P in F1; return UR-MAF(F1, F2, . . . , Fm;k).

Fig. 4. Algorithm for the unrooted-maf problem.

We first study the correctness of the algorithm.According to Step 1, the algorithm is correct when m = 1. Therefore, we will assume that m > 1 and that the algorithm

is correct when the input contains no more than m − 1 X-forests.If Ord(F1) > k, then an MAF F ′ for (F1, F2, . . . , Fm), which is a subforest of F1, must have Ord(F ′) > k. Thus, the instance

(F1, F2, . . . , Fm;k) is a ‘no’. Step 2 is correct.If a label a is a single-vertex tree in one of F1 and F2, then all agreement forests for F1 and F2 must have a in a

single-vertex tree. Thus, it is safe to remove the edge incident to a in the other forest where a is not a single-vertex tree.Thus, Step 3 is correct.

If F2 has no sibling pairs, then by Lemma 3.1, the unique maximal-AF F ′ for F1 and F2 can be constructed in linear time.Since F ′ is the unique maximal-AF for F1 and F2, by Lemma 4.2, the instances (F1, F2, . . . , Fm;k) and (F ′, F3, . . . , Fm;k)

have the same set of MAFs. By our induction, the algorithm works correctly on (F ′, F3, . . . , Fm;k). Thus, Step 4 is correct.If the instance (F1, F2, . . . , Fm;k) reaches Step 5, then the X-forest F2 has a sibling pair (a,b), and a and b are not

single-vertex trees in F1. Steps 6–8 are applied on the X-forests F1 and F2 recursively (during the recursion, Step 3 mayalso be applied). These steps remove edges in F1 and F2 thus reduce the sizes of the forests (Step 7 does not removeedges, but it reduces the size of F1 and F2 without changing their essential structures). Thus, the steps keep the situation,recursively, that F1 and F2 are subforests of F̄1 and F̄2, respectively. This means that during the process of these steps, everyagreement forest for {F1, F2, . . . , Fm} remains an agreement forest for the original { F̄1, F̄2, . . . , F̄m}. These steps continueuntil the condition in either Step 2 or Step 4 is met. By the discussion above, Step 2 or Step 4 then will return a correctsolution to the instance (F1, F2, . . . , Fm;k), which is either an answer ‘no’, or an agreement forest F ∗ for {F1, F2, . . . , Fm}with Ord(F ∗) � k, which is also a solution to the original input ( F̄1, F̄2, . . . , F̄m;k). Thus, no computational path in thealgorithm can output an X-forest that is not a solution to the original input ( F̄1, F̄2, . . . , F̄m;k). In particular, if the originalinput ( F̄1, F̄2, . . . , F̄m;k) is a ‘no’ for the rooted-maf problem, then the algorithm Rt-MAF must return an answer ‘no’. On theother hand, suppose that ( F̄1, F̄2, . . . , F̄m;k) is a ‘yes’ and { F̄1, F̄2, . . . , F̄m} has an MAF F ∗ with Ord(F ∗) � k. Let F ′ be themaximal-AF for F̄1 and F̄2 that has F ∗ as a subforest (note Ord(F ′) � Ord(F ∗) � k). By Lemma 3.2, there is a computationalpath that starts with F1 = F̄1 and F2 = F̄2, and applies Steps 6–8 recursively until F1 and F2 satisfy the condition of Step 4.Step 4 then constructs the maximal-AF F ′ for F̄1 and F̄2. By Lemma 4.1 and our induction, the recursive call in Step 4 willreturn an agreement forest F for { F̄1, F̄2, . . . , F̄m} with Ord(F ) � k. Therefore, the algorithm also works correctly in thiscase.

This completes the proof of the correctness of the algorithm. Now we consider the complexity of the algorithm. Becauseof Step 6 and Step 8, each branch in the search tree T can make at most three ways. Moreover, by examining Steps 6and 8, it is easy to verify that between two consecutive branches in a computational path, the value Ord(F1) is increasedby at least 1. Since the algorithm will stop at Step 2 when Ord(F1) > k, each computational path in the search tree T cango through at most k branches. As a consequence, the number of leaves in the search tree T is bounded by 3k . It is alsoeasy to verify that between two consecutive branches, the computational path takes time O (n), where n is the number ofvertices in the input instance. Summarizing all these together, we conclude that the algorithm Rt-MAF(F1, F2, . . . , Fm;k) hasits running time bounded by O (3kn).

Theorem 4.4. The rooted-maf problem can be solved in time O (3kn), where n is the number of vertices in the input instance.

4.2. A parameterized algorithm for unrooted-maf

Similar to the one for rooted-maf, the algorithm for the unrooted-maf problem is a combination of the analysis given inSection 3.2 and the Main-Algorithm. Comparing the analysis for the rooted version given in Section 3.1 and the analysis forthe unrooted version given in Section 3.2, we can see that they only differ for Case 3: Case 3 in Section 3.1 branches intothree ways while Case 3 in Section 3.2 branches into four ways. Therefore, the two algorithms only need to differ in Step 8,which handles Case 3 in the analysis for the two versions. The parameterized algorithm for the unrooted-maf problem isgiven in Fig. 4.

Page 10: Algorithms for parameterized maximum agreement forest problem on multiple trees

JID:TCS AID:9572 /FLA Doctopic: Algorithms, automata, complexity and games [m3G; v 1.123; Prn:20/01/2014; 13:10] P.10 (1-10)

10 F. Shi et al. / Theoretical Computer Science ••• (••••) •••–•••

The proof for the correctness for the algorithm UR-MAF proceeds in exactly the same way, based on the analysis inSection 3.2, as that for the algorithm Rt-MAF, which was based on the analysis in Section 3.1. For the computational com-plexity, since Step 8 of the algorithm UR-MAF branches into four ways, the search tree T for the algorithm UR-MAF hasfour-way branches. Therefore, we can only conclude that the number of leaves in the search tree T is bounded by 4k . Allother analysis is the same as that for the algorithm Rt-MAF. As a result, we conclude that the algorithm UR-MAF runs intime O (4kn).

Theorem 4.5. The unrooted-maf problem can be solved in time O (4kn), where n is the number of vertices in the input instance.

5. Conclusion

Let T be a rooted binary phylogenetic tree. An rSPR-operation on T is to remove a subtree T ′ rooted at a vertex wfrom T , subdivide an edge [u, v] in T \ T ′ by a new vertex w ′ , then make w a child of w ′ [7]. The rSPR-distance betweentwo rooted binary phylogenetic trees is the minimum number of rSPR-operations that convert one tree to the other, whichis equivalent to the minimum number of lateral gene transfers to transform between the trees, thus provides a lower boundon the number of reticulation events needed to reconcile the two phylogenies [16]. As shown by Bordewich and Semple[7], two rooted binary phylogenetic trees have an rSPR-distance Ord(F ) − 1, where F is an MAF for the two trees. Thisresult can be easily extended to more than two trees: let F be an MAF for a collection {T1, T2, . . . , Tm} of rooted binaryphylogenetic trees, then for any two trees Ti and T j in the collection, Ti can be transformed into T j using no more than(Ord(F )−1) rSPR-operations. Therefore, constructing an MAF for multiple rooted binary phylogenetic trees and studying thecomputational complexity of the rooted-maf problem are important, interesting, and biologically meaningful.

Similarly, the unrooted-maf problem is closely related to the TBR-distance for unrooted binary phylogenetic trees [6].Although the TBR-distance is not known to have a direct biological meaning, TBR-operations are often used to explore thespace of phylogenetic trees. This thus provides a strong motivation for the study of constructions of an MAF for multipleunrooted binary phylogenetic trees, and of the computational complexity of the unrooted-maf problem.

In this paper, we presented two parameterized algorithms for the Maximum Agreement Forest problem on multiplebinary phylogenetic trees: one for the problem rooted-maf on rooted trees that runs in time O (3kn), and the other forthe problem unrooted-maf on unrooted trees that runs in time O (4kn). To our best knowledge, our algorithm for theMAF problem on multiple rooted binary phylogenetic trees is currently the best fixed-parameter tractable algorithm for theproblem, and our algorithm for the MAF problem on multiple unrooted binary phylogenetic trees is the first fixed-parametertractable algorithm for the problem. Further improvements on the algorithm complexity are certainly desired to make thealgorithms more practical in their applications. On the other hand, such an improvement seems to require new observationsand new algorithmic techniques: the complexity of our algorithms for multiple phylogenetic trees is not much worse thanthat of the known algorithms for two phylogenetic trees – some of these algorithms were just developed very recently.

References

[1] D. Robinson, L. Foulds, Comparison of phylogenetic trees, Math. Biosci. 53 (1981) 131–147.[2] M. Li, J. Tromp, L. Zhang, On the nearest neighbour interchange distance between evolutionary trees, J. Theor. Biol. 182 (1996) 463–467.[3] P. Buneman, The recovery of trees from measures of dissimilarity, in: D.G. Kendall, P. Tautu (Eds.), Mathematics in the Archeological and Historical

Sciences, Edinburgh University Press, Edinburgh, 1971, pp. 387–395.[4] D. Swofford, G. Olsen, P. Waddell, D. Hillis, Phylogenetic inference, in: E. Hillis, C. Moritz, B. Mable (Eds.), Molecular Systematics, 2nd ed., Sinauer

Associates, 1996, pp. 407–513.[5] J. Hein, T. Jiang, L. Wang, K. Zhang, On the complexity of comparing evolutionary trees, Discrete Appl. Math. 71 (1996) 153–169.[6] B.L. Allen, M. Steel, Subtree transfer operations and their induced metrics on evolutionary trees, Ann. Comb. 5 (2001) 1–15.[7] M. Bordewich, C. Semple, On the computational complexity of the rooted subtree prune and regraft distance, Ann. Comb. 8 (2005) 409–423.[8] R. Downey, M. Fellows, Parameterized Complexity, Springer, New York, 1999.[9] M. Hallett, C. McCartin, A faster FPT algorithm for the maximum agreement forest problem, Theory Comput. Syst. 41 (2007) 539–550.

[10] C. Whidden, R.G. Beiko, N. Zeh, Fixed-parameter and approximation algorithms for maximum agreement forests, CoRR (2011), arXiv:1108.2664[q-bio.PE].

[11] J. Chen, J.H. Fan, S.H. Sze, Parameterized and approximation algorithms for the MAF problem in multifurcating trees, in: Proceedings of the 39thInternational Workshop on Graph-Theoretic Concepts in Computer Science, WG 2013, in: Lect. Notes Comput. Sci., vol. 8165, 2013, pp. 152–164.

[12] M. Bordewich, C. McCartin, C. Semple, A 3-approximation algorithm for the subtree distance between phylogenies, J. Discrete Algorithms 6 (2008)458–471.

[13] C. Whidden, R.G. Beiko, N. Zeh, Fixed-parameter algorithms for maximum agreement forests, SIAM J. Comput. 42 (2013) 1431–1466.[14] E.M. Rodrigues, M.F. Sagot, Y. Wakabayashi, The maximum agreement forest problem: approximation algorithms and computational experiments, Theor.

Comput. Sci. 374 (2007) 91–110.[15] F. Chataigner, Approximating the maximum agreement forest on k trees, Inf. Process. Lett. 93 (2005) 239–244.[16] R. Beiko, N. Hamilton, Phylogenetic identification of lateral genetic transfer events, BMC Evol. Biol. 6 (2005) 15.[17] Z. Chen, L. Wang, Algorithms for reticulate networks of multiple phylogenetic trees, IEEE/ACM Trans. Comput. Biol. Bioinform. 9 (2012) 372–384.[18] Z. Chen, L. Wang, Faster exact computation of rSPR distance, in: Proceedings of the 3rd Joint International Conference on Frontiers in Algorithmics and

Algorithmic Aspects in Information and Management, FAW–AAIM 2013, in: Lect. Notes Comput. Sci., vol. 7924, 2013, pp. 36–47.