8
J Comb Optim (2011) 21: 151–158 DOI 10.1007/s10878-009-9220-2 On the parameterized complexity of the Multi-MCT and Multi-MCST problems Wenbin Chen · Matthew C. Schmidt · Nagiza F. Samatova Published online: 10 March 2009 © Springer Science+Business Media, LLC 2009 Abstract The comparison of tree structured data is widespread since trees can be used to represent wide varieties of data, such as XML data, evolutionary histories, or carbohydrate structures. Two graph-theoretical problems used in the comparison of such data are the problems of finding the maximum common subtree (MCT) and the minimum common supertree (MCST) of two trees. These problems generalize to the problem of finding the MCT and MCST of multiple trees (MULTI -MCT and MULTI - MCST, respectively). In this paper, we prove parameterized complexity hardness re- sults for the different parameterized versions of the MULTI -MCT and MULTI -MCST problem under isomorphic embeddings. Keywords Multi-MCT · Multi-MCST · W-hierarchy · Parameterized complexity · Computational complexity 1 Introduction Tree structures are used to represent many different types of data. Ordered, vertex- labeled trees have been used to represent XML data (Gou and Chirkova 2007) and web access logs (Zaki 2002), while unordered, vertex-labeled trees have been used to represent phylogenetic trees (Farach and Thorup 1994) and carbohydrate sugar chains (Aoki et al. 2003). Problems concerning such data can then be reduced to graph-theoretical, tree-based problems. W. Chen · M.C. Schmidt · N.F. Samatova ( ) Computer Science Department, North Carolina State University, Raleigh, NC 27695, USA e-mail: [email protected] W. Chen · M.C. Schmidt · N.F. Samatova Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA

On the parameterized complexity of the Multi-MCT and Multi-MCST problems

Embed Size (px)

Citation preview

Page 1: On the parameterized complexity of the Multi-MCT and Multi-MCST problems

J Comb Optim (2011) 21: 151–158DOI 10.1007/s10878-009-9220-2

On the parameterized complexity of the Multi-MCTand Multi-MCST problems

Wenbin Chen · Matthew C. Schmidt ·Nagiza F. Samatova

Published online: 10 March 2009© Springer Science+Business Media, LLC 2009

Abstract The comparison of tree structured data is widespread since trees can beused to represent wide varieties of data, such as XML data, evolutionary histories, orcarbohydrate structures. Two graph-theoretical problems used in the comparison ofsuch data are the problems of finding the maximum common subtree (MCT) and theminimum common supertree (MCST) of two trees. These problems generalize to theproblem of finding the MCT and MCST of multiple trees (MULTI-MCT and MULTI-MCST, respectively). In this paper, we prove parameterized complexity hardness re-sults for the different parameterized versions of the MULTI-MCT and MULTI-MCSTproblem under isomorphic embeddings.

Keywords Multi-MCT · Multi-MCST · W-hierarchy · Parameterized complexity ·Computational complexity

1 Introduction

Tree structures are used to represent many different types of data. Ordered, vertex-labeled trees have been used to represent XML data (Gou and Chirkova 2007) andweb access logs (Zaki 2002), while unordered, vertex-labeled trees have been usedto represent phylogenetic trees (Farach and Thorup 1994) and carbohydrate sugarchains (Aoki et al. 2003). Problems concerning such data can then be reduced tograph-theoretical, tree-based problems.

W. Chen · M.C. Schmidt · N.F. Samatova (�)Computer Science Department, North Carolina State University, Raleigh, NC 27695, USAe-mail: [email protected]

W. Chen · M.C. Schmidt · N.F. SamatovaComputer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge,TN 37831, USA

Page 2: On the parameterized complexity of the Multi-MCT and Multi-MCST problems

152 J Comb Optim (2011) 21: 151–158

Two problems commonly used in the comparison of trees are the problem offinding the maximum common subtree (MCT) and the problem of finding the min-imum common supertree (MCST) for two trees. A subtree in this paper refers toany vertex-induced subgraph of a tree that forms a tree. A supertree is any treefor which the original tree is a subtree. The MCT is the largest tree (by mea-sure of the number of vertices in the tree) that is isomorphic to a subtree of bothtrees. The MCST is the smallest tree that is isomorphic to a supertree of both trees.The two problems MCT and MCST are closely related, since the maximum com-mon subtree can be used to determine the minimum common supertree. For bothordered and unordered, vertex-labeled trees, if only a pair of trees is consideredthen polynomial-time algorithms are known to exist for the MCT problem and, bythe reduction in (Rosselló and Valiente 2006), the MCST problem (Akutsu 1992;Gupta and Nishimura 1998).

There is also a need to find the MCT and MCST of more than just two trees. Forexample, multiple database schemas can be represented as XML data and thereforealso as ordered, labeled trees. The MCT of these trees would represent the maximumintersection of the multiple schemas. This is something of interest if one is tryingto determine which queries can be run across multiple diverse database schemas.The MCST of the ordered, labeled trees would represent the minimum union of themultiple schemas. This MCST could be used to join the multiple database into asingle database. The problem of finding the MCT and MCST for multiple trees iscalled the MULTI-MCT and MULTI-MCST problems, respectively.

Finding solutions to the MULTI-MCT problem and the MULTI-MCST prob-lem has shown to be significantly more difficult than finding solutions to the MCTand MCST problems. Akutsu (1992) proved that the MULTI-MCT problem for un-ordered, vertex-labeled graphs was NP-hard. It is also currently unknown if a solu-tion to the MULTI-MCT problem can generally be used to generate a solution to theMULTI-MCST problem.

Downey and Fellows (1999) introduce parameterized complexity classes that de-fine which problems are likely to have efficient solutions when certain parameters arefixed. In this paper, we study the hardness of the parameterized MULTI-MCT andMULTI-MCST with respect to these parameterized complexity classes.

For MCT and MCST, there are four types of tree embeddings: isomorphic, home-omorphic, topological, minor (Rosselló and Valiente 2006). The formal definitions ofthe different embeddings can be found in (Rosselló and Valiente 2006). Fellows etal. (2003) presented reductions from LCS to MULTI-MCT and from SCS to MULTI-MCST under homeomorphic embedding. However, their reductions do not hold un-der isomorphic, topological, and minor embeddings. For two ordered trees, MCTand MCST under the topological, minor, isomorphic, homeomorphic embeddings arepolynomial-time solvable, for which many references can be found in (Rosselló andValiente 2006). However, to our knowledge, the MULTI-MCT and MULTI-MCSTproblems under isomorphic, topological, and minor embeddings have not been stud-ied in the literature.

In this paper, MULTI-MCT and MULTI-MCST are considered under isomor-phic embedding. Under isomorphic embedding, we give a parameter reduction fromLongest Common Subsequence (LCS) to MULTI-MCT for labeled ordered trees,

Page 3: On the parameterized complexity of the Multi-MCT and Multi-MCST problems

J Comb Optim (2011) 21: 151–158 153

a parameter reduction from Maximum Clique to MULTI-MCT for labeled unorderedtrees and a parameter reduction from Shortest Common Supersequence (SCS) toMulti-MCST for labeled ordered trees. The reductions prove parameterized hardnessresults for the MULTI-MCT and MULTI-MCST problem under isomorphic embed-dings.

2 Definitions of problems

In this section, we will introduce both the MULTI-MCT and MULTI-MCST prob-lems, and discuss their various parameterizations.

Definition 2.1 (Multi-MCT) For a given set of k trees R = {T1, T2, . . . , Tk} withvertex-labels l(v) ∈ � and an integer m, does there exist a tree Tmax of size greaterthan or equal to m that is isomorphic to a subtree of every tree Ti for 1 ≤ i ≤ k?

We define the size of a tree as the number of vertices in the tree. The related prob-lem of finding the minimum common supertree of multiple trees (MULTI-MCST) isdefined as follows.

Definition 2.2 (Multi-MCST) For a given set of k trees R = {T1, T2, . . . , Tk} withvertex-labels l(v) ∈ � and an integer m, does there exist a tree Tmin of size less thanor equal to m for which every tree Ti for 1 ≤ i ≤ k is isomorphic to a subtree of Tmin?

The following problems are defined here because parameterized versions of themwill be shown to be reducible to a parameterized version of either the MULTI-MCTproblem or the MULTI-MCST problem. These problems are formally defined in(Downey and Fellows 1999) and reproduced here.

Definition 2.3 (LCS) For a given finite alphabet �, a finite set Y of k strings{X1,X2, . . . ,Xk} from �∗, and a positive integer m, is there a string Xmax ∈ �∗ with|Xmax| ≥ m such that Xmax is a subsequence of each string in Xi ∈ Y ?

Definition 2.4 (CLIQUE) For a given graph G = (V ,E) and a positive integer m ≤|V |, does G contain a clique of size m or greater?

Definition 2.5 (SCS) For a given finite alphabet �, a finite set Y of k strings{X1,X2, . . . ,Xk} from �∗, and a positive integer m, is there a string Xmin ∈ �∗ with|Xmin| ≤ m such that each string Xi ∈ Y is a subsequence of Xmin?

Choosing different parameters for each of these problems leads to various parame-terized problems. These parameterized problems and their corresponding parametersare shown in Table 1.

The known parameterized complexity and hardness results for the parameterizedversions of the LCS, CLIQUE, and SCS problems are given in Table 2.

Page 4: On the parameterized complexity of the Multi-MCT and Multi-MCST problems

154 J Comb Optim (2011) 21: 151–158

Table 1 The different parameterized versions of the MULTI-MCT, MULTI-MCST, LCS, CLIQUE,SCS problems and their associated parameters

Parameter Multi-MCT LCS CLIQUE

k k-MULTI-MCT k-LCS N/A

m m-MULTI-MCT m-LCS m-CLIQUE

km km-MULTI-MCT km-LCS N/A

k,� k�-MULTI-MCT k�-LCS N/A

Parameter Multi-MCST SCS

k k-MULTI-MCST k-SCS

m m-MULTI-MCST m-SCS

km km-MULTI-MCST km-SCS

k,� k�-MULTI-MCST k�-SCS

Table 2 Parameterizedcomplexity results of theparameterized versions of theLCS, SCS and CLIQUEproblem

Problem Complexity References

k-LCS W [t]-hard, ∀t ≥ 1 Bodlaender et al. (1995a)

m-LCS W [2]-hard Bodlaender et al. (1995a)

km-LCS W [1]-Complete Bodlaender et al. (1995a)

k�-LCS W [t]-hard, ∀t ≥ 1 Bodlaender et al. (1995b)

m-CLIQUE W [1]-Complete Downey and Fellows (1999)

m-SCS FPT Hallett (1996)

k-SCS W [t]-hard, ∀t ≥ 1 Hallett (1996)

km-SCS N/A

k�-SCS W [t]-hard, ∀t ≥ 1 Hallett (1996)

3 Parameterized complexity of the Multi-MCST problem for vertex-labeled,ordered trees

In this section, we show that for vertex-labeled, ordered trees, the parameterized ver-sions of the MULTI-MCST problem are hard for the different parameterized com-plexity classes. In order to get these parameterized hardness results, we will showthat the shortest common supersequence (SCS) problem is reducible to the MULTI-MCST problem for vertex-labeled, ordered trees.

Theorem 3.1 There is a linear FPT reduction from SCS to Multi-MCST for labeledordered trees.

Proof Given an instance of SCS, (X1, . . . ,Xk , m), we construct an instance ofMULTI-MCST as follows. Assume that Xi = Ai,1 . . . ,Ai,ri where ri is the lengthof Xi . For each Xi , we construct a labeled ordered tree Ti : it has one root nodevi,0 with label ROOT , and it has ri ordered children, vi,1, . . . , vi,ri with labels,Ai,1, . . . ,Ai,ri , respectively.

Page 5: On the parameterized complexity of the Multi-MCT and Multi-MCST problems

J Comb Optim (2011) 21: 151–158 155

Table 3 Parameterizedcomplexity results of theparameterized versions of theMULTI-MCST problem forvertex-labeled, ordered trees

Problem Hardness

m-MULTI-MCST N/A

k-MULTI-MCST W [t]-hard, ∀t ≥ 1

km-MULTI-MCST N/A

k�-MULTI-MCST W [t]-hard, ∀t ≥ 1

Thus, if a sequence X = A1 . . .Ar is the supersequence of Xi , then the tree witha root node labeled ROOT and r ordered children labeled A1, . . . ,Ar is a supertreeof Ti .

Conversely, assume that the tree S is a minimum common supertree of the treesin the set constructed previously, R = {T1, . . . , Tk}. Then S will be structured as astar tree whose hub is labeled ROOT . Because of the structure of the trees in R, eachvertex in S must be either have the label ROOT or be connected to a vertex with thelabel ROOT . Now assume that there are two vertices, v1 and v2, in S that are labeledROOT . We can construct a tree by removing v2 and connecting all of the childrenof v2 to v1 that will be a common supertree of R but with |S| − 1 vertices. Thiswould contradict the definition of S. Therefore, S must be a star tree in which thehub and only the hub is labeled ROOT . Thus, the ordered supertrees S must consistsof a root node ROOT and r children nodes A1, . . . ,Ar . Hence S correspond to asupersequence X = A1 · · ·Ar of Xi .

Hence, the minimum common supersequence of X1, . . . ,Xk is of size r if andonly if the minimum common supertree of T1, . . . , Tk of size r + 1.

The proof that the reduction is FPT is left to the reader. �

Thus, by the known parameterized complexity results of the SCS problem, weget the parameterized hardness results of the MULTI-MCST problem. These are pre-sented in Table 3.

4 Parameterized complexity of the Multi-MCT problem for vertex-labeled,ordered trees

In this section, we show that for vertex-labeled, ordered trees, the parameterized ver-sions of the MULTI-MCT problem are hard for the different parameterized complex-ity classes. In order to get these parameterized hardness results, we will show thatthe longest common subsequence (LCS) problem is reducible to the MULTI-MCTproblem for vertex-labeled, ordered trees.

Theorem 4.1 There is a linear FPT reduction from LCS to Multi-MCT for labeledordered trees.

Proof Given an instance of LCS, (X1, . . . ,Xk , m), we construct an instance ofMULTI-MCT as follows. Assume that Xi = Ai,1 . . .Ai,ri where ri is the length ofXi . For each Xi , we construct a labeled ordered tree Ti : it has one root node vi,0 with

Page 6: On the parameterized complexity of the Multi-MCT and Multi-MCST problems

156 J Comb Optim (2011) 21: 151–158

Table 4 Parameterizedhardness results of theparameterized versions of theMULTI-MCT problem forvertex-labeled, ordered trees

Problem Hardness

m-MULTI-MCT W [2]-hard

k-MULTI-MCT W [t]-hard, ∀t ≥ 1

km-MULTI-MCT W [1]-hard

k�-MULTI-MCT W [t]-hard, ∀t ≥ 1

label ROOT , and it has ri ordered children, vi,1, . . . , vi,ri with labels, Ai,1, . . . ,Ai,ri ,respectively.

Thus, if a sequence X = A1 . . .Ar is the subsequence of Xi , then there is aninduced ordered subtree by the vertices labeled A1, . . . ,Ar and the root node la-beled ROOT in Ti . Conversely, an ordered subtree induced by the vertices labeledA1, . . . ,Ar and the root node ROOT in Ti correspond to the subsequence A1 . . .Ar

of Xi .Hence, a subsequence X = A1 · · ·Ar is the common subsequence of X1, . . . ,Xk

if and only if there is an ordered subtree consisting of a root node labeled ROOT andchildren vertices labeled A1, . . . ,Ar is a common ordered subtree of T1, . . . , Tk .

The proof that the reduction is FPT is left to the reader. �

Thus, by the known parameterized complexity results of the LCS problem, we getthe parameterized hardness results of the MULTI-MCT problem. These are presentedin Table 4.

5 Parameterized complexity of the Multi-MCT problem for vertex-labeled,unordered trees

In this section, we show that for vertex-labeled, unordered trees, the parameterizedversion of the MULTI-MCT problem named m-MULTI-MCT is W [1]-hard. In orderto prove this, we will show that the maximum clique (CLIQUE) problem is reducibleto the MULTI-MCT problem for vertex-labeled, unordered trees. The reduction ismotivated by the L-reduction in (Akutsu and Halldórsson 2000), which reduces themaximum independent set problem to MULTI-MCT problem in order to show thehardness of approximation algorithms. We give some modifications.

Lemma 5.1 There is a linear FPT reduction from CLIQUE to Multi-MCT for labeledunordered trees.

Proof Given a graph G = (V ,E) with V = (v1, . . . , vn), we construct n + 1 trees:T1, . . . , Tn+1. For any i ≤ n, the Ti have a root labeled by 0 and have two chil-dren nodes ci, di , which are labeled by 0. For each node ci , it has n children nodesci,1, . . . , ci,n. Every node ci,i is labeled by i. For every ci,j where i �= j , it is la-beled by j if the edge (vi, vj ) ∈ E, or it is labeled by −1 if the edge (vi, vj ) /∈ E.Intuitively, the labels of vertices ci,j encode the adjacency list of graph vertex vi .

For each node di , it has n children nodes di,1, . . . , di,n. For every node di,j , it islabeled by j if j �= i, or it is labeled by −1 if j = i.

Page 7: On the parameterized complexity of the Multi-MCT and Multi-MCST problems

J Comb Optim (2011) 21: 151–158 157

Table 5 Parameterized complexity class hardness results for parameterized MULTI-MCT and MULTI-MCST problems

Parameter Multi-MCT Multi-MCST

Ordered Trees Unordered Trees

k W [t]-hard ∀t ≥ 1 N/A W [t]-hard ∀t ≥ 1

m W [2]-hard W [1]-hard N/A

k,m W [1]-hard N/A N/A

k,� W [t]-hard ∀t ≥ 1 N/A W [t]-hard ∀t ≥ 1

The tree Tn+1 has a root node labeled by 0 and has n children nodes t1, . . . , tn. Foreach node tj , it is labeled by j .

Assume that there is a clique C = {vj1, . . . , vjk}. Then there is common subtree T

of size k + 1, of the set T1, . . . , Tn+1, which consists of one root node labeled by 0and k children nodes, which are labeled by {j1, . . . , jk}.

On the other hand, suppose that T is a common subtree of size k +1. Then T mustinclude the root of Tn+1 and some children nodes. These children nodes will have thelabels j1, . . . , jk . The vertices vj1, . . . , vjk

describe exactly a clique in the graph G.Thus, given a common subtree of size k + 1 we obtain a clique of size k.

The proof that the reduction is FPT is left to the reader. �

Since it is know m-CLIQUE is W[1]-hard, we get the following conclusion.

Theorem 5.2 For labeled unordered trees, m-Multi-MCT is W[1]-hard.

6 Conclusion

In this paper, we have proven that various parameterized versions of the MULTI-MCT and MULTI-MCST problems are hard for various parameterized complexityclasses. The overview of our results are given in Table 5. Since all of the problemsconsidered here are at least W [1]-hard, it is unlikely that the parameterized versionsof the MULTI-MCT and MULTI-MCST can be solved in O(f (k) ∗ nO(1)) time.

Acknowledgements The authors are thankful to the reviewers for their insightful comments. This re-search has been supported by the “Exploratory Data Intensive Computing for Complex Biological Sys-tems” project from U.S. Department of Energy (Office of Advanced Scientific Computing Research, Officeof Science). The work of NFS was also sponsored by the Laboratory Directed Research and DevelopmentProgram of Oak Ridge National Laboratory. Oak Ridge National Laboratory is managed by UT-Battellefor the LLC U.S. D.O.E. under contract no. DEAC05-00OR22725.

References

Akutsu T (1992) An RNC algorithm for finding a largest common subtree of two trees. IEICE Trans InfSyst 75(1):95–101

Akutsu T, Halldórsson M (2000) On the approximation of largest common subtrees and largest commonpoint sets. Theor Comput Sci 233(1–2):33–50

Page 8: On the parameterized complexity of the Multi-MCT and Multi-MCST problems

158 J Comb Optim (2011) 21: 151–158

Aoki KF, Yamaguchi A, Okuno Y, Akutsu T, Ueda N, Kanehisa M, Mamitsuka H (2003) Efficient tree-matching methods for accurate carbohydrate database queries. Genome Inf 14:134–143

Bodlaender H, Downey R, Fellows M, Hallett M, Wareham H (1995b) Parameterized complexity analysisin computational biology. Comput Appl Biosci 11(1):49–57

Bodlaender HL, Downey RG, Fellows MR, Wareham HT (1995a) The parameterized complexity of se-quence alignment and consensus. Theor Comput Sci 147(1–2):31–54

Downey RG, Fellows MR (1999) Parameterized complexity. Springer, New YorkFarach M, Thorup M (1994) Fast comparison of evolutionary trees. In: The fifth annual ACM-SIAM sym-

posium on discrete algorithms. Society for Industrial and Applied Mathematics, Arlington, pp 481–488

Fellows MR, Hallett MT, Stege U (2003) Analogs and duals of the MAST problem for sequences andtrees. J Algorithms 49(1):192–216

Gou G, Chirkova R (2007) Efficiently querying large XML data repositories: a survey. IEEE Trans KnowlData Eng 19(10):1381–1403

Gupta A, Nishimura N (1998) Finding largest subtrees and smallest supertrees. Algorithmica 21(2):183–210

Hallett MT (1996) An integrated complexity analysis of problems from computational biology. PhD thesis,University of Victoria

Rosselló F, Valiente G (2006) An algebraic view of the relation between largest common subtrees andsmallest common supertrees. Theor Comput Sci 362(1–3):33–53

Zaki MJ (2002) Efficiently mining frequent trees in a forest. In: The eighth ACM SIGKDD internationalconference on knowledge discovery and data mining. ACM, Edmonton, pp 71–80