12
Theoretical Computer Science 369 (2006) 384 – 395 www.elsevier.com/locate/tcs Partial multicuts in trees Asaf Levin a , , Danny Segev b, 1 a Department of Statistics, The Hebrew University of Jerusalem, Jerusalem 91905, Israel b School of Mathematical Sciences, Tel-Aviv University, Tel-Aviv 69978, Israel Received 31 May 2005; received in revised form 20 September 2006; accepted 24 September 2006 Communicated by D. Peleg Abstract Let T = (V,E) be an undirected tree, in which each edge is associated with a non-negative cost, and let {s 1 ,t 1 },..., {s k ,t k } be a collection of k distinct pairs of vertices. Given a requirement parameter t k, the partial multicut on a tree problem asks to find a minimum cost set of edges whose removal from T disconnects at least t out of these k pairs. This problem generalizes the well-known multicut on a tree problem, in which we are required to disconnect all given pairs. The main contribution of this paper is an ( 8 3 + )-approximation algorithm for partial multicut on a tree, whose run time is strongly polynomial for any fixed > 0. This result is achieved by introducing problem-specific insight to the general framework of using the Lagrangian relaxation technique in approximation algorithms. Our algorithm utilizes a heuristic for the closely related prize-collecting variant, in which we are not required to disconnect all pairs, but rather incur penalties for failing to do so. We provide a Lagrangian multiplier preserving algorithm for the latter problem, with an approximation factor of 2. Finally, we present a new 2-approximation algorithm for multicut on a tree, based on LP-rounding. © 2006 Published by Elsevier B.V. Keywords: Multicut; Lagrangian relaxation; Approximation algorithms 1. Introduction In this paper we address the partial multicut on a tree problem. The input to this problem consists of an undirected tree T = (V,E), in which each edge e E is associated with a non-negative cost c e , and a collection of k distinct pairs of vertices, {s 1 ,t 1 },..., {s k ,t k }. For 1 i k, the pair {s i ,t i } is said to be separated by the edge set D E if it is not contained in a single connected component of T D. In other words, the removal of D disconnects s i and t i . Given a requirement parameter t k, the objective is to find a minimum cost set of edges that separates at least t out of the k pairs. In spite of these seemingly simple settings, we are not aware of any previous study of this problem. Partial multicut on a tree contains as a special case the well-known multicut on a tree problem, in which we are required to separate all given pairs. Garg et al. [10] demonstrated that this problem is at least as hard to approximate as vertex cover, even in unweighted trees of height 1. In addition, they presented a primal-dual algorithm that constructs An extended abstract of this paper appeared in Proceedings of the Third InternationalWorkshop on Approximation and Online Algorithms, 2005, pp. 320–333. Corresponding author. Tel.: +972 2 5883313. E-mail addresses: [email protected] (A. Levin), [email protected] (D. Segev). 1 This work is part of the author’s Ph.D. thesis prepared atTel-Aviv University under the supervision of Prof. Refael Hassin. 0304-3975/$ - see front matter © 2006 Published by Elsevier B.V. doi:10.1016/j.tcs.2006.09.018

Partial multicuts in trees

Embed Size (px)

Citation preview

Page 1: Partial multicuts in trees

Theoretical Computer Science 369 (2006) 384–395www.elsevier.com/locate/tcs

Partial multicuts in trees�

Asaf Levina,∗, Danny Segevb,1

aDepartment of Statistics, The Hebrew University of Jerusalem, Jerusalem 91905, IsraelbSchool of Mathematical Sciences, Tel-Aviv University, Tel-Aviv 69978, Israel

Received 31 May 2005; received in revised form 20 September 2006; accepted 24 September 2006

Communicated by D. Peleg

Abstract

Let T = (V , E) be an undirected tree, in which each edge is associated with a non-negative cost, and let {s1, t1}, . . . , {sk, tk}be a collection of k distinct pairs of vertices. Given a requirement parameter t �k, the partial multicut on a tree problem asks tofind a minimum cost set of edges whose removal from T disconnects at least t out of these k pairs. This problem generalizes thewell-known multicut on a tree problem, in which we are required to disconnect all given pairs.

The main contribution of this paper is an ( 83 + �)-approximation algorithm for partial multicut on a tree, whose run time is

strongly polynomial for any fixed � > 0. This result is achieved by introducing problem-specific insight to the general frameworkof using the Lagrangian relaxation technique in approximation algorithms. Our algorithm utilizes a heuristic for the closely relatedprize-collecting variant, in which we are not required to disconnect all pairs, but rather incur penalties for failing to do so. We providea Lagrangian multiplier preserving algorithm for the latter problem, with an approximation factor of 2. Finally, we present a new2-approximation algorithm for multicut on a tree, based on LP-rounding.© 2006 Published by Elsevier B.V.

Keywords: Multicut; Lagrangian relaxation; Approximation algorithms

1. Introduction

In this paper we address the partial multicut on a tree problem. The input to this problem consists of an undirectedtree T = (V , E), in which each edge e ∈ E is associated with a non-negative cost ce, and a collection of k distinct pairsof vertices, {s1, t1}, . . . , {sk, tk}. For 1� i�k, the pair {si, ti} is said to be separated by the edge set D ⊆ E if it is notcontained in a single connected component of T − D. In other words, the removal of D disconnects si and ti . Given arequirement parameter t �k, the objective is to find a minimum cost set of edges that separates at least t out of the kpairs. In spite of these seemingly simple settings, we are not aware of any previous study of this problem.

Partial multicut on a tree contains as a special case the well-known multicut on a tree problem, in which we arerequired to separate all given pairs. Garg et al. [10] demonstrated that this problem is at least as hard to approximate asvertex cover, even in unweighted trees of height 1. In addition, they presented a primal-dual algorithm that constructs

� An extended abstract of this paper appeared in Proceedings of the Third International Workshop on Approximation and Online Algorithms, 2005,pp. 320–333.

∗ Corresponding author. Tel.: +972 2 5883313.E-mail addresses: [email protected] (A. Levin), [email protected] (D. Segev).

1 This work is part of the author’s Ph.D. thesis prepared at Tel-Aviv University under the supervision of Prof. Refael Hassin.

0304-3975/$ - see front matter © 2006 Published by Elsevier B.V.doi:10.1016/j.tcs.2006.09.018

Page 2: Partial multicuts in trees

A. Levin, D. Segev / Theoretical Computer Science 369 (2006) 384–395 385

a feasible solution whose cost is at most twice the optimum. We refer to this algorithm as the GVY algorithm, andprovide additional details on its analysis in Section 2, since it serves as one of the building blocks of our algorithm.

When the underlying graph is not restricted to be a tree, the multicut problem becomes much harder. Dahlhaus et al.[6] proved that the multicut problem is NP-hard for all fixed k�3, even when the cost of each edge is 1. Very recently,an arbitrarily large constant factor hardness was given by Chawla et al. [5], assuming the Unique Games Conjecture ofKhot [17]. A stronger version of this conjecture leads to a hardness result of �(log log n). On the positive side, Garget al. [9] used the region growing scheme to obtain an O(log k)-approximation algorithm for the multicut problem.

The partial multicut on a tree problem can also be considered in a different context. Given a ground set of elementsU = {e1, . . . , en}, a collection S1, . . . , Sm of subsets of U with non-negative costs c(Si) and a parameter t �n, partialcover is the problem of finding a minimum cost subcollection of sets that covers at least t elements. Note that a pair{si, ti} is separated by D ⊆ E if this set of edges contains at least one edge from the unique path connecting si and tiin T, which we denote by [si, ti]. This observation allows us to interpret partial multicut on a tree as a special case ofpartial cover. The elements to cover are the paths [si, ti], 1� i�k, and the sets correspond to the edges of T. An edgee ∈ E covers those paths to which it belongs, with cost ce.

The partial cover problem received a great deal of attention in recent years. When t = n, partial cover reduces tothe standard set cover problem, in which we wish to cover the entire universe of elements. Therefore, partial covercannot be approximated to within a ratio of (1 − �) ln n for any � > 0, unless NP ⊂ TIME(nO(log log n)) [7]. Slavík [21]generalized the analysis of the greedy set cover algorithm and proved that it guarantees an H(t)-approximation forpartial cover. For the special case where each element appears in at most f sets, Bar-Yehuda [2] gave an f-approximationusing the local-ratio method. This case was also studied by Gandhi et al. [8], who achieved a similar approximationratio using a primal-dual algorithm. Unfortunately, simple examples show that none of these algorithms provides aconstant factor approximation for partial multicut on a tree.

A closely related generalization of multicut is the prize-collecting multicut problem. In this variant we are not requiredto separate all pairs. However, if the set of edges we pick does not separate a pair {si, ti}, we incur a penalty of pi . Theobjective is to find a set of edges D ⊆ E that minimizes the cost of D plus the penalties of unseparated pairs. Thisproblem indeed generalizes the multicut problem, since an optimal prize-collecting solution is also an optimal multicutwhen pi = ∞ for all i. Once again, we focus our attention on instances of the problem in which the input graph is atree.

For the remainder of this paper, the term “on a tree” is omitted whenever we discuss any of the problems or algorithmsconsidered here. We remark that none of our results holds when the underlying graph is not a tree.

1.1. Results and techniques

In Section 2, we present an interpretation of the prize-collecting multicut problem as an equivalent multicut problem,which is created by adding new leaf vertices to the original tree T and modifying the collection of pairs to be separated.A 2-approximation for this problem immediately follows by applying the GVY algorithm to the resulting multicutinstance. However, the partial multicut algorithm we suggest uses a prize-collecting heuristic as a subroutine, andrequires a bound stronger than the one obtained by this straightforward approach.

Specifically, the prize-collecting algorithm should possess the Lagrangian multiplier preserving (LMP) property 2 :if we denote by C the total edge costs and by P the total penalties of unseparated pairs, then for some constant r �1we have C + rP �rOPT, where OPT is the cost of an optimal solution. To achieve this property, we prove that ourreduction produces multicut instances whose unique configuration forces the GVY algorithm to eliminate edges thatare not part of the original tree, as long as feasibility is maintained. This corresponds to discarding redundant penaltiesfrom the prize-collecting solution. By exploiting the special structural properties of the resulting solution, we strengthenthe analysis of Garg et al. and prove that the LMP property is satisfied with factor r = 2.

In Section 3, we present the main result of this paper, an ( 83 + �)-approximation algorithm for the partial multicut

problem, whose run time is strongly polynomial for any fixed � > 0. It is important to note that this algorithm reliesheavily on a preprocessing step in which we “guess” certain attributes of a fixed arbitrary optimal solution. This stepis implemented using an exhaustive search that involves O(n1/�) calls to the procedure described below.

2 This term was coined by Jain et al. [15].

Page 3: Partial multicuts in trees

386 A. Levin, D. Segev / Theoretical Computer Science 369 (2006) 384–395

Although our algorithm is based on problem-specific methods, it is guided by the general framework of using theLagrangian relaxation technique in approximation algorithms, originally suggested by Jain and Vazirani [16]. Withrespect to a natural integer programming formulation of partial multicut, we relax the complicating constraint that atmost k − t pairs are not separated, and move it to the objective function together with a Lagrangian multiplier �. Forany fixed value of �, this operation results in an instance of the prize-collecting multicut problem, with an additionalconstant term in the objective function. Rather than ensuring that the original constraint is satisfied, this new problemplaces a uniform penalty of � for not separating any of the given pairs.

Next, we use the prize-collecting algorithm to conduct a binary search, at the end of which we find �1 ��2 such that:for �1, the algorithm separates t1 � t pairs by picking the edge set D1; for �2, it separates t2 � t pairs by picking D2. Weobserve that D1 and D2 by themselves are not good solutions, since the cost of D1 can be arbitrarily large with respectto that of the optimal solution, and since D2 is generally not feasible. To resolve this problem, we devise an auxiliaryprocedure that constructs a new feasible solution D3 by greedily transferring edges from D1 to D2. Our analysis showsthat when �1 and �2 are sufficiently close, the cost of the cheaper solution from D1 and D3 is within factor 8

3 + � ofoptimum.

Although the GVY algorithm constructively proves an upper bound of 2 on the integrality gap of the multicut LP-relaxation, no rounding algorithm is known for this problem. In Section 4 we provide such an algorithm, which isvery easy to analyze and implement, although it requires solving two linear programs. Our method can be viewed asan extension of the threshold rounding technique, introduced by Hochbaum [13] for the vertex cover problem. Usingthe optimal fractional solution d∗, our algorithm identifies a new collection of pairs to separate, and constructs a newlinear program with the objective of separating these pairs. We prove that the polyhedron of feasible solutions to thisprogram has integral extreme points. Moreover, we show that the integral solution we obtain is feasible for the originalproblem, and that its cost is at most twice the optimum. We remark that our algorithm follows a technique similar tothe one suggested by Gaur et al. [11] for the rectangle stabbing problem.

2. The prize-collecting multicut problem

The main result of this section is a Lagrangian multiplier preserving algorithm for the prize-collecting multicutproblem, with an approximation factor of 2. We begin with a brief description of the GVY algorithm 3 and thestructural properties of the solution it constructs. Next, we show how to reduce the prize-collecting multicut problemto an equivalent multicut problem by modifying the original tree and collection of pairs. Finally, we observe that ourreduction forces the GVY algorithm to discard redundant penalties from the prize-collecting solution. Our analysisexploits this property to establish the main result of this section.

2.1. The GVY algorithm

The multicut problem can be formulated as an integer program by

minimize∑e∈E

cede (MC)

subject to∑

e∈[si ,ti ]de �1 ∀i = 1, . . . , k, (2.1)

de ∈ {0, 1} ∀e ∈ E. (2.2)

In this formulation, the variable de indicates whether the edge e is picked for the multicut. Constraint (2.1) ensures thatwe pick at least one edge from each path [si, ti]. The LP-relaxation of this program, (MCf ), is obtained by replacingthe integrality constraint (2.2) with de �0. The dual of this linear program is

maximizek∑

i=1fi

subject to∑

i:e∈[si ,ti ]fi �ce ∀e ∈ E, (2.3)

fi �0 ∀i = 1, . . . , k. (2.4)

3 Actually, we describe its simplified version, that appears in [22, Chapter 18].

Page 4: Partial multicuts in trees

A. Levin, D. Segev / Theoretical Computer Science 369 (2006) 384–395 387

Fig. 1. The GVY algorithm.

The dual program can be viewed as the maximum multicommodity flow problem. Given k pairs of vertices, where eachpair {si, ti} is associated with a distinct commodity, the objective is to maximize the sum of routed commodities. In thiscontext, the variable fi specifies the amount of commodity we route between si and ti . The primal costs now serve ascapacities, and constraint (2.3) states that the sum of flows routed through each edge e does not exceed its capacity ce.

The GVY algorithm is shown in Fig. 1. It follows the primal-dual schema for approximation algorithms, and constructsfeasible primal and dual solutions whose costs are within a factor of 2 from each other. Let D be the edge set producedby the algorithm, and let f be the corresponding dual flow. Two structural properties of these solutions were proved in[10] and will be essential to our subsequent analysis.

Property 1. Only saturated edges are picked. That is, for every edge e, if e∈D then∑

i:e∈[si ,ti ] fi = ce.

Property 2. If there is a positive flow between si and ti , at most two edges from the path [si, ti] are picked. That is,for every 1� i�k, if fi > 0 then |D ∩ [si, ti]|�2.

2.2. The prize-collecting algorithm

Reducing prize-collecting multicut to multicut: Given an instance of the prize-collecting multicut problem, with pairs{si, ti} and associated penalties pi , we can translate it to an instance of the multicut problem as follows. For every1� i�k, we add a new leaf vertex t ′i to T, and connect it to ti . The cost of the additional edge (ti , t

′i ) is pi . The new

multicut problem asks to separate the pairs {s1, t′1}, . . . , {sk, t ′k} in the resulting tree, T ′.

We now illustrate the equivalence between these two problems. Let D ⊆ E be any solution to the prize-collectingmulticut problem in T, and let N ⊆ {1, . . . , k} be the index set of pairs that are not separated by D. The cost of thissolution is

∑e∈D ce + ∑

i∈N pi . Since the edge (ti , t′i ) separates the pair {si, t ′i } in T ′, we can easily construct a

corresponding multicut in T ′ by picking the edge set D ∪ {(ti , t ′i ) : i ∈ N}. Clearly, the resulting solution has anidentical cost, since the cost of (ti , t

′i ) is pi . Similarly, any minimal solution D ⊆ E(T ′) to the multicut problem in T ′

can be used to obtain a prize-collecting solution in T with the same cost. This is done by picking the edge set D ∩ E

and paying the penalties∑

i∈N pi , where N = {1� i�k : (ti , t′i ) ∈ D}.

An additional structural property: The reduction above suggests a straightforward way to approximate the prize-collecting multicut problem: reduce it to multicut, use the GVY algorithm, and translate the solution back to the originalproblem. Although Properties 1 and 2 can be used to prove that we obtain a 2-approximation, they are not sufficient toguarantee the LMP property. We deal with this difficulty through a closer inspection of phase 2 in the GVY algorithm,as a result of which we discover a third structural property.

For each 1� i�k such that (ti , t′i ) appears in the final solution D, consider the exact point in phase 2 at which the

algorithm checks whether (ti , t′i ) can be deleted or not. Since (ti , t

′i ) does not separate any pair other than {si, t ′i }, the

algorithm is allowed to discard it if at least one edge on the path [si, ti] appears in D at this point of time. It follows that

Page 5: Partial multicuts in trees

388 A. Levin, D. Segev / Theoretical Computer Science 369 (2006) 384–395

we currently have D ∩ [si, ti] = ∅, or otherwise (ti , t′i ) would have been deleted. By observing that no edge is added

after phase 1, we conclude the following property.

Property 3. If the edge (ti , t′i ) survived phase 2, no other edge on the path [si, t ′i ] was picked. That is, for every

1� i�k, if (ti , t′i ) ∈ D then D ∩ [si, ti] = ∅.

Analysis: Let DT ⊆ D be the set of edges that survived phase 2 and also belong to the original tree T. Property 3implies that the index set of pairs that are not separated by DT is exactly N = {1� i�k : (ti , t

′i ) ∈ D}. Therefore, DT

is a solution to the original prize-collecting problem with edge costs∑

e∈DTce and penalties

∑i∈N pi . In Lemmas 1

and 2 we separately bound the edge costs and penalties in terms of the dual solution f to the multicut problem in T ′.In Theorem 3 we combine these bounds to prove the main result of this section.

Lemma 1.∑

e∈DTce �2

∑i /∈N fi .

Proof. Property 3 implies that no edge in DT belongs to a path [si, ti] for i ∈ N , since otherwise the edge (ti , t′i ) would

not have survived phase 2. Therefore,∑e∈DT

ce = ∑e∈DT

∑i:e∈[si ,t ′i ]

fi (2.5)

= ∑e∈DT

∑i /∈N :e∈[si ,t ′i ]

fi (2.6)

= ∑i /∈N

fi · |DT ∩ [si, t ′i ]| (2.7)

� 2∑i /∈N

fi. (2.8)

Eq. (2.5) holds since ce = ∑i:e∈[si ,t ′i ] fi , by Property 1. Eq. (2.6) follows from the observation that e /∈ [si, t ′i ] for all

i ∈ N , since e ∈ DT . Eq. (2.7) results from changing the order of summation. Inequality (2.8) is due to |DT ∩[si, t ′i ]|�2,which is implied by DT ⊆ D and Property 2. �

Lemma 2.∑

i∈N pi = ∑i∈N fi .

Proof. Since the unique path to which (ti , t′i ) belongs is [si, t ′i ], for every 1� i�k we have the dual constraint

fi �c(ti ,t′i )

= pi . When i ∈ N , the edge (ti , t′i ) was picked by the algorithm, and fi = pi by Property 1. �

Theorem 3. Let OPT be the cost of an optimal solution to the prize-collecting multicut problem. Then,∑e∈DT

ce + 2∑i∈N

pi �2 · OPT.

Proof. We observed earlier that any solution to the prize-collecting multicut problem in T has a matching multicutsolution in T ′ with an identical cost. Therefore, it is sufficient to prove the claim when OPT is replaced with the costof an optimal solution to the latter problem, OPT′. By combining Lemmas 1 and 2, we have

∑e∈DT

ce + 2∑i∈N

pi �2∑i /∈N

fi + 2∑i∈N

fi = 2k∑

i=1fi �2 · OPT′.

The last inequality holds since f is a feasible dual solution, and its cost is a lower bound on the cost of any solution tothe multicut problem. �

3. The partial multicut problem

In what follows we describe the main result of this paper, an ( 83 + �)-approximation algorithm for the partial multicut

problem. It runs in strongly polynomial time for any fixed � > 0. We first present a natural integer programmingformulation of partial multicut and derive its Lagrangian relaxation, the prize-collecting multicut problem. We then use

Page 6: Partial multicuts in trees

A. Levin, D. Segev / Theoretical Computer Science 369 (2006) 384–395 389

the prize-collecting algorithm as a subroutine to find two preliminary sets of edges, D1 and D2. Although these setsare not good solutions by themselves, we show how to greedily combine them into a new edge set D3, and prove thatthe cost of the cheaper solution from D1 and D3 is within factor 8

3 + � of optimum.

3.1. Initial assumptions

An essential part of our algorithm is a preprocessing step in which we guess certain attributes of a fixed arbitraryoptimal solution, D∗ ⊆ E, whose cost we denote by OPT. Based on these attributes, the given tree and collection ofpairs are modified as we explain below. Given an accuracy parameter � > 0, we can make the following assumptionsby conducting an exhaustive search that involves O(n1/�) calls to the main algorithm and returning the best solutionwe find.

Assumption 1. All edge costs are strictly positive.

Assumption 2. We are familiar with cmax, the maximum cost of an edge in D∗.

Assumption 3. The cost of each edge is at most � · OPT.

Assumption 1 is obvious, since we can pick all zero cost edges in advance and contract them. We also eliminate thesubset of pairs that are separated by these edges and update the requirement parameter t. Assumption 2 is justified,since we can test all O(n) edge costs as cmax, and for each such value contract all edges whose cost is greater than cmax.Finally, it is possible to enforce Assumption 3 by observing that there are at most �1/� edges in D∗ with ce �� · OPT.Therefore, we can guess the expensive edges in D∗ by testing all O(n1/�) subsets H ⊆ E of cardinality at most �1/� .For each such subset, we include H in the solution, eliminate the subset of pairs separated by H, update the requirementparameter, and contract all edges whose cost is greater than mine∈H ce.

For the remainder of this section, we continue to denote by k the overall number of pairs, and by t the requirednumber of pairs to be separated.

3.2. The Lagrangian relaxation

The partial multicut problem can be formulated as an integer program by

minimize∑e∈E

cede

subject to∑

e∈[si ,ti ]de + zi �1 ∀i = 1, . . . , k, (3.1)

k∑i=1

zi �k − t, (3.2)

de, zi ∈ {0, 1} ∀e ∈ E, i = 1, . . . , k. (3.3)

The variable de indicates whether we pick the edge e, and the variable zi indicates whether the pair {si, ti} is notseparated. Constraint (3.1) ensures that we either pick at least one edge of [si, ti] or do not separate the correspondingpair. Constraint (3.2) ensures that at most k − t pairs are not separated, which is equivalent to requiring that at least tpairs are separated.

We relax the complicating constraint (3.2) and move it to the objective function multiplied by ��0, to obtain thefollowing Lagrangian relaxation problem:

L(�) = F(�) − �(k − t)

F (�) = minimize∑e∈E

cede + �k∑

i=1zi

subject to∑

e∈[si ,ti ]de + zi �1 ∀i = 1, . . . , k, (3.4)

de, zi ∈ {0, 1} ∀e ∈ E, i = 1, . . . , k. (3.5)

Page 7: Partial multicuts in trees

390 A. Levin, D. Segev / Theoretical Computer Science 369 (2006) 384–395

For any fixed value of �, L(�) is an integer programming formulation of the prize-collecting multicut problem 4 F(�),with an additional constant term −�(k− t). Note that the problem F(�) places a uniform penalty of � for not separatingany of the given pairs. Although the next lemma is plain duality, we provide it for completeness.

Lemma 4. max��0 L(�)�OPT.

Proof. Let (d∗, z∗) be a binary vector indicating which edges of T are contained in the optimal solution D∗, and whichpairs are not separated. That is, d∗

e = 1 if and only if e ∈ D∗, and z∗i = 1 if and only if D∗ ∩ [si, ti] = ∅. Since (d∗, z∗)

satisfies constraint (3.1), it is a feasible solution to L(�), with cost

∑e∈E

ced∗e + �

k∑i=1

z∗i −�(k − t) = ∑

e∈E

ced∗e +�

(k∑

i=1z∗i − (k − t)

)�

∑e∈E

ced∗e = OPT.

Note that �(∑k

i=1 z∗i − (k − t))�0 since ��0 and

∑ki=1 z∗

i �k − t , by constraint (3.2). �

3.3. Finding useful integral solutions

Given ��0, we can use the LMP prize-collecting algorithm from Section 2 to obtain an integral solution (d�, z�)

for F(�) that satisfies∑

e∈E ced�e + 2�

∑ki=1 z�

i �2F(�). In particular, if we can find a value of � for which (d�, z�)

separates exactly t pairs, Lemma 4 shows that this solution is a 2-approximation for the partial multicut problem, since∑e∈E

ced�e �2(F (�) − �(k − t)) = 2L(�)�2 · OPT.

However, we do not know how to find such a value of �. In fact, there are instances in which the prize-collectingalgorithm does not separate exactly t pairs for any value of �. For example, consider a star with n arms, each of unitcost. The center of the star is s, and the vertex at the end of each arm i is ti . The collection of pairs to separate is{s, t1}, . . . , {s, tn}. It is easy to verify that the prize-collecting algorithm does not separate any pair for penalties ��1,but separates all pairs when �>1.

Nevertheless, when � = 0 the prize-collecting algorithm does not separate any pair. This follows from observingthat by Assumption 1 all edge costs are strictly positive, and since F(0) = 0 no edge is picked by the algorithm. Inaddition, F(�)�kcmax for any �, since we can separate all pairs by picking at most k edges (with maximum cost cmax).It follows that the algorithm separates all pairs when � > kcmax. Therefore, using the prize-collecting algorithm weconduct a binary search over the interval [0, kcmax + 1], in which we find �1 ��2, with approximate solutions (d1, z1)

and (d2, z2) for F(�1) and F(�2), respectively, such that1. �1 − �2 �� · cmin/k, where cmin is the minimum cost of an edge in T (recall that cmin > 0 by Assumption 1).2. The solution (d1, z1) separates t1 � t pairs.3. The solution (d2, z2) separates t2 � t pairs.

Without loss of generality, none of these solutions separates exactly t pairs, or otherwise we immediately obtain a2-approximation. Having observed this fact, we claim that the cost of an optimal k-multicut can be approximated bya convex combination of (d1, z1) and (d2, z2), an essential characterization on which our lower bounding argumentswill depend.

Lemma 5. Let � = (t − t2)/(t1 − t2) ∈ (0, 1). Then,

�∑e∈E

ced1e + (1 − �)

∑e∈E

ced2e �2(1 + �)OPT.

Proof. We first observe that for j = 1, 2 we have

∑e∈E

cedje + 2�j

k∑i=1

zji �2F(�j ) = 2(L(�j ) + �j (k − t))�2(OPT + �j (k − t)),

4 For ease of presentation, we refer to the previously described integer program and its optimum value as F(�).

Page 8: Partial multicuts in trees

A. Levin, D. Segev / Theoretical Computer Science 369 (2006) 384–395 391

where the first inequality follows from Theorem 3, and the second from Lemma 4. Therefore,

�∑e∈E

ced1e + (1 − �)

∑e∈E

ced2e � 2 · OPT + 2��1((k − t) − (k − t1)) + 2(1 − �)�2((k − t) − (k − t2))

� 2(1 + �)OPT,

where the last inequality follows from observing that

��1((k − t) − (k − t1)) + (1 − �)�2((k − t) − (k − t2))

��(�2 + � · cmin

k

)((k − t) − (k − t1)) + (1 − �)�2((k − t) − (k − t2))

= �2((k − t) − (�(k − t1) + (1 − �)(k − t2))) + � · � · cmint1 − t

k�� · cmin

�� · OPT.

The first inequality holds since �1 − �2 �� · cmin/k and k − t1 �k − t . The second inequality holds since k − t =�(k − t1) + (1 − �)(k − t2), ��1 and t1 − t �k. �

We remark that O(log (k2cmax)/(� · cmin)) calls to the prize-collecting algorithm are required in order to completethe binary search described above. In Appendix A we show that this step can be replaced with an approximate versionof Megiddo’s parametric search method [20], whose run time is strongly polynomial. A similar modification was alsogiven by Levin [19].

3.4. A greedy partial cover algorithm

We temporarily deviate from the problem-specific theme of this section, to design a greedy partial cover algorithm.Its analysis will considerably simplify the presentation of the final step in our algorithm. We state the next result interms of set systems, since it does not rely on the special structure of the partial multicut problem. Let U = {e1, . . . , en}be a ground set of elements, and let S = {S1, . . . , Sm} be a collection of subsets of U, where each subset Si has anon-negative cost ci . We show how to find in polynomial time a subcollection S ′ ⊆ S covering at least q elements,such that c(S ′)�(q/n)c(S) + maxSi∈S ci .

Without loss of generality, we assume that S is a minimal cover of U. In other words, U cannot be covered byS \ {Si}, for all Si ∈ S. We assign each element e ∈ U to an arbitrary subset Si in which it appears. Let � : U → Sbe the resulting assignment, and for each Si ∈ S let �−1(Si) be the subset of U that is assigned to Si . Note that{�−1(Si) : Si ∈ S} is a partition of U, and �−1(Si) �= ∅ for every Si ∈ S, since S is minimal. For a subset Si ∈ S, letri = ci/|�−1(Si)| be its ratio. We assume that the subsets in S are indexed in non-decreasing order of their ratio, thatis, r1 � · · · �rm.

Theorem 6. Let S ′ = {S1, . . . , Sp}, where p is the minimal index for which∑p

i=1|�−1(Si)|�q. Then,

c(S ′)� q

nc(S) + max

Si∈Sci .

Proof. Clearly, it is sufficient to prove that∑p−1

i=1 ci �(q/n)c(S). Since r1 � · · · �rm, the weighted average ratio ofthe subsets in {S1, . . . , Sm} is an upper bound on that of the subsets in {S1, . . . , Sp−1}, where the weight of each subsetSi ∈ S is |�−1(Si)|. Therefore,

p−1∑i=1

ci �∑p−1

i=1 |�−1(Si)|∑mi=1 |�−1(Si)|

m∑i=1

ci �q

nc(S),

since∑m

i=1 |�−1(Si)| = n and∑p−1

i=1 |�−1(Si)| < q, or otherwise p is not minimal. �

Page 9: Partial multicuts in trees

392 A. Levin, D. Segev / Theoretical Computer Science 369 (2006) 384–395

3.5. A greedy combination

Let D1 be the set of edges picked by the solution (d1, z1). Although D1 is a feasible solution to the partial multicutproblem, its cost can be arbitrarily large with respect to OPT. In contrast, Theorem 3 and Lemma 4 imply that the costof the edge set D2, picked by the solution (d2, z2), is at most 2 · OPT. Since D2 is not a feasible solution, our finalobjective is to construct a new feasible solution D3 by greedily transferring edges from D1 to D2.

Since D2 separates t2 pairs, we can complete it to a feasible solution by finding a set of edges that separates at leastt − t2 additional pairs. Note that D1\D2 separates at least t1 − t2 pairs that are not separated by D2. Therefore, wecan use the greedy partial cover algorithm from Section 3.4 to find a set of edges S ⊆ D1 \ D2 that separates at leastt − t2 pairs from those separated by D1 but not by D2. It follows that D3 = D2 ∪ S is a feasible solution to the partialmulticut problem. In addition, by Theorem 6 and the assumption that the cost of each edge is at most � · OPT,

∑e∈S

ce � t − t2

t1 − t2

∑e∈E

ced1e + � · OPT = �

∑e∈E

ced1e + � · OPT. (3.6)

We are now ready to prove that the cost of the cheaper solution from D1 and D3 is within factor 83 + � of optimum.

In Lemmas 7 and 8 we bound the cost of D1 and D3 in terms of OPT, � and �, where

� = t − t2

t1 − t2∈ (0, 1), � =

∑e∈E ced

2e

OPT∈ [0, 2].

Lemma 7.∑

e∈D1ce � 2(1+�)−(1−�)�

� OPT.

Proof. Since � �= 0, we have

∑e∈D1

ce = 1

�· �

∑e∈E

ced1e � 1

(2(1 + �)OPT − (1 − �)

∑e∈E

ced2e

)= 2(1 + �) − (1 − �)�

�OPT.

The first inequality follows from Lemma 5, and the last equation holds since∑

e∈E ced2e = � · OPT. �

Lemma 8.∑

e∈D3ce �(2 + �� + 3�)OPT.

Proof. Since D3 = D2 ∪ S, we have∑e∈D3

ce = ∑e∈D2

ce + ∑e∈S

ce

�∑e∈E

ced2e + �

∑e∈E

ced1e + � · OPT (3.7)

= (1 − �)∑e∈E

ced2e + �

∑e∈E

ced1e + �

∑e∈E

ced2e + � · OPT

� 2(1 + �)OPT + �∑e∈E

ced2e + � · OPT (3.8)

= (2 + �� + 3�)OPT. (3.9)

Inequality (3.7) follows from inequality (3.6), and inequality (3.8) from Lemma 5. Eq. (3.9) is obtained by substituting∑e∈E ced

2e = � · OPT. �

Theorem 9. The cost of the cheaper solution from D1 and D3 is at most ( 83 + 4�)OPT.

Proof. Lemmas 7 and 8 show that

min

{ ∑e∈D1

ce,∑

e∈D3

ce

}� min

{2(1 + �) − (1 − �)�

�, 2 + �� + 3�

}OPT.

Page 10: Partial multicuts in trees

A. Levin, D. Segev / Theoretical Computer Science 369 (2006) 384–395 393

Although we cannot control � ∈ (0, 1) and � ∈ [0, 2], the approximation guarantee of the algorithm can be boundedby considering the worst possible choice for these parameters. It can be easily verified that

max�∈(0,1)�∈[0,2]

min

{2 − (1 − �)�

�, 2 + ��

}= 8

3,

which is attained at � = 12 and � = 4

3 . �

4. An LP-rounding multicut algorithm

In this section we provide an LP-rounding algorithm for the multicut problem, whose approximation factor is 2.Although our algorithm is easy to analyze and implement, it is not as efficient as the GVY algorithm, since we arerequired to solve two linear programs.

4.1. The algorithm

For 1� i�k, let li be the lowest common ancestor of si and ti , with respect to an arbitrary root of T we fix in advance.Recall that the multicut problem can be formulated as the integer program (MC), given in Section 2.1, whose LP-relaxation we denoted by (MCf ). We first solve the linear program (MCf ) to obtain an optimal fractional solution d∗,and use it to identify a new collection of pairs to separate. Specifically, we define vi = si if

∑e∈[si ,li ] d

∗e �

∑e∈[ti ,li ] d

∗e

and vi = ti otherwise. Since [vi, li] is a subpath of [si, ti], any set of edges that separates {v1, l1}, . . . , {vk, lk} alsoseparates the original collection of pairs. We now construct a new linear program

minimize∑e∈E

cede (MC ′f)

subject to∑

e∈[vi ,li ]de �1 ∀i = 1, . . . , k, (4.1)

de �0 ∀e ∈ E (4.2)

and solve it to obtain an optimal solution d̂ .

4.2. Analysis

In Lemma 10 we show that d̂ is an extreme point of an integral polyhedron, and therefore it is indeed a feasiblesolution to (MC). In Theorem 11 we prove that the cost of d̂ is at most twice the cost of d∗, which is a lower boundon the cost of any solution to the multicut problem.

Lemma 10. Any basic feasible solution to (MC′f ) is integral.

Proof. For each path [vi, li], li is an ancestor of vi . Therefore, we can orient the edges of T from the root down to theleaves, and obtain a directed tree. It follows that the constraint matrix in (MC′

f ) is the transpose of a chain matrix,which is a matrix whose columns are edge vectors of directed paths in a graph. Camion [4] showed that the chain matrixinduced by a directed tree is totally unimodular. �

Theorem 11. The cost of d̂ is at most 2 · OPT(MCf ).

Proof. To bound the cost of d̂ , we claim that 2d∗ is a feasible solution to (MC′f ). Since d∗ satisfies constraint (2.1),∑

e∈[si ,li ] d∗e + ∑

e∈[ti ,li ] d∗e = ∑

e∈[si ,ti ] d∗e �1. Therefore, if we assume without loss of generality that∑

e∈[si ,li ] d∗e �

∑e∈[ti ,li ] d

∗e , we have vi = si and

∑e∈[vi ,li ] (2d∗

e ) = 2∑

e∈[si ,li ] d∗e �1. Since d̂ is an optimal

solution to (MC′f ), we conclude that

∑e∈E

ced̂e �∑e∈E

ce(2d∗e ) = 2 · OPT(MCf ). �

Page 11: Partial multicuts in trees

394 A. Levin, D. Segev / Theoretical Computer Science 369 (2006) 384–395

5. Concluding remarks

It would be of interest to investigate whether the approximation guarantee of 83 + � for partial multicut can be

improved. We suggest several approaches in an attempt to achieve better algorithms:1. The construction in the hardness proof of Garg et al. [10] shows that partial multicut contains partial vertex cover as

a special case. The algorithms of Bar-Yehuda [2] and Gandhi et al. [8] give a 2-approximation for the latter problem.This result was obtained even earlier by Bshouty and Burroughs [3] and by Hochbaum [14]. It might be possible toextend and specialize these algorithms to the partial multicut problem.

2. As observed in Section 3, if we can find a value of � for which the prize-collecting algorithm separates exactly tpairs, the resulting solution is a 2-approximation. However, we also observed that there are instances in which thealgorithm does not separate exactly t pairs for any value of �. Therefore, an interesting open question is whetherthe prize-collecting algorithm can be modified to attain the following continuity property: As � grows, the numberof separated pairs jumps by at most 1. A similar question was posed by Jain and Vazirani [16] with respect to theirprimal-dual algorithm for the uncapacitated facility location problem. Archer et al. [1] provided a positive answerto this question, although their modified algorithm is not polynomial.

3. It is possible to augment the prize-collecting algorithm with a local improvement phase. Given an edge set D ⊆ E,we may add to D an edge e ∈ E \ D if its cost is bounded by the penalties we save. Similarly, we may removefrom D an edge if the new penalties we incur are bounded by its cost. Although this phase does not improve theapproximation factor of 2 for prize-collecting multicut, we obtain a lower bound on ce for every e ∈ E\D and anupper bound on ce for every e ∈ D. An open question is whether these bounds are useful.Subsequent work: We have recently learned that some of our results were independently obtained by Golovin et al.

[12]. In addition, the techniques introduced in the present paper motivated Könemann et al. [18] to present a unifiedframework for approximating partial covering problems.

Appendix A. An approximate version of the parametric search method

In what follows we show how to find in strongly polynomial time a value �∗ such that either the solution constructedby the prize-collecting algorithm for F(�∗) separates exactly t pairs, or that for infinitesimally small � > 0 the algorithmproduces a solution that separates less than t pairs for F(�∗ − �) and more than t pairs for F(�∗ + �). This value isfound using the parametric search method [20].

We simulate the prize-collecting algorithm with the unknown value of �∗. Throughout this simulation we maintainan interval I = [�l , �h], such that for F(�l ) [F(�h)] the prize-collecting algorithm constructs a solution that separatesat most [at least] t pairs. Initially, I = [0, ∞). The simulation is carried out in the following way: we let the algorithmperform one step at a time, while treating �∗ as a parametric symbol. A step that consists of the addition of two numbersis implemented by adding two linear functions of �∗. Similarly, multiplication by a constant is also easily implementedby multiplying a linear function of �∗ by a constant. The difficulties arise with comparisons.

In order to resolve a comparison of two linear functions of �∗, it is sufficient to compute their breaking point �brand figure out whether �∗ < �br, �∗ = �br or �∗>�br. When �br /∈ I , this comparison can be answered easily, since itis independent of �br over I. Otherwise, we use the prize-collecting algorithm to approximate F(�br). If the resultingsolution separates less than t pairs, we conclude that �∗>�br and set �l = �br. If it separates exactly t pairs, we aredone. Finally, if more than t pairs are separated, we conclude that �∗<�br and set �h = �br.

At the end of this simulation we obtain either a solution that separates exactly t pairs or a value �∗ such that forF(�∗ − �) the algorithm produces a solution that separates less than t pairs, and for F(�∗ + �) it produces a solutionthat separates more than t pairs.

References

[1] A. Archer, R. Rajagopalan, D.B. Shmoys, Lagrangian relaxation for the k-median problem: new insights and continuity properties, in: Proc.11th Annu. European Symp. on Algorithms, 2003, pp. 31–42.

[2] R. Bar-Yehuda, Using homogeneous weights for approximating the partial cover problem, J. Algorithms 39 (2001) 137–144.[3] N.H. Bshouty, L. Burroughs, Massaging a linear programming solution to give a 2-approximation for a generalization of the vertex cover

problem, in: Proc. 15th Annu. Symp. on Theoretical Aspects of Computer Science, 1998, pp. 298–308.[4] P. Camion, Matrices totalement unimodulaires et problèmes combinatoires, Thèse et Rapport Euratom, Université de Bruxelles, 1963.

Page 12: Partial multicuts in trees

A. Levin, D. Segev / Theoretical Computer Science 369 (2006) 384–395 395

[5] S. Chawla, R. Krauthgamer, R. Kumar, Y. Rabani, D. Sivakumar, On the hardness of approximating multicut and sparsest-cut, in: Proc. 20thAnnu. IEEE Conf. on Computational Complexity, 2005, pp. 144–153.

[6] E. Dahlhaus, D.S. Johnson, C.H. Papadimitriou, P.D. Seymour, M. Yannakakis, The complexity of multiterminal cuts, SIAM J. Comput. 23(1994) 864–894.

[7] U. Feige, A threshold of ln n for approximating set cover, J. ACM 45 (1998) 634–652.[8] R. Gandhi, S. Khuller, A. Srinivasan, Approximation algorithms for partial covering problems, J. Algorithms 53 (2004) 55–84.[9] N. Garg, V.V. Vazirani, M. Yannakakis, Approximate max-flow min-(multi)cut theorems and their applications, SIAM J. Comput. 25 (1996)

235–251.[10] N. Garg, V.V. Vazirani, M. Yannakakis, Primal-dual approximation algorithms for integral flow and multicut in trees, Algorithmica 18 (1997)

3–20.[11] D.R. Gaur, T. Ibaraki, R. Krishnamurti, Constant ratio approximation algorithms for the rectangle stabbing problem and the rectilinear

partitioning problem, J. Algorithms 43 (2002) 138–152.[12] D. Golovin, V. Nagarajan, M. Singh, Approximating the k-multicut problem, in: Proc. 17th Annu. ACM-SIAM Symp. on Discrete Algorithms,

2006, pp. 621–630.[13] D.S. Hochbaum, Approximation algorithms for the set covering and vertex cover problems, SIAM J. Comput. 11 (1982) 555–556.[14] D.S. Hochbaum, The t-vertex cover problem: extending the half integrality framework with budget constraints, in: Proc. First Internat. Workshop

on Approximation Algorithms for Combinatorial Optimization, 1998, pp. 111–122.[15] K. Jain, M. Mahdian,A. Saberi,A new greedy approach for facility location problems, in: Proc. 34thAnnu.ACM Symp. on Theory of Computing,

2002, pp. 731–740.[16] K. Jain,V.V.Vazirani,Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and Lagrangian

relaxation, J. ACM 48 (2001) 274–296.[17] S. Khot, On the power of unique 2-prover 1-round games, in: Proc. 34th Annu. ACM Symp. on Theory of Computing, 2002, pp. 767–775.[18] J. Könemann, O. Parekh, D. Segev, A unified approach to approximating partial covering problems, in: Proc. 14th Annu. European Symp. on

Algorithms, 2006, pp. 468–479.[19] A. Levin, Strongly polynomial-time approximation for a class of bicriteria problems, Oper. Res. Lett. 32 (2004) 530–534.[20] N. Megiddo, Combinatorial optimization with rational objective functions, Math. Oper. Res. 4 (1979) 414–424.[21] P. Slavík, Improved performance of the greedy algorithm for partial cover, Inform. Process. Lett. 64 (1997) 251–254.[22] V.V. Vazirani, Approximation Algorithms, Springer, Berlin, 2001.