Improved Approximation Algorithms for Label Cover Problemshajiagha/LabelCover.pdf · Improved Approximation Algorithms for Label Cover Problems Moses Charikar1?, MohammadTaghi Hajiaghayi2,

Improved Approximation Algorithms for LabelCover Problems

Moses Charikar1?, MohammadTaghi Hajiaghayi2, and Howard Karloff2

1 Department of Computer Science, Princeton University,Princeton, NJ 08540, USA, [email protected]

2 AT&T Labs — Research, 180 Park Ave.,Florham Park, NJ 07932, USA, hajiagha,[email protected]

Abstract In this paper we consider both the maximization variant MaxRep and the minimization variant Min Rep of the famous Label Coverproblem, for which, till now, the best approximation ratios known wereO(√

n). In fact, several recent papers reduced Label Cover to otherproblems, arguing that if better approximation algorithms for their prob-lems existed, then a o(

√n)-approximation algorithm for Label Cover

would exist.We show, in fact, that there are a O(n1/3)-approximation algorithm forMax Rep and a O(n1/3 log2/3 n)-approximation algorithm for Min Rep.In addition, we also exhibit a randomized reduction from Densest k-Subgraph to Max Rep, showing that any approximation factor forMax Rep implies the same factor (up to a constant) for Densest k-Subgraph.

1 Introduction

Label Cover was first introduced in Arora et al. [2] and is a canonical problemused to show strong hardness results for many NP-hard problems [12]. It isknown that for Label Cover, there is no approximation algorithm achievinga ratio 2log1−ε n, for any 0 < ε < 1, unless NP ⊆ DTIME(npolylog(n)) [2,12].Label Cover has both maximization and minimization variants for both ofwhich the above hardness holds. Kortsarz [14] introduced slight variants of thesetwo problems called Max Rep and Min Rep. (See the end of this section forformal definitions of both problems.) Indeed Max Rep is equivalent to themaximization version of Label Cover, but Min Rep is slightly different fromthe minimization version of Label Cover. Kortsarz [14] showed that for bothMax Rep and Min Rep, there is the same hardness of 2log1−ε n, for 0 < ε < 1,unless NP ⊆ DTIME(npolylog(n)). The simpler definitions of Max Rep andMin Rep make them particularly attractive for use in hardness reductions.

For the upper bound, it is known that both Max Rep and Min Rep admitrelatively simple O(

√n) approximation algorithms [6,16]. Recently some authors

? Supported by NSF ITR grant CCF-0426582, NSF CAREER award CCF-0237113,MSPA-MCS award 0528414, and NSF expeditions award 0832797.

suggested the possibility that O(√

n) is the best approximation factor for thesetwo problems. See, e.g., [8], in which the authors write, “This ratio [O(

√n)]

seems hard to improve and better ratio algorithms for LABEL-COVERmax arenot known even for very simple versions of the problem (e.g., when the structureof the graph obeys the rules of the Unique Game Conjecture. . .). If LABEL-COVERmax is indeed Ω(

√n) hard to approximate, then so is DSF [Directed

Steiner Forest]. Indeed several recent papers reduced Min Rep/Max Rep toother problems in order to obtain hardness results; therefore studying the ap-proximability of Min Rep/Max Rep is an important goal. See [8] for Di-rected Steiner Forest, [16] for Red-Blue Set Cover, [4,11] for SetCover with Pairs, [3] for Sparsest k-Transitive-Closure-Spanner, [10]for Min-Power k-Edge-Disjoint Paths, [1] for `-Round Power Dominat-ing Set, [5] for Target Set Selection, [15] for Vertex ConnectivitySurvivable Network Design, and [9] for Stochastic Steiner Tree withNon-uniform Inflation.

In this paper, we refute the possibility of Ω(√

n) hardness for both Max Repand Min Rep by developing a O(n1/3)-approximation algorithm for Max Rep

and a O(n1/3 log2/3 n)-approximation algorithm for Min Rep. Our result forMin Rep (see Section 2) uses a natural LP relaxation for the problem. We roundthis LP based on an interesting generalization of the birthday paradox. Our resultfor Max Rep (see Section 3) uses a direct combinatorial approach. Indeed, weshow that for Max Rep the integrality ratio for a natural LP relaxation isΩ(

√n

ln n ) (in contrast to Min Rep for which the integrality ratio is Ω(n1/3−ε), forall ε > 0.)

Our O(n1/3)- and O(n1/3 log2/3 n)-approximation algorithms for MaxRepand MinRep might suggest a connection between these problems and the relatedwell-studied problem Densest k-Subgraph, for which the best approximationfactor so far is O(n1/3−δ) [7], for some small fixed δ > 0. The current best in-approximability result only rules out a polynomial time approximation scheme(PTAS) under the assumption that NP 6⊆ BPTIME(2nε

) [13]. We show indeedthat there is a randomized reduction from Densest k-Subgraph to Max Rep,which preserves the approximation factor up to a constant factor (see Section 4).

We end this section with exact definitions of Max Rep and Min Rep.

Definition 1. (Max Rep)Instance: A bipartite graph G = (A, B,E), where |A| = |B| = n, and an equi-

table partition A of A and B of B into k sets of same size q = nk each (assuming

that n mod k = 0).Objective: Choose A′ ⊆ A and B′ ⊆ B with |A′ ∩ Ai| = |B′ ∩ Bj | = 1 foreach i, j = 1, . . . , k such that the subgraph induced by A′ ∪B′ has the maximumnumber of edges.

In Definition 1 the bipartite graph and the partition of A and B induce a“supergraph” H in the following way: The vertices of H are the sets Ai and Bj .Two sets Ai and Bj are adjacent by a “superedge” in H if and only if thereexist ai ∈ Ai and bj ∈ Bj which are adjacent in G. In this case, we say pair

(ai, bj) covers the superedge (Ai, Bj). In the Max Rep problem the goal is toselect one element, called a representative, from each Ai and each Bj such thatthe number of covered superedges in H is maximized. Another natural objectivefunction considered in the literature is as follows:

Definition 2. (Min Rep)Instance: A bipartite graph G = (A,B, E), where |A| = |B| = n, and equitable

partitions A of A and B of B into k sets of same size q = nk .

Objective: Choose A′ ⊆ A and B′ ⊆ B such that pairs (a, b), a ∈ A′ and b ∈ B′,cover all the superedges of H, while minimizing |A′|+ |B′|.

2 O(n1/3 log2/3 n)-Approximation Algorithm for Min Rep

There is a trivial k-approximation algorithm for Min Rep, namely, select bothvertices of one edge corresponding to each superedge. (The optimum selects atleast one vertex of each Ai (Bj) to which there is a superedge attached andwe choose at most k since there are at most k superedges attached to each Ai

(Bj).) In this section, we present a O(√

q log k) approximation algorithm forthe Min Rep problem using a natural LP relaxation and a rounding schemewhose analysis is based on a generalization of the birthday paradox. By usingthe better of these two algorithms, and remembering that q = n/k, we obtainan O(n1/3 log2/3 n)-approximation algorithm.

First, we start with an LP relaxation as follows:

OPT = minimize∑

u∈A

pu +∑

v∈B

pv (1)

subject to∑

u∈Ai,v∈Bj s.t. (u,v)∈E(G)

fuv = 1 ∀i, j : (Ai, Bj) is a superedge

∑

v∈Bj s.t. (u,v)∈E(G)

fuv ≤ pu ∀1 ≤ i, j ≤ k,∀u ∈ Ai

∑

u∈Ai s.t. (u,v)∈E(G)

fuv ≤ pv ∀1 ≤ i, j ≤ k,∀v ∈ Bj

fuv ≥ 0 ∀u ∈ A, v ∈ B s.t. (u, v) ∈ E(G).

In the IP corresponding to LP 1, px for x ∈ A∪B is a binary variable whichspecifies whether vertex x has been chosen or not in our integral solution. (Inthe LP, intuitively it specifies the fraction of vertex x that is chosen.) In the IP,for all i, j such that (Ai, Bj) is a superedge, choose u ∈ Ai, v ∈ Bj such that u, vare both chosen and set fuv = 1; set fu′v′ = 0 for all other u′ ∈ Ai, v

′ ∈ Bj . (Inthe LP, f specifies the “flow” from u to v and satisfies capacity constraint px oneach vertex x ∈ A ∪B.)

Our algorithm, called MinRepAlg, for rounding LP 1 is relatively simple,though its proof is involved and is based on an interesting generalization of thebirthday paradox. The algorithm is as follows.

1. Find an optimal solution f∗, p∗ to LP 1.2. For each x ∈ A ∪B, let p1

x = min1,√

qpx.3. Let S1 = ∅ be the current set of selected elements.4. Repeat the following O(log k) times: for each vertex x ∈ (A ∪ B) − S1, flip

an independent biased coin and put x into S1 with probability p1x.

Since in MinRepAlg, we amplify each (probability) variable p by a factor√q, the objective function would be at most

√q times the optimum solution to

LP 1. Next, we show that, for each currently-uncovered superedge, the probabil-ity that one iteration will cover that superedge is boundedly away from 0. Sincethere are at most k2 such pairs, with high probability after O(log k) iterationsof the while loop, with total cost O(

√q log k)z∗LP , we cover all superedges in

supergraph H.

Theorem 1. If we choose each vertex x ∈ A∪B with probability p1x, any single

superedge (Ai, Bj) is covered with constant probability.

The proof of the above theorem uses the following lemma.

Lemma 2. Consider a superedge (Ai, Bj) for which LP 1 routes one unit offlow f from vertices u ∈ Ai to v ∈ Bj and satisfies capacity constraints withrespect to the p variables. Then there exists a flow f from vertices u ∈ Ai tovertices v ∈ Bj that

1. has value at least 13 and at most 1,

2. that satisfies the capacity constraint px on each vertex x ∈ Ai ∪Bj, and3. such that every nonzero fuv is at least 1/(6q).

Proof. We start with a flow f ′ = f initially and decrease it in iterations until itsflow from Ai to Bj becomes at most 1

3 . We also maintain node capacities p′ = pinitially and reduce them in each iteration maintaining the property that themodified flow f ′ is a feasible flow for the modified node capacities p′. We build anew flow f = 0 initially and increase it iteratively such that in each iteration, weincrease flow f by at least some α on one edge from Ai to Bj and simultaneously,we decrease flow f ′ by at most 2α; we will ensure that α ≥ 1/(6q). We do thisincreasing of f and decreasing of f ′ in such a manner that flow f ′ + f alwayssatisfies capacity constraints p. Thus when the flow f ′ becomes less than 1

3 , theflow f is at least 1

3 and we are done.Now consider f ′ whose flow is at least 1

3 during the process. Let A′i = x ∈Ai, p

′x < 1

6q and B′j = x ∈ Bj , p

′x < 1

6q. First, we show that there is an edge(u, v) ∈ E(G) with u ∈ Ai −A′i and v ∈ Bj −B′

j . If it is not the case, all flow off ′ should pass through either a vertex of A′i or a vertex B′

j and thus its flow isless than 2q 1

6q = 13 , a contradiction. Let α = minp′u, p′v ≥ 1

6q . We now add a

flow α from u to v in f and reduce the flow f ′ as follows: Assume without loss ofgenerality that p′u ≤ p′v. Then we reduce the flow in f ′ along all edges incidenton u to zero. Note that the total flow reduction in this step is at most α. Wealso reduce flow arbitrarily along edges incident to v other than (u, v) such thatthe total flow on these edges is at most p′v − α. The total flow reduction in thisstep is also bounded by α. Finally, we reduce p′u and p′v by α. This maintainsthe property that flow f ′ is feasible for capacities p′, and that f ′ + f is feasiblefor the original capacities p. In this way, the flow of f ′ is decreased by at most2α, and the flow of f on any one edge has been increased by at least 1

6q . utWe are now ready to prove Theorem 1.

Proof. [of Theorem 1] Fix i, j such that (Ai, Bj) is a superedge. First byLemma 2, we obtain a flow f from the flow f in LP 1 and use the properties of finstead of f in the statement of the lemma in the rest of the proof. Let px ≤ px,for each vertex x ∈ Ai ∪Bj , be the total flow of f passing through vertex x.

If px ≥ 1√q , for x ∈ Ai ∪Bj , then since

√qpx ≥ √

qpx ≥ 1, vertex x is chosenin our random selection with probability 1. Without loss of generality, assumethat x ∈ Ai. Let Nx be the set of all vertices y ∈ Bj for which x has positive flowfxy to y. If there is a y with py ≥ 1√

q , then vertex y is also chosen in our randomselection with probability 1. Thus in this case we will satisfy the superedge(Ai, Bj) with probability 1 and we are done. If it is not the case, then each vertexy ∈ Nx will be selected in our random process with probability at least

√qfxy.

This means that the probability that we do not select in our random process anyvertices in Nx is at most Πy∈Nx(1 − √qfxy) ≤ e−

√q

∑y∈Nx

fxy ≤ e−√q 1√

q = 1e

(since∑

y∈Nxfxy = px ≥ 1√

q ). Thus with probability at least 1− 1e , we select a

vertex in Nx and thus satisfy the superedge (Ai, Bj).In the rest of the proof, we assume px ≤ 1√

q , for x ∈ Ai ∪ Bj , and thus√

qfuv ≤ 1 for u ∈ Ai and v ∈ Bj .The outline of the rest of the proof is as follows. Instead of directly analyzing

the probability that the randomized rounding chooses both endpoints of someedge in G[Ai ∪ Bj ], for a general bipartite graph between Ai and Bj with q =|Ai| = |Bj |, we first transform the bipartite graph, in a natural way, into aperfect matching graph. We do this by replacing a vertex v of degree d(v) byd(v) “clones,” associating a different edge incident to v with each clone, andkeeping the flow values on edges unchanged. We then choose each clone withprobability

√q times the flow on the incident edge. (Note that the scaling factor

is the square root of q, not the square root of the number of boys or girls inthe perfect matching.) We argue that with at least positive constant probability,there is an edge e of G[Ai ∪ Bj ] with at least one clone of each endpoint ofe chosen. However, this is not what algorithm MinRepAlg does, in fact (itdoesn’t detour through a perfect matching graph), so we then argue that theprobability that algorithm MinRepAlg chooses both endpoints of some edge ofG[Ai ∪Bj ] is at least as high as it is in the perfect matching, and hence at leasta positive constant.

First, we construct a bipartite graph M = (A′, B′, E′), for the given i, j, inwhich for each vertex a ∈ Ai (resp., b ∈ Bj), we put r vertices a1, a2, . . . , ar

(resp., b1, b2, . . . , br) in A′ (resp., B′) called clones of vertex a (resp., b), where ris the number of edges incident to a (resp., b) in G[Ai ∪Bj ] that carry nonzeroflow (and thus a flow of at least 1

6q ) in f . We associate each clone ai of a (resp.,bj of b) with a different edge of G incident to a (resp., b). We put edges betweenvertices (clones) in E′ corresponding to edges in G[Ai∪Bj ] that carry a nonzeroflow in f (and we put this flow as the flow of the new edge). Since each edgecarrying positive flow in the bipartite graph between Ai and Bj gives rise to oneedge in M whose endpoints have degree 1, M is a bipartite perfect matching,with |A′| = |B′|, which is at most the number of edges in G[Ai ∪Bj ].

We now consider a random process in which we build a set S by selecting eachvertex (clone) c in M independently with probability

√q times the f flow of the

unique edge incident to c in M . Let the subset of Ai∪Bj chosen by MinRepAlgbe called S1. We will prove two things: (1) first, that the chance that, in theperfect matching graph M , S contains an edge, is at least 1 − e−1/54 > 0, and(2) second, the chance that S1 contains an edge in G is at least as large as thechance that S contains an edge in the perfect matching graph M .

Now we prove (1), that both endpoints of some edge in M are chosen withconstant probability. Consider one fixed edge d = (cA, cB) carrying a flow fd. Weselect both cA and cB with probability (

√qfd)2 = qf2

d . Thus with probability1− qf2

d , edge d will be not selected. The probability that no edges are selectedthen is at most Πd∈E′(1 − qf2

d ). (We have independence because the graph isa perfect matching.) Since each edge carries a flow of at least 1

6q and the total

flow is at most one (by Lemma 2), |E′| ≤ 6q. Hence, since flow of f is at least 13

by Lemma 2, Πd∈E′(1− qf2d ) ≤ e−q

∑d∈E′ f2

d ≤ e−q

(∑

d∈E′ fd)2

|E′| ≤ e−q( 13 )2

6q = e−154 .

Thus with constant probability 1−e−154 > 0 we satisfy any one given superedge.

This completes the proof of (1).Now we prove (2). Build a new probabilistic process as follows. Define p2

x,for x ∈ Ai ∪ Bj , to be the probability that at least one of the clones of x ischosen to be in S. This is, of course, at most the sum of the probabilities thateach individual clone is chosen to be in S, which is itself at most the probabilitythat x ∈ S1 (since the flow values add). Build a set S2 by choosing each nodex ∈ Ai ∪ Bj independently with probability p2

x. The algorithm, on the otherhand, builds S1 using probabilities p1

x ≥ p2x. It is a fairly obvious fact that, since

p1x ≥ p2

x, the chance that S1 contains an edge is no smaller than the chance thatS2 contains an edge, but we prove it anyway.

Lemma 3. Suppose we are given an r-node graph H and two probabilities p1x ≥

p2x for each vertex x. Consider experiment E`, for ` = 1, 2, with probability mea-

sure P`, in which we build set S` by putting each vertex x into S` with probabilityp`

x, independently. Then P1[S1 contains an edge] ≥ P2[S2 contains an edge].

Proof. We can pick one sequence of r independent random reals ξx in [0, 1] andput x into S` if ξx ≤ p`

x. As S2 ⊆ S1 always, in every run in which S2 containsan edge, so does S1. ut

But now we can view the construction of S2 as putting a node x into S2 ifand only if at least one of its clones is chosen for S. It is clear that S contains anedge in M implies that S2 contains an edge in G (but not the converse), so thatthe chance that S contains an edge is dominated by the chance that S2 containsan edge, which itself is dominated by the chance that S1 contains an edge, andwe are done with the proof of Theorem 1. ut

2.1 The Integrality Ratio of Min Rep

Next, we show that the integrality ratio of LP 1 is indeed Ω(n13−ε) for all ε > 0

and thus our algorithm in this section is essentially the best that we can hopefor using the LP.

Theorem 4. The integrality ratio of LP 1 for Min Rep is Ω(n1/3−ε) for anyε > 0, for all large enough n.

Proof. Consider an instance of Min Rep with k = n/q groups of q boys eachand k = n/q groups of q girls each. Between the ith group Ai of boys and thejth group Bj of girls there is a random perfect matching. It is clear that onecan assign fe = 1/q for any edge e and pu = 1/q for any vertex u. This impliesthat z∗LP ≤ 2n/q. To study the integrality ratio, we look at the smallest feasibleset S (i.e., the smallest set of vertices such that for all i, j, there is at leastone edge between Ai and Bj both of whose endpoints are in S). Let S be afeasible set, s = |S|. The size s of S is the sum of 2n/q terms, one for eachAi and Bj . Let a = s/(2n/q) = sq/(2n), the average size of the intersectionof S with some Ai or Bj . Of the 2n/q terms, whose sum is s, fewer than 1/4of them (i.e., (1/2)n/q) can exceed 4a = 2sq/n, and hence at least (3/2)n/q ofthem are at most 4a. Since at most n/q of them can be intersections with then/q Ai’s, at least (1/2)n/q of them are intersections with n/q Bj ’s. Similarly, atleast (1/2)n/q of them are intersections with the n/q Ai’s. Hence there are setsI ⊆ 1, 2, ..., n/q and J ⊆ 1, 2, ..., n/q, |I|, |J | = (1/2)n/q (provided that n/qis even), such that |S∩Ai| ≤ 4a = 2sq/n for all i ∈ I and |S∩Bj | ≤ 4a = 2sq/nfor all j ∈ J , and such that there is an edge between S ∩ Ai and T ∩ Bj . LetSi be any subset of Ai which contains S ∩ Ai and which has size exactly 4a.Analogously, let Tj be any subset of Bj which contains T ∩ Bj and which hassize exactly 4a. Clearly there is an edge between Si and Tj .

Fix an s and let a = sq/(2n). As just shown, the existence of an S, of size s,for graph G implies the existence of I, J ⊆ 1, 2, ..., n/q, |I| = |J | = (1/2)n/q,Si ⊆ Ai for all i ∈ I, and Tj ⊆ Bj for all j ∈ J , |Si| = |Tj | = 4a, such that forall i ∈ I, j ∈ J , the perfect matching in G between Ai and Bj contains an edgebetween Si and Tj . (Si, Tj have size exactly 4a.) Hence the probability that thereis an S is at most the probability that there exist I, J ⊆ 1, 2, ..., n/q, both ofsize (1/2)n/q, and Si ⊆ Ai, Tj ⊆ Bj for all i ∈ I, j ∈ J , with |Si| = |Tj | = 4a,

such that for all i ∈ I, j ∈ J , the perfect matching in G between Ai and Bj

contains an edge between Si and Tj .Given fixed I, J, (Si), (Tj), what is the probability that the random graph

contains, for each i ∈ I, j ∈ J , an edge whose left endpoint is in Si and whoseright one is in Tj? The chance that the random graph does not contain bothendpoints of some edge in the random perfect matching between Ai and Bj isthe chance that all the edges in the (i, j) perfect matching (the one between Ai

and Bj) emanating from Si end outside Tj . There are exactly 4a such edges.We will prove a lower bound on the probability that a random perfect matchingdoes not contain both endpoints of some edge whose left endpoint is in Si andwhose right one is in Tj . In order for the mate of each vertex in Si to lie outsideof Tj , the mate of the first one must be chosen to be one of q − 4a nodes notin Tj among the q vertices in Bj , the mate of the second must be chosen to beone of the remaining q− 4a− 1 nodes not in Tj among the remaining q− 1 ver-tices in Bj , etc. Hence the probability is exactly q−4a

qq−4a−1

q−1 · · · q−4a−(4a−1)q−(4a−1) ≥

(q−8a

q

)4a

=(1− 8a

q

)4a

. Hence the chance that the (i, j) perfect matching doescontain an edge whose left endpoint is in Si and whose right one is in Tj is at most1− (1− 8a/q)4a. The chance that the random matching works for all (n/q)2/4pairs (i, j) with i ∈ I, j ∈ J is at most [1 − (1 − 8a/q)4a](n/q)2/4. Let A =(

n/qn/(2q)

)2( q4a

)n/q[1−

(1− 8a

q

)4a](n/q)2/4

, the first binomial coefficient repre-

senting the choices of I and J , the second representing the subsets Si of Ai and Tj

of Bj . If A < 1, then there is a fixed graph for which no set S of size s is good. A ≤

22n/qq4an/q

[1−

(1− 8a

q

)4a](n/q)2/4

. We choose a =√

q/32 so that q/(8a) =

4a. Note that (1− 8a/q)4a = (1− 1/(q/(8a)))q/(8a) ≥ 1/4 for q/(8a) = 4a suffi-ciently large. So [1−(1−8a/q)4a](n/q)2/4 ≤ [1−(1/4)](n/q)2/4. Letting q = nδ for afixed δ, we have A ≤ 22n1−δ

(nδ)4an/q(3/4)(n/q)2/4. Since 4an/q = 4n1−δ√

q/32 =(1/√

2)n1−δ/2, we have A ≤ (3/4)(1/4)n2(1−δ)22n1−δ

nδ(1/√

2)n1−δ/2. We have A ≤

2−0.01n2(1−δ)+2n1−δ+(lg n)(δ/√

2)n1−δ/2. Since obviously 2(1 − δ) > 1 − δ, we will

have A < 1, in fact, A → 0, if 2(1 − δ) > 1 − δ/2, i.e., 2 − 2δ > 1 − δ/2, i.e.,1 > (3/2)δ, i.e., δ < 2/3. Hence if δ < 2/3, then for q = nδ and a =

√q/32, as

specified above, there is an instance for which no set S of size a(2n/q) is feasible.For this instance, z∗IP >

√q/32(2n/q). Since z∗LP ≤ 2n/q, the integrality ratio

exceeds√

q/32, which is Ω(nδ/2). Since δ < 2/3 is arbitrary, for all ε > 0 theintegrality ratio is Ω(n1/3−ε). ut

3 O(n13 )-Approximation Algorithm for Max Rep

In this section, we provide an O(n13 )-approximation algorithm for Max Rep.

However, in contrast to Section 2, in which we use a natural LP for the problem,we can show that the integrality gap of a natural LP for Max Rep is Ω(

√n). This

forces us to use a combinatorial approach to obtain a non-trivial approximation

factor O(n13 ).

We consider the best of three algorithms:

1. Matching: Find a maximal matching in the supergraph H. For each edge(Ai, Bj) in this matching, pick ai ∈ Ai and bj ∈ Bj such that (ai, bj) ∈ E(G).

2. Random-Choice: For each Bj , pick bj ∈ Bj at random. For each Ai, pickai ∈ Ai that has the maximum number of edges to the set of all selected bj

vertices. Repeat, flipping the roles of A and B.3. Random-Neighbor: For each a ∈ A, construct a solution in the following

fashion and eventually pick the best such solution: For each Bj , pick bj ∈ Bj

at random from amongst those vertices that are neighbors of a (if there isno neighbor of a in Bj , pick an arbitrary bj ∈ Bj). For each Ai, pick ai ∈ Ai

that has the maximum number of edges to the selected bj vertices over allj. Repeat, flipping the roles of A and B.

Theorem 5. The best of these three algorithms is a 2(2n)1/3-approximationalgorithm.

Proof. Suppose that the maximal matching in H has size `. Renumber the Ai’sand Bj ’s such that the matching has edges (Ai, Bi), i = 1, . . . , `. There are noedges between Ai and Bj for i, j > `. Let A′ = ∪`

i=1Ai, and B′ = ∪`j=1Bj . The

edges in the optimal solution can be decomposed into two groups: those that gobetween A′ and B and those that go between A and B′. (Edges between A′ andB′ appear in both.) Hence the optimal solution restricted to one of these twogroups much contain at least half the number of edges in the optimal solution.Without loss of generality, assume that the optimal solution restricted to edgesbetween A and B′ contains at least half the number of edges in the optimalsolution.

We introduce some notation to facilitate the analysis. Let Xij = 1 if there isan edge in the optimal solution from Ai to Bj (and 0 otherwise). Let Nij be thenumber of edges from the optimal vertex a∗i in Ai to the remaining vertices inBj , called “nonoptimal” since they’re not in the optimal solution.

Define p and r as follows:

k∑

i=1

∑

j=1

Xij = p(k`) (2)

k∑

i=1

∑

j=1

Nij = r(`n). (3)

Thus OPT ≤ 2∑k

i

∑`j=1 Xij = 2pk` and algorithm Matching gives a 2pk ap-

proximation.Next, we analyze algorithm Random-Choice. The algorithm picks random

vertices in B and picks the best vertices in A for the chosen vertices in B.

In order to obtain a lower bound on the number of superedges covered, wecompute the expected number of superedges covered if we pick random verticesin Bj , j = 1, . . . , `, and instead of the best vertex in Ai, we use the vertex a∗i ∈ Ai

which is in the optimal solution.

For a superedge (Ai, Bj), i = 1, . . . , k and j = 1, . . . , `, the probability thatthis edge is covered by Random-Choice is (Xij +Nij)/(n/k). Hence the expectednumber of superedges covered is at least k

n

∑ki=1

∑`j=1(Xij + Nij) = k

n (pk` +r`n). Hence the approximation ratio of algorithm Random-Choice is at most

2pk`kn (pk`+r`n)

≤ min

2nk , 2p

r

.

Finally, we analyze algorithm Random-Neighbor. Suppose the vertex a chosenby the algorithm in the first step is in, say, Ah, and also is in the optimal solution.Consider set Bj and the vertex b∗j ∈ Bj in the optimal solution. The number ofedges from a to Bj is Xhj +Nhj . The algorithm picks a random neighbor of a inBj . Thus the probability that b∗j is chosen is Xhj

Xhj+Nhj. As before, instead of pick-

ing the best choice of vertices in A for the chosen vertices in B, we lower boundthe expected number of superedges covered by replacing the vertex ai by thevertex a∗i ∈ Ai in the optimal solution. If b∗j ∈ B is chosen, the number of edgesfrom the set of a∗i ’s is

∑ki=1 Xij . Thus the expected number of superedges cov-

ered is at least∑`

j=1Xhj

Xhj+Nhj(∑k

i=1 Xij). In this calculation, we assumed thata = a∗h was chosen in the first step. We average over h = 1, . . . , k. Thus the ex-pected number of covered edges is at least 1

k

∑kh=1

∑`j=1

Xhj

Xhj+Nhj(∑k

i=1 Xij).

Let Cj =∑k

i=1 Xij and let Nj =∑k

i=1 Nij . Then the previous expression

is 1k

∑`j=1 Cj

∑kh=1

Xhj

Xhj+Nhj.Note that

∑kh=1

Xhj

Xhj+Nhj≥ Cj

(1

1+NjCj

)by the

arithmetic-geometric-harmonic means inequality. In order to obtain a lower

bound for this expression, consider the minimum value of∑

j

C3j

Cj + Njover

all choices of Cj and Nj subject to the constraint that∑

j Cj and∑

j Nj arefixed. Now let Cj , Nj be the respective values that minimize this expression.

Then for any indices f 6= g, the function (of x)Cf

3

Cf + (Nf − x)+

Cg3

Cg + (Ng + x)must be minimized for x = 0. Thus, the derivative of this function at x = 0must be zero. Hence Cf

3/(Cf + Nf )2 = Cg3/(Cg + Ng)2. Hence there is a con-

stant α such that for all indices f , (Cf + Nf )2 = αCf3. Hence α

∑j C

3/2j =∑

j Cj +∑

j Nj = pk` + r`n. Thus the expected number of superedges cov-

ered is at least 1k

∑`j=1

C3j

αC3/2j

= 1k

∑`j=1

C3/2j

α ≥ 1k

(∑`

j=1 C3/2j )2

(pk`+r`n) . Convexity of

f(x) = x3/2 shows that this expression is minimized when all Cj are equal.Hence a lower bound on the expected number of superedges covered is givenby 1

k(`(pk)3/2)2

(pk`+r`n) = `2p3k2

pk`+r`n . Thus the approximation ratio of this procedure is at

most 2pk`(pk`+r`n)`2p3k2 =

2(1+( rp ) n

k )

p .

Thus we have the following upper bounds on the approximation ratio of thealgorithm: 2pk, 2n

k , 2pr , 2

1+( rp ) n

k

p . We consider two cases:

Case 1: ( rp )n

k ≥ 1. In this case, the fourth bound is at most4( r

p ) nk

p . The product

of the first, third and fourth bounds is 16pk × pr ×

( rp ) n

k

p = 16n. Hence at leastone upper bound is at most 2(2n)1/3.Case 2: ( r

p )nk < 1. In this case, the fourth bound is at most 4

p . Now the productof the first, second and fourth bound is 2pk × 2n

k × 4p = 16n. Hence at least one

upper bound is at most 2(2n)1/3. ut

4 Reduction From Densest k-Subgraph to Max Rep

In this section, we consider the Densest k-Subgraph (DkS) problem, in whichthe goal is to find an induced subgraph of order k of a given graph with themaximum number of edges.

Theorem 6. An f(n)-approximation algorithm for Max Rep implies the exis-tence of a randomized O(f(n))-approximation algorithm for DkS.

Proof. From an instance of DkS, we produce an instance of Max Rep by ran-domly dividing vertices of the given graph for DkS into k groups of equal sizes = bn

k c, e.g., by using a random permutation of all vertices, and disregard therest of vertices. Next we place bk/2c groups on one side (call this L) and theother dk/2e groups on the other side (call this R) of the instance for Max Rep.Any feasible solution to the Max Rep instance obtained directly gives a solutionto the original DkS instance of the same value. Both instances have the samenumber n of vertices.

We claim that the expected value of the optimal solution to the Max Repinstance obtained thus is at least a constant times the optimal value for DkS.Consider the optimal solution S of size k to the DkS instance. We produce asolution to the Max Rep instance as follows. For every group in the instance,if the group contains a unique vertex of S, then this unique vertex is picked asthe group representative. If there are zero or at least 2 vertices from S then anarbitrary vertex is picked as the group representative (and we don’t count edgesincident to that vertex). We show that the expected value of this solution is atleast a constant times the value of the DkS optimal solution. For any vertexv ∈ S, with constant probability v is placed alone in its group. Furthermore, fortwo distinct vertices u, v ∈ S, the probability that u and v are both alone intheir groups, u is in the L side and v is on the R side, is bounded below by aconstant greater than 0. Hence E[z∗MaxRep] ≥ cz∗DkS , for a positive constant c.

Now the reduction is apparent. Given an f(n)-approximation algorithm Afor Max Rep, take an n-node instance I of DkS, randomly convert it as aboveinto an n-node instance I ′ of Max Rep, use A to generate a solution A(I ′) ofvalue at least f(n)z∗MaxRep, and report A(I ′) as a feasible solution to the DkSinstance. That E[z∗MaxRep] ≥ cz∗DkS implies that the expected size of the DkSsolution returned is at least cf(n) · z∗DkS . ut

5 Conclusion

Obtaining improvements over the approximation guarantees in this paper wouldbe instructive. Given the reduction demonstrated in Section 4, possibly onecan use ideas from the Densest k-Subgraph algorithm to build an n1/3−δ-approximation algorithms for some fixed δ > 0. However, the main remain-ing open problem is whether, for Max Rep or Min Rep, there is a O(nε)-approximation algorithm for each ε > 0.

References

1. A. Aazami and M. D. Stilp, Approximation algorithms and hardness for domi-nation with propagation, in APPROX 2007, pp. 1–15.

2. S. Arora, L. Babai, J. Stern, and Z. Sweedyk, The hardness of approximateoptima in lattices, codes, and systems of linear equations, J. Comput. System Sci.,54 (1997), pp. 317–331.

3. A. Bhattacharyya, E. Grigorescu, K. Jung, S. Raskhodnikova, and D. P.Woodruff, Transitive-Closure Spanners, ArXiv e-prints, (2008).

4. L. Breslau, I. Diakonikolas, N. Duffield, Y. Gu, M. Hajiaghayi, D. John-son, H. Karloff, M. Resende, and S. Sen, Optimal Node Placement For Path-Disjoint Network Monitoring, 2008, manuscript.

5. N. Chen, On the approximability of influence in social networks, in SODA 2008,pp. 1029–1037.

6. M. Elkin and D. Peleg, The hardness of approximating spanner problems, The-ory Comput. Syst., 41 (2007), pp. 691–729.

7. U. Feige, G. Kortsarz, and D. Peleg, The dense k-subgraph problem, Algo-rithmica, 29 (2001), pp. 410–421.

8. M. Feldman, G. Kortsarz, and Z. Nutov, Improved approximation for thedirected steiner forest problem, in SODA 2009, pp. 922-931.

9. A. Gupta, M. Hajiaghayi, and A. Kumar, Stochastic steiner tree with non-uniform inflation, in APPROX 2007, pp. 134–148.

10. M. T. Hajiaghayi, G. Kortsarz, V. S. Mirrokni, and Z. Nutov, Poweroptimization for connectivity problems, Math. Program., 110 (2007), pp. 195–208.

11. R. Hassin and D. Segev, The set cover with pairs problem, in FSTTCS 2005,pp. 164–176.

12. D. S. Hochbaum, ed., Approximation algorithms for NP-hard problems, PWSPublishing Co., Boston, MA, USA, 1997. see the section written by Arora andLund.

13. S. Khot, Ruling out PTAS for graph min-bisection, dense k-subgraph, and bipartiteclique, SIAM J. Comput., 36 (2006), pp. 1025–1071.

14. G. Kortsarz, On the hardness of approximating spanners, Algorithmica, 30(2001), pp. 432–450.

15. G. Kortsarz, R. Krauthgamer, and J. R. Lee, Hardness of approximationfor vertex-connectivity network design problems, SIAM Journal on Computing, 33(2004), pp. 185–199.

16. D. Peleg, Approximation algorithms for the Label-CoverMAX and Red-Blue SetCover problems, J. Discrete Algorithms, 5 (2007), pp. 55–64.

Documents

Improved Approximation Algorithms for Label Cover Problemshajiagha/LabelCover.pdf · Improved Approximation Algorithms for Label Cover Problems Moses Charikar1?, MohammadTaghi Hajiaghayi2,