Greedily Finding a Dense Subgraph

Ž .Journal of Algorithms 34, 203]221 2000doi:10.1006rjagm.1999.1062, available online at http:rrwww.idealibrary.com on

Greedily Finding a Dense Subgraph

Yuichi Asahiro

Department of Computer Science & Communication Engineering, Kyushu Uni ersity,Fukuoka 812, Japan

E-mail: [email protected]

Kazuo Iwama1

School of Informatics, Kyoto Uni ersity, Kyoto 606-01, JapanE-mail: [email protected]

Hisao Tamaki2

Department of Computer Science, Meiji Uni ersity, Kawasaki 214-71, JapanE-mail: [email protected]

and

Takeshi Tokuyama

IBM Tokyo Research Laboratory, Yamato 242, JapanE-mail: [email protected]

Received August 20, 1996

Given an n-vertex graph with nonnegative edge weights and a positive integerk F n, our goal is to find a k-vertex subgraph with the maximum weight. We studythe following greedy algorithm for this problem: repeatedly remove a vertex withthe minimum weighted-degree in the currently remaining graph, until exactly kvertices are left. We derive tight bounds on the worst case approximation ratio R

Ž .2 Ž y1 r3. Ž .2of this greedy algorithm: 1r2 q nr2k y O n F R F 1r2 q nr2k qŽ . Ž . Ž . ŽO 1rn for k in the range nr3 F k F n and 2 nrk y 1 y O 1rk F R F 2 nrk. Ž 2 .y 1 q O nrk for k - nr3. For k s nr2, for example, these bounds are 9r4 "

Ž .O 1rn , improving on naive lower and upper bounds of 2 and 4, respectively.

1 Research supported in part by Science Research Grant 07458061, Ministry of Education,Japan.

2 Main part of the work done while author was at IBM Tokyo Research Laboratory.

203

0196-6774r00 $35.00Copyright Q 2000 by Academic Press

All rights of reproduction in any form reserved.

ASAHIRO ET AL.204

ŽThe upper bound for general k compares well with currently the best and much.more complicated approximation algorithm based on semidefinite programming.

Q 2000 Academic Press

1. INTRODUCTION

In the maximum edge subgraph problem, we are given an n-vertex graphwith nonnegative edge weights and a positive integer k F n and are askedto find a k-vertex subgraph with the maximum weight. Since the problem istrivial for k F 2, we assume k G 3 throughout the paper. Being a general-ization of the maximum clique problem, this problem is obviously NP-hard.The decision problem version of the problem for unweighted graphs, inwhich one is asked if there is a k-vertex graph with more than m edges, is

1qe w xNP-complete even if m is restricted by m F k AI95 . Naturally, wewseek efficient algorithms for approximate solutions RRT94, KP93, AKK95,

x w xAI95, FS97, SW98 . The problem is related to facility dispersion RRT94w xand to the sparsest spanner problem KP94 . It may also find application in

data mining, where diverse methods of data clustering are being activelyexplored.

In the following discussions, the approximation ratio of a solution isOPTrA, where OPT is the weight of the optimal subgraph and A is the

Ž .weight of the given solution. The worst case approximation ratio of anŽalgorithm is the supremum, over all inputs and, when the algorithm is

.nondeterministic, over all nondeterministic choices the algorithm makes ,w xof the approximation ratio of the solution it produces. Ravi et al. RRT94

studied the case where the weight function satisfies the triangle inequalityŽ .in the context of the facility dispersion problem and showed that a greedyalgorithm achieves the approximation ratio 4. They also gave an algorithmwith the approximation ratio 2 in this special case. Kortsarz and Pelegw xKP93 studied the problem with a general weight function and constructed

Ž 7r18.an algorithm with approximation ratio O n . Recently, Arora et al.w xAKK95 have developed a framework for approximately solving denseinstances of NP-hard problems and applied it to the maximum edge

Žsubgraph problem: the result is a PTAS i.e., a polynomial time algorithmŽ . .A with approximation ratio 1 q e , for each constant e ) 0 for thee

Ž . Ž 2 .special cases where the graph is unweighted and 1 the graph has V nŽ . Ž . Ž .edges and k s V n or 2 each vertex of the graph has degree V n .

Yet more recently, after the first submission of the present paper, someauthors developed algorithms based on semidefinite programming relax-

w xations: the algorithm of Feige and Seltser FS97 achieves an approxima-w xtion ratio of roughly nrk and that of Srivastav and Wolf SW98 slightly

outperforms this ratio when k is around nr3.

GREEDILY FINDING A DENSE SUBGRAPH 205

w xThe greedy algorithm of Ravi et al. RRT94 starts from a heaviest edgeand repeats adding a vertex to the current subgraph that maximizes theweight of the resulting new subgraph, until k vertices are chosen. Al-though this algorithm works well for a weight function with the triangleinequality, it fails miserably for a general weight function, as pointed out

w xby Kortsarz and Peleg KP93 .What we study in this paper is an alternative greedy algorithm which we

call GREEDY: given G and a weight function,

Ž .1. Set X s V G .

< < w x2. If X s k then output G X . Otherwise, find a vertex in X withw xthe minimum weighted degree in G X under the given weight function

and remove it from X. Repeat Step 2.

w xHere, G X denotes the subgraph of G induced by X. This algorithm canŽbe viewed as a derandomization using the method of conditional probabil-

w x.ity Rag88 of a randomized algorithm that picks a random set of kvertices. Since the expected weight of a random k-vertex subgraph of G isŽ .k k y 1 Žtimes the total weight of G each edge of weight w contributesŽ .n n y 1Ž . Ž .k k y 1 n n y 1.w to the expectation , we immediately have an upper boundŽ . Ž .n n y 1 k k y 1

Ž .on the approximation ratio, which is, for example, 4 q o 1 when k s nr2.On the other hand, when k s nr2, there is a particularly simple lower

Ž .bound of 2 y o 1 : an unweighted graph consisting of nr4 independentedges and a path of nr2 vertices. When GREEDY is applied to this graph,it may leave the nr4 independent edges as its solution while the optimal isthe path with nr2 y 1 edges.

In what follows, we prove upper and lower bounds on the approximationŽ .ratio of GREEDY, which are tight up to an o 1 additive term as long as

'Ž .k s v n .

THEOREM 1. The worst-case approximation ratio R of GREEDY satisfies

2 21 n 1 ny1r3q y O n F R F q q O 1rnŽ . Ž .ž / ž /2 2k 2 2k

for nr3 - k F n ,

and

n n22 y 1 y O 1rk F R F 2 y 1 q O nrk for k F nr3.Ž . Ž .ž / ž /k k

ASAHIRO ET AL.206

Ž .For the special case k s nr2, the bounds are 9r4 " o 1 improving the'naive bounds of 2 and 4. For n - k F nr3, the upper bound improves the

above naive analysis by a factor of nr2k.w xThe algorithm of Kortsarz and Peleg KP93 uses a subprocedure called

Ž .LARGE, which solves the problem with an approximation ratio O nrk ,where the constant in the O notation is at least 4. The advantage ofGREEDY lies partly in its smaller constant and mostly in its simplicity:procedure LARGE is a combination of several ideas including binarysearch, ‘‘expanding kernels,’’ and the method of conditional probability.Their algorithm also includes a subprocedure that solves the unweighted

ŽŽ .2r3.version of the problem with approximation ratio O nrk , but only5'when the number of edges in the optimal solution is larger than k rn .

In combinatorial optimization problems in general, greedy heuristics areparticularly attractive for their simplicity and it is important to determinehow well they perform in terms of approximation. Our result adds to theinteresting cases where tight bounds on the approximation ratio are

w xobtained through nontrivial analysis. See Grotschel and Lovasz GL95 for¨ óther examples and general discussion on greedy heuristics.

Although we study the problem for weighted graphs, the special case ofunweighted graphs may be of special interest. It is clear that our upperbounds apply to unweighted graphs. On the other hand, since the optimal

kŽ . Ž .solution can have at most edges, a trivial upper bound of O k applies2

Ž .to any algorithm that outputs a subgraph with V k edges; it certainlyapplies to GREEDY since the solution of GREEDY can contain at most

Žone isolated vertex unless the original graph contains more than n y k.isolated vertices, in which case GREEDY finds the optimal solution . This

'upper bound is stronger than our general upper bound when k F n . Fork G cn3r4 where c is some positive constant, however, our general upperbound is tight even when restricted to unweighted graphs, since a matchinglower bound can be exhibited by an unweighted graph for k in this range.For k F cn3r4, our lower bound example does use nonunit weights. Thus,we do not know whether our upper bound is tight for unweighted graphs

3r4'for the range n F k F cn .The rest of the paper is organized as follows. Section 2 contains a few

definitions and notation. In Section 3 we describe examples that establishthe lower bounds in Theorem 1. Then in Section 4 we prove the upperbound. Although the two sections can be read independent of each other,the authors view them as an unsplittable pair: the upper bound proofwould have never been discovered without the guidance of the lowerbound examples and the understanding of one may give some insight intothe other.


2. DEFINITIONS

In this paper, a graph is simple, i.e., without a self-loop or parallel edges,Ž .unless otherwise stated. For a graph G, we denote the vertex set by V G

Ž .and the edge set by E G , as usual. The subgraph of G induced by a vertexŽ . w xset U : V G is denoted by G U .

Ž . Ž . qA weighted graph is a pair G, w , where G is a graph and w: E G ª Ris a weight function that assigns a nonnegative real to each edge of G. The

Ž .weighted degree or simply degree of a vertex ¨ in the weighted graph G, wŽ .is Ýw e where e ranges over all the edges incident to ¨ . The aërage

Ž . Ž . Ž . < <degree of a nonempty vertex set U : V G in G, w is Ý d ¨ r U¨ gUŽ . Ž .where d ¨ denotes the degree of ¨ in G, w . The aërage degree of a

Ž . Ž . Ž .weighted graph G, w is the average degree of V G in G, w . Clearly,Ž . Ž . < Ž . < Ž .the average degree of G, w is equal to 2w G r V G , where w G

Ž .denotes the total weight of G. A weighted graph G, w is d-regular ifevery vertex of it is of weighted degree d. We sometimes regard an

Ž .unweighted graph G as a weighted graph G, 1 , where 1 denotes theweight function that assigns 1 to each edge, and apply the above defini-

Ž .tions. Finally, if G, w is a weighted graph and H is a subgraph of G, weŽ .abuse the notation and write H, w to denote the weighted graph in which

the weight function w is restricted to the edges of H.

3. LOWER BOUND EXAMPLES

The goal of this section is to exhibit examples that establish the lowerbound in Theorem 1.

We first deal with the case k F nr3. Let K denote the completenweighted graph on n-vertices in which each edge is assigned a weight of 1and P a path on n vertices in which each edge is assigned a weight of d.n, dOur lower bound example is the graph consisting of two connectedcomponents K and P . A possible execution of GREEDY onnyk k , nyky1this weighted graph removes P first and then peels off vertices ofk , nyky1Ž .V K until it obtains K as the solution. Since the weight of Pnyk k k , nyky1Ž .Ž . Ž .is n y k y 1 k y 1 and the weight of K is k k y 1 r2, the approxima-k

Ž . Ž .tion ratio of this solution is 2 nrk y 1 y O 1rk . This gives the lowerbound in Theorem 1 for the case k F nr3. Note that, although this lowerbound is valid as long as k F nr2, it is weaker than the bound in thetheorem when k ) nr3.

We now deal with the case k ) nr3. Our construction in this case doesnot use nonunit weights; i.e., the graph we construct is an unweightedgraph. We remark that the above construction for k F nr3 can also be

ASAHIRO ET AL.208

replaced by a nonweighted version if k G cn3r4, where c is a sufficientlylarge constant, using similar ideas.

Fix k and n, with nr3 - k. We may assume that k - n y n2r3, sinceŽ .2otherwise the trivial lower bound of 1 can be expressed as 1r2 q nr2k

Ž y1r3.y O n . For the following construction, it is convenient to assumethat k has a divisor s in the range n2r3 F s - 2n2r3. We will later showhow to remove this assumption. Let k s sd and let p be an integer in therange 1 F p F skrn; p is a parameter to be tuned later in order to obtaina best possible lower bound. Note that, from the assumption on the range

Ž y1r3. 1r3of k, we have p - s 1 y n F s y n . Given these parameters k, n,s, d, and p, we define a graph G as follows.

Ž .The structure of our graph G is illustrated in Fig. 1. The vertex set V G< <of G is a disjoint union of U and W, U s k. The subset U is going to be

w xthe optimal solution; i.e., G U will have the largest number of edgesŽ .among all the k-vertex subgraphs. Within V G is a chain of subsets

Ž . < < Ž .X ; X ; ??? ; X : V G , such that X s s d q i , where h will be0 2 h idetermined later. The set of edges will be defined so that GREEDY can

w x < <give G X as its solution. Note X s sd s k. Let U s X l U and0 0 i i< < Ž . < < ŽW s X l W. Subsets X are chosen so that U s p d q i and W s si i i i i

.Ž .y p d q i . In other words, X intersects with U and W at a fixediŽ .proportion p: s y p . The condition on p, p F skrn, implies that this

Ž . Ž . < < < <ratio pr s y p is not greater than kr n y k s U r W . This means thatthe maximum possible value of h is determined from the size constraint on

< < < < < < Ž . < < Ž .FIG. 1. Lower bound example: X s U s k s sd, X s s d q i , U s p d q i , and0 i i< < Ž .Ž .W s s y p d q i .i


Ž .Ž . ?Ž . Ž .@W: s y p d q h F n y k. We set h s n y k r s y p y d. Our graphG is the union of the following subgraphs.

Ž . w x w x Ž < <1. arbitrary d q 1 -regular graphs G U and G W since U and0 0 0< <W are multiples of d and hence even whenever d q 1 is odd, such0

.regular graphs always exist ;

2. for each i, 1 F i F h,

Ž .a an arbitrary bipartite graph between vertex sets U andiy1U _U such that the degree of each vertex of U is 1 and the degree ofi iy1 iy1

< <each vertex of U _U is U rp s d q i y 1,i iy1 iy1

Ž .b an arbitrary bipartite graph between vertex sets W andiy1W _W such that the degree of each vertex of W is 1 and the degreei iy1 iy1

< < Ž .of each vertex of W _W is W r s y p s d q i y 1, andi iy1 iy1

Ž .c a cycle on X _ X , such that only two edges are betweeni iy1U _U and W _W ;i iy1 i iy1

3. an arbitrary bipartite graph between vertex sets U and U _Uh hsuch that the degree of each vertex in U _U is d q h q 1.h

w x Ž .It is easy to confirm that G U j W is d q i q 1 -regular, for 0 F i F h.i iA possible execution of GREEDY on G proceeds as follows. First, all thevertices in W _W which have degree 0 are removed. The minimumhdegree of the remaining graph is d q h q 1 and next all the vertices of

Ž .U _U are removed in sequence. This leaves us with the d q h q 1 -hw xregular graph G X . The remaining execution consists of h phases, theh

jth of which removes all the vertices in X _ X , where i s h y j q 1. Toi iy1confirm that such an execution is indeed allowed, suppose we are left with

Ž .X as the current set of vertices. Since the remaining graph is d q i q 1 -iregular, we may start the next phase by removing a vertex from X _ X .i iy1Once this is done, we may always find a vertex of degree d q i in

Ž .X _ X or of degree d q i y 1 at the last step of this phase , while thei iy1degree of each vertex in X remains at least d q i during the phase.iy1Thus, by induction, this execution of GREEDY leaves the vertex set X as0the solution.

Now we can compute the approximation ratio of this greedy solution.w x Ž .The number of edges in G X is k d q 1 r2. The number of edges in0

w x < < w x ŽG U is calculated as follows. Let x s U rk. Since G U j W is d q hh h h.q 1 -regular and there are only 2h edges between U and W , the numberh h

w x < <Ž . Ž .of edges in G U is U d q h q 1 r2 y h s xk d q h q 1 r2 y h. Weh hŽ .Ž .have additional k 1 y x d q h q 1 edges between U and U _U . There-h h

w x Ž .Ž .fore, the total number of edges in G U is k d q h q 1 1 y xr2 y h.Ž . < <Since s d q h s X F n, we have h F nrs y d - 3krs y d s 2 d andh

ASAHIRO ET AL.210

hence the approximation ratio R of this greedy solution is0

k d q h q 1 1 y xr2 y hŽ . Ž .R s 1Ž .0 k d q 1 r2Ž .

d q h q 1 2 y xŽ . Ž .s y O 1rk . 2Ž . Ž .

d q 1

Ž . Ž .We may represent the fraction d q h q 1 r d q 1 in term of x asfollows. We have

< <d q h s s XŽ . h

< < < <s n y U _U y W _Wh h

) n y k 1 y x y s,Ž .

where the last inequality follows from the choice of h that implies< <W _W F s, and henceh

d q h q 1 n y k 1 y xŽ .)

d q 1 s d q 1Ž .n y k 1 y xŽ .

sk q s

s nrk y 1 y x y O ny1r3 .Ž . Ž .Ž .Ž Ž y1r3.. Ž .Thus, we have R G 2 y x nrk y 1 q x y O n y O 1rn , or0

R G 2 y x nrk y 1 q x y O ny1r3 . 3Ž . Ž . Ž . Ž .0

What remains to be done is to maximize the right-hand side choosing anŽ .appropriate value of p that determines x . Set r s nrk. The quadratic

Ž . Ž .Ž . Žfunction F x s 2 y x r y 1 q x , 1 - r - 3, has its maximum r qr.2 Ž . Ž .1 r4 at x s 3 y r r2. Therefore, the right-hand side of inequality 3 as

Ž .2 Ž y1r3.a function of x attains its maximum 1r2 q nr2k y O n at x s x0Ž . Ž y1r3.s 3 y nrk r2 q O n . The following lemma establishes that we can

< < Ž y1r3.make x sufficiently close to x ; more specifically x y x F O n , by0 0choosing p appropriately. Thus, our construction gives the lower bound ofŽ .2 Ž y1r3.1r2 q nr2k y O n as claimed in Theorem 1.

LEMMA 2. For each y, 0 - y - 1, there is an integer älue of p such that< < Ž y1r3.the älue of x corresponding to p satisfies x y y F O n .

Proof. Recall from the definition of h and x that x depends on p asŽ . ?Ž . Ž .@ Ž . Ž . Ž .x p s p n y k r s y p rk. If we put f p s p n y k rk s y p ignor-

< Ž . Ž . < Ž y1r3.ing the floor operation, we have f p y x p F prk F O n . There-


fore, it suffices to show that, for each y, 0 - y - 1, there is an integer p< Ž . < Ž y1r3. Ž .such that f p y y F O n . Since f t , for t real, is continuous and

Ž . Ž .increasing in the range 0 F t F ksrn with f 0 s 0 and f ksrn s 1, itŽ . Ž . Ž y1r3.suffices to show that f p y f p y 1 F O n for every 1 F p F ksrn.

This quantity,

n y k p p y 1f p y f p y 1 s yŽ . Ž . ž /k s y p s y p q 1

n y k sŽ .s ,

k s y p s y p q 1Ž . Ž .

is increasing in p in the range 1 F p F ksrn and hence its maximum is

n y k s nŽ .s .

k s y ksrn s y ksrn q 1 ks 1 y krn q 1rsŽ . Ž . Ž .

2r3 ŽFrom our assumptions that k ) nr3, k - n y n hence 1 y krn )y1r3 2r3 y1r3. Ž .n , and s G n , we see that this value is O n .

Finally, we need to remove the assumption that k is a multiple of someinteger s in the range n2r3 F s - 2n2r3. Fixing n, call k simple if it

Ž .satisfies this assumption. For simple k, let OPT k denote the number ofw x Ž .edges of G U in the above construction and GREEDY k denote the

w xnumber of edges in G X . For k that is not simple, we use the above0construction G for kX, where kX is the largest simple integer such that

X X Ž 2r3. Ž Ž y1r3..k - k. Note that k G k y O n s k 1 y O n . The execution ofGREEDY on this graph proceeds exactly as before, except that it stops

Ž X .when k vertices instead of k vertices are left. The heaviest k-vertexŽ X.subgraph of G has at least OPT k edges while the greedy solution

Ž X. Ž .contains at most GREEDY k q O n edges since the maximum degreeŽ 1r3.of G is O n . Therefore, the approximation ratio of this greedy solu-

Ž X. Ž Ž X. Ž .. Ž X. Ž X.tion is at least OPT k r GREEDY k q O n s OPT k rGREEDY kŽ y1r3. Ž X . Ž 4r3. Ž X . Ž 4r3.yO n , since GREEDY k s Q n and OPT k s Q n .

Ž X. Ž X. Ž X.2 Ž y1r3. ŽBut OPT k rGREEDY k s 1r2 q nr2k y O n s 1r2 q nr.2 Ž y1r3.2k y O n .

4. UPPER BOUND

In this section, we give an upper bound that matches the lower boundŽ .construction of the previous section. An n, k -instance or an executionŽ . Ž .instance, when unambiguous, is a triple G, w, s , where G, w is a

ASAHIRO ET AL.212

< Ž . <weighted graph with V G s n and s is a sequence of distinct n y kŽ . Ž Ž ..vertices of G. When we refer to an n, k -instance G, w, ¨ , . . . , ¨ ,n kq1

Ž . � 4we will use V to represent V G _ ¨ , . . . , ¨ , for k - i F n. We call ani iq1 nŽ . Ž Ž ..n, k -instance G, w, ¨ , . . . , ¨ greedy if GREEDY may legally removen kq1

Ž w x .the vertices ¨ , . . . , ¨ in this order; i.e., ¨ is the vertex of G V , wn kq1 i iwith the minimum weighted degree.

For a very technical reason, we compare the weight of the greedysolution to the weight of a solution which is not necessarily optimal. We

Ž .say that a set U of k vertices of a weighted graph G, w is quasioptimalŽ Ž .. Ž w x.with respect to an execution instance I s G, w ¨ , . . . , ¨ if w G Un kq1

is the maximum under the constraint that U l V / B and U / V . Let Uk kŽ .be an optimal set of k vertices of G, w . Replacing a vertex of the

Ž w x . Ž Ž w x. .minimum degree in G U , w whose degree is at most 2w G U rk withsome appropriate vertex, we obtain a vertex set U X with weight at leastŽ w x.Ž . X Xw G U 1 y 2rk such that U l V / B and U / V . Therefore, thek k

Ž .weight of a quasioptimal solution is at least 1 y 2rk times the optimal.In the following, instead of bounding the worst case approximation ratioR, we bound the worst case ratio RX of a quasioptimal solution to a greedy

XŽ . Ž .solution. Since R F R 1 q 8rk as we are assuming k G 3 , the inaccu-Ž .racy incurred is absorbed by the O error terms in the upper bounds of

Theorem 1.The strategy of our upper bound proof is as follows. Given an arbitrary

Žgreedy execution instance we transform the instance by which we roughly.mean changing the weight function w into a certain canonical form

without decreasing the approximation ratio and then analyze this canonicalform to obtain the upper bound. The objects we are going to manipulateare in fact slightly abstract versions of execution instances, which we callscenarios, rather than execution instances themselves. Before formallydefining scenarios, let us first discuss some underlying ideas. As soon as wetry to change the weight function in an execution instance in such a waythat greediness is preserved, we notice that it is a tricky process: simplymoving some weight from one edge to another can destroy the greedinessof the instance in several ways. So, let us consider relaxing the greedinesscondition somewhat to make transformations easier.

Ž . Ž Ž ..We call an n, k -instance G, w, ¨ , . . . , ¨ semigreedy if then kq1w xweighted degree of ¨ in G V is not greater than the average weightedi i

Ž w x .degree of G V , w for k - i F n. Note that any greedy execution in-istance is semigreedy and therefore an upper bound on the approximationratios of semigreedy instances will be an upper bound for greedy instances.The advantages of dealing with semigreedy instances is that the actualweight function w is not important in deciding if the given instance issemigreedy or not. For such a check, we only need the sequenceŽ . Ž . Ž . Ž .h ¨ , h ¨ , . . . , h ¨ , where h ¨ is the weighted degree of ¨ inn ny1 kq1 i i


ˆŽ w x . Ž w x .G V , w , and the average degree h of G V , w . This is because thei kˆ ˆ ˆŽ w x . Ž .average degree h of G V , w is determined by ih s kh q 2Ý h ¨ ,i i i k - jF i j

for k - i F n. If we dealt with a structure in which the weight function wîs replaced by h and h, then preserving semigreediness in transformations

would be easier than preserving greediness with explicit weight assign-ments. Unfortunately, by doing so we would lose information necessary todetermine the approximation ratio of an execution instance: we would

Žknow the weight of the greedy solution but not the optimal or quasiopti-.mal solution. To keep track of the weight of the quasioptimal solution U,

Ž .we need to divide the weighed degree h ¨ into two parts: weights goingiŽ .into U and those going into V G _U. These considerations lead to the

following definition of scenarios.ˆŽ . Ž .An n, k -scenario is a 7-tuple S s V, U, s , f , g, f , g , where V is a setˆ

< <of n vertices, U is a subset of V with U s k, s is a sequence of distinctn y k vertices of V that contains at least one vertex of U but not all the

ˆvertices of U, f and g are nonnegative reals, and f , g are mappings fromˆthe set of vertices of s to the set of nonnegative reals.

The following notation will be used freely whenever we refer to anˆŽ . Ž . Ž .n, k -scenario S s V, U, s , f , g, f , g . We let s s ¨ , ¨ , . . . , ¨ ,ˆ n ny1 kq1

� 4 < < <V s V _ ¨ , . . . , ¨ , and W s V _U. We set s s V l U and t s V li iq1 n i i i i< Ž . Ž . Ž .W , for k F i F n, and h ¨ s f ¨ q g ¨ for k - i F n.i i i

Ž . Ž . Ž . < <Given an n, k -instance I s G, w, s and U : V G with U s k, weX X X ˆŽ . Ž .say that an n, k -scenario S s V , U , s , f , g, f , g corresponds to theˆ

Ž .pair I, U if all of the following conditions hold.

X Ž . X X1. V s V G , U s U, and s s s ;Ž .2. f ¨ , for k - i F n, is the total weight of the edges between ï i

and V l U under weight function w;iy1

Ž .3. g ¨ , for k - i F n, is the total weight of the edges between ï iand V l W under w;iy1

ˆ Ž w x .4. f is the average degree of V l U in G V , w ;k k

Ž w x .5. g is the average degree of V l W in G V , w .ˆ k k

Ž . Ž . Ž .Note that 2 and 3 imply that h ¨ s f ¨ q g ¨ is the weighted degreei i iŽ w x . Ž .of ¨ in G V , w . We say that an n, k -scenario S corresponds to ani i

Ž .execution instance I if S corresponds to I, U for some set of vertices U.ˆ ˆŽ . Ž .Given an n, k -scenario S s V, U, s , f , g, f , g , define f and g , k F iˆ î i

F n by

ˆ ˆs f s s f q f ¨ q h ¨ ,Ž . Ž .Ý Ýi i k j jk-jFi k-jFi , ¨ gUj

ASAHIRO ET AL.214

and

t g s t g q g ¨ q h ¨ .Ž . Ž .ˆ ˆ Ý Ýi i k j jk-jFi k-jFi , ¨ gWj

ˆ În particular, we have f s f and g s g. Note that the restriction on s inˆ ˆk kthe definition of scenarios implies that both V l U and V l W arek knonempty and hence s ) 0 and t ) 0 for k F i F n. The purpose of ouri iearlier decision to compare the greedy solution to a quasioptimal solutionis to eliminate the exceptional cases where s s 0 or t s 0. We remarki ithat the awkwardness of this rather artificial arrangement can be avoidedif one is willing to go through tedious details arising from such exceptionalcases.

It is easy to verify the following proposition.

ˆŽ . Ž .PROPOSITION 3. If scenario S s V, U, s , f , g, f , g corresponds to I, UˆˆŽ .where I s G, w, s , then for each i, k - i F n, f is the aërage degree ofi

Ž w x . Ž w x .V l U in G V , w and g is the aërage degree of V l W in G V , w .î i i i i

ˆ Ž .In view of this proposition, we call f g , resp. the average degree ofî iŽ .V l U V l W, resp in scenario S, even when S does not explicitlyi i

correspond to an execution instance and hence no explicit weight functionis involved.

Ž .We say that n, k -scenario S is älid if the following inequality issatisfied for k - i F n:

ˆh ¨ F min f , g . 4Ž . Ž .ˆž /i i i

Ž . Ž .LEMMA 4. Let I s G, w, s be a greedy n, k -instance and U a qua-Ž .sioptimal k- ertex set with respect to I. Then, there is a unique n, k -scenario

Ž .S that corresponds to I, U and, moreoër, S is älid.

ˆŽ .Proof. Each component of the scenario S s V, U, s , f , g, f , g isˆŽ .uniquely determined from I, U by each condition in the definition of

correspondence. Note that the condition on s in the definition of ascenario is satisfied because U is quasioptimal and hence U l V / B andk

Ž .U / V . It remains to show that S is valid. Since S corresponds to I, h ¨k iŽ w x . Ž .is the weighted degree of ¨ in G V , w and, because I is greedy, h ¨ isi i i

Ž w x .not greater than the weighted degree of ¨ in G V , w for any vertexiŽ .¨ g V . Therefore, h ¨ is not greater than the average degree of anyi i

Ž w x .vertex set in G V , w ; in particular, it is not greater than the averageiˆdegree f of V l U or the average degree g of V l W.î i i i


Ž .Define the älue of an n, k -scenario S by

ˆs f q 2Ý f ¨Ž .k k - iF n , ¨ gU iival S s .Ž .ˆs f q t gk k

Ž . Ž .LEMMA 5. Let I s G, w, s be a greedy n, k -instance and U a qua-Ž . Ž .sioptimal solution for G, w . If S is the n, k -scenario that corresponds to

Ž . Ž . Ž w x. Ž w x.I, U , then val S G w G U rw G V .k

Ž .Proof. The numerator in the definition of val S upper-boundsŽ w x. Ž w x.2w G U while the denominator equals 2w G V .k

Thus, to upper-bound the approximation ratio of the greedy algorithmŽ . Ž .with respect to a quasioptimal solution , it suffices to upper-bound val Sover all valid scenarios S. In what follows, we define several transforma-tions of scenarios that bring a given valid scenario into a certain canonicalform without decreasing the value of the scenario. Then, we analyzecanonical scenarios to upper-bound their values. Defining the transforma-tion directly on execution instances would involve global changes to theweight function; the local transformations below are made possible by ouruse of more general objects}scenarios}that abstract away actual edgeweights.

We say that a valid scenario S is saturated if the equality holds inŽ .inequality 4 , for k - i F n. Our first transformation, called saturate,

converts an arbitrary valid scenario S into a saturated one. Fixing otherŽ . Ž .components of S, it successively redefines one of f ¨ or g ¨ , fori i

i s k q 1 up to n, increasing its value by the minimum amount needed toŽ . Ž .make the equality hold in 4 . More specifically, it increases f ¨ andi

Ž . Ž . Ž .keeps g ¨ fixed if ¨ g W; it increases g ¨ and keeps f ¨ fixed ifi i i iŽ Ž . Ž ..¨ g U. If the amount of increase of f ¨ or g ¨ is D, then, in either ofi i i

ˆthe above two cases, each of s f and t g is increased by D. Sinceî i i is q t s i ) k G 3, we have either s G 2 or t G 2. In the former case, thei i i i

încrease of f is at most Dr2; in the latter case the increase of g is atî iŽ .most Dr2. Therefore, with an appropriate value of D, we can make h ¨ i

ˆŽ .equal to min f , g . It is clear that the result of saturate is a valid andî isaturated scenario. Moreover, this transformation never decreases thevalue of S, because the denominator is fixed and the numerator may onlyincrease.

The purpose of transformation saturate is to make a scenario monotone:Ž . Ž . Ž .we call scenario S monotone if h ¨ F h ¨ F ??? F h ¨ .kq1 kq2 n

PROPOSITION 6. If S is älid and saturated then S is monotone.

ˆŽ .Proof. Since S is saturated, it suffices to show that min f , g Gîq1 iq1ˆŽ .min f , g for k F i - n. Fix i, k F i - n, and suppose ¨ g U. Sinceî i iq1

ASAHIRO ET AL.216

t g G t g and t s t , we have g G g . Therefore, it suffices toˆ ˆ ˆ îq1 iq1 i i iq1 i iq1 i

ˆ ˆ ˆ ˆ ˆŽ .show that f G min f , g , i.e., f G f or f G g . First note thatˆ îq1 i i iq1 i iq1 i

ˆ ˆ ˆŽ . Ž Ž ..s f G s f q h ¨ , since ¨ g U. If we set y s s f q h ¨ riq1 iq1 i i iq1 iq1 i i iq1

ˆ ˆŽ . Ž .s hence f G y then y is a convex combination of f and h ïq1 iq1 i iq1ˆ ˆŽ . Ž .since s s s q 1. Therefore, either h ¨ G y G f or h ¨ - y - f .iq1 i iq1 i iq1 i

ˆ ˆThe first case trivially implies f G f . In the second case, we must haveiq1 iˆŽ .h ¨ s g , since S is saturated, and hence f G y ) g G g .ˆ ˆ îq1 iq1 iq1 iq1 i

To deal with the case ¨ g W, swap the roles of f and U with those of giand W.

Our two other transformations are defined on valid monotone scenarios.Ž .As we will see, they keep h ¨ unchanged for every ¨ and hence preservei i

monotonicity. Note, however, that they do not necessarily preserve saturat-edness}it is not important since saturation is only a mean for achievingmonotonicity.

ˆ Ž .Call scenario S lean if there exists some i, k - i F n, such that f F h ¨ .i nOur second transformation, called squeeze, converts an arbitrary mono-tone valid scenario S into a lean one: fixing other components of S, it

ˆdecreases f by the minimum amount needed to make S lean. Note hereˆ ˆ ˆthat f for each i is an increasing function of f and setting f s 0 certainlyi

ˆ Ž .makes f F h ¨ hold. Therefore, transformation squeeze is always wellk nŽ .defined. Transformation squeeze does not change h ¨ or g for any iî i

ˆ Ž . Ž . Ž .and does not make any f smaller than h ¨ . And since h ¨ G h ¨ duei n n ito the monotonicity of S, the result of squeeze remains valid. Moreover,

ˆ Ž .since f is common in the numerator and the denominator of val S ,Ž . Ž . Žsqueeze does not decrease val S provided val S G 1 originally which

.we can always assume for our upper-bounding purposes .Our last transformation, which is parameterized by integers j and l,

Ž .k - j - l F n, and is called exchange j, l , applies to lean, monotonevalid scenarios. Loosely speaking, this transformation corresponds to mov-ing some weight from the edges between ¨ and U to the edges between ¨j jand W while moving back the same amount of weight from the edgesbetween ¨ and W to the edges between ¨ and U. See Fig. 2. Morel lformally suppose a monotone valid scenario S is given that is lean. Let

ˆŽ . Ž .m S denote the largest integer m such that f F h ¨ . Transformationm nŽ . Ž . Ž . Ž .exchange j, l is applicable to S if m S - j - l F n, f ¨ ) 0, g ¨ ) 0,j l

and ¨ g U. Suppose these conditions are satisfied. Keeping fixed each oflŽ . Ž . Ž . Ž . Ž . Ž .h ¨ , h ¨ , f ¨ q f ¨ , and hence g ¨ q g ¨ , the transformationj l j l j l

Ž . Ž Ž ..decreases f ¨ and hence g ¨ by the minimum amount such that onej lˆŽ . Ž . Ž . Ž . Ž . Ž .of the following happens: a f ¨ s 0, b g ¨ s 0, or c f F h ¨ forj l i n

some i, j F i - l. The result of this transformation is a valid scenario,ˆbecause g for each i is never decreased and, although f may bei i


Ž .FIG. 2. Operation exchange j, l .

Ž . Ž .decreased it may never become strictly smaller than h ¨ G h ¨ due ton iŽ . Žcondition c . It is also clear that the scenario remains monotone because

ˆ. Ž Ž .h is not changed and lean because neither f nor f ¨ , for any k - i FiˆŽ .m S , is changed and hence f remains unchanged for every i in thati

. Ž .range . Finally note that this transformation does not decrease val S ,because the denominator is fixed and the numerator is either unchangedŽ . Ž .when both ¨ and ¨ are in U or increased when ¨ g W and ¨ g U .j l j l

We call a scenario S canonical, if it is valid, monotone, and lean and if,Ž .moreover, transformation exchange j, l is not applicable to S for any

pair of indices j, l, i.e.,

Ž . Ž . Ž . Ž .A there is no pair j, l , m S - j - l F n such that ¨ g U, f ¨l jŽ .) 0, and g ¨ ) 0.l

The following lemma summarizes the role of our transformations.

Ž . Ž .LEMMA 7. For any älid n, k -scenario S such that val S G 1, there is aX Ž X. Ž .canonical scenario S such that val S G val S .

Proof. S can be made monotone by applying transformation saturate.Then, transformation squeeze can be applied to make it lean. Suppose S

Ž .is already lean. We repeatedly apply transformation exchange j, l choos-ing the smallest possible j and largest possible l at each application, untilS is canonical. This process terminates in a finite number of steps, becauseŽ . Ž .m S does not decrease in this process and, during the steps in which m S

Ž Ž . Ž .is fixed and hence only conditions a and b in the definition of

ASAHIRO ET AL.218

Ž . . Ž .exchange j, l happen we can apply exchange j, l at most once for eachŽ .fixed pair j, l . Therefore, we eventually arrive at a canonical scenario. As

we have observed, none of the transformations decreases the value of thescenario.

The following property of a canonical scenario is crucial in our upperbound proof.

ˆŽ .LEMMA 8. Let S s V, U, s , f , g, f , g be a canonical scenario. ThenˆŽ . Ž .g ¨ s 0 for eëry ¨ g U with m S q 2 F i F n.i i

Ž . Ž .Proof. Let S be canonical and m s m S . Suppose that g ¨ ) 0 foriŽ . Ž .some ¨ g U, m q 2 F i F n. Then, from condition A , we have f ï mq1

s 0. We consider two cases: ¨ g U or not. Suppose first that ¨ g U.mq 1 mq1ˆ ˆŽ . Ž .Then, since f ¨ s 0, we have s f s s f q h ¨ and, sincemq 1 mq1 mq1 m m mq1

ˆ ˆ ˆŽ . Ž .s s s q 1, we have h ¨ s s q 1 f y s f . Since f Fmq 1 m mq1 m mq1 m m mˆŽ . Ž . Ž .h ¨ and f ) h ¨ from the definition of m S , it must follow thatn mq1 n

Ž . Ž .h ¨ ) h ¨ , a contradiction to the monotonicity of S. Suppose nextmq 1 nˆ ˆŽ .that ¨ g W. Then it follows from f ¨ s 0 that s f s s fmq 1 mq1 mq1 mq1 m m

ˆ ând hence, since s s s , f s f , contradicting the definition ofmq 1 m mq1 mŽ .m S . Therefore, there must be no i in the range m q 2 F i F n such that

Ž .¨ g U and g ¨ ) 0.i i

We are ready to prove the main lemma for the upper bound result.

Ž . Ž . Ž .2LEMMA 9. For any canonical n, k -scenario S, val S F 1r2 q nr2kŽ . Ž . Ž . Ž 2 .q O 1rn for nr3 F k F n and val S F 2 nrk y 1 q O nrk for k -

nr3.

ˆŽ . Ž .Proof. Let S s G, U, s , f , g, f , g be a canonical n, k -scenario andˆŽ . Ž . Ž .let m s m S . To evaluate val S , we split the numerator of val S in two

ˆ Ž . Ž .parts: N s s f q 2Ý f ¨ and N s 2Ý f ¨ . The1 k k - iF m , ¨ gU i 2 m- iF n, ¨ gU ii i

first part is bounded by

ˆN F s f F s h ¨ 5Ž . Ž .1 m m m n

and the second part by

N F 2 k y s h ¨ . 6Ž . Ž . Ž .2 m n

Therefore, we have

2k y s h ¨Ž . Ž .m nval S F . 7Ž . Ž .

ˆs f q t gk k


X X ˆ ˆŽLet n s n if m s n and n s n y k q s if m - n. Let h s s f qmq 1 k. Ž .t g rk. We claim the following bound on h ¨ .ˆk n

nX y 1 ˆh ¨ F h 8Ž . Ž .n k y 1

To prove this bound, we first consider the easier case where nX s n. Forˆ ˆŽ .k F i F n, let h s s f q t g ri denote the ‘‘average degree’’ of V inî i i i i i

ˆŽ . Ž .scenario S. Note here that s q t s i. Since h ¨ F min f , g from theî i i i iˆ ˆŽ .validity of S, we have h ¨ F h for k F i F n. From the definition of fi i i

ˆ ˆŽ . Ž .and g , we have, for k F i - n, i q 1 h s ih q 2h ¨ and hencei iq1 i iq1ˆ ˆ ˆ ˆ ˆŽ . Ž .i q 1 h F ih q 2h or h ri F h r i y 1 . Therefore, we haveiq1 i iq1 iq1 i

ˆ ˆ ˆŽ . Ž . Ž .h r n y 1 F h r k y 1 by a simple induction. Since h ¨ F h , andn k n nˆ ˆ XŽ .h s h, inequality 8 , with n s n, holds.k

Now suppose m - n and hence nX s n y k q s . We consider anmq 1X X X X X ˆ X X XŽ . Ž .n , k -scenario S s V , U , s , f , g, f , g , where U s U l V , V s Uˆ mq 1

j W, and s X is the sequence obtained from s by striking out each ¨ notiin V X; remaining components are common with S except that functions f

X X Ž .and g are restricted to the elements of s . For k F i F n , let j i denote< X < X Ž Xthe smallest integer j such that V l V s i. Note that s s ¨ ,j jŽn .

. Ž .X¨ , . . . , ¨ with this notation. In particular, j i s i for k F i F mjŽn y1. jŽk .

q1.We claim that SX is a valid scenario. We defer the proof and first argue

Ž .that inequality 8 follows from this claim. Using the same argument as inŽ .the case m S s n, we obtain that

nX y 1ˆ ˆXh F h.jŽn . k y 1X ˆŽ . Ž . Ž . XSince j n ) m, the definition of m s m S implies that h ¨ - f . Wen jŽn .Ž .also have h ¨ F g from the validity of S. Furthermore, since ¨ g U andˆn n i

Ž . Ž X. Ž .g ¨ s 0 for j n - i F n the latter being implied by Lemma 8 , we haveiˆŽ .X X Xg s g and hence h ¨ F g . Since h is a weighted average ofˆ ˆ ˆn jŽn . n jŽn . jŽn .

ˆ ˆŽ .X X Xf and g , we conclude that h ¨ F h and hence that inequalityˆjŽn . jŽn . n jŽn .Ž .8 holds.

X X < X <It remains to show that S is a valid scenario. Let s s U l V andi jŽ i.X X X X ˆ< <t s W l V and define f and g for k F i F n as f , g are definedˆ î jŽ i. i i i i

for S:

X X ˆs f s s f q f ¨ q h ¨Ž . Ž .Ý Ýi i k l lX XŽ . Ž .k-lFj i , ¨ gV k-lFj i , ¨ gUl l

tX gX s t g q g ¨ q h ¨ .Ž . Ž .ˆ ˆ Ý Ýi i k l lXŽ . Ž .k-lFj i , ¨ gV k-lFj i , ¨ gWl l

ASAHIRO ET AL.220

Note that sX s s for k F i F m and sX s s for m q 1 F i F nX whilei i i mq1tX s t for the entire range k F i F nX.i jŽ i.

X X XŽ . Ž .We need to show that h ¨ F min f , g for k - i F n . For k - i FˆjŽ i. i iX ˆŽ .m q 1, this follows from the validity of S, since j i s i, f s f , andi i

X Ž . Xg s g . Suppose i ) m q 1. Since g ¨ s 0 for every ¨ g V _V fromˆ î i l lLemma 8, we have

tX gX s t g q g ¨ q h ¨Ž . Ž .ˆ ˆ Ý Ýi i k l lXŽ . Ž .k-lFj i , ¨ gV k-lFj i , ¨ gWj l

s t g q g ¨ q h ¨Ž . Ž .ˆ Ý Ýk l lŽ . Ž .k-lFj i , ¨ gV k-lFj i , ¨ gWj l

s t gjŽ i. jŽ i.

and hence gX s g , since tX s t . Therefore, from the validity of S, weˆ î jŽ i. i jŽ i.X X XŽ . Ž .have h ¨ F g s g . To compare h ¨ with f , first note that fˆ ˆjŽ i. jŽ i. i jŽ i. i mq1

ˆ X X XŽ . Ž .s f ) h ¨ from the definition of m s m S . Since s f G s fmq 1 n i i mq1 mq1X X XŽ . Ž .and hence f G f because s s s for i ) m, it follows that h ï mq1 i mq1 n

X XŽ .- f and hence h ¨ - f since S is monotone. Therefore, we havei jŽ i. iX X XŽ . Ž .h ¨ F min f , g in the remaining range m q 1 - i F n , establishingˆjŽ i. i iX Ž .the validity of S . This concludes the proof of inequality 8 .

Ž . Ž .Combining inequalities 7 and 8 we have

2k y s nX y 1Ž . Ž .mval S F .Ž .

k k y 1Ž .

Recall that nX s n y k q s if m - n and nX s n if m s n. In eithermq 1X Ž .case, we have n F n y k q s q 1 since in the latter case s s k . Thus,m m

Ž . ŽŽ .Ž .. Ž Ž ..we have val S F 2k y s n y k q s r k k y 1 , or putting x sm ms rk and r s nrk,m

val S F 2 y x r y 1 q x kr k y 1Ž . Ž . Ž . Ž .Ž .F 2 y x r y 1 q x 1 q 2rkŽ . Ž . Ž .F 2 y x r y 1 q x q 4 rrkŽ . Ž .

Ž . Ž .Ž .arriving at the same quadratic function F x s 2 y x r y 1 q x as inrŽ .2 Ž .Section 3. When r - 3, it attains its maximum r q 1 r4 at x s 3 y r r2

Ž .and when r G 3, its maximum is 2 r y 1 at x s 0. This concludes theproof of Lemma 9.

The upper bound in Theorem 1 now follows from the observation onquasioptimal solutions and Lemmas 5, 7, and 9.


We remark that some insight into the above upper bound proof may beobtained if the reader examines how the lower bound examples in Section

Ž . Ž . Ž .3 achieve equality or near-equality in the inequalities 5 , 6 , and 8 .

ACKNOWLEDGMENT

The authors thank the anonymous referees for their valuable comments. They especiallythank the second referee for pointing out some errors in the original submission.

REFERENCES

w xAI95 Y. Asahiro and K. Iwama, Finding dense subgraphs, in ‘‘Proc. InternationalSymposium on Algorithms and Computation ’95,’’ Lecture Notes in ComputerScience, Vol. 1004, pp. 102]111, Springer-Verlag, Berlin, 1995.

w xAKK95 S. Arora, D. Karger, and M. Karpinski, Polynomial time approximation schemesfor dense instances of NP-hard problems, in ‘‘Proc. 27th ACM Symposium onTheory of Computing,’’ pp. 284]293, 1995.

w xFS97 U. Feige and M. Seltser, ‘‘On the Densest k-Subgraph Problem,’’ Technicalreport, Department of Applied Mathematics and Computer Science, The Weiz-mann Institute, Rehobot, 1997.

w xGL95 M. Grotschel and L. Lovasz, Combinatorial optimization: A survey, in ‘‘Handbook¨ óf Combinatorics,’’ North-Holland, Amsterdam, 1995.

w x Ž .KP94 G. Kortsarz and D. Peleg, Generating sparse 2-spanners, J. Algorithms 17 1994 ,222]236.

w xKP93 G. Kortsarz and D. Peleg, On choosing a dense subgraph, in ‘‘Proc. 34th IEEESymposium on Foundations of Computer Science,’’ pp. 692]701, 1993.

w xRag88 P. Raghavan, Probabilistic construction of deterministic algorithms: Approximat-Ž .ing packing integer programs, J. Comput. System Sci. 37 1988 , 130]143.

w xRRT94 S. S. Ravi, D. J. Rosenkrantz, and G. K. Tayi, Heuristic and special caseŽ .algorithms for dispersion problems, Oper. Res. 42 1994 , 299]310.

w xSW98 A. Srivastav and K. Wolf, Finding dense subgraphs with semidefinite program-ming, in ‘‘Approximation Algorithms for Combinatorial Optimization,’’ LectureNotes in Computer Science, Vol. 1444, pp. 181]191, Springer-Verlag, Berlin, 1998.

Documents

Greedily Finding a Dense Subgraph