Transcript
Page 1: A local algorithm for finding dense subgraphs

60

A Local Algorithm for Finding Dense Subgraphs

REID ANDERSEN

Microsoft Research

Abstract. We describe a local algorithm for finding subgraphs with high density, according to ameasure of density introduced by Kannan and Vinay [1999]. The algorithm takes as input a bipartitegraph G, a starting vertex v , and a parameter k, and outputs an induced subgraph of G. It is localin the sense that it does not examine the entire input graph; instead, it adaptively explores a regionof the graph near the starting vertex. The running time of the algorithm is bounded by O(�k2),which depends on the maximum degree �, but is otherwise independent of the graph. We prove thefollowing approximation guarantee: for any subgraph S with k ′ vertices and density θ , there existsa set S′ ⊆ S for which the algorithm outputs a subgraph with density �(θ/ log �) whenever v ∈ S′

and k ≥ k ′. We prove that S′ contains at least half of the edges in S.

Categories and Subject Descriptors: G.2.2 [Discrete Mathematics]: Graph Theory—Graphalgorithms

General Terms: Algorithms, Theory

Additional Key Words and Phrases: Dense subgraphs, local algorithms, spectral graph theory

ACM Reference Format:Andersen, R. 2010. A local algorithm for finding dense subgraphs. ACM Trans. Algor. 6, 4, Article 60(August 2010), 12 pages.DOI = 10.1145/1824777.1824780 http://doi.acm.org/10.1145/1824777.1824780

1. Introduction

Algorithms for finding dense subgraphs are a useful tool for analyzing networks. Inparticular, they have been used to identify the cores of communities [Kumar et al.1999] and to combat link spam [Gibson et al. 2005].

The measure of density we consider was introduced by Kannan and Vinay [1999].As an example, consider the bipartite graph that describes the incidence relationshipbetween a set of groups G and a set of users U , where the edge between a user anda group exists when the user belongs to the group. The subgraph that consists ofthe set of groups S ⊆ G and the set of members T ⊆ U has density

d(S, T ) = e(S, T )√|S|√|T | , (1)

Author’s address: R. Andersen, Microsoft, One Microsoft Way, Redmond, WA 98052, e-mail:[email protected] to make digital or hard copies of part or all of this work for personal or classroom useis granted without fee provided that copies are not made or distributed for profit or commercialadvantage and that copies show this notice on the first page or initial screen of a display along withthe full citation. Copyrights for components of this work owned by others than ACM must be honored.Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistributeto lists, or to use any component of this work in other works requires prior specific permission and/ora fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701,New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected]© 2010 ACM 1549-6325/2010/08-ART60 $10.00DOI 10.1145/1824777.1824780 http://doi.acm.org/10.1145/1824777.1824780

ACM Transactions on Algorithms, Vol. 6, No. 4, Article 60, Publication date: August 2010.

Page 2: A local algorithm for finding dense subgraphs

60:2 R. ANDERSEN

where e(S, T ) is the total number of incidences between the groups and the mem-bers of the subgraph. Here d(S, T ) can be viewed as the geometric mean of theaverage degree of the users and the average degree of the groups. We remark thatthe density measure used in Kannan and Vinay [1999] was defined for directedgraphs rather than bipartite ones. These two settings are equivalent, by a simplereduction described in Section 2.1.

Kannan and Vinay gave a spectral algorithm that produces a subgraph whered(S, T ) is within an O(log n) factor of optimal [Kannan and Vinay 1999]. Charikarshowed that a subgraph achieving the maximum value of d(S, T ) can be identifiedin polynomial time by solving a linear program [Charikar 2000]. He also gavea greedy algorithm that runs in linear time and produces a set with density atleast half of the optimal d(S, T ) when an additional parameter is set appropriately.Algorithms for optimizing d(S, T ) subject to additional size constraints were givenin Khuller and Saha [2009]. Other measures of density have been studied moreextensively that d(S, T ), most notably the average degree of a subgraph (see, forexample, Goldberg [1984], Gallo et al. [1989], and Kortsarz and Peleg [1994]),and variations with additional constraints on the size of the subgraph (see forexample, Feige and Seltser [1997] and Feige et al. [2001]). This related work willbe described in more detail in Section 2.2.

In this article we introduce an algorithm for finding dense subgraphs that couldbe described as a “local exploration algorithm.” The algorithm is given as input alarge graph and a specified starting vertex, and attempts to find a dense subgraphof G near the starting vertex by examining only a small subset of the graph. Weview the graph as being accessed by the algorithm via an oracle, from whichit it can request the degree of a vertex or the adjacency list of a vertex. Thisparticular style of local exploration algorithm and approximation guarantee wasfirst introduced by Spielman and Teng for the graph partitioning problem of findingsparse cuts [Spielman and Teng 2004, 2008].

Our algorithm takes as input a bipartite graph G, a starting vertex v ∈ G, and atarget size k. We prove that the running time is bounded by O(�k2), which dependson the target size k and the maximum degree �, but is otherwise independent of thegraph. We prove that for any subgraph (S, T ) there exists a large induced subgraph(S′, T ′) ⊆ (S, T ) for which the algorithm will produce a subgraph with highdensity. In particular, it will output a subgraph with density �(d(S, T )/ log �)whenever v ∈ (S′, T ′) and max(|S|, |T |) ≤ k, and we prove that the inducedsubgraph (S′, T ′) contains at least half of the edges in (S, T ). This algorithm hasseveral applications: it can find a dense subgraph near a targeted vertex of interest byexamining only a portion of the graph, it can be applied from many different startingvertices in parallel, and it provides an upper bound on the size of the subgraph itoutputs.

The main component of the algorithm is the pruned growth process, a determin-istic iterative process in which the current vector is multiplied by the adjacencymatrix of the graph, then normalized, then rounded to ensure that the number ofnonzero elements in the vector is small. We show that by computing these sparsevectors we can identify a dense subgraph. This is similar in spirit to the truncatedrandom walks used in the local partitioning algorithm of Spielman and Teng [2004,2008], and both processes can be viewed as sparse approximate versions of existingspectral algorithms.

ACM Transactions on Algorithms, Vol. 6, No. 4, Article 60, Publication date: August 2010.

Page 3: A local algorithm for finding dense subgraphs

A Local Algorithm for Finding Dense Subgraphs 60:3

In Section 2, we survey related work on finding dense subgraphs, and comparedifferent measures of density that have appeared in the literature. In Section 3, wedefine the pruned growth process. In Section 4, we state the algorithm and provethe local approximation guarantee.

2. Preliminaries and Related Work

Let G = (V, E) be a simple undirected bipartite graph, and fix a bipartition L , R.Let A be the adjacency matrix of the graph, let d(v) be the degree of a vertex v ,and let � be the maximum degree in the graph. We remark that the algorithmcan be generalized to weighted graphs without any surprises, and we omit thedetails.

For any pair of sets S ⊆ L and T ⊆ R, let (S, T ) denote the induced bipartitesubgraph of G on S ∪ T . We define e(S, T ) to be the number of edges between Sand T . We will use the inner product notation e(S, T ) = 〈1S A, 1T 〉, where 1S isthe indicator function on the set S. Unless stated otherwise, a vector is assumed tobe a row vector indexed by the set of vertices V . For a vector x , we let supp(x)denote the support of the vector (the set of vertices on which x is nonzero). We let‖x‖ denote the usual vector 2-norm, and let x := x/‖x‖ denote the unit vector inthe direction of x .

2.1. THE OBJECTIVE FUNCTION d(S, T ). We use the following objective func-tion to measure density, which was introduced in Kannan and Vinay [1999].

Definition 1. For any induced subgraph (S, T ),

d(S, T ) = e(S, T )√|S|√|T | .

We define the density of the graph d(G) to be the maximum value of d(S, T ) overall induced subgraphs in G.

This objective function was originally defined in the setting of arbitrary directedgraphs rather than undirected bipartite graphs. We now sketch a simple reductionthat shows the two settings are equivalent for the purpose of identifying densesubgraphs.

Given an arbitrary directed graph on the vertex set V , construct an undirectedbipartite graph where L and R are two distinct copies of V . For each edge x → yin the directed graph, place an undirected edge between the copy of x in L andthe copy of y in R. There is an obvious bijective correspondence between pairs ofsets (S, T ) in the directed graph, and pairs of sets (S ⊆ L , T ⊆ R) in the undi-rected bipartite graph. Kannan and Vinay [1999] originally defined the objectivefunction d ′(S, T ) = e(S, T )/

√|S||T |, where e(S, T ) is the number of edges fromS to T in the directed graph. It is easy to see that d ′(S, T ) = d(S, T ) under thecorrespondence described earlier.

2.2. RELATED WORK. A different measure of density, equivalent to the averagedegree of the subgraph, has been studied extensively [Goldberg 1984; Gallo et al.1989; Kortsarz and Peleg 1994; Charikar 2000].

ACM Transactions on Algorithms, Vol. 6, No. 4, Article 60, Publication date: August 2010.

Page 4: A local algorithm for finding dense subgraphs

60:4 R. ANDERSEN

Definition 2. Let G = (V, E) be an undirected graph (not necessarily bipar-tite). For any set S ⊆ V , we define

f (S) = e(S, S)

|S| .

We define f (G) to be the maximum value of f (S) over all subsets of V .A set achieving the optimal density f (G) can be found in polynomial time.

Goldberg showed how to compute such a set using a sequence of maximum flowcomputations [Goldberg 1984]. The basic idea is to guess a parameter α andmaximize the supermodular function e(S, S) − α|S|. This can be done by solvinga flow computation or applying a submodular function minimizer. By repeatedlyapplying this procedure in a binary search, one can find the optimal f (S). Aset achieving f (G) can also be found by a single applicaiton of the parametricflow algorithm of Gallo et al. [1989]. Kortsarz and Peleg [1994] gave a greedy2-approximation algorithm for f (G) that runs in time O(m) in an unweightedgraph. In contrast, the problem of finding an induced subgraph on exactly k verticeswith the maximum number of edges is NP-hard, and there is a large gap betweenthe best approximation algorithms and hardness results known for the problem(see, for example, Feige and Seltser [1997] and Feige et al. [2001]).

Kannan and Vinay gave a spectral approximation algorithm that produces asubgraph (S, T ) with density d(S, T ) = �(d(G)/ log n) from the top singularvectors of A. Charikar showed that a subgraph (S, T ) achieving d(S, T ) = d(G)can be found in polynomial time by solving a linear program [Charikar 2000],and gave a greedy algorithm that finds a subgraph achieving d(S, T ) = d(G)/2using an approach similar to Kortsarz and Peleg [1994]. That algorithm requires anadditional parameter as input, runs in time O(m) for any value of the parameter, andproduces the 2-approximation when the parameter is set correctly. A suitable valueof the parameter can be found using binary search, which requires applying thealgorithm O(log n) times, for a total running time of O(m log n) in an unweightedgraph.

We now compare the objective functions d(S, T ) and f (S). It is easier to comparethese two objective functions if we restrict f (S) to bipartite graphs.

Definition 3. Given an undirected bipartite graph G, for any subgraph (S, T )we define

f (S, T ) = e(S, T )

|S| + |T | .

We define f (G) to be the maximum value of f (S, T ) over all inducedsubgraphs.

The only difference between f (S, T ) and d(S, T ) is the denominator. Notethat d(S, T ) ≥ 2 f (S, T ), since 2

√|S|√|T | ≤ |S| + |T |, but d(S, T ) can bemuch larger than f (S, T ) if S and T have very different sizes. In the completebipartite graph Ka,b, we have d(G) = √

ab, while f (G) = ab/(a + b). The moststriking difference between the two objective functions is that a star has highdensity according to d, but very low density according to f ; In a star wherea = 1, we have d(G) = √

b while f (G) = b/(b + 1) ∼ 1. The merits of d as anobjective functions for density were discussed in Charikar [2000] and Kannan and

ACM Transactions on Algorithms, Vol. 6, No. 4, Article 60, Publication date: August 2010.

Page 5: A local algorithm for finding dense subgraphs

A Local Algorithm for Finding Dense Subgraphs 60:5

Vinay [1999]. In the author’s opinion, d is a weaker objective function than f ,because of the differences described before. We are considering d(S, T ) because itis more amenable to spectral techniques which are central to the algorithm in thisarticle.

A local algorithm for finding small subgraphs with large spectral radius wasintroduced in Andersen and Cioaba [2007], which was based on techniques fromthe conference version of the current article [Andersen 2008]. One can find a densesubgraph by first finding a subgraph with large spectral radius using the algorithmfrom Andersen and Cioaba [2007], and then applying Kannan and Vinay’s spectralalgorithm to the subgraph. The resulting algorithm has essentially the same runningtime as the one described here, but the local approximation guarantee applies to asmaller set of starting vertices.

We remark that the notion of local algorithm considered here is different fromthe one studied in distributed computing (for example, in Naor and Stockmeyer[1995]). In that context, each node in the graph is viewed as a processor and acomputation is considered local if it can be carried out by the distributed networkin a number of rounds independent of the network size.

3. The Pruned Growth Process

We now define the deterministic process on which the local algorithm is based.The process generates a sequence of vectors x0, . . . , xT from a starting vector x0.At each step the vector is multiplied by the adjacency matrix A, then normalized,and then pruned by setting to zero each entry whose value is below a specifiedthreshold. This pruning step reduces the size of the support, and therefore reducesthe amount of computation required to compute the sequence.

Definition 4. Given a vector z and a nonnegative real number ε, we defineprune(ε, z) to be the vector obtained by setting to zero any entry of z whose valueis at most ε,

[prune(ε, z)](u) ={

z(u) if z(u) ≥ ε,

0 otherwise.

Definition 5. Given a starting vector x0 and a sequence ε1, . . . , εt0 of nonneg-ative real numbers, we define a sequence of vectors x0, . . . , xt0 iteratively by thefollowing rule:

yt = xt−1 A/‖xt−1 A‖xt = prune(εt , yt )

We say x0, . . . , xt0 are the vectors generated by the pruned growth process.

The size of the support of xt can be bounded as follows. For any time t and vertexu, we have either xt (u) = 0 or xt (u) ≥ εt , so 1 ≥ ‖xt‖2 ≥ ε2

t |supp(xt )|, whichimplies the bound

|supp(xt )| ≤ ε−2t . (2)

The algorithm will identify a dense subgraph by partitioning the vertices of thegraph into a collection of disjoint sets determined by the vectors xt−1 and yt , asdescribed next.

ACM Transactions on Algorithms, Vol. 6, No. 4, Article 60, Publication date: August 2010.

Page 6: A local algorithm for finding dense subgraphs

60:6 R. ANDERSEN

Definition 6. For any nonnegative vector z and integer i , we define

Si (z) := {v : z(v) ∈ (2−(i+1), 2−i ]

}.

For each t < t0, we let yt = xt A, and define Xti = Si (xt ) and Y t

j = Sj (yt ).

Notice that for any nonnegative vector z,

z ≤∑i∈Z

2−i 1Si (z) ≤ 2z.

4. The Local Approximation Algorithm

In this section we state the local algorithm and prove the main theorems.

Algorithm. LocalDense(G, v, k)Input: A bipartite graph G, a vertex v , and a target size k.Output: A subgraph (X, Y ) of G.

(1) Let t0 = log(2√

k), and εt = √2k−12t .

(2) Compute the pruned growth process vectors x0, . . . , xt0 ,starting from x0 = 1v .

(3) For each time t < t0, compute the density d(Xti , Y t+1

j ) for each pair of indices (i, j) for whichXt

i �= ∅, Y t+1j �= ∅, and |i − j | ≤ 4 + log �. Output the densest subgraph among all such pairs.

THEOREM 1. Let (S, T ) be a subgraph such that d(S, T ) ≥ 16θ for someθ ≥ 1. There exists an induced subgraph (S′, T ′) ⊆ (S, T ) with the followingproperties.

(1) e(S′, T ′) ≥ e(S, T )/2,

(2) If v ∈ (S′, T ′) and k ≥ max(|S|, |T |), then LocalDense(G, v, k) outputs asubgraph (X, Y ) that satisfies

d(X, Y ) ≥ θ

72 + 16 log(�θ−1)= �

1 + log(�θ−1)

).

THEOREM 2. The running time of LocalDense(G, v, k) is O(�k2). For thisbound, we assume that basic arithmetic operations can be performed in unit time.The operands are real numbers bounded by O(poly(n)) and �(1/poly(n)).

The proofs of Theorems 1 and 2 are given in Section 4.3, after we prove two keylemmas. Here is an outline of how we will proceed. In Section 4.1 we show that ifthe ratio ‖xt A‖/‖xt‖ is large at some t < t0, then the algorithm will produce a densesubgraph. In Section 4.2 we show that for every dense subgraph, there is a vectorsupported on that subgraph that grows quickly when multiplied by the adjacencymatrix, and whose entries are fairly large for many vertices in the subgraph. In theproof of Theorem 1 we use this vector to give a bound on ‖xt‖, which will ensurethat the algorithm outputs a dense subgraph.

ACM Transactions on Algorithms, Vol. 6, No. 4, Article 60, Publication date: August 2010.

Page 7: A local algorithm for finding dense subgraphs

A Local Algorithm for Finding Dense Subgraphs 60:7

4.1. FINDING A DENSE SUBGRAPH FROM A FAST-GROWING VECTOR. The fol-lowing lemma shows that if ‖xt A‖/‖xt‖ is large, then one of the subgraphs(Xt

i , Y t+1j ) has high density. The proof combines elements from Kannan and Vinay’s

algorithm for finding dense subgraphs from singular vectors [Kannan and Vinay1999] and Bilu and Linial’s algorithm for finding sets with high discrepancy [Biluand Linial 2004].

LEMMA 1. Let x be a nonnegative vector. Let y = x A, let Xi = Si (x), and letY j = Sj (y). Let P be the set of pairs (i, j) such that Xi and Y j are nonempty and|i − j | ≤ 4 + log �, and let

d(x) = max(i, j)∈P

d(Xi , Y j ).

If ‖y‖ ≥ β‖x‖ for some β ≥ 1, then

d(x) ≥ β/(72 + 16 log(�β−1)).

PROOF OF LEMMA 1. Notice that 〈x A, y〉 = ‖y‖/‖x‖ ≥ β. To prove the lemma,we will bound 〈x A, y〉 in terms of d(x).

〈x A, y〉 ≤⟨∑

i≥0

2−i 1XiA,

∑j≥0

2− j 1Y j

=∑

(i, j)∈I

2−i 2− j⟨1Xi

A, 1Y j

Here, I is the set of pairs (i ≥ 0, j ≥ 0) for which Xi and Y j are nonempty.We partition I into three disjoint sets of pairs, P�, Q�, and R�, depending on thedifference between i and j .

P� = { (i, j) ∈ I : | j − i | ≤ � }Q� = { (i, j) ∈ I : i > j + � }R� = { (i, j) ∈ I : j > i + � }

We set � = 4 + log �β−1 ≤ 4 + log �, so P� ⊆ P .We sum over each set of pairs separately. The sum over indices in P� is

S(P�) =∑

(i, j)∈P�

2−i 2− j⟨1Xi

A, 1Y j

=∑

(i, j)∈P�

2−i 2− j d(Xi , Y j )√|Xi |

√|Y j |

≤ 1

2

∑(i, j)∈P�

d(Xi , Y j )(2−2i |Xi | + 2−2 j |Y j |

)

≤ 1

2d(x)

∑(i, j)∈P�

(2−2i |Xi | + 2−2 j |Y j |

).

The second-to-last line follows from the inequality 2ab ≤ a2 + b2, applied with

a = 2−i√|Xi | and b = 2− j

√|Y j |.

ACM Transactions on Algorithms, Vol. 6, No. 4, Article 60, Publication date: August 2010.

Page 8: A local algorithm for finding dense subgraphs

60:8 R. ANDERSEN

Now we observe that a given index i appears as the first element in at most 1+2�pairs in P�. Similarly, an index j appears as the second element in at most 1 + 2�pairs in P�. Therefore,

S(P�) ≤ 1

2(1 + 2�)d(x)

⎛⎝∑

i≥0

2−2i |Xi | +∑j≥0

2−2 j |Y j |⎞⎠

≤ 1

2(1 + 2�)d(x)(‖2x‖2 + ‖2y‖2)

≤ 4(1 + 2�)d(x).

We bound the sums over Q� and R� in terms of the maximum degree �. Thesum over indices in R� is

S(R�) =∑

(i, j)∈R�

2−i 2− j⟨1Xi

A, 1Y j

≤∑

(i, j)∈R�

2−i 2− j�|Xi |

≤∑i≥0

2−i�|Xi |∑

j>i+�

2− j

=∑i≥0

2−i�|Xi |2−(i+�)

= �2−�∑i≥0

2−2i |Xi |

≤ �2−�‖2x‖2

≤ 4�2−�.

Similarly, the sum over Q� is

S(Q�) ≤ �2−�‖2y‖2 ≤ 4�2−�.

Since we have set � = 4 + log �β−1, we have

β ≤ 〈x A, y〉 ≤ S(P�) + S(Q�) + S(R�)

≤ 4(1 + 2�)d(x) + 8�2−�

= 4(1 + 2�)d(x) + β/2.

Then, β ≤ 8(1 + 2�)d(x) = (72 + 16 log(�β−1))d(x).

4.2. LOWER BOUNDS ON GROWTH INSIDE A DENSE SUBGRAPH. The followinglemma identifies, for any subgraph (S, T ) with high density, a set of startingvertices for which we can give a good lower bound on ‖xt‖. We first identifya large subgraph (S′, T ′) inside of (S, T ) with high minimum degree, using awell-known vertex elimination argument also used in Kortsarz and Peleg [1994]and Charikar [2000]. We then construct a nonnegative vector ψ whose support is(S′, T ′) and for which the projection of xt onto ψ starts out reasonably large andgrows quickly at each step. These properties of ψ will be used to give a bound on‖xt‖ that is robust to the pruning step of the pruned growth process.

ACM Transactions on Algorithms, Vol. 6, No. 4, Article 60, Publication date: August 2010.

Page 9: A local algorithm for finding dense subgraphs

A Local Algorithm for Finding Dense Subgraphs 60:9

LEMMA 2. For every subgraph (S, T ), there exists a subgraph (S′, T ′) ⊆(S, T ) and a vector ψ with the following properties.

(1) ψ is nonnegative and ‖ψ‖ = 1.(2) ψ A ≥ (d(S, T )/4)ψ .(3) supp(ψ) = S′ ∪ T ′.(4) e(S′, T ′) ≥ e(S, T )/2.(5) If v ∈ (S′, T ′), then ψ(v) ≥ 1/

√2 max(|S|, |T |).

PROOF OF LEMMA 2. Given (S, T ), let dS = e(S, T )/|S| and dT = e(S, T )/|T |,which are the average degrees of the left and right sides of the subgraph. Note thatd(S, T ) = √

dSdT . We let a = dS/4 and b = dT /4.We will now use a well-known vertex elimination argument to show that there

exists a large induced subgraph of (S, T ) where the minimum degree on the leftside is a and the minimum degree on the right side is b. Let S1 and T1 be copies of Sand T . At each step, if there exists a vertex u ∈ S1 such that e(u, T1) ≤ a, or if thereexists a vertex u ∈ T1 such that e(u, S1) ≤ b, then select such a vertex arbitrarily andremove it from S1 or T1. Continue this process until no such vertex remains, thenlet S′ = S1 and T ′ = T1. Clearly minu∈S′ e(u, T ′) ≥ a and minu∈T ′e(u, S′) ≥ b.Note that the number of edges in e(S1, T1) decreased by at most a each time avertex from S1 was removed, and decreased by at most b each time a vertex fromT1 was removed, so the total number of edges remaining in e(S′, T ′) satisfies

e(S′, T ′) ≥ e(S, T ) − a|S| − b|T | ≥ e(S, T ) − dS|S|/4 − dT |T |/4 ≥ e(S, T )/2.

Now consider the following vector φ supported on (S′, T ′),

φ(u) =

⎧⎪⎨⎪⎩

√a if u ∈ S′

√b if u ∈ T ′

0 otherwise.

The following shows that φ A ≥ (d(S, T )/4)φ. For any u ∈ S and any v ∈ T ,

φ A(u)/φ(u) = (e(u, T ′)√

b)/√

a ≥ a√

b/√

a = √a√

b = d(S, T )/4,

φ A(v)/φ(v) = (e(v, S′)√

a)/√

b ≥ b√

a/√

b = √a√

b = d(S, T )/4.

Let ψ = φ/‖φ‖. Clearly ψ is a nonnegative unit vector whose support is S′ ∪ T ′.We have already shown that ψ A ≥ (d(S, T )/4)ψ and e(S′, T ′) ≥ e(S, T )/2. Wehave ‖φ‖2 = a|S| + b|T | = e(S, T )/2, so for any u ∈ S and any v ∈ T ,

ψ(u) = φ(u)/‖φ‖ =√

2a/e(S, T ) = 1/√

2|S|.ψ(v) = φ(v)/‖φ‖ =

√2b/e(S, T ) = 1/

√2|T |.

We remark that the conference version of this article used a different argument inplace of Lemma 2. There, it was shown that for each vertex u in a large subsetof (S, T ), there exists a vector ψu that has properties similar to ψ . Each ψu is aneigenvector of some induced subgraph of (S, T ).

4.3. ANALYSIS OF THE ALGORITHM.

PROOF OF THEOREM 1. Let (S, T ) be a subgraph for which d(S, T ) ≥ 16θ forsome θ ≥ 1. Assume that v is a vertex in the set S′ ∪ T ′ described in Lemma 2,

ACM Transactions on Algorithms, Vol. 6, No. 4, Article 60, Publication date: August 2010.

Page 10: A local algorithm for finding dense subgraphs

60:10 R. ANDERSEN

and assume that k ≥ max(|S|, |T |). We’ll show that LocalDense(G, v, k) outputsa subgraph with density at least θγ −1, where γ = (72 + 16 log �θ−1).

Let x0, . . . , xt0 be the pruned growth process vectors starting from x0 = 1v . If thevector xt satisfies ‖xt A‖ ≥ θ at some time t < t0, then d(xt ) ≥ θγ −1 by Lemma 1,in which case we are done. We assume that this does not happen, which implies‖xt A‖ ≤ θ for all t < t0, and attempt to derive a contradiction.

By Lemma 2 there exists a nonnegative unit vector ψ such that ψ A ≥ 4θψ , suchthat supp(ψ) ⊆ S ∪ T , and such that ψ(v) ≥ 1/

√2k. We will prove the following

lower bound on the projection of xt onto ψ .

〈xt , ψ〉 ≥ 1√2k

2t for all t ≤ t0. (3)

Since xt is a unit vector and t0 = log(2√

k), this will be a contradiction.We know that (3) holds for t = 0. The main difficulty in the inductive step

is to analyze how the pruning step affects the projection onto ψ . Recall thatyt = xt−1 A/‖xt−1 A‖. Let rt := yt − xt be the vector that is removed during thepruning step, and note that rt (u) ≤ εt for any vertex u. Let r ′

t be the vector that isequal to rt on the support of ψ , and zero elsewhere. Since supp(ψ) ⊆ S ∪ T , andsupp(rt ) is contained in either L or R, we have |supp(r ′

t )| ≤ max(|S|, |T |) ≤ k.Therefore,

〈rt , ψ〉 = ⟨r ′

t , ψ⟩ ≤ ‖r ′

t‖ ≤ εt

√k.

We have 〈xt−1 A, ψ〉 = 〈xt−1, ψ A〉 ≥ 4θ 〈xt−1, ψ〉. Here we have used thatψ A ≥ 4θψ , that A is symmetric, and that xt−1 is nonnegative. Then,

〈yt , ψ〉 = 〈xt−1 A, ψ〉‖xt−1 A‖ ≥ 4θ 〈xt−1, ψ〉

θ≥ 4 〈xt−1, ψ〉 .

We now use the induction hypothesis that 〈xt−1, ψ〉 ≥ (√

2k)−12t−1, and the factthat εt = √

2k−12t−1,

〈xt , ψ〉 = 〈yt , ψ〉 − 〈rt , ψ〉≥ 4 〈xt−1, ψ〉 − εt

√k

≥ 2 〈xt−1, ψ〉 .

This establishes the inductive step for (3), and completes the proof.

PROOF OF THEOREM 2. We bound the running time of LocalDense(G, v, k) bybounding the number of vertices in the support of xt at each step. By (2), we have|supp(xt )| ≤ ε−2

t . Let Dt be the sum of the degrees of the vertices in supp(xt ),which satisfies

Dt = O(�|supp(xt )|) = O(�ε−2t ) = O(�k22−2t ).

Consider the amount of computation required at step t . We can compute xt Aand xt+1 from xt using O(Dt ) multiplications and additions, which we assume canbe carried out in unit time. We will show in the following that this dominates the

ACM Transactions on Algorithms, Vol. 6, No. 4, Article 60, Publication date: August 2010.

Page 11: A local algorithm for finding dense subgraphs

A Local Algorithm for Finding Dense Subgraphs 60:11

running time of the algorithm, so the total running time of the algorithm ist0∑

t=0

O(�k22−2t ) = O(�k2).

We still need to show that the rest of the compuation at time t can be carried outin time O(Dt ). We must compute the density d(Xt

i , Y t+1j ) for each pair (i, j) ∈ Pt ,

where Pt is the set of pairs for which Xti and Y t+1

j are both nonempty and forwhich i and j satisfy |i − j | ≤ 4 + log �. To compute this, we maintain a set ofvariables Wi, j that record the number of edges between Xt

i and Y t+1j . The largest

value of i we need to consider is O(log(ε−1t )) = O(log n), and the largest value of

j that we need to consider is O(log(ε−1t )) + O(log �) = O(log n), so the number

of variables Wi, j we need to maintain is O(log2 n). We loop through the vertices insupp(xt ) and examine their incident edges, each of which belongs to at most oneof the subgraphs (Xt

i , Y t+1j ). For each edge, we add one to the appropriate variable

Wi, j . This can be done in time O(Dt ). Once we have the values e(Xti , Y t+1

j ) foreach pair in Pt , it is easy to compute the density of each subgraph (Xt

i , Y t+1j ) and

select the densest.

5. Discussion and Open Problems

The running time and the size of the set output byLocalDense(G, v, k) are boundedby O(�k2), and improving either of these bounds in an open problem. We remarkthat the local algorithms for graph partitioning are more efficient, producing sub-graphs with O(k) vertices [Spielman and Teng 2004, 2008; Andersen et al. 2006].Obtaining a local approximation algorithm for the objective function e(S, S)/|S|is also open. If one could design a similar algorithm for the objective functione(S, S)/|S|, and improve the vertex bound to O(k) while maintaining a polylog-arithmic approximation ratio, such an algorithm could be used to design betterapproximation algorithms for the densest k-subgraph problem (see Andersen andChellapilla [2009] and Khuller and Saha [2009] for examples of this type ofreduction).

REFERENCES

ANDERSEN, R. 2008. A local algorithm for finding dense subgraphs. In Proceedings of the 19th AnnualACM-SIAM Symposium on Discrete Algorithms (SODA’08). 1003–1009.

ANDERSEN, R. AND CHELLAPILLA, K. 2009. Finding dense subgraphs with size bounds. In Proceed-ings of the 6th International Workshop on Algorithms and Models for the Web-Graph (WAW’09).25–37.

ANDERSEN, R., CHUNG, F., AND LANG, K. 2006. Local graph partitioning using pagerank vectors. InProceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06).475–486.

ANDERSEN, R. AND CIOABA, S. M. 2007. Spectral densest subgraph and independence number of agraph. J. Univer. Comput. Sci. 13, 11, 1501–1513.

BILU, Y. AND LINIAL, N. 2004. Constructing expander graphs by 2-lifts and discrepancy vs. spectral gap.In Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science (FOCS’04).404–412.

CHARIKAR, M. 2000. Greedy approximation algorithms for finding dense components in a graph. InProceedings of the 3rd International Workshop on Approximation Algorithms for Combinatorial Opti-mization (APPROX’00).

ACM Transactions on Algorithms, Vol. 6, No. 4, Article 60, Publication date: August 2010.

Page 12: A local algorithm for finding dense subgraphs

60:12 R. ANDERSEN

FEIGE, U., PELEG, D., AND KORTSARZ, G. 2001. The dense k-subgraph problem. Algorithmica 29, 3,410–421.

FEIGE, U. AND SELTSER, M. 1997. On the densest k-subgraph problem. Tech. rep. CS97-16. 1,.GALLO, G., GRIGORIADIS, M. D., AND TARJAN, R. E. 1989. A fast parametric maximum flow algorithm

and applications. SIAM J. Comput. 18, 1, 30–55.GIBSON, D., KUMAR, R., AND TOMKINS, A. 2005. Discovering large dense subgraphs in massive graphs.

In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB’05). 721–732.GOLDBERG, A. 1984. Finding a maximum density subgraph. Tech. rep. UCB CSD 84/71, University of

California, Berkeley.KANNAN, R. AND VINAY, V. 1999. Analyzing the structure of large graphs. Manuscript.

http://research.microsoft.com/en-us/um/people/kannan/Papers/webgraph.pdf.KHULLER, S. AND SAHA, B. 2009. On finding dense subgraphs. In Proceedings of the 36th International

Colloquium on Automata, Languages and Programming (ICALP’09). 597–608.KORTSARZ, G. AND PELEG, D. 1994. Generating sparse 2-spanners. J. Algor. 17, 2, 222–236.KUMAR, R., RAGHAVAN, P., RAJAGOPALAN, S., AND TOMKINS, A. 1999. Trawling the web for emerging

cyber-communities. In Proceedings of the 8th World Wide Web Conference (WWW’99). 403–416.NAOR, M. AND STOCKMEYER, L. 1995. What can be computed locally? SIAM J. Comput. 24, 6, 1259–

1277.SPIELMAN, D. A. AND TENG, S.-H. 2004. Nearly-linear time algorithms for graph partitioning, graph

sparsification, and solving linear systems. In Proceedings of the 36th Annual ACM Symposium on Theoryof Computing (STOC’04). ACM Press, New York, 81–90.

SPIELMAN, D. A. AND TENG, S.-H. 2008. A local clustering algorithm for massive graphs and itsapplication to nearly-linear time graph partitioning. CoRR abs/0809.3232.

RECEIVED OCTOBER 2009; ACCEPTED OCTOBER 2009

ACM Transactions on Algorithms, Vol. 6, No. 4, Article 60, Publication date: August 2010.


Recommended