Fast Distributed Algorithms for (Weakly) Connected Dominating …ale/Papers/jcss.pdf · Fast Distributed Algorithms for (Weakly) Connected Dominating Sets and Linear-Size Skeletons

Fast Distributed Algorithms

for (Weakly) Connected Dominating Sets

and Linear-Size Skeletons

Devdatt Dubhashi a Alessandro Mei b,1,∗

Alessandro Panconesi b,1 Jaikumar Radhakrishnan c,2

Aravind Srinivasan d,3

aComputing Science, Chalmers University of Technology, Goteborg, Sweden.

bInformatica, Universita di Roma La Sapienza, Italy.

cSchool of Technology and Computer Science, TIFR, Mumbai, India.

dDepartment of Computer Science and Institute for Advanced Computer Studies,

University of Maryland, College Park, MD, USA.

Abstract

Motivated by routing issues in ad hoc networks, we present polylogarithmic-time

distributed algorithms for two problems. Given a network, we first show how to

compute connected and weakly connected dominating sets whose size is at most

O(log ∆) times the optimum, ∆ being the maximum degree of the input network.

This is best-possible if NP 6⊆ Dtime[nO(log log n)] and if the processors are required

to run in polynomial-time. We then show how to construct dominating sets that have

the above properties, as well as the “low stretch” property that any two adjacent

nodes in the network have their dominators at a distance of at most O(log n) in

the output network. (Given a dominating set S, a dominator of a vertex u is any

Preprint submitted to Elsevier Science 11 April 2005

v ∈ S such that the distance between u and v is at most one.) We also show our

time bounds to be essentially optimal.

Key words: Ad hoc networks, dominating sets, distributed algorithms.

1 Introduction

We present fast distributed algorithms for computing connected and weakly

connected dominating sets with good “low stretch” properties, in a given dis-

tributed network. Our motivation comes from the study of ad hoc wireless

networks. A crucial way in which these differ from current cellular networks

is that they do not have a separate routing infrastructure such as a system of

base-stations; the mobiles have to conduct their own communication through

routing. In these networks it is necessary to set up a so-called backbone, i.e.,

a set of vertices and links among them that is in charge of routing. In the

specialised literature there is a general consensus that the backbone should be

a dominating set, i.e., each vertex is either in the backbone or next to some

vertex in it. To quote Rajaraman [15], “The most basic clustering that has

been studied in the context of ad hoc networks is based on dominating sets”.

∗ Corresponding author.Email addresses: [email protected] (Devdatt Dubhashi),

[email protected] (Alessandro Mei), [email protected] (Alessandro

Panconesi), [email protected] (Jaikumar Radhakrishnan),

[email protected] (Aravind Srinivasan).1 Supported in part by the E. U. under grant IST-2001-34734 EYES.2 Part of this work was done while visiting Informatica, Universita di Roma La

Sapienza, Italy.3 Supported in part by NSF Award CCR-0208005.

2

Moreover, the following additional features are considered to be appealing:

(a) the backbone should be “small” and (b) it should be connected or weakly

connected. Computing small connected dominating sets has been the focus of

many papers. A good starting point is the comparative analysis of [5] with its

comprehensive bibliography.

A dominating set D is connected if the subgraph induced by it is connected,

while D is weakly connected if the graph induced by the stars of vertices of

D is connected (the star of vertex u is the neighborhood of u including u

itself). An equivalent definition is the following: connect every two vertices

of D that are at distance 1 or 2. If the resulting graph (with vertex set D)

is connected, then D is weakly connected. While connectivity appears to be

a natural requirement, several authors have argued that the right notion to

apply in the wireless context is weak connectivity (see for instance [6]). As

a reasonable first approximation for many situations of interest, wireless net-

works can be modeled as message-passing, synchronous networks: vertices are

processors, edges are communication links and the network is synchronous.

Communication proceeds in synchronous rounds : in each round, every vertex

sends messages to its neighbors, receives messages from its neighbors, and

does some local computation. In this model, the running time is the number

of communication rounds. In this fashion, communication is charged for while

local computation is not. This simplification is justified since the former is or-

ders of magnitude greater than the latter. This is the standard model assumed

in this paper. What does it mean to “compute” a subset S of the vertices,

such as a (weakly) connected dominating set, in this model? What we aim for

is an efficient distributed algorithm, at the end of which each vertex knows

whether it is a member of S or not. Henceforth, we denote by n the number

3

of vertices of the network and by ∆ its maximum degree.

There are several papers describing distributed algorithms for computing (weakly)

connected dominating sets in our model. These algorithms fall into two cate-

gories. The first category consists of algorithms that are fast, but provide no

guarantee that the dominating set they return will be small. The algorithm

of [18] falls into this category. The second category consists of algorithms

that are guaranteed to compute a “smallest possible” dominating set, but

require polynomial number of rounds in the number of vertices of the net-

work (see [17] for an analysis of several known dominating set algorithms).

By “smallest possible” we mean an O(log ∆)-approximation; this is the best

one can hope for, assuming that each vertex performs a polynomial-time-

bounded computation during every communication round and that NP 6⊆

Dtime[nO(log log n)] [7,13,16]. (Logarithms in this paper are to the base two, un-

less specified otherwise.) Algorithms in this category are typically distributed

implementations of sequential approximation algorithms such as the ones in

[8]. There are also quite nice distributed algorithms for computing “best possi-

ble” dominating sets in polylogarithmically many rounds ([9] and, with some

modifications, [14]). But these dominating sets are, in general, neither con-

nected nor weakly connected.

In this paper we present several randomised distributed algorithms that com-

pute the “smallest possible” connected and weakly connected dominating set

in polylogarithmically (in n) many rounds. As above, by “smallest possible”

we mean dominating sets of size O(log ∆) · Opt. In fact, our guarantee is

in terms of the smallest, not necessarily (weakly) connected, dominating set;

since the smallest dominating set has size at most that of a smallest (weakly)

connected dominating set, this suffices.

4

Our results are based on a technique that might be of independent interest. We

show that given a graph G that is connected, G can be sparsified by deleting all

but a linear number of edges. Crucially, the resulting graph stays connected.

This is interesting in view of two facts. First, as shown by Proposition 1,

there is no deterministic or randomised algorithm running in o(n) rounds

that is capable of removing even one edge and preserving connectivity. In

particular, this implies that there are no o(n)-round protocols for computing

spanning or Steiner trees. Such trees are the tool of choice for connecting up a

dominating set D while at the same time introducing only linearly many (in

the size of D) new vertices. Second, the result is also interesting because real

networks sometimes are bound to have a superlinear number of edges. Indeed,

the following model has been proposed as a cost-effective way to set up sensor

networks [19]. The sensor networks are distributed essentially at random over

the area of interest that can be assumed to be the unit square; each chip is a

radio transmitter that sets up a connection to the k nearest chips. It is known

that if one wants the network to be connected then k must be Ω(log n) [19].

Therefore the resulting network will have a superlinear number of edges and

the same might very well hold for the graphs induced by dominating sets in

the graph. Furthermore, Proposition 11 shows the running time of O(log n)

for our sparsification, to be best-possible.

In addition to domination, (weak) connectivity and small size, there is yet an-

other feature of the backbone that would be attractive. Since the vertices in the

dominating set take care of routing, it is desirable that dominators of “nearby”

vertices are “relatively nearby”. In particular, we show that our above results

can be strengthened as follows. In Section 3, we present a polylogarithmic-time

randomised distributed algorithm to construct a connected dominating set S

5

of size at most O(log ∆) times the optimum, with the following additional

“low stretch” property: for any pair of nodes u and v in G, the distance (in

the subgraph induced by S) between their respective dominators is at most

O(log n) times the distance between u and v in G. To our knowledge, this is

the first result of this kind in this distributed setting.

Thus we present fast distributed algorithms for constructing dominating sets

with good size, connectivity, and stretch properties; our running times are

also shown to be essentially best-possible. We remark that in the case of the

unit-disk graph, which is sometimes considered to be a good model of ad hoc

networks, our algorithms deliver essentially a 28-approximation rather than

an O(log n)-approximation (to be precise, the approximated solution is of size

at most 28 · Opt + 7). Finally, one of our algorithms (Algorithm 2 in § 2.2)

admits a straightforward asynchronous implementation. Recent experimental

work has shown that the algorithms described in this paper perform quite well

in practice for ad hoc networks [5].

2 Distributed algorithms for connected dominating sets

In this section, we describe two randomised distributed algorithms that com-

pute small connected dominating sets in graphs. These algorithms run in time

polylogarithmic in the size of the graph (we assume that all processors know

the size of the graph, or at least a reasonable upper bound on it), and pro-

duce a solution that is within a factor O(log ∆) of the optimum. In fact,

both these algorithms have a common first phase, during which an O(log ∆)-

approximated dominating set is computed. (In other words, a dominating set

whose size is at most O(log ∆) times the minimum size of a dominating set,

6

is computed.) This can be achieved by using existing randomised distributed

algorithms such as the one in [9] or a suitable modification of that in [14]. The

running time of both algorithms is polylogarithmic. Once an approximation

for the dominating set is found, it remains only to “connect it up”. Let u and

v be two vertices in the dominating set at distance at most 3 and let Puv be a

shortest path connecting them. A connected dominating set can be obtained

by inserting all vertices lying on all such paths (see Proposition 5 below for

a formal proof). However, this method will not preserve the approximation

guarantee, for the number of vertices inserted can be quadratic in the size

of the original dominating set. The problem then is how to insert vertices

without destroying the approximation guarantee. Sequentially this is easy: a

dominating set of size ` can be connected up as a tree by adding at most

2(` − 1) more vertices. This gives a connected dominating set of size within

a factor of O(log ∆) of the optimum dominating set. Guha and Khuller [8]

showed that one can connect the dominating set somewhat more efficiently,

for example, by using an approximation to the Steiner tree to connect up.

However we are interested in fast distributed algorithms and in this setting

the following impossibility result applies; this shows that a (Steiner) tree-based

approach will not lead to efficient algorithms.

Proposition 1 There is no deterministic distributed algorithm that in o(n)

time is able to remove at least one edge and preserve connectivity, whenever

this is possible. For any constant ε > 0, there is no randomised algorithm that

runs in o(n) time for this task, with a success probability of at least 1/2 + ε.

Clearly, it suffices to show the lower bound for randomized algorithms. Con-

sider the following argument.

7

Input: The simple cycle C on the vertex set [n] = 0, 1, . . . , n− 1 with edge

set (i, (i + 1) mod n) : i ∈ [n], or the path Pi obtained by deleting the

edge (i, (i + 1) mod n) from the cycle C.

Claim 2 Suppose there is a randomised distributed algorithm that runs in

time t, such that

(i) on input C, with probability at least 1/2 + ε, the final graph has at least

one edge missing;

(ii) on input Pi (for each i), with probability at least 1/2 + ε, the final graph

is the same as Pi.

Then, t ≥ 2nε1+2ε− 3/2.

Note that there is a trivial randomised algorithm (with no communication)

for ε = 0: the edge (0, 1) drops out with probability 12

(except in P0, where

this edge does not exist and nothing is done). Our claim says that if we want

to improve this to any positive constant ε, we need linear time. For the rest

of the proof, when we mention vertices i − 1, i + 1, i + 2 etc., the addition is

meant modulo n.

PROOF. Pick i ∈ [n] randomly, and let I be the set of edges in C that

are a distance at least t + 2 away from the edge (i, i + 1) (for example, edges

(i+1, i+2) and (i−1, i) are at distance 1 from (i, i+1)). Then, |I| = n−2t−3.

Now, run the algorithm on C and observe the probability of the event E(I) ≡

“some edge in I is deleted by the algorithm”. Then, one can use part (i) of

the claim and an averaging argument to show that

Pr[E(I)] ≥ (1/2 + ε) · n− 2t− 3

n,

8

where the probability is over the internal coin tosses of the algorithm and the

random choice of i. So we can fix ı, I so that

Pr[E(I)] ≥ (1/2 + ε) · n− 2t− 3

n.

Now, one can verify easily that whenever the deleted edge is in I, neither

endpoint of that edge has received any communication emanating from vertices

ı or ı+1. So, the probability of E(I) when the input graph is Pı is also at least

(1/2 + ε) · n−2t−3n

. But, whenever this happens in Pı, there is an error. Hence,

(1/2 + ε) · n−2t−3n≤ 1/2 − ε. Our claim (as well as Proposition 1) follows by

rearranging this inequality. 2

Thus, it is impossible to connect up a given dominating set as a tree in time

significantly better than the diameter of the graph, and since we want poly-

logarithmic running time we cannot afford this. To get around this difficulty,

we will give up the idea of connecting the given dominating set D using a

tree, but will, instead, be satisfied as long as the final connected graph has

a linear number of edges in the size of D. In this section, we present two

distributed algorithms, with polylogarithmic running time, for producing a

connected spanning subgraph with a linear number (in the number of ver-

tices) of edges. Note that this does not contradict Proposition 1. We then

show that our O(log n) running time is best-possible.

The first algorithm is randomised. It may (very rarely) fail to have a linear

number of edges, but the resulting graph is always connected and spanning. As

a consequence, the solution produced by our randomised algorithm for the con-

nected dominating set problem is always a connected dominating set, but with

negligibly small probability, its size may not be within our targeted O(log ∆)

9

factor of the optimal solution. The second algorithm is a deterministic ver-

sion of the randomised algorithm and never fails. Note that, in the sequel, we

give algorithms for computing connected dominating sets. The straightforward

modifications for computing weakly connected dominating sets are omitted.

We start with a useful definition and a simple proposition.

Definition 3 (Powers of Graphs) Given a graph G = (V, E) and a positive

integer d, the graph Gd has vertex-set V , and two vertices u and v are connected

by an edge in Gd iff they are distinct and are connected by a path of length at

most d in G. Also, given a set V ′ ⊆ V , let Gd[V ′] denote the subgraph of Gd

induced by V ′.

Proposition 4 Suppose a distributed algorithm A(H) for a synchronous net-

work H = (V, E) runs in at most T (|V |) rounds, where T (·) is a non-decreasing

function. Then, given a network G = (V, E), a set V ′ ⊆ V , and a positive in-

teger d, we can run A(Gd[V ′]) in our network G in at most d · T (|V |) rounds.

(This is so, since we can simulate one round in Gd[V ′] by d rounds in our

network G.)

From now on, let S denote the O(log ∆)–approximated dominating set pro-

duced by one of the algorithms in [9,14]. Let G′ denote G3[S] (i.e., the graph

with vertex-set S, where two distinct vertices are connected by an edge iff

their distance in G is at most 3). In different language, the following result

has also been proved earlier in [8].

Proposition 5 If G is connected then G′ is also connected.

PROOF. Suppose G′ is not connected. Let v and w be vertices in different

10

components of G′ whose distance in G, dG(v, w), is shortest. Clearly, d(v, w) ≥

4, for otherwise, v, w would be an edge of G′, and v and w would not be

in different components. Let v = v0, v1, v2, v3, . . . , vd = w be a shortest path

from v to w in G. Let u be the vertex of S that is closest in G to v2; since

S is a dominating set, we have dG(u, v2) ≤ 1. So, dG(u, v) ≤ 3, and hence

u and v are in the same component of G′. But then, u and w must be in

different components, and d(u, w) ≤ d(v, w) − 1, contradicting the choice of

(v, w). Hence, G′ must be connected. 2

Our goal now is to find a spanning subgraph H of G′ of linear size. The

subgraph H we obtain will have at most 3|S| edges, each edge corresponding

to a path of length at most 3 in G. By adding the intermediate vertices on

each such path (at most 2) to S, we get a connected dominating set of size at

most 6|S| + |S| = 7|S|, giving us a solution within a factor O(log ∆) of the

optimum. It is straightforward to see that once H has been found, the rest of

the work can be done in a constant number of deterministic steps performed

in a distributed fashion. So, it remains only to show a randomised distributed

algorithm for constructing a connected spanning subgraph H of G′ with at

most 3|V (G′)| edges. The following lemma is central to all our algorithms: to

ensure that H is sparse, it is enough to ensure that H has no small cycles.

Recall that the girth of a graph is the length of the shortest cycle in it.

Lemma 6 (see, e.g., [10, Lemma 15.3.1]) A graph on n vertices with girth g

has at most n1+ 2g−1 + n edges.

This theorem suggests the following strategy. Consider the graph G′. Keep

deleting edges that appear in cycles of length less than 1 + 2 log n until no

such cycles remain. In the end, we will be left with at most 3n edges. While

11

implementing this strategy using a distributed algorithm, we must ensure that

the graph remains connected when we delete edges in parallel. We now present

two algorithms achieving this. For notational convenience, we present these

two algorithms as removing all cycles of length less than 1 + 2 log n from an

n-vertex network G = (V, E); note that in our application, these algorithms

will be run on G′. In view of Proposition 4, there is a blow-up of a constant

factor (three) in running the algorithms on G′.

2.1 Algorithm 1

1. We start with the graph G and delete edges in order to remove all short

cycles. For g = 3, 4, . . . , b1 + 2 log nc, destroy cycles of length g from G

by deleting edges.

1.1. Repeat the following steps 10 log n times.

a. Mark each edge with probability 1g

independently.

b. If the edge e is the only marked edge in some cycle of length g,

then delete e.

c. Remove the marks on all the surviving edges.

Correctness: We show that after the first execution of the outer loop (cor-

responding to g = 3), with high probability, there are no triangles in the

remaining graph. In general, after execution of the outer loop corresponding

to g = i, with high probability, there are no cycles of length i in the remaining

graph. To show this, we need two observations.

Claim 7 The probability that a cycle C of length ` survives at the end of the

iteration of the outer loop corresponding to g = ` is at most 1n5 .

12

PROOF. The probability that the cycle C of length ` is removed in one

iteration of the inner loop is at least

∑e∈C

Pr[e is the only marked edge of C];

i.e., at least

` · 1`

(1− 1

`

)`−1

≥ 1

3.

Thus, the probability that the cycle survives after one iteration of the inner

loop is at most 23; the probability that it survives after 10 log n iterations is at

most 1n5 . 2

Claim 8 (Folklore) The number of cycles in G of length girth(G) is at most(n3

)if girth(G) 6= 4. (For girth(G) = 4, there are clearly at most n4 such

cycles.)

PROOF. Pick three vertices on a cycle of length girth(G) as equally spaced

as possible. This set uniquely determines the cycle. If there were two cycles

of length girth(G) associated to the same set of three vertices, then we would

find a cycle stricly shorter than girth(G). 2

The rest of the argument is straightforward. After the first iteration of the

outer loop the probability that any cycle of length 3 remains is at most 1n2 .

Assuming that we did succeed in destroying all cycles of length 3 in the first

iteration of the outer loop, the probability that some cycle of length 4 remains

after the next iteration is at most 1n. Proceeding in this manner, we see that

13

the probability that some cycle of length less than 1 + 2 log n survives at the

end is at most 1+2 log nn

.

2.2 Algorithm 2

Consider the following deterministic, distributed algorithm for removing all

cycles of length at most 1+2 log n, while preserving connectivity. It is assumed

that each edge has a unique ID (if not a simple distributed way of achieving

this is to make each edge choose a random real in [0, 1] as its ID).

Algorithm 2: Each edge, in parallel, drops out if it is the edge with the

smallest ID in a cycle of length less than 1 + 2 log n.

The algorithm admits a straightforward, O(log n)-time distributed implemen-

tation and it clearly breaks all cycles of the required length. We now prove

that it preserves connectivity.

Lemma 9 Algorithm 2 preserves connectivity.

PROOF. Consider the following sequential process. Order the edges accord-

ing to their ID number. Consider the edges one-by-one from this list and delete

them if they are currently in a cycle of length less than 1 + 2 log n. Clearly,

when an edge is deleted it is in a cycle, so its deletion cannot disconnect the

graph. Thus, in the end we will be left with a connected graph of girth at least

1 + 2 log n.

The claim follows by observing that this sequential process and Algorithm 2

remove exactly the same set of edges. To see this, observe that when an edge e

14

is discarded by the sequential process, there is a cycle in which it is the lowest

ID edge. 2

Thus we obtain the following.

Theorem 10 There is a distributed algorithm that, given an n-vertex con-

nected graph G, computes in time O(log n) a connected subgraph H of G with

a linear number of edges.

This result can be shown to be time-optimal, as described in the following

proposition.

Proposition 11 For constants C and ε > 0, suppose there is a randomised

distributed algorithm with the following property. Given any connected graph

G on n vertices, it produces a subgraph whose expected number of edges is

at most Cn. Also, with probability one it terminates within T (n) steps; with

probability at least ε, it returns a connected subgraph. Then, we must have

T (n) = Ω(log n).

PROOF. Suppose for a contradiction, there exist constants C, ε, as well as

an algorithm with T (n) = o(log n), which satisfy the conditions of the propo-

sition. Take a graph G on n vertices with girth 10 · T (n) and a superlinear

number of edges (the existence of such a graph can be shown by standard

probabilistic arguments). We can show that when G is fed to the algorithm,

some edge e = (a, b) must have a deletion probability of 1− o(1). In the graph

G − e, let A be the set of vertices at distance at most 2 · T (n) from a, and

B the set of vertices at distance at most 2 · T (n) from b. Consider a minimal

cut (set of edges) C of G− e that separates A and B (note that A and B are

15

disjoint). Run the algorithm on the connected graph G−C. With probability

1 − o(1), the algorithm will delete e and thus disconnect G − C, because in

time T (n), the edge e does not know what has happened to edges at distance

2T (n) or more. This leads to a contradiction. 2

Remark on asynchronous implementation. Let H be the graph obtained

after running Algorithm 2 on the input graph G. Since the set of edges deleted

by Algorithm 2 only depends on G and the distribution of ID’s, the edges can

be deleted in any order to obtain the same H. Therefore, Algorithm 2 admits a

straightforward asynchronous implementation. This property might be useful

from a practical point of view.

3 Dominating sets with low stretch

The algorithms in the previous section relied on the observation that if a graph

has no cycles of length less than 1+2 log n, then it must have at most 3n edges.

From this observation, one can infer the following stronger statement, which

is a well-known property of spanners.

Proposition 12 (see, e.g., [2]) For every graph G on n vertices, there is a

subgraph H with V (H) = V (G) and with a linear number of edges, such that

for every two vertices v, w ∈ V (G), dH(v, w) ≤ dG(v, w) · (1 + 2 log n).

PROOF. Start from the empty graph H. Consider the edges of G in some

order. If the edge closes a cycle of length less than 1 + 2 log n in H, then

discard it, otherwise add it to H. Clearly, the resulting graph H satisfies

16

dH(v, w) ≤ dG(v, w) · (1 + 2 log n). Since H has no cycle of length less than

1 + 2 log n, it has at most 3n edges. 2

This proposition and its sequential algorithmic proof lead us to the following

question.

Question: Is there an efficient distributed algorithm for generating a (weakly)

connected dominating set whose induced subgraph H has the “spanning”

property guaranteed by the above proposition? (Note: Algorithms 1 and 2

are not guaranteed to produce such a solution.)

Answer: Yes! Algorithm 3 below does precisely this.

We will make use of the following graph decomposition theorem of Linial and

Saks [11] (see also [3]).

Theorem 13 [11] There is a randomised distributed algorithm that with high

probability, in O(log2 n) time, partitions the vertices of a graph G into O(log n)

colour classes, such that in the subgraph induced by any colour class, every two

vertices in the same connected component have distance O(log n) in G. (The

shortest path need not be confined to the connected component. Also, both the

running time and correctness of this algorithm hold with high probability.)

Our algorithm:

(1) Apply the Linial-Saks decomposition algorithm to the graph Gd, where

d , b12(1 + 2 log n)c. Let C1, C2, . . . , Ck be the colour classes, where k =

O(log n). For brevity, we call the connected components of the graph

induced by the colour classes blocks.

(2) Let H be the empty graph with vertex set V (G).

17

(3) Cycle through the colour classes sequentially. In the ith iteration, consider

the edges of G incident on vertices in Ci, and add them to H so that no

cycles of length less than 1 + 2 log n are formed, and yet every edge that

is not included closes a cycle of length less than 1 + 2 log n in H. Stated

formally,

a. In parallel for each block B of colour class Ci, consider the set EB of

all edges of E(G) with at least one end-point in B.

b. Compute a maximal subset FB ⊆ EB such that there are no cycles of

length less than 1+2 log n in E(H)∪FB. This can be done as follows.

Assume that the decomposition of Step 1 satisfies the guarantees of

Theorem 13 (which is the case with high probability, by Theorem 13).

Then, every pair of vertices in B is at a distance of at most O(d ·

log n) = O(log2 n). So, this step can be performed in O(log2 n) time,

for example, by collecting the required information at one node in

each block, computing FB locally, and transmitting this information

back to all nodes in B.

c. Set E(H)← E(H) ∪ FB and E(G)← E(G)− EB.

Correctness: Since FB is chosen to be maximal in Step 3(b), when an edge

e of G is excluded from E(G), there is a cycle of length less than 1 + 2 log n

in H that e closes. Thus, for every edge of G, its end points are at distance

less than 1 + 2 log n in H. Now, to show that H has a linear number of edges,

we claim that in the end H has no cycles of length less than 1 + 2 log n. For,

suppose a cycle was created for the first time in the iteration corresponding to

colour class Ci. Let Hi−1 be the value of H at the beginning of the iteration.

Since E(Hi−1)∪FB does not have any such cycle (by the definition of FB), the

cycle must have edges from FB1 and FB2 , for distinct blocks B1 and B2 of Ci.

18

But this means that there are vertices in B1 and B2 that are within distance

b12(1 + 2 log n)c of each other in G. This is not possible because each block is

a connected component of Ci in Gd, where d = b12(1 + 2 log n)c.

Running time: The Linial-Saks algorithm takes time O(log2 n); thus, since

d = Θ(log n), we see by Proposition 4 that Step 1 can be run in O(log3 n)

time. The remaining steps are deterministic, and there is no further scope for

error. There are, with high probability, only O(log n) colour classes. Each class

is processed in O(log2 n) time with high probability, as seen above. Thus, the

overall running time of our algorithm is O(log3 n).

Theorem 14 Suppose we are given a distributed network G. There is a ran-

domised distributed algorithm that in polylogarithmic time computes a span-

ning subgraph H of the graph G, such that

(a) H has no cycles of length 1 + 2 log n, and consequently, H has a linear

number of edges; and

(b) for every edge of G, there is a path in H of length at most 1 + 2 log n.

Corollary 15 Suppose we are given a distributed network G = (V, E). There

is a randomised distributed algorithm that in polylogarithmic time computes a

subset V ′ of V , such that

(1) V ′ is a connected dominating set, and has size O(log ∆) times the size of

a minimum-sized dominating set;

(2) for every pair of adjacent vertices in G, there is a path connecting them

with all internal vertices lying in V ′, with the path-length being O(log n).

(Thus, for any pair of vertices v, w in G, there is a path with all inter-

nal vertices lying in V ′, with the path-length being at most O((log n) ·

19

dG(v, w)).)

PROOF. As in Section 2, let S denote an O(log ∆)–approximated dominat-

ing set, and let G′ denote G3[S]. Run the algorithm guaranteed by Theorem 14

on the graph G′; by Proposition 4 and Theorem 14, this requires only poly-

logarithmic time, with high probability. The resulting spanning subgraph H

of G′ has at most 3|V (G′)| edges. As in Section 2, the edges of H correspond

to paths of length at most three in G; we add to the dominating set S, at

most two intermediate vertices for each edge of H. This yields a connected

dominating set of size at most 7|S|. What about the “spanner” property (ii)?

If (u, v) is an edge in G, then it is easy to check that the dominators u′ ∈ S

and v′ ∈ S of u and v respectively, are adjacent in G′. By property (b) of

Theorem 14, we know that there is a path of length O(log n) connecting u′

and v′ in H. Finally, since each edge in H corresponds to a path of length at

most 3 in G, property (ii) follows. 2

4 Constant-factor approximations for unit-disk graphs

Unit-disk graphs can in some cases be taken as good models for radio networks,

including ad hoc networks. Given a set of points in the plane, the graph is

obtained by having a vertex for each point and by connecting any two points

at Euclidean distance one or less. In [17] the following result is proven: if I is a

maximal independent set (MIS) in a unit-disk graph G, then |I| is at most one

more than four times the smallest dominating set size. Since an MIS is also

a dominating set, we can, use Algorithm 1 or Algorithm 2 to connect it up

efficiently. This gives us a connected dominating set of size at most 7·|I|, which

20

is at most 28 ·Opt+7, where Opt is the size of the smallest dominating set in

G. It is well known that an MIS can be computed in O(log n) communication

rounds in our distributed model using randomization [1,12].

5 Open Problems and Summary of results

It would be nice to develop a more direct and computationally simpler method

to obtain connected dominating sets with good stretch properties. One way to

achieve this would be to solve the following problem. Let G = (A, B, E) be a

bipartite network. A subset A′ ⊆ A is a cover if N(A′), the set of neighbours

of A′, is the whole of B. We want a fast distributed algorithm that finds a

minimal such A′. In essence, this is the unweighted set cover problem where

the input instance is represented as a bipartite graph, and where one is to

find a minimal, as opposed to a small, cover. Besides being an interesting

symmetry-breaking problem per se, finding a fast distributed algorithm would

give as a by-product a way to compute a sparse, connected subgraph with low

stretch. This follows from the following discussion.

Definition 16 Given an input graph G, a cycle is small if its length is a at

most c log n, for some arbitrary, but fixed, parameter c. Let C be the collection

of small cycles (these are those that we want to break). A set A of edges is a

cover if G[V, E − A] has no C-cycles.

Claim 17 Let A be a minimal cover with respect to C. Then, for each a ∈ A

there is a ca ∈ C such that ca is broken by a and only by a.

21

PROOF. If a ∈ A does not have a corresponding ca, it can be removed from

A. This would violate the minimality assumption. 2

Claim 18 Let A be a minimal cover with respect to the set of small cycles C.

Then, G[V, E − A] has stretch O(log n).

PROOF. Given any uv-path p of G, every edge e of p either is present in

G[V, E − A] or, if removed, can be bypassed by means of the corresponding

ce ∈ C. 2

The problem of finding a minimal set of edges to break all small cycles reduces

to that of computing a minimal cover in a bipartite graph, according to the

definition above. The two sides of the bipartition are edges on the B-side and

small cycles on the A-side. A cycle and an edge are connected in the bipartite

graph if the edge belongs to the cycle. An algorithm for finding a minimal

set of B-vertices covering all the A-vertices can be simulated in logarithmic

time in the bipartite edge-cycle graph. Thus, a fast distributed solution to

the minimal (as opposed to small) set cover problem would yield an efficient

algorithm for computing sparse, spanning subgraphs with low stretch.

We now summarize the results contained in this paper.

Result 1 (based on Algorithms 1 and 2): There is a randomised dis-

tributed algorithm for computing a O(log ∆) approximation to the minimal

connected dominating set in time O((log n)2).

Result 2 (based on Algorithm 3): There is a randomised distributed al-

gorithm, that given a graph G on n vertices, computes a connected subgraph

22

H of G, with the following properties:

Domination. H is dominating: every vertex in G−H has a neighbour in H.

Sparsity. The number of vertices in H is within a factor O(log ∆) of the

minimum size of a dominating set in G.

Stretch (or dilation). For every pair of vertices (v, w) in G, there is a path

using H (that is, with all internal vertices belonging to H) of length at most

O((log n) · dG(v, w)).

(Note: the approximation factor is O(log ∆) for the size of the dominating set,

but O(log n) for the stretch.)

Acknowledgments

We thank Samir Khuller and the SODA 2003 referees for their helpful com-

ments.

References

[1] N. Alon, L. Babai, and A. Itai. A fast and simple randomized parallel algorithm

for the maximal independent set problem. J. Algorithms, 7:567–583, 1986.

[2] I. Althofer, G. Das, D. P. Dobkin, D. Joseph, and J. Soares. On sparse spanners

of weighted graphs. Discrete Comput. Geom., 9:81–100, 1993.

[3] B. Awerbuch and D. Peleg. Sparse partitions. 31st IEEE Symp. on Foundations

of Computer Science, pages 503–513, 1990.

[4] Khaled Alzoubi, Peng-Jun Wan, and Ophir Frieder. Message-optimal

23

connected-dominating-set construction for routing in mobile ad hoc networks.

Proceedings of Mobihoc 2002.

[5] S. Basagni, M. Mastrogiovanni, and C. Petrioli. A Performance Comparison

of Protocols for Clustering and Backbone Formation in Large Scale Ad Hoc

Networks. In Proceedings of the First IEEE International Conference on Mobile

Ad Hoc and Sensor Systems, MASS 2004 Fall, Fort Lauderdale, FL, October

25-27 2004.

[6] Yuanzhu Chen and Arthur Liestman. Approximating minimum size weakly-

connected dominating sets for clustering mobile ad hoc networks. Proceedings

of Mobihoc 2002.

[7] U. Feige. A threshold of lnn for approximating the set cover. Journal of the

ACM, 45(4), 634–652, July 1998.

[8] S. Guha and S. Khuller. Approximation algorithms for connected dominating

sets. Algorithmica, 20(4):374–387, 1998.

[9] L. Jia, R. Rajaraman and T. Suel. An efficient distributed algorithm

for constructing small dominating sets. Proceedings of the Annual ACM

Symposium on Principles of Distributed Computing, pages 33–42, 2001.

[10] J. Matousek. Lectures on discrete geometry. Graduate Texts in Mathematics,

Vol. 212, Springer, 2002.

[11] N. Linial and M. Saks. Low diameter graph decompositions. Combinatorica,

13:441–454, 1993.

[12] M. Luby. A simple parallel algorithm for the maximal independent set problem.

SIAM J. on Computing, November 1986, Vol. 15, No. 4, pp. 1036-1053.

[13] C. Lund and M. Yannakakis. On the hardness of approximating minimization

Problems. Journal of the ACM, 41(5), 960–981, 1994.

24

[14] S. Rajagopalan and V.V. Vazirani. Primal-dual RNC approximation algorithms

for (multi)set (multi)cover and covering integer programs. SIAM J. on

Computing 28, 2 (1998) 525-540.

[15] R. Rajaraman. Topology control and routing in ad hoc networks: a survey.

SIGACT News, Vol. 33, Number 2, pages 60–73, 2002.

[16] L. Trevisan. Non-approximability results for optimization problems on bounded

degree instances. Proceedings of the 33rd ACM STOC, 453–461, 2001.

[17] Peng-Jun Wan, Khaled M. Alzoubi and Ophir Frieder. Distributed construction

of connected dominating set in wireless ad hoc networks. Proceedings of Infocom

2002.

[18] Jie Wu and Hailan Li. A dominating-set-based routing scheme in ad hoc wireless

networks. special issue on Wireless Networks of the Telecommunication Systems

Journal, Vol. 3, 63–84, 2001.

[19] F. Xue and P. R. Kumar. The number of neighbors needed for connectivity

of wireless networks. Wireless Networks, to appear, 2003. Available from

http://black1.csl.uiuc.edu/∼prkumar/postscript files.html.

25

Documents

Fast Distributed Algorithms for (Weakly) Connected Dominating …ale/Papers/jcss.pdf · Fast Distributed Algorithms for (Weakly) Connected Dominating Sets and Linear-Size Skeletons