21
Approximate Distance Oracles for Unweighted Graphs in Expected O (n 2 ) Time SURENDER BASWANA Max-Planck Institut fuer Informatik, Saarbruecken, Germany AND SANDEEP SEN Indian Institute of Technology Delhi, New Delhi, India Abstract. Let G = (V , E ) be an undirected graph on n vertices, and let δ(u, v ) denote the distance in G between two vertices u and v . Thorup and Zwick showed that for any positive integer t , the graph G can be preprocessed to build a data structure that can efficiently report t -approximate distance between any pair of vertices. That is, for any u, v V , the distance reported is at least δ(u, v ) and at most t δ(u, v ). The remarkable feature of this data structure is that, for t 3, it occupies subquadratic space, that is, it does not store all-pairs distances explicitly, and still it can answer any t -approximate distance query in constant time. They named the data structure “approximate distance oracle” because of this feature. Furthermore, the trade-off between the stretch t and the size of the data structure is essentially optimal. In this article, we show that we can actually construct approximate distance oracles in expected O(n 2 ) time if the graph is unweighted. One of the new ideas used in the improved algorithm also leads to the first expected linear-time algorithm for computing an optimal size (2, 1)-spanner of an unweighted graph. A (2, 1) spanner of an undirected unweighted graph G = (V , E ) is a subgraph (V , ˆ E ), ˆ E E , such that for any two vertices u and v in the graph, their distance in the subgraph is at most 2δ(u, v ) + 1. Categories and Subject Descriptors: E.1 [Data Structures]—Graphs and networks; F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems—Computations on discrete structures; G.2.2 [Discrete Mathematics]: Graph Theory—Graph algorithms General Terms: Algorithms, Theory Additional Key Words and Phrases: Approximate distance oracles, spanners, shortest paths, distance queries, distances A preliminary version of this work appeared in Proceeding of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), ACM, New York, 2004, pp. 271–280. Part of the work of Baswana was done while he was a Ph.D. student at I.I.T. Delhi and was supported by a Ph.D. fellowship from Infosys Technologies Limited Bangalore. Authors’ addresses: S. Baswana, Max-Planck Institut fuer Informatik, 66123 Saarbruecken, Germany, e-mail: [email protected]; S. Sen, Indian Institute of Technology Delhi, Hauz Khas, New Delhi-110016, India, e-mail: [email protected]. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701 New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected]. C 2006 ACM 1549-6325/06/1000-0557 $5.00 ACM Transactions on Algorithms, Vol. 2, No. 4, October 2006, pp. 557–577.

Approximate Distance Oracles for Unweighted Graphs inliamr/p557-baswana.pdf · Approximate Distance Oracles for Unweighted Graphs in Expected O(n2) Time SURENDER BASWANA Max-Planck

  • Upload
    others

  • View
    24

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Approximate Distance Oracles for Unweighted Graphs inliamr/p557-baswana.pdf · Approximate Distance Oracles for Unweighted Graphs in Expected O(n2) Time SURENDER BASWANA Max-Planck

Approximate Distance Oracles for Unweighted Graphs in

Expected O(n2) Time

SURENDER BASWANA

Max-Planck Institut fuer Informatik, Saarbruecken, Germany

AND

SANDEEP SEN

Indian Institute of Technology Delhi, New Delhi, India

Abstract. Let G = (V, E) be an undirected graph on n vertices, and let δ(u, v) denote the distance inG between two vertices u and v . Thorup and Zwick showed that for any positive integer t , the graphG can be preprocessed to build a data structure that can efficiently report t-approximate distancebetween any pair of vertices. That is, for any u, v ∈ V , the distance reported is at least δ(u, v) and atmost tδ(u, v). The remarkable feature of this data structure is that, for t ≥ 3, it occupies subquadraticspace, that is, it does not store all-pairs distances explicitly, and still it can answer any t-approximatedistance query in constant time. They named the data structure “approximate distance oracle” becauseof this feature. Furthermore, the trade-off between the stretch t and the size of the data structure isessentially optimal.

In this article, we show that we can actually construct approximate distance oracles in expectedO(n2) time if the graph is unweighted. One of the new ideas used in the improved algorithm alsoleads to the first expected linear-time algorithm for computing an optimal size (2, 1)-spanner of anunweighted graph. A (2, 1) spanner of an undirected unweighted graph G = (V, E) is a subgraph(V, E), E ⊆ E , such that for any two vertices u and v in the graph, their distance in the subgraph isat most 2δ(u, v) + 1.

Categories and Subject Descriptors: E.1 [Data Structures]—Graphs and networks; F.2.2 [Analysisof Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems—Computationson discrete structures; G.2.2 [Discrete Mathematics]: Graph Theory—Graph algorithms

General Terms: Algorithms, Theory

Additional Key Words and Phrases: Approximate distance oracles, spanners, shortest paths, distancequeries, distances

A preliminary version of this work appeared in Proceeding of the 15th Annual ACM-SIAM Symposiumon Discrete Algorithms (SODA), ACM, New York, 2004, pp. 271–280.

Part of the work of Baswana was done while he was a Ph.D. student at I.I.T. Delhi and was supportedby a Ph.D. fellowship from Infosys Technologies Limited Bangalore.

Authors’ addresses: S. Baswana, Max-Planck Institut fuer Informatik, 66123 Saarbruecken, Germany,e-mail: [email protected]; S. Sen, Indian Institute of Technology Delhi, Hauz Khas, NewDelhi-110016, India, e-mail: [email protected].

Permission to make digital or hard copies of part or all of this work for personal or classroom use isgranted without fee provided that copies are not made or distributed for profit or direct commercialadvantage and that copies show this notice on the first page or initial screen of a display along with thefull citation. Copyrights for components of this work owned by others than ACM must be honored.Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistributeto lists, or to use any component of this work in other works requires prior specific permission and/ora fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected]© 2006 ACM 1549-6325/06/1000-0557 $5.00

ACM Transactions on Algorithms, Vol. 2, No. 4, October 2006, pp. 557–577.

Page 2: Approximate Distance Oracles for Unweighted Graphs inliamr/p557-baswana.pdf · Approximate Distance Oracles for Unweighted Graphs in Expected O(n2) Time SURENDER BASWANA Max-Planck

558 S. BASWANA AND S. SEN

1. Introduction

The all-pairs shortest paths problem is one of the most fundamental algorithmicgraph problem. Every computer scientist is aware of this classical problem rightfrom the days he did his first course on algorithms. This problem is commonlyphrased as follows: Given a graph on n vertices and m edges, compute shortest-paths/distances between each pair of vertices.

In many applications, the aim is not to compute all distances, but to have amechanism (data structure) through which we can extract distance/shortest-pathfor any pair of vertices efficiently. Therefore, the following is a useful alternateformulation of the APSP problem.

Preprocess a given graph efficiently to build a data structure that can answer ashortest-path query or a distance query for any pair of vertices.

Throughout this article, we stick to the above formulation of APSP. The objectiveis to construct a data structure for this problem such that it is efficient both in termsof the space and the preprocessing time. There is a lower bound of �(n2) on thespace requirement of any data structure for APSP problem, and space requirementof all the existing algorithms for APSP match this bound. However, there is a hugegap in the preprocessing time. In its most generic version, that is, for directed graphwith real edge-weights, the best known algorithm for APSP is given by Pettie [2004]and it runs in O(mn +n2 log log n) time. However, for graphs with m = �(n2), thisalgorithm has a running time of �(n3) which matches that of the old and classicalalgorithm of Floyd and Warshal. In fact the best known upper bound on the worstcase time complexity of this problem is O(n3/ log n) due to Chan [2005], whichis marginally subcubic. Surprisingly, despite the fundamental importance of theproblem, there does not exist any algorithm at present that can solve APSP in trulysubcubic time, that is, O(n3−ε) for some ε > 0. The existing lower bound on thetime complexity of APSP is the trivial lower bound of �(n2). This has motivated theresearchers to explore ways to achieve (truly) subcubic preprocessing time and/orsubquadratic space data structures that can report approximate instead of exactshortest-paths/distances.

A data structure is said to compute t-approximate distances for all-pairs of ver-tices, if for any pair of vertices u, v ∈ V , the distance that it reports is at leastδ(u, v) and at most tδ(u, v). Usually t-approximate distance is also termed as dis-tance with stretch t . In the last ten years, many novel algorithms have been designedfor all-pairs approximate shortest paths (APASP) problem that achieve subcubicrunning time and/or subquadratic space. The approximate distance oracle designedby Thorup and Zwick [2005] is a milestone in this area. They showed that any givenweighted undirected graph can be preprocessed in subcubic time to build a datastructure of subquadratic size for answering a distance query with stretch 3 ormore. Note that 3 is also the least stretch for which we can achieve subquadraticspace for APASP (see Cohen and Zwick [2001]). There are two very impressivefeatures of their data structure. First, the trade-off between stretch and the size ofdata structure is essentially optimal assuming a 1963 girth lower bound conjectureof Erdos [1964] and second, in spite of its subquadratic size their data structure cananswer any distance query in constant time, hence the name “oracle”. In precisewords, Thorup and Zwick [2005] achieved the following result.

THEOREM 1.1 [THORUP AND ZWICK 2005]. For any integer k ≥ 1, an undi-rected weighted graph on n vertices and m edges can be preprocessed in expected

Page 3: Approximate Distance Oracles for Unweighted Graphs inliamr/p557-baswana.pdf · Approximate Distance Oracles for Unweighted Graphs in Expected O(n2) Time SURENDER BASWANA Max-Planck

Approximate Distance Oracles for Unweighted Graphs 559

O(kmn1/k) time to build a data structure of size O(kn1+1/k) that can answer any(2k − 1)-approximate distance query in O(k) time.

As mentioned in Thorup and Zwick [2005], the oracle model for approximatedistance has been considered in the past also, at least implicitly, by Awerbuch et al.[1998], Cohen [1998], and Dor et al. [2000]. However, the approximate distanceoracles of Thorup and Zwick significantly improve all these previous results. Hav-ing achieved essentially optimal query time as well as the space requirement forapproximate distance oracles, the only aspect that can be potentially improved isthe preprocessing time. Currently, the expected preprocessing time of the (2k − 1)-approximate distance oracle is O(kmn1/k) which is certainly subcubic. Thorup andZwick posed the following question: Can a (2k − 1)-approximate distance oraclebe computed in O(n2) time?

In this article, we answer their question in affirmative for unweighted graphs.The following is the main result of this article.

THEOREM 1.2. For any integer k ≥ 1, an undirected unweighted graph on nvertices and m edges can be preprocessed in expected O(min(n2, kmn1/k)) timeto construct a data structure of size O(kn1+1/k) that can answer any (2k − 1)-approximate distance query in O(k) time .

1.1. SUMMARY OF RELATED WORK ON APASP. The algorithms for all-pairsapproximate shortest paths fall under two categories, depending upon whether theerror in the distance is additive or multiplicative. The algorithm that computesall-pairs t-approximate shortest paths, as defined earlier, is actually an algorithmthat achieves a multiplicative error t . An algorithm is said to report distances withadditive error k, if for any pair of vertices u, v ∈ V , the distance, that it reports, isat least δ(u, v) and at most δ(u, v) + k. The first algorithm for reporting distanceswith additive error was given by Aingworth et al. [1999]. Their algorithm worksfor unweighted graphs and reports distances with additive error 2. Dor et al. [2000]improved and extended this algorithm for arbitrary additive error. Their algorithm

requires O(kn2− 1k m

1k polylog n) time to report distances with additive error 2(k −1)

for any pair of vertices. They also showed that all-pairs 3-approximate shortestpaths can be computed in O(n2 polylog n) time. Cohen and Zwick [2001] laterextended this result to weighted graphs. There are also some algorithms that achievemultiplicative as well as additive errors simultaneously. Elkin [2005] presented thefirst such algorithm. For unweighted graphs, given arbitrarily small ζ, ε, ρ > 0,Elkin’s algorithm runs in O(mnρ +n2+ζ ) time, and for any pair of vertices u, v ∈ V ,reports distance which is at most (1+ε)δ(u, v)+β, where β is a function of ζ, ε, ρ.If the two vertices u, v ∈ V are separated by sufficiently long distance in the graph,the stretch ensured by Elkin’s algorithm is quite close to (1 + ε). But the stretchmay be quite huge for short paths. This is because β depends on ζ as (1/ζ )log 1/ζ ,depends inverse exponentially on ρ and inverse polynomially on ε. The output of allthe algorithms mentioned above is an n ×n matrix that stores pairwise approximatedistance for all the vertices explicitly to answer each distance query in constant time.Recently, Baswana et al. [2005a] presented a simple algorithm to compute nearly2-approximate distances for unweighted graphs in expected O(n2 polylog n) time.They essentially extend the idea of the 3-approximate distance oracle of Thorupand Zwick [2005] to achieve stretch 2 at the expense of introducing a very smalladditive error (less than 3). All the algorithms for reporting approximate distances

Page 4: Approximate Distance Oracles for Unweighted Graphs inliamr/p557-baswana.pdf · Approximate Distance Oracles for Unweighted Graphs in Expected O(n2) Time SURENDER BASWANA Max-Planck

560 S. BASWANA AND S. SEN

that we mentioned above, including the new results of this article, can also reportthe corresponding approximate shortest path also, and the time for doing so isproportional to the number of edges in the approximate shortest-path.

1.2. ORGANIZATION OF THE ARTICLE. Our starting point will be the 3-approximate distance oracle of Thorup and Zwick [2005]. The following sectionexplains notations, definitions, and lemmas, most of them adapted from Thorupand Zwick [2005]. In Section 3, we provide a sketch of the existing 3-approximatedistance oracle, and identify the most time consuming tasks in its preprocessingalgorithm. Subsequently we explore ways to execute them in expected O(n2) time.We succeed in our goal by providing a tighter analysis of one of the tasks and by per-forming small but crucial changes in the remaining ones. An important tool used byour preprocessing algorithm is a special kind of (2, 1)-spanner. For an unweightedgraph G = (V, E), a subgraph (V, E), E ⊆ E is said to be an (α, β)-spanner if foreach pair of vertices u, v ∈ V , the distance between them in the subgraph is at mostαδ(u, v) + β. In Section 4, we present a parameterized (2, 1)-spanner and explainhow its special features are helpful in improving the preprocessing time of the oraclewithout increasing the stretch. Finally in Section 5, we present and analyze our new3-approximate distance oracle. In a straightforward manner, the techniques usedin our 3-approximate distance oracle lead to an expected O(min(n2, kmn1/k))time algorithm for computing a (2k − 1)-approximate distance oracle inSection 6.

A second contribution of the article, which is important in its own right, is thefirst expected linear-time algorithm for computing a (2, 1)-spanner of (worst caseoptimal) size O(n3/2). The previous linear-time algorithms [Baswana and Sen 2003;Halperin and Zwick 1996] compute spanners of stretch three or more.

2. Preliminaries

For a given undirected unweighted graph G = (V, E) and Y ⊆ V , we define thefollowing notations.

—δ(u, v): the distance between vertex u and vertex v in the graph.

—δ(v, Y ): miny∈Y δ(v, y)

— p(v, Y ): the vertex from the set Y which is nearest to v , that is, at distance δ(v, Y )from v . In case there are multiple vertices from Y at distance δ(v, Y ), we breakthe tie arbitrarily to ensure a unique p(v, Y ). Moreover, if v ∈ Y , we definep(v, Y ) = v .

In this article, we deal with undirected unweighted graphs only. It is straightforwardto observe that a shortest path tree rooted at a vertex v ∈ V in an unweighted graph isthe same as the breadth-first-search (BFS) tree rooted at v , which can be computedin just O(m) time.

Given a set Y ⊂ V , it is also quite easy to compute δ(v, Y ) and p(v, Y ) for allv ∈ V in O(m) time as follows: Connect a dummy vertex o to all the vertices ofset Y , and perform a BFS traversal on the graph starting from o.

LEMMA 2.1. Given a set Y ⊂ V , we can compute δ(v, Y ) and p(v, Y ) for allvertices v ∈ V in O(m) time.

Page 5: Approximate Distance Oracles for Unweighted Graphs inliamr/p557-baswana.pdf · Approximate Distance Oracles for Unweighted Graphs in Expected O(n2) Time SURENDER BASWANA Max-Planck

Approximate Distance Oracles for Unweighted Graphs 561

2.1. BALL AROUND A VERTEX. An important construct of the approximatedistance oracles is a Ball, which is defined as follows.

Definition 2.2. Given a graph G = (V, E), a vertex v ∈ V , and two subsets ofvertices X and Y , we define Ball(v, X, Y ) as a set in the following way.

Ball(v, X, Y ) = {x ∈ X |δ(v, x) < δ(v, Y )}In simple words, Ball(v, X, Y ) consists of all those vertices of the set X whose

distance from v is less than the distance of p(v, Y ) from v . As a simple observation,it can be noted that Ball(v, X, ∅) is the set X itself, whereas Ball(v, X, X ) = ∅.

The following lemma from Thorup and Zwick [2005] suggests that if the set Y isformed by sampling vertices from the set X with suitable probability, the expectedsize of Ball(v, X, Y ) will be sublinear in terms of |X |. This observation plays thekey role in achieving subquadratic bound on the size of the approximate distanceoracles of Thorup and Zwick [2005]. These oracles basically store distances to ahierarchy of Balls around each vertex.

LEMMA 2.3 [THORUP AND ZWICK 2005]. Given a graph G = (V, E), let a setY be formed by picking each vertex of a set X ⊆ V independently with probabilityq. The expected size of Ball(v, X, Y ) is at most 1/q.

PROOF. Consider the sequence 〈x1, x2, . . .〉 of vertices of set X arranged in non-decreasing order of their distances from v . The vertex xi will belong to Ball(v, X, Y )only if none of x1, . . . , xi are selected for the set Y . Therefore, xi ∈ Ball(v, X, Y )with probability at most (1−q)i . Hence using linearity of expectation, the expectednumber of vertices in Ball(v, X, Y ) is at most

∑i (1 − q)i < 1/q.

In order to answer an approximate distance query in constant time, Ball(v, X, Y )can be kept in a hash table [Fredman et al. 1984] so that one can determine inworst case constant time whether or not x ∈ Ball(v, X, Y ), and if so, report thedistance δ(v, x). Once we have the set Ball(v, X, Y ), the preprocessing time as wellas the space requirement of the corresponding hash-table would be of the order of|Ball(v, X, Y )|.

2.2. COMPUTING BALLS EFFICIENTLY. Thorup and Zwick [2005] presented avery efficient way to compute Balls. Their algorithm computes Ball(v, X, Y ) forall v ∈ V in a somewhat indirect way: for each vertex x ∈ X\Y , compute all thosevertices v ∈ V and their distances (from x) whose Ball(v, X, Y ) contains x . Thisindirect way of computing Balls turns out to be quite efficient due to the followinglemma.

LEMMA 2.4 [THORUP AND ZWICK 2005]. If a vertex x ∈ X belongs toBall(v, X, Y ), then x also belongs to Ball(u, X, Y ) for every vertex u lying ona shortest path between x and v.

PROOF. The proof is by contradiction. Given that x belongs to Ball(v, X, Y ),let u be a vertex lying on a shortest path between v and x . The vertex x would notbelong to Ball(u, X, Y ) if and only if there is some vertex w ∈ Y such that δ(u, w) ≤δ(u, x). On the other hand, using triangle inequality it follows that δ(v, w) ≤δ(v, u)+δ(u, w). Therefore, δ(u, w) ≤ δ(u, x) would imply that δ(v, w) ≤ δ(v, x).In other words, for the vertex v , the vertex w ∈ Y is not farther than the vertex x .Hence, x does not belong to Ball(v, X, Y ) (a contradiction !).

Page 6: Approximate Distance Oracles for Unweighted Graphs inliamr/p557-baswana.pdf · Approximate Distance Oracles for Unweighted Graphs in Expected O(n2) Time SURENDER BASWANA Max-Planck

562 S. BASWANA AND S. SEN

It follows from Lemma 2.2 that if we consider a shortest-path tree rooted at avertex x ∈ X , the set of all those vertices v ∈ V satisfying the condition “x ∈Ball(v, X, Y )” appears as a truncated sub-tree rooted at x . This implies that, for avertex x ∈ X , computing the set of vertices v ∈ V satisfying “x ∈ Ball(v, X, Y )”,amounts to performing a restricted BFS traversal from x , wherein we have to confinethe BFS traversal within this sub-tree only. The algorithm is described below andis a variant of the Modified Dijkstra algorithm given by Thorup and Zwick [2005]for unweighted graph.

Algorithm Restricted BFS(x, Y )

// Q is a queue which is empty initially and

// Visited is an array with Visited[u]←false ∀u ∈ V initially.

Visited[x]←true; Enqueue(Q, x);

while Q not empty

v ← Dequeue(Q);

For each neighbor w of vIf not Visited[w] and (δ(x, v) + 1 < δ(w, p(w, Y )))

Visited[w]←true;

δ(x, w) ← δ(x, v) + 1;

Enqueue(Q, w);

We can now state the following lemma whose proof is immediate:

LEMMA 2.5. The algorithm Restricted BFS(x, Y ) begins with exploring theadjacency list of x, and afterward, it explores the adjacency list of only thosevertices v that satisfy “x ∈ Ball(v, X, Y )”.

In order to compute Ball(v, X, Y ) for all v ∈ V , it suffices to performRestricted BFS(x, Y ) on all x ∈ X\Y . The running time of the algorithm is pro-portional to the number of edges traversed. To analyze this running time con-cisely, the following scheme is quite useful. For each edge that is traversed duringRestricted BFS(x, Y ), the cost of the traversal of the edge is assigned to the end-point that traverses it. Using this charging scheme, it follows from Lemma 2.5 thatRestricted BFS(x, Y ) will charge deg(v) cost to a vertex v ∈ V if x ∈ Ball(v, X, Y )and nil otherwise. We can thus state the following lemma.

LEMMA 2.6 [THORUP AND ZWICK 2005]. Given a graph G = (V, E), andsets X, Y of vertices, we can compute Ball(v, X, Y ), ∀v ∈ V by running Re-stricted BFS(x, Y ) on each x ∈ X\Y . During this computation, each vertex v ∈ Vwill be charged O(deg(v)|Ball(v, X, Y )|) cost.

3. The 3-Approximate Distance Oracle of Thorup and Zwick

Let G = (V, E) be an undirected graph. The 3-approximate distance oracle ofThorup and Zwick can be constructed as follows:

Let S ⊂ V be a set formed by selecting each vertex independently with proba-bility n−1/2. For each vertex v ∈ V , compute p(v, S) and the distances to verticesof set Ball(v, V, S). In addition, compute distances δ(v, x) for every x ∈ S (seeFigure 1).

The final data structure would keep, for each vertex v ∈ V , the nearest vertexp(v, S) and two hash tables: one hash table storing the distances to vertices of

Page 7: Approximate Distance Oracles for Unweighted Graphs inliamr/p557-baswana.pdf · Approximate Distance Oracles for Unweighted Graphs in Expected O(n2) Time SURENDER BASWANA Max-Planck

Approximate Distance Oracles for Unweighted Graphs 563

FIG. 1. Ball(v, V, S). For v , we store distance to the vertices pointed by arrows.

Ball(v, V, S), and another hash table storing the distances to all vertices of the setS. Using these hash-tables, any distance query can be answered as follows:

Answering a distance query. Let u, v ∈ V be any two vertices whose approximatedistance is to be computed. First it is determined whether or not u ∈ Ball(v, V, S),and if so, the exact distance δ(u, v) is reported. Note that u /∈ Ball(v, V, S) wouldimply δ(v, p(v, S)) ≤ δ(v, u). In this case, report the distance δ(v, p(v, S)) +δ(u, p(v, S)), which is at least δ(u, v) (using the triangle inequality) and upper-bounded by 3δ(u, v) as shown below.

δ(v, p(v, S)) + δ(u, p(v, S)) ≤ δ(v, p(v, S)) + (δ(u, v) + δ(v, p(v, S))) (1)

= 2δ(v, p(v, S)) + δ(u, v) (2)

≤ 2δ(u, v) + δ(u, v) = 3δ(u, v) (3)

It follows from Lemma 2.1 that the expected size of Ball(v, V, S) would be√

n,and hence the total expected size of the data structure would be O(n3/2). Thorupand Zwick showed that it takes expected O(m

√n) time to build the 3-approximate

distance oracle.

3.1. IDEAS FOR IMPROVING THE PREPROCESSING TIME FOR COMPUTING THE

ORACLE. Let us explore ways to improve the preprocessing time of 3-approximatedistance oracle for unweighted graphs. There are two main computational tasksinvolved in building the 3-approximate distance oracle. The first task is the com-putation of Ball(v, V, S) for all v ∈ V , and the second task is the computation ofdistances from sampled vertices S to all the vertices in the graph. Note that thereis an additional task of computing δ(v, S) and p(v, S) for every v ∈ V which isperformed in the beginning. But that can be quite easily performed in overall O(m)time as mentioned in Lemma 2.1.

In order to compute Ball(v, V, S) for all v ∈ V , the algorithm of Thorup andZwick [2005] computes Restricted BFS trees from all the vertices of set V \S, andshow that the expected total time for this task is O(m

√n). In their analysis, first

they observe from Lemma 2.1 that the expected size of Ball(v, V, S) is O(√

n)and then using Lemma 2.2, they conclude that the expected cost charged to avertex v is O(deg(v)

√n). This analysis shows that the expected time for computing

Page 8: Approximate Distance Oracles for Unweighted Graphs inliamr/p557-baswana.pdf · Approximate Distance Oracles for Unweighted Graphs in Expected O(n2) Time SURENDER BASWANA Max-Planck

564 S. BASWANA AND S. SEN

FIG. 2. Analyzing the distance reported by 3-approximate distance oracle of Thorup and Zwick.

Ball(v, V, S) for all v ∈ V may be as large as �(n2.5) for dense graphs. We providea tighter analysis of the same algorithm and show that its expected time is alwaysbounded by O(n2) irrespective of how large m might be. The key observation is thatthe size of Ball(v, V, S) is negatively correlated to degree of vertex v: if degree of vis very large (>

√n), then it is quite likely that v is going to have a neighbor from

sample S. In this situation, Ball(v, V, S) is going to contain just vertex v only and sototal computation cost charged to it will be O(deg(v)) instead of O(deg(v)

√n). We

prove this observation later more rigorously in Lemma 5.2. This crucial observationwas missing in the analysis of Thorup and Zwick [2005].

It is less obvious to perform the second task in O(n2) time. The second task thatcomputes BFS trees from vertices of set S would require O(m|S|) time, which iscertainly not O(n2) when the graph is dense. To improve its preprocessing timeto O(n2) when the given graph is dense, one approach would be to perform BFStraversal from vertices of the set S on a sparse subgraph which, in spite of itssparseness, preserves pairwise distances approximately too. Such a subgraph iscalled a spanner.

Definition 3.1. For a graph G = (V, E), and two real numbers α ≥ 1, β ≥ 0,a subgraph (V, E), E ⊆ E is said to be an (α, β)-spanner if for all u, v ∈ V , thedistance δ(u, v) in the spanner satisfies

δ(u, v) ≤ δ(u, v) ≤ αδ(u, v) + β.

Note that the sparsity of a spanner comes along with the stretching of the distancesin the graph. So one has to be careful in employing an (α, β)-spanner (with α > 1) inthe second step, lest one should end up computing a (2α +1)-approximate distanceoracle instead of a 3-approximate distance oracle. To verify it, replace the termδ(u, p(v, S)) with δ(u, p(v, S)), which may be at least αδ(u, p(v, S)), in Inequality(1). To explore the possibility of using a spanner in the second task of the algorithm,let us carefully revisit the distance reporting scheme of the 3-approximate distanceoracle of Thorup and Zwick [2005]. For a pair of vertices u, v ∈ V , the distancereported is exact if u ∈ Ball(v, V, S). So let us analyze the case when u lies outsideBall(v, V, S).

Let us partition a shortest path between v and u into two sub-paths: the sub-path oflength a covered by Ball(v, V, S), and the sub-path of length x lying outside the ball.With this partition, the distance δ(v, p(v, S)) would be a + 1. Now considering thepath between p(v, S) and u that passes through v (see Figure 2), we can conclude

Page 9: Approximate Distance Oracles for Unweighted Graphs inliamr/p557-baswana.pdf · Approximate Distance Oracles for Unweighted Graphs in Expected O(n2) Time SURENDER BASWANA Max-Planck

Approximate Distance Oracles for Unweighted Graphs 565

that the distance δ(u, p(v, S)) would be bounded by 2a + 1 + x . Consequently,the distance “δ(v, p(v, S)) + δ(u, p(v, S))” as reported by 3-approximate distanceoracle of Thorup and Zwick [2005] is at most 3a + x + 2.

A comparison of this upper bound of 3a + x +2 on the distance reported with theactual distance δ(u, v) = a + x suggests that there is a possibility of stretching thesubpath (of length x) uncovered by Ball(v, V, S) by a factor of 3 and still keepingthe distance reported to be 3-approximate. So we may employ a spanner for thesecond task of preprocessing algorithm if for each vertex v ∈ V , the shortest pathsfrom v to all the vertices of Ball(v, V, S) ∪ {p(v, S)} are preserved in the spanner.The reader must note that this feature is required in addition to the O(n3/2) size andsuitable stretch (α, β) that the spanner must have. We introduce a special kind ofspanner, called parameterized spanner that has these additional features.

In the following section we define a parameterized (2, 1)-spanner and presenta linear-time algorithm for this. The concept and the linear-time algorithm forparameterized spanner are of independent interest. However, the reader who isinterested in the parameterized spanner as a tool for efficient computation of ap-proximate distance oracles, requires the features of parameterized spanner men-tioned in Lemma 4.2 and Theorem 4.5 only. In Section 5, we describe the expectedO(min(n2, m

√n)) time algorithm for 3-approximate distance oracles. We extend

it to (2k − 1)-approximate distance oracles in Section 6.

4. A Parameterized (2, 1)-Spanner

Definition 4.1. Given a graph G = (V, E), and a parameter S ⊂ V , a subgraph(V, ES), ES ⊆ E is said to be a parameterized (2, 1)-spanner with respect to S if

(i) (V, ES) is a (2, 1)-spanner.

(ii) If a vertex has no neighbor in set S, then all the edges adjacent to it are presentin the spanner.

(iii) All edges incident to S are present in the spanner.

The feature (ii) ensures that any edge (u, v) for which either u or v is not adjacentto S, is present in the spanner. Let G = (V, E) be the given undirected unweightedgraph and S be a subset of vertices from the graph. We shall now present analgorithm that computes a parameterized (2, 1)-spanner (V, ES) with respect to S.The algorithm starts with empty spanner, that is, ES = ∅, and adds edges to ES inthe following three steps.

(1) Forming the clustersWe form clusters (of vertices) around the vertices of the set S. Initially theclusters are {{u}|u ∈ S}. Each u ∈ S will be referred to as the center of itscluster. We process each vertex v ∈ V \S as follows.

(a) If v is not adjacent to any vertex in S, we add every edge incident on v tothe set ES .

(b) If v is adjacent to some vertex o ∈ S, assign v to the cluster centered ato ∈ S and add the edge (v, o) to the set ES . In case there is more than onevertex in S adjacent to v , break ties arbitrarily. (We will break such tie ina definitive way in the context of the construction of approximate distanceoracles).

Page 10: Approximate Distance Oracles for Unweighted Graphs inliamr/p557-baswana.pdf · Approximate Distance Oracles for Unweighted Graphs in Expected O(n2) Time SURENDER BASWANA Max-Planck

566 S. BASWANA AND S. SEN

This is the end of the first step of the algorithm. At this stage let E ′ denote theset of all the edges of the given graph excluding the edges added to the spannerand all the intra-cluster edges. Let V ′ be the set of vertices corresponding tothe end-points of the edges E ′. It is evident that each vertex of set V ′ is either avertex from S or adjacent to some vertex from S, and the first step has partitionedV ′ into disjoint clusters each centered around some vertex from S. The graph(V ′, E ′) and the corresponding clustering is passed onto the second step.

(2) Adding edges between vertices and clusters:For each vertex v ∈ V ′, we group all its neighbors into their respective clusters.There will be at most |S| neighboring clusters of v . For each cluster adjacentto vertex v , we select an (arbitrary) edge from the set of edges between (thevertices of) the cluster and v , and add it to the set ES .

(3) Adding edges incident on S:Finally, we also add to the set ES all the edges incident on vertices of S thatdid not get added in the two steps given above to complete the construction ofthe spanner.

Note that the steps 1(a) and (3) ensure that for the final spanner conditions (ii) and(iii) of Definition 4.1 are satisfied so that it is a parameterized spanner with respectto S.

The algorithm described above is very similar to the algorithm for a 3-spannergiven in Baswana and Sen [2003]. However, there is a subtle difference both in theconstruction and the analysis, and we shall highlight this difference towards theend. It is easy to verify that the implementation of the three steps of the algorithmmerely requires traversal of the adjacency list of each vertex a constant number oftimes. Thus the running time of the algorithm is O(m). For details of O(m) timeimplementation, the reader is referred to Baswana and Sen [2003].

LEMMA 4.2. If (u, v) ∈ E is an edge not present in the parameterized spanner(V, ES) as constructed above, there is a one-edge or a two-edge path in the spannerbetween the vertex u and the center of the cluster containing vertex v, and vice versa.

PROOF. There are two cases depending upon whether u and v belong to thesame cluster or different clusters.

Case 1. (u and v belong to same cluster). Let the cluster centered at o ∈ S containboth the vertices u and v . It follows from step 1(b) of the algorithm that the edges(u, o) and (v, o) are present in the spanner. So there is a one edge path from v tothe center of the cluster containing u, and vice versa.

Case 2. (u and v belong to different clusters). Let x and y be the centers of theclusters containing u and v , respectively. It follows that u is adjacent to the clustercentered at y, therefore, an edge between u and some vertex, say w , of this clusterhas been added to the spanner in the second step of the algorithm. It follows fromstep 1(b) of the algorithm that the edge (w, y) is also present in the spanner. Sothere is a two-edge path u − w − y between u and y in the spanner. Using similararguments, there is also a two-edge path between v and x in the spanner.

We can easily extend the statement of Lemma 4.2 to the edges in ES , and soto the entire set E with the following terminology. Let C(x) denote the vertex xif x does not belong to any cluster, otherwise let C(x) denote the center of the

Page 11: Approximate Distance Oracles for Unweighted Graphs inliamr/p557-baswana.pdf · Approximate Distance Oracles for Unweighted Graphs in Expected O(n2) Time SURENDER BASWANA Max-Planck

Approximate Distance Oracles for Unweighted Graphs 567

cluster containing vertex x (a vertex that belongs to S is its own center). Withthis terminology, Lemma 4.2 states that for each edge (u, v) /∈ ES , there is a pathbetween u and C(v) (and vice versa) of length at most two. This statement triviallyholds for (u, v) ∈ ES too if u and v are not clustered. Note that the distance betweena vertex x and C(x) is at most one. Using this observation, the statement holds for(u, v) ∈ ES even when u or v belong to some cluster. We shall now prove thefollowing important theorem:

THEOREM 4.3. For a graph G = (V, E) and a subset S ⊂ V , the parameterizedspanner (V, ES) with respect to S computed by the algorithm above is a (2, 1)-spanner.

PROOF. In order to ensure that δ(u, w) ≤ 2δ(u, w)+1 in the subgraph (V, ES),consider the sequence suw : 〈u(= v0), v1, . . . , v j−1, w(= v j )〉 of vertices of ashortest path between u and v in the order they appear on it in the original graph. Nowconsider another sequence cuw : 〈v0, C(v1), v2, C(v3), . . .〉 formed by replacingeach vertex v2i+1 of the sequence suw by C(v2i+1), for i < j/2. It follows that eachpair of consecutive vertices in the sequence cuw is connected by a path of lengthat most two in the spanner. So there is a path of length at most 2 j between u andthe last vertex of the sequence cuw in the spanner. Note that the last vertex of thesequence cuw is w or C(w) depending on whether the path length j is even or odd.If the path length is even, it implies that δ(u, w) ≤ 2 j . Otherwise ( j is odd), notethat either C(w) is the vertex w itself or there is an edge between C(w) and w inthe spanner (V, ES). Thus, δ(u, v) ≤ 2 j + 1. Combining the two cases (even andodd j), we can conclude that δ(u, v) ≤ 2δ(u, v) + 1.

Now let us analyze the size of the parameterized (2, 1)-spanner (V, ES) whenthe set S is formed by selecting each vertex independently with probability q. Inthe first step of the algorithm, the expected number of edges contributed by a vertexv is at most 1 + deg(v)(1 − q)deg(v), which is upper-bounded by 1/q. Hence theexpected number of edges added to the spanner during the first step is at most n/q.During the second step, the expected number of edges added to the spanner is atmost n2q since we add at most one edge for each vertex-cluster pair. The thirdstep contributes an expected number of at most 2mq edges to the spanner, whichis bounded by n2q. Hence, we can state the following lemma.

LEMMA 4.4. Let a set S be formed by selecting each vertex independentlywith probability q, then the expected number of edges in the parameterized (2, 1)-spanner (V, ES) is O(n/q + n2q).

As mentioned earlier, the running time of the algorithm is O(m). We would re-peat the algorithm if the size of the spanner exceeds twice its expected size. Theexpected number of iterations is O(1). Thus, a (2, 1) spanner of size O(n/q + n2q)can be computed in O(m) expected time. Now we state the following theo-rem that would highlight the crucial role played by the parameterized (2, 1)-spanner in constructing approximate distance oracles always in expected O(n2)time.

THEOREM 4.5. Let G = (V, E) be a graph and let S be a set formed by selectingeach vertex independently with probability q. There is an expected O(m) timecomputable parameterized (2, 1)-spanner (V, ES) with the following properties:

Page 12: Approximate Distance Oracles for Unweighted Graphs inliamr/p557-baswana.pdf · Approximate Distance Oracles for Unweighted Graphs in Expected O(n2) Time SURENDER BASWANA Max-Planck

568 S. BASWANA AND S. SEN

(1) The spanner preserves a shortest path from v to p(v, S) and to every vertex usuch that δ(v, u) < δ(v, Y ).

(2) The number of edges in the spanner is O(min(m, n/q + n2q)).

PROOF. Let u be a vertex such that δ(v, u) < δ(v, Y ), and let v(= v0),v1, . . . , u(= v j ) be a shortest path between v and u in the original graph G = (V, E).Note that none of the vertices vi , i < j has any neighbor from the set S. There-fore, condition (ii) of Definition 4.1 implies that all the edges (vi , vi+1), i < j arepresent in the parameterized spanner (V, ES). Hence, the shortest path from v to uis preserved in the spanner.

We shall now show that a shortest path from v to p(v, S) is also preserved inthe parameterized (2, 1)-spanner. If p(v, S) is adjacent to v in the graph, then theshortest path between the two vertices is just the edge (v, p(v, S)). Since p(v, S) ∈ Sand we add all those edges to the spanner which are incident on S (see Definition4.1 condition (iii)), the edge (v, p(v, S)) is surely present in the parameterizedspanner. So let us consider the case when p(v, S) is not adjacent to v . In this case,δ(v, p(v, S)) is greater than 1, and let v ′ be the second last vertex on a shortestpath from v to p(v, S). So δ(v, v ′) = δ(v, Y ) − 1. As shown above, the shortestpath from v to v ′ is preserved in the parameterized spanner. The edge (v ′, p(v, S))is also present in the spanner as implied by Definition 4.1 (condition (iii)). Hencethe complete shortest path between v and p(v, S) is preserved in the parameterizedspanner (V, ES).

If we construct the set S by selecting each vertex independently with probabilityq = n−1/2, we get a bound of O(n3/2) on the size of (2, 1)-spanner. Thorup andZwick [2005] showed that �(n3/2) bound on the size of (3,0)-spanner is indeedworst case optimal. Since a (2, 1)-spanner is also a (3,0)-spanner, therefore, �(n3/2)bound is worst case optimal for (2, 1)-spanner too.

THEOREM 4.6. An undirected unweighted graph on n vertices and m edgescan be processed in expected O(m) time to compute its (2, 1)-spanner of sizeO(min(m, n3/2)).

Note that although being of same size asymptotically, a (2, 1)-spanner is betterthan a 3-spanner since it achieves a better ratio of δ(u, v)/δ(u, v), which is close to2 for vertices separated by long distances.

We would also like to highlight the subtle difference between the algorithm for(2, 1)-spanner presented in this section and the algorithm for 3-spanner describedin Baswana and Sen [2003]. The algorithm for a 3-spanner ensures that for eachedge (u, v) not in the spanner, either there is a path from u to C(v) of length at mosttwo or a path from v to C(u) of length at most two. But, as the reader may verify inTheorem 4.3, we require the existence of both these paths to prove that the spanneris a (2, 1)-spanner.

5. A 3-Approximate Distance Oracle in Expected O(min(n2, m√

n)) Time

We now describe the algorithm for computing a 3-approximate distance oracle thatwould require expected O(min(n2, m

√n)) time. The preprocessing algorithm is

essentially the same as that of Thorup and Zwick [2005], except that, for every

Page 13: Approximate Distance Oracles for Unweighted Graphs inliamr/p557-baswana.pdf · Approximate Distance Oracles for Unweighted Graphs in Expected O(n2) Time SURENDER BASWANA Max-Planck

Approximate Distance Oracles for Unweighted Graphs 569

v ∈ V we compute the distance to the vertices of sampled set in a parameterized(2, 1)-spanner rather than in the original graph.

Algorithm for computing a 3-approximate distance oracle

Let S ⊆ V contain each vertex of V independently with probability n−1/2.

(1) Compute p(v, S) and Ball(v, V, S) for each v ∈ V .

(2) Compute a parameterized (2,1)-spanner (V, ES) with respect to set S.(see Note below)

(3) Compute distance δ(v, x) for each v ∈ V, x ∈ S in the spanner (V, ES).

Note. While computing a (2, 1)-spanner in the second step of the preprocessingalgorithm, we shall adopt the following strategy: In case a vertex v is a neighbor ofmore than one vertex from the set S, instead of assigning it to the cluster centeredat any arbitrarily picked neighbor from S, we shall assign v to the cluster centeredat p(v, S). As a result, Lemma 4.2 for the (2, 1)-spanner can be restated as follows:

LEMMA 5.1. If an edge (u, v) is not present in the parameterized (2, 1)-spanner(V, ES), there is a one-edge or a two-edge path in the spanner between the vertexp(u, S) and the vertex v, and vice-versa.

In the next section, we shall show that the new oracle is indeed a 3-approximatedistance oracle. But prior to that, we shall prove that the expected time to computethe oracle is O(min(n2, m

√n)) as follows.

There are three main tasks in the construction of our oracle. The second taskinvolves the computation of parameterized (2, 1)-spanner with respect to S. It fol-lows from Theorem 4.5 that this step would require expected O(m) time and thesize of the spanner will be O(min(m, n3/2)). The third task is accomplished byperforming BFS traversal from vertices of S in the parameterized (2, 1)-spanner.This task would require expected O(min(m, n3/2) · |S|) = O(min(m

√n, n2)) time

since expected size of S is√

n. Let us now analyze the computation time for thefirst task. It follows from Lemma 2.1 that computing p(v, S) for all v ∈ V requiresjust O(m) time in total. Using arguments which essentially combine Lemmas 2.1and 2.2, Thorup and Zwick showed that it takes expected O(m

√n) time to compute

Ball(v, V, S) for all v ∈ V . We shall now show that the net expected time requiredto compute Ball(v, V, S) for all v ∈ V never exceeds O(n2) irrespective of howlarge m may be.

LEMMA 5.2. The expected time for computing Ball(v, V, S) for all v ∈ V neverexceeds O(n2).

PROOF. We compute Ball(v, V, S) for all vertices v ∈ V by executing thesubroutine Restricted BFS(x, S) on each x ∈ V \S. It follows from Lemma 2.2 thatthe expected cost charged to vertex v during all these subroutines will be of theorder of

deg(v) ·∑

u∈V

Pr[u ∈ Ball(v, V, S)].

Therefore, in order to establish an O(n2) bound on the expected time for computingBall(v, V, S) for all v ∈ V , we just need to show that this expected cost is O(n). Let〈v(= v1), v2, . . . , vn〉 be the sequence of vertices V \{v} arranged in nondecreasing

Page 14: Approximate Distance Oracles for Unweighted Graphs inliamr/p557-baswana.pdf · Approximate Distance Oracles for Unweighted Graphs in Expected O(n2) Time SURENDER BASWANA Max-Planck

570 S. BASWANA AND S. SEN

order of their distances from v . Now v j ∈ Ball(v, V, S) if and only if none ofthe vertices within distance δ(v, v j ) from v is present in S. Certainly, there areat-least max( j, deg(v)) such vertices. Furthermore, each vertex is selected in theset S independently with probability q = n−1/2. So the expected cost charged to vcan be bounded as follows.

u∈V

deg(v) · Pr[u ∈ Ball(v, V, S)]

≤∑

j

deg(v) · (1 − q)max ( j,deg(v))

=∑

j≤deg(v)

deg(v) · (1 − q)deg(v) +∑

j>deg(v)

deg(v) · (1 − q) j

≤ (deg(v))2 · (1 − q)deg(v) +∑

j>deg(v)

j · (1 − q) j

≤ (deg(v))2 · (1 − q)deg(v) + 1/q2

≤ 1/q2 + 1/q2 {since x2(1 − α)x ≤ 1/α2, ∀α, x > 0}= O(n) {since q = n−1/2 }

Based on Lemma 5.2 and the preceding discussion, we can conclude thatthe expected preprocessing time for the new approximate distance oracle isO(min(n2, m

√n)). The expected size of the oracle is E[

∑v |Ball(v, V, S)|+|S|·n]

which, using Lemma 2.1, is at most 2n3/2. We shall repeat the algorithm if the sizeexceeds 4n3/2, and using Markov Inequality, the probability for this event will beat most 1/2. So the expected number of repetitions will be O(1). Hence, we canstate the following theorem.

THEOREM 5.3. The new approximate distance oracle can be computed in ex-pected O(min(n2, m

√n)) preprocessing time, and it occupies O(n3/2) space.

5.1. REPORTING DISTANCE WITH STRETCH AT MOST 3. We describe below thequery answering procedure for the new approximate distance oracle. It differs fromthe algorithm of Thorup and Zwick [2005] only for the case when u /∈ Ball(v, V, S).

Q(u, v): Reporting approximate distance between u and v

If u ∈ Ball(v, V, S)

return δ(u, v)

Else report minimum of

δ(u, p(u, S)) + δ(v, p(u, S)) & δ(v, p(v, S)) + δ(u, p(v, S))

We shall now show that the new oracle ensures a stretch of 3 at most, that is, forany pair of vertices (u, v), the distance reported is at most 3 times δ(u, v). Clearly,from triangle inequality the reported distance is at least δ(u, v)

THEOREM 5.4. The query answering algorithm Q(u, v) reports 3-approximatedistance between u and v.

PROOF. It follows from step (3) in the construction of new approximate distanceoracle that if either u or v is a member of set S, then the distance reported is atmost 2δ(u, v) + 1 ≤ 3δ(u, v). So assume from now onwards that u /∈ S and v /∈ S.

Page 15: Approximate Distance Oracles for Unweighted Graphs inliamr/p557-baswana.pdf · Approximate Distance Oracles for Unweighted Graphs in Expected O(n2) Time SURENDER BASWANA Max-Planck

Approximate Distance Oracles for Unweighted Graphs 571

FIG. 3. Analyzing the path from v to u when δ(u, p(u, S)) > 1.

If u ∈ Ball(v, V, S), we report the exact distance between u and v since we, justlike Thorup and Zwick [2005], store the exact distance from v to all the vertices ofBall(v, V, S). So it is the case u /∈ Ball(v, V, S) that needs to be analyzed carefully.In fact there are the following two subcases.

Case 1. δ(u, p(u, S)) = 1. Let v0(= u), v1, . . . , vl(= v) be a shortest pathbetween u and v . Observe that δ(u, p(u, S)) = 1 implies that u must be adjacentto p(u, S). It follows from Lemma 5.1 that there is a path between p(u, S) and v1

in the spanner that consists of at most 2 edges. Furthermore, by the property of(2, 1)-spanner, the distance δ(v1, vl) between v1 and vl in the spanner is not greaterthan 3(l − 1). Hence δ(v, p(u, S)) ≤ 3(l − 1) + 2. So

δ(u, p(u, S)) + δ(v, p(u, S)) ≤ 1 + 3(l − 1) + 2 = 3l = 3δ(u, v)

Case 2. δ(u, p(u, S)) > 1. In this case we show that δ(v, p(v, S))+ δ(u, p(v, S))is at most 3δ(u, v) using the properties of the parameterized (2, 1)-spanner. Thisproof is along the lines as sketched in Section 3.1.

A shortest path from v to u can be visualized as concatenation of two subpaths(see Figure 3): the subpath Pvw of length a lying inside Ball(v, V, S) and the sub-path Pwu of length x ≥ 1 outside the Ball. (In case, a = 0, the vertex w wouldbe same as v .) Let the length of path Pwu be stretched to x ′ in the parameterizedspanner. Since the spanner is a parameterized spanner with respect to S, it followsfrom Theorem 4.5 that δ(v, p(v, S)) = a+1 and δ(v, w) = a. Now considering thepath from u to p(v, S) passing through v it follows that δ(u, p(v, S)) ≤ 2a +1+ x ′.Hence,

δ(v, p(v, S)) + δ(u, p(v, S)) ≤ 3a + 2 + x ′

To ensure that this distance is not more than three times the actual distance δ(u, v) =a + x , all we need to show is that x ′ ≤ 3x − 2. Let (u′, u) be the last edge of thepath Pwu . Now observe that δ(u, p(u, S)) > 1 = δ(u, u′). So it follows fromTheorem 4.5 that the edge (u′, u) must be present in the parameterized spanner.Moreover, the part of the sub-path Pwu excluding the edge (u′, u) is of lengthx − 1, and can’t be stretched to more than 3(x − 1) in the (2, 1)-spanner. Hence,x ′ ≤ 3(x − 1) + 1 = 3x − 2, and we are done.

Combining Theorems 5.3 and 5.4, we can state the following theorem:

THEOREM 5.5. An undirected unweighted graph on n vertices and m edges canbe preprocessed in expected O(min(n2, m

√n)) time to compute a 3-approximate

distance oracle of size O(n3/2).

Page 16: Approximate Distance Oracles for Unweighted Graphs inliamr/p557-baswana.pdf · Approximate Distance Oracles for Unweighted Graphs in Expected O(n2) Time SURENDER BASWANA Max-Planck

572 S. BASWANA AND S. SEN

FIG. 4. (i) Schematic description of Ball(v, Si−1, Si ), i ≤ k. (ii) Hierarchy of balls around v .

6. (2k − 1)-Approximate Distance Oracle in Expected O(n2) Time for k > 2

In this section, we shall describe the construction of a (2k−1)-approximate distanceoracle for k > 2, which can be viewed as a generalization of the new 3-approximatedistance oracle.

Algorithm for computing a (2k − 1)-approximate distance oracle

S1 ←− VFor i ← 2 to k

let Si contain each element of Si−1, independently, with probability n−1/k

(1) For i ← 2 to kCompute p(v, Si ) and Ball(v, Si−1, Si ) for all v ∈ V

(2) Compute a parameterized (2, 1)-spanner (V, ES) with S = Sk .(see Note below)

(3) Compute δ(v, x) for each v ∈ V, x ∈ Sk in the spanner (V, ES) for S = Sk .

Note. While computing a (2, 1)-spanner in the second step of the preprocessingalgorithm, we shall adopt the following strategy: In case a vertex v is neighboringto more than one vertex from set Sk , instead of assigning it to the cluster centeredat any arbitrarily picked neighbor from Sk , we shall assign v to the cluster centeredat p(v, Sk).

The final data structure would keep k hash tables per vertex v ∈ V as follows:For 1 < i ≤ k, the set of vertices Ball(v, Si−1, Si ) ∪ {p(v, Si )} and their distancesfrom v are kept in a hash-table denoted by Balli−1(v) henceforth (see Figure 4).The distances from v to all the vertices of set Sk in the (2, 1)-spanner is kept in ahash-table denoted as Ballk(v).

Since Si is formed by selecting each vertex of the set Si−1 with probabilityn−1/k , it follows from Lemma 2.1 that the expected size of Ball(v, Si−1, Si ) is atmost n1/k . Hence, each hash-table Balli (v), i ≤ k would occupy expected O(n1/k)space. It follows that the final data structure will require expected O(kn1+1/k)space.

Page 17: Approximate Distance Oracles for Unweighted Graphs inliamr/p557-baswana.pdf · Approximate Distance Oracles for Unweighted Graphs in Expected O(n2) Time SURENDER BASWANA Max-Planck

Approximate Distance Oracles for Unweighted Graphs 573

Let us first analyze the computation time required for the new distance oracle.There are three main tasks in the preprocessing algorithm. The second task involvesthe computation of a parameterized (2, 1)-spanner with respect to Sk . It follows fromTheorem 4.5 that it would require expected O(m) time and the size of the spannerwould be O(min(m, n2−1/k)). The third task requires computation of BFS trees inthe spanner from each vertex of set Sk , and hence can be computed in expectedO(min(mn1/k, n2)) time since the expected size of Sk is O(n1/k). Now let us analyzethe first task that requires computation of p(v, Si ) and Ball(v, Si−1, Si ). Similar tothe 3-approximate distance oracle, Ball(v, Si−1, Si ) for all v ∈ V is computed byperforming Restricted BFS(x, Si ) on every vertex x ∈ Si−1\Si . Using argumentswhich essentially combine Lemmas 2.1 and 2.2, Thorup and Zwick showed that ittakes expected O(mn1/k) time to compute Ball(v, Si−1, Si ) for all v ∈ V . Using animproved analysis, which is basically a generalization of Lemma 5.2, we shall nowprovide a tighter bound on the expected time required to compute Ball(v, Si−1, Si )for all v ∈ V .

LEMMA 6.1. The expected time required to compute Ball(v, Si−1, Si ), ∀v ∈ Vnever exceeds O(n

k+ik ).

PROOF. We compute Ball(v, Si−1, Si ) for all v ∈ V by executing the subroutineRestricted BFS(x, Si ) on each x ∈ Si−1\Si . It follows from Lemma 2.2 that theexpected computation cost charged to a vertex v during all these subroutines is of theorder of

∑u deg(v)Pr[u ∈ Ball(v, Si−1, Si )]. It would suffice if we show that this

expected cost is O(ni/k). To do so, let us first calculate Pr[u ∈ Ball(v, Si−1, Si )].Let Du

v denote the set of vertices within distance δ(u, v) from v . It follows fromDefinition 2.2 that u will belong to Ball(v, Si−1, Si ) iff u ∈ Si−1 and none of thevertices of set Du

v is present in Si . Now whether or not a vertex belongs to set Si isindependent of any other vertex. Using this independence,

Pr[u ∈ Ball(v, Si−1, Si )] = Pr[u ∈ Si−1 ∧ u /∈ Si ] ·∏

w∈Duv \{u}

Pr[w /∈ Si ]

= Pr[u ∈ Si−1] · Pr[u /∈ Si |u ∈ Si−1]

·∏

w∈Duv \{u}

Pr[w /∈ Si ].

Now note that Pr[u ∈ Si |u ∈ Si−1] > Pr[u ∈ Si ] and, therefore, Pr[u /∈ Si |u ∈Si−1] < Pr[u /∈ Si ]. So

Pr[u ∈ Ball(v, Si−1, Si )] < Pr[u ∈ Si−1] · Pr[u /∈ Si ] ·∏

w∈Duv \{u}

Pr[w /∈ Si ]

= n−i+2

k ·∏

w∈Duv

Pr[w /∈ Si ]

= n−i+2

k · Pr[u ∈ Ball(v, V, Si )].

Hence, the expected computation cost charged to v is n−i+2

k∑

u deg(v) · Pr[u ∈Ball(v, V, Si )]. Using arguments similar to those used in Lemma 5.2, it followsthat

∑u deg(v) · Pr[u ∈ Ball(v, V, Si )] = 1/q2, where q = n

−i+1k . Hence, the

expected cost charged to vertex v is O(nik ), and we are done.

The preprocessing algorithm computes Ball(v, Si−1, Si ) for all v ∈ V and foreach i ≤ k. Hence using Lemma 6.1 and the preceding discussion, we can conclude

Page 18: Approximate Distance Oracles for Unweighted Graphs inliamr/p557-baswana.pdf · Approximate Distance Oracles for Unweighted Graphs in Expected O(n2) Time SURENDER BASWANA Max-Planck

574 S. BASWANA AND S. SEN

that the expected preprocessing time for the new approximate distance oracle isO(min(n2, kmn1/k)). The expected size of the oracle is of the order of kn1/k . Weshall repeat the algorithm if the size exceeds twice this bound, and using MarkovInequality, the probability of this event will be at most 1/2. So the expected numberof repetitions will be O(1). Hence we can state the following theorem.

THEOREM 6.2. The new approximate distance oracle can be constructed inexpected O(min(n2, kmn1/k)) time and occupies O(kn1/k) space.

6.1. REPORTING APPROXIMATE DISTANCE WITH STRETCH AT MOST (2k − 1).Just like the new 3-approximate distance oracle, there is only a subtle differencebetween the query answering procedure of the new and the previously existing(2k − 1)-approximate distance of Thorup and Zwick [2005]. First, we provide anoverview of the underlying query answering procedure before we formally state it.

Let u and v be two vertices whose approximate distance is to be reported. In orderto explain the query answering procedure, we introduce the following notation.

For a pair of vertices u and v , a vertex w is said to be t-near to u if δ(u, w) ≤tδ(u, v). It follows from the simple triangle inequality that if a vertex is t-near to uthen it is (t + 1)-near to v also.

The entire query answering process is to search for a t-near vertex of u whosedistance to both u and and v is known, and t < k. The search is performed iterativelyas follows: The i th iteration begins with a vertex w ∈ Si which is (i − 1)-near to u(and hence i-near to v) and its distance from u is known. It is determined whetheror not its distance to v is known. Since w ∈ Si , we do so by checking whetheror not w is present in the hash-table Balli (v). Now w /∈ Balli (v) only if, for thevertex v , the vertex p(v, Si+1) is equidistant or nearer than the vertex w . But thiswould imply that p(v, Si+1) is also i-near to v , and note that its distance from vis known. Therefore, if the distance δ(v, w) is not known, we continue in the nextiteration with vertex w = p(v, Si+1), and swap the role of u and v . In this way weproceed gradually, searching over the sets S1, . . . , Sk . We are bound to find sucha vertex within at most k iterations since the distance from Sk is known to all thevertices. We now formally state the query answering procedure which is essentiallythe same as that of Thorup and Zwick [2005] except the ’If’ instruction at theend.

Q(u, v): Reporting approximate distance between u and v

i ← 1

While(

p(u, Si ) /∈ Balli (v))do

swap(u, v)

i ← i + 1

If i < kreturn δ(u, p(u, Si )) + δ(v, p(u, Si ))

Else

return minimum of

δ(u, p(u, Sk)) + δ(v, p(u, Sk)) & δ(v, p(v, Sk)) + δ(u, p(v, Sk))

Based on the arguments above, the following lemma holds.

Page 19: Approximate Distance Oracles for Unweighted Graphs inliamr/p557-baswana.pdf · Approximate Distance Oracles for Unweighted Graphs in Expected O(n2) Time SURENDER BASWANA Max-Planck

Approximate Distance Oracles for Unweighted Graphs 575

LEMMA 6.3 [THORUP AND ZWICK 2005]. In the beginning of the i th iterationof while loop, δ(u, p(u, Si )) ≤ (i − 1)δ(u, v).

THEOREM 6.4. The query answering algorithm Q(u, v) reports (2k − 1)-approximate distance between u and v.

PROOF. The query answering algorithm performs iterations of the While loopuntil the condition of the loop fails. As mentioned above, there will be at most(k −1) successful iterations before the condition of the loop fails. Let the conditionfails in the beginning of i th iteration. So the following inequality follows fromLemma 6.1 when we exit the While loop.

δ(u, p(u, Si )) ≤ (i − 1)δ(u, v) (4)

We shall analyze the distance reported by the oracle on the basis of the final valueof i .

If i < k, the reported distance is δ(u, p(u, Si )) + δ(v, p(u, Si )), which using thetriangle inequality is at most 2δ(u, p(u, Si )) + δ(u, v). Using Inequality (4), thisdistance is at most (2i − 1)δ(u, v) < (2k − 1)δ(u, v).

We have to analyze the case i = k now. It follows from the last step of constructionof (2k−1)-approximate distance oracle that if either u or v belongs to Sk , the distancereported will be at most 3δ(u, v). So from now onwards, assume that u /∈ Sk andv /∈ Sk . There are the following two cases:

Case 1. δ(u, v) < δ(u, p(u, Sk)). It follows from Theorem 4.5 that δ(v, u) =δ(v, u) and δ(u, p(u, Sk)) = δ(u, p(u, Sk)). Combining these two equalities to-gether and using the triangle inequality, it follows that δ(v, p(u, Sk)) ≤ δ(v, u) +δ(u, p(u, Sk)). Hence, the distance reported is at most (2k − 1)δ(u, v) as shownbelow.

δ(u, p(u, Sk)) + δ(v, p(u, Sk)) ≤ 2δ(u, p(u, Sk)) + δ(u, v)

≤ 2(k − 1)δ(u, v) + δ(u, v) {using Inequality (4)}= (2k − 1)δ(u, v)

Case 2. δ(u, v) ≥ δ(u, p(u, Sk)). It follows from triangle inequality thatδ(v, p(u, Sk)) ≤ δ(v, u) + δ(u, p(u, Sk)). It follows from Theorem 4.5 thatδ(u, p(u, Sk)) = δ(u, p(u, Sk)) and δ(v, u) ≤ 3δ(u, v). Hence, the distance re-ported by the oracle is bounded by

δ(u, p(u, Sk)) + δ(v, p(u, Sk)) ≤ δ(u, p(u, Sk)) + (δ(u, p(u, Sk)) + 3δ(u, v))

= 2δ(u, p(u, Sk)) + 3δ(u, v)

≤ 5δ(u, v) {since δ(u, v) ≥ δ(u, p(u, Sk))}.Thus, the approximate distance oracle achieves stretch 5 in this case, which is atmost (2k − 1) for any integer k > 2.

Combining Theorems 6.2 and 6.4, we can state the following theorem:

THEOREM 6.5. An undirected unweighted graph on n vertices and m edgescan be preprocessed in expected O(min(n2, kmn1/k)) time to compute a (2k − 1)-approximate distance oracle of size O(kn1+1/k), for any integer k > 2.

Page 20: Approximate Distance Oracles for Unweighted Graphs inliamr/p557-baswana.pdf · Approximate Distance Oracles for Unweighted Graphs in Expected O(n2) Time SURENDER BASWANA Max-Planck

576 S. BASWANA AND S. SEN

7. Conclusion and Open Problems

In this article, we presented an expected O(min(n2, kmn1/k))-time algorithm forcomputing a (2k −1)-approximate distance oracle for unweighted graphs. RecentlyRoditty et al. [2005] gave a deterministic algorithm that computes a (2k − 1)-approximate distance oracle for weighted graphs in O(kmn1/k) time. It is an im-portant open problem to explore if it is possible to compute a (2k −1)-approximatedistance oracle for weighted graphs in expected or deterministic O(n2) time.

Another result of the article is a simple and expected linear-time algorithmto compute a (2, 1)-spanner of O(n3/2) size which is worst case optimal. Thealgorithm is obtained by a slight modification (for k = 2) in the algorithm ofBaswana and Sen [2003] which computes a (2k − 1)-spanner of O(kn1+1/k) sizein expected O(km) time. It is a natural and interesting problem to extend ouralgorithm for arbitrary k, that is, designing a linear-time algorithm for computing a(k, k − 1)-spanner of size O(kn1+1/k). Subsequent to the submission of our article,this problem was solved in Baswana et al. [2005b].

ACKNOWLEDGMENTS. We wish to thank the anonymous referees who providedvery detailed and insightful comments that has improved the overall presentationand organization. Their feedback also resulted in an O(log n) improvement in therunning time from a previous version as well as rectification of a subtle drawbackin Step (2) of the algorithm for computing (2k − 1)-approximate distance oracle.

REFERENCES

AINGWORTH, D., CHEKURI, C., INDYK, P., AND MOTWANI, R. 1999. Fast estimation of diameter andshortest paths(without matrix multiplication). SIAM J. Comput. 28, 1167–1181.

AWERBUCH, B., BERGER, B., COWEN, L., AND PELEG, D. 1998. Near-linear time construction of sparseneighborhod covers. SIAM J. Comput. 28, 263–277.

BASWANA, S., GOYAL, V., AND SEN, S. 2005a. All-pairs nearly 2-approximate shortest paths inO(n2 polylog n) time. In Proceedings of 22nd Annual Symposium on Theoretical Aspect of ComputerScience. Lecture Notes in Computer Science, vol. 3404. Springer-Verlag, New York, 666–679.

BASWANA, S. AND SEN, S. 2003. A simple linear-time algorithm for computing a (2k − 1)-spanner ofO(n1+1/k ) size in weighted graphs. In Proceedings of the 30th International Colloquium on Automata,Languages and Programming. Lecture Notes in Computer Science, vol. 2719. Springer-Verlag, NewYork, 384–396.

BASWANA, S., TELIKEPALLI, K., MEHLHORN, K., AND PETTIE, S. 2005b. New construction of (α, β)-spanners and purely additive spanners. In Proceedings of 16th Annual ACM-SIAM Symposium on DiscreteAlgorithms (Vancouver, BC, Canada). ACM, New York, 672–681.

CHAN, T. 2005. All-pairs shortest paths with real edge weights in O(n3/ log n) time. In Proceedings ofWorkshop on Algorithms and Data Structures. Lecture Notes in Computer Science, vol. 3608. Springer-Verlag, New York, 318–324.

COHEN, E. 1998. Fast algorithms for constructing t-spanners and paths with stretch t . SIAM J. Comput. 28,210–236.

COHEN, E., AND ZWICK, U. 2001. All-pairs small stretch paths. J. Algor. 38, 335–353.DOR, D., HALPERIN, S., AND ZWICK, U. 2000. All pairs almost shortest paths. SIAM J. Comput. 29,

1740–1759.ELKIN, M. 2005. Computing almost shortest paths. ACM Trans. Algor. 1, 282–323.ERDOS, P. 1964. Extremal problems in graph theory. In Theory of Graphs and its Applications (Proc.

Sympos. Smolenice, 1963). Publ. House Czechoslovak Acad. Sci., Prague, 29–36.FREDMAN, M. L., KOMLOS, J., AND SZEMEREDI, E. 1984. Storing a sparse table with O(1) worst case

time. JACM 31, 538–544.HALPERIN, S. AND ZWICK, U. 1996. Linear time deterministic algorithm for computing spanners for

unweighted graphs. unpublished manuscript.

Page 21: Approximate Distance Oracles for Unweighted Graphs inliamr/p557-baswana.pdf · Approximate Distance Oracles for Unweighted Graphs in Expected O(n2) Time SURENDER BASWANA Max-Planck

Approximate Distance Oracles for Unweighted Graphs 577

PETTIE, S. 2004. A new approach to all-pairs shortest paths on real-weighted graphs. Theoret. Comput.Sci. 312, 47–74.

RODITTY, L., THORUP, M., AND ZWICK, U. 2005. Deterministic construction of approximate distanceoracles and spanners. In Proceedings of 32nd International Colloquim on Automata, Languagaes andProgramming. Lecture Notes in Computer Science, vol. 3580. Springer-Verlag, New York, 261–272.

THORUP, M. AND ZWICK, U. 2005. Approximate distance oracles. JACM 52, 1–24.

RECEIVED MAY 2004; REVISED JULY AND SEPTEMBER 2005; ACCEPTED JUNE 2006

ACM Transactions on Algorithms, Vol. 2, No. 4, October 2006.