Computing dominators in parallel

Information Processing Letters 24 (1987) 217-221 North-Holland

2 March 1987

C O M P U T I N G D O M I N A T O R S IN P A R A L L E L

Shaunak R. P A W A G I *

Department of Computer Science, State University of New York at Stony Brook, Stony Brook, Long Island, N Y 11794, U.S.A.

P.S. G O P A L A K R I S H N A N *

Department of Computer Science, University of Maryland, College Park, MD 20742, U.S.A.

I.V. R A M A K R I S H N A N *

Department of Computer Science, State University of New York at Stony Brook, Stony Brook, Long Island, NY 11794, U.S.A.

Communicated by David Gries Received 1 October 1985 Revised 6 June 1986

We present a fast parallel algorithm for computing the dominators of a directed acyclic graph. The model of computation used is a parallel random access machine that allows simultaneous reads but prohibits simultaneous writes into the same memory location. Let Pt(n) be the processor complexity of computing the transitive closure of an n-vertex directed graph on this model. The only known parallel algorithm for dominators requires O(log2n) time and uses O(nPt(n)) processors. Our algorithm for dominators has the same time complexity but uses O(Pt(n)) processors, thereby improving the processor complexity of the previously known algorithm by a factor of n.

Keywords: Parallel algorithm, dominator, transitive closure

1. I n t r o d u c t i o n

Comput ing the domina tors of a directed acyclic graph (DAG) is a very impor tant code optimiza- t ion step in compilers (see [2] for details). Conse- quently, this problem has at tracted widespread a t tent ion and several excellent sequential algori thms have been developed for it (see [1]). Of late, growing interest in parallel computa t ion has led to a proliferation of parallel algori thms for

* The first author was supported by the Air Force Office of Scientific Research under Contract F-49620-85-K-0009, the second author by grants from the National Science Founda- tion to the Machine Intelligence and Pattern Analysis Laboratory, and the third author by the Office of Naval Research under Contract N00014-84-K-0530, and by the National Science Foundation under Grant ECS-84-04399.

graph problems on a model of synchronous parallel computa t ion [6,7]. Surprisingly, despite the im- portance of the dominator problem, the only

known parallel algorithm for it is due to Savage [6]. This algori thm computes dominators using a

para l le l , r andom access machine that allows simultaneous reads but prohibits simultaneous writes into the same memory location. We refer to this model as R-PRAM. A powerful variation of this model that allows simultaneous writes into the same memory location by more than one processor is referred to as W-PRAM.

Let G = (V, E) be a D A G rooted at r, with

I Vl - - n and IEI--m. We say tha t vertex i is a dominator of vertex j if i is on every pa th f rom r to j. For each vertex k, Savage first constructs a graph G k by deleting from G all edges leaving k. Next, the transitive closure of each of these graphs

0020-0190/87/$3.50 © 1987, Elsevier Science Publishers B.V. (North-Holland) 217

Volume 24, Number 4 INFORMATION PROCESSING LETTERS 2 March 1987

is computed in parallel. Let G* and G~ denote the transitive closure of G and G k, respectively. Domina tors are then computed by using the sim- ple observation that if i is reachable from r in G* and not in G~, then k is a dominator for i. Savage's algori thm requires O(logZn) t ime x and uses O(nPt(n)) processors, where Pt(n) is the processor complexi ty of comput ing the transitive closure in O(log 2 n) time.

In the following section we describe our al- gor i thm for comput ing the dominators on an R- PRAM. Our approach is radically different f rom Savage's as well as f rom those used in sequential algorithms. The time and processor complexity of our algori thm are O(log2n) and O(Pt(n)), respectively. Observe here that the processor require- ment of our algori thm has decreased by a factor of n over Savage's.

It is interesting to note that parallel algorithms for m a n y problems on undirected graphs have the same processor bound as that of the connected componen t problem [7]. In the design of al- gor i thms for problems on directed graphs, comput ing the transitive closure is a fundamenta l as comput ing the connected components f o r problems on undirected graphs. Therefore, parallel algori thms for problems on directed graphs should be considered processor-efficient if they achieve the same processor bound as that of comput ing the transitive closure. Our algorithm does achieve this bound for processor complexity.

2. Preliminaries

We begin with a review of graph-theoretic terms. Let G = (V, E) denote a graph where V is a

finite set of vertices and E is a set of pairs of vertices, called edges. If the edges are unordered pairs, then G is undirected else it is directed. Throughout , we assume that V consists of the set of vertices {1, 2 , . . . , n } and IE[ = m. We denote the undirected edge joining the vertices u and v by (u, v) and the directed edge f rom u to v by (u, v>. An adjacency matrix A of G is an n x n Boolean matr ix such that A[u, v] = 1 if and only if (u, v)

x Throughout this paper, we use log n to denote [log2n].

E. A directed path in G joining two vertices i o and i k is defined as a sequence of vertices (i 0, i a, i 2 . . . . , i k) such that all of them are distinct and, for each 0 ~< p < k, ( ip , ip+l ) is an edge of G. An undirected path is defined similarly. If i 0 = i k, then the path is called a cycle. A directed acvclic graph (DAG) has no cycles. We denote a directed path from u to v by [u ~ v ] . We say that an undirected graph G is connected if, for every pair of vertices u and v in V, there is a path in G joining u and v. A tree is a connected undirected graph with no cycles in it. A rooted directed tree has a distinguished vertex called the root f rom which every other vertex is reachable via a directed path. We say that a vertex u is an ancestor of vertex v if u is on the path f rom the root to v. The parent of a vertex is its immedia te ancestor. A descendant of a vertex is defined similarly. The lowest common ancestor (LCA) of vertices x and y in T is the vertex z such that z is a c o m m o n ancestor of x and y, and any other c o m m o n ancestor of x and y in T is also an ancestor of z in T.

A directed graph is rooted at r if there is a path f rom r to every vertex in V. For the rest of this paper, without loss of generality we assume that G is a directed acyclic graph rooted at r. We say that vertex i is a dominator of vertex j if i is on every path from r to j. In particular, for every i in V, r and i are dominators of i. Domina tors exhibit transitivity, that is, for vertices i, j, and k in V, whenever i is a domina tor of j and j is a domina to r of k, i is a domina to r of k. Therefore, it is easy to see that the set of dominators of a vertex j can be linearly ordered by their order of occurrence on a pa th from r to j. Such a path need not be a shortest path, as required in [1]. The domina to r of j closest to j (other than j) is called the immediate domina tor of j. It follows from the definit ion that the immediate domina to r of every vertex is unique. We can now express the domina tor relation as a directed tree T d rooted at r called the domina to r tree. If u is the immedia te domina tor of v, then (u, v) is an edge of T d. Now, note that i is a dominator of j if i is an ancestor of j in T d.

The transitive closure matrix A* is a Boolean matrix such that A*[i, j] is 1 iff there is a directed pa th from i to j in G. A parallel a lgori thm for the transitive closure is based on repeated multipli-

218


cation of the adjacency matrix. In this parallel algorithm, and ( A ) and or ( V ) operations replace multiplication and addition operations of an inner product step. We refer to this as the a n d - o r multiplication of two matrices. The algorithm ini- tializes the transitive closure matrix A* to the adjacency matrix A and then performs O(log n) iterations of the and -o r multiplication of A* by itself. On an R-PRAM, given a sufficiently large number of processors, matrix multiplication can be done in O(log n) time, implying that the parallel-time complexity of the transitive closure of a directed graph is O(log 2 n).

The best known sequential algorithm for matrix multiplication is due to Coppersmith and Wino- grad [4]. Their algorithm multiplies two n × n matrices in O(n 249) time. This time bound also represents the processor bound of an efficient parallel implementat ion of their algorithm on an R-PRAM [3]. Therefore, the best-known value of Pt(n), the processor complexity of computing the transitive closure, is n 2"49. A lower bound for the processor complexity of any algorithm that performs parallel matrix multiplication in O(log n) time is O(n / / log n). This is obvious since we need to examine O(n 2) entries in both matrices, and the processor-time product of any parallel algorithm must be at least that much. Therefore, the lower bound for the processor complexity of computing the transitive closure of a directed graph in O(log2n) time (using log n plus-rain multipli- cations of the adjacency matrix) is O(n2/log n).

We now proceed to describe our algorithm for computing dominators.

3. The algorithm

In order to compute the dominator tree, first construct a spanning tree for G that is rooted at r; then compute the set of dominators for each vertex in a matrix DoM such that Do~[i, j] = 1 if i is a dominator of j, and Do~[i, j] = 0 otherwise. The computational steps are as follows. (1) Compute the transitive closure matrix A* for

G. This computation requires O(log2n) time and uses O(Pt(n)) processors.

(2) Construct a directed spanning tree T s from the

adjacency matrix A and the transitive closure matrix A*. This is done by specifying the parent of each vertex i, the smallest vertex j such that (j, i) is an edge of G. This selection can be done in O(log n) time by assigning n / log n processors to each vertex. Since there are n vertices in G, we need O(n2/log n) processors for this step.

(3) For every vertex, mark all its ancestors in T S as its dominators. That is, set DOM[i, j] to 1 if i is an ancestor of j, and set DOM[i,j] to 0 otherwise. Ancestor computation can be done in O(log n) time using O(n2/log n) processors (see [7]). Initialization of the matrix DOM requires constant time and O(n 2) processors. By assigning one processor to log n elements, initialization can be done in O(log n) time using O(n2/log n) processors.

(4) For every vertex v, consider the nontree edges incident on it. For all such edges (x, v), compute the lowest common ancestor of x and v in Ts. Among these LCA's, let h(v) be the vertex closest to the root r (h stands for highest). The lowest common ancestors for all vertex pairs can be computed in O(log n) time using O(n/ / log n) processors [7]. For each vertex v, h(v) can be determined in O(log n) time using O(n/ log n) processors.

(5) We initialized the DoM matrix using paths in T s. Now, the nontree edge (x, v) provides a path from r to v (passing through h(v) and x), other than the path present in T~. Therefore, a vertex, say u, on the path in T~ from h(v) to v is not a dominator of v. Such a vertex u is identified by determining if v is a descendant of u and h(v) is an ancestor of u (vertex u 1 in Fig. 1). For all such vertices u, set DOM[u, v] = 0. To do the above computation in O(log n) time, we need O(n/ log n) processors for each vertex. Since there are n vertices in the tree, we use O(n2/qog n) processors for this step.

(6) For every vertex y, if u is not a dominator of y, then u is not a dominator of any vertex reachable from y. Therefore, for every vertex v, u is not a dominator for v if there exists at least one vertex y, such that u is not a dominator for y and v is reachable from y (vertex u 2 in Fig. 1). Therefore, set DOM[u, v] to 0 if

219


U2q

h(v)b

x 1 %

'~x\ ',.t V

t d

, s /

~r

~y

Fig. 1. (In this figure, u 1 and u 2 are instances of u.)

n

V ((DOM[U, y] = 0 ) A (A*[y, v] = 1)} y = l

evaluates to true. This computat ion can be reduced to the a n d - o r multiplication of DOM and A*. Such a reduction can be done in constant t ime using O(n 2) processors, or in O(log n) time using O(n2/log n) processors. Since matrix multiplication requires O(Pt(n)) processors (which is more than n2/ log n), the processor complexity of this step is O(Pt(n)) processors.

This completes the description of our algorithm for dominators . We now provide the proof of its correctness.

Therefore, DoM[u, v] is set to 0 in step (2) of our algorithm and it stays 0 for all following steps. If u is on the path from r to v in T S but it is not a dominator of v, then there must exist a path f rom r to v that does not pass through u. There could be several such paths from r to v, but they are of two types (see Fig. 1). A path of the first type consists of edges in Ts and a nontree edge incident on v. For instance, the path from r to v that passes through LCA(x, v) and x is of the first type. Now, u 1 is on the path from the LCA(x, v) to v in T s. In step (5) of our algorithm, we select h(v) to be the closest LCA to the root r, among all LCAs defined by the nontree edges incident on v. Therefore, u 1 must be on the path [ h ( v ) ~ v], and DOM[u 1, v] will be set to 0. The second type of path from r to v consists of edges of Ts and more than one nontree edge. For example, the path from r to v that passes through y is of the second type. Clearly, u 2 is not a dominator of y, and, in step (5), DOM[u 2, y] will be set to 0. But v is reachable f rom y, providing a path from r to v, and making u 2 a nondorninator of v. Thus, in step (6), which involves a n d - o r multiplication of DOM and A*, DoM[u 2, v] will be set to 0. Hence, the claim of the lemma follows. []

Theorem 3.2. The above algorithm computes the dominator matrix DOM in O(log2n) time using O(Pt(n)) processors.

Proof. The correctness of our algorithm is proved in Lemma 3.1. All steps except (1) require O(log n) time. Step (1), which involves computing the transitive closure, requires O(log2n) time. Steps (1) and (6) involve matrix multiplication and therefore use O(Pt(n)) processors. All other steps use O(n2/ log n) processors. Therefore, the processor complexity of our algorithm is O(Pt(n)), the same as that of computing the transitive closure. []

Lemma 3.1. I f vertex u is not a dominator of vertex v, then at the end of our algorithm DOM[u, v] will be set to O.

Proof. If u is not on the path from r to v in T s, then, by definition, u is not a dominator of v.

Given the matrix DOM, the dominator tree can be constructed by determining the immediate domina tor for each vertex. Recall that the immediate dominator of a vertex is unique, and it is the closest dominator of that vertex. In the domina tor tree, the closest dominator of a vertex

220


is its parent. The steps for construction of the dominator tree are as follows. (1) For every vertex i, count the number of

dominators of i by summing the entries in the ith column of DOM. This summation requires O(n2/log n) processors and O(log n) time.

(2) For every vertex i, determine the immediate dominator of i. If i has d dominators, then the immediate dominator of i is a dominator of i that has d - 1 dominators. This can be done in constant time using n 2 processors or in O(log n) time using O(n2/log n) processors.

(3) The parent of every vertex i in the dominator tree is its immediate dominator. The root of the tree is r.

The above procedure constructs the dominator tree from the matrix DOM in O(log n) time using O(n2/log n) processors. The time and processor complexities are immediate from the steps given above, and the correctness is proved by the following lemma.

processors and all other steps use O(n2/log n) processors. As explained earlier, O(n2/log n) is an obvious lower bound for the processor complexity of computing transitive closure that uses repeated p lus -min multiplication of the adjacency matrix. The best known value for Pt(n) is n 2"49. Therefore, if the processor complexity of computing the transitive closure improves (because of an efficient algorithm for matrix multiplication), the processor complexity of our algorithm will also drop.

Finally, it is worth mentioning that the algorithm presented here would require O(log n) time on a W-PRAM that allows simultaneous writing in the same memory location by more than one processor. The only step in our algorithm that requires O(log2n) time is computing the transitive- closure matrix. All other steps require O(log n) time. Since the transitive closure can be computed in O(log n) time on a concurrent write model [5], our algorithm would therefore require O(log n) time on a W-PRAM.

Lemma 3.3. Let d be the number of dominators of i. A dominator j of i is the immediate dominator of i iff j has d - 1 dominators.

Proof. The number of ancestors of any vertex in the dominator tree is equal to the number of its dominators. Since the dominator tree is unique, the immediate dominator, which is the parent of i in the dominator tree, must have d - 1 dominators. t~

4. Conclusion

We have described an O(log2n) time algorithm for dominators of a DAG that uses O(Pt(n)) processors, where Pt(n) is the processor complexity of computing the transitive closure of a directed graph. This improves the processor complexity of Savage's [6] algorithm by a factor of n. More importantly, we have tied the processor complexity of computing dominators in parallel to that of computing transitive closure of a directed graph. Two steps of our algorithm use O(Pt(n))

Acknowledgment

The authors wish to thank the referees for their helpful comments.

References

[1] A.V. Aho, J.E. Hopcroft and J.D. Ullman, The Design and Analysis of Computer Algorithms (Addison-Wesley, Read- ing, MA, 1974).

[2] A.V. Aho and J.D. UUman, Principles of Compiler Design (Addison-Wesley, Reading, MA, 1977).

[3] D. Coppersmith, Private communication, May 1986. [4] D. Coppersmith and S. Winograd, On the asymptotic com-

plexity of matrix multiplication, SIAM J. Comput. 11 (1982) 472-482.

[5] L. Kucera, Parallel computation and conflicts in memory access, Inform. Process. Lett. 14 (1982) 93-96.

[6] C. Savage, Parallel algorithms for some graph problems, Tech. Rept. #784, Dept. of Mathematics, Univ. of Illinois, Urbana, 1977.

[7] Y. Tsin and F. Chin, Efficient parallel algorithms for a class of graph-theoretic problems, SIAM J. Comput. 14 (1984) 580-599.

221

Documents

Computing dominators in parallel