32
Lecture 3: Parallel Algorithm Design 1

Lecture 3: Parallel Algorithm Design 1. 2 Balanced binary tree Pointer jumping Accelerated cascading Divide and conquer Pipelining Multi-level

Embed Size (px)

Citation preview

Page 1: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

Lecture 3: Parallel Algorithm Design

1

Page 2: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

2

Balanced binary tree Pointer jumping Accelerated cascading Divide and conquer Pipelining Multi-level divide and conquer

  . . . . .

Techniques of Parallel Algorithm Design

Page 3: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

3

Balanced binary tree

Processing on binary tree: Let the leaves correspond to input and internal nodes to processors.

Example

Find the sum of n integers (x1, x2, ... , xn).

P1

Pn-1

P2 Pn/2-1 Pn/2

Pn-3 Pn-2

x1x2 x3x4 xn-3 xn-2 xn-1 xn

Output

Input

Step 1

Step (log n) -1

Step log n

Page 4: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

4

Problem of finding Prefix Sum

Definition of Prefix SumInput :  n integers put in array A[1..n] on the shared memory

Output : array B[1..n], where for each B[i] (1 i n)≦ ≦B[i] = A[1] + A[2] + .... + A[i]

Example   Input: A[1..5] = (5, 8, -7, -10, 3) ,

   Output: B[1..5] = (5, 13, 6, -4, -1)

Sequential algorithm for Prefix Sum

main (){ B[1] = A[1];

for (i = 2; i n; i++) {≦ B[i] = B[i-1] + A[i];

}

}

Balanced binary tree

Page 5: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

5

Solving Prefix Sum problem on balanced binary tree (1)

Outline of the parallel algorithm for prefix sum

To simplify the problem, let n = 2k (k is an integer)

(1) Calculate the sub-sum from the leaves to the root in bottom up style.

(2) Using the sub-sum obtained in (1) , calculate the prefix sum from the root to the leaves in up down style.

Balanced binary tree

Page 6: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

6

Solving Prefix Sum problem on balanced binary tree (2)

(1) First read the input at the leaves. Then,

calculate the sub-sum from the leaves

to the root in bottom up style.

(2) From the root to the leaves, do

the following: send the right son

its sub-sum obtained in (1), and

send the left son the value of

(its sub-sum) – the right son’s sub-sum).

P4P3P2P1

4 2 -9 5 8 -3 7 -2

P2P1

P1

6 -4 5 5

2 10

12

12-(-2)

P4P3P2P1

6 2 7=14

P2P1

P1

2

12

12-5 =7

12

127-(-3) =10

2-(-4) =6

2-5 =-3

6-2 =4

12-10 =2

Balanced binary tree

Page 7: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

7

Solving Prefix Sum problem on balanced binary tree (3)Correctness of the algorithm   When step (1) finished, the sub-sum in each internal node is the sum

of its subtree.

P4P3P2P1

4 2 -9 5 8 -3 7 -2

P2P1

P1

6 -4 5 5

2 10

12

Balanced binary tree

Page 8: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

8

Solving Prefix Sum problem on balanced binary tree (4)

In step (2), at each internal node

(a) The sub-sum sent to the right son is the summation of its subtree.

(b) The sub-sum sent to the left son is the sum of its subtree subtracted by the sum of its right son’s subtree.

P4P3P2P1

P2P1

P1

12

12

(a)

P4P3P2P1

P2P1

P1

12-10 =2

7 5

(b)

Correctness of the algorithm - Continue

P4P3P2P1

4 2 -9 5 8 -3 7 -2

P2P1

P1

6 -4 5 5

2 10

12

12

Balanced binary tree

Page 9: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

9

Algorithm Parallel-PrefixSum (EREW PRAM algorithm)main (){

if (number of processor == i) B[0, i] = A[i];

for (h=1; h log n; h++) {≦ if (number of processor j n/2≦ h) {

B[h, j] = B[h-1, 2j-1] + B[h-1, 2j];

}

}

C[log n, 1] = B[log n, 1]

for (h = (log n) - 1; h 0; h--) {≧ if (number of processor j n/2≦ h) {

if (j is even) C[h, j] = C[h+1, j/2];

if (j is odd) C[h, j] = C[h+1, (j+1)/2] - B[h, j+1];

}

}

}

Solving Prefix Sum problem on balanced binary tree (5)

Balanced binary tree

Page 10: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

10

(First step) B[3,1]

B[0,4]B[0,3]B[0,2]B[0,1]

A[1] A[2] A[3] A[4]

B[1,2]B[1,1]

B[2,1]

B[0,8]B[0,7]B[0,5]B[0,6]

A[5] A[6] A[7] A[8]

B[1,4]B[1,3]

B[2,2]

Solving Prefix Sum problem on balanced binary tree (6)

Balanced binary tree

Page 11: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

11

(Second step)

C [3,1]

C[0,4]C[0,3]C[0,2]C[0,1]

C[1,2]C[1,1]

C[2,1]

C[0,8]C[0,7]C[0,5]C[0,6]

C[1,4]C[1,3]

C[2,2]

Solving Prefix Sum problem on balanced binary tree (7)

Balanced binary tree

Page 12: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

12

Analysis of the algorithm•  Computing time: for loop repeated log n times and each loop can be executed in O(1) time →   O(log n) time•  Number of processors: Not larger than n →   n processors•  Speed up: O(n/log n)•  Cost: O(n log n)

It is not cost optimal since the running time of the optimal Θ(n).

It is not cost optimal since the running time of the optimal Θ(n).

Solving Prefix Sum problem on balanced binary tree (8)

Balanced binary tree

Page 13: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

13

To reduce the cost, solve the problem sequentially when the size of the problem is small.

  

Accelerated cascading is used, usually, with balanced binary tree and divide and conquer techniques.

Accelerated cascading is used, usually, with balanced binary tree and divide and conquer techniques.

Accelerated cascading

Balanced binary tree

Page 14: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

14

Policy for improving the algorithm

To make the algorithm cost optimal, we decrease the number of processors from n to n/logn.

(Note: Computing time of the algorithm is O(logn).)

Steps:

1. Instead of processing n elements in parallel, divide n elements into n/logn groups with logn elements each.

2. To each group assign one processor and solve the problem for the group sequentially.

Accelerated cascading for Prefix Sum problem

Balanced binary tree

Page 15: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

15

Improved algorithm Parallel-PrefixSum(1) Divide n elements in A[1..n] in to n/log n groups with log n elements each.

 ( O(1) time , O(n/log n) processors )

(2)   Assign each group one processor and find the prefix sum for each group.

( O(log n) time , O(n/log n) processors )

Accelerated cascading for Prefix Sum problem

A[1..n]

log n elements

Balanced binary tree

Page 16: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

16

Improved algorithm Parallel-PrefixSum - continue(3)   Let S be the set of the last element in each group (it is the sum of the group). Use algorithm Parallel-PrefixSum to find the prefix sum of S.

( O(log (n/log n) ) = O(log n) time , O(n/log n) processors)

Accelerated cascading for Prefix Sum problem (3)

Algorithm Parallel-PrefixSum

Last element in each group

Balanced binary tree

Page 17: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

17

Improved algorithm Parallel-PrefixSum - continue(4)   Use the prefix sum of S to find the prefix sum of the input

A[1..n].

( O(log n) time , O(n/log n) processor )

Result of (3)

Accelerated cascading for Prefix Sum problem (4)

Balanced binary tree

Page 18: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

18

Analysis of the improved algorithm

Computing time and the number of processors : Each step: O(log n) time, O(n/log n)

→ Totally, O(log n) time, O(n/log n) processors

  Speed up = O(n/log n)   Cost :  O(log n × n/log n) = O(n)

  It is cost optimal.   It is also time optimal ( Don’t show the proof here)   It is optimal algorithm.

  It is cost optimal.   It is also time optimal ( Don’t show the proof here)   It is optimal algorithm.

Accelerated cascading for Prefix Sum problem (5)

Balanced binary tree

Page 19: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

19

(1)   2 divide and conquer

(2) n divide and conquer ( ε <1)ε

Divide and conquer

Divide and Conquer

Page 20: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

20

Divide and conquer technique•   Well known technique in algorithm design•   Solving problems recursively•   Used very often in both sequential and parallel

algorithms

How to divide and conquer (1) Dividing step: dividing the problem into a number of subproblems.

(2) Conquering step: solving each subproblem recursively.

(3) Merging step: merging the solutions of subproblems to the solution of the original problem.

Divide and Conquer

Page 21: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

21

Convex hull problemInput: a set of n points in the plane.

Output: the smallest convex polygon which contains all points of the

input. (The convex polygon is represented by the list of its

vertices in order of clockwise.)

  Basic problems in computational geometry.   A lot of applications.   Solved in O(nlogn) time sequentially.

In the following we only consider

the upper convex hull.

( Upper convex hull: ( P9, P8, P1, P0 ) )

P1

P2

P3

P9

P4

P5

P6

P7

P8

P0

Output: ( P ,P ,P ,P ,P ,P ) 0 3 5 9 8 1

Lower convex hull

Upper convex hull

Divide and Conquer

Page 22: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

22

Merging of two upper convex hulls

p1

p2 p3p9p4

p5

p6 p7p8

p10

Common upper tangent = (p ,p )3 8

Finding the upper common tangent

It is known that common tangents can be found in O(log n) time sequentially.

Divide and Conquer

Page 23: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

23

2 divide and conquer (1)  Outline of the algorithm Parallel-UpperConvexHull

Preprocessing   Sort all the points according to their x coordinates, and let the result is

the sequence (p1, p2, p3, ... , pn).

(1) If the size of sequence is 2, return the sequence.

(2) Divide (p1, p2, p3, ... , pn) to the left half part and the right half part, and find the

upper convex hull of each recursively.

(3) Find the upper common tangent of two upper convex hulls obtained in (2), and

output the solution of the problem.  

Divide and Conquer

Page 24: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

24

2 divide and conquer (2)

How 2 divide and conquer works

Find the upper commontangent for two upper convex hulls of two vertices each.

Find the upper commontangent for two upper convex hulls of two vertices each.

Find the upper commontangent for two upper convex hulls of four vertices each.

Find the upper commontangent for two upper convex hulls of four vertices each.

Find the upper commontangent for two upper convex hulls of eight vertices each.

Find the upper commontangent for two upper convex hulls of eight vertices each.

Divide and Conquer

Page 25: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

25

2 divide and conquer (3)Recursive execution When the problem is divide once, the size of the subproblem becomes

half. Suppose the size of the subproblems becomes 2 when the

problem is divided k times.

n/2k= 2  ⇒  k = log2 n - 1

n

n/2 n/2

n/4 n/4 n/4 n/4

2 22 2 2 2

Height= log2 n

Divide and Conquer

Page 26: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

26

Complexity of the algorithm

Preprocessing :  O(log n) time , n processors  Steps (1) 〜 (3) :  each step runs O(log n) time , use n/2 processors

  T(n) = T(n/2) + O(log n) Therefore, T(n) = O(log n)

∴   The algorithm runs in O(log n) time using O(n) processors.

Computational model: There is no concurrent access  ⇒  EREW PRAM

2 divide and conquer (4)

Proprocessing   Sort the sequence of the points according to their x coordinates.(1) If the size of the sequence is 2, return the sequence. (2) Divide the sequence into the left half part and the right half part, and find the upper convex hull of each recursively.

(3) Find the upper common tangent of two upper convex hulls obtained in (2),

and output the upper convex hull of the sequence.

Proprocessing   Sort the sequence of the points according to their x coordinates.(1) If the size of the sequence is 2, return the sequence. (2) Divide the sequence into the left half part and the right half part, and find the upper convex hull of each recursively.

(3) Find the upper common tangent of two upper convex hulls obtained in (2),

and output the upper convex hull of the sequence.

2

2

Divide and Conquer

Page 27: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

27

Finding the complexity of the algorithm from recursive tree Computing time

Number of processors At the level of the leaves, n/2 processors are used at the same time. ⇒   n/2 processors

2divide and conquer (5)

n

n/2 n/2

n/4 n/4 n/4 n/4

2 22 2 2 2

Height log2 n

c

Time

Totally

c log n

c log n/2

c log n/4

O(log n)2

Processors

2

4

n/2

n/2

T(n)×P(n)=O(nlog n)    It is not cost optimal2

Divide and Conquer

Page 28: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

28

n divide and conquer  Outline of the algorithm Preprocessing

Sort the sequence of the input points according to their x coordinates, and let the result be sequence (p1, p2, p3, ... , pn).

(1)   If the size of the sequence is 2, return the sequence.

(2)   Divide (p1, p2, p3, ... , pn) to equally-sized subsequence, and

find the upper convex hull of each recursively.

(3)   Merge upper convex hulls into the upper convex hull of the

sequence .  

1/2

n1/2

n1/2

Divide and Conquer

Page 29: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

29

Assign each upper convex hull processors to find the upper common tangents in    O(log n) time, and then determine the edges which belong to the solution .   

n 1/2n1/2Merging upper convex hull

Case 1

Case 2

Divide and Conquer

Page 30: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

30

When the problem is divided once, the size of the

subproblems becomes . Suppose that the size of the subproblems becomes 2

when the problem is divided k times.   = 2  ⇒ k = log log n

Recursive tree of n divide and conquer1/2

n 1/2

n1/(2 )

k

Height= loglog n

n

1/2n 1/2n1/2n

1/4n 1/4n 1/4n

2 2 2 2 2 2

Divide and Conquer

Page 31: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

31

Preprocessing :  O(log n) time , n processors.  Steps (1) 〜 (3) :  each step O(log n) time , n processors.

  T(n) = T(n   ) + O(log n), therefore, T(n) = O(log n)∴   Totally, the algorithm runs in O(log n) time using O(n) processors.

Computational model• Concurrent reading happens in the procedure of finding the upper common tangents ⇒   CREW PRAM

Preprocessing   Sort the sequence of the points in their x coordinates.(1)   If the size of the sequence is 2 , return the sequence.

(2)   Divide the sequence into   equally-sized subsequences,

and find the upper convex hull of each recursively.

(3)   Find the upper common tangents of the upper convex hulls

obtained in (2), and determine the solution.

Preprocessing   Sort the sequence of the points in their x coordinates.(1)   If the size of the sequence is 2 , return the sequence.

(2)   Divide the sequence into   equally-sized subsequences,

and find the upper convex hull of each recursively.

(3)   Find the upper common tangents of the upper convex hulls

obtained in (2), and determine the solution.

n1/2

n 1/2

1/2

T(n)×P(n)=O(nlog n)    Optimal !!!

Analysis of the algorithm

Divide and Conquer

Page 32: Lecture 3: Parallel Algorithm Design 1. 2  Balanced binary tree  Pointer jumping  Accelerated cascading  Divide and conquer  Pipelining  Multi-level

32

Exercise1. Suppose nxn matrix A and matrix B are saved in two dimension arrays. Design a PRAM

algorithm for A×B using n and nxn processors, respectively. Answer the following questions:

(1) What PRAM models that you use in your algorithms?

(2) What are the runings time?

(3) Are you algorithms cost optimal?

(4) Are your algorithms time optimal?

2. Design a PRAM algorithm for A×B using k (k <= nxn processors). Answer the same questions.