Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
Multi-threading model
High level model of thread processes using spawn and sync.Does not consider the underlying hardware.
Algorithm Algorithm-A
begin· · · spawn Algorithm-Bdo Algorithm-B in parallel with this code· · · other stuff · · · syncwait here for all previous spawned parallel computations to complete· · · end
Multi-threading model
Many languages (e.g. Java) support the production of separatelyrunnable processes called threads. Each thread looks like it isrunning on its own and the operating system shares time andprocessors between the threads. In the multi-threading model, theexact parallel implementation is left to the operating system
Multi-threading Model
We look at some examples.
I Fibonacci
I Complexity measures
See CLRS: Cormen, Lierseson, Rivest and Stein, Introduction toAlgorithms (3rd edition). Chapter 27, Multithreaded Algorithmshttps://mitpress.mit.edu/sites/default/files/
titles/sample/0262533057chap27.pdf
Multi-treading Fibonacci
Reminder: Recursive Fibonacci
Algorithm FIB(n)
1: if n ≤ 1 return n2: else3: return FIB(n − 1) + FIB(n − 2)
Parallel version
Algorithm Par-FIB(n)
1: if n ≤ 1 return n2: else3: x = spawn Par-FIB(n − 2)4: y = Par-FIB(n − 1)5: sync6: return x + y
Multi-treading Fibonacci
Reminder: Recursive Fibonacci
Algorithm FIB(n)
1: if n ≤ 1 return n2: else3: return FIB(n − 1) + FIB(n − 2)
Parallel version
Algorithm Par-FIB(n)
1: if n ≤ 1 return n2: else3: x = spawn Par-FIB(n − 2)4: y = Par-FIB(n − 1)5: sync6: return x + y
Recursive Fibonacci. The recursion tree for FIB1(6)
Figure : From CLRS Introduction to Algorithms Chapter 27 (downloaded)
Multi-threading Fibonacci
Complexity measures for multi-threading
DAG: directed acyclic graph. Vertices are the circles for spawn,sync or procedure call. For a problem of size n:
I Span S or T∞(n). Number of vertices on the longestdirected path from start to finish in the computation DAG.(The critical path).The run time if each vertex of the DAG has its own processor.
I Work W or T1(n). Total time to execute the entirecomputation on one processor. Defined as the number ofvertices in the computation DAG
I Tp(n). Total time to execute entire computation with pprocessors
I Speed up = T1/Tp. How much faster it is
I Parallelism = T1/T∞. The maximum possible speed up
Example 1: Fibonacci
Lets look at the answer first.(For details see pages 776–784 ofCLRS.)
I T1(n) = Θ(φn) where φ ∼ 1.62 (see page 776)
I T∞(n) = Θ(n).The critical path is proportional to Fib(n)
I Parallelism = T1/T∞ = Θ(φn/n).Let t = T1(n) ∝ (1.62)n be the (sequential) time.Then log t ∝ n log 1.62 = Θ(n).
I Parallelism = T1/T∞ = Θ(t/ log t).Almost linear speed up relative to our chosen sequentialalgorithm.
Span and work
Back to Fib. FIB1(n) is exponential
I For Par-Fib(n) we haveT∞(n) = max(T∞(n − 1) + T∞(n − 2)) + Θ(1) = Θ(n)
I The value (1.62)n is tricky. We can show T1(n) = Ω(√
2n)
T1(n) = T1(n − 1) + T1(n − 2) + Θ(1)
≥ 2T1(n − 2)
≥ 22T1(n − 4) ≥ 23T1(n − 6) ≥ · · ·≥ 2n/2T1(0)
=√
2nΘ(1)
where√
2 ∼ 1.41 so T1(n) ≥ (1.4)n, an exponential run time
Example 2. Add up numbers
S(n) = 1 + · · ·+ 1 i.e. S(n) = n
Algorithm SUM1(n)
1: if n = 0 return 02: SUM1= 03: for i = 1, ..., n do SUM1 = SUM1 + 14: return SUM1
How to make SUM ’look parallel’? Recursive version!
Algorithm SUM(n)
1: if n = 1 return 12: else3: return SUM(n/2) + SUM(n/2)
Add up the first half and then the second half.Not very practical? But it has a good parallel counterpart
Example 2. Add up numbers
S(n) = 1 + · · ·+ 1 i.e. S(n) = n
Algorithm SUM1(n)
1: if n = 0 return 02: SUM1= 03: for i = 1, ..., n do SUM1 = SUM1 + 14: return SUM1
How to make SUM ’look parallel’? Recursive version!
Algorithm SUM(n)
1: if n = 1 return 12: else3: return SUM(n/2) + SUM(n/2)
Add up the first half and then the second half.Not very practical? But it has a good parallel counterpart
Example 2. Add up numbers
S(n) = 1 + · · ·+ 1 i.e. S(n) = n
Algorithm SUM1(n)
1: if n = 0 return 02: SUM1= 03: for i = 1, ..., n do SUM1 = SUM1 + 14: return SUM1
How to make SUM ’look parallel’? Recursive version!
Algorithm SUM(n)
1: if n = 1 return 12: else3: return SUM(n/2) + SUM(n/2)
Add up the first half and then the second half.Not very practical? But it has a good parallel counterpart
Example 2. Add up numbers
S(n) = 1 + · · ·+ 1 i.e. S(n) = n
Algorithm SUM1(n)
1: if n = 0 return 02: SUM1= 03: for i = 1, ..., n do SUM1 = SUM1 + 14: return SUM1
How to make SUM ’look parallel’? Recursive version!
Algorithm SUM(n)
1: if n = 1 return 12: else3: return SUM(n/2) + SUM(n/2)
Add up the first half and then the second half.Not very practical? But it has a good parallel counterpart
Example 2. Add up numbers 1 + · · ·+ 1
Sequential recursion.Assume n is a power of 2, i.e.n = 2m
Algorithm SUM(n)
1: if n = 1 return 12: else3: return SUM(n/2) + SUM(n/2)
Parallel version
Algorithm Par-SUM(n)
1: if n = 1 return 12: else3: x = spawn Par-SUM(n/2)4: y = Par-SUM(n/2)5: sync6: return x + y
Example 2. Add up numbers 1 + · · ·+ 1
Sequential recursion.Assume n is a power of 2, i.e.n = 2m
Algorithm SUM(n)
1: if n = 1 return 12: else3: return SUM(n/2) + SUM(n/2)
Parallel version
Algorithm Par-SUM(n)
1: if n = 1 return 12: else3: x = spawn Par-SUM(n/2)4: y = Par-SUM(n/2)5: sync6: return x + y
Example 2. Complexity comparison
I T1(n) = Θ(n)
I T∞(n) = m = log2 n. Why?Ans: If n = 2m then m = log2 n
T∞(n) = Θ(1) + max(T∞(n/2),T∞(n/2))
= Θ(1) + T∞(n/2)
= (m − 1)Θ(1) + T∞(n/2m)
= mΘ(1)
I Parellism= T1(n)/T∞(n) = Θ(n/ log2 n)almost linear speed up compared to our initial algorithm
Example 3: Add up squares
S(n) = 12 + 22 + · · ·+ n2 = n2 + (n − 1)2 + · · ·+ 12
Algorithm SQUARE(n)
1: if n = 1 return 12: x =SQUARE(n − 1)3: y = n ∗ n4: return x + y
Simple parallel version.
Algorithm Par-SQUARE(n)
1: if n = 1 return 12: x = spawn Par-SQUARE(n − 1)3: y = n ∗ n4: sync5: return x + y
Example 3: Add up squares
S(n) = 12 + 22 + · · ·+ n2 = n2 + (n − 1)2 + · · ·+ 12
Algorithm SQUARE(n)
1: if n = 1 return 12: x =SQUARE(n − 1)3: y = n ∗ n4: return x + y
Simple parallel version.
Algorithm Par-SQUARE(n)
1: if n = 1 return 12: x = spawn Par-SQUARE(n − 1)3: y = n ∗ n4: sync5: return x + y
Example 3: Computation DAG
Example 3. Complexity comparison
I T1(n) = Θ(n)
I T∞(n) = Θ(1) + max(1,T∞(n − 1)) = Θ(n)
I Pararellism= T1(n)/T∞(n) = Θ(1)No speed up over sequential algorithm. Bad parallelimplementation
The bounds on speed up for p processors
I Speed up = T1/Tp.
I In reality: How much faster does the program run with pprocessors?
I What are the bounds on Tp(n)?
I Crude lower bound: Tp ≥ T1/p. Why?
I Difficult to divide work perfectly between the p processors.i.e. pTp ≥ T1
I If p is very large this lower bound is inaccurate. Why?
We need more accurate bounds
Greedy scheduling I
I A scheduler is greedy if it immediately allocates any freeprocessor to an available tasks
I The greedy scheduling principle says that if a computation isrun on p processors using a greedy scheduler then the totaltime Tp is bounded by
Tp ≤W
p+ S
I The span S measures the unavoidably sequential part of thealgorithm
Greedy scheduling II
I The lower bound is
Tp ≥ max
(W
p,S
)I W /p allocates work equally to processors so they all finish at
the same time, S is the span
I Thus
max
(W
p,S
)≤ Tp ≤
W
p+ S
I This means that if we increase the number of processors p sothat W /p S we are wasting resources. The algorithm stilltakes time at least S
This material (and much more) is covered inCormen, Lierseson, Rivest and Stein,Introduction to Algorithms (3rd edition)Chapter 27, Multithreaded Algorithms,downloadable fromhttps://mitpress.mit.edu/sites/default/files/
titles/sample/0262533057chap27.pdf
See also the free book at:http://www.parallel-algorithms-book.com/ (Sections 3.3.2, 3.4)