Multi-threading model · 2016. 10. 3. · Multi-threading model Many languages (e.g. Java) support the production of separately runnable processes called threads. Each thread looks

Multi-threading model

High level model of thread processes using spawn and sync.Does not consider the underlying hardware.

Algorithm Algorithm-A

begin· · · spawn Algorithm-Bdo Algorithm-B in parallel with this code· · · other stuff · · · syncwait here for all previous spawned parallel computations to complete· · · end

Multi-threading model

Many languages (e.g. Java) support the production of separatelyrunnable processes called threads. Each thread looks like it isrunning on its own and the operating system shares time andprocessors between the threads. In the multi-threading model, theexact parallel implementation is left to the operating system

Multi-threading Model

We look at some examples.

I Fibonacci

I Complexity measures

See CLRS: Cormen, Lierseson, Rivest and Stein, Introduction toAlgorithms (3rd edition). Chapter 27, Multithreaded Algorithmshttps://mitpress.mit.edu/sites/default/files/

titles/sample/0262533057chap27.pdf

Multi-treading Fibonacci

Reminder: Recursive Fibonacci

Algorithm FIB(n)

1: if n ≤ 1 return n2: else3: return FIB(n − 1) + FIB(n − 2)

Parallel version

Algorithm Par-FIB(n)

1: if n ≤ 1 return n2: else3: x = spawn Par-FIB(n − 2)4: y = Par-FIB(n − 1)5: sync6: return x + y

Multi-treading Fibonacci

Reminder: Recursive Fibonacci

Algorithm FIB(n)

1: if n ≤ 1 return n2: else3: return FIB(n − 1) + FIB(n − 2)

Parallel version

Algorithm Par-FIB(n)

1: if n ≤ 1 return n2: else3: x = spawn Par-FIB(n − 2)4: y = Par-FIB(n − 1)5: sync6: return x + y

Recursive Fibonacci. The recursion tree for FIB1(6)

Figure : From CLRS Introduction to Algorithms Chapter 27 (downloaded)

Multi-threading Fibonacci

Complexity measures for multi-threading

DAG: directed acyclic graph. Vertices are the circles for spawn,sync or procedure call. For a problem of size n:

I Span S or T∞(n). Number of vertices on the longestdirected path from start to finish in the computation DAG.(The critical path).The run time if each vertex of the DAG has its own processor.

I Work W or T1(n). Total time to execute the entirecomputation on one processor. Defined as the number ofvertices in the computation DAG

I Tp(n). Total time to execute entire computation with pprocessors

I Speed up = T1/Tp. How much faster it is

I Parallelism = T1/T∞. The maximum possible speed up

Example 1: Fibonacci

Lets look at the answer first.(For details see pages 776–784 ofCLRS.)

I T1(n) = Θ(φn) where φ ∼ 1.62 (see page 776)

I T∞(n) = Θ(n).The critical path is proportional to Fib(n)

I Parallelism = T1/T∞ = Θ(φn/n).Let t = T1(n) ∝ (1.62)n be the (sequential) time.Then log t ∝ n log 1.62 = Θ(n).

I Parallelism = T1/T∞ = Θ(t/ log t).Almost linear speed up relative to our chosen sequentialalgorithm.

Span and work

Back to Fib. FIB1(n) is exponential

I For Par-Fib(n) we haveT∞(n) = max(T∞(n − 1) + T∞(n − 2)) + Θ(1) = Θ(n)

I The value (1.62)n is tricky. We can show T1(n) = Ω(√

2n)

T1(n) = T1(n − 1) + T1(n − 2) + Θ(1)

≥ 2T1(n − 2)

≥ 22T1(n − 4) ≥ 23T1(n − 6) ≥ · · ·≥ 2n/2T1(0)

=√

2nΘ(1)

where√

2 ∼ 1.41 so T1(n) ≥ (1.4)n, an exponential run time

Example 2. Add up numbers

S(n) = 1 + · · ·+ 1 i.e. S(n) = n

Algorithm SUM1(n)

1: if n = 0 return 02: SUM1= 03: for i = 1, ..., n do SUM1 = SUM1 + 14: return SUM1

How to make SUM ’look parallel’? Recursive version!

Algorithm SUM(n)

1: if n = 1 return 12: else3: return SUM(n/2) + SUM(n/2)

Add up the first half and then the second half.Not very practical? But it has a good parallel counterpart


S(n) = 1 + · · ·+ 1 i.e. S(n) = n

Algorithm SUM1(n)



Algorithm SUM(n)




S(n) = 1 + · · ·+ 1 i.e. S(n) = n

Algorithm SUM1(n)



Algorithm SUM(n)




S(n) = 1 + · · ·+ 1 i.e. S(n) = n

Algorithm SUM1(n)



Algorithm SUM(n)



Example 2. Add up numbers 1 + · · ·+ 1

Sequential recursion.Assume n is a power of 2, i.e.n = 2m

Algorithm SUM(n)


Parallel version

Algorithm Par-SUM(n)

1: if n = 1 return 12: else3: x = spawn Par-SUM(n/2)4: y = Par-SUM(n/2)5: sync6: return x + y

Example 2. Add up numbers 1 + · · ·+ 1

Sequential recursion.Assume n is a power of 2, i.e.n = 2m

Algorithm SUM(n)


Parallel version

Algorithm Par-SUM(n)

1: if n = 1 return 12: else3: x = spawn Par-SUM(n/2)4: y = Par-SUM(n/2)5: sync6: return x + y

Example 2. Complexity comparison

I T1(n) = Θ(n)

I T∞(n) = m = log2 n. Why?Ans: If n = 2m then m = log2 n

T∞(n) = Θ(1) + max(T∞(n/2),T∞(n/2))

= Θ(1) + T∞(n/2)

= (m − 1)Θ(1) + T∞(n/2m)

= mΘ(1)

I Parellism= T1(n)/T∞(n) = Θ(n/ log2 n)almost linear speed up compared to our initial algorithm

Example 3: Add up squares

S(n) = 12 + 22 + · · ·+ n2 = n2 + (n − 1)2 + · · ·+ 12

Algorithm SQUARE(n)

1: if n = 1 return 12: x =SQUARE(n − 1)3: y = n ∗ n4: return x + y

Simple parallel version.

Algorithm Par-SQUARE(n)

1: if n = 1 return 12: x = spawn Par-SQUARE(n − 1)3: y = n ∗ n4: sync5: return x + y

Example 3: Add up squares

S(n) = 12 + 22 + · · ·+ n2 = n2 + (n − 1)2 + · · ·+ 12

Algorithm SQUARE(n)

1: if n = 1 return 12: x =SQUARE(n − 1)3: y = n ∗ n4: return x + y

Simple parallel version.

Algorithm Par-SQUARE(n)

1: if n = 1 return 12: x = spawn Par-SQUARE(n − 1)3: y = n ∗ n4: sync5: return x + y

Example 3: Computation DAG

Example 3. Complexity comparison

I T1(n) = Θ(n)

I T∞(n) = Θ(1) + max(1,T∞(n − 1)) = Θ(n)

I Pararellism= T1(n)/T∞(n) = Θ(1)No speed up over sequential algorithm. Bad parallelimplementation

The bounds on speed up for p processors

I Speed up = T1/Tp.

I In reality: How much faster does the program run with pprocessors?

I What are the bounds on Tp(n)?

I Crude lower bound: Tp ≥ T1/p. Why?

I Difficult to divide work perfectly between the p processors.i.e. pTp ≥ T1

I If p is very large this lower bound is inaccurate. Why?

We need more accurate bounds

Greedy scheduling I

I A scheduler is greedy if it immediately allocates any freeprocessor to an available tasks

I The greedy scheduling principle says that if a computation isrun on p processors using a greedy scheduler then the totaltime Tp is bounded by

Tp ≤W

p+ S

I The span S measures the unavoidably sequential part of thealgorithm

Greedy scheduling II

I The lower bound is

Tp ≥ max

(W

p,S

)I W /p allocates work equally to processors so they all finish at

the same time, S is the span

I Thus

max

(W

p,S

)≤ Tp ≤

W

p+ S

I This means that if we increase the number of processors p sothat W /p S we are wasting resources. The algorithm stilltakes time at least S

This material (and much more) is covered inCormen, Lierseson, Rivest and Stein,Introduction to Algorithms (3rd edition)Chapter 27, Multithreaded Algorithms,downloadable fromhttps://mitpress.mit.edu/sites/default/files/

titles/sample/0262533057chap27.pdf

See also the free book at:http://www.parallel-algorithms-book.com/ (Sections 3.3.2, 3.4)

Documents

Multi-threading model · 2016. 10. 3. · Multi-threading model Many languages (e.g. Java) support the production of separately runnable processes called threads. Each thread looks