Upload
kevin-parrish
View
213
Download
0
Embed Size (px)
Citation preview
1
Lecture #21Shared Objects and Concurrent
Programming
This material is not available in the textbook. The online powerpoint presentations contain the text explanations given in class.
Art of Multiprocessor Programming 2
Moore’s Law
Clock speed
flattening sharply
Transistor count still
rising
Art of Multiprocessor Programming 3
Vanishing from your Desktops: The Uniprocesor
memory
cpu
Art of Multiprocessor Programming 4
Your Server: The Shared Memory Multiprocessor
(SMP)
cache
BusBus
shared memory
cachecache
Art of Multiprocessor Programming 5
Your New Server or Desktop: The Multicore Processor
(CMP)
cache
BusBus
shared memory
cachecacheAll on the same chip
Sun T2000Niagara
Art of Multiprocessor Programming 6
From the 2008 press…
…Intel has announced a press conference in San Francisco on November 17th, where it will officially launch the Core i7 Nehalem processor…
…Sun’s next generation Enterprise T5140 and T5240 servers, based on the 3rd Generation UltraSPARC T2 Plus processor, were released two days ago…
Art of Multiprocessor Programming 7
Why is Kunle Smiling?
Niagara 1
© 2006 Herlihy and Shavit8
Traditional Software Scaling Process
User code
TraditionalUniprocessor
Speedup1.8x1.8x
7x7x
3.6x3.6x
Time: Moore’s law
© 2006 Herlihy and Shavit9
Multicore Software Scaling Process
User code
Multicore
Speedup 1.8x1.8x
7x7x
3.6x3.6x
Unfortunately, not so simple…
© 2006 Herlihy and Shavit10
Real-World Software Scaling Process
1.8x1.8x 2x2x 2.9x2.9x
User code
Multicore
Speedup
Parallelization and Synchronization require great care…
11
Concurrent Programming
object
object
Shared Memory
Challenge: coordinating access
12
Persistent vs. Transient Communication
•Persistent Communication medium: the sending of information changes the state of the medium forever.
Example: Blackboard.
•Transient communication medium: the change of state is only for some limited time period.
Example: Talking.
13
Parallel Primality Testing
Task: Print all primes from 1 to 1010 in some order Available: A machine with 10 processors
Solution: Speed work up 10 times, that is, new time to print all primes will be 1/10 of time for single processor
14
Parallel Primality Testing
P1 P2 P10
1 109 2x109 1010
Split the work among processors!
Each processor Pi gets 109 numbers to test.
…
…
15
Parallel Primality Testing
(define (P i) (let ((counter (+ 1 (* (- i 1) (power 10 9)))) (upto (* i (power 10 9)))) (define (iter) (if (< counter upto) (begin (if (prime? counter) (display counter) #f) (increment-counter) (iter)) 'done)) (iter)))
(parallel-execute (P 1) (P 2) ... (P 10))
16
Problem: work is split unevenly
Some processors have less primes to test… Some composite numbers are easier to test…
P1 P2 P10
1 109 2x109 1010
Need to split the work range dynamically!
Art of Multiprocessor Programming 17
17
18
19
Shared Counter
each thread takes a number
18
A Shared Counter Object
(define (make-shared-counter value) (define (fetch) value) (define (increment) (set! value (+ 1 value)) (define (dispatch m) (cond (((eq? m 'fetch) (fetch)) (eq? m 'increment) (increment)) (else (error “unknown request”)))) dispatch)
(define shared-counter (make-shared-counter 1))
19
Using the Shared Counter
(define (P i) (define (iter) (let ((index (shared-counter 'fetch))) (if (< index (power 10 10)) (begin (if (prime? index) (display index) #f) (shared-counter 'increment) (iter)) 'done)) (iter)))
(parallel-execute (P 1) (P 2) ... (P 10))
20
This Solution Doesn’t Work
time
Increment: (set! value (+ 1 value))
P1 read value77
77
P2 increment 10 times
87 P1 set! value78 Error!
(let ((index (shared-counter 'fetch)))
77P1 fetch
P2 fetch
77
77Error!
Art of Multiprocessor Programming 21
Is this problem inherent?
If we could only glue reads and writes together…
read
write read
write
!! !!
22
The Fetch-and-Increment Operation
(define (make-shared-counter value) (define (fetch-and-increment) (let ((old value)) (set! value (+ old 1)) old)) (define (dispatch m) (cond (((eq? m 'fetch-and-increment) (fetch-and-increment)) (else (error ``unknown request -- counter'' m)))) dispatch)
Instantaneous
Shared Counter
Fetch-and-inc
© 2006 Herlihy and Shavit23
Where Things Reside
cache
Bus Bus
cachecache
1
shared counter
shared memory
void primePrint { int i = ThreadID.get(); // IDs in {0..9} for (j = i*109+1, j<(i+1)*109; j++) { if (isPrime(j)) print(j); }}
code
Local variables
24
A Correct Shared Counter
(define shared-counter (make-shared-counter 1))(define (P i) (define (iter) (let ((index (shared-counter 'fetch-and-increment))) (if (< index (power 10 10)) (begin (if (prime? index) (display index) #f) (iter)) 'done)) (iter))) (parallel-execute (P 1) (P 2) ... (P 10))
25
Implementing Fetch-and-Inc
To make the program work we need an “instantaneous” implementation of fetch-and-increment. How can we do this:
• Special Hardware. Built-in synchronization instructions. • Special Software. Use regular instructions -- the solution
will involve waiting.
Software: Mutual Exclusion
26
Mutual Exclusion
(mutex 'start)
(let ((old value))
(set! value (+ old 1))
old)
(mutex 'end))
Only one process at a time can execute these instructions
P1
P2
P10
...11 P2
returns 1Mutex count
27
The Story of Alice and Bob
Bob Alice
Yard
* As told by Leslie Lamport
28
The Mutual Exclusion Problem
Requirements: • Mutual Exclusion: there will never be two dogs
simultaneously in the yard.• No Deadlock: if only one dog wants to be in the yard it will
succeed, and if both dogs want to go out, at least one of them will succeed.
29
Cell Phone Solution
Bob Alice
Yard
30
Coke Can Solution
Bob Alice
Yard
31
Flag Solution -- Alice
(define (Alice) (loop ;; ``repeat forever'' (set! Alice-flag 'up) ;; Alice wants to enter (do ((= Bob-flag 'up)) (skip)) ;; loop until Bob lowers flag (Alice-dog-in-yard) ;; Dog can enter the yard (set! Alice-flag 'down) ;; Alice is leaving ))
(define (Alice) (loop ;; ``repeat forever'' (set! Alice-flag 'up) ;; Alice wants to enter (do ((= Bob-flag 'up)) (skip)) ;; loop until Bob lowers flag (Alice-dog-in-yard) ;; Dog can enter the yard (set! Alice-flag 'down) ;; Alice is leaving ))
Bob Alice
32
Flag Solution -- Bob
(define (Bob) (loop ;; ``repeat forever'' (set! Bob-flag 'up) ;; Bob wants to enter (do ((= Alice-flag 'up)) ;; If Alice wants to enter (set! Bob-flag 'down) ;; Bob is a gentleman (do ((= Alice-flag 'up)) (skip)) ;; loop (skip) till Alice leaves (set! Bob-flag 'up) ;; raise flag ) ;; and go through the do again (Bob-dog-in-yard) ;; Dog can enter yard (set! Bob-flag 'down) ;; Bob is leaving ))
(define (Bob) (loop ;; ``repeat forever'' (set! Bob-flag 'up) ;; Bob wants to enter (do ((= Alice-flag 'up)) ;; If Alice wants to enter (set! Bob-flag 'down) ;; Bob is a gentleman (do ((= Alice-flag 'up)) (skip)) ;; loop (skip) till Alice leaves (set! Bob-flag 'up) ;; raise flag ) ;; and go through the do again (Bob-dog-in-yard) ;; Dog can enter yard (set! Bob-flag 'down) ;; Bob is leaving ))
33
Flag Solution -- Both
(define (Alice) (loop ;; ``repeat forever'' (set! Alice-flag 'up) ;; Alice wants to enter (do ((= Bob-flag 'up)) (skip)) ;; loop until Bob lowers flag (Alice-dog-in-yard) ;; Dog can enter the yard (set! Alice-flag 'down) ;; Alice is leaving ))
(define (Alice) (loop ;; ``repeat forever'' (set! Alice-flag 'up) ;; Alice wants to enter (do ((= Bob-flag 'up)) (skip)) ;; loop until Bob lowers flag (Alice-dog-in-yard) ;; Dog can enter the yard (set! Alice-flag 'down) ;; Alice is leaving ))
(define (Bob) (loop ;; ``repeat forever'' (set! Bob-flag 'up) ;; Bob wants to enter (do ((= Alice-flag 'up)) ;; If Alice wants to enter (set! Bob-flag 'down) ;; Bob is a gentleman (do ((= Alice-flag 'up)) (skip)) ;; loop (skip) till Alice leaves (set! Bob-flag 'up) ;; raise flag ) ;; and go through the do again (Bob-dog-in-yard) ;; Dog can enter yard (set! Bob-flag 'down) ;; Bob is leaving ))
(define (Bob) (loop ;; ``repeat forever'' (set! Bob-flag 'up) ;; Bob wants to enter (do ((= Alice-flag 'up)) ;; If Alice wants to enter (set! Bob-flag 'down) ;; Bob is a gentleman (do ((= Alice-flag 'up)) (skip)) ;; loop (skip) till Alice leaves (set! Bob-flag 'up) ;; raise flag ) ;; and go through the do again (Bob-dog-in-yard) ;; Dog can enter yard (set! Bob-flag 'down) ;; Bob is leaving ))
34
Intuition: Why Mutual Exclusion is Preserved
Each perform: • First raise the flag, to signal interest. Then• look to see if the other one has raised the flag.
One can claim that the following flag principle holds:
since Alice and Bob each raise their own flag and then look at the others flag, the last one to start looking must notice that both flags are up.
Art of Multiprocessor Programming 35
Proof of Mutual Exclusion
• Assume both dogs in yard• Derive a contradiction• By reasoning backwards
• Consider the last time Alice and Bob each looked before letting the dogs in
• Without loss of generality assume Alice was the last to look…
Art of Multiprocessor Programming 36
Proof
time
Alice’s last look
Alice last raised her flag
Bob’s last look
QED
Alice must have seen Bob’s Flag. A Contradiction
Bob last raised flag
37
Why is there no Deadlock?
Since Alice has priority over Bob…if neither is entering the critical section, both are repeatedly trying, and Bob will give Alice priority.
Unfortunately, the algorithm is not a fair one, and Bob's dogs might eventually grow very anxious :-)
38
The Morals of our Story
• The Mutual Exclusion problem cannot be solved using transient communication. (I.e. Cell-phones.)
• The Mutual Exclusion problem cannot be solved using interrupts or interrupt bits (I.e. Cans)
• The Mutual Exclusion problem can be solved with one bit registers (i.e. Flags), memory locations that can be read and written (set!-ed).
We cheated a little: the arbiter problem…
Art of Multiprocessor Programming 39
The Arbiter Problem (an aside)
Pick a point
Pick a point
40
The Solution and Conclusion
(define (Alice) (loop (mutex 'begin) (Alice-dog-in-yard) ;; critical section (mutex 'end) ))
Question: then why not execute all the code of the parallel prime-printing algorithm in a critical section?
Art of Multiprocessor Programming 41
Answer: Amdahl’s Law
OldExecutionTimeNewExecutionTimeSpeedup=
…of computation given n CPUs instead of 1
Art of Multiprocessor Programming 42
Amdahl’s Law
p
pn
1
1Speedup=
Art of Multiprocessor Programming 43
Amdahl’s Law
p
pn
1
1Speedup=
Parallel fraction
Art of Multiprocessor Programming 44
Amdahl’s Law
p
pn
1
1Speedup=
Parallel fraction
Sequential fraction
Art of Multiprocessor Programming 45
Amdahl’s Law
p
pn
1
1Speedup=
Parallel fraction
Number of
processors
Sequential fraction
Art of Multiprocessor Programming 46
Example
• Ten processors• 60% concurrent, 40% sequential• How close to 10-fold speedup?
Art of Multiprocessor Programming 47
Example
• Ten processors• 60% concurrent, 40% sequential• How close to 10-fold speedup?
106.0
6.01
1
Speedup = 2.17=
Art of Multiprocessor Programming 48
Example
• Ten processors• 80% concurrent, 20% sequential• How close to 10-fold speedup?
Art of Multiprocessor Programming 49
Example
• Ten processors• 80% concurrent, 20% sequential• How close to 10-fold speedup?
108.0
8.01
1
Speedup = 3.57=
Art of Multiprocessor Programming 50
Example
• Ten processors• 90% concurrent, 10% sequential• How close to 10-fold speedup?
Art of Multiprocessor Programming 51
Example
• Ten processors• 90% concurrent, 10% sequential• How close to 10-fold speedup?
109.0
9.01
1
Speedup = 5.26=
Art of Multiprocessor Programming 52
Example
• Ten processors• 99% concurrent, 01% sequential• How close to 10-fold speedup?
Art of Multiprocessor Programming 53
Example
• Ten processors• 99% concurrent, 01% sequential• How close to 10-fold speedup?
1099.0
99.01
1
Speedup = 9.17=
Art of Multiprocessor Programming
Back to Real-World Multicore Scaling
54
1.8x1.8x 2x2x 2.9x2.9x
User code
Multicore
Speedup
Why the bad performance?
As num cores grows the effect of 25% becomes more accute 2.3/4, 2.9/8, 3.4/16, 3.7/32….
Amdahl’s Law:
Pay for N = 8 cores SequentialPart = 25%
Speedup = only 2.9 times!
Must parallelize applications on a very fine grain!
Where is sequential code coming from…
Need Fine-Grained Locking
75%Unshared
25%Shared
c c
c c
c cc c
CoarseGrained
c
cc
c
c
c
c c
c c
c c
c cc c
FineGrained c c
cc
cc
cc
The reason we get
only 2.9 speedup
75%Unshared
25%Shared
57
Multicores are here …
58
“Life is the synchronicity of chance”
You just saw a bit of what concurrent programming is about
Today we don’t have sufficient expertise yet on how to make use of multicore machines…
You guys are the generation that will get to use them and hopefully develop this expertise.
Programming Multicore Machines