17
Anshul Kumar, CSE IITD CSL718 : CSL718 : Multiprocessors Multiprocessors Synchronization, Memory Consistency 17th April, 2006

Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006

Embed Size (px)

Citation preview

Page 1: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006

Anshul Kumar, CSE IITD

CSL718 : MultiprocessorsCSL718 : MultiprocessorsCSL718 : MultiprocessorsCSL718 : Multiprocessors

Synchronization, Memory Consistency

17th April, 2006

Page 2: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006

Anshul Kumar, CSE IITD slide 2

Synchronization ProblemSynchronization ProblemSynchronization ProblemSynchronization Problem

• Processes run on different processors independently

• At some point they need to know the status of each other for– communication– mutual exclusion etc

• Hardware primitive for atomic read+write is required (e.g. test&set, exchange, fetch&increment etc.)

Page 3: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006

Spin Lock with Exchange Instr.Spin Lock with Exchange Instr.Spin Lock with Exchange Instr.Spin Lock with Exchange Instr.Lock: 0 indicates free and 1 indicates locked

Code to lock X : r2 1lockit: r2 X ;atomic exchange

if(r20)lockit ;already locked

locks are cached for efficiency, coherence is used

Better code to lock X :lockit: r2 X ;read lock if(r20)lockit ;not available r2 1 r2 X ;atomic exchange

if(r20)lockit ;already locked

Page 4: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006

Anshul Kumar, CSE IITD slide 4

LD Locked & ST conditionalLD Locked & ST conditionalLD Locked & ST conditionalLD Locked & ST conditionalSimpler to implement• atomic exchange using LL and SCtry: r3 r2 ;move exchange value LL r1, X ;load locked SC r3, X ;store conditional if(r3=0)try ;branch store fails r2 r1 ;put loaded value in r2• fetch&increment using LL and SCtry: LL r1, X ;load locked r3 r1 + 1 ;increment SC r3, X ;store conditional if(r3=0)try ;branch store fails

Page 5: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006

Anshul Kumar, CSE IITD slide 5

Spin Lock with LL & SCSpin Lock with LL & SCSpin Lock with LL & SCSpin Lock with LL & SClockit: LL r2, X ;load locked if(r20)lockit ;not available r2 1 SC r2, X ;store cond if(r2=0)lockit ;branch store fails

spin lock with exponential back-off reduces contention

Page 6: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006

Anshul Kumar, CSE IITD slide 6

Barrier SynchronizationBarrier SynchronizationBarrier SynchronizationBarrier Synchronization

lock (X)

if(count=0)release 0count++

unlock(X)

if(count=total){count0;release1}else spin(release=1)

Page 7: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006

Anshul Kumar, CSE IITD slide 7

Improved Barrier Synch.Improved Barrier Synch.Improved Barrier Synch.Improved Barrier Synch.

local_sense !local_senselock (X)count++unlock(X)if(count = total) {count0;releaselocal_sense}else {spin(release = local_sense)}

tree based barrier reduces contention

Page 8: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006

Anshul Kumar, CSE IITD slide 8

Memory Consistency ProblemMemory Consistency ProblemMemory Consistency ProblemMemory Consistency Problem

• When must a processor see the value that has been written by another processor? Atomicity of operations – system wide?

• Can memory operations be re-ordered?

Various models :

http://rsim.cs.uiuc.edu/~sadve/Publications/

models_tutorial.ps

Page 9: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006

Anshul Kumar, CSE IITD slide 9

ExampleExampleExampleExample

P1: A = 0 P2: B = 0 ... ... A = 1 B = 1L1: if(B=0)S1 L2: if(A=0)S2

Which statements among S1 and S2 are done?

Both S1, S2 may be done if writes are delayed

Page 10: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006

Anshul Kumar, CSE IITD slide 10

Sequential ConsistencySequential ConsistencySequential ConsistencySequential Consistency

• result of any execution is same as if the operations of all processors were executed in some sequential order

• operations of each processor occur in the order specified by its program

- it requires all memory operations to be atomic

- too restrictive, high overheads

Page 11: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006

Anshul Kumar, CSE IITD slide 11

Relaxing WRelaxing WR orderR orderRelaxing WRelaxing WR orderR order

Loads are allowed to overtake stores

Write buffering is permitted

1. Total Store Ordering : Writes are atomic

2. Processor Consistency : Writes need not be atomic - Invalidations may gradually propagate

Page 12: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006

Anshul Kumar, CSE IITD slide 12

Relaxing WRelaxing WR & WR & WW orderW orderRelaxing WRelaxing WR & WR & WW orderW order

Partial Store Ordering

• Loads are allowed to overtake stores

• Writes can be re-ordered

• Memory barrier or fence are used to explicitly order any operations

Further improves the performance

Page 13: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006

Anshul Kumar, CSE IITD slide 13

ExamplesExamplesExamplesExamples

P1 P2

A = 1; while(flag=0);

flag = 1; print A;

P1 P2

A = 1; print B;

B = 1; print A;

SC ensures that “1” is printed

TSO, PC also do so

PSO does not

SC ensures that if B is printed as “1” then A is also printed as “1”

TSO, PC also do so

PSO does not

Page 14: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006

Anshul Kumar, CSE IITD slide 14

Examples - continuedExamples - continuedExamples - continuedExamples - continued

P1 P2 P3A = 1; while(A=0); while(B=0); B = 1; print A;SC ensures that “1” is printed. TSO and PSO also do that but

PC does not

P1 P2A = 1; B = 1;print B; print A;SC ensures that both can’t be printed as “0”. TSO, PC and

PSO do not

Page 15: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006

Anshul Kumar, CSE IITD slide 15

Relaxing all R/W orderRelaxing all R/W orderRelaxing all R/W orderRelaxing all R/W order

Weak Ordering or Weak Consistency

• Loads and Stores are not restricted to follow an order

• Explicit synchronization primitives are used

• Synchronization primitives follow a strict order

• Easy to achieve

• Low overhead

Page 16: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006

Anshul Kumar, CSE IITD slide 16

Release ConsistencyRelease ConsistencyRelease ConsistencyRelease Consistency

• Further relaxation of weak ordering• Synch primitives are divided into aquire

and release operations• R/W operations after an aquire can not

move before it but those before it can be moved after

• R/W operations before a release can not move after it but those after it can be moved before

Page 17: Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006

Anshul Kumar, CSE IITD slide 17

WC and RC ComparisonWC and RC ComparisonWC and RC ComparisonWC and RC Comparison

R/W…

R/W

R/W…

R/W

R/W…

R/W

synch

synch

1

2

3

R/W…

R/W

R/W…

R/W

R/W…

R/W

aquire

release

1

2

3

WC RC