Upload
nigel
View
68
Download
4
Embed Size (px)
DESCRIPTION
Computer Architecture Memory Coherency & Consistency . By Dan Tsafrir, 11/4/2011 Presentation based on slides by David Patterson, Avi Mendelson, Lihu Rappoport, and Adi Yoaz. Processor 1. Processor 2. L1 cache. L1 cache. L2 cache (shared). Memory. Coherency - intro . - PowerPoint PPT Presentation
Citation preview
Computer Architecture 2011 – coherency & consistency (lec 7)1
Computer ArchitectureComputer Architecture
Memory Coherency & Memory Coherency & Consistency Consistency
By Dan Tsafrir, 11/4/2011Presentation based on slides by David Patterson, Avi Mendelson, Lihu Rappoport, and Adi Yoaz
Computer Architecture 2011 – coherency & consistency (lec 7)2
Coherency - intro Coherency - intro When there’s only one core
Caching doesn’t affect correctness
But what happens when ≥ 2 cores work simultaneously on same memory location? If both are reading, not a problem Otherwise, one might use a stale,
out-of-date copy of the data The inconsistencies might lead to
incorrect execution
Terminology Memory coherency
<=> Cache coherency
Processor 1
L1 cache
Processor 2
L1 cache
L2 cache (shared)
Memory
Computer Architecture 2011 – coherency & consistency (lec 7)3
The cache coherency problem The cache coherency problem for a single memory locationfor a single memory location
Time Event Cache contents for CPU-1
Cache contents for CPU-2
Memory contents for location X
0 11 CPU-1 reads X 1 12 CPU-2 reads X 1 1 13 CPU-1 stores 0
into X0 1 0
Stale value, different than correspondingmemory location and CPU-1 cache.(The next read by CPU-2 will yield “1”.)
Computer Architecture 2011 – coherency & consistency (lec 7)4
A memory system is coherent A memory system is coherent if…if…
Informally, we could say (or we would like to say) that...
A memory system is coherent if…
Any read of a data item returns the most recently written value of that data item
(This definition is intuitive, but overly simplistic)
More formally…
Computer Architecture 2011 – coherency & consistency (lec 7)5
A memory system is coherent A memory system is coherent if…if…
1. - Processor P writes to location X, and later- P reads from X, and- No other processor writes to X between above write & read
=> Read must return value previously written by P
2. - P1 writes to X- Some time – T – elapses- P2 reads from X
=> For big enough T, P2 will read the value written by P1
3. Two writes to same location by any two processors are serialized
=> Are seen in the same order by all processors (if “1” and then “2” are written, no processor would read “2” & “1”)
Computer Architecture 2011 – coherency & consistency (lec 7)6
A memory system is coherent A memory system is coherent if…if…
1. - Processor P writes to location X, and later- P reads from X, and- No other processor writes to X between above write & read
=> Read must return value previously written by P
2. - P1 writes to X- Some time – T – elapses- P2 reads from X
=> For big enough T, P2 will read the value written by P1
3. Two writes to same location X by any two processors are serialized
=> Are seen in the same order by all processors (if “1” and then “2” are written, no processor would read “2” & “1”)
Simply preserves program order(needed even on uniprocessor).
Defines notation of what it means to have acoherent view of memory; if X is never updated regardless of the duration of T, than the memory is not coherent.
If P1 writes to X and then P2 writes to X, serialization of writes ensures that everyprocessor will see P2’s write eventually; otherwise P1’s value might be maintainedindefinitely.
Computer Architecture 2011 – coherency & consistency (lec 7)7
Memory ConsistencyMemory Consistency The coherency definition is not enough
So as to be able to write correct programs It must be supplemented by a consistency model Critical for program correctness
Coherency & consistency are 2 different, complementary aspects of memory systems Coherency
• What values can be returned by a read• Relates to behavior of reads & writes to the same
memory location Consistency
• When will a written value be returned by a subsequent read
• Relates to behavior of reads & writes to different memory locations
Computer Architecture 2011 – coherency & consistency (lec 7)8
Memory Consistency (cont.)Memory Consistency (cont.) “How consistent is the memory system?”
A nontrivial question Assume: locations A & B are
originally cached by P1 & P2 With initial value = 0
If writes are immediately seenby other processors Impossible for both “if” conditions to be true Reaching “if” means either A or B must hold 1
But suppose: (1) “Write invalidate” can be delayed, and (2) Processor allowed to compute during this delay => It’s possible P1 & P2 haven’t seen the invalidations of B
& A until after the reads, thus, both “if” conditions are true
Should this be allowed? Determined by consistency model
Processor P1
Processor P2
A = 0; B = 0;… …A = 1; B = 1;if ( B == 0 ) …
if ( A == 0 ) …
Computer Architecture 2011 – coherency & consistency (lec 7)9
Consistency modelsConsistency models From most strict to most relaxed
Strict consistency Sequential consistency Weak consistency Release consistency […many…]
Stricter models are Easier to understand Harder to implement Slower Involve more communication Waste more energy
Computer Architecture 2011 – coherency & consistency (lec 7)10
Strict consistency Strict consistency (“linearizability”)(“linearizability”)
All memory operations are ordered in time
Any read to location X returns the most recent write op to X
This is the intuitive notion of memory consistency
But too restrictive and thus unused
Computer Architecture 2011 – coherency & consistency (lec 7)11
Sequential consistency Sequential consistency Relaxation of strict (defined by Lamport)
Requires the result of any execution be the same as if memory accesses were executed in some arbitrary order Can be a different order upon each run
Left is sequentially consistent (can be ordered as in the right)
Q. What if we flip the order of P2’s reads (on left)?
P1: W(x)1
P2: R(x)1
R(x)2
P3: R(x)1
R(x)2
P4: W(x)2
P1: W(x)1
P2: R(x)1 R(x)2
P3: R(x)1 R(x)2
P4: W(x)2
time
Computer Architecture 2011 – coherency & consistency (lec 7)12
Weak consistencyWeak consistency1. Access to “synchronization variables” are
sequentially consistent2. No access to a synchronization variable is allowed
to be performed until all previous writes have completed everywhere
3. No data access (read or write) is allowed to be performed until all previous accesses to synchronization variables have been performed
In other words, the processor doesn’t need to broadcast values at all, until a synchronization access happens
But then it broadcasts all values to all cores
P1: W(x)1
W(x)2
S
P2: R(x)0
R(x)2
S R(x)2
P3: R(x)1
S R(x)2
Computer Architecture 2011 – coherency & consistency (lec 7)13
Release consistencyRelease consistency Before accessing shared variable
Acquire op must be completed
Before a release allowed All accesses must be completed
Acquire/release calls are sequentially consistent
Serves as “lock”
Computer Architecture 2011 – coherency & consistency (lec 7)14
MESI ProtocolMESI Protocol Each cache line can be on one of 4 states
Invalid – Line data is not valid (as in simple cache)
Shared – Line is valid & not dirty, copies may exist in other caches
Exclusive – Line is valid & not dirty, other processors do not have the line in their local caches
Modified – Line is valid & dirty, other processors do not have the line in their local caches
(MESI = Modified, Exclusive, Shared, Invalid) Achieves sequential consistency
Computer Architecture 2011 – coherency & consistency (lec 7)15
Two classes of protocols to Two classes of protocols to track sharingtrack sharing
Directory based Status of each memory block kept in just 1 location
(=directory) Directory-based coherence has bigger overhead But can scale to bigger core counts
Snooping Every cache holding a copy of the data has a copy of the
state No centralized state All caches are accessible via broadcast (bus or switch) All cache controllers monitor (or “snoop”) the broadcasts
• To determine if they have a copy of what’s requsted
Computer Architecture 2011 – coherency & consistency (lec 7)16
Processor 1
L1 cache
Processor 2
L1 cache
L2 cache (shared)
Memory
[1000]: 5
miss
Multi-processor System: Multi-processor System: ExampleExample
P1 reads 1000
P1 writes 1000
[1000]: 5
[1000]
miss [1000]: 5
[1000]: 6EM
0010
Computer Architecture 2011 – coherency & consistency (lec 7)17
Processor 1
L1 cache
Processor 2
L1 cache
L2 cache (shared)
Memory
MS
[1000]: 5
Multi-processor System: Multi-processor System: ExampleExample
P1 reads 1000
P1 writes 1000
P2 reads 1000
L2 snoops 1000
P1 writes back 1000
P2 gets 1000[1000]: 5
[1000]: 6 [1000]miss
[1000]: 6
[1000]: 6
S
1011
Computer Architecture 2011 – coherency & consistency (lec 7)18
Processor 1
L1 cache
Processor 2
L1 cache
L2 cache (shared)
Memory
MS
[1000]: 5
Multi-processor System: Multi-processor System: ExampleExample
P1 reads 1000
P1 writes 1000
P2 reads 1000
L2 snoops 1000
P1 writes back 1000
P2 gets 1000[1000]: 6
[1000]: 6 [1000]: 6[1000]: 6S
1011
P2 requests for ownership with write intent
[1000]I
01[1000]
E
Computer Architecture 2011 – coherency & consistency (lec 7)19
The alternative: incoherent The alternative: incoherent memorymemory
As core counts grow, many argue that maintaining coherence Will slow down the machines Will waste a lot of energy Will not scale
Intel SCC Single chip cloud computer – for research purposes 48 cores Shared, incoherent memory Software is responsible for correctness
The Barrelfish operating system By Microsoft & ETH (Zurich) Assumes no coherency as the base line
Computer Architecture 2011 – coherency & consistency (lec 7)20
Computer Architecture 2011 – coherency & consistency (lec 7)21
Intel SCCIntel SCC
Shared (incoherent)memory