Upload
gloria-mitchell
View
218
Download
1
Tags:
Embed Size (px)
Citation preview
Outline
Introduction What is a memory consistency model? Who should care?
Memory consistency models Strict consistency, sequential consistency Relaxed consistency models
Processor, weak ordering, release consistency
Summary
April 21, 2023 SoC Architecture 2
April 21, 2023 SoC Architecture 3
Shared memory architectures
April 21, 2023 SoC Architecture 4
Memory Consistency Model
Specifies constraints on the order in which memory operations (from any process) can appear to execute with respect to one another What orders are preserved? Given a load, constrain the possible values
returned by it Without it, can’t tell much about a Shared
Memory based program’s execution
April 21, 2023 SoC Architecture 5
Example of Orders
What’s the intuition? Cache Coherence does not say, anything about the order
between different variables A and B Whatever it is, we need an ordering model for clear
semantics across different locations as well so programmers can reason about what results are possible
P1 P2
/*Assume initial values of A and B are 0*/
(1a) A = 1; (2a) print B;
(1b) B = 2; (2b) print A;
April 21, 2023 SoC Architecture 6
Memory Consistency Model
Implications for both programmer and system designer Programmer uses to reason about correctness
and possible results System designer can use to constrain how much
accesses can be reordered by compiler or hardware
Contract between programmer and system
Many Consistency Models
Strict cinsistency (linearizability, or atomic consistency) sequential consistency causal consistency release consistency eventual consistency delta consistency PRAM consistency (also known as FIFO consistency) weak consistency vector-field consistency fork consistency one-copy Serializability entry consistency
April 21, 2023 SoC Architecture 7
April 21, 2023 SoC Architecture 8
Goals of consistency mdoels Programmability: Enables programmers to
reason about the behavior and correctness of programs
Performance: Impose the ordering constraints that strike a good balance between programming complexity and performance
Portability: Should be portable to different machines
Strict Consistency Model
Strict consistency Any read to a memory location X returns the value stored by the most
recent (last) write operation to X related to a global clock.
For uni-processors, ’last’ write follows the program order. What is ’last’ for multiprocessors?
P1: W(x)1
P2: R(x)1 R(x)1
P1: W(x)1
P2: R(x)0 R(x)1
P1: W(x)1
P2: R(x)0 R(x)1
OK OK
NO (OK for Sequential Consistency)
Assume that all variables initially have a value of 0.
Sequential Consistency
April 21, 2023 SoC Architecture 11
Sequential Consistency
“A multiprocessor is sequentially consistent if the result of any execution is the same as if the operations of all the processors were
executed in some sequential order, and the operations of each individual processor
appear in this sequence in the order specified by its program.” [Lamport, 1979]
April 21, 2023 SoC Architecture 12
Sequential Consistency
(as if there were no caches, and a single memory) Total order achieved by interleaving accesses from different
processes Maintains program order, and memory operations, from all
processes, appear to [issue, execute, complete] atomically w.r.t. others
Programmer’s intuition is maintained
Processors issuing memory references as per program order
P1 P2 Pn
Memory
The “switch” is randomly set after each memoryreference
April 21, 2023 SoC Architecture 13
SC example Program order among operations from a single
processor Atomic execution of memory operations
April 21, 2023 SoC Architecture 14
SC Example
What matters is order in which the program appears to execute,
possible outcomes for (A,B): (0,0), (1,0), (1,2); impossible under SC: (0,2)
we know 1a->1b and 2a->2b by program order A = 0 implies 2b->1a, which implies 2a->1b B = 2 implies 1b->2a, which leads to a contradiction
actual execution 1b->2a->2b->1a is not SC
P1 P2
/*Assume initial values of A and B are 0*/
(1a) A = 1; (2a) print B;
(1b) B = 2; (2b) print A;
April 21, 2023 SoC Architecture 15
Discussion on SC
Sequential consistency model Intuitive semantics to the programmer Easily implementable by satisfying its sufficient conditions
Write completion Write atomicity: writes visible to all processes.
Restricts many of performance optimizations with the hardware and compiler techniques.
Write buffer General interconnect with multiple memory
modules Overlapping write operations Non-blocking read operations
April 21, 2023 SoC Architecture 16
Canonical hardware optimization (without caches)
Write buffer Write transaction is not complete until acknolwedged On a write, a processor simply inserts the write operation into
the write buffer and proceeds without waiting for the write to complete.
Subsequent reads are allowed to by pass any previous writes in the write buffer for faster completion.
Purpose: hide the latency of write operations Write buffers are safe to use in a uniprocessor since
bypassing between operations to different locations does not lead to a violation of unprocessor data dependence.
What happens in a multiprocessor? April 21, 2023 SoC Architecture 17
Write buffer
Write buffer
April 21, 2023 SoC Architecture 18
If write buffers are used, both reads of flag return 0, violating SC, the program order of Write2Read (to different locations).
Terms t1, t2, t3, t4 indicate the order in which the corresponding read/write operations execute at memory.
Overlapping writes
April 21, 2023 SoC Architecture 19
Allowing writes to different locations to be re-ordered is safe for uniprocessor programs.
What about multiprocessors? The write completion may be out of program order.
An example Interconnection network allows concurrent transactions. Multiple memory modules. To explore the concurrency allowed by the network and
memory, write to another location starts before the previous one is complete (acknowedged).
Overlapping writes
April 21, 2023 SoC Architecture 20
For P2, when Head=1, what is the value for Data? Since no gurantee that the write to Data completes before the
write to Head, no guarantee that Data = 2000, violating SC, the program order of Write2Write (to different locations).
Nonblocking read operations
April 21, 2023 SoC Architecture 21
Many processors do not stall for the return value of a read operation. They can proceed past a read opertion by using techniqueds such as speculative execution, and dynamic scheduling. Reads (Read2Read to different locations)
complete out-of-program-order. What does this mean for multiprocessors?
Nonblocking read operations
April 21, 2023 SoC Architecture 22
P2 reads Data before the updated Head, violating SC, the program order of Read2Read (to different locations).
More chance to reorder operations that can violate sequential consistency. E.g. write-through cache has the similar behavior as write buffer.
Even if a read hits the cache, the processor cannot read the cached value until its previous operations by program order are complete!!
Additional issues: Need cache coherence protocol to propagate (update, invalidate)
a newly writen value to all caches copies of the modifed location.
Detecting when a write is complete needs more transactions.
Hard to make propagating to multiple copies atomic: more challenging to preserve the program order.
April 21, 2023 SoC Architecture 23
Architectures with caches
Detect the completion of write oprations
April 21, 2023 SoC Architecture 24
Suppose a write-through cache for P1 and P2 P2 initially has Data in its cache
What if P2 reads Data from its cache after it sees Head=1, but before Data is updated ?
This can be avoided if P1 waits for P2’s cache copy of Data to be updated or invalidated before proceeding with the write to Head.
Maintain the illusion of atomicity for writes
April 21, 2023 SoC Architecture 25
All processors see writes to the same location in the same order, making writes appear atomic.
Example A, B, C are cached P3 and P4 may see the writes to A by P1 and P2 in a different
order. Register1 and register2 may get 1 and 2, respectively. This violates SC.
Maintain the illusion of atomicity for writes
April 21, 2023 SoC Architecture 26
The value of a write not returned by a read until all invalidates are acked. Otherwise, violates SC.
Example A, B, C are cached P2 sees A=1, P3 sees B=1, but A=1 not be seen, register1=0, violating SC.
Re-order memory references similar to hardware-generated re-orderings
Register allocation example If the compiler register allocates the location Head on P2 (by doing a
single read of P2 and then reading the value within the register), the while loop may never terminate in some executions (if the single read on P2 returns the old value of Head). This violates SC, because the loop is guarantted to terminate in every
sequentially consistent execution of the code.
April 21, 2023 SoC Architecture 27
Compiler optimization
Sequential consistency requirements: Program order requirement: a processor must ensure that its
previous memory operation is complete before proceeding with the next memory operation in program order. A write is complete only after all invalidates (or updates) are acked.
Write atomicity requirement (for cached arch.): Writes to the same location be made visible in the same order to
all processors. The value of a write not returned by a read until all invalidates are
acked. These requirements make many hardware and compiler
optimizations invalid. Memory reference order must be strictly enforced. Instruction scheduling, register allocation, etc
April 21, 2023 SoC Architecture 28
Summary of SC
To improve performance, need to Relax program order requirement
Read/write order for different addresses Write2Read, Write2Write, and Read2Read/Write
Read/write order for the same address must always be enforced.
Relax write atomicity requirement. Allow a read to return the value of another processor’s
write before the write is complete (visible to all processors)
Relaxation related to program order and write atomicity Allow a read to return the value of its own previous write
before the write is complete.April 21, 2023 SoC Architecture 29
Relax the requirements
Relaxed consistency models
Relaxation
Relaxed models that relax all program orders Processor consistency (PC) [Goodman] Weak consistency (weak ordering, WC or WO)
[Dubois et al] Release consistency (RC) [Gharachorloo et al]
April 21, 2023 SoC Architecture 31
Relaxed consistency models
Processor Consistency
Processor consistency (PC) Writes done by a single processor are received by
all other processors in the order in which they were issued, but writes from different processors may be seen in a different order by different processors
The basic idea To better reflect the reality of networks in which
the latency between different nodes can be different.
Processor Consistency Rules: 2 memory access conditions
On a given processor, before a read is allowed to perform all previous read accesses must be performed.
On a given processor, before a write is allowed to perform all previous read or write accesses must be performed.
Example
P1: W(x)1 W(x)2
P2: R(x)2 R(x)1NO
P1 P2 P3
A = 1; While (A==0);B = 1;
While (B==0);Print A;
SC: print 1PC: print 0 or 1
Weak Consistency (WC) Idea: Accesses to shared variables should be done
within critical sections; exploit this fact Memory accesses are distinguished as either data or
sync opertaions. Rules: 3 memory access conditions
All previous synchronization accesses must be performed before a read or a write access is allowed with respect to any other processor.
All previous read and write accesses must be performed before a synchronization access is performed with respect to any other processor.
Synchronization accesses are sequentially consistent with respect to one another.
The WO model ensures that writes always appear atomic to the programmer
Program: Identify/label memory accesses as data or sync operations. Program construct(s) Define a special data type
Compiler: translate the high-level intention to machine language: Associate the type with a pariticular address (region) Or, map the special type to, for example, a SYNC
instruction, if the hardware provides such primitives. Hardware support:
Each processor uses a counter to track outstanding transactions
April 21, 2023 SoC Architecture 35
Implementing Weak consistency
Race Given a sequentially consistent execution, an operation
forms a Race with another operation if the two operations access the same location; at least one of the operationts is write; there are no other intervening operations between the two
operations Example
The operations on Data are data operations, because the write andread of Data will always be separatedby the intervening operations of thewrite and read of Head.
The operations on Head are not always separated by other operations. Therefore, they are sync operations.
April 21, 2023 SoC Architecture 36
When should an operation be a sync operation?
Programmer-centric view on Weak consistency Sync or Data?
April 21, 2023 SoC Architecture 37
Neverraces?
Given a memory location
Data operation
Syncoperation
No
Yes
Don’t know
Each processor uses a counter tracking outstanding transactions The counter is incremented when the processor issues an operation;
is decremented when a previously issued operation completes; Each procesor must ensure that
A sync operation is not issued until all previous operations are complete, i.e., count = 0.
No operations are issued until the previous sync operation completes.
Note: memory operations between two sync operations may still be reordered and overlapped with respect to one another.
April 21, 2023 SoC Architecture 38
Hardware support for WC
Release Consistency (RC)• Idea: Extends weak consistency by considering lock
(acquire) and unlock (release) operations on synchronization variable
• Rules: 3 memory access conditions• Before a read or write operation on shared data is performed, all
previous acquires done by the process must have completed successfully.
• Before a release is allowed to be performed, all previous reads and writes by the process must have completed (flush writes)
• Accesses to synchronization variables are FIFO consistent (sequential consistency is not required).
Example: valid ordering for release consistency
Release Consistency
If all accesses to shared variables are surrounded by acquire and release operations, results are the same as with sequential consistency
Blocks of operations within critical section are made atomic via acquire/release operations
Weak vs Release Consistency
read/writeread/write
read/writeread/write
Acquire(read)
read/writeread/write
Release(write)
read/writeread/write
sync
read/writeread/write
sync
read/writeread/write
Weak consistency Release consistency
1
2
3
1
2
3
Acquire -> all; All -> release;RCsc: special -> special special: acquire/release
Weak vs Release Consistency
S (Sync)
S waits forearlier writes
R/W after Swait for S
WeakConsistency
Acq.(Lock)
Rel.(Unlock)
R/W waitfor Acq.
ReleaseConsistency
Rel. waitsfor earlier R/W
Release consistency: further relax synchronization constraints by distinguishing between Acquire (Lock) and Release (Unlock) operations
April 21, 2023
SoC Architecture
43
Summary of consistency models
Strict Consistency: A read always returns with the most recent write to the same memory location
Strict Consistency: A read always returns with the most recent write to the same memory location
Sequential Consistency: The result of any execution appears as the interleaving of individual
programs strictly in sequential program order
Sequential Consistency: The result of any execution appears as the interleaving of individual
programs strictly in sequential program order
Processor Consistency: Writes issued by each processor are in program
order, but writes from different processors can be out of order
Processor Consistency: Writes issued by each processor are in program
order, but writes from different processors can be out of order
Weak Consistency: Programmer uses synchronization operators to
enforce sequential consistency
Weak Consistency: Programmer uses synchronization operators to
enforce sequential consistency
Release Consistency: Weak consistency with two synchronization operators: acquire and release. Each operator is
guaranteed to be processor-consistent
Release Consistency: Weak consistency with two synchronization operators: acquire and release. Each operator is
guaranteed to be processor-consistent
Uniprocessors
Multiprocessors
April 21, 2023 44
Summary of consistency models
Rela-xation
W2R W2W R2RW Read others’ write early
Read own write early
Safety net
SC √
PC √ √ √ Read-modify-write
WC √ √ √ √ sync
RC √ √ √ √ Acquire, release
A consistency model: what it is; what conditions and primitives to enforce; what order (relaxation) a processor sees; how does it differ from others.
Conclusion
A memory consistency model is a contract between a shared memory machine with its programs 3P: programmablity, performance and portability
Different consistency models exist. They have subtle but important differences. Different performance, overhead, hardware cost etc. Programmer prefers an intuitive interface, like SC.
April 21, 2023 SoC Architecture 45