Memory Consistency Zhonghai Lu [email protected]. Outline Introduction What is a memory consistency model? Who should care? Memory consistency models Strict

Memory Consistency

Zhonghai Lu

[email protected]

Outline

Introduction What is a memory consistency model? Who should care?

Memory consistency models Strict consistency, sequential consistency Relaxed consistency models

Processor, weak ordering, release consistency

Summary

April 21, 2023 SoC Architecture 2


Shared memory architectures


Memory Consistency Model

Specifies constraints on the order in which memory operations (from any process) can appear to execute with respect to one another What orders are preserved? Given a load, constrain the possible values

returned by it Without it, can’t tell much about a Shared

Memory based program’s execution


Example of Orders

What’s the intuition? Cache Coherence does not say, anything about the order

between different variables A and B Whatever it is, we need an ordering model for clear

semantics across different locations as well so programmers can reason about what results are possible

P1 P2

/*Assume initial values of A and B are 0*/

(1a) A = 1; (2a) print B;

(1b) B = 2; (2b) print A;


Memory Consistency Model

Implications for both programmer and system designer Programmer uses to reason about correctness

and possible results System designer can use to constrain how much

accesses can be reordered by compiler or hardware

Contract between programmer and system

Many Consistency Models

Strict cinsistency (linearizability, or atomic consistency) sequential consistency causal consistency release consistency eventual consistency delta consistency PRAM consistency (also known as FIFO consistency) weak consistency vector-field consistency fork consistency one-copy Serializability entry consistency



Goals of consistency mdoels Programmability: Enables programmers to

reason about the behavior and correctness of programs

Performance: Impose the ordering constraints that strike a good balance between programming complexity and performance

Portability: Should be portable to different machines

Strict Consistency Model

Strict consistency Any read to a memory location X returns the value stored by the most

recent (last) write operation to X related to a global clock.

For uni-processors, ’last’ write follows the program order. What is ’last’ for multiprocessors?

P1: W(x)1

P2: R(x)1 R(x)1

P1: W(x)1

P2: R(x)0 R(x)1

P1: W(x)1

P2: R(x)0 R(x)1

OK OK

NO (OK for Sequential Consistency)

Assume that all variables initially have a value of 0.

Sequential Consistency



“A multiprocessor is sequentially consistent if the result of any execution is the same as if the operations of all the processors were

executed in some sequential order, and the operations of each individual processor

appear in this sequence in the order specified by its program.” [Lamport, 1979]



(as if there were no caches, and a single memory) Total order achieved by interleaving accesses from different

processes Maintains program order, and memory operations, from all

processes, appear to [issue, execute, complete] atomically w.r.t. others

Programmer’s intuition is maintained

Processors issuing memory references as per program order

P1 P2 Pn

Memory

The “switch” is randomly set after each memoryreference


SC example Program order among operations from a single

processor Atomic execution of memory operations


SC Example

What matters is order in which the program appears to execute,

possible outcomes for (A,B): (0,0), (1,0), (1,2); impossible under SC: (0,2)

we know 1a->1b and 2a->2b by program order A = 0 implies 2b->1a, which implies 2a->1b B = 2 implies 1b->2a, which leads to a contradiction

actual execution 1b->2a->2b->1a is not SC

P1 P2

/*Assume initial values of A and B are 0*/

(1a) A = 1; (2a) print B;

(1b) B = 2; (2b) print A;


Discussion on SC

Sequential consistency model Intuitive semantics to the programmer Easily implementable by satisfying its sufficient conditions

Write completion Write atomicity: writes visible to all processes.

Restricts many of performance optimizations with the hardware and compiler techniques.

Write buffer General interconnect with multiple memory

modules Overlapping write operations Non-blocking read operations


Canonical hardware optimization (without caches)

Write buffer Write transaction is not complete until acknolwedged On a write, a processor simply inserts the write operation into

the write buffer and proceeds without waiting for the write to complete.

Subsequent reads are allowed to by pass any previous writes in the write buffer for faster completion.

Purpose: hide the latency of write operations Write buffers are safe to use in a uniprocessor since

bypassing between operations to different locations does not lead to a violation of unprocessor data dependence.

What happens in a multiprocessor? April 21, 2023 SoC Architecture 17

Write buffer

Write buffer


If write buffers are used, both reads of flag return 0, violating SC, the program order of Write2Read (to different locations).

Terms t1, t2, t3, t4 indicate the order in which the corresponding read/write operations execute at memory.

Overlapping writes


Allowing writes to different locations to be re-ordered is safe for uniprocessor programs.

What about multiprocessors? The write completion may be out of program order.

An example Interconnection network allows concurrent transactions. Multiple memory modules. To explore the concurrency allowed by the network and

memory, write to another location starts before the previous one is complete (acknowedged).

Overlapping writes


For P2, when Head=1, what is the value for Data? Since no gurantee that the write to Data completes before the

write to Head, no guarantee that Data = 2000, violating SC, the program order of Write2Write (to different locations).

Nonblocking read operations


Many processors do not stall for the return value of a read operation. They can proceed past a read opertion by using techniqueds such as speculative execution, and dynamic scheduling. Reads (Read2Read to different locations)

complete out-of-program-order. What does this mean for multiprocessors?

Nonblocking read operations


P2 reads Data before the updated Head, violating SC, the program order of Read2Read (to different locations).

More chance to reorder operations that can violate sequential consistency. E.g. write-through cache has the similar behavior as write buffer.

Even if a read hits the cache, the processor cannot read the cached value until its previous operations by program order are complete!!

Additional issues: Need cache coherence protocol to propagate (update, invalidate)

a newly writen value to all caches copies of the modifed location.

Detecting when a write is complete needs more transactions.

Hard to make propagating to multiple copies atomic: more challenging to preserve the program order.


Architectures with caches

Detect the completion of write oprations


Suppose a write-through cache for P1 and P2 P2 initially has Data in its cache

What if P2 reads Data from its cache after it sees Head=1, but before Data is updated ?

This can be avoided if P1 waits for P2’s cache copy of Data to be updated or invalidated before proceeding with the write to Head.

Maintain the illusion of atomicity for writes


All processors see writes to the same location in the same order, making writes appear atomic.

Example A, B, C are cached P3 and P4 may see the writes to A by P1 and P2 in a different

order. Register1 and register2 may get 1 and 2, respectively. This violates SC.

Maintain the illusion of atomicity for writes


The value of a write not returned by a read until all invalidates are acked. Otherwise, violates SC.

Example A, B, C are cached P2 sees A=1, P3 sees B=1, but A=1 not be seen, register1=0, violating SC.

Re-order memory references similar to hardware-generated re-orderings

Register allocation example If the compiler register allocates the location Head on P2 (by doing a

single read of P2 and then reading the value within the register), the while loop may never terminate in some executions (if the single read on P2 returns the old value of Head). This violates SC, because the loop is guarantted to terminate in every

sequentially consistent execution of the code.


Compiler optimization

Sequential consistency requirements: Program order requirement: a processor must ensure that its

previous memory operation is complete before proceeding with the next memory operation in program order. A write is complete only after all invalidates (or updates) are acked.

Write atomicity requirement (for cached arch.): Writes to the same location be made visible in the same order to

all processors. The value of a write not returned by a read until all invalidates are

acked. These requirements make many hardware and compiler

optimizations invalid. Memory reference order must be strictly enforced. Instruction scheduling, register allocation, etc


Summary of SC

To improve performance, need to Relax program order requirement

Read/write order for different addresses Write2Read, Write2Write, and Read2Read/Write

Read/write order for the same address must always be enforced.

Relax write atomicity requirement. Allow a read to return the value of another processor’s

write before the write is complete (visible to all processors)

Relaxation related to program order and write atomicity Allow a read to return the value of its own previous write

before the write is complete.April 21, 2023 SoC Architecture 29

Relax the requirements

Relaxed consistency models

Relaxation

Relaxed models that relax all program orders Processor consistency (PC) [Goodman] Weak consistency (weak ordering, WC or WO)

[Dubois et al] Release consistency (RC) [Gharachorloo et al]


Relaxed consistency models

Processor Consistency

Processor consistency (PC) Writes done by a single processor are received by

all other processors in the order in which they were issued, but writes from different processors may be seen in a different order by different processors

The basic idea To better reflect the reality of networks in which

the latency between different nodes can be different.

Processor Consistency Rules: 2 memory access conditions

On a given processor, before a read is allowed to perform all previous read accesses must be performed.

On a given processor, before a write is allowed to perform all previous read or write accesses must be performed.

Example

P1: W(x)1 W(x)2

P2: R(x)2 R(x)1NO

P1 P2 P3

A = 1; While (A==0);B = 1;

While (B==0);Print A;

SC: print 1PC: print 0 or 1

Weak Consistency (WC) Idea: Accesses to shared variables should be done

within critical sections; exploit this fact Memory accesses are distinguished as either data or

sync opertaions. Rules: 3 memory access conditions

All previous synchronization accesses must be performed before a read or a write access is allowed with respect to any other processor.

All previous read and write accesses must be performed before a synchronization access is performed with respect to any other processor.

Synchronization accesses are sequentially consistent with respect to one another.

The WO model ensures that writes always appear atomic to the programmer

Program: Identify/label memory accesses as data or sync operations. Program construct(s) Define a special data type

Compiler: translate the high-level intention to machine language: Associate the type with a pariticular address (region) Or, map the special type to, for example, a SYNC

instruction, if the hardware provides such primitives. Hardware support:

Each processor uses a counter to track outstanding transactions


Implementing Weak consistency

Race Given a sequentially consistent execution, an operation

forms a Race with another operation if the two operations access the same location; at least one of the operationts is write; there are no other intervening operations between the two

operations Example

The operations on Data are data operations, because the write andread of Data will always be separatedby the intervening operations of thewrite and read of Head.

The operations on Head are not always separated by other operations. Therefore, they are sync operations.


When should an operation be a sync operation?

Programmer-centric view on Weak consistency Sync or Data?


Neverraces?

Given a memory location

Data operation

Syncoperation

No

Yes

Don’t know

Each processor uses a counter tracking outstanding transactions The counter is incremented when the processor issues an operation;

is decremented when a previously issued operation completes; Each procesor must ensure that

A sync operation is not issued until all previous operations are complete, i.e., count = 0.

No operations are issued until the previous sync operation completes.

Note: memory operations between two sync operations may still be reordered and overlapped with respect to one another.


Hardware support for WC

Release Consistency (RC)• Idea: Extends weak consistency by considering lock

(acquire) and unlock (release) operations on synchronization variable

• Rules: 3 memory access conditions• Before a read or write operation on shared data is performed, all

previous acquires done by the process must have completed successfully.

• Before a release is allowed to be performed, all previous reads and writes by the process must have completed (flush writes)

• Accesses to synchronization variables are FIFO consistent (sequential consistency is not required).

Example: valid ordering for release consistency

Release Consistency

If all accesses to shared variables are surrounded by acquire and release operations, results are the same as with sequential consistency

Blocks of operations within critical section are made atomic via acquire/release operations

Weak vs Release Consistency

read/writeread/write


Acquire(read)


Release(write)


sync


sync


Weak consistency Release consistency

1

2

3

1

2

3

Acquire -> all; All -> release;RCsc: special -> special special: acquire/release

Weak vs Release Consistency

S (Sync)

S waits forearlier writes

R/W after Swait for S

WeakConsistency

Acq.(Lock)

Rel.(Unlock)

R/W waitfor Acq.

ReleaseConsistency

Rel. waitsfor earlier R/W

Release consistency: further relax synchronization constraints by distinguishing between Acquire (Lock) and Release (Unlock) operations

April 21, 2023

SoC Architecture

43

Summary of consistency models

Strict Consistency: A read always returns with the most recent write to the same memory location

Strict Consistency: A read always returns with the most recent write to the same memory location

Sequential Consistency: The result of any execution appears as the interleaving of individual

programs strictly in sequential program order

Sequential Consistency: The result of any execution appears as the interleaving of individual

programs strictly in sequential program order

Processor Consistency: Writes issued by each processor are in program

order, but writes from different processors can be out of order

Processor Consistency: Writes issued by each processor are in program

order, but writes from different processors can be out of order

Weak Consistency: Programmer uses synchronization operators to

enforce sequential consistency

Weak Consistency: Programmer uses synchronization operators to

enforce sequential consistency

Release Consistency: Weak consistency with two synchronization operators: acquire and release. Each operator is

guaranteed to be processor-consistent

Release Consistency: Weak consistency with two synchronization operators: acquire and release. Each operator is

guaranteed to be processor-consistent

Uniprocessors

Multiprocessors

April 21, 2023 44

Summary of consistency models

Rela-xation

W2R W2W R2RW Read others’ write early

Read own write early

Safety net

SC √

PC √ √ √ Read-modify-write

WC √ √ √ √ sync

RC √ √ √ √ Acquire, release

A consistency model: what it is; what conditions and primitives to enforce; what order (relaxation) a processor sees; how does it differ from others.

Conclusion

A memory consistency model is a contract between a shared memory machine with its programs 3P: programmablity, performance and portability

Different consistency models exist. They have subtle but important differences. Different performance, overhead, hardware cost etc. Programmer prefers an intuitive interface, like SC.


Documents

Memory Consistency Zhonghai Lu [email protected]. Outline Introduction What is a memory consistency model? Who should care? Memory consistency models Strict