View
224
Download
1
Embed Size (px)
Citation preview
Memory Models
In Software and in Hardware
Practical Considerations
Agenda
• Motivation
• Factors
• Levels of Memory Models– Models for software: Java, CLI
– Models for hardware: IA-32, IA-64
MM Motivation and Factors
http://citeseer.nj.nec.com/adve95shared.html
MM Motivation
• Multithreaded programming– Shared memory
• An example: producer/consumer queue
• Does it work correctly?– The program performs the operations in the correct order!
Task t = new Task();
queue.insert(t);
Task t = queue.get();
t.run();
Thread 1 Thread 2
Memory Model Levels
Programmer-LevelModels
Programmer-LevelModels
Implementor-LevelModels (Virtual Machine)
Implementor-LevelModels (Virtual Machine)
Implementor-LevelModels (Hardware)
Implementor-LevelModels (Hardware)
IA-32, IA-64, Alpha, PowerPC, TSO, PSO,
etc.
Java Memory Model (Implementor View),
Microsoft CLI
Java MM, CLI MM, SC, Coherence, Release
Consistency, etc.
Compiler
VM
Factors that Affect MM
• Compiler: performs optimizations
• [Virtual Machine]: yet more optimizations
• Processor: performs operations out of order
• Memory subsystem: delivers updates out of order
MM Factors: Compiler & VM
• Compilers– Store values in registers– Reorder operations
• Example
int x = 0, answer = 0;
void f() { while (!answer) { x = x+1; }}
int x = 0, answer = 0;
void f() { while (!answer) { x = x+1; }}
int x = 0, answer = 0;
void f() { int tmp1 = x; int tmp2 = answer; while (!tmp2) { tmp1 = tmp1+1; } x = tmp1;}
int x = 0, answer = 0;
void f() { int tmp1 = x; int tmp2 = answer; while (!tmp2) { tmp1 = tmp1+1; } x = tmp1;}
No read from memory
No write to memory
Held in register all the time
MM Factors: Processor
• Includes a lot of features that help it tolerate memory latency– Most of them change the order of memory operations
• Examples– Out-of-order execution : The most important
performance-enabler of modern processors
– Write combining : Reads/writes to the same cache line
– Read/write buffers
– Many more
MM Factors: Memory Subsystem
• Hardware– Cache Coherence Protocols
• Software– DSM Coherence Protocols
The TradeoffThe more optimizations are there in the system, the less transparent it is to the programmer
Sequential Consistency Any Order
Transparency Perfo
rman
ce
Programmer View Models
Java – Original specification
Java – New specification
Microsoft’s CLI (.NET) specification
Java MM – Original Spec
• Java Language Specification, Chapter 17 http://java.sun.com/docs/books/jls/
• A. Gontmakher, A. Schuster, ACM TOCS, vol. 18, No. 4, pp. 333-386 http://www.cs.technion.ac.il/~assaf/publications/java.ps
• Defines an abstract virtual machine– Really hard to understand– Non-compliant implementation by SUN (!!!)– Many other problems
Java MM: Motivation
• Built-in synchronization– Modeled after monitors– Integrated with memory model
• Performance: Avoid synchronization– Immutable objects
Java MM: The Abstract ModelThread 1
Local memory
Executionengine
Executionengine
Thread 2
Local memory
Executionengine
Executionengine
Main memory
useuse assignassign
loadload storestore
readread writewrite
useuse assignassign
loadload storestore
readread writewrite
Java MM: The Constraints
read x,v load x,v use x,vassign x,v store x,v write x,v
read x,v load x,vwrite x,v store x,v
load x,v use x,v
store x,v assign x,v … and more
Thread 1
Local memory
Executionengine
Executionengine
Main memory
useuseassignassign
loadloadstorestore
readread writewrite
Not always(Prescient Stores)
Java MM: Applying The Modelx==1y==1y=1 x=1
read y,1 read x,1
load y,1 load x,1
use y,1 use x,1
assign x,1 assign y,1
store x,1 store y,1
write x,1 write y,1
Java MM: How To Deal With
• Determine the dependencies between use/assigns that follow from the constraints
• Then, ignore all the operations except for use/assigns
• Non-Operational Model!
Java MM - Views
use/assign
load/store
read/write
use/assign
load/store
read/write
Programmer View(non-operational)
Implementor View(non-operational)
Program
mer V
iew(operational)Implementor View
(operational)
Java MM: Characterizations
• Java is stronger than Coherence– Proof below
• Volatile variables: Sequential Consistency
• Locks: variant of Release Consistency– Semantics of locks not SC or PC (and not stated
explicitly at all).
Java MM – Characterizations 2• Full definition: regular variables
– Based on Legal Serialization. Constraints:
– Excludes Prescient Stores– Proof: 5+ pages
r x,vw y,w
r/w xr/w x
Legend:Sees a value written by another thread
Same Variable rule
Transistor rule
Java MM – Characterizations 3• Java: full definition (regular variables only)
– Constraints:
– Includes Prescient Stores– Proof: 20+ pages!– Coherence follows from the first Constraint
r x,vr y,1r y,2w y,w
r x,vw y,1r y,2w y,w
r x,v
w y,2wy,w
r/w xr/w x
Legend:Writes a value seen by another thread
Java MM – Coherence Proof 1:Java is not weaker than Coherence
• Take operations for variable X from all threads.
• Divide each thread into blocks:
load-block: load (use)*
store-block: assign (use|assign) store (use)*
• Each block: one load/store operation.
• Sort the blocks by their memory accesses.
• Result: legal serialization of use/assigns to X.
Java MM – Coherence Proof 2:Java is stronger than Coherence
• Coherence: easily shown
• Java (without Prescient Stores):– Transistor Rule: 1.1 1.2, 2.1 2.2– Legal Serialization: 2.2 1.1, 2.1 1.2– Cycle of dependencies!
Thread 1 Thread 2
1 use x,1 1 use y,12 assign y,1 2 assign x,1
Java MM – Coherence Proof 3Prescient Stores
• A store can move presciently up– Before its corresponding assign– But not before another load/store
• The previous execution now valid– But it can still be fixed…
Thread 1read x,1read y,0read y,2write y,1
Thread 2read y,1read x,0read x,2write x,1
Thread 3write x,2write y,2
Necessarily has a load
The store, even prescient, now
cannot move up
Java MM: Conclusions
• Programming with Locks: easy
• Programming with volatile variables: easy
• Programming with regular variables:– Using just Coherence – OK– Using full definition – hard– Really accounting for Prescient Stores -
nightmare
New Java MM
In process, by Bill Pugh et. al.
http://www.javasoft.com/aboutJava/communityprocess/jsr/jsr_133.html
http://www.cs.umd.edu/~pugh/java/memoryModel/semantics.pdf
New Java VM: Motivation
• Correctly synchronized programs must have SC semantics
• Incorrectly synchronized programs must have (safe) semantics– Safety: JVM must never fail– Security: Prevent attacks based on
unsynchronized code
New Java MM: Requirements
• Backward Compatibility– No new language constructs– No new VM instructions– No system-specific artifacts, e.g. garbage collection
• Clear Distinction between compiler and VM– No optimizations in the compiler– Thus, VM model is the same as the one visible to the
programmer
• Implementability– No unrealistic requirements on software or hardware
New Java VM: The Approach
• Exact semantics for all memory accesses– Not really relevant– Except that SC for Properly Labelled (no data
races) programs can be shown
• Semantics for support of established idioms– Final fields– Volatile variables– Locks
• Quite practical
New Semantics of FinalImmutable objects
• Many objects in Java are designed to be immutable– Rationale: avoiding synchronization– Best known example – java.lang.String
• The problem: String not really immutable– Can see writes to the buffer, but not to the
length and offset!
• Security hole
New Semantics of FinalFixing immutable objects
• Solution 1: Make ALL String methods synchronized– Serious hit at performance– Not needed on single-processor machines
• Solution 2: Extending semantics of final fields– Access that reads a final field, sees it initialized– An object must not escape the constructor
• Problem: String: array elements cannot be final– “weak acquire semantics”: reads dependent on the final
field are seen initialized too
New Semantics for Volatile
• Previously: Sequential Consistency– But: no relation with the regular operations– Not really useful for synchronization (recall the
producer/consumer example)
• Now: Acquire/Release Semantics– Read works as Acquire– Write works as Release
New Semantics of VolatileDouble-Checked Locking
• An object s must be created first time it is requestedsynchronized(s) { if (s==null) s = new S(); }– Slow! Locking on each access
• Double-Checking:if (s==null) { synchronized(this)
if (s==null) s = new S(); }
• The reader can reorder access to s and to its fields
• But, if s is volatile, it works!
New Semantics of VolatileAdvanced Double-Checking
static volatile boolean initialized = false;
if (!initialized) {synchronized(this) {
if (!initialized) {s1 = new S();s1.connect(…);initialized = true;
}}}
Final fields won’t help
New Semantics of Locks
• Only locks on the same variable have acquire/release semantics– Simplifies implementation– Different locks do not synchronize anyway, so no
need for acquire
• In original spec, each lock is a memory barrier– Even synchronized(new Object()) {}– Compiler cannot safely remove locks– In the new semantics, recursive locks are no-op
CLI Memory Model
The VM for Microsoft’s .NET
http://www.ecma.ch/ecma1/STAND/ecma-335.htm
Standard ECMA-335, Common Language Infrastructure
Chapter 11.6, Memory Model and Optimizations
CLI Memory Model
• So Short!!! Just 4 pages• The system
– Flat shared memory– Threads access the same memory
• Any reordering of operations is permitted– Except volatile reads/writes– Except synchronous exceptions
• Atomic access defined for some operations• Threading APIs define synchronization semantics
CLI: Volatile Consistency
• Volatile reads and writes– Accesses to volatile variables– Explicit methods: Thread.VolatileRead,
Thread.VolatileWrite– Thread.MemoryBarrier – same as both VolatileRead
and VolatileWrite
• Volatile read – acquire semantics, volatile write – release semantics
• Different threads can see different orders of volatile writes of different threads
CLI: Locks
• Usual locking semantics: obtaining and releasing locks– Synchronized methods– System.Threading.Monitor class – simulates
C.A.R. Hoare’s monitor (only tries to; simulation is no more complete than in Java)
• Acquiring lock has acquire semantics, releasing – release semantics
CLI: Atomic Memory Accesses
• Word-length accesses, aligned 4-byte accesses are atomic
• System.Threading.Interlocked: atomic read-modify-write operations– Increment, Decrement, Exchange,
CompareExchange
• One and Two-byte reads are atomic. Byte writes may write the whole word
Conclusions: Using CLI
• All concurrent accesses might be synchronized using synchronized methods or Monitor class
• Volatile variables: no common order. Probably usable in the simplest cases– Designed for accessing hardware registers. There it fits
• Atomic memory access: no memory barrier semantics– Probably just forgotten
– Useful in some simple cases
Conclusions: Implementing CLI
• Lots of disclaimers in the spec – no unimplementable requirements. Thus, implementation is straightforward– For instance, Alpha has no instruction to write a
byte – implementation of atomic write would be problematic. Java has this problem
• From the other hand, all low-level mechanisms are present (Interlocked)
Conclusions: JVM vs. CLI• Similar semantics for locks
– Except that in Java, nested locks are no-op, thus locks can be eliminated by the compiler
– In Java, acquire/release happens only if synchronizing on same lock object. In CLI – full acquire/release.
• Similar semantics for volatiles– Except that volatiles consistency is weaker. It is unclear if
the Double Checked Locking idiom should work
• Similarly unusable semantics for regular variables– Except for Java’s provisions for object construction
(semantics of volatile fields)
• Adds low-level interlocked accesses
Hardware Memory Models
IA-64 and IA-32
IA-32
• Memory reads: acquire semantics– Except that reads can see local writes early; see
below
• Memory writes: release semantics– Except that there is no global order of writes;
see below
• Interlocked memory accesses: using processor lock prefix
IA-64: Memory Accesses
• Regular memory accesses – unordered
• Attributes to memory accesses: release or acquire– Acquire: ld.acq instruction– Release: st.rel instruction
• Memory Fence (mf)– AKA Memory Barrier, is both acquire and
release.
IA-64: Atomic Accesses
• CMPXCHG (Compare and Exchange)– Compare memory with a given value. Exchange
if not equal– Can have either acquire (cmpxchg.acq) or
release (cmpxchg.rel) semantics
• FAA (fetch and add)– Also acquire or release semantics
• XCHG (Exchange)– Only acquire semantics
IA-64: Semantics of ld.acq, st.rel
• Constraints:– Acquire >> X Acquire X
– X >> Release X Release
– Fence >> X Fence X
– X >> Fence X Fence
• Global order of all the strong write operationsT1 T2 T3 T4
st.rel [x]=1 ld.acq r1=[x] st.rel [y]=1 ld.acq r3=[y]
ld r2=[y] ld r4=[x]
Program order
Forbidden: r1=1, r3=1, r2=0, r4=0
Execution order
IA-64 Semantics: Exceptions
• Load may see value from store buffer
• Inserting mf between st.rel and ld.acq solves the problem
• But: in Java semantics, this execution is OK!
T1 T2
st.rel [x]=1 st.rel [y]=1
ld.acq r1=[x] ld.acq r3=[y]
ld r2=[y] ld r4=[x]
Permitted: r1=1, r3=1, r2=0, r4=0
IA-64 Semantics: Conclusion
• Simple. Clean
• Very usable: direct mapping to both Java and CLI memory models– Especially fits the new Java Memory Model (or
more reasonably, the new Java Memory Model especially fits IA-64 ;)
• IA-32: Obviously developed before MP systems became common (for Intel processors)– Cannot change architecture now