View
218
Download
3
Tags:
Embed Size (px)
Citation preview
Formal Design and Verification Methods for Shared Memory
Systems
Ratan Nalumasu
Dissertation Defense
September 10, 1998
9/10/1998 Design Complexity 2
Problems Facing Digital Design
• Complexity
• Longer design time
• Shorter time to market
9/10/1998 Design Complexity 3
Current Debugging Technology
+ Full model– Partial examination No assurance– Weaker properties– Difficult correctness metrics– Full model
9/10/1998 Introduction to FM 4
Formal Methods
• Formal methods = Math based techniques
• Continuous math : Engineering =
Discrete math : Digital system design
“It is what the designers want. It’s just challenging to prove.”
9/10/1998 Introduction to FM 5
Formal Methods based Design
– Reduced model+ Complete examination+ Better assurances (on the reduced model)+ Stronger property language+ Better correctness metrics+ Reduced model
9/10/1998 Introduction to FM 6
FM Taxonomy
• Manual verification techniques: Interactive theorem provers
• Automatic verification techniques: Model checkers
• Compilation techniques:
Refinement rules
9/10/1998 Theorem Provers 7
Interactive Theorem Provers
+ Can deal with infinite state systems
– Extensive manual reasoning
Proof of a compilation scheme
+ Good for algorithm verification
9/10/1998 Model Checking 8
Model Checking
process p(x) { global G; local L;
while (...) { recv ...; send ...; }}process q(x,y) ...
2
0
1
3
(G=0, p.L=0, ...)
9/10/1998 Model Checking 9
Model Checking Strengths
• Automatic
• If property fails, model checker shows the error trace– Deadlock: How initial state reached it– Assertion: How initial state reached it– Starvation: A loop where no progress is
made
9/10/1998 Model Checking 10
Model Checking: Example
• Construct graph of the system, and check the property: Deadlock at (22)
0
1
2
0
1
2
00
10
20
21
22
01
11 02
12
• State ExplosionPartial Order Reductions
9/10/1998 Refinement Algorithms 11
Refinement Algorithms
• Need to verify only high-level protocols
• Domain-specific compilers can generate efficient implementations
Refinement rules for DSM protocols
9/10/1998 Applied FM 12
State of the art of Applied FM
+ General purpose
+ Widely applicable techniques
– Inefficient algorithms
– Inefficient “compilers”
– Do not help with domain specific concerns
9/10/1998 13
Thesis Statement
Domain specific formal methods• Efficient verification techniques• Address domain specific concerns
Domain:
Memory
CPUCPU
Memory
9/10/1998 14
Overview
• Introduction to formal verification Shared memory systems
• Contributions
• Conclusions
9/10/1998 Memory Bottleneck 15
Memory Bottleneck
• Processor speed increases at 55% a year, while memory speed increases at 7%– Caches
• Tendency toward multiprocessors– Further imbalance complex protocols– SMP systems– DSM systems
9/10/1998 SMP Architecture 16
Symmetric Multiprocessors
Can scale upto 10s of processorsModern caches have support for such SMP
protocols
CPU$
Memory
CPU$
CPU$
9/10/1998 SMP Protocols 17
SMP Protocol Design
• Bus protocols– Bus arbitration algorithm– Cache invalidation scheme– Lack of atomicity on the bus
• Bus and CPU interaction– Does CPU have out-of-order execution?– Does bus allow out-of-order completion?
• Are these decisions visible to software?
9/10/1998 DSM Architecture 18
Distributed Shared Memory
NODE NODE NODE
MEM MEM MEM
Network
Each node may be a SMP or a single CPU
9/10/1998 DSM Protocols 19
DSM Protocol Design
• Network port arbitration
• Coherency maintenance across the network– Maintaining distributed state– Little atomicity– “Ghost” messages– Transient states
• Are these decisions visible to software?
9/10/1998 Shared Memory Systems 20
Shared Memory Correctness
• Low level:– deadlock– forward progress– bus arbitration
• Intermediate level:– at most one owner of a cache line at a
time
• High-level:– abstraction provided to the software
9/10/1998 Software Interface 21
Abstraction Provided to Software
Multiprocessor:P1write(a,new)read(b)
P2write(b,new)read(a)
P1read(b)write(a,new)
P2read(a)write(b, new)
Not okunderS.C.
Uniprocessor: P1write(a,new)read(b)
P1read(b)write(a,new)
okcache/compiler/out-of-order execution
Test model checking
9/10/1998 22
Overview• Introduction to formal verification• Shared Memory systems Contributions
– mitigating state explosion • Partial order reduction algorithm
– facilitating high-level design• Protocol synthesis algorithm
– enhancing applicability• High-level correctness such as SC
• Conclusions
9/10/1998 Contributions 23
Contributions
Protocol
PO algorithm1
TestModel checking
2
2
Refinement rules3
Efficient implementation3
Contribution #1
Mitigating State Explosion Problem
Partial Order Reductions
9/10/1998 PO Reductions 25
Partial Order Reductions
00
10
20
21
22
0
1
2
0
1
2
00
10
20
21
22
01
11 02
12
If two transitions are independent, thenexplore one of them postponing the other
9/10/1998 PO Reductions 26
Ignoring Problem
Select some transitions, and postpone others but do not postpone forever
S0
S1
Postponed
Postponed
9/10/1998 PO Reductions 27
Proviso based Solution
Godefroid, Valmari, Holzmann, Peled’s solutions are very similar: Proviso– Expands the “last” state of the loop
completely
S0
S1
Postponed
Expand
9/10/1998 PO Reductions 28
Problem with Proviso
12
11 01 21
00
Q postponed
10
202202
ALL 9 states
0
1 2P
0
1 2Q
9/10/1998 PO Reductions 29
Our Algorithm: 2-phase
00
0110 20 02
0
1 2P
0
1 2Q
Only 5 states
9/10/1998 PO Reductions 30
States TimeMig (Spin) 113,628 13.6
Mig (2 PV) 9,185 1.7
Inv (Spin) > 620,446 DNF
Inv (2 PV) 135,404 21.2
Performance Comparison
05,000
10,00015,00020,000
SC2 SC3 SC4 Pftp Snpy
SPIN
PV
(20x)
Contribution #2
Facilitating High-level Design
Protocol Refinement
9/10/1998 Refinement Algorithms 32
Protocol Refinement
• PO reductions not sufficient, theorem provers ruled out
• Compile from high-level protocol specification– easier to design– easier to verify– can generate efficient implementation
using domain knowledge
9/10/1998 Refinement Algorithms 33
Unexpected Messages
PP
recv ack
from Q
Send a
req to Q
Some request ???Always nack no forward progressAlways Silence Deadlock
9/10/1998 Refinement Algorithms 34
Refinement Procedures
• Debug the high-level specification: Synchronous communication with no transient states
• Automatic refinement procedures transforms it into detailed implementation– No need to verify the implementation– Needs domain specific knowledge for
efficiency
9/10/1998 Refinement Algorithms 35
Related Work
• Buckley & Silberschatz, 83– For OS environments, not fit for
hardware
• Gribomont,90– Protocols where synchronous
messages can be simply replaced by asynchronous messages
9/10/1998 Refinement Algorithms 36
Related Work (contd)
• Teapot, 96 for DSM systems (Chandra)– Protocol programming language– “Suspend” construct for transient states– Not high-level: Suspend states still
specify what to do in a transient state
9/10/1998 Refinement Algorithms 37
Context: DSM Protocols
Network
Protocol per each cache line1 home, n “remote” nodes per each lineHome is responsible for maintaining
consistency (Hub)
NODE
MEM
NODE
MEM
NODE
MEM
9/10/1998 Refinement Algorithms 38
Refinement Rules
Req
Ack orNack
Home Remote
Req
Ack orNack
Home Remote
9/10/1998 Refinement Algorithms 39
Refinement Rules (2)
Req1 isignored bybothprocesses
Home Remote
Req1 Req2
Ack orNack
9/10/1998 Refinement Algorithms 40
Debugging EffortProtocol N Low-level High-level
specMig 2 54 23,164/2.8
4 235/0.4
8 965/0.5
Inv 2 546/0.6 193389/20.6
4 18686/18.4
Protocol compilation scheme has beenproved using a theorem prover
Contribution #3
Enhancing Applicability
Shared Memory Model Verification
9/10/1998 Test Model Checking 42
Relaxing Instruction Orders
P1write(a,new)read(b)
P2write(b,new)read(a)
P1read(b)write(a,new)
P2read(a)write(b,new)
UnderSC
9/10/1998 Test Model Checking 43
Verification of HW/SW Interface
SC:SC:The result can be explained bysome interleaving of the instructions.
Test modelchecking
CPU $
Memory
CPU $
CPU $
9/10/1998 Test Model Checking 44
Current Verification Techniques
• Simulation– Must study lengthy executions– Must choose non-trivial programs
• Formal techniques (next slide)
9/10/1998 Test Model Checking 45
Related Work
• Graf’s Lazy caching in ACTL*
• Gibbons approach run programs and check if the results are SC
• McMillan’s thesis data abstraction for a test
• Hojati data abstraction in a different context
• Undecidability result by Alur et al
9/10/1998 Test Model Checking 46
ACTL* for (stronger than) SC
• AG(enabled( read(a,d) )) avail(a,d)• AG(avail(a,d) AND EF(enable(read(a,d))))
A[NOT avail(a,d) W AG NOT avail(a,d)]• ...• init AG[after(write(a,d))
A(NOT enabled(read(a,d) W avail(a,d))]
Such MODEL DEPENDENT SPECS do not fit in an iterative industrial frame
9/10/1998 Test Model Checking 47
Test Model Checking
• Adaptation of simulation to model checking– model checking (full coverage) + testing (“black box approach’’)
• Tests are independent of the model being verified manual effort is considerably reduced – Test model-checking can be used early
in the design cycle
9/10/1998 Test Model Checking 48
Results
• Defined a shared memory description language– “data is not used for control decisions”– “addresses are symmetric”– Can specify HP’s Runway/PA, ...
• Model checking technique– “Small number of addresses is
sufficient”
• Application to runway/PA using PV
9/10/1998 Test Model Checking 49
If P1 executes two write instructions, then P2 sees them in the program order of P1
P1A := 1A := 2A := 3
....A := k
P2X1 := AX2 := AX3 := A
....Xk := A
Many deficiencies
Read Order, Write Order
X(i+1) X(i)
9/10/1998 Test Model Checking 50
Deficiencies of the Test
• Finite k– What if an error occurs for a really large
k?
• Location “A” is never written by P2– What if an error occurs when the
ownership changes?
• Only 1-address– The definitions of RO and WO are not
restricted to a single address at a time– How many addresses to consider?
9/10/1998 Test Model Checking 51
Data abstraction + non-determinism
Unbounded k
rd(1)
rd(0)
rd(0)
rd(1)
wr(0)
wr(1)
wr(1)
Non-deterministicchange
9/10/1998 Test Model Checking 52
Ownership Changes
rd(1)
rd(0)
rd(0)
rd(1)
wr(0)
wr(1)
wr(1)or rd(-)
or rd(-)
or wr(2)
or wr(2)
Complete 1-address test
9/10/1998 Test Model Checking 53
2-address (RO, WO) test
wr(1)
rd(0)
rd(-) OR wr(0) rd(-) OR wr(2)
rd(-) OR wr(1)
rd(1)
rd(A,-) OR rd(B,-) ORwr(A,0) OR wr(B,0)
rd(A,-) OR wr(A,1) ORrd(B,-) OR wr(B,1)
rd(A,-) OR or rd(B-) ORwr(A,2) OR wr(B,2)
rd(B,1)wr(A,1)
rd(A,0)
9/10/1998 Test Model Checking 54
2-address (RO, WO) test
rd(A,-) OR rd(B,-) ORwr(A,0) OR wr(B,0)
rd(A,-) OR wr(A,1) ORrd(B,-) OR wr(B,1)
rd(A,-) OR or rd(B-) ORwr(A,2) OR wr(B,2)
rd(B,1)wr(A,1)
rd(A,0)
9/10/1998 Test Model Checking 55
Complete Test for (RO, WO)
• Theorem: A system implements (RO, WO) if and only if it has no errors on all 1- and 2-address programs
• Complete 1-address and 2-address tests
9/10/1998 Test Model Checking 56
Program Order
• PO generalizes RO and WO to include orderings between a read followed by write, and write followed by read
rd(A)
rd(B)
wr(A)
rd(B)
RO
RW
WR
PO
9/10/1998 Test Model Checking 57
• All processors agree on the order of writes– WO imposes the order only if the writes are
from same program
Write Atomicity
wr(A,0)
wr(B,1)
SC is (PO, WA)
9/10/1998 Test Model Checking 58
1-address SC test
ORDER:ORDER:1, 4OR4, 1
P0P0
A := 0rd(A)
A := 1
A := 2rd(A)
P1P1
A := 3rd(A)
A := 4
A := 5rd(A)
Barrier
9/10/1998 Test Model Checking 59
Complete Tests for SC
• Theorem: A system with N processors implements SC if and only if it has no errors on programs n<N address programs
• Scheme for N processors– N barriers– Data written before, at, and after barrier
are different• data 0, 1, 2 for P0, and data 3, 4, 5 for
P1
9/10/1998 Test Model Checking 60
Case Studies
• Serial memory (operational semantics of SC)
• Lazy caching• Runway/PA system model
– Bus based design
– An aggressive split transaction protocol
– Out-of-order completion of transactions on Runway for high-performance
– In-order completion of instructions in PA for sequential consistency
9/10/1998 Test Model Checking 61
Test Model checking of HP/RunwaySpin PV
PO-1 56K 2412
PO-2 > 5M/DNF 285K
SC-1 499K 7880
SC-2a > 5M/DNF 5.9M
SC-2b > 4M/DNF 574K
9/10/1998 62
Conclusion
Showed that specializing formal methods for a particular domain (shared memory) leads to efficient verification techniques for the domain, and increases the applicability of the formal methods– Two phase algorithm– Refinement procedure– Memory model verification
9/10/1998 63
Future Work
• Model checking algorithms– better partial order algorithms– tune for test model checking
• Protocol synthesis– More optimizations
• Test model checking– Weaker memory models, other objects– Application to other fields