42
1 Shared Memory Consistency Protocol Verification against Weak Memory Models: Refinement via Model-checking Prosenjit Chatterjee, Hemanthkumar Sivaraj, Ganesh Gopalakrishnan School of Computing, University of Utah http://www.cs.utah.edu/ formal_verification/ Supported by NSF awards CCR 9987516 and 0081406, and equipment gift from Intel Corpn. [email protected] { hemanth, ganesh } @ cs.utah.edu

1 Shared Memory Consistency Protocol Verification against Weak Memory Models: Refinement via Model-checking Prosenjit Chatterjee, Hemanthkumar Sivaraj,

  • View
    240

  • Download
    1

Embed Size (px)

Citation preview

1

Shared Memory Consistency Protocol Verification against Weak Memory

Models: Refinement via Model-checking

Prosenjit Chatterjee,

Hemanthkumar Sivaraj,

Ganesh Gopalakrishnan

School of Computing, University of Utah

http://www.cs.utah.edu/formal_verification/

Supported by NSF awards CCR 9987516 and 0081406,and equipment gift from Intel Corpn.

[email protected]

{ hemanth, ganesh } @ cs.utah.edu

2

Shared memory multiprocessors

Desktop machinescpu cpu cpu….

memsnoopy bus

Servers andSupercomputers

dir dir

3

How is the programmer’s view classically specified?

“sequentialconsistency”

cpu cpu

mem

st(a,1);ld(b,0);

st(b,2);ld(a,0);

One disallowed scenario

Logical View

(“Coherence” means “per location SC”)

Processors

Memory

Initial memory contents = 0

Peterson? No!

4

cpu cpu cpu….

mem

dir dir

Growing CPU / Memory performance gap necessitates weakenings…

Aggressiveload/storereorderings

‘Bypassing’ (read back own store before others)

Strong orderingsonly at acquires/releases

…all that and more!

5

Overall Features of Weak Memory Models

• Support ‘ordinary’ as well as ‘special’ loads and stores

• Support fences and synchronization primitives

• Orderings may even depend on dynamic context

=> Provide a much larger range of load-values

Therefore…

• Writing a formal specification is highly non-trivial

• Writing a spec that supports verification is even trickier

6

A variety of highly intricate weak memory models exist

Itanium RC_tso

7

“sequentialconsistencyIS good”

One almost wishes to go back to SC…

It does not seem a realistic goal for now…

• Simplifies programming

• Some hardware tricks to hide latencies

• Range of such tricks limited

• Complexity for end-users is containable

8

• a formal specification of a weak consistency model (SPEC)

• a finite-state model of the shared memory system (IMP)

The Verification Problem

Given

Verify that

• the executions of IMP executions allowed by SPEC

Our work enables this checking to be achieved using finite-state reachability

9

• Qadeer [CAV’99, SRC TR #176]• Condon, Hu, et.al. [SPAA’01]• Nalumasu et.al. [CAV’98]• Dist. Computing Special Issue ‘99• MPV Workshop [Post FMCAD’00]

Related Work

For SC

• Qadeer [MPV workshop]• Condon et.al. [HPCA’99]• Ghughal and Gopalakrishnan [FMPPTA’00]

For Weak Models

10

• Simple and intuitive SPECs

• Support a wide range of memory models

• Support automated (finite-state) verification

• Avoid backtracking search over SPEC’s executions

• Avoid bloating state-space beyond that of IMP

Our Emphasis

11

Verification Criterion Illustrated…

IMP

Spec

st(a,1) ; ld(a,1) ; ld(a, 0)P1st(a,1);ld(a);

P2ld(a);

Show that

Implies

Same program Same execution

12

Idea: Employ a model-checker to establish refinement

loadvaluesagree

…loadvaluesagree

ExecutableSPEC

IMP

load

load store

store

load

load

• Must do a non-backtracking search over SPEC’s executions

• SPEC must be deterministic with respect to recorded events

therefore

13

What events do we record? Not just Loads and Stores!

SPEC =Carbon-copy of Imp

Imp st(a,1) ; ld(a,1) ; ld(a, 0)

st(a,1) ; ld(a,1) ; ld(a, 1)P1st(a,1);ld(a);

P2ld(a);

P1

LB LB SBSB

ststld ld

P2

M

st(a,1) ; ld(a,1) ; ld(a, 1)

• st(a,1) drained to M• ld(a,1) , ld(a,1) read from M

st(a,1) ; ld(a,1) ; ld(a, 0)

• st(a,1) in SB ; ld(a,1) from SB• ld(a,0) from M

eh?phew!

14

-- Already growing in use (Itanium spec, Neiger, Condon, …)

-- Helps export internal events to determinize SPEC’s executions

-- Defines Read Values to depend on most recent write

st_L(a,1) ; ld(a,1) ; st_G(a,1); ld(a,1)P1st(a,1);ld(a);

P2ld(a);

st_L(a,1) ; ld(a,1) ; ld(a, 0) ; st_G(a,1)

Choices revealed..

SPEC =Carbon-copy of Imp

Imp

ld(a,0) ; st_G(a,1)

Use Visibility Order style SPECs

15

Example of Visibility Order Spec (Condon, HPCA’99)

In non-Visibility Order

: program order : memory order

is in TSO if

(Memory order constraints)

• X Y /\ isLD(X) /\ isST(Y) => X Y• X MB Y => X Y

Read value rule

Value of LD, ‘X’ == Value of closest store ‘Y’ before or after ‘X’ in (local bypassing detail is messy)

: program order : a total order of LD, ST_L, ST_G is in TSO if

(Memory order constraints)

• conditions on split stores

Read value rule

Value of LD, ‘X’ == -- most recent ST_L, when ST_G is after X (local bypassing)

-- most recent ST_G, otherwise (local bypassing not exercised)

In Visibility Order style

16

• Visibility order SPECs for a wide range of mem models

• Built executable SPEC generator prototype (runnable over web)

• Verification of refinement using Parallel Murphi (ported to MPI at Utah)

• Verification without bloating IMP’s state-space and without backtracking on SPEC’s executions

• Two snoopy-bus protocols modeled after Alpha and Itanium

• Two snoopy protocols where temporal order != visibility order

• One directory-based protocol (‘Avalanche’ multiprocessor)

Our Contributions

17

SPEC

IMP

Details of our solution:Addressing the large “abstraction distance”

dir dir

18

cpu SB

LB

ld

executionpipelineL1 cache

L2 cache

Inside CPU chips Inside Directories, Interconnects, …

(Fewer design groups have control over this)

(More design groups have control over this)

Approach: Exploit Bug-classification

So… develop Intermediate Abstraction that Retains External Partition

19

The Intermediate Abstraction

SPEC

Visibility orderRead-value rule

IMPIntermediateAbstraction

Retain internalpartition

Simplify externalpartition

THIS PAPER

FUTURE WORK

dir dir

20

S C*IBM370

TSOPSORMOAlpha

PC PowerPC

PRAM Slow Memory

Cache C* Causal C*

Itanium Weak C* Entry C*

Release C*

External Partition ReplacementDepends on SPEC Memory Model

Strong Weak Weakest Hybrid

( ‘C*’ means ‘Consistency’ )

21

local global global

Abstraction Method for External Partition

Memory Model Splitting of

store instructions

External Partition

Strong store

unsplit

single port memory

Weak store single port memory

Weakest store Memory & re-order buffer per processor

Hybrid store Memory & re-order buffer per processor

local global

local global global

22

One memory (strong/weak)

orOne memory per CPU

(weakest/hybrid)

Pipe Pipe

RB RB SBSB

ststld ld

CPU1 CPU2

Pipe Pipe

RB RB SBSB

ststld ld

CPU1 CPU2

Snoopy-bus or Directory-based Memory Subsystem

Creating the Intermediate Abstraction

23

Overall approach

Phase 1

Phase 2

Phase 3

Final Spec

Define SpecGenerate Executable Spec

Run it, and gain understanding

Annotated Imp

Final Imp

Design Imp Annotate Imp with events

Derive Impabs

Start

Verify against Impabs

Success

FailureVerify Impabs

24

Verification IMPIntermediateAbstraction

st_L

st_L

st_G

st_G

ld

ld

store in SB

store in M

store in Cache

store in SB

load from LBor from Cache

load from M

loadvaluesagree?

25

Protocol States

(M)

Trans

(M)

Time

(h)

States

(M)

Trans

(M)

Time

(h)

Split Trans Bus

64 470 0.95 111 985 1.75

-- with

Scheurich Opt

251 1794 3.4 325 2769 4.8

Multiple

Interleaved Bus

255 1820 3.6 773 2686 11

-- with

Scheurich278 1946 3.9 927 3402 12

Runs on 16 CPU Parallel Murphi ported to MPI at UtahEach CPU @ 850 MHz, 256 Mb per node (LAN communication)

Alpha model w/oBarriers and LL/SC

Itanium w/o weak ld/stSemaphores (RC_tso)

26

Features of Examples

• Examples with Scheurich’s optimization: -- Logical order != Temporal order

• Directory Protocols: -- a Migratory directory protocol using PV and SPIN found no errors (parallel search not tried)

• Other directory protocols as well as Itanium (hybrid) memory model soon to be tried

27

- Not just coherence

- SC violations

- Write atomicity violations

- Hybrid memory ordering violations

- Bugs in internal partition: will be caught when intermediate abstraction compared against SPEC

Bugs likely to be caught

28

- Improve parallel model-checker

- Approximate search (e.g., parallel random-walk)

- Bounded model-checking (enumerative or SAT)

- Exploit data independence

- Try many examples, and refine methodology

How to scale up?

29

- Efficient use of reachability analysis to verify IMP against weak memory model SPEC

- Applicable to a whole range of weak models

- Selection of Intermediate Abstraction is systematic

- Annotating Intermediate Abstractions is not hard

- State explosion problem is not worsened

An easy-to-use verification technique that multiprocessor designers can use readily.

Conclusions

30

Extra Slides

31

• SC executions have a single visibility order, V• Stores present in V consistent with prog. order (single store order) • Loads present in V consistent with prog. order• Each load to address A returns value D that the most recent store in V to A wrote

“Visibility Order” explained using SC

st(a,1);ld(b,0);

st(b,2);ld(a,0);

st(p,1);st(q,2);

ld(q,2);ld(p,1);

P1 P2 P1 P2

st(a,1); ld(b,0); st(b,2); ld(a,0)

whoops!

st(p,1); ld(q,2); st(q,2); ld(p,1)

OK!

NON-SC SC

32

Writing visibility order specs for weak memory models…

st(a,1);ld(a,1);

xxx ;ld(a,0);ld(a,1);

P1 P2

ld(p,1);ld(p,2);

st(p,1);st(p,2);

P1 P2ld(p,2);ld(p,1);

P3

st(p,1)ld(p,1)st(p,2) ld(p,2)

Visibilityorder of P1

st_L(a,1)ld(a,1)

ld(a,0)st_G(a,1)ld(a,1)

Single visibility order for TSO

Can use single or multiple visibility orders[MPV workshop slides, see http://www.cs.utah.edu/mpv]

st(p,2) ld(p,2)st(p,1)ld(p,1)

..of P3

Split stores into Local and GlobalSingle Global-store Order

Stores kept unsplit

Multiple VO needed for some weak mem models….

33

ld(p,1);ld(p,2);

st(p,1);st(p,2);

P1 P2ld(p,2);ld(p,1);

P3st_1(p,1)ld(p,1)st_1(p,2) ld(p,2)st_2(p,2) ld(p,2)st_2(p,1)ld(p,1)

Single visibility order for Itanium, obtained bysplitting every Store into N copies

• Always use single Visibility Order• Makes specification more intuitive• Can annotate Implementation model with coherency events to obtain generated VO• Can compare against reliable Spec that encompasses all legal VO using reachability analysis

Our main idea

34

Related Work on Verifying Against Weak Memory Models

• Ghughal et.al. [FMPPTA’00] : -- Extension of Collier’s work to weak memory models -- Finite-state abstraction of “ARCHTESTs” to detect ordering violations

• Condon, Hill, Plakal, Sorin et.al [HPCA’99]: -- Idea based on “Lamport Clocks” -- Define “Wisconsin TSO” ordering for execution events -- Assign Lamport Clock values to coherency events -- Manual proof that Lamport Ordering (which traces causalities, and hence read values) implies Wisconsin TSO -- Defines single visibility order idea, but shows it only for subsets of TSO and Alpha

Main inspiration for our work

35

cpu cpu….

mem

What are the observable effects on programs?

ld(a,2);st(b,1);

ld(b,1);st(a,2);lost

atomicity

cpu cpu ….

mem

st(a,1);ld(b,0);

st(b,2);ld(a,0);only

certainguaranteeson executions

cpu cpu

st(p,1);st.rel(q,2);

ld.acq(q,2);ld(p,1);

36

• Shared Memory Implementations are very complex

• Spec (shared memory consistency models) also highly non-trivial

=> Verification engineers face a “double-whammy”

Mini Roadmap:

… Identifying the sources of memory model related bugs

… Related work on verifying against weak memory models

… How to verify against a broad taxonomy of mem models

The Verification Problem

37

Proc Proc

RB RB SBSB

ststld ld

Single Port Memory

38

• Shared Memory Implementations are very complex

• Spec (shared memory consistency models) also highly non-trivial

=> Verification engineers face a “double-whammy”

Mini Roadmap:

… Identifying the sources of memory model related bugs

… Related work on verifying against weak memory models

… How to verify against a broad taxonomy of mem models

The Verification Problem

39

cpu SB

LB

ld

executionpipelineL1 cache

L2 cache

Inside CPU chips Inside Directories, Interconnects, …

(Fewer design groups have control over this)

(More design groups have control over this)

Where are Ordering Relaxations Made?

Techniques that focus on the“external partition” can still bequite useful…

40

Methodology IMPIntermediateAbstraction

• Annotate Imp protocol with events of visibility order -- designer reflects his/her understanding of mem model and Imp

• Replace external partition specific to target memory model

• Annotate intermediate abstraction thus obtained

•Run reachability, matching every visibility event of Imp by one produced by Intermediate Abstraction

41

Taxonomy of memory models, and external partitions for them(can use these in combination for hybrid models)

Strong Weak Weakest Hybrid

Write AtomicityNo local bypassing

Write AtomicityLocal bypassing

No Write AtomicityCoherence

Instructions of many varietiesFences, Acq / Rel

Pictures of ext partitions as well as brief explanation(pictorial) of how event-splitting is done

42

One allowed scenario under a weak memory model (e.g. Sparc TSO)

cpu cpu

mem

st(a,1);ld(b,0);

st(b,2);ld(a,0);