Debugging Concurrent Software by Context-Bounded Analysis Shaz Qadeer Microsoft Research Joint work with: Jakob Rehof, Microsoft Research Dinghao Wu, Princeton

Debugging Concurrent Software by Context-Bounded

AnalysisShaz Qadeer

Microsoft Research

Joint work with:•Jakob Rehof, Microsoft Research•Dinghao Wu, Princeton University

Concurrent software

•Operating systems, device drivers•Databases, web servers, browsers, GUIs, ...•Modern languages: C#, Java

Processor 1

Processor 2

Thread 1

Thread 2

Thread 3

Thread 4

Concurrency is increasingly important

• Single-chip multiprocessors are an architectural inflexion point– Software running on these chips will be

even more concurrent

• Embedded systems– Airplanes, cars, PDAs, cellphones

• Web services

Reliable concurrent software?

•Correctness Problem– does program behave correctly for all

inputs and all interleavings?

•Bugs due to concurrency are insidious – nondeterministic, timing dependent– difficult to detect, reproduce, eliminate– coverage from testing very poor

Analysis of concurrent programs is difficult (1)

• Finite-data single-procedure program– n lines– m states for global data variables

• 1 thread– n * m states

• K threads– (n)

K * m states

Analysis of concurrent programs is difficult (2)

• Finite-data program with procedures– n lines– m states for global data variables

• 1 thread– Infinite number of states– Can still decide assertions in O(n * m3)– SLAM, ESP, BLAST implement this algorithm

• K 2 threads– Undecidable! (Ramalingam 00)

Context-bounded verification of concurrent software

Context Context Context

Context switch Context switch

Analyze all executions with small number of context switches !

• Many subtle concurrency errors are manifested in executions with a small number of contexts

• Context-bounded analysis can be performed efficiently

Why context-bounded analysis?

KISS: A static checker for concurrent software

• An implementation of context-bounded analysis– Technique to use any sequential checker

to perform context-bounded concurrency analysis

• Has found a number of concurrency errors in NT device drivers

Sequentialprogram QKISS

Sequential Checker

Concurrentprogram P

No error found

Error in Q indicateserror in P



Concurrentprogram P


No error found


SDV


Concurrentprogram P


No error found


PREfix


Concurrentprogram P


No error found


ESP

Inside a static checker for sequential programs

int x, y, z;

void foo ( ) { if (x > y) { y = x; } if (y > z) { z = y; }

assert (x ≤ z);}

• Symbolically analyze all paths

• Check the assertion for each path

• Interprocedural analysis – e.g., PREfix, ESP, SLAM,

BLAST

KISS strategy

• Q encodes executions of P with small number of context switches– instrumentation introduces lots of extra paths to

mimic context switches

• Leverage all-path analysis of sequential checkers


Concurrentprogram P

SDV

PnpStop( ) { int t; de->stopping = T; t = AtomicDecr(& de->count); if (t == 0) SetEvent(& de->stopEvent); WaitEvent(& de->stopEvent);

}

DispatchRoutine( ) { int t; if (! de->stopping) { AtomicIncr(& de->count); // do useful work // … t = AtomicDecr(& de->count); if (t == 0) SetEvent(& de->stopEvent);

}}

DispatchRoutine( ) { int t; if (! de->stopping) { AtomicIncr(& de->count); // do useful work // … t = AtomicDecr(& de->count); if (t == 0) SetEvent(& de->stopEvent);

}}

PnpStop( ) { int t; if ($) return; de->stopping = T; if ($) return; t = AtomicDecr(& de->count); if ($) return; if (t == 0) SetEvent(& de->stopEvent); if ($) return; WaitEvent(& de->stopEvent);

}


}

DispatchRoutine( ) { int t; CODE; if (! de->stopping) { CODE; AtomicIncr(& de->count); // do useful work // … CODE; t = AtomicDecr(& de->count); CODE; if (t == 0) SetEvent(& de->stopEvent); CODE; }}

if ( !done ) { if ($) { done = T; PnpStop( ); }}

CODE bool done = F;


}

DispatchRoutine( ) { int t; CODE; if (! de->stopping) { CODE; AtomicIncr(& de->count); // do useful work // … CODE; t = AtomicDecr(& de->count); CODE; if (t == 0) SetEvent(& de->stopEvent); CODE; }}


CODE bool done = F;

main( ) { DispatchRoutine( ); }

PnpStop( ) { int t; CODE; de->stopping = T; CODE; t = AtomicDecr(& de->count); CODE; if (t == 0) SetEvent(& de->stopEvent); CODE; WaitEvent(& de->stopEvent); CODE;}

DispatchRoutine( ) { int t; if ($) return; if (! de->stopping) { if ($) return; AtomicIncr(& de->count); // do useful work // … if ($) return; t = AtomicDecr(& de->count); if ($) return; if (t == 0) SetEvent(& de->stopEvent);

}}


CODE bool done = F;

main( ) { PnpStop( ); }

KISS features• KISS trades off soundness for scalability • Cost of analyzing a concurrent program P =

cost of analyzing a sequential program Q– Size of Q asymptotically same as size of P

• Unsoundness is precisely quantifiable– for 2-thread program, explores all executions

with up to two context switches – for n-thread program, explores up to 2n-2

context switches

• Allows any sequential checker to analyze concurrency

Experimental Evaluation of KISS

Driver Stopping Error in Bluetooth Driver (1 KLOC)

DispatchRoutine() { int t; if (! de->stopping) { AtomicIncr(& de->count); assert ! driverStopped; // do useful work // … t = AtomicDecr(& de->count); if (t == 0) SetEvent(& de->stopEvent); }}

PnpStop() { int t; de->stopping = T; t = AtomicDecr(& de->count); if (t == 0) SetEvent(& de->stopEvent); WaitEvent(& de->stopEvent); driverStopped = T;}

int t;if (! de->stopping) {

int t;de->stopping = T;t = AtomicDecr(& de->count);if (t == 0) SetEvent(& de->stopEvent);WaitEvent(& de->stopEvent);driverStopped = T;

AtomicIncr(& de->count); assert ! driverStopped; // do useful work // … t = AtomicDecr(& de->count); if (t == 0) SetEvent(& de->stopEvent);}

Assertion fails!

DispatchRoutine(IRP *irp) { … irp->CancelRoutine = PacketCancelRoutine; Enqueue(irp); IoMarkIrpPending(irp); …}

IoCancelIrp(IRP *irp) { IoAcquireCancelSpinLock(); if (irp->CancelRoutine) { (irp->CancelRoutine)(irp); } …}

PacketCancelRoutine(IRP *irp) { … Dequeue(irp); IoCompleteRequest(irp); IoReleaseCancelSpinLock(); …}

IRP Cancellation Error in Packet Driver (2.5 KLOC)

…irp->CancelRoutine = PacketCancelRoutine;Enqueue(irp);

IoAcquireCancelSpinLock();if (irp->CancelRoutine) { // inline PacketCancelRoutine(irp) … Dequeue(irp); IoCompleteRequest(irp); IoReleaseCancelSpinLock();

IoMarkIrpPending(irp);

Error: An irp should not be marked pending after it has been completed !

Data-race Conditions in DDK Sample Drivers

• Device extension shared among threads• Data-races on device extension fields• 18 sample DDK drivers

– Range 0.5-9.2 KLOC– Total 70 KLOC

• Each field checked separately with resource limit of 20 minutes and 800MB

• Two threads: each calls nondeterministically chosen dispatch routine

Driver KLOC # Fields # Races

Tracedrv 0.5 3 0

Moufiltr 1.0 14 0

Kbfiltr 1.1 15 0

Imca 1.1 5 1

Startio 1.1 9 0

Toaster/toastmon 1.4 8 1

Diskperf 2.4 16 0

1394diag 2.7 18 0

1394vdev 2.8 18 1

Fakemodem 2.9 39 6

Toaster/bus 5.0 30 0

Serenum 5.9 41 2

Toaster/func 6.6 24 5

Mouclass 7.0 34 1

Kbdclass 7.4 36 1

Mouser 7.6 34 1

Fdc 9.2 92 9

Total:30 races

ToastMon_DispatchPnp(DEVICE_OBJECT *obj,IRP *irp)

{ … IoAcquireRemoveLock(); … case IRP_MN_QUERY_STOP_DEVICE: // Race: write access deviceExt->DevicePnPState = StopPending; … break; … IoReleaseRemoveLock(); …}

ToastMon_DispatchPower(DEVICE_OBJECT *obj,IRP *irp)

{ … // Race: read access if (deviceExt->DevicePnpState == Deleted) { … } …}

DevicePnpState Field in Toaster/toastmon

Acknowledgments

• Tom Ball• Byron Cook• John Henry• Doron Holan• Vladimir Levin• Jakob Lichtenberg• Adrian Oney• Sriram Rajamani• Peter Wieland• …

Keep It Simple and Sequential

• Context-bounded analysis by leveraging existing sequential checkers

• Validates the hypothesis that many concurrency errors require few context switches to show up

However…

• Hard limit on number of explored contexts– e.g., two context switches for concurrent

program with two threads

• Case study: Concurrent transaction management code written in C# (Naik-Rehof 04)– Analyzed by the Zing model checker after

automatically translating to the Zing input language

– Found three bugs each requiring between three and four context switches

Is a tuning knob possible?

Given a concurrent boolean program P and a positive integer c, does P go wrong by failing an assertion via anexecution with at most c contexts?

Given a concurrent boolean program P, does P go wrong by failing an assertion? Undecidable

Decidable

Given a concurrent boolean program P with unbounded fork-join parallelism and a positive integer c, does P go wrong by failing an assertion via an execution with at most c contexts? Decidable

Context Context Context

Context switch Context switch

Problem:• Unbounded computation possible within each context!• Unbounded execution depth and reachable state space• Different from bounded-depth model checking

Global store g, valuation to global variablesLocal store l, valuation to local variables Stack s, sequence of local storesState (g, s)

Sequential boolean program

Example

bool a = F;

void main( ) {L1: a = T;L2: flip(a);L3: }

void flip(bool x) {L4: a = !x;L5: }

(F, )

(F, _, L3)

(F, _, L3 T, L5)

(T, _, L3 T, L4)

(T, _, L2)

(F, _, L1)

(a, x, pc)

Global store g, valuation to global variablesLocal store l, valuation to local variables Stack s, sequence of local storesState (g, s)

Sequential boolean program

Transition relation:

(g, s) (g’, s’)

Reachability problem for sequential boolean program

Given (g, s), is there s’ such that (g, s) * (error,s’)?

Aggregate state

Set of stacks ssAggregate state (g, ss) = { (g,s) | s ss }

Reach(g, ss) = { (g’, s’) | exists s ss such that (g, s) * (g’, s’) }

Aggregate transition relation

Observations: • There is a unique smallest partition of Reach(g, ss) into aggregate states: (g’1, ss’1) … (g’n, ss’n) • The number of elements in the partition is bounded by the number of global stores

(g, ss) (g’1, ss’1)...

(g, ss) (g’n, ss’n)

Theorem (Buchi, Schwoon00)

• If ss is regular and (g, ss) (g’, ss’), then ss’ is regular.

• If ss is given as a finite automaton A, then a finite automaton A’ for ss’ can be constructed from A in polynomial time.

Algorithm

Solution:Compute automaton for ss’ such that (g, {s}) (error, ss’) and check if ss’ is nonempty.

Problem:Given (g, s), is there s’ such that (g, s) * (error,s’)?

Global store g, valuation to global variablesLocal store l, valuation to local variables Stack s, sequence of local storesState (g, s1, s2)

Concurrent boolean program

Transition relation:

(g, s1) (g’, s’1) in thread 1

(g, s1, s2) 1 (g’, s’1, s2)

(g, s2) (g’, s’2) in thread 2

(g, s1, s2) 2 (g’, s1, s’2)

Reachability problem for concurrent boolean program

Given (g, s1, s2), are there s’1 and s’2 such that (g, s1, s2) reaches (error, s’1, s’2) via an execution with at most c contexts?

Aggregate transition relation

(g, ss1) (g’, ss’1) in thread 1

(g, ss1, ss2) 1 (g’, ss’1, ss2)

(g, ss1, ss2) 2 (g’, ss1, ss’2)

(g, ss2) (g’, ss’2) in thread 2

(g, ss1, ss2) = { (g, s1, s2) | s1 ss1, s2 ss2 }

Algorithm: 2 threads, c contexts

1 2

1 2

1 2Depth c

(g, {s1}, {s2})

Compute the set of reachable aggregate states.Report an error if (g, ss1, ss2) is reachable andg = error, ss1 is nonempty, and ss2 is nonempty.

Complexity: 2 threads, c contexts

1 2

1 2

1 2

Depth of tree = context bound cBranching factor bounded by G 2 (G = # of global stores)Number of edges bounded by (G 2) (c+1)

Each edge computable in polynomial time

Depth c

(g, {s1}, {s2})

Context-bounded analysis of concurrent software

• Many subtle concurrency errors are manifested in executions with few context switches – Experience with KISS on Windows drivers– Experience with Zing on transaction manager

• Algorithms for context-bounded analysis are more efficient than those for unbounded analysis– Reducibility to sequential checking with KISS– Decidability of assertion checking for

concurrent boolean programs

Applications of context-bounded analysis

• Coverage metric for testing concurrent software

• Analysis of computer protocols– networking– cache-coherence

Unbounded fork-join parallelism

• Fork operation: x = fork• Join operation: join(x)• Copy thread identifier from one

variable to another

Algorithm: unbounded fork-join parallelism, c contexts

• At most c threads may perform a transition

• Reduce to previously solved problem with c threads and c contexts– Nondeterministically pick c forked

threads for execution

start : {1, …, c} boolean, initialized to i. (i == 1) end : {1, …, c} boolean, initialized to i. false

x = fork translates to

if ($) { assume(count < c); count = count + 1; x = count; start[count] = true;} else { x = c + 1;}

join(x) translates to

assume(x c);assume(end[x]);

count : {1, …, c}, initialized to 1

• c statically created threads• thread i starts execution when start[i] is true • thread i sets end[i] to true on termination

Documents

Debugging Concurrent Software by Context-Bounded Analysis Shaz Qadeer Microsoft Research Joint work with: Jakob Rehof, Microsoft Research Dinghao Wu, Princeton