View
220
Download
0
Embed Size (px)
Citation preview
1
Distributed Dynamic Partial Order Reduction based Verification of Threaded Software
Yu Yang (PhD student; summer intern at CBL)
Xiaofang Chen (PhD student; summer intern at IBM)
Ganesh GopalakrishnanRobert M. Kirby
School of ComputingUniversity of Utah
SPIN 2007 Workshop Presentation
Supported by: Microsoft HPC Institutes
NSF CNS 0509379
2
Thread Programming will become more prevalent
FV of thread programs will grow in importance
3
Why FV for Threaded Programs
> 80% of chipsshipped will bemulti-core
(photo courtesy of
Intel Corporation.)
4
Model Checking will Increasingly be thru Dynamic Methods
Also known as Runtime or In-Situ methods
5
Why Dynamic Verification Methods
• Even after early life-cycle modeling and validation, the final code will have far more details
• Early life-cycle modeling is often impossible- Use of libraries (API) such as MPI, OpenMP, Shmem, …
- Library function semantics can be tricky
- The bug may be in the library function implementation
6
Model Checking will often be “stateless”
7
Why Stateless
• One may not be able to access a lot of the state
- e.g. state of the OS
. It is expensive to hash and lookup revisits
. Stateless is easier to parallelize
8
Partial Order Reduction is Crucial !
9
Why POR?
Process P0:-------------------------------0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize
Process P1:-------------------------------0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize
ONLYDEPENDENTOPERATIONS
• 504 interleavings without POR (2 * (10!)) / (5!)^2• 2 interleavings with POR !!
10
Dynamic POR is almost a “must” !
( Dynamic POR as in Flanagan and Godefroid, POPL 2005)
11
Why Dynamic POR ?
a[ j ]++ a[ k ]--
• Ample Set depends on whether j == k
• Can be very difficult to determine statically
• Can determine dynamically
12
Why Dynamic POR ?
The notion of action dependence (crucial to POR methods) is a function of the execution
13
Computation of “ample” sets in Static POR versus in DPOR
Ample determinedusing “local” criteria
Current State
Next move of Red process
Nearest DependentTransitionLooking Back
Add Red Process to“Backtrack Set”
This builds the Ampleset incrementally based on observed dependencies
Blue is in “Done” set
{ BT }, { Done }
14
We target C/C++ PThread Programs Instrument the given program (largely automated) Run the concurrent program “till the end” Record interleaving variants while advancing When # recorded backtrack points reaches a soft
limit, spill work to other nodes In one larger example, a 11-hour run was finished in
11 minutes using 64 nodes
Heuristic to avoid recomputations was essential for speed-up. First known distributed DPOR
Putting it all together …
15
A Simple DPOR Example
{}, {}t0:
lock(t)
unlock(t)
t1:
lock(t)
unlock(t)
t2:
lock(t)
unlock(t)
16
t0: lock{}, {}t0:
lock(t)
unlock(t)
t1:
lock(t)
unlock(t)
t2:
lock(t)
unlock(t)
A Simple DPOR Example
17
t0: lock
t0: unlock
{}, {}t0:
lock(t)
unlock(t)
t1:
lock(t)
unlock(t)
t2:
lock(t)
unlock(t)
A Simple DPOR Example
18
t0: lock
t0: unlock
t1: lock
{}, {}t0:
lock(t)
unlock(t)
t1:
lock(t)
unlock(t)
t2:
lock(t)
unlock(t)
A Simple DPOR Example
19
t0: lock
t0: unlock
t1: lock
{t1}, {t0}t0:
lock(t)
unlock(t)
t1:
lock(t)
unlock(t)
t2:
lock(t)
unlock(t)
A Simple DPOR Example
20
t0: lock
t0: unlock
t1: lock
t1: unlock
t2: lock
{t1}, {t0}
{}, {}
t0:
lock(t)
unlock(t)
t1:
lock(t)
unlock(t)
t2:
lock(t)
unlock(t)
A Simple DPOR Example
21
t0: lock
t0: unlock
t1: lock
t1: unlock
t2: lock
{t1}, {t0}
{t2}, {t1}
t0:
lock(t)
unlock(t)
t1:
lock(t)
unlock(t)
t2:
lock(t)
unlock(t)
A Simple DPOR Example
22
t0: lock
t0: unlock
t1: lock
t2: unlock
t1: unlock
t2: lock
{t1}, {t0}
{t2}, {t1}
t0:
lock(t)
unlock(t)
t1:
lock(t)
unlock(t)
t2:
lock(t)
unlock(t)
A Simple DPOR Example
23
t0: lock
t0: unlock
t1: lock
t1: unlock
t2: lock
{t1}, {t0}
{t2}, {t1}
t0:
lock(t)
unlock(t)
t1:
lock(t)
unlock(t)
t2:
lock(t)
unlock(t)
A Simple DPOR Example
24
t0: lock
t0: unlock
{t1}, {t0}
{t2}, {t1}
t0:
lock(t)
unlock(t)
t1:
lock(t)
unlock(t)
t2:
lock(t)
unlock(t)
A Simple DPOR Example
25
t0: lock
t0: unlock
t2: lock
{t1,t2}, {t0}
{}, {t1, t2}
t0:
lock(t)
unlock(t)
t1:
lock(t)
unlock(t)
t2:
lock(t)
unlock(t)
A Simple DPOR Example
26
t0: lock
t0: unlock
t2: lock
t2: unlock
{t1,t2}, {t0}
{}, {t1, t2}
t0:
lock(t)
unlock(t)
t1:
lock(t)
unlock(t)
t2:
lock(t)
unlock(t)
A Simple DPOR Example
…
27
t0: lock
t0: unlock
{t1,t2}, {t0}
{}, {t1, t2}
t0:
lock(t)
unlock(t)
t1:
lock(t)
unlock(t)
t2:
lock(t)
unlock(t)
A Simple DPOR Example
28
{t2}, {t0,t1}t0:
lock(t)
unlock(t)
t1:
lock(t)
unlock(t)
t2:
lock(t)
unlock(t)
A Simple DPOR Example
29
t1: lock
t1: unlock
{t2}, {t0, t1}t0:
lock(t)
unlock(t)
t1:
lock(t)
unlock(t)
t2:
lock(t)
unlock(t)
A Simple DPOR Example
…
30
For this example, all the paths explored during DPOR
For others, it will be a proper subset
31
Idea for parallelization: Explore computations from the backtrack set in other processes.
“Embarrassingly Parallel” – it seems so, anyway !
32
We first built a sequential DPOR explorer for C / Pthreads programs, called “Inspect”
Multithreaded C/C++ program
Multithreaded C/C++ program
instrumented program
instrumented program
instrumentation
Thread library wrapper
Thread library wrapper
compile
executableexecutable
thread 1
thread n
schedulerrequest/permit
request/permit
33
Stateless search does not maintain search history Different branches of an acyclic space can be
explored concurrently Simple master-slave scheme can work here
– one load balancer + workers
We then made the following observations
34
worker a worker b
Request unloading
idle node id
work description
report result
load balancer
We then devised a work-distribution scheme…
35
We got zero speedup! Why?
Deeper investigation revealed that multiple nodes
ended up exploring the same interleavings
36
Illustration of the problem (1 of 5)
t0: lock
t0: unlock
t1: lock
t2: unlock
t1: unlock
t2: lock
{t1}, {t0}
{t2}, {t1}
37
Illustration of the problem (2 of 5)
t0: lock
t0: unlock
t1: lock
t2: unlock
t1: unlock
t2: lock
{t1}, {t0}
{t2}, {t1}
Heuristic : Handoff DEEPEST backtrack point for another node to explore
Reason : Largest number of paths emanate from there
To Node 1
38
Detail of (2 of 5)
t0: lock
t0: unlock
t1: lock
t2: unlock
t1: unlock
t2: lock
{t1}, {t0}
{t2}, {t1}
Node 0
t0: lock
t0: unlock
t1: lock
t2: unlock
t1: unlock
t2: lock
{ }, {t0,t1}
{t2}, {t1}
39
Detail of (2 of 5)
t0: lock
t0: unlock
t1: lock
t2: unlock
t1: unlock
t2: lock
{t1}, {t0}
{t2}, {t1}
Node 1Node 0
t0: lock
t0: unlock
t1: lock
t2: unlock
t1: unlock
t2: lock
{ }, {t0,t1}
{t2}, {t1}
t0: lock{t1}, {t0}
40
Detail of (2 of 5)
t0: lock
t0: unlock
t1: lock
t2: unlock
t1: unlock
t2: lock
{t1}, {t0}
{t2}, {t1}
Node 1Node 0
t0: lock
t0: unlock
t1: lock
t2: unlock
t1: unlock
t2: lock
{ }, { t0,t1 }
{t2}, {t1}
t0: lock{ t1 }, {t0}
t1 is forced into DONE set before workhanded to Node 1
Node 1 keeps t1 in backtrack set
41
Illustration of the problem (3 of 5)
t0: lock
t0: unlock
t1: lock
t2: unlock
t1: unlock
t2: lock
{t1}, {t0}
{t2}, {t1}
To Node 1
Decide to do THIS work at Node 0 itself…
42
t0: lock
t0: unlock
{}, {t0,t1}
{t2}, {t1}
{t1}, {t0}
Illustration of the problem (4 of 5)
Being expanded by Node 0
Being expanded by Node 1
43
Illustration of the problem (5 of 5)
t0: lock
t0: unlock
{t2}, {t0,t1}
{}, {t2}t2: lock
t2: unlockt2: unlock
44
Illustration of the problem (5 of 5)
t0: lock
t0: unlock
{t2}, {t0,t1}
{}, {t2}
{t1}, {t0}t1: lock
t1: unlock
t2: lock
t2: unlockt2: unlock
45
Illustration of the problem (5 of 5)
t0: lock
t0: unlock
{t2}, {t0,t1}
{}, {t2}
{t2}, {t0, t1}t1: lock
t1: unlock
t2: lock
t2: unlockt2: unlock
t2: lock
t2: unlockt2: unlock
{}, {t2}
Redundancy!
46
New Backtrack Set Computation: Aggressively mark up the stack!
t0: lock
t0: unlock
t1: lock
t2: unlock
t1: unlock
t2: lock
{t1,t2}, {t0}
{t2}, {t1}
Update the backtrack sets of
ALL dependent operations! Forms a good allocation scheme Does not involve any synchronizations Redundant work may still be performed Likelihood is reduced because a node
aggressively “owns” one operation and
all its dependants
47
Implementation and Evaluation
Using MPI for communication among nodes Did experiments on a 72-node cluster
– 2.4 GHz Intel XEON process, 2GB memory/node
– Two (small) benchmarks
Indexer & file system benchmark used in Flanagan and Godefoid’s DPOR paper
– Aget -- a multithreaded ftp client
– Bbuf – an implementation of bounded buffer
48
Sequential Checking Time
Benchmark Threads Runs Time (sec)
fsbench 26 8,192 291.32
indexer 16 32,768 1188.73
aget 6 113,400 5662.96
bbuf 8 1,938,816 39710.43
49
Speedup on indexer & fs (small exs);so diminishing returns > 40 nodes…
50
Speedup on aget
51
Speedup on bbuf
52
Conclusions and Future Work
Method described is VERY promising We have an in-situ model checker for MPI programs
also! (EuroPVM / MPI 2007)– Will be parallelized using MPI for work distribution!
The C/PThread Work needs to be pushed a lot more:– Automate Instrumentation
– Try many new examples
– Improve work-distribution heuristic in response to findings
– Release tool
53
Questions?
54
Answers !
Properties: Currently – Local “assert”s
– Deadlocks
– Uninitialized Variables
No plans for liveness
Tool release likely in 6 months
That is a very good question. Let’s talk!
55
Extra Slides
56
Concurrent operations on some database
Class A operations:
pthread_mutex_lock(mutex); a_count++;if (a_count == 1) pthred_mutex_lock(res);pthread_mutex_unlock(mutex); …pthread_mutex_lock(mutex);a_count--;if (a_count == 0) pthread_mutex_unlock(res);pthread_mutex_unlock(mutex);
Class B operations:
pthread_mutex_lock(mutex);b_count++;if (b_count == 1) pthred_mutex_lock(res);pthread_mutex_unlock(mutex); …pthread_mutex_lock(mutex);b_count--;if (b_count == 0) pthread_mutex_unlock(res);pthread_mutex_unlock(mutex);
57
Initial random execution
a1 : acquire mutexa2 : a_count + +a3 : a_count == 1a4 : acquire resa5 : release mutexa6 : acquire mutexa7 : a_count a8 : a_count == 0a9 : release resa10 : release mutexb1 : acquire mutexb2 : b_count + +b3 : b_count == 1b4 : acquire resb5 : release mutexb6 : acquire mutexb7 : b_count b8 : b_count == 0b9 : release lockb10 : release mutex
Class A operations:
pthread_mutex_lock(mutex); a_count++;if (a_count == 1) pthred_mutex_lock(res);pthread_mutex_unlock(mutex); …pthread_mutex_lock(mutex);a_count--;if (a_count == 0) pthread_mutex_unlock(res);pthread_mutex_unlock(mutex);
58
Initial random execution
a1 : acquire mutexa2 : a_count + +a3 : a_count == 1a4 : acquire resa5 : release mutexa6 : acquire mutexa7 : a_count a8 : a_count == 0a9 : release resa10 : release mutexb1 : acquire mutexb2 : b_count + +b3 : b_count == 1b4 : acquire resb5 : release mutexb6 : acquire mutexb7 : b_count b8 : b_count == 0b9 : release lockb10 : release mutex
Class A operations:
pthread_mutex_lock(mutex); a_count++;if (a_count == 1) pthred_mutex_lock(res);pthread_mutex_unlock(mutex); …pthread_mutex_lock(mutex);a_count--;if (a_count == 0) pthread_mutex_unlock(res);pthread_mutex_unlock(mutex);
59
Initial random execution
a1 : acquire mutexa2 : a_count + +a3 : a_count == 1a4 : acquire resa5 : release mutexa6 : acquire mutexa7 : a_count a8 : a_count == 0a9 : release resa10 : release mutexb1 : acquire mutexb2 : b_count + +b3 : b_count == 1b4 : acquire resb5 : release mutexb6 : acquire mutexb7 : b_count b8 : b_count == 0b9 : release lockb10 : release mutex
Class A operations:
pthread_mutex_lock(mutex); a_count++;if (a_count == 1) pthred_mutex_lock(res);pthread_mutex_unlock(mutex); …pthread_mutex_lock(mutex);a_count--;if (a_count == 0) pthread_mutex_unlock(res);pthread_mutex_unlock(mutex);
60
Initial random execution
a1 : acquire mutexa2 : a_count + +a3 : a_count == 1a4 : acquire resa5 : release mutexa6 : acquire mutexa7 : a_count a8 : a_count == 0a9 : release resa10 : release mutexb1 : acquire mutexb2 : b_count + +b3 : b_count == 1b4 : acquire resb5 : release mutexb6 : acquire mutexb7 : b_count b8 : b_count == 0b9 : release lockb10 : release mutex
Class A operations:
pthread_mutex_lock(mutex); a_count++;if (a_count == 1) pthred_mutex_lock(res);pthread_mutex_unlock(mutex); …pthread_mutex_lock(mutex);a_count--;if (a_count == 0) pthread_mutex_unlock(res);pthread_mutex_unlock(mutex);
61
Initial random execution
a1 : acquire mutexa2 : a_count + +a3 : a_count == 1a4 : acquire resa5 : release mutexa6 : acquire mutexa7 : a_count a8 : a_count == 0a9 : release resa10 : release mutexb1 : acquire mutexb2 : b_count + +b3 : b_count == 1b4 : acquire resb5 : release mutexb6 : acquire mutexb7 : b_count b8 : b_count == 0b9 : release lockb10 : release mutex
Class A operations:
pthread_mutex_lock(mutex); a_count++;if (a_count == 1) pthred_mutex_lock(res);pthread_mutex_unlock(mutex); …pthread_mutex_lock(mutex);a_count--;if (a_count == 0) pthread_mutex_unlock(res);pthread_mutex_unlock(mutex);
62
Initial random execution
a1 : acquire mutexa2 : a_count + +a3 : a_count == 1a4 : acquire resa5 : release mutexa6 : acquire mutexa7 : a_count a8 : a_count == 0a9 : release resa10 : release mutexb1 : acquire mutexb2 : b_count + +b3 : b_count == 1b4 : acquire resb5 : release mutexb6 : acquire mutexb7 : b_count b8 : b_count == 0b9 : release lockb10 : release mutex
Class A operations:
pthread_mutex_lock(mutex); a_count++;if (a_count == 1) pthred_mutex_lock(res);pthread_mutex_unlock(mutex); …pthread_mutex_lock(mutex);a_count--;if (a_count == 0) pthread_mutex_unlock(res);pthread_mutex_unlock(mutex);
63
Initial random execution
a1 : acquire mutexa2 : a_count + +a3 : a_count == 1a4 : acquire resa5 : release mutexa6 : acquire mutexa7 : a_count --a8 : a_count == 0a9 : release resa10 : release mutexb1 : acquire mutexb2 : b_count + +b3 : b_count == 1b4 : acquire resb5 : release mutexb6 : acquire mutexb7 : b_count b8 : b_count == 0b9 : release lockb10 : release mutex
Class A operations:
pthread_mutex_lock(mutex); a_count++;if (a_count == 1) pthred_mutex_lock(res);pthread_mutex_unlock(mutex); …pthread_mutex_lock(mutex);a_count--;if (a_count == 0) pthread_mutex_unlock(res);pthread_mutex_unlock(mutex);
64
Initial random execution
a1 : acquire mutexa2 : a_count + +a3 : a_count == 1a4 : acquire resa5 : release mutexa6 : acquire mutexa7 : a_count --a8 : a_count == 0a9 : release resa10 : release mutexb1 : acquire mutexb2 : b_count + +b3 : b_count == 1b4 : acquire resb5 : release mutexb6 : acquire mutexb7 : b_count b8 : b_count == 0b9 : release lockb10 : release mutex
Class A operations:
pthread_mutex_lock(mutex); a_count++;if (a_count == 1) pthred_mutex_lock(res);pthread_mutex_unlock(mutex); …pthread_mutex_lock(mutex);a_count--;if (a_count == 0) pthread_mutex_unlock(res);pthread_mutex_unlock(mutex);
65
Initial random execution
a1 : acquire mutexa2 : a_count + +a3 : a_count == 1a4 : acquire resa5 : release mutexa6 : acquire mutexa7 : a_count --a8 : a_count == 0a9 : release resa10 : release mutexb1 : acquire mutexb2 : b_count + +b3 : b_count == 1b4 : acquire resb5 : release mutexb6 : acquire mutexb7 : b_count b8 : b_count == 0b9 : release lockb10 : release mutex
Class A operations:
pthread_mutex_lock(mutex); a_count++;if (a_count == 1) pthred_mutex_lock(res);pthread_mutex_unlock(mutex); …pthread_mutex_lock(mutex);a_count--;if (a_count == 0) pthread_mutex_unlock(res);pthread_mutex_unlock(mutex);
66
Initial random execution
a1 : acquire mutexa2 : a_count + +a3 : a_count == 1a4 : acquire resa5 : release mutexa6 : acquire mutexa7 : a_count --a8 : a_count == 0a9 : release resa10 : release mutexb1 : acquire mutexb2 : b_count + +b3 : b_count == 1b4 : acquire resb5 : release mutexb6 : acquire mutexb7 : b_count b8 : b_count == 0b9 : release lockb10 : release mutex
Class A operations:
pthread_mutex_lock(mutex); a_count++;if (a_count == 1) pthred_mutex_lock(res);pthread_mutex_unlock(mutex); …pthread_mutex_lock(mutex);a_count--;if (a_count == 0) pthread_mutex_unlock(res);pthread_mutex_unlock(mutex);
67
Initial random execution
a1 : acquire mutexa2 : a_count + +a3 : a_count == 1a4 : acquire resa5 : release mutexa6 : acquire mutexa7 : a_count a8 : a_count == 0a9 : release resa10 : release mutexb1 : acquire mutexb2 : b_count + +b3 : b_count == 1b4 : acquire resb5 : release mutexb6 : acquire mutexb7 : b_count b8 : b_count == 0b9 : release lockb10 : release mutex
Class B operations:
pthread_mutex_lock(mutex);b_count++;if (b_count == 1) pthred_mutex_lock(res);pthread_mutex_unlock(mutex); …pthread_mutex_lock(mutex);b_count--;if (b_count == 0) pthread_mutex_unlock(res);pthread_mutex_unlock(mutex);
68
Initial random execution
a1 : acquire mutexa2 : a_count + +a3 : a_count == 1a4 : acquire resa5 : release mutexa6 : acquire mutexa7 : a_count-- a8 : a_count == 0a9 : release resa10 : release mutexb1 : acquire mutexb2 : b_count + +b3 : b_count == 1b4 : acquire resb5 : release mutexb6 : acquire mutexb7 : b_count b8 : b_count == 0b9 : release lockb10 : release mutex
Class B operations:
pthread_mutex_lock(mutex);b_count++;if (b_count == 1) pthred_mutex_lock(res);pthread_mutex_unlock(mutex); …pthread_mutex_lock(mutex);b_count--;if (b_count == 0) pthread_mutex_unlock(res);pthread_mutex_unlock(mutex);
69
Initial random execution
a1 : acquire mutexa2 : a_count + +a3 : a_count == 1a4 : acquire resa5 : release mutexa6 : acquire mutexa7 : a_count a8 : a_count == 0a9 : release resa10 : release mutexb1 : acquire mutexb2 : b_count + +b3 : b_count == 1b4 : acquire resb5 : release mutexb6 : acquire mutexb7 : b_count b8 : b_count == 0b9 : release lockb10 : release mutex
Class B operations:
pthread_mutex_lock(mutex);b_count++;if (b_count == 1) pthred_mutex_lock(res);pthread_mutex_unlock(mutex); …pthread_mutex_lock(mutex);b_count--;if (b_count == 0) pthread_mutex_unlock(res);pthread_mutex_unlock(mutex);
70
Dependent operations?
a1 : acquire mutexa2 : a_count + +a3 : a_count == 1a4 : acquire resa5 : release mutexa6 : acquire mutexa7 : a_count a8 : a_count == 0a9 : release resa10 : release mutexb1 : acquire mutexb2 : b_count + +b3 : b_count == 1b4 : acquire resb5 : release mutexb6 : acquire mutexb7 : b_count b8 : b_count == 0b9 : release lockb10 : release mutex
Class B operations:
pthread_mutex_lock(mutex);b_count++;if (b_count == 1) pthred_mutex_lock(res);pthread_mutex_unlock(mutex); …pthread_mutex_lock(mutex);b_count--;if (b_count == 0) pthread_mutex_unlock(res);pthread_mutex_unlock(mutex);
71
Start an alternative execution
a1 : acquire mutexa2 : a_count + +a3 : a_count == 1a4 : acquire resa5 : release mutexa6 : acquire mutexa7 : a_count --a8 : a_count == 0a9 : release resa10 : release mutexb1 : acquire mutexb2 : b_count + +b3 : b_count == 1b4 : acquire resb5 : release mutexb6 : acquire mutexb7 : b_count b8 : b_count == 0b9 : release lockb10 : release mutex
Class A operations:
pthread_mutex_lock(mutex); a_count++;if (a_count == 1) pthred_mutex_lock(res);pthread_mutex_unlock(mutex); …pthread_mutex_lock(mutex);a_count--;if (a_count == 0) pthread_mutex_unlock(res);pthread_mutex_unlock(mutex);
72
Get a deadlock!
a1 : acquire mutexa2 : a_count + +a3 : a_count == 1a4 : acquire resa5 : release mutexb1 : acquire mutexb2 : b_count + +b3 : b_count == 1a6 : acquire mutexa7 : a_count --a8 : a_count == 0a9 : release resa10 : release mutexb4 : acquire resb5 : release mutexb6 : acquire mutexb7 : b_count b8 : b_count == 0b9 : release lockb10 : release mutex
Class A operations:
pthread_mutex_lock(mutex); a_count++;if (a_count == 1) pthred_mutex_lock(res);pthread_mutex_unlock(mutex);pthread_mutex_lock(mutex);
Class B operations:
pthread_mutex_lock(mutex);b_count++;if (b_count == 1) pthred_mutex_lock(res);