Upload
anevay
View
23
Download
0
Embed Size (px)
DESCRIPTION
Trace Verification for Parallel Systems Vijay K. Garg Department of Electrical and Computer Engineering The University of Texas at Austin Austin, TX 78712 email: [email protected]. Talk Outline. Motivation and Overview Instrumentation Clock : Tracking Dependency Property Checking - PowerPoint PPT Presentation
Citation preview
Parallel and Distributed Systems Laboratory
Paradise: A Toolkit for Building Reliable Concurrent Systems
Trace Verification for Parallel Systems
Vijay K. GargDepartment of Electrical and Computer Engineering
The University of Texas at AustinAustin, TX 78712
email: [email protected]
2
Talk Outline
Motivation and Overview
Instrumentation
– Clock : Tracking Dependency
Property Checking
– Sensor : Detecting Global Properties
– Slicer : Computation Slicing
4
Motivation: Reliable System
Concurrent systems are prone to errors.
– Concurrency, nondeterminism, process and channel failures
Techniques to ensure correctness
Modeling: Model Checking and Formal Verification
Bug Hunting: Simulation, Debugging and Verification
Fault-Tolerance
5
Paradise Environment
Program Monitor
Slicer
Predicate
Observe
Control
6
Talk Outline
Motivation and Overview
Instrumentation
– Clock : Tracking Dependency
Property Checking
– Sensor : Detecting Global Properties
– Slicer : Computation Slicing
7
Trace Model: Total Order vs Partial Order
Total order: interleaving of events in a trace
Partial order: Lamport’s happened-before model
f2e1
CS2 CS1
f1 e2
P1
P2
Partial Order Trace
CS2
CS1
e1 e2
f1 f2
e2e1
CS1CS2
f1 f2
Successful Trace
Specification:CS1 Λ CS2
¬CS2 ¬CS1
¬ CS1
¬CS2
¬CS1 ¬ CS2
Faulty Trace
8
Tracking Dependency
computation: a set of events ordered by “happened before” relation
Problem: Timestamp events to answer
– e happened before f ?
– e concurrent with f ?
9
Clocks in a Distributed System
Result: s happened before t i the vector at s is less than the vector at t.
Vector Clocks [Fidge 89, Mattern 89]
P1
(1,0,0) (2,1,0) (3,1,0)
P2
(0,1,0) (0,2,0)
P3
(0,0,1) (0,0,2) (2,1,3)
10
Dynamic Chain Clocks
Problem with vector clocks: scalability, dynamic process structure
Idea: Computing the “chains” in an online fashion [Aggarwal and Garg PODC 05] for relevant events
a
f
eb
d
c
h
g
a b c d
e f g h
A computation with 4 processes The relevant subcomputation
P1
P2
P3
P4
11
Experimental Results
Simulation of a computation with 1% relevant events
Measured
– number of components vs number of threads
– total time overhead vs number of threads
12
Talk Outline
Motivation and Overview
Instrumentation
– Clock : Tracking Dependency
Property Checking
– Sensor : Detecting Global Properties
– Slicer : Computation Slicing
13
Global Property Detection
Predicate: A global condition expressed using variables on processes
– e.g., more than one process is in critical section, there is no token in the system
Problem: find a global state that satisfies the given predicate
P1
P2
G1 G2
Critical section
14
The Main Difficulty in Partial Order
Algorithm for general predicate [Cooper and Marzullo 91]
Too many global states : A computation may contain as many as O(kn) global states
• k: maximum number of events on a process• n: number of processes
e1 e2
f1 f2
T┴
P1
P2 {e1, ┴} {f1, ┴}
{e1, f1, ┴}
{e2, e1, f1, ┴}
{e2, e1, f2, f1, ┴
{e1, f2, f1, ┴}
{e2, e1, ┴}
{┴}
15
Efficient Predicate Detection for Special Cases
stable predicate: [Chandy and Lamport 85] • once the predicate becomes true, it stays true e.g.,
deadlock
unstable predicate:• observer independent predicate [Charron-Bost et al 95]
occurs in one interleaving occurs in all interleavings e.g., any disjunction of local predicate
• linear predicate [Chase and Garg 95] e.g., conjunctive predicates such as there is no leader in
the system• relational predicate: x1 + x2 +…+ xn ≥ k [Chase and Garg
95] e.g., violation of k-mutual exclusion
16
Algorithms for Conjunctive Predicates
Centralized Algorithm [Garg and Waldecker 92] Each non-checker process maintains its local vector and sends to the checker process the chain clock whenever
– local predicate is true
– at most once in each message interval.
Time complexity: Checker requires at most O(n2m) comparisons.
– token based algorithm [Garg and Chase 95]
– completely distributed algorithm [Garg and Chase 95]
– keeping queues shorter [Chiou and Korfhage 95]
– avoiding control messages [Hurfin, Mizuno, Raynal, Singhal 96]
17
Other Special Classes of Predicates
Relational Predicates
– Let xi: number of token at Pi
– Σxi < k: loss of tokens
– Algorithms: max-flow techniques [Groselj 93, Chase and Garg 95, Wu and Chen 98]
– Dilworth's partition [Tomlinson and Garg 96]
18
Talk Outline
Motivation and Overview
Instrumentation
– Clock : Tracking Dependency
Property Checking
– Sensor : Detecting Global Properties
– Slicer : Computation Slicing
19
The Main Idea of Computation Slicing
Partial order trace
slice
state explosion
keep all red global statesslicing
20
How does Computation Slicing Help?
Partial order trace
slice
retain all global states satisfying b1
slicing for b1
check b1 Λ b2
check b2
satisfy b1
21
Example
Detect predicate (x*y + z < 5) Λ (x ≥1) Λ (z ≤3)
P1
P2
P3
x
y
z
a
1
b
2
c
-1
d
0
e
0
f
2
g
1
h
3
u
4
v
1
w
2
x
4
{a,e,f,u,v} {b}
{w} {g}
Computation
Slice with respect to (x ≥1) Λ (z ≤3)
22
Computation Slice
computation slice: a sub-computation such that: [Mittal and Garg 01]
1. it contains all global states of the computation satisfying the given predicate, and
2. it contains the least number of global states
23
POTA Architecture [Sen Dissertation 04]
Instrumentor
Specification
SlicerPredicate Detector
Trace
Slice
Predicate (Specification)
TranslatorExecuteProgram
ExecuteSPIN
Program
InstrumentedProgram
Promela
Trace Slice
yes/witnessno/counterexample
no/counterexample
yes
Analyzer
24
Results
Efficient polynomial-time algorithms for computing the slice for:
– linear predicates: [Garg and Mittal 01]• time-complexity: O(n2m)
– general predicate:• Theorem: Given a computation, if a predicate b can be
detected efficiently then the slice for b can also be computed efficiently. [Mittal,Sen and Garg 03]
– combining slices: Boolean operators
– temporal logic operators: EF, AG, EG
– approximate slice: For arbitrary boolean expression
• n: number of processes• m: number of events
25
Experiments: Dining Philosophers Trace Verification
POTA: Partial Order Trace Analyzer (based on slicing) [Sen and Garg 03]
SPIN: A widely used model checking tool [Holzmann 97]
– SPIN: 250 seconds for n = 6, runs out of memory for n > 6.
– POTA: can handle n= 200. Used 400 seconds.
Predicate: Two neighboring dining philosophers do not eat concurrently
26
Conclusions
Bug-hunting in concurrent systems
Total order vs. Partial Order
Abstraction like slicing to combat state space explosion problem
27
Questions
?