Upload
pantazis-deligiannis
View
38
Download
0
Tags:
Embed Size (px)
Citation preview
Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers
Pantazis Deligiannis Alastair Donaldson Zvonimir Rakamaric
Intel — June 2015
Concurrency errors, such as data races, make device drivers hard to develop and
debug without automated tool support
Whoop, a new fully automated tool that:
- statically analyses drivers for data races
- exploits any found race-freedom guarantees to achieve a sound partial-order reduction and accelerate bug-finding using Corral
Corral is an industrial strength bug-finder for device drivers from Microsoft that is used as the backend of the Static Driver Verifier
Our approach
We applied Whoop on 16 drivers from the Linux 4.0 kernel:
- block, char, ethernet, nfc, usb and watchdog (250 — 7300 LoC)
- detected some potential races (but requires domain expertise to confirm)
- using Whoop we significantly accelerated Corral (1.5-20x) !!
Results sneak-peek
Boogie IVLcode, instrumented
with yields
Data RaceReports
No Errors(Under Given Bounds)
WHOOP
Error TracesZ3
Chauffeur
SMACK
Linux driver source code in C
Boogie IVL codellvm-IR
LinuxEnvironmental
ModelInstrumentation
Sequentialization
Invariant Generation
BoogieVerification
Engine
CORRAL
A. Translation Phase B. Symbolic Lockset Analysis Phase C. Bug-Finding Phase
Clang / LLVM
entry point information
New tools: Whoop and Chauffeur
The rest: industrial-strength tools that are robust and battle-proven via their
use in many complex software projects
Input:
- Linux driver source code in C
- Linux environmental model (used to “close” the driver)
Boogie IVLcode, instrumented
with yields
Data RaceReports
No Errors(Under Given Bounds)
WHOOP
Error TracesZ3
Chauffeur
SMACK
Linux driver source code in C
Boogie IVL codellvm-IR
LinuxEnvironmental
ModelInstrumentation
Sequentialization
Invariant Generation
BoogieVerification
Engine
CORRAL
A. Translation Phase B. Symbolic Lockset Analysis Phase C. Bug-Finding Phase
Clang / LLVM
entry point information
Chauffeur:
- Clang frontend that traverses the driver AST and identifies all entry points
- outputs related information in an XML file (to be parsed and used by Whoop)
Boogie IVLcode, instrumented
with yields
Data RaceReports
No Errors(Under Given Bounds)
WHOOP
Error TracesZ3
Chauffeur
SMACK
Linux driver source code in C
Boogie IVL codellvm-IR
LinuxEnvironmental
ModelInstrumentation
Sequentialization
Invariant Generation
BoogieVerification
Engine
CORRAL
A. Translation Phase B. Symbolic Lockset Analysis Phase C. Bug-Finding Phase
Clang / LLVM
entry point information
Clang/LLVM:
- compiles the C source code (and the model) into LLVM-IR
- preserves function calls (e.g. locks/unlocks) — we do not need to track them separately
- also preserves debugging information so we can map errors back to source code
Boogie IVLcode, instrumented
with yields
Data RaceReports
No Errors(Under Given Bounds)
WHOOP
Error TracesZ3
Chauffeur
SMACK
Linux driver source code in C
Boogie IVL codellvm-IR
LinuxEnvironmental
ModelInstrumentation
Sequentialization
Invariant Generation
BoogieVerification
Engine
CORRAL
A. Translation Phase B. Symbolic Lockset Analysis Phase C. Bug-Finding Phase
Clang / LLVM
entry point information
SMACK:
- translates the LLVM-IR into the Boogie intermediate verification language
- leverages LLVM pointer-alias analyses to efficiently model the heap manipulation operations of C programs
Boogie IVLcode, instrumented
with yields
Data RaceReports
No Errors(Under Given Bounds)
WHOOP
Error TracesZ3
Chauffeur
SMACK
Linux driver source code in C
Boogie IVL codellvm-IR
LinuxEnvironmental
ModelInstrumentation
Sequentialization
Invariant Generation
BoogieVerification
Engine
CORRAL
A. Translation Phase B. Symbolic Lockset Analysis Phase C. Bug-Finding Phase
Clang / LLVM
entry point information
SMACK uses a split-memory model that:
- soundly partitions memory locations into non-overlapping equivalence classes that do not alias to achieve scalability
- is based on memory regions, which are maps of integers that model the heap — distinct memory regions denote disjoint sections of the heap
- we leverage this knowledge to guide and optimise Whoop
Boogie IVLcode, instrumented
with yields
Data RaceReports
No Errors(Under Given Bounds)
WHOOP
Error TracesZ3
Chauffeur
SMACK
Linux driver source code in C
Boogie IVL codellvm-IR
LinuxEnvironmental
ModelInstrumentation
Sequentialization
Invariant Generation
BoogieVerification
Engine
CORRAL
A. Translation Phase B. Symbolic Lockset Analysis Phase C. Bug-Finding Phase
Clang / LLVM
entry point information
Whoop is based on symbolic pairwise lockset analysis, a novel technique for
data race analysis in device drivers
Boogie IVLcode, instrumented
with yields
Data RaceReports
No Errors(Under Given Bounds)
WHOOP
Error TracesZ3
Chauffeur
SMACK
Linux driver source code in C
Boogie IVL codellvm-IR
LinuxEnvironmental
ModelInstrumentation
Sequentialization
Invariant Generation
BoogieVerification
Engine
CORRAL
A. Translation Phase B. Symbolic Lockset Analysis Phase C. Bug-Finding Phase
Clang / LLVM
entry point information
Lightweight race detection method:
- proposed in the context of Eraser (TOCS’97), a dynamic data race detector — key idea:
- track the set of locks that are consistently used to protect a memory location during program execution
- if that lockset ever becomes empty, the analysis reports a potential race on that memory location
- this is because an empty lockset suggests that a memory location may be accessed simultaneously by two or more threads
Lockset analysis
T1
CLST2 LSA
warning: access to A may not be protected
compute set intersection at access points
T2
{ }CLST1
{ }Program
{ M, N }
lock (M);lock (N);write (A);unlock (N);write (A);
unlock (M);
{ M, N }{ M, N }{ M, N }{ M, N }
{ M }{ M }
{ M }{ M, N }{ M, N }
{ M }{ M }{ }
lock (M);write (A);
unlock (M);write (A);
{ M }{ M }{ }{ }
{ M }{ M }{ M }{ }
Initial
Advantages of lockset analysis:
- easy to implement, lightweight, has the potential to scale well (in contrast with happens-before based analysis)
Limitations of lockset analysis:
- imprecision (a violation of locking discipline is not always a race)
- code coverage in dynamic tools is limited by execution paths that are explored
- to counter the latter, we apply lockset analysis in a static context
For a given driver:
- we consider every pair of entry points that can potentially execute concurrently
- for each pair we use symbolic verification to check if it is possible for a pair to race on a shared memory location
- we soundly model the effects of any other entry point by over-approximating the driver shared state
Symbolic pairwise lockset analysis
For a given pair of entry points:
- we instrument each entry point with additional state to record locksets (for lockset analysis)
- we attempt to verify a sequential program that executes the instrumented entry points in sequence, and then …
- we assert, for each shared location, that the locksets for each entry point with respect to this location have a non-empty intersection
Symbolic verification
1. Initialise current locksets, read and write sets to empty for each entry point in the pair
2. Foreach shared variable s initialise the lockset of s to the set of all possible locks
3. Call entry point T
4. Call entry point U
5. Assert that for each shared variable s, if s is written by T and accessed by U, or if s is written by U and accessed by T, then the lockset of s in T and the lockset of s in U must have at least one common lock (non-empty intersection)
Sequentialisation
Boogie IVLcode, instrumented
with yields
Data RaceReports
No Errors(Under Given Bounds)
WHOOP
Error TracesZ3
Chauffeur
SMACK
Linux driver source code in C
Boogie IVL codellvm-IR
LinuxEnvironmental
ModelInstrumentation
Sequentialization
Invariant Generation
BoogieVerification
Engine
CORRAL
A. Translation Phase B. Symbolic Lockset Analysis Phase C. Bug-Finding Phase
Clang / LLVM
entry point information
Invariant generation:
- procedure summaries (for scalability)
- loop invariants
- we use Houdini (built in Boogie) — given a generated set of candidate invariants it finds the inductive invariants
Boogie IVLcode, instrumented
with yields
Data RaceReports
No Errors(Under Given Bounds)
WHOOP
Error TracesZ3
Chauffeur
SMACK
Linux driver source code in C
Boogie IVL codellvm-IR
LinuxEnvironmental
ModelInstrumentation
Sequentialization
Invariant Generation
BoogieVerification
Engine
CORRAL
A. Translation Phase B. Symbolic Lockset Analysis Phase C. Bug-Finding Phase
Clang / LLVM
entry point information
Verification:
- each instrumented pair is send to Boogie
- Boogie generates VC’s and feeds them to Z3
- verification implies race-freedom
- counter-example denotes a potential race
- the Linux kernel can serialise calls to entry points, thus forcing them to run in sequence instead of an interleaved manner (e.g. RTNL)
- Whoop exploits this knowledge and does not create pairs for entry points that are mutually serialised by the kernel
- ongoing manual effort (requires domain expertise)
Kernel imposed serialisation
- Whoop is “soundy”: aims to perform a sound analysis, but suffers from some known sources of unsoundness
- we assume that the formal parameters of an entry point do not alias, and thus cannot race
- we rely on the soundness of our best-effort environmental model
- we inherit potential unsoundness from the tools we use (e.g. integers in SMACK)
Assumptions
- can be imprecise as it inherits the limitations of lockset analysis
- uses over-approximation, can lead to false alarms
- does not check for dynamically created locks or locks from external libraries
- we currently do not handle interrupt handlers in special way, we just assume they execute concurrently at all times
- we over-approximate lock-free data structures
- we perform static analysis and, thus, need to close the environment
Limitations of Whoop
Boogie IVLcode, instrumented
with yields
Data RaceReports
No Errors(Under Given Bounds)
WHOOP
Error TracesZ3
Chauffeur
SMACK
Linux driver source code in C
Boogie IVL codellvm-IR
LinuxEnvironmental
ModelInstrumentation
Sequentialization
Invariant Generation
BoogieVerification
Engine
CORRAL
A. Translation Phase B. Symbolic Lockset Analysis Phase C. Bug-Finding Phase
Clang / LLVM
entry point information
Accelerating Corral:
- Whoop is sound but imprecise
- we exploit any race-freedom guarantees from phase B to speedup precise bug-finding with Corral (in this work we only consider races as bugs)
Boogie IVLcode, instrumented
with yields
Data RaceReports
No Errors(Under Given Bounds)
WHOOP
Error TracesZ3
Chauffeur
SMACK
Linux driver source code in C
Boogie IVL codellvm-IR
LinuxEnvironmental
ModelInstrumentation
Sequentialization
Invariant Generation
BoogieVerification
Engine
CORRAL
A. Translation Phase B. Symbolic Lockset Analysis Phase C. Bug-Finding Phase
Clang / LLVM
entry point information
Accelerating Corral:
- Corral is a bounded symbolic verifier for Boogie
- sequentialises the driver using a context-switch bound
- attempts to prove bounded (in terms of number of loop iterations and recursion depth) sequential reachability of a bug in a goal-directed, lazy fashion to postpone state-space explosion when analysing a large program
Boogie IVLcode, instrumented
with yields
Data RaceReports
No Errors(Under Given Bounds)
WHOOP
Error TracesZ3
Chauffeur
SMACK
Linux driver source code in C
Boogie IVL codellvm-IR
LinuxEnvironmental
ModelInstrumentation
Sequentialization
Invariant Generation
BoogieVerification
Engine
CORRAL
A. Translation Phase B. Symbolic Lockset Analysis Phase C. Bug-Finding Phase
Clang / LLVM
entry point information
Default sequentialisation:
- By default, and assuming no race-freedom guarantees, Whoop instruments a yield after each shared memory access of each entry point, and after every lock and unlock operation
- Whoop then sends this instrumented program to Corral, which explores all possible thread interleavings up to a pre-defined bound
- The default sequentialisation can explode!
- our solution: if thanks to Whoop we know that a given statement that accesses shared memory cannot be involved in a data race, then we do not instrument a yield after this statement
- this tames the sequentialisation and can greatly speedup Corral
Sound partial-order reduction
We applied Whoop on 16 drivers from the Linux 4.0 kernel:
- block, char, ethernet, nfc, usb and watchdog (250 — 7300 LoC)
- detected some potential races (but requires domain expertise to confirm)
- using Whoop we significantly accelerated Corral (1.5-20x) !!
Evaluation
The symbols +, o and x, represent a context-switch bound of 2, 5 and 9, respectively
Thanks!
http://www.doc.ic.ac.uk/~pd1113/
https://github.com/pdeligia