29
Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers Pantazis Deligiannis Alastair Donaldson Zvonimir Rakamaric ́ Intel — June 2015

Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

Embed Size (px)

Citation preview

Page 1: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

Pantazis Deligiannis Alastair Donaldson Zvonimir Rakamaric

Intel — June 2015

Page 2: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

Concurrency errors, such as data races, make device drivers hard to develop and

debug without automated tool support

Page 3: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

Whoop, a new fully automated tool that:

- statically analyses drivers for data races

- exploits any found race-freedom guarantees to achieve a sound partial-order reduction and accelerate bug-finding using Corral

Corral is an industrial strength bug-finder for device drivers from Microsoft that is used as the backend of the Static Driver Verifier

Our approach

Page 4: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

We applied Whoop on 16 drivers from the Linux 4.0 kernel:

- block, char, ethernet, nfc, usb and watchdog (250 — 7300 LoC)

- detected some potential races (but requires domain expertise to confirm)

- using Whoop we significantly accelerated Corral (1.5-20x) !!

Results sneak-peek

Page 5: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

Boogie IVLcode, instrumented

with yields

Data RaceReports

No Errors(Under Given Bounds)

WHOOP

Error TracesZ3

Chauffeur

SMACK

Linux driver source code in C

Boogie IVL codellvm-IR

LinuxEnvironmental

ModelInstrumentation

Sequentialization

Invariant Generation

BoogieVerification

Engine

CORRAL

A. Translation Phase B. Symbolic Lockset Analysis Phase C. Bug-Finding Phase

Clang / LLVM

entry point information

New tools: Whoop and Chauffeur

The rest: industrial-strength tools that are robust and battle-proven via their

use in many complex software projects

Page 6: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

Input:

- Linux driver source code in C

- Linux environmental model (used to “close” the driver)

Boogie IVLcode, instrumented

with yields

Data RaceReports

No Errors(Under Given Bounds)

WHOOP

Error TracesZ3

Chauffeur

SMACK

Linux driver source code in C

Boogie IVL codellvm-IR

LinuxEnvironmental

ModelInstrumentation

Sequentialization

Invariant Generation

BoogieVerification

Engine

CORRAL

A. Translation Phase B. Symbolic Lockset Analysis Phase C. Bug-Finding Phase

Clang / LLVM

entry point information

Page 7: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

Chauffeur:

- Clang frontend that traverses the driver AST and identifies all entry points

- outputs related information in an XML file (to be parsed and used by Whoop)

Boogie IVLcode, instrumented

with yields

Data RaceReports

No Errors(Under Given Bounds)

WHOOP

Error TracesZ3

Chauffeur

SMACK

Linux driver source code in C

Boogie IVL codellvm-IR

LinuxEnvironmental

ModelInstrumentation

Sequentialization

Invariant Generation

BoogieVerification

Engine

CORRAL

A. Translation Phase B. Symbolic Lockset Analysis Phase C. Bug-Finding Phase

Clang / LLVM

entry point information

Page 8: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

Clang/LLVM:

- compiles the C source code (and the model) into LLVM-IR

- preserves function calls (e.g. locks/unlocks) — we do not need to track them separately

- also preserves debugging information so we can map errors back to source code

Boogie IVLcode, instrumented

with yields

Data RaceReports

No Errors(Under Given Bounds)

WHOOP

Error TracesZ3

Chauffeur

SMACK

Linux driver source code in C

Boogie IVL codellvm-IR

LinuxEnvironmental

ModelInstrumentation

Sequentialization

Invariant Generation

BoogieVerification

Engine

CORRAL

A. Translation Phase B. Symbolic Lockset Analysis Phase C. Bug-Finding Phase

Clang / LLVM

entry point information

Page 9: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

SMACK:

- translates the LLVM-IR into the Boogie intermediate verification language

- leverages LLVM pointer-alias analyses to efficiently model the heap manipulation operations of C programs

Boogie IVLcode, instrumented

with yields

Data RaceReports

No Errors(Under Given Bounds)

WHOOP

Error TracesZ3

Chauffeur

SMACK

Linux driver source code in C

Boogie IVL codellvm-IR

LinuxEnvironmental

ModelInstrumentation

Sequentialization

Invariant Generation

BoogieVerification

Engine

CORRAL

A. Translation Phase B. Symbolic Lockset Analysis Phase C. Bug-Finding Phase

Clang / LLVM

entry point information

Page 10: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

SMACK uses a split-memory model that:

- soundly partitions memory locations into non-overlapping equivalence classes that do not alias to achieve scalability

- is based on memory regions, which are maps of integers that model the heap — distinct memory regions denote disjoint sections of the heap

- we leverage this knowledge to guide and optimise Whoop

Boogie IVLcode, instrumented

with yields

Data RaceReports

No Errors(Under Given Bounds)

WHOOP

Error TracesZ3

Chauffeur

SMACK

Linux driver source code in C

Boogie IVL codellvm-IR

LinuxEnvironmental

ModelInstrumentation

Sequentialization

Invariant Generation

BoogieVerification

Engine

CORRAL

A. Translation Phase B. Symbolic Lockset Analysis Phase C. Bug-Finding Phase

Clang / LLVM

entry point information

Page 11: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

Whoop is based on symbolic pairwise lockset analysis, a novel technique for

data race analysis in device drivers

Boogie IVLcode, instrumented

with yields

Data RaceReports

No Errors(Under Given Bounds)

WHOOP

Error TracesZ3

Chauffeur

SMACK

Linux driver source code in C

Boogie IVL codellvm-IR

LinuxEnvironmental

ModelInstrumentation

Sequentialization

Invariant Generation

BoogieVerification

Engine

CORRAL

A. Translation Phase B. Symbolic Lockset Analysis Phase C. Bug-Finding Phase

Clang / LLVM

entry point information

Page 12: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

Lightweight race detection method:

- proposed in the context of Eraser (TOCS’97), a dynamic data race detector — key idea:

- track the set of locks that are consistently used to protect a memory location during program execution

- if that lockset ever becomes empty, the analysis reports a potential race on that memory location

- this is because an empty lockset suggests that a memory location may be accessed simultaneously by two or more threads

Lockset analysis

Page 13: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

T1

CLST2 LSA

warning: access to A may not be protected

compute set intersection at access points

T2

{ }CLST1

{ }Program

{ M, N }

lock (M);lock (N);write (A);unlock (N);write (A);

unlock (M);

{ M, N }{ M, N }{ M, N }{ M, N }

{ M }{ M }

{ M }{ M, N }{ M, N }

{ M }{ M }{ }

lock (M);write (A);

unlock (M);write (A);

{ M }{ M }{ }{ }

{ M }{ M }{ M }{ }

Initial

Page 14: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

Advantages of lockset analysis:

- easy to implement, lightweight, has the potential to scale well (in contrast with happens-before based analysis)

Limitations of lockset analysis:

- imprecision (a violation of locking discipline is not always a race)

- code coverage in dynamic tools is limited by execution paths that are explored

- to counter the latter, we apply lockset analysis in a static context

Page 15: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

For a given driver:

- we consider every pair of entry points that can potentially execute concurrently

- for each pair we use symbolic verification to check if it is possible for a pair to race on a shared memory location

- we soundly model the effects of any other entry point by over-approximating the driver shared state

Symbolic pairwise lockset analysis

Page 16: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

For a given pair of entry points:

- we instrument each entry point with additional state to record locksets (for lockset analysis)

- we attempt to verify a sequential program that executes the instrumented entry points in sequence, and then …

- we assert, for each shared location, that the locksets for each entry point with respect to this location have a non-empty intersection

Symbolic verification

Page 17: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

1. Initialise current locksets, read and write sets to empty for each entry point in the pair

2. Foreach shared variable s initialise the lockset of s to the set of all possible locks

3. Call entry point T

4. Call entry point U

5. Assert that for each shared variable s, if s is written by T and accessed by U, or if s is written by U and accessed by T, then the lockset of s in T and the lockset of s in U must have at least one common lock (non-empty intersection)

Sequentialisation

Page 18: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

Boogie IVLcode, instrumented

with yields

Data RaceReports

No Errors(Under Given Bounds)

WHOOP

Error TracesZ3

Chauffeur

SMACK

Linux driver source code in C

Boogie IVL codellvm-IR

LinuxEnvironmental

ModelInstrumentation

Sequentialization

Invariant Generation

BoogieVerification

Engine

CORRAL

A. Translation Phase B. Symbolic Lockset Analysis Phase C. Bug-Finding Phase

Clang / LLVM

entry point information

Invariant generation:

- procedure summaries (for scalability)

- loop invariants

- we use Houdini (built in Boogie) — given a generated set of candidate invariants it finds the inductive invariants

Page 19: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

Boogie IVLcode, instrumented

with yields

Data RaceReports

No Errors(Under Given Bounds)

WHOOP

Error TracesZ3

Chauffeur

SMACK

Linux driver source code in C

Boogie IVL codellvm-IR

LinuxEnvironmental

ModelInstrumentation

Sequentialization

Invariant Generation

BoogieVerification

Engine

CORRAL

A. Translation Phase B. Symbolic Lockset Analysis Phase C. Bug-Finding Phase

Clang / LLVM

entry point information

Verification:

- each instrumented pair is send to Boogie

- Boogie generates VC’s and feeds them to Z3

- verification implies race-freedom

- counter-example denotes a potential race

Page 20: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

- the Linux kernel can serialise calls to entry points, thus forcing them to run in sequence instead of an interleaved manner (e.g. RTNL)

- Whoop exploits this knowledge and does not create pairs for entry points that are mutually serialised by the kernel

- ongoing manual effort (requires domain expertise)

Kernel imposed serialisation

Page 21: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

- Whoop is “soundy”: aims to perform a sound analysis, but suffers from some known sources of unsoundness

- we assume that the formal parameters of an entry point do not alias, and thus cannot race

- we rely on the soundness of our best-effort environmental model

- we inherit potential unsoundness from the tools we use (e.g. integers in SMACK)

Assumptions

Page 22: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

- can be imprecise as it inherits the limitations of lockset analysis

- uses over-approximation, can lead to false alarms

- does not check for dynamically created locks or locks from external libraries

- we currently do not handle interrupt handlers in special way, we just assume they execute concurrently at all times

- we over-approximate lock-free data structures

- we perform static analysis and, thus, need to close the environment

Limitations of Whoop

Page 23: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

Boogie IVLcode, instrumented

with yields

Data RaceReports

No Errors(Under Given Bounds)

WHOOP

Error TracesZ3

Chauffeur

SMACK

Linux driver source code in C

Boogie IVL codellvm-IR

LinuxEnvironmental

ModelInstrumentation

Sequentialization

Invariant Generation

BoogieVerification

Engine

CORRAL

A. Translation Phase B. Symbolic Lockset Analysis Phase C. Bug-Finding Phase

Clang / LLVM

entry point information

Accelerating Corral:

- Whoop is sound but imprecise

- we exploit any race-freedom guarantees from phase B to speedup precise bug-finding with Corral (in this work we only consider races as bugs)

Page 24: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

Boogie IVLcode, instrumented

with yields

Data RaceReports

No Errors(Under Given Bounds)

WHOOP

Error TracesZ3

Chauffeur

SMACK

Linux driver source code in C

Boogie IVL codellvm-IR

LinuxEnvironmental

ModelInstrumentation

Sequentialization

Invariant Generation

BoogieVerification

Engine

CORRAL

A. Translation Phase B. Symbolic Lockset Analysis Phase C. Bug-Finding Phase

Clang / LLVM

entry point information

Accelerating Corral:

- Corral is a bounded symbolic verifier for Boogie

- sequentialises the driver using a context-switch bound

- attempts to prove bounded (in terms of number of loop iterations and recursion depth) sequential reachability of a bug in a goal-directed, lazy fashion to postpone state-space explosion when analysing a large program

Page 25: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

Boogie IVLcode, instrumented

with yields

Data RaceReports

No Errors(Under Given Bounds)

WHOOP

Error TracesZ3

Chauffeur

SMACK

Linux driver source code in C

Boogie IVL codellvm-IR

LinuxEnvironmental

ModelInstrumentation

Sequentialization

Invariant Generation

BoogieVerification

Engine

CORRAL

A. Translation Phase B. Symbolic Lockset Analysis Phase C. Bug-Finding Phase

Clang / LLVM

entry point information

Default sequentialisation:

- By default, and assuming no race-freedom guarantees, Whoop instruments a yield after each shared memory access of each entry point, and after every lock and unlock operation

- Whoop then sends this instrumented program to Corral, which explores all possible thread interleavings up to a pre-defined bound

Page 26: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

- The default sequentialisation can explode!

- our solution: if thanks to Whoop we know that a given statement that accesses shared memory cannot be involved in a data race, then we do not instrument a yield after this statement

- this tames the sequentialisation and can greatly speedup Corral

Sound partial-order reduction

Page 27: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

We applied Whoop on 16 drivers from the Linux 4.0 kernel:

- block, char, ethernet, nfc, usb and watchdog (250 — 7300 LoC)

- detected some potential races (but requires domain expertise to confirm)

- using Whoop we significantly accelerated Corral (1.5-20x) !!

Evaluation

Page 28: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

The symbols +, o and x, represent a context-switch bound of 2, 5 and 9, respectively

Page 29: Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

Thanks!

http://www.doc.ic.ac.uk/~pd1113/

[email protected]

https://github.com/pdeligia