Michael Bond Katherine Coons Kathryn McKinley University of Texas at Austin

Preview:

Citation preview

Pacer: Proportional Detection of Data RacesMichael BondKatherine CoonsKathryn McKinleyUniversity of Texas at Austin

Detecting data races in production

Overhead

FastTrack[Flanagan & Freund ’09]

80x 8x

Overhead

FastTrack[Flanagan & Freund ’09]

creads&writes + csync n

Number of threads

Overhead

FastTrack[Flanagan & Freund ’09]

creads&writes + csync n

Problemin future

Problemtoday

Overhead

FastTrack[Flanagan & Freund ’09]

creads&writes + csync n

Pacer (creads&writes + csync n) r + cnon-sampling (1 – r)

Sampling rate

Overhead

FastTrack[Flanagan & Freund ’09]

creads&writes + csync n

Pacer (creads&writes + csync n) r + cnon-sampling (1 – r)

Sampling periods Non-sampling periods

Overhead

FastTrack[Flanagan & Freund ’09]

creads&writes + csync n

Pacer (creads&writes + csync n) r + cnon-sampling (1 – r)

Probability (detecting any race)

FastTrack 1

Pacer r

Detect race first access sampled

Sampling period

Thread A Thread B

Non-sampling period

Sampling period

Non-sampling period

Non-sampling period

Thread A Thread B

write x

read x

read y

write y

Insight #1:Stop tracking variable after

non-sampled access

Thread A

write x

unlock m

Thread B

Thread A

write x

unlock m

Thread B

lock m

Thread A

write x

unlock m

Thread B

lock m

write x

Thread A

write x

unlock m

read x

Thread B

lock m

write x

Thread A

write x

unlock m

read x

Thread B

lock m

write xRace!

Thread A

write x

unlock m

read x

Thread B

lock m

write xRace!

Thread A

write x

unlock m

read x

Thread B

lock m

write x

5 2 3 4A B A B

Vector clocks

Thread A

write x

unlock m

read x

Thread B

lock m

write x

5 2 3 4A B A B

Vector clocks

Thread A

write x

unlock m

read x

Thread B

lock m

write x

5 2 3 4A B A B

Vector clocks

Thread A

write x

unlock m

read x

Thread B

lock m

write x

5 2 3 4A B A B

Thread A

write x

unlock m

read x

Thread B

lock m

write x

5 2 3 4A B A B

5@A

Thread A

write x

unlock m

read x

Thread B

lock m

write x

5 2 3 4

5 2

5@A

A B A B

Thread A

write x

unlock m

read x

Thread B

lock m

write x

5 2 3 4

6 25 2

5@A

Incrementclock

A B A B

Thread A

write x

unlock m

read x

Thread B

lock m

write x

5 2 3 4

6 2

5 4

5 2

Joinclocks

5@A

A B A B

Thread A

write x

unlock m

read x

Thread B

lock m

write x

5 2 3 4

5 4

5 2

5@A

6 2

Happens before?

A B A B

Thread A

write x

unlock m

read x

Thread B

lock m

write x

5 2 3 4

5 4

5@A

5 2

6 2

A B A B

Thread A

write x

unlock m

read x

Thread B

lock m

write x

5 2 3 4

5 4

5@A

5 2

6 2

A B A B

Thread A

write x

unlock m

read x

Thread B

lock m

write x

5 2 3 4

5 4

5@A

5 2

6 2

No work performed

A B A B

Thread A

write x

unlock m

read x

Thread B

lock m

write x

5 2 3 4

5 4

5@A

5 2

6 2

Race uncaught

A B A B

Thread A

write x

unlock m

read x

Thread B

lock m

write x

5 2 3 4

5 4

5 2

6 2

4@B

A B A B

Thread A

write x

unlock m

read x

Thread B

lock m

write x

5 2 3 4

5 4

5 2

6 2

4@B

Happens before?Race!

A B A B

Insight #2: We only care whether“A happens before B”

if A is sampled

Thread A Thread B

Do these events happen before other events?We don’t care!

Increment clocks

Thread A Thread B

Don’t increment clocks

Increment clocks

Don’t increment clocks

Don’t increment clocks

Do these events happen before other events?We don’t care!

Thread A

unlock m1

unlock m2

Thread B

lock m1

lock m2

5 2 3 4A B A B

Thread A

unlock m1

unlock m2

Thread B

lock m1

lock m2

5 2 3 4

5 4

5 4

5 2

No clock increment

A B A B

Thread A

unlock m1

unlock m2

Thread B

lock m1

lock m2

5 2 3 4

5 4

5 4

5 2

5 2

A B A B

Thread A

unlock m1

unlock m2

Thread B

lock m1

lock m2

5 2 3 4

5 4

5 4

5 2

5 2

Unnecessary join

A B A B

Thread A

unlock m1

unlock m2

Thread B

lock m1

lock m2

5 2 3 4

5 4

5 4

5 2

5 2

O(n) O(1)

A B A B

Implementation

http://jikesrvm.org/Research+Archive

Performance

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%0

5

10

15

20

eclipsehsqldbxalanpseudojbb

Sampling rate

Slow

dow

n

1

Performance

Qualitative improvementin time & space

Accuracy

Probability (detecting any race) = r?

Per-Race Accuracy(eclipse, r = 1%)

0%

1%

8%

Distinct races (ordered by detection rate)

Det

ectio

n ra

te

Related Work

LiteRace [Marino et al. ’09]

Cold-region hypothesis [Chilimbi & Hauswirth ’04]

Full analysis at synchronization operations

Deployable Race Detection

Accuracy, time, space sampling rateDetect race first access sampled

Deployable Race Detection

Accuracy, time, space sampling rateDetect race first access sampled

Qualitative improvement

Deployable Race Detection

Accuracy, time, space sampling rateDetect race first access sampled

Qualitative improvementHelp developers fix difficult-to-reproduce bugs

Deployable Race Detection

Accuracy, time, space sampling rateDetect race first access sampled

Qualitative improvementHelp developers fix difficult-to-reproduce bugs

Thank you

Backup

Thread A

unlock m1

unlock m2

Thread B

lock m1

lock m2

5 2 3 4

5 4

A B A B

Example: “Timeless” Non-Sampling Periods

5 4

v6

Vector clock versions

Thread A

unlock m1

unlock m2

Thread B

lock m1

lock m2

5 2 3 4

5 4

A B A B

Example: “Timeless” Non-Sampling Periods

v6

5 2 v6

v6

Thread A

unlock m1

unlock m2

Thread B

lock m1

lock m2

5 2 3 4A B A B

Example: “Timeless” Non-Sampling Periods

v6

5 2 v6

5 2 v6

Join unnecessary

5 4v6

Per-Race Accuracy (Eclipse)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 270%

1%

10%

100%

r = 25%r = 10%r = 5%r = 3%r = 1%

Distinct races (each line sorted by detection rate)

Det

ectio

n ra

te

Space Performance (eclipse)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

200

400

600

800

r=100%r=25%r=5%r=1%Base

Fraction of program execution

Live

mem

ory

(MB)

Performance (0-10% sampling rate)

0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10%0

1

2

3

4

eclipsehsqldbxalanpseudojbb

Sampling rate

Slow

dow

n

0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10%0

1

2

3

4

eclipsehsqldbxalanpseudojbb

Sampling rate

Slow

dow

n

33% base overhead

52% over-head

Performance (0-10% sampling rate)

0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10%0

1

2

3

4

eclipsehsqldbxalanpseudojbb

Sampling rate

Slow

dow

n

33% base overhead

52% over-head

Performance (0-10% sampling rate)

Qualitative improvement

Methodology

Core 2 Quad (4 cores) Multithreaded benchmarks (DaCapo & SPECjbb2000)

Evaluating sampling-based race detection Need 100s of trials to evaluate Some races are rare Evaluate only frequent races

Data Races

Two accesses to same variable (one is a write)

One access doesn’t happen before the other Program order Synchronization order▪ Acquire-release▪ Wait-notify▪ Fork-join▪ Volatile read-write

Data Races

Thread A

write x

unlock m

Thread B Two accesses to same variable (one is a write)

One access doesn’t happen before the other Program order Synchronization order▪ Acquire-release▪ Wait-notify▪ Fork-join▪ Volatile read-write

Data Races

Thread A

write x

unlock m

Thread B

lock m

write x

Two accesses to same variable (one is a write)

One access doesn’t happen before the other Program order Synchronization order▪ Acquire-release▪ Wait-notify▪ Fork-join▪ Volatile read-write

Data Races

Thread A

write x

unlock m

read x

Thread B

lock m

write x

Two accesses to same variable (one is a write)

One access doesn’t happen before the other Program order Synchronization order▪ Acquire-release▪ Wait-notify▪ Fork-join▪ Volatile read-write

Data Races

Thread A

write x

unlock m

read x

Thread B

lock m

write xRace!

Two accesses to same variable (one is a write)

One access doesn’t happen before the other Program order Synchronization order▪ Acquire-release▪ Wait-notify▪ Fork-join▪ Volatile read-write

Why Do We Care?

Races indicate Atomicity violations Order violations

Why Do We Care?

Races indicate Atomicity violations Order violations

Races lead to Sequential consistency violations

No races sequential consistency (Java/C++) Races writes observed out of order

Why Do We Care?

Races indicate Atomicity violations Order violations

Races lead to Sequential consistency violations

No races sequential consistency (Java/C++) Races writes observed out of order

Most races potentially harmful [Flanagan & Freund ’10]

Producer-Consumer Example

class ProducerConsumer { boolean ready; int x;

produce() { x = … ; ready = true; }

consume() { while (!ready) { } … = x; }}

Does It Race?

class ProducerConsumer { boolean ready; int x; T1 T2 produce() { x = … ; ready = true; }

consume() { while (!ready) { } … = x; }}

Does It Race?

class ProducerConsumer { boolean ready; int x; T1 T2 produce() { x = … ; ready = true; }

consume() { while (!ready) { } … = x; }}

So What?

class ProducerConsumer { boolean ready; int x; T1 T2 produce() { x = … ; ready = true; }

consume() { while (!ready) { } … = x; }}

So What?

class ProducerConsumer { boolean ready; int x; T1 T2 produce() { x = … ; ready = true; }

consume() { while (!ready) { } … = x; }}

Can read old value

So What?

class ProducerConsumer { boolean ready; int x; T1 T2 produce() { x = … ; ready = true; }

consume() { … = x; while (!ready) { } }}

Legal reordering by compiler or hardware

How to Fix?

class ProducerConsumer { boolean ready; int x; T1 T2 produce() { x = … ; ready = true; }

consume() { while (!ready) { } … = x; }}

Properly Synchronized

class ProducerConsumer { volatile boolean ready; int x; T1 T2 produce() { x = … ; ready = true; }

consume() { while (!ready) { } … = x; }}

Happens- before edge

Example #2

class LibraryBook { Set<Person> borrowers;}

Initialization on Demand

class LibraryBook { Set<Person> borrowers;

addBorrower(Person p) { if (borrowers == null) { borrowers = new HashSet<Person>(); } borrowers.add(p); }}

Synchronized but Slow?

class LibraryBook { Set<Person> borrowers;

addBorrower(Person p) { synchronized (this) { if (borrowers == null) { borrowers = new HashSet<Person>(); } } borrowers.add(p); }}

Double-Checked Locking

class LibraryBook { Set<Person> borrowers;

addBorrower(Person p) { if (borrowers == null) { synchronized (this) { if (borrowers == null) { borrowers = new HashSet<Person>(); } } } borrowers.add(p); }}

Does It Race?

class LibraryBook { Set<Person> borrowers;

addBorrower(Person p) { if (borrowers == null) { synchronized (this) { if (borrowers == null) { borrowers = new HashSet<Person>(); } } } borrowers.add(p); }}

Does It Race?

addBorrower(Person p) { if (borrowers == null) { synchronized (this) { if (borrowers == null) { borrowers = new HashSet(); } } }

...

borrowers.add(p);}

addBorrower(Person p) {

if (borrowers == null) { ...

}

borrowers.add(p);

}

So What?

addBorrower(Person p) { if (borrowers == null) { synchronized (this) { if (borrowers == null) { HashSet obj = alloc HashSet; obj.<init>(); borrowers = obj; } } }

...

borrowers.add(p);}

addBorrower(Person p) {

if (borrowers == null) { ...

}

borrowers.add(p);

}

So What?

addBorrower(Person p) { if (borrowers == null) { synchronized (this) { if (borrowers == null) { HashSet obj = alloc HashSet; borrowers = obj; obj.<init>(); } } }

...

borrowers.add(p);}

addBorrower(Person p) {

if (borrowers == null) { ...

}

borrowers.add(p);

}

So What?

addBorrower(Person p) { if (borrowers == null) { synchronized (this) { if (borrowers == null) { HashSet obj = alloc HashSet; borrowers = obj;

obj.<init>(); }}} ... borrowers.add(p);}

addBorrower(Person p) {

if (borrowers == null) { ...

}

borrowers.add(p);

}

Performance vs. Accuracy

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%0

5

10

15

20

eclipsehsqldbxalanpseudojbb

Detection rate ( sampling rate)

Slow

dow

n

Performance vs. Accuracy

-1% 1% 3% 5% 7% 9% 11% 13% 15%0

1

2

3

4

5

eclipsehsqldbxalanpseudojbb

Detection rate ( sampling rate)

Slow

dow

n

33% base overhead

~50% overhead

Accuracy & Performance

Program alone FastTrack Pacer

Detection rate 0 occurrence rate occurrence rate × r

Running time t t(c1 + c2n) t[(c1 + c2n)r + c3]

Evaluate only frequent races Evaluate scaling with r Don’t evaluate scaling with n

Northeast Blackout of 2003

Northeast Blackout of 2003

50 million people

Northeast Blackout of 2003

Energy Management System Alarm and Event Processing Routine (1 MLOC)

http://www.securityfocus.com/news/8412

Northeast Blackout of 2003

Energy Management System Alarm and Event Processing Routine (1 MLOC)

Post-mortem analysis: 8 weeks"This fault was so deeply embedded, it took them weeks of poring through millions of lines of code and data to find it.” –Ralph DiNicola, FirstEnergy

http://www.securityfocus.com/news/8412

Northeast Blackout of 2003

Race condition Two threads writing to data structure simultaneously

Usually occurs without error Small window for causing data corruption

http://www.securityfocus.com/news/8412

Tracks happens-before: sound & precise 80X slowdown Each analysis step: O(n) time (n = # of threads)

Vector Clock-Based Race Detection

Tracks happens-before: sound & precise 80X slowdown Each analysis step: O(n) time (n = # of threads)

FastTrack [Flanagan & Freund ’09] Reads & writes (97%): O(1) time Synchronization (3%): O(n) time 8X slowdown

Vector Clock-Based Race Detection

Tracks happens-before: sound & precise 80X slowdown Each analysis step: O(n) time (n = # of threads)

FastTrack [Flanagan & Freund ’09] Reads & writes (97%): O(1) time Synchronization (3%): O(n) time 8X slowdown

Problem today

Problem in future

Vector Clock-Based Race Detection

Tracks happens-before: sound & precise 80X slowdown Each analysis step: O(n) time (n = # of threads)

FastTrack [Flanagan & Freund ’09] Reads & writes (97%): O(1) time Synchronization (3%): O(n) time 8X slowdown

Vector Clock-Based Race Detection

Vector Clock-Based Race Detection

Thread A Thread B

5 2 3 4A B A B

Vector clocks

Thread A Thread B

5 2 3 4A B A B

Vector clocks

Thread A’s logical time Thread B’s logical time

Vector Clock-Based Race Detection

Thread A Thread B

5 2 3 4A B A B

Vector clocks

Last logical time “received” from B

Last logical time “received” from A

Vector Clock-Based Race Detection

5 2 3 4A B A B

Vector Clock-Based Race Detection

Thread A

unlock m

Thread B

lock m6 2Increment

clock

5 2 3 4A B A B

Vector Clock-Based Race Detection

Thread A

unlock m

Thread B

lock m6 2

5 4

5 2

Joinclocks

5 2 3 4A B A B

Vector Clock-Based Race Detection

Thread A

unlock m

Thread B

lock m6 2

5 4 n = # of threads

O(n) time

5 2

Vector Clock-Based Race Detection

Thread A

write x

unlock m

read x

Thread B

lock m

write x

5 2 3 4A B A B

Vector Clock-Based Race Detection

Thread A

write x

unlock m

read x

Thread B

lock m

write x

5 2 3 4A B A B

5@A

Vector Clock-Based Race Detection

Thread A

write x

unlock m

read x

Thread B

lock m

write x

5 2 3 4A B A B

5@A

Vector Clock-Based Race Detection

Thread A

write x

unlock m

read x

Thread B

lock m

write x

5 2 3 4A B A B

6 2

5@A

5 2

Vector Clock-Based Race Detection

Thread A

write x

unlock m

read x

Thread B

lock m

write x

5 2 3 4A B A B

6 2

5@A

5 2

Vector Clock-Based Race Detection

Thread A

write x

unlock m

read x

Thread B

lock m

write x

5 2 3 4A B A B

6 2

5 4

5@A

5 2

Vector Clock-Based Race Detection

Thread A

write x

unlock m

read x

Thread B

lock m

write x

5 2 3 4A B A B

5 4

5@A

6 2Happens before?5 2

5@A

Vector Clock-Based Race Detection

Thread A

write x

unlock m

read x

Thread B

lock m

write x

5 2 3 4A B A B

5 4

6 2

4@B

5 2

5@A

Vector Clock-Based Race Detection

Thread A

write x

unlock m

read x

Thread B

lock m

write x

5 2 3 4A B A B

5 4

6 2

Happens before?

4@B

5 2

5@A

Vector Clock-Based Race Detection

Thread A

write x

unlock m

read x

Thread B

lock m

write x

5 2 3 4A B A B

5 4

6 2

Happens before?

4@BRace!

5 2

FastTrack[Flanagan & Freund ’09]

Pacer

Detection rate occurrence rate occurrence rate × r

Prior Work Isn’t Deployable

Sampling rate

FastTrack[Flanagan & Freund ’09]

Pacer

Detection rate occurrence rate occurrence rate × r

Running time t(c1 + c2n)

(Theoretical) Accuracy & Performance

No. of threads

FastTrack[Flanagan & Freund ’09]

Pacer

Detection rate occurrence rate occurrence rate × r

Running time t(c1 + c2n)

(Theoretical) Accuracy & Performance

Reads & writes Synchronization

FastTrack[Flanagan & Freund ’09]

Pacer

Detection rate occurrence rate occurrence rate × r

Running time t(c1 + c2n)

(Theoretical) Accuracy & Performance

Reads & writes

Problem today Problem in future

Synchronization

FastTrack[Flanagan & Freund ’09]

Pacer

Detection rate occurrence rate occurrence rate × r

Running time t(c1 + c2n) t[(c1 + c2n)r + c3]

Overhead in sampling periods

(Theoretical) Accuracy & Performance

FastTrack[Flanagan & Freund ’09]

Pacer

Detection rate occurrence rate occurrence rate × r

Running time t(c1 + c2n) t[(c1 + c2n)r + c3]

Overhead in sampling periods

Overhead in non-sampling periods (small)

(Theoretical) Accuracy & Performance

Pacer

Pacer

Detecting Data Races in Production

Data race occurs extremely rarely

Data race occurs extremely rarely

Data race occurs periodically

Pre-deployment Deployed

Detecting Data Races in Production

“We test exhaustively … we had in excess of three million online operational hours [342 years] in which

nothing had ever exercised that bug.”–Mike Unum, manager of commercial solutions, GE Energy

http://www.securityfocus.com/news/8412

Detecting Data Races in Production

Data race buggy execution

Recommended