Enabling and Supporting the Debugging of Software Failures (PhD Defense)

ENABLING AND SUPPORTING THE DEBUGGING

OF SOFTWARE FAILURES

Thesis Defense

James Clause

DEFINITIONS‣ mistake: a human action that

produces an incorrect result

‣ fault: an incorrect step, process, or data definition in a computer program

‣ failure: the inability of a system or component to perform its required functions within specified requirements

Debugging

DEBUGGING IS EXPENSIVE

• “...departments tend to spend about half of their applications staff time on maintenance” – Lientz and Swanson, 1981



• “Boehm, Brooks, Myers, and Yourdon and Constantine indicate that testing and debugging alone represent approximately half the cost of new system development.” – Vessey, 1985




• “According to an informal industry poll, 85 to 90 percent of the IS [Information Services] budget goes to legacy system operation and maintenance.” – Erlikh, 2000





• “...the national annual costs of an inadequate infrastructure for software testing is estimated to range from $22.2 to $59.5 billion” – NIST, 2002





• “...the national annual costs of an inadequate infrastructure for software testing is estimated to range from $22.2 to $59.5 billion” – NIST, 2002

• “24,191 people … were involved in either opening, handling, commenting on, or resolving Windows Vista bugs. That is an order of magnitude greater than the ∼2,000 developers who wrote code for Vista” – Guo, 2010


THESIS STATEMENT

Program analysis techniques can enable and support the debugging of failures in widely-used applications by:1) capturing, replaying, minimizing, and, as much

as possible, anonymizing failing executions2) highlighting subsets of failure-inducing inputs

that are likely to be helpful for debugging such failures

TECHNICAL CONTRIBUTIONS

TECHNICAL CONTRIBUTIONSRecording and

replaying executions



Input minimization✘



Input anonymization




Input anonymization


Highlighting failure-relevant inputs



Input anonymization



Enable



Input anonymization



Support

Enable



Input anonymization



Support

Enable

MOTIVATION

MOTIVATION

MOTIVATION

MOTIVATION

MOTIVATION

Failures can be difficultto reproduce.

ENVIRONMENT INTERACTIONS

ENVIRONMENT INTERACTIONSS

tre

am

s

ENVIRONMENT INTERACTIONSS

tre

am

s File

s

LIMITATIONSNot applicable in every situation

LIMITATIONS

• May not be enough space to store accessed data

• databases• long running executions

Not applicable in every situation

LIMITATIONS



• May have unacceptable runtime overhead

• webservers, real-time applications


LIMITATIONS



• May have unacceptable runtime overhead

• webservers, real-time applications


Evaluation demonstrates that it can be useful for some common application types.

EVALUATION

Acceptable runtime overhead

Failures reproduced successfully

EVALUATIONPrototype implementation:

• maps libc function calls to interaction events

Subjects:• several cpu intensive applications

(e.g., bzip, gcc)

Results:• negligible overheads

• data size is acceptable• all failures successfully replayed



Input anonymization



Support

Enable

PRACTICALITY ISSUES

345

PRACTICALITY ISSUES

Large in size

345345

PRACTICALITY ISSUES

Large in size Contain sensitiveinformation

345345

PRACTICALITY ISSUES


Minimize

✘

Highlight

345345

PRACTICALITY ISSUES


AnonymizeMinimize

✘

Highlight



Input anonymization



Support

Enable

MINIMIZATION✘

24:15

MINIMIZATION✘

�Time

minimization 2:5524:15

MINIMIZATION✘

✂Data

minimization 2:55

�Time


MINIMIZATION✘

✂Data

minimization 2:55

�Time


Oracle Oracle

TIME MINIMIZATIONEvent log:

Environment data (streams):KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}...

NETWORK: {3405}<html><body>... ❙ {202}...

FILE foo.1POLL KEYBOARD NOKPOLL KEYBOARD OKPULL KEYBOARD 5POLL NETWORK OKPULL NETWORK 1024FILE bar.1POLL NETWORK NOKPOLL NETWORK OKFILE foo.2...PULL NETWORK 1024FILE foo.2POLL KEYBOARD NOK...





Remove idle time





Remove idle time





Remove idle time

Remove delays





Remove idle time

Remove delays

DATA MINIMIZATION

Environment data (files):

foo.1 foo.2 bar.1

Whole entities

Chunks

Atoms

DATA MINIMIZATION


foo.2 bar.1

Lorem ipsum dolor sitamet, consetetursadipscing elitr,seddiam nonumy eirmodtempor invidunt utlabore et dolore magna aliquyamerat, sed diam voluptua. At vero

eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur

sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et

Whole entities

Chunks

Atoms

DATA MINIMIZATION


foo.2 bar.1




Whole entities

Chunks

Atoms

DATA MINIMIZATION


foo.2 bar.1

Whole entities

Chunks

Atoms

DATA MINIMIZATION


bar.1




Whole entities

Chunks

Atoms

DATA MINIMIZATION


bar.1




Whole entities

Chunks

Atoms

DATA MINIMIZATION


bar.1




Whole entities

Chunks

Atoms

DATA MINIMIZATION


bar.1




Whole entities

Chunks

Atoms

DATA MINIMIZATION


bar.1


Whole entities

Chunks

Atoms

DATA MINIMIZATION


bar.1


Whole entities

Chunks

Atoms

DATA MINIMIZATION


bar.1

Whole entities

Chunks

Atomssadipscing elitr, eirmod invidunt ut labore dolore magna erat, voluptua.

DATA MINIMIZATION


bar.1

Whole entities

Chunks

Atoms

sadipscing elitr, eirmod invidunt ut labore dolore magna erat, voluptua.

foo.2

DATA MINIMIZATION


Whole entities

Chunks

Atoms

sadipscing elitr, eirmod invidunt ut labore dolore magna erat, voluptua.

foo.2

ANALYSIS

ANALYSIS

1. Original and minimized executions produce the same failure

2. Minimized execution is not larger than the original execution

(assuming a correct oracle)

Correctness

ANALYSIS

1. Original and minimized executions produce the same failure

2. Minimized execution is not larger than the original execution

(assuming a correct oracle)

Correctness

polynomial in the size of the captured data(assuming delta debugging)

Worst case performance

EVALUATIONCan the technique produce, in a reasonable amount of time, minimized executions that can be used to debug the original failure?


Pine email and news client• two real field failures• 20 failing executions, 10 per failure


Pine email and news client• two real field failures• 20 failing executions, 10 per failure

Minimized executions generated by• randomly generating interaction scripts• manually performing the scripts (while recording)• minimizing the captured executions

RESULTSHeader-color fault Address book fault

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

# entities streams size files sizeAve

rage

val

ue a

fter

min

imiz

atio

n


Results are likely to be conservative; recorded executionsonly contain the minimal amount of data needed to perform an action.

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%


rage

val

ue a

fter

min

imiz

atio

n


Results are likely to be conservative; recorded executionsonly contain the minimal amount of data needed to perform an action.

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%


rage

val

ue a

fter

min

imiz

atio

n

Inputs can be minimized in a reasonable amount of time (less then 75 minutes)



Input anonymization



Support

Enable

Sensitiveinput (I) that causes F

Input domain

ANONYMIZATION


Input domainInputs that

cause F

ANONYMIZATION



cause F

ANONYMIZATION

Anonymizedinput (I’) that also causes F

Inputs that satisfyF’s path condition Sensitive

input (I) that causes F


cause F

ANONYMIZATION

Anonymizedinput (I’) that also causes F

PATH CONDITION GENERATION

Path condition: set of constraints on a program’s inputs that encode the conditions necessary for a

specific path to be executed.

boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false;}






5 3 0



5 3 0(sensitive)

Path Condition:

Symbolic State:



5 3 0(sensitive)

Path Condition:

Symbolic State:



5 3 0x→i1y→i2z→i3

(sensitive)

Path Condition:

Symbolic State:



5 3 0x→i1y→i2z→i3

(sensitive)

Path Condition:

i1 <= 5

Symbolic State:



5 3 0x→i1y→i2z→i3

(sensitive)

Path Condition:

i1 <= 5

Symbolic State:



5 3 0x→i1y→i2z→i3

(sensitive)

Path Condition:

i1 <= 5

Symbolic State:

a→i1*2



5 3 0x→i1y→i2z→i3

(sensitive)

Path Condition:

i1 <= 5

Symbolic State:

a→i1*2



5 3 0x→i1y→i2z→i3

(sensitive)

Path Condition:

i1 <= 5

Symbolic State:

a→i1*2



5 3 0x→i1y→i2z→i3

∧ i2+i1*2 > 10

(sensitive)

Path Condition:

i1 <= 5

Symbolic State:

a→i1*2



5 3 0x→i1y→i2z→i3

∧ i2+i1*2 > 10

(sensitive)

Path Condition:

i1 <= 5

Symbolic State:

a→i1*2



5 3 0x→i1y→i2z→i3

∧ i2+i1*2 > 10∧ i3 == 0

(sensitive)

CHOOSING ANONYMIZED INPUTS

Path Condition:

i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0

ConstraintSolver


Path Condition:

i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0

ConstraintSolver


Path Condition:

i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0

ConstraintSolver


Path Condition:

i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0 i1 == 5

i2 == 3i3 == 0

ConstraintSolver


Path Condition:

i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0 i1 == 5

i2 == 3i3 == 0


5 3 0

ConstraintSolver


Path Condition:

i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0

ConstraintSolver


Path Condition:

i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0

Input Constraints:

i1 != 5∧ i2 != 3∧ i3 != 0

ConstraintSolver


Path Condition:

i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0

Input Constraints:

i1 != 5∧ i2 != 3∧ i3 != 0

(breakable)

ConstraintSolver


Path Condition:

i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0

Input Constraints:

i1 != 5∧ i2 != 3∧ i3 != 0

(breakable)

ConstraintSolver


Path Condition:

i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0

Input Constraints:

i1 != 5∧ i2 != 3∧ i3 != 0

i1 == 4i2 == 10i3 == 0

(breakable)

PATH CONDITION RELAXATIONSensitiveinput (I) that causes F

Input domain


Input domain


Input domain


Input domain


Input domain

EVALUATIONFeasibilityCan the approach generate, in a reasonable amount of time, anonymized inputs that reproduce the failure?

StrengthHow much information about the original inputs is revealed?

EffectivenessAre the anonymized inputs safe to send to developers?

SUBJECTS

• Columba: 1 fault• htmlparser: 1 fault

• Printtokens: 2 faults• NanoXML: 16 faults

(20 faults, total)

SUBJECTS



Select sensitive failure-inducing inputs• manually generated or included with subject• several 100 bytes to 5MB in size

(20 faults, total)

SUBJECTS



Select sensitive failure-inducing inputs• manually generated or included with subject• several 100 bytes to 5MB in size

(Assume all of each input is potentially sensitive)

(20 faults, total)

RQ1: FEASIBILITY

0

150

300

450

600

Exec

utio

n T

ime

(s)

0

5

10

15

20

colu

mba

html

pars

er

prin

ttok

ens

1

prin

ttok

ens

2

nano

xml

1

nano

xml

2

nano

xml

3

nano

xml

4

nano

xml

5

nano

xml

6

nano

xml

7

nano

xml

8

nano

xml

9

nano

xml

10

nano

xml

11

nano

xml

12

nano

xml

13

nano

xml

14

nano

xml

15

nano

xml

16

Solv

er T

ime

(s)

RQ1: FEASIBILITY

0

150

300

450

600

Exec

utio

n T

ime

(s)

0

5

10

15

20

colu

mba

html

pars

er

prin

ttok

ens

1

prin

ttok

ens

2

nano

xml

1

nano

xml

2

nano

xml

3

nano

xml

4

nano

xml

5

nano

xml

6

nano

xml

7

nano

xml

8

nano

xml

9

nano

xml

10

nano

xml

11

nano

xml

12

nano

xml

13

nano

xml

14

nano

xml

15

nano

xml

16

Solv

er T

ime

(s)

Inputs can be anonymized in a reasonable amount of time (easily done overnight)

Average % Bits Revealed Average % Residue

RQ2: STRENGTH


RQ2: STRENGTH

Measures how many inputs that satisfy the path

condition

Littleinformation revealed


RQ2: STRENGTH


condition

Lots ofinformation revealed


RQ2: STRENGTH


condition

Measures how much of the anonymized input is identical

to the original input

AAAAAAsecretAAAAAA...

AAAAAA

BBBBBBsecretBBBBBB...

BBBBBB

I’


I


RQ2: STRENGTH


condition

Measures how much of the anonymized input is identical

to the original input

AAAAAAsecretAAAAAA...

AAAAAA

BBBBBBsecretBBBBBB...

BBBBBB

I’


I

RQ2: STRENGTH

0

25

50

75

100

0

25

50

75

100

colu

mba

html

pars

er

prin

ttok

ens

1

prin

ttok

ens

2

nano

xml

1

nano

xml

2

nano

xml

3

nano

xml

4

nano

xml

5

nano

xml

6

nano

xml

7

nano

xml

8

nano

xml

9

nano

xml

10

nano

xml

11

nano

xml

12

nano

xml

13

neno

xml

14

nano

xml

15

nano

xml

16

Ave

rage

% B

its R

evea

led

Ave

rage

% R

esid

ue

RQ2: STRENGTH

0

25

50

75

100

0

25

50

75

100

colu

mba

html

pars

er

prin

ttok

ens

1

prin

ttok

ens

2

nano

xml

1

nano

xml

2

nano

xml

3

nano

xml

4

nano

xml

5

nano

xml

6

nano

xml

7

nano

xml

8

nano

xml

9

nano

xml

10

nano

xml

11

nano

xml

12

nano

xml

13

neno

xml

14

nano

xml

15

nano

xml

16

Ave

rage

% B

its R

evea

led

Ave

rage

% R

esid

ue

Anonymized inputs reveal, on average, between 60% (worst case) and 2% (best case) of the

information in the original inputs

RQ3: EFFECTIVENESSHTMLPARSER

<?xml version="1.0" encoding="UTF-8" ?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><head><title>james clause @ gatech | home</title>

<style type="text/css" media="screen" title=""><![CDATA[</style></head><body> ...</body>













The portions of the inputs that remain after anonymization tend to be structural in nature and

therefore are safe to send to developers



Input anonymization



Support

Enable

Foo512B

Bar1KB

Baz1.5GB

OVERVIEW

1 Taint inputs

Foo512B

Bar1KB

Baz1.5GB

OVERVIEW

1 Taint inputs

Foo512B

Bar1KB

Baz1.5GB

OVERVIEW123

456

789

0

1 Taint inputs

2 Propagatetaint marks

Foo512B

Bar1KB

Baz1.5GB

OVERVIEW123

456

789

0

1 Taint inputs


Foo512B

Bar1KB

Baz1.5GB

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

OVERVIEW123

456

789

0

1 Taint inputs


3 Identifyrelevant inputs

Foo512B

Bar1KB

Baz1.5GB

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

OVERVIEW123

456

789

0

1 Taint inputs


3 Identifyrelevant inputs

Foo512B

Bar1KB

Baz1.5GB

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

OVERVIEW123

456

789

0

EVALUATIONStudy 1: Effectiveness for debugging real failures Study 2: Comparison with Delta Debugging


Application KLoC Fault locationbc 1.06 10.5 more_arrays : 177

gzip 1.24 6.3 get_istat : 828

ncompress 4.24 1.4 comprexx : 896

pine 4.44 239.1 rfc822_cat : 260

squid 2.3 69.9 ftpBuildTitleUrl : 1024

Subjects:


Application KLoC Fault locationbc 1.06 10.5 more_arrays : 177

gzip 1.24 6.3 get_istat : 828

ncompress 4.24 1.4 comprexx : 896

pine 4.44 239.1 rfc822_cat : 260

squid 2.3 69.9 ftpBuildTitleUrl : 1024

Subjects:

We selected a failure-revealing input vector for each subject.

STUDY 1: EFFECTIVENESS

Is the information that Penumbra provides helpful for

debugging real failures?

STUDY 1 RESULTS: GZIP & NCOMPRESSCrash when a file name is longer than 1,024 characters.

STUDY 1 RESULTS: GZIP & NCOMPRESS

Contents&

Attributes

Contents&

Attributes

bar

Contents&

Attributes

foo./gzip

Crash when a file name is longer than 1,024 characters.

# Inputs: 10,000,056

longfile name[ ]


Contents&

Attributes

Contents&

Attributes

bar

Contents&

Attributes

foo./gzip


# Inputs: 10,000,056 # Relevant (DF): 1

longfile name[ ]


Contents&

Attributes

Contents&

Attributes

bar

Contents&

Attributes

foo./gzip


# Relevant (DF + CF): 3# Inputs: 10,000,056 # Relevant (DF): 1

longfile name[ ]

STUDY 1: CONCLUSIONS1. Data-flow propagation is always effective,

data- and control-flow propagation is sometimes effective.

➡ Use data-flow propagation first then, if necessary, use control-flow propagation.

2. Highlighted inputs correspond to the failure conditions.

➡ Our technique is effective in assisting the debugging of real failures.

STUDY 2: COMPARISON WITHDELTA DEBUGGING

RQ1: How much manual effort does each technique require?

RQ2: How long does it take to fix a considered failure given the information provided by

each technique?

RQ1: MANUAL EFFORTUse setup-time as a proxy for manual (developer) effort.


5,400

12,600

1,8001,8001259731470163

ncompress bc pine

Setu

p-tim

e (s

)

gzip

PenumbraDelta Debugging

squid


5,400

12,600

1,8001,8001259731470163

ncompress bc pine

Setu

p-tim

e (s

)

gzip


squid


5,400

12,600

1,8001,8001259731470163

ncompress bc pine

Setu

p-tim

e (s

)

gzip


squid


5,400

12,600

1,8001,8001259731470163

ncompress bc pine

Setu

p-tim

e (s

)

gzip


squidPenumbra requires considerably less setup time than Delta Debugging

(although more time time overall for gzip and ncompress).

RQ2: DEBUGGING EFFORTUse number of relevant inputs as a proxy for debugging effort.

RQ2: DEBUGGING EFFORT

Subject PenumbraPenumbra Delta DebuggingDF DF + CF

bc 209 743 285

gzip 1 3 1

ncompress 1 3 1

pine 26 15,100,344 90

squid 89 2,056 —

Use number of relevant inputs as a proxy for debugging effort.



bc 209 743 285

gzip 1 3 1

ncompress 1 3 1

pine 26 15,100,344 90

squid 89 2,056 —


• Penumbra (DF) is comparable to (slightly better than) Delta Debugging.



bc 209 743 285

gzip 1 3 1

ncompress 1 3 1

pine 26 15,100,344 90

squid 89 2,056 —


• Penumbra (DF) is comparable to (slightly better than) Delta Debugging.

• Penumbra (DF + CF) is likely less effective for bc, pine, and squid

CONCLUSIONS

Program analysis techniques can enable and support the debugging of failures in widely-used applications by:1) capturing, replaying, minimizing, and, as much

as possible, anonymizing failing executions2) highlighting subsets of failure-inducing inputs

that are likely to be helpful for debugging such failures

Technology

Enabling and Supporting the Debugging of Software Failures (PhD Defense)