38
Systematic Debugging Zeller

Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

Embed Size (px)

Citation preview

Page 1: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

Systematic Debugging

Zeller

Page 2: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 2

Literature

  “Why Programs Fail”, 2nd Ed.– Chap 1: TRAFFIC / Overview– Chap 6: Scientific Method in debugging

  “Beautiful Code”, eds. Oram & Wilson– Chap 28: Delta debugging

Page 3: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 3

Terminology Increment

  The story goes– The programmer creates a defect

– The defect causes an infection: program state that differs from the intended state

– The infection propagates• Erroneous state is fed into functions and the infection

spreads

– The infection causes a failure• An externally observable error,

Page 4: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 4

Debugging

  Debugging is:– Identify the infection chain– Find the root cause, and

thereby the defect– Remove the defect

  ´Note:– The chain can be long from

infection to failure – Not all infections lead to

failure

Page 5: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 5

TRAFFIC

  Seven step process– T: Track the problem in the database– R: Reproduce the failure– A: Automate and simplify the test case– F: Find possible infection regions– F: Focus on the most likely origins– I: Isolate the infection chain– C: Correct the defect

Page 6: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 6

Debugging as Search

  Searching:– Finding the transition from sane

to infected state• In time & and in place (= var.)

  Principles:– Separate sane from infected state

• If a state is sane, there is no infection to propagate

– Separate relevant from irrelevant• A variable value is the result of a

limited number of earlier variable values. I.e. only part of the earlier state may be relevant for the failure

Page 7: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 7

Searching

  State dependency– Infected state is a result of

a changes in a limited set of state variables

– Finding these of course limits the search space a lot…

  Yet another argument in favor of loose coupling of highly cohesive software units!

Page 8: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

Scientific Method

Wikipedia

And

Zeller Chap 6

Page 9: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 9

Process

  The essential elements of the scientific method are iterations and recursions of the following four steps:– Characterization (clear terminology, careful logging)– Hypothesis (a theoretical, hypothetical explanation) – Prediction (logical deduction from the hypothesis) – Experiment (test of all of the above)

– Observation & Conclusion: Hypothesis reject/confirm

– Theory: A hypothesis that cannot be rejected even though it has been thoroughly tried…

Page 10: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 10

Characterization

Natural Science

Observation

HypothesisExperiment

Theory / Literature Experience

Page 11: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

My Best Debugging Story

Utilizing the Scientific Method

Page 12: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 12

Debugging…

  The method– Observe hypothesis prediction experiment– and detailed record keeping

  … served me well in a EU project in which I was hired as advisor…

  Domain: Symphony orchestra sheet music editor.

Page 13: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 13

Case: Haken-Blostein algorithm

  Haken-Blostein is a complex algorithm for spacing sheet music.

  Basically it treats the graphical objects like a system of connected springs with rods inserted.

Page 14: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 14

Observation

  Observation: Each invocation of the algorithm spaced the sheet music differently. Something gets accumulated or changed between invocations?

  Hypothesis: The observed behavior is due to some kind of accumulating defect in the code. Some variable(s) change value during invocation

  But - Too vague to be useful...

Page 15: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 15

Physics: Spring system

  A classic problem in physics is connected springs, ex. Three springs connected.

  Problem: – how will the connection points move when you

push/pull the system?– When the spring constants are different?– Inserting rods will set a lower limit on how

compressed they can become

Page 16: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 16

Iteration 1

  Parameters:– external algorithm parameters: a

– spring constants ci, rod sizes Rj

  Hypothesis 1:– The algorithm parameter, a, changes value

  Experiment:– Inspect the value of a before and after invocation– Easy: there is a dialogue box where it can be

inspected.

  Result: Hypothesis false.

Page 17: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 17

Iteration 2

  Hypothesis 2:– Spring constants changes

  Experiment:– Inspect the values of a before and after invocation– Difficult:

• There are a lot of springs and no user interface. • Inserting debug output into code.• Manually comparing long lists of double values.

  Result: Hypothesis false.

Page 18: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 18

Iteration 3

  Hypothesis 3:– Rod sizes changes

  Experiment:– Inspect the values of a before and after invocation– Even more difficult:

• There are a hundreds of rods and no user interface. • Inserting debug output into code.• Using diff-tool to compare very long lists of double values.

  Result: Hypothesis accepted.– a few rod sizes somehow accumulate width…

Page 19: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

Zeller

Page 20: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 20

Process

Page 21: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 21

Our aids

  The writing template– Hypothesis– Prediction– Experiment– Observation– Conclusion

  Be explicit! Write it down and keep a log

Page 22: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 22

Example: Haken-Blostein 1

  Hypothesis H1– The spacing algorithm changes the value of a

  Prediction– The value of a before spacing differs from that after

  Experiment– Inspect a (dialog box); respace; inspect a

  Observation– The value of a is the same before and after respacing

  Conclusion– H1 rejected.

Page 23: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 23

Exercise

  Discuss this experiment/iteration using the terminology of– Defect, infection, failure

– Principles:• Separate sane from infected state

– If a state is sane, there is no infection to propagate

• Separate relevant from irrelevant– A variable value is the result of a limited number of earlier

variable values. I.e. only part of the earlier state may be relevant for the failure

Page 24: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 24

Creating New Hypotheses

  If a hypothesis is rejected – we have to formulate a new one!– Creative – but effective!

  Sources– Problem description (be explicit and precise)– Program code– Failing run/test case– Alternative runs/test cases

  Earlier Hypotheses– New must

• Include all confirmed earlier hypotheses• Exclude all rejected earlier hypotheses

Page 25: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 25

Reasoning

  Fig 6.5  Deduction (0 runs)

– Concluding from abstract to concrete• I do not have to measure if the sum of angles in a triangle is

180 degrees. I deduce it from a mathematical theory.

  Observation (1 run)– Observe a phenomenon once

  Induction (n runs)– Collecting many concrete observation to form abstract

• “I have met 15 stupid men, thus all men are stupid…”

  Experimentation (n controlled runs)– Induction, but controlled by scientific method

Page 26: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

Delta Debugging

Page 27: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 27

Automatic the Search

  TRAFFIC– T: Track the problem in the

database– R: Reproduce the failure– A: Automate and simplify the

test case– F: Find possible infection

regions– F: Focus on the most likely

origins– I: Isolate the infection chain– C: Correct the defect

  Debugging = Search

  But can we automate the search?

Page 28: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 28

Case

  ddd– Front end to gdb– Bug: command-line arguments no longer

remembered between runs– Pass: gdb 4.16 Fail: gdb 4.17– Reasoning

• Some code change, Δ, between 4.16 and 4.17 is the cause of the failure.

• Can find it by reviewing the set of Δ’s

– Diff: • 178.200 changed lines• 8721 places

Page 29: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 29

Idea

  Apply one Δ at a time– Test to see if test case fails

  Starting at 4.16 (pass) until fail (4.16’)– Review the last applied Δ – that is it!

  But…– The order of Δ is not known

• Δ1 must be applied before Δ2 and Δ3– Ex: Δ1 declare a new variable, used by Δ2

Page 30: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 30

Permutations?

  All orderings of set of Δ?– 8721!

  All subsets of the set of Δ?– 2^8721 = 10^2625

  There are approx 3 x 10^7 seconds in a year so the ‘fast’ approach would take 375 years if each iteration takes 1 second.

Page 31: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 31

A Simpler Problem

  First – let us look at a simpler, but related problem.– I generate HTML from my XML database

• schedule.xml, lesson.xml, exercise.xml, …

– Using an XSLT translator ‘xt’

  xt xml_src/schedule.xml xsl/gen-schedule.xsl schedule.html

  But the problem is that if the xml is not well formed I generally get error without any hint of what went wrong– Like the ‘good old days’ of a C core dump…

Page 32: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 32

The Scientific Method + Divide&Conquer

  I then revert to binary search:– H1: The defect is in the first half of the xml file

• I cut out the last (meaningful) half, and run ‘xt’• If the error is there, H1 is confirmed, otherwise rejected

– If H1 is rejected then H2: defect in second half

– If H1/ is confirmed, I further divided the halved file with the defect into two – and define two new hypotheses

• i.e. call myself recursively

Page 33: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 33

ddmin

  ddmin is an algorithmic version of this procedure– Test(c) only returns X

in case the exact same failure occurs

– If xt fails for other reasons, return ?

Page 34: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 34

What Happens?

  N = 8

  Test(Cx\C4) = X =>

  Call recursively with new set and N=7– N = max(N-1,2)

Page 35: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 35

The Math Formulation

Page 36: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 36

Back to gdb Problem

  You apply smaller and smaller changes sets to gdb 4.16 and test until the change set is small enough is small enough to – A) compile and run– B) find one the reproduce the defect

  Note: No guaranty that it will be found!– The change that has the defect may rely on a

previous change that is not part of the change set…

Page 37: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 37

The Full dd Algorithm

  The dd algoritms in the handed-out chapter is the extended version, dd, of ddmin.

  It tests from both ends– Failing test case set to be minimized (like ddmin)

• All deltas / gdb 4.17

– Passing test case set to be maximized• No deltas / gdb 4.16

  I tried to hand-run it but …

Page 38: Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

DAIMI Henrik Bærbak Christensen 38

Summary

  TRAFFIC– An algorithm to help in debugging

  Scientific Process– Debugging as iterations over hypothesis, prediction,

observation, and conclusion.

  Delta-debugging– Automating the search for the defect

• By divide-n-conquer of the search space…