Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method

Systematic Debugging

Zeller

DAIMI Henrik Bærbak Christensen 2

Literature

“Why Programs Fail”, 2nd Ed.– Chap 1: TRAFFIC / Overview– Chap 6: Scientific Method in debugging

“Beautiful Code”, eds. Oram & Wilson– Chap 28: Delta debugging


Terminology Increment

The story goes– The programmer creates a defect

– The defect causes an infection: program state that differs from the intended state

– The infection propagates• Erroneous state is fed into functions and the infection

spreads

– The infection causes a failure• An externally observable error,


Debugging

Debugging is:– Identify the infection chain– Find the root cause, and

thereby the defect– Remove the defect

´Note:– The chain can be long from

infection to failure – Not all infections lead to

failure


TRAFFIC

Seven step process– T: Track the problem in the database– R: Reproduce the failure– A: Automate and simplify the test case– F: Find possible infection regions– F: Focus on the most likely origins– I: Isolate the infection chain– C: Correct the defect


Debugging as Search

Searching:– Finding the transition from sane

to infected state• In time & and in place (= var.)

Principles:– Separate sane from infected state

• If a state is sane, there is no infection to propagate

– Separate relevant from irrelevant• A variable value is the result of a

limited number of earlier variable values. I.e. only part of the earlier state may be relevant for the failure


Searching

State dependency– Infected state is a result of

a changes in a limited set of state variables

– Finding these of course limits the search space a lot…

Yet another argument in favor of loose coupling of highly cohesive software units!

Scientific Method

Wikipedia

And

Zeller Chap 6


Process

The essential elements of the scientific method are iterations and recursions of the following four steps:– Characterization (clear terminology, careful logging)– Hypothesis (a theoretical, hypothetical explanation) – Prediction (logical deduction from the hypothesis) – Experiment (test of all of the above)

– Observation & Conclusion: Hypothesis reject/confirm

– Theory: A hypothesis that cannot be rejected even though it has been thoroughly tried…


Characterization

Natural Science

Observation

HypothesisExperiment

Theory / Literature Experience

My Best Debugging Story

Utilizing the Scientific Method


Debugging…

The method– Observe hypothesis prediction experiment– and detailed record keeping

… served me well in a EU project in which I was hired as advisor…

Domain: Symphony orchestra sheet music editor.


Case: Haken-Blostein algorithm

Haken-Blostein is a complex algorithm for spacing sheet music.

Basically it treats the graphical objects like a system of connected springs with rods inserted.


Observation

Observation: Each invocation of the algorithm spaced the sheet music differently. Something gets accumulated or changed between invocations?

Hypothesis: The observed behavior is due to some kind of accumulating defect in the code. Some variable(s) change value during invocation

But - Too vague to be useful...


Physics: Spring system

A classic problem in physics is connected springs, ex. Three springs connected.

Problem: – how will the connection points move when you

push/pull the system?– When the spring constants are different?– Inserting rods will set a lower limit on how

compressed they can become


Iteration 1

Parameters:– external algorithm parameters: a

– spring constants ci, rod sizes Rj

Hypothesis 1:– The algorithm parameter, a, changes value

Experiment:– Inspect the value of a before and after invocation– Easy: there is a dialogue box where it can be

inspected.

Result: Hypothesis false.


Iteration 2

Hypothesis 2:– Spring constants changes

Experiment:– Inspect the values of a before and after invocation– Difficult:

• There are a lot of springs and no user interface. • Inserting debug output into code.• Manually comparing long lists of double values.

Result: Hypothesis false.


Iteration 3

Hypothesis 3:– Rod sizes changes

Experiment:– Inspect the values of a before and after invocation– Even more difficult:

• There are a hundreds of rods and no user interface. • Inserting debug output into code.• Using diff-tool to compare very long lists of double values.

Result: Hypothesis accepted.– a few rod sizes somehow accumulate width…

Zeller


Process


Our aids

The writing template– Hypothesis– Prediction– Experiment– Observation– Conclusion

Be explicit! Write it down and keep a log


Example: Haken-Blostein 1

Hypothesis H1– The spacing algorithm changes the value of a

Prediction– The value of a before spacing differs from that after

Experiment– Inspect a (dialog box); respace; inspect a

Observation– The value of a is the same before and after respacing

Conclusion– H1 rejected.


Exercise

Discuss this experiment/iteration using the terminology of– Defect, infection, failure

– Principles:• Separate sane from infected state

– If a state is sane, there is no infection to propagate

• Separate relevant from irrelevant– A variable value is the result of a limited number of earlier

variable values. I.e. only part of the earlier state may be relevant for the failure


Creating New Hypotheses

If a hypothesis is rejected – we have to formulate a new one!– Creative – but effective!

Sources– Problem description (be explicit and precise)– Program code– Failing run/test case– Alternative runs/test cases

Earlier Hypotheses– New must

• Include all confirmed earlier hypotheses• Exclude all rejected earlier hypotheses


Reasoning

Fig 6.5 Deduction (0 runs)

– Concluding from abstract to concrete• I do not have to measure if the sum of angles in a triangle is

180 degrees. I deduce it from a mathematical theory.

Observation (1 run)– Observe a phenomenon once

Induction (n runs)– Collecting many concrete observation to form abstract

• “I have met 15 stupid men, thus all men are stupid…”

Experimentation (n controlled runs)– Induction, but controlled by scientific method

Delta Debugging


Automatic the Search

TRAFFIC– T: Track the problem in the

database– R: Reproduce the failure– A: Automate and simplify the

test case– F: Find possible infection

regions– F: Focus on the most likely

origins– I: Isolate the infection chain– C: Correct the defect

Debugging = Search

But can we automate the search?


Case

ddd– Front end to gdb– Bug: command-line arguments no longer

remembered between runs– Pass: gdb 4.16 Fail: gdb 4.17– Reasoning

• Some code change, Δ, between 4.16 and 4.17 is the cause of the failure.

• Can find it by reviewing the set of Δ’s

– Diff: • 178.200 changed lines• 8721 places


Idea

Apply one Δ at a time– Test to see if test case fails

Starting at 4.16 (pass) until fail (4.16’)– Review the last applied Δ – that is it!

But…– The order of Δ is not known

• Δ1 must be applied before Δ2 and Δ3– Ex: Δ1 declare a new variable, used by Δ2


Permutations?

All orderings of set of Δ?– 8721!

All subsets of the set of Δ?– 2^8721 = 10^2625

There are approx 3 x 10^7 seconds in a year so the ‘fast’ approach would take 375 years if each iteration takes 1 second.


A Simpler Problem

First – let us look at a simpler, but related problem.– I generate HTML from my XML database

• schedule.xml, lesson.xml, exercise.xml, …

– Using an XSLT translator ‘xt’

xt xml_src/schedule.xml xsl/gen-schedule.xsl schedule.html

But the problem is that if the xml is not well formed I generally get error without any hint of what went wrong– Like the ‘good old days’ of a C core dump…


The Scientific Method + Divide&Conquer

I then revert to binary search:– H1: The defect is in the first half of the xml file

• I cut out the last (meaningful) half, and run ‘xt’• If the error is there, H1 is confirmed, otherwise rejected

– If H1 is rejected then H2: defect in second half

– If H1/ is confirmed, I further divided the halved file with the defect into two – and define two new hypotheses

• i.e. call myself recursively


ddmin

ddmin is an algorithmic version of this procedure– Test(c) only returns X

in case the exact same failure occurs

– If xt fails for other reasons, return ?


What Happens?

N = 8

Test(Cx\C4) = X =>

Call recursively with new set and N=7– N = max(N-1,2)


The Math Formulation


Back to gdb Problem

You apply smaller and smaller changes sets to gdb 4.16 and test until the change set is small enough is small enough to – A) compile and run– B) find one the reproduce the defect

Note: No guaranty that it will be found!– The change that has the defect may rely on a

previous change that is not part of the change set…


The Full dd Algorithm

The dd algoritms in the handed-out chapter is the extended version, dd, of ddmin.

It tests from both ends– Failing test case set to be minimized (like ddmin)

• All deltas / gdb 4.17

– Passing test case set to be maximized• No deltas / gdb 4.16

I tried to hand-run it but …


Summary

TRAFFIC– An algorithm to help in debugging

Scientific Process– Debugging as iterations over hypothesis, prediction,

observation, and conclusion.

Delta-debugging– Automating the search for the defect

• By divide-n-conquer of the search space…

Documents

Systematic Debugging Zeller. DAIMIHenrik Bærbak Christensen2 Literature “Why Programs Fail”, 2nd Ed. –Chap 1: TRAFFIC / Overview –Chap 6: Scientific Method