SCIENTIFIC DEBUGGING CHAPTER 6.

What we have covered so far: How to reproduce the problem effectively Simplified the problem

What we will cover in Chapter 6: Understand how the failure came to be

Applying the scientific method to debugging Creating and verifying hypotheses for making experiments Create a systematic and explicit process for debugging


What exactly is an error in software code?

"An error (or fault) is a design flaw or a deviation from a desired or intended state. An error won't yield a failure without the conditions that trigger it. Example, if the program yields 2+2=5 on the 10th time you use it, you won't see the error before or after the 10th use. The failure is the program's actual incorrect or missing behavior under the error-triggering conditions. A symptom might be a characteristic of a failure that helps you recognize that the program has failed. Defect might refer to the failure or to the underlying error.”

- Mike Kelly (Director of Testing and Quality Assurance for Interactions, President of the Association for Software Testing and co-founder of the Indianapolis Workshops on Software Testing)

Source: http://searchsoftwarequality.techtarget.com/guide/Advice-from-Mike-Kelly

Errors: Cause and Effect

Syntax errors – incorrect grammar; not following language syntax

Misspelled variable name Missing semicolons Unmatched parentheses

Run-rime errors – occurring while program is being run

Division by zero

Logical errors – produce incorrect results despite it being able to compile and execute

Compiler warnings and messages – user are alerted of errors when the program attempts to run

Result observation – testing the actual results to those that are expected

Multiple, consistent runs/experiments – The higher the sample size, the more confident you can be about an error and its cause

Causes How to prove them

Source: https://msdn.microsoft.com/en-us/library/s9ek7a19(v=vs.90).aspx

Intuition comes with experience

Debugging “gurus” – have the ability to locate errors quickly and effectively

Gained such an ability through the development of intuition

Debugging intuition is built by past experiences with similar errors (the past is the greatest teacher)

The more experience you have working with a specific error, the easier it is to identify in different occurrences later

You can gather experience by working through a variety of errors and noticing patterns in how they develop

Speed up the process of locating errors by training your reasoning

Method for identifying failures

What is an effective method for finding an explanation for a failure within code?

Does not require priori knowledge – We should not be expected to have experience working with this error

Works in a systematic and reproducible fashion – We should be able to follow a systemized process that allows us to correctly reproduce it if needed.

Therefore: How do we systematically find an explanation for a failure?

Scientific Method

If our program fails: can no longer rely on our model of the

program must explore the program independently treat the failing program as something that

occurs naturally – natural phenomenon

Scientific Method – method for developing or examining a theory that explains and predicts a natural phenomenon



Steps of the Scientific Method (General Processes)

1. Observe some aspect of the universe. What behavior would you like to examine in further detail?

2. Establish a hypothesis that is consistent with the observation.

Hypothesis – proposed explanation made on the basis of limited evidence as a starting point for further investigation

3. Use the hypothesis to make a specific prediction about the results.

4. Test the predictions by experiments and modify the hypothesis based on the results found.

5. Repeat steps 3 and 4 until there are no longer any discrepancies between the hypothesis and experiment.

Steps of the Scientific Method (General Processes) Cont.






Specific aspect that can be indisputably tested.

Simplify the event.

Make a plausible guess.

Compare expected results with actual.

Modify the original hypothesis with new knowledge.

Example of Scientific Method

Example situation: Turning on a flashlight but the light does not come on.

1. Observation – Even though I switch it on, the light does not come on.

2. Hypothesis – Since the flashlight has worked in the past, it could be possible that the batteries are dead.

3. Prediction – If I replace the batteries, the flashlight should turn on.

4. Experimentation – Replace the current batteries with new ones and switch the flashlight on. The light comes on.

Conclusion: The flashlight needed new batteries.

Theory – all discrepancies are gone from the hypothesis

Conceptual framework that: explains earlier observations predicts future observations

Theory Hypothesis

Based on Certainty, evidence, repeated testing

Possibility, projection, prediction

Data Wide set of data tested under multiple

repeatable experimentations

Limited scope

Instance General; may be applied to various


Very specific; limited to one observable instance

Source: http://www.diffen.com/difference/Hypothesis_vs_Theory

Steps of the Scientific Method (Specific to Debugging)

1. Observe a failure – should be found in the problem description.

2. Create a hypothesis that explains the failure cause.

3. Use the hypothesis to make reasonable predictions.

4. Test the hypothesis by experimentation and further observations.

5. Repeat steps 3 and 4 until hypothesis is fully refined.

Does it satisfy the prediction?

Does it NOT satisfy the prediction?

Refine hypothesis.

Alternative hypothesis.

Example of the scientific method that illustrates the error where MOZILLA crashes when attempting to print.

Gather sources in order to make a reasonable hypothesis, such as: Program description Program code Observance of a failed run Alternative runs that offer

deeper insight into the problem The hypothesis can either be

rejected or confirmed.

Confirmed – Observation is found to be correct and testable; expected results successfully match the actual.

Rejected – Observation provides actual results that do not match the expected.

Results of Scientific Method

A successful iteration through the Scientific Method will produce a valid theory that explains the failure cause

It can clearly explain not only the current failure, but past failures as well (past observations)

It predicts future observations

This newly developed theory is a diagnosis

Applying Scientific MethodSample Program Revisited

Let’s reintroduce the “sample” problem from earlier chapters:

First, let’s note exactly what our error is and why we classify it as an error:

What happened during the failing run? Why didn’t it meet expectations? What does it need to do?

The sample program is designed to take in command-line arguments and sort them in ascending order. The program failed with arguments “11 14” because of an unexplained instance of “0”.

Baseline HypothesisSample Program Revisited

We can start with a simple and obvious baseline hypothesis that describes our failure:

Hypothesis The sample program works.

Prediction The output of sample 11 14 is “11 14”.

Experiment We run sample as previously.

Observation The output of sample 11 14 is “0 11”.

Conclusion The hypothesis is rejected.

Hypothesis #1- Start simple.

Is the “0” reported by the value in the program state? Let’s attempt to reduce the scope within which the

infection can be found. We will create a simple hypothesis first that will help us target the obvious problem: Hypothesis The execution causes a a[0] to be 0.

Prediction a[0] = 0 should hold at line 37.

Experiment Determine the value of a[0] at line 37 using a debugger.

Observation a[0] = 0 holds as predicted.

Conclusion The hypothesis is confirmed.

Hypothesis #2 – Delve deeper.

With some information from the first hypothesis, we can start to make an assumption about the “0”. Let’s take a look at the shell_sort() function now: Hypothesis The infection does not take place until


Prediction a[ ] = [11, 14] and size = 2 should hold at line 6.

Experiment Observe the resulting values of a[ ] and size.

Observation a[ ] = [11, 14, 0] and size = 3 holds.

Conclusion The hypothesis is rejected.

Hypothesis #3 – Becoming clearer.

Since the 0 is inserted within a [ ] BEFORE it gets to the shell_sort() function, we can make the conclusion that the infection does NOT take place within the function. Bad arguments are causing the failure: Hypothesis Invocation of shell_sort() with size = 3 causes

the failure.

Prediction Update the size manually from 3 to 2. The output should then be “11 14”.

Experiment 1. Step execution at shell_sort() (line 6)2. 2. set size to 23. Resume execution

Observation shell_sort() fails as predicted.

Conclusion The hypothesis is confirmed.

Hypothesis Invocation of shell_sort() with size = argc causes the failure.

Prediction If we change argc to argc – 1, the run should be successful.

Experiment Change argc to argc – 1 and run.

Observation As predicted.

Conclusion The hypothesis is confirmed.

Hypothesis #4 – Last iteration.

The infection location is now clear: the value of size from the argc You can observe that size is from the size of the

array plus 1 (argc + 1). We should be able to edit size by using (size = argc – 1) rather than (size = argc).


argc causes the failure

argc – 1 fixes the failure.

Explicit Debugging

The Scientific Method process is explicit: Stated clearly and in sufficient detail

without any room for doubts or confusionSimply stating the problem itself can help

you rethink your assumptionsThis can also reveal crucial clues to the

solution you can trying to solve Unfortunately, many programmers tend to

be implicit with the debugging process Forced to memorize the relevant experiments

and outcomes

Solution to implicit debugging: Keep a Logbook

Avoid storing observations and experiments results solely in your mind:

Keep a logbook – any form you like (written, electronic)

Keeping a well-written log book can allow you to make evident incremental progress in debugging Benefits: Log what observations you have previously made – each

new observation can lead to new conclusionsLog what experiments you have already developed –

avoid repeated work and save timeLog how you can proceed in the future – narrow down the

possible causes of the failure one by one

Logbook Structure Thorough statement of the problem Hypotheses to target problem cause Logical predictions corresponding to

hypotheses Experiments that were performed Observed results of these experiments Conclusions that a tester makes that can be

used to narrow down the infection cause:

An example logbook entry pertaining to the “sample” program.

“Quick-And-Dirty” Approach

The formal methodical approach of developing hypotheses and testing them may be tedious and/or unnecessarily for simple problems:

Identify a

problem you

would like to



a solution in 30


Success? No


debugging necessary.


Use Scientif

ic Method

and record results

in logbook


Problem is too complex and/or difficult to solve ad hoc.

Algorithmic Debugging

Partially automating the debugging process

Tool is used to guide the tester/programmer along an interactive debugging process

Asks a series of questions about the program output in order to locate error in the code

Similar to a decision tree: New possible outcomes

derived from input of the user until final target is found

Possibilities are “branched” from top to bottom, starting with the first question asked

Example Python Sort Program: Applying algorithmic debugging

A sample Python program attempting to sort elements using the sort() function. The highlighted portion indicates the location where the error in the code can be found.

Example run through of an algorithmic debugging process.

Page 27: SCIENTIFIC DEBUGGING CHAPTER 6. What we have covered so far: How to reproduce the problem effectively Simplified the problem What we will cover in Chapter

Algorithmic Debugging (Continued)

Systematic – follows distinct set of questions

Compares the results of many computations to the programmer’s expected results

Works best for functional and logical programming languages – few or no side effects

Not effective for real-world applications: Process does not scale

– too many functions to be inspected

Programmers prefer driving to being driven – process is rigid: Do not like being


SOLUTION: replace tester with an oracle – automatic device that determines correctness from incorrectness (does not exist)

Advantages Disadvantages

Deriving a Hypothesis

Stronger hypothesis

Less experimental


Faster the diagnosis of the


“effective” debugging:

Leverage as many knowledge sources as possible:

1. Description of the problem2. Problem code3. Failing run results4. Alternative runs5. Earlier hypotheses

Resource #1: Description of the Problem

Should be concise, but sufficient Utilize a simplified problem report

Source: https://www.cs.ubc.ca/labs/spl/projects/hipikat/scenarios.html

Resource #2: Program Code

Basis for all debugging techniques Must be familiar with the internals of the

program Without program code, you are forced to

create a work around, rather than fixing the code itself

All tools must have access to the source code to be used successfully

Resource #3: The Failing Run

The program code allows you to speculate what the problem may be, but a concrete failing run will give you actual facts

Problem state Contents of the running program code

Resource #4: Alternate Runs

Most interested in anomalies – aspects of failing run that are very different from normal, successful runs

We must know what a “normal” run is – what it is meant to do and how Used to compare between “normal” and

“abnormal” The more runs we complete, the better

chance we can discover what “normal” is supposed to be and how to create a successful run

Resource #5: Earlier hypotheses

All new hypotheses must be a refined result from previous hypotheses

Include all earlier hypotheses that passed – predictions were true and testable

Exclude all hypotheses that failed – predictions that did not match expectations

It is impossible to reach a final, target hypothesis if you can not make logical conclusions from previous experiences

Must be able to explain earlier observations

Reasoning Techniques

Different reasoning techniques used to debug programs based on a hierarchy of the number of runs required

Levels 0 – 1:

Deduction – Level 00 runs

Move from the general to the particular

Reasoning from program code to concrete runs

In the form of mathematical proofs

Does not require knowledge of the concrete Static analysis – findings

found without running a program

Observation – Level 11 run

Inspect aspects of an individual program run

Actual run of the program is required (dynamic)

Finds actual facts that cannot be denied

Levels 2 – 3:

Induction – Level 22 runs

Reasoning from particular to the general Opposite of deduction

Used to summarize multiple program runs to some abstraction

Generates findings from multiple executions of the program

Experimentation – Level 3

“n” number of runs A series of refined and

detailed experiments as a result of previous iterations

Generates findings from multiple executions of the program that are controlled by the reasoning technique

1. "Hypothesis vs Theory." - Difference and Comparison. N.p., n.d. Web. 08 Oct. 2015. <http://www.diffen.com/difference/Hypothesis_vs_Theory>.

2. Silva, Josep. "A Comparative Study of Algorithmic Debugging Strategies." Logic-Based Program Synthesis and Transformation Lecture Notes in Computer Science (2007): 143-59. Web.

3. Zeller, Andreas. Why Programs Fail: A Guide to Systematic Debugging. Amsterdam: Elsevier/Morgan Kaufmann, 2006. Print.