Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline Empirical Strategies Measurement Experiment Process

Experimentation in Computer Science (Part 2)

Experimentation in Software Engineering --- Outline

Empirical Strategies Measurement Experiment Process

E

Experiment Process:Phases

ExperimentDefinition

ExperimentPlanning

ExperimentOperation

Analysis &Interpretation

Presentation& Package Conclusions

ExperimentIdea

ExperimentProcess

Experiment Process:Phases Defined

Experiment Idea: ask the right question (insight) Experiment Definition: ask the question right Experiment Planning: design experiment to

answer question Experiment Operation: collect metrics Analysis and Interpretation: statistically evaluate

and determine practical consequences Presentation: disseminate results

E



ExperimentPlanning

ExperimentOperation



ExperimentIdea

ExperimentProcess

Experiment Definition:Overview

Formulate experiment idea -- ask the right question Define goals -- why conduct the experiment State research questions:

Descriptive – what percentage of developers use OO? Relational – what percentage of experienced / novice

developers use OO? Causal – what’s the average productivity of developers

using OO versus developers using non-OO?

7

Experiment Definition:Overview – Example

How do test suite size and test case composition affect the costs and benefits of web testing methodologies?

E



ExperimentPlanning

ExperimentOperation



ExperimentIdea

ExperimentProcess

9

Experiment Planning:Overview

ContextSelection

HypothesisFormulation

VariablesSelection

Selection ofSubjects

ExperimentDesign

ExperimentOperation


ExperimentPlanning

Instrumen-tation

ValidityEvaluation

10

Experiment Planning:Context Selection

Context: environment and personnel: Dimensions include:

off-line vs on-line student vs professional personnel toy vs real problems specific vs general software engineering domain

Selection drivers: validity vs cost

11

Experiment Planning:Hypothesis Formulation

Hypothesis: A formal statement related to a research question

Forms the basis for statistical analysis of results through hypothesis testing

Data collected in the experiment is used to, if possible, reject the hypothesis

12

Experiment Planning: Hypothesis Formulation

There are two hypotheses for each question of interest: Null Hypothesis, H0: Describes the state in which the

prediction does not hold. Alternative Hypothesis, Ha, H1, etc : Describes the

prediction we believe will be supported by evidence.

Goal of experiment is to reject H0 with as high significance as possible; this rejection then implies acceptance of the alternative hypothesis

13

Experiment Planning:Hypothesis Formulation

Hypothesis testing involves risks Type-I-error: The probability of rejecting a true null

hypothesis. In this case we infer a pattern or relationship that does not exist.

Type-II-error: The probability of not rejecting a false null hypothesis. In this case we fail to identify a pattern or relationship that does exist.

Power of a statistical test: The probability that the test will reveal a true pattern if the null hypothesis is false (1 – P(type-II-error))

14

Experiment Planning:Variable Selection

Types of Variables to Select: Independent: manipulated by investigator or nature Dependent: affected by changes in Independent

Also Select: Measures and measurement scales Ranges for variables Specific levels of independent variables to be used

15

Experiment Planning:Selection of Subjects/Objects

Selection process strongly affects ability to generalize results

Process for selecting subjects/objects: Identify population U Draw a sample from U using a sampling technique

16

Experiment Planning: Selection of Subjects/Objects

Probability sampling: Simple random: randomly select from U Systematic random:select first subject from U at

random, then select every nth after that Stratified random: divide U into strata following a

known distribution, then apply random within strata Non-probability sampling:

Convenience: select the nearest, most convenient Quota: used to get subjects from various elements of

a population; convenience is used for each element

17

Experiment Planning:Selection of Subjects/Objects

Larger sample sizes result in lower error If population has large variability, larger sample

size is needed Data analysis methods may influence choice of

sample size However: higher sample size implies higher cost Hence, we want a sample as small as possible,

but large enough so that we can generalize!

Experiment Planning:Experiment Design - Principles

Randomization. Statistical methods require that observations be made from independent random variables; applies to subjects, objects, treatments.

Blocking. Given a factor that may affect results but that we aren’t interested in; we block subjects, objects, or techniques w.r.t. that factor, and analyze blocks independently (e.g, program in TSE paper).

Balancing. Assign treatments such that each has an equal number of subjects; not essential, but simplifies and strengthens statistical analysis

Experiment Planning:Experiment Design - Design Types

We will consider several, suitable for experiments with: One factor with two treatments One factor with more than two treatments Two factors with two treatments More than two factors each with two treatments

Notation: i: the mean of the dependent variable for treatment i

Experiment Planning: Experiment Design – 1 Fctr, 2 Trtmts

• Design type:• completely randomized

• Description:• simple means comparison

• Example hypothesis:• H0: 1 = 2

• H1: 1<>2, 1>2 or 1<2,

• Examples of analyses:• T-test• Mann-Whitney

Subjects Trtmt 1 Trtmt 2

1 X

2 X

3 X

4 X

5 X

6 X



• Description:• simple means comparison

• Example hypothesis:• H0: 1 = 2

• H1: 1<>2, 1>2 or 1<2,

• Examples of analyses:• T-test• Mann-Whitney

EXAMPLE: Investigate whether humans usinga new testing method detect faults better than humans using a previousmethod. The factor is the method, treatments are old and new methods, dependent variable could be numberof faults found.


• Design type:• paired comparison

• Description:• compare differences between

techniques more precisely; beware learning effects

• Example hypothesis:• H0: d = 0 (d = mean of diff)• H1: d<>0, d>0, or d<0

• Examples of analyses:• Paired t-test, Sign test, Wilcoxon

Subjects Trtmt 1 Trtmt 2

1 2 1

2 1 2

3 2 1

4 2 1

5 1 2

6 1 2


EXAMPLE: Investigate whether a new testing criterion facilitates fault detection better than a previous criterion. The factor is the criterion, treatments are use of old and new criteria, dependent variable could be number of faults found.

• Design type:• paired comparison

• Description:• compare differences between

techniques more precisely; beware learning effects

• Example hypothesis:• H0: d = 0 (d = mean of diff)• H1: d<>0, d>0, or d<0

• Examples of analyses:• Paired t-test, Sign test, Wilcoxon

Experiment Planning: Experiment Design – 1 Fctr, 3+ Trtmts


• Description:• means comparison

• Example hypothesis:• H0: 1 = 2 = 3=…= a

• H1: i<>j for some (i,j)

• Examples of analyses:• ANOVA• Kruskal-Wallis

Subjects Trtmt 1 Trtmt 2 Trtmt 3

1 X

2 X

3 X

4 X

5 X

6 X



• Description:• means comparison




EXAMPLE: Investigate whether humans usinga new testing method detect faults better than humans using two previousmethods. The factor is the method, treatments are new and two old methods, dependent variable could be number of faults found.


• Design type:• randomized complete block

• Description:• compare diffs; esp. if large

variability between subjects




Subjects Trtmt 1 Trtmt 2 Trtmt 3

1 1 3 2

2 3 1 2

3 2 3 1

4 2 1 3

5 3 2 1

6 1 2 3


• Design type:• randomized complete block

• Description:• compare diffs; esp. if large

variability between subjects



• Examples of analyses:• ANOVA, Kruskal-Wallis

EXAMPLE: Investigate whether a new testing criterion facilitates fault detection better than two previous criteria. The factor is the criterion, treatments are use of new and old criteria, dependent variable could be number of faults found.

Experiment Planning: Experiment Design – 2 Fctrs, 2 Trtmts

• Design type:• 2*2 factorial, 2 treatments

• Three hypotheses• Effect of treatment Ai • Effect of treatment Bi• Effect of interaction

between Ai and Bi

Factor A

Trtmt A1 Trtmt A2

Factor B Trtmt B1 Subject 4,6 Subject 1,7

Trtmt B2 Subject 2,3 Subject 5,8

• Example hypothesis:• H0: 1 = 2 = 0• H1: at least one i<>j 0• (Hypothesis instantiated for

each treatment and for interaction)

• Examples of analyses:• ANOVA

Experiment Planning: Experiment Design – 2 Fctrs, 2 Trtmts

Example: Investigate regression testability of code usingretest-all and regression test selection, in the case wheretests are coarse-grained and the case where they are fine-grained. Factor A is technique, Factor B is granularity.Design is 2*2 factorial because both factors have 2 treatments and every combination of treatments occurs

• Design type:• 2*2 factorial, 2 treatments

• Three hypotheses• Effect of treatment Ai • Effect of treatment Bi• Effect of interaction

between Ai and Bi

• Example hypothesis:• H0: 1 = 2 = 0• H1: at least one i<>j 0• (Hypothesis instantiated for

each treatment and for interaction)

• Examples of analyses:• ANOVA

Experiment Planning: Experiment Design – k Fctrs, 2 Trtmts

Given k factors, results can depend on each factor or interactions among them.

2k design has k factors with two treatments, tests all combinations

Hypotheses and analyses are the same as for 2*2 factorial

Fctr A Fctr B Fctr C Sbjcts

A1 B1 C1 2, 3

A2 B1 C1 1, 13

A1 B2 C1 5, 6

A2 B2 C1 10, 16

A1 B1 C2 7, 15

A2 B1 C2 8, 11

A1 B2 C2 4, 9

A2 B2 C2 12, 14

Experiment Planning: Experiment Design – k Fctrs, 2 Trtmts

As factors grow, expense grows. If high-order interactions can be assumed to be negligible, it is possible to run a fraction of complete factorial

This approach may be used, in particular, for exploratory studies, to identify factors having large effects

Strengthen results by running other fractions in sequence

Fctr A Fctr B Fctr C Sbjcts

A2 B1 C1 2, 3

A1 B2 C1 1, 8

A1 B1 C2 5, 6

A2 B2 C2 4, 7

One-half fractional factorialdesign of the 2k factorial design

Select combinations s.t. if one factor isremoved, remaining design is full 2k-1

Documents

Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline Empirical Strategies Measurement Experiment Process