48

Click here to load reader

Data Mining: The Missing Link in Empirical SE

Embed Size (px)

DESCRIPTION

There is an old joke where two scientists are staring at a blackboard. In between the equations on the left and the equations on the right, it is written "and then a miracle occurs". The older scientist points to this missing link and comments "I think you need to be more explicit in step two" (see http://socsci2.ucsd.edu/~aronatas/project/cartoon.math.gif) If you read the empirical SE literature, you can see a similar "missing link" in recent descriptions of how to conduct empirical SE studies. In one paper I just read, there was: • 9 pages: on selecting methods • 3 pages: on research questions • 2 pages: on different forms of "empirical truth" • 2 pages: on empirical validity • 1 page: on the role of theory building • 1 page: on data collection techniques • And 0 pages: on data analysis (and then a miracle happens) Speaking as a data mining researcher, I'm asserting that selecting data analysis methods deserves more than 0 pages. Data analysis is not a simple matter of loading up a box with data, then pressing a button to produce the output. The actual procedure is far more complex and, if done correctly, can lead to radical new insights that drive the next generation of the theorizing and experimentation. This talk will illustrate this process with detailed examples from effort estimation and defect prediction.

Citation preview

Page 1: Data Mining: The Missing Link in Empirical SE

Data Mining: the Missing Link in Empirical Software Engineering

Tim Menzies, Adam Brady : WVU, USA Jacky Keung : NICTA, Oz

Page 2: Data Mining: The Missing Link in Empirical SE

Sound bites •  An observation:

–  Surprisingly few general SE results.

•  A requirement: –  Need simple methods for finding local lessons.

•  Take home lesson: –  Finding useful local lessons is remarkably simple –  E.g. using “W”

Page 3: Data Mining: The Missing Link in Empirical SE

Remember the old joke?

Page 4: Data Mining: The Missing Link in Empirical SE

SE Methods Guidance (2007) •  9 pages: selecting methods •  3 pages: research questions •  2 pages: empirical validity •  2 pages: different forms of "empirical truth" •  1 page: role of theory building •  1 page: conclusions •  1 page: data collection techniques •  0 pages: data analysis

–  and then a miracle happens

•  Data analysis needs more than 0 pages –  Properly done, data analysis replaces, not augments, standard empirical methods

Page 5: Data Mining: The Missing Link in Empirical SE

Two views Olde: Positivist tradition •  Problem statement •  Research objective, context •  Experiment (goals, materials,

tasks, hypotheses, design) •  Collection, hypothesis testing •  Etc

•  Release the Ph.D. students •  Wait N years •  Do it once, then celebrate

Page 6: Data Mining: The Missing Link in Empirical SE

Two views Olde: Positivist tradition •  Problem statement •  Research objective, context •  Experiment (goals, materials,

tasks, hypotheses, design) •  Collection, hypothesis testing •  Etc

•  Release the Ph.D. students •  Wait N years •  Do it once, then celebrate

Me, 1988

Page 7: Data Mining: The Missing Link in Empirical SE

Two views Olde: Positivist tradition •  Problem statement •  Research objective, context •  Experiment (goals, materials,

tasks, hypotheses, design) •  Collection, hypothesis testing •  Etc

•  Release the Ph.D. students •  Wait N years •  Do it once, then celebrate

Me, 1988

Page 8: Data Mining: The Missing Link in Empirical SE

Two views New : “SAIL” Olde: Positivist tradition

•  Problem statement •  Research objective, context •  Experiment (goals, materials,

tasks, hypotheses, design) •  Collection, hypothesis testing •  Etc

•  Release the Ph.D. students •  Wait N years •  Do it once, then celebrate

•  S imple, semi-automated •  A gent •  I ncremental, instantaneous •  L essons learned, librarian

•  Real-time, anytime monitoring/alert/repair on organizational data

•  Do it all the time, every time

Page 9: Data Mining: The Missing Link in Empirical SE

9/44

Page 10: Data Mining: The Missing Link in Empirical SE

Have we lived up to our PROMISE?

Few general results

•  PROMISE 2005 … 2009 : 64 presentations •  48 papers

–  tried a new analysis on old data –  Or reported a new method that worked once for one project.

•  4 papers –  argued against model generality

•  9 papers –  questioned validity of prior results

•  E.g. Menzies et al. Promise 2006 –  100 times

•  Select 90% of the training data •  Find<a,b> in effort = x.a.LOC b

Page 11: Data Mining: The Missing Link in Empirical SE

Only 11% of papers proposed general models

Have we lived up to our PROMISE?

•  E.g. Ostrand, Weyuker, Bell ‘08, ‘09 –  Same functional form –  Predicts defects for generations of AT&T software

•  E.g. Turhan, Menzies, Bener ’08, ‘09 –  10 projects

•  Learn on 9 •  Apply to the 10th

–  Defect models learned from NASA projects work for Turkish whitegoods software

•  Caveat: need to filter irrelevant training examples

Page 12: Data Mining: The Missing Link in Empirical SE

Lessons learned are very localized

Less Promising Results

•  FSE’09: Zimmerman et al. –  Defect models

not generalizable •  Learn “there”, apply

“here” only works in 4% of their 600+ experiments

–  Opposite to Turhan’09 results •  ?add relevancy filter

•  ASE’09: Green, Menzies et al. –  AI search for better software project options –  Conclusions highly dependent on

local business value proposition

•  And others –  TSE’06: Menzies, Greenwald –  Menzies et al. in ISSE 2007 –  Zannier et al ICSE’06

Page 13: Data Mining: The Missing Link in Empirical SE

The gods are (a little) angry

Overall

•  Fenton at PROMISE’ 07 –  "... much of the current software metrics research is

inherently irrelevant to the industrial mix ...” –  "... any software metrics program that depends on some

extensive metrics collection is doomed to failure ...”

•  Budgen & Kitchenham: –  “Is Evidence Based Software Engineering mature

enough for Practice & Policy? ” –  Need for better reporting: more reviews. –  Empirical SE results too immature for making

policy.

•  Basili : still far to go –  But we should celebrate the progress made over

the last 30 years. –  And we are turning the corner

Page 14: Data Mining: The Missing Link in Empirical SE

Experience Factories

•  Basili’09 (pers. comm.): –  “All my papers have the same form. –  “For the project being studied, we find that changing X improved Y.”

•  Translation (mine): –  Even if we can’t find general models (which seem to be quite rare)…. –  … we can still research general methods for finding local lessons

learned

Methods to find local lessons

Page 15: Data Mining: The Missing Link in Empirical SE

If we can’t find general models, is it science?

Popper ‘60 –  Everything is a “hypothesis” –  And the good ones have

weathered the most attack –  SE “theories” aren’t even

“hypotheses”

Endres & Rombach ‘03 –  Distinguish “observations”,

“laws”, “theory” –  Laws predict repeatable

observations –  Theories explain laws –  Laws are

•  Hypotheses (tentatively accepted) •  Conjectures (guesses)

Gregor’06 : five types of “theory”: 1.  Analysis (e.g. ontologies,

taxonomies) 2.  Explanation (but it is hard to

explain “explanation”) 3.  Prediction (some predictors do

not explain) 4.  Explanation and prediction 5.  “models” for design + action

–  Don’t have to be “right” –  Just “useful” –  A.k.a. Endres & Rombach’s

“laws”?

Page 16: Data Mining: The Missing Link in Empirical SE

If we can’t find general models, is it science?

Popper ‘60 –  Everything is a “hypothesis” –  And the good ones have

weathered the most attack –  SE “theories” aren’t even

“hypotheses”

Endres & Rombach ‘03 –  Distinguish “observations”,

“laws”, “theory” –  Laws predict repeatable

observations –  Theories explain laws –  Laws are

•  Hypotheses (tentatively accepted) •  Conjectures (guesses)

Gregor’06 : five types of “theory”: 1.  Analysis (e.g. ontologies,

taxonomies) 2.  Explanation (but it is hard to

explain “explanation”) 3.  Prediction (some predictors do

not explain) 4.  Explanation and prediction 5.  “models” for design + action

–  Don’t have to be “right” –  Just “useful” –  A.k.a. Endres & Rombach’s

“laws”?

Need ways to quickly build and maintain domain- specific SE models

Page 17: Data Mining: The Missing Link in Empirical SE

The rest of this talk: “W”

•  Bayesian case-based contrast-set learner

–  uses greedy search –  illustrates the “local lessons” effect –  offers functionality missing in

the effort-estimation literature

•  Fast generator of baseline results –  There are too few baseline results –  And baseline results can be very

interesting (humbling).

•  A very (very) simple algorithm –  Should add it to your toolkit –  At least, as the “one to beat”

17/46

A “local lessons” finder

Holte’85 • C4: builds decision trees “N” deep • 1R: builds decision trees “1” deep • For datasets with 2 classes, 1R ≈ C4

accuracy

Page 18: Data Mining: The Missing Link in Empirical SE

18/47

“W”= a minimal machine for making models

Page 19: Data Mining: The Missing Link in Empirical SE

“W”= Simple (Bayesian) Contrast Set Learning (in linear time)

•  “best” = target class (e.g. “survive”) •  “rest” = other classes •  x = any range (e.g. “sex=female”) •  f(x|c) = frequency of x in class c

•  b = f( x | best ) / F(best) •  r = f( x | rest ) / F(rest)

•  LOR= log(odds ratio) = log(b/r) –  ? normalize 0 to max = 1 to 100

•  s = sum of LORs –  e = 2.7183 … –  p = F(B) / (F(B) + F(R))

–  P(B) = 1 / (1 + e^(-1*ln(p/(1 - p)) - s ))

Mozina: KDD’04

Page 20: Data Mining: The Missing Link in Empirical SE

“W”= Simple (Bayesian) Contrast Set Learning (in linear time)

•  “best” = target class •  “rest” = other classes •  x = any range (e.g. sex = female) •  f(x|c) = frequency of x in class c

•  b = f( x | best ) / F(best) •  r = f( x | rest ) / F(rest)

•  LOR= log(odds ratio) = log(b/r) –  ? normalize 0 to max = 1 to 100

•  s = sum of un-normalized LORs –  e = 2.7183 … –  p = F(B) / (F(B) + F(R)) –  P(B) = 1 / (1 + e^(-1*ln(p/(1 - p)) - s ))

Mozina: KDD’04

Page 21: Data Mining: The Missing Link in Empirical SE

“W”= Simple (Bayesian) Contrast Set Learning (in linear time)

•  “best” = target class •  “rest” = other classes •  x = any range (e.g. sex = female) •  f(x|c) = frequency of x in class c

•  b = f( x | best ) / F(best) •  r = f( x | rest ) / F(rest)

•  LOR= log(odds ratio) = log(b/r) –  ? normalize 0 to max = 1 to 100

•  s = sum of un-normalized LORs –  e = 2.7183 … –  p = F(B) / (F(B) + F(R)) –  P(B) = 1 / (1 + e^(-1*ln(p/(1 - p)) - s ))

Mozina: KDD’04

Page 22: Data Mining: The Missing Link in Empirical SE

“W”:Simpler (Bayesian) Contrast Set Learning (in linear time)

•  “best” = target class •  “rest” = other classes •  x = any range (e.g. sex = female) •  f(x|c) = frequency of x in class c

•  b = f( x | best ) / F(best) •  r = f( x | rest ) / F(rest)

•  LOR= log(odds ratio) = log(b/r) –  ? normalize 0 to max = 1 to 100

•  s = sum of LORs –  e = 2.7183 … –  p = F(B) / (F(B) + F(R)) –  P(B) = 1 / (1 + e^(-1*ln(p/(1 - p)) - s ))

Mozina: KDD’04

“W”: 1)  Discretize data and outcomes 2)  Count frequencies of ranges in classes 3)  Sort ranges by LOR 4) Greedy search on top ranked ranges

Page 23: Data Mining: The Missing Link in Empirical SE

LOR and Defect Prediction

•  Data from Norman Fenton’s Bayes Net

•  Classes= –  round(log2(defects/KLOC))/2)

•  Target class = max(class) –  I.e. worse defects

•  Only a few features matter

•  Only a few ranges of those features matter

Page 24: Data Mining: The Missing Link in Empirical SE

Preliminaries

“W” + CBR

•  “Query” –  What kind of project you want to analyze; e.g.

•  Analysts not so clever, •  High reliability system •  Small KLOC

•  “Cases” –  Historical records, with their development effort

•  Output: –  A recommendation on how to change our projects

in order to reduce development effort

Page 25: Data Mining: The Missing Link in Empirical SE

Cases

train test

Cases map features F to a utility F= Controllables + others

Page 26: Data Mining: The Missing Link in Empirical SE

Cases

train test

Cases map features F to a utility F= Controllables + others

(query ⊆ ranges)

relevant

k-NN

Page 27: Data Mining: The Missing Link in Empirical SE

Cases

train test

(query ⊆ ranges)

relevant

Best utilities

rest

x

x

b = F(x | best) / F(best)

r = F(x | rest) / F(rest)

k-NN

Cases map features F to a utility F= Controllables + others

Page 28: Data Mining: The Missing Link in Empirical SE

Cases

train test

(query ⊆ ranges)

relevant

Best utilities

rest

x

x

S = all x sorted descending by score

if controllable(x) && b > r && b > min then score(x) = log(b/r) else score(x) = 0 fi

k-NN

Cases map features F to a utility F= Controllables + others

b = F(x | best) / F(best)

r = F(x | rest) / F(rest)

Page 29: Data Mining: The Missing Link in Empirical SE

Cases

train test

(query ⊆ ranges)

relevant

Best utilities

rest

x

x

S = all x sorted descending by score

queryi* = query + ∪iSi

treatedi

k-NN

k-NN

Cases map features F to a utility F= Controllables + others

if controllable(x) && b > r && b > min then score(x) = log(b/r) else score(x) = 0 fi

b = F(x | best) / F(best)

r = F(x | rest) / F(rest)

Page 30: Data Mining: The Missing Link in Empirical SE

Cases

train test

(query ⊆ ranges)

relevant

Best utilities

rest

x

x

S = all x sorted descending by score

queryi* = query + ∪iSi

treatedi

k-NN

k-NN i

utility

spread

Cases map features F to a utility F= Controllables + others

if controllable(x) && b > r && b > min then score(x) = log(b/r) else score(x) = 0 fi

b = F(x | best) / F(best)

r = F(x | rest) / F(rest)

median

Page 31: Data Mining: The Missing Link in Empirical SE

Cases

train test

(query ⊆ ranges)

relevant

Best utilities

rest

x

x

S = all x sorted descending by score

queryi* = query + ∪iSi

treatedi

k-NN

k-NN i

q0* qi*

As is To be

Cases map features F to a utility F= Controllables + others

if controllable(x) && b > r && b > min then score(x) = log(b/r) else score(x) = 0 fi

treatment

b = F(x | best) / F(best)

r = F(x | rest) / F(rest)

i

utility

spread

median

Page 32: Data Mining: The Missing Link in Empirical SE

#1: Brooks’s Law

Page 33: Data Mining: The Missing Link in Empirical SE

Some tasks have inherent temporal constraints

Which no amount of $$$ can change

Page 34: Data Mining: The Missing Link in Empirical SE

Brooks’s Law (1975) “Adding manpower (sic) to a late project makes it later.”

Inexperience of new comers • Extra communication overhead • Slower progress

Page 35: Data Mining: The Missing Link in Empirical SE

“W”, CBR, & Brooks’s law Can we mitigate for decreased experience?

•  Data: –  Nasa93.arff (from promisedata.org)

•  Query: –  Applications Experience

•  “aexp=1” : under 2 months –  Platform Experience

•  “plex=1” : under 2 months –  Language and tool experience

•  “ltex = 1” : under 2 months

•  For nasa93, inexperience does not always delay the project –  if you can reign in the DB requirements.

•  So generalities may be false –  in specific circumstances

Need ways to quickly build and maintain domain- specific SE models

Page 36: Data Mining: The Missing Link in Empirical SE

#2 , #3,…. #13

Page 37: Data Mining: The Missing Link in Empirical SE

Results (distribution of development efforts in qi*)

Cases from promisedata.org/data

Median = 50% percentile Spread = 75% - 25% percentile

Improvement = (X - Y) / X •  X = as is •  Y = to be •  more is better

Usually: •  spread ≥ 75% improvement •  median ≥ 60% improvement

0%

50%

100%

150%

-50% 0% 50% 100% 150%

median improvement

spre

ad im

pro

vem

ent

Using cases from http://promisedata.org

Page 38: Data Mining: The Missing Link in Empirical SE

And that’s the whole point But that was so easy

•  Yes, finding local lessons learned need not be difficult

•  Strange to say… – There are no references in the CBR effort

estimation literature for anything else than estimate = nearest neighbors

– No steps beyond into planning , etc – Even though that next steps is easy

Page 39: Data Mining: The Missing Link in Empirical SE

What should be change?

“W” learns treatments = qj* - qo*

Page 40: Data Mining: The Missing Link in Empirical SE

Good news Improving estimates requires very few changes

Page 41: Data Mining: The Missing Link in Empirical SE

Not-so-good news Local lessons are very localized

Page 42: Data Mining: The Missing Link in Empirical SE

Not-so-good news Local lessons are very localized

Page 43: Data Mining: The Missing Link in Empirical SE

Most certainly!

Can we do better than “W”?

•  “W” contains at least a dozen arbitrary design decisions –  Which is best?

•  But the algorithm is so simple –  It should least be a baseline tool –  Against which we compare supposedly

more sophisticated methods. –  The straw man

•  Methodological advice –  Before getting complex, get simple –  Warning: often: my straw men don’t burn

Page 44: Data Mining: The Missing Link in Empirical SE

In conclusion …

Page 45: Data Mining: The Missing Link in Empirical SE

Certainly, we should always strive for generality

But don’t be alarmed if you can’t find it

•  SAIL: –  Semi-automated, simple,

agent incremental lessons learned librarian

–  Not one piece of software –  But a “call to arms” –  Run it anytime, every time

•  E.g. “W” –  Even though very simple,

•  Adds significant functionality to standard case-based effort estimation

•  The experience to date is that, –  with rare exceptions, –  SAILing does not lead to

general theories

•  But that’s ok –  Very few others have found

general models (in SE) –  E.g. Turhan, Menzies, Ayse’09

•  Anyway –  If there are few general

results, there may be general methods to find local results

Page 46: Data Mining: The Missing Link in Empirical SE

Case-based reasoning

Btw, constantly (re)building local models is a general model

•  Kolodner’s theory of reconstructive memory

•  The Yale group –  Shank & Riesbeck et al. –  Memory, not models –  Don’t “think”, remember

Page 47: Data Mining: The Missing Link in Empirical SE

Minsky’86: “Society of Mind” •  The brain is a set of 1000+ kludges

Feigenbaum'78 •  Don't take your heart attack

to the Maths Dept. –  Were they will diagnose and treat you

using first principles

•  Instead, go to the E.R room –  Staffed by doctors who spent

decades learning the quirks of drugs, organs, diseases, people, etc

Seek out those that study kludges. •  You'll be treated faster •  You'll live longer

Ask some good old fashioned AI types

Kludges work

Page 48: Data Mining: The Missing Link in Empirical SE

See you at PROMISE’10?

http://promisedata.org/2010