View
3.101
Download
0
Category
Preview:
DESCRIPTION
Promise 2011:"Does Measuring Code Change Improve Fault Prediction?"Robert Bell, Thomas Ostrand and Elaine Weyuker.
Citation preview
© 2007 AT&T Knowledge Ventures. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Knowledge Ventures.
Code Change and Fault Prediction Tom Ostrand, Robert Bell, Elaine Weyuker AT&T Labs – Research Florham Park, NJ, USA
PROMISE 2011
Banff, Alberta, September 20-21, 2011
Overview
•Do measures of code change or churn provide useful input to fault prediction models?
•Standard model
•Base models
•Churn-augmented models
The Standard Model
• Underlying statistical model
• Negative binomial regression
• Output (dependent) variable
• Predicted fault count in each file of release n
• Predictor (independent) variables
• KLOC (n)
• Previous faults (n-1)
• Previous changes (n-1, n-2)
• File age (number of releases)
• File type (C,C++,java,sql,make,sh,perl,...)
Evaluating prediction models
• Model produces ranking of files in a release, from predicted most faults to fewest faults
• Choose cutoff point in ranking, X%
• Yield = percent of all faults in the release that are in the first X% of the ranked files
We’ve usually evaluated models at a 20% cutoff.
• Fault-percentile average (FPA) is the average yield over all values of X
Prediction Results, from the Standard
Model
83 83
75 81
93
76
91 87 88
93 88
93 92
0
10
20
30
40
50
60
70
80
90
100
Percent of faults in top 20% of files FPA
Measures of Code Change
•Changed/not changed
•Number of changes during a release
•Number of lines added
•Number of lines deleted
•Number of lines modified
•Relative churn (line changes/LOC)
Two Subject Systems
Large provisioning system
• 18 releases, 5 year lifespan
• 6 programming languages:
• Java (60%), C, C++, SQL, SQL-C, SQL-C++
• 3000+ files
• 1.5Mil LOC
• Average of 395 faults/release
Two Subject Systems
Utility, data aggregation system
• 18 releases, 5 year lifespan
• >10 programming languages:
• Java (77%), Perl, xml, sh, ...
• 800 files
• 280K LOC
• Average of 90 faults/release
Distribution of files,
averages over all releases.
6.8% 11.0%
82.2%
Percent of Files: Provisioning
New
Changed
Unchanged
1.6% 15.1%
84.4%
Percent of Files: Utility
New
Changed
Unchanged
Where do faults occur?
Distribution of faults over files
0.24
0.80
0.02
Faults/file: Provisioning
New
Changed
Unchanged
0.12
0.82
Faults/file: Utility
New
Changed
Unchanged
Provisioning system faults per file, by
release
0
0.5
1
1.5
2
2.5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Fau
lt-p
er-
File
Release
Faults per File, by Change Status and Release
New (Mean=0.24) Unchanged (Mean=0.02) Changed (Mean=0.80)
Utility system faults per file, by release
0
0.5
1
1.5
2
2.5
3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Fau
lts
pe
r fi
le
Faults per File, by Change Status and Release
New (Mean=.09) Unchanged (Mean=.002) Changed (Mean=.92)
Potential predictor combinations
• Added lines only
• Deleted lines only
• Modified lines only
• Adds & Deletes
• Adds & Mods
• Deletes & Mods
• Adds & Deletes & Mods
• Relative values: changed lines/LOC
Distribution of change combinations,
all check-ins, all releases:
Provisioning system
Mods, 683 Deletes, 296
Adds, 597
Mods & Deletes, 168
Mods & Adds, 1894
Deletes & Adds, 126
M & D & A, 2625
Number of Files
Average lines touched for each combination of
changes
Mods, 4 Deletes, 5
Adds, 21
Mods & Deletes, 23
Mods & Adds, 37
Deletes & Adds, 21
M & D & A, 210
Average Lines touched
Faults per file, changed files only:
Provisioning system
Mods, 0.19
Deletes, 0.04
Adds, 0.3
Mods & Deletes, 0.36
Mods & Adds, 0.55
Deletes & Adds, 0.5
M & D & A, 1.38
Faults per File
Fault prediction models
•Univariate models
•Base model: log(KLOC), File age, File type
•Augmented models:
• Previous Changes
• Previous {Adds / Deletes / Mods}
• Previous Adds + Deletes + Modifications
• Previous {Adds / Deletes / Mods} / LOC (relative churn)
• Previous Developers
Fault-percentile averages for univariate
predictor models: Provisioning system (best result from raw variable, square root, fourth root)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
log(KLOC)
Prior Changes
Prior Adds+Deletes+Mods
Prior Developers
Prior Lines Added
Prior Lines Modified
Prior Changed
Prior Faults
Prior Lines Deleted
Language
Age
Standard Model
FPA, univariate models
Base Model 1
• KLOC
• File age (number of releases)
• File type (C,C++,java,sql,make,sh,perl,...)
Base Model 1, and added variables
• Base model 1
• KLOC
• File age (number of releases)
• File type (C,C++,java,sql,make,sh,perl,...)
89 90 91 92 93 94
Base 1 prev-prev changes
prev-deletes prev-mods
prev-changed prev-adds
prev-developers (prev-adds,dels,mods)/LOC
prev-adds,dels,mods prev-changes
Standard Model
Mean FPA, Provisioning System
87 88 89 90 91 92 93
Base 1
prev-prev changes
prev-deletes
prev-mods
prev-changed
prev-adds
prev-developers
prev-adds,dels,mods
prev-changes
Standard Model
Mean FPA, Utility System
Base Model 2
• KLOC
• File age (number of releases)
• File type (C,C++,java,sql,make,sh,perl,...)
• (Previous changes)1/2
Base Model 2, and added variables
93.2 93.25 93.3 93.35 93.4 93.45 93.5 93.55
Base 2
prev-changed
prev-deletes
(prev-adds,dels,mods)/LOC
prev-developers
prev-mods
prev-adds
prev-adds,dels,mods
prev-prev changes
Mean FPA, Provisioning System
• Base model 2
• KLOC
• File age (number of releases)
• File type (C,C++,java,sql,make,sh,perl,...)
• (Previous changes)1/2
Summary
• Churn can be an effective aid for improving fault prediction
• {Adds+Deletes+Mods} improves the accuracy of a model that doesn’t include any change information
BUT
• a simple count of prior changes slightly outperforms {Adds+Deletes+Mods}
• Prior changed is nearly as good as either, when added to a model without change info
• Lines added is the most effective single predictor
• Lines deleted is least effective single predictor
• Relative churn is no better than absolute churn for predicting total fault count
Recommended