Upload
aron
View
65
Download
1
Embed Size (px)
DESCRIPTION
Experiences and Results from Initiating Field Defect Prediction and Product Test Prioritization Efforts at ABB Inc. Paul Luo Li (Carnegie Mellon University) James Herbsleb (Carnegie Mellon University) Mary Shaw (Carnegie Mellon University) Brian Robinson (ABB Research). - PowerPoint PPT Presentation
Citation preview
© A
BB
Cor
pora
te R
esea
rch
- 1
-Ja
nuar
y, 2
004
Experiences and Results from Initiating Field Defect Prediction and Product Test Prioritization Efforts at ABB Inc.
Paul Luo Li (Carnegie Mellon University)
James Herbsleb (Carnegie Mellon University)
Mary Shaw (Carnegie Mellon University)
Brian Robinson (ABB Research)
© A
BB
Cor
pora
te R
esea
rch
- 2
Field defects matter
Citizen Thermal Energy,Indianapolis, Indiana, USA
Molson,Montreal, Canada
Heineken Spain,Madrid, Spain
Shajiao Power Station,Guandong, China
© A
BB
Cor
pora
te R
esea
rch
- 3
ABB’s risk mitigation activities include…
Remove potential field defects by focusing systems/integration testing
Higher quality product
Earlier defect detection (when its cheaper to fix)
© A
BB
Cor
pora
te R
esea
rch
- 4
ABB’s risk mitigation activities include…
Remove potential field defects by focusing systems/integration testing
Higher quality product
Earlier defect detection (when its cheaper to fix)
Plan for maintenance by predicting the number of field defects within the first year
Faster response for customers
Less stress (e.g. for developers)
More accurate budgets
© A
BB
Cor
pora
te R
esea
rch
- 5
ABB’s risk mitigation activities include…
Remove potential field defects by focusing systems/integration testing
Higher quality product
Earlier defect detection (when its cheaper to fix)
Plan for maintenance by predicting the number of field defects within the first year
Faster response for customers
Less stress (e.g. for developers)
More accurate budgets
Plan for future process improvement efforts by identifying important characteristics (i.e. category of metrics)
More effective improvement efforts
© A
BB
Cor
pora
te R
esea
rch
- 6
Practical issues
How to conduct analysis with incomplete information?
How to select an appropriate modeling method for systems/integration testing prioritization and process improvement planning?
How to evaluate the accuracy of predictions across multiple releases in time?
Experience on how we got to the results
© A
BB
Cor
pora
te R
esea
rch
- 7
Talk outline
ABB systems overview Field defect modeling overview Outputs Inputs
Insight 1: Use available information Modeling methods
Insight 2: Select a modeling method based on explicability and quantifiability
Insight 3: Evaluate accuracy using forward prediction evaluation
Empirical results Conclusions
© A
BB
Cor
pora
te R
esea
rch
- 8
We examined two systems at ABB
Product A
Real-time monitoring system
Growing code base of ~300 KLOC
13 major/minor/fix-pack releases
~127 thousand changes committed by ~ 40 different people
Dates back to 2000 (~5 years)
Product B
Tool suite for managing real-time modules
Stable code base of ~780 KLOC
15 major/minor/fix-pack releases
~50 people have worked on the project
Dates back to 1996 (~9 years)
© A
BB
Cor
pora
te R
esea
rch
- 9
We collected data from…
Request tracking system (Serena Tracker)
Version control system (Microsoft Visual Source Safe)
Experts (e.g. team leads and area leads)
© A
BB
Cor
pora
te R
esea
rch
- 1
0
Talk outline
ABB systems overview Field defect modeling overview Outputs Inputs
Insight 1: Use available information Modeling methods
Insight 2: Select a modeling method based on explicability and quantifiability
Insight 3: Evaluate accuracy through time using forward prediction evaluation
Empirical results Conclusions
© A
BB
Cor
pora
te R
esea
rch
- 1
1
Breakdown of field defect modeling
Predictors (metrics available before release)
Product (Khoshgoftaar and Munson, 1989)
Development (Ostrand et al., 2004)
Deployment and usage (Mockus et al., 2005)
Software and hardware configurations (Li et al., 2006)
Inputs
© A
BB
Cor
pora
te R
esea
rch
- 1
2
Inputs
Metrics-based methods
Modeling method
Breakdown of field defect modeling
© A
BB
Cor
pora
te R
esea
rch
- 1
3
Inputs
Field defects
Modeling method
Outputs
Breakdown of field defect modeling
© A
BB
Cor
pora
te R
esea
rch
- 1
4
Modeling process
Inputs
Take historical Inputs and Outputs to construct model
Modeling method
Outputs
© A
BB
Cor
pora
te R
esea
rch
- 1
5
Talk outline
ABB systems overview Field defect modeling overview Outputs Inputs
Insight 1: Use available information Modeling methods
Insight 2: Select a modeling method based on explicability and quantifiability
Insight 3: Evaluate accuracy through time using forward prediction evaluation
Empirical results Conclusions
© A
BB
Cor
pora
te R
esea
rch
- 1
6
Outputs
Field defects: valid customer reported problems attributable to a release in Serena Tracker
Relationships
What predictors are related to field defects?
Quantities
What is the number of field defects?
© A
BB
Cor
pora
te R
esea
rch
- 1
7
Outputs
Field defects: valid customer reported problems attributable to a release in Serena Tracker
Relationships
Plans for improvement
Targeted systems testing
Quantities
Maintenance resource planning
Remember these objectives
© A
BB
Cor
pora
te R
esea
rch
- 1
8
Talk outline
ABB systems overview Field defect modeling overview Outputs Inputs
Insight 1: Use available information Modeling methods
Insight 2: Select a modeling method based on explicability and quantifiability
Insight 3: Evaluate accuracy through time using forward prediction evaluation
Empirical results Conclusions
© A
BB
Cor
pora
te R
esea
rch
- 1
9
Inputs (Predictors)
Product metrics
Lines of code
Fanin
Halstead’s difficulty …
Development metrics
Open issues
Deltas
Authors …
Deployment and usage (DU) metrics
… we’ll talk more about this
Software and hardware configuration (SH) metrics
Sub-system
Windows configuration …
© A
BB
Cor
pora
te R
esea
rch
- 2
0
Insight 1: use available information
ABB did not officially collect DU information (e.g. the number of installations) Do analysis without the information?
We collected data from available data sources that provided information on possible deployment and usage Type of release Elapsed time between releases
Improved validity More accurate models Justification for better data collection
© A
BB
Cor
pora
te R
esea
rch
- 2
1
Talk outline
ABB systems overview Field defect modeling overview Outputs Inputs
Insight 1: Use available information Modeling methods
Insight 2: Select a modeling method based on explicability and quantifiability
Insight 3: Evaluate accuracy through time using forward prediction evaluation
Empirical results Conclusions
© A
BB
Cor
pora
te R
esea
rch
- 2
2
Methods to establish relationships
Rank Correlation (for improvement planning)
Single predictor
Defect modeling (for improvement planning and for systems/integration test prioritization)
Multiple predictors that complement each other
© A
BB
Cor
pora
te R
esea
rch
- 2
3Insight 2: select a modeling method based on explicability and quantifiability
Previous work use accuracy, however…
To prioritize product testing
Identify faulty configurations
Quantify relative fault-proneness of configurations
For process improvement
Identify characteristics related to field defects
Quantify relative importance of characteristics
© A
BB
Cor
pora
te R
esea
rch
- 2
4Insight 2: select a modeling method based on explicability and quantifiability
Previous work use accuracy, however…
To prioritize product testing
Identify faulty configurations
Quantify relative fault-proneness of configurations
For process improvement
Identify characteristics related to field defects
Quantify relative importance of characteristics
Explicability
© A
BB
Cor
pora
te R
esea
rch
- 2
5Insight 2: select a modeling method based on explicability and quantifiability
Previous work use accuracy, however…
To prioritize product testing
Identify faulty configurations
Quantify relative fault-proneness of configurations
For process improvement
Identify characteristics related to field defects
Quantify relative importance of characteristics
Quantifiability
© A
BB
Cor
pora
te R
esea
rch
- 2
6Insight 2: select a modeling method based on explicability and quantifiability
Previous work use accuracy, however…
To prioritize product testing
Identify faulty configurations
Quantify relative fault-proneness of configurations
For process improvement
Identify characteristics related to field defects
Quantify relative importance of characteristics
Not all models have these qualities e.g. Neural Networks, models with
Principal Component Analysis
© A
BB
Cor
pora
te R
esea
rch
- 2
7
The modeling method we used
Linear modeling with model selection
39% less accurate than Neural Networks (Khoshgoftaar et al.)
example only: not a real model
© A
BB
Cor
pora
te R
esea
rch
- 2
8
Linear modeling with model selection
example only: not a real model
Explicability: distinguish the effects of each predictor
The modeling method we used
Function (Field defects) = B1*Input1 + B2 *Input2 + B3 *Input4
© A
BB
Cor
pora
te R
esea
rch
- 2
9
Linear modeling with model selection
example only: not a real model
Quantifiability: compare the effects of predictors
The modeling method we used
Function (Field defects) = B1*Input1 + B2 *Input2 + B3 *Input4
© A
BB
Cor
pora
te R
esea
rch
- 3
0
Talk outline
ABB systems overview Field defect modeling overview Outputs Inputs
Insight 1: Use available information Modeling methods
Insight 2: Select a modeling method based on explicability and quantifiability
Insight 3: Evaluate accuracy through time using forward prediction evaluation
Empirical results Conclusions Skipping ahead…
Read the paper
© A
BB
Cor
pora
te R
esea
rch
- 3
1
Systems/Integration test prioritizationProduct A Predictors Estimated Effect
Sub-system:
Sub-system 1 9.85x more
Sub-system 2 8.39x more
Sub-system 3 8.13x more
Sub-system 4 7.22x more
Software platforms:
Not Windows Server Versions
1.91x more
Other predictors:
Service Pack 5.55x less
Open Issues 1.01x less
Num Authors 1.08x less
Months Before Next Release 1.16x more
Months Since 1st Release 1.03x less
Log (Field defects) = B1 Input1 + B2 Input2 …
© A
BB
Cor
pora
te R
esea
rch
- 3
2
Systems/Integration test prioritization
Select a modeling method based on explicability and quantifiability
Product A Predictors Estimated Effect
Sub-system:
Sub-system 1 9.85x more
Sub-system 2 8.39x more
Sub-system 3 8.13x more
Sub-system 4 7.22x more
Software platforms:
Not Windows Server Versions
1.91x more
Other predictors :
Service Pack 5.55x less
Open Issues 1.01x less
Num Authors 1.08x less
Months Before Next Release 1.16x more
Months Since 1st Release 1.03x less
© A
BB
Cor
pora
te R
esea
rch
- 3
3
Systems/Integration test prioritization
Select a modeling method based on explicability and quantifiability
Product A Predictors Estimated Effect
Sub-system:
Sub-system 1 9.85x more
Sub-system 2 8.39x more
Sub-system 3 8.13x more
Sub-system 4 7.22x more
Software platforms:
Not Windows Server Versions
1.91x more
Other predictors :
Service Pack 5.55x less
Open Issues 1.01x less
Num Authors 1.08x less
Months Before Next Release 1.16x more
Months Since 1st Release 1.03x less
© A
BB
Cor
pora
te R
esea
rch
- 3
4
Systems/Integration test prioritization
Select a modeling method based on explicability and quantifiability
Product A Predictors Estimated Effect
Sub-system:
Sub-system 1 9.85x more
Sub-system 2 8.39x more
Sub-system 3 8.13x more
Sub-system 4 7.22x more
Software platforms:
Not Windows Server Versions
1.91x more
Other predictors :
Service Pack 5.55x less
Open Issues 1.01x less
Num Authors 1.08x less
Months Before Next Release 1.16x more
Months Since 1st Release 1.03x less
© A
BB
Cor
pora
te R
esea
rch
- 3
5
Systems/Integration test prioritization
Select a modeling method based on explicability and quantifiability
Use available information
Improved validity
More accurate model
Justification for better data collection
Product A Predictors Estimated Effect
Sub-system:
Sub-system 1 9.85x more
Sub-system 2 8.39x more
Sub-system 3 8.13x more
Sub-system 4 7.22x more
Software platforms:
Not Windows Server Versions
1.91x more
Other predictors :
Service Pack 5.55x less
Open Issues 1.01x less
Num Authors 1.08x less
Months Before Next Release 1.16x more
Months Since 1st Release 1.03x less
© A
BB
Cor
pora
te R
esea
rch
- 3
6
Systems/Integration test prioritization
Experts validated results
Quantitative justification for action
ABB found additional defects
Product A Predictors Estimated Effect
Sub-system:
Sub-system 1 9.85x more
Sub-system 2 8.39x more
Sub-system 3 8.13x more
Sub-system 4 7.22x more
Software platforms:
Not Windows Server Versions
1.91x more
Other predictors :
Service Pack 5.55x less
Open Issues 1.01x less
Num Authors 1.08x less
Months Before Next Release 1.16x more
Months Since 1st Release 1.03x less
© A
BB
Cor
pora
te R
esea
rch
- 3
7
Talk outline
ABB systems overview Field defect modeling overview Outputs Inputs
Insight 1: Use available information Modeling methods
Insight 2: Select a modeling method based on explicability and quantifiability
Insight 3: Evaluate accuracy through time using forward prediction evaluation
Empirical results Conclusions
© A
BB
Cor
pora
te R
esea
rch
- 3
8
Risk mitigation activities enabled
Focusing systems/integration testing
Found additional defects
Plan for maintenance by predicting the number of field defects within the first year
Do not yet know if results are accurate enough for planning purposes
Plan for future process improvement efforts
May combine with prediction method to enable process adjustments
© A
BB
Cor
pora
te R
esea
rch
- 3
9
Experiences recapped
Use available information when direct/preferred information is unavailable
Consider explicability and quantifiability of a modeling method when objectives are improvement planning and test prioritization
Use forward prediction evaluation procedure to assesses accuracy of prediction for multiple releases in time
Details on insights and results in our paper
© A
BB
Cor
pora
te R
esea
rch
- 40
-Ja
nuar
y, 2
004
Experiences and Results from Initiating Field Defect Prediction and Product Test Prioritization Efforts at ABB Inc.
Paul Luo Li (Carnegie Mellon University)
James Herbsleb (Carnegie Mellon University)
Mary Shaw (Carnegie Mellon University)
Brian Robinson (ABB Research)
Thanks to:
Ann Poorman
Janet Kaufman
Rob Davenport
Pat Weckerly
© A
BB
Cor
pora
te R
esea
rch
- 4
1
Insight 3 : use forward prediction evaluation
Accuracy is the correct criterion when predicting the number of field defects for maintenance resource planning
Current accuracy evaluation methods are not well-suited for multi-release systems
Cross-validation
Random data with-holding
© A
BB
Cor
pora
te R
esea
rch
- 4
2
Accuracy is the correct criterion when predicting the number of field defects for maintenance resource planning
Current accuracy evaluation methods are not well-suited for multi-release systems
Cross-validation
Random data with-holding
Release 1 Release 2 Release 3 Release 4
Insight 3 : use forward prediction evaluation
© A
BB
Cor
pora
te R
esea
rch
- 4
3
Accuracy is the correct criterion when predicting the number of field defects for maintenance resource planning
Current accuracy evaluation methods are not well-suited for multi-release systems
Cross-validation
Random data with-holding
Release 1 Release 2 Release 3 Release 4
Insight 3 : use forward prediction evaluation
© A
BB
Cor
pora
te R
esea
rch
- 4
4
Accuracy is the correct criterion when predicting the number of field defects for maintenance resource planning
Current accuracy evaluation methods are not well-suited for multi-release systems
Cross-validation
Random data with-holding
Release 1 Release 2 Release 3 Release 4
Insight 3 : use forward prediction evaluation
© A
BB
Cor
pora
te R
esea
rch
- 4
5
Accuracy is the correct criterion when predicting the number of field defects for maintenance resource planning
Current accuracy evaluation methods are not well-suited for multi-release systems
Cross-validation
Random data with-holding
Release 1 Release 2 Release 3 Release 4
Insight 3 : use forward prediction evaluation
© A
BB
Cor
pora
te R
esea
rch
- 4
6
Accuracy is the correct criterion when predicting the number of field defects for maintenance resource planning
Current accuracy evaluation methods are not well-suited for multi-release systems
Cross-validation
Random data with-holding
Release 1 Release 2 Release 3 Release 4
Insight 3 : use forward prediction evaluation
© A
BB
Cor
pora
te R
esea
rch
- 4
7
Accuracy is the correct criterion when predicting the number of field defects for maintenance resource planning
Current accuracy evaluation methods are not well-suited for multi-release systems
Cross-validation
Random data with-holding
Release 1 Release 2 Release 3 Release 4
Insight 3 : use forward prediction evaluation
© A
BB
Cor
pora
te R
esea
rch
- 4
8
Accuracy is the correct criterion when predicting the number of field defects for maintenance resource planning
Current accuracy evaluation methods are not well-suited for multi-release systems
Cross-validation
Random data with-holding
Release 1 Release 2 Release 3 Release 4
Insight 3 : use forward prediction evaluation
© A
BB
Cor
pora
te R
esea
rch
- 4
9
Accuracy is the correct criterion when predicting the number of field defects for maintenance resource planning
Current accuracy evaluation methods are not well-suited for multi-release systems
Cross-validation
Random data with-holding
Release 1 Release 2 Release 3 Release 4
Insight 3 : use forward prediction evaluation
© A
BB
Cor
pora
te R
esea
rch
- 5
0
Accuracy is the correct criterion when predicting the number of field defects for maintenance resource planning
Current accuracy evaluation methods are not well-suited for multi-release systems Cross-validation Random data with-holding
Only a non-random sub-set is available Predicting for a past release is not the same as
predicting for a future release
Not realistic!
Insight 3 : use forward prediction evaluation
© A
BB
Cor
pora
te R
esea
rch
- 5
1
Accuracy is the correct criterion when predicting the number of field defects for maintenance resource planning
Current accuracy evaluation methods are not well-suited for multi-release systems
Cross-validation
Random data with-holding
We use a forward prediction evaluation method
Release 1 Release 2 Release 3 Release 4
Insight 3 : use forward prediction evaluation
© A
BB
Cor
pora
te R
esea
rch
- 5
2
Accuracy is the correct criterion when predicting the number of field defects for maintenance resource planning
Current accuracy evaluation methods are not well-suited for multi-release systems
Cross-validation
Random data with-holding
We use a forward prediction evaluation method
Release 1 Release 2 Release 3 Release 4
Insight 3 : use forward prediction evaluation
© A
BB
Cor
pora
te R
esea
rch
- 5
3
Accuracy is the correct criterion when predicting the number of field defects for maintenance resource planning
Current accuracy evaluation methods are not well-suited for multi-release systems
Cross-validation
Random data with-holding
We use a forward prediction evaluation method
Release 1 Release 2 Release 3 Release 4
Insight 3 : use forward prediction evaluation
© A
BB
Cor
pora
te R
esea
rch
- 5
4
Field defect prediction
ARE Product A R1.0 R1.1 R1.2 Avg
Moving Average 1 Release 3.0% 51.7% 19.2% 24.6%
Linear Regression with Model Selection 17.6% 17.6%
Tree Split with 2 Releases 3.0% 51.7% 19.2% 24.6%
Tree Split with 3 Releases 3.0% 54.0% 19.2% 25.4%
With cost of field defects, can combine with field defect predictions to allocate initial maintenance resources
Need to evaluate if ~24% average error is adequate
© A
BB
Cor
pora
te R
esea
rch
- 5
5
Improvement planning
Product A
Open Issues
Service Pack
Product B
Open Issues
Months Before Next Release
Model selection selected predictors
© A
BB
Cor
pora
te R
esea
rch
- 5
6
Improvement planning
Select a modeling method based on explicability and quantifiability
Product A
Open Issues
Service Pack
Product B
Open Issues
Months Before Next Release
© A
BB
Cor
pora
te R
esea
rch
- 5
7
Improvement planning
Select a modeling method based on explicability and quantifiability
Use available information
Improved validity
More accurate model
Justification for better data collection
Product A
Open Issues
Service Pack
Product B
Open Issues
Months Before Next Release
© A
BB
Cor
pora
te R
esea
rch
- 5
8
Improvement planning
Product A
Open Issues
Service Pack
Product B
Open Issues
Months Before Next Release
Can delay deployment to conduct more testing
Can reduce scope of next release to resolve field defects
© A
BB
Cor
pora
te R
esea
rch
- 5
9
Image from smig.usgs.gov
Explicability example: neural networks
© A
BB
Cor
pora
te R
esea
rch
- 6
0
Z1 = 1
Input1*weight1 + Input2*weight2 + Input3*weight3 + Input4*weight4
Z1
Explicability example: neural networks
© A
BB
Cor
pora
te R
esea
rch
- 6
1
Output = 1
Z1*weight1 + Z2*weight2 + Z3*weight3 + Z4*weight4
Z1
Z2
Z3
Z4
Z5
Explicability example: neural networks
© A
BB
Cor
pora
te R
esea
rch
- 6
2
Z1
Z2
Z3
Z4
Z5
?
How does input relate to the output?
Explicability example: neural networks
© A
BB
Cor
pora
te R
esea
rch
- 6
3
Z1
Z2
Z3
Z4
Z5
Improvement planning and test prioritization both need to attribute effects to predictors
?
Explicability example: neural networks
© A
BB
Cor
pora
te R
esea
rch
- 6
4
Z1 = 1
Input1*weight1 + Input2*weight2 + Input3*weight3 + Input4*weight4
Z1
Quantifiability example: neural networks
© A
BB
Cor
pora
te R
esea
rch
- 6
5
Z1
Z2
Z3
Z4
Z5
How do the predictors compare?
?
Quantifiability example: neural networks
© A
BB
Cor
pora
te R
esea
rch
- 6
6
Z1
Z2
Z3
Z4
Z5
Improvement planning and test prioritization both need to compare importance of predictors
?
Quantifiability example: neural networks