Upload
sung-kim
View
817
Download
1
Embed Size (px)
Citation preview
Heterogeneous Defect PredictionESEC/FSE 2015
September 3, 2015Jaechang Nam and Sunghun Kim
Department of Computer Science and Engineering
HKUST
2
Predict
Training
?
?
Model
Project A
: Metric value: Buggy-labeled instance: Clean-labeled instance
?: Unlabeled instance
Software Defect Prediction
Related WorkMunson@TSE`92, Basili@TSE`95, Menzies@TSE`07,Hassan@ICSE`09, Bird@FSE`11,D’ambros@EMSE112Lee@FSE`11,...
Within-Project Defect Prediction (WPDP)
3
What if labeled instances do not exist?
?
?
?
??
Project X
Unlabeled Dataset
?: Unlabeled instance: Metric value
Model
New projectsProjects lacking inhistorical data
4
Existing Solutions?
?
?
?
??
(New) Project X
Unlabeled Dataset
?: Unlabeled instance: Metric value
5
Cross-Project Defect Prediction (CPDP)
?
?
?
??
Training
Predict
Model
Project A(source)
Project X(target)
Unlabeled Dataset
: Metric value: Buggy-labeled instance: Clean-labeled instance
?: Unlabeled instance
Related WorkWatanabe@PROMISE`08, Turhan@EMSE`09Zimmermann@FSE`09, Ma@IST`12, Zhang@MSR`14Zhang@MSR`14, Panichella@WCRE`14, Canfora@STVR15
Challenge
Same metric set(same feature space)
• Heterogeneousmetrics betweensource and target
6
Motivation
?
Training
Test
Model
Project A(source)
Project C(target)
?
?
?
?
?
?
?Heterogeneous metric sets
(different feature spacesor different domains)
Possible to Reuse all the existing defect datasets for CPDP!
Heterogeneous Defect Prediction (HDP)
7
Key Idea• Consistent defect-proneness tendency of
metrics– Defect prediction metrics measure complexity of
software and its development process.• e.g.
– The number of developers touching a source code file (Bird@FSE`11)
– The number of methods in a class (D’Ambroas@ESEJ`12)– The number of operands (Menzies@TSE`08)
More complexity implies more defect-proneness (Rahman@ICSE`13)
• Distributions between source and target should be the same to build a strong prediction model.
Match source and target metrics that have similar distribution
8
Heterogeneous Defect Prediction (HDP)
- Overview -X1 X2 X3 X4 Label
1 1 3 10 Buggy8 0 1 0 Clean⋮ ⋮ ⋮ ⋮ ⋮9 0 1 1 Clean
MetricMatching
Source: Project A Target: Project B
Cross-prediction Mode
l Build
(training)Predict(test)
MetricSelection
Y1 Y2 Y3 Y4 Y5 Y6 Y7 Label
3 1 1 0 2 1 9 ?1 1 9 0 2 3 8 ?⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮0 1 1 1 2 1 1 ?
1 3 10 Buggy8 1 0 Clean⋮ ⋮ ⋮ ⋮9 1 1 Clean
1 3 10 Buggy8 1 0 Clean⋮ ⋮ ⋮ ⋮9 1 1 Clean
9 1 1 ?8 3 9 ?⋮ ⋮ ⋮ ⋮1 1 1 ?
9
Metric Selection• Why? (Guyon@JMLR`03)
– Select informative metrics• Remove redundant and irrelevant metrics
– Decrease complexity of metric matching combinations
• Feature Selection Approaches (Gao@SPE`11,Shivaji@TSE`13)
– Gain Ratio– Chi-square– Relief-F– Significance attribute evaluation
10
Metric Matching
Source Metrics Target MetricsX1
X2
Y1
Y2
0.8
0.4
0.5
0.3
* We can apply different cutoff values of matching scores.* It can be possible that there is no matching at all.
11
Compute Matching Score KSAnalyzer• Use p-value of Kolmogorov-Smirnov Test
(Massey@JASA`51) Matching Score M of i-th source and j-th target metrics:
Mij = pij
12
Heterogeneous Defect Prediction- Overview -
X1 X2 X3 X4 Label
1 1 3 10 Buggy8 0 1 0 Clean⋮ ⋮ ⋮ ⋮ ⋮9 0 1 1 Clean
MetricMatching
Source: Project A Target: Project B
Cross-prediction Mode
l Build
(training)Predict(test)
MetricSelection
Y1 Y2 Y3 Y4 Y5 Y6 Y7 Label
3 1 1 0 2 1 9 ?1 1 9 0 2 3 8 ?⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮0 1 1 1 2 1 1 ?
1 3 10 Buggy8 1 0 Clean⋮ ⋮ ⋮ ⋮9 1 1 Clean
1 3 10 Buggy8 1 0 Clean⋮ ⋮ ⋮ ⋮9 1 1 Clean
9 1 1 ?8 3 9 ?⋮ ⋮ ⋮ ⋮1 1 1 ?
EVALUATION
13
14
Baselines• WPDP
• CPDP-CM (Turhan@EMSE`09,Ma@IST`12,He@IST`14)
– Cross-project defect prediction using only common metrics between source and target datasets
• CPDP-IFS (He@CoRR`14)
– Cross-project defect prediction on Imbalanced Feature Set (i.e. heterogeneous metric set)
– 16 distributional characteristics of values of an instance as features (e.g., mean, std, maximum,...)
15
Research Questions (RQs)• RQ1– Is heterogeneous defect prediction
comparable to WPDP?• RQ2– Is heterogeneous defect prediction
comparable to CPDP-CM? • RQ3– Is Heterogeneous defect prediction
comparable to CPDP-IFS?
16
Benchmark DatasetsGroup Dataset
# of instances # of
metrics
Granularity
All Buggy (%)
AEEEM
EQ 325 129 (39.7%)
61 Class
JDT 997 206 (20.7%)
LC 399 64 (9.36%)
ML 1862
245 (13.2%)
PDE 1492
209 (14.0%)
MORPH
ant-1.3 125 20 (16.0%)
20 Class
arc 234 27 (11.5%)
camel-1.0 339 13 (3.8%)
poi-1.5 237 141 (75.0%)
redaktor 176 27 (15.3%)
skarbonka 45 9 (20.0%)
tomcat 858 77 (9.0%)velocity-
1.4 196 147 (75.0%)
xalan-2.4 723 110 (15.2%)
xerces-1.2 440 71
(16.1%)
Group Dataset
# of instances # of
metrics
Granularity
All Buggy (%)
ReLink
Apache 194 98 (50.5%)
26 FileSafe 56 22 (39.3%)
ZXing 399 118 (29.6%)
NASA
cm1 327 42 (12.8%)
37 Function
mw1 253 27 (10.7%)
pc1 705 61 (8.7%)
pc3 1077 134 (12.4%)
pc4 1458 178 (12.2%)
SOFTLAB
ar1 121 9 (7.4%)
29 Function
ar3 63 8 (12.7%)
ar4 107 20 (18.7%)
ar5 36 8 (22.2%)
ar6 101 15 (14.9%)
600 prediction combinations in total!
17
Experimental Settings• Logistic Regression• HDP vs. WPDP, CPDP-CM, and CPDP-IFS
Test set(50%)
Training set(50%)
Project 1
Project 2
Project n
......
X 1000
Project 1
Project 2
Project n
......
CPDP-CMCPDP-IFS
HDPWPDP
Project A
18
Evaluation Measures• False Positive Rate = FP/(TN+FP)• True Positive Rate = Recall• AUC (Area Under receiver operating characteristic Curve)
False Positive rate
True
Pos
itive
rate
0 1
1
19
Evaluation Measures• Win/Tie/Loss (Valentini@ICML`03, Li@JASE`12,
Kocaguneli@TSE`13)
– Wilcoxon signed-rank test (p<0.05) for 1000 prediction results
– Win• # of outperforming HDP prediction combinations
with statistical significance. (p<0.05)– Tie
• # of HDP prediction combinations with no statistical significance. (p≥0.05)
– Loss• # of outperforming baseline prediction results
with statistical significance. (p>0.05)
RESULT
20
Prediction Results in median AUC
Target WPDP CPDP-CM
CPDP-IFS
HDPKS(cutoff=0.05)
EQ 0.583 0.776 0.461 0.783JDT 0.795 0.781 0.543 0.767MC 0.575 0.636 0.584 0.655ML 0.734 0.651 0.557 0.692*PDE 0.684 0.682 0.566 0.717
ant-1.3 0.670 0.611 0.500 0.701arc 0.670 0.611 0.523 0.701
camel-1.0 0.550 0.590 0.500 0.639
poi-1.5 0.707 0.676 0.606 0.537redaktor 0.744 0.500 0.500 0.537skarbonk
a 0.569 0.736 0.528 0.694*
tomcat 0.778 0.746 0.640 0.818velocity-
1.4 0.725 0.609 0.500 0.391
xalan-2.4 0.755 0.658 0.499 0.751xerces-
1.2 0.624 0.453 0.500 0.48921
Target WPDP CPDP-CM
CPDP-IFS
HDPKS(cutoff=0.05)
Apache 0.714 0.689 0.635 0.717*Safe 0.706 0.749 0.616 0.818*
ZXing 0.605 0.619 0.530 0.650*cm1 0.653 0.622 0.551 0.717*mw1 0.612 0.584 0.614 0.727pc1 0.787 0.675 0.564 0.752*pc3 0.794 0.665 0.500 0.738*pc4 0.900 0.773 0.589 0.682*ar1 0.582 0.464 0.500 0.734*ar3 0.574 0.862 0.682 0.823*ar4 0.657 0.588 0.575 0.816*ar5 0.804 0.875 0.585 0.911*ar6 0.654 0.611 0.527 0.640
All 0.657 0.636 0.555 0.724*
HDPKS: Heterogeneous defect prediction using KSAnalyzer
22
Win/Tie/Loss ResultsTarget
Against WPDP
Against CPDP-
CM
AgainstCPDP-
IFSW T L W T L W T L
EQ 4 0 0 2 2 0 4 0 0JDT 0 0 5 3 0 2 5 0 0LC 6 0 1 3 3 1 3 1 3ML 0 0 6 4 2 0 6 0 0PDE 3 0 2 2 0 3 5 0 0
ant-1.3 6 0 1 6 0 1 5 0 2arc 3 1 0 3 0 1 4 0 0
camel-1.0 3 0 2 3 0 2 4 0 1
poi-1.5 2 0 2 3 0 1 2 0 2redaktor 0 0 4 2 0 2 3 0 1skarbonk
a 11 0 0 4 0 7 9 0 2
tomcat 2 0 0 1 1 0 2 0 0velocity-
1.4 0 0 3 0 0 3 0 0 3
xalan-2.4 0 0 1 1 0 0 1 0 0xerces-
1.2 0 0 3 3 0 0 1 0 2
Target
AgainstWPDP
AgainstCPDP-CM
AgainstCPDP-IFS
W T L W T L W T L
Apache 6 0 5 8 1 2 9 0 2
Safe 14 0 3 12 0 5 15 0 2ZXing 8 0 0 6 0 2 7 0 1cm1 7 1 2 8 0 2 9 0 1mw1 5 0 1 4 0 2 4 0 2pc1 1 0 5 5 0 1 6 0 0pc3 0 0 7 7 0 0 7 0 0pc4 0 0 7 2 0 5 7 0 0ar1 14 0 1 14 0 1 11 0 4ar3 15 0 0 5 0 10 10 2 3ar4 16 0 0 14 1 1 15 0 1ar5 14 0 4 14 0 4 16 0 2ar6 7 1 7 8 4 3 12 0 3
Total 147 3 72 147 14 61 182 3 35% 66.2
%1.4%
32.4%
66.2%
6.3%
27.5%
82.0%
1.3%
16.7%
23
Matched Metrics (Win)M
etric
Val
ues
Distribution(Source metric: RFC-the number of method invoked by a class, Target metric: the number of operands)
Matching Score = 0.91
AUC = 0.946 (ant1.3 ar5)
24
Matched Metrics (Loss)M
etric
Val
ues
Distribution(Source metric: LOC, Target metric: average number of LOC in a method)
Matching Score = 0.13
AUC = 0.391 (Safe velocity-1.4)
25
Different Feature Selections(median AUCs, Win/Tie/Loss)
Approach
AgainstWPDP
AgainstCPDP-CM
AgainstCPDP-IFS HDP
AUC Win% AUC Win% AUC Win% AUC
Gain Ratio 0.657 63.7% 0.645 63.2% 0.536 80.2% 0.720Chi-
Square 0.657 64.7% 0.651 66.4% 0.556 82.3% 0.727
Significance 0.657 66.2% 0.636 66.2% 0.553 82.0% 0.724
Relief-F 0.670 57.0% 0.657 63.1% 0.543 80.5% 0.709None 0.657 47.3% 0.624 50.3% 0.536 66.3% 0.663
26
Results in Different Cutoffs
CutoffAgainstWPDP
AgainstCPDP-CM
AgainstCPDP-IFS HDP
TargetCoverageAUC Win% AUC Win% AUC Win% AUC
0.05 0.657 66.2% 0.636 66.2% 0.553 82.4% 0.724* 100%0.90 0.657 100% 0.761 71.4% 0.624 100% 0.852* 21%
27
Conclusion• HDP– Potential for CPDP across datasets
with different metric sets.
• Future work– Filtering out noisy metric matching– Determine the best probability
threshold
28
Q&ATHANK YOU!