Upload
sailqu
View
75
Download
2
Embed Size (px)
Citation preview
SAIL, School of Computing, Queen’s University, Kingston, Canada
A Case Study of Bias in Bug-Fix Datasets
Thanh H. D. Nguyen, Bram Adams, Ahmed E. Hassan
2
We need bug prediction• Problem:
• Quality improvement resource is limited.• Solution:
• Bug prediction identifies defect-prone modules.
Our focus is data quality
3
What if there is sample
bias?
We should consider bias in our studies
Stanford graduate student housing survey
6
1
2
#1
#2
#2
Unlinked bugs have:Higher severityLess experience[Bird al et. 2009] Linkage Bias
7
1
2
#1
#2
#2
8
1
2
#1
#2
#2
Tagging BiasAbout 2/3 of all bugs
reports are not defects[Antoniol al et. 2008].
9
Biases are threats to validity of software quality studies
• Because of linkage bias, our models:• neglect higher severity bugs.• neglect less experienced developers.
• Because of tagging bias, our models:• inaccurately consider more bugs that existed.
Do biases really exist? How do biases
affect our research?
10
11
12
Near ideal data:Linkage is enforced.Tagging is provided.
13
Severity
Experience
Maturity
Release pressure
Collaboration
✔✔−−−
−✔✔−−
Conjecture: Biases are properties of the
software process, not of missing links.
Do linkage biases exist in Jazz?
14
Severity
Experience
Maturity
Release pressure
Collaboration
✔✔−−
✔
Question:How does
tagging biases affect our research?
Do tagging biases exist in Jazz?
15
How tagging biases affect our research?
Files Defects + Tasks
A 5B 4C 6D 1
Defects only
3441
Not biasWhich we should use
BiasWhich we
normally use
Spearman: .94Pearson: .97
Conjecture: It might be ok to
use biased data.
16