DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis

Yuan Tian1, David Lo1, Chengnian Sun2 1Singapore Management University 2National University of Singapore

Bug tracking systems allow developers to prioritize which bugs are to be fixed first

¡  Manual process

¡  Depend on other bugs

¡  Time consuming

What is priority and when it is assigned?

Assigned

300 reports to triage daily!

Validity Check, Duplicate Check,

Priority Assignment Developer Assignment

Bug Triager

Priority Vs Severity

“Severity is assigned by customers [users] while priority is provided by developers . . . customer [user] reported severity does impact the developer when they assign a priority level to a bug report, but it’s not the only consideration. For example, it may be a critical issue for a particular reporter that a bug is fixed but it still may not be the right thing

for the eclipse team to fix.” Eclipse PMC Member

Importance: P5 (lowest priority level), major (high severity)

q  Background q  Approach

Overall Framework Features Classification Module

q  Experiment Dataset Research Questions Results

q Conclusion

Outline

q Conclusion

Outline

(1) summary (2) description (3) product (4) component (5) author (6) severity (7) priority.

6Time-related Info.

Bug Report

q Conclusion

Outline

Training Reports

Related Reports

Predicted Priority

Testing Reports

Temporal Textual

Author

Severity

Product

Model Builder Model

Application

Feature Extraction Module

Classifier Module

Training Phase

Testing Phase

Overall Framework

Temporal Factor TMP1 Number of bugs reported within 7 days before the reporting of BR.

TMP2 Number of bugs reported with the same severity within 7 days before the reporting of BR.

TMP3 Number of bugs reported with the same or higher severity within 7 days before the reporting of BR.

TMP4-6 The same as TMP1-3 except the time duration is 1 day.

TMP7-9 The same as TMP1-3 except the time duration is 3 days.

TMP10-12 The same as TMP1-3 except the time duration is 30 days.

Textual Factor TXT1-n Stemmed words from the description field of BR excluding stop

words.

Severity Factor SEV BR’s severity field. 9

Author Factor AUT1 Mean priority of all bugs reports made by the author of BR prior to the

reporting of BR. AUT2 Median priority of all bugs reports made by the author of BR prior to the

reporting of BR. AUT3 The number of bug reports made by the author of BR prior to the reporting

of BR.

Related Reports Factor [REP-, Sun et al.] REP1 Mean priority of the top-20 most similar bug reports to BR as measured

using REP- prior to the reporting of BR.

REP2 Median priority of the top-20 most similar bug reports to BR as measured using REP prior to the reporting of BR.

REP3-4 The same as REP1-2 except only the top 10 bug reports are considered.

REP9-10 The same as REP1-2 except only the top 1 bug reports are considered. 10

Product Factor PRO1 BR’s product field Note: categorical feature PRO2 Number of bug reports made for the same product as that of BR prior to

the reporting of BR. PRO3 Number of bug reports made for the same product of the same severity as

that of BR prior to the reporting of BR. PRO4 Number of bug reports made for the same product of the same or higher

severity as those of BR prior to the reporting of BR. PRO5 Proportion of bug reports made for the same product as that of BR prior to

the reporting of BR that are assigned priority P1. PRO6-9 The same as PRO5 except they are for priority P2-P5 respectively.

PRO10 Mean priority of bug reports made for the same product as that of BR prior to the reporting of BR.

PRO11 Median priority of bug reports mad for the same product as that of BR prior to the reporting of BR.

PRO12-22 The same as PRO1-11 except they are for the component field of BR.

Training Reports

Related Reports

Predicted Priority

Testing Reports

Temporal Textual

Author

Severity

Product

Model Builder Model

Application

Feature Extraction Module

Classifier Module

Training Phase

Testing Phase

Overall Framework

Model Building

Training Features

Linear Regression Model

GRAY: Thresholding and Linear Regression to Classify Imbalanced Data.

Map feature values to real numbers

Training Phase

Model Building

Training Features

Linear Regression

Model Application

Validation Data

Thresholding

Thresholds

Training Phase

Model Building

Training Features

Linear Regression

Model Application

Validation Data

Thresholding

Thresholds

•  Thresholding process maps real numbers to priority levels.

Training Phase

Thresholding Process

BR1 1.2

BR2 1.4

BR3 3.1

BR4 3.5

BR5 2.1

BR6 3.2

BR7 3.4

BR8 3.7

BR9 1.3

BR10 4.5

BR1 1.2

BR9 1.3

BR2 1.4

BR5 2.1

BR3 3.1

BR6 3.2

BR7 3.4

BR4 3.5

BR8 3.7

BR10 4.5

BR1 1.2

BR9 1.3

BR2 1.4

BR5 2.1

BR3 3.1

BR6 3.2

BR7 3.4

BR4 3.5

BR8 3.7

BR10 4.5

Model Building

Training Features

Linear Regression

Model Application

Testing Features

Predicted Priority

Validation Data

Thresholding

Thresholds

Training Phase

Testing Phase

q Conclusion

Outline

q Eclipse Project §  2001-10-10 to 2007-12-14,

§  178,609 bug reports.

Dataset

DRONE Testing DRONE Training

Model Building Validation

4.50% 6.89%

85.45%

1.95% 1.21%

P1 P2 P3 P4 P5

? Accuracy (Precision, Recall, F-measure)

Compare with SEVERISprio [Menzies & Marcus], SEVERISprio+

? Efficiency (Run time)

? Top features (Fisher score)

Research Questions & Measurements

20.00%

40.00%

60.00%

80.00%

100.00%

P1 P2 P3 P4 P5

DRONE SEVERISprio SEVERISprio+

RQ1: How accurate?

29.47%

18.75%

1.  Baselines predict everything as P3 !

2.  Average F-measure improves from 18.75% to 29.47%

3.  A relative improvement of 57.17%.

Approach

Run Time (in seconds) Feature

Extraction(train) Model

Building Feature

Extraction(test) Model

Application

SEVERISprio <0.01 812.18 <0.01 <0.01

SEVERISprio+ <0.01 773.62 <0.01 <0.01

DRONE 0.01 69.25 <0.01 <0.01

RQ2: How efficient?

Our approach is much faster in Model Building!

Feature

Text “1663”

RQ3: What are the top-features?

Feature

Text “1663”

6 out of the top-10 features belong to the product factor family.

Feature

Text “1663”

3 out of the top-10 features come from the related-report factor family.

Feature

Text “1663”

1)  org.eclipse.ui.internal.Workbench.run(Workbech.java:1663)

2)  Appears in 15% P5 reports.

Conclusion yuan.tian.2012@smu.edu.sg

q  Priority prediction is an ordinal + imbalance classification problem ->linear regression + thresholding is one option.

q  DRONE can improve the average F-measure of baselines from 18.75% to 29.47%, a relative improvement of 57.17%.

q  Product factor features are the most discriminative features, followed by related-reports factor features.

I acknowledge the support of Google and the ICSM organizers in the form of a Female Student Travel Grant, which enabled me to attend this conference.

Thank you!

APPENDIX

P1:10% P2:20% P3:40% P4:20% P5:10%

Proportions of each priority levels in Validation Data:

After applying Linear Regression Model on Validation Data:

BR1 1.2

BR2 1.4

BR3 3.1

BR4 3.5

BR5 2.1

BR6 3.2

BR7 3.4

BR8 3.7

BR9 1.3

BR10 4.5

BR1 1.2

BR9 1.3

BR2 1.4

BR5 2.1

BR3 3.1

BR6 3.2

BR7 3.4

BR4 3.5

BR8 3.7

BR10 4.5

BR1 1.2

BR9 1.3

BR2 1.4

BR5 2.1

BR3 3.1

BR6 3.2

BR7 3.4

BR4 3.5

BR8 3.7

BR10 4.5

Initial

Predicted priority level

BR1 1.2

BR9 1.3

BR2 1.4

BR5 2.1

BR3 3.1

BR6 3.2

BR7 3.4

BR4 3.5

BR8 3.7

BR10 4.5

Initialized Thresholds:

BR1 1.2

BR9 1.3

BR2 1.4

BR5 2.1

BR3 3.1

BR6 3.2

BR7 3.4

BR4 3.5

BR8 3.7

BR10 4.5

Tune one threshold

Compute F-measures

BR1 1.2

BR9 1.3

BR2 1.4

BR5 2.1

BR3 3.1

BR6 3.2

BR7 3.4

BR4 3.5

BR8 3.7

BR10 4.5

Tune one threshold

Compute F-measures

BR1 1.2

BR9 1.3

BR2 1.4

BR5 2.1

BR3 3.1

BR6 3.2

BR7 3.4

BR4 3.5

BR8 3.7

BR10 4.5

Update threshold value

Higher

BR1 1.2

BR9 1.3

BR2 1.4

BR5 2.1

BR3 3.1

BR6 3.2

BR7 3.4

BR4 3.5

BR8 3.7

BR10 4.5

Threshold 1 is fixed

BR1 1.2

BR9 1.3

BR2 1.4

BR5 2.1

BR3 3.1

BR6 3.2

BR7 3.4

BR4 3.5

BR8 3.7

BR10 4.5

Tune for next threshold

1.3 P2

q  Menzies and Marcus (ICSM 2008) ¡  Analyze reports in NASA

¡  Textual features +feature selection+ RIPPER

q  Lamkanfi et al. (MSR 2010, CSMR 2011)

¡  Predict coarse-grained severity labels

¡  Severe vs. non-severe

¡  Analyze reports in open-source systems

¡  Compare and contrast various algorithms

q  Tian et al.(WCRE 2012)

¡  Information retrieval + k nearest neighbour

Previous Research Work: Severity Prediciton

q Tokenization Spliting document into tokens according to delimiters.

q Stop-word Removal eg: are, is, I, he

q Stemming eg: woking, works, worked->work

Text Pre-processing

q Textual Features

§  Compute BM25Fext scores

§  Feature1: Extract unigram

§  Feature2: Extract bigrams

q Non-Textual Features

§  Feature3: Product field

§  Feature4: Component Field

Appendix: Similarity Between Bug Reports (REP-)

Note: Weights are learned from duplicate bug reports.

Feature PRO5 Proportion of bug reports made for the same product as that of BR prior to the

reporting of BR that are assigned priority P1. PRO16 Proportion of bug reports made for the same component as that of BR prior to

the reporting of BR that are assigned priority P1. REP1 Mean priority of the top-20 most similar bug reports to BR as measured using

REP- prior to the reporting of BR. REP3 Mean priority of the top-10 most similar bug reports to BR as measured using

REP prior to the reporting of BR. PRO18 Proportion of bug reports made for the same component as that of BR prior to

the reporting of BR that are assigned priority P3. PRO10 Mean priority of bug reports made for the same product as that of BR prior to

the reporting of BR. PRO21 Mean priority of bug reports made for the same component as that of BR prior

to the reporting of BR. PRO7 Proportion of bug reports made for the same product as that of BR prior to the

reporting of BR that are assigned priority P3. REP5 Mean priority of the top-5 most similar bug reports to BR as measured using REP

prior to the reporting of BR. Text “1663”

DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis

Technology

FAMILY TIME Bugs! Bugs! Bugs!Bugs! Bugs! Bugs! by Bob Barner Enjoy the Book Together • you read, make the action words in the story come alive by using As your fingers to mimic the

Drone & Wo: Cultural Influences on Human-Drone …hci.stanford.edu/publications/2017/droneandwo/chi2017_drone_and_w… · Drone & Wo: Cultural Inﬂuences on Human-Drone Interaction

Good Bugs & Bad Bugs

Predicting Aging-Related Bugs using Software Complexity

Insect Unit- Bugs Bugs Everywhere

4 Predicting Bugs from History · 2011. 9. 16. · 4 Predicting Bugs from History 73 models are available. They range from the simple Nelson model [392] to more so-phisticated modelsusing

Bugs Happy Valentine's Day From - A Crafty Spoonful · Bugs Happy Valentine's Day From: Bugs Happy Valentine's Day From: Bugs Happy Valentine's Day From: Bugs Happy Valentine's Day

1 Win XP/Vista/Win7+++ Win 2000. 2 Bugs, Bugs and Bugs

Express Drone Parts at 2016 Commercial Drone Expo

Active Testing: Predicting and Confirming Concurrency Bugs ......Active Testing: Predicting and Con rming Concurrency Bugs for Concurrent and Distributed Memory Parallel Systems by

Little Bugs, Little Bugs, What Do You See?€¦ · Little Bugs, Little Bugs, What Do You See? Tint tots & Velcro Dots

Recording Plants for Bugs PlanSRT fO BugS wisley

“Bugs are Bringing Bugs” - Health Sciences Centerhsc.ghs.org/.../2014/04/0304-Brenner-Bugs-Are-Bringing-Bugs.pdf“Bugs are Bringing Bugs ... •Had been working in Malawi (East

Joyance Drone Brochure - Sprayer Drone Factory Drone

Drone regulations canada and buy drone canada

ujava.org Drone Scenario & Drone Airport Systems

Drone Modelling, Perception and Control Drone Perception

Body Bugs NOVA | Bugs That Live on You NOVA | Bugs That Live on You

BUGS Songbook · PDF fileBugs Bugs, bugs everywhere Bugs on your shoes, bugs in your hair Look around, you will see Bugs on you, bugs on me Max the mosquito, Fiona firefly

4 Predicting Bugs from History - School of …research.cs.queensu.ca › ~ahmed › home › teaching › CISC880 › F11 › ...4 Predicting Bugs from History 73 models are available