MSR2012 - Explaining Software Defects Using Topic Models

Preview:

Citation preview

Explaining Software Defects Using Topic Models

Tse-Hsun (Peter) Chen, Stephen W. Thomas, Meiyappan Nagappan, Ahmed E. Hassan

2

int readFile(String filePath){ fp =

readFile(filePath)if fp == NULLreturn -1

elsereturn fp

}

3

int readFile(String filePath){ fp =

readFile(filePath)

if fp == NULLreturn -1

elsereturn fp

}

int manageMemory(int index){

if mem[index] is not NULL{

freeInd = findFreeMemoryLoc()

goto(freeInd) }

}

4

int readFile(String filePath){ fp =

readFile(filePath)

if fp == NULLreturn -1

elsereturn fp

}

int manageMemory(int index){

if mem[index] is not NULL{

freeInd = findFreeMemoryLoc()

goto(freeInd) }

}

More Risky Concern

5

int readFile(String filePath){ fp =

readFile(filePath)

if fp == NULLreturn -1

elsereturn fp

}

int manageMemory(int index){

if mem[index] is not NULL{

freeInd = findFreeMemoryLoc()

goto(freeInd) }

}

More Risky Concern

Can we use concerns to study defects?

Capturing Concerns Using Topic Models

manage memory index mem free ind find free memory loc

read file file path fp file path fp

Topics Models(LDA)

Topic 1

Topic 2

read, file, path, fp, file, index,

ind

6

manage, memory, mem, free, find, loc

7

How defect prone are topics?

Can topics explain software defects?

Case Studies

3 versions of each system

0.4 - 8.8 MLOC

2.8 - 17 K files

1,300 ~ 6,500 post-release defects

8

9

How defect prone are topics?

Can topics explain software defects?

If some topics are more defect-prone than others...

We can allocate MORE testing resources on these

topics!

10

If some topics are more defect-prone than others...

We can allocate MORE testing resources on these

topics!

11

12

F1

F2

F3

T1

T2

T3

T4

Measuring Topic Defect-proneness

13

F1

F2

F3

T1

T2

T3

T4

Measuring Topic Defect-proneness

14

F1

F2

F3

T1

T2

T3

T4

Measuring Topic Defect-proneness

15

F1

F2

F3

T1

T2

T3

T4

Measuring Topic Defect-proneness

16

F1

F2

F3

T1

T2

T3

T4

Measuring Topic Defect-proneness

17

F1

F2

F3

T1

T2

T3

T4

Measuring Topic Defect-proneness

18

F1

F2

F3

T1

T2

T3

T4

Measuring Topic Defect-proneness

19

F1

F2

F3

T1

T2

T3

T4

Measuring Topic Defect-proneness

20

F1

F2

F3

T1

T2

T3

T4

Measuring Topic Defect-proneness

21

What is Relationship Between Defects and Topics?

22

What is Relationship Between Defects and Topics?

T3 T2 T1 T4

23

What is Relationship Between Defects and Topics?

T3 T2 T1 T4 T3 T2 T1 T4

24

What is Relationship Between Defects and Topics?

T3 T2 T1 T4

25

Few Topics are Defect-prone

26

Few Topics are Defect-prone

Task, Eclipse, Eclipse Mylyn, Task ui, Core,Repository

27

Few Topics are Defect-prone

Lower color,Jface,Comparison check

Task, Eclipse, Eclipse Mylyn, Task ui, Core,Repository

28

How defect prone are topics?

Can topics explain software defects?

Few Topics are Defect-prone!

29

How defect prone are topics?

Can topics explain software defects?

Few Topics are Defect-prone!

Explaining Defects

30

Explaining Defects

31

Static

Explaining Defects

32

Lines of CodeStatic

Explaining Defects

33

Lines of CodeStatic

Historical

Explaining Defects

34

Lines of Code

Pre-release DefectsCode Churn

Static

Historical

Explaining Defects

35

Lines of Code

Pre-release DefectsCode Churn

Static

Historical

TopicsTopic Metrics

36

F1

F2

F3

T1

T2

T3

T4

Using Topics to Explain Defects

37

F1

F2

F3

T1

T2

T3

T4

Using Topics to Explain Defects

38

F3

T1

T2

T3

T4

Using Topics to Explain Defects

F1

F2

Explainability of Metrics

39

Static

Explainability of Metrics

40

Static

Explainability of Metrics

41

Deviance Explained(D1)andAIC1

Static

Explainability of Metrics

42

Deviance Explained(D1)andAIC1

Static

Topics

Explainability of Metrics

43

Deviance Explained(D1)andAIC1

Static

StaticTopics

Explainability of Metrics

44

Deviance Explained(D1)andAIC1

D2 and AIC2

Static

StaticTopics

Explainability of Metrics

45

Deviance Explained(D1)andAIC1

D2 and AIC2

Improvement in Explainability = D2 – D1 and AIC2 – AIC1

Static

StaticTopics

More Topics More Defects in File

46

%A

vg. I m

p. in

D2

47

F1

F2

F3

T1

T2

T3

T4

Topic Membership Metrics:Few Topics are Defect-prone

Dealing with Large # of Metrics

48

Dealing with Large # of Metrics

49

Topic membership metrics may have as many as

500 variables!

Dealing with Large # of Metrics

50

Solution:Use PCA to reduce the number of metrics

Topic membership metrics may have as many as

500 variables!

Topic Memebership Metrics Explain Defects Even More

51

% A

vg. Im

p. in

AIC

52

How defect prone are topics?

Can topics explain software defects?

Few Topics are Defect-prone! YES!

Limitations

53

Limitations

54

1. Parameter Choices

Limitations

55

1. Parameter Choices•Number of topics

•Thresholds

Limitations

56

1. Parameter Choices•Number of topics

•Thresholds

2. Used Baseline Metrics

Static Historical

Limitations

57

1. Parameter Choices•Number of topics

•Thresholds

2. Used Baseline Metrics

3. Studied Three Subject Systems

Static Historical

Summary

Summary

Summary

Summary

Summary

Recommended