29
An Empirical Study of Adoption of Software Testing in Open Source Projects Pavneet Singh Kochhar 1 , Tegawendé F. Bissyandé 2 , David Lo 1 , Lingxiao Jiang 1 1 Singapore Management University 2 University of Luxembourg

An Empirical Study of Adoption of Software Testing in Open Source Projects

Embed Size (px)

Citation preview

Page 1: An Empirical Study of Adoption of Software Testing in Open Source Projects

An Empirical Study of Adoption of Software Testing in Open Source Projects

Pavneet Singh Kochhar1, Tegawendé F. Bissyandé2, David Lo1, Lingxiao Jiang1

1Singapore Management University2University of Luxembourg

Page 2: An Empirical Study of Adoption of Software Testing in Open Source Projects

2/24

Importance of Software Testing

Functionality -- Requirements

Debugging -- Software complexity

Costs -- $59 billions* for inadequate testing

What is the adoption of test casesIn open-source projects?

*G. Tassey, “The economic impacts of inadequate infrastructure for software testing,” National Institute of Standards and Technology, RTI Project, 2002.

Page 3: An Empirical Study of Adoption of Software Testing in Open Source Projects

3/24

Objective & Contributions

Popularity of test cases

Presence of test cases – project characteristics

Influence of software development artifacts

Large Scale Study on over 20,000 GitHub projects

Page 4: An Empirical Study of Adoption of Software Testing in Open Source Projects

4/24

Dataset & Statistical computationsDownloaded over 100,000 projects from GitHub

Randomly selected 50,000 projects Preliminary study

Filter out projects with < 500 Lines of Code (LOC) 20,817 projects

Page 5: An Empirical Study of Adoption of Software Testing in Open Source Projects

5/24

Dataset & Statistical computationsLines of code

LOC* – By programming languages

Number of test cases Count of test files

Developer contributions Project team size

Bug count Tags

Bug reporters User names

*SLOCCount (http://www.dwheeler.com/sloccount/)

Page 6: An Empirical Study of Adoption of Software Testing in Open Source Projects

6/24

Lines of Code (LOC)

Page 7: An Empirical Study of Adoption of Software Testing in Open Source Projects

7/24

RQ1– Popularity of Test Cases

Projects % of ProjectsWithout Test Cases 38.34%

With Test Cases 61.65%

84.87% of the projects < 100 test cases10.7% of the projects have >100 & < 500 cases4.4% of the projects >500 test cases

Distribution of Test Cases

Page 8: An Empirical Study of Adoption of Software Testing in Open Source Projects

8/24

Box Plot

Median

LowerQuartile

UpperQuartile

LowerWhisker

(25% of Data)

UpperWhisker

(25% of Data)

Outliers

50% of Data

Page 9: An Empirical Study of Adoption of Software Testing in Open Source Projects

9/24

RQ1– Popularity of Test CasesLOC (Projects with & without Test cases)

Difference between the distributions is statistically significan (p-value < 0.05)

Page 10: An Empirical Study of Adoption of Software Testing in Open Source Projects

10/24

RQ1– Popularity of Test CasesLOC & Test Cases

Positive correlation between #LOC and #Test Cases (ρ=0.427) (p-value < 0.05)

Page 11: An Empirical Study of Adoption of Software Testing in Open Source Projects

11//24

RQ1– Popularity of Test CasesLOC & Test cases/LOC

Negative correlation between #LOC and #Test Cases/LOC (ρ=-0.451) (p-value < 0.05)

Page 12: An Empirical Study of Adoption of Software Testing in Open Source Projects

12/24

RQ2– Developers & Test CasesDevelopers (Projects with & without Test cases)

Difference between the distributions is statistically significant (p-value < 0.05)

Page 13: An Empirical Study of Adoption of Software Testing in Open Source Projects

13/24

RQ2– Developers & Test CasesDevelopers & Test cases

Weak correlation between #Developers and #Test Cases (ρ=0.207) (p-value < 0.05)

Page 14: An Empirical Study of Adoption of Software Testing in Open Source Projects

14/24

RQ2– Developers & Test CasesDevelopers & Test cases/developer

Negative correlation between Team size and #Test Cases per developer (ρ=-0.444) (p-value < 0.05)

Page 15: An Empirical Study of Adoption of Software Testing in Open Source Projects

15/24

RQ3–Bug Count and Test Cases Identifying bugs (Tags)

bug bug; T bug; Bug Confirmed; bugs; starter bug; bug fix etc.

defect defect; Type-Defect; minor defect

error error; Wow error; build error; error page; user error etc.

Page 16: An Empirical Study of Adoption of Software Testing in Open Source Projects

16/24

RQ3–Bug Count and Test Cases Test cases & Bugs

Weak correlation between # bugs and #Test Cases (ρ=0.181) (p-value < 0.05)

Page 17: An Empirical Study of Adoption of Software Testing in Open Source Projects

17/24

RQ4–Bug Reporters and Test CasesBug reporters (Projects with & without Test cases)

Difference between the distributions is statistically significant (p-value < 0.05)

Page 18: An Empirical Study of Adoption of Software Testing in Open Source Projects

18/24

RQ4– Bug Reporters and Test CasesTest cases & Bug reporters

Weak correlation between # bug reporters and #Test Cases (ρ=0.171) (p-value < 0.05)

Page 19: An Empirical Study of Adoption of Software Testing in Open Source Projects

19/24

RQ5–Programming Languages and Test CasesProjects (Top 10 Languages)

1. Java2. Ruby3. PHP4. Python5. ANSI C6. C++7. Objective-C8. C#9. JavaScript10.Perl

Page 20: An Empirical Study of Adoption of Software Testing in Open Source Projects

20/24

RQ5–Programming Languages and Test CasesTest Cases/Project (Top 10 Languages)

Language # of Projects # of Test Cases Test Cases/ ProjectC++ 1,920 648,773 337.90

ANSI C 2,197 286,009 130.18

PHP 2,902 255,553 88.06

C# 1,042 81,334 78.05

Java 3,112 196,703 63.20

Ruby 3,016 173,864 57.64

JavaScript 819 39,070 47.70

Python 2,536 103,600 40.85

Objective-C 1,153 21,343 18.51

Perl 630 7,690 12.20

Page 21: An Empirical Study of Adoption of Software Testing in Open Source Projects

21/24

RQ5–Programming Languages and Test CasesTest Cases (Median) (Top 10 Languages)

Page 22: An Empirical Study of Adoption of Software Testing in Open Source Projects

Distribution of test cases (C++)

22/24

Page 23: An Empirical Study of Adoption of Software Testing in Open Source Projects

23/24

Threats to Validity

Heuristics to detect test cases

Counting bugs Tags: bug, error, defect

Not all projects use GitHub’s issue tracking system

Page 24: An Empirical Study of Adoption of Software Testing in Open Source Projects

24/24

Conclusion

Findings:o Projects with test cases are bigger in size. o # of test cases per LOC decreases with increasing LOC.o The more developers, the more test caseso The more developers, the less ratio of test cases/developero Weak correlation between # of test cases and # of bugso # of test cases and # of bug reporters have weak positive

correlationo Projects written in popular languages such as C++, ANSI C & PHP have higher mean numbers of test cases.

Future agenda:-- Exploration of the influence of more project characteristics/metrics- - Check with other open source datasets- - Use language specific heuristics

Page 25: An Empirical Study of Adoption of Software Testing in Open Source Projects

25/24

Appendix

Page 26: An Empirical Study of Adoption of Software Testing in Open Source Projects

Bug Tags

27

installation rich Improvement Reporting

duplicated pat New feature community

feature mark Confirmed documentation

routing needs review In Progress categorization

optimization Samples Feature request publishing

security Unable to reproduce Wont fix ranker

translations nack Resolved server

ui rich Bug confirmed Fatal

TODO pat backend Build System

low priority mark low-priority MS AspNet

Sam presentation frontend OAuth2

Page 27: An Empirical Study of Adoption of Software Testing in Open Source Projects

22/23

C++ test cases

URL Language # of test cases

https://github.com/isis-project/WebKit cpp 166,488

https://github.com/cswei/Olympia_on_Desktop cpp 94,591

https://github.com/librelab/qtmoko-test cpp 52,039

https://github.com/mozilla/mozilla-central cpp 36,671

https://github.com/weissms/owb-mirror cpp 29,340

Page 28: An Empirical Study of Adoption of Software Testing in Open Source Projects

Distribution of test cases (C#)

29

Page 29: An Empirical Study of Adoption of Software Testing in Open Source Projects

30

RQ5–Programming Languages and Test CasesTest Cases (Top 10 Languages)

Median

LowerQuartile

UpperQuartile

Lowerwhisker

UpperWhisker Outliers

50% of Data