39
Exploring the Use of Labels to Categorize Issues in Open-Source Software Projects Jordi Cabot, Javier Luis Cánovas Izquierdo, Valerio Cosentino, Belén Rolandi SANER conference March 2015

Exploring the Use of Labels to Categorize Issues in Open-Source Software Projects

Embed Size (px)

Citation preview

Exploring the Use of Labels to

Categorize Issues in

Open-Source Software Projects

Jordi Cabot, Javier Luis Cánovas Izquierdo,

Valerio Cosentino, Belén Rolandi

SANER conference

March 2015

Open-Source Systems

…computer software with its source code made

available with a license in which the copyright

holder provides the rights to study, change

and distribute the software to anyone and for

any purpose.

…Open-Source Software (OSS) is developed

in a collaborative public manner.

Contributing to OSS

doc

code

support

request

question

bug

pull request

doc

code

support

pull request

Contributing to OSS

request

question

bug

doc

code

support

pull request

Contributing to OSS

request

question

bug

Label Issues in GitHubTitle

Author

Description

Asignee

Status

Labels

Label Issues in GitHub

bug

duplicate

enhancement

help wanted

invalid

question

wontfix

Default labels

Label Issues in Github

Label Issues in Github

Label Issues in Github

Label Issues in Github

GitHub Analysis

GHTorrent

GiLA

GitHub Analysis

GHTorrent

GiLA

GitHub Analysis

GHTorrent

RQ1. Label Usage

How many labels are used in Github? How many labels are

used per project? What are the most popular ones?

RQ2. Label Influence

For those projects using labels, does its usage influence the

evolution of the project?

GiLA

Early Research Achievement

Can we detect group of labels commonly used together? Are

there label families?

RQ1. Label Usageflickr/tiffanyTerry

Label Usage in GitHub

Using Labels122,012

3%

Not Using labels3,635,026

97%

Lesson: Labels are scarcely used in GitHub

Main Labels

Lesson: Default labels are the winners but Documentation and feature are also broadly used

Labels used together

Lesson: bug-enhancement are

the labels most used together

Projects using labels

55561

31026

13390

6910

42063011

1934 1378 955 723

2918

0

10000

20000

30000

40000

50000

60000

1 2 3 4 5 6 7 8 9 10 >10

# labels used in the project

# projectsTotal: 122,012

1.47%

0.82%

0.94%

Labels/Issue

55561

31026

13390

6910

42063011

1934 1378 955 723

2918

1 1.02 1.04 1.06 1.09 1.081.13

1.18 1.21.25

1.52

0

0.5

1

1.5

2

2.5

0

10000

20000

30000

40000

50000

60000

1 2 3 4 5 6 7 8 9 10 >10

# labels used in the project

# projects Avg. Labels/issueAvg: 1.14

% Labeled Issues

55561

31026

13390

6910

42063011

1934 1378 955 723

2918

59.87 6158.89 58.84 59.72

56.1657.83 58.99 58.83

55.06 55.88

0

10

20

30

40

50

60

70

80

90

100

0

10000

20000

30000

40000

50000

60000

1 2 3 4 5 6 7 8 9 10 >10

# labels used in the project

# projects %labeled issuesAvg: 58.29%

Users involved in labeled issues

55561

31026

13390

6910

42063011

1934 1378 955 723

2918

59.87 6158.89 58.84 59.72

56.1657.83 58.99 58.83

55.06 55.88

80.98

72.06

77.7375.81 75.22

72.05 72.8775.52

72.0669.25 70.43

0

10

20

30

40

50

60

70

80

90

100

0

10000

20000

30000

40000

50000

60000

1 2 3 4 5 6 7 8 9 10 >10

# labels used in the project

# projects %labeled issues % users involved in labeled issues

Avg: 78,72%

RQ2. Label Influenceflickr/JorisLouwes

Label Influence

Projects Usinglabels

Time to Solve

Issue Age

% Solved

People Involved

Label Influence

26.93

46.18

74.92

101.3111.8

145.7

116.4127.2

116.4

70.4

306.4

148.1

22.53

43.51

48.76

53.2155.27 56.3

58.82 57.95 59.28

63.23

47.59

60.19

0

10

20

30

40

50

60

70

0

50

100

150

200

250

300

350

0 1 2 3 4 5 6 7 8 9 10 >10

# labels used in the project

Med. Time to solve % solved

On average, the percentage of solved labeled issues tends to

increase together with the number of labels used in the project, it may

confirm that the effort of categorizing issues is beneficial for the

project advancement

It might come at the cost of taking more time to solve those labeled

issues

ρ = 0.80

ρ = 0.73

Going further

Detecting familiesflickr/RichBrooks

Detecting families

bug build

contributiondocumentation

duplicate

0 - backlog1 - ready

2 - working

3 - done

docs

enhancement

invalid

urgent

priority-highhigh-priority

priority-low

question

priority-medium

usability

component-logic

component-notyi

component-ui

priority-low

component-mode-perl

component-ui-gtk

frontend-gtkfrontend-pango

milestone

imported

0.0.1

0.0.1

0.0.3

1.0.0.rc1

0.2.0

0.5.01.0.0 update

type-cleanup

p1 p2

p3

taken

fixeddiscuss milestone-release0.4

milestone-release0.7

performance

medium-priority

usability

wontfix

new

type-ask

Families?

low-priority

Detecting families

bug build

contributiondocumentation

duplicate

0 - backlog1 - ready

2 - working

3 - done

docs

enhancement

invalid

urgent

priority-highhigh-priority

priority-low

question

priority-medium

usability

component-logic

component-notyi

component-ui

priority-low

component-mode-perl

low-priority

component-ui-gtk

frontend-gtkfrontend-pango

milestone

imported

0.0.1

0.0.1

0.0.3

1.0.0.rc1

0.2.0

0.5.01.0.0 update

type-cleanup

p1 p2

p3

taken

fixeddiscuss milestone-release0.4

milestone-release0.7

performance

medium-priority

usability

wontfix

new

type-ask

Family # Labels % Projects

Priority 1,027 (2.33%) 4.33%

Version 2,703 (6.14%) 1.68%

Workflow 1,972 (4.48%) 5.67%

Architecture 1,104 (2.51%) 2.00%

0 - backlog

frontend-pangotype-cleanup

2 - working

enhancement

component-ui

component-ui-gtk milestone

imported1.0.0taken

milestone-release0.4

usabilitycontribution

documentationduplicate

3 - done

invalid

question

component-logic

component-notyicomponent-mode-perl

frontend-gtk

0.2.0

0.5.0update

milestone-release0.7

performancenew

type-ask

bug build 1 - ready docs

usability

0.0.1

0.0.1

0.0.3

1.0.0.rc1

p1 p2

p3

fixeddiscuss

wontfix

Detecting families

urgent

priority-highhigh-priority

priority-low

priority-medium

priority-low

low-priority

medium-priority

Family # Labels % Projects

Priority 1,027 (2.33%) 4.33%

Version 2,703 (6.14%) 1.68%

Workflow 1,972 (4.48%) 5.67%

Architecture 1,104 (2.51%) 2.00%

duplicate

component-ui-gtk

importedtaken

documentation

update

high-priority

frontend-pangotype-cleanup

2 - workingusability

contribution

3 - done

invalidfrontend-gtk

type-ask

0 - backlog

enhancement

component-ui

question

component-logic

component-notyicomponent-mode-perl

bug build 1 - ready docs

usability

p1 p2

p3

fixeddiscuss

wontfix urgent

priority-high

priority-low

priority-medium

priority-low

low-priority

medium-priority

performance

Detecting families

Family # Labels % Projects

Priority 1,027 (2.33%) 4.33%

Version 2,703 (6.14%) 1.68%

Workflow 1,972 (4.48%) 5.67%

Architecture 1,104 (2.51%) 2.00%

milestone

1.0.0

milestone-release0.4

0.2.0

0.5.0

milestone-release0.7

new

0.0.1

0.0.1

0.0.3

1.0.0.rc1

1.0.00.5.0

importedcomponent-notyi

fixed

milestone

milestone-release0.7

0.0.1

duplicate

type-cleanup

contribution

bug build

priority-low

medium-priority

new

0.0.3

component-ui-gtk

documentation

update

high-priority

frontend-pango

usability

invalidfrontend-gtk

type-askenhancement

component-ui

question

component-logic

component-mode-perl

docs

usability

discuss

wontfix urgent

priority-high

priority-low

priority-medium low-priority

milestone-release0.4

0.2.0

0.0.1

1.0.0.rc1

performance

Detecting families

Family # Labels % Projects

Priority 1,027 (2.33%) 4.33%

Version 2,703 (6.14%) 1.68%

Workflow 1,972 (4.48%) 5.67%

Architecture 1,104 (2.51%) 2.00%

2 - working

3 - done

0 - backlog1 - ready

p2

p3

taken

p1

taken 1.0.00.5.0

type-ask 3 - done

fixed

0.0.1bug build

priority-low

new

0.0.3docs

usability

priority-high

low-priority

milestone-release0.4

0.2.0

1.0.0.rc1

2 - working

1 - ready

p2

p3

p1

imported

milestone

duplicate

type-cleanup

contribution

medium-priority

documentation

update

high-priorityusability

invalid

enhancement

question

discuss

wontfix urgent

priority-low

priority-medium

0.0.1 0 - backlog

performance

Detecting families

Family # Labels % Projects

Priority 1,027 (2.33%) 4.33%

Version 2,703 (6.14%) 1.68%

Workflow 1,972 (4.48%) 5.67%

Architecture 1,104 (2.51%) 2.00%

component-notyi milestone-release0.7

component-ui-gtk

frontend-pango frontend-gtk

component-uicomponent-logic

component-mode-perl

Conclusion

• Label mechanism is scarcely used

• When used, it may have a positive impact in the project

• Confirmed the existence of families when using labels

• Further research is needed to better classify their use

• How families influence the project success

• Why projects choose a specific label family

• How labels evolve during the life-cycle of the project

• Perform the analysis to other web-based code hosting services

Early result

Future

Except where otherwise noted, content on this presentation is licensed under a Creative Commons Attribution 3.0 License.

Thanks!

Come to see our awesome

demostration!

Belén [email protected]

Jordi [email protected]

Javier L. Cánovas [email protected]

Valerio [email protected]

Label Usage (issues)

45150

17268

3915

1071 421 223 84 49 19 4 12

69.55

75.9479.65

82.18 84.31

78.1

84.65 84.64 83.57 8582.64

0

10

20

30

40

50

60

70

80

90

100

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

1 2 3 4 5 6 7 8 9 10 >10

Projects with 0 to 9 issues

9996

13203

8771

51213115

1995 1177 823 518 337 780

18.39

43.48

52.64

58.3662.72 63.68

66.88 67.8172.16

69.7473.85

0

10

20

30

40

50

60

70

80

90

100

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

1 2 3 4 5 6 7 8 9 10 >10

Projects with 10 to 99 issues

407 545 694 703 656 773 651 481 394 3631765

6.03

11.81

31.1528.68 30.52 31.52

39.1143.11 41.96 42.89

52.25

0

10

20

30

40

50

60

70

80

90

100

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

1 2 3 4 5 6 7 8 9 10 >10

Projects with 100 to 999 issues

8 10 10 15 14 20 22 25 24 19 361

28.67

14.41

5.78

14.4517.28

12.8

24.43 22.83

28.31

21.6

33.95

0

10

20

30

40

50

60

70

80

90

100

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

1 2 3 4 5 6 7 8 9 10 >10

Projects with more than 999 issues

Total: 68,216

Avg: 70.10%

Total: 46,836

Avg: 45,68%

Total: 7,432

Avg: 34.81%

Total: 528

Avg: 29.54%

Label Influence

795.4 808.6

937.3

998.5

11111060

1152 1139

982.9

1425

1148

46.1874.92

101.3 111.8145.7

116.4 127.2 116.470.4

306.4 148.1

0

200

400

600

800

1000

1200

1400

1600

0

200

400

600

800

1000

1200

1400

1600

1 2 3 4 5 6 7 8 9 10 >10

# labels used in the project

Avg. Time to solve Med. Time to solve

Label Influence

4516

5867

7540

8646

9196

9729

9330

10610

10100

8321 8302

2577

3747

6346

7427

80818335 8268

9154

8612

72765918

0

2000

4000

6000

8000

10000

12000

0

2000

4000

6000

8000

10000

12000

1 2 3 4 5 6 7 8 9 10 >10

# labels used in the project

Avg. Issue Age Med. Issue Age

Label Influence

43.51

48.76

53.2155.27 56.3

58.82 57.9559.28

63.23

47.59

60.19

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9 10 >10

# labels used in the project

% Solved

Label Influence

3.45.44

9.73

15.61

20.93

25.91

35.18

46.79

51.24

71.67

129.18

0

20

40

60

80

100

120

140

1 2 3 4 5 6 7 8 9 10 >10

# labels used in the project

Avg. People involved