73
How Developers’ Collaboration Identified from Different Sources Tell us About Code Changes 10/31/2022 Sebastiano Gabriele Massimiliano Gerardo Giuliano Panichella Bavota Di Penta Canfora Antoniol

How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

Embed Size (px)

DESCRIPTION

Written communications recorded through channels such as mailing lists or issue trackers, but also code cochanges, have been used to identify emerging collaborations in software projects. Also, such data has been used to identify the relation between developers’ roles in communication networks and source code changes, or to identify mentors aiding newcomers to evolve the software project. However, results of such analyses may be different depending on the communication channel being mined. This paper investigates how collaboration links vary and complement each other when they are identified through data from three different kinds of communication channels, i.e., mailing lists, issue trackers, and IRC chat logs. Also, the study investigates how such links overlap with links mined from code changes, and how the use of different sources would influence (i) the identification of project mentors, and (ii) the presence of a correlation between the social role of a developer and her changes. Results of a study conducted on seven open source projects indicate that the overlap of communication links between the various sources is relatively low, and that the application of networks obtained from different sources may lead to different results.

Citation preview

Page 1: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

How Developers’ Collaboration Identified from Different Sources

Tell us About Code Changes

Sebastiano Gabriele Massimiliano Gerardo Giuliano

Panichella Bavota Di Penta Canfora Antoniol

Page 2: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Outline

Context and Motivations- Software Development

Case Study- Seven Open Source Projects

Results- Evaluation of Developers Collaboration Identified from

Different Sources- Application of Networks Obtained from Different Sources

Page 3: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Different Sources of Information…

‘‘…In everybody’s experience, different communication channels play different, sometimes complementary sometimes alternative, roles: news can be gathered (and shared) from the radio, by reading a newspaper, watching a TV broadcast or surfing blogs.’’.

Page 4: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Academic Paper Preparation

Page 5: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Academic Paper Preparation

Page 6: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Academic Paper Preparation

Page 7: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Academic Paper Preparation

Page 8: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Academic Paper Preparation

Page 9: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Academic Paper Preparation

‘‘Study Design’’

Page 10: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Academic Paper Preparation

‘‘Study Design’’

‘‘Results’’

Page 11: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Academic Paper Preparation

‘‘Abstract and Introduction"

Page 12: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Academic Paper Preparation

‘‘Conclusion’’

‘‘Results’’

Page 13: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Academic Paper Preparation

‘‘Conclusion’’

‘‘Results’’

Page 14: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Once the Paper is Ready…

Title We face an important decision:determine the right order of the authors…

Page 15: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Focusing on a single source:

Page 16: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Focusing on a single source:

Page 17: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Focusing on a single source:

I would Bavota as first author…

Page 18: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Focusing on a single source:

I would Panichella as first author…

Page 19: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Merging all the sources:

I would Panichella as first author…

I say Panichella..

I would Panichella as first author…

I would Bavota as first author…

I say Bavota..

Page 20: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Software Development Environment

Page 21: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Software Development Environment

Page 22: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Software Development Environment

Example: Hibernate OSS Project

Page 23: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Previous Work…

Bird et al. - MSR 2006

Page 24: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Previous Work…

Canfora et al. - FSE 2012

Page 25: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Previous Work…

Guzzi et al. - MSR 2013

Page 26: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Previous Work…

Elliot et al. - ACM GROUP 2003

Page 27: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

IRC CHAT LOG

VERSIONING SYSTEM

ISSUE TRACKER

MAILING LIST

How Developers’ Collaborations Networks Identified from Different Sources Differ?

Page 28: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Case Study

Goal: investigating how different communication channels would provide different views of developers’ interaction and the use of such information in recommender systems could produce different results.

Research questions:

• RQ1: to what extent do developers discuss through the different communication channels?

• RQ2: How do the inferred links between developers overlap when using different sources of information?

• RQ3: How do social network metrics change when using different sources, and how would this impact on using such information to build recommenders?

Page 29: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Context - Objects

Project from Andr. Api

Period KLOC

Apache HTTPD June 2011-June 2013

2,021-2,240

Apache CXF June 2011-June 2013

593–771

Hibernate June 2011-June 2013

984–1,096

Infinispan June 2011-June 2013

146–286

Apache Lucene June 2011-June 2013

198–437

Samba June 2010-June 2012

1,278–1426

Weld June 2011-June 2013

108–139

Page 30: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Data Extraction

Page 31: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Data Extraction

Page 32: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Data Extraction

Class 1

Class 3

Class 2

Class 4

Class 1

Page 33: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Data Extraction

Class 1

Class 3

Class 2

Class 4

Class 1

Page 34: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Data Extraction

Page 35: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Data Extraction

Identifying people

that use more than

one sources

Page 36: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Data Extraction

Identifying people

that use more than

one sources

Page 37: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

RQ1: to what extent do developers discuss through the different communication

channels?Apache Httpd

Apache Lucene

Samba

Hibernate

Apache CXF

Infinispan

Weld

Page 38: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

RQ1: to what extent do developers discuss through the different communication

channels?Apache Httpd

Apache Lucene

Samba

Hibernate

Apache CXF

Infinispan

Weld

Developers mainly use two out of three communication

channels,whereas the third one is only

used sporadically.

Page 39: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

RQ1: to what extent do developers discuss through the different communication

channels?Apache Httpd

Apache Lucene

Samba

Hibernate

Apache CXF

Infinispan

Weld

While in the past developers used emails as

main communication channel, nowadays they

are massively using chats or issue trackers.

Developers mainly use two out of three communication

channels,whereas the third one is only

used sporadically.

Page 40: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

ISSUE and CHAT ISSUE and MAIL

<35% 56%

MAIL and CHAT MAIL and ISSUE

<50%

86%

Apache Httpd

Apache Lucene

Samba

Hibernate

Apache CXF

Infinispan

Weld

Developers Overlap betweenDifferent Sources

Page 41: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

RQ2: how do the inferred links between developers overlap when using different

sources of information?Apache Httpd

Apache Lucene

Samba

Hibernate

Apache CXF

Infinispan

Weld

ISSUE and CHAT ISSUE and MAIL

<35% 56%

MAIL and CHAT MAIL and ISSUE

<50%

86%

Page 42: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

RQ2: how do the inferred links between developers overlap when using different

sources of information?Apache Httpd

Apache Lucene

Samba

Hibernate

Apache CXF

Infinispan

Weld

ISSUE and CHAT ISSUE and MAIL

<26% 38%

MAIL and CHAT MAIL and ISSUE

<20%

30%

Page 43: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

During an IRC Chat Meeting

“is there a better way? dunno like I said this is brainstorming and I have not given lots of thought to these cases”

“but we also need to create the attributes and values in the entity binding..”

Page 44: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

During an IRC Chat Meeting

“is there a better way? dunno like I said this is brainstorming and I have not given lots of thought to these cases”

1) Brainstorming

“however planning a pure standalone test suite would make things easier...”

Page 45: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

During an IRC Chat Meeting

“is there a better way? dunno like I said this is brainstorming and I have not given lots of thought to these cases”

“however planning a pure standalone test suite would make things easier...”

1) Brainstorming2) Planning (e.g. Testing activities)

Page 46: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

During an IRC Chat Meeting

“okay I think it is a bug and I’m going to create a jira first”

“however planning a pure standalone test suite would make things easier...”

1) Brainstorming2) Planning (e.g. Testing activities)

3) Open an Issue

Page 47: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Similarity Measure of Topics Extracted from Different Communication

Channels

issues vs. mails

issues vs. chat

mails vs. chat

Apache Httpd 0.17 0.09 0.06

Apache CXF 0.86 0.11 0.01

Hibernate 0.11 0.02 0.03

Infinispan 0.07 0.03 0.03

Apache Lucene

0.08 0.03 0.02

Samba 0.06 0.02 0.02

Weld 0.11 0.04 0.03

Page 48: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Similarity Measure of Topics Extracted from Different Communication

Channels

issues vs. mails

issues vs. chat

mails vs. chat

Apache Httpd 0.17 0.09 0.06

Apache CXF 0.86 0.11 0.01

Hibernate 0.11 0.02 0.03

Infinispan 0.07 0.03 0.03

Apache Lucene

0.08 0.03 0.02

Samba 0.06 0.02 0.02

Weld 0.11 0.04 0.03

> ≥

> ≥

> >> >>

> >

> >

Page 49: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

RQ3: How do social network metrics change when using different sources, and how would this impact on using such information to build recommenders?

Page 50: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

RQ3: How do social network metrics change when using different sources, and how would this impact on using such information to build recommenders?

Social Network Metrics:

- Identifying high-degree developers;

- Identifying mentors. (Canfora et al. - FSE 2012).

Social Network Metrics vs. Code Changes:

- Correlation between social roles and change activities.

(replicating the study by Bird et al. - MSR 2006).

Page 51: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Mentors Overlap betweenDifferent Sources

Apache Httpd

Apache Lucene

Samba

Hibernate

Apache CXF

Infinispan

Weld

Page 52: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Mentors Overlap betweenDifferent Sources

Considering ALL SOURCES

41%

Apache Httpd

Apache Lucene

Samba

Hibernate

Apache CXF

Infinispan

Weld

Page 53: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Mentors Overlap betweenDifferent Sources

Considering ALL SOURCESMAIL and ISSUE

<41% 47%

Apache Httpd

Apache Lucene

Samba

Hibernate

Apache CXF

Infinispan

Weld

Page 54: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

High Degree Contributors Overlap between Different

Sources

Considering ALL SOURCESMAIL and ISSUE

<41% 47%

Apache Httpd

Apache Lucene

Samba

Hibernate

Apache CXF

Infinispan

Weld Considering ALL SOURCES

36%

Page 55: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Considering ALL SOURCESMAIL and ISSUE

<41% 47%

Apache Httpd

Apache Lucene

Samba

Hibernate

Apache CXF

Infinispan

Weld Considering ALL SOURCES MAIL and ISSUE

<36%

46%

High Degree Contributors Overlap between Different

Sources

Page 56: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Ohloh Kudos Score

Kudos score:level of appreciation or respect of a developer working for a project. It is based on the judgement of other project members.

http://www.ohloh.net/p/apache/contributors

Page 57: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Issue, Chat and Email to Identify Leaders

Leaders

Leaders

Leaders

Leaders

Ap

ach

e L

uce

ne

Sa

mb

a

0% 10% 20% 30% 40% 50% 60% 70% 80% 90%

0%

20%

40%

20%

20%

40%

60%

60%

60%

60%

60%

80%

Precision in Recommending Leaders

Page 58: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Replication of the Work by Bird et al.

Bird et al. - MSR 2006

‘‘Developers who actually commit

changes, play much more significant roles in the email community than

non-developers’’

Page 59: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Social Network Metrics vs. Source Code Changes

Apache HTTPD

Page 60: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Social Network Metrics vs. Source Code Changes

SNA Metrics

Code Metrics

Apache HTTPD

Page 61: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Social Network Metrics vs. Source Code Changes

SNA Metrics

Code Metrics

Apache HTTPD

Page 62: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Social Network Metrics vs. Source Code Changes

SNA Metrics

Code Metrics

Apache HTTPD

Page 63: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Social Network Metrics vs. Source Code Changes

SNA Metrics

Code Metrics

Results varying when we consider for example issue trackers?

Apache HTTPD

Page 64: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Social Network Metrics vs. Source Code Changes

SNA Metrics

Code Metrics

Apache HTTPD

Apache HTTPD

Page 65: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Social Network Metrics vs. Source Code Changes

Hibernate

Hibernate

Hibernate

Page 66: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Social Network Metrics vs. Source Code Changes

Hibernate

Hibernate

Hibernate

Page 67: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Social Network Metrics vs. Source Code Changes

Hibernate

Hibernate

Hibernate

Page 68: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Social Network Metrics vs. Source Code Changes

Hibernate

Hibernate

Hibernate

Page 69: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Conclusion

Page 70: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Conclusion

Page 71: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Conclusion

Page 72: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Conclusion

Page 73: How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

04/09/2023

Conclusion