Upload
sebastiano-panichella
View
366
Download
1
Embed Size (px)
Citation preview
Supporting Newcomers in SoftwareDevelopment Projects
Post-doctoral Dissertationby
Sebastiano Panichella
2
Open Source Projects (OSS)
Team 1 Team 2 Team n
...
Open Source Projects (OSS)
3
Team 1 Team 2 Team n
...
New FeaturesBugs fixing ...................
...................
...................
Open Source Projects (OSS)
New FeaturesBugs fixing ...................
...................
...................
New FeaturesBugs fixing ...................
...................
...................
4
Newcomers in Open Source Projects
5
Team 1 Team 2 Team n
...
Newcomers in Open Source Projects
6
Team 1 Team 2 Team n
...
Dealing with Newcomers in Open Source Projects
7Kraut et al.
Dealing with Newcomers in Open Source Projects
8
1) Recruitment
Kraut et al.
Dealing with Newcomers in Open Source Projects
9
1) Recruitment
2) Selection
Kraut et al.
Dealing with Newcomers in Open Source Projects
10
1) Recruitment
2) Selection
3) Retention
Kraut et al.
Dealing with Newcomers in Open Source Projects
11
4) Socialization
1) Recruitment
2) Selection
3) Retention
Kraut et al.
Dealing with Newcomers in Open Source Projects
12Kraut et al.
[1] Steinmacher et al. “Why do newcomers abandon open source software projects?” CHASE 2013
[2] Zhou et al. “Does the initial environment impact the future of developers?” ICSE 2011
[3] Dagenais et al. “Moving into a new software project landscape” ICSE 2010
[4] Steinmacher et al. “How to support newcomers on boarding to open source software projects”
13
Newcomer Learning Path…
Perform Development Tasks Mentoring Team Collaborations
14
Newcomer Training Process
Steinmacher et al. “The hard life of open source software project newcomers” (CHASE 2014)
Dagenais et al. “Moving in a New Project Landscape” (ICSE 2011)
Perform Development Tasks Mentoring
Recommenders
Team Collaborations
15
Newcomer Training Process
Steinmacher et al. “The hard life of open source software project newcomers” (CHASE 2014)
Dagenais et al. “Moving in a New Project Landscape” (ICSE 2011)
1) Recommend Mentors
Perform Development Tasks Mentoring
Recommenders
Team Collaborations
16
Newcomer Training Process
Steinmacher et al. “The hard life of open source software project newcomers” (CHASE 2014)
Dagenais et al. “Moving in a New Project Landscape” (ICSE 2011)
1) Recommend Mentors
2) Supporting Source Code Comprehension and Re-‐documentation
Perform Development Tasks Mentoring
Recommenders
Team Collaborations
17
Newcomer Training Process
3) Detecting Emerging TeamsSteinmacher et al. “The hard life of open source software project newcomers” (CHASE 2014)
Dagenais et al. “Moving in a New Project Landscape” (ICSE 2011)
18
Newcomer Learning Path…
Data Extraction
• Versioning Systems • Mailing Lists • Issue trackers • Q&A site (e.g. StackOverflow)
19
Newcomer Learning Path…
Data Extraction
Empirical Studies
• Versioning Systems • Mailing Lists • Issue trackers • Q&A site (e.g. StackOverflow)
20
Newcomer Learning Path…
Data Extraction
Recommenders
• Versioning Systems • Mailing Lists • Issue trackers • Q&A site (e.g. StackOverflow)
Empirical Studies
21
Newcomer Learning Path…
Thesis Structure
• PART I : analyzing data from software repositories to support team work.
• rs developers and support the team work.• PART II : analyzing how developers use software artifacts to
help newcomers in program comprehension task. Si• on ta• sk.• PART III : developing recommenders to support concretely
project newcomers
22
• PART I: analyzing data from software repositories to support team work.
• rs developers and support the team work.• PART II : analyzing how developers use software artifacts to
help newcomers in program comprehension task. Si• on ta• sk.• PART III : developing recommenders to support concretely
project newcomers
23
Thesis Structure
• PART I: analyzing data from software repositories to support team work.
• rs developers and support the team work.• PART II: analyzing how developers use software artifacts to
help newcomers in program comprehension task. Si• on ta• sk.• PART III : presents recommenders to support concretely
project newcomers
24
Thesis Structure
• PART I: analyzing data from software repositories to support team work.
• rs developers and support the team work.• PART II: analyzing how developers use software artifacts to
help newcomers in program comprehension task. Si• on ta• sk.• PART III: developing recommenders to support concretely
project newcomers.
25
Thesis Structure
PART I
Analysis of Developers’Communication
26
How Developers’ Collaborations Networks Identified from Different Sources Differ?
IRC CHAT LOG
VERSIONING SYSTEM
ISSUE TRACKERMAILING LIST
Sebastiano Panichella, Gabriele Bavota, Massimiliano Di Penta, Gerardo Canfora, Giuliano AntoniolHow Developers’ Collaborations Identified from Different Sources Tell us About Code Changes. The 30th International Conference on Software Maintenance and Evolution (IEEE ICSME 2014)
27
Example: Hibernate OSS Project
28
How Developers’ Collaborations Networks Identified from Different Sources Differ?
Example: Hibernate OSS Project
29
How Developers’ Collaborations Networks Identified from Different Sources Differ?
Example: Hibernate OSS Project
30
How Developers’ Collaborations Networks Identified from Different Sources Differ?
Example: Hibernate OSS Project
31
How Developers’ Collaborations Networks Identified from Different Sources Differ?
Example: Hibernate OSS Project
32
How Developers’ Collaborations Networks Identified from Different Sources Differ?
Developers Overlap between Different Sources
Apache Httpd
Apache Lucene
Samba
Hibernate
ISSUE and CHAT ISSUE and MAIL
<35% 56%
MAIL and CHAT MAIL and ISSUE
<50%
86%
33
Overlap of Developers Social Links
ISSUE and CHAT ISSUE and MAIL
<26% 38%
MAIL and CHAT MAIL and ISSUE
<20%
30%
34
Apache Httpd
Apache Lucene
Samba
Hibernate
During an IRC Chat MeetingPROJECT: Hibernate
“is there a better way? dunno like I said this is brainstorming and I have not given lots of thought to these cases”
“but we also need to create the attributes and values in the entity binding..”
35
PROJECT: Hibernate
“is there a better way? dunno like I said this is brainstorming and I have not given lots of thought to these cases”
1) Brainstorming
36
“however planning a pure standalone test suite would make things easier...”
During an IRC Chat Meeting
PROJECT: Hibernate
“is there a better way? dunno like I said this is brainstorming and I have not given lots of thought to these cases”
“however planning a pure standalone test suite would make things easier...”
1) Brainstorming2) Planning (e.g. Testing activities)
37
During an IRC Chat Meeting
PROJECT: Hibernate
“okay I think it is a bug and I’m going to create a jira first”
“however planning a pure standalone test suite would make things easier...”
1) Brainstorming2) Planning (e.g. Testing activities)3) Open an Issue
38
During an IRC Chat Meeting
Apache HTTPD
Apache Lucene
Hibernate
Samba
Precision in Recommending Leaders0% 20% 40% 60% 80%
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.2
0.2
0.4
0.2
0
CHAT ISSUE MAIL
Use Issue, Chat and Mail to Identify Leaders
39
By use FUZZY CLUSTER ALGORITHMS
Teams Identification from Emergent Collaborations
40
Analysis of the Evolution of Teams: How?
Sebastiano Panichella, Gerardo Canfora, Massimiliano Di Penta, Rocco Oliveto: How the evolution of emerging collaborations relates to code changes: an empirical study. The 22nd International Conference on Program Comprehension (IEEE ICPC 2014)
Teams Merge in a New Release:
-‐ in 20%-‐35% of the cases
Teams Split in a New Release:
-‐ In 15%-‐35% of the cases
How do Emerging Collaborations Change across Software Releases?
Teams Disappeared:
22%-‐45%
Teams Survived:
50%-‐70%
41
Analysis of Developers’Communication
1) Analyzing developers collaboration/communication through specific channels would only provide a partial view of the reality.
2) Social network recommenders should not limit their information mining a single source.
3) Issue and mail can be used to identify leaders with high accuracy.
PART I
Summary
42
Analysis of Developers’Communication
1) Analysing developers collaboration/communication from mailing list and issue tracker to profile developers roles.
PART I
Possible Future Directions:
43
2) Analysing developers collabora]on/communica]on from mailing list and issue tracker to profile teams of developers specialized in the topics of interest of newcomers.
PART IIHow Developers Browse
and Understand Software Artifacts
44
45
PART II – Experiment A
PART II – Experiment B
Two Empirical Studies Aimed at Understanding
46
PART II – Experiment AHow such documentation is browsed b y d e v e l o p e r s t o p e r f o r m maintenance activities?
PART II – Experiment B
Two Empirical Studies Aimed at Understanding
PART II – Experiment AHow such documentation is browsed b y d e v e l o p e r s t o p e r f o r m maintenance activities?
PART II – Experiment BWhat code elements are often used by humans when labeling a source code artifacts?
word1word2
word3word4
word5word6
47
Two Empirical Studies Aimed at Understanding
Maintenance TasksBug Fixing:
Add a new feature:
Improve existing features:
48G. Bavota, G. Canfora, M. Di Penta, R.Oliveto, Sebastiano Panichella
An Empirical Investigation on Documentation Usage Patterns in Maintenance Tasks. The 29th International Conference on Software Maintenance (ICSM 2013)
• 33 participants
How Much Time did Participants Spend on Different Kinds of Artifacts?
72%
13%10%3%2%
49
72%
13%10%3%2%
Undergraduates students used Source Code and Javadoc significantly
more than Graduate students
Undergraduate Students
50
How Much Time did Participants Spend on Different Kinds of Artifacts?
72%
13%10%3%2%
Graduate Students
Graduate students used Class Diagrams significantly more than Undergraduates
51
How Much Time did Participants Spend on Different Kinds of Artifacts?
More experienced participants use a more “Integrated approach”
S= Sequence Diagram
D= Class Diagram
U= Use Case
J= Javadoc
52
Most Frequent Navigation Patterns Before Reaching Source Code
18%$
8%$
2%$
2%$
1%$
1%$
3%$
0%$
1%$
0%$
4%$
12%$
16%$
12%$
10%$
4%$
4%$
1%$
2%$
1%$
1%$
5%$
0%# 2%# 4%# 6%# 8%# 10%# 12%# 14%# 16%# 18%# 20%#
S##D##
(SD)+##(US)+##
U(SD)+##(DS)+##
J##U#
S(US)+##SU(SD)+##Other##
Graduate#Undegraduate#
PART II – Experiment B
What Code Elements are Often Used by Humans When Labeling
a Source Code Artifact?
word1word2
word3word4
word5word6
53
• Object:
Experiment B: Context
eXVantage (industrial test data generation tool)
• Subjects: 38 Bachelor Student CS
…
book hotel room reservation arrival departure smoking double card breakfastJava Class
54
Experiment B: ContextComparison of Different Labeling Techniques
Andrea De Lucia, Massimiliano Di Penta, Rocco Oliveto, Annibale Panichella, Sebastiano Panichella : Labeling Source Code with Information Retrieval Methods: An Empirical Study. EMSE 2014.
55
Experiment B: ContextComparison of Different Labeling Techniques
Data extracted from signature of methods match very well the mental model of newcomers when describing source code
Andrea De Lucia, Massimiliano Di Penta, Rocco Oliveto, Annibale Panichella, Sebastiano Panichella : Labeling Source Code with Information Retrieval Methods: An Empirical Study. EMSE 2014.
56
How Developers Browse andUnderstand Software Artifacts
1) Newcomers spend more time to analyze low-level artifacts as compared to high-level artifacts
2) Heuristics based on data extracted form signature of methods are able to match very well the mental model of newcomers when describing source code elements
PART II
Summary
57
How Developers Browse andUnderstand Software Artifacts
1) Develop new recommenders able to suggest to newcomers how to proper navigate software artefacts during maintenance tasks
2) Build on top of the the automatic labelling approach more accurate summarization techniques to support newcomers during program comprehension and maintenance activity.
PART II
Possible Future Directions:
58
PART III
Recommenders
Data Extraction (Software Repositories)
Empirical Studies
PART I and PART II
Recommenders
PART III
59
Two Recommenders to SupportProject Newcomers
PART III - A)Suggest Appropriate Mentors to Help Newcomers in Open Source Projects
PART III – B)Mining Source Code Descriptions from Developers’ Communication to Improve Newcomers’ Program Comprehension
60
PART III – A)
Suggest Appropriate Mentors to Help Newcomers in Open Source Projects
61
Mentoring of Project Newcomers is Highly Desirable….
Previous Work
Dagenais et al. ICSE 2010
MENTOR
62
Motivation
https://community.apache.org/mentoringprogramme.html
https://community.apache.org/mentoringprogramme.html
63
Identifying Mentors in Software Projects
Characteristics of a Good Mentor
64
Enough ability to help other people.
Enough expertise about the topic of interest for the newcomer.
https://community.apache.org/mentoringprogramme.html
1) Find Past Successful Mentors
2) Suggest Mentors Having Specific Skills
YODA (Young and newcOmer Developer Assistant)
65
Approach for Mentors Identification in Open Source Projects
Gerardo Canfora, Massimiliano Di Penta, Rocco Oliveto, Sebastiano Panichella: Who is Going to Mentor Newcomers in Open Source Projects?
International Symposium on the Foundations of Software Engineering (SIGSOFT FSE 2012)
http://www.ifi.uzh.ch/seal/people/panichella/tools.html
Score Computed Aggregating the Factors in a Weighted Sum
Identify Past Successful Mentors
∑=
5
1iii fw
F1
F2 >F3
F4 -‐ 1st
F5
Recommending Mentors
67
Time
Project Developers
68
Time
Project Developers
Newcomer
t
Recommending Mentors
69
Time
Project Developers
Newcomer
t
?Mentor with Adequate Skills
Recommending Mentors
70
Time
Past Mentors
Newcomer
t
Inspired to the Work on Bug Triaging by J. Anvik et al., TOSEM 2011
Recommending Mentors
71
Time
Past Mentors
Newcomer
t
Inspired to the Work on Bug Triaging by J. Anvik et al., TOSEM 2011
Recommending Mentors
72
Time
Past Mentors
Newcomer
t
Inspired to the Work on Bug Triaging by J. Anvik et al., TOSEM 2011
Recommending Mentors
73
Time
Past Mentors
Newcomer
t
Inspired to the Work on Bug Triaging by J. Anvik et al., TOSEM 2011
Recommending Mentors
74
Is it Possible to Recommend Mentors To Project Newcomers?
85%$
30%$
100%$
64%$
94%$
81%$
24%$
100%$
77%$82%$
0%$
10%$
20%$
30%$
40%$
50%$
60%$
70%$
80%$
90%$
100%$
Apache$ FreeBSD$ PostgreSQL$ Python$ Samba$
Precision$
Mentor$RecommendaGons:$Precision$on$Top$1$and$Top$2$
Top$1$
Top$2$
75
Results When are Used Both Mails and Issues
80%$
67%$
90%$ 91%$ 92%$
72%$
45%$
89%$84%$
76%$
0%$
10%$
20%$
30%$
40%$
50%$
60%$
70%$
80%$
90%$
100%$
Apache$ FreeBSD$ PostgreSQL$ Python$ Samba$
Precision$
Mentor$RecommendaGons:$Precision$on$Top$1$and$Top$2$
Top$1$
Top$2$
76
Mentor Recommenda]ons: Precision on Top 1 and Top 2
Precision
0%
25%
50%
75%
100%
Apache FreeBSD PostgreSQL Python Samba
76%
84%89%
45%
72%
92%91%90%
67%
80%
Top 1Top 2
YODA Make it Possible to Recommend Mentors
It is Possible to Recommend Mentors To Project Newcomers?
PART III – B)
Mining Source Code Descriptions from Developer Communications to Improve
Newcomers Program Comprehension
77
Effort in Program Comprehension
78
Mining Summaries
We argue that messages exchanged among contributors/developers are a useful source of information to help understanding source code.
In such situations developers need to infer knowledge from,
the source code itself
Newcomer
Can find Source code description
79
Mining Summaries
We argue that messages exchanged among contributors/developers are a useful source of information to help understanding source code.
In such situations developers need to infer knowledge from,
the source code itself source code descriptions in external artifacts.
Newcomer
Can find Source code description
.................................................. When call the method IndexSplitter.split(File destDir, String[] segs) from the Lucene cotrib directory(contrib/misc/src/java/org/apache/lucene/index) it creates an index with segments descriptor file with wrong data. Namely wrong is the number representing the name of segment that would be created next in this index. ..................................................
CLASS: IndexSplitter METHOD: split
80
StackOverflow
81
• Step 1: Downloading SO discussions relying on its REST interface and tracing them onto classes
• Step 2: Extracting paragraphs
• Step 3: Tracing paragraphs onto methods ( Discards Paragraphs of discussions with 0 Votes)• Step 4: Heuristic based Filtering • Step 5: Similarity based Filtering
CODES:Approach for Mining Method Descriptions
Carmine Vassallo, Sebastiano Panichella, Massimiliano Di Penta, Gerardo Canfora: CODES: mining source code descriptions from developers discussions. BEST TOOL AWARD at the 22nd International Conference on Program Comprehension (IEEE ICPC 2014)
82
http://www.ifi.uzh.ch/seal/people/panichella/tools.html
Sebastiano Panichella, Jairo Aponte, Massimiliano Di Penta, Andrian Marcus, Gerardo Canfora: Mining source code descriptions from developer communications. International Conference on Program Comprehension (IEEE ICPC 2012)
Help Newcomer Program Comprehension with extraction of summaries of code elements from
Newcomer Q&A SITE
ISSUE TRACKER
MAILING LIST
83
Supporting Software Development
Approach Precision vs. Number of Method Covered
84
Approach Precision vs. Number of Method Covered
We mine useful java methods description from developers discussions
85
PART III
Recommenders
1) YODA make it possible to recommend mentors with a precision higher than 67%
3) Combining Mails and Issues improve recommenders’ performance
2) CODES identifies relevant descriptions with a precision higher than 79%
86
87
Future Work and
Conclusion
PART I
PART II
PART III
88
Future Work…
Analysing developers collaboration/communication from mailing list and issue tracker to profile developers roles or teams.
We will aim at building recommenders to help newcomer in the choice of appropriate patterns to navigate software documentation during maintenance tasks.
New Recommenders
89
Tester Maintainer Mentor
Future Work…
Develop new recommenders able to describing dependencies with third-‐party libraries and help newcomers to avoid changes that could break the dependency with used libraries.
New Recommenders
90
Build on top of the the automatic labelling approach more accurate summarization techniques to support newcomers during program comprehension
Advice to Young Newcomers (Researchers)
91
Advice to Young Newcomers (Researchers)
92
Pleasures and rewards of research:
-‐ PASSION is the most important thing!
-‐ Be aware that with this work you will not earn so much moneys. If you have ANY DOUBT don’t do it!
-‐ WORK HARD (even in the week-‐end) and do good work!
Do you really want to be a PhD student?
Ways to fail a Ph.D
93
Learn too much… - Some students go to Ph.D. school because they want to learn. - Let there be no mistake: Ph.D. school involves a lot of learning.
But, it requires focused learning directed toward an eventual thesis. - Taking (or sitting in on) non-required classes outside one's focus is almost always a waste of time, and it's always unnecessary.
Expect perfection…Perfection cannot be attained. It is approached in the limit…
Something that is “good enough” can be achieved…
Procrastinate… Chronic perfectionists also tend to be procrastinators. So do eternal students with a drive to learn instead of research.
Ways to fail a Ph.D
94
Ignore Conference Meetings…
Start complaining instead working…- Complaining about some of the following staff instead working is a big mistake:
a) Your working environment. b) The absence of your advisor.
c) Anything else.
-> “PhD student” doesn’t means “student” …It means, “Young Researcher”…
…we can expect ADVISES from our advisor NOT ideas…
- Conferences are the best places to be aware of “new emerging ideas” - Be in contact with other peers and share ideas. “Best idea born in a collaborative
matter”.
Thing to do during a Ph.D
95
Be focused!
Be focused on real problems!
- It is important, certainly work in a wider research has a wide impact, however, it should be around a precise problem.
A wide research will get you better job offers, but proven depth is more likely to get you fame.
- The best research often comes from REAL PROBLEMS: I think that much academic research is, well, academic.
- Talk with local companies: you may find that your best ideas, even general theoretical ideas, arise from their needs.
Validate your research work!- Validate your research ideas is a very effort consuming task, it is very
important try to involve developers (doing a survey or controlled experiments) in this loop.
Thanks for your Attention!
96
PART I
PART II
PART III
97