View
146
Download
0
Category
Preview:
Citation preview
What Would Users Change in My App? Summarizing App Reviews for
Recommending Software Changes.
Andrea Sebastiano Carol V. Junji Corrado A. Gerardo Harald Di Sorbo Panichella Alexandru Shimagaki Visaggio Canfora Gall
UNIVERSITÀ DEGLI STUDI DEL
SANNIO
2
OUTLINE
Context: Manual v.s. AutomatedAnalysis of User Reviews
Proposed Solution: Generating Summaries of User Reviews
Case Study: Assessment of the SummariesInvolving 23 Developers
Conclusion and Future Work
3
Manual v.s. AutomatedAnalysis of User Reviews
V.S.
4
Maintenance of Mobile Applications
“About one third of app reviews contain useful information for developers”
Pagano et. al. RE2013
5
Manual Analysis of Reviews
6
PAST WORKChen et al – ICSE
2014
Text Analysis to filter out non-informative reviews
Topic Analysis to recognize topics treated in the reviews classified as informative
7
PAST WORK Panichella et al – ICSME
2015
FEATURE REQUESTPROBLEM DISCOVERY
INFORMATION SEEKINGINFORMATION GIVING
OTHER
Sentiment Analysis+
Natural Language Parsing
+Text Analysis
8
The Problem
Feature Requests Bug Reports
9
Generating Summaries of User Reviews
SURF (Summarizer of User Review Feedback)
10
USER REVIEWS MODEL
11
USER REVIEWS MODEL
I love this app but it crashes my whole iPad and it
has to restart itself•User intention: Problem Discovery
•Review topics: App, Model“…The User Reviews Model proposed by the authors is impressive in how it analyzes a review sentence by sentence and is able to characterize a sentence with multiple labels…” – one of FSE reviewers
12
SUMMARIZER OF USER REVIEW FEEDBACK
13
1. Data Collection1
14
2. Intention Classification2
machinelearning
15
3. Topics Classification3
16
3. Topics Classification3
17
3. Topics Classification3
Can't change position of icons on main screen and can't close bookmarks icon
too.
screen, trajectory, button, white, background, interface, usability, tap, switch, icon, orientation,
position, picture, show, list, category, cover, scroll, touch, website, swipe, sensitive, view, roll, side, sort, click, small, colorful, glitch, page, corner,
bookmark…
GUI-related dictionary
P (SENTENCE, GUI) = 5/14 = 0.357
18
4. Sentence Scoring
Obs1) User feedback discussing bug reports and feature requests are more important for developers than all other reviews type.
Intention Class ScoreProblem Discovery 3.0Feature Request 3.0
Information Seeking 1.0Information Giving 1.5
Other 0.5
4
IRSSENTENCE = 3.0
SENTENCE = Can't change position of icons on main screen and can't close bookmarks icon too
19
Obs1) User feedback discussing bug reports and feature requests are more important for developers than all other reviews type.Obs2) Developers need reasonably useful sentences discussing specific aspect of an app with respect to other review sentences
P (SENTENCE, GUI) = 5/14 = 0.357
4. Sentence Scoring4
SENTENCE = Can't change position of icons on main screen and can't close bookmarks icon too
20
Obs1) User feedback discussing bug reports and feature requests are more important for developers than all other reviews type.Obs2) Developers need reasonably useful sentences discussing specific aspect of an app with respect to other review sentences.Obs3) Longer sentences are usually more informative than shorter ones.
L SENTENCE = 80
4. Sentence Scoring4
SENTENCE = Can't change position of icons on main screen and can't close bookmarks icon too
21
Obs1) User feedback discussing bug reports and feature requests are more important for developers than all other reviews type.Obs2) Developers need reasonably useful sentences discussing specific aspect of an app with respect to other review sentences.Obs3) Longer sentences are usually more informative than shorter ones.Obs4) Reviews treating frequently discussed features may attract more attention of developers than reviews dealing with features rarely used or discussed by users
MFWR (SENTENCE,GUI) = 2/14 = 0.143
4. Sentence Scoring4
SENTENCE = Can't change position of icons on main screen and can't close bookmarks icon too
23
5. Summary Generation5
24
Case Study
Involving 23 Developers
25
Case Study
Involving 23 Developers
3439 Reviews
26
Case Study
Involving 23 Developers
3439 Reviews
Of17
Apps
27
Research Questions
RQ1: Is URM a robust and suitable model for representing user needs in meaningful maintenance tasks for developers?
RQ2: To what extent does a summarization technique developed on top of URM help mobile developers better understand the users' needs?
URM
28
Study Procedure
29
TWO Experiments
Experiment I Experiment II
ITALY SWITZERLAND
NETHERLAND
JAPAN
30
TWO Experiments
Experiment I Experiment II
ITALY SWITZERLAND
NETHERLAND
JAPAN
31
TWO Experiments
Experiment I
ITALY
SWITZERLAND
NETHERLAND
32
TWO Experiments
Experiment I
ITALY
SWITZERLAND
NETHERLAND
1) Summaries for 15Apps
33
TWO Experiments
Experiment I
ITALY
SWITZERLAND
NETHERLAND
1) Summaries for 15Apps
2) Involving 16 Developers (6 were the original
developers)
34
TWO Experiments
Experiment I
ITALY
SWITZERLAND
NETHERLAND
1) Summaries for 15Apps
2) Involving 16 Developers (6 were the original
developers)3) We assigned to each participant an app.
35
TWO Experiments
Experiment II
JAPAN
36
TWO Experiments
Experiment II
JAPAN
1) Summaries Of 2Apps
37
TWO Experiments
Experiment II
JAPAN
1) Summaries Of 2Apps
2) Involving 7 Employers from
38
TWO Experiments
Experiment IIGroup 1 (3 subjects) Group 2 (4 subjects)
Experiment II-A
Experiment II-B
39
TWO Experiments
Experiment IIGroup 1 (3 subjects) Group 2 (4 subjects)
Experiment II-A
Experiment II-B
Participants ClassifiedReviews according to URM Participants Classified
Reviews according to URM
40
TWO Experiments
Experiment IIGroup 1 (3 subjects) Group 2 (4 subjects)
Experiment II-A
Experiment II-B
Participants ClassifiedReviews according to URM Participants Classified
Reviews according to URM
Participants Validatedthe summaries generated
by SURF
Participants Validatedthe summaries generated
by SURF
41
Is URM a robust and suitable model for representing user needs in meaningful
maintenance tasks for developers?
RQ1
42
RQ1: Is URM a robust and suitable model for representing user needs in meaningful
maintenance tasks for developers?
Experiment I Experiment II &
43
RQ1: Is URM a robust and suitable model for representing user needs in meaningful
maintenance tasks for developers?
Experiment I Experiment II &
78.26% of participants declared that URM is not
missing any relevant information and that the topics
considered in URM are EXAUSTIVE.
44
RQ1: Is URM a robust and suitable model for representing user needs in meaningful
maintenance tasks for developers?
78.26% of participants declared that URM is not
missing any relevant information and that the topics
considered in URM are EXAUSTIVE.
Experiment I Experiment II &
82% of participants declared that the most important topics modeled
in URM are the App, GUI and Feature or Functionality categories.
45
RQ1: Is URM a robust and suitable model for representing user needs in meaningful
maintenance tasks for developers?
78.26% of participants declared that URM is not
missing any relevant information and that the topics
considered in URM are EXAUSTIVE.
Experiment I Experiment II &
82% of participants declared that the most important topics modeled
in URM are the App, GUI and Feature or Functionality categories.
“I found the classification GUI-BUG, APP-BUG, etc
very useful. . .”
46
RQ1: Is URM a robust and suitable model for representing user needs in meaningful
maintenance tasks for developers?
78.26% of participants declared that URM is not
missing any relevant information and that the topics
considered in URM are EXAUSTIVE.
Experiment I Experiment II &
82% of participants declared that the most important topics modeled
in URM are the App, GUI and Feature or Functionality categories.
“. . in case I'm searching for BUGs, I can just
look for the category, instead of reading
everything over andover again. . .”
“I found the classification GUI-BUG, APP-BUG, etc
very useful. . .”
47
RQ1: Is URM a robust and suitable model for representing user needs in meaningful
maintenance tasks for developers?
78.26% of participants declared that URM is not
missing any relevant information and that the topics
considered in URM are EXAUSTIVE.
Experiment I Experiment II &
82% of participants declared that the most important topics modeled
in URM are the App, GUI and Feature or Functionality categories.
“. . in case I'm searching for BUGs, I can just
look for the category, instead of reading
everything over andover again. . .”
“I found the classification GUI-BUG, APP-BUG, etc
very useful. . .”
SUMMARY: Most of participants consider URM as a robust and suitable model for representing user
needs in meaningful maintenance tasks for developers.
48
To what extent does a summarization technique developed on top of URM help mobile developers better understand the
users' needs?
RQ2
49
RQ2: To what extent does a summarization technique developed on top of URM help mobile developers
better understand the users' needs?
50
RQ2:
The validation task performed by the survey participants highlights
the very high classification accuracy of
SURF, which is 91%.
To what extent does a summarization technique developed on top of URM help mobile developers
better understand the users' needs?
51
RQ2:
The validation task performed by the survey
participants highlights the very high classification
accuracy of SURF, which is 91%.
To what extent does a summarization technique developed on top of URM help mobile developers
better understand the users' needs?
SURF works reasonable well in summarizing user feedback regarding change requests
concerning GUI, APP, FEATURE improvements with the only
exception of the maintenance topic “COMPANY”.
52
How do app review summaries generatedby SURF impact the time required by developers to
analyze user reviews?
53
How do app review summaries generatedby SURF impact the time required by developers to
analyze user reviews?
The time saving capability of
SURF perceived by all developers
Is of at least 50%.
94% of participants believe that the time saving capability ofSURF is of 75%.
54
How do app review summaries generatedby SURF impact the time required by developers to
analyze user reviews?
The time saving capability of
SURF perceived by all developers
Is of at least 50%.
94% of participants believe that the time saving capability ofSURF is of 75%.
55
How do app review summaries generatedby SURF impact the time required by developers to
analyze user reviews?
The time saving capability of
SURF perceived by all developers
Is of at least 50%.
94% of participants believe that the time saving capability ofSURF is of 75%.
SURF helps to prevent more than 50% of the time required by developers for analyzing users
feedback and planning software changes.
56
How do app review summaries generatedby SURF impact the time required by developers to
analyze user reviews?
The time saving capability of
SURF perceived by all developers
Is of at least 50%.
94% of participants believe that the time saving capability ofSURF is of 75%.
SURF helps to prevent more than 50% of the time required by developers for analyzing users
feedback and planning software changes.66% of feedback manually
extracted by the participants also appear in the summaries
automatically generated by SURF.
57
How do app review summaries generatedby SURF impact the time required by developers to
analyze user reviews?
The time saving capability of
SURF perceived by all developers
Is of at least 50%.
94% of participants believe that the time saving capability ofSURF is of 75%.
SURF helps to prevent more than 50% of the time required by developers for analyzing users
feedback and planning software changes.66% of feedback manually
extracted by the participants also appear in the summaries
automatically generated by SURF.
SUMMARY: 1) SURF helps to prevent more than half of
the time required by developers for analyzing users feedback and planning software changes.
2) 66% of manually extracted feedback appears also in the automatic generated summaries.
58
Quality of SURF’ Summaries
59
Quality of SURF’ Summaries
60
Quality of SURF’ Summaries
61
Conclusion
1) URM is a robust and suitable model for representing user needs in
meaningful maintenance tasks for developers.
2) SURF helps to prevent more than half of the time required for
analyzing users feedback and planning software changes.
3) 66% of manually extracted feedback appears
also in the automatic generated summaries.
V.S.
4) Summaries generated by SURF are reasonably correct,
adequate, concise, and expressive.
Thanks for the Attention!
Questions?
SURF (Summarizer of User Review Feedback)
Recommended