Transcript
Page 1: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

Analyzing Reviews and Code of Mobile Apps for

Better Release Planning

Adelina Ciurumelea, Andreas Schaufenbühl, Sebastiano Panichella, Harald C. Gall

software evolution & architecture lab

Page 2: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

2

Extremely Popular Apps8,087,067 reviews3,505,905 reviews38,742,600 reviews

Page 3: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

3

Open Source Apps62,707 reviews

Page 4: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

4

The number of reviews is large compared to the available development resources.

Page 5: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

5

• reviews contain valuable feedback directly from the users

• users often report bugs, user experience and request features

• the review content influences the number of downloads

Importance of reviews

Page 6: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

6

INFORMATIVE NON-INFORMATIVE

“AR-Miner: Mining informative reviews for developers from mobile app marketplace” N. Chen, J. Lin, S. Hoi, X. Xiao, and B. Zhang

Page 7: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

7

BUG FEATURE REQUEST

“Release planning of mobile apps based on user reviews” L. Villarroel, G. Bavota, B. Russo, R. Oliveto, and M. Di Penta\

OTHER

Page 8: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

8

BUGFEATURE REQUEST

• the developer has to manually analyse the unstructured groups of reviews, understand what they talk about and extract actionable change tasks

• what does a particular cluster talk about? Does it talk about the UI or about the performance of the app, etc.?

Page 9: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

9

What are the mobile specific topics users talk about in their reviews?

Page 10: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

10

manual analysis of ~1600 reviews

Page 11: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

11

Hmmm...

Mm No…

This is IT

Nope Nopity nope

• not all reviews are useful

Page 12: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

12

Hmmm...

Mm No…

This is IT

Nope Nopity nope

Sucks Way to many errors

0 stars Garbage.

problem bro

Garbage Bla bla bla

• not all reviews are useful

• some are even offensive

Page 13: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

13

Pretty close to perfect, this app is way better than any comic book

reader I've ever used. It's small, it operates fast, and the interface is

incredibly clean and simple.

• others can provide valuable information for the developer

Page 14: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

14

Pretty close to perfect, this app is way better than any comic book reader I've ever used. It's small,

it operates fast, and the interface is incredibly clean and

simple.

Resources

Usage

Page 15: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

15

For info (in case dev not already aware!), there is a graphical

glitch when scrolling output in marshmallow on a nexus 5.

Compatibility

Usage

Complaint

Page 16: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

16

Building the taxonomy

• feature extraction: TF-IDF scores and 2 and 3-grams counts

Content analysis in 2 passes:

• start with an empty list of categories

• analyse each review and add a new category (including definition and keywords) if necessary

• label the review with all the matching categories

• second pass: revisit the list of reviews and label

them with the appropriate categories

Page 17: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

17

Category Description

Compatibility mentions the OS, mobile device or a specific hardware component.

Usage talks about the UI or the usability of the app.

Resources mentions the app’s influence on the battery and memory usage or the performance of the app/phone.

Pricing statements mentioning the license model or the price of the app.

Protection statements referring to security or privacy issues.

Complaint the user reports or complains about an issue with the app.

High Level Taxonomy

Page 18: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

18

specialise the taxonomy further

Page 19: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

19

Liked it and worked very well in lollipop, but not MM The plugins don't refresh, manual navigation

to next image doesn't work. Some plugins give error.

Altogether seems broken after MM update on Note 4.

Compatibility

Page 20: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

20

Liked it and worked very well in lollipop, but not MM The plugins don't refresh, manual navigation

to next image doesn't work. Some plugins give error.

Altogether seems broken after MM update on Note 4.

Compatibility

Device

Android Version

Page 21: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

21

High Level Low Level Categories

Compatibility Device, Android Version, Hardware

Usage App Usability, UI

Resources Performance, Battery, Memory

Pricing Licensing, Price

Protection Security, Privacy

Low Level Taxonomy

Page 22: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

22

Automated Classification

Page 23: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

23

Gradient Boosted Trees Training

Preprocessing &

Feature ExtractionMulti-label

Classification

ML Approach

Page 24: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

24

Preprocessing & Feature Extraction

• preprocessing: stop words removal and stemming

• feature extraction: TF-IDF scores and 2 and 3-grams counts

Page 25: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

25

Training

• feature extraction: TF-IDF scores and 2 and 3-grams counts

• one-vs-all strategy: separate classifier for each high and low level category (18 in total)

• used the Gradient Boosted Trees model

Page 26: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

26

Multi-label Classification

PreprocessingFeature

Extraction ClassificationHigh & Low

Level Categories

++

++

Battery

UI

Complaint

Resources

Usage

Page 27: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

27

Example

• feature extraction: TF-IDF scores and 2 and 3-grams counts

RQ2: Does our approach correctly recommend the software artifacts that need to be modified in order to handle user requests and complaints?

• 752 user reviews from our dataset belong to AcDisplay

• analyse Compatibility and Complaint reviews (61 reviews)

• Complaint and Android Version (22 reviews)

Page 28: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

28

Example

• feature extraction: TF-IDF scores and 2 and 3-grams counts

“Good but has some issues with Marshmallow I used this on my old phone and if was flawless and I loved it. I noticed that sometimes when I had AcDisplay activated I would not be able to use the fingerprint sensor even after I unlocked AcDisplay and had to enter a password. This is very frustrating so I cannot use AcDisplay.”

“Love the design I love the app. It’s super sleek and nice. But ever since my phone updated to marshmallow it’s stopped working. Hope it comes back soon.”

“On Marshmallow, the screen is buggy and sometimes shows the notification shade.”

Page 29: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

29

• feature extraction: TF-IDF scores and 2 and 3-grams counts

• can we link reviews to the related source code?

• IR methods based on the VSM (hard task: the vocabulary used by reviews and source code is different)

• use additional Android project specific information (e.g. UI functionality is implemented in Activity classes)

Source Code Localisation

Page 30: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

30

Source Code Localisation

Android Project Structure Info

IR - VSM

Software ArtifactsApp’s Source Code

User Reviews

Page 31: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

31

Evaluation

• feature extraction: TF-IDF scores and 2 and 3-grams counts

RQ1: To what extent does our approach organise reviews according to meaningful maintenance and evolution tasks for developers?

RQ2: Does our approach correctly recommend the software artifacts that need to be modified in order to handle user requests and complaints?

Page 32: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

32

Reviews Source Code

Page 33: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

33

Study RQ1

• feature extraction: TF-IDF scores and 2 and 3-grams counts

• ~7800 user reviews from 39 apps

Page 34: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

34

Study RQ1

• feature extraction: TF-IDF scores and 2 and 3-grams counts

• 2 external evaluators

• evaluate 200 reviews for each category (3600 total)

Page 35: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

35

Results RQ1

High Level Category Precision Recall F1 Score

Compatibility 71% 97% 82%

Usage 89% 94% 91%

Resources 79% 99% 88%

Pricing 85% 97% 90%

Protection 89% 98% 93%

Complaint 85% 80% 82%

Page 36: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

36

Results RQ1High Level Category

Low LevelCategory Precision Recall F1

Score

CompatibilityDevice

OS Version Hardware

85% 89% 61%

98% 86% 95%

91% 87% 74%

Usage App Usability UI

92% 83%

91% 93%

91% 88%

ResourcesPerformance

Battery Memory

64% 78% 68%

97% 95% 95%

77% 86% 79%

Pricing Licensing Price

91% 85%

98% 96%

94% 90%

Protection Security Privacy

87% 83%

98% 96%

92% 89%

Page 37: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

37

Results RQ1

Our approach is able to classify reviews with high precision and recall according to the mobile specific topics we derived. The most important categories are Usage, Resources and Compatibility.

Page 38: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

38

Study RQ2

• 1 external evaluator

• 91 user reviews from 2 apps

Page 39: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

39

Results RQ2

• feature extraction: TF-IDF scores and 2 and 3-grams counts

Quality of Reviews Precision Recall F1 Score

Difficult to Link 41% 83% 55%

Easier to Link 52% 79% 63%

All 51% 79% 62%

Page 40: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

40

Results RQ2

Our approach achieves promising results in recommending related software artifacts for specific user reviews, furthermore better quality reviews are easier to link than lower quality ones.

Page 41: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

41

Conclusion & Future Work

• reviews can be classified with high precision and recall using machine learning according to mobile specific topics

• linking reviews to source code using textual similarity based methods is difficult

• future work: summarise reviews, improve localisation (static analysis)

Page 42: Analyzing Reviews and Code of Mobile Apps for Better Release Planning

42

Discussion

What mechanisms can we adopt for enabling a reliable and practical solution for code localisation?


Recommended