12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 On the Relationship between User Churn and Soſtware Issues Omar El Zarif [email protected] Queen’s University School Of Computing Kingston, Ontario, Canada Daniel Alencar Da Costa [email protected] University of Otago Department of Information Science Dunedin, Otago, New Zealand Safwat Hassan [email protected] Queen’s University School Of Computing Kingston, Ontario, Canada Ying Zou [email protected] Queen’s University Department of Electrical and Computer Engineering Kingston, Ontario, Canada ABSTRACT The satisfaction of users is only part of the success of a software product, since a strong competition can easily detract users from a software product/service. User churn is the jargon used to denote when a user changes from a product/service to the one offered by the competition. In this study, we empirically investigate the relationship between the issues that are present in a software prod- uct and user churn. For this purpose, we investigate a new dataset provided by the alternativeto.net platform. Alternativeto.net has a unique feature that allows users to recommend alternatives for a specific software product, which signals the intention to switch from one software product to another. Through our empirical study, we observe that (i) the intention to change software is tightly asso- ciated to the issues that are present in these software; (ii) we can predict the rate of potential churn using machine learning models; (iii) the longer the issue takes to be fixed, the higher the chances of user churn; and (iv) issues within more general software modules are more likely to be associated with user churn. Our study can provide more insights on the prioritization of issues that need to be fixed to proactively minimize the chances of user churn. KEYWORDS software issues, users churn, software alternatives, deep learning ACM Reference Format: Omar El Zarif, Daniel Alencar Da Costa, Safwat Hassan, and Ying Zou. 2020. On the Relationship between User Churn and Software Issues. In 17th International Conference on Mining Software Repositories (MSR ’20), October 5–6, 2020, Seoul, Republic of Korea. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3379597.3387456 Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. MSR’20, October 5–6, 2020, Seoul, Republic of Korea © 2020 Association for Computing Machinery. ACM ISBN 978-1-4503-7517-7/20/05. . . $15.00 https://doi.org/10.1145/3379597.3387456 1 INTRODUCTION User satisfaction is essential for the success of software projects [22, 23, 60]. In fact, even when the budget and the schedule of a project are in control, user dissatisfaction may still lead the project to failure [43, 76]. Users tend to discuss in public (e.g., public forums) their preferences and disappointments regarding software, in which they often mention what made them love or hate a software product (e.g., by providing ratings and detailed comments). Such data about user satisfaction has been continuously used by researchers to study the most important factors to explain user satisfaction. For example, Panichella et al. [52] identified useful user reviews of mobile apps, so that developers can improve their apps accordingly (e.g., by addressing feature requests within such reviews). Other research works have extracted user feedback [31], and studied the planning process of future releases based on user reviews [72]. Nevertheless, the success of software projects is not only defined by the relationship between the software product and its users, but also by the strengths and weaknesses of competitors. For instance, studies have shown that a poor user experience may create a point of no return in which the user is led to the competition or searches for alternative solutions [62]. In addition, a poor user experience may create a bad reputation for the software product, which impairs the adherence of new users [11] (which would likely adhere to the competitors). Therefore, an important area of study is to unveil the underlying reasons for losing users to competitors. User churn is the jargon used to denote when a user decides to change from a product/service to those offered by the competition. User churn has been studied extensively in areas other than software engineering, such as mobile operators and telecommuni- cation networks [20, 53, 73]. In fact, Wei et al. [73] observed that the cost of acquiring new users is more expensive than retaining users. Several services investigate their own user base characteris- tics to predict user churn. There has also been research targeting Yahoo Answers [24], StumbleUpon (a web content recommenda- tion system) [21], Top Eleven - Be A Football Manager (an online mobile game) [41], Pengyou (a Chinese social network) [37], using prediction models for predicting user churn. Although prior research has studied the concerns of users re- garding software products (including mobile apps), there has been a lack of empirical research to investigate user churn in the context of software products. Filling this gap is important to help open 1

On the Relationship between User Churn and Software Issues · relationship between the issues that are present in a software prod-uct and user churn. For this purpose, we investigate

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: On the Relationship between User Churn and Software Issues · relationship between the issues that are present in a software prod-uct and user churn. For this purpose, we investigate

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

On the Relationship between User Churn and Software IssuesOmar El Zarif

oelzarifcsqueensucaQueenrsquos University

School Of ComputingKingston Ontario Canada

Daniel Alencar Da Costadanielcalencarotagoacnz

University of OtagoDepartment of Information Science

Dunedin Otago New Zealand

Safwat HassanshassancsqueensucaQueenrsquos University

School Of ComputingKingston Ontario Canada

Ying ZouyingzouqueensucaQueenrsquos University

Department of Electrical and Computer EngineeringKingston Ontario Canada

ABSTRACTThe satisfaction of users is only part of the success of a softwareproduct since a strong competition can easily detract users from asoftware productservice User churn is the jargon used to denotewhen a user changes from a productservice to the one offeredby the competition In this study we empirically investigate therelationship between the issues that are present in a software prod-uct and user churn For this purpose we investigate a new datasetprovided by the alternativetonet platform Alternativetonet hasa unique feature that allows users to recommend alternatives fora specific software product which signals the intention to switchfrom one software product to another Through our empirical studywe observe that (i) the intention to change software is tightly asso-ciated to the issues that are present in these software (ii) we canpredict the rate of potential churn using machine learning models(iii) the longer the issue takes to be fixed the higher the chances ofuser churn and (iv) issues within more general software modulesare more likely to be associated with user churn Our study canprovide more insights on the prioritization of issues that need tobe fixed to proactively minimize the chances of user churn

KEYWORDSsoftware issues users churn software alternatives deep learning

ACM Reference FormatOmar El Zarif Daniel Alencar Da Costa Safwat Hassan and Ying Zou2020 On the Relationship between User Churn and Software Issues In 17thInternational Conference on Mining Software Repositories (MSR rsquo20) October5ndash6 2020 Seoul Republic of Korea ACM New York NY USA 12 pageshttpsdoiorg10114533795973387456

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page Copyrights for components of this work owned by others than ACMmust be honored Abstracting with credit is permitted To copy otherwise or republishto post on servers or to redistribute to lists requires prior specific permission andor afee Request permissions from permissionsacmorgMSRrsquo20 October 5ndash6 2020 Seoul Republic of Koreacopy 2020 Association for Computing MachineryACM ISBN 978-1-4503-7517-72005 $1500httpsdoiorg10114533795973387456

1 INTRODUCTIONUser satisfaction is essential for the success of software projects [2223 60] In fact even when the budget and the schedule of a projectare in control user dissatisfaction may still lead the project tofailure [43 76] Users tend to discuss in public (eg public forums)their preferences and disappointments regarding software in whichthey often mention what made them love or hate a software product(eg by providing ratings and detailed comments) Such data aboutuser satisfaction has been continuously used by researchers to studythe most important factors to explain user satisfaction For examplePanichella et al [52] identified useful user reviews of mobile appsso that developers can improve their apps accordingly (eg byaddressing feature requests within such reviews) Other researchworks have extracted user feedback [31] and studied the planningprocess of future releases based on user reviews [72]

Nevertheless the success of software projects is not only definedby the relationship between the software product and its users butalso by the strengths and weaknesses of competitors For instancestudies have shown that a poor user experience may create a pointof no return in which the user is led to the competition or searchesfor alternative solutions [62] In addition a poor user experiencemay create a bad reputation for the software product which impairsthe adherence of new users [11] (which would likely adhere to thecompetitors) Therefore an important area of study is to unveil theunderlying reasons for losing users to competitors User churn isthe jargon used to denote when a user decides to change from aproductservice to those offered by the competition

User churn has been studied extensively in areas other thansoftware engineering such as mobile operators and telecommuni-cation networks [20 53 73] In fact Wei et al [73] observed thatthe cost of acquiring new users is more expensive than retainingusers Several services investigate their own user base characteris-tics to predict user churn There has also been research targetingYahoo Answers [24] StumbleUpon (a web content recommenda-tion system) [21] Top Eleven - Be A Football Manager (an onlinemobile game) [41] Pengyou (a Chinese social network) [37] usingprediction models for predicting user churn

Although prior research has studied the concerns of users re-garding software products (including mobile apps) there has beena lack of empirical research to investigate user churn in the contextof software products Filling this gap is important to help open

1

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

source and corporate software organizations to retain their usersand improve their probability of success (not to mention the overallsatisfaction of users)

In this paper we use data obtained from the alternativetonet1website which has a unique feature that allows users to recommendalternatives for a specific software product The recommendationof alternatives can signal the intention to switch from one softwareproduct to another (we provide an example of this scenario in Sec-tion 2) For the sake of simplicity we refer to the recommendationfor an alternative software product as simply potential user churn

By using the alternativetonet dataset we formulate an empiricalstudy to investigate the (i) Web Browsers (ii) IDEs and (iii) WebServers domains on the alternativetonet website We first extract3556 reviews and 10081 comments to better understand the overallconcerns of users regarding the software products in the studieddomains Next we extract 12549 recommendations for alterna-tives by users (ie potential user churn) from the Firefox Eclipseand Apache projects We specifically choose these projects for thedeeper analyses because these projects are established in the WebBrowser IDE andWeb Server domains respectively

To investigate the underlying reasons for the potential user churnof software products we pose four research questions that guidesour study

bull RQ0 What are the most significant usersrsquo concerns for BrowsersIDEs and Web Servers Understanding usersrsquo concerns will pavethe way to grasp the reasons behind potential user churn Whendiscussing the pros and cons of a software product and competi-tors users voice varied concerns (eg security amp privacy issues ormemory issues)bull RQ1 Can we find a correlation between potential user churn andissue reports Given that the issues that are present in the studiedprojects are the predominant topic in usersrsquo concerns we studywhether there exists a correlation between the issue reports of agiven software project (ie Firefox Eclipse and Apache) and thepotential user churnbull RQ2 Can we use machine learning models to predict the rate ofpotential user churn After correlating issue reports with potentialuser churn we study whether machine learning can be used topredict the rate of potential user churn (expressed by high or lowuser churn) To build our machine learning models we use latentfeatures from the issue reports of our studied projectbull RQ3 What are the most important factors related to potential userchurn given that machine learning models can be used to predictthe rate of potential user churn we can then extract the mostimportant features that are associated to the potential user churnThe reoccurring bugs the time alive factor and the issues acrossmultiple platforms are common factors that are highly associatedwith the potential user churn

The paper is organized as the following we showcase a motivat-ing example in Section 2 We describe our data collection processin Section 3 We delve into our approach and show our obtainedresults in Section 4 We discuss the threats to validity in Section 5In Section 6 we present the related work Finally we conclude thestudy in Section 7

1httpsalternativetonet

Figure 1 Example of a pdf reader on the alternativetonetwebsite

2 MOTIVATING EXAMPLEThe goal of the alternativetonet website is to help users to findsoftware alternatives that can better address the usersrsquo necessitiesFor instance let us consider that a user needs a better pdf reader(eg the current reader freezes occasionally) The first challengeoccurs because the user is not aware of all the available softwarealternatives In addition choosing an alternative for a softwareis not always simple since users may be already familiar with aset of features (from the software in use) which they would notlike to compromise Considering the pdf reader example whilethe user wishes a freezing-free alternative the user may only feelcomfortable to change if the alternative provides the same level ofcommenting capabilities (as compared to the reader in use)

With such challenges in mind the alternativetonet website wasdesigned to help users choose the best software alternative for theirneeds Figure 1 shows an example of an initial page of a softwareproduct on alternativetonet This initial page provides a descriptionof the software features categories and tags The features cate-gories and tags are used to organize the software alternatives onalternativetonet and can be used to search for software alternativesInteresting to note is the ldquoAlternativesrdquo tab depicted in Figure 1which shows all the available alternatives for a software product(as deemed by the community)

Alternativetonet allows users to provide reviews for softwarealternatives along with ratings (similarly to Google Play whichallows reviews to be provided for mobile apps) However what setsalternativetonet apart from other platforms is that it allows users tovoice their opinions by placing a software product in perspective toits competitors For example Figure 2 shows the opinions voiced byusers as to whether (and why) Chrome is a good (or bad) alternativeto Firefox2

As shown in Figure 2 users do not only comment on the prod-uct being evaluated but also put the product in perspective to its

2httpsalternativetonetsoftwarefirefox

2

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

Figure 2 httpsalternativetonet provides in-perspectivecomments where users specify why a certain software alter-native (eg Firefox) can be better than the current software(eg Chrome)

Figure 3 Recording the recommendation for an alternativefor Firefox

competitors For example a user mentions ldquo[Chrome] does not haveas many languages and spell checking support as Firefox does3rdquo

Because the opinions are voiced with respect to competitors (iethe in-perspective comments) alternativetonet provides a uniqueopportunity for empirical studies Particularly in our study thein-perspective comments allow us to investigate the most recurrentconcerns within the competition scene of a software domain (weperform this investigation in RQ0)

Another interesting and unique characteristic in the aterna-tivetonet data is the register of users that recommended alternativesfor a given software product (ie the potential user churn) Forexample Figure 3 shows how alternativetonet records when a userindicates that Chrome can be a good alternative to the Firefox webbrowser (along with the reason why) This data is important be-cause it can be used to proactively identify potential user churn insoftware products (eg by using machine learning) Based on thisinformation we build machine learning models and report on theirresults in RQ2 and RQ3

Finally the potential user churn provided by alternativetonetallow us to investigate whether it is related to prominent issuesthat may be present in a software product For example while auser has expressed the reason for a possible user churn in Figure 3(ie Jupyter is slow on Firefox) the development team of Firefoxis also working on an issue report that expresses the same issue(see Figure 4) In this paper we hypothesize that there may existstrong links between potential user churn and the issue reports thatare submitted to software products (we perform this investigation inRQ1)

3To provide an opinion as to whether a software product is a good or a bad alternativeto another product users must prove they are not robots Only afterwards a text boxcontaining the submission form will be provided

Figure 4 Issue report expressing the problem voiced by theuser in Figure 3

We capitalize on these unique characteristics of the alterna-tivetonet data to perform our empirical study on the relationshipbetween potential user churn and software issues

3 DATA COLLECTIONAlternativetonet organizes the software alternatives through severalmeans such as categories features or tags Categories are the broad-est way to categorize software alternatives and alternativetonet hasover 25 categories of software alternatives at the time of this studyTags and features are more specific ways to categorize softwarealternatives (eg the ldquoIDErdquo tag or the ldquoresponsive designrdquo feature)Another characteristic of tags and features is that they can be addedby users However these new tags and features must be verified bythe alternativetonet community before being added to the websiteFor the purposes of our study we use tags to filter our data forsoftware alternatives because it is more precise than using cate-gories For example we can easily filter for the software alternativeswithin the Web Browser domain using the ldquoweb browserrdquo tag Onthe other hand it would be hard to filter our data based on thefeatures For example filtering by ldquoresponsive designrdquo may retrievesoftware from very distinct domains

For the sake of data robustness it is tempting to analyze allthe software alternatives from all the software domains that areavailable on the alternativetonetwebsite However since one of ourmain objectives is to study the relationship between potential userchurn and the issues that are present in the software products weneed to restrict our analyses to the domains of certain software Forinstance it would be virtually impossible to collect and analyze allthe issue reports from every software alternative listed on the alter-nativetonet website Every different software alternative may usea different Issue Tracking System which would require differentways for collecting data Therefore we analyze the software alter-natives within the domains of three well established open sourceprojects Eclipse4 Firefox5 and the Apache server6 Our choice forthese three projects was based on the extensive prior empiricalsoftware engineering research that has been conducted using theseprojects over the years [2 7 18 19 29 36 42 54]

To collect the data for our study we followed the process de-picted in Figure 5 Given our chosen studied projects we collectall the software alternatives tagged with the ldquoBrowserrdquo ldquoIDErdquo andldquoWeb Serverrdquo tags which are the respective domains of the EclipseFirefox and Apache projects In total we collect 290 browser alter-natives 296 IDE alternatives and 134 web-server alternatives Oncethe software alternatives are fetched we collect the in-perspective4httpswwweclipseorg5httpswwwmozillaorgen-USfirefoxnew6httpshttpdapacheorg

3

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

Table 1 A summary of our collected data

Domain Alternatives Reviews Comments Potential churn Issue reports

IDEs 296 1088 3036 4319 (Eclipse) 22758 (Eclipse)

Browsers 290 1754 5112 6421 (Firefox) 40749 (Firefox)

Web Servers 124 714 1933 1809 (Apache) 20965 (Apache)

comments (explained in Section 2) and the reviews for all the soft-ware alternatives We also collect the potential user churn (explainedin Section 2) for the Eclipse Firefox and Apache projects Finally inaddition to the data collected from alternativetonet we also collectthe issue reports from the respective Issue Tracking Systems of ourstudied projects We use the potential user churn data combinedwith the issue reports to perform our investigations in RQ1-RQ3Table 1 summarizes the data collected for our study (ie the datatype and the amount of collected data)

4 RESEARCH QUESTIONS amp RESULTSIn this section we present our research questions (RQs) along withtheir results For each RQ we present its motivation and explainthe approach that is used to conduct the RQ

RQ0 What are the most significant usersrsquoconcerns for Browsers IDEs and Web ServersMotivation The motivation behind RQ0 is to understand the mainconcerns within the competition scene of a specific software do-main This investigation is important to highlight what are themain strengths and weaknesses within a particular domain (egBrowsers) This knowledge can be useful because for example anorganization can study the main weaknesses within a domain andbetter assess whether its productsservices fall within the sameflawsApproach To study the most significant concerns related toBrowsers IDEs and Web Servers we use the in-perspect commentsand reviews data that were explained in Section 3 To analyze suchdata we adopt the Latent Dirichlet Allocation (LDA) to find themain topics that are present in the in-perspective comments andreviews data We use the LDA implementation provided by theMALLET framework [40] LDA is a probabilistic generative methodthat assumes a Dirichlet distribution of latent topics LDA is partic-ular useful for us since we are interested in finding the main topicsthat are present in the in-perspective comments and reviews dataHowever The LDA model requires the data to be pre-processedin a certain format for optimal results Therefore we follow thestandard text pre-processing steps [48]

Fixing Typographical Errors Typographical errors can bedue to the usage of internet slang in a written text or simply due tocommon language mistakes The most recurrent of errors that wefind in our data is the repetition of characters For example userstend to use words such as ldquobeeeestrdquo to emphasize their feelings Weuse the Pattern For Python [63] package to fix such typographicalerrors

Removing Stop Words Examples of stop words are of byand the These are words that are recurrently used in written textbut do not possess meaning by themselves We use the package

nltk [8] to remove such stop words This package contains a well-known corpus of stop words and is widely used for text filtering

Stemming andLemmatization For better results words shouldbe repeated as frequently as they can since LDA relies heavily onthe occurrences of words in different sentences Hence the usageof stemming and lemmatization is essential These methods trans-form inflectional forms of a word to its basic form For examplethe words ldquofixedrdquo ldquofixingrdquo and ldquofixesrdquo are transformed to the basicform ldquofixrdquo Stemming works by removing the suffix of the word(eg the word ldquocatsrdquo becomes ldquocatrdquo by removing the ldquosrdquo) whilelemmatization retrieves the basic form of a word from a dictionaryknown as lemma (by using morphological analysis) We use bothstemming and lemmatization to ensure that we obtain the mostunique words possible

Forming Bigrams and Trigrams Forming bigrams and tri-grams is essential to group the words which frequently appeartogether Some words do not inflict any significant meaning for theLDA model on their own However when grouped together suchwords may indicate a specific topic One example of these wordsis ldquocustomer supportrdquo which has a precise meaning (as opposed toonly ldquocustomerrdquo or ldquosupportrdquo which can hold diverse meanings)

To choose the optimal number of topics in our LDAmodel we usethe coherence scoremetric [56] This metric was found to be the bestin alliance with human perception [49 65] However even whenrelying on the coherence score (which naturally limits redundancyin the produced topics) there is still the possibility of duplicatetopics being generated Therefore we perform a manual analysisof the produced topics to eliminate duplicates Our LDA modelproduces topics in the form of keywords and their percentage-of-significance with respect to the topic For example (0097 lowastldquo119904119900119906119903119888119890rdquo + 0085 lowast ldquo119901119903119894119907119886119888119910rdquo + 0053 lowast ldquo119905119903119886119888119896rdquo + 0050 lowast ldquo119901119903119900 119891 119894119905rdquo +0038 lowast ldquo119902119906119886119899119905119906119898rdquo + 0038 lowast ldquo119888119900119897119897119890119888119905rdquo + 0035 lowast ldquo119901119903119900119905119890119888119905rdquo + 0032 lowastldquo119886119903119888ℎ119894119907119890rdquo + 0029 lowast ldquo119898119900119899119890119910rdquo + 0026 lowast ldquo119901119890119903 119891 119900119903119898119886119899119888119890rdquo) are a set ofkeywords that indicate Privacy as high level topic But some otherdistribution of keywords could refer to the same topic Two of theauthors conduct together the manual analyses of the topics Thefinal set of topics is only produced when full consensus is reachedbetween authors (ie the authors went over and discussed all thetopics) Because both authors have analyzed together all topics andagreements were reached on all the topics in the end we did notcompute inter-rater agreement

Once the topics are defined we extract 100 examples (aiming fora 95 confidence level with a 10 margin error) from our data foreach studied domain (ie IDE Browser and Web Server) totalling300 examples For each example two of the authors assign theirtopic Again two authors analyze and discuss each example sothat consensus is reached regarding the assigned topic Afterwardswe compute the topic distribution for each domain which canrepresent the most recurrent concerns for that domainFindings Tables 2 and 3 show the topics obtained by our LDAmodelWe also show some of the examples for each topic in the table(all the examples and analyses are available in our replication pack-age which is hosted on httpsdoiorg105281zenodo3610584

Our obtained topics can be categorized into (i) general topicswhich are present in the three domains and (ii) domain-specifictopics which might still be present in other domains but are more

4

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

Alternativetonet

Issue tracking system

Fetch data from

studied domains

IDE Browser

and Web Server

alternatives

Fetch in-

perspective

comment and

reviews

In-perspective

comments

Software

reviews

RQ0

Competition

scene of a

domain

Fetch issues

from Firefox

Eclipse and

Apache

Fetch potential

churn from

Eclipse Firefox

and Apache

Potential churns

Issue reports

Issues amp

Potential churn

pairs

RQ1 Correlating

issues with

potential churns

RQ2 Machine

learning to

predict potential

churns

RQ3 Important

features

associated with

potential churns

Figure 5 An overview of our data collection process

dominant in a specific domain Table 4 shows the distribution oftopics per domain based on our manual analyses We observe thatrelease and updates memory issues bugscrashes and cross-platformavailability can be considered as general topics (ie they are im-portant in all the three analyzed domains) Indeed these topics areintuitively present when handling any kind of software

On the other hand Documentation and testing were presentfor web servers and IDEs signalling that a robust documentation(which is often overlooked [55]) and with good support for test-ing tools may definitely attractretain more users when competingin the IDE and Web Sever Market For example an organizationmay direct their attention to these aspects when studying theircompetition

The bookmarks privacy and security were exclusively present inthe Browsers domain signalling that these are essential features toretain or attract users Simplistic design debug options code refac-toring and auto-completion are topics that target IDEs exclusively

Finally complaints about release and updates memory issues andbugscrashes can be considered as the main topics that drive usersto churn These topics were present in the three domains with thehighest percentages among each of the domains The percentagescombined exceed 50 in each of the domains (see Table 4)Summary Our observations suggest that the usersrsquo concerns aretightly related to issues (eg bugs or crashes) that are present inthe respective software products

RQ1 Can we find a correlation betweenpotential user churn and issue reportsMotivation In RQ0 we observe that software issues (eg bugsamp crashes and memory issues) are common concerns that maymotivate users to churn With such an observation we hypothesizein RQ1 that the potential usersrsquochurn observed on alternativetonetmay be correlated with the issue reports which developers addressin their respective projects This analysis is important because ifsuch a correlation exists the potential user churn information canbe possibly used to prioritize issue reports (for example the moreassociated with potential user churn the more important the issuereport)

Approach In our work we approach the recommendations foralternatives (ie potential usersrsquochurn) and the issue reports asevents occurring over time and interpret them as two time-seriesSuch an interpretation is handy for two reasons (i) we can studythe correlation between two trends (potential usersrsquochurn and issuereports) and (ii) we can study whether the specific peaks in issuereports can be associated with peaks in potential usersrsquochurn

In our study we group the two time series in a weekly basisie all the potential user churn that occurred within a week aregrouped together We use the weekly grouping because groupingissue reports in a daily-basis is too narrow to form a pattern whilegrouping them in a monthly-basis can over-generalize the patternsin issues The time-series of potential usersrsquochurn and the time-series of issues showcase the events (grouped in a weekly-basis)from March 2015 (the earliest date on alternativetonet) to July 2019forming 200 weeks grouping

The first step to correlated our two time-series is to verifywhetherthere exists a global correlation between them (ie whether thereexist common patterns among the increasing and decreasing trendsover time) To do so we use the cross-correlation which is a metricused in signal processing [77] This correlation calculates the simi-larity between two time-series as a function of the displacementfrom one time-series to another For example given two functions119891 and 119892 the cross-correlation calculates the degree to which a shiftin 119892 (along the x-axis) is identical to the shift in 119891

However since we are also interested in analyzing the relation-ship between the peaks of our two time-series we use the DynamicTimeWarping (DTW) [58] which was designed for comparing time-series varying at different speeds A peak in potential usersrsquochurnmay be followed by a peak in issue reports (or vice-versa) ie thepeaks may not necessarily occur at the same time Differently fromthe Eucledian distance which assumes that a point 119894 in a time-seriesmust be aligned to the 119894119905ℎ point in another time-series [38 75] DTWallows the variance in the offset between two time-series [59]

To employ DTW we use the tslearn [68] which is a machinelearning toolkit specialized for time series analysis The DTW com-putation produces a matrix that shows the distance between everytwo points in the graph of the two time-series The algorithm isdesigned in a way to choose the closest point that respects the

5

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

633

634

635

636

637

638

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

639

640

641

642

643

644

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

691

692

693

694

695

696

Table 2 The predominant topics in alternativetonet

Release and updatesKeywords ldquo119907119890119903119904119894119900119899rdquo ldquo119897119900119899119892119890119903 rdquo ldquo119903119890119897119890119886119904119890rdquo ldquo119889119900119908119899119897119900119886119889rdquo ldquo119900 119891 119891 119894119888119894119886119897rdquoExamples - The program seems to be no longer updated Last versions (2018) can be still downloaded from the official website

- Firefox version 60+ (Quantum) is better than Chrome Compare to previous versions of Firefox

Bookmarks FunctionalityKeywords ldquo119887119900119900119896119898119886119903119896rdquo ldquo119901119886119899119890119897rdquo ldquo119891 119886119907119900119903119894119905119890rdquo ldquo119904119906119901119901119900119903119905rdquo ldquo119904119910119899119888rdquoExamples - Firefox now allows bookmarks importing from Chrome which is a huge plus

- With the recent upgrade to 20 (including the bookmarks syncing feature that I liked in Firefox) Vivaldi just got better

Flash supportKeywords ldquo119901119897119906119892119894119899rdquo ldquo119907119894119889119890119900rdquo ldquo119894119899119904119905119886119897119897rdquo ldquo119891 119897119886119904ℎrdquo ldquo119898119890119889119894119886 minus 119901119897119886119910119890119903 rdquoExamples - Since its the only way to watch Flash video on an iPhone without jailbreaking Skyfire is a killer app thats worth getting for anyone Playback of Flash video over WiFi is

flawless and the quality is at very least watchable

Memory issuesKeywords ldquo119898119890119898119900119903119910rdquo ldquo119877119860119872rdquo ldquo119897119886119892rdquo ldquo119897119894119898119894119905rdquo ldquoℎ119890119886119907119910rdquoExamples - Great C++ IDE which balances memory usage and indexing solution Contains some basic refactoring functions Good for large projects and limited RAM with remaining

some capabilities of more complex C++ IDEs- Eclipse is really lagging after the last update The memory is filled whenever I run a Java swing applet- Chrome is so heavy it lags alot when I open multiple tabs

Privacy and securityKeywords ldquo119901119903119894119907119886119888119910rdquo ldquo119904119890119888119906119903119894119905119910rdquo ldquo119894119898119901119900119903119905119886119899119905rdquo ldquo119886119897119897119900119908rdquo ldquo119888119900119899119905119903119900119897rdquoExamples - Firefox allows far more control over your privacy than does Chrome (or even Chromium on which Chrome is based)

- Google Chrome is a great browser and a lot better than Vivaldi Browser Chrome keeps updating with cool stuff like the new cast feature better extensions and lot betterprivacy and security Vivaldi has more customizing features but has a long way to go to compete with Chrome

BugsCrashesKeywords ldquo119888119903119886119904ℎrdquo ldquo119887119906119892rdquo ldquo119891 119894119909rdquo ldquo119890119909119901119890119903119894119890119899119888119890rdquo ldquo119894119904119904119906119890rdquoExamples - Lots of users experience Adobe Flash Player hangs or crashes I dont know why it isnt fixed yet

If Chrome ever got its proxy support bugs fixed it would leap to the top of the list and become pretty much the only web browser Id ever use- I worked with NetBeans for a couple of years and always loved it but for professional users the yearly fee is well spent Nicer interface better Code Intelligence moreoptions to fine tune what happens on saving a file (eg saving or uploading to various locations) You get a couple of feature update each year plus bugfixes whereasNetBeans is sometimes very slow to update things (support for new version of a language) or to fix known bugs

Table 3 The predominant topics in alternativetonet (continued)

Debug optionsKeywords ldquo119891 119890119886119905119906119903119890rdquo ldquo119889119890119907119890119897119900119901119898119890119899119905rdquo ldquo119889119890119887119906119892rdquo ldquo119894119899119905119890119892119903119886119905119894119900119899rdquo ldquo119901119890119903 119891 119890119888119905rdquoExamples - Not enough debug and run options just a text editor Atom has not got any solid debugging features (and packages) out of the box or at all

- This is my favorite code editor It is open source and portable you can get a portable version that will fit onto a flash drive if youwant It is just as configurableas Sublime Text and has a large community producing great plugins it has GIT support built in and excellent support for Javascript HTML CSS and Pythonand others of course It has an integrated debugging system

Simplistic designKeywords ldquo119904119894119898119901119897119890rdquo ldquo119890119889119894119905119900119903 rdquo ldquo119897119894119892ℎ119905119908119890119894119892ℎ119905rdquo ldquo119889119890119904119894119892119899rdquo ldquo119899119900119905119890119901119886119889rdquoExamples - Really lightweight with some advanced not complete but easy to use auto complete features Simple design and customizable

Cross-platformKeywords ldquo119888119903119900119904119904 minus 119901119897119886119905 119891 119900119903119898rdquo ldquo119871119894119899119906119909rdquo ldquo119900119901119890119899 minus 119904119900119906119903119888119890rdquo ldquo119886119907119886119894119897119886119887119897119890rdquo ldquo119894119899119904119905119886119897119897rdquoExamples - Its cross platform and open source and bring the benefits of the JetBrains IDE platform which is well polished and powerful

- One of the best open source browsers that work on Linux and Windows

Code refactoring and auto-completionKeywords ldquo119888119900119889119890rdquo ldquordquo119886119906119905119900 minus 119888119900119898119901119897119890119905119894119900119899rdquo ldquo119903119890 119891 119886119888119905119900119903 rdquo ldquo119900119901119905119894119900119899rdquo ldquo119906119904119890rdquoExamples - For its young age it is well matured and a very powerful IDE with great CMake Support and other goodies like great refactoring tools and a sophisticated

auto-completion system

TestingKeywords ldquo119905119890119904119905rdquo ldquo119904119890119903119907119890119903 rdquo ldquo119889119890119901119897119900119910rdquo ldquo119908119890119887119904119894119905119890rdquo ldquo119904119890119905rdquoExamples - XAMPP is great tool to develop and test your website (particularly if it uses php and mysql databases) offline before putting it on a live server You could

also use XAMPP as an easy method of setting up a live server- Eclipse is far more superior than netbeans in unit testing functionality

DocumentationKeywords ldquo119889119900119888119906119898119890119899119905119886119905119894119900119899rdquo ldquo119904119890119903119907119890119903 rdquo ldquo119901119900119903119905rdquo ldquo119889119886119905119886119887119886119904119890rdquo ldquo119891 119900119903119906119898rdquoExamples - Easiest way to run a web server on Windows The performance is surprisingly good and lots of documentation on how to manage it

- Support is via forum only Documentation can be very poor particularly around product upgrades (one expects more from commercial packages) Due totheir poor documentation I lost several years of development databases when porting from an old to a new computer (they dont accurately document thelocation of the database files)- Eclipse lacks in adding plugins documentation

time constraints for continuity and monotonicity Finally two co-authors performed a manual analysis of the observed peaks in the

two time-series7 The goal of the manual analyses is to find whetherthe identified peaks are indeed likely related (eg the issue reports

7We share the data of our manual analyses at httpsdoiorg105281zenodo3610584

6

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

808

809

810

811

812

Table 4 The extracted topics percentages distribution overthe three domains

Topics Browsers() IDEs() Web servers()Release and updates 18 17 21Bookmarks functionality 12 0 0Memory issues 23 22 11Privacy and security 8 0 0BugsCrashes 24 19 20Debug options 0 10 0Simplistic design 0 8 2Cross-platform 8 6 5Code refactoring and auto-completion 0 10 0Testing 0 2 17Documentation 0 2 15Miscellaneous 7 4 9

within a peak actually describe problems mentioned in the poten-tial usersrsquochurn) Again both co-authors analyze and discuss everyanalyzed sample

In this RQ we perform our analyses only in our studied projects(as opposed to analyzing all the software alternatives in their do-mains) As discussed in Section 3 it would be impracticable toextract the data from all the different Issue Tracking Systems of thesoftware alternatives in the studied domainsFindingsWe respectively observe a cross-correlation of 076 072and 070 in Firefox Eclipse and Apache These cross-correlationscores indicate that there exist a global commonality between thetrends of the two time-series regardless of the timing factor

Regarding our DTW analysis we observe that the time fromthe filing of an issue report to the time of a spark in potentialusersrsquochurn span from 2 to 4 weeks (on the median for the threeprojects) Additionally our DTW correlation reveals a bidirectionalrelationship between the peaks in issue reports and the peaks inpotential usersrsquochurn For the cases where a peak in issue reportsleads to a peak in potential usersrsquochurn it is intuitive to think thatthe peak in potential usersrsquochurn is due to the frustration generatedby the issues described in the reports On the other hand the caseswhere a peak in potential usersrsquochurn is followed by a peak inissue reports may occur due to users being frustrated over non-obvious issues For example if we consider the potential user churndepicted in Figure 3 (ie ldquoJupyter is slow on Firefoxrdquo) the users mightnot find it obvious that such a problem should be reported to theFirefox team (eg users may simply interpret it as a characteristicof Firefox) Therefore in such cases issue reports would be filledonly a while after the potential usersrsquochurn have been expressed

We are specifically interested in studying the relationship be-tween reported issues (ie the documented issues in the trackersystems) and the potential user churn Our rationale is that under-standing the characteristics of issues that are associated with userchurn can help developers to prioritize such issues and avoid futurechurn (thereby retaining more users)

Finally Table 5 shows a subset of the examples found in ourmanual analyses We can indeed observe the apparent relationshipbetween the comments within the potential usersrsquochurn and thedescriptions (or titles) of issue reports The examples are followinga unidirectional link in which the issue report that occurs at acertain time would induce the potential user churn at a later timeSummary Our results suggest that there exists a significant cor-relation between software issues and potential user churn

RQ2 Can we use machine learning models topredict the rate of potential user churnMotivation In the practice most of the issue reports undergo hu-man estimations of the priority and the severity [5] usually basedon internal business-oriented factors (such as ROI [25]) Issuesmight be prioritized by their recency the affected platforms theversions of the software where the issue is present Feeding a ma-chine learning model to identify the potential user churn that arecaused by issues may help automate the issue report prioritizationprocess and ultimately save huge costs in software developmentApproach We train three different machine learning models topredict low and high user churn rates by using information from is-sue reports The first model is a Feed-Forward Neural Network withtwo hidden layers developed by using the framework Keras [17]Keras is widely used to train simple neural networks as it hasa higher abstraction than TensorFlow [1] Feed-Forward NeuralNetworks [64] are the most suitable type of neural networks tobe trained on tabular data [27 33 50] This network evaluates allthe possible combinations of different metrics values to guide thepredictions

Our second model is the XGboost model which is a gradientboosting tree [12 13] XGboost is widely used for binary classifica-tion problems Research has shown that XGboost performs betterthan neural networks and classic classification machine learningalgorithms in problems dealing with tabular data [14 45 69]

We also train Random Forest models [35] Random Forest isa classical machine learning algorithm which is widely used inproblems dealing with tabular data [26 51 57] In addition Ran-dom Forest models have been widely used in software engineeringresearch such as in defect prediction [67]

The goal of ourmodels is to predict the rate of potential usersrsquochurngiven the characteristics of issue reports The data used to fit ourmodels is the data obtained from our DTW associations Simply putthe DTW provides us with 119875 pairs of issue reports 119868 and potentialusersrsquochurn 119862 (ie 119875 lt 119868 119862 gt) Therefore we study the character-istics of the issues 119868 to predict the amount of potential user churn119862 We split the distribution of 119862 into two percentiles (ie low andhigh) The low percentile being from 0 to 50 and the high from51 to 100 Thus our models output dichotomous predictions iewhether the potential usersrsquochurn is low or high

The characteristics of issue reports that we include in our models(henceforth referred to as features) are described in Table 6 Thesefeatures are collected from the Issue Tracking systems of our threestudied projects We study features such as Component Hardwareand OS to account for the locality and the spread of the issuesFor example these features can indicate whether platform-specificissues or more general issues have more potential user churn

Other features such as the Severity and Priority showcase theprioritization used by developers The Severity and Priority areimportant for us to verify whether the prioritization performed bydevelopers is aligned with the rate of potential user churn

Features related to the type of the bugs (ie whether it is acrash) whether a version number is present or the target milestoneprovide us information regarding the tracking process of issuesThey are important to understand whether better-tracked issues

7

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

870

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

871

872

873

874

875

876

877

878

879

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

Table 5 Manual investigation of DTW correlation linking

FirefoxExample 1Correlation Linking Week 164 in issues to week 169 in switches (May 2018)Issue Report Title Firefox version 5303 appears high memory high resources usage and might causing for SSD damagesIssue Description Mozilla Firefox it is caching all of this data 4-5GB on my new SSD and its lost 4 from its health within 5 months due continuously writesoverwrites firefox cache on my SSDAltrnativeto Comment Switching to Chrome after version 5303 high memory is damaging my SSDExample 2Correlation Linking Week 66 in issues to week 68 in switches (November 2016)Issue Report Title Sessions are not cleared in the private windowIssue Description After closing private window (not a tab) and opening a new one I still logged inAltrnativeto Comment Chrome incognito functionality is more stable than Firefox Firefox doesnrsquot resets sessions

EclipseExample 1Correlation Linking Week 10 in issues to week 11 in switches (July 2015)Issue Report Title Not Java 8 CompatibleIssue Description When downloading Eclipse for the first time for Java development 64-bit (which is what my computer is) it is not running because it is not compatible with Java version 8

(which is the latest version of Java) but appears to be compatible with version 7 still -DosgirequiredJavaVersion=17Altrnativeto Comment Using Netbeans because Eclipse is not installingExample 2Correlation Linking Week 211 in issues to week 213 in switches (February 2019)Issue Report Title jar files in the exported application do not contain class filesIssue Description The application has many plugins It builds and runs well within eclipse but when I export the application using the eclipse export product wizard the jar files (corresponding to

the plugins) do not contain any class files they include meta-inf pluginxml and all the extra folders (eg lib assets etc when available) but they do NOT contain any class filesAltrnativeto Comment Didnrsquot find any problems in exporting jar (when recommending IntelliJ)

ApacheExample 1Correlation Linking Week 113 in issues to week 117 in switches (July 2017)Issue Report Title httpd 2426 no longer building against lua 531 or lua 534Issue Description Building Apache 2426 against lua 531 or 534 compiled from scratch (latest version as of this writing) does not work Compilation against lua 531 did work with 2425Altrnativeto Comment Some issues while building with lua (when recommending XAMPP)Example 2Correlation Linking Week 12 in issues to week 13 in switches (September 2019)Issue Report Title Apache Server is restarted time to time (Release 2410)Issue Description I am using Apache release 2410 in Windows Server 2008 Time to time the Apache server is getting restarted The service is unavailable for 3-5 minutes and after that the service

is started up automatically This issue occurs frequently twice a day two days once etcAltrnativeto Comment XAMPP is more stable I am experiencing random restarts throughout the day when using Apache

Table 6 The issue reports attributes for Firefox Eclipse andApache

Metric Description

Component consists of different components of the system such as bookmarkstheme toolbar etc

Status describes the status of the issue such as unconfirmed open re-solved etc

Resolution describes the resolution type of the issue such as fixed duplicateinvalid etc

Severity the assigned severity such as blocker critical normal etcHardware the different hardware affected by the issue such as desktop mobile

32 or 64 bits etcOS the different OS affected by the issue such as OSX Windows etcPriority the assigned priority from 1 to 5Type the assigned issue type such as defect enhancement or taskTime Alive the time difference between the date opened and the date resolved

in minutesTime since last change the time difference between the date opened and the date of last

change in minutesIs a Crash a boolean value to indicate if it is a crash reportHas a Version Number a boolean whether the issue report has a version numberHas a Target Milestone a boolean whether the issue report has a target numberHas a Regression Range a boolean value to indicate if the bug causes a regression testingNumber of comments the number of comments for the reportNumber of votes the number of votes for the report which validates the integrity of

the report by other usersNumber of Blocks the number of issues that are blocked by that issue until it is re-

solved

(ie with more tracking information) are associated with potentialusersrsquochurn

The number of comments votes and blocks showcase the effortinvested on discussing an issue Finally the time-alive and the time-since-last-change are two features related to the life-cycle of an issuereport Longer time-alive values indicate a lingering issue that hasnot been resolved yet while a short time-since-last-change values

indicate issues that have been recently addressed (and thus can stillbe reoccurring)

Our studied features are inspired by previous research in soft-ware engineering that have built machine learning models usinginformation present in issue reports [15 16 16 18 19 29 32 39 74]

We chose features that are common across the issue reports ofFirefox Eclipse and Apache We filtered out features that are uniqueto an issue tracker system only to ensure a fair comparison betweenour studied issue reports Afterwards we performed Spearmancorrelation tests [44] on the selected features in which each featureis tested against the set of all the other features We did not observeany correlation obtaining a value higher than 07 which suggeststhat our selected features are not correlated and can be safely fedinto our machine learning models [30]

To evaluate the performance of our models we use the Area Un-der the Curvemetric (AUC) AUC is useful for evaluating our modelsbecause our models output probabilities Therefore AUC showsthe discrimination power of our models in every probability thresh-old (unlike Precision Recall or F-measure which are limited toevaluating models at a single probability threshold [66]) The AUCvalues range from 0 to 1 An AUC of 05 denotes a random guessingmodel while an AUC of 1 denotes a perfect distinguishing powerAn AUC of 0 denotes a model with perfect inverse predictions

Given the temporal nature of our data (ie future issue reportscannot be used to predict the potential user churn of past issuereports) we adopt a leave-one-out validation approach to obtainour AUC values

First our issue reports have already been sorted when we per-form the time-series analyses Second we split our data into two

8

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

987

988

989

990

991

992

993

994

995

996

997

998

999

1000

1001

1002

1003

1004

1005

1006

1007

1008

1009

1010

1011

1012

1013

1014

1015

1016

1017

1018

1019

1020

1021

1022

1023

1024

1025

1026

1027

1028

1029

1030

1031

1032

1033

1034

1035

1036

1037

1038

1039

1040

1041

1042

1043

1044

sets one containing 80 of the data ie the training set and an-other containing 20 ie the validation set In the first iterationof our models we use 80 of the data to train our models and weuse the first element of the validation set to test out models Thenthe leave-one-out validation iteratively increases the training setsize by one element (from the validation set) and predicts the nextelement of the validation set This process is repeated until thetraining set reaches the size of 119873 minus 1 (where 119873 is the size of ourdata set) Finally our obtained AUC values are the aggregation(ie median) of all the AUC values obtained in the leave-one-outiterationsFindingsTheAUC values obtained for our threemodels are shownin our online appendix8 The XGboost model performs the best inthe three studied projects obtaining AUC values that exceed 083There exists a subtle difference between the AUC values of theXGboost model and the Feed Forward Neural Networks AUCssince both the XGboost model and the Neural Networks can graspthe complex relationships in our issue reports dataSummary Our results suggest that machine learning models suchas XGboost and Feed Forward Neural Networks can predict therate of user churn with relatively high accuracies (in terms of AUC)Such predictions could help in the prioritization of issue reports

RQ3 What are the most important factorsrelated to potential user churnMotivation In RQ2 we observe that machine learning modelscan be used to predict the rate of potential usersrsquochurn Howeverdevelopers would hardly blindly trust a machine learning modelto help on their decisions (and they should not do so) Developerswould mostly benefit from understanding the reasons behind thepredictions of our machine learning models so that they couldpossibly adapt (or not) their development process Therefore inRQ3 we investigate the most important features in the predictionsof our machine learning modelApproach In this RQ we study the most important features in thepredictions of the XGboost models since they obtain the best AUCvalues in our three studied projects

To find the most important features we adopt a simple featureextraction algorithm [61 70] Consider that our models are trainedon a feature set 119883 = 1199091 1199092 119909119899 (as explained in RQ2) The featureextraction process consists of iteratively (i) removing each feature 119909119894from our models and (ii) computing the AUC without each feature119909119894 (by using the same leave-one-out validation process explainedin RQ2) The AUC values from the models without a feature 119909119894 arecompared to the models containing all the features (ie we takethe difference 119860119880119862119900119903119894119892119894119899119886119897 minus119860119880119862119903119890119898119900119907119890119889 ) The higher the drop inthe AUC value caused by removing a certain feature 119909119894 the higherthe importance of such a feature 119909119894 [4 9 10]Findings Table 7 shows our obtained results after performing thefeature extraction process The table shows the drop in the AUC foreach feature (ie ΔAUC) We also show the minimum maximummean and median values of the features in the 119897119900119908 user churn rate(ie 0) and ℎ119894119892ℎ user churn rate (ie 1) prediction classes

8httpsdoiorg105281zenodo3610584

The Issue Alive Time is the most important feature obtaining thehighest ΔAUC in our three studied projects This result suggeststhat it is unhealthy to leave issues hanging for a long time as theycan be perceived as a reason for users to churn

The Number of Comments obtains a considerable ΔAUC in ourthree studied systems The association between the Number ofComments and the rate of user churnmay occur due to themultitudenumber of users being affected by an issue

The Time-since-last-change is also an important feature in theFirefox and Eclipse projects This result suggests that issues withrecurring changes or modifications may signal a certain instabilityin the fixing process which might cause a potential user churn inthe future

It is worthy noting that features such as Severity Priority andType of Bugs do not obtain a high importance in the predictionswhich might suggest that the existing prioritization fields used inissue reports are misaligned with the potential user churnSummaryOur results suggest that long lived issues can potentiallylead to more potential userrsquos churn and that the existing processesfor prioritizing issues should be augmented to capture the highlyinteractive and long lived issues

5 THREATS TO VALIDITYConstruct Validity The main construct validity of our study isrelated to the assumption of a potential user churn The alterna-tivetonet website does not record exactly when a user has chosen asoftware alternative over another software Instead alternativetonetrecords that a user thinks that a software may be a good alternativeto another software The motivation behind the act of signaling analternative is not known by us For example there might be usersthat are simply driven by the passion of contributing to the alterna-tivetonet community which lead them to provide several opinionsas to which software could make good alternatives to others Wehave deliberately adopted the term potential user churn instead ofsimply user churn due to this unknown motivations behind signal-ing alternatives Moreover it is reasonable to think that if a givensoftware has received a considerable amount of recommendationfor alternatives this may likely indeed represent that users areconsidering to change the software

Internal Validity Our internal threats to validity are mainlyrelated to (i) our chosen features and (ii) our prediction classes Wesplit the peaks of potential user churn into the low and high classesOther research may find different results if the potential user churnis modeled in different way In addition we acknowledge that ourset of metrics is not exhaustive For example we have not studiedwhether the presence of a stack-trace in an issue report can berelated to future potential user churn We plan to (i) model thepotential user churn as a continuous variable and (ii) extend our setof features to predict the potential user churn in future research

External Validity Our study targeted three domains that areweb servers IDEs and web browsers These three domains werechosen due to their availability since we could fetch their issuereports data Although our study showed common factors in ouranalyses we cannot generalize our results to other domains orprojects with a different scale (ie smaller projects) Regardless ourwork provides insights of the possible reasons behind user churn

9

1045

1046

1047

1048

1049

1050

1051

1052

1053

1054

1055

1056

1057

1058

1059

1060

1061

1062

1063

1064

1065

1066

1067

1068

1069

1070

1071

1072

1073

1074

1075

1076

1077

1078

1079

1080

1081

1082

1083

1084

1085

1086

1087

1088

1089

1090

1091

1092

1093

1094

1095

1096

1097

1098

1099

1100

1101

1102

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

1103

1104

1105

1106

1107

1108

1109

1110

1111

1112

1113

1114

1115

1116

1117

1118

1119

1120

1121

1122

1123

1124

1125

1126

1127

1128

1129

1130

1131

1132

1133

1134

1135

1136

1137

1138

1139

1140

1141

1142

1143

1144

1145

1146

1147

1148

1149

1150

1151

1152

1153

1154

1155

1156

1157

1158

1159

1160

Table 7 The most important features for Firefox Eclipse and Apache ranked by their drop in AUC

Factor ΔAUC Min 0 Max 0 Mean 0 Median 0 Min 1 Max 1 Mean 1 Median 1

FirefoxTime Alive 013 0 120 1102 13 10 1487 4872 63OS All 007 0 1 014 0 0 1 083 1Number of Comments 006 1 65 894 12 1 458 2811 45Time Since Last Change 006 9 30 1090 22 0 5 410 3Has a Target Milestone 005 0 1 022 0 0 1 079 1Has a Version Number 004 0 1 033 0 0 1 064 1Component UI Web Payments 003 0 1 042 0 0 1 061 1

EclipseTime Alive 012 0 103 901 12 8 1334 4396 49Has a Target Milestone 006 0 1 019 0 0 1 082 1Time since last change 006 7 27 851 19 0 6 312 3Has a Version Number 005 0 1 034 0 0 1 077 1Component Debug 004 0 1 047 0 0 1 072 1Number of Comments 003 1 72 1092 11 1 632 2701 72OS All 002 0 1 050 0 0 1 068 1

ApacheTime Alive 014 0 136 1132 14 14 1521 5310 70Component Utils 010 0 1 022 0 0 1 091 1Has a Version Number 009 0 1 038 0 0 1 086 1Number of Comments 005 1 92 1232 6 1 352 4207 79OS All 004 0 1 033 0 0 1 079 1Status REOPENED 003 0 1 040 0 0 1 073 1Hardware All 002 0 1 051 0 0 1 078 1

6 RELATEDWORKGiven that we study the relationship between potential user churnand issue reports our research is related to the areas of usercustomerchurn and software quality Therefore we survey the related re-search around these two areas in this section

UserCustomerChurnUser or customer churn has beenwidelystudied in the area of telecommunication services [3 28 71] Aminet al [3] proposed a cross-company machine learning model topredict customer churn The authors also propose the use of datatransformation (eg by using the log function) to improve the train-ing data Ullah et al [71] used feature selection techniques such asInformation Gain and Correlation Attributes Ranking Filter to selectwhich features better explained the churn behaviour of differentcustomer groups The authors observed that a Random Forest modelperformed the best in their study While the Business to Customers(B2C) churn has been widely studied Figalist et al [28] recentlyinvestigated the Business to Business (B2B) churn ie when thecustomer business changes shifts to the services provided by thecompetition Figalist et al [28] allude that although B2B relation-ships tend to be more stable they have a much bigger financial im-pact when they change In terms of software engineering there hasbeen a lack of studies that have investigated usercostumer churndirectly Instead much research has been invested on customerreviews regarding software products [6 46 47] Noei et al [46]studied which features from mobile apps are associated with betterranks in the Google Play store While the work by Noei et al [46]does not study the customer churn directly striving for better ranksin Google Play may avoid user churn in mobile apps Bavota etal [6] investigated the relationship between the usage of fault- andchange-prone APIs in Android apps and user reviews Differentlyfrom all the aforementioned work our study investigates the userchurn is software products

Software Quality Although there exist a lack of studies ana-lyzing the user churn of software productsservices there has beena considerable amount of research on software quality [5 34 39 6667 74] Research on software quality is important since a quality

software is more stable and less defect-prone Ultimately such char-acteristics are tightly related to the observations of our study(iesoftware issues are correlated with user churn) A prominent areaof software quality research in software engineering is the defectprediction area [66 67] Tantithamthavorn et al [66] studied howmuch improvement can be obtained by automatically optimizingdefect prediction models Their research shows that automatic op-timization can yield significant or insignificant performance gainsdepending on the machine learning algorithm Other considerableamount of research has been invested on the triaging part of issuereports [5] and on the effort estimation of an issue report (in termsof time to address an issue) [39 74] In this paper we complementprior research by studying the relationship shared between issuereports (eg defects or required enhancements) with the potentialuser churn of users

7 CONCLUSIONIn this work we study the data available on alternativetonet tobetter understand the relationship between software issues and thepotential user churn of users Having observed that user concernsare tightly related to software issues (eg bugs) we investigatethe relationship between issue reports and the potential user churnof users Our study reveals key issues that must be addressed forthe success of a software product (depending on the domain) Forexample we observe that the potential user churn of users may betightly related to the lack of a robust documentation and support fortesting tools (in the ldquoIDErdquo and ldquoWeb Serverrdquo domains) Finally ourmachine learning models reveal that (i) the longer the issue takes tobe fixed the higher the chances of user churn and (ii) issues withinmore general software components are more likely to be associatedwith user churn Finally we suggest that the current prioritizationperformed by developers should be augmented to encompass thelong lived and highly interactive issues In overall our study sug-gests that the prioritization process of issues can be improved byconsidering the potential user churn of users associated with suchissues

10

1161

1162

1163

1164

1165

1166

1167

1168

1169

1170

1171

1172

1173

1174

1175

1176

1177

1178

1179

1180

1181

1182

1183

1184

1185

1186

1187

1188

1189

1190

1191

1192

1193

1194

1195

1196

1197

1198

1199

1200

1201

1202

1203

1204

1205

1206

1207

1208

1209

1210

1211

1212

1213

1214

1215

1216

1217

1218

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

1219

1220

1221

1222

1223

1224

1225

1226

1227

1228

1229

1230

1231

1232

1233

1234

1235

1236

1237

1238

1239

1240

1241

1242

1243

1244

1245

1246

1247

1248

1249

1250

1251

1252

1253

1254

1255

1256

1257

1258

1259

1260

1261

1262

1263

1264

1265

1266

1267

1268

1269

1270

1271

1272

1273

1274

1275

1276

REFERENCES[1] Martiacuten Abadi Paul Barham Jianmin Chen Zhifeng Chen Andy Davis Jeffrey

Dean Matthieu Devin Sanjay Ghemawat Geoffrey Irving Michael Isard Man-junath Kudlur Josh Levenberg Rajat Monga Sherry Moore Derek G MurrayBenoit Steiner Paul Tucker Vijay Vasudevan Pete Warden Martin Wicke YuanYu and Xiaoqiang Zheng 2016 Tensorflow A system for large-scale machinelearning In 12th USENIX Symposium on Operating Systems Design and Imple-mentation (OSDI 16) 265ndash283

[2] W Abdelmoez Mohamed Kholief and Fayrouz M Elsalmy 2012 Bug fix-time pre-diction model using naiumlve bayes classifier In 2012 22nd International Conferenceon Computer Theory and Applications (ICCTA) IEEE 167ndash172

[3] Adnan Amin Babar Shah Asad Masood Khattak Thar Baker and Sajid Anwar2018 Just-in-time customer churn prediction with and without data transforma-tion In 2018 IEEE Congress on Evolutionary Computation (CEC) IEEE 1ndash6

[4] Hafeez Ullah Amin Aamir Saeed Malik Rana Fayyaz Ahmad Nasreen BadruddinNidal Kamel Muhammad Hussain and Weng-Tink Chooi 2015 Feature extrac-tion and classification for EEG signals using wavelet transform and machinelearning techniques Australasian physical amp engineering sciences in medicine 381 (2015) 139ndash149

[5] John Anvik Lyndon Hiew and Gail C Murphy 2006 Who should fix this bugIn Proceedings of the 28th international conference on Software engineering ACM361ndash370

[6] Gabriele Bavota Mario Linares-Vasquez Carlos Eduardo Bernal-Cardenas Mas-similiano Di Penta Rocco Oliveto and Denys Poshyvanyk 2014 The impactof api change-and fault-proneness on the user ratings of android apps IEEETransactions on Software Engineering 41 4 (2014) 384ndash407

[7] Pamela Bhattacharya and Iulian Neamtiu 2011 Bug-fix time prediction modelscan we do better In Proceedings of the 8thWorking Conference on Mining SoftwareRepositories ACM 207ndash210

[8] Steven Bird and Edward Loper 2004 NLTK the natural language toolkit InProceedings of the ACL 2004 on Interactive poster and demonstration sessionsAssociation for Computational Linguistics 31

[9] Andrew P Bradley 1997 The use of the area under the ROC curve in theevaluation of machine learning algorithms Pattern recognition 30 7 (1997)1145ndash1159

[10] Leo Breiman 2001 Random forests Machine learning 45 1 (2001) 5ndash32[11] Albert Caruana and Michael T Ewing 2010 How corporate reputation quality

and value influence online loyalty Journal of Business Research 63 9-10 (2010)1103ndash1110

[12] Tianqi Chen and Carlos Guestrin 2016 Xgboost A scalable tree boosting systemIn Proceedings of the 22nd acm sigkdd international conference on knowledgediscovery and data mining ACM 785ndash794

[13] Tianqi Chen Tong He Michael Benesty Vadim Khotilovich and Yuan Tang 2015Xgboost extreme gradient boosting R package version 04-2 (2015) 1ndash4

[14] Wenbin Chen Kun Fu Jiawei Zuo Xinwei Zheng Tinglei Huang and WenjuanRen 2017 Radar emitter classification for large data set based on weighted-xgboost IET Radar Sonar amp Navigation 11 8 (2017) 1203ndash1207

[15] Morakot Choetkiertikul Hoa Khanh Dam Truyen Tran and Aditya Ghose 2015Predicting delays in software projects using networked classification (t) In 201530th IEEEACM International Conference on Automated Software Engineering (ASE)IEEE 353ndash364

[16] Morakot Choetkiertikul Hoa Khanh Dam Truyen Tran Aditya Ghose and JohnGrundy 2017 Predicting delivery capability in iterative software developmentIEEE Transactions on Software Engineering 44 6 (2017) 551ndash573

[17] Franccedilois Chollet 2017 Keras[18] Daniel Alencar Da Costa Shane McIntosh Uiraacute Kulesza Ahmed E Hassan and

Surafel Lemma Abebe 2018 An empirical study of the integration time of fixedissues Empirical Software Engineering 23 1 (2018) 334ndash383

[19] Daniel Alencar da Costa Shane McIntosh Christoph Treude Uiraacute Kulesza andAhmed E Hassan 2018 The impact of rapid release cycles on the integrationdelay of fixed issues Empirical Software Engineering 23 2 (2018) 835ndash904

[20] Piew Datta Brij Masand Deepak R Mani and Bin Li 2000 Automated cellularmodeling and prediction on a large scale Artificial Intelligence Review 14 6 (2000)485ndash502

[21] Kushal S Dave Vishal Vaingankar Sumanth Kolar and Vasudeva Varma 2013Timespent based models for predicting user retention In Proceedings of the 22ndinternational conference on World Wide Web ACM 331ndash342

[22] William H DeLone and Ephraim R McLean 1992 Information systems successThe quest for the dependent variable Information systems research 3 1 (1992)60ndash95

[23] William H DeLone and Ephraim R McLean 2003 Model of information systemssuccess a ten years update Journal of Management (2003)

[24] Gideon Dror Dan Pelleg Oleg Rokhlenko and Idan Szpektor 2012 Churnprediction in new users of Yahoo answers In Proceedings of the 21st InternationalConference on World Wide Web ACM 829ndash834

[25] Khaled El Emam 2005 The ROI from software quality Auerbach Publications[26] Katherine Ellis Jacqueline Kerr Suneeta Godbole Gert Lanckriet David Wing

and Simon Marshall 2014 A random forest classifier for the prediction of energy

expenditure and type of physical activity from wrist and hip accelerometersPhysiological measurement 35 11 (2014) 2191

[27] David Enke and Suraphan Thawornwong 2005 The use of datamining and neuralnetworks for forecasting stock market returns Expert Systems with applications29 4 (2005) 927ndash940

[28] Iris Figalist Christoph Elsner Jan Bosch and Helena Holmstroumlm Olsson 2019Customer churn prediction in B2B contexts In International Conference on Soft-ware Business Springer 378ndash386

[29] Emanuel Giger Martin Pinzger and Harald Gall 2010 Predicting the fix timeof bugs In Proceedings of the 2nd International Workshop on RecommendationSystems for Software Engineering ACM 52ndash56

[30] Frank E Harrell Jr 2015 Regression modeling strategies with applications to linearmodels logistic and ordinal regression and survival analysis Springer

[31] Steffen Hedegaard and Jakob Grue Simonsen 2013 Extracting usability and userexperience information from online user reviews In Proceedings of the SIGCHIConference on Human Factors in Computing Systems ACM 2089ndash2098

[32] Yujuan Jiang Bram Adams and Daniel M German 2013 Will my patch make itand how fast Case study on the linux kernel In Proceedings of the 10th WorkingConference on Mining Software Repositories IEEE Press 101ndash110

[33] Guolin Ke Jia Zhang Zhenhui Xu Jiang Bian and Tie-Yan Liu 2019 TabNNa universal neural network solution for tabular data httpsopenreviewnetforumid=r1eJssCqY7

[34] Foutse Khomh Brian Chan Ying Zou and Ahmed E Hassan 2011 An entropyevaluation approach for triaging field crashes A case study of mozilla firefox InReverse Engineering (WCRE) 2011 18th Working Conference on IEEE 261ndash270

[35] Andy Liaw and Matthew Wiener 2002 Classification and regression by randomforest R news 2 3 (2002) 18ndash22

[36] Erik Linstead Paul Rigor Sushil Bajracharya Cristina Lopes and Pierre Baldi2007 Mining eclipse developer contributions via author-topic models In FourthInternational Workshop on Mining Software Repositories (MSRrsquo07 ICSE Workshops2007) IEEE 30ndash30

[37] Xi Long Wenjing Yin Le An Haiying Ni Lixian Huang Qi Luo and Yan Chen2012 Churn analysis of online social network users using data mining techniquesIn Proceedings of the international MultiConference of Engineers and ConputerScientists Vol 1

[38] James MacQueen 1967 Some methods for classification and analysis of multivari-ate observations In Proceedings of the fifth Berkeley symposium on mathematicalstatistics and probability Vol 1 Oakland CA USA 281ndash297

[39] Lionel Marks Ying Zou and Ahmed E Hassan 2011 Studying the fix-timefor bugs in large open source projects In Proceedings of the 7th InternationalConference on Predictive Models in Software Engineering ACM 11

[40] Andrew Kachites McCallum 2002 Mallet A machine learning for languagetoolkit (2002)

[41] Miloš Milošević Nenad Živić and Igor Andjelković 2017 Early churn predic-tion with personalized targeting in mobile social games Expert Systems withApplications 83 (2017) 326ndash332

[42] Audris Mockus Roy T Fielding and James Herbsleb 2000 A case study ofopen source software development the Apache server In Proceedings of the 22ndinternational conference on Software engineering Acm 263ndash272

[43] Gustavo Percio Zimmermann Montesdioca and Antocircnio Carlos Gastaud Maccedilada2015 Measuring user satisfaction with information security practices Computersamp Security 48 (2015) 267ndash280

[44] Leann Myers and Maria J Sirois 2004 Spearman correlation coefficients differ-ences between Encyclopedia of statistical sciences 12 (2004)

[45] Didrik Nielsen 2016 Tree boosting with XGBoost-why does XGBoost win EveryMachine Learning Competition Masterrsquos thesis NTNU

[46] Ehsan Noei Daniel Alencar Da Costa and Ying Zou 2018 Winning the appproduction rally In Proceedings of the 2018 26th ACM Joint Meeting on EuropeanSoftware Engineering Conference and Symposium on the Foundations of SoftwareEngineering ACM 283ndash294

[47] Ehsan Noei Feng Zhang Shaohua Wang and Ying Zou 2019 Towards pri-oritizing user-related issue reports of mobile applications Empirical SoftwareEngineering 24 4 (2019) 1964ndash1996

[48] Ehsan Noei Feng Zhang and Ying Zou 2019 Too Many User-Reviews WhatShould App Developers Look at First IEEE Transactions on Software Engineering(2019)

[49] Derek Orsquocallaghan Derek Greene Joe Carthy and Paacutedraig Cunningham 2015An analysis of the coherence of descriptors in topic modeling Expert Systemswith Applications 42 13 (2015) 5645ndash5657

[50] Ya A Pachepsky Dennis Timlin and GY Varallyay 1996 Artificial neural net-works to estimate soil water retention from easily measurable data Soil ScienceSociety of America Journal 60 3 (1996) 727ndash733

[51] Mahesh Pal 2005 Random forest classifier for remote sensing classificationInternational Journal of Remote Sensing 26 1 (2005) 217ndash222

[52] Sebastiano Panichella Andrea Di Sorbo Emitza Guzman Corrado A VisaggioGerardo Canfora and Harald C Gall 2015 How can i improve my app classifyinguser reviews for software maintenance and evolution In Software maintenanceand evolution (ICSME) 2015 IEEE international conference on IEEE 281ndash290

11

1277

1278

1279

1280

1281

1282

1283

1284

1285

1286

1287

1288

1289

1290

1291

1292

1293

1294

1295

1296

1297

1298

1299

1300

1301

1302

1303

1304

1305

1306

1307

1308

1309

1310

1311

1312

1313

1314

1315

1316

1317

1318

1319

1320

1321

1322

1323

1324

1325

1326

1327

1328

1329

1330

1331

1332

1333

1334

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

1335

1336

1337

1338

1339

1340

1341

1342

1343

1344

1345

1346

1347

1348

1349

1350

1351

1352

1353

1354

1355

1356

1357

1358

1359

1360

1361

1362

1363

1364

1365

1366

1367

1368

1369

1370

1371

1372

1373

1374

1375

1376

1377

1378

1379

1380

1381

1382

1383

1384

1385

1386

1387

1388

1389

1390

1391

1392

[53] Chitra Phadke Huseyin Uzunalioglu Veena B Mendiratta Dan Kushnir andDerek Doran 2013 Prediction of subscriber churn using social network analysisBell Labs Technical Journal 17 4 (2013) 63ndash76

[54] Peter C Rigby Daniel M German and Margaret-Anne Storey 2008 Open sourcesoftware peer review practices a case study of the apache server In Proceedingsof the 30th international conference on Software engineering ACM 541ndash550

[55] Martin P Robillard and Robert Deline 2011 A field study of API learning obstaclesEmpirical Software Engineering 16 6 (2011) 703ndash732

[56] Michael Roumlder Andreas Both and Alexander Hinneburg 2015 Exploring thespace of topic coherence measures In Proceedings of the eighth ACM internationalconference on Web search and data mining ACM 399ndash408

[57] Victor Francisco Rodriguez-Galiano Bardan Ghimire John Rogan Mario Chica-Olmo and Juan Pedro Rigol-Sanchez 2012 An assessment of the effectivenessof a random forest classifier for land-cover classification ISPRS Journal of Pho-togrammetry and Remote Sensing 67 (2012) 93ndash104

[58] Hiroaki Sakoe and Seibi Chiba 1978 Dynamic programming algorithm opti-mization for spoken word recognition IEEE transactions on acoustics speech andsignal processing 26 1 (1978) 43ndash49

[59] Stan Salvador and Philip Chan 2007 Toward accurate dynamic time warping inlinear time and space Intelligent Data Analysis 11 5 (2007) 561ndash580

[60] Peter B Seddon 1997 A respecification and extension of the DeLone and McLeanmodel of IS success Information systems research 8 3 (1997) 240ndash253

[61] Rudy Setiono Bart Baesens and Christophe Mues 2008 Recursive neural net-work rule extraction for data with mixed attributes IEEE Transactions on NeuralNetworks 19 2 (2008) 299ndash307

[62] Dong-Hee Shin 2015 Effect of the customer experience on satisfaction withsmartphones Assessing smart satisfaction index with partial least squaresTelecommunications Policy 39 8 (2015) 627ndash641

[63] Tom De Smedt and Walter Daelemans 2012 Pattern for python Journal ofMachine Learning Research 13 Jun (2012) 2063ndash2067

[64] Daniel Svozil Vladimir Kvasnicka and Jiri Pospichal 1997 Introduction to multi-layer feed-forward neural networks Chemometrics and intelligent laboratorysystems 39 1 (1997) 43ndash62

[65] Shaheen Syed and Marco Spruit 2017 Full-text or abstract Examining topiccoherence scores using latent dirichlet allocation In 2017 IEEE Internationalconference on data science and advanced analytics (DSAA) IEEE 165ndash174

[66] Chakkrit Tantithamthavorn Shane McIntosh Ahmed E Hassan and KenichiMatsumoto 2016 Automated parameter optimization of classification techniquesfor defect prediction models In 2016 IEEEACM 38th International Conference onSoftware Engineering (ICSE) IEEE 321ndash332

[67] Chakkrit Tantithamthavorn Shane McIntosh Ahmed E Hassan and KenichiMatsumoto 2018 The impact of automated parameter optimization on defectprediction models IEEE Transactions on Software Engineering (2018)

[68] Romain Tavenard 2017 tslearn A machine learning toolkit dedicated to time-series data httpsgithubcomrtavenartslearn

[69] L Torlay Marcela Perrone-Bertolotti Elizabeth Thomas and Monica Baciu 2017Machine learningndashXGBoost analysis of language networks to classify patientswith epilepsy Brain informatics 4 3 (2017) 159

[70] Geoffrey G Towell and Jude W Shavlik 1993 Extracting refined rules fromknowledge-based neural networks Machine learning 13 1 (1993) 71ndash101

[71] Irfan Ullah Basit Raza Ahmad Kamran Malik Muhammad Imran Saif Ul Islamand SungWon Kim 2019 A churn prediction model using random forest analysisof machine learning techniques for churn prediction and factor identification intelecom sector IEEE Access 7 (2019) 60134ndash60149

[72] Lorenzo Villarroel Gabriele Bavota Barbara Russo Rocco Oliveto and Massimil-iano Di Penta 2016 Release planning of mobile apps based on user reviews InProceedings of the 38th International Conference on Software Engineering ACM14ndash24

[73] Chih-Ping Wei and I-Tang Chiu 2002 Turning telecommunications call detailsto churn prediction a data mining approach Expert systems with applications 232 (2002) 103ndash112

[74] Cathrin Weiss Rahul Premraj Thomas Zimmermann and Andreas Zeller 2007How long will it take to fix this bug In Fourth International Workshop on MiningSoftware Repositories (MSRrsquo07 ICSE Workshops 2007) IEEE 1ndash1

[75] Ian H Witten Eibe Frank and A Mark 2016 Hall and Christopher J Pal DataMining Practical machine learning tools and techniques (2016)

[76] Barbara H Wixom and Peter A Todd 2005 A theoretical integration of usersatisfaction and technology acceptance Information systems research 16 1 (2005)85ndash102

[77] Jae-Chern Yoo and Tae Hee Han 2009 Fast normalized cross-correlation Circuitssystems and signal processing 28 6 (2009) 819

12

  • Abstract
  • 1 Introduction
  • 2 Motivating Example
  • 3 Data Collection
  • 4 Research Questions amp Results
  • 5 Threats To Validity
  • 6 Related Work
  • 7 Conclusion
  • References
Page 2: On the Relationship between User Churn and Software Issues · relationship between the issues that are present in a software prod-uct and user churn. For this purpose, we investigate

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

source and corporate software organizations to retain their usersand improve their probability of success (not to mention the overallsatisfaction of users)

In this paper we use data obtained from the alternativetonet1website which has a unique feature that allows users to recommendalternatives for a specific software product The recommendationof alternatives can signal the intention to switch from one softwareproduct to another (we provide an example of this scenario in Sec-tion 2) For the sake of simplicity we refer to the recommendationfor an alternative software product as simply potential user churn

By using the alternativetonet dataset we formulate an empiricalstudy to investigate the (i) Web Browsers (ii) IDEs and (iii) WebServers domains on the alternativetonet website We first extract3556 reviews and 10081 comments to better understand the overallconcerns of users regarding the software products in the studieddomains Next we extract 12549 recommendations for alterna-tives by users (ie potential user churn) from the Firefox Eclipseand Apache projects We specifically choose these projects for thedeeper analyses because these projects are established in the WebBrowser IDE andWeb Server domains respectively

To investigate the underlying reasons for the potential user churnof software products we pose four research questions that guidesour study

bull RQ0 What are the most significant usersrsquo concerns for BrowsersIDEs and Web Servers Understanding usersrsquo concerns will pavethe way to grasp the reasons behind potential user churn Whendiscussing the pros and cons of a software product and competi-tors users voice varied concerns (eg security amp privacy issues ormemory issues)bull RQ1 Can we find a correlation between potential user churn andissue reports Given that the issues that are present in the studiedprojects are the predominant topic in usersrsquo concerns we studywhether there exists a correlation between the issue reports of agiven software project (ie Firefox Eclipse and Apache) and thepotential user churnbull RQ2 Can we use machine learning models to predict the rate ofpotential user churn After correlating issue reports with potentialuser churn we study whether machine learning can be used topredict the rate of potential user churn (expressed by high or lowuser churn) To build our machine learning models we use latentfeatures from the issue reports of our studied projectbull RQ3 What are the most important factors related to potential userchurn given that machine learning models can be used to predictthe rate of potential user churn we can then extract the mostimportant features that are associated to the potential user churnThe reoccurring bugs the time alive factor and the issues acrossmultiple platforms are common factors that are highly associatedwith the potential user churn

The paper is organized as the following we showcase a motivat-ing example in Section 2 We describe our data collection processin Section 3 We delve into our approach and show our obtainedresults in Section 4 We discuss the threats to validity in Section 5In Section 6 we present the related work Finally we conclude thestudy in Section 7

1httpsalternativetonet

Figure 1 Example of a pdf reader on the alternativetonetwebsite

2 MOTIVATING EXAMPLEThe goal of the alternativetonet website is to help users to findsoftware alternatives that can better address the usersrsquo necessitiesFor instance let us consider that a user needs a better pdf reader(eg the current reader freezes occasionally) The first challengeoccurs because the user is not aware of all the available softwarealternatives In addition choosing an alternative for a softwareis not always simple since users may be already familiar with aset of features (from the software in use) which they would notlike to compromise Considering the pdf reader example whilethe user wishes a freezing-free alternative the user may only feelcomfortable to change if the alternative provides the same level ofcommenting capabilities (as compared to the reader in use)

With such challenges in mind the alternativetonet website wasdesigned to help users choose the best software alternative for theirneeds Figure 1 shows an example of an initial page of a softwareproduct on alternativetonet This initial page provides a descriptionof the software features categories and tags The features cate-gories and tags are used to organize the software alternatives onalternativetonet and can be used to search for software alternativesInteresting to note is the ldquoAlternativesrdquo tab depicted in Figure 1which shows all the available alternatives for a software product(as deemed by the community)

Alternativetonet allows users to provide reviews for softwarealternatives along with ratings (similarly to Google Play whichallows reviews to be provided for mobile apps) However what setsalternativetonet apart from other platforms is that it allows users tovoice their opinions by placing a software product in perspective toits competitors For example Figure 2 shows the opinions voiced byusers as to whether (and why) Chrome is a good (or bad) alternativeto Firefox2

As shown in Figure 2 users do not only comment on the prod-uct being evaluated but also put the product in perspective to its

2httpsalternativetonetsoftwarefirefox

2

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

Figure 2 httpsalternativetonet provides in-perspectivecomments where users specify why a certain software alter-native (eg Firefox) can be better than the current software(eg Chrome)

Figure 3 Recording the recommendation for an alternativefor Firefox

competitors For example a user mentions ldquo[Chrome] does not haveas many languages and spell checking support as Firefox does3rdquo

Because the opinions are voiced with respect to competitors (iethe in-perspective comments) alternativetonet provides a uniqueopportunity for empirical studies Particularly in our study thein-perspective comments allow us to investigate the most recurrentconcerns within the competition scene of a software domain (weperform this investigation in RQ0)

Another interesting and unique characteristic in the aterna-tivetonet data is the register of users that recommended alternativesfor a given software product (ie the potential user churn) Forexample Figure 3 shows how alternativetonet records when a userindicates that Chrome can be a good alternative to the Firefox webbrowser (along with the reason why) This data is important be-cause it can be used to proactively identify potential user churn insoftware products (eg by using machine learning) Based on thisinformation we build machine learning models and report on theirresults in RQ2 and RQ3

Finally the potential user churn provided by alternativetonetallow us to investigate whether it is related to prominent issuesthat may be present in a software product For example while auser has expressed the reason for a possible user churn in Figure 3(ie Jupyter is slow on Firefox) the development team of Firefoxis also working on an issue report that expresses the same issue(see Figure 4) In this paper we hypothesize that there may existstrong links between potential user churn and the issue reports thatare submitted to software products (we perform this investigation inRQ1)

3To provide an opinion as to whether a software product is a good or a bad alternativeto another product users must prove they are not robots Only afterwards a text boxcontaining the submission form will be provided

Figure 4 Issue report expressing the problem voiced by theuser in Figure 3

We capitalize on these unique characteristics of the alterna-tivetonet data to perform our empirical study on the relationshipbetween potential user churn and software issues

3 DATA COLLECTIONAlternativetonet organizes the software alternatives through severalmeans such as categories features or tags Categories are the broad-est way to categorize software alternatives and alternativetonet hasover 25 categories of software alternatives at the time of this studyTags and features are more specific ways to categorize softwarealternatives (eg the ldquoIDErdquo tag or the ldquoresponsive designrdquo feature)Another characteristic of tags and features is that they can be addedby users However these new tags and features must be verified bythe alternativetonet community before being added to the websiteFor the purposes of our study we use tags to filter our data forsoftware alternatives because it is more precise than using cate-gories For example we can easily filter for the software alternativeswithin the Web Browser domain using the ldquoweb browserrdquo tag Onthe other hand it would be hard to filter our data based on thefeatures For example filtering by ldquoresponsive designrdquo may retrievesoftware from very distinct domains

For the sake of data robustness it is tempting to analyze allthe software alternatives from all the software domains that areavailable on the alternativetonetwebsite However since one of ourmain objectives is to study the relationship between potential userchurn and the issues that are present in the software products weneed to restrict our analyses to the domains of certain software Forinstance it would be virtually impossible to collect and analyze allthe issue reports from every software alternative listed on the alter-nativetonet website Every different software alternative may usea different Issue Tracking System which would require differentways for collecting data Therefore we analyze the software alter-natives within the domains of three well established open sourceprojects Eclipse4 Firefox5 and the Apache server6 Our choice forthese three projects was based on the extensive prior empiricalsoftware engineering research that has been conducted using theseprojects over the years [2 7 18 19 29 36 42 54]

To collect the data for our study we followed the process de-picted in Figure 5 Given our chosen studied projects we collectall the software alternatives tagged with the ldquoBrowserrdquo ldquoIDErdquo andldquoWeb Serverrdquo tags which are the respective domains of the EclipseFirefox and Apache projects In total we collect 290 browser alter-natives 296 IDE alternatives and 134 web-server alternatives Oncethe software alternatives are fetched we collect the in-perspective4httpswwweclipseorg5httpswwwmozillaorgen-USfirefoxnew6httpshttpdapacheorg

3

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

Table 1 A summary of our collected data

Domain Alternatives Reviews Comments Potential churn Issue reports

IDEs 296 1088 3036 4319 (Eclipse) 22758 (Eclipse)

Browsers 290 1754 5112 6421 (Firefox) 40749 (Firefox)

Web Servers 124 714 1933 1809 (Apache) 20965 (Apache)

comments (explained in Section 2) and the reviews for all the soft-ware alternatives We also collect the potential user churn (explainedin Section 2) for the Eclipse Firefox and Apache projects Finally inaddition to the data collected from alternativetonet we also collectthe issue reports from the respective Issue Tracking Systems of ourstudied projects We use the potential user churn data combinedwith the issue reports to perform our investigations in RQ1-RQ3Table 1 summarizes the data collected for our study (ie the datatype and the amount of collected data)

4 RESEARCH QUESTIONS amp RESULTSIn this section we present our research questions (RQs) along withtheir results For each RQ we present its motivation and explainthe approach that is used to conduct the RQ

RQ0 What are the most significant usersrsquoconcerns for Browsers IDEs and Web ServersMotivation The motivation behind RQ0 is to understand the mainconcerns within the competition scene of a specific software do-main This investigation is important to highlight what are themain strengths and weaknesses within a particular domain (egBrowsers) This knowledge can be useful because for example anorganization can study the main weaknesses within a domain andbetter assess whether its productsservices fall within the sameflawsApproach To study the most significant concerns related toBrowsers IDEs and Web Servers we use the in-perspect commentsand reviews data that were explained in Section 3 To analyze suchdata we adopt the Latent Dirichlet Allocation (LDA) to find themain topics that are present in the in-perspective comments andreviews data We use the LDA implementation provided by theMALLET framework [40] LDA is a probabilistic generative methodthat assumes a Dirichlet distribution of latent topics LDA is partic-ular useful for us since we are interested in finding the main topicsthat are present in the in-perspective comments and reviews dataHowever The LDA model requires the data to be pre-processedin a certain format for optimal results Therefore we follow thestandard text pre-processing steps [48]

Fixing Typographical Errors Typographical errors can bedue to the usage of internet slang in a written text or simply due tocommon language mistakes The most recurrent of errors that wefind in our data is the repetition of characters For example userstend to use words such as ldquobeeeestrdquo to emphasize their feelings Weuse the Pattern For Python [63] package to fix such typographicalerrors

Removing Stop Words Examples of stop words are of byand the These are words that are recurrently used in written textbut do not possess meaning by themselves We use the package

nltk [8] to remove such stop words This package contains a well-known corpus of stop words and is widely used for text filtering

Stemming andLemmatization For better results words shouldbe repeated as frequently as they can since LDA relies heavily onthe occurrences of words in different sentences Hence the usageof stemming and lemmatization is essential These methods trans-form inflectional forms of a word to its basic form For examplethe words ldquofixedrdquo ldquofixingrdquo and ldquofixesrdquo are transformed to the basicform ldquofixrdquo Stemming works by removing the suffix of the word(eg the word ldquocatsrdquo becomes ldquocatrdquo by removing the ldquosrdquo) whilelemmatization retrieves the basic form of a word from a dictionaryknown as lemma (by using morphological analysis) We use bothstemming and lemmatization to ensure that we obtain the mostunique words possible

Forming Bigrams and Trigrams Forming bigrams and tri-grams is essential to group the words which frequently appeartogether Some words do not inflict any significant meaning for theLDA model on their own However when grouped together suchwords may indicate a specific topic One example of these wordsis ldquocustomer supportrdquo which has a precise meaning (as opposed toonly ldquocustomerrdquo or ldquosupportrdquo which can hold diverse meanings)

To choose the optimal number of topics in our LDAmodel we usethe coherence scoremetric [56] This metric was found to be the bestin alliance with human perception [49 65] However even whenrelying on the coherence score (which naturally limits redundancyin the produced topics) there is still the possibility of duplicatetopics being generated Therefore we perform a manual analysisof the produced topics to eliminate duplicates Our LDA modelproduces topics in the form of keywords and their percentage-of-significance with respect to the topic For example (0097 lowastldquo119904119900119906119903119888119890rdquo + 0085 lowast ldquo119901119903119894119907119886119888119910rdquo + 0053 lowast ldquo119905119903119886119888119896rdquo + 0050 lowast ldquo119901119903119900 119891 119894119905rdquo +0038 lowast ldquo119902119906119886119899119905119906119898rdquo + 0038 lowast ldquo119888119900119897119897119890119888119905rdquo + 0035 lowast ldquo119901119903119900119905119890119888119905rdquo + 0032 lowastldquo119886119903119888ℎ119894119907119890rdquo + 0029 lowast ldquo119898119900119899119890119910rdquo + 0026 lowast ldquo119901119890119903 119891 119900119903119898119886119899119888119890rdquo) are a set ofkeywords that indicate Privacy as high level topic But some otherdistribution of keywords could refer to the same topic Two of theauthors conduct together the manual analyses of the topics Thefinal set of topics is only produced when full consensus is reachedbetween authors (ie the authors went over and discussed all thetopics) Because both authors have analyzed together all topics andagreements were reached on all the topics in the end we did notcompute inter-rater agreement

Once the topics are defined we extract 100 examples (aiming fora 95 confidence level with a 10 margin error) from our data foreach studied domain (ie IDE Browser and Web Server) totalling300 examples For each example two of the authors assign theirtopic Again two authors analyze and discuss each example sothat consensus is reached regarding the assigned topic Afterwardswe compute the topic distribution for each domain which canrepresent the most recurrent concerns for that domainFindings Tables 2 and 3 show the topics obtained by our LDAmodelWe also show some of the examples for each topic in the table(all the examples and analyses are available in our replication pack-age which is hosted on httpsdoiorg105281zenodo3610584

Our obtained topics can be categorized into (i) general topicswhich are present in the three domains and (ii) domain-specifictopics which might still be present in other domains but are more

4

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

Alternativetonet

Issue tracking system

Fetch data from

studied domains

IDE Browser

and Web Server

alternatives

Fetch in-

perspective

comment and

reviews

In-perspective

comments

Software

reviews

RQ0

Competition

scene of a

domain

Fetch issues

from Firefox

Eclipse and

Apache

Fetch potential

churn from

Eclipse Firefox

and Apache

Potential churns

Issue reports

Issues amp

Potential churn

pairs

RQ1 Correlating

issues with

potential churns

RQ2 Machine

learning to

predict potential

churns

RQ3 Important

features

associated with

potential churns

Figure 5 An overview of our data collection process

dominant in a specific domain Table 4 shows the distribution oftopics per domain based on our manual analyses We observe thatrelease and updates memory issues bugscrashes and cross-platformavailability can be considered as general topics (ie they are im-portant in all the three analyzed domains) Indeed these topics areintuitively present when handling any kind of software

On the other hand Documentation and testing were presentfor web servers and IDEs signalling that a robust documentation(which is often overlooked [55]) and with good support for test-ing tools may definitely attractretain more users when competingin the IDE and Web Sever Market For example an organizationmay direct their attention to these aspects when studying theircompetition

The bookmarks privacy and security were exclusively present inthe Browsers domain signalling that these are essential features toretain or attract users Simplistic design debug options code refac-toring and auto-completion are topics that target IDEs exclusively

Finally complaints about release and updates memory issues andbugscrashes can be considered as the main topics that drive usersto churn These topics were present in the three domains with thehighest percentages among each of the domains The percentagescombined exceed 50 in each of the domains (see Table 4)Summary Our observations suggest that the usersrsquo concerns aretightly related to issues (eg bugs or crashes) that are present inthe respective software products

RQ1 Can we find a correlation betweenpotential user churn and issue reportsMotivation In RQ0 we observe that software issues (eg bugsamp crashes and memory issues) are common concerns that maymotivate users to churn With such an observation we hypothesizein RQ1 that the potential usersrsquochurn observed on alternativetonetmay be correlated with the issue reports which developers addressin their respective projects This analysis is important because ifsuch a correlation exists the potential user churn information canbe possibly used to prioritize issue reports (for example the moreassociated with potential user churn the more important the issuereport)

Approach In our work we approach the recommendations foralternatives (ie potential usersrsquochurn) and the issue reports asevents occurring over time and interpret them as two time-seriesSuch an interpretation is handy for two reasons (i) we can studythe correlation between two trends (potential usersrsquochurn and issuereports) and (ii) we can study whether the specific peaks in issuereports can be associated with peaks in potential usersrsquochurn

In our study we group the two time series in a weekly basisie all the potential user churn that occurred within a week aregrouped together We use the weekly grouping because groupingissue reports in a daily-basis is too narrow to form a pattern whilegrouping them in a monthly-basis can over-generalize the patternsin issues The time-series of potential usersrsquochurn and the time-series of issues showcase the events (grouped in a weekly-basis)from March 2015 (the earliest date on alternativetonet) to July 2019forming 200 weeks grouping

The first step to correlated our two time-series is to verifywhetherthere exists a global correlation between them (ie whether thereexist common patterns among the increasing and decreasing trendsover time) To do so we use the cross-correlation which is a metricused in signal processing [77] This correlation calculates the simi-larity between two time-series as a function of the displacementfrom one time-series to another For example given two functions119891 and 119892 the cross-correlation calculates the degree to which a shiftin 119892 (along the x-axis) is identical to the shift in 119891

However since we are also interested in analyzing the relation-ship between the peaks of our two time-series we use the DynamicTimeWarping (DTW) [58] which was designed for comparing time-series varying at different speeds A peak in potential usersrsquochurnmay be followed by a peak in issue reports (or vice-versa) ie thepeaks may not necessarily occur at the same time Differently fromthe Eucledian distance which assumes that a point 119894 in a time-seriesmust be aligned to the 119894119905ℎ point in another time-series [38 75] DTWallows the variance in the offset between two time-series [59]

To employ DTW we use the tslearn [68] which is a machinelearning toolkit specialized for time series analysis The DTW com-putation produces a matrix that shows the distance between everytwo points in the graph of the two time-series The algorithm isdesigned in a way to choose the closest point that respects the

5

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

633

634

635

636

637

638

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

639

640

641

642

643

644

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

691

692

693

694

695

696

Table 2 The predominant topics in alternativetonet

Release and updatesKeywords ldquo119907119890119903119904119894119900119899rdquo ldquo119897119900119899119892119890119903 rdquo ldquo119903119890119897119890119886119904119890rdquo ldquo119889119900119908119899119897119900119886119889rdquo ldquo119900 119891 119891 119894119888119894119886119897rdquoExamples - The program seems to be no longer updated Last versions (2018) can be still downloaded from the official website

- Firefox version 60+ (Quantum) is better than Chrome Compare to previous versions of Firefox

Bookmarks FunctionalityKeywords ldquo119887119900119900119896119898119886119903119896rdquo ldquo119901119886119899119890119897rdquo ldquo119891 119886119907119900119903119894119905119890rdquo ldquo119904119906119901119901119900119903119905rdquo ldquo119904119910119899119888rdquoExamples - Firefox now allows bookmarks importing from Chrome which is a huge plus

- With the recent upgrade to 20 (including the bookmarks syncing feature that I liked in Firefox) Vivaldi just got better

Flash supportKeywords ldquo119901119897119906119892119894119899rdquo ldquo119907119894119889119890119900rdquo ldquo119894119899119904119905119886119897119897rdquo ldquo119891 119897119886119904ℎrdquo ldquo119898119890119889119894119886 minus 119901119897119886119910119890119903 rdquoExamples - Since its the only way to watch Flash video on an iPhone without jailbreaking Skyfire is a killer app thats worth getting for anyone Playback of Flash video over WiFi is

flawless and the quality is at very least watchable

Memory issuesKeywords ldquo119898119890119898119900119903119910rdquo ldquo119877119860119872rdquo ldquo119897119886119892rdquo ldquo119897119894119898119894119905rdquo ldquoℎ119890119886119907119910rdquoExamples - Great C++ IDE which balances memory usage and indexing solution Contains some basic refactoring functions Good for large projects and limited RAM with remaining

some capabilities of more complex C++ IDEs- Eclipse is really lagging after the last update The memory is filled whenever I run a Java swing applet- Chrome is so heavy it lags alot when I open multiple tabs

Privacy and securityKeywords ldquo119901119903119894119907119886119888119910rdquo ldquo119904119890119888119906119903119894119905119910rdquo ldquo119894119898119901119900119903119905119886119899119905rdquo ldquo119886119897119897119900119908rdquo ldquo119888119900119899119905119903119900119897rdquoExamples - Firefox allows far more control over your privacy than does Chrome (or even Chromium on which Chrome is based)

- Google Chrome is a great browser and a lot better than Vivaldi Browser Chrome keeps updating with cool stuff like the new cast feature better extensions and lot betterprivacy and security Vivaldi has more customizing features but has a long way to go to compete with Chrome

BugsCrashesKeywords ldquo119888119903119886119904ℎrdquo ldquo119887119906119892rdquo ldquo119891 119894119909rdquo ldquo119890119909119901119890119903119894119890119899119888119890rdquo ldquo119894119904119904119906119890rdquoExamples - Lots of users experience Adobe Flash Player hangs or crashes I dont know why it isnt fixed yet

If Chrome ever got its proxy support bugs fixed it would leap to the top of the list and become pretty much the only web browser Id ever use- I worked with NetBeans for a couple of years and always loved it but for professional users the yearly fee is well spent Nicer interface better Code Intelligence moreoptions to fine tune what happens on saving a file (eg saving or uploading to various locations) You get a couple of feature update each year plus bugfixes whereasNetBeans is sometimes very slow to update things (support for new version of a language) or to fix known bugs

Table 3 The predominant topics in alternativetonet (continued)

Debug optionsKeywords ldquo119891 119890119886119905119906119903119890rdquo ldquo119889119890119907119890119897119900119901119898119890119899119905rdquo ldquo119889119890119887119906119892rdquo ldquo119894119899119905119890119892119903119886119905119894119900119899rdquo ldquo119901119890119903 119891 119890119888119905rdquoExamples - Not enough debug and run options just a text editor Atom has not got any solid debugging features (and packages) out of the box or at all

- This is my favorite code editor It is open source and portable you can get a portable version that will fit onto a flash drive if youwant It is just as configurableas Sublime Text and has a large community producing great plugins it has GIT support built in and excellent support for Javascript HTML CSS and Pythonand others of course It has an integrated debugging system

Simplistic designKeywords ldquo119904119894119898119901119897119890rdquo ldquo119890119889119894119905119900119903 rdquo ldquo119897119894119892ℎ119905119908119890119894119892ℎ119905rdquo ldquo119889119890119904119894119892119899rdquo ldquo119899119900119905119890119901119886119889rdquoExamples - Really lightweight with some advanced not complete but easy to use auto complete features Simple design and customizable

Cross-platformKeywords ldquo119888119903119900119904119904 minus 119901119897119886119905 119891 119900119903119898rdquo ldquo119871119894119899119906119909rdquo ldquo119900119901119890119899 minus 119904119900119906119903119888119890rdquo ldquo119886119907119886119894119897119886119887119897119890rdquo ldquo119894119899119904119905119886119897119897rdquoExamples - Its cross platform and open source and bring the benefits of the JetBrains IDE platform which is well polished and powerful

- One of the best open source browsers that work on Linux and Windows

Code refactoring and auto-completionKeywords ldquo119888119900119889119890rdquo ldquordquo119886119906119905119900 minus 119888119900119898119901119897119890119905119894119900119899rdquo ldquo119903119890 119891 119886119888119905119900119903 rdquo ldquo119900119901119905119894119900119899rdquo ldquo119906119904119890rdquoExamples - For its young age it is well matured and a very powerful IDE with great CMake Support and other goodies like great refactoring tools and a sophisticated

auto-completion system

TestingKeywords ldquo119905119890119904119905rdquo ldquo119904119890119903119907119890119903 rdquo ldquo119889119890119901119897119900119910rdquo ldquo119908119890119887119904119894119905119890rdquo ldquo119904119890119905rdquoExamples - XAMPP is great tool to develop and test your website (particularly if it uses php and mysql databases) offline before putting it on a live server You could

also use XAMPP as an easy method of setting up a live server- Eclipse is far more superior than netbeans in unit testing functionality

DocumentationKeywords ldquo119889119900119888119906119898119890119899119905119886119905119894119900119899rdquo ldquo119904119890119903119907119890119903 rdquo ldquo119901119900119903119905rdquo ldquo119889119886119905119886119887119886119904119890rdquo ldquo119891 119900119903119906119898rdquoExamples - Easiest way to run a web server on Windows The performance is surprisingly good and lots of documentation on how to manage it

- Support is via forum only Documentation can be very poor particularly around product upgrades (one expects more from commercial packages) Due totheir poor documentation I lost several years of development databases when porting from an old to a new computer (they dont accurately document thelocation of the database files)- Eclipse lacks in adding plugins documentation

time constraints for continuity and monotonicity Finally two co-authors performed a manual analysis of the observed peaks in the

two time-series7 The goal of the manual analyses is to find whetherthe identified peaks are indeed likely related (eg the issue reports

7We share the data of our manual analyses at httpsdoiorg105281zenodo3610584

6

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

808

809

810

811

812

Table 4 The extracted topics percentages distribution overthe three domains

Topics Browsers() IDEs() Web servers()Release and updates 18 17 21Bookmarks functionality 12 0 0Memory issues 23 22 11Privacy and security 8 0 0BugsCrashes 24 19 20Debug options 0 10 0Simplistic design 0 8 2Cross-platform 8 6 5Code refactoring and auto-completion 0 10 0Testing 0 2 17Documentation 0 2 15Miscellaneous 7 4 9

within a peak actually describe problems mentioned in the poten-tial usersrsquochurn) Again both co-authors analyze and discuss everyanalyzed sample

In this RQ we perform our analyses only in our studied projects(as opposed to analyzing all the software alternatives in their do-mains) As discussed in Section 3 it would be impracticable toextract the data from all the different Issue Tracking Systems of thesoftware alternatives in the studied domainsFindingsWe respectively observe a cross-correlation of 076 072and 070 in Firefox Eclipse and Apache These cross-correlationscores indicate that there exist a global commonality between thetrends of the two time-series regardless of the timing factor

Regarding our DTW analysis we observe that the time fromthe filing of an issue report to the time of a spark in potentialusersrsquochurn span from 2 to 4 weeks (on the median for the threeprojects) Additionally our DTW correlation reveals a bidirectionalrelationship between the peaks in issue reports and the peaks inpotential usersrsquochurn For the cases where a peak in issue reportsleads to a peak in potential usersrsquochurn it is intuitive to think thatthe peak in potential usersrsquochurn is due to the frustration generatedby the issues described in the reports On the other hand the caseswhere a peak in potential usersrsquochurn is followed by a peak inissue reports may occur due to users being frustrated over non-obvious issues For example if we consider the potential user churndepicted in Figure 3 (ie ldquoJupyter is slow on Firefoxrdquo) the users mightnot find it obvious that such a problem should be reported to theFirefox team (eg users may simply interpret it as a characteristicof Firefox) Therefore in such cases issue reports would be filledonly a while after the potential usersrsquochurn have been expressed

We are specifically interested in studying the relationship be-tween reported issues (ie the documented issues in the trackersystems) and the potential user churn Our rationale is that under-standing the characteristics of issues that are associated with userchurn can help developers to prioritize such issues and avoid futurechurn (thereby retaining more users)

Finally Table 5 shows a subset of the examples found in ourmanual analyses We can indeed observe the apparent relationshipbetween the comments within the potential usersrsquochurn and thedescriptions (or titles) of issue reports The examples are followinga unidirectional link in which the issue report that occurs at acertain time would induce the potential user churn at a later timeSummary Our results suggest that there exists a significant cor-relation between software issues and potential user churn

RQ2 Can we use machine learning models topredict the rate of potential user churnMotivation In the practice most of the issue reports undergo hu-man estimations of the priority and the severity [5] usually basedon internal business-oriented factors (such as ROI [25]) Issuesmight be prioritized by their recency the affected platforms theversions of the software where the issue is present Feeding a ma-chine learning model to identify the potential user churn that arecaused by issues may help automate the issue report prioritizationprocess and ultimately save huge costs in software developmentApproach We train three different machine learning models topredict low and high user churn rates by using information from is-sue reports The first model is a Feed-Forward Neural Network withtwo hidden layers developed by using the framework Keras [17]Keras is widely used to train simple neural networks as it hasa higher abstraction than TensorFlow [1] Feed-Forward NeuralNetworks [64] are the most suitable type of neural networks tobe trained on tabular data [27 33 50] This network evaluates allthe possible combinations of different metrics values to guide thepredictions

Our second model is the XGboost model which is a gradientboosting tree [12 13] XGboost is widely used for binary classifica-tion problems Research has shown that XGboost performs betterthan neural networks and classic classification machine learningalgorithms in problems dealing with tabular data [14 45 69]

We also train Random Forest models [35] Random Forest isa classical machine learning algorithm which is widely used inproblems dealing with tabular data [26 51 57] In addition Ran-dom Forest models have been widely used in software engineeringresearch such as in defect prediction [67]

The goal of ourmodels is to predict the rate of potential usersrsquochurngiven the characteristics of issue reports The data used to fit ourmodels is the data obtained from our DTW associations Simply putthe DTW provides us with 119875 pairs of issue reports 119868 and potentialusersrsquochurn 119862 (ie 119875 lt 119868 119862 gt) Therefore we study the character-istics of the issues 119868 to predict the amount of potential user churn119862 We split the distribution of 119862 into two percentiles (ie low andhigh) The low percentile being from 0 to 50 and the high from51 to 100 Thus our models output dichotomous predictions iewhether the potential usersrsquochurn is low or high

The characteristics of issue reports that we include in our models(henceforth referred to as features) are described in Table 6 Thesefeatures are collected from the Issue Tracking systems of our threestudied projects We study features such as Component Hardwareand OS to account for the locality and the spread of the issuesFor example these features can indicate whether platform-specificissues or more general issues have more potential user churn

Other features such as the Severity and Priority showcase theprioritization used by developers The Severity and Priority areimportant for us to verify whether the prioritization performed bydevelopers is aligned with the rate of potential user churn

Features related to the type of the bugs (ie whether it is acrash) whether a version number is present or the target milestoneprovide us information regarding the tracking process of issuesThey are important to understand whether better-tracked issues

7

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

870

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

871

872

873

874

875

876

877

878

879

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

Table 5 Manual investigation of DTW correlation linking

FirefoxExample 1Correlation Linking Week 164 in issues to week 169 in switches (May 2018)Issue Report Title Firefox version 5303 appears high memory high resources usage and might causing for SSD damagesIssue Description Mozilla Firefox it is caching all of this data 4-5GB on my new SSD and its lost 4 from its health within 5 months due continuously writesoverwrites firefox cache on my SSDAltrnativeto Comment Switching to Chrome after version 5303 high memory is damaging my SSDExample 2Correlation Linking Week 66 in issues to week 68 in switches (November 2016)Issue Report Title Sessions are not cleared in the private windowIssue Description After closing private window (not a tab) and opening a new one I still logged inAltrnativeto Comment Chrome incognito functionality is more stable than Firefox Firefox doesnrsquot resets sessions

EclipseExample 1Correlation Linking Week 10 in issues to week 11 in switches (July 2015)Issue Report Title Not Java 8 CompatibleIssue Description When downloading Eclipse for the first time for Java development 64-bit (which is what my computer is) it is not running because it is not compatible with Java version 8

(which is the latest version of Java) but appears to be compatible with version 7 still -DosgirequiredJavaVersion=17Altrnativeto Comment Using Netbeans because Eclipse is not installingExample 2Correlation Linking Week 211 in issues to week 213 in switches (February 2019)Issue Report Title jar files in the exported application do not contain class filesIssue Description The application has many plugins It builds and runs well within eclipse but when I export the application using the eclipse export product wizard the jar files (corresponding to

the plugins) do not contain any class files they include meta-inf pluginxml and all the extra folders (eg lib assets etc when available) but they do NOT contain any class filesAltrnativeto Comment Didnrsquot find any problems in exporting jar (when recommending IntelliJ)

ApacheExample 1Correlation Linking Week 113 in issues to week 117 in switches (July 2017)Issue Report Title httpd 2426 no longer building against lua 531 or lua 534Issue Description Building Apache 2426 against lua 531 or 534 compiled from scratch (latest version as of this writing) does not work Compilation against lua 531 did work with 2425Altrnativeto Comment Some issues while building with lua (when recommending XAMPP)Example 2Correlation Linking Week 12 in issues to week 13 in switches (September 2019)Issue Report Title Apache Server is restarted time to time (Release 2410)Issue Description I am using Apache release 2410 in Windows Server 2008 Time to time the Apache server is getting restarted The service is unavailable for 3-5 minutes and after that the service

is started up automatically This issue occurs frequently twice a day two days once etcAltrnativeto Comment XAMPP is more stable I am experiencing random restarts throughout the day when using Apache

Table 6 The issue reports attributes for Firefox Eclipse andApache

Metric Description

Component consists of different components of the system such as bookmarkstheme toolbar etc

Status describes the status of the issue such as unconfirmed open re-solved etc

Resolution describes the resolution type of the issue such as fixed duplicateinvalid etc

Severity the assigned severity such as blocker critical normal etcHardware the different hardware affected by the issue such as desktop mobile

32 or 64 bits etcOS the different OS affected by the issue such as OSX Windows etcPriority the assigned priority from 1 to 5Type the assigned issue type such as defect enhancement or taskTime Alive the time difference between the date opened and the date resolved

in minutesTime since last change the time difference between the date opened and the date of last

change in minutesIs a Crash a boolean value to indicate if it is a crash reportHas a Version Number a boolean whether the issue report has a version numberHas a Target Milestone a boolean whether the issue report has a target numberHas a Regression Range a boolean value to indicate if the bug causes a regression testingNumber of comments the number of comments for the reportNumber of votes the number of votes for the report which validates the integrity of

the report by other usersNumber of Blocks the number of issues that are blocked by that issue until it is re-

solved

(ie with more tracking information) are associated with potentialusersrsquochurn

The number of comments votes and blocks showcase the effortinvested on discussing an issue Finally the time-alive and the time-since-last-change are two features related to the life-cycle of an issuereport Longer time-alive values indicate a lingering issue that hasnot been resolved yet while a short time-since-last-change values

indicate issues that have been recently addressed (and thus can stillbe reoccurring)

Our studied features are inspired by previous research in soft-ware engineering that have built machine learning models usinginformation present in issue reports [15 16 16 18 19 29 32 39 74]

We chose features that are common across the issue reports ofFirefox Eclipse and Apache We filtered out features that are uniqueto an issue tracker system only to ensure a fair comparison betweenour studied issue reports Afterwards we performed Spearmancorrelation tests [44] on the selected features in which each featureis tested against the set of all the other features We did not observeany correlation obtaining a value higher than 07 which suggeststhat our selected features are not correlated and can be safely fedinto our machine learning models [30]

To evaluate the performance of our models we use the Area Un-der the Curvemetric (AUC) AUC is useful for evaluating our modelsbecause our models output probabilities Therefore AUC showsthe discrimination power of our models in every probability thresh-old (unlike Precision Recall or F-measure which are limited toevaluating models at a single probability threshold [66]) The AUCvalues range from 0 to 1 An AUC of 05 denotes a random guessingmodel while an AUC of 1 denotes a perfect distinguishing powerAn AUC of 0 denotes a model with perfect inverse predictions

Given the temporal nature of our data (ie future issue reportscannot be used to predict the potential user churn of past issuereports) we adopt a leave-one-out validation approach to obtainour AUC values

First our issue reports have already been sorted when we per-form the time-series analyses Second we split our data into two

8

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

987

988

989

990

991

992

993

994

995

996

997

998

999

1000

1001

1002

1003

1004

1005

1006

1007

1008

1009

1010

1011

1012

1013

1014

1015

1016

1017

1018

1019

1020

1021

1022

1023

1024

1025

1026

1027

1028

1029

1030

1031

1032

1033

1034

1035

1036

1037

1038

1039

1040

1041

1042

1043

1044

sets one containing 80 of the data ie the training set and an-other containing 20 ie the validation set In the first iterationof our models we use 80 of the data to train our models and weuse the first element of the validation set to test out models Thenthe leave-one-out validation iteratively increases the training setsize by one element (from the validation set) and predicts the nextelement of the validation set This process is repeated until thetraining set reaches the size of 119873 minus 1 (where 119873 is the size of ourdata set) Finally our obtained AUC values are the aggregation(ie median) of all the AUC values obtained in the leave-one-outiterationsFindingsTheAUC values obtained for our threemodels are shownin our online appendix8 The XGboost model performs the best inthe three studied projects obtaining AUC values that exceed 083There exists a subtle difference between the AUC values of theXGboost model and the Feed Forward Neural Networks AUCssince both the XGboost model and the Neural Networks can graspthe complex relationships in our issue reports dataSummary Our results suggest that machine learning models suchas XGboost and Feed Forward Neural Networks can predict therate of user churn with relatively high accuracies (in terms of AUC)Such predictions could help in the prioritization of issue reports

RQ3 What are the most important factorsrelated to potential user churnMotivation In RQ2 we observe that machine learning modelscan be used to predict the rate of potential usersrsquochurn Howeverdevelopers would hardly blindly trust a machine learning modelto help on their decisions (and they should not do so) Developerswould mostly benefit from understanding the reasons behind thepredictions of our machine learning models so that they couldpossibly adapt (or not) their development process Therefore inRQ3 we investigate the most important features in the predictionsof our machine learning modelApproach In this RQ we study the most important features in thepredictions of the XGboost models since they obtain the best AUCvalues in our three studied projects

To find the most important features we adopt a simple featureextraction algorithm [61 70] Consider that our models are trainedon a feature set 119883 = 1199091 1199092 119909119899 (as explained in RQ2) The featureextraction process consists of iteratively (i) removing each feature 119909119894from our models and (ii) computing the AUC without each feature119909119894 (by using the same leave-one-out validation process explainedin RQ2) The AUC values from the models without a feature 119909119894 arecompared to the models containing all the features (ie we takethe difference 119860119880119862119900119903119894119892119894119899119886119897 minus119860119880119862119903119890119898119900119907119890119889 ) The higher the drop inthe AUC value caused by removing a certain feature 119909119894 the higherthe importance of such a feature 119909119894 [4 9 10]Findings Table 7 shows our obtained results after performing thefeature extraction process The table shows the drop in the AUC foreach feature (ie ΔAUC) We also show the minimum maximummean and median values of the features in the 119897119900119908 user churn rate(ie 0) and ℎ119894119892ℎ user churn rate (ie 1) prediction classes

8httpsdoiorg105281zenodo3610584

The Issue Alive Time is the most important feature obtaining thehighest ΔAUC in our three studied projects This result suggeststhat it is unhealthy to leave issues hanging for a long time as theycan be perceived as a reason for users to churn

The Number of Comments obtains a considerable ΔAUC in ourthree studied systems The association between the Number ofComments and the rate of user churnmay occur due to themultitudenumber of users being affected by an issue

The Time-since-last-change is also an important feature in theFirefox and Eclipse projects This result suggests that issues withrecurring changes or modifications may signal a certain instabilityin the fixing process which might cause a potential user churn inthe future

It is worthy noting that features such as Severity Priority andType of Bugs do not obtain a high importance in the predictionswhich might suggest that the existing prioritization fields used inissue reports are misaligned with the potential user churnSummaryOur results suggest that long lived issues can potentiallylead to more potential userrsquos churn and that the existing processesfor prioritizing issues should be augmented to capture the highlyinteractive and long lived issues

5 THREATS TO VALIDITYConstruct Validity The main construct validity of our study isrelated to the assumption of a potential user churn The alterna-tivetonet website does not record exactly when a user has chosen asoftware alternative over another software Instead alternativetonetrecords that a user thinks that a software may be a good alternativeto another software The motivation behind the act of signaling analternative is not known by us For example there might be usersthat are simply driven by the passion of contributing to the alterna-tivetonet community which lead them to provide several opinionsas to which software could make good alternatives to others Wehave deliberately adopted the term potential user churn instead ofsimply user churn due to this unknown motivations behind signal-ing alternatives Moreover it is reasonable to think that if a givensoftware has received a considerable amount of recommendationfor alternatives this may likely indeed represent that users areconsidering to change the software

Internal Validity Our internal threats to validity are mainlyrelated to (i) our chosen features and (ii) our prediction classes Wesplit the peaks of potential user churn into the low and high classesOther research may find different results if the potential user churnis modeled in different way In addition we acknowledge that ourset of metrics is not exhaustive For example we have not studiedwhether the presence of a stack-trace in an issue report can berelated to future potential user churn We plan to (i) model thepotential user churn as a continuous variable and (ii) extend our setof features to predict the potential user churn in future research

External Validity Our study targeted three domains that areweb servers IDEs and web browsers These three domains werechosen due to their availability since we could fetch their issuereports data Although our study showed common factors in ouranalyses we cannot generalize our results to other domains orprojects with a different scale (ie smaller projects) Regardless ourwork provides insights of the possible reasons behind user churn

9

1045

1046

1047

1048

1049

1050

1051

1052

1053

1054

1055

1056

1057

1058

1059

1060

1061

1062

1063

1064

1065

1066

1067

1068

1069

1070

1071

1072

1073

1074

1075

1076

1077

1078

1079

1080

1081

1082

1083

1084

1085

1086

1087

1088

1089

1090

1091

1092

1093

1094

1095

1096

1097

1098

1099

1100

1101

1102

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

1103

1104

1105

1106

1107

1108

1109

1110

1111

1112

1113

1114

1115

1116

1117

1118

1119

1120

1121

1122

1123

1124

1125

1126

1127

1128

1129

1130

1131

1132

1133

1134

1135

1136

1137

1138

1139

1140

1141

1142

1143

1144

1145

1146

1147

1148

1149

1150

1151

1152

1153

1154

1155

1156

1157

1158

1159

1160

Table 7 The most important features for Firefox Eclipse and Apache ranked by their drop in AUC

Factor ΔAUC Min 0 Max 0 Mean 0 Median 0 Min 1 Max 1 Mean 1 Median 1

FirefoxTime Alive 013 0 120 1102 13 10 1487 4872 63OS All 007 0 1 014 0 0 1 083 1Number of Comments 006 1 65 894 12 1 458 2811 45Time Since Last Change 006 9 30 1090 22 0 5 410 3Has a Target Milestone 005 0 1 022 0 0 1 079 1Has a Version Number 004 0 1 033 0 0 1 064 1Component UI Web Payments 003 0 1 042 0 0 1 061 1

EclipseTime Alive 012 0 103 901 12 8 1334 4396 49Has a Target Milestone 006 0 1 019 0 0 1 082 1Time since last change 006 7 27 851 19 0 6 312 3Has a Version Number 005 0 1 034 0 0 1 077 1Component Debug 004 0 1 047 0 0 1 072 1Number of Comments 003 1 72 1092 11 1 632 2701 72OS All 002 0 1 050 0 0 1 068 1

ApacheTime Alive 014 0 136 1132 14 14 1521 5310 70Component Utils 010 0 1 022 0 0 1 091 1Has a Version Number 009 0 1 038 0 0 1 086 1Number of Comments 005 1 92 1232 6 1 352 4207 79OS All 004 0 1 033 0 0 1 079 1Status REOPENED 003 0 1 040 0 0 1 073 1Hardware All 002 0 1 051 0 0 1 078 1

6 RELATEDWORKGiven that we study the relationship between potential user churnand issue reports our research is related to the areas of usercustomerchurn and software quality Therefore we survey the related re-search around these two areas in this section

UserCustomerChurnUser or customer churn has beenwidelystudied in the area of telecommunication services [3 28 71] Aminet al [3] proposed a cross-company machine learning model topredict customer churn The authors also propose the use of datatransformation (eg by using the log function) to improve the train-ing data Ullah et al [71] used feature selection techniques such asInformation Gain and Correlation Attributes Ranking Filter to selectwhich features better explained the churn behaviour of differentcustomer groups The authors observed that a Random Forest modelperformed the best in their study While the Business to Customers(B2C) churn has been widely studied Figalist et al [28] recentlyinvestigated the Business to Business (B2B) churn ie when thecustomer business changes shifts to the services provided by thecompetition Figalist et al [28] allude that although B2B relation-ships tend to be more stable they have a much bigger financial im-pact when they change In terms of software engineering there hasbeen a lack of studies that have investigated usercostumer churndirectly Instead much research has been invested on customerreviews regarding software products [6 46 47] Noei et al [46]studied which features from mobile apps are associated with betterranks in the Google Play store While the work by Noei et al [46]does not study the customer churn directly striving for better ranksin Google Play may avoid user churn in mobile apps Bavota etal [6] investigated the relationship between the usage of fault- andchange-prone APIs in Android apps and user reviews Differentlyfrom all the aforementioned work our study investigates the userchurn is software products

Software Quality Although there exist a lack of studies ana-lyzing the user churn of software productsservices there has beena considerable amount of research on software quality [5 34 39 6667 74] Research on software quality is important since a quality

software is more stable and less defect-prone Ultimately such char-acteristics are tightly related to the observations of our study(iesoftware issues are correlated with user churn) A prominent areaof software quality research in software engineering is the defectprediction area [66 67] Tantithamthavorn et al [66] studied howmuch improvement can be obtained by automatically optimizingdefect prediction models Their research shows that automatic op-timization can yield significant or insignificant performance gainsdepending on the machine learning algorithm Other considerableamount of research has been invested on the triaging part of issuereports [5] and on the effort estimation of an issue report (in termsof time to address an issue) [39 74] In this paper we complementprior research by studying the relationship shared between issuereports (eg defects or required enhancements) with the potentialuser churn of users

7 CONCLUSIONIn this work we study the data available on alternativetonet tobetter understand the relationship between software issues and thepotential user churn of users Having observed that user concernsare tightly related to software issues (eg bugs) we investigatethe relationship between issue reports and the potential user churnof users Our study reveals key issues that must be addressed forthe success of a software product (depending on the domain) Forexample we observe that the potential user churn of users may betightly related to the lack of a robust documentation and support fortesting tools (in the ldquoIDErdquo and ldquoWeb Serverrdquo domains) Finally ourmachine learning models reveal that (i) the longer the issue takes tobe fixed the higher the chances of user churn and (ii) issues withinmore general software components are more likely to be associatedwith user churn Finally we suggest that the current prioritizationperformed by developers should be augmented to encompass thelong lived and highly interactive issues In overall our study sug-gests that the prioritization process of issues can be improved byconsidering the potential user churn of users associated with suchissues

10

1161

1162

1163

1164

1165

1166

1167

1168

1169

1170

1171

1172

1173

1174

1175

1176

1177

1178

1179

1180

1181

1182

1183

1184

1185

1186

1187

1188

1189

1190

1191

1192

1193

1194

1195

1196

1197

1198

1199

1200

1201

1202

1203

1204

1205

1206

1207

1208

1209

1210

1211

1212

1213

1214

1215

1216

1217

1218

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

1219

1220

1221

1222

1223

1224

1225

1226

1227

1228

1229

1230

1231

1232

1233

1234

1235

1236

1237

1238

1239

1240

1241

1242

1243

1244

1245

1246

1247

1248

1249

1250

1251

1252

1253

1254

1255

1256

1257

1258

1259

1260

1261

1262

1263

1264

1265

1266

1267

1268

1269

1270

1271

1272

1273

1274

1275

1276

REFERENCES[1] Martiacuten Abadi Paul Barham Jianmin Chen Zhifeng Chen Andy Davis Jeffrey

Dean Matthieu Devin Sanjay Ghemawat Geoffrey Irving Michael Isard Man-junath Kudlur Josh Levenberg Rajat Monga Sherry Moore Derek G MurrayBenoit Steiner Paul Tucker Vijay Vasudevan Pete Warden Martin Wicke YuanYu and Xiaoqiang Zheng 2016 Tensorflow A system for large-scale machinelearning In 12th USENIX Symposium on Operating Systems Design and Imple-mentation (OSDI 16) 265ndash283

[2] W Abdelmoez Mohamed Kholief and Fayrouz M Elsalmy 2012 Bug fix-time pre-diction model using naiumlve bayes classifier In 2012 22nd International Conferenceon Computer Theory and Applications (ICCTA) IEEE 167ndash172

[3] Adnan Amin Babar Shah Asad Masood Khattak Thar Baker and Sajid Anwar2018 Just-in-time customer churn prediction with and without data transforma-tion In 2018 IEEE Congress on Evolutionary Computation (CEC) IEEE 1ndash6

[4] Hafeez Ullah Amin Aamir Saeed Malik Rana Fayyaz Ahmad Nasreen BadruddinNidal Kamel Muhammad Hussain and Weng-Tink Chooi 2015 Feature extrac-tion and classification for EEG signals using wavelet transform and machinelearning techniques Australasian physical amp engineering sciences in medicine 381 (2015) 139ndash149

[5] John Anvik Lyndon Hiew and Gail C Murphy 2006 Who should fix this bugIn Proceedings of the 28th international conference on Software engineering ACM361ndash370

[6] Gabriele Bavota Mario Linares-Vasquez Carlos Eduardo Bernal-Cardenas Mas-similiano Di Penta Rocco Oliveto and Denys Poshyvanyk 2014 The impactof api change-and fault-proneness on the user ratings of android apps IEEETransactions on Software Engineering 41 4 (2014) 384ndash407

[7] Pamela Bhattacharya and Iulian Neamtiu 2011 Bug-fix time prediction modelscan we do better In Proceedings of the 8thWorking Conference on Mining SoftwareRepositories ACM 207ndash210

[8] Steven Bird and Edward Loper 2004 NLTK the natural language toolkit InProceedings of the ACL 2004 on Interactive poster and demonstration sessionsAssociation for Computational Linguistics 31

[9] Andrew P Bradley 1997 The use of the area under the ROC curve in theevaluation of machine learning algorithms Pattern recognition 30 7 (1997)1145ndash1159

[10] Leo Breiman 2001 Random forests Machine learning 45 1 (2001) 5ndash32[11] Albert Caruana and Michael T Ewing 2010 How corporate reputation quality

and value influence online loyalty Journal of Business Research 63 9-10 (2010)1103ndash1110

[12] Tianqi Chen and Carlos Guestrin 2016 Xgboost A scalable tree boosting systemIn Proceedings of the 22nd acm sigkdd international conference on knowledgediscovery and data mining ACM 785ndash794

[13] Tianqi Chen Tong He Michael Benesty Vadim Khotilovich and Yuan Tang 2015Xgboost extreme gradient boosting R package version 04-2 (2015) 1ndash4

[14] Wenbin Chen Kun Fu Jiawei Zuo Xinwei Zheng Tinglei Huang and WenjuanRen 2017 Radar emitter classification for large data set based on weighted-xgboost IET Radar Sonar amp Navigation 11 8 (2017) 1203ndash1207

[15] Morakot Choetkiertikul Hoa Khanh Dam Truyen Tran and Aditya Ghose 2015Predicting delays in software projects using networked classification (t) In 201530th IEEEACM International Conference on Automated Software Engineering (ASE)IEEE 353ndash364

[16] Morakot Choetkiertikul Hoa Khanh Dam Truyen Tran Aditya Ghose and JohnGrundy 2017 Predicting delivery capability in iterative software developmentIEEE Transactions on Software Engineering 44 6 (2017) 551ndash573

[17] Franccedilois Chollet 2017 Keras[18] Daniel Alencar Da Costa Shane McIntosh Uiraacute Kulesza Ahmed E Hassan and

Surafel Lemma Abebe 2018 An empirical study of the integration time of fixedissues Empirical Software Engineering 23 1 (2018) 334ndash383

[19] Daniel Alencar da Costa Shane McIntosh Christoph Treude Uiraacute Kulesza andAhmed E Hassan 2018 The impact of rapid release cycles on the integrationdelay of fixed issues Empirical Software Engineering 23 2 (2018) 835ndash904

[20] Piew Datta Brij Masand Deepak R Mani and Bin Li 2000 Automated cellularmodeling and prediction on a large scale Artificial Intelligence Review 14 6 (2000)485ndash502

[21] Kushal S Dave Vishal Vaingankar Sumanth Kolar and Vasudeva Varma 2013Timespent based models for predicting user retention In Proceedings of the 22ndinternational conference on World Wide Web ACM 331ndash342

[22] William H DeLone and Ephraim R McLean 1992 Information systems successThe quest for the dependent variable Information systems research 3 1 (1992)60ndash95

[23] William H DeLone and Ephraim R McLean 2003 Model of information systemssuccess a ten years update Journal of Management (2003)

[24] Gideon Dror Dan Pelleg Oleg Rokhlenko and Idan Szpektor 2012 Churnprediction in new users of Yahoo answers In Proceedings of the 21st InternationalConference on World Wide Web ACM 829ndash834

[25] Khaled El Emam 2005 The ROI from software quality Auerbach Publications[26] Katherine Ellis Jacqueline Kerr Suneeta Godbole Gert Lanckriet David Wing

and Simon Marshall 2014 A random forest classifier for the prediction of energy

expenditure and type of physical activity from wrist and hip accelerometersPhysiological measurement 35 11 (2014) 2191

[27] David Enke and Suraphan Thawornwong 2005 The use of datamining and neuralnetworks for forecasting stock market returns Expert Systems with applications29 4 (2005) 927ndash940

[28] Iris Figalist Christoph Elsner Jan Bosch and Helena Holmstroumlm Olsson 2019Customer churn prediction in B2B contexts In International Conference on Soft-ware Business Springer 378ndash386

[29] Emanuel Giger Martin Pinzger and Harald Gall 2010 Predicting the fix timeof bugs In Proceedings of the 2nd International Workshop on RecommendationSystems for Software Engineering ACM 52ndash56

[30] Frank E Harrell Jr 2015 Regression modeling strategies with applications to linearmodels logistic and ordinal regression and survival analysis Springer

[31] Steffen Hedegaard and Jakob Grue Simonsen 2013 Extracting usability and userexperience information from online user reviews In Proceedings of the SIGCHIConference on Human Factors in Computing Systems ACM 2089ndash2098

[32] Yujuan Jiang Bram Adams and Daniel M German 2013 Will my patch make itand how fast Case study on the linux kernel In Proceedings of the 10th WorkingConference on Mining Software Repositories IEEE Press 101ndash110

[33] Guolin Ke Jia Zhang Zhenhui Xu Jiang Bian and Tie-Yan Liu 2019 TabNNa universal neural network solution for tabular data httpsopenreviewnetforumid=r1eJssCqY7

[34] Foutse Khomh Brian Chan Ying Zou and Ahmed E Hassan 2011 An entropyevaluation approach for triaging field crashes A case study of mozilla firefox InReverse Engineering (WCRE) 2011 18th Working Conference on IEEE 261ndash270

[35] Andy Liaw and Matthew Wiener 2002 Classification and regression by randomforest R news 2 3 (2002) 18ndash22

[36] Erik Linstead Paul Rigor Sushil Bajracharya Cristina Lopes and Pierre Baldi2007 Mining eclipse developer contributions via author-topic models In FourthInternational Workshop on Mining Software Repositories (MSRrsquo07 ICSE Workshops2007) IEEE 30ndash30

[37] Xi Long Wenjing Yin Le An Haiying Ni Lixian Huang Qi Luo and Yan Chen2012 Churn analysis of online social network users using data mining techniquesIn Proceedings of the international MultiConference of Engineers and ConputerScientists Vol 1

[38] James MacQueen 1967 Some methods for classification and analysis of multivari-ate observations In Proceedings of the fifth Berkeley symposium on mathematicalstatistics and probability Vol 1 Oakland CA USA 281ndash297

[39] Lionel Marks Ying Zou and Ahmed E Hassan 2011 Studying the fix-timefor bugs in large open source projects In Proceedings of the 7th InternationalConference on Predictive Models in Software Engineering ACM 11

[40] Andrew Kachites McCallum 2002 Mallet A machine learning for languagetoolkit (2002)

[41] Miloš Milošević Nenad Živić and Igor Andjelković 2017 Early churn predic-tion with personalized targeting in mobile social games Expert Systems withApplications 83 (2017) 326ndash332

[42] Audris Mockus Roy T Fielding and James Herbsleb 2000 A case study ofopen source software development the Apache server In Proceedings of the 22ndinternational conference on Software engineering Acm 263ndash272

[43] Gustavo Percio Zimmermann Montesdioca and Antocircnio Carlos Gastaud Maccedilada2015 Measuring user satisfaction with information security practices Computersamp Security 48 (2015) 267ndash280

[44] Leann Myers and Maria J Sirois 2004 Spearman correlation coefficients differ-ences between Encyclopedia of statistical sciences 12 (2004)

[45] Didrik Nielsen 2016 Tree boosting with XGBoost-why does XGBoost win EveryMachine Learning Competition Masterrsquos thesis NTNU

[46] Ehsan Noei Daniel Alencar Da Costa and Ying Zou 2018 Winning the appproduction rally In Proceedings of the 2018 26th ACM Joint Meeting on EuropeanSoftware Engineering Conference and Symposium on the Foundations of SoftwareEngineering ACM 283ndash294

[47] Ehsan Noei Feng Zhang Shaohua Wang and Ying Zou 2019 Towards pri-oritizing user-related issue reports of mobile applications Empirical SoftwareEngineering 24 4 (2019) 1964ndash1996

[48] Ehsan Noei Feng Zhang and Ying Zou 2019 Too Many User-Reviews WhatShould App Developers Look at First IEEE Transactions on Software Engineering(2019)

[49] Derek Orsquocallaghan Derek Greene Joe Carthy and Paacutedraig Cunningham 2015An analysis of the coherence of descriptors in topic modeling Expert Systemswith Applications 42 13 (2015) 5645ndash5657

[50] Ya A Pachepsky Dennis Timlin and GY Varallyay 1996 Artificial neural net-works to estimate soil water retention from easily measurable data Soil ScienceSociety of America Journal 60 3 (1996) 727ndash733

[51] Mahesh Pal 2005 Random forest classifier for remote sensing classificationInternational Journal of Remote Sensing 26 1 (2005) 217ndash222

[52] Sebastiano Panichella Andrea Di Sorbo Emitza Guzman Corrado A VisaggioGerardo Canfora and Harald C Gall 2015 How can i improve my app classifyinguser reviews for software maintenance and evolution In Software maintenanceand evolution (ICSME) 2015 IEEE international conference on IEEE 281ndash290

11

1277

1278

1279

1280

1281

1282

1283

1284

1285

1286

1287

1288

1289

1290

1291

1292

1293

1294

1295

1296

1297

1298

1299

1300

1301

1302

1303

1304

1305

1306

1307

1308

1309

1310

1311

1312

1313

1314

1315

1316

1317

1318

1319

1320

1321

1322

1323

1324

1325

1326

1327

1328

1329

1330

1331

1332

1333

1334

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

1335

1336

1337

1338

1339

1340

1341

1342

1343

1344

1345

1346

1347

1348

1349

1350

1351

1352

1353

1354

1355

1356

1357

1358

1359

1360

1361

1362

1363

1364

1365

1366

1367

1368

1369

1370

1371

1372

1373

1374

1375

1376

1377

1378

1379

1380

1381

1382

1383

1384

1385

1386

1387

1388

1389

1390

1391

1392

[53] Chitra Phadke Huseyin Uzunalioglu Veena B Mendiratta Dan Kushnir andDerek Doran 2013 Prediction of subscriber churn using social network analysisBell Labs Technical Journal 17 4 (2013) 63ndash76

[54] Peter C Rigby Daniel M German and Margaret-Anne Storey 2008 Open sourcesoftware peer review practices a case study of the apache server In Proceedingsof the 30th international conference on Software engineering ACM 541ndash550

[55] Martin P Robillard and Robert Deline 2011 A field study of API learning obstaclesEmpirical Software Engineering 16 6 (2011) 703ndash732

[56] Michael Roumlder Andreas Both and Alexander Hinneburg 2015 Exploring thespace of topic coherence measures In Proceedings of the eighth ACM internationalconference on Web search and data mining ACM 399ndash408

[57] Victor Francisco Rodriguez-Galiano Bardan Ghimire John Rogan Mario Chica-Olmo and Juan Pedro Rigol-Sanchez 2012 An assessment of the effectivenessof a random forest classifier for land-cover classification ISPRS Journal of Pho-togrammetry and Remote Sensing 67 (2012) 93ndash104

[58] Hiroaki Sakoe and Seibi Chiba 1978 Dynamic programming algorithm opti-mization for spoken word recognition IEEE transactions on acoustics speech andsignal processing 26 1 (1978) 43ndash49

[59] Stan Salvador and Philip Chan 2007 Toward accurate dynamic time warping inlinear time and space Intelligent Data Analysis 11 5 (2007) 561ndash580

[60] Peter B Seddon 1997 A respecification and extension of the DeLone and McLeanmodel of IS success Information systems research 8 3 (1997) 240ndash253

[61] Rudy Setiono Bart Baesens and Christophe Mues 2008 Recursive neural net-work rule extraction for data with mixed attributes IEEE Transactions on NeuralNetworks 19 2 (2008) 299ndash307

[62] Dong-Hee Shin 2015 Effect of the customer experience on satisfaction withsmartphones Assessing smart satisfaction index with partial least squaresTelecommunications Policy 39 8 (2015) 627ndash641

[63] Tom De Smedt and Walter Daelemans 2012 Pattern for python Journal ofMachine Learning Research 13 Jun (2012) 2063ndash2067

[64] Daniel Svozil Vladimir Kvasnicka and Jiri Pospichal 1997 Introduction to multi-layer feed-forward neural networks Chemometrics and intelligent laboratorysystems 39 1 (1997) 43ndash62

[65] Shaheen Syed and Marco Spruit 2017 Full-text or abstract Examining topiccoherence scores using latent dirichlet allocation In 2017 IEEE Internationalconference on data science and advanced analytics (DSAA) IEEE 165ndash174

[66] Chakkrit Tantithamthavorn Shane McIntosh Ahmed E Hassan and KenichiMatsumoto 2016 Automated parameter optimization of classification techniquesfor defect prediction models In 2016 IEEEACM 38th International Conference onSoftware Engineering (ICSE) IEEE 321ndash332

[67] Chakkrit Tantithamthavorn Shane McIntosh Ahmed E Hassan and KenichiMatsumoto 2018 The impact of automated parameter optimization on defectprediction models IEEE Transactions on Software Engineering (2018)

[68] Romain Tavenard 2017 tslearn A machine learning toolkit dedicated to time-series data httpsgithubcomrtavenartslearn

[69] L Torlay Marcela Perrone-Bertolotti Elizabeth Thomas and Monica Baciu 2017Machine learningndashXGBoost analysis of language networks to classify patientswith epilepsy Brain informatics 4 3 (2017) 159

[70] Geoffrey G Towell and Jude W Shavlik 1993 Extracting refined rules fromknowledge-based neural networks Machine learning 13 1 (1993) 71ndash101

[71] Irfan Ullah Basit Raza Ahmad Kamran Malik Muhammad Imran Saif Ul Islamand SungWon Kim 2019 A churn prediction model using random forest analysisof machine learning techniques for churn prediction and factor identification intelecom sector IEEE Access 7 (2019) 60134ndash60149

[72] Lorenzo Villarroel Gabriele Bavota Barbara Russo Rocco Oliveto and Massimil-iano Di Penta 2016 Release planning of mobile apps based on user reviews InProceedings of the 38th International Conference on Software Engineering ACM14ndash24

[73] Chih-Ping Wei and I-Tang Chiu 2002 Turning telecommunications call detailsto churn prediction a data mining approach Expert systems with applications 232 (2002) 103ndash112

[74] Cathrin Weiss Rahul Premraj Thomas Zimmermann and Andreas Zeller 2007How long will it take to fix this bug In Fourth International Workshop on MiningSoftware Repositories (MSRrsquo07 ICSE Workshops 2007) IEEE 1ndash1

[75] Ian H Witten Eibe Frank and A Mark 2016 Hall and Christopher J Pal DataMining Practical machine learning tools and techniques (2016)

[76] Barbara H Wixom and Peter A Todd 2005 A theoretical integration of usersatisfaction and technology acceptance Information systems research 16 1 (2005)85ndash102

[77] Jae-Chern Yoo and Tae Hee Han 2009 Fast normalized cross-correlation Circuitssystems and signal processing 28 6 (2009) 819

12

  • Abstract
  • 1 Introduction
  • 2 Motivating Example
  • 3 Data Collection
  • 4 Research Questions amp Results
  • 5 Threats To Validity
  • 6 Related Work
  • 7 Conclusion
  • References
Page 3: On the Relationship between User Churn and Software Issues · relationship between the issues that are present in a software prod-uct and user churn. For this purpose, we investigate

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

Figure 2 httpsalternativetonet provides in-perspectivecomments where users specify why a certain software alter-native (eg Firefox) can be better than the current software(eg Chrome)

Figure 3 Recording the recommendation for an alternativefor Firefox

competitors For example a user mentions ldquo[Chrome] does not haveas many languages and spell checking support as Firefox does3rdquo

Because the opinions are voiced with respect to competitors (iethe in-perspective comments) alternativetonet provides a uniqueopportunity for empirical studies Particularly in our study thein-perspective comments allow us to investigate the most recurrentconcerns within the competition scene of a software domain (weperform this investigation in RQ0)

Another interesting and unique characteristic in the aterna-tivetonet data is the register of users that recommended alternativesfor a given software product (ie the potential user churn) Forexample Figure 3 shows how alternativetonet records when a userindicates that Chrome can be a good alternative to the Firefox webbrowser (along with the reason why) This data is important be-cause it can be used to proactively identify potential user churn insoftware products (eg by using machine learning) Based on thisinformation we build machine learning models and report on theirresults in RQ2 and RQ3

Finally the potential user churn provided by alternativetonetallow us to investigate whether it is related to prominent issuesthat may be present in a software product For example while auser has expressed the reason for a possible user churn in Figure 3(ie Jupyter is slow on Firefox) the development team of Firefoxis also working on an issue report that expresses the same issue(see Figure 4) In this paper we hypothesize that there may existstrong links between potential user churn and the issue reports thatare submitted to software products (we perform this investigation inRQ1)

3To provide an opinion as to whether a software product is a good or a bad alternativeto another product users must prove they are not robots Only afterwards a text boxcontaining the submission form will be provided

Figure 4 Issue report expressing the problem voiced by theuser in Figure 3

We capitalize on these unique characteristics of the alterna-tivetonet data to perform our empirical study on the relationshipbetween potential user churn and software issues

3 DATA COLLECTIONAlternativetonet organizes the software alternatives through severalmeans such as categories features or tags Categories are the broad-est way to categorize software alternatives and alternativetonet hasover 25 categories of software alternatives at the time of this studyTags and features are more specific ways to categorize softwarealternatives (eg the ldquoIDErdquo tag or the ldquoresponsive designrdquo feature)Another characteristic of tags and features is that they can be addedby users However these new tags and features must be verified bythe alternativetonet community before being added to the websiteFor the purposes of our study we use tags to filter our data forsoftware alternatives because it is more precise than using cate-gories For example we can easily filter for the software alternativeswithin the Web Browser domain using the ldquoweb browserrdquo tag Onthe other hand it would be hard to filter our data based on thefeatures For example filtering by ldquoresponsive designrdquo may retrievesoftware from very distinct domains

For the sake of data robustness it is tempting to analyze allthe software alternatives from all the software domains that areavailable on the alternativetonetwebsite However since one of ourmain objectives is to study the relationship between potential userchurn and the issues that are present in the software products weneed to restrict our analyses to the domains of certain software Forinstance it would be virtually impossible to collect and analyze allthe issue reports from every software alternative listed on the alter-nativetonet website Every different software alternative may usea different Issue Tracking System which would require differentways for collecting data Therefore we analyze the software alter-natives within the domains of three well established open sourceprojects Eclipse4 Firefox5 and the Apache server6 Our choice forthese three projects was based on the extensive prior empiricalsoftware engineering research that has been conducted using theseprojects over the years [2 7 18 19 29 36 42 54]

To collect the data for our study we followed the process de-picted in Figure 5 Given our chosen studied projects we collectall the software alternatives tagged with the ldquoBrowserrdquo ldquoIDErdquo andldquoWeb Serverrdquo tags which are the respective domains of the EclipseFirefox and Apache projects In total we collect 290 browser alter-natives 296 IDE alternatives and 134 web-server alternatives Oncethe software alternatives are fetched we collect the in-perspective4httpswwweclipseorg5httpswwwmozillaorgen-USfirefoxnew6httpshttpdapacheorg

3

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

Table 1 A summary of our collected data

Domain Alternatives Reviews Comments Potential churn Issue reports

IDEs 296 1088 3036 4319 (Eclipse) 22758 (Eclipse)

Browsers 290 1754 5112 6421 (Firefox) 40749 (Firefox)

Web Servers 124 714 1933 1809 (Apache) 20965 (Apache)

comments (explained in Section 2) and the reviews for all the soft-ware alternatives We also collect the potential user churn (explainedin Section 2) for the Eclipse Firefox and Apache projects Finally inaddition to the data collected from alternativetonet we also collectthe issue reports from the respective Issue Tracking Systems of ourstudied projects We use the potential user churn data combinedwith the issue reports to perform our investigations in RQ1-RQ3Table 1 summarizes the data collected for our study (ie the datatype and the amount of collected data)

4 RESEARCH QUESTIONS amp RESULTSIn this section we present our research questions (RQs) along withtheir results For each RQ we present its motivation and explainthe approach that is used to conduct the RQ

RQ0 What are the most significant usersrsquoconcerns for Browsers IDEs and Web ServersMotivation The motivation behind RQ0 is to understand the mainconcerns within the competition scene of a specific software do-main This investigation is important to highlight what are themain strengths and weaknesses within a particular domain (egBrowsers) This knowledge can be useful because for example anorganization can study the main weaknesses within a domain andbetter assess whether its productsservices fall within the sameflawsApproach To study the most significant concerns related toBrowsers IDEs and Web Servers we use the in-perspect commentsand reviews data that were explained in Section 3 To analyze suchdata we adopt the Latent Dirichlet Allocation (LDA) to find themain topics that are present in the in-perspective comments andreviews data We use the LDA implementation provided by theMALLET framework [40] LDA is a probabilistic generative methodthat assumes a Dirichlet distribution of latent topics LDA is partic-ular useful for us since we are interested in finding the main topicsthat are present in the in-perspective comments and reviews dataHowever The LDA model requires the data to be pre-processedin a certain format for optimal results Therefore we follow thestandard text pre-processing steps [48]

Fixing Typographical Errors Typographical errors can bedue to the usage of internet slang in a written text or simply due tocommon language mistakes The most recurrent of errors that wefind in our data is the repetition of characters For example userstend to use words such as ldquobeeeestrdquo to emphasize their feelings Weuse the Pattern For Python [63] package to fix such typographicalerrors

Removing Stop Words Examples of stop words are of byand the These are words that are recurrently used in written textbut do not possess meaning by themselves We use the package

nltk [8] to remove such stop words This package contains a well-known corpus of stop words and is widely used for text filtering

Stemming andLemmatization For better results words shouldbe repeated as frequently as they can since LDA relies heavily onthe occurrences of words in different sentences Hence the usageof stemming and lemmatization is essential These methods trans-form inflectional forms of a word to its basic form For examplethe words ldquofixedrdquo ldquofixingrdquo and ldquofixesrdquo are transformed to the basicform ldquofixrdquo Stemming works by removing the suffix of the word(eg the word ldquocatsrdquo becomes ldquocatrdquo by removing the ldquosrdquo) whilelemmatization retrieves the basic form of a word from a dictionaryknown as lemma (by using morphological analysis) We use bothstemming and lemmatization to ensure that we obtain the mostunique words possible

Forming Bigrams and Trigrams Forming bigrams and tri-grams is essential to group the words which frequently appeartogether Some words do not inflict any significant meaning for theLDA model on their own However when grouped together suchwords may indicate a specific topic One example of these wordsis ldquocustomer supportrdquo which has a precise meaning (as opposed toonly ldquocustomerrdquo or ldquosupportrdquo which can hold diverse meanings)

To choose the optimal number of topics in our LDAmodel we usethe coherence scoremetric [56] This metric was found to be the bestin alliance with human perception [49 65] However even whenrelying on the coherence score (which naturally limits redundancyin the produced topics) there is still the possibility of duplicatetopics being generated Therefore we perform a manual analysisof the produced topics to eliminate duplicates Our LDA modelproduces topics in the form of keywords and their percentage-of-significance with respect to the topic For example (0097 lowastldquo119904119900119906119903119888119890rdquo + 0085 lowast ldquo119901119903119894119907119886119888119910rdquo + 0053 lowast ldquo119905119903119886119888119896rdquo + 0050 lowast ldquo119901119903119900 119891 119894119905rdquo +0038 lowast ldquo119902119906119886119899119905119906119898rdquo + 0038 lowast ldquo119888119900119897119897119890119888119905rdquo + 0035 lowast ldquo119901119903119900119905119890119888119905rdquo + 0032 lowastldquo119886119903119888ℎ119894119907119890rdquo + 0029 lowast ldquo119898119900119899119890119910rdquo + 0026 lowast ldquo119901119890119903 119891 119900119903119898119886119899119888119890rdquo) are a set ofkeywords that indicate Privacy as high level topic But some otherdistribution of keywords could refer to the same topic Two of theauthors conduct together the manual analyses of the topics Thefinal set of topics is only produced when full consensus is reachedbetween authors (ie the authors went over and discussed all thetopics) Because both authors have analyzed together all topics andagreements were reached on all the topics in the end we did notcompute inter-rater agreement

Once the topics are defined we extract 100 examples (aiming fora 95 confidence level with a 10 margin error) from our data foreach studied domain (ie IDE Browser and Web Server) totalling300 examples For each example two of the authors assign theirtopic Again two authors analyze and discuss each example sothat consensus is reached regarding the assigned topic Afterwardswe compute the topic distribution for each domain which canrepresent the most recurrent concerns for that domainFindings Tables 2 and 3 show the topics obtained by our LDAmodelWe also show some of the examples for each topic in the table(all the examples and analyses are available in our replication pack-age which is hosted on httpsdoiorg105281zenodo3610584

Our obtained topics can be categorized into (i) general topicswhich are present in the three domains and (ii) domain-specifictopics which might still be present in other domains but are more

4

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

Alternativetonet

Issue tracking system

Fetch data from

studied domains

IDE Browser

and Web Server

alternatives

Fetch in-

perspective

comment and

reviews

In-perspective

comments

Software

reviews

RQ0

Competition

scene of a

domain

Fetch issues

from Firefox

Eclipse and

Apache

Fetch potential

churn from

Eclipse Firefox

and Apache

Potential churns

Issue reports

Issues amp

Potential churn

pairs

RQ1 Correlating

issues with

potential churns

RQ2 Machine

learning to

predict potential

churns

RQ3 Important

features

associated with

potential churns

Figure 5 An overview of our data collection process

dominant in a specific domain Table 4 shows the distribution oftopics per domain based on our manual analyses We observe thatrelease and updates memory issues bugscrashes and cross-platformavailability can be considered as general topics (ie they are im-portant in all the three analyzed domains) Indeed these topics areintuitively present when handling any kind of software

On the other hand Documentation and testing were presentfor web servers and IDEs signalling that a robust documentation(which is often overlooked [55]) and with good support for test-ing tools may definitely attractretain more users when competingin the IDE and Web Sever Market For example an organizationmay direct their attention to these aspects when studying theircompetition

The bookmarks privacy and security were exclusively present inthe Browsers domain signalling that these are essential features toretain or attract users Simplistic design debug options code refac-toring and auto-completion are topics that target IDEs exclusively

Finally complaints about release and updates memory issues andbugscrashes can be considered as the main topics that drive usersto churn These topics were present in the three domains with thehighest percentages among each of the domains The percentagescombined exceed 50 in each of the domains (see Table 4)Summary Our observations suggest that the usersrsquo concerns aretightly related to issues (eg bugs or crashes) that are present inthe respective software products

RQ1 Can we find a correlation betweenpotential user churn and issue reportsMotivation In RQ0 we observe that software issues (eg bugsamp crashes and memory issues) are common concerns that maymotivate users to churn With such an observation we hypothesizein RQ1 that the potential usersrsquochurn observed on alternativetonetmay be correlated with the issue reports which developers addressin their respective projects This analysis is important because ifsuch a correlation exists the potential user churn information canbe possibly used to prioritize issue reports (for example the moreassociated with potential user churn the more important the issuereport)

Approach In our work we approach the recommendations foralternatives (ie potential usersrsquochurn) and the issue reports asevents occurring over time and interpret them as two time-seriesSuch an interpretation is handy for two reasons (i) we can studythe correlation between two trends (potential usersrsquochurn and issuereports) and (ii) we can study whether the specific peaks in issuereports can be associated with peaks in potential usersrsquochurn

In our study we group the two time series in a weekly basisie all the potential user churn that occurred within a week aregrouped together We use the weekly grouping because groupingissue reports in a daily-basis is too narrow to form a pattern whilegrouping them in a monthly-basis can over-generalize the patternsin issues The time-series of potential usersrsquochurn and the time-series of issues showcase the events (grouped in a weekly-basis)from March 2015 (the earliest date on alternativetonet) to July 2019forming 200 weeks grouping

The first step to correlated our two time-series is to verifywhetherthere exists a global correlation between them (ie whether thereexist common patterns among the increasing and decreasing trendsover time) To do so we use the cross-correlation which is a metricused in signal processing [77] This correlation calculates the simi-larity between two time-series as a function of the displacementfrom one time-series to another For example given two functions119891 and 119892 the cross-correlation calculates the degree to which a shiftin 119892 (along the x-axis) is identical to the shift in 119891

However since we are also interested in analyzing the relation-ship between the peaks of our two time-series we use the DynamicTimeWarping (DTW) [58] which was designed for comparing time-series varying at different speeds A peak in potential usersrsquochurnmay be followed by a peak in issue reports (or vice-versa) ie thepeaks may not necessarily occur at the same time Differently fromthe Eucledian distance which assumes that a point 119894 in a time-seriesmust be aligned to the 119894119905ℎ point in another time-series [38 75] DTWallows the variance in the offset between two time-series [59]

To employ DTW we use the tslearn [68] which is a machinelearning toolkit specialized for time series analysis The DTW com-putation produces a matrix that shows the distance between everytwo points in the graph of the two time-series The algorithm isdesigned in a way to choose the closest point that respects the

5

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

633

634

635

636

637

638

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

639

640

641

642

643

644

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

691

692

693

694

695

696

Table 2 The predominant topics in alternativetonet

Release and updatesKeywords ldquo119907119890119903119904119894119900119899rdquo ldquo119897119900119899119892119890119903 rdquo ldquo119903119890119897119890119886119904119890rdquo ldquo119889119900119908119899119897119900119886119889rdquo ldquo119900 119891 119891 119894119888119894119886119897rdquoExamples - The program seems to be no longer updated Last versions (2018) can be still downloaded from the official website

- Firefox version 60+ (Quantum) is better than Chrome Compare to previous versions of Firefox

Bookmarks FunctionalityKeywords ldquo119887119900119900119896119898119886119903119896rdquo ldquo119901119886119899119890119897rdquo ldquo119891 119886119907119900119903119894119905119890rdquo ldquo119904119906119901119901119900119903119905rdquo ldquo119904119910119899119888rdquoExamples - Firefox now allows bookmarks importing from Chrome which is a huge plus

- With the recent upgrade to 20 (including the bookmarks syncing feature that I liked in Firefox) Vivaldi just got better

Flash supportKeywords ldquo119901119897119906119892119894119899rdquo ldquo119907119894119889119890119900rdquo ldquo119894119899119904119905119886119897119897rdquo ldquo119891 119897119886119904ℎrdquo ldquo119898119890119889119894119886 minus 119901119897119886119910119890119903 rdquoExamples - Since its the only way to watch Flash video on an iPhone without jailbreaking Skyfire is a killer app thats worth getting for anyone Playback of Flash video over WiFi is

flawless and the quality is at very least watchable

Memory issuesKeywords ldquo119898119890119898119900119903119910rdquo ldquo119877119860119872rdquo ldquo119897119886119892rdquo ldquo119897119894119898119894119905rdquo ldquoℎ119890119886119907119910rdquoExamples - Great C++ IDE which balances memory usage and indexing solution Contains some basic refactoring functions Good for large projects and limited RAM with remaining

some capabilities of more complex C++ IDEs- Eclipse is really lagging after the last update The memory is filled whenever I run a Java swing applet- Chrome is so heavy it lags alot when I open multiple tabs

Privacy and securityKeywords ldquo119901119903119894119907119886119888119910rdquo ldquo119904119890119888119906119903119894119905119910rdquo ldquo119894119898119901119900119903119905119886119899119905rdquo ldquo119886119897119897119900119908rdquo ldquo119888119900119899119905119903119900119897rdquoExamples - Firefox allows far more control over your privacy than does Chrome (or even Chromium on which Chrome is based)

- Google Chrome is a great browser and a lot better than Vivaldi Browser Chrome keeps updating with cool stuff like the new cast feature better extensions and lot betterprivacy and security Vivaldi has more customizing features but has a long way to go to compete with Chrome

BugsCrashesKeywords ldquo119888119903119886119904ℎrdquo ldquo119887119906119892rdquo ldquo119891 119894119909rdquo ldquo119890119909119901119890119903119894119890119899119888119890rdquo ldquo119894119904119904119906119890rdquoExamples - Lots of users experience Adobe Flash Player hangs or crashes I dont know why it isnt fixed yet

If Chrome ever got its proxy support bugs fixed it would leap to the top of the list and become pretty much the only web browser Id ever use- I worked with NetBeans for a couple of years and always loved it but for professional users the yearly fee is well spent Nicer interface better Code Intelligence moreoptions to fine tune what happens on saving a file (eg saving or uploading to various locations) You get a couple of feature update each year plus bugfixes whereasNetBeans is sometimes very slow to update things (support for new version of a language) or to fix known bugs

Table 3 The predominant topics in alternativetonet (continued)

Debug optionsKeywords ldquo119891 119890119886119905119906119903119890rdquo ldquo119889119890119907119890119897119900119901119898119890119899119905rdquo ldquo119889119890119887119906119892rdquo ldquo119894119899119905119890119892119903119886119905119894119900119899rdquo ldquo119901119890119903 119891 119890119888119905rdquoExamples - Not enough debug and run options just a text editor Atom has not got any solid debugging features (and packages) out of the box or at all

- This is my favorite code editor It is open source and portable you can get a portable version that will fit onto a flash drive if youwant It is just as configurableas Sublime Text and has a large community producing great plugins it has GIT support built in and excellent support for Javascript HTML CSS and Pythonand others of course It has an integrated debugging system

Simplistic designKeywords ldquo119904119894119898119901119897119890rdquo ldquo119890119889119894119905119900119903 rdquo ldquo119897119894119892ℎ119905119908119890119894119892ℎ119905rdquo ldquo119889119890119904119894119892119899rdquo ldquo119899119900119905119890119901119886119889rdquoExamples - Really lightweight with some advanced not complete but easy to use auto complete features Simple design and customizable

Cross-platformKeywords ldquo119888119903119900119904119904 minus 119901119897119886119905 119891 119900119903119898rdquo ldquo119871119894119899119906119909rdquo ldquo119900119901119890119899 minus 119904119900119906119903119888119890rdquo ldquo119886119907119886119894119897119886119887119897119890rdquo ldquo119894119899119904119905119886119897119897rdquoExamples - Its cross platform and open source and bring the benefits of the JetBrains IDE platform which is well polished and powerful

- One of the best open source browsers that work on Linux and Windows

Code refactoring and auto-completionKeywords ldquo119888119900119889119890rdquo ldquordquo119886119906119905119900 minus 119888119900119898119901119897119890119905119894119900119899rdquo ldquo119903119890 119891 119886119888119905119900119903 rdquo ldquo119900119901119905119894119900119899rdquo ldquo119906119904119890rdquoExamples - For its young age it is well matured and a very powerful IDE with great CMake Support and other goodies like great refactoring tools and a sophisticated

auto-completion system

TestingKeywords ldquo119905119890119904119905rdquo ldquo119904119890119903119907119890119903 rdquo ldquo119889119890119901119897119900119910rdquo ldquo119908119890119887119904119894119905119890rdquo ldquo119904119890119905rdquoExamples - XAMPP is great tool to develop and test your website (particularly if it uses php and mysql databases) offline before putting it on a live server You could

also use XAMPP as an easy method of setting up a live server- Eclipse is far more superior than netbeans in unit testing functionality

DocumentationKeywords ldquo119889119900119888119906119898119890119899119905119886119905119894119900119899rdquo ldquo119904119890119903119907119890119903 rdquo ldquo119901119900119903119905rdquo ldquo119889119886119905119886119887119886119904119890rdquo ldquo119891 119900119903119906119898rdquoExamples - Easiest way to run a web server on Windows The performance is surprisingly good and lots of documentation on how to manage it

- Support is via forum only Documentation can be very poor particularly around product upgrades (one expects more from commercial packages) Due totheir poor documentation I lost several years of development databases when porting from an old to a new computer (they dont accurately document thelocation of the database files)- Eclipse lacks in adding plugins documentation

time constraints for continuity and monotonicity Finally two co-authors performed a manual analysis of the observed peaks in the

two time-series7 The goal of the manual analyses is to find whetherthe identified peaks are indeed likely related (eg the issue reports

7We share the data of our manual analyses at httpsdoiorg105281zenodo3610584

6

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

808

809

810

811

812

Table 4 The extracted topics percentages distribution overthe three domains

Topics Browsers() IDEs() Web servers()Release and updates 18 17 21Bookmarks functionality 12 0 0Memory issues 23 22 11Privacy and security 8 0 0BugsCrashes 24 19 20Debug options 0 10 0Simplistic design 0 8 2Cross-platform 8 6 5Code refactoring and auto-completion 0 10 0Testing 0 2 17Documentation 0 2 15Miscellaneous 7 4 9

within a peak actually describe problems mentioned in the poten-tial usersrsquochurn) Again both co-authors analyze and discuss everyanalyzed sample

In this RQ we perform our analyses only in our studied projects(as opposed to analyzing all the software alternatives in their do-mains) As discussed in Section 3 it would be impracticable toextract the data from all the different Issue Tracking Systems of thesoftware alternatives in the studied domainsFindingsWe respectively observe a cross-correlation of 076 072and 070 in Firefox Eclipse and Apache These cross-correlationscores indicate that there exist a global commonality between thetrends of the two time-series regardless of the timing factor

Regarding our DTW analysis we observe that the time fromthe filing of an issue report to the time of a spark in potentialusersrsquochurn span from 2 to 4 weeks (on the median for the threeprojects) Additionally our DTW correlation reveals a bidirectionalrelationship between the peaks in issue reports and the peaks inpotential usersrsquochurn For the cases where a peak in issue reportsleads to a peak in potential usersrsquochurn it is intuitive to think thatthe peak in potential usersrsquochurn is due to the frustration generatedby the issues described in the reports On the other hand the caseswhere a peak in potential usersrsquochurn is followed by a peak inissue reports may occur due to users being frustrated over non-obvious issues For example if we consider the potential user churndepicted in Figure 3 (ie ldquoJupyter is slow on Firefoxrdquo) the users mightnot find it obvious that such a problem should be reported to theFirefox team (eg users may simply interpret it as a characteristicof Firefox) Therefore in such cases issue reports would be filledonly a while after the potential usersrsquochurn have been expressed

We are specifically interested in studying the relationship be-tween reported issues (ie the documented issues in the trackersystems) and the potential user churn Our rationale is that under-standing the characteristics of issues that are associated with userchurn can help developers to prioritize such issues and avoid futurechurn (thereby retaining more users)

Finally Table 5 shows a subset of the examples found in ourmanual analyses We can indeed observe the apparent relationshipbetween the comments within the potential usersrsquochurn and thedescriptions (or titles) of issue reports The examples are followinga unidirectional link in which the issue report that occurs at acertain time would induce the potential user churn at a later timeSummary Our results suggest that there exists a significant cor-relation between software issues and potential user churn

RQ2 Can we use machine learning models topredict the rate of potential user churnMotivation In the practice most of the issue reports undergo hu-man estimations of the priority and the severity [5] usually basedon internal business-oriented factors (such as ROI [25]) Issuesmight be prioritized by their recency the affected platforms theversions of the software where the issue is present Feeding a ma-chine learning model to identify the potential user churn that arecaused by issues may help automate the issue report prioritizationprocess and ultimately save huge costs in software developmentApproach We train three different machine learning models topredict low and high user churn rates by using information from is-sue reports The first model is a Feed-Forward Neural Network withtwo hidden layers developed by using the framework Keras [17]Keras is widely used to train simple neural networks as it hasa higher abstraction than TensorFlow [1] Feed-Forward NeuralNetworks [64] are the most suitable type of neural networks tobe trained on tabular data [27 33 50] This network evaluates allthe possible combinations of different metrics values to guide thepredictions

Our second model is the XGboost model which is a gradientboosting tree [12 13] XGboost is widely used for binary classifica-tion problems Research has shown that XGboost performs betterthan neural networks and classic classification machine learningalgorithms in problems dealing with tabular data [14 45 69]

We also train Random Forest models [35] Random Forest isa classical machine learning algorithm which is widely used inproblems dealing with tabular data [26 51 57] In addition Ran-dom Forest models have been widely used in software engineeringresearch such as in defect prediction [67]

The goal of ourmodels is to predict the rate of potential usersrsquochurngiven the characteristics of issue reports The data used to fit ourmodels is the data obtained from our DTW associations Simply putthe DTW provides us with 119875 pairs of issue reports 119868 and potentialusersrsquochurn 119862 (ie 119875 lt 119868 119862 gt) Therefore we study the character-istics of the issues 119868 to predict the amount of potential user churn119862 We split the distribution of 119862 into two percentiles (ie low andhigh) The low percentile being from 0 to 50 and the high from51 to 100 Thus our models output dichotomous predictions iewhether the potential usersrsquochurn is low or high

The characteristics of issue reports that we include in our models(henceforth referred to as features) are described in Table 6 Thesefeatures are collected from the Issue Tracking systems of our threestudied projects We study features such as Component Hardwareand OS to account for the locality and the spread of the issuesFor example these features can indicate whether platform-specificissues or more general issues have more potential user churn

Other features such as the Severity and Priority showcase theprioritization used by developers The Severity and Priority areimportant for us to verify whether the prioritization performed bydevelopers is aligned with the rate of potential user churn

Features related to the type of the bugs (ie whether it is acrash) whether a version number is present or the target milestoneprovide us information regarding the tracking process of issuesThey are important to understand whether better-tracked issues

7

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

870

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

871

872

873

874

875

876

877

878

879

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

Table 5 Manual investigation of DTW correlation linking

FirefoxExample 1Correlation Linking Week 164 in issues to week 169 in switches (May 2018)Issue Report Title Firefox version 5303 appears high memory high resources usage and might causing for SSD damagesIssue Description Mozilla Firefox it is caching all of this data 4-5GB on my new SSD and its lost 4 from its health within 5 months due continuously writesoverwrites firefox cache on my SSDAltrnativeto Comment Switching to Chrome after version 5303 high memory is damaging my SSDExample 2Correlation Linking Week 66 in issues to week 68 in switches (November 2016)Issue Report Title Sessions are not cleared in the private windowIssue Description After closing private window (not a tab) and opening a new one I still logged inAltrnativeto Comment Chrome incognito functionality is more stable than Firefox Firefox doesnrsquot resets sessions

EclipseExample 1Correlation Linking Week 10 in issues to week 11 in switches (July 2015)Issue Report Title Not Java 8 CompatibleIssue Description When downloading Eclipse for the first time for Java development 64-bit (which is what my computer is) it is not running because it is not compatible with Java version 8

(which is the latest version of Java) but appears to be compatible with version 7 still -DosgirequiredJavaVersion=17Altrnativeto Comment Using Netbeans because Eclipse is not installingExample 2Correlation Linking Week 211 in issues to week 213 in switches (February 2019)Issue Report Title jar files in the exported application do not contain class filesIssue Description The application has many plugins It builds and runs well within eclipse but when I export the application using the eclipse export product wizard the jar files (corresponding to

the plugins) do not contain any class files they include meta-inf pluginxml and all the extra folders (eg lib assets etc when available) but they do NOT contain any class filesAltrnativeto Comment Didnrsquot find any problems in exporting jar (when recommending IntelliJ)

ApacheExample 1Correlation Linking Week 113 in issues to week 117 in switches (July 2017)Issue Report Title httpd 2426 no longer building against lua 531 or lua 534Issue Description Building Apache 2426 against lua 531 or 534 compiled from scratch (latest version as of this writing) does not work Compilation against lua 531 did work with 2425Altrnativeto Comment Some issues while building with lua (when recommending XAMPP)Example 2Correlation Linking Week 12 in issues to week 13 in switches (September 2019)Issue Report Title Apache Server is restarted time to time (Release 2410)Issue Description I am using Apache release 2410 in Windows Server 2008 Time to time the Apache server is getting restarted The service is unavailable for 3-5 minutes and after that the service

is started up automatically This issue occurs frequently twice a day two days once etcAltrnativeto Comment XAMPP is more stable I am experiencing random restarts throughout the day when using Apache

Table 6 The issue reports attributes for Firefox Eclipse andApache

Metric Description

Component consists of different components of the system such as bookmarkstheme toolbar etc

Status describes the status of the issue such as unconfirmed open re-solved etc

Resolution describes the resolution type of the issue such as fixed duplicateinvalid etc

Severity the assigned severity such as blocker critical normal etcHardware the different hardware affected by the issue such as desktop mobile

32 or 64 bits etcOS the different OS affected by the issue such as OSX Windows etcPriority the assigned priority from 1 to 5Type the assigned issue type such as defect enhancement or taskTime Alive the time difference between the date opened and the date resolved

in minutesTime since last change the time difference between the date opened and the date of last

change in minutesIs a Crash a boolean value to indicate if it is a crash reportHas a Version Number a boolean whether the issue report has a version numberHas a Target Milestone a boolean whether the issue report has a target numberHas a Regression Range a boolean value to indicate if the bug causes a regression testingNumber of comments the number of comments for the reportNumber of votes the number of votes for the report which validates the integrity of

the report by other usersNumber of Blocks the number of issues that are blocked by that issue until it is re-

solved

(ie with more tracking information) are associated with potentialusersrsquochurn

The number of comments votes and blocks showcase the effortinvested on discussing an issue Finally the time-alive and the time-since-last-change are two features related to the life-cycle of an issuereport Longer time-alive values indicate a lingering issue that hasnot been resolved yet while a short time-since-last-change values

indicate issues that have been recently addressed (and thus can stillbe reoccurring)

Our studied features are inspired by previous research in soft-ware engineering that have built machine learning models usinginformation present in issue reports [15 16 16 18 19 29 32 39 74]

We chose features that are common across the issue reports ofFirefox Eclipse and Apache We filtered out features that are uniqueto an issue tracker system only to ensure a fair comparison betweenour studied issue reports Afterwards we performed Spearmancorrelation tests [44] on the selected features in which each featureis tested against the set of all the other features We did not observeany correlation obtaining a value higher than 07 which suggeststhat our selected features are not correlated and can be safely fedinto our machine learning models [30]

To evaluate the performance of our models we use the Area Un-der the Curvemetric (AUC) AUC is useful for evaluating our modelsbecause our models output probabilities Therefore AUC showsthe discrimination power of our models in every probability thresh-old (unlike Precision Recall or F-measure which are limited toevaluating models at a single probability threshold [66]) The AUCvalues range from 0 to 1 An AUC of 05 denotes a random guessingmodel while an AUC of 1 denotes a perfect distinguishing powerAn AUC of 0 denotes a model with perfect inverse predictions

Given the temporal nature of our data (ie future issue reportscannot be used to predict the potential user churn of past issuereports) we adopt a leave-one-out validation approach to obtainour AUC values

First our issue reports have already been sorted when we per-form the time-series analyses Second we split our data into two

8

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

987

988

989

990

991

992

993

994

995

996

997

998

999

1000

1001

1002

1003

1004

1005

1006

1007

1008

1009

1010

1011

1012

1013

1014

1015

1016

1017

1018

1019

1020

1021

1022

1023

1024

1025

1026

1027

1028

1029

1030

1031

1032

1033

1034

1035

1036

1037

1038

1039

1040

1041

1042

1043

1044

sets one containing 80 of the data ie the training set and an-other containing 20 ie the validation set In the first iterationof our models we use 80 of the data to train our models and weuse the first element of the validation set to test out models Thenthe leave-one-out validation iteratively increases the training setsize by one element (from the validation set) and predicts the nextelement of the validation set This process is repeated until thetraining set reaches the size of 119873 minus 1 (where 119873 is the size of ourdata set) Finally our obtained AUC values are the aggregation(ie median) of all the AUC values obtained in the leave-one-outiterationsFindingsTheAUC values obtained for our threemodels are shownin our online appendix8 The XGboost model performs the best inthe three studied projects obtaining AUC values that exceed 083There exists a subtle difference between the AUC values of theXGboost model and the Feed Forward Neural Networks AUCssince both the XGboost model and the Neural Networks can graspthe complex relationships in our issue reports dataSummary Our results suggest that machine learning models suchas XGboost and Feed Forward Neural Networks can predict therate of user churn with relatively high accuracies (in terms of AUC)Such predictions could help in the prioritization of issue reports

RQ3 What are the most important factorsrelated to potential user churnMotivation In RQ2 we observe that machine learning modelscan be used to predict the rate of potential usersrsquochurn Howeverdevelopers would hardly blindly trust a machine learning modelto help on their decisions (and they should not do so) Developerswould mostly benefit from understanding the reasons behind thepredictions of our machine learning models so that they couldpossibly adapt (or not) their development process Therefore inRQ3 we investigate the most important features in the predictionsof our machine learning modelApproach In this RQ we study the most important features in thepredictions of the XGboost models since they obtain the best AUCvalues in our three studied projects

To find the most important features we adopt a simple featureextraction algorithm [61 70] Consider that our models are trainedon a feature set 119883 = 1199091 1199092 119909119899 (as explained in RQ2) The featureextraction process consists of iteratively (i) removing each feature 119909119894from our models and (ii) computing the AUC without each feature119909119894 (by using the same leave-one-out validation process explainedin RQ2) The AUC values from the models without a feature 119909119894 arecompared to the models containing all the features (ie we takethe difference 119860119880119862119900119903119894119892119894119899119886119897 minus119860119880119862119903119890119898119900119907119890119889 ) The higher the drop inthe AUC value caused by removing a certain feature 119909119894 the higherthe importance of such a feature 119909119894 [4 9 10]Findings Table 7 shows our obtained results after performing thefeature extraction process The table shows the drop in the AUC foreach feature (ie ΔAUC) We also show the minimum maximummean and median values of the features in the 119897119900119908 user churn rate(ie 0) and ℎ119894119892ℎ user churn rate (ie 1) prediction classes

8httpsdoiorg105281zenodo3610584

The Issue Alive Time is the most important feature obtaining thehighest ΔAUC in our three studied projects This result suggeststhat it is unhealthy to leave issues hanging for a long time as theycan be perceived as a reason for users to churn

The Number of Comments obtains a considerable ΔAUC in ourthree studied systems The association between the Number ofComments and the rate of user churnmay occur due to themultitudenumber of users being affected by an issue

The Time-since-last-change is also an important feature in theFirefox and Eclipse projects This result suggests that issues withrecurring changes or modifications may signal a certain instabilityin the fixing process which might cause a potential user churn inthe future

It is worthy noting that features such as Severity Priority andType of Bugs do not obtain a high importance in the predictionswhich might suggest that the existing prioritization fields used inissue reports are misaligned with the potential user churnSummaryOur results suggest that long lived issues can potentiallylead to more potential userrsquos churn and that the existing processesfor prioritizing issues should be augmented to capture the highlyinteractive and long lived issues

5 THREATS TO VALIDITYConstruct Validity The main construct validity of our study isrelated to the assumption of a potential user churn The alterna-tivetonet website does not record exactly when a user has chosen asoftware alternative over another software Instead alternativetonetrecords that a user thinks that a software may be a good alternativeto another software The motivation behind the act of signaling analternative is not known by us For example there might be usersthat are simply driven by the passion of contributing to the alterna-tivetonet community which lead them to provide several opinionsas to which software could make good alternatives to others Wehave deliberately adopted the term potential user churn instead ofsimply user churn due to this unknown motivations behind signal-ing alternatives Moreover it is reasonable to think that if a givensoftware has received a considerable amount of recommendationfor alternatives this may likely indeed represent that users areconsidering to change the software

Internal Validity Our internal threats to validity are mainlyrelated to (i) our chosen features and (ii) our prediction classes Wesplit the peaks of potential user churn into the low and high classesOther research may find different results if the potential user churnis modeled in different way In addition we acknowledge that ourset of metrics is not exhaustive For example we have not studiedwhether the presence of a stack-trace in an issue report can berelated to future potential user churn We plan to (i) model thepotential user churn as a continuous variable and (ii) extend our setof features to predict the potential user churn in future research

External Validity Our study targeted three domains that areweb servers IDEs and web browsers These three domains werechosen due to their availability since we could fetch their issuereports data Although our study showed common factors in ouranalyses we cannot generalize our results to other domains orprojects with a different scale (ie smaller projects) Regardless ourwork provides insights of the possible reasons behind user churn

9

1045

1046

1047

1048

1049

1050

1051

1052

1053

1054

1055

1056

1057

1058

1059

1060

1061

1062

1063

1064

1065

1066

1067

1068

1069

1070

1071

1072

1073

1074

1075

1076

1077

1078

1079

1080

1081

1082

1083

1084

1085

1086

1087

1088

1089

1090

1091

1092

1093

1094

1095

1096

1097

1098

1099

1100

1101

1102

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

1103

1104

1105

1106

1107

1108

1109

1110

1111

1112

1113

1114

1115

1116

1117

1118

1119

1120

1121

1122

1123

1124

1125

1126

1127

1128

1129

1130

1131

1132

1133

1134

1135

1136

1137

1138

1139

1140

1141

1142

1143

1144

1145

1146

1147

1148

1149

1150

1151

1152

1153

1154

1155

1156

1157

1158

1159

1160

Table 7 The most important features for Firefox Eclipse and Apache ranked by their drop in AUC

Factor ΔAUC Min 0 Max 0 Mean 0 Median 0 Min 1 Max 1 Mean 1 Median 1

FirefoxTime Alive 013 0 120 1102 13 10 1487 4872 63OS All 007 0 1 014 0 0 1 083 1Number of Comments 006 1 65 894 12 1 458 2811 45Time Since Last Change 006 9 30 1090 22 0 5 410 3Has a Target Milestone 005 0 1 022 0 0 1 079 1Has a Version Number 004 0 1 033 0 0 1 064 1Component UI Web Payments 003 0 1 042 0 0 1 061 1

EclipseTime Alive 012 0 103 901 12 8 1334 4396 49Has a Target Milestone 006 0 1 019 0 0 1 082 1Time since last change 006 7 27 851 19 0 6 312 3Has a Version Number 005 0 1 034 0 0 1 077 1Component Debug 004 0 1 047 0 0 1 072 1Number of Comments 003 1 72 1092 11 1 632 2701 72OS All 002 0 1 050 0 0 1 068 1

ApacheTime Alive 014 0 136 1132 14 14 1521 5310 70Component Utils 010 0 1 022 0 0 1 091 1Has a Version Number 009 0 1 038 0 0 1 086 1Number of Comments 005 1 92 1232 6 1 352 4207 79OS All 004 0 1 033 0 0 1 079 1Status REOPENED 003 0 1 040 0 0 1 073 1Hardware All 002 0 1 051 0 0 1 078 1

6 RELATEDWORKGiven that we study the relationship between potential user churnand issue reports our research is related to the areas of usercustomerchurn and software quality Therefore we survey the related re-search around these two areas in this section

UserCustomerChurnUser or customer churn has beenwidelystudied in the area of telecommunication services [3 28 71] Aminet al [3] proposed a cross-company machine learning model topredict customer churn The authors also propose the use of datatransformation (eg by using the log function) to improve the train-ing data Ullah et al [71] used feature selection techniques such asInformation Gain and Correlation Attributes Ranking Filter to selectwhich features better explained the churn behaviour of differentcustomer groups The authors observed that a Random Forest modelperformed the best in their study While the Business to Customers(B2C) churn has been widely studied Figalist et al [28] recentlyinvestigated the Business to Business (B2B) churn ie when thecustomer business changes shifts to the services provided by thecompetition Figalist et al [28] allude that although B2B relation-ships tend to be more stable they have a much bigger financial im-pact when they change In terms of software engineering there hasbeen a lack of studies that have investigated usercostumer churndirectly Instead much research has been invested on customerreviews regarding software products [6 46 47] Noei et al [46]studied which features from mobile apps are associated with betterranks in the Google Play store While the work by Noei et al [46]does not study the customer churn directly striving for better ranksin Google Play may avoid user churn in mobile apps Bavota etal [6] investigated the relationship between the usage of fault- andchange-prone APIs in Android apps and user reviews Differentlyfrom all the aforementioned work our study investigates the userchurn is software products

Software Quality Although there exist a lack of studies ana-lyzing the user churn of software productsservices there has beena considerable amount of research on software quality [5 34 39 6667 74] Research on software quality is important since a quality

software is more stable and less defect-prone Ultimately such char-acteristics are tightly related to the observations of our study(iesoftware issues are correlated with user churn) A prominent areaof software quality research in software engineering is the defectprediction area [66 67] Tantithamthavorn et al [66] studied howmuch improvement can be obtained by automatically optimizingdefect prediction models Their research shows that automatic op-timization can yield significant or insignificant performance gainsdepending on the machine learning algorithm Other considerableamount of research has been invested on the triaging part of issuereports [5] and on the effort estimation of an issue report (in termsof time to address an issue) [39 74] In this paper we complementprior research by studying the relationship shared between issuereports (eg defects or required enhancements) with the potentialuser churn of users

7 CONCLUSIONIn this work we study the data available on alternativetonet tobetter understand the relationship between software issues and thepotential user churn of users Having observed that user concernsare tightly related to software issues (eg bugs) we investigatethe relationship between issue reports and the potential user churnof users Our study reveals key issues that must be addressed forthe success of a software product (depending on the domain) Forexample we observe that the potential user churn of users may betightly related to the lack of a robust documentation and support fortesting tools (in the ldquoIDErdquo and ldquoWeb Serverrdquo domains) Finally ourmachine learning models reveal that (i) the longer the issue takes tobe fixed the higher the chances of user churn and (ii) issues withinmore general software components are more likely to be associatedwith user churn Finally we suggest that the current prioritizationperformed by developers should be augmented to encompass thelong lived and highly interactive issues In overall our study sug-gests that the prioritization process of issues can be improved byconsidering the potential user churn of users associated with suchissues

10

1161

1162

1163

1164

1165

1166

1167

1168

1169

1170

1171

1172

1173

1174

1175

1176

1177

1178

1179

1180

1181

1182

1183

1184

1185

1186

1187

1188

1189

1190

1191

1192

1193

1194

1195

1196

1197

1198

1199

1200

1201

1202

1203

1204

1205

1206

1207

1208

1209

1210

1211

1212

1213

1214

1215

1216

1217

1218

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

1219

1220

1221

1222

1223

1224

1225

1226

1227

1228

1229

1230

1231

1232

1233

1234

1235

1236

1237

1238

1239

1240

1241

1242

1243

1244

1245

1246

1247

1248

1249

1250

1251

1252

1253

1254

1255

1256

1257

1258

1259

1260

1261

1262

1263

1264

1265

1266

1267

1268

1269

1270

1271

1272

1273

1274

1275

1276

REFERENCES[1] Martiacuten Abadi Paul Barham Jianmin Chen Zhifeng Chen Andy Davis Jeffrey

Dean Matthieu Devin Sanjay Ghemawat Geoffrey Irving Michael Isard Man-junath Kudlur Josh Levenberg Rajat Monga Sherry Moore Derek G MurrayBenoit Steiner Paul Tucker Vijay Vasudevan Pete Warden Martin Wicke YuanYu and Xiaoqiang Zheng 2016 Tensorflow A system for large-scale machinelearning In 12th USENIX Symposium on Operating Systems Design and Imple-mentation (OSDI 16) 265ndash283

[2] W Abdelmoez Mohamed Kholief and Fayrouz M Elsalmy 2012 Bug fix-time pre-diction model using naiumlve bayes classifier In 2012 22nd International Conferenceon Computer Theory and Applications (ICCTA) IEEE 167ndash172

[3] Adnan Amin Babar Shah Asad Masood Khattak Thar Baker and Sajid Anwar2018 Just-in-time customer churn prediction with and without data transforma-tion In 2018 IEEE Congress on Evolutionary Computation (CEC) IEEE 1ndash6

[4] Hafeez Ullah Amin Aamir Saeed Malik Rana Fayyaz Ahmad Nasreen BadruddinNidal Kamel Muhammad Hussain and Weng-Tink Chooi 2015 Feature extrac-tion and classification for EEG signals using wavelet transform and machinelearning techniques Australasian physical amp engineering sciences in medicine 381 (2015) 139ndash149

[5] John Anvik Lyndon Hiew and Gail C Murphy 2006 Who should fix this bugIn Proceedings of the 28th international conference on Software engineering ACM361ndash370

[6] Gabriele Bavota Mario Linares-Vasquez Carlos Eduardo Bernal-Cardenas Mas-similiano Di Penta Rocco Oliveto and Denys Poshyvanyk 2014 The impactof api change-and fault-proneness on the user ratings of android apps IEEETransactions on Software Engineering 41 4 (2014) 384ndash407

[7] Pamela Bhattacharya and Iulian Neamtiu 2011 Bug-fix time prediction modelscan we do better In Proceedings of the 8thWorking Conference on Mining SoftwareRepositories ACM 207ndash210

[8] Steven Bird and Edward Loper 2004 NLTK the natural language toolkit InProceedings of the ACL 2004 on Interactive poster and demonstration sessionsAssociation for Computational Linguistics 31

[9] Andrew P Bradley 1997 The use of the area under the ROC curve in theevaluation of machine learning algorithms Pattern recognition 30 7 (1997)1145ndash1159

[10] Leo Breiman 2001 Random forests Machine learning 45 1 (2001) 5ndash32[11] Albert Caruana and Michael T Ewing 2010 How corporate reputation quality

and value influence online loyalty Journal of Business Research 63 9-10 (2010)1103ndash1110

[12] Tianqi Chen and Carlos Guestrin 2016 Xgboost A scalable tree boosting systemIn Proceedings of the 22nd acm sigkdd international conference on knowledgediscovery and data mining ACM 785ndash794

[13] Tianqi Chen Tong He Michael Benesty Vadim Khotilovich and Yuan Tang 2015Xgboost extreme gradient boosting R package version 04-2 (2015) 1ndash4

[14] Wenbin Chen Kun Fu Jiawei Zuo Xinwei Zheng Tinglei Huang and WenjuanRen 2017 Radar emitter classification for large data set based on weighted-xgboost IET Radar Sonar amp Navigation 11 8 (2017) 1203ndash1207

[15] Morakot Choetkiertikul Hoa Khanh Dam Truyen Tran and Aditya Ghose 2015Predicting delays in software projects using networked classification (t) In 201530th IEEEACM International Conference on Automated Software Engineering (ASE)IEEE 353ndash364

[16] Morakot Choetkiertikul Hoa Khanh Dam Truyen Tran Aditya Ghose and JohnGrundy 2017 Predicting delivery capability in iterative software developmentIEEE Transactions on Software Engineering 44 6 (2017) 551ndash573

[17] Franccedilois Chollet 2017 Keras[18] Daniel Alencar Da Costa Shane McIntosh Uiraacute Kulesza Ahmed E Hassan and

Surafel Lemma Abebe 2018 An empirical study of the integration time of fixedissues Empirical Software Engineering 23 1 (2018) 334ndash383

[19] Daniel Alencar da Costa Shane McIntosh Christoph Treude Uiraacute Kulesza andAhmed E Hassan 2018 The impact of rapid release cycles on the integrationdelay of fixed issues Empirical Software Engineering 23 2 (2018) 835ndash904

[20] Piew Datta Brij Masand Deepak R Mani and Bin Li 2000 Automated cellularmodeling and prediction on a large scale Artificial Intelligence Review 14 6 (2000)485ndash502

[21] Kushal S Dave Vishal Vaingankar Sumanth Kolar and Vasudeva Varma 2013Timespent based models for predicting user retention In Proceedings of the 22ndinternational conference on World Wide Web ACM 331ndash342

[22] William H DeLone and Ephraim R McLean 1992 Information systems successThe quest for the dependent variable Information systems research 3 1 (1992)60ndash95

[23] William H DeLone and Ephraim R McLean 2003 Model of information systemssuccess a ten years update Journal of Management (2003)

[24] Gideon Dror Dan Pelleg Oleg Rokhlenko and Idan Szpektor 2012 Churnprediction in new users of Yahoo answers In Proceedings of the 21st InternationalConference on World Wide Web ACM 829ndash834

[25] Khaled El Emam 2005 The ROI from software quality Auerbach Publications[26] Katherine Ellis Jacqueline Kerr Suneeta Godbole Gert Lanckriet David Wing

and Simon Marshall 2014 A random forest classifier for the prediction of energy

expenditure and type of physical activity from wrist and hip accelerometersPhysiological measurement 35 11 (2014) 2191

[27] David Enke and Suraphan Thawornwong 2005 The use of datamining and neuralnetworks for forecasting stock market returns Expert Systems with applications29 4 (2005) 927ndash940

[28] Iris Figalist Christoph Elsner Jan Bosch and Helena Holmstroumlm Olsson 2019Customer churn prediction in B2B contexts In International Conference on Soft-ware Business Springer 378ndash386

[29] Emanuel Giger Martin Pinzger and Harald Gall 2010 Predicting the fix timeof bugs In Proceedings of the 2nd International Workshop on RecommendationSystems for Software Engineering ACM 52ndash56

[30] Frank E Harrell Jr 2015 Regression modeling strategies with applications to linearmodels logistic and ordinal regression and survival analysis Springer

[31] Steffen Hedegaard and Jakob Grue Simonsen 2013 Extracting usability and userexperience information from online user reviews In Proceedings of the SIGCHIConference on Human Factors in Computing Systems ACM 2089ndash2098

[32] Yujuan Jiang Bram Adams and Daniel M German 2013 Will my patch make itand how fast Case study on the linux kernel In Proceedings of the 10th WorkingConference on Mining Software Repositories IEEE Press 101ndash110

[33] Guolin Ke Jia Zhang Zhenhui Xu Jiang Bian and Tie-Yan Liu 2019 TabNNa universal neural network solution for tabular data httpsopenreviewnetforumid=r1eJssCqY7

[34] Foutse Khomh Brian Chan Ying Zou and Ahmed E Hassan 2011 An entropyevaluation approach for triaging field crashes A case study of mozilla firefox InReverse Engineering (WCRE) 2011 18th Working Conference on IEEE 261ndash270

[35] Andy Liaw and Matthew Wiener 2002 Classification and regression by randomforest R news 2 3 (2002) 18ndash22

[36] Erik Linstead Paul Rigor Sushil Bajracharya Cristina Lopes and Pierre Baldi2007 Mining eclipse developer contributions via author-topic models In FourthInternational Workshop on Mining Software Repositories (MSRrsquo07 ICSE Workshops2007) IEEE 30ndash30

[37] Xi Long Wenjing Yin Le An Haiying Ni Lixian Huang Qi Luo and Yan Chen2012 Churn analysis of online social network users using data mining techniquesIn Proceedings of the international MultiConference of Engineers and ConputerScientists Vol 1

[38] James MacQueen 1967 Some methods for classification and analysis of multivari-ate observations In Proceedings of the fifth Berkeley symposium on mathematicalstatistics and probability Vol 1 Oakland CA USA 281ndash297

[39] Lionel Marks Ying Zou and Ahmed E Hassan 2011 Studying the fix-timefor bugs in large open source projects In Proceedings of the 7th InternationalConference on Predictive Models in Software Engineering ACM 11

[40] Andrew Kachites McCallum 2002 Mallet A machine learning for languagetoolkit (2002)

[41] Miloš Milošević Nenad Živić and Igor Andjelković 2017 Early churn predic-tion with personalized targeting in mobile social games Expert Systems withApplications 83 (2017) 326ndash332

[42] Audris Mockus Roy T Fielding and James Herbsleb 2000 A case study ofopen source software development the Apache server In Proceedings of the 22ndinternational conference on Software engineering Acm 263ndash272

[43] Gustavo Percio Zimmermann Montesdioca and Antocircnio Carlos Gastaud Maccedilada2015 Measuring user satisfaction with information security practices Computersamp Security 48 (2015) 267ndash280

[44] Leann Myers and Maria J Sirois 2004 Spearman correlation coefficients differ-ences between Encyclopedia of statistical sciences 12 (2004)

[45] Didrik Nielsen 2016 Tree boosting with XGBoost-why does XGBoost win EveryMachine Learning Competition Masterrsquos thesis NTNU

[46] Ehsan Noei Daniel Alencar Da Costa and Ying Zou 2018 Winning the appproduction rally In Proceedings of the 2018 26th ACM Joint Meeting on EuropeanSoftware Engineering Conference and Symposium on the Foundations of SoftwareEngineering ACM 283ndash294

[47] Ehsan Noei Feng Zhang Shaohua Wang and Ying Zou 2019 Towards pri-oritizing user-related issue reports of mobile applications Empirical SoftwareEngineering 24 4 (2019) 1964ndash1996

[48] Ehsan Noei Feng Zhang and Ying Zou 2019 Too Many User-Reviews WhatShould App Developers Look at First IEEE Transactions on Software Engineering(2019)

[49] Derek Orsquocallaghan Derek Greene Joe Carthy and Paacutedraig Cunningham 2015An analysis of the coherence of descriptors in topic modeling Expert Systemswith Applications 42 13 (2015) 5645ndash5657

[50] Ya A Pachepsky Dennis Timlin and GY Varallyay 1996 Artificial neural net-works to estimate soil water retention from easily measurable data Soil ScienceSociety of America Journal 60 3 (1996) 727ndash733

[51] Mahesh Pal 2005 Random forest classifier for remote sensing classificationInternational Journal of Remote Sensing 26 1 (2005) 217ndash222

[52] Sebastiano Panichella Andrea Di Sorbo Emitza Guzman Corrado A VisaggioGerardo Canfora and Harald C Gall 2015 How can i improve my app classifyinguser reviews for software maintenance and evolution In Software maintenanceand evolution (ICSME) 2015 IEEE international conference on IEEE 281ndash290

11

1277

1278

1279

1280

1281

1282

1283

1284

1285

1286

1287

1288

1289

1290

1291

1292

1293

1294

1295

1296

1297

1298

1299

1300

1301

1302

1303

1304

1305

1306

1307

1308

1309

1310

1311

1312

1313

1314

1315

1316

1317

1318

1319

1320

1321

1322

1323

1324

1325

1326

1327

1328

1329

1330

1331

1332

1333

1334

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

1335

1336

1337

1338

1339

1340

1341

1342

1343

1344

1345

1346

1347

1348

1349

1350

1351

1352

1353

1354

1355

1356

1357

1358

1359

1360

1361

1362

1363

1364

1365

1366

1367

1368

1369

1370

1371

1372

1373

1374

1375

1376

1377

1378

1379

1380

1381

1382

1383

1384

1385

1386

1387

1388

1389

1390

1391

1392

[53] Chitra Phadke Huseyin Uzunalioglu Veena B Mendiratta Dan Kushnir andDerek Doran 2013 Prediction of subscriber churn using social network analysisBell Labs Technical Journal 17 4 (2013) 63ndash76

[54] Peter C Rigby Daniel M German and Margaret-Anne Storey 2008 Open sourcesoftware peer review practices a case study of the apache server In Proceedingsof the 30th international conference on Software engineering ACM 541ndash550

[55] Martin P Robillard and Robert Deline 2011 A field study of API learning obstaclesEmpirical Software Engineering 16 6 (2011) 703ndash732

[56] Michael Roumlder Andreas Both and Alexander Hinneburg 2015 Exploring thespace of topic coherence measures In Proceedings of the eighth ACM internationalconference on Web search and data mining ACM 399ndash408

[57] Victor Francisco Rodriguez-Galiano Bardan Ghimire John Rogan Mario Chica-Olmo and Juan Pedro Rigol-Sanchez 2012 An assessment of the effectivenessof a random forest classifier for land-cover classification ISPRS Journal of Pho-togrammetry and Remote Sensing 67 (2012) 93ndash104

[58] Hiroaki Sakoe and Seibi Chiba 1978 Dynamic programming algorithm opti-mization for spoken word recognition IEEE transactions on acoustics speech andsignal processing 26 1 (1978) 43ndash49

[59] Stan Salvador and Philip Chan 2007 Toward accurate dynamic time warping inlinear time and space Intelligent Data Analysis 11 5 (2007) 561ndash580

[60] Peter B Seddon 1997 A respecification and extension of the DeLone and McLeanmodel of IS success Information systems research 8 3 (1997) 240ndash253

[61] Rudy Setiono Bart Baesens and Christophe Mues 2008 Recursive neural net-work rule extraction for data with mixed attributes IEEE Transactions on NeuralNetworks 19 2 (2008) 299ndash307

[62] Dong-Hee Shin 2015 Effect of the customer experience on satisfaction withsmartphones Assessing smart satisfaction index with partial least squaresTelecommunications Policy 39 8 (2015) 627ndash641

[63] Tom De Smedt and Walter Daelemans 2012 Pattern for python Journal ofMachine Learning Research 13 Jun (2012) 2063ndash2067

[64] Daniel Svozil Vladimir Kvasnicka and Jiri Pospichal 1997 Introduction to multi-layer feed-forward neural networks Chemometrics and intelligent laboratorysystems 39 1 (1997) 43ndash62

[65] Shaheen Syed and Marco Spruit 2017 Full-text or abstract Examining topiccoherence scores using latent dirichlet allocation In 2017 IEEE Internationalconference on data science and advanced analytics (DSAA) IEEE 165ndash174

[66] Chakkrit Tantithamthavorn Shane McIntosh Ahmed E Hassan and KenichiMatsumoto 2016 Automated parameter optimization of classification techniquesfor defect prediction models In 2016 IEEEACM 38th International Conference onSoftware Engineering (ICSE) IEEE 321ndash332

[67] Chakkrit Tantithamthavorn Shane McIntosh Ahmed E Hassan and KenichiMatsumoto 2018 The impact of automated parameter optimization on defectprediction models IEEE Transactions on Software Engineering (2018)

[68] Romain Tavenard 2017 tslearn A machine learning toolkit dedicated to time-series data httpsgithubcomrtavenartslearn

[69] L Torlay Marcela Perrone-Bertolotti Elizabeth Thomas and Monica Baciu 2017Machine learningndashXGBoost analysis of language networks to classify patientswith epilepsy Brain informatics 4 3 (2017) 159

[70] Geoffrey G Towell and Jude W Shavlik 1993 Extracting refined rules fromknowledge-based neural networks Machine learning 13 1 (1993) 71ndash101

[71] Irfan Ullah Basit Raza Ahmad Kamran Malik Muhammad Imran Saif Ul Islamand SungWon Kim 2019 A churn prediction model using random forest analysisof machine learning techniques for churn prediction and factor identification intelecom sector IEEE Access 7 (2019) 60134ndash60149

[72] Lorenzo Villarroel Gabriele Bavota Barbara Russo Rocco Oliveto and Massimil-iano Di Penta 2016 Release planning of mobile apps based on user reviews InProceedings of the 38th International Conference on Software Engineering ACM14ndash24

[73] Chih-Ping Wei and I-Tang Chiu 2002 Turning telecommunications call detailsto churn prediction a data mining approach Expert systems with applications 232 (2002) 103ndash112

[74] Cathrin Weiss Rahul Premraj Thomas Zimmermann and Andreas Zeller 2007How long will it take to fix this bug In Fourth International Workshop on MiningSoftware Repositories (MSRrsquo07 ICSE Workshops 2007) IEEE 1ndash1

[75] Ian H Witten Eibe Frank and A Mark 2016 Hall and Christopher J Pal DataMining Practical machine learning tools and techniques (2016)

[76] Barbara H Wixom and Peter A Todd 2005 A theoretical integration of usersatisfaction and technology acceptance Information systems research 16 1 (2005)85ndash102

[77] Jae-Chern Yoo and Tae Hee Han 2009 Fast normalized cross-correlation Circuitssystems and signal processing 28 6 (2009) 819

12

  • Abstract
  • 1 Introduction
  • 2 Motivating Example
  • 3 Data Collection
  • 4 Research Questions amp Results
  • 5 Threats To Validity
  • 6 Related Work
  • 7 Conclusion
  • References
Page 4: On the Relationship between User Churn and Software Issues · relationship between the issues that are present in a software prod-uct and user churn. For this purpose, we investigate

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

Table 1 A summary of our collected data

Domain Alternatives Reviews Comments Potential churn Issue reports

IDEs 296 1088 3036 4319 (Eclipse) 22758 (Eclipse)

Browsers 290 1754 5112 6421 (Firefox) 40749 (Firefox)

Web Servers 124 714 1933 1809 (Apache) 20965 (Apache)

comments (explained in Section 2) and the reviews for all the soft-ware alternatives We also collect the potential user churn (explainedin Section 2) for the Eclipse Firefox and Apache projects Finally inaddition to the data collected from alternativetonet we also collectthe issue reports from the respective Issue Tracking Systems of ourstudied projects We use the potential user churn data combinedwith the issue reports to perform our investigations in RQ1-RQ3Table 1 summarizes the data collected for our study (ie the datatype and the amount of collected data)

4 RESEARCH QUESTIONS amp RESULTSIn this section we present our research questions (RQs) along withtheir results For each RQ we present its motivation and explainthe approach that is used to conduct the RQ

RQ0 What are the most significant usersrsquoconcerns for Browsers IDEs and Web ServersMotivation The motivation behind RQ0 is to understand the mainconcerns within the competition scene of a specific software do-main This investigation is important to highlight what are themain strengths and weaknesses within a particular domain (egBrowsers) This knowledge can be useful because for example anorganization can study the main weaknesses within a domain andbetter assess whether its productsservices fall within the sameflawsApproach To study the most significant concerns related toBrowsers IDEs and Web Servers we use the in-perspect commentsand reviews data that were explained in Section 3 To analyze suchdata we adopt the Latent Dirichlet Allocation (LDA) to find themain topics that are present in the in-perspective comments andreviews data We use the LDA implementation provided by theMALLET framework [40] LDA is a probabilistic generative methodthat assumes a Dirichlet distribution of latent topics LDA is partic-ular useful for us since we are interested in finding the main topicsthat are present in the in-perspective comments and reviews dataHowever The LDA model requires the data to be pre-processedin a certain format for optimal results Therefore we follow thestandard text pre-processing steps [48]

Fixing Typographical Errors Typographical errors can bedue to the usage of internet slang in a written text or simply due tocommon language mistakes The most recurrent of errors that wefind in our data is the repetition of characters For example userstend to use words such as ldquobeeeestrdquo to emphasize their feelings Weuse the Pattern For Python [63] package to fix such typographicalerrors

Removing Stop Words Examples of stop words are of byand the These are words that are recurrently used in written textbut do not possess meaning by themselves We use the package

nltk [8] to remove such stop words This package contains a well-known corpus of stop words and is widely used for text filtering

Stemming andLemmatization For better results words shouldbe repeated as frequently as they can since LDA relies heavily onthe occurrences of words in different sentences Hence the usageof stemming and lemmatization is essential These methods trans-form inflectional forms of a word to its basic form For examplethe words ldquofixedrdquo ldquofixingrdquo and ldquofixesrdquo are transformed to the basicform ldquofixrdquo Stemming works by removing the suffix of the word(eg the word ldquocatsrdquo becomes ldquocatrdquo by removing the ldquosrdquo) whilelemmatization retrieves the basic form of a word from a dictionaryknown as lemma (by using morphological analysis) We use bothstemming and lemmatization to ensure that we obtain the mostunique words possible

Forming Bigrams and Trigrams Forming bigrams and tri-grams is essential to group the words which frequently appeartogether Some words do not inflict any significant meaning for theLDA model on their own However when grouped together suchwords may indicate a specific topic One example of these wordsis ldquocustomer supportrdquo which has a precise meaning (as opposed toonly ldquocustomerrdquo or ldquosupportrdquo which can hold diverse meanings)

To choose the optimal number of topics in our LDAmodel we usethe coherence scoremetric [56] This metric was found to be the bestin alliance with human perception [49 65] However even whenrelying on the coherence score (which naturally limits redundancyin the produced topics) there is still the possibility of duplicatetopics being generated Therefore we perform a manual analysisof the produced topics to eliminate duplicates Our LDA modelproduces topics in the form of keywords and their percentage-of-significance with respect to the topic For example (0097 lowastldquo119904119900119906119903119888119890rdquo + 0085 lowast ldquo119901119903119894119907119886119888119910rdquo + 0053 lowast ldquo119905119903119886119888119896rdquo + 0050 lowast ldquo119901119903119900 119891 119894119905rdquo +0038 lowast ldquo119902119906119886119899119905119906119898rdquo + 0038 lowast ldquo119888119900119897119897119890119888119905rdquo + 0035 lowast ldquo119901119903119900119905119890119888119905rdquo + 0032 lowastldquo119886119903119888ℎ119894119907119890rdquo + 0029 lowast ldquo119898119900119899119890119910rdquo + 0026 lowast ldquo119901119890119903 119891 119900119903119898119886119899119888119890rdquo) are a set ofkeywords that indicate Privacy as high level topic But some otherdistribution of keywords could refer to the same topic Two of theauthors conduct together the manual analyses of the topics Thefinal set of topics is only produced when full consensus is reachedbetween authors (ie the authors went over and discussed all thetopics) Because both authors have analyzed together all topics andagreements were reached on all the topics in the end we did notcompute inter-rater agreement

Once the topics are defined we extract 100 examples (aiming fora 95 confidence level with a 10 margin error) from our data foreach studied domain (ie IDE Browser and Web Server) totalling300 examples For each example two of the authors assign theirtopic Again two authors analyze and discuss each example sothat consensus is reached regarding the assigned topic Afterwardswe compute the topic distribution for each domain which canrepresent the most recurrent concerns for that domainFindings Tables 2 and 3 show the topics obtained by our LDAmodelWe also show some of the examples for each topic in the table(all the examples and analyses are available in our replication pack-age which is hosted on httpsdoiorg105281zenodo3610584

Our obtained topics can be categorized into (i) general topicswhich are present in the three domains and (ii) domain-specifictopics which might still be present in other domains but are more

4

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

Alternativetonet

Issue tracking system

Fetch data from

studied domains

IDE Browser

and Web Server

alternatives

Fetch in-

perspective

comment and

reviews

In-perspective

comments

Software

reviews

RQ0

Competition

scene of a

domain

Fetch issues

from Firefox

Eclipse and

Apache

Fetch potential

churn from

Eclipse Firefox

and Apache

Potential churns

Issue reports

Issues amp

Potential churn

pairs

RQ1 Correlating

issues with

potential churns

RQ2 Machine

learning to

predict potential

churns

RQ3 Important

features

associated with

potential churns

Figure 5 An overview of our data collection process

dominant in a specific domain Table 4 shows the distribution oftopics per domain based on our manual analyses We observe thatrelease and updates memory issues bugscrashes and cross-platformavailability can be considered as general topics (ie they are im-portant in all the three analyzed domains) Indeed these topics areintuitively present when handling any kind of software

On the other hand Documentation and testing were presentfor web servers and IDEs signalling that a robust documentation(which is often overlooked [55]) and with good support for test-ing tools may definitely attractretain more users when competingin the IDE and Web Sever Market For example an organizationmay direct their attention to these aspects when studying theircompetition

The bookmarks privacy and security were exclusively present inthe Browsers domain signalling that these are essential features toretain or attract users Simplistic design debug options code refac-toring and auto-completion are topics that target IDEs exclusively

Finally complaints about release and updates memory issues andbugscrashes can be considered as the main topics that drive usersto churn These topics were present in the three domains with thehighest percentages among each of the domains The percentagescombined exceed 50 in each of the domains (see Table 4)Summary Our observations suggest that the usersrsquo concerns aretightly related to issues (eg bugs or crashes) that are present inthe respective software products

RQ1 Can we find a correlation betweenpotential user churn and issue reportsMotivation In RQ0 we observe that software issues (eg bugsamp crashes and memory issues) are common concerns that maymotivate users to churn With such an observation we hypothesizein RQ1 that the potential usersrsquochurn observed on alternativetonetmay be correlated with the issue reports which developers addressin their respective projects This analysis is important because ifsuch a correlation exists the potential user churn information canbe possibly used to prioritize issue reports (for example the moreassociated with potential user churn the more important the issuereport)

Approach In our work we approach the recommendations foralternatives (ie potential usersrsquochurn) and the issue reports asevents occurring over time and interpret them as two time-seriesSuch an interpretation is handy for two reasons (i) we can studythe correlation between two trends (potential usersrsquochurn and issuereports) and (ii) we can study whether the specific peaks in issuereports can be associated with peaks in potential usersrsquochurn

In our study we group the two time series in a weekly basisie all the potential user churn that occurred within a week aregrouped together We use the weekly grouping because groupingissue reports in a daily-basis is too narrow to form a pattern whilegrouping them in a monthly-basis can over-generalize the patternsin issues The time-series of potential usersrsquochurn and the time-series of issues showcase the events (grouped in a weekly-basis)from March 2015 (the earliest date on alternativetonet) to July 2019forming 200 weeks grouping

The first step to correlated our two time-series is to verifywhetherthere exists a global correlation between them (ie whether thereexist common patterns among the increasing and decreasing trendsover time) To do so we use the cross-correlation which is a metricused in signal processing [77] This correlation calculates the simi-larity between two time-series as a function of the displacementfrom one time-series to another For example given two functions119891 and 119892 the cross-correlation calculates the degree to which a shiftin 119892 (along the x-axis) is identical to the shift in 119891

However since we are also interested in analyzing the relation-ship between the peaks of our two time-series we use the DynamicTimeWarping (DTW) [58] which was designed for comparing time-series varying at different speeds A peak in potential usersrsquochurnmay be followed by a peak in issue reports (or vice-versa) ie thepeaks may not necessarily occur at the same time Differently fromthe Eucledian distance which assumes that a point 119894 in a time-seriesmust be aligned to the 119894119905ℎ point in another time-series [38 75] DTWallows the variance in the offset between two time-series [59]

To employ DTW we use the tslearn [68] which is a machinelearning toolkit specialized for time series analysis The DTW com-putation produces a matrix that shows the distance between everytwo points in the graph of the two time-series The algorithm isdesigned in a way to choose the closest point that respects the

5

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

633

634

635

636

637

638

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

639

640

641

642

643

644

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

691

692

693

694

695

696

Table 2 The predominant topics in alternativetonet

Release and updatesKeywords ldquo119907119890119903119904119894119900119899rdquo ldquo119897119900119899119892119890119903 rdquo ldquo119903119890119897119890119886119904119890rdquo ldquo119889119900119908119899119897119900119886119889rdquo ldquo119900 119891 119891 119894119888119894119886119897rdquoExamples - The program seems to be no longer updated Last versions (2018) can be still downloaded from the official website

- Firefox version 60+ (Quantum) is better than Chrome Compare to previous versions of Firefox

Bookmarks FunctionalityKeywords ldquo119887119900119900119896119898119886119903119896rdquo ldquo119901119886119899119890119897rdquo ldquo119891 119886119907119900119903119894119905119890rdquo ldquo119904119906119901119901119900119903119905rdquo ldquo119904119910119899119888rdquoExamples - Firefox now allows bookmarks importing from Chrome which is a huge plus

- With the recent upgrade to 20 (including the bookmarks syncing feature that I liked in Firefox) Vivaldi just got better

Flash supportKeywords ldquo119901119897119906119892119894119899rdquo ldquo119907119894119889119890119900rdquo ldquo119894119899119904119905119886119897119897rdquo ldquo119891 119897119886119904ℎrdquo ldquo119898119890119889119894119886 minus 119901119897119886119910119890119903 rdquoExamples - Since its the only way to watch Flash video on an iPhone without jailbreaking Skyfire is a killer app thats worth getting for anyone Playback of Flash video over WiFi is

flawless and the quality is at very least watchable

Memory issuesKeywords ldquo119898119890119898119900119903119910rdquo ldquo119877119860119872rdquo ldquo119897119886119892rdquo ldquo119897119894119898119894119905rdquo ldquoℎ119890119886119907119910rdquoExamples - Great C++ IDE which balances memory usage and indexing solution Contains some basic refactoring functions Good for large projects and limited RAM with remaining

some capabilities of more complex C++ IDEs- Eclipse is really lagging after the last update The memory is filled whenever I run a Java swing applet- Chrome is so heavy it lags alot when I open multiple tabs

Privacy and securityKeywords ldquo119901119903119894119907119886119888119910rdquo ldquo119904119890119888119906119903119894119905119910rdquo ldquo119894119898119901119900119903119905119886119899119905rdquo ldquo119886119897119897119900119908rdquo ldquo119888119900119899119905119903119900119897rdquoExamples - Firefox allows far more control over your privacy than does Chrome (or even Chromium on which Chrome is based)

- Google Chrome is a great browser and a lot better than Vivaldi Browser Chrome keeps updating with cool stuff like the new cast feature better extensions and lot betterprivacy and security Vivaldi has more customizing features but has a long way to go to compete with Chrome

BugsCrashesKeywords ldquo119888119903119886119904ℎrdquo ldquo119887119906119892rdquo ldquo119891 119894119909rdquo ldquo119890119909119901119890119903119894119890119899119888119890rdquo ldquo119894119904119904119906119890rdquoExamples - Lots of users experience Adobe Flash Player hangs or crashes I dont know why it isnt fixed yet

If Chrome ever got its proxy support bugs fixed it would leap to the top of the list and become pretty much the only web browser Id ever use- I worked with NetBeans for a couple of years and always loved it but for professional users the yearly fee is well spent Nicer interface better Code Intelligence moreoptions to fine tune what happens on saving a file (eg saving or uploading to various locations) You get a couple of feature update each year plus bugfixes whereasNetBeans is sometimes very slow to update things (support for new version of a language) or to fix known bugs

Table 3 The predominant topics in alternativetonet (continued)

Debug optionsKeywords ldquo119891 119890119886119905119906119903119890rdquo ldquo119889119890119907119890119897119900119901119898119890119899119905rdquo ldquo119889119890119887119906119892rdquo ldquo119894119899119905119890119892119903119886119905119894119900119899rdquo ldquo119901119890119903 119891 119890119888119905rdquoExamples - Not enough debug and run options just a text editor Atom has not got any solid debugging features (and packages) out of the box or at all

- This is my favorite code editor It is open source and portable you can get a portable version that will fit onto a flash drive if youwant It is just as configurableas Sublime Text and has a large community producing great plugins it has GIT support built in and excellent support for Javascript HTML CSS and Pythonand others of course It has an integrated debugging system

Simplistic designKeywords ldquo119904119894119898119901119897119890rdquo ldquo119890119889119894119905119900119903 rdquo ldquo119897119894119892ℎ119905119908119890119894119892ℎ119905rdquo ldquo119889119890119904119894119892119899rdquo ldquo119899119900119905119890119901119886119889rdquoExamples - Really lightweight with some advanced not complete but easy to use auto complete features Simple design and customizable

Cross-platformKeywords ldquo119888119903119900119904119904 minus 119901119897119886119905 119891 119900119903119898rdquo ldquo119871119894119899119906119909rdquo ldquo119900119901119890119899 minus 119904119900119906119903119888119890rdquo ldquo119886119907119886119894119897119886119887119897119890rdquo ldquo119894119899119904119905119886119897119897rdquoExamples - Its cross platform and open source and bring the benefits of the JetBrains IDE platform which is well polished and powerful

- One of the best open source browsers that work on Linux and Windows

Code refactoring and auto-completionKeywords ldquo119888119900119889119890rdquo ldquordquo119886119906119905119900 minus 119888119900119898119901119897119890119905119894119900119899rdquo ldquo119903119890 119891 119886119888119905119900119903 rdquo ldquo119900119901119905119894119900119899rdquo ldquo119906119904119890rdquoExamples - For its young age it is well matured and a very powerful IDE with great CMake Support and other goodies like great refactoring tools and a sophisticated

auto-completion system

TestingKeywords ldquo119905119890119904119905rdquo ldquo119904119890119903119907119890119903 rdquo ldquo119889119890119901119897119900119910rdquo ldquo119908119890119887119904119894119905119890rdquo ldquo119904119890119905rdquoExamples - XAMPP is great tool to develop and test your website (particularly if it uses php and mysql databases) offline before putting it on a live server You could

also use XAMPP as an easy method of setting up a live server- Eclipse is far more superior than netbeans in unit testing functionality

DocumentationKeywords ldquo119889119900119888119906119898119890119899119905119886119905119894119900119899rdquo ldquo119904119890119903119907119890119903 rdquo ldquo119901119900119903119905rdquo ldquo119889119886119905119886119887119886119904119890rdquo ldquo119891 119900119903119906119898rdquoExamples - Easiest way to run a web server on Windows The performance is surprisingly good and lots of documentation on how to manage it

- Support is via forum only Documentation can be very poor particularly around product upgrades (one expects more from commercial packages) Due totheir poor documentation I lost several years of development databases when porting from an old to a new computer (they dont accurately document thelocation of the database files)- Eclipse lacks in adding plugins documentation

time constraints for continuity and monotonicity Finally two co-authors performed a manual analysis of the observed peaks in the

two time-series7 The goal of the manual analyses is to find whetherthe identified peaks are indeed likely related (eg the issue reports

7We share the data of our manual analyses at httpsdoiorg105281zenodo3610584

6

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

808

809

810

811

812

Table 4 The extracted topics percentages distribution overthe three domains

Topics Browsers() IDEs() Web servers()Release and updates 18 17 21Bookmarks functionality 12 0 0Memory issues 23 22 11Privacy and security 8 0 0BugsCrashes 24 19 20Debug options 0 10 0Simplistic design 0 8 2Cross-platform 8 6 5Code refactoring and auto-completion 0 10 0Testing 0 2 17Documentation 0 2 15Miscellaneous 7 4 9

within a peak actually describe problems mentioned in the poten-tial usersrsquochurn) Again both co-authors analyze and discuss everyanalyzed sample

In this RQ we perform our analyses only in our studied projects(as opposed to analyzing all the software alternatives in their do-mains) As discussed in Section 3 it would be impracticable toextract the data from all the different Issue Tracking Systems of thesoftware alternatives in the studied domainsFindingsWe respectively observe a cross-correlation of 076 072and 070 in Firefox Eclipse and Apache These cross-correlationscores indicate that there exist a global commonality between thetrends of the two time-series regardless of the timing factor

Regarding our DTW analysis we observe that the time fromthe filing of an issue report to the time of a spark in potentialusersrsquochurn span from 2 to 4 weeks (on the median for the threeprojects) Additionally our DTW correlation reveals a bidirectionalrelationship between the peaks in issue reports and the peaks inpotential usersrsquochurn For the cases where a peak in issue reportsleads to a peak in potential usersrsquochurn it is intuitive to think thatthe peak in potential usersrsquochurn is due to the frustration generatedby the issues described in the reports On the other hand the caseswhere a peak in potential usersrsquochurn is followed by a peak inissue reports may occur due to users being frustrated over non-obvious issues For example if we consider the potential user churndepicted in Figure 3 (ie ldquoJupyter is slow on Firefoxrdquo) the users mightnot find it obvious that such a problem should be reported to theFirefox team (eg users may simply interpret it as a characteristicof Firefox) Therefore in such cases issue reports would be filledonly a while after the potential usersrsquochurn have been expressed

We are specifically interested in studying the relationship be-tween reported issues (ie the documented issues in the trackersystems) and the potential user churn Our rationale is that under-standing the characteristics of issues that are associated with userchurn can help developers to prioritize such issues and avoid futurechurn (thereby retaining more users)

Finally Table 5 shows a subset of the examples found in ourmanual analyses We can indeed observe the apparent relationshipbetween the comments within the potential usersrsquochurn and thedescriptions (or titles) of issue reports The examples are followinga unidirectional link in which the issue report that occurs at acertain time would induce the potential user churn at a later timeSummary Our results suggest that there exists a significant cor-relation between software issues and potential user churn

RQ2 Can we use machine learning models topredict the rate of potential user churnMotivation In the practice most of the issue reports undergo hu-man estimations of the priority and the severity [5] usually basedon internal business-oriented factors (such as ROI [25]) Issuesmight be prioritized by their recency the affected platforms theversions of the software where the issue is present Feeding a ma-chine learning model to identify the potential user churn that arecaused by issues may help automate the issue report prioritizationprocess and ultimately save huge costs in software developmentApproach We train three different machine learning models topredict low and high user churn rates by using information from is-sue reports The first model is a Feed-Forward Neural Network withtwo hidden layers developed by using the framework Keras [17]Keras is widely used to train simple neural networks as it hasa higher abstraction than TensorFlow [1] Feed-Forward NeuralNetworks [64] are the most suitable type of neural networks tobe trained on tabular data [27 33 50] This network evaluates allthe possible combinations of different metrics values to guide thepredictions

Our second model is the XGboost model which is a gradientboosting tree [12 13] XGboost is widely used for binary classifica-tion problems Research has shown that XGboost performs betterthan neural networks and classic classification machine learningalgorithms in problems dealing with tabular data [14 45 69]

We also train Random Forest models [35] Random Forest isa classical machine learning algorithm which is widely used inproblems dealing with tabular data [26 51 57] In addition Ran-dom Forest models have been widely used in software engineeringresearch such as in defect prediction [67]

The goal of ourmodels is to predict the rate of potential usersrsquochurngiven the characteristics of issue reports The data used to fit ourmodels is the data obtained from our DTW associations Simply putthe DTW provides us with 119875 pairs of issue reports 119868 and potentialusersrsquochurn 119862 (ie 119875 lt 119868 119862 gt) Therefore we study the character-istics of the issues 119868 to predict the amount of potential user churn119862 We split the distribution of 119862 into two percentiles (ie low andhigh) The low percentile being from 0 to 50 and the high from51 to 100 Thus our models output dichotomous predictions iewhether the potential usersrsquochurn is low or high

The characteristics of issue reports that we include in our models(henceforth referred to as features) are described in Table 6 Thesefeatures are collected from the Issue Tracking systems of our threestudied projects We study features such as Component Hardwareand OS to account for the locality and the spread of the issuesFor example these features can indicate whether platform-specificissues or more general issues have more potential user churn

Other features such as the Severity and Priority showcase theprioritization used by developers The Severity and Priority areimportant for us to verify whether the prioritization performed bydevelopers is aligned with the rate of potential user churn

Features related to the type of the bugs (ie whether it is acrash) whether a version number is present or the target milestoneprovide us information regarding the tracking process of issuesThey are important to understand whether better-tracked issues

7

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

870

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

871

872

873

874

875

876

877

878

879

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

Table 5 Manual investigation of DTW correlation linking

FirefoxExample 1Correlation Linking Week 164 in issues to week 169 in switches (May 2018)Issue Report Title Firefox version 5303 appears high memory high resources usage and might causing for SSD damagesIssue Description Mozilla Firefox it is caching all of this data 4-5GB on my new SSD and its lost 4 from its health within 5 months due continuously writesoverwrites firefox cache on my SSDAltrnativeto Comment Switching to Chrome after version 5303 high memory is damaging my SSDExample 2Correlation Linking Week 66 in issues to week 68 in switches (November 2016)Issue Report Title Sessions are not cleared in the private windowIssue Description After closing private window (not a tab) and opening a new one I still logged inAltrnativeto Comment Chrome incognito functionality is more stable than Firefox Firefox doesnrsquot resets sessions

EclipseExample 1Correlation Linking Week 10 in issues to week 11 in switches (July 2015)Issue Report Title Not Java 8 CompatibleIssue Description When downloading Eclipse for the first time for Java development 64-bit (which is what my computer is) it is not running because it is not compatible with Java version 8

(which is the latest version of Java) but appears to be compatible with version 7 still -DosgirequiredJavaVersion=17Altrnativeto Comment Using Netbeans because Eclipse is not installingExample 2Correlation Linking Week 211 in issues to week 213 in switches (February 2019)Issue Report Title jar files in the exported application do not contain class filesIssue Description The application has many plugins It builds and runs well within eclipse but when I export the application using the eclipse export product wizard the jar files (corresponding to

the plugins) do not contain any class files they include meta-inf pluginxml and all the extra folders (eg lib assets etc when available) but they do NOT contain any class filesAltrnativeto Comment Didnrsquot find any problems in exporting jar (when recommending IntelliJ)

ApacheExample 1Correlation Linking Week 113 in issues to week 117 in switches (July 2017)Issue Report Title httpd 2426 no longer building against lua 531 or lua 534Issue Description Building Apache 2426 against lua 531 or 534 compiled from scratch (latest version as of this writing) does not work Compilation against lua 531 did work with 2425Altrnativeto Comment Some issues while building with lua (when recommending XAMPP)Example 2Correlation Linking Week 12 in issues to week 13 in switches (September 2019)Issue Report Title Apache Server is restarted time to time (Release 2410)Issue Description I am using Apache release 2410 in Windows Server 2008 Time to time the Apache server is getting restarted The service is unavailable for 3-5 minutes and after that the service

is started up automatically This issue occurs frequently twice a day two days once etcAltrnativeto Comment XAMPP is more stable I am experiencing random restarts throughout the day when using Apache

Table 6 The issue reports attributes for Firefox Eclipse andApache

Metric Description

Component consists of different components of the system such as bookmarkstheme toolbar etc

Status describes the status of the issue such as unconfirmed open re-solved etc

Resolution describes the resolution type of the issue such as fixed duplicateinvalid etc

Severity the assigned severity such as blocker critical normal etcHardware the different hardware affected by the issue such as desktop mobile

32 or 64 bits etcOS the different OS affected by the issue such as OSX Windows etcPriority the assigned priority from 1 to 5Type the assigned issue type such as defect enhancement or taskTime Alive the time difference between the date opened and the date resolved

in minutesTime since last change the time difference between the date opened and the date of last

change in minutesIs a Crash a boolean value to indicate if it is a crash reportHas a Version Number a boolean whether the issue report has a version numberHas a Target Milestone a boolean whether the issue report has a target numberHas a Regression Range a boolean value to indicate if the bug causes a regression testingNumber of comments the number of comments for the reportNumber of votes the number of votes for the report which validates the integrity of

the report by other usersNumber of Blocks the number of issues that are blocked by that issue until it is re-

solved

(ie with more tracking information) are associated with potentialusersrsquochurn

The number of comments votes and blocks showcase the effortinvested on discussing an issue Finally the time-alive and the time-since-last-change are two features related to the life-cycle of an issuereport Longer time-alive values indicate a lingering issue that hasnot been resolved yet while a short time-since-last-change values

indicate issues that have been recently addressed (and thus can stillbe reoccurring)

Our studied features are inspired by previous research in soft-ware engineering that have built machine learning models usinginformation present in issue reports [15 16 16 18 19 29 32 39 74]

We chose features that are common across the issue reports ofFirefox Eclipse and Apache We filtered out features that are uniqueto an issue tracker system only to ensure a fair comparison betweenour studied issue reports Afterwards we performed Spearmancorrelation tests [44] on the selected features in which each featureis tested against the set of all the other features We did not observeany correlation obtaining a value higher than 07 which suggeststhat our selected features are not correlated and can be safely fedinto our machine learning models [30]

To evaluate the performance of our models we use the Area Un-der the Curvemetric (AUC) AUC is useful for evaluating our modelsbecause our models output probabilities Therefore AUC showsthe discrimination power of our models in every probability thresh-old (unlike Precision Recall or F-measure which are limited toevaluating models at a single probability threshold [66]) The AUCvalues range from 0 to 1 An AUC of 05 denotes a random guessingmodel while an AUC of 1 denotes a perfect distinguishing powerAn AUC of 0 denotes a model with perfect inverse predictions

Given the temporal nature of our data (ie future issue reportscannot be used to predict the potential user churn of past issuereports) we adopt a leave-one-out validation approach to obtainour AUC values

First our issue reports have already been sorted when we per-form the time-series analyses Second we split our data into two

8

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

987

988

989

990

991

992

993

994

995

996

997

998

999

1000

1001

1002

1003

1004

1005

1006

1007

1008

1009

1010

1011

1012

1013

1014

1015

1016

1017

1018

1019

1020

1021

1022

1023

1024

1025

1026

1027

1028

1029

1030

1031

1032

1033

1034

1035

1036

1037

1038

1039

1040

1041

1042

1043

1044

sets one containing 80 of the data ie the training set and an-other containing 20 ie the validation set In the first iterationof our models we use 80 of the data to train our models and weuse the first element of the validation set to test out models Thenthe leave-one-out validation iteratively increases the training setsize by one element (from the validation set) and predicts the nextelement of the validation set This process is repeated until thetraining set reaches the size of 119873 minus 1 (where 119873 is the size of ourdata set) Finally our obtained AUC values are the aggregation(ie median) of all the AUC values obtained in the leave-one-outiterationsFindingsTheAUC values obtained for our threemodels are shownin our online appendix8 The XGboost model performs the best inthe three studied projects obtaining AUC values that exceed 083There exists a subtle difference between the AUC values of theXGboost model and the Feed Forward Neural Networks AUCssince both the XGboost model and the Neural Networks can graspthe complex relationships in our issue reports dataSummary Our results suggest that machine learning models suchas XGboost and Feed Forward Neural Networks can predict therate of user churn with relatively high accuracies (in terms of AUC)Such predictions could help in the prioritization of issue reports

RQ3 What are the most important factorsrelated to potential user churnMotivation In RQ2 we observe that machine learning modelscan be used to predict the rate of potential usersrsquochurn Howeverdevelopers would hardly blindly trust a machine learning modelto help on their decisions (and they should not do so) Developerswould mostly benefit from understanding the reasons behind thepredictions of our machine learning models so that they couldpossibly adapt (or not) their development process Therefore inRQ3 we investigate the most important features in the predictionsof our machine learning modelApproach In this RQ we study the most important features in thepredictions of the XGboost models since they obtain the best AUCvalues in our three studied projects

To find the most important features we adopt a simple featureextraction algorithm [61 70] Consider that our models are trainedon a feature set 119883 = 1199091 1199092 119909119899 (as explained in RQ2) The featureextraction process consists of iteratively (i) removing each feature 119909119894from our models and (ii) computing the AUC without each feature119909119894 (by using the same leave-one-out validation process explainedin RQ2) The AUC values from the models without a feature 119909119894 arecompared to the models containing all the features (ie we takethe difference 119860119880119862119900119903119894119892119894119899119886119897 minus119860119880119862119903119890119898119900119907119890119889 ) The higher the drop inthe AUC value caused by removing a certain feature 119909119894 the higherthe importance of such a feature 119909119894 [4 9 10]Findings Table 7 shows our obtained results after performing thefeature extraction process The table shows the drop in the AUC foreach feature (ie ΔAUC) We also show the minimum maximummean and median values of the features in the 119897119900119908 user churn rate(ie 0) and ℎ119894119892ℎ user churn rate (ie 1) prediction classes

8httpsdoiorg105281zenodo3610584

The Issue Alive Time is the most important feature obtaining thehighest ΔAUC in our three studied projects This result suggeststhat it is unhealthy to leave issues hanging for a long time as theycan be perceived as a reason for users to churn

The Number of Comments obtains a considerable ΔAUC in ourthree studied systems The association between the Number ofComments and the rate of user churnmay occur due to themultitudenumber of users being affected by an issue

The Time-since-last-change is also an important feature in theFirefox and Eclipse projects This result suggests that issues withrecurring changes or modifications may signal a certain instabilityin the fixing process which might cause a potential user churn inthe future

It is worthy noting that features such as Severity Priority andType of Bugs do not obtain a high importance in the predictionswhich might suggest that the existing prioritization fields used inissue reports are misaligned with the potential user churnSummaryOur results suggest that long lived issues can potentiallylead to more potential userrsquos churn and that the existing processesfor prioritizing issues should be augmented to capture the highlyinteractive and long lived issues

5 THREATS TO VALIDITYConstruct Validity The main construct validity of our study isrelated to the assumption of a potential user churn The alterna-tivetonet website does not record exactly when a user has chosen asoftware alternative over another software Instead alternativetonetrecords that a user thinks that a software may be a good alternativeto another software The motivation behind the act of signaling analternative is not known by us For example there might be usersthat are simply driven by the passion of contributing to the alterna-tivetonet community which lead them to provide several opinionsas to which software could make good alternatives to others Wehave deliberately adopted the term potential user churn instead ofsimply user churn due to this unknown motivations behind signal-ing alternatives Moreover it is reasonable to think that if a givensoftware has received a considerable amount of recommendationfor alternatives this may likely indeed represent that users areconsidering to change the software

Internal Validity Our internal threats to validity are mainlyrelated to (i) our chosen features and (ii) our prediction classes Wesplit the peaks of potential user churn into the low and high classesOther research may find different results if the potential user churnis modeled in different way In addition we acknowledge that ourset of metrics is not exhaustive For example we have not studiedwhether the presence of a stack-trace in an issue report can berelated to future potential user churn We plan to (i) model thepotential user churn as a continuous variable and (ii) extend our setof features to predict the potential user churn in future research

External Validity Our study targeted three domains that areweb servers IDEs and web browsers These three domains werechosen due to their availability since we could fetch their issuereports data Although our study showed common factors in ouranalyses we cannot generalize our results to other domains orprojects with a different scale (ie smaller projects) Regardless ourwork provides insights of the possible reasons behind user churn

9

1045

1046

1047

1048

1049

1050

1051

1052

1053

1054

1055

1056

1057

1058

1059

1060

1061

1062

1063

1064

1065

1066

1067

1068

1069

1070

1071

1072

1073

1074

1075

1076

1077

1078

1079

1080

1081

1082

1083

1084

1085

1086

1087

1088

1089

1090

1091

1092

1093

1094

1095

1096

1097

1098

1099

1100

1101

1102

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

1103

1104

1105

1106

1107

1108

1109

1110

1111

1112

1113

1114

1115

1116

1117

1118

1119

1120

1121

1122

1123

1124

1125

1126

1127

1128

1129

1130

1131

1132

1133

1134

1135

1136

1137

1138

1139

1140

1141

1142

1143

1144

1145

1146

1147

1148

1149

1150

1151

1152

1153

1154

1155

1156

1157

1158

1159

1160

Table 7 The most important features for Firefox Eclipse and Apache ranked by their drop in AUC

Factor ΔAUC Min 0 Max 0 Mean 0 Median 0 Min 1 Max 1 Mean 1 Median 1

FirefoxTime Alive 013 0 120 1102 13 10 1487 4872 63OS All 007 0 1 014 0 0 1 083 1Number of Comments 006 1 65 894 12 1 458 2811 45Time Since Last Change 006 9 30 1090 22 0 5 410 3Has a Target Milestone 005 0 1 022 0 0 1 079 1Has a Version Number 004 0 1 033 0 0 1 064 1Component UI Web Payments 003 0 1 042 0 0 1 061 1

EclipseTime Alive 012 0 103 901 12 8 1334 4396 49Has a Target Milestone 006 0 1 019 0 0 1 082 1Time since last change 006 7 27 851 19 0 6 312 3Has a Version Number 005 0 1 034 0 0 1 077 1Component Debug 004 0 1 047 0 0 1 072 1Number of Comments 003 1 72 1092 11 1 632 2701 72OS All 002 0 1 050 0 0 1 068 1

ApacheTime Alive 014 0 136 1132 14 14 1521 5310 70Component Utils 010 0 1 022 0 0 1 091 1Has a Version Number 009 0 1 038 0 0 1 086 1Number of Comments 005 1 92 1232 6 1 352 4207 79OS All 004 0 1 033 0 0 1 079 1Status REOPENED 003 0 1 040 0 0 1 073 1Hardware All 002 0 1 051 0 0 1 078 1

6 RELATEDWORKGiven that we study the relationship between potential user churnand issue reports our research is related to the areas of usercustomerchurn and software quality Therefore we survey the related re-search around these two areas in this section

UserCustomerChurnUser or customer churn has beenwidelystudied in the area of telecommunication services [3 28 71] Aminet al [3] proposed a cross-company machine learning model topredict customer churn The authors also propose the use of datatransformation (eg by using the log function) to improve the train-ing data Ullah et al [71] used feature selection techniques such asInformation Gain and Correlation Attributes Ranking Filter to selectwhich features better explained the churn behaviour of differentcustomer groups The authors observed that a Random Forest modelperformed the best in their study While the Business to Customers(B2C) churn has been widely studied Figalist et al [28] recentlyinvestigated the Business to Business (B2B) churn ie when thecustomer business changes shifts to the services provided by thecompetition Figalist et al [28] allude that although B2B relation-ships tend to be more stable they have a much bigger financial im-pact when they change In terms of software engineering there hasbeen a lack of studies that have investigated usercostumer churndirectly Instead much research has been invested on customerreviews regarding software products [6 46 47] Noei et al [46]studied which features from mobile apps are associated with betterranks in the Google Play store While the work by Noei et al [46]does not study the customer churn directly striving for better ranksin Google Play may avoid user churn in mobile apps Bavota etal [6] investigated the relationship between the usage of fault- andchange-prone APIs in Android apps and user reviews Differentlyfrom all the aforementioned work our study investigates the userchurn is software products

Software Quality Although there exist a lack of studies ana-lyzing the user churn of software productsservices there has beena considerable amount of research on software quality [5 34 39 6667 74] Research on software quality is important since a quality

software is more stable and less defect-prone Ultimately such char-acteristics are tightly related to the observations of our study(iesoftware issues are correlated with user churn) A prominent areaof software quality research in software engineering is the defectprediction area [66 67] Tantithamthavorn et al [66] studied howmuch improvement can be obtained by automatically optimizingdefect prediction models Their research shows that automatic op-timization can yield significant or insignificant performance gainsdepending on the machine learning algorithm Other considerableamount of research has been invested on the triaging part of issuereports [5] and on the effort estimation of an issue report (in termsof time to address an issue) [39 74] In this paper we complementprior research by studying the relationship shared between issuereports (eg defects or required enhancements) with the potentialuser churn of users

7 CONCLUSIONIn this work we study the data available on alternativetonet tobetter understand the relationship between software issues and thepotential user churn of users Having observed that user concernsare tightly related to software issues (eg bugs) we investigatethe relationship between issue reports and the potential user churnof users Our study reveals key issues that must be addressed forthe success of a software product (depending on the domain) Forexample we observe that the potential user churn of users may betightly related to the lack of a robust documentation and support fortesting tools (in the ldquoIDErdquo and ldquoWeb Serverrdquo domains) Finally ourmachine learning models reveal that (i) the longer the issue takes tobe fixed the higher the chances of user churn and (ii) issues withinmore general software components are more likely to be associatedwith user churn Finally we suggest that the current prioritizationperformed by developers should be augmented to encompass thelong lived and highly interactive issues In overall our study sug-gests that the prioritization process of issues can be improved byconsidering the potential user churn of users associated with suchissues

10

1161

1162

1163

1164

1165

1166

1167

1168

1169

1170

1171

1172

1173

1174

1175

1176

1177

1178

1179

1180

1181

1182

1183

1184

1185

1186

1187

1188

1189

1190

1191

1192

1193

1194

1195

1196

1197

1198

1199

1200

1201

1202

1203

1204

1205

1206

1207

1208

1209

1210

1211

1212

1213

1214

1215

1216

1217

1218

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

1219

1220

1221

1222

1223

1224

1225

1226

1227

1228

1229

1230

1231

1232

1233

1234

1235

1236

1237

1238

1239

1240

1241

1242

1243

1244

1245

1246

1247

1248

1249

1250

1251

1252

1253

1254

1255

1256

1257

1258

1259

1260

1261

1262

1263

1264

1265

1266

1267

1268

1269

1270

1271

1272

1273

1274

1275

1276

REFERENCES[1] Martiacuten Abadi Paul Barham Jianmin Chen Zhifeng Chen Andy Davis Jeffrey

Dean Matthieu Devin Sanjay Ghemawat Geoffrey Irving Michael Isard Man-junath Kudlur Josh Levenberg Rajat Monga Sherry Moore Derek G MurrayBenoit Steiner Paul Tucker Vijay Vasudevan Pete Warden Martin Wicke YuanYu and Xiaoqiang Zheng 2016 Tensorflow A system for large-scale machinelearning In 12th USENIX Symposium on Operating Systems Design and Imple-mentation (OSDI 16) 265ndash283

[2] W Abdelmoez Mohamed Kholief and Fayrouz M Elsalmy 2012 Bug fix-time pre-diction model using naiumlve bayes classifier In 2012 22nd International Conferenceon Computer Theory and Applications (ICCTA) IEEE 167ndash172

[3] Adnan Amin Babar Shah Asad Masood Khattak Thar Baker and Sajid Anwar2018 Just-in-time customer churn prediction with and without data transforma-tion In 2018 IEEE Congress on Evolutionary Computation (CEC) IEEE 1ndash6

[4] Hafeez Ullah Amin Aamir Saeed Malik Rana Fayyaz Ahmad Nasreen BadruddinNidal Kamel Muhammad Hussain and Weng-Tink Chooi 2015 Feature extrac-tion and classification for EEG signals using wavelet transform and machinelearning techniques Australasian physical amp engineering sciences in medicine 381 (2015) 139ndash149

[5] John Anvik Lyndon Hiew and Gail C Murphy 2006 Who should fix this bugIn Proceedings of the 28th international conference on Software engineering ACM361ndash370

[6] Gabriele Bavota Mario Linares-Vasquez Carlos Eduardo Bernal-Cardenas Mas-similiano Di Penta Rocco Oliveto and Denys Poshyvanyk 2014 The impactof api change-and fault-proneness on the user ratings of android apps IEEETransactions on Software Engineering 41 4 (2014) 384ndash407

[7] Pamela Bhattacharya and Iulian Neamtiu 2011 Bug-fix time prediction modelscan we do better In Proceedings of the 8thWorking Conference on Mining SoftwareRepositories ACM 207ndash210

[8] Steven Bird and Edward Loper 2004 NLTK the natural language toolkit InProceedings of the ACL 2004 on Interactive poster and demonstration sessionsAssociation for Computational Linguistics 31

[9] Andrew P Bradley 1997 The use of the area under the ROC curve in theevaluation of machine learning algorithms Pattern recognition 30 7 (1997)1145ndash1159

[10] Leo Breiman 2001 Random forests Machine learning 45 1 (2001) 5ndash32[11] Albert Caruana and Michael T Ewing 2010 How corporate reputation quality

and value influence online loyalty Journal of Business Research 63 9-10 (2010)1103ndash1110

[12] Tianqi Chen and Carlos Guestrin 2016 Xgboost A scalable tree boosting systemIn Proceedings of the 22nd acm sigkdd international conference on knowledgediscovery and data mining ACM 785ndash794

[13] Tianqi Chen Tong He Michael Benesty Vadim Khotilovich and Yuan Tang 2015Xgboost extreme gradient boosting R package version 04-2 (2015) 1ndash4

[14] Wenbin Chen Kun Fu Jiawei Zuo Xinwei Zheng Tinglei Huang and WenjuanRen 2017 Radar emitter classification for large data set based on weighted-xgboost IET Radar Sonar amp Navigation 11 8 (2017) 1203ndash1207

[15] Morakot Choetkiertikul Hoa Khanh Dam Truyen Tran and Aditya Ghose 2015Predicting delays in software projects using networked classification (t) In 201530th IEEEACM International Conference on Automated Software Engineering (ASE)IEEE 353ndash364

[16] Morakot Choetkiertikul Hoa Khanh Dam Truyen Tran Aditya Ghose and JohnGrundy 2017 Predicting delivery capability in iterative software developmentIEEE Transactions on Software Engineering 44 6 (2017) 551ndash573

[17] Franccedilois Chollet 2017 Keras[18] Daniel Alencar Da Costa Shane McIntosh Uiraacute Kulesza Ahmed E Hassan and

Surafel Lemma Abebe 2018 An empirical study of the integration time of fixedissues Empirical Software Engineering 23 1 (2018) 334ndash383

[19] Daniel Alencar da Costa Shane McIntosh Christoph Treude Uiraacute Kulesza andAhmed E Hassan 2018 The impact of rapid release cycles on the integrationdelay of fixed issues Empirical Software Engineering 23 2 (2018) 835ndash904

[20] Piew Datta Brij Masand Deepak R Mani and Bin Li 2000 Automated cellularmodeling and prediction on a large scale Artificial Intelligence Review 14 6 (2000)485ndash502

[21] Kushal S Dave Vishal Vaingankar Sumanth Kolar and Vasudeva Varma 2013Timespent based models for predicting user retention In Proceedings of the 22ndinternational conference on World Wide Web ACM 331ndash342

[22] William H DeLone and Ephraim R McLean 1992 Information systems successThe quest for the dependent variable Information systems research 3 1 (1992)60ndash95

[23] William H DeLone and Ephraim R McLean 2003 Model of information systemssuccess a ten years update Journal of Management (2003)

[24] Gideon Dror Dan Pelleg Oleg Rokhlenko and Idan Szpektor 2012 Churnprediction in new users of Yahoo answers In Proceedings of the 21st InternationalConference on World Wide Web ACM 829ndash834

[25] Khaled El Emam 2005 The ROI from software quality Auerbach Publications[26] Katherine Ellis Jacqueline Kerr Suneeta Godbole Gert Lanckriet David Wing

and Simon Marshall 2014 A random forest classifier for the prediction of energy

expenditure and type of physical activity from wrist and hip accelerometersPhysiological measurement 35 11 (2014) 2191

[27] David Enke and Suraphan Thawornwong 2005 The use of datamining and neuralnetworks for forecasting stock market returns Expert Systems with applications29 4 (2005) 927ndash940

[28] Iris Figalist Christoph Elsner Jan Bosch and Helena Holmstroumlm Olsson 2019Customer churn prediction in B2B contexts In International Conference on Soft-ware Business Springer 378ndash386

[29] Emanuel Giger Martin Pinzger and Harald Gall 2010 Predicting the fix timeof bugs In Proceedings of the 2nd International Workshop on RecommendationSystems for Software Engineering ACM 52ndash56

[30] Frank E Harrell Jr 2015 Regression modeling strategies with applications to linearmodels logistic and ordinal regression and survival analysis Springer

[31] Steffen Hedegaard and Jakob Grue Simonsen 2013 Extracting usability and userexperience information from online user reviews In Proceedings of the SIGCHIConference on Human Factors in Computing Systems ACM 2089ndash2098

[32] Yujuan Jiang Bram Adams and Daniel M German 2013 Will my patch make itand how fast Case study on the linux kernel In Proceedings of the 10th WorkingConference on Mining Software Repositories IEEE Press 101ndash110

[33] Guolin Ke Jia Zhang Zhenhui Xu Jiang Bian and Tie-Yan Liu 2019 TabNNa universal neural network solution for tabular data httpsopenreviewnetforumid=r1eJssCqY7

[34] Foutse Khomh Brian Chan Ying Zou and Ahmed E Hassan 2011 An entropyevaluation approach for triaging field crashes A case study of mozilla firefox InReverse Engineering (WCRE) 2011 18th Working Conference on IEEE 261ndash270

[35] Andy Liaw and Matthew Wiener 2002 Classification and regression by randomforest R news 2 3 (2002) 18ndash22

[36] Erik Linstead Paul Rigor Sushil Bajracharya Cristina Lopes and Pierre Baldi2007 Mining eclipse developer contributions via author-topic models In FourthInternational Workshop on Mining Software Repositories (MSRrsquo07 ICSE Workshops2007) IEEE 30ndash30

[37] Xi Long Wenjing Yin Le An Haiying Ni Lixian Huang Qi Luo and Yan Chen2012 Churn analysis of online social network users using data mining techniquesIn Proceedings of the international MultiConference of Engineers and ConputerScientists Vol 1

[38] James MacQueen 1967 Some methods for classification and analysis of multivari-ate observations In Proceedings of the fifth Berkeley symposium on mathematicalstatistics and probability Vol 1 Oakland CA USA 281ndash297

[39] Lionel Marks Ying Zou and Ahmed E Hassan 2011 Studying the fix-timefor bugs in large open source projects In Proceedings of the 7th InternationalConference on Predictive Models in Software Engineering ACM 11

[40] Andrew Kachites McCallum 2002 Mallet A machine learning for languagetoolkit (2002)

[41] Miloš Milošević Nenad Živić and Igor Andjelković 2017 Early churn predic-tion with personalized targeting in mobile social games Expert Systems withApplications 83 (2017) 326ndash332

[42] Audris Mockus Roy T Fielding and James Herbsleb 2000 A case study ofopen source software development the Apache server In Proceedings of the 22ndinternational conference on Software engineering Acm 263ndash272

[43] Gustavo Percio Zimmermann Montesdioca and Antocircnio Carlos Gastaud Maccedilada2015 Measuring user satisfaction with information security practices Computersamp Security 48 (2015) 267ndash280

[44] Leann Myers and Maria J Sirois 2004 Spearman correlation coefficients differ-ences between Encyclopedia of statistical sciences 12 (2004)

[45] Didrik Nielsen 2016 Tree boosting with XGBoost-why does XGBoost win EveryMachine Learning Competition Masterrsquos thesis NTNU

[46] Ehsan Noei Daniel Alencar Da Costa and Ying Zou 2018 Winning the appproduction rally In Proceedings of the 2018 26th ACM Joint Meeting on EuropeanSoftware Engineering Conference and Symposium on the Foundations of SoftwareEngineering ACM 283ndash294

[47] Ehsan Noei Feng Zhang Shaohua Wang and Ying Zou 2019 Towards pri-oritizing user-related issue reports of mobile applications Empirical SoftwareEngineering 24 4 (2019) 1964ndash1996

[48] Ehsan Noei Feng Zhang and Ying Zou 2019 Too Many User-Reviews WhatShould App Developers Look at First IEEE Transactions on Software Engineering(2019)

[49] Derek Orsquocallaghan Derek Greene Joe Carthy and Paacutedraig Cunningham 2015An analysis of the coherence of descriptors in topic modeling Expert Systemswith Applications 42 13 (2015) 5645ndash5657

[50] Ya A Pachepsky Dennis Timlin and GY Varallyay 1996 Artificial neural net-works to estimate soil water retention from easily measurable data Soil ScienceSociety of America Journal 60 3 (1996) 727ndash733

[51] Mahesh Pal 2005 Random forest classifier for remote sensing classificationInternational Journal of Remote Sensing 26 1 (2005) 217ndash222

[52] Sebastiano Panichella Andrea Di Sorbo Emitza Guzman Corrado A VisaggioGerardo Canfora and Harald C Gall 2015 How can i improve my app classifyinguser reviews for software maintenance and evolution In Software maintenanceand evolution (ICSME) 2015 IEEE international conference on IEEE 281ndash290

11

1277

1278

1279

1280

1281

1282

1283

1284

1285

1286

1287

1288

1289

1290

1291

1292

1293

1294

1295

1296

1297

1298

1299

1300

1301

1302

1303

1304

1305

1306

1307

1308

1309

1310

1311

1312

1313

1314

1315

1316

1317

1318

1319

1320

1321

1322

1323

1324

1325

1326

1327

1328

1329

1330

1331

1332

1333

1334

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

1335

1336

1337

1338

1339

1340

1341

1342

1343

1344

1345

1346

1347

1348

1349

1350

1351

1352

1353

1354

1355

1356

1357

1358

1359

1360

1361

1362

1363

1364

1365

1366

1367

1368

1369

1370

1371

1372

1373

1374

1375

1376

1377

1378

1379

1380

1381

1382

1383

1384

1385

1386

1387

1388

1389

1390

1391

1392

[53] Chitra Phadke Huseyin Uzunalioglu Veena B Mendiratta Dan Kushnir andDerek Doran 2013 Prediction of subscriber churn using social network analysisBell Labs Technical Journal 17 4 (2013) 63ndash76

[54] Peter C Rigby Daniel M German and Margaret-Anne Storey 2008 Open sourcesoftware peer review practices a case study of the apache server In Proceedingsof the 30th international conference on Software engineering ACM 541ndash550

[55] Martin P Robillard and Robert Deline 2011 A field study of API learning obstaclesEmpirical Software Engineering 16 6 (2011) 703ndash732

[56] Michael Roumlder Andreas Both and Alexander Hinneburg 2015 Exploring thespace of topic coherence measures In Proceedings of the eighth ACM internationalconference on Web search and data mining ACM 399ndash408

[57] Victor Francisco Rodriguez-Galiano Bardan Ghimire John Rogan Mario Chica-Olmo and Juan Pedro Rigol-Sanchez 2012 An assessment of the effectivenessof a random forest classifier for land-cover classification ISPRS Journal of Pho-togrammetry and Remote Sensing 67 (2012) 93ndash104

[58] Hiroaki Sakoe and Seibi Chiba 1978 Dynamic programming algorithm opti-mization for spoken word recognition IEEE transactions on acoustics speech andsignal processing 26 1 (1978) 43ndash49

[59] Stan Salvador and Philip Chan 2007 Toward accurate dynamic time warping inlinear time and space Intelligent Data Analysis 11 5 (2007) 561ndash580

[60] Peter B Seddon 1997 A respecification and extension of the DeLone and McLeanmodel of IS success Information systems research 8 3 (1997) 240ndash253

[61] Rudy Setiono Bart Baesens and Christophe Mues 2008 Recursive neural net-work rule extraction for data with mixed attributes IEEE Transactions on NeuralNetworks 19 2 (2008) 299ndash307

[62] Dong-Hee Shin 2015 Effect of the customer experience on satisfaction withsmartphones Assessing smart satisfaction index with partial least squaresTelecommunications Policy 39 8 (2015) 627ndash641

[63] Tom De Smedt and Walter Daelemans 2012 Pattern for python Journal ofMachine Learning Research 13 Jun (2012) 2063ndash2067

[64] Daniel Svozil Vladimir Kvasnicka and Jiri Pospichal 1997 Introduction to multi-layer feed-forward neural networks Chemometrics and intelligent laboratorysystems 39 1 (1997) 43ndash62

[65] Shaheen Syed and Marco Spruit 2017 Full-text or abstract Examining topiccoherence scores using latent dirichlet allocation In 2017 IEEE Internationalconference on data science and advanced analytics (DSAA) IEEE 165ndash174

[66] Chakkrit Tantithamthavorn Shane McIntosh Ahmed E Hassan and KenichiMatsumoto 2016 Automated parameter optimization of classification techniquesfor defect prediction models In 2016 IEEEACM 38th International Conference onSoftware Engineering (ICSE) IEEE 321ndash332

[67] Chakkrit Tantithamthavorn Shane McIntosh Ahmed E Hassan and KenichiMatsumoto 2018 The impact of automated parameter optimization on defectprediction models IEEE Transactions on Software Engineering (2018)

[68] Romain Tavenard 2017 tslearn A machine learning toolkit dedicated to time-series data httpsgithubcomrtavenartslearn

[69] L Torlay Marcela Perrone-Bertolotti Elizabeth Thomas and Monica Baciu 2017Machine learningndashXGBoost analysis of language networks to classify patientswith epilepsy Brain informatics 4 3 (2017) 159

[70] Geoffrey G Towell and Jude W Shavlik 1993 Extracting refined rules fromknowledge-based neural networks Machine learning 13 1 (1993) 71ndash101

[71] Irfan Ullah Basit Raza Ahmad Kamran Malik Muhammad Imran Saif Ul Islamand SungWon Kim 2019 A churn prediction model using random forest analysisof machine learning techniques for churn prediction and factor identification intelecom sector IEEE Access 7 (2019) 60134ndash60149

[72] Lorenzo Villarroel Gabriele Bavota Barbara Russo Rocco Oliveto and Massimil-iano Di Penta 2016 Release planning of mobile apps based on user reviews InProceedings of the 38th International Conference on Software Engineering ACM14ndash24

[73] Chih-Ping Wei and I-Tang Chiu 2002 Turning telecommunications call detailsto churn prediction a data mining approach Expert systems with applications 232 (2002) 103ndash112

[74] Cathrin Weiss Rahul Premraj Thomas Zimmermann and Andreas Zeller 2007How long will it take to fix this bug In Fourth International Workshop on MiningSoftware Repositories (MSRrsquo07 ICSE Workshops 2007) IEEE 1ndash1

[75] Ian H Witten Eibe Frank and A Mark 2016 Hall and Christopher J Pal DataMining Practical machine learning tools and techniques (2016)

[76] Barbara H Wixom and Peter A Todd 2005 A theoretical integration of usersatisfaction and technology acceptance Information systems research 16 1 (2005)85ndash102

[77] Jae-Chern Yoo and Tae Hee Han 2009 Fast normalized cross-correlation Circuitssystems and signal processing 28 6 (2009) 819

12

  • Abstract
  • 1 Introduction
  • 2 Motivating Example
  • 3 Data Collection
  • 4 Research Questions amp Results
  • 5 Threats To Validity
  • 6 Related Work
  • 7 Conclusion
  • References
Page 5: On the Relationship between User Churn and Software Issues · relationship between the issues that are present in a software prod-uct and user churn. For this purpose, we investigate

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

Alternativetonet

Issue tracking system

Fetch data from

studied domains

IDE Browser

and Web Server

alternatives

Fetch in-

perspective

comment and

reviews

In-perspective

comments

Software

reviews

RQ0

Competition

scene of a

domain

Fetch issues

from Firefox

Eclipse and

Apache

Fetch potential

churn from

Eclipse Firefox

and Apache

Potential churns

Issue reports

Issues amp

Potential churn

pairs

RQ1 Correlating

issues with

potential churns

RQ2 Machine

learning to

predict potential

churns

RQ3 Important

features

associated with

potential churns

Figure 5 An overview of our data collection process

dominant in a specific domain Table 4 shows the distribution oftopics per domain based on our manual analyses We observe thatrelease and updates memory issues bugscrashes and cross-platformavailability can be considered as general topics (ie they are im-portant in all the three analyzed domains) Indeed these topics areintuitively present when handling any kind of software

On the other hand Documentation and testing were presentfor web servers and IDEs signalling that a robust documentation(which is often overlooked [55]) and with good support for test-ing tools may definitely attractretain more users when competingin the IDE and Web Sever Market For example an organizationmay direct their attention to these aspects when studying theircompetition

The bookmarks privacy and security were exclusively present inthe Browsers domain signalling that these are essential features toretain or attract users Simplistic design debug options code refac-toring and auto-completion are topics that target IDEs exclusively

Finally complaints about release and updates memory issues andbugscrashes can be considered as the main topics that drive usersto churn These topics were present in the three domains with thehighest percentages among each of the domains The percentagescombined exceed 50 in each of the domains (see Table 4)Summary Our observations suggest that the usersrsquo concerns aretightly related to issues (eg bugs or crashes) that are present inthe respective software products

RQ1 Can we find a correlation betweenpotential user churn and issue reportsMotivation In RQ0 we observe that software issues (eg bugsamp crashes and memory issues) are common concerns that maymotivate users to churn With such an observation we hypothesizein RQ1 that the potential usersrsquochurn observed on alternativetonetmay be correlated with the issue reports which developers addressin their respective projects This analysis is important because ifsuch a correlation exists the potential user churn information canbe possibly used to prioritize issue reports (for example the moreassociated with potential user churn the more important the issuereport)

Approach In our work we approach the recommendations foralternatives (ie potential usersrsquochurn) and the issue reports asevents occurring over time and interpret them as two time-seriesSuch an interpretation is handy for two reasons (i) we can studythe correlation between two trends (potential usersrsquochurn and issuereports) and (ii) we can study whether the specific peaks in issuereports can be associated with peaks in potential usersrsquochurn

In our study we group the two time series in a weekly basisie all the potential user churn that occurred within a week aregrouped together We use the weekly grouping because groupingissue reports in a daily-basis is too narrow to form a pattern whilegrouping them in a monthly-basis can over-generalize the patternsin issues The time-series of potential usersrsquochurn and the time-series of issues showcase the events (grouped in a weekly-basis)from March 2015 (the earliest date on alternativetonet) to July 2019forming 200 weeks grouping

The first step to correlated our two time-series is to verifywhetherthere exists a global correlation between them (ie whether thereexist common patterns among the increasing and decreasing trendsover time) To do so we use the cross-correlation which is a metricused in signal processing [77] This correlation calculates the simi-larity between two time-series as a function of the displacementfrom one time-series to another For example given two functions119891 and 119892 the cross-correlation calculates the degree to which a shiftin 119892 (along the x-axis) is identical to the shift in 119891

However since we are also interested in analyzing the relation-ship between the peaks of our two time-series we use the DynamicTimeWarping (DTW) [58] which was designed for comparing time-series varying at different speeds A peak in potential usersrsquochurnmay be followed by a peak in issue reports (or vice-versa) ie thepeaks may not necessarily occur at the same time Differently fromthe Eucledian distance which assumes that a point 119894 in a time-seriesmust be aligned to the 119894119905ℎ point in another time-series [38 75] DTWallows the variance in the offset between two time-series [59]

To employ DTW we use the tslearn [68] which is a machinelearning toolkit specialized for time series analysis The DTW com-putation produces a matrix that shows the distance between everytwo points in the graph of the two time-series The algorithm isdesigned in a way to choose the closest point that respects the

5

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

633

634

635

636

637

638

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

639

640

641

642

643

644

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

691

692

693

694

695

696

Table 2 The predominant topics in alternativetonet

Release and updatesKeywords ldquo119907119890119903119904119894119900119899rdquo ldquo119897119900119899119892119890119903 rdquo ldquo119903119890119897119890119886119904119890rdquo ldquo119889119900119908119899119897119900119886119889rdquo ldquo119900 119891 119891 119894119888119894119886119897rdquoExamples - The program seems to be no longer updated Last versions (2018) can be still downloaded from the official website

- Firefox version 60+ (Quantum) is better than Chrome Compare to previous versions of Firefox

Bookmarks FunctionalityKeywords ldquo119887119900119900119896119898119886119903119896rdquo ldquo119901119886119899119890119897rdquo ldquo119891 119886119907119900119903119894119905119890rdquo ldquo119904119906119901119901119900119903119905rdquo ldquo119904119910119899119888rdquoExamples - Firefox now allows bookmarks importing from Chrome which is a huge plus

- With the recent upgrade to 20 (including the bookmarks syncing feature that I liked in Firefox) Vivaldi just got better

Flash supportKeywords ldquo119901119897119906119892119894119899rdquo ldquo119907119894119889119890119900rdquo ldquo119894119899119904119905119886119897119897rdquo ldquo119891 119897119886119904ℎrdquo ldquo119898119890119889119894119886 minus 119901119897119886119910119890119903 rdquoExamples - Since its the only way to watch Flash video on an iPhone without jailbreaking Skyfire is a killer app thats worth getting for anyone Playback of Flash video over WiFi is

flawless and the quality is at very least watchable

Memory issuesKeywords ldquo119898119890119898119900119903119910rdquo ldquo119877119860119872rdquo ldquo119897119886119892rdquo ldquo119897119894119898119894119905rdquo ldquoℎ119890119886119907119910rdquoExamples - Great C++ IDE which balances memory usage and indexing solution Contains some basic refactoring functions Good for large projects and limited RAM with remaining

some capabilities of more complex C++ IDEs- Eclipse is really lagging after the last update The memory is filled whenever I run a Java swing applet- Chrome is so heavy it lags alot when I open multiple tabs

Privacy and securityKeywords ldquo119901119903119894119907119886119888119910rdquo ldquo119904119890119888119906119903119894119905119910rdquo ldquo119894119898119901119900119903119905119886119899119905rdquo ldquo119886119897119897119900119908rdquo ldquo119888119900119899119905119903119900119897rdquoExamples - Firefox allows far more control over your privacy than does Chrome (or even Chromium on which Chrome is based)

- Google Chrome is a great browser and a lot better than Vivaldi Browser Chrome keeps updating with cool stuff like the new cast feature better extensions and lot betterprivacy and security Vivaldi has more customizing features but has a long way to go to compete with Chrome

BugsCrashesKeywords ldquo119888119903119886119904ℎrdquo ldquo119887119906119892rdquo ldquo119891 119894119909rdquo ldquo119890119909119901119890119903119894119890119899119888119890rdquo ldquo119894119904119904119906119890rdquoExamples - Lots of users experience Adobe Flash Player hangs or crashes I dont know why it isnt fixed yet

If Chrome ever got its proxy support bugs fixed it would leap to the top of the list and become pretty much the only web browser Id ever use- I worked with NetBeans for a couple of years and always loved it but for professional users the yearly fee is well spent Nicer interface better Code Intelligence moreoptions to fine tune what happens on saving a file (eg saving or uploading to various locations) You get a couple of feature update each year plus bugfixes whereasNetBeans is sometimes very slow to update things (support for new version of a language) or to fix known bugs

Table 3 The predominant topics in alternativetonet (continued)

Debug optionsKeywords ldquo119891 119890119886119905119906119903119890rdquo ldquo119889119890119907119890119897119900119901119898119890119899119905rdquo ldquo119889119890119887119906119892rdquo ldquo119894119899119905119890119892119903119886119905119894119900119899rdquo ldquo119901119890119903 119891 119890119888119905rdquoExamples - Not enough debug and run options just a text editor Atom has not got any solid debugging features (and packages) out of the box or at all

- This is my favorite code editor It is open source and portable you can get a portable version that will fit onto a flash drive if youwant It is just as configurableas Sublime Text and has a large community producing great plugins it has GIT support built in and excellent support for Javascript HTML CSS and Pythonand others of course It has an integrated debugging system

Simplistic designKeywords ldquo119904119894119898119901119897119890rdquo ldquo119890119889119894119905119900119903 rdquo ldquo119897119894119892ℎ119905119908119890119894119892ℎ119905rdquo ldquo119889119890119904119894119892119899rdquo ldquo119899119900119905119890119901119886119889rdquoExamples - Really lightweight with some advanced not complete but easy to use auto complete features Simple design and customizable

Cross-platformKeywords ldquo119888119903119900119904119904 minus 119901119897119886119905 119891 119900119903119898rdquo ldquo119871119894119899119906119909rdquo ldquo119900119901119890119899 minus 119904119900119906119903119888119890rdquo ldquo119886119907119886119894119897119886119887119897119890rdquo ldquo119894119899119904119905119886119897119897rdquoExamples - Its cross platform and open source and bring the benefits of the JetBrains IDE platform which is well polished and powerful

- One of the best open source browsers that work on Linux and Windows

Code refactoring and auto-completionKeywords ldquo119888119900119889119890rdquo ldquordquo119886119906119905119900 minus 119888119900119898119901119897119890119905119894119900119899rdquo ldquo119903119890 119891 119886119888119905119900119903 rdquo ldquo119900119901119905119894119900119899rdquo ldquo119906119904119890rdquoExamples - For its young age it is well matured and a very powerful IDE with great CMake Support and other goodies like great refactoring tools and a sophisticated

auto-completion system

TestingKeywords ldquo119905119890119904119905rdquo ldquo119904119890119903119907119890119903 rdquo ldquo119889119890119901119897119900119910rdquo ldquo119908119890119887119904119894119905119890rdquo ldquo119904119890119905rdquoExamples - XAMPP is great tool to develop and test your website (particularly if it uses php and mysql databases) offline before putting it on a live server You could

also use XAMPP as an easy method of setting up a live server- Eclipse is far more superior than netbeans in unit testing functionality

DocumentationKeywords ldquo119889119900119888119906119898119890119899119905119886119905119894119900119899rdquo ldquo119904119890119903119907119890119903 rdquo ldquo119901119900119903119905rdquo ldquo119889119886119905119886119887119886119904119890rdquo ldquo119891 119900119903119906119898rdquoExamples - Easiest way to run a web server on Windows The performance is surprisingly good and lots of documentation on how to manage it

- Support is via forum only Documentation can be very poor particularly around product upgrades (one expects more from commercial packages) Due totheir poor documentation I lost several years of development databases when porting from an old to a new computer (they dont accurately document thelocation of the database files)- Eclipse lacks in adding plugins documentation

time constraints for continuity and monotonicity Finally two co-authors performed a manual analysis of the observed peaks in the

two time-series7 The goal of the manual analyses is to find whetherthe identified peaks are indeed likely related (eg the issue reports

7We share the data of our manual analyses at httpsdoiorg105281zenodo3610584

6

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

808

809

810

811

812

Table 4 The extracted topics percentages distribution overthe three domains

Topics Browsers() IDEs() Web servers()Release and updates 18 17 21Bookmarks functionality 12 0 0Memory issues 23 22 11Privacy and security 8 0 0BugsCrashes 24 19 20Debug options 0 10 0Simplistic design 0 8 2Cross-platform 8 6 5Code refactoring and auto-completion 0 10 0Testing 0 2 17Documentation 0 2 15Miscellaneous 7 4 9

within a peak actually describe problems mentioned in the poten-tial usersrsquochurn) Again both co-authors analyze and discuss everyanalyzed sample

In this RQ we perform our analyses only in our studied projects(as opposed to analyzing all the software alternatives in their do-mains) As discussed in Section 3 it would be impracticable toextract the data from all the different Issue Tracking Systems of thesoftware alternatives in the studied domainsFindingsWe respectively observe a cross-correlation of 076 072and 070 in Firefox Eclipse and Apache These cross-correlationscores indicate that there exist a global commonality between thetrends of the two time-series regardless of the timing factor

Regarding our DTW analysis we observe that the time fromthe filing of an issue report to the time of a spark in potentialusersrsquochurn span from 2 to 4 weeks (on the median for the threeprojects) Additionally our DTW correlation reveals a bidirectionalrelationship between the peaks in issue reports and the peaks inpotential usersrsquochurn For the cases where a peak in issue reportsleads to a peak in potential usersrsquochurn it is intuitive to think thatthe peak in potential usersrsquochurn is due to the frustration generatedby the issues described in the reports On the other hand the caseswhere a peak in potential usersrsquochurn is followed by a peak inissue reports may occur due to users being frustrated over non-obvious issues For example if we consider the potential user churndepicted in Figure 3 (ie ldquoJupyter is slow on Firefoxrdquo) the users mightnot find it obvious that such a problem should be reported to theFirefox team (eg users may simply interpret it as a characteristicof Firefox) Therefore in such cases issue reports would be filledonly a while after the potential usersrsquochurn have been expressed

We are specifically interested in studying the relationship be-tween reported issues (ie the documented issues in the trackersystems) and the potential user churn Our rationale is that under-standing the characteristics of issues that are associated with userchurn can help developers to prioritize such issues and avoid futurechurn (thereby retaining more users)

Finally Table 5 shows a subset of the examples found in ourmanual analyses We can indeed observe the apparent relationshipbetween the comments within the potential usersrsquochurn and thedescriptions (or titles) of issue reports The examples are followinga unidirectional link in which the issue report that occurs at acertain time would induce the potential user churn at a later timeSummary Our results suggest that there exists a significant cor-relation between software issues and potential user churn

RQ2 Can we use machine learning models topredict the rate of potential user churnMotivation In the practice most of the issue reports undergo hu-man estimations of the priority and the severity [5] usually basedon internal business-oriented factors (such as ROI [25]) Issuesmight be prioritized by their recency the affected platforms theversions of the software where the issue is present Feeding a ma-chine learning model to identify the potential user churn that arecaused by issues may help automate the issue report prioritizationprocess and ultimately save huge costs in software developmentApproach We train three different machine learning models topredict low and high user churn rates by using information from is-sue reports The first model is a Feed-Forward Neural Network withtwo hidden layers developed by using the framework Keras [17]Keras is widely used to train simple neural networks as it hasa higher abstraction than TensorFlow [1] Feed-Forward NeuralNetworks [64] are the most suitable type of neural networks tobe trained on tabular data [27 33 50] This network evaluates allthe possible combinations of different metrics values to guide thepredictions

Our second model is the XGboost model which is a gradientboosting tree [12 13] XGboost is widely used for binary classifica-tion problems Research has shown that XGboost performs betterthan neural networks and classic classification machine learningalgorithms in problems dealing with tabular data [14 45 69]

We also train Random Forest models [35] Random Forest isa classical machine learning algorithm which is widely used inproblems dealing with tabular data [26 51 57] In addition Ran-dom Forest models have been widely used in software engineeringresearch such as in defect prediction [67]

The goal of ourmodels is to predict the rate of potential usersrsquochurngiven the characteristics of issue reports The data used to fit ourmodels is the data obtained from our DTW associations Simply putthe DTW provides us with 119875 pairs of issue reports 119868 and potentialusersrsquochurn 119862 (ie 119875 lt 119868 119862 gt) Therefore we study the character-istics of the issues 119868 to predict the amount of potential user churn119862 We split the distribution of 119862 into two percentiles (ie low andhigh) The low percentile being from 0 to 50 and the high from51 to 100 Thus our models output dichotomous predictions iewhether the potential usersrsquochurn is low or high

The characteristics of issue reports that we include in our models(henceforth referred to as features) are described in Table 6 Thesefeatures are collected from the Issue Tracking systems of our threestudied projects We study features such as Component Hardwareand OS to account for the locality and the spread of the issuesFor example these features can indicate whether platform-specificissues or more general issues have more potential user churn

Other features such as the Severity and Priority showcase theprioritization used by developers The Severity and Priority areimportant for us to verify whether the prioritization performed bydevelopers is aligned with the rate of potential user churn

Features related to the type of the bugs (ie whether it is acrash) whether a version number is present or the target milestoneprovide us information regarding the tracking process of issuesThey are important to understand whether better-tracked issues

7

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

870

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

871

872

873

874

875

876

877

878

879

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

Table 5 Manual investigation of DTW correlation linking

FirefoxExample 1Correlation Linking Week 164 in issues to week 169 in switches (May 2018)Issue Report Title Firefox version 5303 appears high memory high resources usage and might causing for SSD damagesIssue Description Mozilla Firefox it is caching all of this data 4-5GB on my new SSD and its lost 4 from its health within 5 months due continuously writesoverwrites firefox cache on my SSDAltrnativeto Comment Switching to Chrome after version 5303 high memory is damaging my SSDExample 2Correlation Linking Week 66 in issues to week 68 in switches (November 2016)Issue Report Title Sessions are not cleared in the private windowIssue Description After closing private window (not a tab) and opening a new one I still logged inAltrnativeto Comment Chrome incognito functionality is more stable than Firefox Firefox doesnrsquot resets sessions

EclipseExample 1Correlation Linking Week 10 in issues to week 11 in switches (July 2015)Issue Report Title Not Java 8 CompatibleIssue Description When downloading Eclipse for the first time for Java development 64-bit (which is what my computer is) it is not running because it is not compatible with Java version 8

(which is the latest version of Java) but appears to be compatible with version 7 still -DosgirequiredJavaVersion=17Altrnativeto Comment Using Netbeans because Eclipse is not installingExample 2Correlation Linking Week 211 in issues to week 213 in switches (February 2019)Issue Report Title jar files in the exported application do not contain class filesIssue Description The application has many plugins It builds and runs well within eclipse but when I export the application using the eclipse export product wizard the jar files (corresponding to

the plugins) do not contain any class files they include meta-inf pluginxml and all the extra folders (eg lib assets etc when available) but they do NOT contain any class filesAltrnativeto Comment Didnrsquot find any problems in exporting jar (when recommending IntelliJ)

ApacheExample 1Correlation Linking Week 113 in issues to week 117 in switches (July 2017)Issue Report Title httpd 2426 no longer building against lua 531 or lua 534Issue Description Building Apache 2426 against lua 531 or 534 compiled from scratch (latest version as of this writing) does not work Compilation against lua 531 did work with 2425Altrnativeto Comment Some issues while building with lua (when recommending XAMPP)Example 2Correlation Linking Week 12 in issues to week 13 in switches (September 2019)Issue Report Title Apache Server is restarted time to time (Release 2410)Issue Description I am using Apache release 2410 in Windows Server 2008 Time to time the Apache server is getting restarted The service is unavailable for 3-5 minutes and after that the service

is started up automatically This issue occurs frequently twice a day two days once etcAltrnativeto Comment XAMPP is more stable I am experiencing random restarts throughout the day when using Apache

Table 6 The issue reports attributes for Firefox Eclipse andApache

Metric Description

Component consists of different components of the system such as bookmarkstheme toolbar etc

Status describes the status of the issue such as unconfirmed open re-solved etc

Resolution describes the resolution type of the issue such as fixed duplicateinvalid etc

Severity the assigned severity such as blocker critical normal etcHardware the different hardware affected by the issue such as desktop mobile

32 or 64 bits etcOS the different OS affected by the issue such as OSX Windows etcPriority the assigned priority from 1 to 5Type the assigned issue type such as defect enhancement or taskTime Alive the time difference between the date opened and the date resolved

in minutesTime since last change the time difference between the date opened and the date of last

change in minutesIs a Crash a boolean value to indicate if it is a crash reportHas a Version Number a boolean whether the issue report has a version numberHas a Target Milestone a boolean whether the issue report has a target numberHas a Regression Range a boolean value to indicate if the bug causes a regression testingNumber of comments the number of comments for the reportNumber of votes the number of votes for the report which validates the integrity of

the report by other usersNumber of Blocks the number of issues that are blocked by that issue until it is re-

solved

(ie with more tracking information) are associated with potentialusersrsquochurn

The number of comments votes and blocks showcase the effortinvested on discussing an issue Finally the time-alive and the time-since-last-change are two features related to the life-cycle of an issuereport Longer time-alive values indicate a lingering issue that hasnot been resolved yet while a short time-since-last-change values

indicate issues that have been recently addressed (and thus can stillbe reoccurring)

Our studied features are inspired by previous research in soft-ware engineering that have built machine learning models usinginformation present in issue reports [15 16 16 18 19 29 32 39 74]

We chose features that are common across the issue reports ofFirefox Eclipse and Apache We filtered out features that are uniqueto an issue tracker system only to ensure a fair comparison betweenour studied issue reports Afterwards we performed Spearmancorrelation tests [44] on the selected features in which each featureis tested against the set of all the other features We did not observeany correlation obtaining a value higher than 07 which suggeststhat our selected features are not correlated and can be safely fedinto our machine learning models [30]

To evaluate the performance of our models we use the Area Un-der the Curvemetric (AUC) AUC is useful for evaluating our modelsbecause our models output probabilities Therefore AUC showsthe discrimination power of our models in every probability thresh-old (unlike Precision Recall or F-measure which are limited toevaluating models at a single probability threshold [66]) The AUCvalues range from 0 to 1 An AUC of 05 denotes a random guessingmodel while an AUC of 1 denotes a perfect distinguishing powerAn AUC of 0 denotes a model with perfect inverse predictions

Given the temporal nature of our data (ie future issue reportscannot be used to predict the potential user churn of past issuereports) we adopt a leave-one-out validation approach to obtainour AUC values

First our issue reports have already been sorted when we per-form the time-series analyses Second we split our data into two

8

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

987

988

989

990

991

992

993

994

995

996

997

998

999

1000

1001

1002

1003

1004

1005

1006

1007

1008

1009

1010

1011

1012

1013

1014

1015

1016

1017

1018

1019

1020

1021

1022

1023

1024

1025

1026

1027

1028

1029

1030

1031

1032

1033

1034

1035

1036

1037

1038

1039

1040

1041

1042

1043

1044

sets one containing 80 of the data ie the training set and an-other containing 20 ie the validation set In the first iterationof our models we use 80 of the data to train our models and weuse the first element of the validation set to test out models Thenthe leave-one-out validation iteratively increases the training setsize by one element (from the validation set) and predicts the nextelement of the validation set This process is repeated until thetraining set reaches the size of 119873 minus 1 (where 119873 is the size of ourdata set) Finally our obtained AUC values are the aggregation(ie median) of all the AUC values obtained in the leave-one-outiterationsFindingsTheAUC values obtained for our threemodels are shownin our online appendix8 The XGboost model performs the best inthe three studied projects obtaining AUC values that exceed 083There exists a subtle difference between the AUC values of theXGboost model and the Feed Forward Neural Networks AUCssince both the XGboost model and the Neural Networks can graspthe complex relationships in our issue reports dataSummary Our results suggest that machine learning models suchas XGboost and Feed Forward Neural Networks can predict therate of user churn with relatively high accuracies (in terms of AUC)Such predictions could help in the prioritization of issue reports

RQ3 What are the most important factorsrelated to potential user churnMotivation In RQ2 we observe that machine learning modelscan be used to predict the rate of potential usersrsquochurn Howeverdevelopers would hardly blindly trust a machine learning modelto help on their decisions (and they should not do so) Developerswould mostly benefit from understanding the reasons behind thepredictions of our machine learning models so that they couldpossibly adapt (or not) their development process Therefore inRQ3 we investigate the most important features in the predictionsof our machine learning modelApproach In this RQ we study the most important features in thepredictions of the XGboost models since they obtain the best AUCvalues in our three studied projects

To find the most important features we adopt a simple featureextraction algorithm [61 70] Consider that our models are trainedon a feature set 119883 = 1199091 1199092 119909119899 (as explained in RQ2) The featureextraction process consists of iteratively (i) removing each feature 119909119894from our models and (ii) computing the AUC without each feature119909119894 (by using the same leave-one-out validation process explainedin RQ2) The AUC values from the models without a feature 119909119894 arecompared to the models containing all the features (ie we takethe difference 119860119880119862119900119903119894119892119894119899119886119897 minus119860119880119862119903119890119898119900119907119890119889 ) The higher the drop inthe AUC value caused by removing a certain feature 119909119894 the higherthe importance of such a feature 119909119894 [4 9 10]Findings Table 7 shows our obtained results after performing thefeature extraction process The table shows the drop in the AUC foreach feature (ie ΔAUC) We also show the minimum maximummean and median values of the features in the 119897119900119908 user churn rate(ie 0) and ℎ119894119892ℎ user churn rate (ie 1) prediction classes

8httpsdoiorg105281zenodo3610584

The Issue Alive Time is the most important feature obtaining thehighest ΔAUC in our three studied projects This result suggeststhat it is unhealthy to leave issues hanging for a long time as theycan be perceived as a reason for users to churn

The Number of Comments obtains a considerable ΔAUC in ourthree studied systems The association between the Number ofComments and the rate of user churnmay occur due to themultitudenumber of users being affected by an issue

The Time-since-last-change is also an important feature in theFirefox and Eclipse projects This result suggests that issues withrecurring changes or modifications may signal a certain instabilityin the fixing process which might cause a potential user churn inthe future

It is worthy noting that features such as Severity Priority andType of Bugs do not obtain a high importance in the predictionswhich might suggest that the existing prioritization fields used inissue reports are misaligned with the potential user churnSummaryOur results suggest that long lived issues can potentiallylead to more potential userrsquos churn and that the existing processesfor prioritizing issues should be augmented to capture the highlyinteractive and long lived issues

5 THREATS TO VALIDITYConstruct Validity The main construct validity of our study isrelated to the assumption of a potential user churn The alterna-tivetonet website does not record exactly when a user has chosen asoftware alternative over another software Instead alternativetonetrecords that a user thinks that a software may be a good alternativeto another software The motivation behind the act of signaling analternative is not known by us For example there might be usersthat are simply driven by the passion of contributing to the alterna-tivetonet community which lead them to provide several opinionsas to which software could make good alternatives to others Wehave deliberately adopted the term potential user churn instead ofsimply user churn due to this unknown motivations behind signal-ing alternatives Moreover it is reasonable to think that if a givensoftware has received a considerable amount of recommendationfor alternatives this may likely indeed represent that users areconsidering to change the software

Internal Validity Our internal threats to validity are mainlyrelated to (i) our chosen features and (ii) our prediction classes Wesplit the peaks of potential user churn into the low and high classesOther research may find different results if the potential user churnis modeled in different way In addition we acknowledge that ourset of metrics is not exhaustive For example we have not studiedwhether the presence of a stack-trace in an issue report can berelated to future potential user churn We plan to (i) model thepotential user churn as a continuous variable and (ii) extend our setof features to predict the potential user churn in future research

External Validity Our study targeted three domains that areweb servers IDEs and web browsers These three domains werechosen due to their availability since we could fetch their issuereports data Although our study showed common factors in ouranalyses we cannot generalize our results to other domains orprojects with a different scale (ie smaller projects) Regardless ourwork provides insights of the possible reasons behind user churn

9

1045

1046

1047

1048

1049

1050

1051

1052

1053

1054

1055

1056

1057

1058

1059

1060

1061

1062

1063

1064

1065

1066

1067

1068

1069

1070

1071

1072

1073

1074

1075

1076

1077

1078

1079

1080

1081

1082

1083

1084

1085

1086

1087

1088

1089

1090

1091

1092

1093

1094

1095

1096

1097

1098

1099

1100

1101

1102

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

1103

1104

1105

1106

1107

1108

1109

1110

1111

1112

1113

1114

1115

1116

1117

1118

1119

1120

1121

1122

1123

1124

1125

1126

1127

1128

1129

1130

1131

1132

1133

1134

1135

1136

1137

1138

1139

1140

1141

1142

1143

1144

1145

1146

1147

1148

1149

1150

1151

1152

1153

1154

1155

1156

1157

1158

1159

1160

Table 7 The most important features for Firefox Eclipse and Apache ranked by their drop in AUC

Factor ΔAUC Min 0 Max 0 Mean 0 Median 0 Min 1 Max 1 Mean 1 Median 1

FirefoxTime Alive 013 0 120 1102 13 10 1487 4872 63OS All 007 0 1 014 0 0 1 083 1Number of Comments 006 1 65 894 12 1 458 2811 45Time Since Last Change 006 9 30 1090 22 0 5 410 3Has a Target Milestone 005 0 1 022 0 0 1 079 1Has a Version Number 004 0 1 033 0 0 1 064 1Component UI Web Payments 003 0 1 042 0 0 1 061 1

EclipseTime Alive 012 0 103 901 12 8 1334 4396 49Has a Target Milestone 006 0 1 019 0 0 1 082 1Time since last change 006 7 27 851 19 0 6 312 3Has a Version Number 005 0 1 034 0 0 1 077 1Component Debug 004 0 1 047 0 0 1 072 1Number of Comments 003 1 72 1092 11 1 632 2701 72OS All 002 0 1 050 0 0 1 068 1

ApacheTime Alive 014 0 136 1132 14 14 1521 5310 70Component Utils 010 0 1 022 0 0 1 091 1Has a Version Number 009 0 1 038 0 0 1 086 1Number of Comments 005 1 92 1232 6 1 352 4207 79OS All 004 0 1 033 0 0 1 079 1Status REOPENED 003 0 1 040 0 0 1 073 1Hardware All 002 0 1 051 0 0 1 078 1

6 RELATEDWORKGiven that we study the relationship between potential user churnand issue reports our research is related to the areas of usercustomerchurn and software quality Therefore we survey the related re-search around these two areas in this section

UserCustomerChurnUser or customer churn has beenwidelystudied in the area of telecommunication services [3 28 71] Aminet al [3] proposed a cross-company machine learning model topredict customer churn The authors also propose the use of datatransformation (eg by using the log function) to improve the train-ing data Ullah et al [71] used feature selection techniques such asInformation Gain and Correlation Attributes Ranking Filter to selectwhich features better explained the churn behaviour of differentcustomer groups The authors observed that a Random Forest modelperformed the best in their study While the Business to Customers(B2C) churn has been widely studied Figalist et al [28] recentlyinvestigated the Business to Business (B2B) churn ie when thecustomer business changes shifts to the services provided by thecompetition Figalist et al [28] allude that although B2B relation-ships tend to be more stable they have a much bigger financial im-pact when they change In terms of software engineering there hasbeen a lack of studies that have investigated usercostumer churndirectly Instead much research has been invested on customerreviews regarding software products [6 46 47] Noei et al [46]studied which features from mobile apps are associated with betterranks in the Google Play store While the work by Noei et al [46]does not study the customer churn directly striving for better ranksin Google Play may avoid user churn in mobile apps Bavota etal [6] investigated the relationship between the usage of fault- andchange-prone APIs in Android apps and user reviews Differentlyfrom all the aforementioned work our study investigates the userchurn is software products

Software Quality Although there exist a lack of studies ana-lyzing the user churn of software productsservices there has beena considerable amount of research on software quality [5 34 39 6667 74] Research on software quality is important since a quality

software is more stable and less defect-prone Ultimately such char-acteristics are tightly related to the observations of our study(iesoftware issues are correlated with user churn) A prominent areaof software quality research in software engineering is the defectprediction area [66 67] Tantithamthavorn et al [66] studied howmuch improvement can be obtained by automatically optimizingdefect prediction models Their research shows that automatic op-timization can yield significant or insignificant performance gainsdepending on the machine learning algorithm Other considerableamount of research has been invested on the triaging part of issuereports [5] and on the effort estimation of an issue report (in termsof time to address an issue) [39 74] In this paper we complementprior research by studying the relationship shared between issuereports (eg defects or required enhancements) with the potentialuser churn of users

7 CONCLUSIONIn this work we study the data available on alternativetonet tobetter understand the relationship between software issues and thepotential user churn of users Having observed that user concernsare tightly related to software issues (eg bugs) we investigatethe relationship between issue reports and the potential user churnof users Our study reveals key issues that must be addressed forthe success of a software product (depending on the domain) Forexample we observe that the potential user churn of users may betightly related to the lack of a robust documentation and support fortesting tools (in the ldquoIDErdquo and ldquoWeb Serverrdquo domains) Finally ourmachine learning models reveal that (i) the longer the issue takes tobe fixed the higher the chances of user churn and (ii) issues withinmore general software components are more likely to be associatedwith user churn Finally we suggest that the current prioritizationperformed by developers should be augmented to encompass thelong lived and highly interactive issues In overall our study sug-gests that the prioritization process of issues can be improved byconsidering the potential user churn of users associated with suchissues

10

1161

1162

1163

1164

1165

1166

1167

1168

1169

1170

1171

1172

1173

1174

1175

1176

1177

1178

1179

1180

1181

1182

1183

1184

1185

1186

1187

1188

1189

1190

1191

1192

1193

1194

1195

1196

1197

1198

1199

1200

1201

1202

1203

1204

1205

1206

1207

1208

1209

1210

1211

1212

1213

1214

1215

1216

1217

1218

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

1219

1220

1221

1222

1223

1224

1225

1226

1227

1228

1229

1230

1231

1232

1233

1234

1235

1236

1237

1238

1239

1240

1241

1242

1243

1244

1245

1246

1247

1248

1249

1250

1251

1252

1253

1254

1255

1256

1257

1258

1259

1260

1261

1262

1263

1264

1265

1266

1267

1268

1269

1270

1271

1272

1273

1274

1275

1276

REFERENCES[1] Martiacuten Abadi Paul Barham Jianmin Chen Zhifeng Chen Andy Davis Jeffrey

Dean Matthieu Devin Sanjay Ghemawat Geoffrey Irving Michael Isard Man-junath Kudlur Josh Levenberg Rajat Monga Sherry Moore Derek G MurrayBenoit Steiner Paul Tucker Vijay Vasudevan Pete Warden Martin Wicke YuanYu and Xiaoqiang Zheng 2016 Tensorflow A system for large-scale machinelearning In 12th USENIX Symposium on Operating Systems Design and Imple-mentation (OSDI 16) 265ndash283

[2] W Abdelmoez Mohamed Kholief and Fayrouz M Elsalmy 2012 Bug fix-time pre-diction model using naiumlve bayes classifier In 2012 22nd International Conferenceon Computer Theory and Applications (ICCTA) IEEE 167ndash172

[3] Adnan Amin Babar Shah Asad Masood Khattak Thar Baker and Sajid Anwar2018 Just-in-time customer churn prediction with and without data transforma-tion In 2018 IEEE Congress on Evolutionary Computation (CEC) IEEE 1ndash6

[4] Hafeez Ullah Amin Aamir Saeed Malik Rana Fayyaz Ahmad Nasreen BadruddinNidal Kamel Muhammad Hussain and Weng-Tink Chooi 2015 Feature extrac-tion and classification for EEG signals using wavelet transform and machinelearning techniques Australasian physical amp engineering sciences in medicine 381 (2015) 139ndash149

[5] John Anvik Lyndon Hiew and Gail C Murphy 2006 Who should fix this bugIn Proceedings of the 28th international conference on Software engineering ACM361ndash370

[6] Gabriele Bavota Mario Linares-Vasquez Carlos Eduardo Bernal-Cardenas Mas-similiano Di Penta Rocco Oliveto and Denys Poshyvanyk 2014 The impactof api change-and fault-proneness on the user ratings of android apps IEEETransactions on Software Engineering 41 4 (2014) 384ndash407

[7] Pamela Bhattacharya and Iulian Neamtiu 2011 Bug-fix time prediction modelscan we do better In Proceedings of the 8thWorking Conference on Mining SoftwareRepositories ACM 207ndash210

[8] Steven Bird and Edward Loper 2004 NLTK the natural language toolkit InProceedings of the ACL 2004 on Interactive poster and demonstration sessionsAssociation for Computational Linguistics 31

[9] Andrew P Bradley 1997 The use of the area under the ROC curve in theevaluation of machine learning algorithms Pattern recognition 30 7 (1997)1145ndash1159

[10] Leo Breiman 2001 Random forests Machine learning 45 1 (2001) 5ndash32[11] Albert Caruana and Michael T Ewing 2010 How corporate reputation quality

and value influence online loyalty Journal of Business Research 63 9-10 (2010)1103ndash1110

[12] Tianqi Chen and Carlos Guestrin 2016 Xgboost A scalable tree boosting systemIn Proceedings of the 22nd acm sigkdd international conference on knowledgediscovery and data mining ACM 785ndash794

[13] Tianqi Chen Tong He Michael Benesty Vadim Khotilovich and Yuan Tang 2015Xgboost extreme gradient boosting R package version 04-2 (2015) 1ndash4

[14] Wenbin Chen Kun Fu Jiawei Zuo Xinwei Zheng Tinglei Huang and WenjuanRen 2017 Radar emitter classification for large data set based on weighted-xgboost IET Radar Sonar amp Navigation 11 8 (2017) 1203ndash1207

[15] Morakot Choetkiertikul Hoa Khanh Dam Truyen Tran and Aditya Ghose 2015Predicting delays in software projects using networked classification (t) In 201530th IEEEACM International Conference on Automated Software Engineering (ASE)IEEE 353ndash364

[16] Morakot Choetkiertikul Hoa Khanh Dam Truyen Tran Aditya Ghose and JohnGrundy 2017 Predicting delivery capability in iterative software developmentIEEE Transactions on Software Engineering 44 6 (2017) 551ndash573

[17] Franccedilois Chollet 2017 Keras[18] Daniel Alencar Da Costa Shane McIntosh Uiraacute Kulesza Ahmed E Hassan and

Surafel Lemma Abebe 2018 An empirical study of the integration time of fixedissues Empirical Software Engineering 23 1 (2018) 334ndash383

[19] Daniel Alencar da Costa Shane McIntosh Christoph Treude Uiraacute Kulesza andAhmed E Hassan 2018 The impact of rapid release cycles on the integrationdelay of fixed issues Empirical Software Engineering 23 2 (2018) 835ndash904

[20] Piew Datta Brij Masand Deepak R Mani and Bin Li 2000 Automated cellularmodeling and prediction on a large scale Artificial Intelligence Review 14 6 (2000)485ndash502

[21] Kushal S Dave Vishal Vaingankar Sumanth Kolar and Vasudeva Varma 2013Timespent based models for predicting user retention In Proceedings of the 22ndinternational conference on World Wide Web ACM 331ndash342

[22] William H DeLone and Ephraim R McLean 1992 Information systems successThe quest for the dependent variable Information systems research 3 1 (1992)60ndash95

[23] William H DeLone and Ephraim R McLean 2003 Model of information systemssuccess a ten years update Journal of Management (2003)

[24] Gideon Dror Dan Pelleg Oleg Rokhlenko and Idan Szpektor 2012 Churnprediction in new users of Yahoo answers In Proceedings of the 21st InternationalConference on World Wide Web ACM 829ndash834

[25] Khaled El Emam 2005 The ROI from software quality Auerbach Publications[26] Katherine Ellis Jacqueline Kerr Suneeta Godbole Gert Lanckriet David Wing

and Simon Marshall 2014 A random forest classifier for the prediction of energy

expenditure and type of physical activity from wrist and hip accelerometersPhysiological measurement 35 11 (2014) 2191

[27] David Enke and Suraphan Thawornwong 2005 The use of datamining and neuralnetworks for forecasting stock market returns Expert Systems with applications29 4 (2005) 927ndash940

[28] Iris Figalist Christoph Elsner Jan Bosch and Helena Holmstroumlm Olsson 2019Customer churn prediction in B2B contexts In International Conference on Soft-ware Business Springer 378ndash386

[29] Emanuel Giger Martin Pinzger and Harald Gall 2010 Predicting the fix timeof bugs In Proceedings of the 2nd International Workshop on RecommendationSystems for Software Engineering ACM 52ndash56

[30] Frank E Harrell Jr 2015 Regression modeling strategies with applications to linearmodels logistic and ordinal regression and survival analysis Springer

[31] Steffen Hedegaard and Jakob Grue Simonsen 2013 Extracting usability and userexperience information from online user reviews In Proceedings of the SIGCHIConference on Human Factors in Computing Systems ACM 2089ndash2098

[32] Yujuan Jiang Bram Adams and Daniel M German 2013 Will my patch make itand how fast Case study on the linux kernel In Proceedings of the 10th WorkingConference on Mining Software Repositories IEEE Press 101ndash110

[33] Guolin Ke Jia Zhang Zhenhui Xu Jiang Bian and Tie-Yan Liu 2019 TabNNa universal neural network solution for tabular data httpsopenreviewnetforumid=r1eJssCqY7

[34] Foutse Khomh Brian Chan Ying Zou and Ahmed E Hassan 2011 An entropyevaluation approach for triaging field crashes A case study of mozilla firefox InReverse Engineering (WCRE) 2011 18th Working Conference on IEEE 261ndash270

[35] Andy Liaw and Matthew Wiener 2002 Classification and regression by randomforest R news 2 3 (2002) 18ndash22

[36] Erik Linstead Paul Rigor Sushil Bajracharya Cristina Lopes and Pierre Baldi2007 Mining eclipse developer contributions via author-topic models In FourthInternational Workshop on Mining Software Repositories (MSRrsquo07 ICSE Workshops2007) IEEE 30ndash30

[37] Xi Long Wenjing Yin Le An Haiying Ni Lixian Huang Qi Luo and Yan Chen2012 Churn analysis of online social network users using data mining techniquesIn Proceedings of the international MultiConference of Engineers and ConputerScientists Vol 1

[38] James MacQueen 1967 Some methods for classification and analysis of multivari-ate observations In Proceedings of the fifth Berkeley symposium on mathematicalstatistics and probability Vol 1 Oakland CA USA 281ndash297

[39] Lionel Marks Ying Zou and Ahmed E Hassan 2011 Studying the fix-timefor bugs in large open source projects In Proceedings of the 7th InternationalConference on Predictive Models in Software Engineering ACM 11

[40] Andrew Kachites McCallum 2002 Mallet A machine learning for languagetoolkit (2002)

[41] Miloš Milošević Nenad Živić and Igor Andjelković 2017 Early churn predic-tion with personalized targeting in mobile social games Expert Systems withApplications 83 (2017) 326ndash332

[42] Audris Mockus Roy T Fielding and James Herbsleb 2000 A case study ofopen source software development the Apache server In Proceedings of the 22ndinternational conference on Software engineering Acm 263ndash272

[43] Gustavo Percio Zimmermann Montesdioca and Antocircnio Carlos Gastaud Maccedilada2015 Measuring user satisfaction with information security practices Computersamp Security 48 (2015) 267ndash280

[44] Leann Myers and Maria J Sirois 2004 Spearman correlation coefficients differ-ences between Encyclopedia of statistical sciences 12 (2004)

[45] Didrik Nielsen 2016 Tree boosting with XGBoost-why does XGBoost win EveryMachine Learning Competition Masterrsquos thesis NTNU

[46] Ehsan Noei Daniel Alencar Da Costa and Ying Zou 2018 Winning the appproduction rally In Proceedings of the 2018 26th ACM Joint Meeting on EuropeanSoftware Engineering Conference and Symposium on the Foundations of SoftwareEngineering ACM 283ndash294

[47] Ehsan Noei Feng Zhang Shaohua Wang and Ying Zou 2019 Towards pri-oritizing user-related issue reports of mobile applications Empirical SoftwareEngineering 24 4 (2019) 1964ndash1996

[48] Ehsan Noei Feng Zhang and Ying Zou 2019 Too Many User-Reviews WhatShould App Developers Look at First IEEE Transactions on Software Engineering(2019)

[49] Derek Orsquocallaghan Derek Greene Joe Carthy and Paacutedraig Cunningham 2015An analysis of the coherence of descriptors in topic modeling Expert Systemswith Applications 42 13 (2015) 5645ndash5657

[50] Ya A Pachepsky Dennis Timlin and GY Varallyay 1996 Artificial neural net-works to estimate soil water retention from easily measurable data Soil ScienceSociety of America Journal 60 3 (1996) 727ndash733

[51] Mahesh Pal 2005 Random forest classifier for remote sensing classificationInternational Journal of Remote Sensing 26 1 (2005) 217ndash222

[52] Sebastiano Panichella Andrea Di Sorbo Emitza Guzman Corrado A VisaggioGerardo Canfora and Harald C Gall 2015 How can i improve my app classifyinguser reviews for software maintenance and evolution In Software maintenanceand evolution (ICSME) 2015 IEEE international conference on IEEE 281ndash290

11

1277

1278

1279

1280

1281

1282

1283

1284

1285

1286

1287

1288

1289

1290

1291

1292

1293

1294

1295

1296

1297

1298

1299

1300

1301

1302

1303

1304

1305

1306

1307

1308

1309

1310

1311

1312

1313

1314

1315

1316

1317

1318

1319

1320

1321

1322

1323

1324

1325

1326

1327

1328

1329

1330

1331

1332

1333

1334

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

1335

1336

1337

1338

1339

1340

1341

1342

1343

1344

1345

1346

1347

1348

1349

1350

1351

1352

1353

1354

1355

1356

1357

1358

1359

1360

1361

1362

1363

1364

1365

1366

1367

1368

1369

1370

1371

1372

1373

1374

1375

1376

1377

1378

1379

1380

1381

1382

1383

1384

1385

1386

1387

1388

1389

1390

1391

1392

[53] Chitra Phadke Huseyin Uzunalioglu Veena B Mendiratta Dan Kushnir andDerek Doran 2013 Prediction of subscriber churn using social network analysisBell Labs Technical Journal 17 4 (2013) 63ndash76

[54] Peter C Rigby Daniel M German and Margaret-Anne Storey 2008 Open sourcesoftware peer review practices a case study of the apache server In Proceedingsof the 30th international conference on Software engineering ACM 541ndash550

[55] Martin P Robillard and Robert Deline 2011 A field study of API learning obstaclesEmpirical Software Engineering 16 6 (2011) 703ndash732

[56] Michael Roumlder Andreas Both and Alexander Hinneburg 2015 Exploring thespace of topic coherence measures In Proceedings of the eighth ACM internationalconference on Web search and data mining ACM 399ndash408

[57] Victor Francisco Rodriguez-Galiano Bardan Ghimire John Rogan Mario Chica-Olmo and Juan Pedro Rigol-Sanchez 2012 An assessment of the effectivenessof a random forest classifier for land-cover classification ISPRS Journal of Pho-togrammetry and Remote Sensing 67 (2012) 93ndash104

[58] Hiroaki Sakoe and Seibi Chiba 1978 Dynamic programming algorithm opti-mization for spoken word recognition IEEE transactions on acoustics speech andsignal processing 26 1 (1978) 43ndash49

[59] Stan Salvador and Philip Chan 2007 Toward accurate dynamic time warping inlinear time and space Intelligent Data Analysis 11 5 (2007) 561ndash580

[60] Peter B Seddon 1997 A respecification and extension of the DeLone and McLeanmodel of IS success Information systems research 8 3 (1997) 240ndash253

[61] Rudy Setiono Bart Baesens and Christophe Mues 2008 Recursive neural net-work rule extraction for data with mixed attributes IEEE Transactions on NeuralNetworks 19 2 (2008) 299ndash307

[62] Dong-Hee Shin 2015 Effect of the customer experience on satisfaction withsmartphones Assessing smart satisfaction index with partial least squaresTelecommunications Policy 39 8 (2015) 627ndash641

[63] Tom De Smedt and Walter Daelemans 2012 Pattern for python Journal ofMachine Learning Research 13 Jun (2012) 2063ndash2067

[64] Daniel Svozil Vladimir Kvasnicka and Jiri Pospichal 1997 Introduction to multi-layer feed-forward neural networks Chemometrics and intelligent laboratorysystems 39 1 (1997) 43ndash62

[65] Shaheen Syed and Marco Spruit 2017 Full-text or abstract Examining topiccoherence scores using latent dirichlet allocation In 2017 IEEE Internationalconference on data science and advanced analytics (DSAA) IEEE 165ndash174

[66] Chakkrit Tantithamthavorn Shane McIntosh Ahmed E Hassan and KenichiMatsumoto 2016 Automated parameter optimization of classification techniquesfor defect prediction models In 2016 IEEEACM 38th International Conference onSoftware Engineering (ICSE) IEEE 321ndash332

[67] Chakkrit Tantithamthavorn Shane McIntosh Ahmed E Hassan and KenichiMatsumoto 2018 The impact of automated parameter optimization on defectprediction models IEEE Transactions on Software Engineering (2018)

[68] Romain Tavenard 2017 tslearn A machine learning toolkit dedicated to time-series data httpsgithubcomrtavenartslearn

[69] L Torlay Marcela Perrone-Bertolotti Elizabeth Thomas and Monica Baciu 2017Machine learningndashXGBoost analysis of language networks to classify patientswith epilepsy Brain informatics 4 3 (2017) 159

[70] Geoffrey G Towell and Jude W Shavlik 1993 Extracting refined rules fromknowledge-based neural networks Machine learning 13 1 (1993) 71ndash101

[71] Irfan Ullah Basit Raza Ahmad Kamran Malik Muhammad Imran Saif Ul Islamand SungWon Kim 2019 A churn prediction model using random forest analysisof machine learning techniques for churn prediction and factor identification intelecom sector IEEE Access 7 (2019) 60134ndash60149

[72] Lorenzo Villarroel Gabriele Bavota Barbara Russo Rocco Oliveto and Massimil-iano Di Penta 2016 Release planning of mobile apps based on user reviews InProceedings of the 38th International Conference on Software Engineering ACM14ndash24

[73] Chih-Ping Wei and I-Tang Chiu 2002 Turning telecommunications call detailsto churn prediction a data mining approach Expert systems with applications 232 (2002) 103ndash112

[74] Cathrin Weiss Rahul Premraj Thomas Zimmermann and Andreas Zeller 2007How long will it take to fix this bug In Fourth International Workshop on MiningSoftware Repositories (MSRrsquo07 ICSE Workshops 2007) IEEE 1ndash1

[75] Ian H Witten Eibe Frank and A Mark 2016 Hall and Christopher J Pal DataMining Practical machine learning tools and techniques (2016)

[76] Barbara H Wixom and Peter A Todd 2005 A theoretical integration of usersatisfaction and technology acceptance Information systems research 16 1 (2005)85ndash102

[77] Jae-Chern Yoo and Tae Hee Han 2009 Fast normalized cross-correlation Circuitssystems and signal processing 28 6 (2009) 819

12

  • Abstract
  • 1 Introduction
  • 2 Motivating Example
  • 3 Data Collection
  • 4 Research Questions amp Results
  • 5 Threats To Validity
  • 6 Related Work
  • 7 Conclusion
  • References
Page 6: On the Relationship between User Churn and Software Issues · relationship between the issues that are present in a software prod-uct and user churn. For this purpose, we investigate

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

633

634

635

636

637

638

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

639

640

641

642

643

644

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

691

692

693

694

695

696

Table 2 The predominant topics in alternativetonet

Release and updatesKeywords ldquo119907119890119903119904119894119900119899rdquo ldquo119897119900119899119892119890119903 rdquo ldquo119903119890119897119890119886119904119890rdquo ldquo119889119900119908119899119897119900119886119889rdquo ldquo119900 119891 119891 119894119888119894119886119897rdquoExamples - The program seems to be no longer updated Last versions (2018) can be still downloaded from the official website

- Firefox version 60+ (Quantum) is better than Chrome Compare to previous versions of Firefox

Bookmarks FunctionalityKeywords ldquo119887119900119900119896119898119886119903119896rdquo ldquo119901119886119899119890119897rdquo ldquo119891 119886119907119900119903119894119905119890rdquo ldquo119904119906119901119901119900119903119905rdquo ldquo119904119910119899119888rdquoExamples - Firefox now allows bookmarks importing from Chrome which is a huge plus

- With the recent upgrade to 20 (including the bookmarks syncing feature that I liked in Firefox) Vivaldi just got better

Flash supportKeywords ldquo119901119897119906119892119894119899rdquo ldquo119907119894119889119890119900rdquo ldquo119894119899119904119905119886119897119897rdquo ldquo119891 119897119886119904ℎrdquo ldquo119898119890119889119894119886 minus 119901119897119886119910119890119903 rdquoExamples - Since its the only way to watch Flash video on an iPhone without jailbreaking Skyfire is a killer app thats worth getting for anyone Playback of Flash video over WiFi is

flawless and the quality is at very least watchable

Memory issuesKeywords ldquo119898119890119898119900119903119910rdquo ldquo119877119860119872rdquo ldquo119897119886119892rdquo ldquo119897119894119898119894119905rdquo ldquoℎ119890119886119907119910rdquoExamples - Great C++ IDE which balances memory usage and indexing solution Contains some basic refactoring functions Good for large projects and limited RAM with remaining

some capabilities of more complex C++ IDEs- Eclipse is really lagging after the last update The memory is filled whenever I run a Java swing applet- Chrome is so heavy it lags alot when I open multiple tabs

Privacy and securityKeywords ldquo119901119903119894119907119886119888119910rdquo ldquo119904119890119888119906119903119894119905119910rdquo ldquo119894119898119901119900119903119905119886119899119905rdquo ldquo119886119897119897119900119908rdquo ldquo119888119900119899119905119903119900119897rdquoExamples - Firefox allows far more control over your privacy than does Chrome (or even Chromium on which Chrome is based)

- Google Chrome is a great browser and a lot better than Vivaldi Browser Chrome keeps updating with cool stuff like the new cast feature better extensions and lot betterprivacy and security Vivaldi has more customizing features but has a long way to go to compete with Chrome

BugsCrashesKeywords ldquo119888119903119886119904ℎrdquo ldquo119887119906119892rdquo ldquo119891 119894119909rdquo ldquo119890119909119901119890119903119894119890119899119888119890rdquo ldquo119894119904119904119906119890rdquoExamples - Lots of users experience Adobe Flash Player hangs or crashes I dont know why it isnt fixed yet

If Chrome ever got its proxy support bugs fixed it would leap to the top of the list and become pretty much the only web browser Id ever use- I worked with NetBeans for a couple of years and always loved it but for professional users the yearly fee is well spent Nicer interface better Code Intelligence moreoptions to fine tune what happens on saving a file (eg saving or uploading to various locations) You get a couple of feature update each year plus bugfixes whereasNetBeans is sometimes very slow to update things (support for new version of a language) or to fix known bugs

Table 3 The predominant topics in alternativetonet (continued)

Debug optionsKeywords ldquo119891 119890119886119905119906119903119890rdquo ldquo119889119890119907119890119897119900119901119898119890119899119905rdquo ldquo119889119890119887119906119892rdquo ldquo119894119899119905119890119892119903119886119905119894119900119899rdquo ldquo119901119890119903 119891 119890119888119905rdquoExamples - Not enough debug and run options just a text editor Atom has not got any solid debugging features (and packages) out of the box or at all

- This is my favorite code editor It is open source and portable you can get a portable version that will fit onto a flash drive if youwant It is just as configurableas Sublime Text and has a large community producing great plugins it has GIT support built in and excellent support for Javascript HTML CSS and Pythonand others of course It has an integrated debugging system

Simplistic designKeywords ldquo119904119894119898119901119897119890rdquo ldquo119890119889119894119905119900119903 rdquo ldquo119897119894119892ℎ119905119908119890119894119892ℎ119905rdquo ldquo119889119890119904119894119892119899rdquo ldquo119899119900119905119890119901119886119889rdquoExamples - Really lightweight with some advanced not complete but easy to use auto complete features Simple design and customizable

Cross-platformKeywords ldquo119888119903119900119904119904 minus 119901119897119886119905 119891 119900119903119898rdquo ldquo119871119894119899119906119909rdquo ldquo119900119901119890119899 minus 119904119900119906119903119888119890rdquo ldquo119886119907119886119894119897119886119887119897119890rdquo ldquo119894119899119904119905119886119897119897rdquoExamples - Its cross platform and open source and bring the benefits of the JetBrains IDE platform which is well polished and powerful

- One of the best open source browsers that work on Linux and Windows

Code refactoring and auto-completionKeywords ldquo119888119900119889119890rdquo ldquordquo119886119906119905119900 minus 119888119900119898119901119897119890119905119894119900119899rdquo ldquo119903119890 119891 119886119888119905119900119903 rdquo ldquo119900119901119905119894119900119899rdquo ldquo119906119904119890rdquoExamples - For its young age it is well matured and a very powerful IDE with great CMake Support and other goodies like great refactoring tools and a sophisticated

auto-completion system

TestingKeywords ldquo119905119890119904119905rdquo ldquo119904119890119903119907119890119903 rdquo ldquo119889119890119901119897119900119910rdquo ldquo119908119890119887119904119894119905119890rdquo ldquo119904119890119905rdquoExamples - XAMPP is great tool to develop and test your website (particularly if it uses php and mysql databases) offline before putting it on a live server You could

also use XAMPP as an easy method of setting up a live server- Eclipse is far more superior than netbeans in unit testing functionality

DocumentationKeywords ldquo119889119900119888119906119898119890119899119905119886119905119894119900119899rdquo ldquo119904119890119903119907119890119903 rdquo ldquo119901119900119903119905rdquo ldquo119889119886119905119886119887119886119904119890rdquo ldquo119891 119900119903119906119898rdquoExamples - Easiest way to run a web server on Windows The performance is surprisingly good and lots of documentation on how to manage it

- Support is via forum only Documentation can be very poor particularly around product upgrades (one expects more from commercial packages) Due totheir poor documentation I lost several years of development databases when porting from an old to a new computer (they dont accurately document thelocation of the database files)- Eclipse lacks in adding plugins documentation

time constraints for continuity and monotonicity Finally two co-authors performed a manual analysis of the observed peaks in the

two time-series7 The goal of the manual analyses is to find whetherthe identified peaks are indeed likely related (eg the issue reports

7We share the data of our manual analyses at httpsdoiorg105281zenodo3610584

6

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

808

809

810

811

812

Table 4 The extracted topics percentages distribution overthe three domains

Topics Browsers() IDEs() Web servers()Release and updates 18 17 21Bookmarks functionality 12 0 0Memory issues 23 22 11Privacy and security 8 0 0BugsCrashes 24 19 20Debug options 0 10 0Simplistic design 0 8 2Cross-platform 8 6 5Code refactoring and auto-completion 0 10 0Testing 0 2 17Documentation 0 2 15Miscellaneous 7 4 9

within a peak actually describe problems mentioned in the poten-tial usersrsquochurn) Again both co-authors analyze and discuss everyanalyzed sample

In this RQ we perform our analyses only in our studied projects(as opposed to analyzing all the software alternatives in their do-mains) As discussed in Section 3 it would be impracticable toextract the data from all the different Issue Tracking Systems of thesoftware alternatives in the studied domainsFindingsWe respectively observe a cross-correlation of 076 072and 070 in Firefox Eclipse and Apache These cross-correlationscores indicate that there exist a global commonality between thetrends of the two time-series regardless of the timing factor

Regarding our DTW analysis we observe that the time fromthe filing of an issue report to the time of a spark in potentialusersrsquochurn span from 2 to 4 weeks (on the median for the threeprojects) Additionally our DTW correlation reveals a bidirectionalrelationship between the peaks in issue reports and the peaks inpotential usersrsquochurn For the cases where a peak in issue reportsleads to a peak in potential usersrsquochurn it is intuitive to think thatthe peak in potential usersrsquochurn is due to the frustration generatedby the issues described in the reports On the other hand the caseswhere a peak in potential usersrsquochurn is followed by a peak inissue reports may occur due to users being frustrated over non-obvious issues For example if we consider the potential user churndepicted in Figure 3 (ie ldquoJupyter is slow on Firefoxrdquo) the users mightnot find it obvious that such a problem should be reported to theFirefox team (eg users may simply interpret it as a characteristicof Firefox) Therefore in such cases issue reports would be filledonly a while after the potential usersrsquochurn have been expressed

We are specifically interested in studying the relationship be-tween reported issues (ie the documented issues in the trackersystems) and the potential user churn Our rationale is that under-standing the characteristics of issues that are associated with userchurn can help developers to prioritize such issues and avoid futurechurn (thereby retaining more users)

Finally Table 5 shows a subset of the examples found in ourmanual analyses We can indeed observe the apparent relationshipbetween the comments within the potential usersrsquochurn and thedescriptions (or titles) of issue reports The examples are followinga unidirectional link in which the issue report that occurs at acertain time would induce the potential user churn at a later timeSummary Our results suggest that there exists a significant cor-relation between software issues and potential user churn

RQ2 Can we use machine learning models topredict the rate of potential user churnMotivation In the practice most of the issue reports undergo hu-man estimations of the priority and the severity [5] usually basedon internal business-oriented factors (such as ROI [25]) Issuesmight be prioritized by their recency the affected platforms theversions of the software where the issue is present Feeding a ma-chine learning model to identify the potential user churn that arecaused by issues may help automate the issue report prioritizationprocess and ultimately save huge costs in software developmentApproach We train three different machine learning models topredict low and high user churn rates by using information from is-sue reports The first model is a Feed-Forward Neural Network withtwo hidden layers developed by using the framework Keras [17]Keras is widely used to train simple neural networks as it hasa higher abstraction than TensorFlow [1] Feed-Forward NeuralNetworks [64] are the most suitable type of neural networks tobe trained on tabular data [27 33 50] This network evaluates allthe possible combinations of different metrics values to guide thepredictions

Our second model is the XGboost model which is a gradientboosting tree [12 13] XGboost is widely used for binary classifica-tion problems Research has shown that XGboost performs betterthan neural networks and classic classification machine learningalgorithms in problems dealing with tabular data [14 45 69]

We also train Random Forest models [35] Random Forest isa classical machine learning algorithm which is widely used inproblems dealing with tabular data [26 51 57] In addition Ran-dom Forest models have been widely used in software engineeringresearch such as in defect prediction [67]

The goal of ourmodels is to predict the rate of potential usersrsquochurngiven the characteristics of issue reports The data used to fit ourmodels is the data obtained from our DTW associations Simply putthe DTW provides us with 119875 pairs of issue reports 119868 and potentialusersrsquochurn 119862 (ie 119875 lt 119868 119862 gt) Therefore we study the character-istics of the issues 119868 to predict the amount of potential user churn119862 We split the distribution of 119862 into two percentiles (ie low andhigh) The low percentile being from 0 to 50 and the high from51 to 100 Thus our models output dichotomous predictions iewhether the potential usersrsquochurn is low or high

The characteristics of issue reports that we include in our models(henceforth referred to as features) are described in Table 6 Thesefeatures are collected from the Issue Tracking systems of our threestudied projects We study features such as Component Hardwareand OS to account for the locality and the spread of the issuesFor example these features can indicate whether platform-specificissues or more general issues have more potential user churn

Other features such as the Severity and Priority showcase theprioritization used by developers The Severity and Priority areimportant for us to verify whether the prioritization performed bydevelopers is aligned with the rate of potential user churn

Features related to the type of the bugs (ie whether it is acrash) whether a version number is present or the target milestoneprovide us information regarding the tracking process of issuesThey are important to understand whether better-tracked issues

7

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

870

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

871

872

873

874

875

876

877

878

879

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

Table 5 Manual investigation of DTW correlation linking

FirefoxExample 1Correlation Linking Week 164 in issues to week 169 in switches (May 2018)Issue Report Title Firefox version 5303 appears high memory high resources usage and might causing for SSD damagesIssue Description Mozilla Firefox it is caching all of this data 4-5GB on my new SSD and its lost 4 from its health within 5 months due continuously writesoverwrites firefox cache on my SSDAltrnativeto Comment Switching to Chrome after version 5303 high memory is damaging my SSDExample 2Correlation Linking Week 66 in issues to week 68 in switches (November 2016)Issue Report Title Sessions are not cleared in the private windowIssue Description After closing private window (not a tab) and opening a new one I still logged inAltrnativeto Comment Chrome incognito functionality is more stable than Firefox Firefox doesnrsquot resets sessions

EclipseExample 1Correlation Linking Week 10 in issues to week 11 in switches (July 2015)Issue Report Title Not Java 8 CompatibleIssue Description When downloading Eclipse for the first time for Java development 64-bit (which is what my computer is) it is not running because it is not compatible with Java version 8

(which is the latest version of Java) but appears to be compatible with version 7 still -DosgirequiredJavaVersion=17Altrnativeto Comment Using Netbeans because Eclipse is not installingExample 2Correlation Linking Week 211 in issues to week 213 in switches (February 2019)Issue Report Title jar files in the exported application do not contain class filesIssue Description The application has many plugins It builds and runs well within eclipse but when I export the application using the eclipse export product wizard the jar files (corresponding to

the plugins) do not contain any class files they include meta-inf pluginxml and all the extra folders (eg lib assets etc when available) but they do NOT contain any class filesAltrnativeto Comment Didnrsquot find any problems in exporting jar (when recommending IntelliJ)

ApacheExample 1Correlation Linking Week 113 in issues to week 117 in switches (July 2017)Issue Report Title httpd 2426 no longer building against lua 531 or lua 534Issue Description Building Apache 2426 against lua 531 or 534 compiled from scratch (latest version as of this writing) does not work Compilation against lua 531 did work with 2425Altrnativeto Comment Some issues while building with lua (when recommending XAMPP)Example 2Correlation Linking Week 12 in issues to week 13 in switches (September 2019)Issue Report Title Apache Server is restarted time to time (Release 2410)Issue Description I am using Apache release 2410 in Windows Server 2008 Time to time the Apache server is getting restarted The service is unavailable for 3-5 minutes and after that the service

is started up automatically This issue occurs frequently twice a day two days once etcAltrnativeto Comment XAMPP is more stable I am experiencing random restarts throughout the day when using Apache

Table 6 The issue reports attributes for Firefox Eclipse andApache

Metric Description

Component consists of different components of the system such as bookmarkstheme toolbar etc

Status describes the status of the issue such as unconfirmed open re-solved etc

Resolution describes the resolution type of the issue such as fixed duplicateinvalid etc

Severity the assigned severity such as blocker critical normal etcHardware the different hardware affected by the issue such as desktop mobile

32 or 64 bits etcOS the different OS affected by the issue such as OSX Windows etcPriority the assigned priority from 1 to 5Type the assigned issue type such as defect enhancement or taskTime Alive the time difference between the date opened and the date resolved

in minutesTime since last change the time difference between the date opened and the date of last

change in minutesIs a Crash a boolean value to indicate if it is a crash reportHas a Version Number a boolean whether the issue report has a version numberHas a Target Milestone a boolean whether the issue report has a target numberHas a Regression Range a boolean value to indicate if the bug causes a regression testingNumber of comments the number of comments for the reportNumber of votes the number of votes for the report which validates the integrity of

the report by other usersNumber of Blocks the number of issues that are blocked by that issue until it is re-

solved

(ie with more tracking information) are associated with potentialusersrsquochurn

The number of comments votes and blocks showcase the effortinvested on discussing an issue Finally the time-alive and the time-since-last-change are two features related to the life-cycle of an issuereport Longer time-alive values indicate a lingering issue that hasnot been resolved yet while a short time-since-last-change values

indicate issues that have been recently addressed (and thus can stillbe reoccurring)

Our studied features are inspired by previous research in soft-ware engineering that have built machine learning models usinginformation present in issue reports [15 16 16 18 19 29 32 39 74]

We chose features that are common across the issue reports ofFirefox Eclipse and Apache We filtered out features that are uniqueto an issue tracker system only to ensure a fair comparison betweenour studied issue reports Afterwards we performed Spearmancorrelation tests [44] on the selected features in which each featureis tested against the set of all the other features We did not observeany correlation obtaining a value higher than 07 which suggeststhat our selected features are not correlated and can be safely fedinto our machine learning models [30]

To evaluate the performance of our models we use the Area Un-der the Curvemetric (AUC) AUC is useful for evaluating our modelsbecause our models output probabilities Therefore AUC showsthe discrimination power of our models in every probability thresh-old (unlike Precision Recall or F-measure which are limited toevaluating models at a single probability threshold [66]) The AUCvalues range from 0 to 1 An AUC of 05 denotes a random guessingmodel while an AUC of 1 denotes a perfect distinguishing powerAn AUC of 0 denotes a model with perfect inverse predictions

Given the temporal nature of our data (ie future issue reportscannot be used to predict the potential user churn of past issuereports) we adopt a leave-one-out validation approach to obtainour AUC values

First our issue reports have already been sorted when we per-form the time-series analyses Second we split our data into two

8

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

987

988

989

990

991

992

993

994

995

996

997

998

999

1000

1001

1002

1003

1004

1005

1006

1007

1008

1009

1010

1011

1012

1013

1014

1015

1016

1017

1018

1019

1020

1021

1022

1023

1024

1025

1026

1027

1028

1029

1030

1031

1032

1033

1034

1035

1036

1037

1038

1039

1040

1041

1042

1043

1044

sets one containing 80 of the data ie the training set and an-other containing 20 ie the validation set In the first iterationof our models we use 80 of the data to train our models and weuse the first element of the validation set to test out models Thenthe leave-one-out validation iteratively increases the training setsize by one element (from the validation set) and predicts the nextelement of the validation set This process is repeated until thetraining set reaches the size of 119873 minus 1 (where 119873 is the size of ourdata set) Finally our obtained AUC values are the aggregation(ie median) of all the AUC values obtained in the leave-one-outiterationsFindingsTheAUC values obtained for our threemodels are shownin our online appendix8 The XGboost model performs the best inthe three studied projects obtaining AUC values that exceed 083There exists a subtle difference between the AUC values of theXGboost model and the Feed Forward Neural Networks AUCssince both the XGboost model and the Neural Networks can graspthe complex relationships in our issue reports dataSummary Our results suggest that machine learning models suchas XGboost and Feed Forward Neural Networks can predict therate of user churn with relatively high accuracies (in terms of AUC)Such predictions could help in the prioritization of issue reports

RQ3 What are the most important factorsrelated to potential user churnMotivation In RQ2 we observe that machine learning modelscan be used to predict the rate of potential usersrsquochurn Howeverdevelopers would hardly blindly trust a machine learning modelto help on their decisions (and they should not do so) Developerswould mostly benefit from understanding the reasons behind thepredictions of our machine learning models so that they couldpossibly adapt (or not) their development process Therefore inRQ3 we investigate the most important features in the predictionsof our machine learning modelApproach In this RQ we study the most important features in thepredictions of the XGboost models since they obtain the best AUCvalues in our three studied projects

To find the most important features we adopt a simple featureextraction algorithm [61 70] Consider that our models are trainedon a feature set 119883 = 1199091 1199092 119909119899 (as explained in RQ2) The featureextraction process consists of iteratively (i) removing each feature 119909119894from our models and (ii) computing the AUC without each feature119909119894 (by using the same leave-one-out validation process explainedin RQ2) The AUC values from the models without a feature 119909119894 arecompared to the models containing all the features (ie we takethe difference 119860119880119862119900119903119894119892119894119899119886119897 minus119860119880119862119903119890119898119900119907119890119889 ) The higher the drop inthe AUC value caused by removing a certain feature 119909119894 the higherthe importance of such a feature 119909119894 [4 9 10]Findings Table 7 shows our obtained results after performing thefeature extraction process The table shows the drop in the AUC foreach feature (ie ΔAUC) We also show the minimum maximummean and median values of the features in the 119897119900119908 user churn rate(ie 0) and ℎ119894119892ℎ user churn rate (ie 1) prediction classes

8httpsdoiorg105281zenodo3610584

The Issue Alive Time is the most important feature obtaining thehighest ΔAUC in our three studied projects This result suggeststhat it is unhealthy to leave issues hanging for a long time as theycan be perceived as a reason for users to churn

The Number of Comments obtains a considerable ΔAUC in ourthree studied systems The association between the Number ofComments and the rate of user churnmay occur due to themultitudenumber of users being affected by an issue

The Time-since-last-change is also an important feature in theFirefox and Eclipse projects This result suggests that issues withrecurring changes or modifications may signal a certain instabilityin the fixing process which might cause a potential user churn inthe future

It is worthy noting that features such as Severity Priority andType of Bugs do not obtain a high importance in the predictionswhich might suggest that the existing prioritization fields used inissue reports are misaligned with the potential user churnSummaryOur results suggest that long lived issues can potentiallylead to more potential userrsquos churn and that the existing processesfor prioritizing issues should be augmented to capture the highlyinteractive and long lived issues

5 THREATS TO VALIDITYConstruct Validity The main construct validity of our study isrelated to the assumption of a potential user churn The alterna-tivetonet website does not record exactly when a user has chosen asoftware alternative over another software Instead alternativetonetrecords that a user thinks that a software may be a good alternativeto another software The motivation behind the act of signaling analternative is not known by us For example there might be usersthat are simply driven by the passion of contributing to the alterna-tivetonet community which lead them to provide several opinionsas to which software could make good alternatives to others Wehave deliberately adopted the term potential user churn instead ofsimply user churn due to this unknown motivations behind signal-ing alternatives Moreover it is reasonable to think that if a givensoftware has received a considerable amount of recommendationfor alternatives this may likely indeed represent that users areconsidering to change the software

Internal Validity Our internal threats to validity are mainlyrelated to (i) our chosen features and (ii) our prediction classes Wesplit the peaks of potential user churn into the low and high classesOther research may find different results if the potential user churnis modeled in different way In addition we acknowledge that ourset of metrics is not exhaustive For example we have not studiedwhether the presence of a stack-trace in an issue report can berelated to future potential user churn We plan to (i) model thepotential user churn as a continuous variable and (ii) extend our setof features to predict the potential user churn in future research

External Validity Our study targeted three domains that areweb servers IDEs and web browsers These three domains werechosen due to their availability since we could fetch their issuereports data Although our study showed common factors in ouranalyses we cannot generalize our results to other domains orprojects with a different scale (ie smaller projects) Regardless ourwork provides insights of the possible reasons behind user churn

9

1045

1046

1047

1048

1049

1050

1051

1052

1053

1054

1055

1056

1057

1058

1059

1060

1061

1062

1063

1064

1065

1066

1067

1068

1069

1070

1071

1072

1073

1074

1075

1076

1077

1078

1079

1080

1081

1082

1083

1084

1085

1086

1087

1088

1089

1090

1091

1092

1093

1094

1095

1096

1097

1098

1099

1100

1101

1102

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

1103

1104

1105

1106

1107

1108

1109

1110

1111

1112

1113

1114

1115

1116

1117

1118

1119

1120

1121

1122

1123

1124

1125

1126

1127

1128

1129

1130

1131

1132

1133

1134

1135

1136

1137

1138

1139

1140

1141

1142

1143

1144

1145

1146

1147

1148

1149

1150

1151

1152

1153

1154

1155

1156

1157

1158

1159

1160

Table 7 The most important features for Firefox Eclipse and Apache ranked by their drop in AUC

Factor ΔAUC Min 0 Max 0 Mean 0 Median 0 Min 1 Max 1 Mean 1 Median 1

FirefoxTime Alive 013 0 120 1102 13 10 1487 4872 63OS All 007 0 1 014 0 0 1 083 1Number of Comments 006 1 65 894 12 1 458 2811 45Time Since Last Change 006 9 30 1090 22 0 5 410 3Has a Target Milestone 005 0 1 022 0 0 1 079 1Has a Version Number 004 0 1 033 0 0 1 064 1Component UI Web Payments 003 0 1 042 0 0 1 061 1

EclipseTime Alive 012 0 103 901 12 8 1334 4396 49Has a Target Milestone 006 0 1 019 0 0 1 082 1Time since last change 006 7 27 851 19 0 6 312 3Has a Version Number 005 0 1 034 0 0 1 077 1Component Debug 004 0 1 047 0 0 1 072 1Number of Comments 003 1 72 1092 11 1 632 2701 72OS All 002 0 1 050 0 0 1 068 1

ApacheTime Alive 014 0 136 1132 14 14 1521 5310 70Component Utils 010 0 1 022 0 0 1 091 1Has a Version Number 009 0 1 038 0 0 1 086 1Number of Comments 005 1 92 1232 6 1 352 4207 79OS All 004 0 1 033 0 0 1 079 1Status REOPENED 003 0 1 040 0 0 1 073 1Hardware All 002 0 1 051 0 0 1 078 1

6 RELATEDWORKGiven that we study the relationship between potential user churnand issue reports our research is related to the areas of usercustomerchurn and software quality Therefore we survey the related re-search around these two areas in this section

UserCustomerChurnUser or customer churn has beenwidelystudied in the area of telecommunication services [3 28 71] Aminet al [3] proposed a cross-company machine learning model topredict customer churn The authors also propose the use of datatransformation (eg by using the log function) to improve the train-ing data Ullah et al [71] used feature selection techniques such asInformation Gain and Correlation Attributes Ranking Filter to selectwhich features better explained the churn behaviour of differentcustomer groups The authors observed that a Random Forest modelperformed the best in their study While the Business to Customers(B2C) churn has been widely studied Figalist et al [28] recentlyinvestigated the Business to Business (B2B) churn ie when thecustomer business changes shifts to the services provided by thecompetition Figalist et al [28] allude that although B2B relation-ships tend to be more stable they have a much bigger financial im-pact when they change In terms of software engineering there hasbeen a lack of studies that have investigated usercostumer churndirectly Instead much research has been invested on customerreviews regarding software products [6 46 47] Noei et al [46]studied which features from mobile apps are associated with betterranks in the Google Play store While the work by Noei et al [46]does not study the customer churn directly striving for better ranksin Google Play may avoid user churn in mobile apps Bavota etal [6] investigated the relationship between the usage of fault- andchange-prone APIs in Android apps and user reviews Differentlyfrom all the aforementioned work our study investigates the userchurn is software products

Software Quality Although there exist a lack of studies ana-lyzing the user churn of software productsservices there has beena considerable amount of research on software quality [5 34 39 6667 74] Research on software quality is important since a quality

software is more stable and less defect-prone Ultimately such char-acteristics are tightly related to the observations of our study(iesoftware issues are correlated with user churn) A prominent areaof software quality research in software engineering is the defectprediction area [66 67] Tantithamthavorn et al [66] studied howmuch improvement can be obtained by automatically optimizingdefect prediction models Their research shows that automatic op-timization can yield significant or insignificant performance gainsdepending on the machine learning algorithm Other considerableamount of research has been invested on the triaging part of issuereports [5] and on the effort estimation of an issue report (in termsof time to address an issue) [39 74] In this paper we complementprior research by studying the relationship shared between issuereports (eg defects or required enhancements) with the potentialuser churn of users

7 CONCLUSIONIn this work we study the data available on alternativetonet tobetter understand the relationship between software issues and thepotential user churn of users Having observed that user concernsare tightly related to software issues (eg bugs) we investigatethe relationship between issue reports and the potential user churnof users Our study reveals key issues that must be addressed forthe success of a software product (depending on the domain) Forexample we observe that the potential user churn of users may betightly related to the lack of a robust documentation and support fortesting tools (in the ldquoIDErdquo and ldquoWeb Serverrdquo domains) Finally ourmachine learning models reveal that (i) the longer the issue takes tobe fixed the higher the chances of user churn and (ii) issues withinmore general software components are more likely to be associatedwith user churn Finally we suggest that the current prioritizationperformed by developers should be augmented to encompass thelong lived and highly interactive issues In overall our study sug-gests that the prioritization process of issues can be improved byconsidering the potential user churn of users associated with suchissues

10

1161

1162

1163

1164

1165

1166

1167

1168

1169

1170

1171

1172

1173

1174

1175

1176

1177

1178

1179

1180

1181

1182

1183

1184

1185

1186

1187

1188

1189

1190

1191

1192

1193

1194

1195

1196

1197

1198

1199

1200

1201

1202

1203

1204

1205

1206

1207

1208

1209

1210

1211

1212

1213

1214

1215

1216

1217

1218

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

1219

1220

1221

1222

1223

1224

1225

1226

1227

1228

1229

1230

1231

1232

1233

1234

1235

1236

1237

1238

1239

1240

1241

1242

1243

1244

1245

1246

1247

1248

1249

1250

1251

1252

1253

1254

1255

1256

1257

1258

1259

1260

1261

1262

1263

1264

1265

1266

1267

1268

1269

1270

1271

1272

1273

1274

1275

1276

REFERENCES[1] Martiacuten Abadi Paul Barham Jianmin Chen Zhifeng Chen Andy Davis Jeffrey

Dean Matthieu Devin Sanjay Ghemawat Geoffrey Irving Michael Isard Man-junath Kudlur Josh Levenberg Rajat Monga Sherry Moore Derek G MurrayBenoit Steiner Paul Tucker Vijay Vasudevan Pete Warden Martin Wicke YuanYu and Xiaoqiang Zheng 2016 Tensorflow A system for large-scale machinelearning In 12th USENIX Symposium on Operating Systems Design and Imple-mentation (OSDI 16) 265ndash283

[2] W Abdelmoez Mohamed Kholief and Fayrouz M Elsalmy 2012 Bug fix-time pre-diction model using naiumlve bayes classifier In 2012 22nd International Conferenceon Computer Theory and Applications (ICCTA) IEEE 167ndash172

[3] Adnan Amin Babar Shah Asad Masood Khattak Thar Baker and Sajid Anwar2018 Just-in-time customer churn prediction with and without data transforma-tion In 2018 IEEE Congress on Evolutionary Computation (CEC) IEEE 1ndash6

[4] Hafeez Ullah Amin Aamir Saeed Malik Rana Fayyaz Ahmad Nasreen BadruddinNidal Kamel Muhammad Hussain and Weng-Tink Chooi 2015 Feature extrac-tion and classification for EEG signals using wavelet transform and machinelearning techniques Australasian physical amp engineering sciences in medicine 381 (2015) 139ndash149

[5] John Anvik Lyndon Hiew and Gail C Murphy 2006 Who should fix this bugIn Proceedings of the 28th international conference on Software engineering ACM361ndash370

[6] Gabriele Bavota Mario Linares-Vasquez Carlos Eduardo Bernal-Cardenas Mas-similiano Di Penta Rocco Oliveto and Denys Poshyvanyk 2014 The impactof api change-and fault-proneness on the user ratings of android apps IEEETransactions on Software Engineering 41 4 (2014) 384ndash407

[7] Pamela Bhattacharya and Iulian Neamtiu 2011 Bug-fix time prediction modelscan we do better In Proceedings of the 8thWorking Conference on Mining SoftwareRepositories ACM 207ndash210

[8] Steven Bird and Edward Loper 2004 NLTK the natural language toolkit InProceedings of the ACL 2004 on Interactive poster and demonstration sessionsAssociation for Computational Linguistics 31

[9] Andrew P Bradley 1997 The use of the area under the ROC curve in theevaluation of machine learning algorithms Pattern recognition 30 7 (1997)1145ndash1159

[10] Leo Breiman 2001 Random forests Machine learning 45 1 (2001) 5ndash32[11] Albert Caruana and Michael T Ewing 2010 How corporate reputation quality

and value influence online loyalty Journal of Business Research 63 9-10 (2010)1103ndash1110

[12] Tianqi Chen and Carlos Guestrin 2016 Xgboost A scalable tree boosting systemIn Proceedings of the 22nd acm sigkdd international conference on knowledgediscovery and data mining ACM 785ndash794

[13] Tianqi Chen Tong He Michael Benesty Vadim Khotilovich and Yuan Tang 2015Xgboost extreme gradient boosting R package version 04-2 (2015) 1ndash4

[14] Wenbin Chen Kun Fu Jiawei Zuo Xinwei Zheng Tinglei Huang and WenjuanRen 2017 Radar emitter classification for large data set based on weighted-xgboost IET Radar Sonar amp Navigation 11 8 (2017) 1203ndash1207

[15] Morakot Choetkiertikul Hoa Khanh Dam Truyen Tran and Aditya Ghose 2015Predicting delays in software projects using networked classification (t) In 201530th IEEEACM International Conference on Automated Software Engineering (ASE)IEEE 353ndash364

[16] Morakot Choetkiertikul Hoa Khanh Dam Truyen Tran Aditya Ghose and JohnGrundy 2017 Predicting delivery capability in iterative software developmentIEEE Transactions on Software Engineering 44 6 (2017) 551ndash573

[17] Franccedilois Chollet 2017 Keras[18] Daniel Alencar Da Costa Shane McIntosh Uiraacute Kulesza Ahmed E Hassan and

Surafel Lemma Abebe 2018 An empirical study of the integration time of fixedissues Empirical Software Engineering 23 1 (2018) 334ndash383

[19] Daniel Alencar da Costa Shane McIntosh Christoph Treude Uiraacute Kulesza andAhmed E Hassan 2018 The impact of rapid release cycles on the integrationdelay of fixed issues Empirical Software Engineering 23 2 (2018) 835ndash904

[20] Piew Datta Brij Masand Deepak R Mani and Bin Li 2000 Automated cellularmodeling and prediction on a large scale Artificial Intelligence Review 14 6 (2000)485ndash502

[21] Kushal S Dave Vishal Vaingankar Sumanth Kolar and Vasudeva Varma 2013Timespent based models for predicting user retention In Proceedings of the 22ndinternational conference on World Wide Web ACM 331ndash342

[22] William H DeLone and Ephraim R McLean 1992 Information systems successThe quest for the dependent variable Information systems research 3 1 (1992)60ndash95

[23] William H DeLone and Ephraim R McLean 2003 Model of information systemssuccess a ten years update Journal of Management (2003)

[24] Gideon Dror Dan Pelleg Oleg Rokhlenko and Idan Szpektor 2012 Churnprediction in new users of Yahoo answers In Proceedings of the 21st InternationalConference on World Wide Web ACM 829ndash834

[25] Khaled El Emam 2005 The ROI from software quality Auerbach Publications[26] Katherine Ellis Jacqueline Kerr Suneeta Godbole Gert Lanckriet David Wing

and Simon Marshall 2014 A random forest classifier for the prediction of energy

expenditure and type of physical activity from wrist and hip accelerometersPhysiological measurement 35 11 (2014) 2191

[27] David Enke and Suraphan Thawornwong 2005 The use of datamining and neuralnetworks for forecasting stock market returns Expert Systems with applications29 4 (2005) 927ndash940

[28] Iris Figalist Christoph Elsner Jan Bosch and Helena Holmstroumlm Olsson 2019Customer churn prediction in B2B contexts In International Conference on Soft-ware Business Springer 378ndash386

[29] Emanuel Giger Martin Pinzger and Harald Gall 2010 Predicting the fix timeof bugs In Proceedings of the 2nd International Workshop on RecommendationSystems for Software Engineering ACM 52ndash56

[30] Frank E Harrell Jr 2015 Regression modeling strategies with applications to linearmodels logistic and ordinal regression and survival analysis Springer

[31] Steffen Hedegaard and Jakob Grue Simonsen 2013 Extracting usability and userexperience information from online user reviews In Proceedings of the SIGCHIConference on Human Factors in Computing Systems ACM 2089ndash2098

[32] Yujuan Jiang Bram Adams and Daniel M German 2013 Will my patch make itand how fast Case study on the linux kernel In Proceedings of the 10th WorkingConference on Mining Software Repositories IEEE Press 101ndash110

[33] Guolin Ke Jia Zhang Zhenhui Xu Jiang Bian and Tie-Yan Liu 2019 TabNNa universal neural network solution for tabular data httpsopenreviewnetforumid=r1eJssCqY7

[34] Foutse Khomh Brian Chan Ying Zou and Ahmed E Hassan 2011 An entropyevaluation approach for triaging field crashes A case study of mozilla firefox InReverse Engineering (WCRE) 2011 18th Working Conference on IEEE 261ndash270

[35] Andy Liaw and Matthew Wiener 2002 Classification and regression by randomforest R news 2 3 (2002) 18ndash22

[36] Erik Linstead Paul Rigor Sushil Bajracharya Cristina Lopes and Pierre Baldi2007 Mining eclipse developer contributions via author-topic models In FourthInternational Workshop on Mining Software Repositories (MSRrsquo07 ICSE Workshops2007) IEEE 30ndash30

[37] Xi Long Wenjing Yin Le An Haiying Ni Lixian Huang Qi Luo and Yan Chen2012 Churn analysis of online social network users using data mining techniquesIn Proceedings of the international MultiConference of Engineers and ConputerScientists Vol 1

[38] James MacQueen 1967 Some methods for classification and analysis of multivari-ate observations In Proceedings of the fifth Berkeley symposium on mathematicalstatistics and probability Vol 1 Oakland CA USA 281ndash297

[39] Lionel Marks Ying Zou and Ahmed E Hassan 2011 Studying the fix-timefor bugs in large open source projects In Proceedings of the 7th InternationalConference on Predictive Models in Software Engineering ACM 11

[40] Andrew Kachites McCallum 2002 Mallet A machine learning for languagetoolkit (2002)

[41] Miloš Milošević Nenad Živić and Igor Andjelković 2017 Early churn predic-tion with personalized targeting in mobile social games Expert Systems withApplications 83 (2017) 326ndash332

[42] Audris Mockus Roy T Fielding and James Herbsleb 2000 A case study ofopen source software development the Apache server In Proceedings of the 22ndinternational conference on Software engineering Acm 263ndash272

[43] Gustavo Percio Zimmermann Montesdioca and Antocircnio Carlos Gastaud Maccedilada2015 Measuring user satisfaction with information security practices Computersamp Security 48 (2015) 267ndash280

[44] Leann Myers and Maria J Sirois 2004 Spearman correlation coefficients differ-ences between Encyclopedia of statistical sciences 12 (2004)

[45] Didrik Nielsen 2016 Tree boosting with XGBoost-why does XGBoost win EveryMachine Learning Competition Masterrsquos thesis NTNU

[46] Ehsan Noei Daniel Alencar Da Costa and Ying Zou 2018 Winning the appproduction rally In Proceedings of the 2018 26th ACM Joint Meeting on EuropeanSoftware Engineering Conference and Symposium on the Foundations of SoftwareEngineering ACM 283ndash294

[47] Ehsan Noei Feng Zhang Shaohua Wang and Ying Zou 2019 Towards pri-oritizing user-related issue reports of mobile applications Empirical SoftwareEngineering 24 4 (2019) 1964ndash1996

[48] Ehsan Noei Feng Zhang and Ying Zou 2019 Too Many User-Reviews WhatShould App Developers Look at First IEEE Transactions on Software Engineering(2019)

[49] Derek Orsquocallaghan Derek Greene Joe Carthy and Paacutedraig Cunningham 2015An analysis of the coherence of descriptors in topic modeling Expert Systemswith Applications 42 13 (2015) 5645ndash5657

[50] Ya A Pachepsky Dennis Timlin and GY Varallyay 1996 Artificial neural net-works to estimate soil water retention from easily measurable data Soil ScienceSociety of America Journal 60 3 (1996) 727ndash733

[51] Mahesh Pal 2005 Random forest classifier for remote sensing classificationInternational Journal of Remote Sensing 26 1 (2005) 217ndash222

[52] Sebastiano Panichella Andrea Di Sorbo Emitza Guzman Corrado A VisaggioGerardo Canfora and Harald C Gall 2015 How can i improve my app classifyinguser reviews for software maintenance and evolution In Software maintenanceand evolution (ICSME) 2015 IEEE international conference on IEEE 281ndash290

11

1277

1278

1279

1280

1281

1282

1283

1284

1285

1286

1287

1288

1289

1290

1291

1292

1293

1294

1295

1296

1297

1298

1299

1300

1301

1302

1303

1304

1305

1306

1307

1308

1309

1310

1311

1312

1313

1314

1315

1316

1317

1318

1319

1320

1321

1322

1323

1324

1325

1326

1327

1328

1329

1330

1331

1332

1333

1334

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

1335

1336

1337

1338

1339

1340

1341

1342

1343

1344

1345

1346

1347

1348

1349

1350

1351

1352

1353

1354

1355

1356

1357

1358

1359

1360

1361

1362

1363

1364

1365

1366

1367

1368

1369

1370

1371

1372

1373

1374

1375

1376

1377

1378

1379

1380

1381

1382

1383

1384

1385

1386

1387

1388

1389

1390

1391

1392

[53] Chitra Phadke Huseyin Uzunalioglu Veena B Mendiratta Dan Kushnir andDerek Doran 2013 Prediction of subscriber churn using social network analysisBell Labs Technical Journal 17 4 (2013) 63ndash76

[54] Peter C Rigby Daniel M German and Margaret-Anne Storey 2008 Open sourcesoftware peer review practices a case study of the apache server In Proceedingsof the 30th international conference on Software engineering ACM 541ndash550

[55] Martin P Robillard and Robert Deline 2011 A field study of API learning obstaclesEmpirical Software Engineering 16 6 (2011) 703ndash732

[56] Michael Roumlder Andreas Both and Alexander Hinneburg 2015 Exploring thespace of topic coherence measures In Proceedings of the eighth ACM internationalconference on Web search and data mining ACM 399ndash408

[57] Victor Francisco Rodriguez-Galiano Bardan Ghimire John Rogan Mario Chica-Olmo and Juan Pedro Rigol-Sanchez 2012 An assessment of the effectivenessof a random forest classifier for land-cover classification ISPRS Journal of Pho-togrammetry and Remote Sensing 67 (2012) 93ndash104

[58] Hiroaki Sakoe and Seibi Chiba 1978 Dynamic programming algorithm opti-mization for spoken word recognition IEEE transactions on acoustics speech andsignal processing 26 1 (1978) 43ndash49

[59] Stan Salvador and Philip Chan 2007 Toward accurate dynamic time warping inlinear time and space Intelligent Data Analysis 11 5 (2007) 561ndash580

[60] Peter B Seddon 1997 A respecification and extension of the DeLone and McLeanmodel of IS success Information systems research 8 3 (1997) 240ndash253

[61] Rudy Setiono Bart Baesens and Christophe Mues 2008 Recursive neural net-work rule extraction for data with mixed attributes IEEE Transactions on NeuralNetworks 19 2 (2008) 299ndash307

[62] Dong-Hee Shin 2015 Effect of the customer experience on satisfaction withsmartphones Assessing smart satisfaction index with partial least squaresTelecommunications Policy 39 8 (2015) 627ndash641

[63] Tom De Smedt and Walter Daelemans 2012 Pattern for python Journal ofMachine Learning Research 13 Jun (2012) 2063ndash2067

[64] Daniel Svozil Vladimir Kvasnicka and Jiri Pospichal 1997 Introduction to multi-layer feed-forward neural networks Chemometrics and intelligent laboratorysystems 39 1 (1997) 43ndash62

[65] Shaheen Syed and Marco Spruit 2017 Full-text or abstract Examining topiccoherence scores using latent dirichlet allocation In 2017 IEEE Internationalconference on data science and advanced analytics (DSAA) IEEE 165ndash174

[66] Chakkrit Tantithamthavorn Shane McIntosh Ahmed E Hassan and KenichiMatsumoto 2016 Automated parameter optimization of classification techniquesfor defect prediction models In 2016 IEEEACM 38th International Conference onSoftware Engineering (ICSE) IEEE 321ndash332

[67] Chakkrit Tantithamthavorn Shane McIntosh Ahmed E Hassan and KenichiMatsumoto 2018 The impact of automated parameter optimization on defectprediction models IEEE Transactions on Software Engineering (2018)

[68] Romain Tavenard 2017 tslearn A machine learning toolkit dedicated to time-series data httpsgithubcomrtavenartslearn

[69] L Torlay Marcela Perrone-Bertolotti Elizabeth Thomas and Monica Baciu 2017Machine learningndashXGBoost analysis of language networks to classify patientswith epilepsy Brain informatics 4 3 (2017) 159

[70] Geoffrey G Towell and Jude W Shavlik 1993 Extracting refined rules fromknowledge-based neural networks Machine learning 13 1 (1993) 71ndash101

[71] Irfan Ullah Basit Raza Ahmad Kamran Malik Muhammad Imran Saif Ul Islamand SungWon Kim 2019 A churn prediction model using random forest analysisof machine learning techniques for churn prediction and factor identification intelecom sector IEEE Access 7 (2019) 60134ndash60149

[72] Lorenzo Villarroel Gabriele Bavota Barbara Russo Rocco Oliveto and Massimil-iano Di Penta 2016 Release planning of mobile apps based on user reviews InProceedings of the 38th International Conference on Software Engineering ACM14ndash24

[73] Chih-Ping Wei and I-Tang Chiu 2002 Turning telecommunications call detailsto churn prediction a data mining approach Expert systems with applications 232 (2002) 103ndash112

[74] Cathrin Weiss Rahul Premraj Thomas Zimmermann and Andreas Zeller 2007How long will it take to fix this bug In Fourth International Workshop on MiningSoftware Repositories (MSRrsquo07 ICSE Workshops 2007) IEEE 1ndash1

[75] Ian H Witten Eibe Frank and A Mark 2016 Hall and Christopher J Pal DataMining Practical machine learning tools and techniques (2016)

[76] Barbara H Wixom and Peter A Todd 2005 A theoretical integration of usersatisfaction and technology acceptance Information systems research 16 1 (2005)85ndash102

[77] Jae-Chern Yoo and Tae Hee Han 2009 Fast normalized cross-correlation Circuitssystems and signal processing 28 6 (2009) 819

12

  • Abstract
  • 1 Introduction
  • 2 Motivating Example
  • 3 Data Collection
  • 4 Research Questions amp Results
  • 5 Threats To Validity
  • 6 Related Work
  • 7 Conclusion
  • References
Page 7: On the Relationship between User Churn and Software Issues · relationship between the issues that are present in a software prod-uct and user churn. For this purpose, we investigate

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

808

809

810

811

812

Table 4 The extracted topics percentages distribution overthe three domains

Topics Browsers() IDEs() Web servers()Release and updates 18 17 21Bookmarks functionality 12 0 0Memory issues 23 22 11Privacy and security 8 0 0BugsCrashes 24 19 20Debug options 0 10 0Simplistic design 0 8 2Cross-platform 8 6 5Code refactoring and auto-completion 0 10 0Testing 0 2 17Documentation 0 2 15Miscellaneous 7 4 9

within a peak actually describe problems mentioned in the poten-tial usersrsquochurn) Again both co-authors analyze and discuss everyanalyzed sample

In this RQ we perform our analyses only in our studied projects(as opposed to analyzing all the software alternatives in their do-mains) As discussed in Section 3 it would be impracticable toextract the data from all the different Issue Tracking Systems of thesoftware alternatives in the studied domainsFindingsWe respectively observe a cross-correlation of 076 072and 070 in Firefox Eclipse and Apache These cross-correlationscores indicate that there exist a global commonality between thetrends of the two time-series regardless of the timing factor

Regarding our DTW analysis we observe that the time fromthe filing of an issue report to the time of a spark in potentialusersrsquochurn span from 2 to 4 weeks (on the median for the threeprojects) Additionally our DTW correlation reveals a bidirectionalrelationship between the peaks in issue reports and the peaks inpotential usersrsquochurn For the cases where a peak in issue reportsleads to a peak in potential usersrsquochurn it is intuitive to think thatthe peak in potential usersrsquochurn is due to the frustration generatedby the issues described in the reports On the other hand the caseswhere a peak in potential usersrsquochurn is followed by a peak inissue reports may occur due to users being frustrated over non-obvious issues For example if we consider the potential user churndepicted in Figure 3 (ie ldquoJupyter is slow on Firefoxrdquo) the users mightnot find it obvious that such a problem should be reported to theFirefox team (eg users may simply interpret it as a characteristicof Firefox) Therefore in such cases issue reports would be filledonly a while after the potential usersrsquochurn have been expressed

We are specifically interested in studying the relationship be-tween reported issues (ie the documented issues in the trackersystems) and the potential user churn Our rationale is that under-standing the characteristics of issues that are associated with userchurn can help developers to prioritize such issues and avoid futurechurn (thereby retaining more users)

Finally Table 5 shows a subset of the examples found in ourmanual analyses We can indeed observe the apparent relationshipbetween the comments within the potential usersrsquochurn and thedescriptions (or titles) of issue reports The examples are followinga unidirectional link in which the issue report that occurs at acertain time would induce the potential user churn at a later timeSummary Our results suggest that there exists a significant cor-relation between software issues and potential user churn

RQ2 Can we use machine learning models topredict the rate of potential user churnMotivation In the practice most of the issue reports undergo hu-man estimations of the priority and the severity [5] usually basedon internal business-oriented factors (such as ROI [25]) Issuesmight be prioritized by their recency the affected platforms theversions of the software where the issue is present Feeding a ma-chine learning model to identify the potential user churn that arecaused by issues may help automate the issue report prioritizationprocess and ultimately save huge costs in software developmentApproach We train three different machine learning models topredict low and high user churn rates by using information from is-sue reports The first model is a Feed-Forward Neural Network withtwo hidden layers developed by using the framework Keras [17]Keras is widely used to train simple neural networks as it hasa higher abstraction than TensorFlow [1] Feed-Forward NeuralNetworks [64] are the most suitable type of neural networks tobe trained on tabular data [27 33 50] This network evaluates allthe possible combinations of different metrics values to guide thepredictions

Our second model is the XGboost model which is a gradientboosting tree [12 13] XGboost is widely used for binary classifica-tion problems Research has shown that XGboost performs betterthan neural networks and classic classification machine learningalgorithms in problems dealing with tabular data [14 45 69]

We also train Random Forest models [35] Random Forest isa classical machine learning algorithm which is widely used inproblems dealing with tabular data [26 51 57] In addition Ran-dom Forest models have been widely used in software engineeringresearch such as in defect prediction [67]

The goal of ourmodels is to predict the rate of potential usersrsquochurngiven the characteristics of issue reports The data used to fit ourmodels is the data obtained from our DTW associations Simply putthe DTW provides us with 119875 pairs of issue reports 119868 and potentialusersrsquochurn 119862 (ie 119875 lt 119868 119862 gt) Therefore we study the character-istics of the issues 119868 to predict the amount of potential user churn119862 We split the distribution of 119862 into two percentiles (ie low andhigh) The low percentile being from 0 to 50 and the high from51 to 100 Thus our models output dichotomous predictions iewhether the potential usersrsquochurn is low or high

The characteristics of issue reports that we include in our models(henceforth referred to as features) are described in Table 6 Thesefeatures are collected from the Issue Tracking systems of our threestudied projects We study features such as Component Hardwareand OS to account for the locality and the spread of the issuesFor example these features can indicate whether platform-specificissues or more general issues have more potential user churn

Other features such as the Severity and Priority showcase theprioritization used by developers The Severity and Priority areimportant for us to verify whether the prioritization performed bydevelopers is aligned with the rate of potential user churn

Features related to the type of the bugs (ie whether it is acrash) whether a version number is present or the target milestoneprovide us information regarding the tracking process of issuesThey are important to understand whether better-tracked issues

7

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

870

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

871

872

873

874

875

876

877

878

879

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

Table 5 Manual investigation of DTW correlation linking

FirefoxExample 1Correlation Linking Week 164 in issues to week 169 in switches (May 2018)Issue Report Title Firefox version 5303 appears high memory high resources usage and might causing for SSD damagesIssue Description Mozilla Firefox it is caching all of this data 4-5GB on my new SSD and its lost 4 from its health within 5 months due continuously writesoverwrites firefox cache on my SSDAltrnativeto Comment Switching to Chrome after version 5303 high memory is damaging my SSDExample 2Correlation Linking Week 66 in issues to week 68 in switches (November 2016)Issue Report Title Sessions are not cleared in the private windowIssue Description After closing private window (not a tab) and opening a new one I still logged inAltrnativeto Comment Chrome incognito functionality is more stable than Firefox Firefox doesnrsquot resets sessions

EclipseExample 1Correlation Linking Week 10 in issues to week 11 in switches (July 2015)Issue Report Title Not Java 8 CompatibleIssue Description When downloading Eclipse for the first time for Java development 64-bit (which is what my computer is) it is not running because it is not compatible with Java version 8

(which is the latest version of Java) but appears to be compatible with version 7 still -DosgirequiredJavaVersion=17Altrnativeto Comment Using Netbeans because Eclipse is not installingExample 2Correlation Linking Week 211 in issues to week 213 in switches (February 2019)Issue Report Title jar files in the exported application do not contain class filesIssue Description The application has many plugins It builds and runs well within eclipse but when I export the application using the eclipse export product wizard the jar files (corresponding to

the plugins) do not contain any class files they include meta-inf pluginxml and all the extra folders (eg lib assets etc when available) but they do NOT contain any class filesAltrnativeto Comment Didnrsquot find any problems in exporting jar (when recommending IntelliJ)

ApacheExample 1Correlation Linking Week 113 in issues to week 117 in switches (July 2017)Issue Report Title httpd 2426 no longer building against lua 531 or lua 534Issue Description Building Apache 2426 against lua 531 or 534 compiled from scratch (latest version as of this writing) does not work Compilation against lua 531 did work with 2425Altrnativeto Comment Some issues while building with lua (when recommending XAMPP)Example 2Correlation Linking Week 12 in issues to week 13 in switches (September 2019)Issue Report Title Apache Server is restarted time to time (Release 2410)Issue Description I am using Apache release 2410 in Windows Server 2008 Time to time the Apache server is getting restarted The service is unavailable for 3-5 minutes and after that the service

is started up automatically This issue occurs frequently twice a day two days once etcAltrnativeto Comment XAMPP is more stable I am experiencing random restarts throughout the day when using Apache

Table 6 The issue reports attributes for Firefox Eclipse andApache

Metric Description

Component consists of different components of the system such as bookmarkstheme toolbar etc

Status describes the status of the issue such as unconfirmed open re-solved etc

Resolution describes the resolution type of the issue such as fixed duplicateinvalid etc

Severity the assigned severity such as blocker critical normal etcHardware the different hardware affected by the issue such as desktop mobile

32 or 64 bits etcOS the different OS affected by the issue such as OSX Windows etcPriority the assigned priority from 1 to 5Type the assigned issue type such as defect enhancement or taskTime Alive the time difference between the date opened and the date resolved

in minutesTime since last change the time difference between the date opened and the date of last

change in minutesIs a Crash a boolean value to indicate if it is a crash reportHas a Version Number a boolean whether the issue report has a version numberHas a Target Milestone a boolean whether the issue report has a target numberHas a Regression Range a boolean value to indicate if the bug causes a regression testingNumber of comments the number of comments for the reportNumber of votes the number of votes for the report which validates the integrity of

the report by other usersNumber of Blocks the number of issues that are blocked by that issue until it is re-

solved

(ie with more tracking information) are associated with potentialusersrsquochurn

The number of comments votes and blocks showcase the effortinvested on discussing an issue Finally the time-alive and the time-since-last-change are two features related to the life-cycle of an issuereport Longer time-alive values indicate a lingering issue that hasnot been resolved yet while a short time-since-last-change values

indicate issues that have been recently addressed (and thus can stillbe reoccurring)

Our studied features are inspired by previous research in soft-ware engineering that have built machine learning models usinginformation present in issue reports [15 16 16 18 19 29 32 39 74]

We chose features that are common across the issue reports ofFirefox Eclipse and Apache We filtered out features that are uniqueto an issue tracker system only to ensure a fair comparison betweenour studied issue reports Afterwards we performed Spearmancorrelation tests [44] on the selected features in which each featureis tested against the set of all the other features We did not observeany correlation obtaining a value higher than 07 which suggeststhat our selected features are not correlated and can be safely fedinto our machine learning models [30]

To evaluate the performance of our models we use the Area Un-der the Curvemetric (AUC) AUC is useful for evaluating our modelsbecause our models output probabilities Therefore AUC showsthe discrimination power of our models in every probability thresh-old (unlike Precision Recall or F-measure which are limited toevaluating models at a single probability threshold [66]) The AUCvalues range from 0 to 1 An AUC of 05 denotes a random guessingmodel while an AUC of 1 denotes a perfect distinguishing powerAn AUC of 0 denotes a model with perfect inverse predictions

Given the temporal nature of our data (ie future issue reportscannot be used to predict the potential user churn of past issuereports) we adopt a leave-one-out validation approach to obtainour AUC values

First our issue reports have already been sorted when we per-form the time-series analyses Second we split our data into two

8

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

987

988

989

990

991

992

993

994

995

996

997

998

999

1000

1001

1002

1003

1004

1005

1006

1007

1008

1009

1010

1011

1012

1013

1014

1015

1016

1017

1018

1019

1020

1021

1022

1023

1024

1025

1026

1027

1028

1029

1030

1031

1032

1033

1034

1035

1036

1037

1038

1039

1040

1041

1042

1043

1044

sets one containing 80 of the data ie the training set and an-other containing 20 ie the validation set In the first iterationof our models we use 80 of the data to train our models and weuse the first element of the validation set to test out models Thenthe leave-one-out validation iteratively increases the training setsize by one element (from the validation set) and predicts the nextelement of the validation set This process is repeated until thetraining set reaches the size of 119873 minus 1 (where 119873 is the size of ourdata set) Finally our obtained AUC values are the aggregation(ie median) of all the AUC values obtained in the leave-one-outiterationsFindingsTheAUC values obtained for our threemodels are shownin our online appendix8 The XGboost model performs the best inthe three studied projects obtaining AUC values that exceed 083There exists a subtle difference between the AUC values of theXGboost model and the Feed Forward Neural Networks AUCssince both the XGboost model and the Neural Networks can graspthe complex relationships in our issue reports dataSummary Our results suggest that machine learning models suchas XGboost and Feed Forward Neural Networks can predict therate of user churn with relatively high accuracies (in terms of AUC)Such predictions could help in the prioritization of issue reports

RQ3 What are the most important factorsrelated to potential user churnMotivation In RQ2 we observe that machine learning modelscan be used to predict the rate of potential usersrsquochurn Howeverdevelopers would hardly blindly trust a machine learning modelto help on their decisions (and they should not do so) Developerswould mostly benefit from understanding the reasons behind thepredictions of our machine learning models so that they couldpossibly adapt (or not) their development process Therefore inRQ3 we investigate the most important features in the predictionsof our machine learning modelApproach In this RQ we study the most important features in thepredictions of the XGboost models since they obtain the best AUCvalues in our three studied projects

To find the most important features we adopt a simple featureextraction algorithm [61 70] Consider that our models are trainedon a feature set 119883 = 1199091 1199092 119909119899 (as explained in RQ2) The featureextraction process consists of iteratively (i) removing each feature 119909119894from our models and (ii) computing the AUC without each feature119909119894 (by using the same leave-one-out validation process explainedin RQ2) The AUC values from the models without a feature 119909119894 arecompared to the models containing all the features (ie we takethe difference 119860119880119862119900119903119894119892119894119899119886119897 minus119860119880119862119903119890119898119900119907119890119889 ) The higher the drop inthe AUC value caused by removing a certain feature 119909119894 the higherthe importance of such a feature 119909119894 [4 9 10]Findings Table 7 shows our obtained results after performing thefeature extraction process The table shows the drop in the AUC foreach feature (ie ΔAUC) We also show the minimum maximummean and median values of the features in the 119897119900119908 user churn rate(ie 0) and ℎ119894119892ℎ user churn rate (ie 1) prediction classes

8httpsdoiorg105281zenodo3610584

The Issue Alive Time is the most important feature obtaining thehighest ΔAUC in our three studied projects This result suggeststhat it is unhealthy to leave issues hanging for a long time as theycan be perceived as a reason for users to churn

The Number of Comments obtains a considerable ΔAUC in ourthree studied systems The association between the Number ofComments and the rate of user churnmay occur due to themultitudenumber of users being affected by an issue

The Time-since-last-change is also an important feature in theFirefox and Eclipse projects This result suggests that issues withrecurring changes or modifications may signal a certain instabilityin the fixing process which might cause a potential user churn inthe future

It is worthy noting that features such as Severity Priority andType of Bugs do not obtain a high importance in the predictionswhich might suggest that the existing prioritization fields used inissue reports are misaligned with the potential user churnSummaryOur results suggest that long lived issues can potentiallylead to more potential userrsquos churn and that the existing processesfor prioritizing issues should be augmented to capture the highlyinteractive and long lived issues

5 THREATS TO VALIDITYConstruct Validity The main construct validity of our study isrelated to the assumption of a potential user churn The alterna-tivetonet website does not record exactly when a user has chosen asoftware alternative over another software Instead alternativetonetrecords that a user thinks that a software may be a good alternativeto another software The motivation behind the act of signaling analternative is not known by us For example there might be usersthat are simply driven by the passion of contributing to the alterna-tivetonet community which lead them to provide several opinionsas to which software could make good alternatives to others Wehave deliberately adopted the term potential user churn instead ofsimply user churn due to this unknown motivations behind signal-ing alternatives Moreover it is reasonable to think that if a givensoftware has received a considerable amount of recommendationfor alternatives this may likely indeed represent that users areconsidering to change the software

Internal Validity Our internal threats to validity are mainlyrelated to (i) our chosen features and (ii) our prediction classes Wesplit the peaks of potential user churn into the low and high classesOther research may find different results if the potential user churnis modeled in different way In addition we acknowledge that ourset of metrics is not exhaustive For example we have not studiedwhether the presence of a stack-trace in an issue report can berelated to future potential user churn We plan to (i) model thepotential user churn as a continuous variable and (ii) extend our setof features to predict the potential user churn in future research

External Validity Our study targeted three domains that areweb servers IDEs and web browsers These three domains werechosen due to their availability since we could fetch their issuereports data Although our study showed common factors in ouranalyses we cannot generalize our results to other domains orprojects with a different scale (ie smaller projects) Regardless ourwork provides insights of the possible reasons behind user churn

9

1045

1046

1047

1048

1049

1050

1051

1052

1053

1054

1055

1056

1057

1058

1059

1060

1061

1062

1063

1064

1065

1066

1067

1068

1069

1070

1071

1072

1073

1074

1075

1076

1077

1078

1079

1080

1081

1082

1083

1084

1085

1086

1087

1088

1089

1090

1091

1092

1093

1094

1095

1096

1097

1098

1099

1100

1101

1102

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

1103

1104

1105

1106

1107

1108

1109

1110

1111

1112

1113

1114

1115

1116

1117

1118

1119

1120

1121

1122

1123

1124

1125

1126

1127

1128

1129

1130

1131

1132

1133

1134

1135

1136

1137

1138

1139

1140

1141

1142

1143

1144

1145

1146

1147

1148

1149

1150

1151

1152

1153

1154

1155

1156

1157

1158

1159

1160

Table 7 The most important features for Firefox Eclipse and Apache ranked by their drop in AUC

Factor ΔAUC Min 0 Max 0 Mean 0 Median 0 Min 1 Max 1 Mean 1 Median 1

FirefoxTime Alive 013 0 120 1102 13 10 1487 4872 63OS All 007 0 1 014 0 0 1 083 1Number of Comments 006 1 65 894 12 1 458 2811 45Time Since Last Change 006 9 30 1090 22 0 5 410 3Has a Target Milestone 005 0 1 022 0 0 1 079 1Has a Version Number 004 0 1 033 0 0 1 064 1Component UI Web Payments 003 0 1 042 0 0 1 061 1

EclipseTime Alive 012 0 103 901 12 8 1334 4396 49Has a Target Milestone 006 0 1 019 0 0 1 082 1Time since last change 006 7 27 851 19 0 6 312 3Has a Version Number 005 0 1 034 0 0 1 077 1Component Debug 004 0 1 047 0 0 1 072 1Number of Comments 003 1 72 1092 11 1 632 2701 72OS All 002 0 1 050 0 0 1 068 1

ApacheTime Alive 014 0 136 1132 14 14 1521 5310 70Component Utils 010 0 1 022 0 0 1 091 1Has a Version Number 009 0 1 038 0 0 1 086 1Number of Comments 005 1 92 1232 6 1 352 4207 79OS All 004 0 1 033 0 0 1 079 1Status REOPENED 003 0 1 040 0 0 1 073 1Hardware All 002 0 1 051 0 0 1 078 1

6 RELATEDWORKGiven that we study the relationship between potential user churnand issue reports our research is related to the areas of usercustomerchurn and software quality Therefore we survey the related re-search around these two areas in this section

UserCustomerChurnUser or customer churn has beenwidelystudied in the area of telecommunication services [3 28 71] Aminet al [3] proposed a cross-company machine learning model topredict customer churn The authors also propose the use of datatransformation (eg by using the log function) to improve the train-ing data Ullah et al [71] used feature selection techniques such asInformation Gain and Correlation Attributes Ranking Filter to selectwhich features better explained the churn behaviour of differentcustomer groups The authors observed that a Random Forest modelperformed the best in their study While the Business to Customers(B2C) churn has been widely studied Figalist et al [28] recentlyinvestigated the Business to Business (B2B) churn ie when thecustomer business changes shifts to the services provided by thecompetition Figalist et al [28] allude that although B2B relation-ships tend to be more stable they have a much bigger financial im-pact when they change In terms of software engineering there hasbeen a lack of studies that have investigated usercostumer churndirectly Instead much research has been invested on customerreviews regarding software products [6 46 47] Noei et al [46]studied which features from mobile apps are associated with betterranks in the Google Play store While the work by Noei et al [46]does not study the customer churn directly striving for better ranksin Google Play may avoid user churn in mobile apps Bavota etal [6] investigated the relationship between the usage of fault- andchange-prone APIs in Android apps and user reviews Differentlyfrom all the aforementioned work our study investigates the userchurn is software products

Software Quality Although there exist a lack of studies ana-lyzing the user churn of software productsservices there has beena considerable amount of research on software quality [5 34 39 6667 74] Research on software quality is important since a quality

software is more stable and less defect-prone Ultimately such char-acteristics are tightly related to the observations of our study(iesoftware issues are correlated with user churn) A prominent areaof software quality research in software engineering is the defectprediction area [66 67] Tantithamthavorn et al [66] studied howmuch improvement can be obtained by automatically optimizingdefect prediction models Their research shows that automatic op-timization can yield significant or insignificant performance gainsdepending on the machine learning algorithm Other considerableamount of research has been invested on the triaging part of issuereports [5] and on the effort estimation of an issue report (in termsof time to address an issue) [39 74] In this paper we complementprior research by studying the relationship shared between issuereports (eg defects or required enhancements) with the potentialuser churn of users

7 CONCLUSIONIn this work we study the data available on alternativetonet tobetter understand the relationship between software issues and thepotential user churn of users Having observed that user concernsare tightly related to software issues (eg bugs) we investigatethe relationship between issue reports and the potential user churnof users Our study reveals key issues that must be addressed forthe success of a software product (depending on the domain) Forexample we observe that the potential user churn of users may betightly related to the lack of a robust documentation and support fortesting tools (in the ldquoIDErdquo and ldquoWeb Serverrdquo domains) Finally ourmachine learning models reveal that (i) the longer the issue takes tobe fixed the higher the chances of user churn and (ii) issues withinmore general software components are more likely to be associatedwith user churn Finally we suggest that the current prioritizationperformed by developers should be augmented to encompass thelong lived and highly interactive issues In overall our study sug-gests that the prioritization process of issues can be improved byconsidering the potential user churn of users associated with suchissues

10

1161

1162

1163

1164

1165

1166

1167

1168

1169

1170

1171

1172

1173

1174

1175

1176

1177

1178

1179

1180

1181

1182

1183

1184

1185

1186

1187

1188

1189

1190

1191

1192

1193

1194

1195

1196

1197

1198

1199

1200

1201

1202

1203

1204

1205

1206

1207

1208

1209

1210

1211

1212

1213

1214

1215

1216

1217

1218

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

1219

1220

1221

1222

1223

1224

1225

1226

1227

1228

1229

1230

1231

1232

1233

1234

1235

1236

1237

1238

1239

1240

1241

1242

1243

1244

1245

1246

1247

1248

1249

1250

1251

1252

1253

1254

1255

1256

1257

1258

1259

1260

1261

1262

1263

1264

1265

1266

1267

1268

1269

1270

1271

1272

1273

1274

1275

1276

REFERENCES[1] Martiacuten Abadi Paul Barham Jianmin Chen Zhifeng Chen Andy Davis Jeffrey

Dean Matthieu Devin Sanjay Ghemawat Geoffrey Irving Michael Isard Man-junath Kudlur Josh Levenberg Rajat Monga Sherry Moore Derek G MurrayBenoit Steiner Paul Tucker Vijay Vasudevan Pete Warden Martin Wicke YuanYu and Xiaoqiang Zheng 2016 Tensorflow A system for large-scale machinelearning In 12th USENIX Symposium on Operating Systems Design and Imple-mentation (OSDI 16) 265ndash283

[2] W Abdelmoez Mohamed Kholief and Fayrouz M Elsalmy 2012 Bug fix-time pre-diction model using naiumlve bayes classifier In 2012 22nd International Conferenceon Computer Theory and Applications (ICCTA) IEEE 167ndash172

[3] Adnan Amin Babar Shah Asad Masood Khattak Thar Baker and Sajid Anwar2018 Just-in-time customer churn prediction with and without data transforma-tion In 2018 IEEE Congress on Evolutionary Computation (CEC) IEEE 1ndash6

[4] Hafeez Ullah Amin Aamir Saeed Malik Rana Fayyaz Ahmad Nasreen BadruddinNidal Kamel Muhammad Hussain and Weng-Tink Chooi 2015 Feature extrac-tion and classification for EEG signals using wavelet transform and machinelearning techniques Australasian physical amp engineering sciences in medicine 381 (2015) 139ndash149

[5] John Anvik Lyndon Hiew and Gail C Murphy 2006 Who should fix this bugIn Proceedings of the 28th international conference on Software engineering ACM361ndash370

[6] Gabriele Bavota Mario Linares-Vasquez Carlos Eduardo Bernal-Cardenas Mas-similiano Di Penta Rocco Oliveto and Denys Poshyvanyk 2014 The impactof api change-and fault-proneness on the user ratings of android apps IEEETransactions on Software Engineering 41 4 (2014) 384ndash407

[7] Pamela Bhattacharya and Iulian Neamtiu 2011 Bug-fix time prediction modelscan we do better In Proceedings of the 8thWorking Conference on Mining SoftwareRepositories ACM 207ndash210

[8] Steven Bird and Edward Loper 2004 NLTK the natural language toolkit InProceedings of the ACL 2004 on Interactive poster and demonstration sessionsAssociation for Computational Linguistics 31

[9] Andrew P Bradley 1997 The use of the area under the ROC curve in theevaluation of machine learning algorithms Pattern recognition 30 7 (1997)1145ndash1159

[10] Leo Breiman 2001 Random forests Machine learning 45 1 (2001) 5ndash32[11] Albert Caruana and Michael T Ewing 2010 How corporate reputation quality

and value influence online loyalty Journal of Business Research 63 9-10 (2010)1103ndash1110

[12] Tianqi Chen and Carlos Guestrin 2016 Xgboost A scalable tree boosting systemIn Proceedings of the 22nd acm sigkdd international conference on knowledgediscovery and data mining ACM 785ndash794

[13] Tianqi Chen Tong He Michael Benesty Vadim Khotilovich and Yuan Tang 2015Xgboost extreme gradient boosting R package version 04-2 (2015) 1ndash4

[14] Wenbin Chen Kun Fu Jiawei Zuo Xinwei Zheng Tinglei Huang and WenjuanRen 2017 Radar emitter classification for large data set based on weighted-xgboost IET Radar Sonar amp Navigation 11 8 (2017) 1203ndash1207

[15] Morakot Choetkiertikul Hoa Khanh Dam Truyen Tran and Aditya Ghose 2015Predicting delays in software projects using networked classification (t) In 201530th IEEEACM International Conference on Automated Software Engineering (ASE)IEEE 353ndash364

[16] Morakot Choetkiertikul Hoa Khanh Dam Truyen Tran Aditya Ghose and JohnGrundy 2017 Predicting delivery capability in iterative software developmentIEEE Transactions on Software Engineering 44 6 (2017) 551ndash573

[17] Franccedilois Chollet 2017 Keras[18] Daniel Alencar Da Costa Shane McIntosh Uiraacute Kulesza Ahmed E Hassan and

Surafel Lemma Abebe 2018 An empirical study of the integration time of fixedissues Empirical Software Engineering 23 1 (2018) 334ndash383

[19] Daniel Alencar da Costa Shane McIntosh Christoph Treude Uiraacute Kulesza andAhmed E Hassan 2018 The impact of rapid release cycles on the integrationdelay of fixed issues Empirical Software Engineering 23 2 (2018) 835ndash904

[20] Piew Datta Brij Masand Deepak R Mani and Bin Li 2000 Automated cellularmodeling and prediction on a large scale Artificial Intelligence Review 14 6 (2000)485ndash502

[21] Kushal S Dave Vishal Vaingankar Sumanth Kolar and Vasudeva Varma 2013Timespent based models for predicting user retention In Proceedings of the 22ndinternational conference on World Wide Web ACM 331ndash342

[22] William H DeLone and Ephraim R McLean 1992 Information systems successThe quest for the dependent variable Information systems research 3 1 (1992)60ndash95

[23] William H DeLone and Ephraim R McLean 2003 Model of information systemssuccess a ten years update Journal of Management (2003)

[24] Gideon Dror Dan Pelleg Oleg Rokhlenko and Idan Szpektor 2012 Churnprediction in new users of Yahoo answers In Proceedings of the 21st InternationalConference on World Wide Web ACM 829ndash834

[25] Khaled El Emam 2005 The ROI from software quality Auerbach Publications[26] Katherine Ellis Jacqueline Kerr Suneeta Godbole Gert Lanckriet David Wing

and Simon Marshall 2014 A random forest classifier for the prediction of energy

expenditure and type of physical activity from wrist and hip accelerometersPhysiological measurement 35 11 (2014) 2191

[27] David Enke and Suraphan Thawornwong 2005 The use of datamining and neuralnetworks for forecasting stock market returns Expert Systems with applications29 4 (2005) 927ndash940

[28] Iris Figalist Christoph Elsner Jan Bosch and Helena Holmstroumlm Olsson 2019Customer churn prediction in B2B contexts In International Conference on Soft-ware Business Springer 378ndash386

[29] Emanuel Giger Martin Pinzger and Harald Gall 2010 Predicting the fix timeof bugs In Proceedings of the 2nd International Workshop on RecommendationSystems for Software Engineering ACM 52ndash56

[30] Frank E Harrell Jr 2015 Regression modeling strategies with applications to linearmodels logistic and ordinal regression and survival analysis Springer

[31] Steffen Hedegaard and Jakob Grue Simonsen 2013 Extracting usability and userexperience information from online user reviews In Proceedings of the SIGCHIConference on Human Factors in Computing Systems ACM 2089ndash2098

[32] Yujuan Jiang Bram Adams and Daniel M German 2013 Will my patch make itand how fast Case study on the linux kernel In Proceedings of the 10th WorkingConference on Mining Software Repositories IEEE Press 101ndash110

[33] Guolin Ke Jia Zhang Zhenhui Xu Jiang Bian and Tie-Yan Liu 2019 TabNNa universal neural network solution for tabular data httpsopenreviewnetforumid=r1eJssCqY7

[34] Foutse Khomh Brian Chan Ying Zou and Ahmed E Hassan 2011 An entropyevaluation approach for triaging field crashes A case study of mozilla firefox InReverse Engineering (WCRE) 2011 18th Working Conference on IEEE 261ndash270

[35] Andy Liaw and Matthew Wiener 2002 Classification and regression by randomforest R news 2 3 (2002) 18ndash22

[36] Erik Linstead Paul Rigor Sushil Bajracharya Cristina Lopes and Pierre Baldi2007 Mining eclipse developer contributions via author-topic models In FourthInternational Workshop on Mining Software Repositories (MSRrsquo07 ICSE Workshops2007) IEEE 30ndash30

[37] Xi Long Wenjing Yin Le An Haiying Ni Lixian Huang Qi Luo and Yan Chen2012 Churn analysis of online social network users using data mining techniquesIn Proceedings of the international MultiConference of Engineers and ConputerScientists Vol 1

[38] James MacQueen 1967 Some methods for classification and analysis of multivari-ate observations In Proceedings of the fifth Berkeley symposium on mathematicalstatistics and probability Vol 1 Oakland CA USA 281ndash297

[39] Lionel Marks Ying Zou and Ahmed E Hassan 2011 Studying the fix-timefor bugs in large open source projects In Proceedings of the 7th InternationalConference on Predictive Models in Software Engineering ACM 11

[40] Andrew Kachites McCallum 2002 Mallet A machine learning for languagetoolkit (2002)

[41] Miloš Milošević Nenad Živić and Igor Andjelković 2017 Early churn predic-tion with personalized targeting in mobile social games Expert Systems withApplications 83 (2017) 326ndash332

[42] Audris Mockus Roy T Fielding and James Herbsleb 2000 A case study ofopen source software development the Apache server In Proceedings of the 22ndinternational conference on Software engineering Acm 263ndash272

[43] Gustavo Percio Zimmermann Montesdioca and Antocircnio Carlos Gastaud Maccedilada2015 Measuring user satisfaction with information security practices Computersamp Security 48 (2015) 267ndash280

[44] Leann Myers and Maria J Sirois 2004 Spearman correlation coefficients differ-ences between Encyclopedia of statistical sciences 12 (2004)

[45] Didrik Nielsen 2016 Tree boosting with XGBoost-why does XGBoost win EveryMachine Learning Competition Masterrsquos thesis NTNU

[46] Ehsan Noei Daniel Alencar Da Costa and Ying Zou 2018 Winning the appproduction rally In Proceedings of the 2018 26th ACM Joint Meeting on EuropeanSoftware Engineering Conference and Symposium on the Foundations of SoftwareEngineering ACM 283ndash294

[47] Ehsan Noei Feng Zhang Shaohua Wang and Ying Zou 2019 Towards pri-oritizing user-related issue reports of mobile applications Empirical SoftwareEngineering 24 4 (2019) 1964ndash1996

[48] Ehsan Noei Feng Zhang and Ying Zou 2019 Too Many User-Reviews WhatShould App Developers Look at First IEEE Transactions on Software Engineering(2019)

[49] Derek Orsquocallaghan Derek Greene Joe Carthy and Paacutedraig Cunningham 2015An analysis of the coherence of descriptors in topic modeling Expert Systemswith Applications 42 13 (2015) 5645ndash5657

[50] Ya A Pachepsky Dennis Timlin and GY Varallyay 1996 Artificial neural net-works to estimate soil water retention from easily measurable data Soil ScienceSociety of America Journal 60 3 (1996) 727ndash733

[51] Mahesh Pal 2005 Random forest classifier for remote sensing classificationInternational Journal of Remote Sensing 26 1 (2005) 217ndash222

[52] Sebastiano Panichella Andrea Di Sorbo Emitza Guzman Corrado A VisaggioGerardo Canfora and Harald C Gall 2015 How can i improve my app classifyinguser reviews for software maintenance and evolution In Software maintenanceand evolution (ICSME) 2015 IEEE international conference on IEEE 281ndash290

11

1277

1278

1279

1280

1281

1282

1283

1284

1285

1286

1287

1288

1289

1290

1291

1292

1293

1294

1295

1296

1297

1298

1299

1300

1301

1302

1303

1304

1305

1306

1307

1308

1309

1310

1311

1312

1313

1314

1315

1316

1317

1318

1319

1320

1321

1322

1323

1324

1325

1326

1327

1328

1329

1330

1331

1332

1333

1334

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

1335

1336

1337

1338

1339

1340

1341

1342

1343

1344

1345

1346

1347

1348

1349

1350

1351

1352

1353

1354

1355

1356

1357

1358

1359

1360

1361

1362

1363

1364

1365

1366

1367

1368

1369

1370

1371

1372

1373

1374

1375

1376

1377

1378

1379

1380

1381

1382

1383

1384

1385

1386

1387

1388

1389

1390

1391

1392

[53] Chitra Phadke Huseyin Uzunalioglu Veena B Mendiratta Dan Kushnir andDerek Doran 2013 Prediction of subscriber churn using social network analysisBell Labs Technical Journal 17 4 (2013) 63ndash76

[54] Peter C Rigby Daniel M German and Margaret-Anne Storey 2008 Open sourcesoftware peer review practices a case study of the apache server In Proceedingsof the 30th international conference on Software engineering ACM 541ndash550

[55] Martin P Robillard and Robert Deline 2011 A field study of API learning obstaclesEmpirical Software Engineering 16 6 (2011) 703ndash732

[56] Michael Roumlder Andreas Both and Alexander Hinneburg 2015 Exploring thespace of topic coherence measures In Proceedings of the eighth ACM internationalconference on Web search and data mining ACM 399ndash408

[57] Victor Francisco Rodriguez-Galiano Bardan Ghimire John Rogan Mario Chica-Olmo and Juan Pedro Rigol-Sanchez 2012 An assessment of the effectivenessof a random forest classifier for land-cover classification ISPRS Journal of Pho-togrammetry and Remote Sensing 67 (2012) 93ndash104

[58] Hiroaki Sakoe and Seibi Chiba 1978 Dynamic programming algorithm opti-mization for spoken word recognition IEEE transactions on acoustics speech andsignal processing 26 1 (1978) 43ndash49

[59] Stan Salvador and Philip Chan 2007 Toward accurate dynamic time warping inlinear time and space Intelligent Data Analysis 11 5 (2007) 561ndash580

[60] Peter B Seddon 1997 A respecification and extension of the DeLone and McLeanmodel of IS success Information systems research 8 3 (1997) 240ndash253

[61] Rudy Setiono Bart Baesens and Christophe Mues 2008 Recursive neural net-work rule extraction for data with mixed attributes IEEE Transactions on NeuralNetworks 19 2 (2008) 299ndash307

[62] Dong-Hee Shin 2015 Effect of the customer experience on satisfaction withsmartphones Assessing smart satisfaction index with partial least squaresTelecommunications Policy 39 8 (2015) 627ndash641

[63] Tom De Smedt and Walter Daelemans 2012 Pattern for python Journal ofMachine Learning Research 13 Jun (2012) 2063ndash2067

[64] Daniel Svozil Vladimir Kvasnicka and Jiri Pospichal 1997 Introduction to multi-layer feed-forward neural networks Chemometrics and intelligent laboratorysystems 39 1 (1997) 43ndash62

[65] Shaheen Syed and Marco Spruit 2017 Full-text or abstract Examining topiccoherence scores using latent dirichlet allocation In 2017 IEEE Internationalconference on data science and advanced analytics (DSAA) IEEE 165ndash174

[66] Chakkrit Tantithamthavorn Shane McIntosh Ahmed E Hassan and KenichiMatsumoto 2016 Automated parameter optimization of classification techniquesfor defect prediction models In 2016 IEEEACM 38th International Conference onSoftware Engineering (ICSE) IEEE 321ndash332

[67] Chakkrit Tantithamthavorn Shane McIntosh Ahmed E Hassan and KenichiMatsumoto 2018 The impact of automated parameter optimization on defectprediction models IEEE Transactions on Software Engineering (2018)

[68] Romain Tavenard 2017 tslearn A machine learning toolkit dedicated to time-series data httpsgithubcomrtavenartslearn

[69] L Torlay Marcela Perrone-Bertolotti Elizabeth Thomas and Monica Baciu 2017Machine learningndashXGBoost analysis of language networks to classify patientswith epilepsy Brain informatics 4 3 (2017) 159

[70] Geoffrey G Towell and Jude W Shavlik 1993 Extracting refined rules fromknowledge-based neural networks Machine learning 13 1 (1993) 71ndash101

[71] Irfan Ullah Basit Raza Ahmad Kamran Malik Muhammad Imran Saif Ul Islamand SungWon Kim 2019 A churn prediction model using random forest analysisof machine learning techniques for churn prediction and factor identification intelecom sector IEEE Access 7 (2019) 60134ndash60149

[72] Lorenzo Villarroel Gabriele Bavota Barbara Russo Rocco Oliveto and Massimil-iano Di Penta 2016 Release planning of mobile apps based on user reviews InProceedings of the 38th International Conference on Software Engineering ACM14ndash24

[73] Chih-Ping Wei and I-Tang Chiu 2002 Turning telecommunications call detailsto churn prediction a data mining approach Expert systems with applications 232 (2002) 103ndash112

[74] Cathrin Weiss Rahul Premraj Thomas Zimmermann and Andreas Zeller 2007How long will it take to fix this bug In Fourth International Workshop on MiningSoftware Repositories (MSRrsquo07 ICSE Workshops 2007) IEEE 1ndash1

[75] Ian H Witten Eibe Frank and A Mark 2016 Hall and Christopher J Pal DataMining Practical machine learning tools and techniques (2016)

[76] Barbara H Wixom and Peter A Todd 2005 A theoretical integration of usersatisfaction and technology acceptance Information systems research 16 1 (2005)85ndash102

[77] Jae-Chern Yoo and Tae Hee Han 2009 Fast normalized cross-correlation Circuitssystems and signal processing 28 6 (2009) 819

12

  • Abstract
  • 1 Introduction
  • 2 Motivating Example
  • 3 Data Collection
  • 4 Research Questions amp Results
  • 5 Threats To Validity
  • 6 Related Work
  • 7 Conclusion
  • References
Page 8: On the Relationship between User Churn and Software Issues · relationship between the issues that are present in a software prod-uct and user churn. For this purpose, we investigate

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

870

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

871

872

873

874

875

876

877

878

879

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

Table 5 Manual investigation of DTW correlation linking

FirefoxExample 1Correlation Linking Week 164 in issues to week 169 in switches (May 2018)Issue Report Title Firefox version 5303 appears high memory high resources usage and might causing for SSD damagesIssue Description Mozilla Firefox it is caching all of this data 4-5GB on my new SSD and its lost 4 from its health within 5 months due continuously writesoverwrites firefox cache on my SSDAltrnativeto Comment Switching to Chrome after version 5303 high memory is damaging my SSDExample 2Correlation Linking Week 66 in issues to week 68 in switches (November 2016)Issue Report Title Sessions are not cleared in the private windowIssue Description After closing private window (not a tab) and opening a new one I still logged inAltrnativeto Comment Chrome incognito functionality is more stable than Firefox Firefox doesnrsquot resets sessions

EclipseExample 1Correlation Linking Week 10 in issues to week 11 in switches (July 2015)Issue Report Title Not Java 8 CompatibleIssue Description When downloading Eclipse for the first time for Java development 64-bit (which is what my computer is) it is not running because it is not compatible with Java version 8

(which is the latest version of Java) but appears to be compatible with version 7 still -DosgirequiredJavaVersion=17Altrnativeto Comment Using Netbeans because Eclipse is not installingExample 2Correlation Linking Week 211 in issues to week 213 in switches (February 2019)Issue Report Title jar files in the exported application do not contain class filesIssue Description The application has many plugins It builds and runs well within eclipse but when I export the application using the eclipse export product wizard the jar files (corresponding to

the plugins) do not contain any class files they include meta-inf pluginxml and all the extra folders (eg lib assets etc when available) but they do NOT contain any class filesAltrnativeto Comment Didnrsquot find any problems in exporting jar (when recommending IntelliJ)

ApacheExample 1Correlation Linking Week 113 in issues to week 117 in switches (July 2017)Issue Report Title httpd 2426 no longer building against lua 531 or lua 534Issue Description Building Apache 2426 against lua 531 or 534 compiled from scratch (latest version as of this writing) does not work Compilation against lua 531 did work with 2425Altrnativeto Comment Some issues while building with lua (when recommending XAMPP)Example 2Correlation Linking Week 12 in issues to week 13 in switches (September 2019)Issue Report Title Apache Server is restarted time to time (Release 2410)Issue Description I am using Apache release 2410 in Windows Server 2008 Time to time the Apache server is getting restarted The service is unavailable for 3-5 minutes and after that the service

is started up automatically This issue occurs frequently twice a day two days once etcAltrnativeto Comment XAMPP is more stable I am experiencing random restarts throughout the day when using Apache

Table 6 The issue reports attributes for Firefox Eclipse andApache

Metric Description

Component consists of different components of the system such as bookmarkstheme toolbar etc

Status describes the status of the issue such as unconfirmed open re-solved etc

Resolution describes the resolution type of the issue such as fixed duplicateinvalid etc

Severity the assigned severity such as blocker critical normal etcHardware the different hardware affected by the issue such as desktop mobile

32 or 64 bits etcOS the different OS affected by the issue such as OSX Windows etcPriority the assigned priority from 1 to 5Type the assigned issue type such as defect enhancement or taskTime Alive the time difference between the date opened and the date resolved

in minutesTime since last change the time difference between the date opened and the date of last

change in minutesIs a Crash a boolean value to indicate if it is a crash reportHas a Version Number a boolean whether the issue report has a version numberHas a Target Milestone a boolean whether the issue report has a target numberHas a Regression Range a boolean value to indicate if the bug causes a regression testingNumber of comments the number of comments for the reportNumber of votes the number of votes for the report which validates the integrity of

the report by other usersNumber of Blocks the number of issues that are blocked by that issue until it is re-

solved

(ie with more tracking information) are associated with potentialusersrsquochurn

The number of comments votes and blocks showcase the effortinvested on discussing an issue Finally the time-alive and the time-since-last-change are two features related to the life-cycle of an issuereport Longer time-alive values indicate a lingering issue that hasnot been resolved yet while a short time-since-last-change values

indicate issues that have been recently addressed (and thus can stillbe reoccurring)

Our studied features are inspired by previous research in soft-ware engineering that have built machine learning models usinginformation present in issue reports [15 16 16 18 19 29 32 39 74]

We chose features that are common across the issue reports ofFirefox Eclipse and Apache We filtered out features that are uniqueto an issue tracker system only to ensure a fair comparison betweenour studied issue reports Afterwards we performed Spearmancorrelation tests [44] on the selected features in which each featureis tested against the set of all the other features We did not observeany correlation obtaining a value higher than 07 which suggeststhat our selected features are not correlated and can be safely fedinto our machine learning models [30]

To evaluate the performance of our models we use the Area Un-der the Curvemetric (AUC) AUC is useful for evaluating our modelsbecause our models output probabilities Therefore AUC showsthe discrimination power of our models in every probability thresh-old (unlike Precision Recall or F-measure which are limited toevaluating models at a single probability threshold [66]) The AUCvalues range from 0 to 1 An AUC of 05 denotes a random guessingmodel while an AUC of 1 denotes a perfect distinguishing powerAn AUC of 0 denotes a model with perfect inverse predictions

Given the temporal nature of our data (ie future issue reportscannot be used to predict the potential user churn of past issuereports) we adopt a leave-one-out validation approach to obtainour AUC values

First our issue reports have already been sorted when we per-form the time-series analyses Second we split our data into two

8

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

987

988

989

990

991

992

993

994

995

996

997

998

999

1000

1001

1002

1003

1004

1005

1006

1007

1008

1009

1010

1011

1012

1013

1014

1015

1016

1017

1018

1019

1020

1021

1022

1023

1024

1025

1026

1027

1028

1029

1030

1031

1032

1033

1034

1035

1036

1037

1038

1039

1040

1041

1042

1043

1044

sets one containing 80 of the data ie the training set and an-other containing 20 ie the validation set In the first iterationof our models we use 80 of the data to train our models and weuse the first element of the validation set to test out models Thenthe leave-one-out validation iteratively increases the training setsize by one element (from the validation set) and predicts the nextelement of the validation set This process is repeated until thetraining set reaches the size of 119873 minus 1 (where 119873 is the size of ourdata set) Finally our obtained AUC values are the aggregation(ie median) of all the AUC values obtained in the leave-one-outiterationsFindingsTheAUC values obtained for our threemodels are shownin our online appendix8 The XGboost model performs the best inthe three studied projects obtaining AUC values that exceed 083There exists a subtle difference between the AUC values of theXGboost model and the Feed Forward Neural Networks AUCssince both the XGboost model and the Neural Networks can graspthe complex relationships in our issue reports dataSummary Our results suggest that machine learning models suchas XGboost and Feed Forward Neural Networks can predict therate of user churn with relatively high accuracies (in terms of AUC)Such predictions could help in the prioritization of issue reports

RQ3 What are the most important factorsrelated to potential user churnMotivation In RQ2 we observe that machine learning modelscan be used to predict the rate of potential usersrsquochurn Howeverdevelopers would hardly blindly trust a machine learning modelto help on their decisions (and they should not do so) Developerswould mostly benefit from understanding the reasons behind thepredictions of our machine learning models so that they couldpossibly adapt (or not) their development process Therefore inRQ3 we investigate the most important features in the predictionsof our machine learning modelApproach In this RQ we study the most important features in thepredictions of the XGboost models since they obtain the best AUCvalues in our three studied projects

To find the most important features we adopt a simple featureextraction algorithm [61 70] Consider that our models are trainedon a feature set 119883 = 1199091 1199092 119909119899 (as explained in RQ2) The featureextraction process consists of iteratively (i) removing each feature 119909119894from our models and (ii) computing the AUC without each feature119909119894 (by using the same leave-one-out validation process explainedin RQ2) The AUC values from the models without a feature 119909119894 arecompared to the models containing all the features (ie we takethe difference 119860119880119862119900119903119894119892119894119899119886119897 minus119860119880119862119903119890119898119900119907119890119889 ) The higher the drop inthe AUC value caused by removing a certain feature 119909119894 the higherthe importance of such a feature 119909119894 [4 9 10]Findings Table 7 shows our obtained results after performing thefeature extraction process The table shows the drop in the AUC foreach feature (ie ΔAUC) We also show the minimum maximummean and median values of the features in the 119897119900119908 user churn rate(ie 0) and ℎ119894119892ℎ user churn rate (ie 1) prediction classes

8httpsdoiorg105281zenodo3610584

The Issue Alive Time is the most important feature obtaining thehighest ΔAUC in our three studied projects This result suggeststhat it is unhealthy to leave issues hanging for a long time as theycan be perceived as a reason for users to churn

The Number of Comments obtains a considerable ΔAUC in ourthree studied systems The association between the Number ofComments and the rate of user churnmay occur due to themultitudenumber of users being affected by an issue

The Time-since-last-change is also an important feature in theFirefox and Eclipse projects This result suggests that issues withrecurring changes or modifications may signal a certain instabilityin the fixing process which might cause a potential user churn inthe future

It is worthy noting that features such as Severity Priority andType of Bugs do not obtain a high importance in the predictionswhich might suggest that the existing prioritization fields used inissue reports are misaligned with the potential user churnSummaryOur results suggest that long lived issues can potentiallylead to more potential userrsquos churn and that the existing processesfor prioritizing issues should be augmented to capture the highlyinteractive and long lived issues

5 THREATS TO VALIDITYConstruct Validity The main construct validity of our study isrelated to the assumption of a potential user churn The alterna-tivetonet website does not record exactly when a user has chosen asoftware alternative over another software Instead alternativetonetrecords that a user thinks that a software may be a good alternativeto another software The motivation behind the act of signaling analternative is not known by us For example there might be usersthat are simply driven by the passion of contributing to the alterna-tivetonet community which lead them to provide several opinionsas to which software could make good alternatives to others Wehave deliberately adopted the term potential user churn instead ofsimply user churn due to this unknown motivations behind signal-ing alternatives Moreover it is reasonable to think that if a givensoftware has received a considerable amount of recommendationfor alternatives this may likely indeed represent that users areconsidering to change the software

Internal Validity Our internal threats to validity are mainlyrelated to (i) our chosen features and (ii) our prediction classes Wesplit the peaks of potential user churn into the low and high classesOther research may find different results if the potential user churnis modeled in different way In addition we acknowledge that ourset of metrics is not exhaustive For example we have not studiedwhether the presence of a stack-trace in an issue report can berelated to future potential user churn We plan to (i) model thepotential user churn as a continuous variable and (ii) extend our setof features to predict the potential user churn in future research

External Validity Our study targeted three domains that areweb servers IDEs and web browsers These three domains werechosen due to their availability since we could fetch their issuereports data Although our study showed common factors in ouranalyses we cannot generalize our results to other domains orprojects with a different scale (ie smaller projects) Regardless ourwork provides insights of the possible reasons behind user churn

9

1045

1046

1047

1048

1049

1050

1051

1052

1053

1054

1055

1056

1057

1058

1059

1060

1061

1062

1063

1064

1065

1066

1067

1068

1069

1070

1071

1072

1073

1074

1075

1076

1077

1078

1079

1080

1081

1082

1083

1084

1085

1086

1087

1088

1089

1090

1091

1092

1093

1094

1095

1096

1097

1098

1099

1100

1101

1102

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

1103

1104

1105

1106

1107

1108

1109

1110

1111

1112

1113

1114

1115

1116

1117

1118

1119

1120

1121

1122

1123

1124

1125

1126

1127

1128

1129

1130

1131

1132

1133

1134

1135

1136

1137

1138

1139

1140

1141

1142

1143

1144

1145

1146

1147

1148

1149

1150

1151

1152

1153

1154

1155

1156

1157

1158

1159

1160

Table 7 The most important features for Firefox Eclipse and Apache ranked by their drop in AUC

Factor ΔAUC Min 0 Max 0 Mean 0 Median 0 Min 1 Max 1 Mean 1 Median 1

FirefoxTime Alive 013 0 120 1102 13 10 1487 4872 63OS All 007 0 1 014 0 0 1 083 1Number of Comments 006 1 65 894 12 1 458 2811 45Time Since Last Change 006 9 30 1090 22 0 5 410 3Has a Target Milestone 005 0 1 022 0 0 1 079 1Has a Version Number 004 0 1 033 0 0 1 064 1Component UI Web Payments 003 0 1 042 0 0 1 061 1

EclipseTime Alive 012 0 103 901 12 8 1334 4396 49Has a Target Milestone 006 0 1 019 0 0 1 082 1Time since last change 006 7 27 851 19 0 6 312 3Has a Version Number 005 0 1 034 0 0 1 077 1Component Debug 004 0 1 047 0 0 1 072 1Number of Comments 003 1 72 1092 11 1 632 2701 72OS All 002 0 1 050 0 0 1 068 1

ApacheTime Alive 014 0 136 1132 14 14 1521 5310 70Component Utils 010 0 1 022 0 0 1 091 1Has a Version Number 009 0 1 038 0 0 1 086 1Number of Comments 005 1 92 1232 6 1 352 4207 79OS All 004 0 1 033 0 0 1 079 1Status REOPENED 003 0 1 040 0 0 1 073 1Hardware All 002 0 1 051 0 0 1 078 1

6 RELATEDWORKGiven that we study the relationship between potential user churnand issue reports our research is related to the areas of usercustomerchurn and software quality Therefore we survey the related re-search around these two areas in this section

UserCustomerChurnUser or customer churn has beenwidelystudied in the area of telecommunication services [3 28 71] Aminet al [3] proposed a cross-company machine learning model topredict customer churn The authors also propose the use of datatransformation (eg by using the log function) to improve the train-ing data Ullah et al [71] used feature selection techniques such asInformation Gain and Correlation Attributes Ranking Filter to selectwhich features better explained the churn behaviour of differentcustomer groups The authors observed that a Random Forest modelperformed the best in their study While the Business to Customers(B2C) churn has been widely studied Figalist et al [28] recentlyinvestigated the Business to Business (B2B) churn ie when thecustomer business changes shifts to the services provided by thecompetition Figalist et al [28] allude that although B2B relation-ships tend to be more stable they have a much bigger financial im-pact when they change In terms of software engineering there hasbeen a lack of studies that have investigated usercostumer churndirectly Instead much research has been invested on customerreviews regarding software products [6 46 47] Noei et al [46]studied which features from mobile apps are associated with betterranks in the Google Play store While the work by Noei et al [46]does not study the customer churn directly striving for better ranksin Google Play may avoid user churn in mobile apps Bavota etal [6] investigated the relationship between the usage of fault- andchange-prone APIs in Android apps and user reviews Differentlyfrom all the aforementioned work our study investigates the userchurn is software products

Software Quality Although there exist a lack of studies ana-lyzing the user churn of software productsservices there has beena considerable amount of research on software quality [5 34 39 6667 74] Research on software quality is important since a quality

software is more stable and less defect-prone Ultimately such char-acteristics are tightly related to the observations of our study(iesoftware issues are correlated with user churn) A prominent areaof software quality research in software engineering is the defectprediction area [66 67] Tantithamthavorn et al [66] studied howmuch improvement can be obtained by automatically optimizingdefect prediction models Their research shows that automatic op-timization can yield significant or insignificant performance gainsdepending on the machine learning algorithm Other considerableamount of research has been invested on the triaging part of issuereports [5] and on the effort estimation of an issue report (in termsof time to address an issue) [39 74] In this paper we complementprior research by studying the relationship shared between issuereports (eg defects or required enhancements) with the potentialuser churn of users

7 CONCLUSIONIn this work we study the data available on alternativetonet tobetter understand the relationship between software issues and thepotential user churn of users Having observed that user concernsare tightly related to software issues (eg bugs) we investigatethe relationship between issue reports and the potential user churnof users Our study reveals key issues that must be addressed forthe success of a software product (depending on the domain) Forexample we observe that the potential user churn of users may betightly related to the lack of a robust documentation and support fortesting tools (in the ldquoIDErdquo and ldquoWeb Serverrdquo domains) Finally ourmachine learning models reveal that (i) the longer the issue takes tobe fixed the higher the chances of user churn and (ii) issues withinmore general software components are more likely to be associatedwith user churn Finally we suggest that the current prioritizationperformed by developers should be augmented to encompass thelong lived and highly interactive issues In overall our study sug-gests that the prioritization process of issues can be improved byconsidering the potential user churn of users associated with suchissues

10

1161

1162

1163

1164

1165

1166

1167

1168

1169

1170

1171

1172

1173

1174

1175

1176

1177

1178

1179

1180

1181

1182

1183

1184

1185

1186

1187

1188

1189

1190

1191

1192

1193

1194

1195

1196

1197

1198

1199

1200

1201

1202

1203

1204

1205

1206

1207

1208

1209

1210

1211

1212

1213

1214

1215

1216

1217

1218

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

1219

1220

1221

1222

1223

1224

1225

1226

1227

1228

1229

1230

1231

1232

1233

1234

1235

1236

1237

1238

1239

1240

1241

1242

1243

1244

1245

1246

1247

1248

1249

1250

1251

1252

1253

1254

1255

1256

1257

1258

1259

1260

1261

1262

1263

1264

1265

1266

1267

1268

1269

1270

1271

1272

1273

1274

1275

1276

REFERENCES[1] Martiacuten Abadi Paul Barham Jianmin Chen Zhifeng Chen Andy Davis Jeffrey

Dean Matthieu Devin Sanjay Ghemawat Geoffrey Irving Michael Isard Man-junath Kudlur Josh Levenberg Rajat Monga Sherry Moore Derek G MurrayBenoit Steiner Paul Tucker Vijay Vasudevan Pete Warden Martin Wicke YuanYu and Xiaoqiang Zheng 2016 Tensorflow A system for large-scale machinelearning In 12th USENIX Symposium on Operating Systems Design and Imple-mentation (OSDI 16) 265ndash283

[2] W Abdelmoez Mohamed Kholief and Fayrouz M Elsalmy 2012 Bug fix-time pre-diction model using naiumlve bayes classifier In 2012 22nd International Conferenceon Computer Theory and Applications (ICCTA) IEEE 167ndash172

[3] Adnan Amin Babar Shah Asad Masood Khattak Thar Baker and Sajid Anwar2018 Just-in-time customer churn prediction with and without data transforma-tion In 2018 IEEE Congress on Evolutionary Computation (CEC) IEEE 1ndash6

[4] Hafeez Ullah Amin Aamir Saeed Malik Rana Fayyaz Ahmad Nasreen BadruddinNidal Kamel Muhammad Hussain and Weng-Tink Chooi 2015 Feature extrac-tion and classification for EEG signals using wavelet transform and machinelearning techniques Australasian physical amp engineering sciences in medicine 381 (2015) 139ndash149

[5] John Anvik Lyndon Hiew and Gail C Murphy 2006 Who should fix this bugIn Proceedings of the 28th international conference on Software engineering ACM361ndash370

[6] Gabriele Bavota Mario Linares-Vasquez Carlos Eduardo Bernal-Cardenas Mas-similiano Di Penta Rocco Oliveto and Denys Poshyvanyk 2014 The impactof api change-and fault-proneness on the user ratings of android apps IEEETransactions on Software Engineering 41 4 (2014) 384ndash407

[7] Pamela Bhattacharya and Iulian Neamtiu 2011 Bug-fix time prediction modelscan we do better In Proceedings of the 8thWorking Conference on Mining SoftwareRepositories ACM 207ndash210

[8] Steven Bird and Edward Loper 2004 NLTK the natural language toolkit InProceedings of the ACL 2004 on Interactive poster and demonstration sessionsAssociation for Computational Linguistics 31

[9] Andrew P Bradley 1997 The use of the area under the ROC curve in theevaluation of machine learning algorithms Pattern recognition 30 7 (1997)1145ndash1159

[10] Leo Breiman 2001 Random forests Machine learning 45 1 (2001) 5ndash32[11] Albert Caruana and Michael T Ewing 2010 How corporate reputation quality

and value influence online loyalty Journal of Business Research 63 9-10 (2010)1103ndash1110

[12] Tianqi Chen and Carlos Guestrin 2016 Xgboost A scalable tree boosting systemIn Proceedings of the 22nd acm sigkdd international conference on knowledgediscovery and data mining ACM 785ndash794

[13] Tianqi Chen Tong He Michael Benesty Vadim Khotilovich and Yuan Tang 2015Xgboost extreme gradient boosting R package version 04-2 (2015) 1ndash4

[14] Wenbin Chen Kun Fu Jiawei Zuo Xinwei Zheng Tinglei Huang and WenjuanRen 2017 Radar emitter classification for large data set based on weighted-xgboost IET Radar Sonar amp Navigation 11 8 (2017) 1203ndash1207

[15] Morakot Choetkiertikul Hoa Khanh Dam Truyen Tran and Aditya Ghose 2015Predicting delays in software projects using networked classification (t) In 201530th IEEEACM International Conference on Automated Software Engineering (ASE)IEEE 353ndash364

[16] Morakot Choetkiertikul Hoa Khanh Dam Truyen Tran Aditya Ghose and JohnGrundy 2017 Predicting delivery capability in iterative software developmentIEEE Transactions on Software Engineering 44 6 (2017) 551ndash573

[17] Franccedilois Chollet 2017 Keras[18] Daniel Alencar Da Costa Shane McIntosh Uiraacute Kulesza Ahmed E Hassan and

Surafel Lemma Abebe 2018 An empirical study of the integration time of fixedissues Empirical Software Engineering 23 1 (2018) 334ndash383

[19] Daniel Alencar da Costa Shane McIntosh Christoph Treude Uiraacute Kulesza andAhmed E Hassan 2018 The impact of rapid release cycles on the integrationdelay of fixed issues Empirical Software Engineering 23 2 (2018) 835ndash904

[20] Piew Datta Brij Masand Deepak R Mani and Bin Li 2000 Automated cellularmodeling and prediction on a large scale Artificial Intelligence Review 14 6 (2000)485ndash502

[21] Kushal S Dave Vishal Vaingankar Sumanth Kolar and Vasudeva Varma 2013Timespent based models for predicting user retention In Proceedings of the 22ndinternational conference on World Wide Web ACM 331ndash342

[22] William H DeLone and Ephraim R McLean 1992 Information systems successThe quest for the dependent variable Information systems research 3 1 (1992)60ndash95

[23] William H DeLone and Ephraim R McLean 2003 Model of information systemssuccess a ten years update Journal of Management (2003)

[24] Gideon Dror Dan Pelleg Oleg Rokhlenko and Idan Szpektor 2012 Churnprediction in new users of Yahoo answers In Proceedings of the 21st InternationalConference on World Wide Web ACM 829ndash834

[25] Khaled El Emam 2005 The ROI from software quality Auerbach Publications[26] Katherine Ellis Jacqueline Kerr Suneeta Godbole Gert Lanckriet David Wing

and Simon Marshall 2014 A random forest classifier for the prediction of energy

expenditure and type of physical activity from wrist and hip accelerometersPhysiological measurement 35 11 (2014) 2191

[27] David Enke and Suraphan Thawornwong 2005 The use of datamining and neuralnetworks for forecasting stock market returns Expert Systems with applications29 4 (2005) 927ndash940

[28] Iris Figalist Christoph Elsner Jan Bosch and Helena Holmstroumlm Olsson 2019Customer churn prediction in B2B contexts In International Conference on Soft-ware Business Springer 378ndash386

[29] Emanuel Giger Martin Pinzger and Harald Gall 2010 Predicting the fix timeof bugs In Proceedings of the 2nd International Workshop on RecommendationSystems for Software Engineering ACM 52ndash56

[30] Frank E Harrell Jr 2015 Regression modeling strategies with applications to linearmodels logistic and ordinal regression and survival analysis Springer

[31] Steffen Hedegaard and Jakob Grue Simonsen 2013 Extracting usability and userexperience information from online user reviews In Proceedings of the SIGCHIConference on Human Factors in Computing Systems ACM 2089ndash2098

[32] Yujuan Jiang Bram Adams and Daniel M German 2013 Will my patch make itand how fast Case study on the linux kernel In Proceedings of the 10th WorkingConference on Mining Software Repositories IEEE Press 101ndash110

[33] Guolin Ke Jia Zhang Zhenhui Xu Jiang Bian and Tie-Yan Liu 2019 TabNNa universal neural network solution for tabular data httpsopenreviewnetforumid=r1eJssCqY7

[34] Foutse Khomh Brian Chan Ying Zou and Ahmed E Hassan 2011 An entropyevaluation approach for triaging field crashes A case study of mozilla firefox InReverse Engineering (WCRE) 2011 18th Working Conference on IEEE 261ndash270

[35] Andy Liaw and Matthew Wiener 2002 Classification and regression by randomforest R news 2 3 (2002) 18ndash22

[36] Erik Linstead Paul Rigor Sushil Bajracharya Cristina Lopes and Pierre Baldi2007 Mining eclipse developer contributions via author-topic models In FourthInternational Workshop on Mining Software Repositories (MSRrsquo07 ICSE Workshops2007) IEEE 30ndash30

[37] Xi Long Wenjing Yin Le An Haiying Ni Lixian Huang Qi Luo and Yan Chen2012 Churn analysis of online social network users using data mining techniquesIn Proceedings of the international MultiConference of Engineers and ConputerScientists Vol 1

[38] James MacQueen 1967 Some methods for classification and analysis of multivari-ate observations In Proceedings of the fifth Berkeley symposium on mathematicalstatistics and probability Vol 1 Oakland CA USA 281ndash297

[39] Lionel Marks Ying Zou and Ahmed E Hassan 2011 Studying the fix-timefor bugs in large open source projects In Proceedings of the 7th InternationalConference on Predictive Models in Software Engineering ACM 11

[40] Andrew Kachites McCallum 2002 Mallet A machine learning for languagetoolkit (2002)

[41] Miloš Milošević Nenad Živić and Igor Andjelković 2017 Early churn predic-tion with personalized targeting in mobile social games Expert Systems withApplications 83 (2017) 326ndash332

[42] Audris Mockus Roy T Fielding and James Herbsleb 2000 A case study ofopen source software development the Apache server In Proceedings of the 22ndinternational conference on Software engineering Acm 263ndash272

[43] Gustavo Percio Zimmermann Montesdioca and Antocircnio Carlos Gastaud Maccedilada2015 Measuring user satisfaction with information security practices Computersamp Security 48 (2015) 267ndash280

[44] Leann Myers and Maria J Sirois 2004 Spearman correlation coefficients differ-ences between Encyclopedia of statistical sciences 12 (2004)

[45] Didrik Nielsen 2016 Tree boosting with XGBoost-why does XGBoost win EveryMachine Learning Competition Masterrsquos thesis NTNU

[46] Ehsan Noei Daniel Alencar Da Costa and Ying Zou 2018 Winning the appproduction rally In Proceedings of the 2018 26th ACM Joint Meeting on EuropeanSoftware Engineering Conference and Symposium on the Foundations of SoftwareEngineering ACM 283ndash294

[47] Ehsan Noei Feng Zhang Shaohua Wang and Ying Zou 2019 Towards pri-oritizing user-related issue reports of mobile applications Empirical SoftwareEngineering 24 4 (2019) 1964ndash1996

[48] Ehsan Noei Feng Zhang and Ying Zou 2019 Too Many User-Reviews WhatShould App Developers Look at First IEEE Transactions on Software Engineering(2019)

[49] Derek Orsquocallaghan Derek Greene Joe Carthy and Paacutedraig Cunningham 2015An analysis of the coherence of descriptors in topic modeling Expert Systemswith Applications 42 13 (2015) 5645ndash5657

[50] Ya A Pachepsky Dennis Timlin and GY Varallyay 1996 Artificial neural net-works to estimate soil water retention from easily measurable data Soil ScienceSociety of America Journal 60 3 (1996) 727ndash733

[51] Mahesh Pal 2005 Random forest classifier for remote sensing classificationInternational Journal of Remote Sensing 26 1 (2005) 217ndash222

[52] Sebastiano Panichella Andrea Di Sorbo Emitza Guzman Corrado A VisaggioGerardo Canfora and Harald C Gall 2015 How can i improve my app classifyinguser reviews for software maintenance and evolution In Software maintenanceand evolution (ICSME) 2015 IEEE international conference on IEEE 281ndash290

11

1277

1278

1279

1280

1281

1282

1283

1284

1285

1286

1287

1288

1289

1290

1291

1292

1293

1294

1295

1296

1297

1298

1299

1300

1301

1302

1303

1304

1305

1306

1307

1308

1309

1310

1311

1312

1313

1314

1315

1316

1317

1318

1319

1320

1321

1322

1323

1324

1325

1326

1327

1328

1329

1330

1331

1332

1333

1334

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

1335

1336

1337

1338

1339

1340

1341

1342

1343

1344

1345

1346

1347

1348

1349

1350

1351

1352

1353

1354

1355

1356

1357

1358

1359

1360

1361

1362

1363

1364

1365

1366

1367

1368

1369

1370

1371

1372

1373

1374

1375

1376

1377

1378

1379

1380

1381

1382

1383

1384

1385

1386

1387

1388

1389

1390

1391

1392

[53] Chitra Phadke Huseyin Uzunalioglu Veena B Mendiratta Dan Kushnir andDerek Doran 2013 Prediction of subscriber churn using social network analysisBell Labs Technical Journal 17 4 (2013) 63ndash76

[54] Peter C Rigby Daniel M German and Margaret-Anne Storey 2008 Open sourcesoftware peer review practices a case study of the apache server In Proceedingsof the 30th international conference on Software engineering ACM 541ndash550

[55] Martin P Robillard and Robert Deline 2011 A field study of API learning obstaclesEmpirical Software Engineering 16 6 (2011) 703ndash732

[56] Michael Roumlder Andreas Both and Alexander Hinneburg 2015 Exploring thespace of topic coherence measures In Proceedings of the eighth ACM internationalconference on Web search and data mining ACM 399ndash408

[57] Victor Francisco Rodriguez-Galiano Bardan Ghimire John Rogan Mario Chica-Olmo and Juan Pedro Rigol-Sanchez 2012 An assessment of the effectivenessof a random forest classifier for land-cover classification ISPRS Journal of Pho-togrammetry and Remote Sensing 67 (2012) 93ndash104

[58] Hiroaki Sakoe and Seibi Chiba 1978 Dynamic programming algorithm opti-mization for spoken word recognition IEEE transactions on acoustics speech andsignal processing 26 1 (1978) 43ndash49

[59] Stan Salvador and Philip Chan 2007 Toward accurate dynamic time warping inlinear time and space Intelligent Data Analysis 11 5 (2007) 561ndash580

[60] Peter B Seddon 1997 A respecification and extension of the DeLone and McLeanmodel of IS success Information systems research 8 3 (1997) 240ndash253

[61] Rudy Setiono Bart Baesens and Christophe Mues 2008 Recursive neural net-work rule extraction for data with mixed attributes IEEE Transactions on NeuralNetworks 19 2 (2008) 299ndash307

[62] Dong-Hee Shin 2015 Effect of the customer experience on satisfaction withsmartphones Assessing smart satisfaction index with partial least squaresTelecommunications Policy 39 8 (2015) 627ndash641

[63] Tom De Smedt and Walter Daelemans 2012 Pattern for python Journal ofMachine Learning Research 13 Jun (2012) 2063ndash2067

[64] Daniel Svozil Vladimir Kvasnicka and Jiri Pospichal 1997 Introduction to multi-layer feed-forward neural networks Chemometrics and intelligent laboratorysystems 39 1 (1997) 43ndash62

[65] Shaheen Syed and Marco Spruit 2017 Full-text or abstract Examining topiccoherence scores using latent dirichlet allocation In 2017 IEEE Internationalconference on data science and advanced analytics (DSAA) IEEE 165ndash174

[66] Chakkrit Tantithamthavorn Shane McIntosh Ahmed E Hassan and KenichiMatsumoto 2016 Automated parameter optimization of classification techniquesfor defect prediction models In 2016 IEEEACM 38th International Conference onSoftware Engineering (ICSE) IEEE 321ndash332

[67] Chakkrit Tantithamthavorn Shane McIntosh Ahmed E Hassan and KenichiMatsumoto 2018 The impact of automated parameter optimization on defectprediction models IEEE Transactions on Software Engineering (2018)

[68] Romain Tavenard 2017 tslearn A machine learning toolkit dedicated to time-series data httpsgithubcomrtavenartslearn

[69] L Torlay Marcela Perrone-Bertolotti Elizabeth Thomas and Monica Baciu 2017Machine learningndashXGBoost analysis of language networks to classify patientswith epilepsy Brain informatics 4 3 (2017) 159

[70] Geoffrey G Towell and Jude W Shavlik 1993 Extracting refined rules fromknowledge-based neural networks Machine learning 13 1 (1993) 71ndash101

[71] Irfan Ullah Basit Raza Ahmad Kamran Malik Muhammad Imran Saif Ul Islamand SungWon Kim 2019 A churn prediction model using random forest analysisof machine learning techniques for churn prediction and factor identification intelecom sector IEEE Access 7 (2019) 60134ndash60149

[72] Lorenzo Villarroel Gabriele Bavota Barbara Russo Rocco Oliveto and Massimil-iano Di Penta 2016 Release planning of mobile apps based on user reviews InProceedings of the 38th International Conference on Software Engineering ACM14ndash24

[73] Chih-Ping Wei and I-Tang Chiu 2002 Turning telecommunications call detailsto churn prediction a data mining approach Expert systems with applications 232 (2002) 103ndash112

[74] Cathrin Weiss Rahul Premraj Thomas Zimmermann and Andreas Zeller 2007How long will it take to fix this bug In Fourth International Workshop on MiningSoftware Repositories (MSRrsquo07 ICSE Workshops 2007) IEEE 1ndash1

[75] Ian H Witten Eibe Frank and A Mark 2016 Hall and Christopher J Pal DataMining Practical machine learning tools and techniques (2016)

[76] Barbara H Wixom and Peter A Todd 2005 A theoretical integration of usersatisfaction and technology acceptance Information systems research 16 1 (2005)85ndash102

[77] Jae-Chern Yoo and Tae Hee Han 2009 Fast normalized cross-correlation Circuitssystems and signal processing 28 6 (2009) 819

12

  • Abstract
  • 1 Introduction
  • 2 Motivating Example
  • 3 Data Collection
  • 4 Research Questions amp Results
  • 5 Threats To Validity
  • 6 Related Work
  • 7 Conclusion
  • References
Page 9: On the Relationship between User Churn and Software Issues · relationship between the issues that are present in a software prod-uct and user churn. For this purpose, we investigate

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

987

988

989

990

991

992

993

994

995

996

997

998

999

1000

1001

1002

1003

1004

1005

1006

1007

1008

1009

1010

1011

1012

1013

1014

1015

1016

1017

1018

1019

1020

1021

1022

1023

1024

1025

1026

1027

1028

1029

1030

1031

1032

1033

1034

1035

1036

1037

1038

1039

1040

1041

1042

1043

1044

sets one containing 80 of the data ie the training set and an-other containing 20 ie the validation set In the first iterationof our models we use 80 of the data to train our models and weuse the first element of the validation set to test out models Thenthe leave-one-out validation iteratively increases the training setsize by one element (from the validation set) and predicts the nextelement of the validation set This process is repeated until thetraining set reaches the size of 119873 minus 1 (where 119873 is the size of ourdata set) Finally our obtained AUC values are the aggregation(ie median) of all the AUC values obtained in the leave-one-outiterationsFindingsTheAUC values obtained for our threemodels are shownin our online appendix8 The XGboost model performs the best inthe three studied projects obtaining AUC values that exceed 083There exists a subtle difference between the AUC values of theXGboost model and the Feed Forward Neural Networks AUCssince both the XGboost model and the Neural Networks can graspthe complex relationships in our issue reports dataSummary Our results suggest that machine learning models suchas XGboost and Feed Forward Neural Networks can predict therate of user churn with relatively high accuracies (in terms of AUC)Such predictions could help in the prioritization of issue reports

RQ3 What are the most important factorsrelated to potential user churnMotivation In RQ2 we observe that machine learning modelscan be used to predict the rate of potential usersrsquochurn Howeverdevelopers would hardly blindly trust a machine learning modelto help on their decisions (and they should not do so) Developerswould mostly benefit from understanding the reasons behind thepredictions of our machine learning models so that they couldpossibly adapt (or not) their development process Therefore inRQ3 we investigate the most important features in the predictionsof our machine learning modelApproach In this RQ we study the most important features in thepredictions of the XGboost models since they obtain the best AUCvalues in our three studied projects

To find the most important features we adopt a simple featureextraction algorithm [61 70] Consider that our models are trainedon a feature set 119883 = 1199091 1199092 119909119899 (as explained in RQ2) The featureextraction process consists of iteratively (i) removing each feature 119909119894from our models and (ii) computing the AUC without each feature119909119894 (by using the same leave-one-out validation process explainedin RQ2) The AUC values from the models without a feature 119909119894 arecompared to the models containing all the features (ie we takethe difference 119860119880119862119900119903119894119892119894119899119886119897 minus119860119880119862119903119890119898119900119907119890119889 ) The higher the drop inthe AUC value caused by removing a certain feature 119909119894 the higherthe importance of such a feature 119909119894 [4 9 10]Findings Table 7 shows our obtained results after performing thefeature extraction process The table shows the drop in the AUC foreach feature (ie ΔAUC) We also show the minimum maximummean and median values of the features in the 119897119900119908 user churn rate(ie 0) and ℎ119894119892ℎ user churn rate (ie 1) prediction classes

8httpsdoiorg105281zenodo3610584

The Issue Alive Time is the most important feature obtaining thehighest ΔAUC in our three studied projects This result suggeststhat it is unhealthy to leave issues hanging for a long time as theycan be perceived as a reason for users to churn

The Number of Comments obtains a considerable ΔAUC in ourthree studied systems The association between the Number ofComments and the rate of user churnmay occur due to themultitudenumber of users being affected by an issue

The Time-since-last-change is also an important feature in theFirefox and Eclipse projects This result suggests that issues withrecurring changes or modifications may signal a certain instabilityin the fixing process which might cause a potential user churn inthe future

It is worthy noting that features such as Severity Priority andType of Bugs do not obtain a high importance in the predictionswhich might suggest that the existing prioritization fields used inissue reports are misaligned with the potential user churnSummaryOur results suggest that long lived issues can potentiallylead to more potential userrsquos churn and that the existing processesfor prioritizing issues should be augmented to capture the highlyinteractive and long lived issues

5 THREATS TO VALIDITYConstruct Validity The main construct validity of our study isrelated to the assumption of a potential user churn The alterna-tivetonet website does not record exactly when a user has chosen asoftware alternative over another software Instead alternativetonetrecords that a user thinks that a software may be a good alternativeto another software The motivation behind the act of signaling analternative is not known by us For example there might be usersthat are simply driven by the passion of contributing to the alterna-tivetonet community which lead them to provide several opinionsas to which software could make good alternatives to others Wehave deliberately adopted the term potential user churn instead ofsimply user churn due to this unknown motivations behind signal-ing alternatives Moreover it is reasonable to think that if a givensoftware has received a considerable amount of recommendationfor alternatives this may likely indeed represent that users areconsidering to change the software

Internal Validity Our internal threats to validity are mainlyrelated to (i) our chosen features and (ii) our prediction classes Wesplit the peaks of potential user churn into the low and high classesOther research may find different results if the potential user churnis modeled in different way In addition we acknowledge that ourset of metrics is not exhaustive For example we have not studiedwhether the presence of a stack-trace in an issue report can berelated to future potential user churn We plan to (i) model thepotential user churn as a continuous variable and (ii) extend our setof features to predict the potential user churn in future research

External Validity Our study targeted three domains that areweb servers IDEs and web browsers These three domains werechosen due to their availability since we could fetch their issuereports data Although our study showed common factors in ouranalyses we cannot generalize our results to other domains orprojects with a different scale (ie smaller projects) Regardless ourwork provides insights of the possible reasons behind user churn

9

1045

1046

1047

1048

1049

1050

1051

1052

1053

1054

1055

1056

1057

1058

1059

1060

1061

1062

1063

1064

1065

1066

1067

1068

1069

1070

1071

1072

1073

1074

1075

1076

1077

1078

1079

1080

1081

1082

1083

1084

1085

1086

1087

1088

1089

1090

1091

1092

1093

1094

1095

1096

1097

1098

1099

1100

1101

1102

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

1103

1104

1105

1106

1107

1108

1109

1110

1111

1112

1113

1114

1115

1116

1117

1118

1119

1120

1121

1122

1123

1124

1125

1126

1127

1128

1129

1130

1131

1132

1133

1134

1135

1136

1137

1138

1139

1140

1141

1142

1143

1144

1145

1146

1147

1148

1149

1150

1151

1152

1153

1154

1155

1156

1157

1158

1159

1160

Table 7 The most important features for Firefox Eclipse and Apache ranked by their drop in AUC

Factor ΔAUC Min 0 Max 0 Mean 0 Median 0 Min 1 Max 1 Mean 1 Median 1

FirefoxTime Alive 013 0 120 1102 13 10 1487 4872 63OS All 007 0 1 014 0 0 1 083 1Number of Comments 006 1 65 894 12 1 458 2811 45Time Since Last Change 006 9 30 1090 22 0 5 410 3Has a Target Milestone 005 0 1 022 0 0 1 079 1Has a Version Number 004 0 1 033 0 0 1 064 1Component UI Web Payments 003 0 1 042 0 0 1 061 1

EclipseTime Alive 012 0 103 901 12 8 1334 4396 49Has a Target Milestone 006 0 1 019 0 0 1 082 1Time since last change 006 7 27 851 19 0 6 312 3Has a Version Number 005 0 1 034 0 0 1 077 1Component Debug 004 0 1 047 0 0 1 072 1Number of Comments 003 1 72 1092 11 1 632 2701 72OS All 002 0 1 050 0 0 1 068 1

ApacheTime Alive 014 0 136 1132 14 14 1521 5310 70Component Utils 010 0 1 022 0 0 1 091 1Has a Version Number 009 0 1 038 0 0 1 086 1Number of Comments 005 1 92 1232 6 1 352 4207 79OS All 004 0 1 033 0 0 1 079 1Status REOPENED 003 0 1 040 0 0 1 073 1Hardware All 002 0 1 051 0 0 1 078 1

6 RELATEDWORKGiven that we study the relationship between potential user churnand issue reports our research is related to the areas of usercustomerchurn and software quality Therefore we survey the related re-search around these two areas in this section

UserCustomerChurnUser or customer churn has beenwidelystudied in the area of telecommunication services [3 28 71] Aminet al [3] proposed a cross-company machine learning model topredict customer churn The authors also propose the use of datatransformation (eg by using the log function) to improve the train-ing data Ullah et al [71] used feature selection techniques such asInformation Gain and Correlation Attributes Ranking Filter to selectwhich features better explained the churn behaviour of differentcustomer groups The authors observed that a Random Forest modelperformed the best in their study While the Business to Customers(B2C) churn has been widely studied Figalist et al [28] recentlyinvestigated the Business to Business (B2B) churn ie when thecustomer business changes shifts to the services provided by thecompetition Figalist et al [28] allude that although B2B relation-ships tend to be more stable they have a much bigger financial im-pact when they change In terms of software engineering there hasbeen a lack of studies that have investigated usercostumer churndirectly Instead much research has been invested on customerreviews regarding software products [6 46 47] Noei et al [46]studied which features from mobile apps are associated with betterranks in the Google Play store While the work by Noei et al [46]does not study the customer churn directly striving for better ranksin Google Play may avoid user churn in mobile apps Bavota etal [6] investigated the relationship between the usage of fault- andchange-prone APIs in Android apps and user reviews Differentlyfrom all the aforementioned work our study investigates the userchurn is software products

Software Quality Although there exist a lack of studies ana-lyzing the user churn of software productsservices there has beena considerable amount of research on software quality [5 34 39 6667 74] Research on software quality is important since a quality

software is more stable and less defect-prone Ultimately such char-acteristics are tightly related to the observations of our study(iesoftware issues are correlated with user churn) A prominent areaof software quality research in software engineering is the defectprediction area [66 67] Tantithamthavorn et al [66] studied howmuch improvement can be obtained by automatically optimizingdefect prediction models Their research shows that automatic op-timization can yield significant or insignificant performance gainsdepending on the machine learning algorithm Other considerableamount of research has been invested on the triaging part of issuereports [5] and on the effort estimation of an issue report (in termsof time to address an issue) [39 74] In this paper we complementprior research by studying the relationship shared between issuereports (eg defects or required enhancements) with the potentialuser churn of users

7 CONCLUSIONIn this work we study the data available on alternativetonet tobetter understand the relationship between software issues and thepotential user churn of users Having observed that user concernsare tightly related to software issues (eg bugs) we investigatethe relationship between issue reports and the potential user churnof users Our study reveals key issues that must be addressed forthe success of a software product (depending on the domain) Forexample we observe that the potential user churn of users may betightly related to the lack of a robust documentation and support fortesting tools (in the ldquoIDErdquo and ldquoWeb Serverrdquo domains) Finally ourmachine learning models reveal that (i) the longer the issue takes tobe fixed the higher the chances of user churn and (ii) issues withinmore general software components are more likely to be associatedwith user churn Finally we suggest that the current prioritizationperformed by developers should be augmented to encompass thelong lived and highly interactive issues In overall our study sug-gests that the prioritization process of issues can be improved byconsidering the potential user churn of users associated with suchissues

10

1161

1162

1163

1164

1165

1166

1167

1168

1169

1170

1171

1172

1173

1174

1175

1176

1177

1178

1179

1180

1181

1182

1183

1184

1185

1186

1187

1188

1189

1190

1191

1192

1193

1194

1195

1196

1197

1198

1199

1200

1201

1202

1203

1204

1205

1206

1207

1208

1209

1210

1211

1212

1213

1214

1215

1216

1217

1218

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

1219

1220

1221

1222

1223

1224

1225

1226

1227

1228

1229

1230

1231

1232

1233

1234

1235

1236

1237

1238

1239

1240

1241

1242

1243

1244

1245

1246

1247

1248

1249

1250

1251

1252

1253

1254

1255

1256

1257

1258

1259

1260

1261

1262

1263

1264

1265

1266

1267

1268

1269

1270

1271

1272

1273

1274

1275

1276

REFERENCES[1] Martiacuten Abadi Paul Barham Jianmin Chen Zhifeng Chen Andy Davis Jeffrey

Dean Matthieu Devin Sanjay Ghemawat Geoffrey Irving Michael Isard Man-junath Kudlur Josh Levenberg Rajat Monga Sherry Moore Derek G MurrayBenoit Steiner Paul Tucker Vijay Vasudevan Pete Warden Martin Wicke YuanYu and Xiaoqiang Zheng 2016 Tensorflow A system for large-scale machinelearning In 12th USENIX Symposium on Operating Systems Design and Imple-mentation (OSDI 16) 265ndash283

[2] W Abdelmoez Mohamed Kholief and Fayrouz M Elsalmy 2012 Bug fix-time pre-diction model using naiumlve bayes classifier In 2012 22nd International Conferenceon Computer Theory and Applications (ICCTA) IEEE 167ndash172

[3] Adnan Amin Babar Shah Asad Masood Khattak Thar Baker and Sajid Anwar2018 Just-in-time customer churn prediction with and without data transforma-tion In 2018 IEEE Congress on Evolutionary Computation (CEC) IEEE 1ndash6

[4] Hafeez Ullah Amin Aamir Saeed Malik Rana Fayyaz Ahmad Nasreen BadruddinNidal Kamel Muhammad Hussain and Weng-Tink Chooi 2015 Feature extrac-tion and classification for EEG signals using wavelet transform and machinelearning techniques Australasian physical amp engineering sciences in medicine 381 (2015) 139ndash149

[5] John Anvik Lyndon Hiew and Gail C Murphy 2006 Who should fix this bugIn Proceedings of the 28th international conference on Software engineering ACM361ndash370

[6] Gabriele Bavota Mario Linares-Vasquez Carlos Eduardo Bernal-Cardenas Mas-similiano Di Penta Rocco Oliveto and Denys Poshyvanyk 2014 The impactof api change-and fault-proneness on the user ratings of android apps IEEETransactions on Software Engineering 41 4 (2014) 384ndash407

[7] Pamela Bhattacharya and Iulian Neamtiu 2011 Bug-fix time prediction modelscan we do better In Proceedings of the 8thWorking Conference on Mining SoftwareRepositories ACM 207ndash210

[8] Steven Bird and Edward Loper 2004 NLTK the natural language toolkit InProceedings of the ACL 2004 on Interactive poster and demonstration sessionsAssociation for Computational Linguistics 31

[9] Andrew P Bradley 1997 The use of the area under the ROC curve in theevaluation of machine learning algorithms Pattern recognition 30 7 (1997)1145ndash1159

[10] Leo Breiman 2001 Random forests Machine learning 45 1 (2001) 5ndash32[11] Albert Caruana and Michael T Ewing 2010 How corporate reputation quality

and value influence online loyalty Journal of Business Research 63 9-10 (2010)1103ndash1110

[12] Tianqi Chen and Carlos Guestrin 2016 Xgboost A scalable tree boosting systemIn Proceedings of the 22nd acm sigkdd international conference on knowledgediscovery and data mining ACM 785ndash794

[13] Tianqi Chen Tong He Michael Benesty Vadim Khotilovich and Yuan Tang 2015Xgboost extreme gradient boosting R package version 04-2 (2015) 1ndash4

[14] Wenbin Chen Kun Fu Jiawei Zuo Xinwei Zheng Tinglei Huang and WenjuanRen 2017 Radar emitter classification for large data set based on weighted-xgboost IET Radar Sonar amp Navigation 11 8 (2017) 1203ndash1207

[15] Morakot Choetkiertikul Hoa Khanh Dam Truyen Tran and Aditya Ghose 2015Predicting delays in software projects using networked classification (t) In 201530th IEEEACM International Conference on Automated Software Engineering (ASE)IEEE 353ndash364

[16] Morakot Choetkiertikul Hoa Khanh Dam Truyen Tran Aditya Ghose and JohnGrundy 2017 Predicting delivery capability in iterative software developmentIEEE Transactions on Software Engineering 44 6 (2017) 551ndash573

[17] Franccedilois Chollet 2017 Keras[18] Daniel Alencar Da Costa Shane McIntosh Uiraacute Kulesza Ahmed E Hassan and

Surafel Lemma Abebe 2018 An empirical study of the integration time of fixedissues Empirical Software Engineering 23 1 (2018) 334ndash383

[19] Daniel Alencar da Costa Shane McIntosh Christoph Treude Uiraacute Kulesza andAhmed E Hassan 2018 The impact of rapid release cycles on the integrationdelay of fixed issues Empirical Software Engineering 23 2 (2018) 835ndash904

[20] Piew Datta Brij Masand Deepak R Mani and Bin Li 2000 Automated cellularmodeling and prediction on a large scale Artificial Intelligence Review 14 6 (2000)485ndash502

[21] Kushal S Dave Vishal Vaingankar Sumanth Kolar and Vasudeva Varma 2013Timespent based models for predicting user retention In Proceedings of the 22ndinternational conference on World Wide Web ACM 331ndash342

[22] William H DeLone and Ephraim R McLean 1992 Information systems successThe quest for the dependent variable Information systems research 3 1 (1992)60ndash95

[23] William H DeLone and Ephraim R McLean 2003 Model of information systemssuccess a ten years update Journal of Management (2003)

[24] Gideon Dror Dan Pelleg Oleg Rokhlenko and Idan Szpektor 2012 Churnprediction in new users of Yahoo answers In Proceedings of the 21st InternationalConference on World Wide Web ACM 829ndash834

[25] Khaled El Emam 2005 The ROI from software quality Auerbach Publications[26] Katherine Ellis Jacqueline Kerr Suneeta Godbole Gert Lanckriet David Wing

and Simon Marshall 2014 A random forest classifier for the prediction of energy

expenditure and type of physical activity from wrist and hip accelerometersPhysiological measurement 35 11 (2014) 2191

[27] David Enke and Suraphan Thawornwong 2005 The use of datamining and neuralnetworks for forecasting stock market returns Expert Systems with applications29 4 (2005) 927ndash940

[28] Iris Figalist Christoph Elsner Jan Bosch and Helena Holmstroumlm Olsson 2019Customer churn prediction in B2B contexts In International Conference on Soft-ware Business Springer 378ndash386

[29] Emanuel Giger Martin Pinzger and Harald Gall 2010 Predicting the fix timeof bugs In Proceedings of the 2nd International Workshop on RecommendationSystems for Software Engineering ACM 52ndash56

[30] Frank E Harrell Jr 2015 Regression modeling strategies with applications to linearmodels logistic and ordinal regression and survival analysis Springer

[31] Steffen Hedegaard and Jakob Grue Simonsen 2013 Extracting usability and userexperience information from online user reviews In Proceedings of the SIGCHIConference on Human Factors in Computing Systems ACM 2089ndash2098

[32] Yujuan Jiang Bram Adams and Daniel M German 2013 Will my patch make itand how fast Case study on the linux kernel In Proceedings of the 10th WorkingConference on Mining Software Repositories IEEE Press 101ndash110

[33] Guolin Ke Jia Zhang Zhenhui Xu Jiang Bian and Tie-Yan Liu 2019 TabNNa universal neural network solution for tabular data httpsopenreviewnetforumid=r1eJssCqY7

[34] Foutse Khomh Brian Chan Ying Zou and Ahmed E Hassan 2011 An entropyevaluation approach for triaging field crashes A case study of mozilla firefox InReverse Engineering (WCRE) 2011 18th Working Conference on IEEE 261ndash270

[35] Andy Liaw and Matthew Wiener 2002 Classification and regression by randomforest R news 2 3 (2002) 18ndash22

[36] Erik Linstead Paul Rigor Sushil Bajracharya Cristina Lopes and Pierre Baldi2007 Mining eclipse developer contributions via author-topic models In FourthInternational Workshop on Mining Software Repositories (MSRrsquo07 ICSE Workshops2007) IEEE 30ndash30

[37] Xi Long Wenjing Yin Le An Haiying Ni Lixian Huang Qi Luo and Yan Chen2012 Churn analysis of online social network users using data mining techniquesIn Proceedings of the international MultiConference of Engineers and ConputerScientists Vol 1

[38] James MacQueen 1967 Some methods for classification and analysis of multivari-ate observations In Proceedings of the fifth Berkeley symposium on mathematicalstatistics and probability Vol 1 Oakland CA USA 281ndash297

[39] Lionel Marks Ying Zou and Ahmed E Hassan 2011 Studying the fix-timefor bugs in large open source projects In Proceedings of the 7th InternationalConference on Predictive Models in Software Engineering ACM 11

[40] Andrew Kachites McCallum 2002 Mallet A machine learning for languagetoolkit (2002)

[41] Miloš Milošević Nenad Živić and Igor Andjelković 2017 Early churn predic-tion with personalized targeting in mobile social games Expert Systems withApplications 83 (2017) 326ndash332

[42] Audris Mockus Roy T Fielding and James Herbsleb 2000 A case study ofopen source software development the Apache server In Proceedings of the 22ndinternational conference on Software engineering Acm 263ndash272

[43] Gustavo Percio Zimmermann Montesdioca and Antocircnio Carlos Gastaud Maccedilada2015 Measuring user satisfaction with information security practices Computersamp Security 48 (2015) 267ndash280

[44] Leann Myers and Maria J Sirois 2004 Spearman correlation coefficients differ-ences between Encyclopedia of statistical sciences 12 (2004)

[45] Didrik Nielsen 2016 Tree boosting with XGBoost-why does XGBoost win EveryMachine Learning Competition Masterrsquos thesis NTNU

[46] Ehsan Noei Daniel Alencar Da Costa and Ying Zou 2018 Winning the appproduction rally In Proceedings of the 2018 26th ACM Joint Meeting on EuropeanSoftware Engineering Conference and Symposium on the Foundations of SoftwareEngineering ACM 283ndash294

[47] Ehsan Noei Feng Zhang Shaohua Wang and Ying Zou 2019 Towards pri-oritizing user-related issue reports of mobile applications Empirical SoftwareEngineering 24 4 (2019) 1964ndash1996

[48] Ehsan Noei Feng Zhang and Ying Zou 2019 Too Many User-Reviews WhatShould App Developers Look at First IEEE Transactions on Software Engineering(2019)

[49] Derek Orsquocallaghan Derek Greene Joe Carthy and Paacutedraig Cunningham 2015An analysis of the coherence of descriptors in topic modeling Expert Systemswith Applications 42 13 (2015) 5645ndash5657

[50] Ya A Pachepsky Dennis Timlin and GY Varallyay 1996 Artificial neural net-works to estimate soil water retention from easily measurable data Soil ScienceSociety of America Journal 60 3 (1996) 727ndash733

[51] Mahesh Pal 2005 Random forest classifier for remote sensing classificationInternational Journal of Remote Sensing 26 1 (2005) 217ndash222

[52] Sebastiano Panichella Andrea Di Sorbo Emitza Guzman Corrado A VisaggioGerardo Canfora and Harald C Gall 2015 How can i improve my app classifyinguser reviews for software maintenance and evolution In Software maintenanceand evolution (ICSME) 2015 IEEE international conference on IEEE 281ndash290

11

1277

1278

1279

1280

1281

1282

1283

1284

1285

1286

1287

1288

1289

1290

1291

1292

1293

1294

1295

1296

1297

1298

1299

1300

1301

1302

1303

1304

1305

1306

1307

1308

1309

1310

1311

1312

1313

1314

1315

1316

1317

1318

1319

1320

1321

1322

1323

1324

1325

1326

1327

1328

1329

1330

1331

1332

1333

1334

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

1335

1336

1337

1338

1339

1340

1341

1342

1343

1344

1345

1346

1347

1348

1349

1350

1351

1352

1353

1354

1355

1356

1357

1358

1359

1360

1361

1362

1363

1364

1365

1366

1367

1368

1369

1370

1371

1372

1373

1374

1375

1376

1377

1378

1379

1380

1381

1382

1383

1384

1385

1386

1387

1388

1389

1390

1391

1392

[53] Chitra Phadke Huseyin Uzunalioglu Veena B Mendiratta Dan Kushnir andDerek Doran 2013 Prediction of subscriber churn using social network analysisBell Labs Technical Journal 17 4 (2013) 63ndash76

[54] Peter C Rigby Daniel M German and Margaret-Anne Storey 2008 Open sourcesoftware peer review practices a case study of the apache server In Proceedingsof the 30th international conference on Software engineering ACM 541ndash550

[55] Martin P Robillard and Robert Deline 2011 A field study of API learning obstaclesEmpirical Software Engineering 16 6 (2011) 703ndash732

[56] Michael Roumlder Andreas Both and Alexander Hinneburg 2015 Exploring thespace of topic coherence measures In Proceedings of the eighth ACM internationalconference on Web search and data mining ACM 399ndash408

[57] Victor Francisco Rodriguez-Galiano Bardan Ghimire John Rogan Mario Chica-Olmo and Juan Pedro Rigol-Sanchez 2012 An assessment of the effectivenessof a random forest classifier for land-cover classification ISPRS Journal of Pho-togrammetry and Remote Sensing 67 (2012) 93ndash104

[58] Hiroaki Sakoe and Seibi Chiba 1978 Dynamic programming algorithm opti-mization for spoken word recognition IEEE transactions on acoustics speech andsignal processing 26 1 (1978) 43ndash49

[59] Stan Salvador and Philip Chan 2007 Toward accurate dynamic time warping inlinear time and space Intelligent Data Analysis 11 5 (2007) 561ndash580

[60] Peter B Seddon 1997 A respecification and extension of the DeLone and McLeanmodel of IS success Information systems research 8 3 (1997) 240ndash253

[61] Rudy Setiono Bart Baesens and Christophe Mues 2008 Recursive neural net-work rule extraction for data with mixed attributes IEEE Transactions on NeuralNetworks 19 2 (2008) 299ndash307

[62] Dong-Hee Shin 2015 Effect of the customer experience on satisfaction withsmartphones Assessing smart satisfaction index with partial least squaresTelecommunications Policy 39 8 (2015) 627ndash641

[63] Tom De Smedt and Walter Daelemans 2012 Pattern for python Journal ofMachine Learning Research 13 Jun (2012) 2063ndash2067

[64] Daniel Svozil Vladimir Kvasnicka and Jiri Pospichal 1997 Introduction to multi-layer feed-forward neural networks Chemometrics and intelligent laboratorysystems 39 1 (1997) 43ndash62

[65] Shaheen Syed and Marco Spruit 2017 Full-text or abstract Examining topiccoherence scores using latent dirichlet allocation In 2017 IEEE Internationalconference on data science and advanced analytics (DSAA) IEEE 165ndash174

[66] Chakkrit Tantithamthavorn Shane McIntosh Ahmed E Hassan and KenichiMatsumoto 2016 Automated parameter optimization of classification techniquesfor defect prediction models In 2016 IEEEACM 38th International Conference onSoftware Engineering (ICSE) IEEE 321ndash332

[67] Chakkrit Tantithamthavorn Shane McIntosh Ahmed E Hassan and KenichiMatsumoto 2018 The impact of automated parameter optimization on defectprediction models IEEE Transactions on Software Engineering (2018)

[68] Romain Tavenard 2017 tslearn A machine learning toolkit dedicated to time-series data httpsgithubcomrtavenartslearn

[69] L Torlay Marcela Perrone-Bertolotti Elizabeth Thomas and Monica Baciu 2017Machine learningndashXGBoost analysis of language networks to classify patientswith epilepsy Brain informatics 4 3 (2017) 159

[70] Geoffrey G Towell and Jude W Shavlik 1993 Extracting refined rules fromknowledge-based neural networks Machine learning 13 1 (1993) 71ndash101

[71] Irfan Ullah Basit Raza Ahmad Kamran Malik Muhammad Imran Saif Ul Islamand SungWon Kim 2019 A churn prediction model using random forest analysisof machine learning techniques for churn prediction and factor identification intelecom sector IEEE Access 7 (2019) 60134ndash60149

[72] Lorenzo Villarroel Gabriele Bavota Barbara Russo Rocco Oliveto and Massimil-iano Di Penta 2016 Release planning of mobile apps based on user reviews InProceedings of the 38th International Conference on Software Engineering ACM14ndash24

[73] Chih-Ping Wei and I-Tang Chiu 2002 Turning telecommunications call detailsto churn prediction a data mining approach Expert systems with applications 232 (2002) 103ndash112

[74] Cathrin Weiss Rahul Premraj Thomas Zimmermann and Andreas Zeller 2007How long will it take to fix this bug In Fourth International Workshop on MiningSoftware Repositories (MSRrsquo07 ICSE Workshops 2007) IEEE 1ndash1

[75] Ian H Witten Eibe Frank and A Mark 2016 Hall and Christopher J Pal DataMining Practical machine learning tools and techniques (2016)

[76] Barbara H Wixom and Peter A Todd 2005 A theoretical integration of usersatisfaction and technology acceptance Information systems research 16 1 (2005)85ndash102

[77] Jae-Chern Yoo and Tae Hee Han 2009 Fast normalized cross-correlation Circuitssystems and signal processing 28 6 (2009) 819

12

  • Abstract
  • 1 Introduction
  • 2 Motivating Example
  • 3 Data Collection
  • 4 Research Questions amp Results
  • 5 Threats To Validity
  • 6 Related Work
  • 7 Conclusion
  • References
Page 10: On the Relationship between User Churn and Software Issues · relationship between the issues that are present in a software prod-uct and user churn. For this purpose, we investigate

1045

1046

1047

1048

1049

1050

1051

1052

1053

1054

1055

1056

1057

1058

1059

1060

1061

1062

1063

1064

1065

1066

1067

1068

1069

1070

1071

1072

1073

1074

1075

1076

1077

1078

1079

1080

1081

1082

1083

1084

1085

1086

1087

1088

1089

1090

1091

1092

1093

1094

1095

1096

1097

1098

1099

1100

1101

1102

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

1103

1104

1105

1106

1107

1108

1109

1110

1111

1112

1113

1114

1115

1116

1117

1118

1119

1120

1121

1122

1123

1124

1125

1126

1127

1128

1129

1130

1131

1132

1133

1134

1135

1136

1137

1138

1139

1140

1141

1142

1143

1144

1145

1146

1147

1148

1149

1150

1151

1152

1153

1154

1155

1156

1157

1158

1159

1160

Table 7 The most important features for Firefox Eclipse and Apache ranked by their drop in AUC

Factor ΔAUC Min 0 Max 0 Mean 0 Median 0 Min 1 Max 1 Mean 1 Median 1

FirefoxTime Alive 013 0 120 1102 13 10 1487 4872 63OS All 007 0 1 014 0 0 1 083 1Number of Comments 006 1 65 894 12 1 458 2811 45Time Since Last Change 006 9 30 1090 22 0 5 410 3Has a Target Milestone 005 0 1 022 0 0 1 079 1Has a Version Number 004 0 1 033 0 0 1 064 1Component UI Web Payments 003 0 1 042 0 0 1 061 1

EclipseTime Alive 012 0 103 901 12 8 1334 4396 49Has a Target Milestone 006 0 1 019 0 0 1 082 1Time since last change 006 7 27 851 19 0 6 312 3Has a Version Number 005 0 1 034 0 0 1 077 1Component Debug 004 0 1 047 0 0 1 072 1Number of Comments 003 1 72 1092 11 1 632 2701 72OS All 002 0 1 050 0 0 1 068 1

ApacheTime Alive 014 0 136 1132 14 14 1521 5310 70Component Utils 010 0 1 022 0 0 1 091 1Has a Version Number 009 0 1 038 0 0 1 086 1Number of Comments 005 1 92 1232 6 1 352 4207 79OS All 004 0 1 033 0 0 1 079 1Status REOPENED 003 0 1 040 0 0 1 073 1Hardware All 002 0 1 051 0 0 1 078 1

6 RELATEDWORKGiven that we study the relationship between potential user churnand issue reports our research is related to the areas of usercustomerchurn and software quality Therefore we survey the related re-search around these two areas in this section

UserCustomerChurnUser or customer churn has beenwidelystudied in the area of telecommunication services [3 28 71] Aminet al [3] proposed a cross-company machine learning model topredict customer churn The authors also propose the use of datatransformation (eg by using the log function) to improve the train-ing data Ullah et al [71] used feature selection techniques such asInformation Gain and Correlation Attributes Ranking Filter to selectwhich features better explained the churn behaviour of differentcustomer groups The authors observed that a Random Forest modelperformed the best in their study While the Business to Customers(B2C) churn has been widely studied Figalist et al [28] recentlyinvestigated the Business to Business (B2B) churn ie when thecustomer business changes shifts to the services provided by thecompetition Figalist et al [28] allude that although B2B relation-ships tend to be more stable they have a much bigger financial im-pact when they change In terms of software engineering there hasbeen a lack of studies that have investigated usercostumer churndirectly Instead much research has been invested on customerreviews regarding software products [6 46 47] Noei et al [46]studied which features from mobile apps are associated with betterranks in the Google Play store While the work by Noei et al [46]does not study the customer churn directly striving for better ranksin Google Play may avoid user churn in mobile apps Bavota etal [6] investigated the relationship between the usage of fault- andchange-prone APIs in Android apps and user reviews Differentlyfrom all the aforementioned work our study investigates the userchurn is software products

Software Quality Although there exist a lack of studies ana-lyzing the user churn of software productsservices there has beena considerable amount of research on software quality [5 34 39 6667 74] Research on software quality is important since a quality

software is more stable and less defect-prone Ultimately such char-acteristics are tightly related to the observations of our study(iesoftware issues are correlated with user churn) A prominent areaof software quality research in software engineering is the defectprediction area [66 67] Tantithamthavorn et al [66] studied howmuch improvement can be obtained by automatically optimizingdefect prediction models Their research shows that automatic op-timization can yield significant or insignificant performance gainsdepending on the machine learning algorithm Other considerableamount of research has been invested on the triaging part of issuereports [5] and on the effort estimation of an issue report (in termsof time to address an issue) [39 74] In this paper we complementprior research by studying the relationship shared between issuereports (eg defects or required enhancements) with the potentialuser churn of users

7 CONCLUSIONIn this work we study the data available on alternativetonet tobetter understand the relationship between software issues and thepotential user churn of users Having observed that user concernsare tightly related to software issues (eg bugs) we investigatethe relationship between issue reports and the potential user churnof users Our study reveals key issues that must be addressed forthe success of a software product (depending on the domain) Forexample we observe that the potential user churn of users may betightly related to the lack of a robust documentation and support fortesting tools (in the ldquoIDErdquo and ldquoWeb Serverrdquo domains) Finally ourmachine learning models reveal that (i) the longer the issue takes tobe fixed the higher the chances of user churn and (ii) issues withinmore general software components are more likely to be associatedwith user churn Finally we suggest that the current prioritizationperformed by developers should be augmented to encompass thelong lived and highly interactive issues In overall our study sug-gests that the prioritization process of issues can be improved byconsidering the potential user churn of users associated with suchissues

10

1161

1162

1163

1164

1165

1166

1167

1168

1169

1170

1171

1172

1173

1174

1175

1176

1177

1178

1179

1180

1181

1182

1183

1184

1185

1186

1187

1188

1189

1190

1191

1192

1193

1194

1195

1196

1197

1198

1199

1200

1201

1202

1203

1204

1205

1206

1207

1208

1209

1210

1211

1212

1213

1214

1215

1216

1217

1218

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

1219

1220

1221

1222

1223

1224

1225

1226

1227

1228

1229

1230

1231

1232

1233

1234

1235

1236

1237

1238

1239

1240

1241

1242

1243

1244

1245

1246

1247

1248

1249

1250

1251

1252

1253

1254

1255

1256

1257

1258

1259

1260

1261

1262

1263

1264

1265

1266

1267

1268

1269

1270

1271

1272

1273

1274

1275

1276

REFERENCES[1] Martiacuten Abadi Paul Barham Jianmin Chen Zhifeng Chen Andy Davis Jeffrey

Dean Matthieu Devin Sanjay Ghemawat Geoffrey Irving Michael Isard Man-junath Kudlur Josh Levenberg Rajat Monga Sherry Moore Derek G MurrayBenoit Steiner Paul Tucker Vijay Vasudevan Pete Warden Martin Wicke YuanYu and Xiaoqiang Zheng 2016 Tensorflow A system for large-scale machinelearning In 12th USENIX Symposium on Operating Systems Design and Imple-mentation (OSDI 16) 265ndash283

[2] W Abdelmoez Mohamed Kholief and Fayrouz M Elsalmy 2012 Bug fix-time pre-diction model using naiumlve bayes classifier In 2012 22nd International Conferenceon Computer Theory and Applications (ICCTA) IEEE 167ndash172

[3] Adnan Amin Babar Shah Asad Masood Khattak Thar Baker and Sajid Anwar2018 Just-in-time customer churn prediction with and without data transforma-tion In 2018 IEEE Congress on Evolutionary Computation (CEC) IEEE 1ndash6

[4] Hafeez Ullah Amin Aamir Saeed Malik Rana Fayyaz Ahmad Nasreen BadruddinNidal Kamel Muhammad Hussain and Weng-Tink Chooi 2015 Feature extrac-tion and classification for EEG signals using wavelet transform and machinelearning techniques Australasian physical amp engineering sciences in medicine 381 (2015) 139ndash149

[5] John Anvik Lyndon Hiew and Gail C Murphy 2006 Who should fix this bugIn Proceedings of the 28th international conference on Software engineering ACM361ndash370

[6] Gabriele Bavota Mario Linares-Vasquez Carlos Eduardo Bernal-Cardenas Mas-similiano Di Penta Rocco Oliveto and Denys Poshyvanyk 2014 The impactof api change-and fault-proneness on the user ratings of android apps IEEETransactions on Software Engineering 41 4 (2014) 384ndash407

[7] Pamela Bhattacharya and Iulian Neamtiu 2011 Bug-fix time prediction modelscan we do better In Proceedings of the 8thWorking Conference on Mining SoftwareRepositories ACM 207ndash210

[8] Steven Bird and Edward Loper 2004 NLTK the natural language toolkit InProceedings of the ACL 2004 on Interactive poster and demonstration sessionsAssociation for Computational Linguistics 31

[9] Andrew P Bradley 1997 The use of the area under the ROC curve in theevaluation of machine learning algorithms Pattern recognition 30 7 (1997)1145ndash1159

[10] Leo Breiman 2001 Random forests Machine learning 45 1 (2001) 5ndash32[11] Albert Caruana and Michael T Ewing 2010 How corporate reputation quality

and value influence online loyalty Journal of Business Research 63 9-10 (2010)1103ndash1110

[12] Tianqi Chen and Carlos Guestrin 2016 Xgboost A scalable tree boosting systemIn Proceedings of the 22nd acm sigkdd international conference on knowledgediscovery and data mining ACM 785ndash794

[13] Tianqi Chen Tong He Michael Benesty Vadim Khotilovich and Yuan Tang 2015Xgboost extreme gradient boosting R package version 04-2 (2015) 1ndash4

[14] Wenbin Chen Kun Fu Jiawei Zuo Xinwei Zheng Tinglei Huang and WenjuanRen 2017 Radar emitter classification for large data set based on weighted-xgboost IET Radar Sonar amp Navigation 11 8 (2017) 1203ndash1207

[15] Morakot Choetkiertikul Hoa Khanh Dam Truyen Tran and Aditya Ghose 2015Predicting delays in software projects using networked classification (t) In 201530th IEEEACM International Conference on Automated Software Engineering (ASE)IEEE 353ndash364

[16] Morakot Choetkiertikul Hoa Khanh Dam Truyen Tran Aditya Ghose and JohnGrundy 2017 Predicting delivery capability in iterative software developmentIEEE Transactions on Software Engineering 44 6 (2017) 551ndash573

[17] Franccedilois Chollet 2017 Keras[18] Daniel Alencar Da Costa Shane McIntosh Uiraacute Kulesza Ahmed E Hassan and

Surafel Lemma Abebe 2018 An empirical study of the integration time of fixedissues Empirical Software Engineering 23 1 (2018) 334ndash383

[19] Daniel Alencar da Costa Shane McIntosh Christoph Treude Uiraacute Kulesza andAhmed E Hassan 2018 The impact of rapid release cycles on the integrationdelay of fixed issues Empirical Software Engineering 23 2 (2018) 835ndash904

[20] Piew Datta Brij Masand Deepak R Mani and Bin Li 2000 Automated cellularmodeling and prediction on a large scale Artificial Intelligence Review 14 6 (2000)485ndash502

[21] Kushal S Dave Vishal Vaingankar Sumanth Kolar and Vasudeva Varma 2013Timespent based models for predicting user retention In Proceedings of the 22ndinternational conference on World Wide Web ACM 331ndash342

[22] William H DeLone and Ephraim R McLean 1992 Information systems successThe quest for the dependent variable Information systems research 3 1 (1992)60ndash95

[23] William H DeLone and Ephraim R McLean 2003 Model of information systemssuccess a ten years update Journal of Management (2003)

[24] Gideon Dror Dan Pelleg Oleg Rokhlenko and Idan Szpektor 2012 Churnprediction in new users of Yahoo answers In Proceedings of the 21st InternationalConference on World Wide Web ACM 829ndash834

[25] Khaled El Emam 2005 The ROI from software quality Auerbach Publications[26] Katherine Ellis Jacqueline Kerr Suneeta Godbole Gert Lanckriet David Wing

and Simon Marshall 2014 A random forest classifier for the prediction of energy

expenditure and type of physical activity from wrist and hip accelerometersPhysiological measurement 35 11 (2014) 2191

[27] David Enke and Suraphan Thawornwong 2005 The use of datamining and neuralnetworks for forecasting stock market returns Expert Systems with applications29 4 (2005) 927ndash940

[28] Iris Figalist Christoph Elsner Jan Bosch and Helena Holmstroumlm Olsson 2019Customer churn prediction in B2B contexts In International Conference on Soft-ware Business Springer 378ndash386

[29] Emanuel Giger Martin Pinzger and Harald Gall 2010 Predicting the fix timeof bugs In Proceedings of the 2nd International Workshop on RecommendationSystems for Software Engineering ACM 52ndash56

[30] Frank E Harrell Jr 2015 Regression modeling strategies with applications to linearmodels logistic and ordinal regression and survival analysis Springer

[31] Steffen Hedegaard and Jakob Grue Simonsen 2013 Extracting usability and userexperience information from online user reviews In Proceedings of the SIGCHIConference on Human Factors in Computing Systems ACM 2089ndash2098

[32] Yujuan Jiang Bram Adams and Daniel M German 2013 Will my patch make itand how fast Case study on the linux kernel In Proceedings of the 10th WorkingConference on Mining Software Repositories IEEE Press 101ndash110

[33] Guolin Ke Jia Zhang Zhenhui Xu Jiang Bian and Tie-Yan Liu 2019 TabNNa universal neural network solution for tabular data httpsopenreviewnetforumid=r1eJssCqY7

[34] Foutse Khomh Brian Chan Ying Zou and Ahmed E Hassan 2011 An entropyevaluation approach for triaging field crashes A case study of mozilla firefox InReverse Engineering (WCRE) 2011 18th Working Conference on IEEE 261ndash270

[35] Andy Liaw and Matthew Wiener 2002 Classification and regression by randomforest R news 2 3 (2002) 18ndash22

[36] Erik Linstead Paul Rigor Sushil Bajracharya Cristina Lopes and Pierre Baldi2007 Mining eclipse developer contributions via author-topic models In FourthInternational Workshop on Mining Software Repositories (MSRrsquo07 ICSE Workshops2007) IEEE 30ndash30

[37] Xi Long Wenjing Yin Le An Haiying Ni Lixian Huang Qi Luo and Yan Chen2012 Churn analysis of online social network users using data mining techniquesIn Proceedings of the international MultiConference of Engineers and ConputerScientists Vol 1

[38] James MacQueen 1967 Some methods for classification and analysis of multivari-ate observations In Proceedings of the fifth Berkeley symposium on mathematicalstatistics and probability Vol 1 Oakland CA USA 281ndash297

[39] Lionel Marks Ying Zou and Ahmed E Hassan 2011 Studying the fix-timefor bugs in large open source projects In Proceedings of the 7th InternationalConference on Predictive Models in Software Engineering ACM 11

[40] Andrew Kachites McCallum 2002 Mallet A machine learning for languagetoolkit (2002)

[41] Miloš Milošević Nenad Živić and Igor Andjelković 2017 Early churn predic-tion with personalized targeting in mobile social games Expert Systems withApplications 83 (2017) 326ndash332

[42] Audris Mockus Roy T Fielding and James Herbsleb 2000 A case study ofopen source software development the Apache server In Proceedings of the 22ndinternational conference on Software engineering Acm 263ndash272

[43] Gustavo Percio Zimmermann Montesdioca and Antocircnio Carlos Gastaud Maccedilada2015 Measuring user satisfaction with information security practices Computersamp Security 48 (2015) 267ndash280

[44] Leann Myers and Maria J Sirois 2004 Spearman correlation coefficients differ-ences between Encyclopedia of statistical sciences 12 (2004)

[45] Didrik Nielsen 2016 Tree boosting with XGBoost-why does XGBoost win EveryMachine Learning Competition Masterrsquos thesis NTNU

[46] Ehsan Noei Daniel Alencar Da Costa and Ying Zou 2018 Winning the appproduction rally In Proceedings of the 2018 26th ACM Joint Meeting on EuropeanSoftware Engineering Conference and Symposium on the Foundations of SoftwareEngineering ACM 283ndash294

[47] Ehsan Noei Feng Zhang Shaohua Wang and Ying Zou 2019 Towards pri-oritizing user-related issue reports of mobile applications Empirical SoftwareEngineering 24 4 (2019) 1964ndash1996

[48] Ehsan Noei Feng Zhang and Ying Zou 2019 Too Many User-Reviews WhatShould App Developers Look at First IEEE Transactions on Software Engineering(2019)

[49] Derek Orsquocallaghan Derek Greene Joe Carthy and Paacutedraig Cunningham 2015An analysis of the coherence of descriptors in topic modeling Expert Systemswith Applications 42 13 (2015) 5645ndash5657

[50] Ya A Pachepsky Dennis Timlin and GY Varallyay 1996 Artificial neural net-works to estimate soil water retention from easily measurable data Soil ScienceSociety of America Journal 60 3 (1996) 727ndash733

[51] Mahesh Pal 2005 Random forest classifier for remote sensing classificationInternational Journal of Remote Sensing 26 1 (2005) 217ndash222

[52] Sebastiano Panichella Andrea Di Sorbo Emitza Guzman Corrado A VisaggioGerardo Canfora and Harald C Gall 2015 How can i improve my app classifyinguser reviews for software maintenance and evolution In Software maintenanceand evolution (ICSME) 2015 IEEE international conference on IEEE 281ndash290

11

1277

1278

1279

1280

1281

1282

1283

1284

1285

1286

1287

1288

1289

1290

1291

1292

1293

1294

1295

1296

1297

1298

1299

1300

1301

1302

1303

1304

1305

1306

1307

1308

1309

1310

1311

1312

1313

1314

1315

1316

1317

1318

1319

1320

1321

1322

1323

1324

1325

1326

1327

1328

1329

1330

1331

1332

1333

1334

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

1335

1336

1337

1338

1339

1340

1341

1342

1343

1344

1345

1346

1347

1348

1349

1350

1351

1352

1353

1354

1355

1356

1357

1358

1359

1360

1361

1362

1363

1364

1365

1366

1367

1368

1369

1370

1371

1372

1373

1374

1375

1376

1377

1378

1379

1380

1381

1382

1383

1384

1385

1386

1387

1388

1389

1390

1391

1392

[53] Chitra Phadke Huseyin Uzunalioglu Veena B Mendiratta Dan Kushnir andDerek Doran 2013 Prediction of subscriber churn using social network analysisBell Labs Technical Journal 17 4 (2013) 63ndash76

[54] Peter C Rigby Daniel M German and Margaret-Anne Storey 2008 Open sourcesoftware peer review practices a case study of the apache server In Proceedingsof the 30th international conference on Software engineering ACM 541ndash550

[55] Martin P Robillard and Robert Deline 2011 A field study of API learning obstaclesEmpirical Software Engineering 16 6 (2011) 703ndash732

[56] Michael Roumlder Andreas Both and Alexander Hinneburg 2015 Exploring thespace of topic coherence measures In Proceedings of the eighth ACM internationalconference on Web search and data mining ACM 399ndash408

[57] Victor Francisco Rodriguez-Galiano Bardan Ghimire John Rogan Mario Chica-Olmo and Juan Pedro Rigol-Sanchez 2012 An assessment of the effectivenessof a random forest classifier for land-cover classification ISPRS Journal of Pho-togrammetry and Remote Sensing 67 (2012) 93ndash104

[58] Hiroaki Sakoe and Seibi Chiba 1978 Dynamic programming algorithm opti-mization for spoken word recognition IEEE transactions on acoustics speech andsignal processing 26 1 (1978) 43ndash49

[59] Stan Salvador and Philip Chan 2007 Toward accurate dynamic time warping inlinear time and space Intelligent Data Analysis 11 5 (2007) 561ndash580

[60] Peter B Seddon 1997 A respecification and extension of the DeLone and McLeanmodel of IS success Information systems research 8 3 (1997) 240ndash253

[61] Rudy Setiono Bart Baesens and Christophe Mues 2008 Recursive neural net-work rule extraction for data with mixed attributes IEEE Transactions on NeuralNetworks 19 2 (2008) 299ndash307

[62] Dong-Hee Shin 2015 Effect of the customer experience on satisfaction withsmartphones Assessing smart satisfaction index with partial least squaresTelecommunications Policy 39 8 (2015) 627ndash641

[63] Tom De Smedt and Walter Daelemans 2012 Pattern for python Journal ofMachine Learning Research 13 Jun (2012) 2063ndash2067

[64] Daniel Svozil Vladimir Kvasnicka and Jiri Pospichal 1997 Introduction to multi-layer feed-forward neural networks Chemometrics and intelligent laboratorysystems 39 1 (1997) 43ndash62

[65] Shaheen Syed and Marco Spruit 2017 Full-text or abstract Examining topiccoherence scores using latent dirichlet allocation In 2017 IEEE Internationalconference on data science and advanced analytics (DSAA) IEEE 165ndash174

[66] Chakkrit Tantithamthavorn Shane McIntosh Ahmed E Hassan and KenichiMatsumoto 2016 Automated parameter optimization of classification techniquesfor defect prediction models In 2016 IEEEACM 38th International Conference onSoftware Engineering (ICSE) IEEE 321ndash332

[67] Chakkrit Tantithamthavorn Shane McIntosh Ahmed E Hassan and KenichiMatsumoto 2018 The impact of automated parameter optimization on defectprediction models IEEE Transactions on Software Engineering (2018)

[68] Romain Tavenard 2017 tslearn A machine learning toolkit dedicated to time-series data httpsgithubcomrtavenartslearn

[69] L Torlay Marcela Perrone-Bertolotti Elizabeth Thomas and Monica Baciu 2017Machine learningndashXGBoost analysis of language networks to classify patientswith epilepsy Brain informatics 4 3 (2017) 159

[70] Geoffrey G Towell and Jude W Shavlik 1993 Extracting refined rules fromknowledge-based neural networks Machine learning 13 1 (1993) 71ndash101

[71] Irfan Ullah Basit Raza Ahmad Kamran Malik Muhammad Imran Saif Ul Islamand SungWon Kim 2019 A churn prediction model using random forest analysisof machine learning techniques for churn prediction and factor identification intelecom sector IEEE Access 7 (2019) 60134ndash60149

[72] Lorenzo Villarroel Gabriele Bavota Barbara Russo Rocco Oliveto and Massimil-iano Di Penta 2016 Release planning of mobile apps based on user reviews InProceedings of the 38th International Conference on Software Engineering ACM14ndash24

[73] Chih-Ping Wei and I-Tang Chiu 2002 Turning telecommunications call detailsto churn prediction a data mining approach Expert systems with applications 232 (2002) 103ndash112

[74] Cathrin Weiss Rahul Premraj Thomas Zimmermann and Andreas Zeller 2007How long will it take to fix this bug In Fourth International Workshop on MiningSoftware Repositories (MSRrsquo07 ICSE Workshops 2007) IEEE 1ndash1

[75] Ian H Witten Eibe Frank and A Mark 2016 Hall and Christopher J Pal DataMining Practical machine learning tools and techniques (2016)

[76] Barbara H Wixom and Peter A Todd 2005 A theoretical integration of usersatisfaction and technology acceptance Information systems research 16 1 (2005)85ndash102

[77] Jae-Chern Yoo and Tae Hee Han 2009 Fast normalized cross-correlation Circuitssystems and signal processing 28 6 (2009) 819

12

  • Abstract
  • 1 Introduction
  • 2 Motivating Example
  • 3 Data Collection
  • 4 Research Questions amp Results
  • 5 Threats To Validity
  • 6 Related Work
  • 7 Conclusion
  • References
Page 11: On the Relationship between User Churn and Software Issues · relationship between the issues that are present in a software prod-uct and user churn. For this purpose, we investigate

1161

1162

1163

1164

1165

1166

1167

1168

1169

1170

1171

1172

1173

1174

1175

1176

1177

1178

1179

1180

1181

1182

1183

1184

1185

1186

1187

1188

1189

1190

1191

1192

1193

1194

1195

1196

1197

1198

1199

1200

1201

1202

1203

1204

1205

1206

1207

1208

1209

1210

1211

1212

1213

1214

1215

1216

1217

1218

On the Relationship between User Churn and Software Issues MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea

1219

1220

1221

1222

1223

1224

1225

1226

1227

1228

1229

1230

1231

1232

1233

1234

1235

1236

1237

1238

1239

1240

1241

1242

1243

1244

1245

1246

1247

1248

1249

1250

1251

1252

1253

1254

1255

1256

1257

1258

1259

1260

1261

1262

1263

1264

1265

1266

1267

1268

1269

1270

1271

1272

1273

1274

1275

1276

REFERENCES[1] Martiacuten Abadi Paul Barham Jianmin Chen Zhifeng Chen Andy Davis Jeffrey

Dean Matthieu Devin Sanjay Ghemawat Geoffrey Irving Michael Isard Man-junath Kudlur Josh Levenberg Rajat Monga Sherry Moore Derek G MurrayBenoit Steiner Paul Tucker Vijay Vasudevan Pete Warden Martin Wicke YuanYu and Xiaoqiang Zheng 2016 Tensorflow A system for large-scale machinelearning In 12th USENIX Symposium on Operating Systems Design and Imple-mentation (OSDI 16) 265ndash283

[2] W Abdelmoez Mohamed Kholief and Fayrouz M Elsalmy 2012 Bug fix-time pre-diction model using naiumlve bayes classifier In 2012 22nd International Conferenceon Computer Theory and Applications (ICCTA) IEEE 167ndash172

[3] Adnan Amin Babar Shah Asad Masood Khattak Thar Baker and Sajid Anwar2018 Just-in-time customer churn prediction with and without data transforma-tion In 2018 IEEE Congress on Evolutionary Computation (CEC) IEEE 1ndash6

[4] Hafeez Ullah Amin Aamir Saeed Malik Rana Fayyaz Ahmad Nasreen BadruddinNidal Kamel Muhammad Hussain and Weng-Tink Chooi 2015 Feature extrac-tion and classification for EEG signals using wavelet transform and machinelearning techniques Australasian physical amp engineering sciences in medicine 381 (2015) 139ndash149

[5] John Anvik Lyndon Hiew and Gail C Murphy 2006 Who should fix this bugIn Proceedings of the 28th international conference on Software engineering ACM361ndash370

[6] Gabriele Bavota Mario Linares-Vasquez Carlos Eduardo Bernal-Cardenas Mas-similiano Di Penta Rocco Oliveto and Denys Poshyvanyk 2014 The impactof api change-and fault-proneness on the user ratings of android apps IEEETransactions on Software Engineering 41 4 (2014) 384ndash407

[7] Pamela Bhattacharya and Iulian Neamtiu 2011 Bug-fix time prediction modelscan we do better In Proceedings of the 8thWorking Conference on Mining SoftwareRepositories ACM 207ndash210

[8] Steven Bird and Edward Loper 2004 NLTK the natural language toolkit InProceedings of the ACL 2004 on Interactive poster and demonstration sessionsAssociation for Computational Linguistics 31

[9] Andrew P Bradley 1997 The use of the area under the ROC curve in theevaluation of machine learning algorithms Pattern recognition 30 7 (1997)1145ndash1159

[10] Leo Breiman 2001 Random forests Machine learning 45 1 (2001) 5ndash32[11] Albert Caruana and Michael T Ewing 2010 How corporate reputation quality

and value influence online loyalty Journal of Business Research 63 9-10 (2010)1103ndash1110

[12] Tianqi Chen and Carlos Guestrin 2016 Xgboost A scalable tree boosting systemIn Proceedings of the 22nd acm sigkdd international conference on knowledgediscovery and data mining ACM 785ndash794

[13] Tianqi Chen Tong He Michael Benesty Vadim Khotilovich and Yuan Tang 2015Xgboost extreme gradient boosting R package version 04-2 (2015) 1ndash4

[14] Wenbin Chen Kun Fu Jiawei Zuo Xinwei Zheng Tinglei Huang and WenjuanRen 2017 Radar emitter classification for large data set based on weighted-xgboost IET Radar Sonar amp Navigation 11 8 (2017) 1203ndash1207

[15] Morakot Choetkiertikul Hoa Khanh Dam Truyen Tran and Aditya Ghose 2015Predicting delays in software projects using networked classification (t) In 201530th IEEEACM International Conference on Automated Software Engineering (ASE)IEEE 353ndash364

[16] Morakot Choetkiertikul Hoa Khanh Dam Truyen Tran Aditya Ghose and JohnGrundy 2017 Predicting delivery capability in iterative software developmentIEEE Transactions on Software Engineering 44 6 (2017) 551ndash573

[17] Franccedilois Chollet 2017 Keras[18] Daniel Alencar Da Costa Shane McIntosh Uiraacute Kulesza Ahmed E Hassan and

Surafel Lemma Abebe 2018 An empirical study of the integration time of fixedissues Empirical Software Engineering 23 1 (2018) 334ndash383

[19] Daniel Alencar da Costa Shane McIntosh Christoph Treude Uiraacute Kulesza andAhmed E Hassan 2018 The impact of rapid release cycles on the integrationdelay of fixed issues Empirical Software Engineering 23 2 (2018) 835ndash904

[20] Piew Datta Brij Masand Deepak R Mani and Bin Li 2000 Automated cellularmodeling and prediction on a large scale Artificial Intelligence Review 14 6 (2000)485ndash502

[21] Kushal S Dave Vishal Vaingankar Sumanth Kolar and Vasudeva Varma 2013Timespent based models for predicting user retention In Proceedings of the 22ndinternational conference on World Wide Web ACM 331ndash342

[22] William H DeLone and Ephraim R McLean 1992 Information systems successThe quest for the dependent variable Information systems research 3 1 (1992)60ndash95

[23] William H DeLone and Ephraim R McLean 2003 Model of information systemssuccess a ten years update Journal of Management (2003)

[24] Gideon Dror Dan Pelleg Oleg Rokhlenko and Idan Szpektor 2012 Churnprediction in new users of Yahoo answers In Proceedings of the 21st InternationalConference on World Wide Web ACM 829ndash834

[25] Khaled El Emam 2005 The ROI from software quality Auerbach Publications[26] Katherine Ellis Jacqueline Kerr Suneeta Godbole Gert Lanckriet David Wing

and Simon Marshall 2014 A random forest classifier for the prediction of energy

expenditure and type of physical activity from wrist and hip accelerometersPhysiological measurement 35 11 (2014) 2191

[27] David Enke and Suraphan Thawornwong 2005 The use of datamining and neuralnetworks for forecasting stock market returns Expert Systems with applications29 4 (2005) 927ndash940

[28] Iris Figalist Christoph Elsner Jan Bosch and Helena Holmstroumlm Olsson 2019Customer churn prediction in B2B contexts In International Conference on Soft-ware Business Springer 378ndash386

[29] Emanuel Giger Martin Pinzger and Harald Gall 2010 Predicting the fix timeof bugs In Proceedings of the 2nd International Workshop on RecommendationSystems for Software Engineering ACM 52ndash56

[30] Frank E Harrell Jr 2015 Regression modeling strategies with applications to linearmodels logistic and ordinal regression and survival analysis Springer

[31] Steffen Hedegaard and Jakob Grue Simonsen 2013 Extracting usability and userexperience information from online user reviews In Proceedings of the SIGCHIConference on Human Factors in Computing Systems ACM 2089ndash2098

[32] Yujuan Jiang Bram Adams and Daniel M German 2013 Will my patch make itand how fast Case study on the linux kernel In Proceedings of the 10th WorkingConference on Mining Software Repositories IEEE Press 101ndash110

[33] Guolin Ke Jia Zhang Zhenhui Xu Jiang Bian and Tie-Yan Liu 2019 TabNNa universal neural network solution for tabular data httpsopenreviewnetforumid=r1eJssCqY7

[34] Foutse Khomh Brian Chan Ying Zou and Ahmed E Hassan 2011 An entropyevaluation approach for triaging field crashes A case study of mozilla firefox InReverse Engineering (WCRE) 2011 18th Working Conference on IEEE 261ndash270

[35] Andy Liaw and Matthew Wiener 2002 Classification and regression by randomforest R news 2 3 (2002) 18ndash22

[36] Erik Linstead Paul Rigor Sushil Bajracharya Cristina Lopes and Pierre Baldi2007 Mining eclipse developer contributions via author-topic models In FourthInternational Workshop on Mining Software Repositories (MSRrsquo07 ICSE Workshops2007) IEEE 30ndash30

[37] Xi Long Wenjing Yin Le An Haiying Ni Lixian Huang Qi Luo and Yan Chen2012 Churn analysis of online social network users using data mining techniquesIn Proceedings of the international MultiConference of Engineers and ConputerScientists Vol 1

[38] James MacQueen 1967 Some methods for classification and analysis of multivari-ate observations In Proceedings of the fifth Berkeley symposium on mathematicalstatistics and probability Vol 1 Oakland CA USA 281ndash297

[39] Lionel Marks Ying Zou and Ahmed E Hassan 2011 Studying the fix-timefor bugs in large open source projects In Proceedings of the 7th InternationalConference on Predictive Models in Software Engineering ACM 11

[40] Andrew Kachites McCallum 2002 Mallet A machine learning for languagetoolkit (2002)

[41] Miloš Milošević Nenad Živić and Igor Andjelković 2017 Early churn predic-tion with personalized targeting in mobile social games Expert Systems withApplications 83 (2017) 326ndash332

[42] Audris Mockus Roy T Fielding and James Herbsleb 2000 A case study ofopen source software development the Apache server In Proceedings of the 22ndinternational conference on Software engineering Acm 263ndash272

[43] Gustavo Percio Zimmermann Montesdioca and Antocircnio Carlos Gastaud Maccedilada2015 Measuring user satisfaction with information security practices Computersamp Security 48 (2015) 267ndash280

[44] Leann Myers and Maria J Sirois 2004 Spearman correlation coefficients differ-ences between Encyclopedia of statistical sciences 12 (2004)

[45] Didrik Nielsen 2016 Tree boosting with XGBoost-why does XGBoost win EveryMachine Learning Competition Masterrsquos thesis NTNU

[46] Ehsan Noei Daniel Alencar Da Costa and Ying Zou 2018 Winning the appproduction rally In Proceedings of the 2018 26th ACM Joint Meeting on EuropeanSoftware Engineering Conference and Symposium on the Foundations of SoftwareEngineering ACM 283ndash294

[47] Ehsan Noei Feng Zhang Shaohua Wang and Ying Zou 2019 Towards pri-oritizing user-related issue reports of mobile applications Empirical SoftwareEngineering 24 4 (2019) 1964ndash1996

[48] Ehsan Noei Feng Zhang and Ying Zou 2019 Too Many User-Reviews WhatShould App Developers Look at First IEEE Transactions on Software Engineering(2019)

[49] Derek Orsquocallaghan Derek Greene Joe Carthy and Paacutedraig Cunningham 2015An analysis of the coherence of descriptors in topic modeling Expert Systemswith Applications 42 13 (2015) 5645ndash5657

[50] Ya A Pachepsky Dennis Timlin and GY Varallyay 1996 Artificial neural net-works to estimate soil water retention from easily measurable data Soil ScienceSociety of America Journal 60 3 (1996) 727ndash733

[51] Mahesh Pal 2005 Random forest classifier for remote sensing classificationInternational Journal of Remote Sensing 26 1 (2005) 217ndash222

[52] Sebastiano Panichella Andrea Di Sorbo Emitza Guzman Corrado A VisaggioGerardo Canfora and Harald C Gall 2015 How can i improve my app classifyinguser reviews for software maintenance and evolution In Software maintenanceand evolution (ICSME) 2015 IEEE international conference on IEEE 281ndash290

11

1277

1278

1279

1280

1281

1282

1283

1284

1285

1286

1287

1288

1289

1290

1291

1292

1293

1294

1295

1296

1297

1298

1299

1300

1301

1302

1303

1304

1305

1306

1307

1308

1309

1310

1311

1312

1313

1314

1315

1316

1317

1318

1319

1320

1321

1322

1323

1324

1325

1326

1327

1328

1329

1330

1331

1332

1333

1334

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

1335

1336

1337

1338

1339

1340

1341

1342

1343

1344

1345

1346

1347

1348

1349

1350

1351

1352

1353

1354

1355

1356

1357

1358

1359

1360

1361

1362

1363

1364

1365

1366

1367

1368

1369

1370

1371

1372

1373

1374

1375

1376

1377

1378

1379

1380

1381

1382

1383

1384

1385

1386

1387

1388

1389

1390

1391

1392

[53] Chitra Phadke Huseyin Uzunalioglu Veena B Mendiratta Dan Kushnir andDerek Doran 2013 Prediction of subscriber churn using social network analysisBell Labs Technical Journal 17 4 (2013) 63ndash76

[54] Peter C Rigby Daniel M German and Margaret-Anne Storey 2008 Open sourcesoftware peer review practices a case study of the apache server In Proceedingsof the 30th international conference on Software engineering ACM 541ndash550

[55] Martin P Robillard and Robert Deline 2011 A field study of API learning obstaclesEmpirical Software Engineering 16 6 (2011) 703ndash732

[56] Michael Roumlder Andreas Both and Alexander Hinneburg 2015 Exploring thespace of topic coherence measures In Proceedings of the eighth ACM internationalconference on Web search and data mining ACM 399ndash408

[57] Victor Francisco Rodriguez-Galiano Bardan Ghimire John Rogan Mario Chica-Olmo and Juan Pedro Rigol-Sanchez 2012 An assessment of the effectivenessof a random forest classifier for land-cover classification ISPRS Journal of Pho-togrammetry and Remote Sensing 67 (2012) 93ndash104

[58] Hiroaki Sakoe and Seibi Chiba 1978 Dynamic programming algorithm opti-mization for spoken word recognition IEEE transactions on acoustics speech andsignal processing 26 1 (1978) 43ndash49

[59] Stan Salvador and Philip Chan 2007 Toward accurate dynamic time warping inlinear time and space Intelligent Data Analysis 11 5 (2007) 561ndash580

[60] Peter B Seddon 1997 A respecification and extension of the DeLone and McLeanmodel of IS success Information systems research 8 3 (1997) 240ndash253

[61] Rudy Setiono Bart Baesens and Christophe Mues 2008 Recursive neural net-work rule extraction for data with mixed attributes IEEE Transactions on NeuralNetworks 19 2 (2008) 299ndash307

[62] Dong-Hee Shin 2015 Effect of the customer experience on satisfaction withsmartphones Assessing smart satisfaction index with partial least squaresTelecommunications Policy 39 8 (2015) 627ndash641

[63] Tom De Smedt and Walter Daelemans 2012 Pattern for python Journal ofMachine Learning Research 13 Jun (2012) 2063ndash2067

[64] Daniel Svozil Vladimir Kvasnicka and Jiri Pospichal 1997 Introduction to multi-layer feed-forward neural networks Chemometrics and intelligent laboratorysystems 39 1 (1997) 43ndash62

[65] Shaheen Syed and Marco Spruit 2017 Full-text or abstract Examining topiccoherence scores using latent dirichlet allocation In 2017 IEEE Internationalconference on data science and advanced analytics (DSAA) IEEE 165ndash174

[66] Chakkrit Tantithamthavorn Shane McIntosh Ahmed E Hassan and KenichiMatsumoto 2016 Automated parameter optimization of classification techniquesfor defect prediction models In 2016 IEEEACM 38th International Conference onSoftware Engineering (ICSE) IEEE 321ndash332

[67] Chakkrit Tantithamthavorn Shane McIntosh Ahmed E Hassan and KenichiMatsumoto 2018 The impact of automated parameter optimization on defectprediction models IEEE Transactions on Software Engineering (2018)

[68] Romain Tavenard 2017 tslearn A machine learning toolkit dedicated to time-series data httpsgithubcomrtavenartslearn

[69] L Torlay Marcela Perrone-Bertolotti Elizabeth Thomas and Monica Baciu 2017Machine learningndashXGBoost analysis of language networks to classify patientswith epilepsy Brain informatics 4 3 (2017) 159

[70] Geoffrey G Towell and Jude W Shavlik 1993 Extracting refined rules fromknowledge-based neural networks Machine learning 13 1 (1993) 71ndash101

[71] Irfan Ullah Basit Raza Ahmad Kamran Malik Muhammad Imran Saif Ul Islamand SungWon Kim 2019 A churn prediction model using random forest analysisof machine learning techniques for churn prediction and factor identification intelecom sector IEEE Access 7 (2019) 60134ndash60149

[72] Lorenzo Villarroel Gabriele Bavota Barbara Russo Rocco Oliveto and Massimil-iano Di Penta 2016 Release planning of mobile apps based on user reviews InProceedings of the 38th International Conference on Software Engineering ACM14ndash24

[73] Chih-Ping Wei and I-Tang Chiu 2002 Turning telecommunications call detailsto churn prediction a data mining approach Expert systems with applications 232 (2002) 103ndash112

[74] Cathrin Weiss Rahul Premraj Thomas Zimmermann and Andreas Zeller 2007How long will it take to fix this bug In Fourth International Workshop on MiningSoftware Repositories (MSRrsquo07 ICSE Workshops 2007) IEEE 1ndash1

[75] Ian H Witten Eibe Frank and A Mark 2016 Hall and Christopher J Pal DataMining Practical machine learning tools and techniques (2016)

[76] Barbara H Wixom and Peter A Todd 2005 A theoretical integration of usersatisfaction and technology acceptance Information systems research 16 1 (2005)85ndash102

[77] Jae-Chern Yoo and Tae Hee Han 2009 Fast normalized cross-correlation Circuitssystems and signal processing 28 6 (2009) 819

12

  • Abstract
  • 1 Introduction
  • 2 Motivating Example
  • 3 Data Collection
  • 4 Research Questions amp Results
  • 5 Threats To Validity
  • 6 Related Work
  • 7 Conclusion
  • References
Page 12: On the Relationship between User Churn and Software Issues · relationship between the issues that are present in a software prod-uct and user churn. For this purpose, we investigate

1277

1278

1279

1280

1281

1282

1283

1284

1285

1286

1287

1288

1289

1290

1291

1292

1293

1294

1295

1296

1297

1298

1299

1300

1301

1302

1303

1304

1305

1306

1307

1308

1309

1310

1311

1312

1313

1314

1315

1316

1317

1318

1319

1320

1321

1322

1323

1324

1325

1326

1327

1328

1329

1330

1331

1332

1333

1334

MSRrsquo20 October 5ndash6 2020 Seoul Republic of Korea Anonymous et al

1335

1336

1337

1338

1339

1340

1341

1342

1343

1344

1345

1346

1347

1348

1349

1350

1351

1352

1353

1354

1355

1356

1357

1358

1359

1360

1361

1362

1363

1364

1365

1366

1367

1368

1369

1370

1371

1372

1373

1374

1375

1376

1377

1378

1379

1380

1381

1382

1383

1384

1385

1386

1387

1388

1389

1390

1391

1392

[53] Chitra Phadke Huseyin Uzunalioglu Veena B Mendiratta Dan Kushnir andDerek Doran 2013 Prediction of subscriber churn using social network analysisBell Labs Technical Journal 17 4 (2013) 63ndash76

[54] Peter C Rigby Daniel M German and Margaret-Anne Storey 2008 Open sourcesoftware peer review practices a case study of the apache server In Proceedingsof the 30th international conference on Software engineering ACM 541ndash550

[55] Martin P Robillard and Robert Deline 2011 A field study of API learning obstaclesEmpirical Software Engineering 16 6 (2011) 703ndash732

[56] Michael Roumlder Andreas Both and Alexander Hinneburg 2015 Exploring thespace of topic coherence measures In Proceedings of the eighth ACM internationalconference on Web search and data mining ACM 399ndash408

[57] Victor Francisco Rodriguez-Galiano Bardan Ghimire John Rogan Mario Chica-Olmo and Juan Pedro Rigol-Sanchez 2012 An assessment of the effectivenessof a random forest classifier for land-cover classification ISPRS Journal of Pho-togrammetry and Remote Sensing 67 (2012) 93ndash104

[58] Hiroaki Sakoe and Seibi Chiba 1978 Dynamic programming algorithm opti-mization for spoken word recognition IEEE transactions on acoustics speech andsignal processing 26 1 (1978) 43ndash49

[59] Stan Salvador and Philip Chan 2007 Toward accurate dynamic time warping inlinear time and space Intelligent Data Analysis 11 5 (2007) 561ndash580

[60] Peter B Seddon 1997 A respecification and extension of the DeLone and McLeanmodel of IS success Information systems research 8 3 (1997) 240ndash253

[61] Rudy Setiono Bart Baesens and Christophe Mues 2008 Recursive neural net-work rule extraction for data with mixed attributes IEEE Transactions on NeuralNetworks 19 2 (2008) 299ndash307

[62] Dong-Hee Shin 2015 Effect of the customer experience on satisfaction withsmartphones Assessing smart satisfaction index with partial least squaresTelecommunications Policy 39 8 (2015) 627ndash641

[63] Tom De Smedt and Walter Daelemans 2012 Pattern for python Journal ofMachine Learning Research 13 Jun (2012) 2063ndash2067

[64] Daniel Svozil Vladimir Kvasnicka and Jiri Pospichal 1997 Introduction to multi-layer feed-forward neural networks Chemometrics and intelligent laboratorysystems 39 1 (1997) 43ndash62

[65] Shaheen Syed and Marco Spruit 2017 Full-text or abstract Examining topiccoherence scores using latent dirichlet allocation In 2017 IEEE Internationalconference on data science and advanced analytics (DSAA) IEEE 165ndash174

[66] Chakkrit Tantithamthavorn Shane McIntosh Ahmed E Hassan and KenichiMatsumoto 2016 Automated parameter optimization of classification techniquesfor defect prediction models In 2016 IEEEACM 38th International Conference onSoftware Engineering (ICSE) IEEE 321ndash332

[67] Chakkrit Tantithamthavorn Shane McIntosh Ahmed E Hassan and KenichiMatsumoto 2018 The impact of automated parameter optimization on defectprediction models IEEE Transactions on Software Engineering (2018)

[68] Romain Tavenard 2017 tslearn A machine learning toolkit dedicated to time-series data httpsgithubcomrtavenartslearn

[69] L Torlay Marcela Perrone-Bertolotti Elizabeth Thomas and Monica Baciu 2017Machine learningndashXGBoost analysis of language networks to classify patientswith epilepsy Brain informatics 4 3 (2017) 159

[70] Geoffrey G Towell and Jude W Shavlik 1993 Extracting refined rules fromknowledge-based neural networks Machine learning 13 1 (1993) 71ndash101

[71] Irfan Ullah Basit Raza Ahmad Kamran Malik Muhammad Imran Saif Ul Islamand SungWon Kim 2019 A churn prediction model using random forest analysisof machine learning techniques for churn prediction and factor identification intelecom sector IEEE Access 7 (2019) 60134ndash60149

[72] Lorenzo Villarroel Gabriele Bavota Barbara Russo Rocco Oliveto and Massimil-iano Di Penta 2016 Release planning of mobile apps based on user reviews InProceedings of the 38th International Conference on Software Engineering ACM14ndash24

[73] Chih-Ping Wei and I-Tang Chiu 2002 Turning telecommunications call detailsto churn prediction a data mining approach Expert systems with applications 232 (2002) 103ndash112

[74] Cathrin Weiss Rahul Premraj Thomas Zimmermann and Andreas Zeller 2007How long will it take to fix this bug In Fourth International Workshop on MiningSoftware Repositories (MSRrsquo07 ICSE Workshops 2007) IEEE 1ndash1

[75] Ian H Witten Eibe Frank and A Mark 2016 Hall and Christopher J Pal DataMining Practical machine learning tools and techniques (2016)

[76] Barbara H Wixom and Peter A Todd 2005 A theoretical integration of usersatisfaction and technology acceptance Information systems research 16 1 (2005)85ndash102

[77] Jae-Chern Yoo and Tae Hee Han 2009 Fast normalized cross-correlation Circuitssystems and signal processing 28 6 (2009) 819

12

  • Abstract
  • 1 Introduction
  • 2 Motivating Example
  • 3 Data Collection
  • 4 Research Questions amp Results
  • 5 Threats To Validity
  • 6 Related Work
  • 7 Conclusion
  • References