10
Managing open-source projects’ manpower By Forecasting the number of daily issues on GitHub’s Repository Group 2 Aditya Utama Wijaya - V K Sanjeed - Renaud Jollet De Lorenzo - Katy Wen Lee - Cindy Soh

Group 2 Aditya Utama Wijaya - V K Sanjeed - Renaud Jollet ... Source User... · r tory Group 2 Aditya Utama Wijaya - V K Sanjeed - Renaud Jollet De Lorenzo - Katy Wen Lee - Cindy

  • Upload
    vothu

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Managing open-source projects’ manpower

By Forecasting the number of daily issues on GitHub’s Repository

Group 2Aditya Utama Wijaya - V K Sanjeed - Renaud Jollet De

Lorenzo - Katy Wen Lee - Cindy Soh

Business problemOpen-Source collaboration

platform

14 Million Users

35 Million Repo

Open-source software development cycle

Contribution

The Open-Source Software Development Problem

Contribution

Community

Community

Business GoalsBusiness Goals

To manage the manpower allocation through the number of daily issues forecast in the upcoming 3

weeks

Opportunities1. Our forecast will be a proof-of-concept

whether our method can successfully forecast number of issues.

2. Accurate forecasts will bring value to foundation to allocate manpower efficiently.

Challenges1. Diverse characeristics of the repository

community

Donation $

Users (Individual & Corporation)

Repository

Company1. Maintain the repo2. Add features3. Solve issue

1. Submit issue2. Contributing -

add features or solve issues

Stakeholders / Clients

Ecosystem contribution

Forecasting goalYt Number of issues daily

t 1,2,3 … 365 (1 year long)

k 1, 2, 3, …,20,21

Ft+k Number of issues until t+k

Goal’s type Forward-looking

Purpose of the forecast

Forecast the number of issues per day in the coming 3 weeks

How To Be Used

Used the number of issues to determine the manpower

allocation

Foundation / Corporation

Repository Description

Apache Spark Large-scale data-processing engine

Apple Swift Programming language for iOS / OSX

NumFOCUS Julia Programming language for Data Analysis

Node.js Foundation

Node Programming language to use Javascript on server-side

Google TensorFlow Machine-learning framework

Data DescriptionData Source

MeasurementFrequency= Daily

*Issue: Any problem or Question that users report

In the last 1 Year

Experiments1. Create 28 training and validation samples

2. Notes on methodsa. MAPE & RMSE are average of the 28 resultsb. HOLT Winters → select the best among the additive, multiplicative,

with / without trend

Evaluation

Output: Apache Spark Issues Forecasts for Training, Validation

Actual Forecast

Train Valid ACF on the Residuals of Ensemble

Forecast result: apache Box-Plot: Ensemble Forecast Result

Fore

cast

Inte

rval

Num

ber o

f Iss

ue Forecast Horizon

Forecasted Number of Issues

17 issues

Period

Forecasted Number of Issues

Recommendations But..

❖ Our prediction intervals are very large eg. (+/-10)

❖ 5 Repositories’ daily issues very low

❖ Improved forecasts must be substantial enough to justify extra effort

❖ Repositories with a larger volume of daily issues likely to benefit more

Results..

Final Model

1. Able to capture time-series components (Trend, Seasonality, AC)

2. Improved upon Seasonal Naive benchmarks ~17%