Upload
vothu
View
214
Download
0
Embed Size (px)
Citation preview
Managing open-source projects’ manpower
By Forecasting the number of daily issues on GitHub’s Repository
Group 2Aditya Utama Wijaya - V K Sanjeed - Renaud Jollet De
Lorenzo - Katy Wen Lee - Cindy Soh
Business problemOpen-Source collaboration
platform
14 Million Users
35 Million Repo
Open-source software development cycle
Contribution
The Open-Source Software Development Problem
Contribution
Community
Community
Business GoalsBusiness Goals
To manage the manpower allocation through the number of daily issues forecast in the upcoming 3
weeks
Opportunities1. Our forecast will be a proof-of-concept
whether our method can successfully forecast number of issues.
2. Accurate forecasts will bring value to foundation to allocate manpower efficiently.
Challenges1. Diverse characeristics of the repository
community
Donation $
Users (Individual & Corporation)
Repository
Company1. Maintain the repo2. Add features3. Solve issue
1. Submit issue2. Contributing -
add features or solve issues
Stakeholders / Clients
Ecosystem contribution
Forecasting goalYt Number of issues daily
t 1,2,3 … 365 (1 year long)
k 1, 2, 3, …,20,21
Ft+k Number of issues until t+k
Goal’s type Forward-looking
Purpose of the forecast
Forecast the number of issues per day in the coming 3 weeks
How To Be Used
Used the number of issues to determine the manpower
allocation
Foundation / Corporation
Repository Description
Apache Spark Large-scale data-processing engine
Apple Swift Programming language for iOS / OSX
NumFOCUS Julia Programming language for Data Analysis
Node.js Foundation
Node Programming language to use Javascript on server-side
Google TensorFlow Machine-learning framework
Data DescriptionData Source
MeasurementFrequency= Daily
*Issue: Any problem or Question that users report
In the last 1 Year
Experiments1. Create 28 training and validation samples
2. Notes on methodsa. MAPE & RMSE are average of the 28 resultsb. HOLT Winters → select the best among the additive, multiplicative,
with / without trend
Output: Apache Spark Issues Forecasts for Training, Validation
Actual Forecast
Train Valid ACF on the Residuals of Ensemble
Forecast result: apache Box-Plot: Ensemble Forecast Result
Fore
cast
Inte
rval
Num
ber o
f Iss
ue Forecast Horizon
Forecasted Number of Issues
17 issues
Period
Forecasted Number of Issues
Recommendations But..
❖ Our prediction intervals are very large eg. (+/-10)
❖ 5 Repositories’ daily issues very low
❖ Improved forecasts must be substantial enough to justify extra effort
❖ Repositories with a larger volume of daily issues likely to benefit more
Results..
Final Model
1. Able to capture time-series components (Trend, Seasonality, AC)
2. Improved upon Seasonal Naive benchmarks ~17%