View
7
Download
0
Category
Preview:
Citation preview
Data from Strava.com
Pace
Time series, demographic, and aggregated running data on 10,000 runners. 1,000 with half-marathon times.
Analysis
Benchmarking with a linear model 0.73 6.5 min
Reducing number of features
Ensemble partial least squares regression 0.73 6.4 min
5-fold cross-validation
Regression r2
RMSE
Validation:72 runners 0.63 7.2 min
About me: Alexis Yelton, MIT postdocChitinase in marine cyanobacteria
Chiti
nase
acti
vity
My first half marathon:1:56:30
Personal best:1:47:56
22 Features
Month distance Weight RangeMonth Runs Gender
Month Elevation Rest Days / WeekMonth Pace Fast Days / WeekMonth Time Long Days /Week
6 Month Distance 5K Time6 Month Runs Marathon Time
6 Month Elevation Minimum Pace6 Month Pace Minimum Pace > 2 mi6 Month Time Minimum Pace > 3 mi
Age Range SD Pace
Results
Half Marathon Time
Errors vary with half marathon time.A larger data set would allow for better predictions for faster and slower runners.
Analysis
Benchmarking with a linear model 0.73 6.5 min
Dealing with collinear features (and reducing number of features)
1. Ensemble partial least squares regression 0.72 6.6 min2. Linear model 0.71 6.7
min3. Lasso regression 0.69 6.8 min4. Ridge regression 0.72 6.7 min5. Random forest regression 0.67 7.1 min
3-fold cross-validation
Regression r2
RMSE
Validation:69 runners 0.63 7.2 min
Analysis
Benchmarking with a linear model 0.73 6.5 min
Reducing number of features
1. Ensemble partial least squares regression 0.72 6.6 min2. Linear model 0.71 6.7
min3. Lasso regression 0.69 6.8 min
Other models with these features4. Ridge regression 0.72 6.7 min5. Random forest regression 0.67 7.1 min
3-fold cross-validation
Regression r2
RMSE
Validation:69 runners 0.63 7.2 min
Recommended