Upload
raunak-mundada
View
60
Download
6
Embed Size (px)
Citation preview
April 20th, 2016Raunak Mundada
Peter WendelAddie Olson
Outline
• Question of interest and background
• Data Cleaning
• Seed prediction
• Bracket adjustment
• NCAA tournament predictions
• Results
• Conclusion
Question of Interest
• How would’ve SMU performed in NCAA division 1 Men’s basketball championship in 2016?
• Predict the seed for SMU
• Adjust the 2016 bracket for the tournament and predict the result of each match
• Simulate entire tournament and generate odds estimation for SMU
Background
(NCAA) Men's Division I Basketball Tournament• Single-elimination tournament played each spring in the United States, currently
featuring 68 college basketball teams, to determine the national championship of the major college basketball teams.
• The NCAA sanctioned the SMU men's basketball program for multiple violations, the penalty kept SMU from the 2016 postseason tournament.
• SMU Men’s Basketball was the last undefeated team to lose this season
• The team spent most of the season in the top 25, peaking at #8 in the country
Data Cleaning
Raw Data for each season
Data source: https://www.kaggle.com/c/march-machine-learning-mania-2016
Wteam ID Wscore ... Lteam ID Lscore ...
1374 75 ... 1106 70 ...
Team ID AvgScore ...
1374 73.2 ...
1106 75.8 ...
WteamID WAvgScore ... Lteam ID LAvgscore ...
1374 75.8 ... 1106 73.2 ...
Team ID TeamAvgScore ... Team1Won
1374 75.8 ... 1
Team ID TeamAvgScore ... Team1Won
1374 73.2 ... 0
Team1 ID Team2 ID ScoreRatio ... Team1Won
1374 1106 75.8/73.2=1.03 ... 1
÷
1. Raw data with winning and losing team game stats for each matchup in the season
2. Calculate season average stats for each team
3. Bind season average score to each team for each matchup in the previous tourneys
4. Convert win to binary variable based on previous March Madness results
5. Use ratios of team average statistics to use as regressors
Data set after preprocessing
Seed Prediction & Bracket Adjustment
Decision Tree for Seeding● Used 2003 - 2014 regular
season data to build decision tree to predict seed with a max depth of 5
● Tested on 2015 and 2016 regular season data
SMU’s seed estimate: 6
Training Data Test Data
• Found and eliminated least qualified team that did not win conference (Tulsa, 11 seed)
• Insert SMU as a 6 seed
• Bump least qualified 6 seed to 7
• Repeat until least qualified 10 seed takes Tulsa’s initial position as an 11 seed
• Use data from actual play-in games to select winners
Bracket Adjustments
• Found and eliminated least qualified team that did not win conference (Tulsa, 11 seed)
• Insert SMU as a 7 seed
• Bump least qualified 7 seed to 8
• Repeat until least qualified 10 seed takes Tulsa’s initial position as an 11 seed
• Use data from actual play-in games to select winners
Bracket Adjustments
NCAA tournament predictions
Model Building Process
Model Building ProcessModel Accuracy
-TrainingAccuracy -
TestSensitivity -
TrainingSensitivity -
TestingSpecificity -
TrainingSpecificity -
Testing
Logistic Regression
67.3% 71.6% 66.5% 66.2% 68.2% 77.3%
Logistic Regression - Significant Features
68.8% 65.7% 60.34% 54.4% 77.5% 77.3%
Penalized Logistic
Regression
68.3% 70.9% 66.3% 71% 71% 72.3%
Random Forest
69% 65.7% 69.3% 63.2% 68.7% 68.2%
Logistic Regression - Significant FeaturesVariable Importance
Logistic Regression Details
Team 2 Wins Team 1 Wins
Team 2 Wins 45 15
Team 1 Wins 23 51
Confusion Matrix
Pre
dict
ion
Reference
• Model output on test data set• Test dataset includes season
average from 2013, 2014 and 2015• The outcome variable corresponds
to the output for 2013, 2014 and 2015 NCAA March madness match-ups
AUC = 74.3%
Simulation Process
SMU Predicted Performance
The graph is read as follows –
• When SMU is seed 6, 1% times it reaches the sweet 16 round
In that sense, SMU has 0.5% chance of winning the tournament, when seeded 7.
Round 1
Round 2
Sweet 16
Elite 8
Final 4
Championship G
ame
Champions0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
42.0%
55.5%
1.0% 1.0% 0.5% 0.0% 0.0%
10.5%
66.5%
9.5% 7.5% 4.0% 1.5% 0.5%
SMU Predicted Performance for NCAA 2016
SMU Seed 6SMU Seed 7
Exit Round
Tota
l Vis
its
SMU Predicted Performance
The cumulative probability graph tells us that SMU, when seeded 6, probability of making it to at least the Elite 8 is 1.5%.
Round 2 Sweet 16 Elite 8 Final 4 Championship Game
Champions0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
89.5%
23.0%
13.5%6.0%
2.0% 0.5%
58.0%
2.5% 1.5% 0.5%0.0%
0.0%
Cumulative Probability of SMU Advancing to Each Round
SMU 7 Seed SMU 6 Seed
Comparison with Vegas probabilities
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
15.38%14.29%
12.50%
8.33% 8.33%6.25% 5.56% 5.56%
0.00%
13.0%
27.0%
14.5%
3.5%4.5%
0.0%
20.5%
2.5%
0.0%
13.50%
22.00%
17.00%
4.50% 4.50%
0.00%
21.50%
2.00%1.50%
Championship Probability for top 8 teams & SMU
Vegas Probability
Probability - SMU Seed 6
Probability - SMU seed 7
Overall, the model in some cases overestimates while in some, it underestimates the probability of winning the championship for teams.
The model underestimates the probability of winning for Villanova (the ultimate winner of NCAA March Madness 2016)
Round1
Round2
Sweet 16
Elite 8
Final 4
Championship G
ame
Champions0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
30.0%
6.0%7.5%
26.5%
19.5% 18.5%
7.5%
14.5%
10.5%
22.0%
26.5%
22.5%
8.5% 7.5%
2.5%
North Carolina and Villanova Predicted Performance
North Carolina Villanova
Predicted performance for the finalists
• North Carolina and Villanova were the finalists
• Our model gives a 14.5% chance of winning to North Carolina
• However, the winner was Villanova (2.5% probability of winning according to our model)
Feature Importance across models
Conclusion
According to our seeding model and bracket simulator, SMU would have had 1.5% probability of winning the championship had they been eligible for postseason play. As the tournament goes on, the probabilities dwindle, but the ever present possibility of an upset and their strong regular season performance would have made the Mustangs a formidable opponent for any team in any round.
Further Questions -
• Further investigation into prediction of seeding
• Simulate more brackets of different seedings, readings
• Investigate independence of games and include player information in the model
• Include past march madness results
Thank you