26
April 20 th , 2016 Raunak Mundada Peter Wendel Addie Olson

April Madness

Embed Size (px)

Citation preview

Page 1: April Madness

April 20th, 2016Raunak Mundada

Peter WendelAddie Olson

Page 2: April Madness

Outline

• Question of interest and background

• Data Cleaning

• Seed prediction

• Bracket adjustment

• NCAA tournament predictions

• Results

• Conclusion

Page 3: April Madness

Question of Interest

• How would’ve SMU performed in NCAA division 1 Men’s basketball championship in 2016?

• Predict the seed for SMU

• Adjust the 2016 bracket for the tournament and predict the result of each match

• Simulate entire tournament and generate odds estimation for SMU

Page 4: April Madness

Background

(NCAA) Men's Division I Basketball Tournament• Single-elimination tournament played each spring in the United States, currently

featuring 68 college basketball teams, to determine the national championship of the major college basketball teams.

• The NCAA sanctioned the SMU men's basketball program for multiple violations, the penalty kept SMU from the 2016 postseason tournament.

• SMU Men’s Basketball was the last undefeated team to lose this season

• The team spent most of the season in the top 25, peaking at #8 in the country

Page 5: April Madness

Data Cleaning

Page 6: April Madness

Raw Data for each season

Data source: https://www.kaggle.com/c/march-machine-learning-mania-2016

Page 7: April Madness

Wteam ID Wscore ... Lteam ID Lscore ...

1374 75 ... 1106 70 ...

Team ID AvgScore ...

1374 73.2 ...

1106 75.8 ...

WteamID WAvgScore ... Lteam ID LAvgscore ...

1374 75.8 ... 1106 73.2 ...

Team ID TeamAvgScore ... Team1Won

1374 75.8 ... 1

Team ID TeamAvgScore ... Team1Won

1374 73.2 ... 0

Team1 ID Team2 ID ScoreRatio ... Team1Won

1374 1106 75.8/73.2=1.03 ... 1

÷

1. Raw data with winning and losing team game stats for each matchup in the season

2. Calculate season average stats for each team

3. Bind season average score to each team for each matchup in the previous tourneys

4. Convert win to binary variable based on previous March Madness results

5. Use ratios of team average statistics to use as regressors

Page 8: April Madness

Data set after preprocessing

Page 9: April Madness

Seed Prediction & Bracket Adjustment

Page 10: April Madness

Decision Tree for Seeding● Used 2003 - 2014 regular

season data to build decision tree to predict seed with a max depth of 5

● Tested on 2015 and 2016 regular season data

SMU’s seed estimate: 6

Training Data Test Data

Page 11: April Madness

• Found and eliminated least qualified team that did not win conference (Tulsa, 11 seed)

• Insert SMU as a 6 seed

• Bump least qualified 6 seed to 7

• Repeat until least qualified 10 seed takes Tulsa’s initial position as an 11 seed

• Use data from actual play-in games to select winners

Bracket Adjustments

Page 12: April Madness
Page 13: April Madness

• Found and eliminated least qualified team that did not win conference (Tulsa, 11 seed)

• Insert SMU as a 7 seed

• Bump least qualified 7 seed to 8

• Repeat until least qualified 10 seed takes Tulsa’s initial position as an 11 seed

• Use data from actual play-in games to select winners

Bracket Adjustments

Page 14: April Madness

NCAA tournament predictions

Page 15: April Madness

Model Building Process

Page 16: April Madness

Model Building ProcessModel Accuracy

-TrainingAccuracy -

TestSensitivity -

TrainingSensitivity -

TestingSpecificity -

TrainingSpecificity -

Testing

Logistic Regression

67.3% 71.6% 66.5% 66.2% 68.2% 77.3%

Logistic Regression - Significant Features

68.8% 65.7% 60.34% 54.4% 77.5% 77.3%

Penalized Logistic

Regression

68.3% 70.9% 66.3% 71% 71% 72.3%

Random Forest

69% 65.7% 69.3% 63.2% 68.7% 68.2%

Page 17: April Madness

Logistic Regression - Significant FeaturesVariable Importance

Page 18: April Madness

Logistic Regression Details

Team 2 Wins Team 1 Wins

Team 2 Wins 45 15

Team 1 Wins 23 51

Confusion Matrix

Pre

dict

ion

Reference

• Model output on test data set• Test dataset includes season

average from 2013, 2014 and 2015• The outcome variable corresponds

to the output for 2013, 2014 and 2015 NCAA March madness match-ups

AUC = 74.3%

Page 19: April Madness

Simulation Process

Page 20: April Madness

SMU Predicted Performance

The graph is read as follows –

• When SMU is seed 6, 1% times it reaches the sweet 16 round

In that sense, SMU has 0.5% chance of winning the tournament, when seeded 7.

Round 1

Round 2

Sweet 16

Elite 8

Final 4

Championship G

ame

Champions0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

42.0%

55.5%

1.0% 1.0% 0.5% 0.0% 0.0%

10.5%

66.5%

9.5% 7.5% 4.0% 1.5% 0.5%

SMU Predicted Performance for NCAA 2016

SMU Seed 6SMU Seed 7

Exit Round

Tota

l Vis

its

Page 21: April Madness

SMU Predicted Performance

The cumulative probability graph tells us that SMU, when seeded 6, probability of making it to at least the Elite 8 is 1.5%.

Round 2 Sweet 16 Elite 8 Final 4 Championship Game

Champions0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

80.0%

90.0%

100.0%

89.5%

23.0%

13.5%6.0%

2.0% 0.5%

58.0%

2.5% 1.5% 0.5%0.0%

0.0%

Cumulative Probability of SMU Advancing to Each Round

SMU 7 Seed SMU 6 Seed

Page 22: April Madness

Comparison with Vegas probabilities

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

15.38%14.29%

12.50%

8.33% 8.33%6.25% 5.56% 5.56%

0.00%

13.0%

27.0%

14.5%

3.5%4.5%

0.0%

20.5%

2.5%

0.0%

13.50%

22.00%

17.00%

4.50% 4.50%

0.00%

21.50%

2.00%1.50%

Championship Probability for top 8 teams & SMU

Vegas Probability

Probability - SMU Seed 6

Probability - SMU seed 7

Overall, the model in some cases overestimates while in some, it underestimates the probability of winning the championship for teams.

The model underestimates the probability of winning for Villanova (the ultimate winner of NCAA March Madness 2016)

Page 23: April Madness

Round1

Round2

Sweet 16

Elite 8

Final 4

Championship G

ame

Champions0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

30.0%

6.0%7.5%

26.5%

19.5% 18.5%

7.5%

14.5%

10.5%

22.0%

26.5%

22.5%

8.5% 7.5%

2.5%

North Carolina and Villanova Predicted Performance

North Carolina Villanova

Predicted performance for the finalists

• North Carolina and Villanova were the finalists

• Our model gives a 14.5% chance of winning to North Carolina

• However, the winner was Villanova (2.5% probability of winning according to our model)

Page 24: April Madness

Feature Importance across models

Page 25: April Madness

Conclusion

According to our seeding model and bracket simulator, SMU would have had 1.5% probability of winning the championship had they been eligible for postseason play. As the tournament goes on, the probabilities dwindle, but the ever present possibility of an upset and their strong regular season performance would have made the Mustangs a formidable opponent for any team in any round.

Further Questions -

• Further investigation into prediction of seeding

• Simulate more brackets of different seedings, readings

• Investigate independence of games and include player information in the model

• Include past march madness results

Page 26: April Madness

Thank you