Upload
laura-albert-mclay
View
96
Download
0
Embed Size (px)
Citation preview
The Math Behind the March Madness Tournament and
College Football Playoff
Laura Albert McLayAssociate Professor, ISYE
[email protected]@lauramclay
@badgerbracketshttp://bracketology.engr.wisc.edu/
Let’s start with the 2 minute version of my talk
https://www.facebook.com/UWMadison/videos/10154004638653114/
First, of all…
I’m a industrial and systems engineering professor by day And a bracketologist by night!
I study systems
A system is a set of things—people, cells, vehicles, basketball teams, or whatever—interconnected in such a way that they produce their own pattern of behavior over time.
My discipline is operations research: the science of making decisions using advanced analytical methods
Our world is becoming increasingly complex and increasingly connected
Systems matter!
Mathematical models and systems thinkinghelp us study systemsand navigate the complex, interconnectedworld we live in.
What do we hope to learn from probability models like Markov chains?
• How do we draw conclusions from limited data?
• How can we make data-driven decisions in the presence of uncertainty?
How I got started in bracketology
In 2014 someone suggested I examine bracketology in the context of the first College Football Playoff…
…and so began Badger Bracketology
My objective: forecast which teams would make the first college football playoff before the season was over.
Markov chains:The Little Engine that Could Markov chains:
A type of math model for understanding how a system can evolve over time.
Uses: finance, epidemiology, queues, zombies
Markov chains for ranking teams in a nutshell
Each team is a state. A team “votes” for teams that that it loses to
http://sumnous.github.io/blog/2014/07/24/gephi-on-mac/
Graph of 2014 college football season
Simple yet powerful ideaAutomatically rate and ranks teams by taking advantage of the network structure of the match ups
• Use Markov chains to account for strength of schedule
• Do not need a human in the loop
Simple data requirements:
1. Game outcomes (score differentials),
2. Home/away status
Takes difficulty of future games into account in football playoff forecasts
• Polls give the ranking right now, only gives insight a playoff held today
Google PageRank is a Markov model!
Source: google.com
Do you remember Internet searches before Google?
https://www.wordstream.com/articles/internet-search-engines-history
First, let’s talk about ranking basketball teams
TransitionsRutgers 52 @ Wisconsin 72
Wisconsin Rutgers 1 −𝑊
𝑊
𝑊
1 −𝑊
How much credit should Wisconsin get for beating Rutgers by 20 at home?
𝑊 = effective wins (fraction of a vote), which help us compute our Markov chain transition probabilities
Let’s find a data-driven answer!
Given that team 𝑖 beat team 𝑗 by 𝑥 points at home, what is the
probability that 𝑖 is a better team than 𝑗 on a neutral court?
Data: Some teams play twice per season (home away)
Given that team 𝑖 beat team 𝑗 by 𝑥 points at home, what is the
probability that 𝑖 is a better team than 𝑗 on 𝑗′𝑠 home court?
𝑟𝑥𝐻 𝑟𝑥
𝐴 = probability that a team outscores its opponent by 𝑥
points at home 𝐻 (away 𝐴) is better than its opponent on a
neutral 𝑁 site
Developed by Sokol, Kvam, Nemhauser, and Brown at Georgia Tech to rank NCAA men’s basketball teams https://www2.isye.gatech.edu/~jsokol/lrmc/
What is the probability you win your next game (on the road) given that you win by 20 at home?
Logistic regression to the rescue!
Problem 1: must win by 50+ points to get a lot of credit for a win!Winning/losing close games gives you the same amount of “credit”
Margin of victory 𝑥Pro
bab
ility
of
win
nin
g o
n t
he
ro
ad n
ext
tim
e
Problem 2: We need to get neutral site win probabilities
Logistic regression for NCAA men’s basketball • Use log (Point differentials) instead!
• Do not truncate point differentials
-30 -20 -10 0 10 20 300
0.2
0.4
0.6
0.8
1
Point differential
Eff
ecti
ve w
ins
Winning matters• Average in a pure win/loss model to give more credit for winning the
game
-30 -20 -10 0 10 20 300
0.2
0.4
0.6
0.8
1
Point differential
Eff
ecti
ve w
ins
-30 -20 -10 0 10 20 300
0.2
0.4
0.6
0.8
1
Point differential
Eff
ecti
ve w
ins
Putting it all together• End up with the red line!
-30 -20 -10 0 10 20 300
0.2
0.4
0.6
0.8
1
Point differential
Eff
ecti
ve w
ins
-30 -20 -10 0 10 20 300
0.2
0.4
0.6
0.8
1
Point differential
Eff
ecti
ve w
ins
Markov chain transition probabilitiesRutgers 52 @ Wisconsin 72 *
Wisconsin Rutgers 1 −𝑊
𝑊
𝑊
1 −𝑊
How much credit should Wisconsin get for beating Rutgers by 20 at home?
P(UW beats Rutgers on a neutral court) = 0.6255
𝑊 = 0.6817 effective wins (fraction of a vote)
* Wisconsin 61 @ Rutgers 54 later on 1/28/2017
TransitionsSame idea for the rest of the games…
Wisconsin
Minnesota
Northwestern
Rutgers
Illinois
Current rankings3/12/2017 Selection Sunday
1 Gonzaga2 Villanova3 Kentucky4 SMU5 Wichita St6 Arizona7 UCLA8 Duke9 Cincinnati10 Oregon11 MTSU12 North Carolina13 St Marys CA
14 West Virginia15 Kansas16 Nevada17 Purdue18 Vermont19 UNC Wilmington20 Michigan21 Florida St22 VA Commonwealth23 Notre Dame24 Bucknell25 Wisconsin
The B1G, ranked.3/12/2017
17 Purdue20 Michigan25 Wisconsin41 Northwestern43 Minnesota54 Maryland78 Indiana87 Michigan St121 Iowa130 Illinois141 Ohio St176 Penn St187 Rutgers242 Nebraska
How did we do last year?3/13/2016 Selection Sunday
1. North Carolina
2. Kansas
3. Villanova
4. Michigan St
5. Virginia
6. West Virginia
7. Oklahoma
8. Kentucky
9. Oregon
10. Purdue
11. Xavier
12. Miami FL
13. Duke
14. Utah
15. Texas A&M
16. Louisville
17. Maryland
18. Arizona
19. Seton Hall
20. Iowa St
21. Indiana
22. California
23. Baylor
24. St Josephs PA
25. Iowa
Now let’s talk about the College Football Playoff
College Football Playoff
Objective: determine which teams would make the first college football playoff.
Goal: to forecast the top 4 teams weeks before the season ends.
Solution method: a ranking method.
Challenge: need to simulate the remainder of the season and rank the teams at the end of the (simulated) season.
Giant assumption
• We assume the selection committee will pick the four ranked teams in the playoff.
• History suggests that humans prefer the most deserving teams rather than the best teams in the national championship game.
• E.g., 2013 Alabama lost on a fluke play.
• …but the College Football Selection Committee might have changed this!
2013 BCS Rankings just before bowl bids
College football playoff committee rankings
2014 Playoff rankings 2015 Playoff rankings
How we did last year
2016 Playoff Rankings Badger Bracketology rankings1 Alabama
2 Ohio State
3 Clemson
3 Washington
5 Michigan
6 Penn State
7 Western Michigan
8 Louisville
9 Oklahoma
10 Wisconsin :(
Model: two parts
0. Observe a few (7-8) weeks of game outcomes
1. Ranking.• Assign a rating to each team to rank the teams.• Similar to what we had before but with college football data
2. Game simulation.• Determine who wins a game based on the team ratings.
Simulate the next week’s game outcomes.
• Combine these:• Re-rate and re-rank after each week of games.• Simulate the remainder of the season.• Report teams most likely to be in the top 4
Score differentialsYes, running up the score matters, mathematically.
Histogram of score differentials, 2012-2014
Home score - away score
Fre
qu
en
cy
-60 -40 -20 0 20 40 60 80
05
01
00
15
02
00
Capped score differentials38% of conference games fall beyond the cap
Histogram of score differentials capped at +/-21, 2012-2014
Home score - away score
Fre
qu
en
cy
-60 -40 -20 0 20 40 60 80
05
01
00
15
02
00
25
0
Note: Rating systems used by College Football Playoff committee must use wins/losses only (not score differentials). Running up the score makes a difference!
-20 -10 0 10 200
0.2
0.4
0.6
0.8
1
Point differential
Eff
ecti
ve w
ins
Sx
H
rx
H
rx
N
Build the Markov chain for football
• Used 3 seasons of data (truncate scores by +/-21)
• Use games played in consecutive years to identify win
probabilities to feed into the Markov chain
-20 -15 -10 -5 0 5 10 15 200
0.2
0.4
0.6
0.8
1
logistic regression
logistic regression averaged with win (weight = 2/3)
logistic regression averaged with win (weight = 1/3)
Modified Log Logistic Regression Markov Chain (ln(mLRMC))• Same as mLRMC except that we consider log point differentials to
dampen big score differentials
• Do not truncate point differentials
-20 -10 0 10 200
0.2
0.4
0.6
0.8
1
Point differential
Eff
ecti
ve w
ins
logistic regression (home team)
logistic regression averaged with win
Markov chain transitionsUse mLRMC and ln(mLRMC) for all games
Simulate the rest of the season!
0. Observe a few (7-8) weeks of game outcomes
1. Ranking.• Assign a rating to each team to rank the teams.
2. Game simulation.• Determine who wins a game based on the team ratings.
Simulate the next week’s game outcomes.
• Combine these:• Re-rate and re-rank after each week of games.• Simulate the remainder of the season.• Report teams most likely to be in the top 4
Win probability parametersThe win probability between teams 𝑖 and 𝑗, where 𝑖 is the home team is captured by the best-fit logistic regression model using two years of game data:
𝑝𝑖𝑗 =𝑒𝑏+𝑎(𝑟𝑖−𝑟𝑗)
1 + 𝑒𝑏+𝑎(𝑟𝑖−𝑟𝑗)
where 𝑟𝑖 − 𝑟𝑗 = the difference in ratings between the two teams.
and assign a point differential to the winner.
Game prediction accuracy (averaged per game)
Statistic Model Training set Test set
Mean Absolute Error mLRMC 0.2043 0.3152
ln(mLRMC) 0.2026 0.3162
Mean Squared Error mLRMC 0.1006 0.1885
ln(mLRMC) 0.0999 0.1897
College football playoff committee rankings
2016 Playoff rankings2015 Playoff rankings
2016 Results: Rankings (NOW)
Team Method Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14
Alabama CFP Committee 1 1 1 1 1 1
ln(mLRMC) 1 1 1 1 1 1 1 1
Clemson CFP Committee 2 2 4 4 3 2
ln(mLRMC) 3 4 3 3 5 4 4 3
OSU CFP Committee 6 5 2 2 2 3
ln(mLRMC) 4 5 5 4 3 3 2 2
Washington CFP Committee 5 4 6 5 4 4
ln(mLRMC) 8 7 6 6 6 6 5 3*
* Clemson and Washington were tied
2016 Results: RankingsForecasted ranking of likelihood to make playoff (any seed, out of 1000)
Team Method Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14
Alabama
ln(mLRMC)Forecasted ranking 1 1 1 1 1 1 NAln(mLRMC) now ranking 1 1 1 1 1 1 1 1
Clemson
ln(mLRMC) Forecasted ranking 2 2 2 2 2 3 NAln(mLRMC) now ranking 3 4 3 3 5 4 4 3
OSU
ln(mLRMC) Forecasted ranking 5 5 5 4 5 2 NAln(mLRMC) now ranking 4 5 5 4 3 3 2 2
Washington
ln(mLRMC) Forecasted ranking 3 4 4 6 4 4 NAln(mLRMC) now ranking 8 7 6 6 6 6 5 3*
* Clemson and Washington were tied
2015 Results: Rankings (NOW)
Team Method Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14
Clemson CFP Committee 1 1 1 1 1 1
mLRMC 2 3 1 1 2 1 3 2
ln(mLRMC) 7 5 1 1 2 1 1 2
Alabama CFP Committee 4 2 2 2 2 2
mLRMC 4 5 8 3 1 2 1 1
ln(mLRMC) 5 4 6 2 1 2 2 1
MSU CFP Committee 7 13 9 5 5 3
mLRMC 6 4 5 9 9 5 5 3
ln(mLRMC) 6 2 4 7 8 4 4 3
Oklahoma CFP Committee 15 12 7 3 3 4
mLRMC 16 13 13 8 5 3 1 4
ln(mLRMC) 18 12 16 10 5 5 3 4
Team Method Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13
Clemson mLRMC 667 897 931 905 915 949 956
ln(mLRMC) 749 840 897 893 955 923 976
Alabama mLRMC 361 209 166 837 913 943 995
ln(mLRMC) 427 240 197 858 847 931 996
MSU mLRMC 179 213 261 54 24 569 675
ln(mLRMC) 226 349 354 115 162 573 706
Oklahoma mLRMC 20 46 71 119 393 758 1000
ln(mLRMC) 12 73 16 63 142 247 1000
2015 Results: Forecasted number of times to make playoff (out of 1000)
Nebraska beats MSU
MSU beats
The OSU
No Big12 championship
Slight difference in rankings:
3rd /4th vs. 5th /6th
2015 Results: Forecasted ranking of likelihood to make playoff (any seed, out of 1000)
Team Method Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14
Clemson mLRMC 2 1 1 1 1 1 3 2
ln(mLRMC) 2 1 1 1 2 2 3 2
Alabama mLRMC 5 7 6 2 2 2 2 1
ln(mLRMC) 4 6 8 2 1 1 2 1
MSU mLRMC 7 6 7 11 10 4 4 3
ln(mLRMC) 6 5 5 9 6 3 4 3
Oklahoma mLRMC 18 13 13 9 4 3 1 4
ln(mLRMC) 21 15 14 12 8 6 1 4
No Big12 championship
No simulation: the season is
over. We think the committee
got it right!
Ranked 2nd & 7th
after week 7
Ranked 5th & 4th
after week 9
2015 Results:What happened to The Ohio State University?
Rankings after week 12Forecasted rankings after
week 12
1. Clemson 1. Clemson
2. Alabama 2. Alabama
3. Oklahoma 3. Oklahoma
4. Notre Dame 4. Michigan State
5. Michigan State 5. Iowa
6. Ohio State 6. Notre Dame
7. Iowa 7. Stanford
8. Florida 8. Florida
9. Michigan 9. Ohio State
10. Stanford(no other teams have >1%
chance of making the playoff)
Final thoughts about March Madness
Picking the perfect bracket
There are about 9.2 quintillion ways to fill out a bracket…And 1 way to fill out a perfect bracket
The odds of filling out a perfect bracket are not 9-quintillion-to-1 because:
(a) the tournament isn’t like the lottery where every outcome is equally likely, and
(b) monkeys are not randomly selecting game outcomes. Instead, people are purposefully selecting outcomes.
Can math help our odds?
FiveThirtyEight notes that the typical bracket has a 2.5 trillion-to-1 odds of being perfect:
• https://fivethirtyeight.com/features/march-madness-perfect-bracket-odds/
BracketOdds at Illinois estimates that a historical average winning bracket performs at 4.4 billion-to-1
• Warren Buffet may have to pay out!
The thing with perfect bracketsThey depend on the year.
Let’s only look at how many people correctly select all Final Four teams:
– 1140 of 13 million brackets correctly picked all Final Four teams in 2016– 182,709 of 11.57 million brackets correctly picked all Final Four teams in 2015 *– 612 of 11 million brackets correctly picked all Final Four teams in 2014– 47 of 8.15 million brackets correctly picked all Final Four teams in 2013– 23,304 of 6.45 million brackets correctly picked all Final Four teams in 2012– 2 of 5.9 million brackets correctly picked all Final Four teams in 2011
* Only 1 bracket emerged from the round of 64 with all 32 correct picks
Tips for winning your office pool
1. Don’t use RPI
• Badger Bracketology (my favorite tool!)
• Logistic Regression Markov Chain (LRMC)
• FiveThirtyEight rankings of tournament teams
• Ken Pomoroy’s rankings
• Sagarin rankings
• Massey Ratings
• ESPN’s BPI rankings
Rankings clearinghouse: http://www.masseyratings.com/cb/compare.htm
2. Pay attention to the seeds
Some seeds generate more upsets than others• 7-10 seeds and 5/12 seeds
Historically, 6/11 seeds go the longest before facing a 1 or 2 seed.
3. Don’t pick Kansas
• Be strategic. The point is NOT to maximize your points, it’s to get more points than your opponents
• Differentiate your Final Four• Check ESPN for the top picked teams. Some top teams
are overvalued and others are undervalued• Last year:
• Kansas was selected as the overall winner in 27% of brackets (and in 62% of Final Fours) with a 19% chance of winning (538)
• UNC selected as overall winner in 8% of brackets (with a 15% win probability) and Villanova in 5.5%
http://games.espn.com/tournament-challenge-bracket/2016/en/whopickedwhom
https://projects.fivethirtyeight.com/2016-march-madness-predictions/
4. It’s totally random
A good process yields good outcomes on average
• It does not guarantee the best outcome in any given tournament
Small pools are better if you have a good process
• Scoring can be random
• The more brackets, the higher chance that a “random” bracket will be the best
Topics in Sports AnalyticsISYE 601 in Spring 2017!• Goal: teach students data-driven methods for making
better decisions using sports as a vehicle
• Course topics:• Linear regression • Logistic regression• Empirical Bayes• Ranking methods• Probability models and Markov chains• Forecasting• Game theory• Tournament scheduling• Networks (is my team mathematically eliminated from the
playoffs?)…and more!
In the news!
56https://punkrockor.com/in-the-news/
Thank you!Laura Albert McLay
punkrockOR.com
bracketology.engr.wisc.edu
Twitter: @lauramclay, @badgerbrackets