57
The Math Behind the March Madness Tournament and College Football Playoff Laura Albert McLay Associate Professor, ISYE [email protected] @lauramclay @badgerbrackets http://bracketology.engr.wisc.edu/

Bracketology talk at the Crossroads of ideas

Embed Size (px)

Citation preview

Page 1: Bracketology talk at the Crossroads of ideas

The Math Behind the March Madness Tournament and

College Football Playoff

Laura Albert McLayAssociate Professor, ISYE

[email protected]@lauramclay

@badgerbracketshttp://bracketology.engr.wisc.edu/

Page 2: Bracketology talk at the Crossroads of ideas

Let’s start with the 2 minute version of my talk

https://www.facebook.com/UWMadison/videos/10154004638653114/

Page 3: Bracketology talk at the Crossroads of ideas

First, of all…

I’m a industrial and systems engineering professor by day And a bracketologist by night!

Page 4: Bracketology talk at the Crossroads of ideas

I study systems

A system is a set of things—people, cells, vehicles, basketball teams, or whatever—interconnected in such a way that they produce their own pattern of behavior over time.

My discipline is operations research: the science of making decisions using advanced analytical methods

Page 5: Bracketology talk at the Crossroads of ideas

Our world is becoming increasingly complex and increasingly connected

Systems matter!

Mathematical models and systems thinkinghelp us study systemsand navigate the complex, interconnectedworld we live in.

Page 6: Bracketology talk at the Crossroads of ideas

What do we hope to learn from probability models like Markov chains?

• How do we draw conclusions from limited data?

• How can we make data-driven decisions in the presence of uncertainty?

Page 7: Bracketology talk at the Crossroads of ideas

How I got started in bracketology

In 2014 someone suggested I examine bracketology in the context of the first College Football Playoff…

…and so began Badger Bracketology

My objective: forecast which teams would make the first college football playoff before the season was over.

Page 8: Bracketology talk at the Crossroads of ideas

Markov chains:The Little Engine that Could Markov chains:

A type of math model for understanding how a system can evolve over time.

Uses: finance, epidemiology, queues, zombies

Page 9: Bracketology talk at the Crossroads of ideas

Markov chains for ranking teams in a nutshell

Each team is a state. A team “votes” for teams that that it loses to

http://sumnous.github.io/blog/2014/07/24/gephi-on-mac/

Graph of 2014 college football season

Page 10: Bracketology talk at the Crossroads of ideas

Simple yet powerful ideaAutomatically rate and ranks teams by taking advantage of the network structure of the match ups

• Use Markov chains to account for strength of schedule

• Do not need a human in the loop

Simple data requirements:

1. Game outcomes (score differentials),

2. Home/away status

Takes difficulty of future games into account in football playoff forecasts

• Polls give the ranking right now, only gives insight a playoff held today

Page 11: Bracketology talk at the Crossroads of ideas

Google PageRank is a Markov model!

Source: google.com

Page 12: Bracketology talk at the Crossroads of ideas

Do you remember Internet searches before Google?

https://www.wordstream.com/articles/internet-search-engines-history

Page 13: Bracketology talk at the Crossroads of ideas

First, let’s talk about ranking basketball teams

Page 14: Bracketology talk at the Crossroads of ideas

TransitionsRutgers 52 @ Wisconsin 72

Wisconsin Rutgers 1 −𝑊

𝑊

𝑊

1 −𝑊

How much credit should Wisconsin get for beating Rutgers by 20 at home?

𝑊 = effective wins (fraction of a vote), which help us compute our Markov chain transition probabilities

Page 15: Bracketology talk at the Crossroads of ideas

Let’s find a data-driven answer!

Given that team 𝑖 beat team 𝑗 by 𝑥 points at home, what is the

probability that 𝑖 is a better team than 𝑗 on a neutral court?

Data: Some teams play twice per season (home away)

Given that team 𝑖 beat team 𝑗 by 𝑥 points at home, what is the

probability that 𝑖 is a better team than 𝑗 on 𝑗′𝑠 home court?

𝑟𝑥𝐻 𝑟𝑥

𝐴 = probability that a team outscores its opponent by 𝑥

points at home 𝐻 (away 𝐴) is better than its opponent on a

neutral 𝑁 site

Developed by Sokol, Kvam, Nemhauser, and Brown at Georgia Tech to rank NCAA men’s basketball teams https://www2.isye.gatech.edu/~jsokol/lrmc/

Page 16: Bracketology talk at the Crossroads of ideas

What is the probability you win your next game (on the road) given that you win by 20 at home?

Page 17: Bracketology talk at the Crossroads of ideas

Logistic regression to the rescue!

Problem 1: must win by 50+ points to get a lot of credit for a win!Winning/losing close games gives you the same amount of “credit”

Margin of victory 𝑥Pro

bab

ility

of

win

nin

g o

n t

he

ro

ad n

ext

tim

e

Problem 2: We need to get neutral site win probabilities

Page 18: Bracketology talk at the Crossroads of ideas

Logistic regression for NCAA men’s basketball • Use log (Point differentials) instead!

• Do not truncate point differentials

-30 -20 -10 0 10 20 300

0.2

0.4

0.6

0.8

1

Point differential

Eff

ecti

ve w

ins

Page 19: Bracketology talk at the Crossroads of ideas

Winning matters• Average in a pure win/loss model to give more credit for winning the

game

-30 -20 -10 0 10 20 300

0.2

0.4

0.6

0.8

1

Point differential

Eff

ecti

ve w

ins

-30 -20 -10 0 10 20 300

0.2

0.4

0.6

0.8

1

Point differential

Eff

ecti

ve w

ins

Page 20: Bracketology talk at the Crossroads of ideas

Putting it all together• End up with the red line!

-30 -20 -10 0 10 20 300

0.2

0.4

0.6

0.8

1

Point differential

Eff

ecti

ve w

ins

-30 -20 -10 0 10 20 300

0.2

0.4

0.6

0.8

1

Point differential

Eff

ecti

ve w

ins

Page 21: Bracketology talk at the Crossroads of ideas

Markov chain transition probabilitiesRutgers 52 @ Wisconsin 72 *

Wisconsin Rutgers 1 −𝑊

𝑊

𝑊

1 −𝑊

How much credit should Wisconsin get for beating Rutgers by 20 at home?

P(UW beats Rutgers on a neutral court) = 0.6255

𝑊 = 0.6817 effective wins (fraction of a vote)

* Wisconsin 61 @ Rutgers 54 later on 1/28/2017

Page 22: Bracketology talk at the Crossroads of ideas

TransitionsSame idea for the rest of the games…

Wisconsin

Minnesota

Northwestern

Rutgers

Illinois

Page 23: Bracketology talk at the Crossroads of ideas

Current rankings3/12/2017 Selection Sunday

1 Gonzaga2 Villanova3 Kentucky4 SMU5 Wichita St6 Arizona7 UCLA8 Duke9 Cincinnati10 Oregon11 MTSU12 North Carolina13 St Marys CA

14 West Virginia15 Kansas16 Nevada17 Purdue18 Vermont19 UNC Wilmington20 Michigan21 Florida St22 VA Commonwealth23 Notre Dame24 Bucknell25 Wisconsin

Page 24: Bracketology talk at the Crossroads of ideas

The B1G, ranked.3/12/2017

17 Purdue20 Michigan25 Wisconsin41 Northwestern43 Minnesota54 Maryland78 Indiana87 Michigan St121 Iowa130 Illinois141 Ohio St176 Penn St187 Rutgers242 Nebraska

Page 25: Bracketology talk at the Crossroads of ideas

How did we do last year?3/13/2016 Selection Sunday

1. North Carolina

2. Kansas

3. Villanova

4. Michigan St

5. Virginia

6. West Virginia

7. Oklahoma

8. Kentucky

9. Oregon

10. Purdue

11. Xavier

12. Miami FL

13. Duke

14. Utah

15. Texas A&M

16. Louisville

17. Maryland

18. Arizona

19. Seton Hall

20. Iowa St

21. Indiana

22. California

23. Baylor

24. St Josephs PA

25. Iowa

Page 26: Bracketology talk at the Crossroads of ideas

Now let’s talk about the College Football Playoff

Page 27: Bracketology talk at the Crossroads of ideas

College Football Playoff

Objective: determine which teams would make the first college football playoff.

Goal: to forecast the top 4 teams weeks before the season ends.

Solution method: a ranking method.

Challenge: need to simulate the remainder of the season and rank the teams at the end of the (simulated) season.

Page 28: Bracketology talk at the Crossroads of ideas

Giant assumption

• We assume the selection committee will pick the four ranked teams in the playoff.

• History suggests that humans prefer the most deserving teams rather than the best teams in the national championship game.

• E.g., 2013 Alabama lost on a fluke play.

• …but the College Football Selection Committee might have changed this!

2013 BCS Rankings just before bowl bids

Page 29: Bracketology talk at the Crossroads of ideas

College football playoff committee rankings

2014 Playoff rankings 2015 Playoff rankings

Page 30: Bracketology talk at the Crossroads of ideas

How we did last year

2016 Playoff Rankings Badger Bracketology rankings1 Alabama

2 Ohio State

3 Clemson

3 Washington

5 Michigan

6 Penn State

7 Western Michigan

8 Louisville

9 Oklahoma

10 Wisconsin :(

Page 31: Bracketology talk at the Crossroads of ideas

Model: two parts

0. Observe a few (7-8) weeks of game outcomes

1. Ranking.• Assign a rating to each team to rank the teams.• Similar to what we had before but with college football data

2. Game simulation.• Determine who wins a game based on the team ratings.

Simulate the next week’s game outcomes.

• Combine these:• Re-rate and re-rank after each week of games.• Simulate the remainder of the season.• Report teams most likely to be in the top 4

Page 32: Bracketology talk at the Crossroads of ideas

Score differentialsYes, running up the score matters, mathematically.

Histogram of score differentials, 2012-2014

Home score - away score

Fre

qu

en

cy

-60 -40 -20 0 20 40 60 80

05

01

00

15

02

00

Page 33: Bracketology talk at the Crossroads of ideas

Capped score differentials38% of conference games fall beyond the cap

Histogram of score differentials capped at +/-21, 2012-2014

Home score - away score

Fre

qu

en

cy

-60 -40 -20 0 20 40 60 80

05

01

00

15

02

00

25

0

Note: Rating systems used by College Football Playoff committee must use wins/losses only (not score differentials). Running up the score makes a difference!

Page 34: Bracketology talk at the Crossroads of ideas

-20 -10 0 10 200

0.2

0.4

0.6

0.8

1

Point differential

Eff

ecti

ve w

ins

Sx

H

rx

H

rx

N

Build the Markov chain for football

• Used 3 seasons of data (truncate scores by +/-21)

• Use games played in consecutive years to identify win

probabilities to feed into the Markov chain

-20 -15 -10 -5 0 5 10 15 200

0.2

0.4

0.6

0.8

1

logistic regression

logistic regression averaged with win (weight = 2/3)

logistic regression averaged with win (weight = 1/3)

Page 35: Bracketology talk at the Crossroads of ideas

Modified Log Logistic Regression Markov Chain (ln(mLRMC))• Same as mLRMC except that we consider log point differentials to

dampen big score differentials

• Do not truncate point differentials

-20 -10 0 10 200

0.2

0.4

0.6

0.8

1

Point differential

Eff

ecti

ve w

ins

logistic regression (home team)

logistic regression averaged with win

Page 36: Bracketology talk at the Crossroads of ideas

Markov chain transitionsUse mLRMC and ln(mLRMC) for all games

Page 37: Bracketology talk at the Crossroads of ideas

Simulate the rest of the season!

0. Observe a few (7-8) weeks of game outcomes

1. Ranking.• Assign a rating to each team to rank the teams.

2. Game simulation.• Determine who wins a game based on the team ratings.

Simulate the next week’s game outcomes.

• Combine these:• Re-rate and re-rank after each week of games.• Simulate the remainder of the season.• Report teams most likely to be in the top 4

Page 38: Bracketology talk at the Crossroads of ideas

Win probability parametersThe win probability between teams 𝑖 and 𝑗, where 𝑖 is the home team is captured by the best-fit logistic regression model using two years of game data:

𝑝𝑖𝑗 =𝑒𝑏+𝑎(𝑟𝑖−𝑟𝑗)

1 + 𝑒𝑏+𝑎(𝑟𝑖−𝑟𝑗)

where 𝑟𝑖 − 𝑟𝑗 = the difference in ratings between the two teams.

and assign a point differential to the winner.

Game prediction accuracy (averaged per game)

Statistic Model Training set Test set

Mean Absolute Error mLRMC 0.2043 0.3152

ln(mLRMC) 0.2026 0.3162

Mean Squared Error mLRMC 0.1006 0.1885

ln(mLRMC) 0.0999 0.1897

Page 39: Bracketology talk at the Crossroads of ideas

College football playoff committee rankings

2016 Playoff rankings2015 Playoff rankings

Page 40: Bracketology talk at the Crossroads of ideas

2016 Results: Rankings (NOW)

Team Method Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14

Alabama CFP Committee 1 1 1 1 1 1

ln(mLRMC) 1 1 1 1 1 1 1 1

Clemson CFP Committee 2 2 4 4 3 2

ln(mLRMC) 3 4 3 3 5 4 4 3

OSU CFP Committee 6 5 2 2 2 3

ln(mLRMC) 4 5 5 4 3 3 2 2

Washington CFP Committee 5 4 6 5 4 4

ln(mLRMC) 8 7 6 6 6 6 5 3*

* Clemson and Washington were tied

Page 41: Bracketology talk at the Crossroads of ideas

2016 Results: RankingsForecasted ranking of likelihood to make playoff (any seed, out of 1000)

Team Method Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14

Alabama

ln(mLRMC)Forecasted ranking 1 1 1 1 1 1 NAln(mLRMC) now ranking 1 1 1 1 1 1 1 1

Clemson

ln(mLRMC) Forecasted ranking 2 2 2 2 2 3 NAln(mLRMC) now ranking 3 4 3 3 5 4 4 3

OSU

ln(mLRMC) Forecasted ranking 5 5 5 4 5 2 NAln(mLRMC) now ranking 4 5 5 4 3 3 2 2

Washington

ln(mLRMC) Forecasted ranking 3 4 4 6 4 4 NAln(mLRMC) now ranking 8 7 6 6 6 6 5 3*

* Clemson and Washington were tied

Page 42: Bracketology talk at the Crossroads of ideas

2015 Results: Rankings (NOW)

Team Method Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14

Clemson CFP Committee 1 1 1 1 1 1

mLRMC 2 3 1 1 2 1 3 2

ln(mLRMC) 7 5 1 1 2 1 1 2

Alabama CFP Committee 4 2 2 2 2 2

mLRMC 4 5 8 3 1 2 1 1

ln(mLRMC) 5 4 6 2 1 2 2 1

MSU CFP Committee 7 13 9 5 5 3

mLRMC 6 4 5 9 9 5 5 3

ln(mLRMC) 6 2 4 7 8 4 4 3

Oklahoma CFP Committee 15 12 7 3 3 4

mLRMC 16 13 13 8 5 3 1 4

ln(mLRMC) 18 12 16 10 5 5 3 4

Page 43: Bracketology talk at the Crossroads of ideas

Team Method Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13

Clemson mLRMC 667 897 931 905 915 949 956

ln(mLRMC) 749 840 897 893 955 923 976

Alabama mLRMC 361 209 166 837 913 943 995

ln(mLRMC) 427 240 197 858 847 931 996

MSU mLRMC 179 213 261 54 24 569 675

ln(mLRMC) 226 349 354 115 162 573 706

Oklahoma mLRMC 20 46 71 119 393 758 1000

ln(mLRMC) 12 73 16 63 142 247 1000

2015 Results: Forecasted number of times to make playoff (out of 1000)

Nebraska beats MSU

MSU beats

The OSU

No Big12 championship

Slight difference in rankings:

3rd /4th vs. 5th /6th

Page 44: Bracketology talk at the Crossroads of ideas

2015 Results: Forecasted ranking of likelihood to make playoff (any seed, out of 1000)

Team Method Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14

Clemson mLRMC 2 1 1 1 1 1 3 2

ln(mLRMC) 2 1 1 1 2 2 3 2

Alabama mLRMC 5 7 6 2 2 2 2 1

ln(mLRMC) 4 6 8 2 1 1 2 1

MSU mLRMC 7 6 7 11 10 4 4 3

ln(mLRMC) 6 5 5 9 6 3 4 3

Oklahoma mLRMC 18 13 13 9 4 3 1 4

ln(mLRMC) 21 15 14 12 8 6 1 4

No Big12 championship

No simulation: the season is

over. We think the committee

got it right!

Ranked 2nd & 7th

after week 7

Ranked 5th & 4th

after week 9

Page 45: Bracketology talk at the Crossroads of ideas

2015 Results:What happened to The Ohio State University?

Rankings after week 12Forecasted rankings after

week 12

1. Clemson 1. Clemson

2. Alabama 2. Alabama

3. Oklahoma 3. Oklahoma

4. Notre Dame 4. Michigan State

5. Michigan State 5. Iowa

6. Ohio State 6. Notre Dame

7. Iowa 7. Stanford

8. Florida 8. Florida

9. Michigan 9. Ohio State

10. Stanford(no other teams have >1%

chance of making the playoff)

Page 46: Bracketology talk at the Crossroads of ideas

Final thoughts about March Madness

Page 47: Bracketology talk at the Crossroads of ideas

Picking the perfect bracket

There are about 9.2 quintillion ways to fill out a bracket…And 1 way to fill out a perfect bracket

The odds of filling out a perfect bracket are not 9-quintillion-to-1 because:

(a) the tournament isn’t like the lottery where every outcome is equally likely, and

(b) monkeys are not randomly selecting game outcomes. Instead, people are purposefully selecting outcomes.

Page 48: Bracketology talk at the Crossroads of ideas

Can math help our odds?

FiveThirtyEight notes that the typical bracket has a 2.5 trillion-to-1 odds of being perfect:

• https://fivethirtyeight.com/features/march-madness-perfect-bracket-odds/

BracketOdds at Illinois estimates that a historical average winning bracket performs at 4.4 billion-to-1

• Warren Buffet may have to pay out!

Page 49: Bracketology talk at the Crossroads of ideas

The thing with perfect bracketsThey depend on the year.

Let’s only look at how many people correctly select all Final Four teams:

– 1140 of 13 million brackets correctly picked all Final Four teams in 2016– 182,709 of 11.57 million brackets correctly picked all Final Four teams in 2015 *– 612 of 11 million brackets correctly picked all Final Four teams in 2014– 47 of 8.15 million brackets correctly picked all Final Four teams in 2013– 23,304 of 6.45 million brackets correctly picked all Final Four teams in 2012– 2 of 5.9 million brackets correctly picked all Final Four teams in 2011

* Only 1 bracket emerged from the round of 64 with all 32 correct picks

Page 50: Bracketology talk at the Crossroads of ideas

Tips for winning your office pool

Page 51: Bracketology talk at the Crossroads of ideas

1. Don’t use RPI

• Badger Bracketology (my favorite tool!)

• Logistic Regression Markov Chain (LRMC)

• FiveThirtyEight rankings of tournament teams

• Ken Pomoroy’s rankings

• Sagarin rankings

• Massey Ratings

• ESPN’s BPI rankings

Rankings clearinghouse: http://www.masseyratings.com/cb/compare.htm

Page 52: Bracketology talk at the Crossroads of ideas

2. Pay attention to the seeds

Some seeds generate more upsets than others• 7-10 seeds and 5/12 seeds

Historically, 6/11 seeds go the longest before facing a 1 or 2 seed.

Page 53: Bracketology talk at the Crossroads of ideas

3. Don’t pick Kansas

• Be strategic. The point is NOT to maximize your points, it’s to get more points than your opponents

• Differentiate your Final Four• Check ESPN for the top picked teams. Some top teams

are overvalued and others are undervalued• Last year:

• Kansas was selected as the overall winner in 27% of brackets (and in 62% of Final Fours) with a 19% chance of winning (538)

• UNC selected as overall winner in 8% of brackets (with a 15% win probability) and Villanova in 5.5%

http://games.espn.com/tournament-challenge-bracket/2016/en/whopickedwhom

https://projects.fivethirtyeight.com/2016-march-madness-predictions/

Page 54: Bracketology talk at the Crossroads of ideas

4. It’s totally random

A good process yields good outcomes on average

• It does not guarantee the best outcome in any given tournament

Small pools are better if you have a good process

• Scoring can be random

• The more brackets, the higher chance that a “random” bracket will be the best

Page 55: Bracketology talk at the Crossroads of ideas

Topics in Sports AnalyticsISYE 601 in Spring 2017!• Goal: teach students data-driven methods for making

better decisions using sports as a vehicle

• Course topics:• Linear regression • Logistic regression• Empirical Bayes• Ranking methods• Probability models and Markov chains• Forecasting• Game theory• Tournament scheduling• Networks (is my team mathematically eliminated from the

playoffs?)…and more!

Page 56: Bracketology talk at the Crossroads of ideas

In the news!

56https://punkrockor.com/in-the-news/

Page 57: Bracketology talk at the Crossroads of ideas

Thank you!Laura Albert McLay

[email protected]

punkrockOR.com

bracketology.engr.wisc.edu

Twitter: @lauramclay, @badgerbrackets