Bracketology talk at the Crossroads of ideas

The Math Behind the March Madness Tournament and

College Football Playoff

Laura Albert McLayAssociate Professor, ISYE

[email protected]@lauramclay

@badgerbracketshttp://bracketology.engr.wisc.edu/

Let’s start with the 2 minute version of my talk

https://www.facebook.com/UWMadison/videos/10154004638653114/

https://www.facebook.com/UWMadison/videos/10154004638653114/

First, of all…

I’m a industrial and systems engineering professor by day And a bracketologist by night!

I study systems

A system is a set of things—people, cells, vehicles, basketball teams, or whatever—interconnected in such a way that they produce their own pattern of behavior over time.

My discipline is operations research: the science of making decisions using advanced analytical methods

Our world is becoming increasingly complex and increasingly connected

Systems matter!

Mathematical models and systems thinkinghelp us study systemsand navigate the complex, interconnectedworld we live in.

What do we hope to learn from probability models like Markov chains?

• How do we draw conclusions from limited data?

• How can we make data-driven decisions in the presence of uncertainty?

How I got started in bracketology

In 2014 someone suggested I examine bracketology in the context of the first College Football Playoff…

…and so began Badger Bracketology

My objective: forecast which teams would make the first college football playoff before the season was over.

Markov chains:The Little Engine that Could Markov chains:

A type of math model for understanding how a system can evolve over time.

Uses: finance, epidemiology, queues, zombies

Markov chains for ranking teams in a nutshell

Each team is a state. A team “votes” for teams that that it loses to

http://sumnous.github.io/blog/2014/07/24/gephi-on-mac/

Graph of 2014 college football season

Simple yet powerful ideaAutomatically rate and ranks teams by taking advantage of the network structure of the match ups

• Use Markov chains to account for strength of schedule

• Do not need a human in the loop

Simple data requirements:

1. Game outcomes (score differentials),

2. Home/away status

Takes difficulty of future games into account in football playoff forecasts

• Polls give the ranking right now, only gives insight a playoff held today

Google PageRank is a Markov model!

Source: google.com

Do you remember Internet searches before Google?

https://www.wordstream.com/articles/internet-search-engines-history

https://www.wordstream.com/articles/internet-search-engines-history

First, let’s talk about ranking basketball teams

TransitionsRutgers 52 @ Wisconsin 72

Wisconsin Rutgers 1 −𝑊

𝑊

𝑊

1 −𝑊

How much credit should Wisconsin get for beating Rutgers by 20 at home?

𝑊 = effective wins (fraction of a vote), which help us compute our Markov chain transition probabilities

Let’s find a data-driven answer!

Given that team 𝑖 beat team 𝑗 by 𝑥 points at home, what is the

probability that 𝑖 is a better team than 𝑗 on a neutral court?

Data: Some teams play twice per season (home away)

Given that team 𝑖 beat team 𝑗 by 𝑥 points at home, what is the

probability that 𝑖 is a better team than 𝑗 on 𝑗′𝑠 home court?

𝑟𝑥𝐻 𝑟𝑥

𝐴 = probability that a team outscores its opponent by 𝑥

points at home 𝐻 (away 𝐴) is better than its opponent on a

neutral 𝑁 site

Developed by Sokol, Kvam, Nemhauser, and Brown at Georgia Tech to rank NCAA men’s basketball teams https://www2.isye.gatech.edu/~jsokol/lrmc/

What is the probability you win your next game (on the road) given that you win by 20 at home?

Logistic regression to the rescue!

Problem 1: must win by 50+ points to get a lot of credit for a win!Winning/losing close games gives you the same amount of “credit”

Margin of victory 𝑥Pro

bab

ility

of

win

nin

g o

n t

he

ro

ad n

ext

tim

e

Problem 2: We need to get neutral site win probabilities

Logistic regression for NCAA men’s basketball • Use log (Point differentials) instead!

• Do not truncate point differentials

-30 -20 -10 0 10 20 300

0.2

0.4

0.6

0.8

1

Point differential

Eff

ecti

ve w

ins

Winning matters• Average in a pure win/loss model to give more credit for winning the

game

-30 -20 -10 0 10 20 300

0.2

0.4

0.6

0.8

1

Point differential

Eff

ecti

ve w

ins

-30 -20 -10 0 10 20 300

0.2

0.4

0.6

0.8

1

Point differential

Eff

ecti

ve w

ins

Putting it all together• End up with the red line!

-30 -20 -10 0 10 20 300

0.2

0.4

0.6

0.8

1

Point differential

Eff

ecti

ve w

ins

-30 -20 -10 0 10 20 300

0.2

0.4

0.6

0.8

1

Point differential

Eff

ecti

ve w

ins

Markov chain transition probabilitiesRutgers 52 @ Wisconsin 72 *

Wisconsin Rutgers 1 −𝑊

𝑊

𝑊

1 −𝑊

How much credit should Wisconsin get for beating Rutgers by 20 at home?

P(UW beats Rutgers on a neutral court) = 0.6255

𝑊 = 0.6817 effective wins (fraction of a vote)

* Wisconsin 61 @ Rutgers 54 later on 1/28/2017

TransitionsSame idea for the rest of the games…

Wisconsin

Minnesota

Northwestern

Rutgers

Illinois

Current rankings3/12/2017 Selection Sunday

1 Gonzaga2 Villanova3 Kentucky4 SMU5 Wichita St6 Arizona7 UCLA8 Duke9 Cincinnati10 Oregon11 MTSU12 North Carolina13 St Marys CA

14 West Virginia15 Kansas16 Nevada17 Purdue18 Vermont19 UNC Wilmington20 Michigan21 Florida St22 VA Commonwealth23 Notre Dame24 Bucknell25 Wisconsin

The B1G, ranked.3/12/2017

17 Purdue20 Michigan25 Wisconsin41 Northwestern43 Minnesota54 Maryland78 Indiana87 Michigan St121 Iowa130 Illinois141 Ohio St176 Penn St187 Rutgers242 Nebraska

How did we do last year?3/13/2016 Selection Sunday

1. North Carolina

2. Kansas

3. Villanova

4. Michigan St

5. Virginia

6. West Virginia

7. Oklahoma

8. Kentucky

9. Oregon

10. Purdue

11. Xavier

12. Miami FL

13. Duke

14. Utah

15. Texas A&M

16. Louisville

17. Maryland

18. Arizona

19. Seton Hall

20. Iowa St

21. Indiana

22. California

23. Baylor

24. St Josephs PA

25. Iowa

Now let’s talk about the College Football Playoff

College Football Playoff

Objective: determine which teams would make the first college football playoff.

Goal: to forecast the top 4 teams weeks before the season ends.

Solution method: a ranking method.

Challenge: need to simulate the remainder of the season and rank the teams at the end of the (simulated) season.

Giant assumption

• We assume the selection committee will pick the four ranked teams in the playoff.

• History suggests that humans prefer the most deserving teams rather than the best teams in the national championship game.

• E.g., 2013 Alabama lost on a fluke play.

• …but the College Football Selection Committee might have changed this!

2013 BCS Rankings just before bowl bids

College football playoff committee rankings

2014 Playoff rankings 2015 Playoff rankings

How we did last year

2016 Playoff Rankings Badger Bracketology rankings1 Alabama

2 Ohio State

3 Clemson

3 Washington

5 Michigan

6 Penn State

7 Western Michigan

8 Louisville

9 Oklahoma

10 Wisconsin :(

Model: two parts

0. Observe a few (7-8) weeks of game outcomes

1. Ranking.• Assign a rating to each team to rank the teams.• Similar to what we had before but with college football data

2. Game simulation.• Determine who wins a game based on the team ratings.

Simulate the next week’s game outcomes.

• Combine these:• Re-rate and re-rank after each week of games.• Simulate the remainder of the season.• Report teams most likely to be in the top 4

Score differentialsYes, running up the score matters, mathematically.

Histogram of score differentials, 2012-2014

Home score - away score

Fre

qu

en

cy

-60 -40 -20 0 20 40 60 80

05

01

00

15

02

00

Capped score differentials38% of conference games fall beyond the cap

Histogram of score differentials capped at +/-21, 2012-2014

Home score - away score

Fre

qu

en

cy

-60 -40 -20 0 20 40 60 80

05

01

00

15

02

00

25

0

Note: Rating systems used by College Football Playoff committee must use wins/losses only (not score differentials). Running up the score makes a difference!

-20 -10 0 10 200

0.2

0.4

0.6

0.8

1

Point differential

Eff

ecti

ve w

ins

Sx

H

rx

H

rx

N

Build the Markov chain for football

• Used 3 seasons of data (truncate scores by +/-21)

• Use games played in consecutive years to identify win

probabilities to feed into the Markov chain

-20 -15 -10 -5 0 5 10 15 200

0.2

0.4

0.6

0.8

1

logistic regression

logistic regression averaged with win (weight = 2/3)

logistic regression averaged with win (weight = 1/3)

Modified Log Logistic Regression Markov Chain (ln(mLRMC))• Same as mLRMC except that we consider log point differentials to

dampen big score differentials

• Do not truncate point differentials

-20 -10 0 10 200

0.2

0.4

0.6

0.8

1

Point differential

Eff

ecti

ve w

ins

logistic regression (home team)

logistic regression averaged with win

Markov chain transitionsUse mLRMC and ln(mLRMC) for all games

Simulate the rest of the season!

0. Observe a few (7-8) weeks of game outcomes

1. Ranking.• Assign a rating to each team to rank the teams.

2. Game simulation.• Determine who wins a game based on the team ratings.

Simulate the next week’s game outcomes.

• Combine these:• Re-rate and re-rank after each week of games.• Simulate the remainder of the season.• Report teams most likely to be in the top 4

Win probability parametersThe win probability between teams 𝑖 and 𝑗, where 𝑖 is the home team is captured by the best-fit logistic regression model using two years of game data:

𝑝𝑖𝑗 =𝑒𝑏+𝑎(𝑟𝑖−𝑟𝑗)

1 + 𝑒𝑏+𝑎(𝑟𝑖−𝑟𝑗)

where 𝑟𝑖 − 𝑟𝑗 = the difference in ratings between the two teams.

and assign a point differential to the winner.

Game prediction accuracy (averaged per game)

Statistic Model Training set Test set

Mean Absolute Error mLRMC 0.2043 0.3152

ln(mLRMC) 0.2026 0.3162

Mean Squared Error mLRMC 0.1006 0.1885

ln(mLRMC) 0.0999 0.1897

College football playoff committee rankings

2016 Playoff rankings2015 Playoff rankings

2016 Results: Rankings (NOW)

Team Method Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14

Alabama CFP Committee 1 1 1 1 1 1

ln(mLRMC) 1 1 1 1 1 1 1 1

Clemson CFP Committee 2 2 4 4 3 2

ln(mLRMC) 3 4 3 3 5 4 4 3

OSU CFP Committee 6 5 2 2 2 3

ln(mLRMC) 4 5 5 4 3 3 2 2

Washington CFP Committee 5 4 6 5 4 4

ln(mLRMC) 8 7 6 6 6 6 5 3*

* Clemson and Washington were tied

2016 Results: RankingsForecasted ranking of likelihood to make playoff (any seed, out of 1000)


Alabama

ln(mLRMC)Forecasted ranking 1 1 1 1 1 1 NAln(mLRMC) now ranking 1 1 1 1 1 1 1 1

Clemson

ln(mLRMC) Forecasted ranking 2 2 2 2 2 3 NAln(mLRMC) now ranking 3 4 3 3 5 4 4 3

OSU

ln(mLRMC) Forecasted ranking 5 5 5 4 5 2 NAln(mLRMC) now ranking 4 5 5 4 3 3 2 2

Washington

ln(mLRMC) Forecasted ranking 3 4 4 6 4 4 NAln(mLRMC) now ranking 8 7 6 6 6 6 5 3*

* Clemson and Washington were tied

2015 Results: Rankings (NOW)


Clemson CFP Committee 1 1 1 1 1 1

mLRMC 2 3 1 1 2 1 3 2

ln(mLRMC) 7 5 1 1 2 1 1 2

Alabama CFP Committee 4 2 2 2 2 2

mLRMC 4 5 8 3 1 2 1 1

ln(mLRMC) 5 4 6 2 1 2 2 1

MSU CFP Committee 7 13 9 5 5 3

mLRMC 6 4 5 9 9 5 5 3

ln(mLRMC) 6 2 4 7 8 4 4 3

Oklahoma CFP Committee 15 12 7 3 3 4

mLRMC 16 13 13 8 5 3 1 4

ln(mLRMC) 18 12 16 10 5 5 3 4

Team Method Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13

Clemson mLRMC 667 897 931 905 915 949 956

ln(mLRMC) 749 840 897 893 955 923 976

Alabama mLRMC 361 209 166 837 913 943 995

ln(mLRMC) 427 240 197 858 847 931 996

MSU mLRMC 179 213 261 54 24 569 675

ln(mLRMC) 226 349 354 115 162 573 706

Oklahoma mLRMC 20 46 71 119 393 758 1000

ln(mLRMC) 12 73 16 63 142 247 1000

2015 Results: Forecasted number of times to make playoff (out of 1000)

Nebraska beats MSU

MSU beats

The OSU

No Big12 championship

Slight difference in rankings:

3rd /4th vs. 5th /6th

2015 Results: Forecasted ranking of likelihood to make playoff (any seed, out of 1000)


Clemson mLRMC 2 1 1 1 1 1 3 2

ln(mLRMC) 2 1 1 1 2 2 3 2

Alabama mLRMC 5 7 6 2 2 2 2 1

ln(mLRMC) 4 6 8 2 1 1 2 1

MSU mLRMC 7 6 7 11 10 4 4 3

ln(mLRMC) 6 5 5 9 6 3 4 3

Oklahoma mLRMC 18 13 13 9 4 3 1 4

ln(mLRMC) 21 15 14 12 8 6 1 4

No Big12 championship

No simulation: the season is

over. We think the committee

got it right!

Ranked 2nd & 7th

after week 7

Ranked 5th & 4th

after week 9

2015 Results:What happened to The Ohio State University?

Rankings after week 12Forecasted rankings after

week 12

1. Clemson 1. Clemson

2. Alabama 2. Alabama

3. Oklahoma 3. Oklahoma

4. Notre Dame 4. Michigan State

5. Michigan State 5. Iowa

6. Ohio State 6. Notre Dame

7. Iowa 7. Stanford

8. Florida 8. Florida

9. Michigan 9. Ohio State

10. Stanford(no other teams have >1%

chance of making the playoff)

Final thoughts about March Madness

Picking the perfect bracket

There are about 9.2 quintillion ways to fill out a bracket…And 1 way to fill out a perfect bracket

The odds of filling out a perfect bracket are not 9-quintillion-to-1 because:

(a) the tournament isn’t like the lottery where every outcome is equally likely, and

(b) monkeys are not randomly selecting game outcomes. Instead, people are purposefully selecting outcomes.

Can math help our odds?

FiveThirtyEight notes that the typical bracket has a 2.5 trillion-to-1 odds of being perfect:

• https://fivethirtyeight.com/features/march-madness-perfect-bracket-odds/

BracketOdds at Illinois estimates that a historical average winning bracket performs at 4.4 billion-to-1

• Warren Buffet may have to pay out!

https://fivethirtyeight.com/features/march-madness-perfect-bracket-odds/

The thing with perfect bracketsThey depend on the year.

Let’s only look at how many people correctly select all Final Four teams:

– 1140 of 13 million brackets correctly picked all Final Four teams in 2016– 182,709 of 11.57 million brackets correctly picked all Final Four teams in 2015 *– 612 of 11 million brackets correctly picked all Final Four teams in 2014– 47 of 8.15 million brackets correctly picked all Final Four teams in 2013– 23,304 of 6.45 million brackets correctly picked all Final Four teams in 2012– 2 of 5.9 million brackets correctly picked all Final Four teams in 2011

* Only 1 bracket emerged from the round of 64 with all 32 correct picks

http://ftw.usatoday.com/2016/03/ncaa-bracket-final-four

http://espn.go.com/blog/collegebasketballnation/post/_/id/106391/tournament-challenge-1-6-percent-nailed-final-four

http://espn.go.com/blog/collegebasketballnation/post/_/id/97906/tournament-challenge-unlikely-foursome

http://articles.courant.com/2013-04-01/business/hc-espn-final-four-brackets-20130401_1_wichita-state-brackets-tournament-challenge

http://espn.go.com/blog/collegebasketballnation/post/_/id/56444/less-than-one-percent-nailed-final-four

http://espn.go.com/blog/collegebasketballnation/post/_/id/28729/tournament-challenge-two-have-all-four

Tips for winning your office pool

1. Don’t use RPI

• Badger Bracketology (my favorite tool!)

• Logistic Regression Markov Chain (LRMC)

• FiveThirtyEight rankings of tournament teams

• Ken Pomoroy’s rankings

• Sagarin rankings

• Massey Ratings

• ESPN’s BPI rankings

Rankings clearinghouse: http://www.masseyratings.com/cb/compare.htm

http://www.masseyratings.com/cb/compare.htm

2. Pay attention to the seeds

Some seeds generate more upsets than others• 7-10 seeds and 5/12 seeds

Historically, 6/11 seeds go the longest before facing a 1 or 2 seed.

3. Don’t pick Kansas

• Be strategic. The point is NOT to maximize your points, it’s to get more points than your opponents

• Differentiate your Final Four• Check ESPN for the top picked teams. Some top teams

are overvalued and others are undervalued• Last year:

• Kansas was selected as the overall winner in 27% of brackets (and in 62% of Final Fours) with a 19% chance of winning (538)

• UNC selected as overall winner in 8% of brackets (with a 15% win probability) and Villanova in 5.5%

http://games.espn.com/tournament-challenge-bracket/2016/en/whopickedwhom

https://projects.fivethirtyeight.com/2016-march-madness-predictions/

http://games.espn.com/tournament-challenge-bracket/2016/en/whopickedwhom

https://projects.fivethirtyeight.com/2016-march-madness-predictions/

4. It’s totally random

A good process yields good outcomes on average

• It does not guarantee the best outcome in any given tournament

Small pools are better if you have a good process

• Scoring can be random

• The more brackets, the higher chance that a “random” bracket will be the best

Topics in Sports AnalyticsISYE 601 in Spring 2017!• Goal: teach students data-driven methods for making

better decisions using sports as a vehicle

• Course topics:• Linear regression • Logistic regression• Empirical Bayes• Ranking methods• Probability models and Markov chains• Forecasting• Game theory• Tournament scheduling• Networks (is my team mathematically eliminated from the

playoffs?)…and more!

In the news!

56https://punkrockor.com/in-the-news/

Thank you!Laura Albert McLay

[email protected]

punkrockOR.com

bracketology.engr.wisc.edu

Twitter: @lauramclay, @badgerbrackets