16
Simulating Sports: The Inputs and the Engines Paul Bessire General Manager, Co-Founder PredictionMachine.com September 29, 2010

Simulating Sports: The Inputs and the Engines Paul Bessire General Manager, Co-Founder PredictionMachine.com September 29, 2010

Embed Size (px)

Citation preview

Page 1: Simulating Sports: The Inputs and the Engines Paul Bessire General Manager, Co-Founder PredictionMachine.com September 29, 2010

Simulating Sports: The Inputs and the Engines

Paul BessireGeneral Manager, Co-Founder

PredictionMachine.comSeptember 29, 2010

Page 2: Simulating Sports: The Inputs and the Engines Paul Bessire General Manager, Co-Founder PredictionMachine.com September 29, 2010

Table of Contents

• Intro

• PredictionMachine.com & Simulation Overview

• Simulating Baseball

• Plate Appearance Decision Tree

• Examples (more second presentation)

Page 3: Simulating Sports: The Inputs and the Engines Paul Bessire General Manager, Co-Founder PredictionMachine.com September 29, 2010

Introduction• 2004 University of Cincinnati BBA, Finance and QA

• 2005 MSQA - Master’s Project (with Dr. Fry):

Measuring Individual and Team Effectiveness in the NBA Through Multivariate Regression

• 2004 – 2009 WhatIfSports.com/FOXSports.com, Director, Content and Quantitative Analysis

• 2010 Launched PredictionMachine.com in February

Page 4: Simulating Sports: The Inputs and the Engines Paul Bessire General Manager, Co-Founder PredictionMachine.com September 29, 2010

About PredictionMachine.com• “We play the game 50,000 times before it’s actually played.”

• Built by Paul Bessire to focus on content after six years at WhatIfSports.com/FOXSports

• February 2010 - Launched with Super Bowl Prediction (Indianapolis 28 – New Orleans 27)

• “Predictalator” – Simulation engine plays entire NFL season 50,000 times in 8 seconds

• March Madness, NBA Playoffs, MLB Daily, College Football, NFL

• Customizable Predictalator – Any teams, Any where, Any line

• Fantasy Football Projections

• Live simulator built to analyze in-game winning probabilities and value in coaching decisions

Page 5: Simulating Sports: The Inputs and the Engines Paul Bessire General Manager, Co-Founder PredictionMachine.com September 29, 2010

Sports Simulation• Play-by-play

– A “play” means something different for each sport

– Probabilities for every individual outcome

– Random number generation

– Pitch-by-pitch (or basketball/hockey pass-by-pass) not needed

– Account for every possible statistical interaction during a game

• Can be recreated quickly– 50,000+ games/second

– All data tracked

– Every outcome is different

– Boxscores

Page 6: Simulating Sports: The Inputs and the Engines Paul Bessire General Manager, Co-Founder PredictionMachine.com September 29, 2010

Significant StatsPitchers

• HBP/BF• BB/(BF – HBP)• OAV• 1B/Hit Allowed• 2B/Hit Allowed• 3B/Hit Allowed• HR/Hit Allowed• K/Out• GO/FO• BF• Pitches Thrown/BF• Relative Range Factor• Fielding Percentage• Handedness• Ballpark Effects• League Averages

Hitters• HBP/PA• BB/(PA – HBP)• AVG• 1B/Hit • 2B/Hit• 3B/Hit• HR/Hit• K/Out• GO/FO• PA• Relative Range Factor• Fielding Percentage• Catcher Arm Rating• CS% (Runner)• Speed Rating• Handedness• Ballpark Effects• League Averages

Page 7: Simulating Sports: The Inputs and the Engines Paul Bessire General Manager, Co-Founder PredictionMachine.com September 29, 2010

Insignificant StatsPitchers

• Wins• Losses• Saves• Holds• Complete Games• Shutouts

• ERA (kind of – 2B and 3B approx)

• Unearned Runs• Games Started

• Pitch Types

• Performance in Counts

• Other Situational Stats

Hitters• RBI

• IBB

• Runs (kind of – in Speed Formula)

• GIDP (kind of – in Speed Formula)

• SF (kind of – in PA, but also situational)

• SH (kind of – in PA, in but also situational)

• SBA (kind of – attempts, but also setting)

• Performance in Counts

• Other Situational Stats

Page 8: Simulating Sports: The Inputs and the Engines Paul Bessire General Manager, Co-Founder PredictionMachine.com September 29, 2010

Ballpark Effects

Page 9: Simulating Sports: The Inputs and the Engines Paul Bessire General Manager, Co-Founder PredictionMachine.com September 29, 2010

Ballparks – Extremes (Min. 3 seasons)

Effect Ballpark High Ballpark Low

Hits Coors Field 1.182 Petco Park .908

2B Baker Bowl 1.291 Dodger Stadium .795

3B Palace of the Fans 1.868 Great American Ballpark .523

HR_RF Coors Field 1.374 Municipal Stadium .636

HR_LF Coors Field 1.385 Municipal Stadium .634

Runs (unused) Coors Field 1.380 Petco Park .830

Page 10: Simulating Sports: The Inputs and the Engines Paul Bessire General Manager, Co-Founder PredictionMachine.com September 29, 2010

PA Decision Tree - NormalizationEvery step in PA uses modified* log5 normalization (Bill James AVG example):

H/AB = ((AVG * OAV) / LgAVG) /

((AVG * OAV) / LgAVG + (1- AVG )*(1- OAV)/(1-LgAvg))

Where, LgAVG = (PLgAVG + BLgAVG)/2

2000 Pedro vs. 1923 Ruth Example:

H/AB = ((.393 * .167) / .2791) /

((.393 * .167) / .2791+ (1- .393)*(1- .167)/(1-.2791))

Where, LgAVG = (.283 + .276)/2 or .2791

Result = .2504

* Modified due to a flaw in the assumption above that the batter and pitcher carry equal (50/50) weights on each possible outcome of the PA event. Also accounts for handedness and ballpark.

Page 11: Simulating Sports: The Inputs and the Engines Paul Bessire General Manager, Co-Founder PredictionMachine.com September 29, 2010

PA Decision Tree – Steps 1*Plate Appearance

Unusual Event(IBB, WP, PB, SB, CS, SH, Hit and Run, Pickoff, Balk)

Normal PA

HBP(per PA or BFP)

Not HBP

BB(per PA or BFP – HBP)

At Bat…

* No ballpark or handedness adjustments made yet.

Page 12: Simulating Sports: The Inputs and the Engines Paul Bessire General Manager, Co-Founder PredictionMachine.com September 29, 2010

PA Decision Tree – Steps 2At-Bat

Out Hit…(AVG vs. OAV)*

Strikeout(K/Out)

Normal(Logic to determine direction

and GO or FO)

Hit(Poor Play)

Error(Fielding Percentage)

Normal

* Historical handedness adjustment and ballpark hits multiplier used.

Page 13: Simulating Sports: The Inputs and the Engines Paul Bessire General Manager, Co-Founder PredictionMachine.com September 29, 2010

PA Decision Tree – Steps 3Hit*

Normal – In Play HR*(HR/Hit)

Out(Plus Play)

Normal Hit

3B*(3B/Hit * multiplier

for lost HR)

2B*(2B/Hit * multiplier

for lost HR)

1B

* Ballpark multipliers used.

Page 14: Simulating Sports: The Inputs and the Engines Paul Bessire General Manager, Co-Founder PredictionMachine.com September 29, 2010

PA Decision Tree – Matchup Weights

Addresses previous 50/50 assumption using League-Adjusted Variance to form batter and pitcher weights for each step:

  HBP/PA BB/(PA-HBP) H/AB K/(OUT) HR/HIT 2B/HIT 3B/HIT

Pitcher% 47.8 43.5 46.7 45.6 39.7 15.2 11.6

Hitter% 52.2 56.5 53.3 54.4 60.3 84.8 88.4

Page 15: Simulating Sports: The Inputs and the Engines Paul Bessire General Manager, Co-Founder PredictionMachine.com September 29, 2010

Matchup Weights: What does this mean?

• Batter always has more control (even with HBP and BB)

– Makes final decision (Swing or not)– Dictates strike zone– Less consistent

• Doubles and Triples are (mostly) out of pitcher’s control (BABIP)

• Does not necessarily batting is more important

– 9 vs. 1– Fewer pitcher outliers means elite pitchers are more valuable

Page 16: Simulating Sports: The Inputs and the Engines Paul Bessire General Manager, Co-Founder PredictionMachine.com September 29, 2010

PA Decision Tree - NormalizationBatting Average Example using Matchup Weights:

H/AB = ((1.066*AVG * .934*OAV) / LgAVG) /

((1.066*AVG * .934*OAV) / LgAVG + (1.066- 1.066*AVG )*(.934- .934*OAV)/(1-LgAvg))

Where, LgAVG = (.934*PLgAVG + 1.066*BLgAVG)/2

2000 Pedro vs. 1923 Ruth Example (with handedness):

H/AB = ((1.066*.393 * .167 * .934) / .2795) /

((.393 * .167) / .2795+ (1- .393)*(1- .167)/(1-.2795))

Where, LgAVG = (1.066*.283 + 0.934*.276)/2 or .2795

Result * Handedness = .2502 * 1.045

Final Result = .2614