A Multiresolution Stochastic Process Model for Basketball ...dcervone.com/slides/epv_jsm.pdfNBA optical tracking data (x;y) locations for all 10 players (5 on each team) at 25Hz. (x;y;z)

A Multiresolution Stochastic Process Model forBasketball Possession Outcomes

Dan Cervone, Alex D’Amour, Luke Bornn, Kirk Goldsberry

Harvard Statistics Department

August 11, 2015

Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015

NBA optical tracking data



(x , y) locations for all 10 players (5 on each team) at 25Hz.

(x , y , z) locations for the ball at 25Hz.

Event annotations (shots, passes, fouls, etc.).

1230 games from 2013-14 NBA, each 48 minutes, featuring 461 players intotal.


http://dcervone.com/wp-content/uploads/2015/01/nba_data.gif


(x , y) locations for all 10 players (5 on each team) at 25Hz.

(x , y , z) locations for the ball at 25Hz.

Event annotations (shots, passes, fouls, etc.).

1230 games from 2013-14 NBA, each 48 minutes, featuring 461 players intotal.


http://dcervone.com/wp-content/uploads/2015/01/nba_data.gif

Expected Possession Value (EPV)

JamesshotNORRIS COLE POSSESSION

RASHARDLEWISPOSS.

LEBRON JAMES POSSESSION

EPV constant whilepass is en route

PassPass

Slight dip in EPV aftercrossing 3 point line


Acceleratestowards basket

Runs behindbasket and

defenders close

Acceleratesinto the paint

Splits the defense;clear path to

basket

0.8

1.0

1.2

1.4

1.6

time

EP

V

0 3 6 9 12 15 18


EPV definition

Let Ω be the space of all possible basketball possessions. For ω ∈ Ω

X (ω) ∈ 0, 2, 3: point value of possession ω.

T (ω): possession length

Zt(ω), 0 ≤ t ≤ T (ω): time series of optical tracking data.

F (Z)t = σ(Z−1

s : 0 ≤ s ≤ t): natural filtration.

Definition

The expected possession value (EPV) at time t ≥ 0 during a possession is

νt = E[X |F (Z)t ].

EPV provides an instantaneous snapshot of the possession’s value, given itsfull spatiotemporal history.

νt is a Martingale: E[νt+h|F (Z)t ] = νt for all h > 0.


EPV definition





F (Z)t = σ(Z−1


Definition


νt = E[X |F (Z)t ].




EPV definition





F (Z)t = σ(Z−1


Definition


νt = E[X |F (Z)t ].




EPV definition





F (Z)t = σ(Z−1


Definition


νt = E[X |F (Z)t ].




EPV definition





F (Z)t = σ(Z−1


Definition


νt = E[X |F (Z)t ].




EPV definition





F (Z)t = σ(Z−1


Definition


νt = E[X |F (Z)t ].




EPV definition





F (Z)t = σ(Z−1


Definition


νt = E[X |F (Z)t ].




Calculating EPVνt = E[X |F (Z)

t ]

Regression-type prediction methods:

– Data are not traditional input/output pairs.

– No guarantee of stochastic consistency.

Markov chains:

+ Stochastically consistent.

– Information is lost through discretization.

– Many rare transitions.

Brute force, “God model” for basketball.

+ Allows Monte Carlo calculation of νt by simulating future possession paths.

– Zt is high dimensional and includes discrete events (passes, shots, turnovers).



t ]




Markov chains:









t ]




Markov chains:








A coarsened process

Finite collection of states C = Cposs ∪ Cend ∪ Ctrans.

Cposs: Ball possession states

player × region × defender within 5 feet

Cend: End states

made 2, made 3, turnover

Ctrans: Transition states

pass linking c , c ′ ∈ Cposs,shot attempt from c ∈ Cposs,

turnover in progress,rebound in progress .

Ct ∈ C: state of the possession at time t.

C (0),C (1), . . . ,C (K): discrete sequence of distinct states.


A coarsened process



player × region × defender within 5 feetCend: End states








A coarsened process











A coarsened process











Possible paths for Ct

Cposs

Player 1

Passto

P2 Player 2

Passto

P3 Player 3

Passto

P4 Player 4

Passto

P5 Player 5

Ctrans Turnover Shot Rebound

Cend Made 2pt Made 3pt End of possession

Figure 1: Schematic of the coarsened possession process C, with states (rectangles) and possiblestate transitions (arrows) shown. The unshaded states in the first row compose Cposs. Here, statescorresponding to distinct ballhandlers are grouped together (Player 1 through 5), and the discretizedcourt in each group represents the player’s coarsened position and defended state. The gray shadedrectangles are transition states, Ctrans, while the rectangles in the third row represent the end states,Cend. Blue arrows represent possible macrotransition entrances (and red arrows, macrotransitionexits) when Player 1 has the ball; these terms are introduced in Section ??.

1

τt =

mins : s > t,Cs ∈ Ctrans if Ct ∈ Cposs

t if Ct 6∈ Cposs

δt = mins : s ≥ τt ,Cs 6∈ Ctrans.


Possible paths for Ct

Cposs

Player 1

Passto

P2 Player 2

Passto

P3 Player 3

Passto

P4 Player 4

Passto

P5 Player 5

Ctrans Turnover Shot Rebound

Cend Made 2pt Made 3pt End of possession

Figure 1: Schematic of the coarsened possession process C, with states (rectangles) and possiblestate transitions (arrows) shown. The unshaded states in the first row compose Cposs. Here, statescorresponding to distinct ballhandlers are grouped together (Player 1 through 5), and the discretizedcourt in each group represents the player’s coarsened position and defended state. The gray shadedrectangles are transition states, Ctrans, while the rectangles in the third row represent the end states,Cend. Blue arrows represent possible macrotransition entrances (and red arrows, macrotransitionexits) when Player 1 has the ball; these terms are introduced in Section ??.

1

τt =


t if Ct 6∈ Cposs



Stopping times for switching resolutions

τt =


t if Ct 6∈ Cposs


Key assumptions:

A1 C is marginally semi-Markov.

A2 For all s > δt , P(Cs |Cδt ,F (Z)t ) = P(Cs |Cδt ).

Theorem

Assume (A1)–(A2), then for all 0 ≤ t < T,

νt =∑

c∈Ctrans∪Cend

E[X |Cδt = c]P(Cδt = c |F (Z)t ).



τt =


t if Ct 6∈ Cposs


Key assumptions:



Theorem


νt =∑

c∈Ctrans∪Cend




τt =


t if Ct 6∈ Cposs


Key assumptions:



Theorem


νt =∑

c∈Ctrans∪Cend




τt =


t if Ct 6∈ Cposs


Key assumptions:



Theorem


νt =∑

c∈Ctrans∪Cend



Multiresolution models

EPV:νt =

∑c∈Ctrans∪Cend


Let M(t) be the event τt ≤ t + ε.M1 P(Zt+ε|M(t)c ,F (Z)

t ): the microtransition model.

M2 P(M(t)|F (Z)t ): the macrotransition entry model.

M3 P(Cδt |M(t),F (Z)t ): the macrotransition exit model.

M4 P, with Pqr = P(C (n+1) = cr |C (n) = cq): the Markov transition probabilitymatrix.

Monte Carlo computation of νt :

Draw Cδt |F (Z)t using (M1)–(M3).

Calculate E[X |Cδt ] using (M4).



EPV:νt =













EPV:νt =












Microtransition model

Player `’s position at time t is z`(t) = (x`(t), y `(t)).

x`(t + ε) = x`(t) + α`x [x`(t)− x`(t − ε)] + η`x(t)

η`x(t) ∼ N (µ`x(z`(t)), (σ`x)2).µx has Gaussian Process prior.y `(t) modeled analogously (and independently).

Defensive microtransition model based on defensive matchups[Franks et al., 2015].






TONY PARKER WITH BALL DWIGHT HOWARD WITH BALL







TONY PARKER WITHOUT BALL DWIGHT HOWARD WITHOUT BALL







TONY PARKER WITHOUT BALL DWIGHT HOWARD WITHOUT BALL



Macrotransition entry model

Recall M(t) = τt ≤ t + ε:Six different “types”, based on entry state Cτt ,∪6

j=1Mj(t) = M(t).

Hazards: λj(t) = limε→0P(Mj (t)|F (Z)

t )ε .

log(λj(t)) = [W`j (t)]′β`j + ξ`j

(z`(t)

)




j=1Mj(t) = M(t).


t )ε .


(z`(t)

)




j=1Mj(t) = M(t).


t )ε .


(z`(t)

)+ ξ`j (zj(t)) 1[j ≤ 4]


Hierarchical modeling

Dwight Howard’s shot chart:


Hierarchical modeling

Shrinkage needed:

Across space.

Across different players.


Basis representation of spatial effects

Spatial effects ξ`j`: ballcarrier identity.

j : macrotransition type (pass, shot, turnover).

Functional basis representation

ξ`j (z) = [w`j ]′φj(z).

φj = (φji . . . φjd)′: d spatial basis functions.

w`j : weights/loadings.

Information sharing

φj allows for non-stationarity, correlations between disjoint regions[Higdon, 2002].

w`j : weights across players follow a CAR model [Besag, 1974] based on playersimilarity graph H.






ξ`j (z) = [w`j ]′φj(z).



Information sharing








ξ`j (z) = [w`j ]′φj(z).



Information sharing





Basis functions φj learned in pre-processing step:

Graph H learned from players’ court occupancy distribution:



Basis functions φj learned in pre-processing step:

Graph H learned from players’ court occupancy distribution:


Inference

“Partially Bayes” estimation of all model parameters:

Multiresolution transition models provide partial likelihood factorization[Cox, 1975].

All model parameters estimated using R-INLA software[Rue et al., 2009, Lindgren et al., 2011].

Distributed computing implementation:

Preprocessing involves low-resource, highly parallelizable tasks.

Parameter estimation involves several CPU- and memory-intensive tasks.

Calculating EPV from parameter estimates involves low-resource, highlyparallelizable tasks.


Inference









Inference









New insights from basketball possessions

JamesshotNORRIS COLE POSSESSION

RASHARDLEWISPOSS.

LEBRON JAMES POSSESSION

EPV constant whilepass is en route

PassPass



Acceleratestowards basket

Runs behindbasket and

defenders close

Acceleratesinto the paint

Splits the defense;clear path to

basket

0.8

1.0

1.2

1.4

1.6

time

EP

V

0 3 6 9 12 15 18


New insights from basketball possessions0.

81.

01.

21.

41.

6

time

EP

V

0 3 6 9 12 15 18

A

B

C

D

1

2 3

4

5

1

2

34

51

2 3

4

5

1

2

34

51

2 3

4

5

1

2

34

5

A

V 1.42P 0.83

V 1.09P 0.00

V 0.98P 0.04

V 1.01P 0.06

V 1.03P 0.04

TURNOVERV 0.00P 0.02

OTHERV 0.99P 0.02

LEGENDMOVEMENT

Macro History Micro

MACROTRANSITION VALUE (V)

High Medium Low

MACROTRANSITION PROBABILITY (P)High Medium Low

OFFENSE1

2

3

4

5

Norris ColeRay AllenRashard LewisLeBron JamesChris Bosh

DEFENSE1

2

3

4

5

Deron WilliamsJason Terry

Joe JohnsonAndray Blatche

Brook Lopez

1

23

4

5

1

2

3

451

23

4

5

1

2

3

451

23

4

5

1

2

3

45

B

V 1.17P 0.27

V 1.02P 0.03 V 0.98

P 0.04

V 1.01P 0.23

V 1.03P 0.12


OTHERV 1.00P 0.16

1

23

4

5

1

23

45

1

23

4

5

1

23

45

1

23

4

5

1

23

45

C

V 1.01P 0.05

V 1.00P 0.13

V 1.01P 0.04

V 1.01P 0.61

V 1.05P 0.04 TURNOVER

V 0.00P 0.01

OTHERV 0.97P 0.13

12

3

4

5

12

3

4

5

12

3

4

5

12

3

4

5

12

3

4

5

12

3

4

5

D

V 1.50P 0.83

V 1.01P 0.01

V 1.09P 0.01

V 0.98P 0.04

V 1.03P 0.00


OTHERV 1.06P 0.06


New metrics for player performance

EPV-added:

Rank Player EPVA

1 Kevin Durant 3.262 LeBron James 2.963 Jose Calderon 2.794 Dirk Nowitzki 2.695 Stephen Curry 2.506 Kyle Korver 2.017 Serge Ibaka 1.708 Channing Frye 1.659 Al Horford 1.55

10 Goran Dragic 1.54

Rank Player EPVA

277 Zaza Pachulia -1.55278 DeMarcus Cousins -1.59279 Gordon Hayward -1.61280 Jimmy Butler -1.61281 Rodney Stuckey -1.63282 Ersan Ilyasova -1.89283 DeMar DeRozan -2.03284 Rajon Rondo -2.27285 Ricky Rubio -2.36286 Rudy Gay -2.59

Table : Top 10 and bottom 10 players by EPV-added (EPVA) per game in 2013-14,minimum 500 touches during season.


New metrics for player performance

Shot satisfaction:

Rank Player Satis.

1 Mason Plumlee 0.352 Pablo Prigioni 0.313 Mike Miller 0.274 Andre Drummond 0.265 Brandan Wright 0.246 DeAndre Jordan 0.247 Kyle Korver 0.248 Jose Calderon 0.229 Jodie Meeks 0.22

10 Anthony Tolliver 0.22

Rank Player Satis.

277 Garrett Temple -0.02278 Kevin Garnett -0.02279 Shane Larkin -0.02280 Tayshaun Prince -0.03281 Dennis Schroder -0.04282 LaMarcus Aldridge -0.04283 Ricky Rubio -0.04284 Roy Hibbert -0.05285 Will Bynum -0.05286 Darrell Arthur -0.05

Table : Top 10 and bottom 10 players by shot satisfaction in 2013-14, minimum 500touches during season.


Acknowledgements and future work

Our EPV framework can be extended to better incorporate unique basketballstrategies:

Additional macrotransitions can be defined, such as pick and rolls, screens,and other set plays.

Use more information in defensive matchups (only defender locations, notidentities, are currently used).

Summarize and aggregate EPV estimates into useful player- or team-specificmetrics.

Thanks to:

Co-authors: Alex D’Amour, Luke Bornn, Kirk Goldsberry.

Colleagues: Alex Franks, Andrew Miller.


Acknowledgements and future work

Our EPV framework can be extended to better incorporate unique basketballstrategies:

Additional macrotransitions can be defined, such as pick and rolls, screens,and other set plays.

Use more information in defensive matchups (only defender locations, notidentities, are currently used).

Summarize and aggregate EPV estimates into useful player- or team-specificmetrics.

Thanks to:

Co-authors: Alex D’Amour, Luke Bornn, Kirk Goldsberry.

Colleagues: Alex Franks, Andrew Miller.


References I

[0] Besag, J. (1974).Spatial interaction and the statistical analysis of lattice systems.Journal of the Royal Statistical Society: Series B (Methodological), 36(2):192–236.

[0] Cox, D. R. (1975).A note on partially bayes inference and the linear model.Biometrika, 62(3):651–654.

[0] Franks, A., Miller, A., Bornn, L., and Goldsberry, K. (2015).Characterizing the spatial structure of defensive skill in professional basketball.Annals of Applied Statistics.

[0] Higdon, D. (2002).Space and space-time modeling using process convolutions.In Quantitative Methods for Current Environmental Issues, pages 37–56. Springer, New York, NY.

[0] Lindgren, F., Rue, H., and Lindstrom, J. (2011).An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach.Journal of the Royal Statistical Society: Series B (Methodological), 73(4):423–498.

[0] Rue, H., Martino, S., and Chopin, N. (2009).Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations.Journal of the Royal Statistical Society: Series B (Methodological), 71(2):319–392.


Documents

A Multiresolution Stochastic Process Model for Basketball ...dcervone.com/slides/epv_jsm.pdfNBA optical tracking data (x;y) locations for all 10 players (5 on each team) at 25Hz. (x;y;z)