27
Winning Daily Fantasy Sports Hockey Contests Using Integer Programming David Scott Hunter Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA 02139, [email protected] Juan Pablo Vielma Department of Operations Research, Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA 02139, [email protected] Tauhid Zaman Department of Operations Management, Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA 02139, [email protected] We present an integer programming approach to winning daily fantasy sports hockey contests which have top heavy payoff structures (i.e. most of the winnings go to the top ranked entries). Our approach incorporates publicly available predictions on player and team performance into a series of integer programming problems that compute optimal lineups to enter into the contests. We find that the lineups produced by our approach perform well in practice and are even able to come in first place in contests with thousands of entries. We also show through simulations how the profit margin varies as a function of the number of lineups entered and the points distribution of the entered lineups. Studies on real contest data show how different versions of our integer programming approach perform in practice. We find two major insights for winning top heavy contests. First, each lineup entered should be constructed in a way to maximize its individual mean, and more importantly, variance. Second, multiple lineups should be entered and they should be constructed in a way to limit their correlation with each other. Our approach can easily be extended to other sports besides hockey, such as American football and baseball. Key words : sports analytics, daily fantasy sports, statistics, integer programming History : 1. Introduction Daily fantasy sports allow users to enter lineups of players in a sport (such as hockey or American football) into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money in the contests. Daily fantasy sports have recently become a huge industry. For instance, DraftKings, the leading company in this area, reported $30 million in revenues and $304 million in entry fees in 2014 (Heitner 2015). While there is ongoing legal debate as to whether or not daily fantasy sports contests are gambling (Board 2015), there is a strong element of skill required to continuously win contests as evidenced by the fact that the top one percent of players pay 40 percent of the entry fees and gain 91 percent of the 1 arXiv:1604.01455v1 [stat.OT] 6 Apr 2016

Winning Daily Fantasy Sports Hockey Contests Using Integer ... · into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Winning Daily Fantasy Sports Hockey Contests Using Integer ... · into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money

Winning Daily Fantasy Sports Hockey Contests UsingInteger Programming

David Scott HunterOperations Research Center, Massachusetts Institute of Technology, Cambridge, MA 02139, [email protected]

Juan Pablo VielmaDepartment of Operations Research, Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA 02139,

[email protected]

Tauhid ZamanDepartment of Operations Management, Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA 02139,

[email protected]

We present an integer programming approach to winning daily fantasy sports hockey contests which have top heavy

payoff structures (i.e. most of the winnings go to the top ranked entries). Our approach incorporates publicly available

predictions on player and team performance into a series of integer programming problems that compute optimal lineups

to enter into the contests. We find that the lineups produced by our approach perform well in practice and are even able to

come in first place in contests with thousands of entries. We also show through simulations how the profit margin varies

as a function of the number of lineups entered and the points distribution of the entered lineups. Studies on real contest

data show how different versions of our integer programming approach perform in practice. We find two major insights

for winning top heavy contests. First, each lineup entered should be constructed in a way to maximize its individual

mean, and more importantly, variance. Second, multiple lineups should be entered and they should be constructed in a

way to limit their correlation with each other. Our approach can easily be extended to other sports besides hockey, such

as American football and baseball.

Key words: sports analytics, daily fantasy sports, statistics, integer programming

History:

1. IntroductionDaily fantasy sports allow users to enter lineups of players in a sport (such as hockey or American football)

into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of

money in the contests. Daily fantasy sports have recently become a huge industry. For instance, DraftKings,

the leading company in this area, reported $30 million in revenues and $304 million in entry fees in 2014

(Heitner 2015). While there is ongoing legal debate as to whether or not daily fantasy sports contests are

gambling (Board 2015), there is a strong element of skill required to continuously win contests as evidenced

by the fact that the top one percent of players pay 40 percent of the entry fees and gain 91 percent of the

1

arX

iv:1

604.

0145

5v1

[st

at.O

T]

6 A

pr 2

016

Page 2: Winning Daily Fantasy Sports Hockey Contests Using Integer ... · into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money

Hunter, Vielma, and Zaman: Winning2

profits (Harwell 2015). Daily fantasy sports therefore provide a great opportunity for users to compete and

earn profits with analytics. Several sites such as Rotogrinders have been created which solely focus on daily

fantasy sports analytics (Rotogrinders 2015). Other sites such as Daily Fantasy Nerd even offer to provide

optimized contest lineups for a fee (DailyFantasyNerd 2015).

There are a variety of contests in daily fantasy sports with different payoff structures. Some contests

distribute all the entry fees equally among all lineups that score above the median score of the contest. These

are known as double-up or 50/50 contests. They are relatively easy to win, but do not pay out very much.

Other contests give a disproportionately high fraction of the entry fees to the top scoring lineups. We refer

to these as top heavy contests. Winning a top heavy contest is much more difficult than winning a double-up

or a 50/50 contest, but the payoff is also much greater. Typical entry fees range from $3 to $30, while the

potential winnings range from thousands to a million dollars (DraftKings 2015).

A daily fantasy sports company sets a price (in virtual dollars) for each player in a sport. These prices

vary day to day and are based upon past and predicted performance (Steinberg 2015). Lineups must then be

constructed within a fixed budget. For instance, in DraftKings hockey contests, there is a $50,000 budget

for each lineup. There are further constraints on the maximum number of each type of player allowed and

the number of different teams represented in a lineup. Finding an optimal lineup is challenging because

the performance of each player is unknown and there are hundreds of players to choose from. However,

predictions for player points can be found on a host of sites such as Rotogrinders (Rotogrinders 2015) or

Daily Fantasy Nerd (DailyFantasyNerd 2015) which are fairly accurate, and many fantasy sports players

have a good sense about how players will perform. This suggests that the main challenge is not in predicting

player performance, but in using predictions and other information to construct optimal lineups.

The strategy for constructing lineups depends upon the type of contest being entered. For double-up and

50/50 contests all that is needed is a lineup that beats the median. In this case, one would want a lineup that

has a high average predicted score but also low variance. In contrast, top heavy contests provide extreme

profits if a lineup can achieve a very high rank. If the predictions were perfect, then this could be done with a

single lineup, similar to what one would do for a double-up or 50/50. The problem with this approach is that

the predictions are noisy, so one could not consistently achieve a high rank with a single lineup. However,

two things can be done to increase the chance of victory. First, one can use lineups which have an inherently

large variance, which would increase the chance of the lineup having an extremely good performance and

achieving a high rank. Second, one can enter multiple lineups, thereby increasing the chance that at least

one of them will rank high. Because of the payoff structure of a top heavy contest, having at least one lineup

achieve a high rank will be enough to make a substantial profit. Therefore, a good approach to winning top

heavy contests is to enter many lineups with high variance.

In this work we use integer programming to win top heavy hockey contests in DraftKings. Our integer

programming approach allows us to systematically construct multiple lineups which have high variance. We

Page 3: Winning Daily Fantasy Sports Hockey Contests Using Integer ... · into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money

Hunter, Vielma, and Zaman: Winning3

November 15, 2015 November 16, 2015 November 17, 2015 November 23, 2015

Figure 1 Screenshots of the performance of our top integer programming lineups in real DraftKings hockey

contests with 200 linueps entered. All profits are being donated to charity.

chose hockey because we are not familiar with it. This way, any success we have is strictly due to our use of

integer programming and not any of our own expert knowledge. The contests we focused on had thousands

of entrants and top payoffs of several thousand dollars.

During the period when this work was conducted, DraftKings allowed a single user to enter 200 lineups

into a contest. We began competing with our integer programming approach and performed very well several

days in a row. We even placed third or higher in three consecutive contests. We show screenshots of our

performance with 200 lineups in Figure 1. A week after we began winning with our integer programming

approach, DraftKings reduced the number of multiple entries to 100. This reduced our performance, but

we were still able to perform well in the contests. This demonstrates the power of integer programming for

winning daily fantasy sports contests.

The remainder of the paper is structured as follows. We begin by reviewing the literature on sports ana-

lytics with a particular focus on hockey in Section 1.1. We present an empirical analysis of NHL player

performance and the prediction accuracy of sites which forecast player performance in Section 2. We present

a simulation study of the multiple lineups strategy in Section 3 which shows how much of an edge a set

of lineups needs in order to win top heavy contests. Section 4 presents our integer programming approach

for constructing multiple lineups. We evaluate the performance of our integer programming approach in

actual DraftKings hockey contests in Section 5. We conclude in Section 6 with a discussion on extending

our integer programming approach to other sports.

1.1. Previous Work

Much of the extant work in hockey analytics has focused on statistical models of player performance. The

traditional measure of a player’s ability is the plus-minus statistic, which measures the number of goals

Page 4: Winning Daily Fantasy Sports Hockey Contests Using Integer ... · into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money

Hunter, Vielma, and Zaman: Winning4

scored by a player’s team minus the number of goals scored by the opposing team when the player in on

the ice. Many modifications have been suggested for this simple metric in order to provide a more clear

assessment of a player’s quality. The adjusted minus/plus probability approach of Schuckers et al. (2011)

attempts to add in other data such as hits or face-offs. Controlling for team strength by using the team-

average plus-minus was proposed in Awad (2009). Other approaches attempt to statistically infer the effects

of individual factors on team performance (Thomas et al. 2013). An approach using a regression on goals

per hour to assess player performance in proposed in Macdonald (2011). To deal with the large number of

features present in the high dimensional regressions that occur for player performance, a regularized logistic

regression approach is used in Gramacy et al. (2013). The Added Goal Value metric is presented in Pettigrew

(2015) which incorporates information on what are known as power plays in hockey. A methodology known

as total hockey rating is proposed in Schuckers and Curro (2013) for performance evaluation of skaters

which incorporates data on factors such as zone starts and non-shooting events such as turnovers.

While the above models focus on skating players, a different approach is needed when evaluating goalie

performance. Typically a goalie is evaluated by their save percentage. However, this may depend upon the

strength of the defense of the team. A defense independent rating of goalies is presented in Schuckers (2011)

which uses data on the spatial locations of shots they faced. In Beaudoin and Swartz (2010) a simulator is

developed which assesses strategies for pulling goalies in hockey games.

In the area of fantasy sports there are a plethora of websites which provide strategies for drafting lineups

or algorithmic tools which provide optimized lineups. However, there is very little formal academic work

on this topic. In Becker and Sun (2014) an integer programming approach is presented which optimizes

lineups for an entire fantasy American football season. This work has elements similar to our work, such as

the use of integer programming to optimize lineups, but there are a few key differences. First the focus is on

season long fantasy contests, whereas we focus on daily fantasy contests. Second, the model is developed

for American football, which has a very different structure from hockey. Third, the approach produces a

single optimized lineup, whereas our strategy is to generate multiple lineups.

2. Hockey Data AnalysisThere are two capabilities we utilize in our approach to winning daily fantasy sports hockey contests. The

first is exploiting the structure of hockey teams and fantasy points scoring systems in order to construct

lineups that can score many fantasy points. The second is to be able to predict the performance of players so

that our lineups do indeed achieve the scores we expect. In this section we present data analysis which will

show how to predict player performance and guide the construction of an integer programming formulation

to select multiple lineups in Section 4. The daily fantasy contests we focus on are from the site DraftKings

and are based on players in the National Hockey League (NHL), which is the main hockey league in North

America. We begin by describing the basic elements of hockey and hockey daily fantasy sports contests.

Page 5: Winning Daily Fantasy Sports Hockey Contests Using Integer ... · into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money

Hunter, Vielma, and Zaman: Winning5

We then evaluate the performance of different public websites that provide predictions for player and team

performance. After this we study the correlations in player performance which result from the structure of

hockey teams. While our analysis focuses on DraftKings, we note that other daily fantasy sports sites have

very similar contests and point scoring systems. All data in this section comes from October 21, 2015 until

December 31, 2015.

2.1. Hockey Basics

A hockey team consists of six players: one center, two wingers, two defensemen, and a goalie. The centers,

wingers, and defensemen are known as skaters because they are free to skate on the ice rink, whereas the

goalie primarily stays in front of the goal. In hockey the aim is for a team to shoot a puck into the goal of

the opposing team and the goalie’s task is to protect the goal from any opponent scores.

In daily fantasy sports hockey contests there are constraints on the types of players that must be present

in a lineup. For DraftKings, each lineup must consist of two centers, three wingers, two defensemen, one

goalie, and one utility player which is any skater, resulting in a lineup of nine players. We show a screenshot

of one possible DraftKings lineup in Figure 2. The figure also shows a column labeled FPPG, which stands

for fantasy points per game. This is the average number of fantasy points per game the player has earned

throughout the current NHL season. DraftKings provides this information to help users construct their

lineups. The total fantasy points for a lineup is the sum of each player’s fantasy points. The lineup shown

has 34.4 fantasy points per game. This score is in the typical range of NHL lineups in DraftKings and gives

a sense of the value of different scoring actions. In a daily fantasy sports contest, there is a specific way in

which players score fantasy points. Because we test our approach in DraftKings, we will focus on its points

scoring system1. We summarize the details of the DraftKings scoring system in Table 1. In Figure 2 we

show the basic statistics of points per game for each position averaged over all NHL players. We also show

the average DraftKings cost per player in the figure. As can be seen, on average goalies tend to score almost

twice as many points as all other players and also cost twice as much. Also, the standard deviation of each

position’s points per game is roughly the same size as the average value.

The centers and wingers are known as offensive players. In the NHL each hockey team plays its offensive

players in fixed sets known as lines. This means that if one player from the line, such as the center, is on

the ice, then so are the other two players from the line. Each team has four standard lines and two power

play lines, which only play when there is a power play as a result of a penalty. We show the average points

per game for players based upon their line in Figure 2. The rank of the line typically suggests how well it

performs. As can be seen, line one is generally the highest scoring line.

The points system of DraftKings is very similar to most daily fantasy hockey points systems. It gives

more points for players that perform better, but it also has some important properties which we use to

1 Full scoring rules can be found at www.draftkings.com/rules.

Page 6: Winning Daily Fantasy Sports Hockey Contests Using Integer ... · into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money

Hunter, Vielma, and Zaman: Winning6

Score type PointsGoal 3Assist 2Shot on Goal 0.5Blocked Shot 0.5Short Handed Point Bonus (Goal/Assist) 1Shootout Goal 0.2Hat Trick Bonus 1.5Win (goalie only) 3Save (goalie only) 0.2Goal allowed (goalie only) -1Shutout Bonus (goalie only) 2

Table 1 Points system for NHL contests in DraftKings.

construct our lineups. The first property concerns assists, which is passing the puck to a player who either

scores or who passes the puck to the player that scores. Each assist is worth two points and DraftKings

awards points for up to two assists per goal. This is an important aspect of the scoring system because it

means that for each goal scored, it is possible for a single lineup to receive seven points (three points for the

goal, and four points for the two assists that led to the goal). However, such a situation can only occur if all

players involved are on a single lineup. This suggests that to score many points, a lineup should contain all

players from a single line. By putting all players from a single line in a lineup, we are essentially increasing

the variance of the lineup. This is a popular fantasy sports tactic known as stacking. Many expert daily

fantasy sports players use stacking to win in a variety of sports contests (Drewby 2015), (Bales 2015b),

(Bales 2015a). The second property of the points system concerns goalies. If a goalie’s team wins the game,

an additional three points is awarded to the goalie. This is important because as we will see, it is possible

to predict who will win an NHL hockey game with relatively good accuracy, so by choosing goalies on

winning teams, an extra three points can be obtained, which is a significant amount in hockey contests given

that lineups average between 30 to 40 points.

2.2. Prediction Sites

Many fantasy sports players devote a substantial amount of effort in developing elaborate prediction mod-

els for players’ performance. Their advantage comes mainly from the power of their predictive models.

However, we do not have the domain knowledge required to improve upon the prediction models of the top

fantasy players. Therefore, we take a different approach. Publicly available player point predictions are pro-

vided by the websites Rotogrinders (Rotogrinders 2015) and Daily Fantasy Nerd (DailyFantasyNerd 2015).

We find the predictions from these sites to be relatively accurate, but the accuracy varies by position. We

plot in Figure 2 the mean absolute error in points of the two sites’ predictions for different hockey positions

using our data from the 2015 NHL season. We see that the prediction errors of both sites are very similar.

Therefore, it does not seem that either one has a superior prediction model. In Section 4.1 we build a simple

regression based prediction model which uses the data from both of these sites.

Page 7: Winning Daily Fantasy Sports Hockey Contests Using Integer ... · into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money

Hunter, Vielma, and Zaman: Winning7

Goalie Center Winger Defenseman Skaters0

0.5

1

1.5

2

2.5

3

3.5

Mea

n A

bsol

ute

Err

or [F

PP

G]

Daily Fantasy NerdRotogrinders

Goalie Center Winger Defenseman0

1

2

3

4

5

6

7

8Average FPPGStandard Deviation FPPGAverage Player Cost [$ thousands]

Winger Center0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5Line 1Line 2Line 3Line 4Power Play Line 1Power Play Line 2

Ave

rage

FP

PG

Figure 2 (top left) Mean and standard deviation of fantasy points per game (FPPG) and average player cost

in DraftKings for each hockey position. (bottom left) Mean absolute error of NHL player FPPG

predictions of Daily Fantasy Nerd and Rotogrinders for each hockey position. (top right) Mean

FFPG of offensive skaters in different types of lines. (bottom right) Screenhsot of an NHL lineup

from DraftKings.

2.3. Player Correlations

In Section 4 we will present an integer programming formulation which creates lineups with a large vari-

ance. The key to this approach is putting players on lineups with positive correlations in their FPPG. This

way a lineup will either perform very well or very poorly. For top heavy contests, we will see that this

strategy is effective. To create high variance lineups we must first understand the key correlations between

players. There are two important correlations we look at. The first concerns players in lines. The second

concerns offensive players and goalies.

We first begin by examining the correlations of FPPG between players in the same line versus those in

different lines. We calculate the Pearson correlation coefficient between the FPPG for every pair of NHL

offensive players (centers and wingers) in the same line and in the same team but different line. We plot

the resulting average correlation coefficients for each group in Figure 3. We see that the players on the

same line have a much higher correlation than players on different lines. This is not surprising given the

way fantasy points are awarded. This correlation is most likely due to the assist/goal combinations that give

Page 8: Winning Daily Fantasy Sports Hockey Contests Using Integer ... · into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money

Hunter, Vielma, and Zaman: Winning8

Line 1 Line 2 Line 3 Line 4 Different Line−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

Cor

r. C

oef.

of O

ffens

ive

Pla

yer

FPP

G

Same Team Different Team-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

Cor

r. C

oeff.

of G

oalie

-Ska

ter F

PP

G Offensive SkaterDefensemen

Figure 3 (left) Average correlation coefficient of FPPG for offensive players in the same and different lines.

(right) Average correlation coefficient of FPPG for goalies and skaters in the same and opposing

teams.

points to multiple players who are simultaneously on the ice. This correlation should be included in a lineup

to increase the variance.

We next look at the correlation between goalies and skaters in the same game. We consider the correlation

of FPPG for offensive players and defensemen with goalies on the same and opposing team in a game. We

plot the resulting average Pearson correlation coefficient of the FPPG for these groups in Figure 3. We find

that the correlation is negative for goalies and skaters on opposing teams. For the same team the correlation

is near zero. The negative correlation is due to the fact that if a skater scores fantasy points, it is typically

because a goal was scored, which gives negative fantasy points to the goalie. On the same team there is no

structural condition that would correlate skaters and goalies on the same team. The negative correlation is

something we want to avoid within a lineup as it would reduce the overall variance. Therefore, to increase

lineup variance, we will avoid putting goalies and skaters on opposing teams in a single lineup.

3. Simulation Study of the Multiple Lineups StrategyBecause of the excessively high payoff for the best lineups in top heavy contests, the key to winning is to

have at least one lineup achieve a high rank. For instance, in a DraftKings NHL Sniper contest, nothing

is paid to the bottom 79% of lineups but $4000 is paid to the first place lineup. The entry fee is $3 per

lineup and an individual user can enter up to 100 or 200 lineups2. If a user entered 200 lineups in the

Sniper contest and one of the lineups came in at least seventh place, then all $600 in entry fees would be

recovered. The Sniper contest payoff structure is similar to other top heavy contests. Profit can be made in

these contests even if all lineups entered do poorly, as long as at least one does very well. We provide an

2 Originally a maximum of 200 lineups per user were allowed in any contest, but during the course of this research DraftKingsreduced this to 100 lineups.

Page 9: Winning Daily Fantasy Sports Hockey Contests Using Integer ... · into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money

Hunter, Vielma, and Zaman: Winning9

integer programming approach to construct multiple lineups in Section 4 that have some sort of edge over

the other contest lineups. We wish to understand how much of an edge the integer programming approach

needs to provide in order for it to be profitable.

To understand how the multiple lineup strategy will perform in a top heavy contest, we model the lineups

as follows. Let n be the total number of lineups in a contest. Let m be the number of lineups we will enter

using our integer programming approach. The rules of daily fantasy sports force m to be no more than 1%

or 2% of n. To analyze the behavior of the profit, we model the points of each lineup as an independent

normal random variable. In reality this will not be the case because many lineups will share players, but

we make the assumption here to simplify our analysis. Let the points of the integer programming lineups

be given by normal random variables Xi, 1≤ i≤m, with mean and variance µ1 and σ21 . Let the points of

the other lineups in the contest, which we refer to as the population lineups, be given by the normal random

variables Yi, 1≤ i≤ n−m with mean and variance µ0 and σ20 . We wish to understand how the profit of the

integer programming lineups behaves as a function of the contest payoff structure, the number of integer

programming lineups entered, and the edge provided by the integer programming approach.

We let the profit of each rank in a contest be given by a vector p = [p1, p2, ..., pn] where pi is the total

profit (winnings minus entry fee) of a lineup ranking i out of n. We have that∑n

i=1 pi < 0 since a fixed

percentage of the entry fees are claimed by DraftKings as revenue. We define the order statistics of the

integer programming lineups as X(1) ≥X(2) ≥ ...≥X(m) and the order statistics of the population lineups

as Y(1) ≥ Y(2) ≥ ...≥ Y(n−m). The rank of the ith best integer programming lineup X(i) in the contest is ri

and the profit margin from a contest is

Q=

∑m

i=1 pricm

. (3.1)

This profit margin is a function of n,m, p, c, and the distributions of Xi, and Yi. To understand the rela-

tionship between Q and the other variables, we perform a simulation study. We consider the NHL Sniper

contest which has a top heavy payoff structure shown in Figure 4. The size of the contest varies from day

to day, but typically has several thousand entries. In our simulations we use n= 21,073 which is the actual

number of entries in one of the Sniper contests. To simulate the Sniper contest we fix the distribution param-

eters, simulate m integer programming lineups and n−m population lineups, and calculate the resulting

profit margin using equation (3.1). We repeat this for 10,000 trials for each set of parameter values. We then

investigate the relationship between profit margin and the different parameters of the model.

We plot in Figure 5 the median profit as a function of m for the Sniper contest with µ0 = 27.5 and

σ0 = 7.1, which are actual population values for one of the Sniper contests. We use different values for

µ1 and σ1. We simulate up to m = 1000, but in reality m is no more than 100 or 200. We see that when

the algorithm and population have equivalent variances, the integer programming mean must be two points

above the population to make profit. From the figure, the profit margin becomes positive at approximately

Page 10: Winning Daily Fantasy Sports Hockey Contests Using Integer ... · into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money

Hunter, Vielma, and Zaman: Winning10

100 101 102 103 104100

101

102

103

104

Rank

Pay

off [

$]

Figure 4 Payoff versus lineup rank of the Sniper contest with 21,073 entries. Ranks which receive zero payoff

are not shown.

140 integer programming lineups when µ1−µ0 > 2. The profit margin also tends to increases in m for this

range. Therefore, if the integer programming approach could only increase µ1, then more lineups would

always be better in this range. The profit margin clearly decreases at some point because if m = n the

profit margin would be negative since the net payoff of the contest is negative (due to fees collected by

DraftKings). However, this occurs far beyond m= 1,000 and so is not relevant in real contests.

We see a more interesting behavior if we set the integer programming mean equal to the population mean

and vary the integer programming variance. Here we see that with σ1 only two points above σ0, positive

profit can be achieved with 35 lineups. If σ1 is four points above σ0, something different occurs. Now the

peak median profit margin of 1,250 % is achieved at around 100 lineups, but for more than 100 lineups, the

profit margin actually begins to decrease. For the range of m allowed by DraftKings, an increase in σ1 has

much more impact on the profit margin than an increase in µ1. This is a result of the payoff structure of top

heavy contests.

To further understand how the algorithm lineup distribution affects the profit margin for values of m that

are used in practice, we fix µ0 and σ0 as before and we set m equal to 100 and 200 (the new and original

limits on multiple lineups set by DraftKings). We then sweep over µ1 and σ1 and trace the profit margin

contours in Figure 6. We see from this figure that if µ0 = µ1 then we need σ1 to be about one point above

σ0 to make profit. If σ0 = σ1, µ1 must be approximately two points above µ0 to make profit.

The integer programming approach can even make profit if its mean or variance are below the population.

For instance, if µ1 − µ0 =−1, then the integer programming approach will make profit if σ1 is more than

one point above σ0. But if σ1 − σ0 = −1 then the integer programming approach needs µ1 to be at least

three points above µ0. This shows that the key to winning top heavy contests is having lineups which have

a higher variance than the population, even if this comes at the cost of a lower mean.

Page 11: Winning Daily Fantasy Sports Hockey Contests Using Integer ... · into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money

Hunter, Vielma, and Zaman: Winning11

0 200 400 600 800 1000−150

−100

−50

0

50

100

150

200

250

300

Number of integer program lineups

Med

ian

prof

it m

argi

n (p

erce

nt)

µ1−µ

0 = −2

µ1−µ

0 = 0

µ1−µ

0 = 2

µ1−µ

0 = 4

0 200 400 600 800 1000−200

0

200

400

600

800

1000

1200

1400

Number of integer program lineups

Med

ian

prof

it m

argi

n (p

erce

nt)

σ1−σ

0 = −2

σ1−σ

0 = 0

σ1−σ

0 = 2

σ1−σ

0 = 4

Figure 5 The median profit as a function of the number of integer programming lineups for the Sniper con-

test with n= 21,073, µ0 = 27.5, and σ0 = 7.1. (right) σ1 = σ and curves for different values of µ1 are

shown. (left) µ1 = µ and curves for different values of σ1 are shown.

0

0

0

0

200

200

200

200

400

400

400

400

600

600

600

600

200 integer program lineups

µ1−µ0

σ 1−σ 0

−5 −4 −3 −2 −1 0 1 2 3 4 5−5

−4

−3

−2

−1

0

1

2

3

4

5

0

0

0

0

200

200

200

200

400

400

400

600

600

600

100 integer program lineups

µ1−µ0

σ 1−σ 0

−5 −4 −3 −2 −1 0 1 2 3 4 5−5

−4

−3

−2

−1

0

1

2

3

4

5

Figure 6 Contour plot of the median profit margin (in percent) for the Sniper contest with n = 21,073, µ0 =

27.5, σ0 = 7.1 and different numbers of integer programming lineups.

3.1. Double-ups versus Top Heavy Contests

Our simulation analysis shows that the multiple lineups strategy will be very profitable if our integer pro-

gramming approach can produce lineups with an edge of a few points above the population. We now inves-

tigate if a similar result holds for another popular contest, the double-up. Recall that double-up contests

divide all the entry fees evenly among roughly the top half of the lineups. We consider an actual DraftKings

hockey double-up contest with n = 1,136 and m = 30. This contest has a smaller overall size and multi-

ple entry limit than the top heavy competitions we analyzed. We simulate performance on this contest as

a function of the integer programming edge and show the result in Figure 7. Two things are evident from

this figure. First, the profit margin is much lower than for top heavy contests. In the figure the maximum

Page 12: Winning Daily Fantasy Sports Hockey Contests Using Integer ... · into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money

Hunter, Vielma, and Zaman: Winning12

profit margin is 90 percent. Second, the profit margin decreases as σ1 increases, in contrast to the top heavy

contests. In fact, the highest profits are made when the integer programming approach has a positive edge

in the mean and a negative edge in the variance. This difference in strategies for double-up and top heavy

contests is due to the different payoff structures of the contests. Top heavy contests have a high variance

payoff structure and require the integer programming approach to have a high variance as well, whereas

double-up contests have a low variance payoff structure and prefer a lower variance.

The simplicity of the double-up payoff structure allows us to accurately approximate the profit margin

with a closed form expression. We assume the entry fee is c and that m lineups are entered. For our approx-

imation we assume that n is very large so that the median score in the competition is given by µ0 (recall

that for normal random variables the mean and median are equal). For a double-up, the payoff is approxi-

mately 2c for any lineup above the median and zero for the rest. We say approximately because the cutoff

for winning is actually slightly above the median in order for some of the entry fees to be collected by the

daily fantasy sports company. Then, using equation (3.1) the profit margin for m lineups is

Qm =c∑m

i=1 1{Xi >µ0}− c∑m

i=1 1{Xi ≤ µ0}cm

=2

m

m∑i=1

1{Xi >µ0}− 1

Recall that we model Xi as normal random variables. We define F (x;µ,σ) as the cumulative distribution

function (CDF) of a normal random variable with mean µ and standard deviation σ. Then we have the

following result for the limiting profit margin of a double-up.

Theorem 3.1 Let Q= 1− 2F (µ0;µ1, σ1). Then Qm converges almost surely to Q.

Proof. From the strong law of large numbers we have that 1m

∑m

i=1 1{Xi >µ0}a.s.−→ 1−F (µ0;µ1, σ1). The

expression for Q follows from this. �

The expression Q is our approximation for the profit margin of a double up contest when we have many

lineups entered and the contest is very large. We plot Q in Figure 7 along side the simulated profit margin

for 30 integer programming lineups. As can be seen, the approximation is very close to the simulated value.

Therefore, we can use Q to very easily map the profit margin surface of double-up contests. We find the

same basic conclusions as with the simulation, i.e. the best lineups have high mean and low variance. While

we can find a closed form expression for the profit margin surface for double-ups, such an expression cannot

be easily found for top heavy competitions because of the complexity of the payoff structure. Instead we

must rely on simulations to map the profit margin surface.

Page 13: Winning Daily Fantasy Sports Hockey Contests Using Integer ... · into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money

Hunter, Vielma, and Zaman: Winning13

00

0

1010

10

10

20

2020

20

20

30

30

30

30

40

40

40

50

50

50

60

60

70

70

80

90

30 algorithm lineups, double-up

µ1−µ0

σ 1−σ0

−5 −4 −3 −2 −1 0 1 2 3 4 5−5

−4

−3

−2

−1

0

1

2

3

4

0

00

1010

1020

20

20

30

30

30

40

40

40

50

50

50

60

60

70

70

80

80

90

Theory, double−up

µ1−µ0

σ 1−σ0

−5 −4 −3 −2 −1 0 1 2 3 4 5−5

−4

−3

−2

−1

0

1

2

3

4

Figure 7 (left) Contour plot of the median profit margin (in percent) for a double-up contest with n= 1,136,

m= 30, µ0 = 27.5, and σ0 = 7.1. (right) Contour plot of the theoretical profit margin (in percent) for a

double-up contest with µ0 = 27.5, and σ0 = 7.1.

4. Multiple Lineup Integer Programming ApproachWe now present our integer programming approach for selecting multiple lineups for hockey contests. The

integer programming approach takes as input the player point predictions, which we calculate using a simple

regression model. We then construct and solve an integer programming formulation to produce a single

lineup with maximum predicted points, subject to a set of stacking constraints which increase the lineup

variance. Then, we iteratively add constraints to the integer programming formulation in order to produce

multiple lineups which are distinct from previously constructed lineups. We now go into the details of the

prediction model and the integer programming formulation.

4.1. Player Point Predictions

We first build a predictive model for the fantasy points for each player in the NHL. Rather than build

complex statistical model to make predictions, we instead choose a much simpler approach. We saw that

sites such as Rotogrinders and Daily Fantasy Nerd have relatively accurate predictions for player points.

Further, these sites are run by fantasy sports experts who most likely have much domain expertise and have

spent a great amount of effort to construct their predictions. Therefore, it is unlikely that any statistical

model we build would have an edge over these sites, especially given our lack of domain knowledge for

hockey. Rather, it is sensible and very simple to use these predictions for the players’ points. Here we will

analyze a set of simple predictive models and evaluate their prediction accuracy. Later in Section 5 we will

evaluate these different models by their profitability in real contests.

Page 14: Winning Daily Fantasy Sports Hockey Contests Using Integer ... · into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money

Hunter, Vielma, and Zaman: Winning14

(1) (2) (3) (4) (5) (6)skater points goalie points goalie points goalie points goalie points goalie points

β1 0.634∗∗∗ 0.628∗∗∗ 0.203 0.203(Rotogrinders) (0.0203) (0.0864) (0.144) (0.144)

β2 0.282∗∗∗ -0.0173 -0.0309 -0.0301(Daily Fantasy Nerd) (0.0242) (0.106) (0.113) (0.113)

β3 4.310∗ 5.304∗∗ 4.176∗ 5.172∗∗

(Win Probability) (2.123) (2.005) (2.064) (1.941)

β0 1.334 1.686(0.0314) (0.549) (1.032) (1.003) (1.013) (0.983)

R2 0.238 0.0858 0.0179 0.0140 0.0178 0.0139Samples 10825 565 506 506 506 506Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

Table 2 Regression coefficients, R squared, and sample count for different prediction models of skater and

goalie points.

For a player u, let xu1 and xu2 be the predictions by Rotogrinders and Daily Fantasy Nerd, respectively.

For skaters, we perform a linear regression on the player’s points using these predictions as features. We

denote yu as the predicted points of player u using our model. For skater u our points prediction is

yu = β0 +β1f1u +β2f2u. (4.1)

For goalies, we must also factor in the outcome of the game since there is a three point bonus for victory.

Daily Fantasy Nerd provides win probabilities for each goalie, and we have found these estimates to be

relatively accurate. We perform a linear regression on the goalie points using as predictor variables the

predictions of Rotogrinders, Daily Fantasy Nerd, and the win probability of the goalie’s team. For a goalie

u, let this win probability by pu. Then the points prediction for goalie u is

fu = β0 +β1f1u +β2f2u +β3pu. (4.2)

We compare different prediction models in Table 2. For skaters, both Rotogrinders and Daily Fantasy

Nerd predictions are significant. For goalies when we do not include the win probability we find that only

Rotogrinders is significant. When the win probability is included, it is the only significant feature. The

highest R squared value for goalies is the model which does not include the win probability. This suggests

that we should not include the win probability in our goalie model. However, we will see in Section 5 that

in terms of profit in real contests, the win probability model can have better performance.

4.2. Feasibility Constraints

Now that we have the predictions for the player points, we can construct the integer programming formula-

tion. This integer programming formulation will be solved iteratively with new added constraints to produce

Page 15: Winning Daily Fantasy Sports Hockey Contests Using Integer ... · into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money

Hunter, Vielma, and Zaman: Winning15

multiple lineups. We now introduce the basic notation for our algorithm. We will construct m lineups. We

assume there areN NHL players to choose from and we let the variable xuk equal one if player u is selected

for lineup k and zero otherwise. Each player u is predicted to achieve fu fantasy points. We let the fantasy

price of player u be cu. There are NT NHL teams playing in a contest, and we denote the set of players on

team i by Ti. Also, we denote the set of players who are centers, wingers, defenseman, and goalies by C,

W , D, and G, respectively.

We begin by formulating the basic feasibility constraints of a lineup. Daily fantasy sports contests all

impose a budget B constraint on a lineup. For DraftKings, this budget is B = 50,000 fantasy dollars. There

is also a constraint on the number of players of each type of position. In DraftKings, a lineup consists of

nine players: two centers, three wingers, two defensemen, a goalie, and a utility player who can be any

skater type. The position constraints are given by

2≤∑u∈C

xuk ≤ 3,

3≤∑u∈W

xuk ≤ 4,

2≤∑u∈D

xuk ≤ 3,∑u∈G

xuk = 1,

(4.3)

DraftKings also requires each lineup to have players that come from at least three different NHL teams. The

team constraints areti ≤

∑u∈Ti

xuk, ∀ i∈ {1, . . . ,NT}

NT∑i=1

ti ≥ 3,

ti ∈ {0, 1} , ∀ i∈ {1, . . . ,NT} .

(4.4)

We put these constraints together to obtain the basic constraints for a feasible lineup:

N∑u=1

cuxuk ≤B, (budget constraint)

N∑u=1

xuk = 9, (lineup size constraint)

Equations (4.3), (position constraint)

Equations (4.4), (team constraint)

xuk ∈ {0,1} , 1≤ u≤N.

(4.5)

4.3. Stacking Lineups Using Structural Correlations

As we discussed in Section 2.3, there are correlations that are induced by the structure of the game. If we

were able to perfectly predict every player’s performance, then we would not need this structural correlation

Page 16: Winning Daily Fantasy Sports Hockey Contests Using Integer ... · into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money

Hunter, Vielma, and Zaman: Winning16

information to win a given contest. However, uncertainty in the player performance requires us to take

advantage of these structural correlations in order to win the contests. In essence, what we want to do is use

the structural correlations to create lineups which have very high variance, which is referred to as stacking.

By stacking our lineups, we increase the chance that at least one lineup can achieve a top rank in a contest.

In this section we propose ways to stack lineups by incorporate this structural correlation information into

our integer programming formulation.

Goalie Stacking. As discussed in Section 2.3, goalies are negatively correlated with the skaters that they

are opposing. Therefore, to increase the variance of a lineup, we would like to avoid having this negative

correlation by making sure all of the lineup’s skaters are not opposing its goalie. We will refer to this

constraint as goalie stacking. To model this constraint in our integer programming formulation, we denote

the set of skaters who are opposing goalie u byOu. Due to the team constraint given by (4.4), the maximum

number of players we can have in one lineup from a given NHL team is six. For lineup k and goalie u, the

goalie stacking constraint is

6xuk +∑v∈Ou

xvk ≤ 6, ∀u∈G. (4.6)

This constraint will force the goalie variable xuk to be zero if the lineup has any skater opposing the goalie.

Line Stacking. Additionally, in Section 2.3 we found that players on the same line are positively cor-

related. Therefore, to increase the lineup variance, we would like to impose additional constraints on our

formulation that will create lineups with players on the same line. We will refer to this as line stacking.

There are eight possible skaters on a team and a line consists of three skaters. Therefore, the maximum

number of lines we can include in a lineup is two. To include as much positive correlation as possible

amongst the players, we can make a lineup to have one complete line of three players, and two partial lines

with at least two players each. To formulate the line stacking constraints, we introduce some notation. Let

NL denote the total number of lines, and let Li denote the set of players on line i. With this notation we can

formulate the constraint of having at least one complete line in lineup k as

3vi ≤∑u∈Li

xuk, ∀i∈ {1, . . . ,NL}

NL∑i=1

vi ≥ 1

vi ∈ {0,1} , ∀i∈ {1, . . . ,NL} .

(4.7)

Under these constraints vi is one if we have all three players from line i, and zero otherwise, and we require

at least one of the vi to be one. To formulate the constraint of having at least two players from two distinct

Page 17: Winning Daily Fantasy Sports Hockey Contests Using Integer ... · into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money

Hunter, Vielma, and Zaman: Winning17

lines, we use the following constraints:

2wi ≤∑u∈Li

xuk, ∀i∈ {1, . . . ,NL}

NL∑i=1

wi ≥ 2

wi ∈ {0,1} , ∀i∈ {1, . . . ,NL} .

(4.8)

Similar to the complete line constraints, this constraint sets wi to one if at least two out of three players of

line i are in the lineup, and there must be at least two wi equal to one.

Defensemen Stacking. Lastly, we have found that defensemen that are on the first power play line score

substantially higher than defensemen that are not. Therefore, to increase a lineup’s average points, one could

choose defensemen that are on the first power play line. We will refer to this as defensemen stacking. To

model this constraint, let PD denote the set of defensemen that are on any team’s first power play line. Then,

to use only these defensemen in lineup k we add the following constraints to our integer programming

formulation:

xuk = 0 ∀u∈D \PD. (4.9)

This constraint makes sure that any defenseman that is chosen for the lineup is also in the first power play

line of some team.

There are many other constraints that could be added to the model, such as forcing a lineup to have every

player on a team’s first power play, imposing constraints to have seven players from one game, etc. We

did not investigate all the possible types of stacking, but they can all be naturally incorporated into our

integer programming formulation. In this work we focus on the constraints presented because they capture

the major structural correlations in hockey.

4.4. Multiple Lineup Constraints

Multiple lineups are created by iteratively solving the integer programming formulation. For each new

lineup, we add additional linear constraints to the integer programming formulation in order to refine the

solution space by removing previously found lineups. Let x∗ul be the optimal value of the variable for player

u in lineup l. We define the multiple lineup constraints for lineup k as

N∑u=1

x∗ulxuk ≤ γ, l= 1, . . . , k− 1. (4.10)

These constraints make sure that lineup k will have no more than γ players in common with any previously

constructed lineup. We refer to γ as the maximum lineup overlap. Decreasing γ forces the lineups to be

more diverse, whereas for larger γ the lineups will share more players. We will see in Section 5 how the

choice of γ is impacted by the number of NHL games being played on a given night.

Page 18: Winning Daily Fantasy Sports Hockey Contests Using Integer ... · into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money

Hunter, Vielma, and Zaman: Winning18

4.5. Integer Programming Approach for Multiple Lineups

We now present the final integer programming formulation for constructing m lineups. We obtain lineup k

(1≤ k≤m) by solving

maximizex

N∑u=1

fuxuk

subject to Equation (4.5) (feasibility constraints)

Equation (4.10) (multiple lineup constraints)

Any subset of equations (4.6), (4.7), (4.8), (4.9) (stacking constraints).

(4.11)

By solving this integer programming formulation iterativelym times, each time updating the multiple lineup

constraints with the previously found solutions, one obtains m lineups. This formulation allows flexibility

in the choice of stacking constraints, prediction model, maximum lineup overlap, and number of linueps

constructed. We will see next how to select the best integer programming settings.

5. PerformanceWe now evaluate the performance of our algorithm on real daily fantasy sports hockey contests. We obtained

data on the performance of the population lineups in several actual DraftKings hockey contests by entering

the contest. As entrants, we are able to download the lineups and points of every lineup in the contest. In

total we have data for 38 different top heavy contests which contain several thousand entrants. We run our

algorithm and evaluate how much profit our lineups would have made in each contest competing against the

actual population lineups. We investigate a variety of integer programming settings, including the number

of lineups entered, the type of lineup stacking used, the prediction model used, and other features.

We combine the different constraints from Section 4.3 to create six different types of stacking which

are summarized in Table 3. The first type is no stacking, which only uses the basic constraints and none

of the stacking constraints. All the other types of stacking use goalie stacking (equation (4.6)) as this is

a very natural negative correlation we wish to avoid. Also, all the other types of stacking require at least

one complete line (equation (4.7)), and at least two partial lines (equation (4.8)). Only two defensemen

are allowed in Type 2, the logic being that the utility player should be an offensive skater that will score

more points, as was seen in Figure 2. Types 3 and 4 use defensemen stacking (equation (4.9)). Types 4 and

5 require exactly three different teams to be represented on a lineup. This constraint increases the lineup

variance by having more players on the same team in the lineup.

5.1. Impact of Stacking and Number of Lineups

We begin by investigating the performance of each type of stacking versus the number of lineups entered.

We create different numbers of lineups for each type of stacking with a maximum lineup overlap of seven

and calculate the mean profit margin across all contests. We plot the results in Figure 8. We observe that the

Page 19: Winning Daily Fantasy Sports Hockey Contests Using Integer ... · into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money

Hunter, Vielma, and Zaman: Winning19

No stacking Type 1 Type 2 Type 3 Type 4 Type 5Ck Ck

Equations (4.6)Equations (4.7)Equations (4.8)

CkEquations (4.6)Equations (4.7)Equations (4.8)∑

u∈D

xuk = 2

CkEquations (4.6)Equations (4.7)Equations (4.8)Equations (4.9)

CkEquations (4.6)Equations (4.7)Equations (4.8)Equations (4.9)

NT∑i=1

ti = 3

CkEquations (4.6)Equations (4.7)Equations (4.8)

NT∑i=1

ti = 3

Table 3 Summary of the constraints for different types of stacking in our integer programming formulation

for constructing hockey lineups.

lowest profit margin is achieved by no stacking. This shows the importance of stacking lineups to increase

lineup variance. Type 4 stacking achieves the highest profit margin for 100 and 200 lineups. For one lineup,

Type 3 stacking has a higher mean profit margin due to a single contest where the profit margin is 124%.

Excluding this single outlier, Type 4 stacking has the highest mean profit margin for one lineup. Recall that

Type 4 stacking includes goalie stacking, line stacking (complete and partial), defensemen stacking, and at

most three different teams. It has more stacking than all other types, which we expect to give its lineups

more variance. We see from the data that this maximizing variance strategy is effective for winning top

heavy contests.

It seems from Figure 8 that the mean profit margin is largest for one lineup. However, this is a deceiving

statistic because the mean is dictated essentially by a single outlier for these top heavy contests. A more

accurate picture is provided by the boxplot in Figure 9. Here we show the profit margin for Type 4 stacking

with a maximum lineup overlap of seven. We see that with one lineup, the median profit margin is -100%.

However, with more lineups, the median profit margin becomes larger. In addition, the 75th percentile is

larger for 200 lineups than for 100 lineups. This shows that with more lineups the distribution of the profit

margin is shifted upward and given a slightly fatter tail. Therefore, to win more money consistently, it is

advantageous to use more lineups.

5.2. Impact of Prediction Models

We next investigate the impact of different prediction models on our performance. We consider Type 4

stacking, a maximum lineup overlap of seven, and 200 lineups. The prediction models we consider use

Rotogrinders predictions, Daily Fantasy Nerd predictions, both sites’ predictions, and both sites plus the

win probability. From Table 2 we saw that the best model (in terms of R squared) for predicting goalie points

used only the Rotogrinders and Daily Fantasy Nerd predictions and not the win probability information.

However, we find that in terms of profit margin, this is not the case. We show in Figure 10 the mean profit

margin of different prediction models for different numbers of integer programming lineups. We see here

that the model with all three features does the best or near the best for each number of lineups. The reason

Page 20: Winning Daily Fantasy Sports Hockey Contests Using Integer ... · into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money

Hunter, Vielma, and Zaman: Winning20

1 lineup 100 lineups 200 lineups−100

0

100

200

300

400

500

Mea

n pr

ofit

mar

gin

[%]

No StackingType 1Type 2Type 3Type 4Type 5

Figure 8 Plot of the mean profit margin in DraftKings hockey contests versus the number of integer pro-

gramming lineups. The plots are for different types of stacking constraints. The maximum lineup

overlap allowed is seven.

−100

0

100

200

300

400

500

1 lineup 100 lineups 200 lineups

Pro�

t mar

gin

[%]

Figure 9 Boxplot of the profit margin in DraftKings hockey contests versus the number of integer program-

ming lineups. The lineups use Type 4 stacking and a maximum lineup overlap of seven.

here may be that by including the win probability, the integer programming lineups are more biased towards

goalies expected to win the games, resulting in a better chance of obtaining the game win bonus of three

points. Therefore, though the regression analysis suggested that the win probability feature resulted in worse

predictions, it actually biased our lineups towards goalies more likely to win, which gave us the edge in the

contests.

5.3. Impact of Maximum Lineup Overlap

One parameter we can adjust in our model is the maximum lineup overlap. By decreasing the allowed

overlap, we force the integer programming approach to explore more of the space of possible lineups. We

find that the degree to which one wants to explore this space depends upon how large it is. The size of the

lineup space on a given night is larger if there are more NHL games being played. We plot the mean profit

Page 21: Winning Daily Fantasy Sports Hockey Contests Using Integer ... · into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money

Hunter, Vielma, and Zaman: Winning21

1 lineup 100 lineups 200 lineups−50

0

50

100

150

200

Mea

n pr

ofit

mar

gin

[%]

DFNRotogrindersDFN and RotogrindersDFN, Rotogrinders, and Vegas odds

Figure 10 Plot of the mean profit margin of our integer programming approach in DraftKings hockey con-

tests with different prediction models. The models considered are Daily Fantasy Nerd predictions

(DFN), Rotogrinders predictions, DFN and Rotogrinders predictions, and DFN and Rotogrinders

predictions with Vegas odds. The maximum lineup overlap allowed is seven and constraint type

four is used for the integer programming lineups.

margin for 200 integer programming lineups as a function of the number of games played in a night for

different values of the maximum overlap in Figure 11. These lineups were created using Type 4 stacking.

For nights with less than four games, a maximum overlap of seven does best. For a larger number of games,

it seems that decreasing the maximum overlap improves performance. For instance, for nights with more

than nine games, an overlap of four does best. The reason for this is that when there are few games being

played, the lineup space is smaller and probably has fewer good lineups. By reducing the maximum overlap,

we end up not selecting these good lineups and instead choose many poor lineups. However, on nights with

many games, there is a huge lineup space that has several different good lineups. A smaller overlap causes

the integer programming approach to choose several of these good lineups, increasing the chance that one of

them will achieve a high rank. For maximum overlaps smaller than four we found that it becomes difficult

to obtain 200 feasible lineups. Maximum overlaps larger than seven produce lineups that are too similar and

do not provide enough diversity. Therefore, the maximum overlap should be tuned between four and seven

depending on how many games are being played on a given night.

5.4. Lineup Creation Order Versus Lineup Performance Rank

The integer programming formulation is solved iteratively to create multiple lineups. One question to ask is

do the lineups which are created earlier perform better? To investigate this, we set the integer programming

approach to create 200 linueps with Type 4 stacking and a maximum overlap of seven. We then evaluate

Page 22: Winning Daily Fantasy Sports Hockey Contests Using Integer ... · into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money

Hunter, Vielma, and Zaman: Winning22

<4 games 4 to 6 games 7−9 games >9 games−50

0

50

100

150

200

250

300

Mea

n pr

ofit

mar

gin

[%]

Max. overlap = 4

Max. overlap = 7

Max. overlap = 5

Max. overlap = 6

Figure 11 Plot of the mean profit margin 200 integer programming lineups with Type 4 stacking in DraftKings

hockey contests versus maximum lineup overlap for different number of NHL hockey games being

played.

their performance rank relative to each other in the contests. This allows us to look at several statistics

relating the creation rank and the performance rank of the lineups. We first investigate the Spearman rank

correlation coefficient between these two rankings. We find that the average value of the correlation coeffi-

cient across the different contests is 0.09 with a standard deviation of 0.1. This suggests that there is very

little correlation between the creation rank and performance rank of the lineups. To dive deeper into the

analysis, we plot in Figure 12 two different sets of data. First, we show a boxplot of the creation rank of

the best performing lineup. Second, we show the performance rank of the first created lineup. We see that

the median performance rank of the first created lineup is 74.5. This shows that the first created lineup has

a slightly, but not substantially improved performance. For the best performing lineup, the median creation

rank is 124.5. Therefore, while the first created lineup is slightly better, the winning lineup is generally one

of the later created lineups. This supports our finding of a very low correlation between the creation and

performance rank of the lineups.

5.5. Mean and Standard Deviation of Lineup Points Versus Profit Margin

We also look at the empirical mean and standard deviation of the integer programming lineups (µ1 and σ1)

versus the population lineups (µ0 and σ0) for each contest. We use 200 integer programming lineups with

Type 4 stacking and a maximum overlap of seven and plot the profit margin versus the difference in the

mean and standard deviation in Figure 13. If one looks at the spread of the mean difference of the integer

programming lineups, we see that there is a substantial variation. The mean difference swings between -10

and 20 points. When it is below zero, no money is made, but when it is above zero, a substantial profit

is made. This is a result of the fact that the lineups must satisfy the stacking constraints which increase

Page 23: Winning Daily Fantasy Sports Hockey Contests Using Integer ... · into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money

Hunter, Vielma, and Zaman: Winning23

0

50

100

150

200

250

Contest rank offirst created lineup

Creation rank ofbest lineup

Figure 12 Plot of the performance rank of the first created lineup and the creation rank of the best performing

lineup. There were 200 integer programming lineups created with Type 4 stacking and a maximum

overlap of seven.

their individual variance, but also maximize predicted points. The result is that when the actual points are

realized, we are either far above or far below the population mean. The standard deviation of the integer

programming lineups is generally below the population standard deviation. This is because we are choosing

lineups that have some correlation because they are designed to maximize points, even though the individual

lineups are designed to have a high variance. The variance we induce in the lineups is manifested through

the swing of their mean across contests. From Figure 13 it seems that we are equally likely to be either

above or below the population mean. However, because the contest payoff structure is so asymmetric, with

a maximum loss of 100%, but a maximum gain which is at least an order of magnitude larger, our strategy

ends up being profitable.

5.6. Implementation and Runtime

All formulations were constructed using the JuMP algebraic modeling language (Dunning et al. 2015,

Lubin and Dunning 2015), which is written in the Julia programming language (Bezanson et al. 2012).

This allowed us to test the runtime of our approach for various integer programming solvers. In prac-

tice, we must wait for information about which goalies are being played to be posted before we con-

struct our lineups. Goalies do not play every night, so if we do not wait for this information, we may

put goalies in lineups who are not playing and receive zero points for them. This goalie information is

generally posted on public websites about 30 minutes before the games begin. We must be able to solve

the integer programming formulation in this time to be able to enter the contest. We show in Figure

14 the time needed to solve the integer programming formulation for 200 lineups in each of our con-

tests using different integer programming solvers. We consider the free solvers CBC (COIN-OR) and

GLPK (GNU) and the commercial solvers Gurobi (Gurobi Optimization). All computations were done

using a Intel Core i5-4570 3.20GHz with 8GB of RAM. We find that all solvers are able to solve the

Page 24: Winning Daily Fantasy Sports Hockey Contests Using Integer ... · into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money

Hunter, Vielma, and Zaman: Winning24

-20 -10 0 10 20-4

-3

-2

-1

0

1

2

3

4

41%

1220%

114%

435%

178%4%

102%

1445%

77%

642%8%

13%161%

127%

554%

744%

200

400

600

800

1000

1200

1400

σ 1-σ

0

µ1-µ0

Figure 13 Plot of profit margin versus µ1 − µ0 and σ1 − σ0 for DraftKings hockey contests with 200 integer

programming lineups created using Type 4 stacking and a maximum overlap of seven. The profit

margin for each contest is indicated through the color and size of the markers and also listed next

to the marker. The x markers indicate contests where the profit margin is not positive.

0.5

1

1.5

2

2.5

3

3.5

4

CBC GLPK Gurobi

Solver

Run

time

[min

utes

]

Figure 14 The runtime of our algorithm for generating 100 lineups with different integer programming soft-

ware.

integer programming formulations in under four minutes, giving sufficient time to enter the lineups into

the contest. The complete software used to construct the integer programming lineups is available at

https://github.com/dscotthunter/Fantasy-Hockey-IP-Code.

Page 25: Winning Daily Fantasy Sports Hockey Contests Using Integer ... · into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money

Hunter, Vielma, and Zaman: Winning25

6. ConclusionWe have shown here that integer programming can provide a significant edge in daily fantasy sports contests.

Our integer programming approach was able to win several top heavy hockey contests. We have studied

how different settings of the integer programming approach impact performance. Furthermore, we provided

an analysis of the edge needed from the integer programming approach in order to win top heavy contests.

At a high level, our approach can be summarized as follows.

1. Use publicly available predictions for player points.

2. Construct an integer programming formulation to construct a lineup that maximizes points while forc-

ing the lineup to have a high variance by including lines (also known as stacking).

3. Solve the integer programming formulation multiple times, each time adding constraints to produce

distinct lineups.

The success we were able to achieve given the relative simplicity of our approach shows the power of inte-

ger programming in daily fantasy sports contests. An interesting question is how easily can this integer

programming approach be ported over to other sports besides hockey? Of the items listed above, the first

and third are generic to any sport. The only one unique to hockey is the way in which we formulated stack-

ing constraints. There are structural correlation in hockey which we exploited to produce higher variance

lineups. While other sports may not have these specific structural correlations, they do have other structural

correlations which can be used to stack lineups. We now briefly discuss these for a few popular sports.

American Football. In American football daily fantasy sports contests, lineups are required to have a

single quarterback who throws the ball, and receivers who catch the ball. Fantasy points are awarded to the

quarterback and receiver for any completed passes, yards achieved on the completion, and any touchdown

scored on the completion. This leads to a very natural structural correlation. By creating a lineup with a

quarterback and receivers on the same team, the variance of the lineup will increase. This is a well known

approach to stacking football lineups (Bales 2015b).

Baseball. In baseball, a pitcher throws the ball to a batter who must hit the ball and then run to four

bases to score runs. There is a fixed set of batters for a team, and usually a single pitcher for the majority

of a game. Fantasy points are given to the batters for hits and runs. The structural correlation in baseball is

created by the pitcher. If a pitcher is weak, then the all the opposing batters will likely score many fantasy

points. Therefore, to create high variance lineups in baseball, batters should be selected from the same team

if the opposing pitcher is expected to be weak. This stacking approach is also well known (Bales 2015a).

ReferencesT. Awad. Numbers on ice: Fixing plus/minus. Hockey Prospectus, apr 2009.

Jonathan Bales. The art of stacking (or not) in daily fantasy baseball. http://playbook.draftkings.com/mlb/the-art-of-

stacking-or-not-in-daily-fantasy-baseball/, June 2015a.

Page 26: Winning Daily Fantasy Sports Hockey Contests Using Integer ... · into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money

Hunter, Vielma, and Zaman: Winning26

Jonathan Bales. Pairing a QB with his receiver(s). https://rotogrinders.com/articles/pairing-a-qb-with-his-receiver-s-

481544, December 2015b.

David Beaudoin and Tim B Swartz. Strategies for pulling the goalie in hockey. The American Statistician, 64(3),

2010.

A. Becker and A. Sun. An analytical approach for fantasy football draft and lineup management. Journal for Quanti-

tative Analysis in Sports, 2014.

Jeff Bezanson, Stefan Karpinski, Viral Shah, and Alan Edelman. Julia: A fast dynamic language for technical com-

puting. CoRR, abs/1209.5145, 2012.

New York Times Editorial Board. Rein in online fantasy sports gambling.

http://www.nytimes.com/2015/10/05/opinion/rein-in-online-fantasy-sports-gambling.html, October 2015.

COIN-OR. CBC: COIN-OR Branch and Cut. https://projects.coin-or.org/Cbc.

DailyFantasyNerd. Daily fantasy nerd. http://dailyfantasynerd.com, December 2015.

DraftKings. Draftkings. http://draftkings.com, December 2015.

Drewby. Rackem stackem: Dfs hockey. http://dailyroto.com/nhl-dfs-stacking-strategy-draftkings-fanduel, October

2015.

Iain Dunning, Joey Huchette, and Miles Lubin. JuMP: A modeling language for mathematical optimization.

arXiv:1508.01982 [math.OC], 2015.

GNU. GLPK: GNU Linear Programming Kit. https://www.gnu.org/software/glpk/.

Robert B Gramacy, Shane T Jensen, and Matt Taddy. Estimating player contribution in hockey with regularized logistic

regression. Journal of Quantitative Analysis in Sports, 9(1):97–111, 2013.

Gurobi Optimization. The Gurobi Optimizer. http://www.gurobi.com.

Drew Harwell. Why you (probably) won’t win money playing Draftkings, FanDuel.

http://www.dailyherald.com/article/20151012/business/151019683/, October 2015.

Darren Heitner. Draftkings reports $304 million of entry fees in 2014.

http://www.forbes.com/sites/darrenheitner/2015/01/22/draftkings-reports-304-million-of-entry-fees-in-2014/,

January 2015.

Miles Lubin and Iain Dunning. Computing in operations research using Julia. INFORMS Journal on Computing, 27

(2):238–248, Spring 2015.

Brian Macdonald. An improved adjusted plus-minus statistic for nhl players. In Proceedings of the MIT Sloan Sports

Analytics Conference, URL http://www. sloansportsconference. com, 2011.

Stephen Pettigrew. Assessing the offensive productivity of nhl players using in-game win probabilities. 2015.

Rotogrinders. Rotogrinders. https://rotogrinders.com/, December 2015.

Page 27: Winning Daily Fantasy Sports Hockey Contests Using Integer ... · into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money

Hunter, Vielma, and Zaman: Winning27

M Schuckers. Digr: A defense independent rating of nhl goaltenders using spatially smoothed save percentage maps.

In Proceedings of the MIT Sloan Sports Analytics Conference, 2011.

Michael Schuckers and James Curro. Total hockey rating (thor): A comprehensive statistical rating of national hockey

league forwards and defensemen based upon all on-ice events. In Proceedings of the 2013 MIT Sloan Sports

Analytics Conference, http://www. statsportsconsulting. com/thor, 2013.

Michael E Schuckers, Dennis F Lock, Chris Wells, CJ Knickerbocker, and Robin H Lock. National hockey league

skater ratings based upon all on-ice events: An adjusted minus/plus probability (ampp) approach. Unpub-

lished manuscript. Available at http://myslu. stlawu. edu/˜ msch/sports/LockSchuckersProbPlusMinus113010.

pdf, 2011.

Daniel Steinberg. How daily fantasy football pricing algorithms work. https://dailyfantasywinners.com/fantasy-

categories/featured/daily-fantasy-football-pricing-algorithms-work/, 2015.

AC Thomas, Samuel L Ventura, Shane T Jensen, Stephen Ma, et al. Competing process hazard function models for

player ratings in ice hockey. The Annals of Applied Statistics, 7(3):1497–1524, 2013.