Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Winning Daily Fantasy Sports Hockey Contests UsingInteger Programming
David Scott HunterOperations Research Center, Massachusetts Institute of Technology, Cambridge, MA 02139, [email protected]
Juan Pablo VielmaDepartment of Operations Research, Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA 02139,
Tauhid ZamanDepartment of Operations Management, Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA 02139,
We present an integer programming approach to winning daily fantasy sports hockey contests which have top heavy
payoff structures (i.e. most of the winnings go to the top ranked entries). Our approach incorporates publicly available
predictions on player and team performance into a series of integer programming problems that compute optimal lineups
to enter into the contests. We find that the lineups produced by our approach perform well in practice and are even able to
come in first place in contests with thousands of entries. We also show through simulations how the profit margin varies
as a function of the number of lineups entered and the points distribution of the entered lineups. Studies on real contest
data show how different versions of our integer programming approach perform in practice. We find two major insights
for winning top heavy contests. First, each lineup entered should be constructed in a way to maximize its individual
mean, and more importantly, variance. Second, multiple lineups should be entered and they should be constructed in a
way to limit their correlation with each other. Our approach can easily be extended to other sports besides hockey, such
as American football and baseball.
Key words: sports analytics, daily fantasy sports, statistics, integer programming
History:
1. IntroductionDaily fantasy sports allow users to enter lineups of players in a sport (such as hockey or American football)
into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of
money in the contests. Daily fantasy sports have recently become a huge industry. For instance, DraftKings,
the leading company in this area, reported $30 million in revenues and $304 million in entry fees in 2014
(Heitner 2015). While there is ongoing legal debate as to whether or not daily fantasy sports contests are
gambling (Board 2015), there is a strong element of skill required to continuously win contests as evidenced
by the fact that the top one percent of players pay 40 percent of the entry fees and gain 91 percent of the
1
arX
iv:1
604.
0145
5v1
[st
at.O
T]
6 A
pr 2
016
Hunter, Vielma, and Zaman: Winning2
profits (Harwell 2015). Daily fantasy sports therefore provide a great opportunity for users to compete and
earn profits with analytics. Several sites such as Rotogrinders have been created which solely focus on daily
fantasy sports analytics (Rotogrinders 2015). Other sites such as Daily Fantasy Nerd even offer to provide
optimized contest lineups for a fee (DailyFantasyNerd 2015).
There are a variety of contests in daily fantasy sports with different payoff structures. Some contests
distribute all the entry fees equally among all lineups that score above the median score of the contest. These
are known as double-up or 50/50 contests. They are relatively easy to win, but do not pay out very much.
Other contests give a disproportionately high fraction of the entry fees to the top scoring lineups. We refer
to these as top heavy contests. Winning a top heavy contest is much more difficult than winning a double-up
or a 50/50 contest, but the payoff is also much greater. Typical entry fees range from $3 to $30, while the
potential winnings range from thousands to a million dollars (DraftKings 2015).
A daily fantasy sports company sets a price (in virtual dollars) for each player in a sport. These prices
vary day to day and are based upon past and predicted performance (Steinberg 2015). Lineups must then be
constructed within a fixed budget. For instance, in DraftKings hockey contests, there is a $50,000 budget
for each lineup. There are further constraints on the maximum number of each type of player allowed and
the number of different teams represented in a lineup. Finding an optimal lineup is challenging because
the performance of each player is unknown and there are hundreds of players to choose from. However,
predictions for player points can be found on a host of sites such as Rotogrinders (Rotogrinders 2015) or
Daily Fantasy Nerd (DailyFantasyNerd 2015) which are fairly accurate, and many fantasy sports players
have a good sense about how players will perform. This suggests that the main challenge is not in predicting
player performance, but in using predictions and other information to construct optimal lineups.
The strategy for constructing lineups depends upon the type of contest being entered. For double-up and
50/50 contests all that is needed is a lineup that beats the median. In this case, one would want a lineup that
has a high average predicted score but also low variance. In contrast, top heavy contests provide extreme
profits if a lineup can achieve a very high rank. If the predictions were perfect, then this could be done with a
single lineup, similar to what one would do for a double-up or 50/50. The problem with this approach is that
the predictions are noisy, so one could not consistently achieve a high rank with a single lineup. However,
two things can be done to increase the chance of victory. First, one can use lineups which have an inherently
large variance, which would increase the chance of the lineup having an extremely good performance and
achieving a high rank. Second, one can enter multiple lineups, thereby increasing the chance that at least
one of them will rank high. Because of the payoff structure of a top heavy contest, having at least one lineup
achieve a high rank will be enough to make a substantial profit. Therefore, a good approach to winning top
heavy contests is to enter many lineups with high variance.
In this work we use integer programming to win top heavy hockey contests in DraftKings. Our integer
programming approach allows us to systematically construct multiple lineups which have high variance. We
Hunter, Vielma, and Zaman: Winning3
November 15, 2015 November 16, 2015 November 17, 2015 November 23, 2015
Figure 1 Screenshots of the performance of our top integer programming lineups in real DraftKings hockey
contests with 200 linueps entered. All profits are being donated to charity.
chose hockey because we are not familiar with it. This way, any success we have is strictly due to our use of
integer programming and not any of our own expert knowledge. The contests we focused on had thousands
of entrants and top payoffs of several thousand dollars.
During the period when this work was conducted, DraftKings allowed a single user to enter 200 lineups
into a contest. We began competing with our integer programming approach and performed very well several
days in a row. We even placed third or higher in three consecutive contests. We show screenshots of our
performance with 200 lineups in Figure 1. A week after we began winning with our integer programming
approach, DraftKings reduced the number of multiple entries to 100. This reduced our performance, but
we were still able to perform well in the contests. This demonstrates the power of integer programming for
winning daily fantasy sports contests.
The remainder of the paper is structured as follows. We begin by reviewing the literature on sports ana-
lytics with a particular focus on hockey in Section 1.1. We present an empirical analysis of NHL player
performance and the prediction accuracy of sites which forecast player performance in Section 2. We present
a simulation study of the multiple lineups strategy in Section 3 which shows how much of an edge a set
of lineups needs in order to win top heavy contests. Section 4 presents our integer programming approach
for constructing multiple lineups. We evaluate the performance of our integer programming approach in
actual DraftKings hockey contests in Section 5. We conclude in Section 6 with a discussion on extending
our integer programming approach to other sports.
1.1. Previous Work
Much of the extant work in hockey analytics has focused on statistical models of player performance. The
traditional measure of a player’s ability is the plus-minus statistic, which measures the number of goals
Hunter, Vielma, and Zaman: Winning4
scored by a player’s team minus the number of goals scored by the opposing team when the player in on
the ice. Many modifications have been suggested for this simple metric in order to provide a more clear
assessment of a player’s quality. The adjusted minus/plus probability approach of Schuckers et al. (2011)
attempts to add in other data such as hits or face-offs. Controlling for team strength by using the team-
average plus-minus was proposed in Awad (2009). Other approaches attempt to statistically infer the effects
of individual factors on team performance (Thomas et al. 2013). An approach using a regression on goals
per hour to assess player performance in proposed in Macdonald (2011). To deal with the large number of
features present in the high dimensional regressions that occur for player performance, a regularized logistic
regression approach is used in Gramacy et al. (2013). The Added Goal Value metric is presented in Pettigrew
(2015) which incorporates information on what are known as power plays in hockey. A methodology known
as total hockey rating is proposed in Schuckers and Curro (2013) for performance evaluation of skaters
which incorporates data on factors such as zone starts and non-shooting events such as turnovers.
While the above models focus on skating players, a different approach is needed when evaluating goalie
performance. Typically a goalie is evaluated by their save percentage. However, this may depend upon the
strength of the defense of the team. A defense independent rating of goalies is presented in Schuckers (2011)
which uses data on the spatial locations of shots they faced. In Beaudoin and Swartz (2010) a simulator is
developed which assesses strategies for pulling goalies in hockey games.
In the area of fantasy sports there are a plethora of websites which provide strategies for drafting lineups
or algorithmic tools which provide optimized lineups. However, there is very little formal academic work
on this topic. In Becker and Sun (2014) an integer programming approach is presented which optimizes
lineups for an entire fantasy American football season. This work has elements similar to our work, such as
the use of integer programming to optimize lineups, but there are a few key differences. First the focus is on
season long fantasy contests, whereas we focus on daily fantasy contests. Second, the model is developed
for American football, which has a very different structure from hockey. Third, the approach produces a
single optimized lineup, whereas our strategy is to generate multiple lineups.
2. Hockey Data AnalysisThere are two capabilities we utilize in our approach to winning daily fantasy sports hockey contests. The
first is exploiting the structure of hockey teams and fantasy points scoring systems in order to construct
lineups that can score many fantasy points. The second is to be able to predict the performance of players so
that our lineups do indeed achieve the scores we expect. In this section we present data analysis which will
show how to predict player performance and guide the construction of an integer programming formulation
to select multiple lineups in Section 4. The daily fantasy contests we focus on are from the site DraftKings
and are based on players in the National Hockey League (NHL), which is the main hockey league in North
America. We begin by describing the basic elements of hockey and hockey daily fantasy sports contests.
Hunter, Vielma, and Zaman: Winning5
We then evaluate the performance of different public websites that provide predictions for player and team
performance. After this we study the correlations in player performance which result from the structure of
hockey teams. While our analysis focuses on DraftKings, we note that other daily fantasy sports sites have
very similar contests and point scoring systems. All data in this section comes from October 21, 2015 until
December 31, 2015.
2.1. Hockey Basics
A hockey team consists of six players: one center, two wingers, two defensemen, and a goalie. The centers,
wingers, and defensemen are known as skaters because they are free to skate on the ice rink, whereas the
goalie primarily stays in front of the goal. In hockey the aim is for a team to shoot a puck into the goal of
the opposing team and the goalie’s task is to protect the goal from any opponent scores.
In daily fantasy sports hockey contests there are constraints on the types of players that must be present
in a lineup. For DraftKings, each lineup must consist of two centers, three wingers, two defensemen, one
goalie, and one utility player which is any skater, resulting in a lineup of nine players. We show a screenshot
of one possible DraftKings lineup in Figure 2. The figure also shows a column labeled FPPG, which stands
for fantasy points per game. This is the average number of fantasy points per game the player has earned
throughout the current NHL season. DraftKings provides this information to help users construct their
lineups. The total fantasy points for a lineup is the sum of each player’s fantasy points. The lineup shown
has 34.4 fantasy points per game. This score is in the typical range of NHL lineups in DraftKings and gives
a sense of the value of different scoring actions. In a daily fantasy sports contest, there is a specific way in
which players score fantasy points. Because we test our approach in DraftKings, we will focus on its points
scoring system1. We summarize the details of the DraftKings scoring system in Table 1. In Figure 2 we
show the basic statistics of points per game for each position averaged over all NHL players. We also show
the average DraftKings cost per player in the figure. As can be seen, on average goalies tend to score almost
twice as many points as all other players and also cost twice as much. Also, the standard deviation of each
position’s points per game is roughly the same size as the average value.
The centers and wingers are known as offensive players. In the NHL each hockey team plays its offensive
players in fixed sets known as lines. This means that if one player from the line, such as the center, is on
the ice, then so are the other two players from the line. Each team has four standard lines and two power
play lines, which only play when there is a power play as a result of a penalty. We show the average points
per game for players based upon their line in Figure 2. The rank of the line typically suggests how well it
performs. As can be seen, line one is generally the highest scoring line.
The points system of DraftKings is very similar to most daily fantasy hockey points systems. It gives
more points for players that perform better, but it also has some important properties which we use to
1 Full scoring rules can be found at www.draftkings.com/rules.
Hunter, Vielma, and Zaman: Winning6
Score type PointsGoal 3Assist 2Shot on Goal 0.5Blocked Shot 0.5Short Handed Point Bonus (Goal/Assist) 1Shootout Goal 0.2Hat Trick Bonus 1.5Win (goalie only) 3Save (goalie only) 0.2Goal allowed (goalie only) -1Shutout Bonus (goalie only) 2
Table 1 Points system for NHL contests in DraftKings.
construct our lineups. The first property concerns assists, which is passing the puck to a player who either
scores or who passes the puck to the player that scores. Each assist is worth two points and DraftKings
awards points for up to two assists per goal. This is an important aspect of the scoring system because it
means that for each goal scored, it is possible for a single lineup to receive seven points (three points for the
goal, and four points for the two assists that led to the goal). However, such a situation can only occur if all
players involved are on a single lineup. This suggests that to score many points, a lineup should contain all
players from a single line. By putting all players from a single line in a lineup, we are essentially increasing
the variance of the lineup. This is a popular fantasy sports tactic known as stacking. Many expert daily
fantasy sports players use stacking to win in a variety of sports contests (Drewby 2015), (Bales 2015b),
(Bales 2015a). The second property of the points system concerns goalies. If a goalie’s team wins the game,
an additional three points is awarded to the goalie. This is important because as we will see, it is possible
to predict who will win an NHL hockey game with relatively good accuracy, so by choosing goalies on
winning teams, an extra three points can be obtained, which is a significant amount in hockey contests given
that lineups average between 30 to 40 points.
2.2. Prediction Sites
Many fantasy sports players devote a substantial amount of effort in developing elaborate prediction mod-
els for players’ performance. Their advantage comes mainly from the power of their predictive models.
However, we do not have the domain knowledge required to improve upon the prediction models of the top
fantasy players. Therefore, we take a different approach. Publicly available player point predictions are pro-
vided by the websites Rotogrinders (Rotogrinders 2015) and Daily Fantasy Nerd (DailyFantasyNerd 2015).
We find the predictions from these sites to be relatively accurate, but the accuracy varies by position. We
plot in Figure 2 the mean absolute error in points of the two sites’ predictions for different hockey positions
using our data from the 2015 NHL season. We see that the prediction errors of both sites are very similar.
Therefore, it does not seem that either one has a superior prediction model. In Section 4.1 we build a simple
regression based prediction model which uses the data from both of these sites.
Hunter, Vielma, and Zaman: Winning7
Goalie Center Winger Defenseman Skaters0
0.5
1
1.5
2
2.5
3
3.5
Mea
n A
bsol
ute
Err
or [F
PP
G]
Daily Fantasy NerdRotogrinders
Goalie Center Winger Defenseman0
1
2
3
4
5
6
7
8Average FPPGStandard Deviation FPPGAverage Player Cost [$ thousands]
Winger Center0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5Line 1Line 2Line 3Line 4Power Play Line 1Power Play Line 2
Ave
rage
FP
PG
Figure 2 (top left) Mean and standard deviation of fantasy points per game (FPPG) and average player cost
in DraftKings for each hockey position. (bottom left) Mean absolute error of NHL player FPPG
predictions of Daily Fantasy Nerd and Rotogrinders for each hockey position. (top right) Mean
FFPG of offensive skaters in different types of lines. (bottom right) Screenhsot of an NHL lineup
from DraftKings.
2.3. Player Correlations
In Section 4 we will present an integer programming formulation which creates lineups with a large vari-
ance. The key to this approach is putting players on lineups with positive correlations in their FPPG. This
way a lineup will either perform very well or very poorly. For top heavy contests, we will see that this
strategy is effective. To create high variance lineups we must first understand the key correlations between
players. There are two important correlations we look at. The first concerns players in lines. The second
concerns offensive players and goalies.
We first begin by examining the correlations of FPPG between players in the same line versus those in
different lines. We calculate the Pearson correlation coefficient between the FPPG for every pair of NHL
offensive players (centers and wingers) in the same line and in the same team but different line. We plot
the resulting average correlation coefficients for each group in Figure 3. We see that the players on the
same line have a much higher correlation than players on different lines. This is not surprising given the
way fantasy points are awarded. This correlation is most likely due to the assist/goal combinations that give
Hunter, Vielma, and Zaman: Winning8
Line 1 Line 2 Line 3 Line 4 Different Line−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
Cor
r. C
oef.
of O
ffens
ive
Pla
yer
FPP
G
Same Team Different Team-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
Cor
r. C
oeff.
of G
oalie
-Ska
ter F
PP
G Offensive SkaterDefensemen
Figure 3 (left) Average correlation coefficient of FPPG for offensive players in the same and different lines.
(right) Average correlation coefficient of FPPG for goalies and skaters in the same and opposing
teams.
points to multiple players who are simultaneously on the ice. This correlation should be included in a lineup
to increase the variance.
We next look at the correlation between goalies and skaters in the same game. We consider the correlation
of FPPG for offensive players and defensemen with goalies on the same and opposing team in a game. We
plot the resulting average Pearson correlation coefficient of the FPPG for these groups in Figure 3. We find
that the correlation is negative for goalies and skaters on opposing teams. For the same team the correlation
is near zero. The negative correlation is due to the fact that if a skater scores fantasy points, it is typically
because a goal was scored, which gives negative fantasy points to the goalie. On the same team there is no
structural condition that would correlate skaters and goalies on the same team. The negative correlation is
something we want to avoid within a lineup as it would reduce the overall variance. Therefore, to increase
lineup variance, we will avoid putting goalies and skaters on opposing teams in a single lineup.
3. Simulation Study of the Multiple Lineups StrategyBecause of the excessively high payoff for the best lineups in top heavy contests, the key to winning is to
have at least one lineup achieve a high rank. For instance, in a DraftKings NHL Sniper contest, nothing
is paid to the bottom 79% of lineups but $4000 is paid to the first place lineup. The entry fee is $3 per
lineup and an individual user can enter up to 100 or 200 lineups2. If a user entered 200 lineups in the
Sniper contest and one of the lineups came in at least seventh place, then all $600 in entry fees would be
recovered. The Sniper contest payoff structure is similar to other top heavy contests. Profit can be made in
these contests even if all lineups entered do poorly, as long as at least one does very well. We provide an
2 Originally a maximum of 200 lineups per user were allowed in any contest, but during the course of this research DraftKingsreduced this to 100 lineups.
Hunter, Vielma, and Zaman: Winning9
integer programming approach to construct multiple lineups in Section 4 that have some sort of edge over
the other contest lineups. We wish to understand how much of an edge the integer programming approach
needs to provide in order for it to be profitable.
To understand how the multiple lineup strategy will perform in a top heavy contest, we model the lineups
as follows. Let n be the total number of lineups in a contest. Let m be the number of lineups we will enter
using our integer programming approach. The rules of daily fantasy sports force m to be no more than 1%
or 2% of n. To analyze the behavior of the profit, we model the points of each lineup as an independent
normal random variable. In reality this will not be the case because many lineups will share players, but
we make the assumption here to simplify our analysis. Let the points of the integer programming lineups
be given by normal random variables Xi, 1≤ i≤m, with mean and variance µ1 and σ21 . Let the points of
the other lineups in the contest, which we refer to as the population lineups, be given by the normal random
variables Yi, 1≤ i≤ n−m with mean and variance µ0 and σ20 . We wish to understand how the profit of the
integer programming lineups behaves as a function of the contest payoff structure, the number of integer
programming lineups entered, and the edge provided by the integer programming approach.
We let the profit of each rank in a contest be given by a vector p = [p1, p2, ..., pn] where pi is the total
profit (winnings minus entry fee) of a lineup ranking i out of n. We have that∑n
i=1 pi < 0 since a fixed
percentage of the entry fees are claimed by DraftKings as revenue. We define the order statistics of the
integer programming lineups as X(1) ≥X(2) ≥ ...≥X(m) and the order statistics of the population lineups
as Y(1) ≥ Y(2) ≥ ...≥ Y(n−m). The rank of the ith best integer programming lineup X(i) in the contest is ri
and the profit margin from a contest is
Q=
∑m
i=1 pricm
. (3.1)
This profit margin is a function of n,m, p, c, and the distributions of Xi, and Yi. To understand the rela-
tionship between Q and the other variables, we perform a simulation study. We consider the NHL Sniper
contest which has a top heavy payoff structure shown in Figure 4. The size of the contest varies from day
to day, but typically has several thousand entries. In our simulations we use n= 21,073 which is the actual
number of entries in one of the Sniper contests. To simulate the Sniper contest we fix the distribution param-
eters, simulate m integer programming lineups and n−m population lineups, and calculate the resulting
profit margin using equation (3.1). We repeat this for 10,000 trials for each set of parameter values. We then
investigate the relationship between profit margin and the different parameters of the model.
We plot in Figure 5 the median profit as a function of m for the Sniper contest with µ0 = 27.5 and
σ0 = 7.1, which are actual population values for one of the Sniper contests. We use different values for
µ1 and σ1. We simulate up to m = 1000, but in reality m is no more than 100 or 200. We see that when
the algorithm and population have equivalent variances, the integer programming mean must be two points
above the population to make profit. From the figure, the profit margin becomes positive at approximately
Hunter, Vielma, and Zaman: Winning10
100 101 102 103 104100
101
102
103
104
Rank
Pay
off [
$]
Figure 4 Payoff versus lineup rank of the Sniper contest with 21,073 entries. Ranks which receive zero payoff
are not shown.
140 integer programming lineups when µ1−µ0 > 2. The profit margin also tends to increases in m for this
range. Therefore, if the integer programming approach could only increase µ1, then more lineups would
always be better in this range. The profit margin clearly decreases at some point because if m = n the
profit margin would be negative since the net payoff of the contest is negative (due to fees collected by
DraftKings). However, this occurs far beyond m= 1,000 and so is not relevant in real contests.
We see a more interesting behavior if we set the integer programming mean equal to the population mean
and vary the integer programming variance. Here we see that with σ1 only two points above σ0, positive
profit can be achieved with 35 lineups. If σ1 is four points above σ0, something different occurs. Now the
peak median profit margin of 1,250 % is achieved at around 100 lineups, but for more than 100 lineups, the
profit margin actually begins to decrease. For the range of m allowed by DraftKings, an increase in σ1 has
much more impact on the profit margin than an increase in µ1. This is a result of the payoff structure of top
heavy contests.
To further understand how the algorithm lineup distribution affects the profit margin for values of m that
are used in practice, we fix µ0 and σ0 as before and we set m equal to 100 and 200 (the new and original
limits on multiple lineups set by DraftKings). We then sweep over µ1 and σ1 and trace the profit margin
contours in Figure 6. We see from this figure that if µ0 = µ1 then we need σ1 to be about one point above
σ0 to make profit. If σ0 = σ1, µ1 must be approximately two points above µ0 to make profit.
The integer programming approach can even make profit if its mean or variance are below the population.
For instance, if µ1 − µ0 =−1, then the integer programming approach will make profit if σ1 is more than
one point above σ0. But if σ1 − σ0 = −1 then the integer programming approach needs µ1 to be at least
three points above µ0. This shows that the key to winning top heavy contests is having lineups which have
a higher variance than the population, even if this comes at the cost of a lower mean.
Hunter, Vielma, and Zaman: Winning11
0 200 400 600 800 1000−150
−100
−50
0
50
100
150
200
250
300
Number of integer program lineups
Med
ian
prof
it m
argi
n (p
erce
nt)
µ1−µ
0 = −2
µ1−µ
0 = 0
µ1−µ
0 = 2
µ1−µ
0 = 4
0 200 400 600 800 1000−200
0
200
400
600
800
1000
1200
1400
Number of integer program lineups
Med
ian
prof
it m
argi
n (p
erce
nt)
σ1−σ
0 = −2
σ1−σ
0 = 0
σ1−σ
0 = 2
σ1−σ
0 = 4
Figure 5 The median profit as a function of the number of integer programming lineups for the Sniper con-
test with n= 21,073, µ0 = 27.5, and σ0 = 7.1. (right) σ1 = σ and curves for different values of µ1 are
shown. (left) µ1 = µ and curves for different values of σ1 are shown.
0
0
0
0
200
200
200
200
400
400
400
400
600
600
600
600
200 integer program lineups
µ1−µ0
σ 1−σ 0
−5 −4 −3 −2 −1 0 1 2 3 4 5−5
−4
−3
−2
−1
0
1
2
3
4
5
0
0
0
0
200
200
200
200
400
400
400
600
600
600
100 integer program lineups
µ1−µ0
σ 1−σ 0
−5 −4 −3 −2 −1 0 1 2 3 4 5−5
−4
−3
−2
−1
0
1
2
3
4
5
Figure 6 Contour plot of the median profit margin (in percent) for the Sniper contest with n = 21,073, µ0 =
27.5, σ0 = 7.1 and different numbers of integer programming lineups.
3.1. Double-ups versus Top Heavy Contests
Our simulation analysis shows that the multiple lineups strategy will be very profitable if our integer pro-
gramming approach can produce lineups with an edge of a few points above the population. We now inves-
tigate if a similar result holds for another popular contest, the double-up. Recall that double-up contests
divide all the entry fees evenly among roughly the top half of the lineups. We consider an actual DraftKings
hockey double-up contest with n = 1,136 and m = 30. This contest has a smaller overall size and multi-
ple entry limit than the top heavy competitions we analyzed. We simulate performance on this contest as
a function of the integer programming edge and show the result in Figure 7. Two things are evident from
this figure. First, the profit margin is much lower than for top heavy contests. In the figure the maximum
Hunter, Vielma, and Zaman: Winning12
profit margin is 90 percent. Second, the profit margin decreases as σ1 increases, in contrast to the top heavy
contests. In fact, the highest profits are made when the integer programming approach has a positive edge
in the mean and a negative edge in the variance. This difference in strategies for double-up and top heavy
contests is due to the different payoff structures of the contests. Top heavy contests have a high variance
payoff structure and require the integer programming approach to have a high variance as well, whereas
double-up contests have a low variance payoff structure and prefer a lower variance.
The simplicity of the double-up payoff structure allows us to accurately approximate the profit margin
with a closed form expression. We assume the entry fee is c and that m lineups are entered. For our approx-
imation we assume that n is very large so that the median score in the competition is given by µ0 (recall
that for normal random variables the mean and median are equal). For a double-up, the payoff is approxi-
mately 2c for any lineup above the median and zero for the rest. We say approximately because the cutoff
for winning is actually slightly above the median in order for some of the entry fees to be collected by the
daily fantasy sports company. Then, using equation (3.1) the profit margin for m lineups is
Qm =c∑m
i=1 1{Xi >µ0}− c∑m
i=1 1{Xi ≤ µ0}cm
=2
m
m∑i=1
1{Xi >µ0}− 1
Recall that we model Xi as normal random variables. We define F (x;µ,σ) as the cumulative distribution
function (CDF) of a normal random variable with mean µ and standard deviation σ. Then we have the
following result for the limiting profit margin of a double-up.
Theorem 3.1 Let Q= 1− 2F (µ0;µ1, σ1). Then Qm converges almost surely to Q.
Proof. From the strong law of large numbers we have that 1m
∑m
i=1 1{Xi >µ0}a.s.−→ 1−F (µ0;µ1, σ1). The
expression for Q follows from this. �
The expression Q is our approximation for the profit margin of a double up contest when we have many
lineups entered and the contest is very large. We plot Q in Figure 7 along side the simulated profit margin
for 30 integer programming lineups. As can be seen, the approximation is very close to the simulated value.
Therefore, we can use Q to very easily map the profit margin surface of double-up contests. We find the
same basic conclusions as with the simulation, i.e. the best lineups have high mean and low variance. While
we can find a closed form expression for the profit margin surface for double-ups, such an expression cannot
be easily found for top heavy competitions because of the complexity of the payoff structure. Instead we
must rely on simulations to map the profit margin surface.
Hunter, Vielma, and Zaman: Winning13
00
0
1010
10
10
20
2020
20
20
30
30
30
30
40
40
40
50
50
50
60
60
70
70
80
90
30 algorithm lineups, double-up
µ1−µ0
σ 1−σ0
−5 −4 −3 −2 −1 0 1 2 3 4 5−5
−4
−3
−2
−1
0
1
2
3
4
0
00
1010
1020
20
20
30
30
30
40
40
40
50
50
50
60
60
70
70
80
80
90
Theory, double−up
µ1−µ0
σ 1−σ0
−5 −4 −3 −2 −1 0 1 2 3 4 5−5
−4
−3
−2
−1
0
1
2
3
4
Figure 7 (left) Contour plot of the median profit margin (in percent) for a double-up contest with n= 1,136,
m= 30, µ0 = 27.5, and σ0 = 7.1. (right) Contour plot of the theoretical profit margin (in percent) for a
double-up contest with µ0 = 27.5, and σ0 = 7.1.
4. Multiple Lineup Integer Programming ApproachWe now present our integer programming approach for selecting multiple lineups for hockey contests. The
integer programming approach takes as input the player point predictions, which we calculate using a simple
regression model. We then construct and solve an integer programming formulation to produce a single
lineup with maximum predicted points, subject to a set of stacking constraints which increase the lineup
variance. Then, we iteratively add constraints to the integer programming formulation in order to produce
multiple lineups which are distinct from previously constructed lineups. We now go into the details of the
prediction model and the integer programming formulation.
4.1. Player Point Predictions
We first build a predictive model for the fantasy points for each player in the NHL. Rather than build
complex statistical model to make predictions, we instead choose a much simpler approach. We saw that
sites such as Rotogrinders and Daily Fantasy Nerd have relatively accurate predictions for player points.
Further, these sites are run by fantasy sports experts who most likely have much domain expertise and have
spent a great amount of effort to construct their predictions. Therefore, it is unlikely that any statistical
model we build would have an edge over these sites, especially given our lack of domain knowledge for
hockey. Rather, it is sensible and very simple to use these predictions for the players’ points. Here we will
analyze a set of simple predictive models and evaluate their prediction accuracy. Later in Section 5 we will
evaluate these different models by their profitability in real contests.
Hunter, Vielma, and Zaman: Winning14
(1) (2) (3) (4) (5) (6)skater points goalie points goalie points goalie points goalie points goalie points
β1 0.634∗∗∗ 0.628∗∗∗ 0.203 0.203(Rotogrinders) (0.0203) (0.0864) (0.144) (0.144)
β2 0.282∗∗∗ -0.0173 -0.0309 -0.0301(Daily Fantasy Nerd) (0.0242) (0.106) (0.113) (0.113)
β3 4.310∗ 5.304∗∗ 4.176∗ 5.172∗∗
(Win Probability) (2.123) (2.005) (2.064) (1.941)
β0 1.334 1.686(0.0314) (0.549) (1.032) (1.003) (1.013) (0.983)
R2 0.238 0.0858 0.0179 0.0140 0.0178 0.0139Samples 10825 565 506 506 506 506Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Table 2 Regression coefficients, R squared, and sample count for different prediction models of skater and
goalie points.
For a player u, let xu1 and xu2 be the predictions by Rotogrinders and Daily Fantasy Nerd, respectively.
For skaters, we perform a linear regression on the player’s points using these predictions as features. We
denote yu as the predicted points of player u using our model. For skater u our points prediction is
yu = β0 +β1f1u +β2f2u. (4.1)
For goalies, we must also factor in the outcome of the game since there is a three point bonus for victory.
Daily Fantasy Nerd provides win probabilities for each goalie, and we have found these estimates to be
relatively accurate. We perform a linear regression on the goalie points using as predictor variables the
predictions of Rotogrinders, Daily Fantasy Nerd, and the win probability of the goalie’s team. For a goalie
u, let this win probability by pu. Then the points prediction for goalie u is
fu = β0 +β1f1u +β2f2u +β3pu. (4.2)
We compare different prediction models in Table 2. For skaters, both Rotogrinders and Daily Fantasy
Nerd predictions are significant. For goalies when we do not include the win probability we find that only
Rotogrinders is significant. When the win probability is included, it is the only significant feature. The
highest R squared value for goalies is the model which does not include the win probability. This suggests
that we should not include the win probability in our goalie model. However, we will see in Section 5 that
in terms of profit in real contests, the win probability model can have better performance.
4.2. Feasibility Constraints
Now that we have the predictions for the player points, we can construct the integer programming formula-
tion. This integer programming formulation will be solved iteratively with new added constraints to produce
Hunter, Vielma, and Zaman: Winning15
multiple lineups. We now introduce the basic notation for our algorithm. We will construct m lineups. We
assume there areN NHL players to choose from and we let the variable xuk equal one if player u is selected
for lineup k and zero otherwise. Each player u is predicted to achieve fu fantasy points. We let the fantasy
price of player u be cu. There are NT NHL teams playing in a contest, and we denote the set of players on
team i by Ti. Also, we denote the set of players who are centers, wingers, defenseman, and goalies by C,
W , D, and G, respectively.
We begin by formulating the basic feasibility constraints of a lineup. Daily fantasy sports contests all
impose a budget B constraint on a lineup. For DraftKings, this budget is B = 50,000 fantasy dollars. There
is also a constraint on the number of players of each type of position. In DraftKings, a lineup consists of
nine players: two centers, three wingers, two defensemen, a goalie, and a utility player who can be any
skater type. The position constraints are given by
2≤∑u∈C
xuk ≤ 3,
3≤∑u∈W
xuk ≤ 4,
2≤∑u∈D
xuk ≤ 3,∑u∈G
xuk = 1,
(4.3)
DraftKings also requires each lineup to have players that come from at least three different NHL teams. The
team constraints areti ≤
∑u∈Ti
xuk, ∀ i∈ {1, . . . ,NT}
NT∑i=1
ti ≥ 3,
ti ∈ {0, 1} , ∀ i∈ {1, . . . ,NT} .
(4.4)
We put these constraints together to obtain the basic constraints for a feasible lineup:
N∑u=1
cuxuk ≤B, (budget constraint)
N∑u=1
xuk = 9, (lineup size constraint)
Equations (4.3), (position constraint)
Equations (4.4), (team constraint)
xuk ∈ {0,1} , 1≤ u≤N.
(4.5)
4.3. Stacking Lineups Using Structural Correlations
As we discussed in Section 2.3, there are correlations that are induced by the structure of the game. If we
were able to perfectly predict every player’s performance, then we would not need this structural correlation
Hunter, Vielma, and Zaman: Winning16
information to win a given contest. However, uncertainty in the player performance requires us to take
advantage of these structural correlations in order to win the contests. In essence, what we want to do is use
the structural correlations to create lineups which have very high variance, which is referred to as stacking.
By stacking our lineups, we increase the chance that at least one lineup can achieve a top rank in a contest.
In this section we propose ways to stack lineups by incorporate this structural correlation information into
our integer programming formulation.
Goalie Stacking. As discussed in Section 2.3, goalies are negatively correlated with the skaters that they
are opposing. Therefore, to increase the variance of a lineup, we would like to avoid having this negative
correlation by making sure all of the lineup’s skaters are not opposing its goalie. We will refer to this
constraint as goalie stacking. To model this constraint in our integer programming formulation, we denote
the set of skaters who are opposing goalie u byOu. Due to the team constraint given by (4.4), the maximum
number of players we can have in one lineup from a given NHL team is six. For lineup k and goalie u, the
goalie stacking constraint is
6xuk +∑v∈Ou
xvk ≤ 6, ∀u∈G. (4.6)
This constraint will force the goalie variable xuk to be zero if the lineup has any skater opposing the goalie.
Line Stacking. Additionally, in Section 2.3 we found that players on the same line are positively cor-
related. Therefore, to increase the lineup variance, we would like to impose additional constraints on our
formulation that will create lineups with players on the same line. We will refer to this as line stacking.
There are eight possible skaters on a team and a line consists of three skaters. Therefore, the maximum
number of lines we can include in a lineup is two. To include as much positive correlation as possible
amongst the players, we can make a lineup to have one complete line of three players, and two partial lines
with at least two players each. To formulate the line stacking constraints, we introduce some notation. Let
NL denote the total number of lines, and let Li denote the set of players on line i. With this notation we can
formulate the constraint of having at least one complete line in lineup k as
3vi ≤∑u∈Li
xuk, ∀i∈ {1, . . . ,NL}
NL∑i=1
vi ≥ 1
vi ∈ {0,1} , ∀i∈ {1, . . . ,NL} .
(4.7)
Under these constraints vi is one if we have all three players from line i, and zero otherwise, and we require
at least one of the vi to be one. To formulate the constraint of having at least two players from two distinct
Hunter, Vielma, and Zaman: Winning17
lines, we use the following constraints:
2wi ≤∑u∈Li
xuk, ∀i∈ {1, . . . ,NL}
NL∑i=1
wi ≥ 2
wi ∈ {0,1} , ∀i∈ {1, . . . ,NL} .
(4.8)
Similar to the complete line constraints, this constraint sets wi to one if at least two out of three players of
line i are in the lineup, and there must be at least two wi equal to one.
Defensemen Stacking. Lastly, we have found that defensemen that are on the first power play line score
substantially higher than defensemen that are not. Therefore, to increase a lineup’s average points, one could
choose defensemen that are on the first power play line. We will refer to this as defensemen stacking. To
model this constraint, let PD denote the set of defensemen that are on any team’s first power play line. Then,
to use only these defensemen in lineup k we add the following constraints to our integer programming
formulation:
xuk = 0 ∀u∈D \PD. (4.9)
This constraint makes sure that any defenseman that is chosen for the lineup is also in the first power play
line of some team.
There are many other constraints that could be added to the model, such as forcing a lineup to have every
player on a team’s first power play, imposing constraints to have seven players from one game, etc. We
did not investigate all the possible types of stacking, but they can all be naturally incorporated into our
integer programming formulation. In this work we focus on the constraints presented because they capture
the major structural correlations in hockey.
4.4. Multiple Lineup Constraints
Multiple lineups are created by iteratively solving the integer programming formulation. For each new
lineup, we add additional linear constraints to the integer programming formulation in order to refine the
solution space by removing previously found lineups. Let x∗ul be the optimal value of the variable for player
u in lineup l. We define the multiple lineup constraints for lineup k as
N∑u=1
x∗ulxuk ≤ γ, l= 1, . . . , k− 1. (4.10)
These constraints make sure that lineup k will have no more than γ players in common with any previously
constructed lineup. We refer to γ as the maximum lineup overlap. Decreasing γ forces the lineups to be
more diverse, whereas for larger γ the lineups will share more players. We will see in Section 5 how the
choice of γ is impacted by the number of NHL games being played on a given night.
Hunter, Vielma, and Zaman: Winning18
4.5. Integer Programming Approach for Multiple Lineups
We now present the final integer programming formulation for constructing m lineups. We obtain lineup k
(1≤ k≤m) by solving
maximizex
N∑u=1
fuxuk
subject to Equation (4.5) (feasibility constraints)
Equation (4.10) (multiple lineup constraints)
Any subset of equations (4.6), (4.7), (4.8), (4.9) (stacking constraints).
(4.11)
By solving this integer programming formulation iterativelym times, each time updating the multiple lineup
constraints with the previously found solutions, one obtains m lineups. This formulation allows flexibility
in the choice of stacking constraints, prediction model, maximum lineup overlap, and number of linueps
constructed. We will see next how to select the best integer programming settings.
5. PerformanceWe now evaluate the performance of our algorithm on real daily fantasy sports hockey contests. We obtained
data on the performance of the population lineups in several actual DraftKings hockey contests by entering
the contest. As entrants, we are able to download the lineups and points of every lineup in the contest. In
total we have data for 38 different top heavy contests which contain several thousand entrants. We run our
algorithm and evaluate how much profit our lineups would have made in each contest competing against the
actual population lineups. We investigate a variety of integer programming settings, including the number
of lineups entered, the type of lineup stacking used, the prediction model used, and other features.
We combine the different constraints from Section 4.3 to create six different types of stacking which
are summarized in Table 3. The first type is no stacking, which only uses the basic constraints and none
of the stacking constraints. All the other types of stacking use goalie stacking (equation (4.6)) as this is
a very natural negative correlation we wish to avoid. Also, all the other types of stacking require at least
one complete line (equation (4.7)), and at least two partial lines (equation (4.8)). Only two defensemen
are allowed in Type 2, the logic being that the utility player should be an offensive skater that will score
more points, as was seen in Figure 2. Types 3 and 4 use defensemen stacking (equation (4.9)). Types 4 and
5 require exactly three different teams to be represented on a lineup. This constraint increases the lineup
variance by having more players on the same team in the lineup.
5.1. Impact of Stacking and Number of Lineups
We begin by investigating the performance of each type of stacking versus the number of lineups entered.
We create different numbers of lineups for each type of stacking with a maximum lineup overlap of seven
and calculate the mean profit margin across all contests. We plot the results in Figure 8. We observe that the
Hunter, Vielma, and Zaman: Winning19
No stacking Type 1 Type 2 Type 3 Type 4 Type 5Ck Ck
Equations (4.6)Equations (4.7)Equations (4.8)
CkEquations (4.6)Equations (4.7)Equations (4.8)∑
u∈D
xuk = 2
CkEquations (4.6)Equations (4.7)Equations (4.8)Equations (4.9)
CkEquations (4.6)Equations (4.7)Equations (4.8)Equations (4.9)
NT∑i=1
ti = 3
CkEquations (4.6)Equations (4.7)Equations (4.8)
NT∑i=1
ti = 3
Table 3 Summary of the constraints for different types of stacking in our integer programming formulation
for constructing hockey lineups.
lowest profit margin is achieved by no stacking. This shows the importance of stacking lineups to increase
lineup variance. Type 4 stacking achieves the highest profit margin for 100 and 200 lineups. For one lineup,
Type 3 stacking has a higher mean profit margin due to a single contest where the profit margin is 124%.
Excluding this single outlier, Type 4 stacking has the highest mean profit margin for one lineup. Recall that
Type 4 stacking includes goalie stacking, line stacking (complete and partial), defensemen stacking, and at
most three different teams. It has more stacking than all other types, which we expect to give its lineups
more variance. We see from the data that this maximizing variance strategy is effective for winning top
heavy contests.
It seems from Figure 8 that the mean profit margin is largest for one lineup. However, this is a deceiving
statistic because the mean is dictated essentially by a single outlier for these top heavy contests. A more
accurate picture is provided by the boxplot in Figure 9. Here we show the profit margin for Type 4 stacking
with a maximum lineup overlap of seven. We see that with one lineup, the median profit margin is -100%.
However, with more lineups, the median profit margin becomes larger. In addition, the 75th percentile is
larger for 200 lineups than for 100 lineups. This shows that with more lineups the distribution of the profit
margin is shifted upward and given a slightly fatter tail. Therefore, to win more money consistently, it is
advantageous to use more lineups.
5.2. Impact of Prediction Models
We next investigate the impact of different prediction models on our performance. We consider Type 4
stacking, a maximum lineup overlap of seven, and 200 lineups. The prediction models we consider use
Rotogrinders predictions, Daily Fantasy Nerd predictions, both sites’ predictions, and both sites plus the
win probability. From Table 2 we saw that the best model (in terms of R squared) for predicting goalie points
used only the Rotogrinders and Daily Fantasy Nerd predictions and not the win probability information.
However, we find that in terms of profit margin, this is not the case. We show in Figure 10 the mean profit
margin of different prediction models for different numbers of integer programming lineups. We see here
that the model with all three features does the best or near the best for each number of lineups. The reason
Hunter, Vielma, and Zaman: Winning20
1 lineup 100 lineups 200 lineups−100
0
100
200
300
400
500
Mea
n pr
ofit
mar
gin
[%]
No StackingType 1Type 2Type 3Type 4Type 5
Figure 8 Plot of the mean profit margin in DraftKings hockey contests versus the number of integer pro-
gramming lineups. The plots are for different types of stacking constraints. The maximum lineup
overlap allowed is seven.
−100
0
100
200
300
400
500
1 lineup 100 lineups 200 lineups
Pro�
t mar
gin
[%]
Figure 9 Boxplot of the profit margin in DraftKings hockey contests versus the number of integer program-
ming lineups. The lineups use Type 4 stacking and a maximum lineup overlap of seven.
here may be that by including the win probability, the integer programming lineups are more biased towards
goalies expected to win the games, resulting in a better chance of obtaining the game win bonus of three
points. Therefore, though the regression analysis suggested that the win probability feature resulted in worse
predictions, it actually biased our lineups towards goalies more likely to win, which gave us the edge in the
contests.
5.3. Impact of Maximum Lineup Overlap
One parameter we can adjust in our model is the maximum lineup overlap. By decreasing the allowed
overlap, we force the integer programming approach to explore more of the space of possible lineups. We
find that the degree to which one wants to explore this space depends upon how large it is. The size of the
lineup space on a given night is larger if there are more NHL games being played. We plot the mean profit
Hunter, Vielma, and Zaman: Winning21
1 lineup 100 lineups 200 lineups−50
0
50
100
150
200
Mea
n pr
ofit
mar
gin
[%]
DFNRotogrindersDFN and RotogrindersDFN, Rotogrinders, and Vegas odds
Figure 10 Plot of the mean profit margin of our integer programming approach in DraftKings hockey con-
tests with different prediction models. The models considered are Daily Fantasy Nerd predictions
(DFN), Rotogrinders predictions, DFN and Rotogrinders predictions, and DFN and Rotogrinders
predictions with Vegas odds. The maximum lineup overlap allowed is seven and constraint type
four is used for the integer programming lineups.
margin for 200 integer programming lineups as a function of the number of games played in a night for
different values of the maximum overlap in Figure 11. These lineups were created using Type 4 stacking.
For nights with less than four games, a maximum overlap of seven does best. For a larger number of games,
it seems that decreasing the maximum overlap improves performance. For instance, for nights with more
than nine games, an overlap of four does best. The reason for this is that when there are few games being
played, the lineup space is smaller and probably has fewer good lineups. By reducing the maximum overlap,
we end up not selecting these good lineups and instead choose many poor lineups. However, on nights with
many games, there is a huge lineup space that has several different good lineups. A smaller overlap causes
the integer programming approach to choose several of these good lineups, increasing the chance that one of
them will achieve a high rank. For maximum overlaps smaller than four we found that it becomes difficult
to obtain 200 feasible lineups. Maximum overlaps larger than seven produce lineups that are too similar and
do not provide enough diversity. Therefore, the maximum overlap should be tuned between four and seven
depending on how many games are being played on a given night.
5.4. Lineup Creation Order Versus Lineup Performance Rank
The integer programming formulation is solved iteratively to create multiple lineups. One question to ask is
do the lineups which are created earlier perform better? To investigate this, we set the integer programming
approach to create 200 linueps with Type 4 stacking and a maximum overlap of seven. We then evaluate
Hunter, Vielma, and Zaman: Winning22
<4 games 4 to 6 games 7−9 games >9 games−50
0
50
100
150
200
250
300
Mea
n pr
ofit
mar
gin
[%]
Max. overlap = 4
Max. overlap = 7
Max. overlap = 5
Max. overlap = 6
Figure 11 Plot of the mean profit margin 200 integer programming lineups with Type 4 stacking in DraftKings
hockey contests versus maximum lineup overlap for different number of NHL hockey games being
played.
their performance rank relative to each other in the contests. This allows us to look at several statistics
relating the creation rank and the performance rank of the lineups. We first investigate the Spearman rank
correlation coefficient between these two rankings. We find that the average value of the correlation coeffi-
cient across the different contests is 0.09 with a standard deviation of 0.1. This suggests that there is very
little correlation between the creation rank and performance rank of the lineups. To dive deeper into the
analysis, we plot in Figure 12 two different sets of data. First, we show a boxplot of the creation rank of
the best performing lineup. Second, we show the performance rank of the first created lineup. We see that
the median performance rank of the first created lineup is 74.5. This shows that the first created lineup has
a slightly, but not substantially improved performance. For the best performing lineup, the median creation
rank is 124.5. Therefore, while the first created lineup is slightly better, the winning lineup is generally one
of the later created lineups. This supports our finding of a very low correlation between the creation and
performance rank of the lineups.
5.5. Mean and Standard Deviation of Lineup Points Versus Profit Margin
We also look at the empirical mean and standard deviation of the integer programming lineups (µ1 and σ1)
versus the population lineups (µ0 and σ0) for each contest. We use 200 integer programming lineups with
Type 4 stacking and a maximum overlap of seven and plot the profit margin versus the difference in the
mean and standard deviation in Figure 13. If one looks at the spread of the mean difference of the integer
programming lineups, we see that there is a substantial variation. The mean difference swings between -10
and 20 points. When it is below zero, no money is made, but when it is above zero, a substantial profit
is made. This is a result of the fact that the lineups must satisfy the stacking constraints which increase
Hunter, Vielma, and Zaman: Winning23
0
50
100
150
200
250
Contest rank offirst created lineup
Creation rank ofbest lineup
Figure 12 Plot of the performance rank of the first created lineup and the creation rank of the best performing
lineup. There were 200 integer programming lineups created with Type 4 stacking and a maximum
overlap of seven.
their individual variance, but also maximize predicted points. The result is that when the actual points are
realized, we are either far above or far below the population mean. The standard deviation of the integer
programming lineups is generally below the population standard deviation. This is because we are choosing
lineups that have some correlation because they are designed to maximize points, even though the individual
lineups are designed to have a high variance. The variance we induce in the lineups is manifested through
the swing of their mean across contests. From Figure 13 it seems that we are equally likely to be either
above or below the population mean. However, because the contest payoff structure is so asymmetric, with
a maximum loss of 100%, but a maximum gain which is at least an order of magnitude larger, our strategy
ends up being profitable.
5.6. Implementation and Runtime
All formulations were constructed using the JuMP algebraic modeling language (Dunning et al. 2015,
Lubin and Dunning 2015), which is written in the Julia programming language (Bezanson et al. 2012).
This allowed us to test the runtime of our approach for various integer programming solvers. In prac-
tice, we must wait for information about which goalies are being played to be posted before we con-
struct our lineups. Goalies do not play every night, so if we do not wait for this information, we may
put goalies in lineups who are not playing and receive zero points for them. This goalie information is
generally posted on public websites about 30 minutes before the games begin. We must be able to solve
the integer programming formulation in this time to be able to enter the contest. We show in Figure
14 the time needed to solve the integer programming formulation for 200 lineups in each of our con-
tests using different integer programming solvers. We consider the free solvers CBC (COIN-OR) and
GLPK (GNU) and the commercial solvers Gurobi (Gurobi Optimization). All computations were done
using a Intel Core i5-4570 3.20GHz with 8GB of RAM. We find that all solvers are able to solve the
Hunter, Vielma, and Zaman: Winning24
-20 -10 0 10 20-4
-3
-2
-1
0
1
2
3
4
41%
1220%
114%
435%
178%4%
102%
1445%
77%
642%8%
13%161%
127%
554%
744%
200
400
600
800
1000
1200
1400
σ 1-σ
0
µ1-µ0
Figure 13 Plot of profit margin versus µ1 − µ0 and σ1 − σ0 for DraftKings hockey contests with 200 integer
programming lineups created using Type 4 stacking and a maximum overlap of seven. The profit
margin for each contest is indicated through the color and size of the markers and also listed next
to the marker. The x markers indicate contests where the profit margin is not positive.
0.5
1
1.5
2
2.5
3
3.5
4
CBC GLPK Gurobi
Solver
Run
time
[min
utes
]
Figure 14 The runtime of our algorithm for generating 100 lineups with different integer programming soft-
ware.
integer programming formulations in under four minutes, giving sufficient time to enter the lineups into
the contest. The complete software used to construct the integer programming lineups is available at
https://github.com/dscotthunter/Fantasy-Hockey-IP-Code.
Hunter, Vielma, and Zaman: Winning25
6. ConclusionWe have shown here that integer programming can provide a significant edge in daily fantasy sports contests.
Our integer programming approach was able to win several top heavy hockey contests. We have studied
how different settings of the integer programming approach impact performance. Furthermore, we provided
an analysis of the edge needed from the integer programming approach in order to win top heavy contests.
At a high level, our approach can be summarized as follows.
1. Use publicly available predictions for player points.
2. Construct an integer programming formulation to construct a lineup that maximizes points while forc-
ing the lineup to have a high variance by including lines (also known as stacking).
3. Solve the integer programming formulation multiple times, each time adding constraints to produce
distinct lineups.
The success we were able to achieve given the relative simplicity of our approach shows the power of inte-
ger programming in daily fantasy sports contests. An interesting question is how easily can this integer
programming approach be ported over to other sports besides hockey? Of the items listed above, the first
and third are generic to any sport. The only one unique to hockey is the way in which we formulated stack-
ing constraints. There are structural correlation in hockey which we exploited to produce higher variance
lineups. While other sports may not have these specific structural correlations, they do have other structural
correlations which can be used to stack lineups. We now briefly discuss these for a few popular sports.
American Football. In American football daily fantasy sports contests, lineups are required to have a
single quarterback who throws the ball, and receivers who catch the ball. Fantasy points are awarded to the
quarterback and receiver for any completed passes, yards achieved on the completion, and any touchdown
scored on the completion. This leads to a very natural structural correlation. By creating a lineup with a
quarterback and receivers on the same team, the variance of the lineup will increase. This is a well known
approach to stacking football lineups (Bales 2015b).
Baseball. In baseball, a pitcher throws the ball to a batter who must hit the ball and then run to four
bases to score runs. There is a fixed set of batters for a team, and usually a single pitcher for the majority
of a game. Fantasy points are given to the batters for hits and runs. The structural correlation in baseball is
created by the pitcher. If a pitcher is weak, then the all the opposing batters will likely score many fantasy
points. Therefore, to create high variance lineups in baseball, batters should be selected from the same team
if the opposing pitcher is expected to be weak. This stacking approach is also well known (Bales 2015a).
ReferencesT. Awad. Numbers on ice: Fixing plus/minus. Hockey Prospectus, apr 2009.
Jonathan Bales. The art of stacking (or not) in daily fantasy baseball. http://playbook.draftkings.com/mlb/the-art-of-
stacking-or-not-in-daily-fantasy-baseball/, June 2015a.
Hunter, Vielma, and Zaman: Winning26
Jonathan Bales. Pairing a QB with his receiver(s). https://rotogrinders.com/articles/pairing-a-qb-with-his-receiver-s-
481544, December 2015b.
David Beaudoin and Tim B Swartz. Strategies for pulling the goalie in hockey. The American Statistician, 64(3),
2010.
A. Becker and A. Sun. An analytical approach for fantasy football draft and lineup management. Journal for Quanti-
tative Analysis in Sports, 2014.
Jeff Bezanson, Stefan Karpinski, Viral Shah, and Alan Edelman. Julia: A fast dynamic language for technical com-
puting. CoRR, abs/1209.5145, 2012.
New York Times Editorial Board. Rein in online fantasy sports gambling.
http://www.nytimes.com/2015/10/05/opinion/rein-in-online-fantasy-sports-gambling.html, October 2015.
COIN-OR. CBC: COIN-OR Branch and Cut. https://projects.coin-or.org/Cbc.
DailyFantasyNerd. Daily fantasy nerd. http://dailyfantasynerd.com, December 2015.
DraftKings. Draftkings. http://draftkings.com, December 2015.
Drewby. Rackem stackem: Dfs hockey. http://dailyroto.com/nhl-dfs-stacking-strategy-draftkings-fanduel, October
2015.
Iain Dunning, Joey Huchette, and Miles Lubin. JuMP: A modeling language for mathematical optimization.
arXiv:1508.01982 [math.OC], 2015.
GNU. GLPK: GNU Linear Programming Kit. https://www.gnu.org/software/glpk/.
Robert B Gramacy, Shane T Jensen, and Matt Taddy. Estimating player contribution in hockey with regularized logistic
regression. Journal of Quantitative Analysis in Sports, 9(1):97–111, 2013.
Gurobi Optimization. The Gurobi Optimizer. http://www.gurobi.com.
Drew Harwell. Why you (probably) won’t win money playing Draftkings, FanDuel.
http://www.dailyherald.com/article/20151012/business/151019683/, October 2015.
Darren Heitner. Draftkings reports $304 million of entry fees in 2014.
http://www.forbes.com/sites/darrenheitner/2015/01/22/draftkings-reports-304-million-of-entry-fees-in-2014/,
January 2015.
Miles Lubin and Iain Dunning. Computing in operations research using Julia. INFORMS Journal on Computing, 27
(2):238–248, Spring 2015.
Brian Macdonald. An improved adjusted plus-minus statistic for nhl players. In Proceedings of the MIT Sloan Sports
Analytics Conference, URL http://www. sloansportsconference. com, 2011.
Stephen Pettigrew. Assessing the offensive productivity of nhl players using in-game win probabilities. 2015.
Rotogrinders. Rotogrinders. https://rotogrinders.com/, December 2015.
Hunter, Vielma, and Zaman: Winning27
M Schuckers. Digr: A defense independent rating of nhl goaltenders using spatially smoothed save percentage maps.
In Proceedings of the MIT Sloan Sports Analytics Conference, 2011.
Michael Schuckers and James Curro. Total hockey rating (thor): A comprehensive statistical rating of national hockey
league forwards and defensemen based upon all on-ice events. In Proceedings of the 2013 MIT Sloan Sports
Analytics Conference, http://www. statsportsconsulting. com/thor, 2013.
Michael E Schuckers, Dennis F Lock, Chris Wells, CJ Knickerbocker, and Robin H Lock. National hockey league
skater ratings based upon all on-ice events: An adjusted minus/plus probability (ampp) approach. Unpub-
lished manuscript. Available at http://myslu. stlawu. edu/˜ msch/sports/LockSchuckersProbPlusMinus113010.
pdf, 2011.
Daniel Steinberg. How daily fantasy football pricing algorithms work. https://dailyfantasywinners.com/fantasy-
categories/featured/daily-fantasy-football-pricing-algorithms-work/, 2015.
AC Thomas, Samuel L Ventura, Shane T Jensen, Stephen Ma, et al. Competing process hazard function models for
player ratings in ice hockey. The Annals of Applied Statistics, 7(3):1497–1524, 2013.