Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
ACCUPREDICT: A Method for Forecasting NASCAR AVAILABLE BY SUBSCRIPTION AT FANTASYRACINGCHEATSHEET.COM This report examines several driver performance measures and develops a method for predicting the finishing order of NASCAR Sprint Cup races. The author can be contacted at [email protected]. Cliff DeJong 1/9/2012
INTRODUCTION
Several years ago, I entered a friendly NASCAR fantasy league competition with my brother, who has
beaten me in just about everything. But, I’m a nerd and he is not, so I soon started to look at statistics
to improve my picks. It then became an obsession that has consumed untold hours of my time.
There is a lot of randomness in NASCAR. The plot below shows one of the better measures that I have
found for predictions. It shows the actual finish of each driver plotted against the average of the last 18
races prior to that race for the 2011 season, 1260 data points. Only finishes of 35 and better are
included. I have also shown the trendline as a summary of these data.
Figure 1
2011 Actual Finish Compared to Average of The Last 18 Races
The spread of the data is amazing, and it is not obvious that this can be useful. Yet there are tendencies
that are valuable since the data are clustered about the trendline. The important fact is that the order
of drivers in a specific race can be predicted in a meaningful way.
This paper will identify several metrics that will be used to forecast NASCAR outcomes. It will also
address how to combine these to get the best forecast. It is not a rigorous scientific paper, but intended
to show the methods used in general terms.
The implementation of this method is available as ACCUPREDICT, on FantasyRacingCheatSheet.com. We
will also provide an estimate of fantasy points for NASCAR.com’s Fantasy Live Game on the website,
using similar methods but not presented here. That estimate was used without alteration to score 22nd
overall in 2011 out of several thousand competitors.
SUMMARY
NASCAR data from 1991 through 2011 are used to develop performance metrics. The key driver
performance measures identified here are
average finish over the last 15 races,
year-to-date driver rating,
finishes at the last eight tracks of the same type,
driver ratings at the same track for the last eight races,
practice and
starting position.
Driver Rating is the NASCAR Loop Driver Rating, a formula that combines wins, finishes, green flag
passes and several other driver performance measures.
In this paper, track types are examined and a regrouping of types is suggested by statistical
considerations. Restrictor plate races are scored by a subset of the key measures listed above.
Driver scores based on the above measures are correlated with the actual finishes for the 2011 season
with a value of 0.554. During the 2011 season, ACCUPREDICT achieved a correlation of 0.538.
DATABASE
Predictions of almost anything are either historically based, assuming the past repeats itself, or based on
first principles of physics, like your daily weather forecast. Predictions based on historical databases are
looking for similarities with the past: if a situation has come up before, what has happened and how
does that apply to this week’s race? In other words, if a driver has done well at a particular track in the
past, does this mean he will do well this weekend? Maybe… you can also consider how well he is doing
this year, and at similar tracks, and how he practiced and qualified.
For NASCAR, there is a rich dataset of past races: I have each race back to 1991 in my database with the
finishing positions of each driver. This database is from the LeonardFrye.com website, which is an
excellent source for NASCAR statistics. There are over 19000 data points. The database is in a
computer-readable form, not scattered over various web sites, so it is relatively easy to process. Plus,
each week in the season, and for past races, there are driver loop data, practice data and qualifying
results, and other data such as bonus points earned, laps led, etc. My primary source for these data is
FantasyRacingCheatSheet.com.
There are some very good expert picks available on the web at no cost and some better ones that cost a
subscription fee, including ACCUPREDICT, which is the result of this analysis. Not all of these expert
picks rank all the drivers; some only give a list of the top 5 or so drivers, and perhaps a dark horse or
two.
Success in the fantasy leagues often depends on how well the low-ranked drivers do. These guys are
necessary picks because of fantasy salary constraints. So, I wanted to be able to rank each driver… not
just get someone’s opinion on who would do well at the next track.
METRICS
A metric is a quantifiable measure of a driver’s performance. Metrics available each week for each
driver include
Performance in the last several races
Performance at the same track
Performance at the same type track
Practice
Qualifying
Expert opinions (cheat sheets)
Performance can be measured in two primary ways: finishing position and Driver Rating. Lots of other
data reflecting performance are also available, for example, laps led, fast laps, green flag passes, quality
passes, etc. These later metrics are not as easy to process, but they are available on web sites such as
fantasyracingcheatsheet.com, and will be addressed for the 2011 season only.
Cheat sheets are opinions of experts, often based on unspecified statistics, and not used in this analysis.
I have found that other cheat sheets often do not score middle and lower ranked drivers.
DNFs or other major problems during a race can easily move a top ranked driver from a predicted top
five to a finish of 40th. I define a DNF as finishing behind anyone who does not complete the race—that
is a clear indication of a major problem, not just poor performance. Typical DNF rates are 15-20%. Since
DNFs are unpredictable, there is no obvious way to include them. Their effects on finishing position are
in the database that is used.
The process of how to combine the various metrics is a complex subject that takes serious effort but, as
will be seen, providing little gain beyond simple measures. The metrics are not independent—a driver
who has done well at a particular track has generally done well at the same track types, and he is likely
to practice well and qualify well.
STATISTICAL MEASURES
How do you measure the effectiveness of a metric or combinations of metrics? There are two primary
ways that I use: (1) correlation with the predicted finish and (2) the standard deviation of predicted
finish. I also use less frequently the likelihood that a higher-ranked driver will finish ahead of a lower-
ranked driver.
Correlation is a standard statistical measure that essentially plots one variable (the actual finish, for
example) as a function of the other variable (the metric, practice speed, for example), and measures
how well a straight line will fit the data. Correlation ranges between -1 and 1, with the two extremes
indicating a perfect fit. A correlation of zero indicates that the result is independent of the metric. In
other words, a very low correlation indicates the metric is not a useful indicator of a driver’s finishing
position. I will show some plots later to make this a lot clearer. Correlation can also be expressed as a
percentage: -100% to 100%. Typically, in NASCAR, numbers range from 30 to 50%, that is, there is a lot
of randomness in NASCAR. The data shown in the introduction has about 0.50 or 50% correlation.
A negative correlation means that as the metric gets larger, the actual finish gets smaller. Correlations
for NASCAR finishing positions are positive when past performance is measured by finishing positions,
that is, a small actual finish is expected when the average finish over the last several races is good (or
low). When performance is measured by Driver Rating, correlations are negative since a high driver
rating number implies a better driver and therefore a better predicted finish. In this paper, I deal only
with positive correlations by scaling the metrics—for example, the Driver Rating becomes a simple
ranking of the drivers, with the best driver scored a one, second best a two, etc.
The standard deviation of the predicted finish is a measure of how accurate the prediction is. In
essence, it is a measure of how much you are wrong on average. Almost 70% of the data are within plus
or minus one standard deviation. It is larger than you might think: typical numbers are 9 to 10,
showing, again, a lot of variability in NASCAR. This is not at all unreasonable if you think about a DNF
rate of about 20%. A driver that finishes 1, 2, 3, 4 and 35 (due to an accident), will average only a 9th
place finish for these five races, despite four outstanding races. The relative average finishing positions
among drivers is the important point.
Drivers will be ranked by a score, based on the metrics selected. The likelihood that a higher ranked
driver finishes ahead of a lower ranked driver is calculated by comparing each driver with every other
driver ranked below him. The percentage of correct rankings is then calculated, and averages about
70%. This percentage is higher if the difference in rankings is high, and less if differences are small. This
measure is not used often, since it is related closely to correlations.
RECENT PAST PERFORMANCE-ALL RACES
One metric is how well a driver has done in the last several races, counting every track. If you use a
small number of races, you will measure how a particular driver has done lately and be able to react to a
driver on a hot streak, such as Tony Stewart at the end of the 2011 season (or Kyle Bush’s annual
collapse during the Chase). A small number of races will better reflect how a driver has improved with
time as well. On the other hand, using a large number of past races will not be sensitive to one bad race
caused, for example, by an accident, and will be a better estimate of how consistent a driver is.
Using the database from 1991 through 2011, I evaluated the correlation of actual finishing position to
the driver’s average finish over the last N races. If a driver was only in some of the last N races, the
average is over those races he was in. The plot below shows correlations for the entire database, and
with the years 2010 and 2011 separated out.
Figure 2
Correlation for All Races Averages
The curves shown have similar shapes, and all show that averaging over the last 10 races or more give
the best results. However, an obvious question is why 2010 and 2011 are so much better than the
entire span of data over the years from 1991 to 2011. I believe that this is due to recent phenomena of
start-and-park, where some drivers with little or no sponsorship will attempt to qualify and then only
run a few laps due to cost issues. Those drivers are almost certain to finish very poorly every race and
are therefore easy to predict. This raises the overall correlation for those races in a misleading manner.
Since the top 35 cars in owner points are locked into each race and do not start-and-park, I repeated the
calculations using only cars that finished in the top 35. This is shown in Figure 3 below. It would be
better to look only at drivers in the top 35 in points (those locked into the race and not likely to start-
and-park), but that is not readily available. It would take significant effort to add this to the database.
Figure 3
Correlation for All Races (Top 35 Finishers Only)
Again, the curve shapes are similar and correlations for 2011 are somewhat above the long-term
averages, but the differences are much smaller. There is rapid improvement as the number of races
averaged increases to about 12-15, with little or no improvement above that. Trying to read differences
of less than a percent is pushing the data beyond what is reasonable, so I have selected 15 as a
reasonable number to average over all races. I wanted as small a number as possible to preserve any
information about drivers on hot streaks.
RECENT PAST PERFORMANCE-SAME TRACK
Often a driver will excel at a particular track. Denny Hamlin, for example, has always done very well at
Pocono. Using the 1991-2011 data, I calculated correlations of actual finishes for all drivers with his
average finishing position for each track. Again, only the top 35 finishers are counted. Figure 4 shows
the results for three typical tracks: Phoenix, Atlanta and Daytona.
Figure 4
Representative Same Track Averages
Phoenix and Atlanta show typical curve shapes, with the correlations rising as the number of races
averaged gets larger and then flatten out. The curves peak out at 6 or more races. Daytona, on the
other hand, has a very poor correlation, no matter how many races are averaged. There may be a few
drivers that have done well in the past and will do well at Daytona in the future, but in general, at
Daytona, past performance at that track does not imply continued success. Conversely, poor past
performance at Daytona does not necessarily imply another poor finish.
In Figure 5, I have taken each track’s correlation as a function of the number of races and averaged all
the tracks together to give the curve in red. The figure also shows in blue the average correlation for
averages from the most recent N races at any track (from Figure 3 above).
Figure 5
Average of Race Correlations at the Same Track vs All Tracks
The two curves have similar shapes: both start relatively low and then improve as the number of races
averaged increase. The correlation for averages at each individual track starts to level out above five or
six, while the average for the correlations using all tracks climbs more slowly, and peaks at around 15.
Consideration of each individual track’s correlation curve suggests that averaging eight races at the
same track gives very good performance for this measure. The table below in Figure 6 gives each track’s
correlation for performance averaged over the last eight races. I have also included the last four race
averages, since I have frequently used that in the past.
For almost all tracks, averaging over eight races improves correlation over the four-race averages, but
not by much. These correlations are also seen to be lower than the correlations over the last 15 races at
all tracks (see Figure 5). In other words, average driver finishes over the last 15 races at all tracks is a
better indicator of how well he will do at a particular track than his past finishes at the same track. Of
course, both performance measures will be used to estimate driver finishes. Notice also that some
tracks are not correlated well at all to past performance at that track: California, Chicago, Daytona,
Sonoma and Talladega.
Figure 6
Same Track Correlations
One possible explanation for the most recent 15 races at all tracks being a better indicator than the
most recent eight races at a specific track is the time interval covered. The last 15 races at any track is
almost half a season, or about half a year, while the last eight races covers the past four or eight years of
data at that track, depending on whether or not one or two races are run each season at that track (such
as once yearly races at Chicago or biannual races at Martinsville). I suspect that the reason for needing
to average over several races, even over several years at some tracks, is due to accidents or other
problems, like flat tires, that could skew a driver’s performance downward and distort his performance
unfairly. It is interesting to note that the need to use several races is more important than reflecting a
hot streak over a few races, that is, driver consistency for the long haul is more important.
PAST RECENT PERFORMANCE-SIMILAR TRACKS
One way to decrease the time interval for measurement of a driver’s performance is to look at
performance at similar tracks. For example, Jeff Gordon always does well at flat tracks, such as
Martinsville and Loudon. There are several races each year at flat tracks, so averaging over the past
eight races at flat tracks would only cover races during the past year and therefore would reflect more
recent performance, rather than requiring several years’ performance at an individual track.
Tracks can be grouped by several means: until now, the most accurate grouping that I have seen is
illustrated below in Figure 7. This grouping is based on track physical similarities of track length and
corner banking, and was made by Christopher Harris of ESPN in 2007. The ODD tracks are those that do
not fit nicely into the other categories, but have some similarities to others in the same ODD listing.
Figure 7
Traditional Track Groupings
The theory for similar track groupings is that a driver’s performance at all alike tracks will be consistent.
To evaluate this, I looked at the 1991-2011 database and correlated each driver’s finish to his average
performance at previous races at tracks in the same grouping. For example, I looked at Bristol finishes
for each driver against the average finish over the last N races at any steep track. This may include
previous races at Bristol. When these averages for Bristol based on steep tracks are plotted against
averages over all steep tracks, you would expect similar shaped curves if Bristol is properly classified as a
steep track and the supposition of similar performance holds.
Figure 8 shows the results for all eight groupings of tracks. Curves are generally tightly clustered for Flat
Tracks, Steep Tracks and Restrictor Plate Tracks, and have similar shapes for Road Course Tracks. Curve
shapes are somewhat different for Shallow Tracks and Cookie Cutter Tracks. The ODD1 and ODD2
Tracks, plus Road Courses, have curves that are spread out. Kentucky is based on only one result so
little can be determined for that track. Note also that some of the tracks, such as Chicago, Watkins Glen
and the plate tracks are poorly correlated to other tracks in their categories. Most of the curves in
general show their best correlation for an average taken over the past eight races of tracks in the same
category.
Figure 8-Traditional Track Groupings
Some tracks were regrouped because of these results. It was found that including the ODD tracks with
the Cookie Cutter Tracks was advantageous. This revised category was named Large Ovals and its
performance is shown in Figure 9. Again, the average over the last eight races is a good measure of
performance.
Figure 9
Large Oval Track Grouping
This shows a fairly tight clustering of the curves and improved correlations for each of the member
tracks with average finishes over the other tracks in the Large Ovals category. The exception is still
Kentucky, based on only one race, and therefore not a concern. The correlations for these tracks in the
new Large Oval category are shown below in Figure 10 for their previous grouping of track types and the
revised category of Large Oval. Results are for eight races averaged. All tracks in this new category
perform better and Chicago’s correlation is much improved and is now on a par with other tracks in this
category. Michigan and Atlanta are significantly improved as well.
Figure 10
Improvement in Regrouping as Large Oval Tracks
Indianapolis and Pocono, grouped as Shallow Tracks, offered only three races per year, and therefore
were grouped into the Flat Tracks category. Results are given in Figure 11 for eight-race averages, and
show improvement in all Flat Track correlations. Pocono in particular is better correlated with this new
grouping of Flat Tracks.
Figure 11
Revised Flat Track Category Performance
There are no obvious reclassifications for Road Course or Restrictor Plate Tracks. When the four of them
were grouped together as an excursion, the correlations were improved for Plate Tracks, but the Road
Course results were worse than the original groupings. There does not appear to be any justification for
regrouping these tracks.
Figure 12
Revised Similar Track Categories
Below in Figure 13 are the correlation averages for the revised track types. The curves all increase with
the number of races averaged and plateau near eight races. The Large Oval and Expanded Flat Tracks
have the highest correlations, with Steep Tracks close behind. Road Course and Restrictor Plate Tracks
are relatively poorly correlated.
Figure 13
Averages for Revised Track Categories
PRACTICE
There are at least two practices each race weekend, except in the case of rainouts, and sometimes there
are more. The last practice is called Happy Hour. Happy Hour sometimes is after qualifying, but more
often in recent years, it is before qualifying. When before qualifying, some drivers are making mock
qualifying runs and those are generally faster than practice in race trim. So, performance in Happy Hour
may not be the best comparative measure of a driver’s upcoming performance. In addition, practice
speeds are sometimes measured as the fastest lap a driver has made, and sometimes as the best 10-lap
average speed for those drivers that run at least 10 consecutive laps. TV commentators often have
commented about the value of the 10-lap averages. Average speeds for each practice session are also
available.
Practice is not usually intended to give the driver a rehearsal at a particular track, that is to familiarize
him with the track itself, but serves as a means to dial in the car’s handling characteristics and to
understand how to adjust the car as the track changes. A driver who dominates in practice almost
always does well in the following race.
A comprehensive database of the various practice measures does not seem to be available, so I went
back over the 2011 season and put together various statistics for each race. The correlations of finishing
position to several practice measures are shown below.
Figure 14
Practice Measures Comparison for 2011
The bar labeled Happy Hour is a ranking of the fastest Happy Hour speeds, while Average HH is the
average speed during Happy Hour. The Top 10 laps in Happy Hour are shown. Because of the mixture
of race trim and qualifying trim during Happy Hour, I also looked at peak lap speeds in the practice just
prior to Happy Hour. This is often the first practice, which also serves to set the qualifying order and is
therefore important to the driver. As the chart shows, the fastest laps in the next to last practice are the
best measure of practice, and the correlation achieved of 42% is about the same as the other measures
discussed so far in this analysis.
QUALIFYING
Qualifying is important in multiple ways to a driver. It is an obvious measure of how fast a driver can run
a single lap. Pit selection is chosen by a team in the order of the qualifying results, and it may give the
driver an easier (faster) entry and/or exit from his pit stall, and less chance of a pit road mishap. The
qualifying result is also the starting position and this can be very important at tracks where it is difficult
to pass.
The correlation of starting position to finish position is shown in Figure 15 for the last several seasons.
Some drivers are required to start at the end of the field because of an engine change, for example, and
those drivers are treated here by how they qualified.
Figure 15
Correlation of Finishing Position with Starting Position
It is not clear why the correlation has improved with time. Correlations for the latest season (2011) are
again about 40%, which is about the same as the other performance measures examined this far.
OTHER METRICS
Thus far, we have looked at performance in past races in a number of ways: the past finish positions of
all recent races, races at the same track, and races at similar tracks. A revision of track types from
similar physical characteristics to those with similar statistics was assessed and found to be beneficial.
Practice speeds at the next to last practice are useful, and the starting position (or qualifying results) is
also valuable.
Each season is a new start for drivers and their teams, and may bring a new crew chief or even a new
team for some drivers. I have observed that each year has drivers that seem to do consistently better
(or worse) than expected, based solely on their past performance from previous years. As a
consequence, another measure that I have found to be useful is the current year-to-date standings of
the drivers. I do not use year-to-date statistics until after four races have been run. I have used point
standings and Driver Ratings in past years, and found that driver rankings based on Driver Ratings are a
little better to use. I have not done a formal analysis, but at the end of the 2010 season found that the
Driver Rating was correlated to average finish with a value of 0.93 while the correlation of points (after
taking out the Chase points adjustment) to average finish was 0.76. These are correlations to the final
values for the entire 2010 season and cannot be compared to other correlation measures in this report
that are for single races.
Driver Rating, as defined by NASCAR loop data, combines several measures of driver performance,
including green flag passes, green flag times passed, fast laps, laps led, and more. I collected several of
these measures during the 2011 season for assessment and will show these in the next section.
2011 SEASON ASSESSMENT OF SELECTED METRICS
For the 2011 season, a number of performance measures were collected for each race. The figure
below shows those measures and their correlations with finishing position.
Figure 16
2011 Driver Performance Measures and Correlations
Here are the definitions of each measure:
L18-F: Average of the finishing position of the last 18 races
L4-DR: Ranking of average Driver Rating for the last 4 races
YTD-DR: Ranking of average Driver Rating for the year to date
Not used in the first four races
SType-F: Average finish position for races at the same type track
This uses the traditional track groupings, before revisions above, and averages over 4-12
races for different tracks.
SType-DR: Average Driver Rating for races at the same type track, as above
SType-4F: Average finish position for the last four races at the same type track
STrack-F: Average finish position for races at the same track, over 2-11 races
STrack-DR: Ranking of average Driver Ratings at the same track, over 2-11 races
STrack-Pwr: Ranking of average Driver Ratings at the same track over the past two years
Start: Start Position, defined as qualifying results
Practice: Ranking of fast speeds in the next to the last practice
Bonus Points: Average of bonus points earned
Pass Dif: Average of green flag passes, less green flag times passed, over the last 2-11 races
Laps Led: Number of Laps Led at the same track, averaged over the last 2-11 races
Fast Laps: Number of Fast Laps at the same track, averaged over the last 2-11 races
These metrics were developed as possibly useful in unpublished analyses of past seasons. I offer no
rationale for their selection; they have evolved over time. Some of the measures are much better than
others; the average finish over the last 18 races is the best, with year-to-date driver rating the second
best. Others like the green flag pass differential have little information with relatively poor correlations.
COMBINING METRICS
Given these 15 measures of driver performance, how can they be combined to give the best estimate of
finish position for each driver and race? This is far from an obvious question, because all of these are
heavily correlated to each other, that is, a driver that has finished well in the last 18 races, is also placed
highly in the year-to-date Driver Ratings, etc. If two measures are highly correlated, then the second
measure adds little new information to the information in the first measure. An additional complication
is that not all measures are always available for each driver; Trevor Bayne, for example, had no prior
Sprint Cup history at Daytona before the 2011 season opening race.
The desire is to find a simple method for combining selected measures. First, all measures must be
transformed mathematically so that a small number will indicate a likely good finish. The easy way to do
this is to fit the various measures to the average finish and then use the curve fit data to represent the
measure in question.
A score defined as a simple average of all the transformed measures gives a correlation of 0.538 to the
actual finishes. The standard deviation of the estimated finishes based on the simple average for 2011 is
9.45. A large number of perturbations on combinations of the measures were examined, and the best
approach was to average L18-F, YTD-DR, STY-F, STR-DR, Start and Practice. This gave a correlation for
the score to actual finish of 0.550 and a standard deviation of 9.36 for the estimated finishes.
To determine the best possible fit of the metrics to the finishing positions, a multiple regression was
calculated, using all 15 metrics as inputs. Regression, strictly speaking, is only valid for independent
variables, and these are not independent. Still, in practice, regression can be very useful even here. For
data points without all 15 metrics, the simple averages of the best combinations in the previous
paragraph were used. This gave a correlation of 0.559 and a standard deviation of 9.29. The
disadvantage of using this approach, however, is complexity and the fact that the regression is highly
tuned to the data in the 2011 season. The regression for 2012 data will almost certainly be different.
Plus, the 0.559 is not dramatically better than the 0.550 found by experiment.
There are other approaches to maximize the correlation of a combination of correlated variables. For
statistics geeks, I tried Principal Component Analysis and a method in a paper by Keller and Olkin. You
can Google these for more info. The required assumptions are only partially met, and results were a
correlation of 0.550-0.551, not quite as good as the regression results. These have also been tried in
earlier seasons, with similar results, and have the same drawbacks as the regression method.
Another interesting approach that I tried was to look at each driver as ranked for all metrics. If a driver
was ranked ahead of another driver on more of the metrics, then he was ranked higher in the
combination. This was only slightly different from the averages of the metrics, and performance was
slightly worse.
For all approaches, the likelihood of a driver finishing ahead of a lower ranked driver was calculated. It
varied by race, but all of the best approaches averaged about 70%.
The approach of a simple average of the metrics was chosen. With this, the performance of the best
combination is very poor for the restrictor plate races, so they were split out. Best combinations for
these were L18-F, YTD-DR and Practice. Correlations for the plate races improved from 0.243 to 0.318,
and finish standard deviation went from 11.5 to 11.2. This is still poor performance.
When this was combined with the non-plate races and their best combinations, the final correlation of
score with finish is 0.554, and the standard deviation is 9.29. Corresponding ACCUPREDICT results were
0.535 and 9.53.
FINAL ACCUPREDICT METHOD FOR 2012
The method proposed is somewhat better than the approach used in 2011. For each race, the top 35
drivers in points are identified, and their finishes in the last 15 races are averaged. Their performance in
year-to-date driver ratings is ranked. Similar track types are identified, using the revised definitions in a
previous section, and finishing position is averaged over the last eight races on those tracks. The
average driver ratings at the last eight races at the same track are ranked. Practice speeds at the next to
last practice are ranked, and the start position is used. These six performance measures, or whatever
exist for each driver, are averaged, and the resulting score gives the expected finishing position by a
simple curve fit to the 2011 data. For restrictor plate races, the average of the last 15 races, year-to-
date Driver Ratings, and the practice rankings are used.
It is noted that the 2012 proposed metrics are slightly different from the 2011 season metrics used to
select combinations. The last 15 races are used, and the track groupings into related track types have
been revised. The number of races averaged for 2012 for same type and same track metrics is eight,
rather than variable 2-14 races in 2011. The rationale for this is that the numbers and groupings have
been changed to improve correlations and they are measuring very similar information.
This new approach was applied to five sample races from 2011, one from each track type, and
correlations improved on average from 0.470 to 0.502.
A similar approach has been defined for NASCAR.com’s Fantasy Live to predict fantasy points for each
driver. In 2012, it will be included in the ACCUPREDICT forecasts. The method in 2011 finished 22nd
overall out of several thousand competitors.