15
Machine Learning in Football By Andrew Finley

Machine Learning in Football

Embed Size (px)

DESCRIPTION

Machine Learning in Football. By Andrew Finley. Research Question. Is it possible to predict a football player’s professional based on collegiate performance? That is, is it possible to accurately predict some player’s NFL statistic using only their collegiate statistics? - PowerPoint PPT Presentation

Citation preview

Page 1: Machine  Learning in Football

Machine Learning in Football

By Andrew Finley

Page 2: Machine  Learning in Football

Research QuestionIs it possible to predict a football player’s

professional based on collegiate performance? That is, is it possible to accurately predict some player’s NFL statistic using only their collegiate statistics?

Why – Too many “busts”How –

Gather statistics for both NCAA and NFL playersUse statistics and ML algorithms to train a

program Use program to predict unseen examples

Page 3: Machine  Learning in Football

Presentation OutlineRelated Works

Alternate applications of machine learning in sportMy Approach

Machine Learning - ClassificationDecision Tree Algorithm

ImplementationStatistics to predictGather and Format StatisticsInsert into Weka (ML software)Build Decision Tree

Results and AnalysisCross-validationFeature Selection

Page 4: Machine  Learning in Football

Related WorksMr. NFL/NCAA (Predicts Games)

Classification using Linear Regression on Team StatisticsFFtoday.com (Predicts Fantasy Football Stats)

Linear Regression on Fantasy Football StatisticsDraft Tek (Predicts NFL Draft)

Ranks college players and takes a matrix of team needs at every position

SABRmetricsUse statistical analysis to create new baseball statisticsExample:

RUNS = (.41) 1B + (.82) 2B + (1.06) 3B + (1.42) HR

Page 5: Machine  Learning in Football

Machine LearningType – Supervised Learning (Classification)

Program is given a set of examples (instances) from which it learns to classify unseen examples

Each instance is a set of attribute values and with a known class

The goal is to generate a set of rules that will correctly classify new examples

Algorithm: Decision Tree

Page 6: Machine  Learning in Football

Decision TreeCreate a graph (tree) from the training data.The leaves are the classes, and branches are

attribute valuesGoal is to make the smallest tree possible

that covers all instances Use the tree to make a set of classification

rules

Page 7: Machine  Learning in Football

My DataI narrowed my predictions down to just Quarterbacks

and Running backs Input (NCAA):

Individual and team stats from every year of college play, as well as team rankings and strength of schedule, and height and weight

Combine data not included due to lack of participationOutput (NFL):

RB: Yrds/Carry, Total Rushing Yards, and Rushing TDs, for each of first 3 seasons, starting after 3 seasons

QB: Total Passing Yards, Passing TDs, Interceptions, and QB Rating, for each of first 3 seasons, starting after 3 seasons

Page 8: Machine  Learning in Football

Data RetrievalStep 1 – Find statistics

Online: NFL.com, NCAA.orgCollegio Football: Database Software

Step 2 – Extract dataPython scripts parsed necessary statistics off

websitesStatistics from Collegio were exported manually

Step 3 – Convert data into correct formatPython scripts used to combine data into 2

large .csv files for, one for RB and one for QBMissing data is filled in as accurately as possible

Page 9: Machine  Learning in Football

ExamplePlayer School Year1 Pos1 Cl1 G1 Rush Yds1 Car1 Rush TD1 Yds/Car1 RushYds/G1 Rec Yds1 Rec1 Rec TD1 Yds/Rec1 Rec/G1 RecYds/G1 PR1 PR Yds1 PR TD1 Yds/PR1 PR/G1 KR1 KR Yds1 KR TD1 Yds/KR1 KR/G1 Ret TD1 Tot Yds1 Tot TD1 TotYds/G1Ronnie Brown Auburn 2002RB So 12 1008 175 13 5.76 84 166 9 1 18.4 0 13.8 0 0 0 0 0 0 0 0 0 0 0 1174 14 97.8

Year2 Pos2 Cl2 G2 Rush Yds2 Car2 Rush TD2 Yds/Car2 RushYds/G2 Rec Yds2 Rec2 Rec TD2 Yds/Rec2 Rec/G2 RecYds/G2 PR2 PR Yds2 PR TD2 Yds/PR2 PR/G2 KR2 KR Yds2 KR TD2 Yds/KR2 KR/G2 Ret TD2 Tot Yds2 Tot TD2 TotYds/G22003RB Jr 6 446 95 5 4.7 74.3 80 8 0 10 1 13.3 0 0 0 0 0 0 0 0 0 0 0 526 5 87.6

Year3 Pos3 Cl3 G3 Rush Yds3 Car3 Rush TD3 Yds/Car3 RushYds/G3 Rec Yds3 Rec3 Rec TD3 Yds/Rec3 Rec/G3 RecYds/G3 PR3 PR Yds3 PR TD3 Yds/PR3 PR/G3 KR3 KR Yds3 KR TD3 Yds/KR3 KR/G3 Ret TD3 Tot Yds3 Tot TD3 TotYds/G32004RB Sr 12 913 153 8 5.97 76.1 313 34 1 9.2 2 26.1 0 0 0 0 0 0 0 0 0 0 0 1226 9 102.2

Height Weight6'-1'' 230Season1 Team1 G1 GS1 Att1 RushYds1 RushAvg1 RushLng1 RushTD1 Rec1 RecYds1 RecAvg1 RecLng1 RecTD1 FUM1 Lost1 Starting

2005MiamiDolphins 15 14 207 907 4.4 65 4 32 232 7.3 38 1 4 4 TRUESeason2 Team2 G2 GS2 Att2 RushYds2 RushAvg2 RushLng2 RushTD2 Rec2 RecYds2 RecAvg2 RecLng2 RecTD2 FUM2 Lost2 Starting

2006MiamiDolphins 13 12 241 1008 4.2 47 5 33 276 8.4 24 0 4 2 TRUESeason3 Team3 G3 GS3 Att3 RushYds3 RushAvg3 RushLng3 RushTD3 Rec3 RecYds3 RecAvg3 RecLng3 RecTD3 FUM3 Lost3 Starting

2007MiamiDolphins 7 7 119 602 5.1 60 4 39 389 10 43 1 0 0 TRUE

Blue = NCAA dataRed = NFL data

Page 10: Machine  Learning in Football

Weka Data ProcessingWeka is a machine learning algorithm

database built in Java. Only accepts .csv files in particular format.Preprocessing:

Apply filters to fix missing statsRemove all NFL data except statistic being

predictedClassify the desired statistic: if numeric

separate into ranges, if nominal separate by values.

Specify attributes

Page 11: Machine  Learning in Football

Building the TreeTree is constructed from specified attributes.Weka converts tree to classification rules.Accuracy is measured using cross validation.Cross validation: Break the training data into

a specified number of sets, use each set once as the test data, while the rest is used as training data.

Page 12: Machine  Learning in Football

Initial ResultsInitial runs with all attributes used failed;

created a 1 layer tree mapped to false for predicted statistic.

The accuracy varies greatly with slight changes to attributes used.

Tree size seems to increase as the attributes used decreases.

Page 13: Machine  Learning in Football

AnalysisThe initial 1 layer tree that was built gave an

accuracy of 68%.This is the worst possible tree, so I should be

able to get accuracy better than this.Attribute selection needs to improve.

Page 14: Machine  Learning in Football

NextImprove attribute selection to optimize

accuracy.(If time) Implement other algorithms to

compare accuracy.

Page 15: Machine  Learning in Football

Questions?