Forecasting the FIFA World Cuppieter.robberechts/... · Match outcome prediction Via team rating systems Two prediction models were explored:-Ordered logit regression (result-based)-Bivariate

Forecasting the FIFA World CupCombining goal- and result-based team ability parameters

Pieter Robberechts, Jesse Davishttp://people.cs.kuleuven.be/pieter.robberechts

http://people.cs.kuleuven.be/pieter.robberechts

Introduction

A popular research topic since the '60

Two popular approaches:

1. Goal-based models

Model the number of goals scored by both teams

2. Result-based models

Model win-draw-loss outcomes directly

Typical approach:

1. Estimate team abilities based on historical match data

2. Use them to predict future match outcomes

Match outcome prediction

Data → Team ratings → Predictions


Typical approach:




Data scraped from:

- post WW2 international games from http://eloratings.net

- betting odds from http://betexplorer.com/

http://eloratings

http://betexplorer.com/


Typical approach:




Two rating systems were explored:

- ELO ratings (result-based)

- ODM ratings (goal-based)

Team ...

Strength 2320 2237 2220 2207 ....

The ELO rating systemA Result-based rating system

EH =1

1 + 10RH−RA

400

R′�H = RH + k(SH − EH)

Given:

RH, RA

SH = {10.50

Then:

Current home and away team ratings

Expected score for the home team

Actual score of the home team

Updated rating of the home team

If the home team wonWhen drawIf the home team lost

The ELO rating systemA Result-based rating system

k = k0wi(1+δ)γ

Problem: - Not all games are handled with the same seriousness- Most games are played against weak opponents

‣ Competitiveness factor ‣ Margin of victory

Margin of victory weight Recentness factorR′�H = RH + k(SH − EH)

Offense-Defense ratingsA Goal-based rating system

Given:

Then:

Aij =

oj =n

∑i=1

Aij

didi =

n

∑i=1

Aji

oi

Aij = 0

Score team j generated against team i

Otherwise

Offensive rating of team j Defensive rating of team i

Offense-Defense ratingsA Goal-based rating system

Problem: - Large disparities between the number of games played and the

strength of the opponents- Teams in different confederations rarely play each other

Solution:

Update ratings sequentially

For each team:- Pre-game ratings = weighted sum of a team's post game ratings- Post-game ratings = ODM procedure with pre-game ratings as initial ratings

Match outcome predictionVia team rating systems

Two prediction models were explored:

- Ordered logit regression (result-based)

- Bivariate poisson regression (goal-based)

Typical approach:




Predictor

Eloattdef

Elodefatt [ 0.43 0.33 0.24 ]

"Belgium wins"

"It's a tie"

"England wins"

Home advantage?

-

Tuning the predictive power

1r − 1

r −1

∑k=1

(k

∑l=1

( ̂pl − yl))2

How accurate are our predictions?

3 possible interpretations:1. How many games are predicted correctly?→ Accuracy

2. How certain was the model about the true outcome?→ Logarithmic loss

3. How certain was the model about the true ordered outcome?

→ Ranked Probability Score (RPS)

Tuning the predictive power

Dataset

Test setValidation set

Apply best model

Training set

Until convergence: For each game ∈ Training set: update_rating(game) If game ∈ Validation set: make_prediction(game)

End if End for Compute average RPS Update rating and prediction model parameters

Minimise RPS with L-BFG-S algorithm:

Challenge I: Match outcome prediction

Accuracy LogLoss RPS

ELO ordered logit

ELO bivariate Poisson

Random forest

Bookmakers

ELO+ODM ordered logit

ELO+ODM bivariate Poisson

ODM ordered logit

ODM bivariate Poisson

0,51 0,6 0,1

40,2

30,9

21,0

1

The models were validated on the 2002, 2006, 2010 and 2014

World Cups 2002 2006 2010 2014allX

Challenge I: Match outcome prediction

Accuracy RPS

Bookmakers

ELO ordered logit

ELO+ODM ordered logit

Berrar et al.

Hubáček et al.

Constantinou

Tsokos et al.

And compared with the 2017 Soccer Prediction Challenge submissions

0,5 0,54

0,201

0,209

Accuracy LogLoss RPS

2014 Elo

Elo+ODM

FiveThirthyEight

2010 Elo

Elo+ODM

2006 Elo

Elo+ODM

2002 Elo

Elo+ODM

0,3 0,6 0,1 0,24

0,15

0,25

Challenge II: Tournament elimination

How accurate can we predict the round of elimination of each team in

previous World Cups?

Our predictions

Other's predictions

Accuracy LogLoss RPSFiveThirtyEightZeileirs et al.Groll et al.Our model

UBS 0,50,563

0,5940,563

0,531

0,2010,224

0,1860,1850,182

0,1920,1320,1260,1270,124

Tournament elimination

Online interactive https://dtai.cs.kuleuven.be/sports/worldcup18/

https://dtai.cs.kuleuven.be/sports/worldcup18

Thanks!Any questions?

Interactive at:https://dtai.cs.kuleuven.be/sports/worldcup18/

https://dtai.cs.kuleuven.be/sports/worldcup18

Documents

Forecasting the FIFA World Cuppieter.robberechts/... · Match outcome prediction Via team rating systems Two prediction models were explored:-Ordered logit regression (result-based)-Bivariate