22
Multiobjective Prediction with Expert Advice Alexey Chernov Computer Learning Research Centre and Department of Computer Science Royal Holloway University of London GTP Workshop, June 2010 Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 1 / 18

Multiobjective Prediction with Expert Advicechernov/gtp.pdf · t,ωt) Expert Evaluator’s advice Loss(k) Ek (T) = XT t=1 λ(k)(γ(k) t,ωt) Loss(k)(T) = XT t=1 λ(k)(γ t,ωt) Alexey

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Multiobjective Prediction with Expert Advicechernov/gtp.pdf · t,ωt) Expert Evaluator’s advice Loss(k) Ek (T) = XT t=1 λ(k)(γ(k) t,ωt) Loss(k)(T) = XT t=1 λ(k)(γ t,ωt) Alexey

Multiobjective Prediction with Expert Advice

Alexey Chernov

Computer Learning Research Centre and Department of Computer ScienceRoyal Holloway University of London

GTP Workshop, June 2010

Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 1 / 18

Page 2: Multiobjective Prediction with Expert Advicechernov/gtp.pdf · t,ωt) Expert Evaluator’s advice Loss(k) Ek (T) = XT t=1 λ(k)(γ(k) t,ωt) Loss(k)(T) = XT t=1 λ(k)(γ t,ωt) Alexey

Example: Prediction of Sport Match Outcome

V. Vovk, F. Zhdanov. Predictions with Expert Advice for Brier Game. ICML’08

Bookmakers data:4 bookmakers, odds for ∼ 10000 tennis matches (2 outcomes)8 bookmakers, odds for ∼ 9000 football matches (3 outcomes)

Odds ai can be transformed to probabilities Prob[i]:

Prob[i] =1/ai∑j 1/aj

The loss is measured by the square (Brier) loss function.

Learner’s strategy is the Aggregating Algorithm.

Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 2 / 18

Page 3: Multiobjective Prediction with Expert Advicechernov/gtp.pdf · t,ωt) Expert Evaluator’s advice Loss(k) Ek (T) = XT t=1 λ(k)(γ(k) t,ωt) Loss(k)(T) = XT t=1 λ(k)(γ t,ωt) Alexey

Tennis Prediction, Square Loss

0 2000 4000 6000 8000 10000 12000−4

−2

0

2

4

6

8

10

12

14

16

Theoretical boundsLearnerBookmakers

Graph of the negative regret LossEk (T )− Loss(T ), 4 ExpertsLearner is the AA for the square loss

Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 3 / 18

Page 4: Multiobjective Prediction with Expert Advicechernov/gtp.pdf · t,ωt) Expert Evaluator’s advice Loss(k) Ek (T) = XT t=1 λ(k)(γ(k) t,ωt) Loss(k)(T) = XT t=1 λ(k)(γ t,ωt) Alexey

Tennis Prediction, Log Loss

0 2000 4000 6000 8000 10000 12000−5

0

5

10

15

20

25

Theoretical boundsLearnerBookmakers

Graph of the negative regret LossEk (T )− Loss(T ), 4 ExpertsLearner is the AA for the log loss (Bayes mixture)

Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 4 / 18

Page 5: Multiobjective Prediction with Expert Advicechernov/gtp.pdf · t,ωt) Expert Evaluator’s advice Loss(k) Ek (T) = XT t=1 λ(k)(γ(k) t,ωt) Loss(k)(T) = XT t=1 λ(k)(γ t,ωt) Alexey

Tennis Predictions: “Wrong” Losses

Graphs of the negative regret LossEk (T )− Loss(T )

0 2000 4000 6000 8000 10000 12000−10

−5

0

5

10

15

20

25

Theoretical boundsLearnerBookmakers

0 2000 4000 6000 8000 10000 12000−5

0

5

10

15

Theoretical boundsLearnerBookmakers

log loss square lossthe AA for the square loss the AA for the log loss

Learner optimizes for a “wrong” loss function

Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 5 / 18

Page 6: Multiobjective Prediction with Expert Advicechernov/gtp.pdf · t,ωt) Expert Evaluator’s advice Loss(k) Ek (T) = XT t=1 λ(k)(γ(k) t,ωt) Loss(k)(T) = XT t=1 λ(k)(γ t,ωt) Alexey

Aggregating Algorithm with Wrong Losses

FactFor the game with 2 outcomes, one can construct a sequence ofpredictions of 2 Experts and a sequence of outcomes with thefollowing property. If Learner’s predictions are generated by theAggregating Algorithm for the log loss then for almost all T

Loss(T ) ≥ LossE1(T ) + T/10 ,

where Loss(T ) and LossE1(T ) are the square losses of Learner andExpert 1.

A similar statement holds for the Aggregating Algorithm for the squareloss evaluated by the log loss.

Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 6 / 18

Page 7: Multiobjective Prediction with Expert Advicechernov/gtp.pdf · t,ωt) Expert Evaluator’s advice Loss(k) Ek (T) = XT t=1 λ(k)(γ(k) t,ωt) Loss(k)(T) = XT t=1 λ(k)(γ t,ωt) Alexey

New Settings

Experts: γ(1)t , . . . , γ

(k)t

Learner: γtReality: ωt

Many loss functions

Loss(m)Ek

(T ) =T∑

t=1

λ(m)(γ(k)t , ωt)

Loss(m)(T ) =T∑

t=1

λ(m)(γt , ωt)

Expert Evaluator’s advice

Loss(k)Ek

(T ) =T∑

t=1

λ(k)(γ(k)t , ωt)

Loss(k)(T ) =T∑

t=1

λ(k)(γt , ωt)

Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 7 / 18

Page 8: Multiobjective Prediction with Expert Advicechernov/gtp.pdf · t,ωt) Expert Evaluator’s advice Loss(k) Ek (T) = XT t=1 λ(k)(γ(k) t,ωt) Loss(k)(T) = XT t=1 λ(k)(γ t,ωt) Alexey

New Settings

Experts: γ(1)t , . . . , γ

(k)t

Learner: γtReality: ωt

Many loss functions

Loss(m)Ek

(T ) =T∑

t=1

λ(m)(γ(k)t , ωt)

Loss(m)(T ) =T∑

t=1

λ(m)(γt , ωt)

Expert Evaluator’s advice

Loss(k)Ek

(T ) =T∑

t=1

λ(k)(γ(k)t , ωt)

Loss(k)(T ) =T∑

t=1

λ(k)(γt , ωt)

Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 7 / 18

Page 9: Multiobjective Prediction with Expert Advicechernov/gtp.pdf · t,ωt) Expert Evaluator’s advice Loss(k) Ek (T) = XT t=1 λ(k)(γ(k) t,ωt) Loss(k)(T) = XT t=1 λ(k)(γ t,ωt) Alexey

Bound for New Settings

TheoremIf λ(k) are η(k)-mixable proper loss functions, k = 1, . . . , K , Learnerhas a strategy (e. g. the Defensive Forecasting algorithm) thatguarantees, for all T and for all k, that

Loss(k)(T ) ≤ Loss(k)Ek

(T ) +1

η(k)ln K .

Corollary

If λ(m) are η(m)-mixable proper loss functions, m = 1, . . . , M, Learnerhas a strategy that guarantees, for all T , for all k and for all m, that

Loss(m)(T ) ≤ Loss(m)Ek

(T ) +1

η(m)(ln K + ln M) .

Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 8 / 18

Page 10: Multiobjective Prediction with Expert Advicechernov/gtp.pdf · t,ωt) Expert Evaluator’s advice Loss(k) Ek (T) = XT t=1 λ(k)(γ(k) t,ωt) Loss(k)(T) = XT t=1 λ(k)(γ t,ωt) Alexey

Tennis Predictions, Two Losses

Graphs of the negative regret Loss(m)Ek

(T )− Loss(m)(T )

0 2000 4000 6000 8000 10000 12000−4

−2

0

2

4

6

8

10

12

14

16

Theoretical boundsLearnerBookmakers

0 2000 4000 6000 8000 10000 12000−5

0

5

10

15

20

25

Theoretical boundsLearnerBookmakers

square loss log loss

Learner optimizes for both loss functions, using the DF algorithm.

Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 9 / 18

Page 11: Multiobjective Prediction with Expert Advicechernov/gtp.pdf · t,ωt) Expert Evaluator’s advice Loss(k) Ek (T) = XT t=1 λ(k)(γ(k) t,ωt) Loss(k)(T) = XT t=1 λ(k)(γ t,ωt) Alexey

Defensive Forecasting Algorithm

∃π ∀ωK∑

k=1

p(k)t−1eη(λ(π,ω)−λ(π

(k)t ,ω)) ≤ 1 ,

where p(k)t−1 = p(k)

0 eη(Loss(t−1)−LossEk (t−1))

To get this (from Levin’s Lemma) we need that λ(π, ω) is continuousand for all π, π′

Eπeη(λ(π,·)−λ(π′,·)) =∑ω∈Ω

π(ω)eη(λ(π,ω)−λ(π′,ω)) ≤ 1

Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 10 / 18

Page 12: Multiobjective Prediction with Expert Advicechernov/gtp.pdf · t,ωt) Expert Evaluator’s advice Loss(k) Ek (T) = XT t=1 λ(k)(γ(k) t,ωt) Loss(k)(T) = XT t=1 λ(k)(γ t,ωt) Alexey

Defensive Forecasting Algorithm

∃π ∀ωK∑

k=1

p(k)t−1eη(k)(λ(k)(π,ω)−λ(k)(π

(k)t ,ω)) ≤ 1 ,

where p(k)t−1 = p(k)

0 eη(k)(Loss(k)(t−1)−Loss(k)Ek

(t−1))

To get this (from Levin’s Lemma) we need that λ(k)(π, ω) is continuousand for all π, π′

Eπeη(k)(λ(k)(π,·)−λ(k)(π′,·)) =∑ω∈Ω

π(ω)eη(k)(λ(k)(π,ω)−λ(k)(π′,ω)) ≤ 1

Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 10 / 18

Page 13: Multiobjective Prediction with Expert Advicechernov/gtp.pdf · t,ωt) Expert Evaluator’s advice Loss(k) Ek (T) = XT t=1 λ(k)(γ(k) t,ωt) Loss(k)(T) = XT t=1 λ(k)(γ t,ωt) Alexey

The DFA and the AA

λ is continuous and ∀π, π′ Eπeη(λ(π,·)−λ(π′,·)) ≤ 1 ⇒ λ is η-mixable

∀π, π′ Eπeη(λ(π,·)−λ(π′,·)) ≤ 1 ⇒ λ is proper

λ is η-mixable and ? ⇒ ∀π, π′ Eπeη(λ(π,·)−λ(π′,·)) ≤ 1

Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 11 / 18

Page 14: Multiobjective Prediction with Expert Advicechernov/gtp.pdf · t,ωt) Expert Evaluator’s advice Loss(k) Ek (T) = XT t=1 λ(k)(γ(k) t,ωt) Loss(k)(T) = XT t=1 λ(k)(γ t,ωt) Alexey

The DFA and the AA

λ is continuous and ∀π, π′ Eπeη(λ(π,·)−λ(π′,·)) ≤ 1 ⇒ λ is η-mixable

∀π, π′ Eπeη(λ(π,·)−λ(π′,·)) ≤ 1 ⇒ λ is proper

λ is η-mixable and proper ⇒ ∀π, π′ Eπeη(λ(π,·)−λ(π′,·)) ≤ 1

Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 11 / 18

Page 15: Multiobjective Prediction with Expert Advicechernov/gtp.pdf · t,ωt) Expert Evaluator’s advice Loss(k) Ek (T) = XT t=1 λ(k)(γ(k) t,ωt) Loss(k)(T) = XT t=1 λ(k)(γ t,ωt) Alexey

Proper Loss Functions

λ is proper if for any π, π′ ∈ P(Ω)

Eπλ(π, ·) ≤ Eπλ(π′, ·)

If ω ∼ π then Eπλ(π′, ω) is the expected loss for prediction π′.

The expected loss is minimal for the true distribution⇒ the forecaster is encouraged to give the true probabilities

The square loss and the log loss are proper.

Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 12 / 18

Page 16: Multiobjective Prediction with Expert Advicechernov/gtp.pdf · t,ωt) Expert Evaluator’s advice Loss(k) Ek (T) = XT t=1 λ(k)(γ(k) t,ωt) Loss(k)(T) = XT t=1 λ(k)(γ t,ωt) Alexey

Example: Hellinger Loss

λHellinger (γ, ω) =12

r∑j=1

(√γ(j)−

√Iω=j

)2

The Hellinger loss is√

2-mixable

The Hellinger loss is not proper

Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 13 / 18

Page 17: Multiobjective Prediction with Expert Advicechernov/gtp.pdf · t,ωt) Expert Evaluator’s advice Loss(k) Ek (T) = XT t=1 λ(k)(γ(k) t,ωt) Loss(k)(T) = XT t=1 λ(k)(γ t,ωt) Alexey

Proper Mixable Loss Functions

Each mixable loss function λ(γ, ω) has a proper analogue λproper (π, ω)such that

1 ∀π∃γ∀ω λproper (π, ω) = λ(γ, ω)

2 ∀π∀γ Eπλproper (π, ·) ≤ Eπλ(γ, ·)

For the Hellinger loss, the proper analogue is the spherical loss

λspherical(π, ω) = 1− π(ω)√∑rj=1(π(j))2

λspherical(π, ω) = λHellinger (γ, ω) for γ(ω) = (π(ω))2Pr

j=1(π(j))2

Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 14 / 18

Page 18: Multiobjective Prediction with Expert Advicechernov/gtp.pdf · t,ωt) Expert Evaluator’s advice Loss(k) Ek (T) = XT t=1 λ(k)(γ(k) t,ωt) Loss(k)(T) = XT t=1 λ(k)(γ t,ωt) Alexey

Proper Mixable Loss Functions

Each mixable loss function λ(γ, ω) has a proper analogue λproper (π, ω)such that

1 ∀π∃γ∀ω λproper (π, ω) = λ(γ, ω)

2 ∀π∀γ Eπλproper (π, ·) ≤ Eπλ(γ, ·)

For the Hellinger loss, the proper analogue is the spherical loss

λspherical(π, ω) = 1− π(ω)√∑rj=1(π(j))2

λspherical(π, ω) = λHellinger (γ, ω) for γ(ω) = (π(ω))2Pr

j=1(π(j))2

Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 14 / 18

Page 19: Multiobjective Prediction with Expert Advicechernov/gtp.pdf · t,ωt) Expert Evaluator’s advice Loss(k) Ek (T) = XT t=1 λ(k)(γ(k) t,ωt) Loss(k)(T) = XT t=1 λ(k)(γ t,ωt) Alexey

Example: Mixable and Non-Mixable Losses

Experts 1, . . . , K predict π(k) ∈ P(0, 1).Experts 1, . . . , N predict γ(n) ∈ 0, 1.Learner predicts (π, π) ∈ P(0, 1)× P(0, 1) such thatif π(0) > 1/2 then π(0) = 1 and if π(1) > 1/2 then π(1) = 1.

There exists a strategy for Learner that guarantees for any k

T∑t=1

λsquare(πt , ωt) ≤T∑

t=1

λsquare(π(k)t , ωt) + ln(K + N)

and for any m

T∑t=1

λabs(πt , ωt) ≤T∑

t=1

λsimple(γ(n)t , ωt) + O(

√T ln(K + N) + T ln ln T )

Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 15 / 18

Page 20: Multiobjective Prediction with Expert Advicechernov/gtp.pdf · t,ωt) Expert Evaluator’s advice Loss(k) Ek (T) = XT t=1 λ(k)(γ(k) t,ωt) Loss(k)(T) = XT t=1 λ(k)(γ t,ωt) Alexey

Tennis Predictions, Square and Absolute Losses

Graphs of the negative regret Loss(m)Ek

(T )− Loss(m)(T )

0 50 100 150 200 250 300 350 400−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

theoretical boundsDFexperts

0 50 100 150 200 250 300 350 400−2

−1.5

−1

−0.5

0

0.5

DFexperts

square loss absolute loss

Learner optimizes for both loss functions, using the DF algorithm with“mixability” and Hoeffding supermartingales.

Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 16 / 18

Page 21: Multiobjective Prediction with Expert Advicechernov/gtp.pdf · t,ωt) Expert Evaluator’s advice Loss(k) Ek (T) = XT t=1 λ(k)(γ(k) t,ωt) Loss(k)(T) = XT t=1 λ(k)(γ t,ωt) Alexey

The Mixed Supermartingale

1K + N

K∑k=1

e2PT−1

t=1 ((pt−ωt )2−(p(k)

t −ωt )2) × e2((p−ω)2−(p(k)

T −ω)2)

+1

K + N

N∑n=1

∫ 1/e

0

η(

ln 1η

)2 eηPT−1

t=1 (|pt−ωt |−[γ(n)t 6=ωt ])−η2/2

× eη(|p−ω|−[γ(n)T 6=ω])−η2/2

where pt = πt(1), p(k)t = π

(k)t (1), pt = πt(1).

[x 6= y ] = 1 if x 6= y and [x 6= y ] = 0 if x = y .

Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 17 / 18

Page 22: Multiobjective Prediction with Expert Advicechernov/gtp.pdf · t,ωt) Expert Evaluator’s advice Loss(k) Ek (T) = XT t=1 λ(k)(γ(k) t,ωt) Loss(k)(T) = XT t=1 λ(k)(γ t,ωt) Alexey

References

V. Vovk, F. Zhdanov. Predictions with Expert Advice for Brier Game.ICML 2008. http://arxiv.org/abs/0710.0485

A. Chernov, Y. Kalnishkan, F. Zhdanov, V. Vovk. Supermartingales inPrediction with Expert Advice. ALT 2008 and TCS.http://arxiv.org/abs/1003.2218

A. Chernov, V. Vovk. Prediction with expert evaluators’ advice.ALT 2009. http://arxiv.org/abs/0902.4127

A. Chernov, V. Vovk. Prediction with Advice of Unknown Number ofExperts. UAI 2010. http://arxiv.org/abs/1006.0475

http://onlineprediction.net/

Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 18 / 18