Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Multiobjective Prediction with Expert Advice
Alexey Chernov
Computer Learning Research Centre and Department of Computer ScienceRoyal Holloway University of London
GTP Workshop, June 2010
Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 1 / 18
Example: Prediction of Sport Match Outcome
V. Vovk, F. Zhdanov. Predictions with Expert Advice for Brier Game. ICML’08
Bookmakers data:4 bookmakers, odds for ∼ 10000 tennis matches (2 outcomes)8 bookmakers, odds for ∼ 9000 football matches (3 outcomes)
Odds ai can be transformed to probabilities Prob[i]:
Prob[i] =1/ai∑j 1/aj
The loss is measured by the square (Brier) loss function.
Learner’s strategy is the Aggregating Algorithm.
Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 2 / 18
Tennis Prediction, Square Loss
0 2000 4000 6000 8000 10000 12000−4
−2
0
2
4
6
8
10
12
14
16
Theoretical boundsLearnerBookmakers
Graph of the negative regret LossEk (T )− Loss(T ), 4 ExpertsLearner is the AA for the square loss
Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 3 / 18
Tennis Prediction, Log Loss
0 2000 4000 6000 8000 10000 12000−5
0
5
10
15
20
25
Theoretical boundsLearnerBookmakers
Graph of the negative regret LossEk (T )− Loss(T ), 4 ExpertsLearner is the AA for the log loss (Bayes mixture)
Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 4 / 18
Tennis Predictions: “Wrong” Losses
Graphs of the negative regret LossEk (T )− Loss(T )
0 2000 4000 6000 8000 10000 12000−10
−5
0
5
10
15
20
25
Theoretical boundsLearnerBookmakers
0 2000 4000 6000 8000 10000 12000−5
0
5
10
15
Theoretical boundsLearnerBookmakers
log loss square lossthe AA for the square loss the AA for the log loss
Learner optimizes for a “wrong” loss function
Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 5 / 18
Aggregating Algorithm with Wrong Losses
FactFor the game with 2 outcomes, one can construct a sequence ofpredictions of 2 Experts and a sequence of outcomes with thefollowing property. If Learner’s predictions are generated by theAggregating Algorithm for the log loss then for almost all T
Loss(T ) ≥ LossE1(T ) + T/10 ,
where Loss(T ) and LossE1(T ) are the square losses of Learner andExpert 1.
A similar statement holds for the Aggregating Algorithm for the squareloss evaluated by the log loss.
Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 6 / 18
New Settings
Experts: γ(1)t , . . . , γ
(k)t
Learner: γtReality: ωt
Many loss functions
Loss(m)Ek
(T ) =T∑
t=1
λ(m)(γ(k)t , ωt)
Loss(m)(T ) =T∑
t=1
λ(m)(γt , ωt)
Expert Evaluator’s advice
Loss(k)Ek
(T ) =T∑
t=1
λ(k)(γ(k)t , ωt)
Loss(k)(T ) =T∑
t=1
λ(k)(γt , ωt)
Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 7 / 18
New Settings
Experts: γ(1)t , . . . , γ
(k)t
Learner: γtReality: ωt
Many loss functions
Loss(m)Ek
(T ) =T∑
t=1
λ(m)(γ(k)t , ωt)
Loss(m)(T ) =T∑
t=1
λ(m)(γt , ωt)
Expert Evaluator’s advice
Loss(k)Ek
(T ) =T∑
t=1
λ(k)(γ(k)t , ωt)
Loss(k)(T ) =T∑
t=1
λ(k)(γt , ωt)
Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 7 / 18
Bound for New Settings
TheoremIf λ(k) are η(k)-mixable proper loss functions, k = 1, . . . , K , Learnerhas a strategy (e. g. the Defensive Forecasting algorithm) thatguarantees, for all T and for all k, that
Loss(k)(T ) ≤ Loss(k)Ek
(T ) +1
η(k)ln K .
Corollary
If λ(m) are η(m)-mixable proper loss functions, m = 1, . . . , M, Learnerhas a strategy that guarantees, for all T , for all k and for all m, that
Loss(m)(T ) ≤ Loss(m)Ek
(T ) +1
η(m)(ln K + ln M) .
Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 8 / 18
Tennis Predictions, Two Losses
Graphs of the negative regret Loss(m)Ek
(T )− Loss(m)(T )
0 2000 4000 6000 8000 10000 12000−4
−2
0
2
4
6
8
10
12
14
16
Theoretical boundsLearnerBookmakers
0 2000 4000 6000 8000 10000 12000−5
0
5
10
15
20
25
Theoretical boundsLearnerBookmakers
square loss log loss
Learner optimizes for both loss functions, using the DF algorithm.
Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 9 / 18
Defensive Forecasting Algorithm
∃π ∀ωK∑
k=1
p(k)t−1eη(λ(π,ω)−λ(π
(k)t ,ω)) ≤ 1 ,
where p(k)t−1 = p(k)
0 eη(Loss(t−1)−LossEk (t−1))
To get this (from Levin’s Lemma) we need that λ(π, ω) is continuousand for all π, π′
Eπeη(λ(π,·)−λ(π′,·)) =∑ω∈Ω
π(ω)eη(λ(π,ω)−λ(π′,ω)) ≤ 1
Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 10 / 18
Defensive Forecasting Algorithm
∃π ∀ωK∑
k=1
p(k)t−1eη(k)(λ(k)(π,ω)−λ(k)(π
(k)t ,ω)) ≤ 1 ,
where p(k)t−1 = p(k)
0 eη(k)(Loss(k)(t−1)−Loss(k)Ek
(t−1))
To get this (from Levin’s Lemma) we need that λ(k)(π, ω) is continuousand for all π, π′
Eπeη(k)(λ(k)(π,·)−λ(k)(π′,·)) =∑ω∈Ω
π(ω)eη(k)(λ(k)(π,ω)−λ(k)(π′,ω)) ≤ 1
Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 10 / 18
The DFA and the AA
λ is continuous and ∀π, π′ Eπeη(λ(π,·)−λ(π′,·)) ≤ 1 ⇒ λ is η-mixable
∀π, π′ Eπeη(λ(π,·)−λ(π′,·)) ≤ 1 ⇒ λ is proper
λ is η-mixable and ? ⇒ ∀π, π′ Eπeη(λ(π,·)−λ(π′,·)) ≤ 1
Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 11 / 18
The DFA and the AA
λ is continuous and ∀π, π′ Eπeη(λ(π,·)−λ(π′,·)) ≤ 1 ⇒ λ is η-mixable
∀π, π′ Eπeη(λ(π,·)−λ(π′,·)) ≤ 1 ⇒ λ is proper
λ is η-mixable and proper ⇒ ∀π, π′ Eπeη(λ(π,·)−λ(π′,·)) ≤ 1
Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 11 / 18
Proper Loss Functions
λ is proper if for any π, π′ ∈ P(Ω)
Eπλ(π, ·) ≤ Eπλ(π′, ·)
If ω ∼ π then Eπλ(π′, ω) is the expected loss for prediction π′.
The expected loss is minimal for the true distribution⇒ the forecaster is encouraged to give the true probabilities
The square loss and the log loss are proper.
Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 12 / 18
Example: Hellinger Loss
λHellinger (γ, ω) =12
r∑j=1
(√γ(j)−
√Iω=j
)2
The Hellinger loss is√
2-mixable
The Hellinger loss is not proper
Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 13 / 18
Proper Mixable Loss Functions
Each mixable loss function λ(γ, ω) has a proper analogue λproper (π, ω)such that
1 ∀π∃γ∀ω λproper (π, ω) = λ(γ, ω)
2 ∀π∀γ Eπλproper (π, ·) ≤ Eπλ(γ, ·)
For the Hellinger loss, the proper analogue is the spherical loss
λspherical(π, ω) = 1− π(ω)√∑rj=1(π(j))2
λspherical(π, ω) = λHellinger (γ, ω) for γ(ω) = (π(ω))2Pr
j=1(π(j))2
Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 14 / 18
Proper Mixable Loss Functions
Each mixable loss function λ(γ, ω) has a proper analogue λproper (π, ω)such that
1 ∀π∃γ∀ω λproper (π, ω) = λ(γ, ω)
2 ∀π∀γ Eπλproper (π, ·) ≤ Eπλ(γ, ·)
For the Hellinger loss, the proper analogue is the spherical loss
λspherical(π, ω) = 1− π(ω)√∑rj=1(π(j))2
λspherical(π, ω) = λHellinger (γ, ω) for γ(ω) = (π(ω))2Pr
j=1(π(j))2
Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 14 / 18
Example: Mixable and Non-Mixable Losses
Experts 1, . . . , K predict π(k) ∈ P(0, 1).Experts 1, . . . , N predict γ(n) ∈ 0, 1.Learner predicts (π, π) ∈ P(0, 1)× P(0, 1) such thatif π(0) > 1/2 then π(0) = 1 and if π(1) > 1/2 then π(1) = 1.
There exists a strategy for Learner that guarantees for any k
T∑t=1
λsquare(πt , ωt) ≤T∑
t=1
λsquare(π(k)t , ωt) + ln(K + N)
and for any m
T∑t=1
λabs(πt , ωt) ≤T∑
t=1
λsimple(γ(n)t , ωt) + O(
√T ln(K + N) + T ln ln T )
Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 15 / 18
Tennis Predictions, Square and Absolute Losses
Graphs of the negative regret Loss(m)Ek
(T )− Loss(m)(T )
0 50 100 150 200 250 300 350 400−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
theoretical boundsDFexperts
0 50 100 150 200 250 300 350 400−2
−1.5
−1
−0.5
0
0.5
DFexperts
square loss absolute loss
Learner optimizes for both loss functions, using the DF algorithm with“mixability” and Hoeffding supermartingales.
Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 16 / 18
The Mixed Supermartingale
1K + N
K∑k=1
e2PT−1
t=1 ((pt−ωt )2−(p(k)
t −ωt )2) × e2((p−ω)2−(p(k)
T −ω)2)
+1
K + N
N∑n=1
∫ 1/e
0
dη
η(
ln 1η
)2 eηPT−1
t=1 (|pt−ωt |−[γ(n)t 6=ωt ])−η2/2
× eη(|p−ω|−[γ(n)T 6=ω])−η2/2
where pt = πt(1), p(k)t = π
(k)t (1), pt = πt(1).
[x 6= y ] = 1 if x 6= y and [x 6= y ] = 0 if x = y .
Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 17 / 18
References
V. Vovk, F. Zhdanov. Predictions with Expert Advice for Brier Game.ICML 2008. http://arxiv.org/abs/0710.0485
A. Chernov, Y. Kalnishkan, F. Zhdanov, V. Vovk. Supermartingales inPrediction with Expert Advice. ALT 2008 and TCS.http://arxiv.org/abs/1003.2218
A. Chernov, V. Vovk. Prediction with expert evaluators’ advice.ALT 2009. http://arxiv.org/abs/0902.4127
A. Chernov, V. Vovk. Prediction with Advice of Unknown Number ofExperts. UAI 2010. http://arxiv.org/abs/1006.0475
http://onlineprediction.net/
Alexey Chernov (RHUL) Multiobjective Prediction with Expert Advice GTP Workshop, June 2010 18 / 18