Game-theoretic Online Learning

An Introduction to

Tim van Erven

General Mathematics Colloquium Leiden, December 4, 2014

Statistics

BayesianStatistics Frequentist

Statistics

Statistics A.was here

Statistics

RogueFactions

Statistics

RogueFactions

AdversarialBandits

Statistics

RogueFactions

Game-theoreticOnline Learning

AdversarialBandits

Youare here

Outline

● Introduction to Online Learning– Game-theoretic Model

– Regression example: Electricity

– Classification example: Spam

● Three algorithms● Tuning the Learning Rate

Game-Theoretic Online Learning

● Predict data that arrive one by one● Model: repeated game against an adversary● Applications:

– spam detection

– data compression

– online convex optimization

– predicting electricity consumption

– predicting air pollution levels

– ...

Repeated Game (Informally)

● Sequentially predict outcomes ● Measure quality of prediction by loss

● Before predicting , get predictions (=advice)from experts

● Goal: to predict as well as the best expert over rounds.

● Data and Advice can be adversarial

Repeated Game

● Every round :

1. Get expert predictions

2. Predict

3. Outcome is revealed

4. Measure losses

Repeated Game

● Every round :

1. Get expert predictions

2. Predict

3. Outcome is revealed

4. Measure losses

● Best expert:

● Goal: minimize regret

Outline

Regression Example: Electricity

● Électricité de France: predict electricitydemand one day ahead, every day

● Experts: complicated regression models● Loss:

[Devaine, Gaillard,Goude, Stoltz, 2013]

Regression Example: Electricity

● Électricité de France: predict electricitydemand one day ahead, every day

● Experts: complicated regression models● Loss:

● Best model after one year:

● How much worse are we?

[Devaine, Gaillard,Goude, Stoltz, 2013]

Outline

Example: Spam Detection

spamspam

hamham

Classification Example: Spam

● Experts: spam detection algorithms● Messages:

Predictions:● Loss:

● Regret: extra mistakes we make over bestalgorithm on messages

Outline

● Introduction to Online Learning● Three algorithms:

1. Halving

2. Follow the Leader (FTL)

3. Follow the Regularized Leader (FTRL)

● Tuning the Learning Rate

A First Algorithm: Halving

● Suppose one of the spam detectors is perfect

● Keep track of experts without mistakes so far:

● Halving algorithm:

● Theorem: regret

A First Algorithm: Halving

Theorem: regret

● Does not grow with

Proof:● Suppose halving makes mistakes, regret =

● Every mistake eliminates at least half of

● is at most mistakes

Outline

1. Halving

Follow the Leader

● Want to remove unrealistic assumption thatone expert is perfect

● FTL: copy the leader's prediction● The leader at time :

where is cumulative loss forexpert k

(break ties randomly)

FTL Works with Perfect Expert

Theorem: Suppose one of the spam detectorsis perfect. Then Expected regret

Proof:● Expected regret = E[nr. mistakes] - 0● Worst case: experts get one loss in turn● E[nr. mistakes]

FTL: More Good News

● No assumption of perfect expert

Theorem: regret

FTL: More Good News

Theorem: regret

● Proof sketch:– No leader change: our loss = loss of leader, so

the regret stays the same

– Leader change: our regret increases at mostby 1 (range of losses)

FTL: More Good News

Theorem: regret

● Proof sketch:– No leader change: our loss = loss of leader, so

the regret stays the same

– Leader change, our regret increases at mostby 1 (range of losses)

● Works well for i.i.d. losses, because the leaderchanges only finitely many times w.h.p.

FTL on IID Losses

● 4 experts with Bernoulli 0.1, 0.2, 0.3, 0.4losses; regret = O(log K)

FTL Worst-case Losses

● Two experts with tie/leader change everyround:

● Both experts have cumulative loss:● Regret is linear in

● Problem: FTL too sure of itself when no ties!

Expert 1 1 0 1 0 1 0

Expert 2 0 1 0 1 0 1

FTL 1/2 1 1/2 1 1/2 1

Outline

1. Halving

Solution: Be Less Sure!

● Pull FTL choices towards uniform distribution● Follow the Leader:

– leader:

– as distribution:

● Follow the Regularized Leader:

– add penalty for being away from uniform;Kullback-Leibler divergence in this talk

The Learning Rate

● Follow the Regularized Leader:

● Very sensitive to choice of learning rate

Follow the Leader Don't learn at all

Outline

● Introduction to Online Learning● Three algorithms● Tuning the Learning Rate:

– Safe tuning

– The Price of Robustness

– Learning the Learning Rate

The Worst-case Safe Learning Rate

Theorem: For FTRL regret

regret

● No (probabilistic) assumptions about data!● Optimal● is standard in online learning

Outline

– Safe tuning

The Price of Robustness

● Safe tuning does much worse than FTL oni.i.d. losses

Method Special Caseof FTRL

PerfectExpert Data

IID Data Worst-caseData

Halving no undefined undefined

Follow the Leader

(very large)

FTRL with Worst-case Safe Tuning

(small)

Can we adapt to optimal eta automatically?

Outline

– Safe tuning

A Failed Approach: Being Meta

● We want ● Idea: meta-problem

– Expert 1: FTL

– Expert 2: FTRL with Safe Tuning

– Expert 1: FTL

● Regret ● Best of both worlds if meta-regret small!

– Expert 1: FTL

● Regret ● Best of both worlds if meta-regret small!● If and ,

then is too big!

Partial Progress

● Safe tuning: Regret● Improvement for small losses:

Regret

Partial Progress

● Variance Bounds:

Regret variance of

● Cesa-Bianchi, Mansour, Stoltz, 2007● vE, Grünwald, Koolen, De Rooij, 2011● De Rooij, vE, Grünwald, Koolen, 2014

Partial Progress

● Variance Bounds:

Regret variance of

Regret is a Difficult Function 1/2

● All the previous solutions could potentially endup in wrong local minimum

Regret is a Difficult Function 2/2

● Regret as a function of T for fixed is non-monotonic.

● This means some may look very bad for awhile, but end up being very good after all

● How do we see which 's are good?● Use (best-possible) monotonic lower-bound

Learning the Learning Rate

● Koolen, vE, Grünwald, 2014

● Track performance for grid of learning rates

● Switch between them– Pay for switching, but not too much

● Running time as fast as for single fixed– does not depend on size of grid

Theorems:

● For all interesting :

} As good asall previousmethods

polylog(K,T)

● Koolen, vE, Grünwald, 2014

Method Special Case ofFTRL

PerfectExpert Data

IID Data Worst-caseData

Compared toOptimal

Follow theLeader

(very large)

no usefulguarantees

FTRL withWorst-caseSafe Tuning

(small)

no usefulguarantees

LLR polylog(K,T)factor

Can we adapt to optimal eta automatically? Yes!

Summary

● Game-theoretic Online Learning– e.g. electricity forecasting, spam detection

● Three algorithms:– Halving, Follow the (Regularized) Leader

● Tuning the Learning Rate:– Safe tuning pays the Price of Robustness

– Learning the learning rate adapts to theoptimal learning rate automatically

Statistics

FrequentistStatistics

RogueFactions

Game-theoreticOnline Learning

AdversarialBandits

Youare here

BayesianStatistics

Future Work

● So far: compete with the best expert

● Online Convex Optimization: compete with thebest convex combination of experts

● Future work: extend LLR to online convexoptimization

References

● Cesa-Bianchi and Lugosi. Prediction, learning, and games. 2006.

● Cesa-Bianchi, Mansour, Stoltz. Improved second-order bounds for prediction withexpert advice. Machine Learning, 66(2/3):321–352, 2007.

● Devaine, Gaillard, Goude, Stoltz. Forecasting electricity consumption byaggregating specialized experts. Machine Learning, 90(2):231-260, 2013.

● Van Erven, Grünwald, Koolen and De Rooij. Adaptive Hedge. NIPS, 2011.

● De Rooij, Van Erven, Grünwald, Koolen. Follow the Leader If You Can, Hedge IfYou Must. Journal of Machine Learning Research, 2014.

● Koolen, Van Erven, Grünwald. Learning the Learning Rate forPrediction with Expert Advice. NIPS, 2014.

Game-theoretic Online Learning

Documents

Game-Theoretic Rough Sets

Clifton_M_Rational choice and game theoretic approaches_ introduction.pdf

Game Theoretic Pragmatics - uni-tuebingen.de

Computing Strong Game-Theoretic Strategies and Exploiting

Pt tLi iPatent Licensing : A Game Theoretic AnalysisA Game

A Game Theoretic Approach to Strategic Behaviorpeople.tamu.edu/~aglass/econ323/Chapter12Slides.pdf · Chapter 12 A Game‐Theoretic ... • A game is a prisoner’s dilemma if: 1

Optimal and Game-Theoretic Deployment of Security

GAME THEORETIC OPTIMIZATION OF PRODUCTION PLANNING

GAME-THEORETIC OPTIMAL PORTFOLIOS*

Game Theoretic Load Balancing

Game-Theoretic Comparison Approach for Intercontinental ...downloads.hindawi.com/journals/jat/2017/3128372.pdf · Game-Theoretic Comparison Approach for Intercontinental Container

A Game Theoretic Distributed Algorithm for FeICIC

A Game Theoretic Analysis of Alternative

Game theoretic modeling, analysis, and mitigation of

Urban rail patrolling: a game theoretic approach

GameOn: A Game-Theoretic Approach to Digital Marketing and ... · GameOn: A Game-Theoretic Approach to Digital Marketing and Online Lead Generation for Oligopoly Markets Submitted

A Game-theoretic Analysis of Catalog Optimization

Game-theoretic Analysis of Australia’s National

Game theoretic approaches for Cloud Computing

Shape Matching: A Game-Theoretic Perspective