45
Sequential Optimization and Computer Experiments An introduction Aurélien Garivier March 23 rd , 2016 Institut de Mathématiques de Toulouse LabeX CIMI Université Paul Sabatier

Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

  • Upload
    others

  • View
    20

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

Sequential Optimization andComputer ExperimentsAn introduction

Aurélien GarivierMarch 23rd, 2016

Institut de Mathématiques de ToulouseLabeX CIMIUniversité Paul Sabatier

Page 2: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

Table of contents

1. Sequential Optimizing Without Noise: Some Ideas

2. Kriging: Gaussian Process Regression

3. Optimizing in the presence of noise

4. The Bandit Approach

5. Bandit Algorithms for the Continuous Case

2

Page 3: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

Framework

Given an input x ∈ X , a (complex)codes returns

F(x,U) = f(x) + η

where U is an independent U [0, 1] r.v.and E[η] = 0. Possibly, η = 0.

Source: freesourcecode.net

Goal: maximize f

using a sequential choice of inputs.

Examples:

• Numerical Dosimetry (foetus exposure to Radio FrequencyElectromagnetic Fields) - Jalla et al., Mascotnum 2013

• Traffic Optimization (find the shortest path from A to B)

3

Page 4: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

Sequential Optimizing WithoutNoise: Some Ideas

Page 5: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

Methods not mentionned here

Gradient descentHere: search for globaloptimum, no convexityhypothesis

Source: www.d.umn.edu/̃deoka001/

Simulated annealingslowly lower the temper-ature of a hot mate-rial, minimizing the sys-tem energy

Source: freesourcecode.net

Genetic Algorithms, Cutting Plane methods, Sum of Squares,... 5

Page 6: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

The Branch-and-Bound Paradigm [Munos, 2014, de Freitaset al., 2012]

Also used for discrete and combinatorial optimization problems

• Branching = hierarchical partitioning(recursive splitting) of X

• Each cell C has a representative xC ∈ C• Assumption: possibility to compute anupper-bound of f on each cell (usingthe regularity of f)

• Start with 1 active cell = X and x̂ = xX• At each iteration:

• Pick an active cell C• f(xC) > f(x̂), update x̂ := xC• Split C into sub-cells and desactivate C• Set all sub-cells with upper-boundslarger that f(x̂) to be active

Source: veendeta.wordpress.com

Source: de Freitas et al. [2012]

6

Page 7: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

The SOO algorithm Munos [2011]

SOO = Simulatneous Optimistic Optimization

Requires a multi-scale decomposition of X :∀h ≥ 0,

X =

Nh∪i=1

Ch,i .

Ex: binary splitting.Source: veendeta.wordpress.com

SOOFOR r=1..R

FOR every non-empty depth dSPLIT the cell Ch,i of depth d with highest f(x)

No need to know the (possibly anisotropic) regularity of f !7

Page 8: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

SOO: an Example

f(x, y) = (x− c1)2 − 0.05|y− c2|

n = 108

Page 9: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

SOO: an Example

f(x, y) = (x− c1)2 − 0.05|y− c2|

n = 20

8

Page 10: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

SOO: an Example

f(x, y) = (x− c1)2 − 0.05|y− c2|

n = 30

8

Page 11: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

SOO: an Example

f(x, y) = (x− c1)2 − 0.05|y− c2|

n = 50

8

Page 12: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

SOO: an Example

f(x, y) = (x− c1)2 − 0.05|y− c2|

n = 90

8

Page 13: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

SOO: an Example

f(x, y) = (x− c1)2 − 0.05|y− c2|

n = 120

8

Page 14: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

Analysis: Near-Optimality Dimension

For every ϵ > 0, let

Xϵ ={x ∈ X : f(x) ≥ f∗ − ϵ

}.

Definition: Near-Optimality Dimension

The near-optimality dimension of f is the smallest d ≥ 0 such thatthere exists C > 0 for which, for all ϵ > 0, the maximal number ofdisjoint balls of radius ϵ with center in Xϵ is less than Cϵ−d.

9

Page 15: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

Speed of convergence of SOO [Valko et al., 2013]

Theorem: If δ(h) = cγh and if the near-optimality dimension of f isd = 0, then

f∗ − f(x̂t) = O(γt).

If the near-optimality dimension of f is d > 0, then

f∗ − f(x̂t) = O(

1t1/d

).

Idea of the proof:For every scale h let

δ(h) = maxi

supx,x′∈Ch,i

f(x)− f(x′) and Ih ={Ch,i : f(xh,i)+δ(h) ≥ f∗

}At every level h, the number of cells splitted before the onecontaining x∗ is at most |Ih| ≤ Cδ(h)−d.Thus, after t splits, the algorithm has splitted a cell containing x∗ oflevel at least h∗t such that C

∑h∗tl=0 δ(l)−d ≥ t.

10

Page 16: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

Kriging: Gaussian Process Regres-sion

Page 17: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

Kriging

Bayesian model: f is drawn from a random distribution.

Gaussian Process: for every t and every x1, . . . , xt ∈ X ,f(x1)f(x2)...

f(xt)

∼ N

00...0

, Kt =

k(x1, x1) k(x1, x2) . . . k(x1, xt)k(x2, x1) k(x2, x2) . . . k(x2, xt)

......

......

k(xt, x1) k(xt, x2) . . . k(xt, xt)

where k : X × X → R is a covariance function.

Possibility to incorporate Gaussian noise: Y⃗t = f⃗t + ϵ⃗t.

12

Page 18: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

Why kriging?

Conditionally on Ft, f is still a Gaussian process:

L(f|Ft

)= GP

(µt : u 7→ kt(u)TK−1t Y⃗t, kt : u, v 7→ k(u, v)− k⃗t(u)TK−1t k⃗t(v)

)

where

k⃗t(u) =

k(u, x1)k(u, x2)

...k(u, xt)

.

Source: wikipedia

13

Page 19: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

The GP-UCB Algorithm [Srinivas et al., 2012]

• Initialization: space-filling (LHS)• Iteration t:

• For every x ∈ X , computeu(x) = quantile of f(x)conditionally on Ft−1 of level1− 1/t

• Choose Xt = argmaxx∈X u(x)• Observe Yt = F(Xt,Ut) Source: de Freitas et al. [2012]

Source: Srinivas et al. [2012]14

Page 20: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

GP-UCB: Convergence

Two kinds of results:

• If f is really drawn from the Gaussian Process: for the Gaussiankernel the cumulated regret is bounded as

E

[ T∑t=1

f∗ − f(Xt)]= O

(√T(log(T)

) d+12).

• If f has a small norm in the RKHS corresponding to the kernel k(= regularity condition), similar results

Also Expected Improvement (similar idea, slightly different criterion),see Vazquez and Bect [2010].

15

Page 21: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

GP-UCB is not limited to smooth functions: BrownUCB

0.0 0.2 0.4 0.6 0.8 1.0

0.6

0.8

1.0

1.2

1.4

1.6

res$x

res$

f

● ●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

16

Page 22: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

Optimizing in the presence ofnoise

Page 23: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

Sequential Strategies

A strategy is a triple:

• Sampling rule: Xt is Ft−1-measurable, where

Yt = F(Xt,Ut) and Ft = σ(X1, Y1, . . . , Xt, Yt) .

• Stopping rule: the number of observations τ is a stopping timewrt (Ft)t.

• Decision rule: x̂ is Fτ -measurable.

18

Page 24: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

Objectives

At least three relevant goals:

• Cumulated regret: τ = T, maximize E[∑T

t=1 Yt]

ex: clinical study protocol• Simple regret: τ = T, minimize f∗ − E

[f(x̂T)

]• PAC analysis: among (ϵ, δ)-Probably Approximately Correctmethods satisfying

P(f(x̂) ≥ f∗ − ϵ

)≥ 1− δ ,

minimize the sample complexity E[τ ].

19

Page 25: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

The Bandit Approach

Page 26: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

Advertisement: Perchet Course

In this workshop, 2 introductory lectures by Vianney Perchet:

• Lecture 1 (thursay 9:00):the stochastic case(as above)

• Lecture 2 (friday 9:00):adversarial case(game-theoretic/robustapproach)

Here: optimization point of view.

21

Page 27: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

Simplistic case: X finite

• X = {1, . . . , K}• f ∈ [0, 1]K, no structure• F(x,U) ∼ B(fx)• (ϵ, δ)−PAC analysis• ϵ = 0.

Ex: extreme clinical trials in dictatorship.

Not so simple!

22

Page 28: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

Racing Algorithms: Successive Elimination[Even-Dar et al., 2006, Kaufmann and Kalyanakrishnan, 2013]

• At start, all arms are active;• Then, repeatedly cycle thru active arms until only one arm isstill active

• At the end of a cycle, eliminate arms with statistical evidence ofsub-optimality: desactivate x if

max f̂(t)− f̂x(t) ≥ 2√log(Kt2/δ)

tTheorem: Sucessive Elimination is (0, δ)− PAC and, with probabilityat least 1− δ,

τδ = O

∑x̸=x∗

log Kδ∆x

∆2x

where for all x ∈ {1, . . . , K}, ∆x = f∗ − f(x).

23

Page 29: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

The LUCB Algorithm[Kaufmann and Kalyanakrishnan, 2013]

See also Kalyanakrishnan et al. [2012], Gabillon et al. [2012], Jamieson et al. [2014].

• Maintain, at every step, a lower- and anupper-confidence bound for each arm;

• Successively draw the best empiricalarm and the challenger with highestupper-confidence bound;

• Stop when, for some x ∈ X , the lowerbound on fx is by ϵ of the highestupper-bound of the other arms.

Theorem: The sample complexity E[τ ] of LUCB (with adequateconfidence bounds) is upper-bounded by O

(Hϵ log(Hϵ/δ)

), where

Hϵ =∑x

1(∆x ∨ ϵ/2)2 ,

∆x∗ = f(x∗)−maxx̸=x∗ f(x) and, for x ̸= x∗, ∆x = f(x∗)− f(x). 24

Page 30: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

Lower bound [Garivier and Kaufmann, 2016]

Let ΣK = {ω ∈ Rk+ : ω1 + · · ·+ ωK = 1} and

A(f) := {g ∈ [0, 1]K : argmax f ̸= argmaxg} ,

Theorem: For any δ-PAC strategy and any function f ∈ [0, 1]K,

E[τδ] ≥ T∗(f) kl(δ, 1− δ),

where

T∗(f)−1 := supw∈ΣK

infg∈A(f)

K∑x=1

wx kl(fx,gx)

and kl(x, y) := x log(x/y) + (1− x) log((1− x)/(1− y)).

Note: kl(δ, 1− δ) ≈ log(1/δ) when δ → 0.

25

Page 31: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

About the Computation of T∗(f) and w∗

The proof shows that the maximizer w∗(f) of

supw∈ΣK

infg∈A(f)

K∑x=1

wxd(fx,gx)

is the optimal proportion of arm draws.

Introducing

Iα(y, z) := αkl(y, αy+ (1− α)z

)+ (1− α)kl

(z, αy+ (1− α)z

),

one can see that

T∗(f)−1 = supw∈ΣK

minx̸=1

(w1 + wx)I w1w1+wx

(f1, fx) .

T∗(f) and w∗(f) can be computed by a succession of scalar equationresolutions, and one proves that:

1. For all f ∈ [0, 1]K and all 1 ≤ x ≤ K, w∗x (f) > 0;

2. w∗(f) is continuous in every f;3. If f1 > f2 ≥ · · · ≥ fK, one has w∗

2 (f) ≥ · · · ≥ w∗K(f).

26

Page 32: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

Tracking Strategy: Sampling Rule [Garivier and Kaufmann, 2016]

Let f̂(s) be the ML-estimator of f based on observations Y1, . . . , Ys.

For every ϵ ∈ (0, 1/K], let wϵ(f) be a L∞ projection of w∗(f) onΣϵK =

{(w1, . . . ,wK) ∈ [ϵ, 1] : w1 + · · ·+ wK = 1

}. Let ϵt = (K2 + t)−1/2/2

and

Xt+1 ∈ argmax1≤x≤K

t∑s=0

wϵsx (̂f(s))− 1{Xs = x} ,

Then for all t ≥ 1 and x ∈ {1, . . . , K},Nx(t) =

∑ts=0 1{Xs = x}(t) ≥

√t+ K2 − 2K and

max1≤x≤K

∣∣∣∣∣Nx(t)−t−1∑s=0

w∗x (̂f(s))

∣∣∣∣∣ ≤ K(1+√t) .

27

Page 33: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

Tracking Strategy: Stopping Rule [Garivier and Kaufmann, 2016]

For x ∈ {0, 1}∗ let pθ(x) = θ∑

x(1− θ)∑

(1−x).

Chernoff’s Stopping Rule (see Chernoff [1959]): for 1 ≤ x, z ≤ K let

Zx,z(t) = logmaxf′x≥f′z pf′x

(XxNx(t)

)pf′z(XzNz(t)

)maxf′x≤f′z pf′x

(XxNx(t)

)pf′z(XzNz(t)

)= Nx(t)d

(̂fx(t), f̂x,z(t)

)+ Nz(t)d

(̂fz(t), f̂x,z(t)

)if f̂x(t) ≥ f̂z(t), and Zx,z(t) = −Zz,x(t).

The stopping rule is defined by:

τδ = inf{t ≥ 1 : Z(t) := max

x∈{1,...,K}min

z∈{1,...,K}\{x}Zx,z(t) > β(t, δ)

}where β(t, δ) is the threshold to be tuned.

28

Page 34: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

Optimality Result

Proposition: For every δ ∈]0, 1[, whatever the sampling strategy,Chernoff’s stopping rule with

β(t, δ) = log(2t(K− 1)

δ

)ensures that for all f ∈ [0, 1]K, P (τδ < ∞, x̂τδ ̸= x∗) ≤ δ.

Theorem: With the sampling rule and the stopping rule given above,τδ < ∞ a.s. and

P(lim sup

δ→0

τδlog(1/δ) ≤ T∗(f)

)= 1.

29

Page 35: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

Bandit Algorithms for the Contin-uous Case

Page 36: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

Kriging: GP-UCB Srinivas et al. [2012]

If f ∼ GP(0, k) and if for all t ≥ 1

• Yt = f(Xt) + ϵt

• the noise ϵt is Gaussian

then Y⃗t is still a Gaussian vector.

=⇒ the covariance kernel is modified, but one can still computeE[f(x)|Ft] for every x, and apply the GP-UCB algorithm!

Works well in pratice, but limited guarantees.

31

Page 37: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

Extensions to Continuous Spaces [Munos, 2014]

HOO maintains, for every cell Ch,i, two upper-confidence bounds (UCB) on maxx∈Ch,i f(x):Bh,i based on all observations on the cell, andUh,i = min{Bh,i,Uh+1,j} computed from thechildren j of Ch,i.

Source: veendeta.wordpress.com

The HOO Algorithm [Bubeck et al., 2011]FOR t=1..T

GO DOWN the tree picking each time the nodewith highest Ui,h until a leave is met

PICK a point at random in leave cellUPDATE Uh,i and Bh,i of all nodes in the path

Theorem: If f has near-optimality dimension d, then cumulatedregret of HOO is upper-bounded as

RT = O(T(d+1)/(d+2) log1/(d+2)(T)) . 32

Page 38: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

The HOO Algorithm

Source: aresearch.wordpress.com33

Page 39: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

The StoSOO Algorithm of Valko et al. [2013]

Stochastic Simulataneous Optimistic Optimization:instead of f(x), use an upper-confidence bound.

StoSOOFOR r=1..R

FOR every non-empty depth dPICK the cell Ch,i of depth dwith highest upper-confidence bound on f(xCh,i)IF xCh,i has been evaluated T/ log3(T) times

THEN SPLIT itELSE evaluate at xCh,i

Theorem: If the near-optimality dimension of f is d = 0, then

E[f∗ − f(x̂T)] = O(log2(T)√

T

).

34

Page 40: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

Conclusion

Still a lot to be done, from both ends:

• Kriging: Powerful and versatile algorithms, but with very lowguarantees.

• Optimal bandit algorithms for very limited settings, to beextended!

• Adaptivity to the problem difficulty: function regularity,partitioning scheme.

35

Page 41: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

References I

References

S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvári. X-armed bandits.Journal of Machine Learning Research, 12:1587–1627, 2011.

H. Chernoff. Sequential design of Experiments. The Annals ofMathematical Statistics, 30(3):755–770, 1959.

Nando de Freitas, Alex Smola, and Masrour Zoghi. Regret Bounds forDeterministic Gaussian Process Bandits. Technical ReportarXiv:1203.2177, Mar 2012. URLhttps://cds.cern.ch/record/1430976. Comments: 17pages, 5 figures.

36

Page 42: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

References II

E. Even-Dar, S. Mannor, and Y. Mansour. Action Elimination andStopping Conditions for the Multi-Armed Bandit andReinforcement Learning Problems. Journal of Machine LearningResearch, 7:1079–1105, 2006.

V. Gabillon, M. Ghavamzadeh, and A. Lazaric. Best Arm Identification:A Unified Approach to Fixed Budget and Fixed Confidence. InAdvances in Neural Information Processing Systems, 2012.

A. Garivier and E. Kaufmann. Optimal Best Arm Identification withFixed Confidence, 2016.

K. Jamieson, M. Malloy, R. Nowak, and S. Bubeck. lil’UCB: an OptimalExploration Algorithm for Multi-Armed Bandits. In Proceedings ofthe 27th Conference on Learning Theory, 2014.

37

Page 43: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

References III

S. Kalyanakrishnan, A. Tewari, P. Auer, and P. Stone. PAC subsetselection in stochastic multi-armed bandits. In InternationalConference on Machine Learning (ICML), 2012.

E. Kaufmann and S. Kalyanakrishnan. Information complexity inbandit subset selection. In Proceeding of the 26th Conference OnLearning Theory., 2013.

R. Munos. From bandits to Monte-Carlo Tree Search: The optimisticprinciple applied to optimization and planning., volume 7.Foundations and Trends in Machine Learning, 2014.

38

Page 44: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

References IV

Rémi Munos. Optimistic optimization of a deterministic functionwithout the knowledge of its smoothness. In John Shawe-Taylor,Richard S. Zemel, Peter L. Bartlett, Fernando C. N. Pereira, andKilian Q. Weinberger, editors, NIPS, pages 783–791, 2011. URLhttp://dblp.uni-trier.de/db/conf/nips/nips2011.html#Munos11.

N. Srinivas, A. Krause, S. Kakade, and M. Seeger.Information-Theoretic Regret Bounds for Gaussian ProcessOptimization in the Bandit Setting. IEEE Transactions onInformation Theory, 58(5):3250–3265, 2012.

39

Page 45: Sequential Optimization and Computer Experiments - An ...mascot2016.sciencesconf.org/conference/mascot2016/... · Sequential Optimization and Computer Experiments Anintroduction AurélienGarivier

References V

Michal Valko, Alexandra Carpentier, and Rémi Munos. Stochasticsimultaneous optimistic optimization. In Sanjoy Dasgupta andDavid Mcallester, editors, Proceedings of the 30th InternationalConference on Machine Learning (ICML-13), volume 28, pages 19–27.JMLR Workshop and Conference Proceedings, May 2013. URLhttp://jmlr.csail.mit.edu/proceedings/papers/v28/valko13.pdf.

Emmanuel Vazquez and Julien Bect. Convergence properties of theexpected improvement algorithm with fixed mean and covariancefunctions. Journal of Statistical Planning and Inference, 140(11):3088 – 3095, 2010. ISSN 0378-3758. doi:http://dx.doi.org/10.1016/j.jspi.2010.04.018. URLhttp://www.sciencedirect.com/science/article/pii/S0378375810001850.

40