Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Optimal Learning: an Overview
Peter I. Frazier
Operations Research & Information Engineering, Cornell University
Thursday June 12, 2014Guest Lecture, Operations Research 3 – Decision-making
Tsinghua University
Research supported by AFOSR and NSF
Frazier (Cornell University) Optimal Learning talk 1 / 45
What is optimal learning?
In many applications, we make decisions about which data to collect.
In making these decisions we trade the benefit of information (theability to make better decisions in the future) against its cost (money,time, or opportunity cost).
Statistical learning is making predictions or decisions based on data.
Optimal learning is making decisions about which data tocollect in an optimal way.
Frazier (Cornell University) Optimal Learning talk 2 / 45
Optimal learning overlaps with other fields
Optimal learning overlaps with these fields:
Bayesian statistics, and machine learning.
Decision-making under uncertainty, and dynamic programming.
Frazier (Cornell University) Optimal Learning talk 3 / 45
Outline
1 Example Optimal Learning Problems
2 Bayesian Selection of the BestProblem summaryBayesian inferenceThe Knowledge-Gradient (KG) methodOptimality analysis using dynamic programming
3 Conclusion
Frazier (Cornell University) Optimal Learning talk 4 / 45
Dynamic Pricing
Our goal is to price airline tickets tomaximize revenue.
We learn about demand for a flight as wesell tickets.
The information collected depends on howwe price each ticket: we only observewhether the price that the customer waswilling to pay was above or below theoffered price.
Collecting more information now mayprovide the ability to improve revenueslater.
Round Trip (Start New Search)
Depart New York/Newark, NJ (EWR - Liberty)
Arrive Los Angeles, CA (LAX)
Date Sun., Jan. 11, 2009 Time Anytime
Cabin Economy Travelers 1
Fares listed are for the entire trip per person and do not include all taxes and fees. Additional bag charges may
apply.
The fare displayed is the lowest available for the dates requested; however some flights may not be in the cabin
you requested.
Nonstopfrom
$937
Search tip:
To use US Helicopter service between Newark and Manhattan, select the Wall Street (JRB) or
Midtown Manhattan (TSS) heliport.
Select Your Departing Flight for Sun., Jan. 11, 2009:
BUY NOW - limited tickets at our lowest price
Price Departing Arriving Travel Time OnePass Miles
Nonstop flights from $937
from
$937
4 tickets
at this price
Select
Depart:
8:30 a.m.Sun., Jan. 11, 2009New York/Newark, NJ (EWR
- Liberty)
Arrive:
11:52 a.m.Sun., Jan. 11, 2009
Los Angeles, CA (LAX)
Travel
Time:
6 hr 22
mn
OnePass Miles/
Elite
Qualification*:
2,454
/150%
Flight: CO1402
Aircraft: Boeing 737-800
Fare Class: Economy (H)
Meal: Snack
See On-Time Performance
View Seats
Select Departing Flight
from
$937
3 tickets
at this price
Select
Depart:
1:10 p.m.Sun., Jan. 11, 2009New York/Newark, NJ (EWR
- Liberty)
Arrive:
4:15 p.m.Sun., Jan. 11, 2009
Los Angeles, CA (LAX)
Travel
Time:
6 hr 5
mn
OnePass Miles/
Elite
Qualification*:
2,454
/150%
Flight: CO17
Aircraft: Boeing 757-200
Fare Class: Economy (H)
Meal: Snack
See On-Time Performance
View Seats
from
$937
8 tickets
at this price
Select
Depart:
7:00 p.m.Sun., Jan. 11, 2009New York/Newark, NJ (EWR
- Liberty)
Arrive:
10:27 p.m.Sun., Jan. 11, 2009
Los Angeles, CA (LAX)
Travel
Time:
6 hr 27
mn
OnePass Miles/
Elite
Qualification*:
2,454
/150%
Flight: CO1502
Aircraft: Boeing 737-800
Fare Class: Economy (H)
Meal: Snack
See On-Time Performance
View Seats
from
$1,187
2 tickets
at this price
Select
Depart:
3:20 p.m.Sun., Jan. 11, 2009New York/Newark, NJ (EWR
- Liberty)
Arrive:
6:36 p.m.Sun., Jan. 11, 2009
Los Angeles, CA (LAX)
Travel
Time:
6 hr 16
mn
OnePass Miles/
Elite
Qualification*:
2,454
/150%
Flight: CO65
Aircraft: Boeing 757-200
Fare Class: Economy (Y)
Meal: Snack
See On-Time Performance
View Seats
from
$1,237
1 ticket
at this price
Select
Depart:
4:30 p.m.Sun., Jan. 11, 2009New York/Newark, NJ (EWR
- Liberty)
Arrive:
7:51 p.m.Sun., Jan. 11, 2009
Los Angeles, CA (LAX)
Travel
Time:
6 hr 21
mn
OnePass Miles/
Elite
Qualification*:
2,454
/150%
Flight: CO702
Aircraft: Boeing 737-800
Fare Class: First (A)
Meal: Dinner
See On-Time Performance
View Seats
Flights with stops from $1,182
from
$1,182
Depart:
1:29 p.m.Sun., Jan. 11, 2009New York/Newark, NJ (EWR
Arrive:
4:08 p.m.Sun., Jan. 11, 2009
Atlanta, GA (ATL)
Flight
Time:
2 hr 39
mn
OnePass Miles/
Elite
Qualification*:
756 /100%
Flight: CO1161
Aircraft: Boeing 737-300
Fare Class: Economy (N)
Meal: None
Select- Liberty) See On-Time Performance
View Seats
Change Planes. Connect time in Atlanta, GA (ATL) is 1 hour 22 minutes.
Depart:
5:30 p.m.Sun., Jan. 11, 2009
Atlanta, GA (ATL)
Arrive:
7:34 p.m.Sun., Jan. 11, 2009
Los Angeles, CA (LAX)
Flight
Time:
5 hr 4 mn
Travel
Time:
9 hr 5
mn
OnePass Miles/
Elite
Qualification*:
1,946 /150%
Total Miles:
2,702
Flight: CO4524
Aircraft: Boeing 767-300
Fare Class: Economy (H)
Meal: None
No Special Meal Offered.
View Seats
Continental flight 4524 operated by Delta Air Lines.
from
$1,376
5 tickets
at this price
Select
Depart:
2:25 p.m.Sun., Jan. 11, 2009New York/Newark, NJ (EWR
- Liberty)
Arrive:
4:07 p.m.Sun., Jan. 11, 2009
Cleveland, OH (CLE)
Flight
Time:
1 hr 42
mn
OnePass Miles/
Elite
Qualification*:
416 /100%
Flight: CO325
Aircraft: Boeing 737-800
Fare Class: Economy (N)
Meal: None
See On-Time Performance
View Seats
Change Planes. Connect time in Cleveland, OH (CLE) is 1 hour 8 minutes.
Depart:
5:15 p.m.Sun., Jan. 11, 2009
Cleveland, OH (CLE)
Arrive:
7:26 p.m.Sun., Jan. 11, 2009
Los Angeles, CA (LAX)
Flight
Time:
5 hr 11
mn
Travel
Time:
8 hr 1
mn
OnePass Miles/
Elite
Qualification*:
2,053 /150%
Total Miles:
2,469
Flight: CO67
Aircraft: Boeing 737-800
Fare Class: Economy (H)
Meal: Snack
No Special Meal Offered.
See On-Time Performance
View Seats
from
$1,446
6 tickets
at this price
Select
Depart:
5:30 a.m.Sun., Jan. 11, 2009New York/Newark, NJ (EWR
- Liberty)
Arrive:
8:23 a.m.Sun., Jan. 11, 2009Houston, TX (IAH -
Intercontinental)
Flight
Time:
3 hr 53
mn
OnePass Miles/
Elite
Qualification*:
1,415
/100%
Flight: CO611
Aircraft: Boeing 757-200
Fare Class: Economy (M)
Meal: Snack
No Special Meal Offered.
See On-Time Performance
View Seats
Change Planes. Connect time in Houston, TX (IAH - Intercontinental) is 52 minutes.
Depart:
9:15 a.m.Sun., Jan. 11, 2009Houston, TX (IAH -
Intercontinental)
Arrive:
10:53 a.m.Sun., Jan. 11, 2009
Los Angeles, CA (LAX)
Flight
Time:
3 hr 38
mn
Travel
Time:
8 hr 23
mn
OnePass Miles/
Elite
Qualification*:
1,379 /150%
Total Miles:
2,794
Flight: CO1495
Aircraft: Boeing 757-300
Fare Class: Economy (H)
Meal: Snack
No Special Meal Offered.
See On-Time Performance
View Seats
from
$1,446
5 tickets
at this price
Select
Depart:
7:30 a.m.Sun., Jan. 11, 2009New York/Newark, NJ (EWR
- Liberty)
Arrive:
10:30 a.m.Sun., Jan. 11, 2009Houston, TX (IAH -
Intercontinental)
Flight
Time:
4 hr 0 mn
OnePass Miles/
Elite
Qualification*:
1,415
/100%
Flight: CO211
Aircraft: Boeing 767-200
Fare Class: Economy (M)
Meal: Snack
No Special Meal Offered.
See On-Time Performance
View Seats
Change Planes. Connect time in Houston, TX (IAH - Intercontinental) is 1 hour 20 minutes.
Depart:
11:50 a.m.Sun., Jan. 11, 2009Houston, TX (IAH -
Intercontinental)
Arrive:
1:30 p.m.Sun., Jan. 11, 2009
Los Angeles, CA (LAX)
Flight
Time:
3 hr 40
mn
Travel
OnePass Miles/
Elite
Qualification*:
1,379 /150%
Total Miles:
2,794
Flight: CO1605
Aircraft: Boeing 737-900
Fare Class: Economy (H)
Meal: Snack
No Special Meal Offered.
See On-Time Performance
Time:
9 hr 0
mn
View Seats
from
$1,446
4 tickets
at this price
Select
Depart:
7:30 a.m.Sun., Jan. 11, 2009New York/Newark, NJ (EWR
- Liberty)
Arrive:
10:30 a.m.Sun., Jan. 11, 2009Houston, TX (IAH -
Intercontinental)
Flight
Time:
4 hr 0 mn
OnePass Miles/
Elite
Qualification*:
1,415
/100%
Flight: CO211
Aircraft: Boeing 767-200
Fare Class: Economy (M)
Meal: Snack
No Special Meal Offered.
See On-Time Performance
View Seats
Change Planes. Connect time in Houston, TX (IAH - Intercontinental) is 2 hours 30 minutes.
Depart:
1:00 p.m.Sun., Jan. 11, 2009Houston, TX (IAH -
Intercontinental)
Arrive:
2:48 p.m.Sun., Jan. 11, 2009
Los Angeles, CA (LAX)
Flight
Time:
3 hr 48
mn
Travel
Time:
10 hr
18 mn
OnePass Miles/
Elite
Qualification*:
1,379 /150%
Total Miles:
2,794
Flight: CO1695
Aircraft: Boeing 737-800
Fare Class: Economy (H)
Meal: Snack
No Special Meal Offered.
See On-Time Performance
View Seats
from
$1,446
4 tickets
at this price
Select
Depart:
8:50 a.m.Sun., Jan. 11, 2009New York/Newark, NJ (EWR
- Liberty)
Arrive:
12:00 p.m.Sun., Jan. 11, 2009Houston, TX (IAH -
Intercontinental)
Flight
Time:
4 hr 10
mn
OnePass Miles/
Elite
Qualification*:
1,415
/100%
Flight: CO1011
Aircraft: Boeing 737-700
Fare Class: Economy (M)
Meal: Snack
No Special Meal Offered.
See On-Time Performance
View Seats
Change Planes. Connect time in Houston, TX (IAH - Intercontinental) is 1 hour.
Depart:
1:00 p.m.Sun., Jan. 11, 2009Houston, TX (IAH -
Intercontinental)
Arrive:
2:48 p.m.Sun., Jan. 11, 2009
Los Angeles, CA (LAX)
Flight
Time:
3 hr 48
mn
Travel
Time:
8 hr 58
mn
OnePass Miles/
Elite
Qualification*:
1,379 /150%
Total Miles:
2,794
Flight: CO1695
Aircraft: Boeing 737-800
Fare Class: Economy (H)
Meal: Snack
No Special Meal Offered.
See On-Time Performance
View Seats
from
$1,446
5 tickets
at this price
Select
Depart:
8:50 a.m.Sun., Jan. 11, 2009New York/Newark, NJ (EWR
- Liberty)
Arrive:
12:00 p.m.Sun., Jan. 11, 2009Houston, TX (IAH -
Intercontinental)
Flight
Time:
4 hr 10
mn
OnePass Miles/
Elite
Qualification*:
1,415
/100%
Flight: CO1011
Aircraft: Boeing 737-700
Fare Class: Economy (M)
Meal: Snack
No Special Meal Offered.
See On-Time Performance
View Seats
Change Planes. Connect time in Houston, TX (IAH - Intercontinental) is 2 hours 15 minutes.
Depart:
2:15 p.m.Sun., Jan. 11, 2009Houston, TX (IAH -
Intercontinental)
Arrive:
4:00 p.m.Sun., Jan. 11, 2009
Los Angeles, CA (LAX)
Flight
Time:
3 hr 45
mn
Travel
Time:
10 hr
10 mn
OnePass Miles/
Elite
Qualification*:
1,379 /150%
Total Miles:
2,794
Flight: CO137
Aircraft: Boeing 737-900
Fare Class: Economy (H)
Meal: None
See On-Time Performance
View Seats
from
$1,446
5 tickets
at this price
Depart:
10:40 a.m.Sun., Jan. 11, 2009New York/Newark, NJ (EWR
- Liberty)
Arrive:
1:32 p.m.Sun., Jan. 11, 2009Houston, TX (IAH -
Intercontinental)
Flight
Time:
3 hr 52
mn
OnePass Miles/
Elite
Qualification*:
1,415
/100%
Flight: CO303
Aircraft: Boeing 737-800
Fare Class: Economy (M)
Meal: Snack
No Special Meal Offered.
See On-Time Performance
View Seats
Frazier (Cornell University) Optimal Learning talk 5 / 45
AIDS Treatment and Prevention
We would like to treat and preventAIDS in Africa.
We are uncertain about theeffectiveness of experimentaltreatments and untried preventionmethods, but we can learn aboutthem by using them in practice, orby conducting scientific studies.
To which treatment and preventionmethods should we allocate ourinvestigative resources?
How should we balance using thosemethods that appear to be mosteffective, with those untriedmethods that may be very good?
Frazier (Cornell University) Optimal Learning talk 6 / 45
Exploration vs. Exploitation in News Feeds
We would like to design anautomatic document screeningsystem that forwards documents(e.g., webpages) of interest to ahuman.
The screening system earns a rewardif the forwarded document is ofinterest, and pays a penalty if not.
Even if the expected immediatereward of forwarding a particulardocument is negative, the systemmay still want to do so becausehuman feedback may allow thesystem to improve futureperformance.
Frazier (Cornell University) Optimal Learning talk 7 / 45
Adaptive Web Design (multi-armed bandits)
Frazier (Cornell University) Optimal Learning talk 8 / 45
Product development(optimization of expensive functions)
We have a product whose featureswe are selecting based on asequence of focus groups.
We have the time and budget for afixed number of focus groups,through which we want to learnmore about underlying consumerpreferences for these features.
After conducting these focusgroups, we will choose a particularset of features with which to bringour product to market and receive areward based on the resulting salesrevenue and manufacturing anddevelopment costs.
Frazier (Cornell University) Optimal Learning talk 9 / 45
Other examples
Materials informatics / Designing novel materials
Simulation optimization
Optimization of long-running computer codes
Clinical trials (sequential hypothesis testing)
Inventory control with censored demand
Quality control (changepoint detection)
Frazier (Cornell University) Optimal Learning talk 10 / 45
Outline
1 Example Optimal Learning Problems
2 Bayesian Selection of the BestProblem summaryBayesian inferenceThe Knowledge-Gradient (KG) methodOptimality analysis using dynamic programming
3 Conclusion
Frazier (Cornell University) Optimal Learning talk 11 / 45
Outline
1 Example Optimal Learning Problems
2 Bayesian Selection of the BestProblem summaryBayesian inferenceThe Knowledge-Gradient (KG) methodOptimality analysis using dynamic programming
3 Conclusion
Frazier (Cornell University) Optimal Learning talk 12 / 45
We consider an optimal learning problem called BayesianRanking & Selection
We consider an optimal optimal learning problem called “BayesianRanking & Selection (R&S)” or “Bayesian Selection of the Best”.
In this problem, we wish to know which of a finite number of optionsis the best.
To figure out the quality of an option, we can sample it (try it out).
When we sample an option, we get a noisy observation of its quality.
We can take a limited number of samples.
We wish to allocate this sampling budget efficienly, so as to bestsupport selecting the best.
Frazier (Cornell University) Optimal Learning talk 13 / 45
Example: Drug Discovery
A pharmaceutical company has a library of millions of compounds that itwould like to screen for potential cancer drugs. Robots will do the initialassay by performing a fixed test one or several times on some subset of thecompounds.
Sources: http://www.paa.co.uk/img/labauto/inst highres/ssi/mini dispenser.jpg,
http://www.kalyx.com/store/images/Images SW/SW 201442-51.jpg
Frazier (Cornell University) Optimal Learning talk 14 / 45
Example: Queuing Control
We would like to choose a nurse/doctor staffing policy in a hospitalto minimize expected patient waiting time.To figure out the patient waiting time under a particular staffingpolicy is, we can simulate it using a discrete event simulation.Each simulation takes about 1 minute.We want to choose the best among 100 possible staffing policies,using at most 24 hours of simulation effort.
Shi, Chen, and Yucesan
budget. This is the basic idea of optimal computing budget
allocation (OCBA) (Chen et al. 1996, 1999).
We apply the hybrid algorithm for a stochastic
resource allocation problem, where no analytical
expression exists for the objective function, and it is
estimated through simulation. Numerical results show that
our proposed algorithm can be effectively used for solving
large-scale stochastic discrete optimization problems.
The paper is organized as follows: In section 2 we
formulate the resource allocation problem as a stochastic
discrete optimization problem. In section 3 we present the
hybrid algorithm. The performance of the algorithm is
illustrated with one numerical example in Section 4.
Section 5 concludes the paper.
problem of performing numerical expectation since the
functional L( 0,5> is available only in the form of a complex
calculation via simulation. The standard approach is to
estimate E[L( 6, 5>] by simulation sampling, i.e.,
Unfortunately, t can not be too small for a reasonable
estimation of E[L(O, 01. And the total number of
simulation samples can be extremely large since in the
resource allocation problems, the number of (el, &,..., 0,) combinations is usually very large as we will show the
following example.
2 RESOURCE ALLOCATION PROBLEMS 2.1 Buffer Allocation in Supply Chain Management
There are many resource allocation problems in the design
of discrete event systems. In this paper we consider the
following resource allocation optimization problem:
where 0 is a finite discrete set and J : 0 + R is a
performance function that is subject to noise. Often J ( @ is
an expectation of some random estimate of the
performance,
where 5 is a random vector that represents uncertain factors
in the systems. The "stochastic" aspect has to do with the
We consider a 10-node network shown in Figure 1. There
are 10 servers and 10 buffers, which is an example of a
supply chain, although such a network could be the model
for many different real-world systems, such as a
manufacturing system, a communication or a traffic
network. There are two classes of customers with different
arrival distributions, but the same service requirements. We
consider both exponential and non-exponential
distributions (uniform) in the network. Both classes arrive
at any of Nodes 0-3, and leave the network after having
gone through three different stages of service. The routing
is not probabilistic, but class dependent as shown in Figure
1. Finite buffer sizes at all nodes are assumed which is
exactly what makes our optimization problem interesting.
More specific, we are interested in distributing optimally
C1: Unif[2,18]
C2: Exp(O.12) Arrival:
Figure 1: A 10-node Network in the Resource Allocation Problem
396
Authorized licensed use limited to: IEEE Xplore. Downloaded on November 20, 2008 at 16:29 from IEEE Xplore. Restrictions apply.
Source: Shi,Chen,and Yucesan 1999Frazier (Cornell University) Optimal Learning talk 15 / 45
Mathematical Model
We consider k alternative options.
The underlying value of alternative x is θx ∈ R. We do not observethis, and must try to learn it through sampling. Let θ = (θ1, . . . ,θk).
At each time n = 1, . . . ,N, we choose an alternative to sample,xn ∈ {1, . . . ,k}.We observe a sample,
yn | xn,θ1:k ∼ Normal(θxn ,λ2).
To keep things simple, we assume that λ 2 is known and is the samefor all options. It is also possible to allow λ 2 to be unknown, and tovary with x .
At time N, we select an option x̂ ∈ {1, . . . ,k}, which we hope is thebest option.
We receive a reward of θx̂ , which is the true value of the selectedoption x̂ .
Frazier (Cornell University) Optimal Learning talk 16 / 45
Example, Time 0
-2
-1
0
1
2
x=1 x=2 x=3 x=4 x=5
Frazier (Cornell University) Optimal Learning talk 17 / 45
Example, Time 1
-2
-1
0
1
2
x=1 x=2 x=3 x=4 x=5
measurement
Frazier (Cornell University) Optimal Learning talk 17 / 45
Example, Time 2
-2
-1
0
1
2
x=1 x=2 x=3 x=4 x=5
measurements
Frazier (Cornell University) Optimal Learning talk 17 / 45
Example, Time 3
-2
-1
0
1
2
x=1 x=2 x=3 x=4 x=5
measurements
Frazier (Cornell University) Optimal Learning talk 17 / 45
Example, Time 4
-2
-1
0
1
2
x=1 x=2 x=3 x=4 x=5
measurements
Frazier (Cornell University) Optimal Learning talk 17 / 45
Example, Time 5
-2
-1
0
1
2
x=1 x=2 x=3 x=4 x=5
measurements
Frazier (Cornell University) Optimal Learning talk 17 / 45
Example, Time 10
-2
-1
0
1
2
x=1 x=2 x=3 x=4 x=5
measurements
Frazier (Cornell University) Optimal Learning talk 17 / 45
Example, Time 10
-2
-1
0
1
2
x=1 x=2 x=3 x=4 x=5
measurementstrue values
Frazier (Cornell University) Optimal Learning talk 17 / 45
Outline
1 Example Optimal Learning Problems
2 Bayesian Selection of the BestProblem summaryBayesian inferenceThe Knowledge-Gradient (KG) methodOptimality analysis using dynamic programming
3 Conclusion
Frazier (Cornell University) Optimal Learning talk 18 / 45
We put a Bayesian prior probability distribution on θ
The underlying value of alternative x is θx .
We do not know θx , but based on intuition, experience, and datafrom other similar problems, we may be able to make statements like“The expected waiting time for this nurse staffing policy is probablybetween 15 minutes and 4 hours.”
We formalize this by supposing that θx was drawn by nature atrandom from a Bayesian prior probability distribution.
Once θx is drawn by nature (at time n = 0), it stays fixed (overn = 1,2, . . .).
We use a normal prior probility distribution, because it is flexible, andallows easy computation:
θx ∼ Normal(µ0x ,σ20x).
Frazier (Cornell University) Optimal Learning talk 19 / 45
We can use Bayesian statistics to estimate θx , based onnoisy samples.
Suppose our first sample is from option x , so x1 = x .
We observey1 | x1 = x ,θ1:k ∼ Normal(θx ,λ
2).
We can use Bayes rule to calculate the conditional distribution of θx
given this sample.
The conditional distribution given the data is called the posteriordistribution.
Frazier (Cornell University) Optimal Learning talk 20 / 45
We can use Bayesian statistics to estimate θx , based onnoisy samples.
Bayes rule shows us that the posterior distribution on θx is
θx | x1,y1 ∼ Normal(µ1,x ,σ
21,x
),
where
µ1,x =(σ0,x)−2µ0,x + λ−2y1
σ−20,x + λ−2
σ21,x =
[σ−20,x + λ
−2]−1
The posterior distribution on θx ′ , where x ′ 6= x , does not change.
Frazier (Cornell University) Optimal Learning talk 21 / 45
There is a nice expression for the posterior distribution
In general,
θx | x1, . . . ,xn,y1, . . . ,yn ∼ Normal(µn,x ,σ
2n,x
),
where µn,x ,σn,x can be computed recursively.For x = xn, the posterior is updated via:
µn+1,x =σ−2n,x µn,x + λ−2yn+1
σ−2n,x + λ−2
σ2n+1,x =
[σ−2n,x + λ
−2]−1and the posterior for x 6= xn does not change:
µn+1,x = µn,x for x 6= xn
σn+1,x = σn,x for x 6= xn
Frazier (Cornell University) Optimal Learning talk 22 / 45
Example of the posterior distribution
Frazier (Cornell University) Optimal Learning talk 23 / 45
Example of the posterior distribution
Frazier (Cornell University) Optimal Learning talk 23 / 45
Example of the posterior distribution
Frazier (Cornell University) Optimal Learning talk 23 / 45
Example of the posterior distribution
Frazier (Cornell University) Optimal Learning talk 23 / 45
Example of the posterior distribution
Frazier (Cornell University) Optimal Learning talk 23 / 45
Example of the posterior distribution
Frazier (Cornell University) Optimal Learning talk 23 / 45
Example of the posterior distribution
Frazier (Cornell University) Optimal Learning talk 23 / 45
Example of the posterior distribution
Frazier (Cornell University) Optimal Learning talk 23 / 45
Example of the posterior distribution
Frazier (Cornell University) Optimal Learning talk 23 / 45
Example of the posterior distribution
Frazier (Cornell University) Optimal Learning talk 23 / 45
Example of the posterior distribution
Frazier (Cornell University) Optimal Learning talk 23 / 45
We can use the posterior distribution to choose x̂
Recall that x̂ is our selection of the best, and it is chosen at time Nbased on all previous samples x1, . . . ,xN ,y1, . . . ,yN .
Based on these samples, the posterior is,
θx | x1, . . . ,xN ,y1, . . . ,yN ∼ Normal(µN,x ,σ
2N,x
).
Recall that the reward for choosing x̂ = x is θx .
The conditional expected reward for choosing x̂ = x is
E [θx | x1, . . . ,xN ,y1, . . . ,yN ] = µN,x .
Thus, the choice that gives the biggest conditional expected reward isarg maxx µN,x and it has value maxx µN,x .
Frazier (Cornell University) Optimal Learning talk 24 / 45
Example of choosing x̂
Frazier (Cornell University) Optimal Learning talk 25 / 45
Example of choosing x̂
Frazier (Cornell University) Optimal Learning talk 25 / 45
How should we choose the xn?
Our ability to choose x̂ accurately depends on the chooses we makefor x1, . . . ,xN .
Intuitively, a good way to choose these should spend the first part ofthe budget exploring the options to figure out which ones are amongthe best, and then focus the rest of the budget on these options.
But how precisely should we accomplish this?
One way to choose the xn is through the knowledge-gradient (KG)method for independent beliefs.
Later, in the seminar, I will talk about the knowedge-gradient methodfor correlated beliefs.
Frazier (Cornell University) Optimal Learning talk 26 / 45
Outline
1 Example Optimal Learning Problems
2 Bayesian Selection of the BestProblem summaryBayesian inferenceThe Knowledge-Gradient (KG) methodOptimality analysis using dynamic programming
3 Conclusion
Frazier (Cornell University) Optimal Learning talk 27 / 45
The knowledge-gradient factor quantifies a sample’s value
The knowledge-gradient method is created via the following thoughtexperiment.
If we were to stop at time n, and select x̂ based on x1:n,y1:n, wewould earn an expected reward of
µ∗n = max
xµn,x .
If we were to take one more sample, xn+1, and observe yn+1, andthen select x̂ , we would earn an expected reward of
µ∗n+1 = max
xµn+1,x .
Frazier (Cornell University) Optimal Learning talk 28 / 45
The knowledge-gradient factor quantifies a sample’s value
Before the new sample, our value was µ∗n . After, it was µ∗n+1.
The additional sample xn+1,yn+1 has increased our solution’s value by
µ∗n+1−µ
∗n .
At time n, we don’t know yn+1, so we can’t compute this quantity.
We can, however, compute its expected value,
KGn(x) = En
[µ∗n+1−µ
∗n | xn+1 = x
].
We call this quantity the knowledge-gradient (KG) factor, because itmeasures the change in the value of our knowledge.
Frazier (Cornell University) Optimal Learning talk 29 / 45
Computing the KG factor requires us to think about howthe next measurement will change our posterior.
At time n, suppose we decide to measure xn+1 = x .
Before we observe yn+1, it is random.
We can calculate its conditional distribution givenx1, . . . ,xn+1,y1, . . . ,yn.
yn+1 | x1, . . . ,xn+1,y1, . . . ,yn ∼ Normal(µn,x ,σ2n,x + λ
2),
From this, and the formula for µn+1,x in terms of µn,x , σ2n,x , and
yn+1, we can calculate that
µn+1,x | x1, . . . ,xn+1,y1, . . . ,yn ∼ Normal(µn,x , σ̃2n,x),
where σ̃n,x = σ2n,x/
√σ2n,x + λ 2.
This distribution is called the “posterior predictive distribution”.
Frazier (Cornell University) Optimal Learning talk 30 / 45
The KG factor has a convenient formula.
The VOI / KG factor for measuring alternative x at time n is
KGn(x) = σ̃n,x f
(−∆n,x
σ̃n,x
)where
∆n,x = |µn,x −maxx ′ 6=x
µn,x ′ |,
f (c) = cΦ(c) + ϕ(c),
Φ is the normal cdf, and ϕ is the normal pdf.
Frazier (Cornell University) Optimal Learning talk 31 / 45
Animation of the KG method
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
0 1 2 3 4 5 6
KnowledgeGradient n=0
priorY_x
Frazier (Cornell University) Optimal Learning talk 32 / 45
Animation of the KG method
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
0 1 2 3 4 5 6
KnowledgeGradient n=1
priorY_xyhat
Frazier (Cornell University) Optimal Learning talk 32 / 45
Animation of the KG method
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
0 1 2 3 4 5 6
KnowledgeGradient n=2
priorY_xyhat
Frazier (Cornell University) Optimal Learning talk 32 / 45
Animation of the KG method
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
0 1 2 3 4 5 6
KnowledgeGradient n=3
priorY_xyhat
Frazier (Cornell University) Optimal Learning talk 32 / 45
Animation of the KG method
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
0 1 2 3 4 5 6
KnowledgeGradient n=4
priorY_xyhat
Frazier (Cornell University) Optimal Learning talk 32 / 45
Animation of the KG method
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
0 1 2 3 4 5 6
KnowledgeGradient n=5
priorY_xyhat
Frazier (Cornell University) Optimal Learning talk 32 / 45
Animation of the KG method
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
0 1 2 3 4 5 6
KnowledgeGradient n=6
priorY_xyhat
Frazier (Cornell University) Optimal Learning talk 32 / 45
Animation of the KG method
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
0 1 2 3 4 5 6
KnowledgeGradient n=7
priorY_xyhat
Frazier (Cornell University) Optimal Learning talk 32 / 45
Animation of the KG method
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
0 1 2 3 4 5 6
KnowledgeGradient n=8
priorY_xyhat
Frazier (Cornell University) Optimal Learning talk 32 / 45
Animation of the KG method
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
0 1 2 3 4 5 6
KnowledgeGradient n=9
priorY_xyhat
Frazier (Cornell University) Optimal Learning talk 32 / 45
Animation of the KG method
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
0 1 2 3 4 5 6
KnowledgeGradient n=10
priorY_xyhat
Frazier (Cornell University) Optimal Learning talk 32 / 45
Animation of the KG method
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
0 1 2 3 4 5 6
KnowledgeGradient n=11
priorY_xyhat
Frazier (Cornell University) Optimal Learning talk 32 / 45
Animation of the KG method
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
0 1 2 3 4 5 6
KnowledgeGradient n=12
priorY_xyhat
Frazier (Cornell University) Optimal Learning talk 32 / 45
Animation of the KG method
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
0 1 2 3 4 5 6
KnowledgeGradient n=13
priorY_xyhat
Frazier (Cornell University) Optimal Learning talk 32 / 45
Animation of the KG method
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
0 1 2 3 4 5 6
KnowledgeGradient n=14
priorY_xyhat
Frazier (Cornell University) Optimal Learning talk 32 / 45
Animation of the KG method
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
0 1 2 3 4 5 6
KnowledgeGradient n=15
priorY_xyhat
Frazier (Cornell University) Optimal Learning talk 32 / 45
Animation of the KG method
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
0 1 2 3 4 5 6
KnowledgeGradient n=16
priorY_xyhat
Frazier (Cornell University) Optimal Learning talk 32 / 45
Animation of the KG method
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
0 1 2 3 4 5 6
KnowledgeGradient n=17
priorY_xyhat
Frazier (Cornell University) Optimal Learning talk 32 / 45
Animation of the KG method
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
0 1 2 3 4 5 6
KnowledgeGradient n=18
priorY_xyhat
Frazier (Cornell University) Optimal Learning talk 32 / 45
Animation of the KG method
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
0 1 2 3 4 5 6
KnowledgeGradient n=19
priorY_xyhat
Frazier (Cornell University) Optimal Learning talk 32 / 45
Animation of the KG method
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
0 1 2 3 4 5 6
KnowledgeGradient n=20
priorY_xyhat
Frazier (Cornell University) Optimal Learning talk 32 / 45
The KG method works well
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 >.950
20
40
60
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 >.950
20
40
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 >.950
10
20
Value(KG)−Value(Boltzmann)
Value(KG)−Value(EqualAllocation)
Value(KG)−Value(Exploit)
−0.01 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090
20
40
60
−0.01 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090
20
40
60
−0.01 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.080
20
40
60
Value(KG)−Value(IE)
Value(KG)−Value(OCBA)
Value(KG)−Value(LL(S))
Histogram of the sampled difference in value for competing policiesaggregated across the 100 randomly generated problems.
Frazier (Cornell University) Optimal Learning talk 33 / 45
Outline
1 Example Optimal Learning Problems
2 Bayesian Selection of the BestProblem summaryBayesian inferenceThe Knowledge-Gradient (KG) methodOptimality analysis using dynamic programming
3 Conclusion
Frazier (Cornell University) Optimal Learning talk 34 / 45
The knowledge-gradient method is good, but it notoptimal in general
The KG method works well against other algorithms proposed for thisproblem.
The KG method is optimal if we have only sample remaining.
But in general, multiple samples remain.
What is the best algorithm in general?
Frazier (Cornell University) Optimal Learning talk 35 / 45
The optimal algorithm is the solution to a dynamicprogram
The conditional expected value we receive, given what we know attime N, is maxx µN,x .
Define VN = VN(µN ,σ2N) = maxx µN,x .
At time N−1, the optimal choice of xN is the one that maximizes theexpected value of this reward,
arg maxxN
EN [VN |xN ],
and this maximal expected value is
VN−1 = VN−1(µN−1,σ2N−1) = max
xNEN−1[VN |xN ].
Notation: En means the conditional expectation with respect to µn
and σ2n ; µN = (µN ,x : x = 1, . . . ,k) and similarly for σ2
N .
Frazier (Cornell University) Optimal Learning talk 36 / 45
In principle, we can repeat this to find the optimal rule forevery xn
We iterate backward n = N,N−1,N−2, . . . ,1, where in each stage n:
We computed Vn+1(µn+1,σ2n+1) in the previous stage.
The optimal choice for xn+1 is
xn+1 ∈ arg maxxn+1
En[Vn+1(µn+1,σ2n+1)|xn+1]
The value of this decision is
Vn(µn,σn) = maxxn+1
En[Vn+1(µn+1,σ2n+1)|xn+1].
This is dynamic programming.
Frazier (Cornell University) Optimal Learning talk 37 / 45
We can solve the DP exactly for small problems
Here is the value function for a Bayesian ranking and selection problemwith Bernoulli (0/1) observations, and independent beta prior distributions.
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0 1 2 3 4 5 6 7 8
Valu
e
# measurements (N)
k=4k=3k=2
Frazier (Cornell University) Optimal Learning talk 38 / 45
For large problems, this does not work because of thecurse of dimensionality
To use dynamic programming, we need to compute and storeVn(µn,σ
2n ) for each possible value of µn and σ2
n . (We need tocompute Vn for every n, but at any given time we only need Vn andVn+1 in memory.)
There are infinitely many possible values for µn. We can discretize,but it is a vector in k dimensions, and so discretizing into m pieces ineach dimension allows for mk possible values.
σ2n only takes finitely many values, since (σ2
nx)−1/λ−2 is the numberof samples of alternative x , but there are still kn/n! possible values.
For large values of k (say, k > 10), solving the dynamic program iscomputationally intractable.
For such large values of k , we recommend using the KG policy.
Frazier (Cornell University) Optimal Learning talk 39 / 45
The KG method has nice optimality properties
The dynamic programming equations to prove certain optimality propertiesof the KG policy:
The knowledge-gradient policy is optimal when N = 1.
The knowledge-gradient policy is asymptotically optimal as N → ∞.
For other N, the knowledge-gradient policy’s suboptimality isbounded by
VKG ,n(Sn)≥ V n(Sn)− N−n−1√2π
maxx
σ̃nx ,
where VKG ,n gives the value of the knowledge-gradient policy and V n
the value of the optimal policy, both with N−n measurementsremaining.
Frazier (Cornell University) Optimal Learning talk 40 / 45
The KG method has nice optimality properties
If there are exactly 2 alternatives (M=2), the knowledge-gradient policy isoptimal. In this case, the optimal policy reduces to
xn = arg maxx
σnx .
Frazier (Cornell University) Optimal Learning talk 41 / 45
The KG method has nice optimality properties
If there is no measurement noise, and alternatives may be reordered so that
µ01 ≥ µ
02 ≥ . . .≥ µ
0M
σ01 ≥ σ
02 ≥ . . .≥ σ
0M ,
then the knowledge-gradient policy is optimal.
Frazier (Cornell University) Optimal Learning talk 42 / 45
Outline
1 Example Optimal Learning Problems
2 Bayesian Selection of the BestProblem summaryBayesian inferenceThe Knowledge-Gradient (KG) methodOptimality analysis using dynamic programming
3 Conclusion
Frazier (Cornell University) Optimal Learning talk 43 / 45
Conclusion
We gave an introduction to Bayesian ranking and selection, which isone of many optimal learning problems.
We showed how Bayesian statistics and a one-step optimality analysiscan be used to derive the KG policy for this problem.
In the seminar today, we will look at another optimal learningproblem: simulation optimization, with correlated Bayesian priordistributions.
Knowledge-gradient methods offer a convenient yet principaled way todevelop algorithms for a wide variety of optimal learning problems.
Frazier (Cornell University) Optimal Learning talk 44 / 45
For further reading
P.I. Frazier, “Tutorial: Optimization via Simulation with BayesianStatistics and Dynamic Programming,” Winter SimulationConference, 2012. (available on my website)
W.B. Powell & I.O. Ryzhov “Optimal Learning”, 2012. (textbook)
The original paper on the KG method: P.I. Frazier, W.B. Powell, andS. Dayanik “A Knowledge-Gradient Policy for Sequential InformationCollection,” SIAM Journal on Control and Optimization, 2008.
Other introductory materials available on my website,http://people.orie.cornell.edu/pfrazier/
Frazier (Cornell University) Optimal Learning talk 45 / 45