Arena Stanfordlecturenotes11

Embed Size (px)

Citation preview

  • 7/27/2019 Arena Stanfordlecturenotes11

    1/9

    MS&E 223 Lecture Notes #11

    Simulation Making Decisions via Simulation

    Peter J. Haas Spring Quarter 2003-04

    Page 1 of9

    Making Decisions via Simulation

    Ref: Law and Kelton, Chapter 10;Handbook of Simulation , Chapters 8 and 9; Haas, Sec. 6.3.6.

    We give an introduction into some methods for selecting the best system design (or parameter setting

    for a given design), where the performance under each alternative is estimated via simulation.

    The methods presented are chosen because of simplicitysee the references for more details on exactly

    when they are applicable, and whether improved versions of the algorithms are available. Current

    understanding of the behavior of these algorithms is incomplete, and they should be applied with due

    caution.

    1. Continuous Stochastic Optimization

    Robbins-Monro Algorithm

    Here we will consider the problem min f ( ) where, for a given value of , we are not able to evaluatef ( ) analytically or numerically, but must obtain (noisy) estimates of f ( ) using simulation. We willassume for now that the set of possible solutions is uncountably infinite. In particular, suppose that isan interval [ , ] of real numbers. One approach to solving the problem )(fmin is to estimate )('f using simulation and then use an iterative method to solve the equation )('f = 0. The most basic suchmethod is called the Robbins-Monro Algorithm, and can be viewed as a stochastic version of Newtons

    method for solving nonlinear equations.

    The basic iteration is

    n+1 =

    nn Zn

    a,

    where a > 0 is a fixed parameter of the algorithm (the quantity a/n is sometimes called the gain), Z is

    an unbiased estimator of )('f n , and

    >

  • 7/27/2019 Arena Stanfordlecturenotes11

    2/9

    MS&E 223 Lecture Notes #11

    Simulation Making Decisions via Simulation

    Peter J. Haas Spring Quarter 2003-04

    Page 2 of9

    as n with probability 1. (Otherwise the algorithm can converge to some other local minimizer.) Forlarge n, it can be shown that n has approximately a normal distribution. Thus, if the procedure is run m

    independent times (where m = 5 to 10) with n iterations for each run, generating i.i.d. replicates n,1, ... ,n,m, then a point estimator for* is m =

    m1

    ,jm j 1= and confidence intervals can be formed in the usual

    way, based on the Student-t distribution with m 1 degrees of freedom.

    Remark: Variants of the above procedure are available for multi-parameter problems.

    Remark: The main drawback of the above algorithm is that convergence can be slow and sensitive to

    the choice of the parameter a; current research is focused on finding methods that converge faster. One

    easy, practical way to stabilize the algorithm a bit is to keep track and return the best parameter value

    seen so far.

    Remark: A variant of the Robbins-Monro procedure, due to Kiefer and Wolfowitz, replaces the

    gradient estimate by a finite-difference that is obtained by running the simulation at various values of the

    parameter . This procedure breaks down as the dimension of the parameter space increases. Recentwork on simultaneous perturbation stochastic approximation (SPSA) tries to avoid the dimensionality

    problem as follows. At the kth iteration of a d-dimensional problem, a finite-difference is obtained by

    running the simulation at the two parameter values c . Here is the current value of the parameter

    vector (of length d), c is a nonnegative real number that is a parameter of the algorithm, and is a d-

    vector of independent Bernoulli random variables, each equal to 1 or 1 with equal probability.

    How do we obtain the derivative estimates needed for the Robbins-Monro algorithm? One popularmethod for estimating )('f is via likelihood ratios.

    Tutorial on Likelihood Ratios

    Suppose that we wish to estimaten 0 1 n

    E[g (X ,X , ,X )] = K , where0 1 n

    X , X , , XK are i.i.d. replicates of a

    discrete random variable X with state space S. The straightforward approach would be as follows: for k =

    1, 2, ... , K, generate n+1 i.i.d. replicates of X, say0,k 1,k n,k

    X ,X , ,XK , and set

    n 0,k 1,k n,k (k) g (X ,X , , X ) = K .

    Then estimate by the average of(1), ... , (K).

    Suppose that Y is another random variable with state space S. Also suppose for all s S that q(s) = P{Y= s} is positive whenever p(s) = P{X = s} is positive. Then can be estimated based on simulation of Yinstead of X. Let Y0, ... , Yn be i.i.d. replicates of Y, and set

  • 7/27/2019 Arena Stanfordlecturenotes11

    3/9

    MS&E 223 Lecture Notes #11

    Simulation Making Decisions via Simulation

    Peter J. Haas Spring Quarter 2003-04

    Page 3 of9

    Ln =

    =

    =n

    0i

    i

    n

    0i

    i

    )Y(q

    )Y(p

    .

    Ln is called a likelihood ratio and represents the likelihood of seeing the observations under the

    distribution p relative to the likelihood of seeing the observations under distribution q. We claim that

    (1)n 0 1 n n n 0 1 n

    E[g (Y , Y , ,Y )L ] E[g (X ,X , , X )]=K K

    To see this, observe that

    0 1 n nE[g (Y , Y , ,Y )L ]K = ( )

    0 n

    n

    ii 00 n 0 0 n nn

    s S s Sii 0

    p(s )g (s , ,s ) P Y s , , Y s

    q(s )

    =

    =

    = =

    L K K

    =0 n

    nn

    ii 00 n in

    s S s S i 0ii 0

    p(s )g (s , ,s ) q ( s )

    q(s )

    =

    ==

    L K

    =0 n

    n

    0 n i

    s S s S i 0

    g (s , ,s ) p(s )

    = L K

    = E[gn(X0, ... , Xn)]

    Thus, we can simulate Ys instead of Xs, as long as the performance measure g is multiplied by the

    likelihood ratio correction factor.

    This idea can be applied in great generality. For example, if X and Y are real-valued random variables

    with density functions fX and fY respectively, then (1) holds with

    Ln =

    n

    X i

    i 0

    n

    Y i

    i 0

    f (Y)

    f (Y)

    =

    =

    .

    As another example, if X0, X1, ... , Xn are observations of a discrete-time Markov chain with initial

    distribution and transition matrix P and Y0, Y1, ... , Yn are observations of a discrete-time Markov chainwith initial distribution % and transition matrix P

    ~, then (1) holds with

  • 7/27/2019 Arena Stanfordlecturenotes11

    4/9

    MS&E 223 Lecture Notes #11

    Simulation Making Decisions via Simulation

    Peter J. Haas Spring Quarter 2003-04

    Page 4 of9

    (2) Ln =

    ( )

    ( )

    n

    0 i 1 i

    i 1

    n

    0 i 1 i

    i 1

    Y P(Y , Y )

    Y P(Y , Y )

    =

    =

    %%

    .

    Remark: The relation in (1) continues to hold if n is replaced by a random time N.

    Remark: Ln can easily be computed in a recursive manner.

    Likelihood ratios can be computed in the context of GSMPs in an analogous way. The idea is to initially set

    the running likelihood ratio equal to 1. Then, whenever there is a state transition from S to1

    S + caused

    by the occurrence of the events in *n

    E , the running likelihood ratio is multiplied by

    * *

    n 1 n n n 1 n np(S ;S ,E )/p(S ;S , E )+ +% . Moreover, for each new clock reading iX generated for an event ie , the

    running likelihood is multiplied by * *i n 1 i n n i n 1 i n n

    f ( X ; S , e , S , E ) / f ( X ; S , e , S , E )+ +% . (Here f and f% denote the

    densities of the new clock-setting distributions.) Similar multiplications are required initially to account for

    any differences in the initial distribution.

    Application of Likelihood Ratios to Gradient Estimation

    Likelihood ratio techniques can be used to yield an estimate of )('f based on simulation at the singleparameter value . Suppose that we have the representation f() = E[c(X, )]; we write E to indicate

    that the distribution of the random variable X depends on .

    Example: Consider an M/M/1 queue with interarrival rate and service rate . A performance measureof above form is obtained by letting X be the average waiting time for the first k customers and setting c(x,

    ) = a + bx. This cost function reflects the tradeoff between operating costs due to an increase in theservice rate and the costs due to increased waiting times for the customers. We assume that > , sothat the queue is stable for all possible choices of.

    As discussed above, we have

    [ ] [ ]hf ( h) E c(X, h) E c(X, h)L(h)+ + = + = + ,

    where L(h) is an appropriate likelihood ratio. Noting that L(0) = 1 and assuming that we can interchange

    the expectation and limit operations, we have

    h 0 h 0

    f ( h) f ( ) c(X, h)L(h) c(X, )L(0)f '( ) lim limE

    h h

    + + = =

  • 7/27/2019 Arena Stanfordlecturenotes11

    5/9

    MS&E 223 Lecture Notes #11

    Simulation Making Decisions via Simulation

    Peter J. Haas Spring Quarter 2003-04

    Page 5 of9

    ( )

    [ ]

    h 0h 0

    c(X, h)L(h) c(X, )L(0) dE lim E c(X, h)L(h)

    h dh

    E c'(X, ) c(X, )L'(0)

    =

    + = = + = +

    (Here c ' c /= .) To estimate r() = 'f (), generate i.i.d. replicates1 n

    X , ,XK and compute the

    estimator

    r n() = [ ]n

    i i

    i 1

    1c'(X , ) c(X , )L'(0)

    n = + .

    Confidence intervals can be formed in the usual way. In the context of the Robbins-Monro algorithm

    (where computing confidence intervals is not an issue), we often take n to be small, even perhaps taking n

    = 1, because in practice it is often more efficient to execute many R-M iterations with relatively noisyderivative estimates than to execute fewer iterations with more precise derivative estimates.

    Example: For the M/M/1 queue discussed above, denote by1 k

    V , ,VK the k service times generated in the

    course of simulating the k waiting times. Then the likelihood ratio L(h) is given by

    L(h) =

    =

    +

    +

    k

    1i

    V

    k

    1i

    V)h(

    i

    i

    e

    e)h(

    = =

    +k

    1i

    hVieh

    ,

    so that

    =

    =k

    1i

    iV1

    )0('L and ( ) = =

    ++=

    n

    1j

    k

    1i

    ijj V1

    bXaan

    1)(r ,

    where jX is the average of the k waiting times generated during the jth

    simulation and ijV is the ith

    service

    time generated during the jth

    simulation. Confidence intervals for )('f can be formed in the usual way.

    Remark: The Gradient information is useful even outside the context of stochastic optimization, because it

    can be used to explore the sensitivity of system performance to specified parameters.

    Remark: There are other methods that can be used to estimate gradients, such as infinitesimal

    perturbation analysis (IPA). See Chapter 9 in theHandbook of Simulation for further discussion.

    Remark: The main drawback to the likelihood-ratio approach is that the variance of the likelihood ratio

    increases with the length of the simulation. Thus the estimates can be very noisy for long simulation runs,

    and many simulation replications must be performed to get reasonable estimates.

  • 7/27/2019 Arena Stanfordlecturenotes11

    6/9

    MS&E 223 Lecture Notes #11

    Simulation Making Decisions via Simulation

    Peter J. Haas Spring Quarter 2003-04

    Page 6 of9

    2. Small Number of Alternatives: Ranking and SelectionNow lets look at the case of a finite number of alternatives. Law and Kelton discuss the case of twocompeting alternatives in section 10.2. We focus on the more typical case where there are a number of

    alternatives to be compared. Unless stated otherwise, we will assume that bigger is better, i.e., our goal

    is to maximize some performance measure. (Any minimization problem can be converted to a

    maximization problem by maximizing the negative of the original performance measure.)

    Suppose that we are comparing k systems, where k is on the order of 3 to 20. Denote by i the value ofthe performance measure for the i

    thalternative. (E.g., i might be the expected revenue from system i

    over one year.) Sometimes we write (1) < (2) < ... < (k), so that (k) is the optimal value of theperformance measure

    The goal is to select the best system with (high) probability 1 , whenever the performance measure forthe best system is at least better than the others; i.e., whenever (k ) (1) > for l k. The parameter

    is called the indifference zone. If in fact there exist solutions within the indifference zone (e.g., when(k ) (1) for one or more values of l), the algorithm described below typically selects a good solution

    (i.e., a system lying within the indifference zone) with probability 1 .

    The inputs to the procedure are observations ijY , where ijY is the jth

    estimate ofi

    . For a fixed

    alternative i, the observations ( ijY : j 1) are assumed to be i.i.d. (i.e., based on independent simulation

    runs). Each ijY is required to have (at least approximately) a normal distribution. Thus, an observation ijY

    may be

    an average of a large enough number of i.i.d. observations of a finite-horizon performancemeasure so that the CLT is applicable

    an estimate of a steady-state performance measure based on a large enough number ofregenerative cycles

    an estimate of a steady-state performance measure based on a large enough number of batchmeans

    The following algorithm (due to Rinott, with modifications by Nelson and Matejcik) will not only select the

    best system as defined above, but will also provide simultaneous 100(1 )% confidence intervalsi

    J

    on the quantities

    i i j i jmax =

    for 1 i k. That is,i i

    P( J for 1 k) 1 = . (This is a stronger assertion thani i

    P( J ) 1 = for 1 i k as with individual confidence intervals.) This multiple comparisons with the best approach indicateshow close each of the inferior systems is to being the best. This information is useful in identifying other

    good solutions; one of them may be needed if the best solution is ruled out because of secondary

  • 7/27/2019 Arena Stanfordlecturenotes11

    7/9

    MS&E 223 Lecture Notes #11

    Simulation Making Decisions via Simulation

    Peter J. Haas Spring Quarter 2003-04

    Page 7 of9

    considerations (ease of implementation, maintenance costs, etc.). See Handbook of Simulation, Section

    8.3.2 for further details.

    Algorithm

    1. Specify , , and an intial sample size n0 (n0 20).2. For i = 1 to k, take an i.i.d. sample

    0in2i1i Y,,Y,Y K

    3. For i = 1 to k, compute = = 00 n 1j ijn1i YY and ( ) = = 00 n 1j 2iij1n 12i YYs 4. Look of the value of h from Table 8.3 (given at the end of the lecture notes).5. For i = 1 to k, Take Ni n0 additional samples Yij, independent of the first-stage sample, where

    Ni =

    2

    i0

    hs,nmax

    6. For i = 1 to k, compute == ii N 1j ijN1 Y)i( 7. Select the system with the largest value of(i) as the best8. Form the simultaneous 100(1 )% confidence intervals fori as [ai, bi], where

    ai = min(0, (i) j imax (j) )

    bi = max(0, (i) j imax (j) + ).

    The justification of this algorithm is similar to (but more complicated than) the discussion in Section 10A in

    Law and Kelton. See Rinotts 1978 paper in Commun. Statist. A: Theory and Methods for details. (The

    algorithm presented here is simpler to execute, however, than the algorithm discussed in Law and Kelton.)

    Remark: Improved versions of this procedure can be obtained through the use of common random

    numbers. SeeHandbook of Simulation, Chapter 8.

    3. Large Number of Alternatives: Discrete Stochastic Optimization

    We wish to solve the problem )(fmax , where the value of f() is estimated using simulation and isa very large set of discrete design alternatives. For example, might be the number of servers at a servicecenter and f() might be the expected value of the total revenue over a year of operation.

  • 7/27/2019 Arena Stanfordlecturenotes11

    8/9

    MS&E 223 Lecture Notes #11

    Simulation Making Decisions via Simulation

    Peter J. Haas Spring Quarter 2003-04

    Page 8 of9

    Development of good random search algorithms is a topic of ongoing research. Most of the current

    algorithms generate a sequence (n : n 0) with n+1 N(n) {n}, where N() is a neighborhood of.A currently-optimal solution *n is maintained. The algorithms differ in the structure of the neighborhood,

    the rule used to determine n+1 from n, the rule used to compute *n , and the rule used to stop thealgorithm.

    We present a typical algorithm of this type (due to Andradottir) which uses a simulated annealing-type

    rule to determine n+1 from n. This rule usually accepts a better solution but, with a small probability,sometimes accepts a worse solution in order to avoid getting stuck at a local maximum.

    Algorithm

    1. Fix T, N > 0. Select a starting point 0, and generate an estimate )(f 0 of f(0) via simulation. SetA(0) = 0f ( ) and C(0) = 1. Set n = 0 and 0

    *0 = .

    2. Select a candidate solution 'n randomly and uniformly from {n}. Generate an estimate )(f 'n of )(f 'n via simulation.

    3. If )(f 'n )(f n , then set n+1 = 'n . Otherwise, generate a random variable U distributed uniformlyon [0,1]. If

    U

    T

    )(f)(fexp

    n'n

    then set n+1 = 'n ; otherwise, set n+1 = n.

    4. Set A(n+1) = A(n+1) + )(f 1n + and C(n+1) = C(n+1) + 1. If A(n+1)/ C(n+1) > )(A *n / )(C *n ,then set 1n

    *1n ++ = ; otherwise, set

    *n

    *1n = + .

    5. If n N return the final answer * 1n + with estimated objective function value )(A * 1n + / )(C * 1n + .Otherwise, set n = n + 1 and go to Step 2.

    In the above algorithm, the neighborhood was defined as N() = {}. If the problem has morestructure, the neighborhood may be defined in a problem-specific manner.

  • 7/27/2019 Arena Stanfordlecturenotes11

    9/9

    MS&E 223 Lecture Notes #11

    Simulation Making Decisions via Simulation

    Peter J. Haas Spring Quarter 2003-04

    Page 9 of9