Output Analysis: Variance EstimationIE680Output Analysis : Variance EstimationIE680 Output Analysis: Variance Estimation Jong-hyun Ryu 1

Output Analysis: Variance EstimationIE680 Output Analysis : Variance EstimationIE680

Output Analysis: Variance Estimation

Jong-hyun Ryu

1

Output Analysis: Variance EstimationIE680

Contents

• Motivation (Quality Control Chart)

• Collecting output data– Type of simulations

– Steady-State Simulation

– Stochastic Stationary Process

• Variance Estimation Methods– Replication Method

– Batch mean methods• Non-overlapping vs. Overlapping

– Standardized Time Series

• Additional Methods

2


Motivation

• Quality Control Chart– Simple Control Chart (Shewhart Control Chart)

• Design of Shewhart Control Chart– Control Limits

• UCL & LCL

• Results from incorrect control limits– If overestimated detection delay – If underestimated high false alarm rate– Estimating process variance (standard deviation) is critical.

ˆ3

ˆ3

UCL x

LCL x

3


Type of simulations

• Terminating vs. non-terminating simulations

• Terminating simulations– Runs for some duration of time TE, where E is a specified event

that stops the simulation.

– Starts at time 0 under well-specified initial conditions.

– Ends at the stopping time TE.

– Bank example: Opens at 8:30 am (time 0) with no customers present and 8 of the 11 teller working (initial conditions), and closes at 4:30 pm (Te)

– The simulation analyst chooses to consider it a terminatingsystem because the object of interest is one day’s operation.

4


Type of simulations

• Non-terminating simulation– Runs continuously, or at least over a very long period of time.– Examples: assembly lines that shut down infrequently, telephone

systems, hospital emergency rooms. – Initial conditions defined by the analyst.– Runs for some analyst-specified period of time TE.– Study the steady-state (long-run) properties of the system,

properties that are not influenced by the initial conditions of the model.

• Terminating or non-terminating – The objectives of the simulation study – The nature of the system.

5


Output Analysis for Steady-State Simulation

• Consider a single run of a simulation model to estimate a steady-state or long-run characteristics of the system.– The single run produces observations Y1, Y2, ... (generally

the samples of an autocorrelated time series).

• Performance measures:– Point estimator and confidence interval

– Independent of the initial conditions

6


Output Analysis for Steady-State Simulation

• The sample size is a design choice, with several considerations in mind:– Any bias in the point estimator that is due to artificial or

arbitrary initial conditions (bias can be severe if run length is too short).

– Desired precision of the point estimator.

– Budget constraints on computer resources.

7


Initialization Bias

• Methods to reduce the point-estimator bias caused by using artificial and unrealistic initial conditions:

– Intelligent initialization.– Divide simulation into an initialization phase and data-collection

phase.

• Intelligent initialization

– Initialize the simulation in a state that is more representative of long-run conditions.

– If the system exists, collect data on it and use these data to specify more nearly typical initial conditions.

– If the system can be simplified enough to make it mathematically solvable, e.g. queueing models, solve the simplified model to find long-run expected or most likely conditions, use that to initialize the simulation.

8


Initialization Bias

• No widely accepted, objective and proven technique to guide how much data to delete to reduce initialization bias to a negligible level.

• Plots can be misleading but they are still recommended.– If averages reveal a smoother and more precise trend as the # of

replications increases.– If averages can be smoothed further by plotting a moving

average.– Cumulative average becomes less variable as more data are

averaged.– The more correlation present, the longer it takes for to approach

steady state.

9


Introduction to variance estimation

• Simulation output analysis – Point estimator and confidence interval

– Variance estimation confidence interval

• Independent and identically distributed (IID)– Suppose X1,…Xm are iid

10


Stochastic Stationary Process

• Stationary time series with positive autocorrelation

• Stationary time series with negative autocorrelation

• Nonstationary time series with an upward trend

The stochastic process X is stationary for t1,…,tk, t T, if∈

1 1" "( ,..., ) ( ,..., )

d

k k

d

t t t t t t where denotes equality in distributionX X X X

11


Stochastic Stationary Process (2)

• A discrete-time stationary process X = {Xi : i≥1} with mean µ and variance σX

2 = Cov(Xi,Xi),

• Variance of the sample mean

• Easy to see

• For iid,

2 21, 1

1

2 ( )X jj

Cov X X

2lim ( )nnnVar X

2 12

1, 11

( ) 1 2(1 ) ( )

1

nn

X jj

S X jE Cov X X

n n n n

12

1, 11

1( ) 2 (1 ) ( )

n

n X jj

jVar X Cov X X

n n

12


Stochastic Stationary Process (3)• The expected value of the variance estimator is:

– If Xi are independent, then is an unbiased estimator of

– If the autocorrelation is positive, then is biased low as an estimator of

– If the autocorrelation is negative, then is biased high as an estimator of

2

2

1( )

( )1

( )( )

n nn

nn

nS X a

E Var Xn n

S XE Var X when positively correlated

n

( )nVar X

( )nVar X

( )nVar X2 ( )nS X

n

2 ( )nS X

n

2 ( )nS X

n

13


Replication Method

• Use to estimate point-estimator variability and to construct a confidence interval.

• Approach: make R replications, initializing and deleting from each one the same way.

• Important to do a thorough job of investigating the initial-condition bias:– Bias is not affected by the number of replications, instead, it is affected only by

deleting more data or extending the length of each run (i.e. increasing TE).

• Basic raw output data {Xrj, r = 1, ..., R; j = 1, …, n} is derived by:– Individual observation from within replication r.– Batch mean from within replication r of some number of discrete-time observations.– Batch mean of a continuous-time process over time interval j.

14


Replication Method

• Length of each replication (n) beyond deletion point (d):(n - d) > 10d

• Number of replications (R) should be as many as time permits, up to about 25 replications.

• For a fixed total sample size (n), as fewer data are deleted (d decreases)

– C.I. shifts: greater bias.– Decrease Variance– Trade off between bias and variance.

15


Batch means method

– When using a single, long replication:

– Problem :• Data are dependent so the usual estimator is biased

• Replication method (more than 25 replications) is expensive

– One solution: Batch means method

16


• Non-overlapping batch mean (NBM)

m observations

with batch mean Y1,m

Batch 1

m observations

with batch mean Yk,m

Batch k

Batch means method (Nonoverlapping batch mean)

1, 2, , ,

1 1 2 , ( 1) 1 , ( 1) 1,..., , ,..., ... ,..., ... ,...,

m m i m k m

n k m

m m m i m im k m km

Y Y Y Y

X

X X X X X X X X

17


Nonoverlapping batch mean (NBM)

• Suppose the batch means become uncorrelated

as m ∞

• NBM estimator for σ2

• Confidence Interval

,,

( )( ) ( )i m

n i m

nVar YnVar X mVar Y

k

2,

1

ˆ ( , ) ( )1

k

B i m ni

mV k m Y X

k

1,1 / 2

ˆ ( , )Bn k

V k mX t

n

18


Overlapping Batch Mean (OBM)

• OBM estimator for σ2

1 2 3 , 1 2 3, , , ... , , , , ...m m m mX X X X X X X

Y1,m

Y2,m

12

,1

ˆ ( ) ( )( 1)( )

n m

O i m ni

nmV m Y X

n m n m

19


NBM vs. OBM

• Under mild conditions

– Thus, both have similar bias

• Variance of the estimators

– Thus, the OBM method gives better MSE

2ˆ ( , ) ( 1) (1 )BE V k m k n o n 2ˆ ( ) (1 )OE V m m o m

ˆ ( , ) 3, ,

ˆ 2( )

B

O

Var V k mas k m

Var V m

20


Standardized Time Series

• Define the ‘centered’ partial sums of Xi as

• Central Limit Theorem

• Define the continuous time process

Question: How does Zn(t) behave as n increases?

21


n=100

22


n=1000

23


n=10000

24


n=1,000,000

25


Functional Central Limit Theorem (FCLT)

where B(t) is the standard Brownian motion (drift coefficient =0, diffusion coefficient =1)

• Brownian motion with drift coefficient and diffusion coefficient 2 is a real valued stochastic process with stationary and independent increments having continuous sample paths where

• Thus,

1( ) ( ) (0 1)

nt

ii

n

X ntZ t B t as n t

n

22 1 2 1 2 1( ) ( ) ~ ( ( ), ( ))

( ) ( ) (0) ~ (0, )n n n

B t B t Normal t t t t

Z t Z t Z Normal t

26


Functional Central Limit Theorem (FCLT)

• While CLT says that for any t,

FCLT also shows that Zn(t) converges to an asymptotically continuous stochastic process with independent increments.

27


The Weighted Area Estimator

1

0( ) ( ) 1Var f t B t dt

1

0( ) ( ) ~ (0,1)f t B t dt N

21 2 2 2

10( ) ( ) ( ) ~ ( )A f f t B t dt E A f

28


The Weighted Area Estimator

• Since

and

is a variance estimator

1( )

nt

ii

n

X ntZ t

n

21

0

21

( ; ) ( ) ( ) ( ) ( ) ( )1

n i iA f n f Z A f f t B t dt as n

nn n ni

2 2 2 21( ; ) ( ) ~ ( ) ( )dA f n A f E A f E A f

( ; )A f n

29


Additional Methods

• Regenerative Method (Fishman 1973; Shedler 1993)

• Spectral Theory (Heidelberger and Welch 1981, 1983; Damerdji 1991)

• Quantile estimation (Heidelberger and Lewis 1984; Seila 1982)

• Estimation of functions of means (Munoz and Glynn 1997)

• Estimation of multivariate means (Anderson 1984; Charnes 1989, 1995; Seila 1984)

30


Reference

• Alexopoulos, C., D. Goldsman, and R. F. Serfozo. 2005. Stationary processes: statistical estimation. In The Encyclopedia of Statistical Sciences, ed. N. L. Johnson, and C. B. Read, 2nd edition. New York: John Wiley & Sons

• Kim, S.-H., C. Alexopoulos, K.-L. Tsui, and J. R. Wilson. 2006. A distribution-free tabular CUSUM chart for autocorrelated data. IIE Transactions, forthcoming.

• Alexopoulos, C., N. T. Argon, D. Goldsman, and G. Tokol. 2004. Overlapping variance estimators for simulations. Technical Report, School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia.

• Law, A. M., and W. D. Kelton. 2000 Simulation Modeling and Analysis, 3rd ed. New York: McGraw-Hill.

31

Documents

Output Analysis: Variance EstimationIE680Output Analysis : Variance EstimationIE680 Output Analysis: Variance Estimation Jong-hyun Ryu 1