38
Automating the Analysis of Simulation Output Data Katy Hoad, Stewart Robinson, Ruth Davies SSIG Meeting, 24th October 2007 http://www.wbs.ac.uk/go/ autosimoa

Automating the Analysis of Simulation Output Data

  • Upload
    ikia

  • View
    48

  • Download
    1

Embed Size (px)

DESCRIPTION

Automating the Analysis of Simulation Output Data. Katy Hoad , Stewart Robinson, Ruth Davies SSIG Meeting , 24th October 2007 http://www.wbs.ac.uk/go/autosimoa. The Problem. Prevalence of simulation software: ‘easy-to-develop’ models and use by non-experts. - PowerPoint PPT Presentation

Citation preview

Page 1: Automating the Analysis of Simulation Output Data

Automating the Analysis of Simulation Output Data

Katy Hoad, Stewart Robinson, Ruth Davies

SSIG Meeting, 24th October 2007

http://www.wbs.ac.uk/go/autosimoa

Page 2: Automating the Analysis of Simulation Output Data

The Problem

• Prevalence of simulation software: ‘easy-to-develop’ models and use by non-experts.

• Simulation software generally have very limited facilities for directing/advising user how to run the model to get accurate estimates of performance.

• With a lack of the necessary skills and support, it is highly likely that simulation users are using their models poorly.

Page 3: Automating the Analysis of Simulation Output Data

3 Main Decisions:

• How long a warm-up is needed?

• How long a run length is needed?

• How many replications should be run?

Page 4: Automating the Analysis of Simulation Output Data

Continuing theoretical developments

BUT little put into practical use.

Why?

• Limited testing of methods

• Requirement for detailed statistical knowledge

• Methods generally not implemented in simulation software (AutoMod/AutoStat is an exception)

A solution?

Provide an automated output ‘Analyser’.

Page 5: Automating the Analysis of Simulation Output Data

An Automated Output AnalyserSimulation

model

Warm-upanalysis

Run-lengthanalysis

Replicationsanalysis

Use replicationsor long-run?

Recommendationpossible?

Recommend-ation

Output data

Analyser

Obt

ain

mor

e ou

tput

dat

a

Analyser advises user on:

• Warm-up length

• Run-length

• Number of replications

Page 6: Automating the Analysis of Simulation Output Data

A 3 year, EPSRC funded project in collaboration with SIMUL8 Corporation.

The AutoSimOA Project

Main Objective:

•To propose a procedure for automated output analysis of warm-up, replications and run-length

Only looking at analysis of a single scenario

Page 7: Automating the Analysis of Simulation Output Data

The AutoSimOA Project

WORK CARRIED OUT TO DATE:

1. Creation of a representative and sufficient set of models / data output for testing chosen simulation output analysis methods.

2. Development of an automated algorithm for estimating the number of replications to run.

3. Selection and testing of warm-up methods from the literature.

Page 8: Automating the Analysis of Simulation Output Data

Part 1.

Creation of models and data sets

Page 9: Automating the Analysis of Simulation Output Data

AIMS:

Provide a representative and sufficient set of models / data output for use in discrete event simulation research.

Use models / data sets to test the chosen simulation output analysis methods in the AutoSimOA Project.

Page 10: Automating the Analysis of Simulation Output Data

Categorising Output Data Sets by Shape & Characteristics

Group A

…Group NGroup B

Auto Correlation

NormalityCycling/Seasonality

Terminating

Non-terminating

Steady state

In/out of control

Transient

Page 11: Automating the Analysis of Simulation Output Data

Model characteristics

Deterministic or random

Significant pre-determined model changes (by time)

Dynamic internal changes i.e. ‘feed-back’

Empty-to-empty pattern

Initial transient (warm-up)

Out of control trend ρ≥1

Cycle

Auto-correlation

Statistical distribution

Output data characteristics

Page 12: Automating the Analysis of Simulation Output Data

Modelling Warm-up Period:

Shapes of Initial Bias Functions

• Mean Shift:

• Linear:

• Quadratic:

• Exponential:

• Oscillating (decreasing):Quadratic ExponentialLinear

Page 13: Automating the Analysis of Simulation Output Data

Artificial Data: Construct data which resembles real model output with

known values for some specific attribute. Example: Known steady state mean and variance.Example data: AR(1) with N(0,1) errors & linear initial bias.

Real Models: Collect range of models created in “real circumstances”. Examples: • Swimming Pool complex: average number in system• Production Line Manufacturing Plant: through-put / hour• Fast Food Store: average queuing time

Page 14: Automating the Analysis of Simulation Output Data

Part 2.

WORK IN PROGRESS

Automating estimation of warm-up length

Page 15: Automating the Analysis of Simulation Output Data

The Initial Bias Problem

• Model may not start in a “typical” state.• This may cause initial bias in the output.• Many methods proposed for dealing with

initial bias: e.g. Initial steady state conditions; run model for ‘long’ time…

• This project uses: Deletion of the initial transient data by specifying a warm-up period.

Page 16: Automating the Analysis of Simulation Output Data

Question is:

How do you estimate the length of the warm-up period

required?

Page 17: Automating the Analysis of Simulation Output Data

5 main types of methods:

1. Graphical Methods.

2. Heuristic Approaches.

3. Statistical Methods.

4. Initialisation Bias Tests.

5. Hybrid Methods.

Page 18: Automating the Analysis of Simulation Output Data

Literature search – 42 methods

Summary of methods and

literature references on project

web site:

http://www.wbs.ac.uk/go/autosimoa

Currently testing methods

Page 19: Automating the Analysis of Simulation Output Data

Part 3.

Automating analysis of number of replications

Page 20: Automating the Analysis of Simulation Output Data

Introduction• Initial Setup:

Any warm-up problems already dealt with. Run length (m) decided upon. Modeller decided to use multiple replications to

obtain better estimate of mean performance.

• Multiple replications performed by changing the random number streams used by the model and re-running the simulation.

N

jj

NNm

NN

m

m

XN

X

X

X

XXX

XXX

XXX

1

1

21

222

21

112

11

1

ˆ

ˆ

,,,

,,,

,,,

Output data from modelResponse measure of interest

= summary statistic from rep1

= summary statistic from repN

N replications

Page 21: Automating the Analysis of Simulation Output Data

QUESTION IS…How many replications are

needed? • Limiting factors: computing time and

expense.

If performing N replications achieves a sufficient estimate of mean performance:> N replications: Unnecessary use of computer

time and money.< N replications: Inaccurate results → incorrect

decisions.

Page 22: Automating the Analysis of Simulation Output Data

Cumulative mean graph

46

48

50

52

54

56

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

Number of replications (n)

Cum

ulat

ive

mea

n

Page 23: Automating the Analysis of Simulation Output Data

Confidence Interval Method

• User decides size of error they can tolerate.• Run increasing numbers of replications, • Construct Confidence Intervals around sequential

cumulative mean of output variable until desired precision achieved.

Advantages: Relies upon statistical inference to determine

number of replications required.

Allows the user to tailor accuracy of output results to their particular requirement

or purpose for that model and result.

Disadvantage: Many simulation users do not have the skills to apply such an approach.

Page 24: Automating the Analysis of Simulation Output Data

Run

Model START:

Load Input

Produce Output Results

Run Replication Algorithm

Precision criteria met?

Recommend replication number

Run one more

replication

YES

NO

AUTOMATE Confidence Interval Method: Algorithm interacts with simulation model sequentially.

Page 25: Automating the Analysis of Simulation Output Data

2,1 nt

n

nn

nX

nt

d

s2,1

100

is the student t value for n-1 df and a significance of 1-α,

nX

sn is the estimate of the standard deviation,

calculated using results Xi (i = 1 to n) of the n current replications.

Where

n is the current number of replications carried out,

We define the precision, dn, as the ½ width of the Confidence Interval expressed as a percentage of the cumulative mean:

is the cumulative mean,

ALGORITHM DEFINITIONS

Page 26: Automating the Analysis of Simulation Output Data

Stopping Criteria

• Simplest method:

Stop when dn 1st found to be ≤ desired precision, drequired , and recommend that number of replications, Nsol, to the user.

• Problem: Data series could prematurely converge, by chance, to incorrect estimate of the mean, with precision drequired , then diverge again.

• ‘Look-ahead’ procedure: When dn 1st found to be ≤ drequired, algorithm performs set number of extra replications, to check that precision remains ≤ drequired.

Page 27: Automating the Analysis of Simulation Output Data

23

25

27

29

31

33

35

37

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Replication number (n)

NsolNsol + f(kLimit)

f(kLimit)

Precision ≤ 5%X

X

95% confidence limits

Cumulative mean,

Replication Algorithm

Page 28: Automating the Analysis of Simulation Output Data

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Replication number (n)

Precision

≤ 5%

Precision

> 5%

Precision ≤ 5%

f(kLimit)

Nsol2Nsol2 + f(kLimit)

Nsol1

Page 29: Automating the Analysis of Simulation Output Data

• 24 artificial data sets created: Left skewed, symmetric, right skewed; Varying values of relative standard deviation (stdev/mean).

• Advantage: true mean and variance known.

• Artificial data set: 100 sequences of 2000 data values.

• 8 real models selected.

• Different lengths of ‘look ahead’ period looked at:

kLimit values = 0 (i.e. no ‘look ahead’ period), 5, 10, 25.

• drequired value kept constant at 5%.

TESTING METHODOLOGY

Page 30: Automating the Analysis of Simulation Output Data

5 performance measures

1. Coverage of the true mean2. Bias3. Absolute Bias4. Average Nsol value5. Comparison of 4. with Theoretical Nsol

value

• For real models: ‘true’ mean & variance values - estimated from whole sets of output data (3000 to 11000 data points).

Microsoft Excel Worksheet

Page 31: Automating the Analysis of Simulation Output Data

Results

• Nsol values for individual algorithm runs are very variable.

• Average Nsol values for 100 runs per model close to the theoretical values of Nsol.

• Normality assumption appears robust.

• Using a ‘look ahead’ period improves performance of the algorithm.

Page 32: Automating the Analysis of Simulation Output Data

Mean bias significantly different to zero

Failed in coverage of true mean

Mean est. Nsol significantly different to theoretical Nsol (>3)

No ‘look-ahead’ period

Proportion of Artificial models

4/24 2/24 9/18

Proportion of Real models

1/8 1/8 3/5

kLimit = 5 Proportion of Artificial models

1/24 0 1/18

Proportion of Real models

0 0 0

Page 33: Automating the Analysis of Simulation Output Data

% decrease in absolute mean bias

kLimit = 0 tokLimit = 5

kLimit = 5 tokLimit = 10

kLimit = 10 tokLimit = 25

ArtificialModels

8.76% 0.07% 0.26%

RealModels

10.45% 0.14% 0.33%

Impact of different look ahead periods on performance of algorithm

Page 34: Automating the Analysis of Simulation Output Data

Model ID

kLimit Nsol Theoretical Nsol (approx)

Mean estimate significantly different to the true mean?

A9 0 4 112 Yes

  5 120 No

A24 0 3 755 Yes

  5 718 No

R7 0 3 10 Yes

  5 8 No

R4 0 3 6 Yes

5 7 No

R8 0 3 45 Yes

  5 46 No

Examples of changes in Nsol & improvement in estimate of true mean

Page 35: Automating the Analysis of Simulation Output Data

Replication Work Discussion

• kLimit default value set to 5.

• Initial number of replications set to 3.

• Multiple response variables - Algorithm run with each response - use maximum estimated value for Nsol.

• Different scenarios - advisable to repeat algorithm every few scenarios to check that precision has not degraded significantly.

• Inclusion into SIMUL8 package: Full explanations of algorithm and results.

Page 36: Automating the Analysis of Simulation Output Data

Summary Of Replications Work

• Selection and automation of Confidence Interval Method for estimating the number of replications to be run in a simulation.

• Algorithm created with ‘look ahead’ period -efficient and performs well on wide selection of artificial and real model output.

• ‘Black box’ - fully automated and does not require user intervention.

Page 37: Automating the Analysis of Simulation Output Data

PROJECT OVERVIEW

• Created set of artificial and “real” model data including warm-up bias functions.

• Created replication algorithm.

Currently:

• Testing warm-up methods.

Page 38: Automating the Analysis of Simulation Output Data

ACKNOWLEDGMENTSThis work is part of the Automating Simulation Output Analysis

(AutoSimOA) project that is funded by the UK (EPSRC) Engineering and Physical Sciences Research Council (EP/D033640/1). The work is being carried out in collaboration with SIMUL8 Corporation, who are

also providing sponsorship for the project.

Stewart Robinson, Katy Hoad, Ruth Davies

SSIG Meeting, 24th October 2007

http://www.wbs.ac.uk/go/autosimoa