Basic Concepts of Probability and Statistics for …...Basic Concepts of Probability and Statistics for Reliability Engineering Ernesto Gutierrez-Miravete Spring 2007 1 Introduction

Basic Concepts of Probability and Statisticsfor Reliability Engineering

Ernesto Gutierrez-Miravete

Spring 2007

1 Introduction

1.1 Probability

Events and sample space are fundamental concepts in probability. A sample space S isthe set of all possible outcomes of an experiment whose outcome cannot be determinedin advance while an event E is a subset of S. The probability of the event E, P (E), is anumber satisfying the following axioms

0 ≤ P (E) ≤ 1

P (S) = 1

P (⋃Ei) =

∑P (Ei)

where the various Ei’s are mutually exclusive events.One can associate with each occurrence in S a numerical value. A random variable X

is a function assigning a real number to each member of S. Random variables can adoptdiscrete values or continuous values.

1.2 Discrete random variables

If X is a random variable (i.e. a function defined over the elements of a sample space) witha finite number of possible values xi ∈ RX with i = 1, 2, ... , where RX is the range of valuesof the random variable, then it is a discrete random variable.

The probability of X having a specific value xi, p(xi) = P (X = xi) is a number such that

1

p(xi) ≥ 0

for every i = 1, 2, ..., and

∞∑i=1

p(xi) = 1

The collection of pairs (xi, p(xi)), i = 1, 2, ... is called the probability distribution of X.p(xi) is the probability mass function of X.

Two examples of discrete random variables are:

• Number of jobs arriving at a job shop each week.

• Tossing a loaded die.

1.3 Continuous random variables

If RX is an interval rather than a discrete set then X is a continuous random variable.The probability that X ∈ [a, b] is

P (a ≤ X ≤ b) =∫ b

af(x)dx

where f(x) is the probability density function of X satisfying (for all x ∈ RX)

f(x) ≥ 0

∫RX

f(x)dx = 1

and, if x 6∈ RX

f(x) = 0

Two example of continuous random variables are:

• The life of a device.

• Temperature readings in a turbulent flow field.

2

1.4 Cumulative distribution function

The probability that X ≤ x, P (X ≤ x) = F (x) is the cumulative distribution function.The CDF is defined as

F (x) =n∑

i=1

p(xi)

for discrete X ≤ xn and as

F (x) =∫ x

−∞f(t)dt

for continuous X ≤ x.Note that if a < b then F (a) ≤ F (b), limx→∞ F (x) = 1, limx→−∞ F (x) = 0 and P (a ≤

X ≤ b) = F (b) − F (a).Exercise. Determine the probabilities of various outcomes in tossing a loaded die and

also the probability that a device has a certain life.

1.5 Expectation and Moment Generating Function

The expected value of a random variable X, the expectation of X is

E(X) =n∑

i=1

xip(xi)

for discrete X and

E(X) =∫ ∞

−∞xf(x)dx

for continuous X. E(X) is also called the mean or the first moment of X. Generalizing,the nth moment of X is

E(Xn) =n∑

i=1

xni p(xi)

for discrete X and

E(X) =∫ ∞

−∞xnf(x)dx

for continuous X.A moment generating function of a random variable X can be defined as

ψ(t) = E(etX) =∫etXdF (x)

3

Moments of all orders for X are obtained as the derivatives of ψ. The existence of a momentgenerating function uniquely determines the distribution of X.

The variance of X, V (X) = var(X) = σ2 is

σ2 = E((X − E(X))2) = E(X2) − (E(X))2

The standard deviation of X is σ =√σ2. The third and fourth moments of a distribution

are associated with its skewness and its kurtosis, respectively.Exercises. Determine the expectations of various outcomes in tossing a loaded die and

that of certain device having a certain life.Another important statistic is the covariance of two random variablesX and Y , Cov(X,Y ).

This is defined as

Cov(X,Y ) = E(XY ) − E(X)E(Y )

If Cov(X,Y ) = 0 the variables are said to be uncorrelated. Further, the autocorrelationcoefficient, ρ(X,Y ) is defined as

ρ(X,Y ) =Cov(X,Y )

(var(X)var(Y ))12

The conditional probability gives the probability that a random variable X = x giventhat Y = y and is defined as

P (X = x|Y = y) =P (X = x, Y = y)

P (Y = y)

Exercise. In a population of N people NA are color blind, NH are female and NAH arecolor blind females. If a person chosen at random turns out to be a female, what is theprobability that she will also be color blind?

1.6 Law of Large Numbers and the Central Limit Theorem

The following limit theorems are of fundamental and practical importance. They are givenhere without proof.

The strong law of large numbers states that if the random variables X1, X2, ..., Xn

are independent and identically distributed (iid) with mean µ then the limit

limn→∞

∑n1 Xi

n= lim

n→∞ X̄ = µ

with probability P = 1.Furthermore if the variance of the distribution of the Xi above is σ2, the central limit

theorem states that

limn→∞P [

X̄ − µ

σ/√n

≤ a] =∫ a

−∞1√2πe−x2/2dx

4

In words the theorem states that the distribution of the normalized random variable(X̄ − µ)/(σ/

√n) approaches the standard normal distribution of mean 0 and standard

deviation 1.

2 Discrete Distributions

2.1 Bernoulli distribution

For an experiment consisting of n independent trials each with two possible outcomes, namelysuccess and failure. If Xj = 1 for a success and Xj = 0 for failure and the probability ofsuccess remains constant from trial to trial, the probability of success at the jth trial is givenby the Bernoulli distribution as follows

pj(xj) = p(xj) =

p xj = 1, j = 1, 2, ..., n1 − p = q xj = 0, j = 1, 2, ..., n0 otherwise

Note that E(Xj) = p and V (Xj) = pq.The outcome of tossing a fair coin n times can be represented by a Bernoulli distribution

with p = q = 12.

2.2 Binomial distribution

The number of successes in n Bernoulli trials is a random variable X with the binomialdistribution p(x)

p(x) =

(nx

)pxqn−x x = 0, 1, 2, ..., n

0 otherwise

where (nx

)=

n!

x!(n− x)!

Note that E(X) = np and V (X) = npq.Consider as an example the following situation form quality control in chip manufacture

where the probability of finding more than 2 nonconforming chips in a sample of 50, is

P (X > 2) = 1 − P (X ≤ 2) = 1 −2∑

x=0

(nx

)pxq50−x

5

2.3 Geometric distribution

The number of trials required to achieve the first success is a random variable X with thegeometric distribution p(x)

p(x) =

{qx−1p x = 1, 2, ...0 otherwise

Note that E(X) = 1/p and V (X) = q/p2.Exercise. In acceptance sampling one must determine, for example, the probability that

the first acceptable item found is the third one inspected given that 40% of items are rejectedduring inspection. Determine the values of x and q and find p(x).

2.4 Poisson distribution

If α > 0, the Poisson probability mass function is

p(x) =

{exp(−α)αx

x!x = 0, 1, 2, ...

0 otherwise

Note that E(X) = V (X) = α. The cumulative distribution function is

F (x) =x∑

i=0

exp(−α)αi

i!

Examples of Poisson distributed random variables include

• The number of customers arriving at a bank.

• Beeper calls to an on-call service person.

• Lead time demand in inventory systems.

3 Continuous Distributions

3.1 Uniform distribution

For a random variable X which is uniformly distributed in [a, b] the uniform probabilitydensity function is

f(x) =

{1

b−aa ≤ x ≤ b

0 otherwise

6

while its cumulative distribution function is

F (x) =

0 x < ax−ab−a

a ≤ x < b

1 x ≥ b

Note that P (x1 < X < x2) = F (x2) − F (x1) = x2−x1

b−a. Note also that E(X) = a+b

2and

V (X) = (b−a)2

12.

Examples of uniformly distributed random variables could be:

• Inter arrival time for calls seeking a forklift in warehouse operations.

• Five minute wait probability for passenger at a bus stop.

• Readings from a table of random numbers.

3.2 Exponential distribution

If λ > 0, the exponential probability density function of X is

f(x) =

{λ exp(−λx) x ≥ 00 elsewhere


F (x) =

{0 x < 0∫ x0 λ exp(−λt)dt = 1 − exp(−λx) x ≥ 0

Note that E(X) = 1/λ and V (X) = 1/λ2.Examples of exponentially distributed random variables include:

• Inter arrival times of commercial aircraft at an airport.

• Life of a device.

The exponential distribution possesses the memoryless property, i.e. if s ≥ 0 and t ≥ 0then P (X > s + t|X > s) = P (X > t). Clearly, unless there is agreement beforehand thetime one person arrives at the bank is independent of the arrival time of the next person.Another example is that of the life of a used component which is as good as new. In thediscrete case the geometric distribution also possesses the memoryless property.

7

3.3 Gamma distribution

The gamma function of parameter β > 0, Γ(β) is

Γ(β) =∫ ∞

0xβ−1 exp(−x)dx

Note that Γ(β) = (β − 1)Γ(β − 1) = (β − 1)!A random variable X has a gamma probability density function with shape param-

eter β and scale parameter θ if

f(x) =

{βθ

Γ(β)(βθx)β−1 exp(−βθx) x > 0

0 otherwise

The cumulative distribution function is

F (x) =

{1 − ∫∞

xβθ

Γ(β)(βθt)β−1 exp(−βθt)dt x > 0

0 x ≤ 0

Note that E(X) = 1/θ and V (X) = 1/βθ2.

3.4 Erlang distribution

If above β = k where k is an integer, the Erlang distribution of order k is obtained. Thecumulative distribution function is

F (x) =

{1 −∑k−1

i=0exp(−kθx)(kθx)i

i!x > 0

0 x ≤ 0

Examples of gamma distributions occur for random variables associated with the relia-bility function and in the probability that a process consisting of several steps will have agiven duration.

3.5 Normal distribution

A random variable X with mean µ and variance σ2 has a normal distribution (X ∼N(µ, σ) if its probability density function in x ∈ [−∞,∞] is

f(x) =1

σ√

2πe−

12( x−µ

σ)2


F (x) = P (X ≤ x) =∫ x

−∞1

σ√

2πe−

12( t−µ

σ)2dt

8

The standardized random variable Z = (X − µ)/σ has mean of zero and standarddeviation of 1. Its probability density function is:

φ(z) =1√2πe−

z2

2

and the cumulative distribution function is

Φ(z) = P (X ≤ x) =∫ z

−∞1√2πe−

t2

2 dt

Examples of normally distributed random variables abound. A few of them are:

• Time to perform a task.

• Time waiting in a queue.

• lead time demand for an item.

3.6 Lognormal distribution

A random variable X has a lognormal distribution (X ∼ LN(θ,m, σ)) if its probabilitydensity function in x ∈ [0,∞] is given by

f(x) =1

(x− θ)σ√

2πe−[(ln( x−θ

m))2/(2σ2)]

where θ is the location parameter (often = 0) and m is the scale parameter. When θ = 0and m = 1 one has the standard lognormal distribution.


F (x) = P (X ≤ x) = Φ(lnx

σ)

where Φ is the cumulative distribution function of the normal distribution.Because of its relation with the normal distribution of mean µ and variance σ2, the

probability density function of the lognormal distribution with location parameter θ = 0 issometimes expressed as

f(x) =1

xσ√

2πe−[(ln x−µ)2/(2σ2)]

here, µ and σ2 are the mean and standard deviation of the random variable’s logarithm. Theexpected value (mean) of the lognormal distributed random variable is

E(x) = exp(µ+ σ2/2)

and the variance is

var(x) = exp(2µ+ 2σ2/2) − exp(1µ+ σ2)

9

3.7 Weibull distribution

A random variable X associated with the three parameters −∞ < ν <∞ (location), α > 0(scale) and β > 0 (shape), has a Weibull distribution if its probability density function is

f(x) =

{βα(x−ν

α)β−1 exp(−(x−ν

α)β) x ≥ ν

0 otherwise

The cumulative probability distribution function is

F (x) =

{0 x < ν1 − exp(−(x−ν

α)β) otherwise

If ν = 0, the probability density function becomes

f(x) =

{β 1

α( x

α)β−1 exp(−( x

α)β) x ≥ 0

0 otherwise

The corresponding cumulative distribution function is

F (x) =

{0 x < 01 − exp(−( x

α)β) otherwise

If ν = 0 and β = 1, the probability density function

f(x) =

{1α

exp(− xα) x ≥ 0

0 otherwise

i.e. the exponential distribution with parameter λ = 1/α.The mean and variance of the Weibull distribution are, respectively E(X) = ν+αΓ( 1

β+1)

and V (X) = α2(Γ( 2β

+ 1) − Γ( 1β

+ 1)2).Examples of Weibull distributed random variables include:

• Mean time to failure of flat panel screens.

• Probability of clearing an airport runaway within a given time.

3.8 Extreme Value (Gumbell) distribution

A random variable X has an Extreme Value (Gumbell) distribution (X ∼ EV (µ, β) ifits probability density function in x ∈ [−∞,∞] is given by

f(x) =1

βe−

x−µβ ee

x−µβ

10

where µ is the location parameter and β is the scale parameter. When µ = 0 and β = 1 onehas the standard Gumbell distribution.

The cumulative distribution function of the standard Gumbell distribution is given by

F (x) = P (X ≤ x) = 1 − eex

This distribution has been found useful for the description of extreme events such asfloods or earthquakes.

3.9 Triangular distribution

The triangular probability density function is

f(x) =

2(x−a)(b−a)(c−a)

a ≤ x ≤ b2(c−x)

(c−b)(c−a)b ≤ x ≤ c

0 otherwise

While its cumulative distribution function is

F (x) =

0 x ≤ a(x−a)2

(b−a)(c−a)a < x ≤ b

1 − (c−x)2

(c−b)(c−a)b < x ≤ c

1 x > c

The mean E(X) = (a + b + c)/3 and the mode M = b. The median is obtained by settingF (x) = 0.5 and solving for x.

The triangular distribution is a useful one when the only information one has availableabout the random variable are its extreme and its maximum values.

4 Empirical Distributions

If the distribution function of a random variable can not be specified in terms of a knowndistribution and field data is available, one can use an empirical distribution. Empiricaldistributions can be discrete or continuous.

5 Inferences, Estimation and Test of Hypotheses

Statistical inference is a collection of methods designed to investigate the characteristicsof a certain population using only information obtained from a random sample extractedfrom such population. Inference is an aid in making decisions confronted with uncertaintyand it is the foundation of modern decision theory.

11

Estimation consists in the determination of the value or range of values of a parameterof the population using the sample data. Confidence intervals with a specified degree ofconfidence are used in interval estimation.

Sometimes, rather than in the value of a parameter one is interested in the validity ofa certain statement (hypothesis testing). In such cases one can encounter the followingsituations:

• Accept the statement, it being true (No error is involved).

• Reject the statement, it being true (Type I error).

• Accept the statement, it being false (Type II error).

• Reject the statement, it being false (No error is involved).

One is then interested in the probabilities of incurring in Type I and Type II errors(respectively, α and β).

Two commonly used statistical inference tests in simulation modeling are the Chi squaredand Kolmogorov-Smirnov tests.

Exercise. Do some research and find out how are the Chi-squared and the Kolmogorov-Smirnov tests performed.

6 Useful Probabilistic and Statistical Models

6.1 Stochastic Processes

A stochastic process takes place in a system when the state of the system changes withtime in a random manner. Many if not most natural and/or human-made processes arestochastic processes, although in some cases the random aspects can be neglected.

6.2 Poisson Process

Often one is interested in the number of events which occur over a certain interval of time,i.e. a counting process (N(t), t ≥ 0). A counting process is a Poisson process if itinvolves

• One arrival at a time.

• Random arrivals without rush or slack periods (stationary increments).

• Independent increments.

12

Under these circumstances, the probability that N(t) = n for t ≥ 0 and n = 0, 1, 2, ... is

P (N(t) = n) =exp(−λt)(λt)n

n!

This means that N(t) has a Poisson distribution with parameter α = λt. Its mean andvariance are E(N(t)) = V (N(t)) = α = λt. It can be shown that if the number of arrivalshas a Poisson distribution, the inter arrival times have an exponential distribution.

The random splitting property of Poisson processes states that ifN(t) = N1(t)+N2(t)is Poisson with rate λ, then N1 and N2 are independent Poisson with rates λp and λ(1− p),where p and1−p are the probabilities of the branches N1 and N2. Similarly, if N1(t)+N2(t) =N(t), the reverse is true (random pooling property).

6.3 Markov Chains and the Kolmogorov Balance Equations

If the future probability characteristics of a system in which a stochastic process is takingplace depend only on the state of the system at the current time, one has a Markov processor chain. The effect of the past on the future is contained in the present state of the system.

As a simple example of a Markov chain consider a machine that works until it fails(randomly) and then resumes work once is repaired. There are two states for this system,namely

• The machine is busy (S0)

• The machine is being repaired (S1)

The system moves from state S0 to S1 at a rate λ and from S1 back to S0 at a rate µ.Exercise. Make a graph representing the above Markov chain.As a second example consider now a facility where two machines A and B perform an

operation. The machines fail randomly but resume work once they are repaired. The fourpossible states of this system are

• Both machines are busy (S0)

• Machine A is being repaired while B is busy (S1)

• Machine B is being repaired while A is busy (S2)

• Both machines are being repaired (S3)

Now λ1 and λ2 are, respectively, the failure rates of machines A and B while µ1 and µ2 arethe corresponding repair rates.

Exercise. Make a graph representing the above Markov chain.

13

The Kolmogorov Balance Equations are differential equations relating the probabil-ities of the various states involved in a Markov chain P0, P1, P2 and P3. They are obtainedby a probability balance on the states. For the second example above they are

dP0

dt= µ1P1 + µ2P2 − (λ1 + λ2)P0

dP1

dt= λ1P0 + µ2P3 − (µ1 + λ2)P1

dP2

dt= λ2P0 + µ1P3 − (λ1 + µ2)P2

dP3

dt= λ2P1 + λ1P2 − (µ1 + µ2)P3

Under steady state or equilibrium conditions the time derivatives are zero and the prob-abilities are then related by a system of simultaneous linear algebraic equations.

6.4 Queueing Systems and Little’s Formula

A queueing system involves one or more servers which provide some service to customerswho arrive, line up and wait for service at a queue when all the servers are busy. Typically,both arrival and service times are random variables. The single server queue consists of asingle server and a single queue. If the inter arrival times of customers and the service timesare exponentially distributed the resulting queue is known as the M/M/1 queue.

Inter arrival and service times in queues are often modeled probabilistically. Twoexamples of queueing systems are:

• Inter arrival times of mechanics at a centralized tool crib.

• Number of mechanics arriving at a centralized tool crib per time period.

Random inter arrival and service times are often simulated using exponential distribu-tions. However, sometimes a normal distribution or a truncated normal distribution maybe more appropriate. Gamma and Weibull distributions are also used.

An important parameter of the queueing system is the server utilization ρ given by

ρ =λ

µ

where λ is the mean arrival rate of customers from the outside world into the queueingsystem and µ is the mean service rate.

14

The single server queue can also be regarded as a Markov chain in which the variousstates are distinguished only by the number of customers waiting in the queue. Let us callthe corresponding states S0, S1, ..., Sn. The system can then move into state Si either fromSi−1 (if a new customer arrives before service is completed for the customer being served)or from Si+1 if service is completed and the next customer in line begins service before anynew arrival. Let λi,j be the rate at which the system transitions from state Si to state Sj.

Exercise. Make a graph representing the Markov chain for the single teller queue.If the queue is at steady state, the Kolmogorov equations yield

Pn =λn−1,n...λ1,2λ0,1

λn,n−1...λ2,1λ1,0

P0

where Pn is the probability of encountering n customers in the system and λ0,1 = λ.Exercise. Derive the above expression.In investigating queueing systems one is interested in performance measures such

as the expected number of customers in the system L, the expected number ofcustomers in the queue Lq, the expected wait time of customers in the system W ,and the expected wait time of customers in the queue Wq. The above expectanciesare related by Little’s Formula. The formula simply states that

L = λW

or that

Lq = λWq

Exercise. Derive the above expression relating L and W .A number of queueing problems have been solved yielding closed form expressions for

the above performance parameters. For instance, for the M/M/1 queue at steady state theresults are as follows

L λµ−λ

= ρ1−ρ

W 1µ−λ

= 1µ(1−ρ)

Lqλ2

µ(µ−λ)= ρ2

1−ρ

Wqλ

µ(µ−λ)= ρ

µ(1−ρ)

Pn (1 − λµ)(λ

µ)n

Further, for the M/G/1 queue in which the service times have a mean of 1/µ and avariance σ2 the corresponding results are

L ρ+ λ2(1/µ2+σ2)2(1−ρ)

= ρ+ ρ2(1+σ2µ2)2(1−ρ)

W 1µ

+ λ(1/µ2+σ2)2(1−ρ)

Lqλ2(1/µ2+σ2)

2(1−ρ)= ρ2(1+σ2µ2)

2(1−ρ)

Wqλ(1/µ2+σ2)

2(1−ρ)

P0 1 − ρ

15

7 Matching Data with Distributions

For the sake of computational convenience in reliability analysis and modeling, raw datawhich is known to consist of independent and identically distributed (i.i.d.) entries are fittedto a theoretical distribution function. This is nowadays easily done using programs such asStat::Fit or Expert.Fit.

Time to failure of complex components is usually represented with the Weibull distri-bution. If failures are completely random, the exponential distribution is used but if failuretimes fluctuate equally around a mean, the normal distribution may be more appropriate.The lognormal distribution can also be used. For incomplete data uniform, triangularand beta distributions are used.

Data must then be tested for independence. Some useful tools are:

• Scatter Plots. Contiguous values in a string of values of a random variable are plottedon a x− y plane. The resulting pattern of points is characteristic of the distribution.

• Autocorrelation Plots. The covariance of values separated by a specified lag in a stringof values of a random variable is plotted as a function of the number of data points.

• Runs Tests. This searches for peculiar patterns in substrings of numbers from a largerstream.

Data must also be tested to see if they are Identically Distributed (Homogeneity Tests).Some useful tools are:

• Histograms.

• Distribution Plots.

• Quartile-Quartile Plots.

• Kolmogorov-Smirnov.

• Chi-squared.

• Time Dependency of Distributions.

• ANOVA.

The collected data typically consists of a limited number of data values. Simulationmodeling require large numbers of multiple samples therefore the data must be converted toa frequency distribution.

Raw data can be used as input for the simulation project but this is usually not recom-mended except in special cases. More commonly, once data have been tested for independenceand correlation they are converted to a form suitable for use in the simulation model. This is

16

done by fitting it to some distribution. Once a distribution fitting the data has been deter-mined, input for the simulation program is produced as random variates sampled from thefitted distribution. The frequency distribution selected can be empirical or theoretical, dis-crete or continuous. Discrete distributions are rarely directly used. Instead, numerical valuesof discrete probabilities are directly used. Effectively, continuous, theoretical distributionsare almost always employed.

Of the many available theoretical distributions 12 or so are commonly used in simulationmodeling. Data are fitted to theoretical distributions by identifying the theoretical distri-bution which best represents the data. Stat::Fit provides a ranking of distributions fittinga particular data set together with a goodness of fit (Chi-squared or Kolmogorov-Smirnov)diagnostic. Note also that if the fitted distribution is unbounded values for simulation shouldbe taken rather from a truncated version of the selected distribution in order to avoid unre-alistic extreme values.

7.1 Physical Basis of Common Distributions

Each statistical distribution function has a physical basis. An understanding of this basisis useful in determining candidate distributions to represent field data. Following is a briefsummary of the physical basis of selected distributions.

• Binomial. This represents the distribution of a random variable giving the number ofsuccesses in n independent trials each yielding either success or failure with probabilitiesp and 1 − p, respectively.

• Geometric. This represents the distribution of a random variable giving the numberof independent trials required in an experiment before k successes are achieved.

• Poisson. This represents the distribution of a random variable giving the number ofindependent events occurring within a fixed amount of time.

• Normal. This represents the distribution of a random variable which is itself the resultof the sum of component processes.

• Lognormal. This represents the distribution of a random variable which is itself theresult of the product of component processes.

• Exponential. This represents the distribution of a random variable giving the timeinterval between independent events.

• Gamma. A distribution of broad applicability restricted to non-negative randomvariables.

• Beta. A distribution of broad applicability restricted to bounded random variables.

17

• Erlang. This represents the distribution of a random variable which is itself the resultof the sum of exponential component processes.

• Weibull. This represents the distribution of a random variable giving the time tofailure of a component.

• Uniform. This represents the distribution of a random variable whose values arecompletely uncertain.

• Triangular. This represents the distribution of a random variable for which onlyminimum, most likely and maximum values are known.

7.2 Common Situations where Specific Distributions are UsefulRepresentations of Collected Data

Input data for DES models must often be created according to a specific statistical dis-tribution. The required distribution must be identified based on how well it represents thecollected data. Following is a brief summary of the real-life situations where the distributionsmentioned above are likely to be encountered.

• Binomial. Useful when there are only two possible outcomes of an experiment whichis repeated multiple times.

• Geometric. Useful also when there are only two possible outcomes of an experimentwhich is repeated multiple times.

• Poisson. Useful to represent the number of incoming customers or requests into asystem.

• Normal. Useful to represent the distribution of errors of all kinds.

• Lognormal. Useful for representation of times required to perform a given task oraccomplish a certain goal.

• Exponential. Useful to represent inter arrival times in all kinds of situations.

• Gamma. Useful also for representation of times required to perform a given task oraccomplish a certain goal but more general.

• Beta. Useful as a rough model under situation of ignorance and/or to represent theproportion of non-conforming items in a set.

• Erlang. Useful to represent systems making simultaneous request for attention froma server.

18

• Weibull. Useful to represent the life and/or reliability of components.

• Uniform. Useful when one knows nothing about the system.

• Triangular. Useful when one know little about the system.

7.3 Short-Cut Methods for Distribution Identification

In DES modeling the collected input data is often replaced by computer generated randomvariate values which accurately represent the original data. Typically, several distributionswill be considered good candidates. A histogram of the data can provide a first inklingabout the family of distribution function(s) which can well represent the data.

Another simple test which can be used to quickly determine whether a given set of data areadequately represented by a specific distribution is the construction of quantile-quantileplots.

Assume that X is a random variable whose cumulative distribution function is F . Theq-quantile of X is that value γ of the random variable which satisfies the equation

F (γ) = P (X < γ) = q

In n data values are arranged in increasing order then the value of the j(n+1)k

-largest valuewill be denoted by gj/k. Therefore, the median is g1/2, i.e. the n+1

2-th or half-way through

the data set.A common application of this concept is in investigating the distribution of income in

a population where the total number of households is divided into five quintiles, (q =0.2, 0.4, 0.6, 0.8 and 0.1) by increasing values of income. Specifically, here in the USA, if yourhousehold income is more than about γ = 80, 000 dollars per year then you belong in thetop quintile. One in five households is in that quintile.

Consider a collection of n values of the random variable X, xi for i = 1, 2, ..., n. If thevalues are arranged according to their magnitude a new string of values is obtained whichwe call yj with j = 1, 2, ..., n. The new variable becomes immediately an estimate for the(j − 1

2)/n quantile of X, i.e.

yj ≈ F−1(j − 1

2

n)

Once an appropriate family of distributions has been selected one proceeds to determinethe various distribution parameters using the collected data values. Following is summaryof distribution parameters and their estimators for three commonly used distributions.

• Poisson. Parameter: E(X) = α. Estimator: sample mean.

• Exponential. Parameter: E(X) = λ. Estimator: reciprocal of the sample mean.

• Normal. Parameters: µ and σ2. Estimators: sample mean and variance.

19

7.4 Goodness of Fit Testing

To determine the appropriateness of a given distribution in a particular situation goodness-of-fit tests are required. The tests verify the validity of the null hypothesis H0 whichstates that the random variable X follows a specific distribution.

In this section we examine two commonly used tests used for this purpose, the Chi-square test (applicable to large samples) and the Kolmogorov-Smirnov test (applicableto small samples and restricted to continuous distributions).

For the Chi-square test the n data points are arranged into a desired number of cells (k).The expected number of points to fall inside the i-th cell, Ei is then given by

Ei = npi

where pi is the probability associated with that interval and is obtained from the specifieddistribution.

For instance, consider the case of reliability data consisting of a total of nf failures,binned into cells representing number of failures within time intervals of uniform duration∆ti = ti+1 − ti. Assume that direct calculation of the failure rate (hazard function value)yields a reasonably constant trend and that an average value is computed. Introduce thenthe null hypothesis that the data is exponentially distributed with constant failure rate λ̂estimated as the average failure rate computed from the data. The expected number offailures inside each time bin is then given by

Ei = nf × [exp(−λ̂ti) − exp(−λ̂ti+1)]

Next, using the actual number of data points contained in each cell, Oi one computesthe statistic

χ20 =

n∑i=1

(Oi − Ei)2

Ei

To test the null hypothesis, the critical value of the statistic is then determined as χ2α,k−s−1

where α is the confidence level and k−s−1 is the number of degrees of freedom and s isthe number of parameters in the candidate distribution (s = 1 in the case of the exponentialdistribution). Finally, if χ2

0 > χ2α,k−s−1 thenH0 is rejected but if χ2

0 < χ2α,k−s−1 the hypothesis

cannot be rejected at the given confidence level.If the null hypothesis cannot be rejected one can then calculate a confidence interval for

the distribution parameters. For instance, considering again the case of the exponentiallydistributed reliability data above, one can show that a 100(1 − α)% confidence interval forthe value of λ̂ is given by

[λ̂× χ20(2nf , 1 − α/2)

2nf

, λ̂× χ20(2nf , α/2)

2nf

]

20

For the Kolmogorov-Smirnov test the n data points are also arranged in increasing order.If possible the data are made dimensionless dividing each value by the largest value in theset. Then, one calculates the statistics

D+ = max (i

n−Ri)

D− = max (Ri − i− 1

n)

and

D = max (D+, D−)

and compares D against the critical value Dc. When D < Dc the null hypothesis H0 cannotbe rejected.

7.5 Input in the Absence of Data

Sometimes input data for DES models is just not easily available. In those instances one mustrely on related engineering data, expert opinion and physical or other limitations to producereasonable input values for the model. A few data values combined with the assumption ofuniform, triangular or beta distribution can provide a solid starting point for research.

7.6 Correlated Input Data

In some situations various inputs may be related to each other or the same input quantitymay exhibit autocorrelation over time. Typical examples are in inventory modeling wheredemand data affect lead time data and in stock trading where buy and sell orders called tothe broker tend to arrive in bursts.

When two correlated input variables X1 and X2 are involved one uses their covarianceCov(X1, X2) or their correlation

ρ =Cov(X1, X2)

σ1σ2

If collected data values for the two values yields ρ >> 0 then one needs to generate correlatedvariates.

The following algorithm generates two correlated random variates with normal distribu-tions with parameters µ1, σ1 and µ2, σ2, respectively.

• Generate two independent standard normal variates Z1 and Z2.

• Set X1 = µ1 + σ1Z1

21

• Set X2 = µ2 + σ2(ρZ1 +√

1 − ρ2Z2)

If data correspond to a time series of values of a single variable X1, X2, X − 3, ... allfrom the same distribution then one uses the lag-h auto covariance Cov(Xi, Xi+h) or thelag-h correlation

ρh =Cov(Xi, Xi+h)

σi

σi+h

Autoregressive order-1 (AR(1)) and exponential autoregressive order-1 (EAR(1)) modelsare commonly used to generate autocorrelated time series.

The algorithm for the AR(1) model is as follows

• Using the collected data, determine the values of the parameters µ ≈ X̄, φ = Cov(Xt, Xt+1)/S2

(lag-1 autocorrelation) and σε = S2(1 − φ2).

• Generate εt from a normal distribution with mean 0 and variance σ2ε .

• Generate X1 from a normal with mean µ and variance σε/(1 − φ2).

• Set Xt = µ+ φ(Xt−1 − µ) + εt.

• Repeat.

The algorithm for the EAR(1) model is as follows

• Using the collected data, determine the values of the parameters λ ≈ 1/X̄ and φ =Cov(Xt, Xt+1)/S

2 (lag-1 autocorrelation).

• Generate X1 from an exponential with mean 1/λ.

• Generate U from a uniform [0,1].

• If U ≤ φ set Xt = φXt−1.

• If U > φ generate εt from an exponential with mean 1/λ and set Xt = φXt−1 + εt.

• Repeat.

22

8 Generation of Random Numbers and Pseudo-Random

Numbers

Recall that for a random variable X which is uniformly distributed in [0, 1] the uniformprobability density function is

f(x) =

{1, 0 ≤ x ≤ 10, otherwise


F (x) =

0, x < 0x, 0 ≤ x < 11, x ≥ 1

A random number (RN) stream is a collection of uniformly distributed random variables.A truly random stream of numbers has the following characteristics:

• Uniformly distributed.

• Continuous-valued.

• E(R) = 12.

• σ2 = 112

.

• No autocorrelation between numbers.

• No runs.

In practice one always works with streams of pseudo random numbers (PRN). Thesehave approximately the same characteristics as RN’s. PRN’s are generated with a computerusing a numerical algorithm embedded in a computer program or routine. The requirementsof a good PRNG routine are:

• Fast.

• Portable.

• Long Cycle.

• Replicability.

• Produce PRN with the desired characteristics.

23

8.1 The Linear Congruential Method

The established algorithm for PRN generation is the linear congruential method (LCM).More sophisticated approaches still use as foundation this method. The fundamental rela-tionship of the LCM is

Xi+1 = (aXi + c)mod(m)

This means that the value of Xi+1 is the remainder left from integer division of aXi + c bym. Note that the values obtained form the LCM are from the set I = {0, 1/m, 2/m, ..., (m−1)/m}.

One key feature of the method is its period (P ) (the number of numbers that can begenerated before the same number appears twice). The period is related to the values of mand c as follows:

• If m = 2b and |c| > 0, P = m = 2b.

• If m = 2b and c = 0, P = m/4 = 2b−2.

• If m = prime and c = 0, P = m− 1 = 2b − 1.

8.2 The Combined Linear Congruential Method

Large simulations require large collections of PRNs and there is a need for still longer periods.These can be obtained by the use of combined linear congruential methods (CLCM).The fundamental theorem associated with CLCM is L’Ecouyer’s.

If Wi, 1,Wi,2, ...,Wi,k are independent discrete-valued random variables with at least oneof them (say Wi,1) being uniformly distributed between 0 and m1 − 2. then

Wi = (k∑

j=1

Wi,j)mod(m1 − 1)

is a uniformly distributed RV between 0 and m1 − 2.More specifically, consider the following algorithm

Xi = (k∑

j=1

(−1)j−1Xi,j)mod(m1 − 1)

where the Xi,j are LC and with

Ri =

{Xi

m1, Xi > 0

m1−1m1

, Xi = 0

24

It can be shown that the maximum period obtained with this algorithm is

P =(m1 − 1)(m2 − 1)...(mk − 1)

2k − 1

Example. L’Ecuyer proposed the following CLCM:

X1,j+1 = 40014X1,jmod(2147483563)

X2,j+1 = 40692X2,jmod(2147483399)

produce the combined PRNG

Xj+1 = (X1,j+1 −X1,j+1)mod(2147483562)

to yield

Rj+1 =

{Xj+1

2147483563, Xj+1 > 0

21474835622147483563

, Xj+1 = 0

9 Tests for Random Numbers

Since one always works in practice with PRN streams it is necessary to check how close aretheir characteristics to those of real RN streams. Assume a stream containing N PRN’s hasbeen produced. To verify their characteristics the stream is subjected to various tests. In allcases, one states a hypothesis about a given characteristic of the stream and then acceptsit or rejects it with a given level of significance α where

α = P (rejectingH0|H0is true)

(i.e. Type I error).In testing for uniformity The null hypothesis H0 is

Ri ∈ U [0, 1]

while the alternative hypothesis H1 is

Ri /∈ U [0, 1]

In testing for independence The null hypothesis H0 is

Ri ∈ independent

while the alternative hypothesis H1 is

Ri /∈ independent

25

9.1 Kolmogorov-Smirnov Frequency test

For this test the numbers are first arranged in increasing order

R1 < R2 < ... < RN

The test makes use of the new variables

D+ = max (i

N− Ri)

D− = max (Ri − i− 1

N)

and

D = max (D+, D−)

Once D has been computed, a critical value Dc is obtained from the K-S statisticaltable for the desired α and the given N . Finally

• If D > Dc, H0 is rejected (H1 is accepted).

• If D ≤ Dc, H0 is not rejected (i.e. the numbers are uniformly distributed).

9.2 Chi-square Frequency test

In this test the numbers are arranged into n classes by subdividing the range [0, 1] into nsubintervals and determining how many of the numbers end up in each class i, (Oi).

The test uses the statistic

χ20 =

n∑i=1

(Oi − Ei)2

Ei

where Ei = N/n are the expected numbers of numbers in each class for a uniform distribu-tion.

Once χ20 has been computed, a critical value χ2

α,n−1 is obtained from the Chi-squarestatistical table. Finally

• If χ20 > χ2

α,n−1, H0 is rejected (H1 is accepted).

• If χ20 ≤ χ2

α,n−1, H0 is not rejected (i.e. the numbers are uniformly distributed).

26

9.3 Runs Test

This test aims to detect whether there are patterns in substrings of the stream. One examinesthe stream and checks whether each number is followed by a larger (+) or a smaller (−)number. Runs are the resulting patterns of +’s and −’s. In a truly random sequence themean and variance of the number of up and down runs a are given by

µa =2N − 1

3

and

σ2a =

16N − 29

90

When N > 20 the distribution of a is close to normal so the test statistic is

Z0 =a− µa

σa

which has the normal distribution of mean zero and unit standard deviation (N(0, 1)).Once Z0 has been computed a critical value zα/2 is obtained from the normal statistical

table. Finally

• If Z0 < −zα/2 or Z0 > zα/2, H0 is rejected (H1 is accepted).

• If −zα/2 ≤ Z0 ≤ zα/2, H0 is not rejected (i.e. the numbers are independent).

Other types of runs tests are also possible, for instance runs above and below the meanand run lengths. For runs above and below the mean a test similar to the one above is usedbut with the values of mean and variance for the number of runs b

µb =2n1n2

N+

1

2

and

σ2b =

2n1n2(2n1n2 −N)

N2(N − 1)

where n1 and n2 are, respectively, the numbers of runs above and below the mean.For run lengths one uses the Chi− square test to compare the observed number of runs

of given lengths against the expected number obtained in a truly independent stream.

27

9.4 Autocorrelation Test

This test aims to detect correlation among numbers in the stream separated by specificnumber of numbers (lag). Consider the autocorrelation test for a lag m. One investigatesthen the behavior of numbers Ri and Ri+jm. If the autocorrelation ρim > 0 there is positivecorrelation (i.e. high numbers follow high numbers and vice versa) and if ρim < 0 one hasnegative correlation. The autocorrelation is estimated by

ρ′im =1

M + 1[

M∑k=0

Ri+kmRi+(k+1)m] − 0.25

where M is the largest integer satisfying i+(M +1)m ≤ N . The test statistic is in this casegiven by

Z0 =ρ′imσρ′im

where

σρ′im =

√13M + 7

12(M + 1)

Once Z0 has been computed a critical value zα/2 is obtained from the normal statisticaltable. Finally

• If Z0 < −zα/2 or Z0 > zα/2, H0 is rejected (H1 is accepted).

• If −zα/2 ≤ Z0 ≤ zα/2, H0 is not rejected (i.e. the numbers are independent).

9.5 Gap Test

This test checks for independence by tracking down the pattern of gaps between a givendigit in the stream. The test is performed using the Kolmogorov-Smirnov scheme.

9.6 Poker Test

This test checks for independence based on the repetition of certain digits in the sequence.The test is performed using the Chi-square scheme.

10 Generation of Random Variates

Discrete event simulation models require as inputs the values of random variables withspecified probability distributions. Such random variables are called random variates.

28

Input data for DES models are collected from the field and/or produced from best availableestimates. However, the amount of data collected is rarely enough to run simulation modelsand one must use the data to create PRN streams with statistical characteristics similar tothose of the original data.

So, on the one hand one needs to identify the statistical characteristics of the originaldata and on the other one must be able to produce large collections of random variateswith statistical characteristics similar to those of the original data. Here we focus on thesecond aspect, namely once we have determined the probability distribution applicable toour data we proceed to generate random variate streams for use in the simulation. This isaccomplished by the inverse transform method.

10.1 The Inverse Transform Method

Given a random (or pseudo-random) number R and a random variate X,

• Determine the cumulative distribution function of X, F (X).

• Set F (X) = R.

• Solve the equation F (X) = R for X in terms of R, i.e. X = F−1(R).

• Repeat the above for the stream of random (or pseudo-random) numbers R1, R2, ..., Rn

to obtain the stream of random variates X1, X2, ..., Xn.

Next, the formulae obtained by the inverse transform method for several commonly usedrandom variates are given.

10.2 Inverse Transform for the Exponential Distribution

Following are the specific steps required to obtain exponentially distributed random variateswith mean λ from a random number stream using the inverse transform method.

• F (x) = 1 − e−λx.

• Set F (X) = 1 − e−λx = R.

• X = − 1λ

ln(1 −R).

• For i = 1, 2, ..., n, compute Xi = − 1λ

ln(1 −Ri)

29

10.3 Inverse Transform for the Uniform Distribution

Following are the specific steps required to obtain uniformly distributed random variatesbetween a and b from a random number stream using the inverse transform method.

• F (x) = x−ab−a

.

• Set F (X) = X−ab−a

= R.

• X = a+ (b− a)R.

• For i = 1, 2, ..., n, compute Xi = a+ (b− a)Ri

10.4 Inverse Transform for the Weibull Distribution

Following are the specific steps required to obtain Weibull distributed random variates withparameters α and β from a random number stream using the inverse transform method.

• F (x) = 1 − e−(x/α)β.

• Set F (X) = 1 − e−(X/α)β= R.

• X = α[ln(1 −R)]1β .

• For i = 1, 2, ..., n, compute Xi = α[− ln(1 − Ri)]1β

10.5 Inverse Transform for the Triangular Distribution

Following are the specific steps required to obtain random variates with triangular distribu-tion between 0 and 2 with mode 1 from a random number stream using the inverse transformmethod.

• F (x) =

0 x ≤ 0x2

20 < x ≤ 1

1 − (2−x)2

21 < x ≤ 2

1 x > 2

• Xi =

{ √2Ri 0 < Ri ≤ 1

2

2 −√

2(1 −Ri)12< Ri ≤ 1

30

10.6 Inverse Transform for Empirical Distributions

If no appropriate distribution can be found for the data one can resort to resampling thedata. This creates an empirical distribution. A simple empirical distribution can beproduced from given data by piecewise linear approximation.

Assume the available data points (observations) are arranged in increasing order x1, x2, ..., xn.Assume also that a probability is assigned to each resulting range xj − xj−1 such that thecumulative probability of the first j intervals is cj. The associated random variate is obtainedas

Xi = xj−1 +xj − xj−1

cj − cj−1(Ri − cj−1)

when cj−1 < Ri ≤ cj.

10.7 Inverse and Direct Transforms for the Normal Distribution

The normal distribution does not have a closed-form inverse transformation. However, thefollowing expression is an excellent approximation to the inverse cumulative distributionfunction of the standard normal distribution.

Xi ≈ R0.135i − (1 −Ri)

0.135

0.1975

From the above, random variates with a normal distribution of mean µ and standard devi-ation σ are readily obtained as

Xi ≈ µ+ σ(R0.135

i − (1 −Ri)0.135

0.1975)

A direct transformation can be used to produce two independent standard normal variatesZ1 and Z2 from two random numbers R1 and R2 according to

Z1 = (−2 lnR1)12 cos(2πR2)

and

Z2 = (−2 lnR1)12 sin(2πR2)

Normal random variates Xi with mean µ and standard deviation σ can then be obtainedfrom

Xi = µ+ σZi

31

10.8 Inverse and Direct Transforms for the Lognormal Distribu-tion

If the random variable Y has the normal distribution with mean µ and variance σ2, theassociated random variable X = exp(Y ) has the lognormal distribution with parameters µand σ2.

Thus, random variates with a standard lognormal distribution can be generated from theexpression

Xi ≈ exp(R0.135

i − (1 −Ri)0.135

0.1975)

Random variates with a lognormal distribution of parameters µ and σ are then generated by

Xi ≈ exp[µ+ σ(R0.135

i − (1 −Ri)0.135

0.1975)]

10.9 Inverse Transform for the Discrete Distributions

A similar procedure to the one indicated above can be used to produce discretely distributedrandom variates. Since the cumulative distribution functions for discrete distributions consistof discrete jumps separated by horizontal plateaus, lookup tables are a convenient and veryefficient method of generating inverses.

10.10 Other Methods of Generating Random Variates

When two or more random variables are added together to produce a new random variablewith a desired distribution one is using the method of convolution.

If one generates the random variate by selective accepting or rejecting numbers from arandom number stream one is using the acceptance-rejection technique.

Detailed descriptions of these two methods as well as examples can be found in yourtextbook.

32

Documents

Basic Concepts of Probability and Statistics for …...Basic Concepts of Probability and Statistics for Reliability Engineering Ernesto Gutierrez-Miravete Spring 2007 1 Introduction