Chapter 1 Motivation and Applicationpages.stat.wisc.edu/~yzwang/361note1.pdf · Chapter 1 Motivation and Application Example Suppose X ∼ f(x) and we want to evaluate E[h(X)] = R

Chapter 1

Motivation and Application

Example Suppose X ∼ f(x) and we want to evaluate E[h(X)] =∫

h(x) f(x) dx.

Monte Carlo computation: simulate Xi ∼ f(x), i = 1, · · · , n.

θ =1

n

n∑

i=1

h(Xi) −→ E[h(X)]

θ is unbiased

E(θ) =1

n

n∑

i=1

E[h(Xi)] = E[h(X)]

with variance

V ar(θ) =1

n

n∑

i=1

V ar[h(Xi)] =V ar[h(X)]

n

V ar[h(X)] = E[h2(X)] − {E[h(X)]}2 =

∫

h2(x) f(x) dx − {E[h(X)]}2

Case 1: X is univariate random variable. We may approximate E(X) by

X =

∑ni=1 Xi

n

Case 2 Suppose (U, V ) ∼ f(u, v) and we want to evaluate E[h(U, V )]. Monte Carlo

computation: simulate (Ui, Vi) ∼ f(u, v), i = 1, · · · , n.

1

n

n∑

i=1

h(Ui, Vi) −→ E[h(U, V )]

Approximate Cov(U, V ) by∑n

i=1 Ui Vi

n− U V

Example Suppose we want to evaluate∫

H(x) dx.

1

Monte Carlo method: find h(x) and a pdf f(x) such that

H(x) = h(x) f(x), X ∼ f(x)∫

H(x) dx =

∫

h(x) f(x) dx = E[h(X)]

Sample data X1, · · · , Xn from distribution f(x) and estimate E[h(X)] by

1

n

n∑

i=1

h(Xi)

Case 1 Compute∫ ∞

−∞log|x| exp[−(x + 1)2/8] dx. Take X ∼ N(−1, 4) and

f(x) = (8 π)−1/2 exp[−(x + 1)2/8], h(x) = (8 π)1/2 log|x|

Simulate Xi, i = 1, · · · , n, from N(−1, 4); compute h(Xi) = (8 π)1/2 log|Xi| and approximate

the integral by

(8 π)1/2 1

n

n∑

i=1

log|Xi|

Case 2 Compute∫ 5

0

√x e−x dx.

Method a: take X ∼ uniform on (0, 5), and f(x) = 1/5 for x ∈ (0, 5) and zero otherwise.

Then

h(x) = 5√

x e−x

Simulate random numbers Ui, i = 1, · · · , n; let Xi = 5 Ui; compute Yi = h(Xi) = 5√

Xi e−Xi

and approximate the integral by

Y =1

n

n∑

i=1

Yi

Method b: take X ∼ exponential distribution with mean one, and f(x) = e−x for x > 0

and zero otherwise. Then

h(x) =√

x I(0 < x < 5)

Simulate random numbers Ui, i = 1, · · · , n; let Xi = − log (1 − Ui); compute

Yi = h(Xi) =

{ √Xi 0 < Xi < 5

0 Xi ≥ 5

and approximate the integral by

Y =1

n

n∑

i=1

Yi

Example Let U1, U2, · · · , be a sequence of uniform random variables on (0, 1). Define N to

be the number of random numbers that must be summed to exceed one, that is

N = min

{

n :n

∑

i=1

Ui > 1

}

.

2

a) Outline a simulation to approximate E(N) and V ar(N).

b) Estimate them by generating 100 values of N .

c) Estimate them by generating 1000 values of N .

d) Estimate them by generating 10000 values of N .

e) What do you think is the value of E(N) ?

Example Suppose St and Xt, t = 1, · · · , 100 are stock prices and returns of T=100 days.

Evaluate

E(maxt

Xt − mint

Xt) E[(ST − K) 1(min Xt ≥ c)]

3

Chapter 2

Random Variable Simulation

2.1 Random numbers

The building block of simulation is the ability to generate random number which represents

the value of a random variable with uniform distribution on (0, 1). Random number were

originally either manually or mechanically generated, by using such techniques as spinning

wheels, or dice rolling, or card shuffling, the modern approach is to use a computer to succes-

sively generate pseudorandom numbers. These pseudorandom numbers constitute a sequence

of values, which, although they are deterministically generated, have all appearances of being

independent uniform (0,1) random variables.

Multiplicative Congruential method: start with an initial value x0, called the seed, and

then recursively computes successive values xn, n ≥ 1, by letting

xn = a xn−1 modulo m

where a and m are given positive integers, and where the above means that a xn−1 is divided

by m and the reminder is taken as the value of xn. Thus, each xn is either 0, 1, · · · ,m − 1

and the quantity xn/m, called a pseudorandom number, is taken as an approximation to the

value of a uniform (0,1) random variable.

Since each of the numbers xn assumes one of 0, 1, · · · ,m − 1, it follows that after some

finite numbers (of at most m) of generated values a value must repeat itself; and once this

happens, the whole sequence will be begin to repeat. Thus, we want to choose the constant

a and m so that, for any initial seed x0, the number of variables that can be generated before

this repetition occurs is large. a and m should be chosen to satisfy the three criteria:

1. For any initial seed, the resultant sequence has the ‘appearance” of being a sequence

of independent uniform (0,1) random variables.

4

2. For any initial seed, the number of variables that can be generated before repetition

begins is large.

3. The values can be computed efficiently on a digital computer.

A guideline is that m should be chosen to be a large prime number that can be fitted

to the computer word size. For a 32-bit word machine (where the first bit is a sign bit) it

has been shown that the choices of m = 231 − 1 and a = 75 = 16, 807 result in desirable

properties. For a 36-bit word machine, the choices of m = 235 − 31 and a = 55 appear to

work well.

Mixed Congruential method: another generator of pseudorandom numbers uses recur-

sion

xn = (a xn−1 + c) modulo m

Such generators are called mixed congruential generators, as they involves both an additive

and a multiplicative terms. m is oftern chosen to equal the computer’s word length, since

this makes the computation of a xn−1 + c modulo m quite efficient.

Most computer languages already have a built-in random number generator which can

be called to generate random numbers. As our starting point, we suppose that we can

generate a sequence of pseudorandom numbers which can be taken as an approximation to

the values of a sequence of independent uniform (0,1) random variables. We do not explore

the interesting theoretical questions relating to the construction of “good” pesudorandom

number generators. Rather, we assume that we have a “black box” that gives a random

number on request.

1.2 Generating discrete random variables

1.2.1 The inverse transformation method

Purpose: to generate the value of a discrete random variable X having probabilities

P (X = xj) = pj, j = 0, 1, · · · ,∑

j

pj = 1.

Approach: Generate a random number U [i.e. a uniform random variable on (0,1)] and set

X =

{

x0 U < p0

xj

∑j−1i=0 pi ≤ U <

∑ji=0 pi, j = 1, 2, · · · ,

5

Justification:

P (X = xj) = P (

j−1∑

i=0

pi ≤ U <

j∑

i=0

pi) =

j∑

i=0

pi −j−1∑

i=0

pi = pj.

Thus, X has the desired distribution. Let

F (x) =∑

xj≤x

pj = P (X ≤ x).

Then

F (xj) =

j∑

i=0

pi, X = F−1(U).

Example 1. (Bernoulli distribution) Let X ∼ Bernoulli(p), namely,

P (X = 1) = p = 1 − P (X = 0).

Then, we set

X =

{

0 U < 1 − p

1 otherwise= 1(1 − U ≤ p).

Alternatively, we may set

X =

{

1 U < p

0 otherwise= 1(U < p),

Example 2. Generate random variable X with distribution

P (X = 1) = 0.2, P (X = 2.5) = 0.15, P (X = 4) = 0.25,

P (X = 10) = 0.4.

Then we set

X =

1 U < 0.2

2.5 0.2 ≤ U < 0.35

4 0.35 ≤ U < 0.60

10 0.60 ≤ U

Example 3. (Discrete unifrom) Wish to simulate X with

P (X = xj) =1

N, j = 1, · · · , N.

Set

X = xj, ifj − 1

N≤ U <

j

Nor equivalently j − 1 ≤ N U < j.

6

We may simply write

X = xj, if int(N U) = j − 1.

In particular, if xj = j, then

X = j, if int(N U) = j − 1 ⇐⇒ X = int(N U) + 1.

Example 4. (Draw at random without replacement) Set k = N .

step 1. Generate U and set I = int(k U) + 1.

step 2. Interchange the values of xI with xk.

step 3. Let k = k − 1 and if k > N − n, go to step 2; otherwise stop.

Then, the new values x∗N , · · · , x∗

N−n+1 is a random sample from x1, · · · , xN .

In particular, the above method applies to double-blind controlled experiments. Suppose

1000 subjects are to be divided into two groups:{

Control Group : no treatment (placebo group)

Treatment Group : receive treatment

In this case, xj = j the label of the j-th subject, N = 1000 and n = 500.

Example 5. (Binomial distribution)

P (X = x) =

(

n

x

)

px (1 − p)n−x, x = 0, 1, · · · , n.

Note that

P (X = x + 1) =n − x

x + 1

p

1 − pP (X = x).

Algorithm: set c = p/(1 − p), i = 0, prob = (1 − p)n and F = prob.

step 1. Generate a random number U .

step 2. If U < F , set x = i and stop.

step 3. prob = prob c (n − i)/(i + 1), F = F + prob, and i = i + 1.

step 4. Go to step 2.

Remark.

1. E(X) = n p, it takes about n p + 1 steps of search.

2. If X ∼ Binomial(n, p); then n−X ∼ Binomail(n, 1− p). Thus, if p > 1/2, it is more

efficient to generate Y ∼ Binomial(n, 1 − p) and then set X = n − Y .

7

3. A simpler method is to use X = X1 + · · · + Xn, where Xi are i.i.d. Bernoulli random

variables. With n i.i.d. random numbers Ui, we set X = 1(1−U1 ≤ p)+· · ·+1(1−Un ≤p).

Example 6. (Poisson distribution) Generate X with

P (X = x) =λx e−λ

x!, x = 0, 1, · · · ,

Note that

P (X = x + 1) =λ

x + 1P (X = x).

Algorithm: set i = 0, prob = e−λ and F = prob.


step 2. If U < F , set x = i and stop.

step 3. prob = prob λ/(i + 1), F = F + prob, and i = i + 1.

step 4. Go to step 2.

Example 7. (Geometric distribution) Simulate X with

P (X = x) = p (1 − p)x−1, x = 1, 2, · · · ,

Note thatj

∑

x=1

P (X = x) = p

j∑

x=1

px−1 = 1 − (1 − p)j.

Thus,

X = j, when 1 − (1 − p)j−1 ≤ U < 1 − (1 − p)j,

or equivalently,

(1 − p)j < 1 − U ≤ (1 − p)j−1 ⇐⇒

j log(1 − p) < log(1 − U) ≤ (j − 1) log(1 − p) ⇐⇒ j − 1 ≤ log(1 − U)

log(1 − p)< j.

Then

X = int

(

log(1 − U)

log(1 − p)

)

+ 1.

8

1.2.2 Rejection method

The rejection method provides a useful method for generating random variables without

inverting the cumulative distribution function.

Aim: Draw a random variable with

P (X = xj) = pj, j = 1, 2, · · · ,

Assumption:

(a). We have efficient way to generate Y such that P (Y = xj) = qj, j = 1, 2, · · · .

(b). maxj(pj/qj) ≤ c.

Idea: Generate the value of Y as a proposal and accept the proposal Y = xj with probability

pj/(c qj).

Algorithm

step 1. Simulate Y , and denote the obtained value by xJ .


step 3. If U ≤ pJ/(c qJ), set X = Y and stop. Otherwise, go to step 1.

Justification:

P (Y = xj, xj is accepted) = P (Y = xj) P (xj is accepted|Y = xj)

= qjpj

c qj

=pj

c.

Thus,

P (a value of Y is accepted) =∑

j

P (Y = xj, xj is accepted)

=∑

j

pj

c=

1

c.

Now,

P (X = xj) =∞

∑

k=1

P (xj is accepted on the k-th iteration)

=∞

∑

k=1

pj

c

(

1 − 1

c

)k−1

= pj.

Thus, the generated random value for X has distribution pj.

9

Remark. The number of iterations N needed to generate one value of X is a random

variable, having geometric distribution with mean c. Thus, c represents the expected number

of iterations. The closer of c to 1, the more efficient the algorithm is. Note c ≥ 1 since

pj ≤ c qj implies 1 =∑

j pj ≤ c∑

j qj = c.

Example 8. Simulate a random variable X with

P (X = 1) = 0.25, P (X = 2) = 0.18, P (X = 3) = 0.22,

P (X = 4) = 0.2, P (X = 5) = 0.15.

Let Y = int(5 U)+1 be the uniform distribution on {1, 2, 3, 4, 5}, namely, P (Y = j) = 0.2 =

qj, j = 1, 2, 3, 4, 5. Let c = maxj(pj/qj) = 1.25.

Algorithm

step 1. Simulate a random U1 and Y = int(5 U1) + 1.

step 2. Generate another random number U2.

step 3. If U2 ≤ pY /0.25, set X = Y and stop. Otherwise, go to step 1.

1.3 Generating continuous random variables

1.3.1 The inverse transformation method

The method is based on the following result. Let F be the cumulative distribution

function. Let

F−1(u) = inf{x : F (x) ≥ u}.

This definition holds for both discrete and continuous random variables. For a strictly

increasing function F , if x = F−1(u), then F (x) = u.

Theorem 1 Let U be a random number. Then X = F−1(U) has distribution fucntion F .

Proof.

P (X ≤ x) = P (F−1(U) ≤ x) = P (U ≤ F (x)) = F (x).

The above algorithm provides a simple algorithm to generate random variable X from ran-

dom number: compute F−1(U). But this can be expensive to compute.

Example 1. (Exponential distribution) Simulate X with density

f(x) = e−x, x ≥ 0.

10

The distribution function

F (x) = 1 − e−x, x ≥ 0.

Let x = F−1(u). Then

u = F (x) = 1 − e−x ⇐⇒ x = −log(1 − u).

Hence,

X = F−1(U) = −log(1 − U).

Since 1 − U is uniform on (0,1), it follows that X = −logU has the standard exponential

distribution.

If Y = β X, then Y has the exponential distribution with mean β:

f(y) =1

βexp

(

− y

β

)

, x ≥ 0.

It can be simply generated as

Y = −β log(U).

Remark. It is easy to varify that X ∼ χ22 = exponential(2)=Gamma(1,2). Thus, Y ∼ χ2

2 m

can be generated as

Y = −2 logU1 − · · · − 2 logUm = −2 log(U1 · · · Um).

Also, Y ∼ Gamma(m,β) can be simulated by

Y = −β log(U1 · · · Um).

Example 2. (Cauchy distribution) The Cauchy distribution has density

f(x) =1

π

1

1 + x2.

It has heavy tails: mean does not exist.

F (x) =1

π{arctan(x) + π/2} =

1

π{tan−1(x) + π/2}.

Let

u =1

π{tan−1(x) + π/2}.

Then

x = tan(π[u − 1/2]).

Thus,

F−1(u) = tan(π[u − 1/2]), X = tan(π[U − 1/2]).

11

This method requires to evaluate the trigonometric functions, which is relatively expensive.

Example 3. (Normal distribution) Let Φ(x) be the cumulative distribution function of a

standard normal random variable. Then

X = Φ−1(U)

has the standard normal distribution. The drawback is that one has to compute Φ−1(u),

which is quite expensive.

1.3.2 Rejection method

The rejection method provides an alternative way to generate X ∼ f .

Let g be a density that can easily be simulated from. Suppose

f(x)

g(x)≤ c for all x.

Necessarily, g(x) has a heavier tail than f(x) since

lim supx→−∞

f(x)

g(x)≤ c and lim sup

x→∞

f(x)

g(x)≤ c.

Further,f(x)

g(x)≤ c =⇒ 1 =

∫

f(x) dx ≤ c

∫

g(x) dx = c.

Rejection method:

step 1. Generate Y ∼ g as a proposal.


step 3. If U ≤ f(Y )/[c g(Y )], set X = Y . Otherwise, return to step 1.

The unconditional probability of acceptance is

P (acceptance) = E[P (acceptance|Y )] = E

(

f(Y )

c g(Y )

)

=

∫

f(y)

c g(y)g(y) dy =

1

c.

Hence, letting N = the number of iterations in the algorithm to get a value of X, we have

P (N = n) =1

c

(

1 − 1

c

)n−1

, n = 1, 2, · · · ,

this is a geometric distribution with mean c.

12

Theorem 2 The random variable generated by the algorithm has density f . The number of

iterations in the algorithm follows a geometric distribution with mean c.

Proof. The second statement is already shown above. Let prove the first one. Let Xsim be

the outcome of the above algorithm. The algorithm sets

Xsim = Y only when U ≤ f(Y )

c g(Y ).

Let h(y) = f(y)/[c g(y)]. Then

P (Xsim ≤ x) = P [Y ≤ x|U ≤ h(Y )] =P [Y ≤ x, U ≤ h(Y )]

P [U ≤ h(Y )].

By using a property of conditional probability

P [Y ≤ x, U ≤ h(Y )] = E{P [Y ≤ x, U ≤ h(Y )|Y ]}= E{1(Y ≤ x) h(Y )}

=

∫

1(y ≤ x) h(y) g(y) dy

=1

c

∫ x

−∞

f(x) dx

=F (x)

c,

where F (x) is the distribution function of X. Similarly,

P [U ≤ h(Y )] = P [Y < ∞, U ≤ h(Y )] =F (∞)

c=

1

c.

Hence,

P (Xsim ≤ x) = F (x).

From the above theorem, the smaller the constant c is, the more efficient the algorithm. The

smallest c is

c = maxx

f(x)

g(x).

Example 5. Use the rejection method to generate a random variable with

f(x) =2

π

√1 − x2, |x| ≤ 1.

Let g(x) = 1/2 for −1 ≤ x ≤ 1 be the density of the uniform distribution on (-1,1). Then

f(x)

g(x)=

4

π

√1 − x2

and

c = maxx

f(x)

g(x)=

4

π≈ 1.27.

Algorithm:

13

x

y

-1.0 -0.5 0.0 0.5 1.0

-0.5

0.0

0.5

Figure 1.1: Density Function

step 1. Generate random numbers U1 and U2.

step 2. Let Y = 2 U1 − 1.

step 3. If U2 ≤√

1 − Y 2, set X = Y ; otherwise go to step 1.

Example 6. (Normal-Exponential) Use the rejection method to generate Z ∼ N(0, 1).

Since the normal distribution is symmetric about zero, we can simulate |Z| first and then

add negative sign with probability 0.5. Let X = |Z|. Then X has density

f(x) =

√

2

πe−x2/2, x > 0.

Let

g(x) = e−x, x > 0.

Thenf(x)

g(x)=

√

2

πe−x2/2+x.

Maximizing above function is the same as maximizing

h(x) = −x2

2+ x.

By calculus,

h′(x) = −x + 1 = 0 =⇒ x = 1.

14

So the maximum is attained at x0 = 1 (strictly, one needs to varify h′′(x) < 0). Thus,

c =

√

2

πe−1/2+1 =

√

2 e

π≈ 1.32.

f(x)

c g(x)= exp

{

−(x − 1)2/2}

.

Algorithm:


step 2. Let Y = −logU1.

step 3. If U2 ≤ exp {−(Y − 1)2/2}, set X = Y ; otherwise go to step 1.

Note that U2 ≤ exp {−(Y − 1)2/2} is equivalent to −logU2 ≥ (Y − 1)2/2 and −logU2 has a

standard exponential distribution. If one wants to generate Z ∼ N(0, 1), modify step 3 as

follows.

step 3. If U2 ≤ exp {−(Y − 1)2/2}, go to step 4; otherwise go to step 1.

step 4. Generate random number U3. If U3 < 0.5 set Z = Y ; otherwise set Z = −Y .

Example 7. Suppose we want to generate Gamma(32, 1) random variable

f(x) =1

Γ(3/2)x1/2 e−x, x > 0,

where Γ(3/2) =√

π/2. As Gamma(32, 1) has mean 3/2, we choose Exp(3/2) and

g(x) =2

3e−2 x/3, x > 0.

Nowf(x)

g(x)=

3√π

x1/2 e−x/3.

Differentiating and setting the derivative equal to zero, we find the ratio is maximized at

x = 3/2. Thus

c = 3

√

3

2 π e≈ 1.257,

f(x)

c g(x)=

√

2 e/3 x1/2 e−x/3.

Algorithm:


step 2. Let Y = −32logU1.

step 3. If U2 ≤√

2 e/3 Y 1/2 e−Y/3, set X = Y ; otherwise return to step 1.

15

1.4 Generating normal random variables

1.4.1 Box-Muller Method

Let X1 ∼ N(0, 1) and X2 ∼ N(0, 1) be independent. Observe that

X21 + X2

2 ∼ χ22 = −2 log U1,

where U1 ∼ unif(0,1) is a random number.

-0.2 0.0 0.2 0.4 0.6 0.8

0.0

0.2

0.4

0.6

0.8

R

x

y

theta

Figure 1.2: Polar Coordinate

Question: Given X21 + X2

2 = −2 log U1, how to obtain X1 and X2 ?

Let Θ be the randon angle between X1 and X21 + X2

2 . Then Θ is unformly distributed

on (0, 2 π), leading to{

X1 =√−2 log U1 cos(Θ) =

√−2 log U1 cos(2 π U2),

X2 =√−2 log U1 sin(Θ) =

√−2 log U1 sin(2 π U2),

where U2 = Θ/(2 π) is a random number.

Justification: Let

x21 + x2

2 = −2 logu1 =⇒ u1 = exp{−(x21 + x2

2)/2},

tan(2 π u2) =x2

x1

=⇒ u2 =1

2 πtan−1

(

x2

x1

)

.

16

The Jacobian of the polar transformation∣

∣

∣

∣

∂(u1, u2)

∂(x1, x2)

∣

∣

∣

∣

=1

2 πexp{−(x2

1 + x22)/2}.

By the formula for the random variable transformation, we have

f(x1, x2) = f(u1, u2)

∣

∣

∣

∣

∂(u1, u2)

∂(x1, x2)

∣

∣

∣

∣

=1

2 πexp{−(x2

1 + x22)/2}.

Hence X1 and X2 are independent N(0, 1).

Algorithm:


step 2. Compute

X1 =√

−2 log U1 cos(2 π U2), X2 =√

−2 log U1 sin(2 π U2).

1.4.2 Polar Method

x

y

-1.0 -0.5 0.0 0.5 1.0

-1.0

-0.5

0.0

0.5

1.0

R

V_1

V_2

theta

Figure 1.3: Uniform Distributions in Square and Circle

Let Vi = 2 Ui − 1. Then V1 and V2 are independent and uniformly distributed on (-1,1).

Suppose we continuously generate such pairs (V1, V2) until we obtain one that is contained

17

in the circle of radius 1 centred at origin, that is, (V1, V2) with V 21 + V 2

2 ≤ 1. It now follows

such pairs (V1, V2) is uniformly distributed in the circle. If we let R and Θ denote the polar

coordinates of the pair, then R and Θ are independent, with R2 being uniformly distributed

on (0,1) and Θ being uniformly distributed on (0, 2 π). Since Θ is a random angle, we

generate its sine and cosine by generating a random point (V1, V2) in the circle and then

setting

sin(Θ) =V2

R=

V2√

V 21 + V 2

2

, cos(Θ) =V1

R=

V1√

V 21 + V 2

2

.

Since R2 is uniform on (0,1) and independent of Θ, from Box-Muller transformation we have

X1 =√

−2 logR2V1

R, X2 =

√

−2 logR2V2

R.

are independent standard normal random variables.

Algorithm:


step 2. Set V1 = 2 U1 − 1, V2 = 2 U2 − 1, and S = V 21 + V 2

2 .

step 2. If S > 1, return to step 1. Otherwise, return independent normal random variables

X1 =

√

−2 log S

SV1, X2 =

√

−2 log S

SV2.

Suppose Z ∼ N(0, 1). Then use

X = µ + σ Z ∼ N(µ, σ2)

to simulate normal distribution with mean µ and variance σ2.

1.4.3 Multivariate normal random variables

Suppose X ∼ Nk(µ, Σ), a k-dimensional normal random variable.

Cholesky decomposition: there exist a unique lower-triangular matrix L such that

Σ = LLT . Then

X = µ + LZ,

where Z = (Z1, · · · , Zk)T , and Z1, · · · , Zk are i.i.d. N(0,1) random variables.

Example Suppose X = (X1, X2)T ∼ N2(µ, Σ), where µ = (µ1, µ2),

Σ =

(

σ21 ρ σ1 σ2

ρ σ1 σ2 σ22

)

.

18

First simulate Z1 and Z2 are i.i.d. with distrbution N(0, 1). Let

Z∗2 = ρZ1 +

√

1 − ρ2 Z2.

Then Z∗2 ∼ N(0, 1), and Z1 and Z2 have correlation ρ. We simulate (X1, X2) by

X1 = µ1 + σ1 Z1

X2 = µ2 + σ2 Z∗2

= µ2 + σ2

(

ρZ1 +√

1 − ρ2 Z2

)

.

1.5 ARMA models

Suppose time series X1, · · · , XN is stationary, and let

µ = E(Xt), γ(h) = Cov(Xt, Xt+h), ρ(h) =γ(h)

γ(0),

where γ(h) is called autocovariance function (ACVF) and ρ(h) is called autocorrelation

function (ACF). γ(0) = V ar(Xt).

MA(1) Consider a moving average of order 1

Xt = α0 + εt + β1 εt−1,

where εt are i.i.d. N(0, σ2).

µ = E(Xt) = α0, γ(0) = V ar(Xt) = (1 + β21) σ2,

ρ(1) =β1

1 + β21

, ρ(h) = 0 for h ≥ 2.

Proof.

E(Xt) = α0 + Eεt + β1 Eεt−1 = α0.

V ar(Xt) = V ar(εt) + β21 V ar(εt−1) = σ2 + β2

1 σ2.

γ(1) = Cov(Xt, Xt+1) = Cov(εt + β1 εt−1, εt+1 + β1 εt)

= β1 Cov(εt, εt) = β1 σ2.

For h ≥ 2,

γ(h) = Cov(εt + β1 εt−1, εt+h + β1 εt+h−1) = 0.

AR(1) Consider an autoregressive of order 1

Xt = α0 + α1 Xt−1 + εt,

19


µ = E(Xt) =α0

1 − α1

, γ(0) = V ar(Xt) =σ2

1 − α21

, ρ(h) = αh1 .

Proof.

µ = α0 + α1 E(Xt−1) + E(εt) = α0 + α1 µ.

γ(0) = V ar(α1 Xt−1 + εt)

= α21 V ar(Xt−1) + V ar(εt)

= α21 γ(0) + σ2.

γ(h) = Cov(α0 + α1 Xt−1 + εt, α0 + α1 Xt+h−1 + εt+h)

= α1 Cov(α0 + α1 Xt−1 + εt, Xt+h−1)

= α1 Cov(Xt, Xt+h−1) = α1 γ(h − 1)

= · · · = αh1 γ(0).

ARMA(1,1) Consider a ARMA process

Xt = α0 + α1 Xt−1 + εt + β1 εt−1,


µ = E(Xt) =α0

1 − α1

,

γ(0) = V ar(Xt) = σ2

(

1 +(α1 + β1)

2

1 − α21

)

= σ2 1 + 2 α1 β1 + β21

1 − α21

,

γ(1) = σ2

(

α1 + α2 + α2(α1 + β1)

2

1 − α21

)

,

ρ(1) = α1 +β1 σ2

γ(0), ρ(h) = α1 ρ(h − 1) = αh−1

1 ρ(1).

Proof.

µ = α0 + α1 E(Xt−1) = α0 + α1 µ.

γ(0) = V ar(Xt) = α21 V ar(Xt−1) + V ar(εt) + β2

1 V ar(εt−1) + 2 α1 β1

+Cov(Xt−1, εt−1)

= α21 γ(0) + σ2 + β2

1 σ2 + 2 α1 β1 σ2,

where the last equality is due to

Cov(Xt−1, εt−1) = Cov(α0 + α1 Xt−2 + εt−1 + β1 εt−2, εt−1)

= Cov(εt−1, εt−1) = σ2.

Algorithm to generate ARMA(1,1) process without given initial value: for t = −200(say),−199, · · · , 1, 2, ·step 1. Generate independent εt ∼ N(0, 1).

20

step 2. Compute Xt from

Xt = α0 + α1 Xt−1 + β1 εt,

with X−201 = 0 (say).

The initial series {X−200, · · · , X0} is a pre-burning period and is discarded.

Estimation: covariance Based on observation X1, · · · , XN , we define sample mean and

sample variance

X =1

N

N∑

t=1

Xt, S2 =1

N

N∑

t=1

(Xt − X)2,

We use X and S2 to estimate µ and γ(0) = V ar(Xt). Define sample covariance

γ(h) =1

N

N−h∑

t=1

(Xt − X)(Xt+h − X),

(γ(0) = S2) and sample correlation

ρ(h) =γ(h)

γ(0).

Then ACVF γ(h) and ACF ρ(h) are estimated by γ(h) and ρ(h), respectively. Plot ACF

ρ(h) against h and check its correlation pattern. The pattern helps us to select a model.

Estimation: parameter Consider AR(1) model

Xt = α0 + α1 Xt−1 + εt,


Suppose we have observation X1, · · · , Xn from the model.

Method of Moments Let X, S2n, and ρ(1) are sample mean, sample variance, and

sample correlation at lag 1. Then set

α0

1 − α1

= X,σ2

1 − α21

= S2n, α1 = ρ(1),

and obtain solution

α1 = ρ(1), α0 = (1 − α1) X, σ2 = (1 − α21) S2

n.

MLE For each Xk, conditional on X1, · · · , Xk−1, the conditional distribution of Xk is a

normal distribution with mean α0 + α1 Xk−1 and variance σ2. Thus, the joint distribution

of X1, · · · , X2

f(X1)n

∏

i=2

φ

(

Xi − α0 − α1 Xi−1

σ

)

= f(X1)

(

1

2 π σ2

)(n−1)/2

exp

{

− 1

2 σ2

n∑

i=2

(Xi − α0 − α1 Xi−1)2

}

,

21

which gives the likelihood as a function of αi and σ2. To find the MLEs of αi and σ2 we need

to maximize the likelihood function. First, we note that to maximize the likelihood over αi

is equivalent to minimizen

∑

i=2

(Xi − α0 − α1 Xi−1)2

which is a least squares estimator of regression problem for

(X2, X1), (X3, X2), · · · , (Xn, Xn−1)

Denote by the least squares estimator by αi. Then

α1 =

∑ni=2 Xi Xi−1 −

∑ni=2 Xi

∑ni=2 Xi−1/(n − 1)

∑ni=2 X2

i−1 − (∑n

i=2 Xi−1)2/(n − 1),

α0 =1

n − 1

(

n∑

i=2

Xi − α1

n∑

i=2

Xi−1

)

Then the estimator of σ2

σ2 =1

n − 3

n∑

i=2

(Xi − α0 − α1 Xi−1)2

For large sample, the MLE estimator α1 is apprxomately normal with mean α1 and

variance (1 − α21)/n.

Example The Dow Jones Utilities Index, August 28 to December 18, 1972. Denote by

Di the index. The difference at lag 1: Xt = Dt − Dt−1 seems to be stationary. There

are 77 values of Xt. Fit AR(1) model to the lag 1 difference and obtain α0 = 0.0699 and

α1 = 0.4471. The standard deviation of α1 is

√1 − 0.44712 = 0.1019

95% confidence of α1 is 0.4471 ± 1.96 × 0.1019 = 0.4771 ± 0.1997 = (0.2774, 0.6768). The

estimated value of σ is σ = 0.1455.

The fitted model is

Xt = 0.0699 + 0.4471Xt−1 + εt

where εt are i.i.d. normal with mean zero and standard deviation 0.1455.

22

Documents

Chapter 1 Motivation and Applicationpages.stat.wisc.edu/~yzwang/361note1.pdf · Chapter 1 Motivation and Application Example Suppose X ∼ f(x) and we want to evaluate E[h(X)] = R