Upload
bob
View
226
Download
0
Embed Size (px)
Citation preview
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 1/62
1
Zhi Ying Feng
ACTL2003 Stochastic Modelling
By Zhi Ying Feng
ContentsPart 1: Stochastic Processes ................................................................................................................. 4
Stochastic Processes ......................................................................................................................... 4
Increment Properties .................................................................................................................... 4
Markov Processes ............................................................................................................................ 4
Transition Probability .................................................................................................................. 5
Chapman-Kolmogorov Equations................................................................................................ 5
Classification of States ................................................................................................................. 6
Irreducible Markov Chain ............................................................................................................ 6
Recurrent and Transient States .................................................................................................... 6
Class Properties ............................................................................................................................ 8
Limiting Probabilities .................................................................................................................. 8
Mean Time in Transient States .................................................................................................... 9
Branching Processes .................................................................................................................... 9
Probability Generating Functions .............................................................................................. 10
Time Reversible Markov Chains ............................................................................................... 11
Exponential, Poisson and Gamma Distributions........................................................................ 11
Counting Process............................................................................................................................ 12
Poisson Process .......................................................................................................................... 13
Inter-arrival and Waiting Time .................................................................................................. 13
Order Statistics ........................................................................................................................... 13
Conditional Distribution of Arrival Time .................................................................................. 14
Thinning of Poisson Process ...................................................................................................... 14
Non-homogenous Poisson Process ............................................................................................ 15
Compound Poisson Process ....................................................................................................... 15
Continuous-time Markov Chains ................................................................................................... 16
Time Spent in a State ................................................................................................................. 16
Transition Rates and Probabilities ............................................................................................. 17
Chapman-Kolmogorov Equations.............................................................................................. 18
Limiting Probabilities ................................................................................................................ 19
Embedded Markov Chain .......................................................................................................... 19
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 2/62
2
Zhi Ying Feng
Time Reversibility...................................................................................................................... 20
Birth and Death Process ................................................................................................................. 21
Transition Rates and Embedded Markov Chain ........................................................................ 21
Examples of Birth and Death Processes .................................................................................... 21
Expected Time in States ............................................................................................................. 22
Kolmogorov Equations .............................................................................................................. 23
Limiting Probabilities ................................................................................................................ 23
Application: Pure Birth Process ................................................................................................. 24
Application: Simple Sickness Model ......................................................................................... 24
Occupancy Probabilities and Time ............................................................................................ 25
First Holding Time ..................................................................................................................... 25
Non-homogenous Markov Jump Processes ................................................................................... 26
Residual Holding Time .............................................................................................................. 27
Current Holding Time ................................................................................................................ 27
Part 2: Time Series ............................................................................................................................. 28
Introduction to Time Series ............................................................................................................ 28
Classical Decomposition Model ................................................................................................ 28
Moving Average Linear Filters .................................................................................................. 28
Differencing ............................................................................................................................... 29
Stationarity ................................................................................................................................. 30
Sample Statistics ........................................................................................................................ 30
Noise .......................................................................................................................................... 31
Linear Processes ......................................................................................................................... 31
Time Series Models ....................................................................................................................... 32
Moving Average Process ........................................................................................................... 32
Autoregressive Process .............................................................................................................. 33
Autoregressive Moving Average Models .................................................................................. 34
Causality..................................................................................................................................... 34
Invertibility................................................................................................................................. 35
Calculation of ACF .................................................................................................................... 35
Partial Autocorrelation Function ................................................................................................ 37
Model Building .............................................................................................................................. 38
Model Selection ......................................................................................................................... 38
Parameter Estimation ................................................................................................................. 39
Model Diagnosis ........................................................................................................................ 40
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 3/62
3
Zhi Ying Feng
Non-Stationarity ............................................................................................................................. 41
Stochastic Trends ....................................................................................................................... 41
ARIMA Model ........................................................................................................................... 42
SARIMA Model ......................................................................................................................... 42
Dickey-Fuller Test ..................................................................................................................... 43
Overdifferencing ........................................................................................................................ 44
Cointegrated Time Series ........................................................................................................... 44
Time Series Forecasting ................................................................................................................. 45
Time Series and Markov Property ............................................................................................. 45
k -step Ahead Predictor ............................................................................................................... 46
Best Linear Predictor ................................................................................................................. 47
Part 3: Brownian Motion.................................................................................................................... 48
Definitions .................................................................................................................................. 48
Properties of Brownian Motion.................................................................................................. 48
Brownian Motion and Symmetric Random Walk...................................................................... 49
Brownian Motion with Drift ...................................................................................................... 49
Geometric Brownian Motion ..................................................................................................... 49
Gaussian Processes .................................................................................................................... 50
Differential Form of Brownian Motion ..................................................................................... 50
Stochastic Differential Equations............................................................................................... 51
Stochastic Integration ................................................................................................................. 52
Part 4: Simulation............................................................................................................................... 53
Continuous Random Variables Simulation .................................................................................... 53
Pseudo-Random Numbers.......................................................................................................... 53
Inverse Transform Method......................................................................................................... 53
Discrete Random Variable Simulation ...................................................................................... 54
Acceptance-Rejection Method ................................................................................................... 55
Simulation Using Distributional Relationships.......................................................................... 56
Monte Carlo Simulation ................................................................................................................. 57
Expectation and Variance .......................................................................................................... 57
Antithetic Variables ................................................................................................................... 58
Control Variates ......................................................................................................................... 59
Importance Sampling ................................................................................................................. 60
Number of Simulations .............................................................................................................. 60
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 4/62
4
Zhi Ying Feng
Part 1: Stochastic Processes
Stochastic Processes
A stochastic process is any collection of random variables, denoted as:
, X t t T
T is the index set of the process, usually the time parameter X(t) is the state of the process at time t
S is the state space, i.e. the set of values that X(t) can take on
Stochastic processes can be classified according to the nature of the:
Index Set T
- Discrete time process: index set is finite or countably infinite
- Continuous time process: index set is continuous
State Space S
-
Discrete state space: X(t) is a discrete random variable- Continuous state space: X(t) is a continuous random variable
A sample path or a realisation of a stochastic process is a particular assignment of possible values
of X(t) for all t ∈ T .
Increment Properties
In a Markov chain, an increment is a random variable for all 1 2,t t :
2 1 X t X t
A stochastic process has independent increments if 0 1 0 1, ,..., n n X t X t X t X t X t areindependent for all 1 2 ... nt t t . Equivalently, the r.v. X t s X t and X t are independent
for all s t , i.e. future increases are independent of the past or present.
A stochastic process has stationary increments if 2 1 2 1and X t X t X t X t , i.e.
increments of the same length, have the same probability distribution, for all 1 2, and 0t t .
Markov Processes
A Markov process is a stochastic process that has the Markov property, where given the present
state, the future state is independent of the past states:
1 1 1 1 1 1Pr | ,..., Pr |n n n n n n n n X t x X t x X t x X t x X t x
A Markov Chain is a Markov process on a discrete index set 0,1,2,...T denoted by
, 0,1,2,...n X n
i.e. the process is in state k at time n, and a finite or countable state space:
0,1,2,..., or 0,1,2,...S n S
The Markov property for a Markov chain is given by:
1 1 0 0 1 1Pr | ,..., Pr |n n n n n n n n X x X x X x X x X x
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 5/62
5
Zhi Ying Feng
Transition Probability
The one-step transition probabilities of the Markov chain is the conditional probability of
moving to state j in one step, given that the process is in state i in the present: , 1
1Pr |n n
ij n n P X j X i
If the one-step transition probabilities do NOT depend on time, i.e. same for all n to n+1, then the
Markov chain is homogenous with stationary transition probabilities. , 1
1Pr |n n
ij ij n n P P X j X i
The transition probability matrix is the matrix consisting of all transition probabilities:
00 01 02
10 11 12
20 21 22
ij
P P P
P P P P
P P P
P
This matrix satisfies the properties:
0 for , 0,1, 2... and 1 for 0,1, 2...ij ij
all j P i j P i
Note: in the non-homogenous case, time must be specified.
The n -step transition probability is the probability that a process in state i will be in state j in n
steps:
Pr |n
ij n m m P X j X i
Chapman-Kolmogorov Equations
The Chapman-Kolmogorov equations computes the n+m-step transition probabilities:
0
for all , 0n m n m
ij ik kj
k
P P P n m
This is the sum of the transitional probabilities of reaching an intermediate state k after n steps, then
reaching state j after m steps. Or in matrix multiplication:
...
n m n m
n m times
P P P P P P
Where n nij P P is the matrix consisting of the n-step transition probabilities.
Consider the matrix multiplication A*B = AB
To get the ith row of AB, multiply the ith row of A by B
To get the jth column of AB, multiply A by the jth column of A
Kolmogorov Forward Equations: start in state i, transition into state k after n steps, then a one-
step transition to state j
1
0
for all 0n n
ij ik kj
k
P P P n
Kolmogorov Backward Equations: start in state i, one-step transition into state k , then n step
transition into state j.
1
0
for all 0n n
ij ik kj
k
P P P n
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 6/62
6
Zhi Ying Feng
Classification of States
Absorbing state
A state i is said to be an absorbing state if 1ii P or 0 for allij P j i . An absorbing state is a state
in which, if the process ones arrive in it, the process will stay in.
Accessible
State j is accessible from state i if 0n
ij P for some 0n . State j is accessible from i if there is a
positive probability that the process will be in state j at some future time, given that the process is
currently in state i. This is written as:
i j
Note that:
if and theni j j k i k
The absorbing state is the ONLY accessible state for an absorbing state.
Communicate
States i and j communicates if:
and , i.e.i j j i i j
That is, there is a positive probability that if the process is currently in state i, then at a future point
in time the process will return to state i in between visiting state j. Note that:
for all
if then
if and then
i i i
i j j i
i j j k i k
The class of states that communicate with state i is the set:
:C i j S i j
That is, there is a positive probability that if the process is in state i, then at a future point in time the
process will return to state i in between visiting at least one state in class C
Irreducible Markov Chain
A Markov chain is irreducible if there is only ONE class, i.e. all states communicate with each
other. Properties:
All states in a finite, irreducible Markov chain is recurrent
The probability of returning to the current state in the long run is positive The probability of being in any state in the long run is also positive
Recurrent and Transient States
Let f i be the probability that the process starting in state i will return to state i sometime in the future.
0Pr |ni ii n f P X i X i
A state is:
Recurrent if 1i f
Transient if 1i f
If state i is recurrent then, starting in state i, the process will return to state i at some point in the
future, and the process will enter state i infinitely often.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 7/62
7
Zhi Ying Feng
If state i is transient then, starting in state i, the number of the times the process will enter state i
thereafter is a Bernoulli random variable
0 if process returns to state
~ 11 if process does not return to state
i i
i X Ber p f
i
Then the probability that the process will be in state i for exactly n time periods can be modelled
using the geometric distribution:
1 1n
i i f f
The total number of periods that the process is in state i is given by:
0
1 if ;, where
0 if ,
n
n n
nn
X i I I
X i
With the expected number of time periods that the process starts in state i:
0 0
0 0
0
0
0
| |
Pr |
1
1
n n
n n
n
n
nii
n
i
E I X i E I X i
X i X i
P
f
An alternate definition for recurrent/transient states is if a state i is:
1 1
recurrent if ; transient if n nii ii
n n
P P
That is, a transient state will only be visited a finite number of times.
Note:
In a finite state Markov chain, AT LEAST one state must be recurrent and NOT ALL states
can be transient. Otherwise, after a finite number of times, no states will be visited!
An absorbing state is recurrent, since it will revisit itself infinite times
If state i communicates with state j and state i is: Recurrent, then state j is also recurrent
Transient, then state j is also transient
A class of states is:
Recurrent if all states in the class are recurrent
Transient if all states in the class are transient
Closed if all of the states in the class can only lead to states WITHIN the class. Hence,
states in a closed class are recurrent
Open if the state in the class can lead to states OUTSIDE the class. Hence states in an open
class are transient
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 8/62
8
Zhi Ying Feng
Class Properties
Period
State i has period d(i) if d(i) is the greatest common divisor of all 1n for which 0nii P , i.e. it’s
the number of steps in all possible paths back state i, if the process starts in state i. Note that:
If 0nii P for all n, then 0d i .
If , theni j d i d j
A state with period 1 is called aperiodic
Positive Recurrent
A recurrent state i is positive recurrent if the expected time of return to itself is finite. In a finite
state Markov chain, ALL recurrent states are positive recurrent.
Ergodic
A state i is ergodic if it is positive recurrent and aperiodic.
Limiting Probabilities
Limiting probabilities is the long run probability that a process is in a certain state. For an
irreducible (only one class, all states communicate) and ergodic Markov chain, the limiting
probability exists and is independent of i, i.e. where the process starts from:
lim 0n j ij
n P j
Where j is the unique non-negative solution to the set of equations:
0 0
and 1 j i ij j
i j
P
Or, in matrix form:
1 2where , ,... P
This can be interpreted as:
The probability that the process being in state j at time t is the same as t+1, as t
j is the long run proportion of time that the Markov chain is in state j
Note that the limiting probabilities may not exist at all, or only for even/odd transitions
If state j is transient and state i is recurrent, then lim 0n
ijn P and lim 1n
iin P
If the distribution of the initial states is chosen to be the limiting distribution, then the probabilities
of being in state j initially is same as the probability of being in state j at time n:
0Pr , then Pr j n j X j X j
The mean time between visits to state j is the expected number of transitions jjm until a Markov
chain which starts in state j to return to state j, given by the mean of geometric distribution:
1 jj
j
m
That is, the proportion of time in state j equals the inverse of the mean time between visits to state j.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 9/62
9
Zhi Ying Feng
Mean Time in Transient States
For a finite state Markov chain, let its transient states be denoted by 1,2,...,T t . Let T P denote the
transition matrix that ONLY contains the transient states. Note that the rows will sum to less than 1.
11 12 1
21 22 2
1 2
...
...
...
...
t
t
T
t t tt
P P P
P P P
P P P
P
The mean time spent in transient states ij s is the expected number of periods that the Markov
chain is in state j, given that the process starts in state i.
11 12 1
21 22 2
1 2
...
...
...
t
t
t t tt
s s s
s s s
s s s
S
Conditioning on the initial transition, we have for ,i j T ,
1
1
if
1 if
t
ik kj
k
ij t
ik ki
k
P s i j
s
P s i j
i.e. mean time spent in state j given the process is currently in state i, given by the transition
probability into state k starting in state i, times by the time spent in state j starting in state k ,
summed for all possible k .
Then we have:
1
T
T
S = I P S
S I P
Note that:
11 1
det
a b d b d b A
c d c a c a A ad bc
Branching ProcessesA branching process is a Markov chain that describes the size of a population where each member
in each generation produces a random number offsprings in the next generation.
Let , 0 j p j be the probability of producing j offsprings by each individual in each
generation
Assume that the offsprings of each individual is independent of the numbers by others
Let X 0 be the number of individuals initially present, i.e. the zeroth generation
Individuals produced from the nth generation belong to the (n+1)th generation
Let X n be the size of the nth generation.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 10/62
10
Zhi Ying Feng
Then , 0,1, ...n X n is a branching process with:
00 1 P and state 0 is recurrent
If 0 0 p , all states other than 0 are transient, i.e. the population will either die out or
converge to infinity, since transient states are visited finite amount of times
After n+1 generations, the size of the population is a random sun:
1
1
n X
n i
i
X Y
Where Y k are i.i.d. r.v. that represent the no. of offspring of the X i-th person:
0
Pr , for 0,1,... 1i k k
k
Y k p k p
The mean and variance of number of offspring for an individual is:
0
22
0
var
i k k
i k
k
E Y kp
Y k p
The mean and variance of number of the population of the nth generation is:
12 1
1
2
if 1var
if 1
n
nn
n
n
E X
X
n
Under the assumption that 0 1 X , the probability of extinction, i.e. everyone dies, is given by:
0 0
0
k
k
k
p
Since the pgf of Y is 0
k Y k k
G s p s
we have
0 0Y G
If 1 then 0 1
If 1 then 0 1 , i.e. extinction is certain
Probability Generating Functions
Let X be an integer-valued random variable with i P X i p for 0,1, 2...i . The p.g.f. is:
0
Pr X x X
x
G t E t t X x
Properties:
Relationship between the probability mass function of X :
0
1
!
x X
X x
t
P t p x
x t
Relationship between the moment generating function of X :
exp log X X X m t E Xt G t m t
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 11/62
11
Zhi Ying Feng
Time Reversible Markov Chains
Consider a stationary ergodic Markov chain with stationary probabilities P ij and stationary
probabilities π i. If we start at time n and work backwards, the reversed sequence of states X n , X n-1,…
is also a Markov chain, due to independence being a symmetric relationship.
The transition probabilities of this reversed process are:
1
1
1
1
1
Pr |
Pr ,
Pr
Pr Pr |
Pr
ij m m
m m
m
m m m
m
j ji
i
Q X j X j
X j X j
X j
X j X j X j
X j
P
A time reversible Markov chain is one where:for all ,ij ijQ P i j
For a time reversible Markov chain:
for all ,i ij j ji P P i j
Thus, the rate at which the process goes from state i to state j (LHS) is the same as the rate at which
it goes from state j to state i.
Exponential, Poisson and Gamma Distributions
A r.v. X has an exponential distribution if its probability density function is in the form:
if 0
0 otherwise
xe x f x
The cumulative probability distribution and the survival function are in the form:
1 if 0
0 if 0
xe x
F x x
if 01
0 if 0
xe x
S x F x x
The moment generating function is given by:
for
tX X M t E e
t t
The mean and variance of the exponential distribution is:
2
1
1var
E X
X
The hazard (or failure) rate is given by:
1
x
x
d S x
f x edx xS x F x e
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 12/62
12
Zhi Ying Feng
A key property of the exponential distribution is that it is memoryless. X is said to be memoryless if
for all s,t>0,
Pr | Pr or;
Pr Pr Pr
X s t X t X s
X s t X t X s
Consider independent i X for 1,2,...,i n with parameter i then
If all the parameters are equal i then 1 2 ...n nY X X X has a gamma(n, )
distribution with pdf:
1
1 !n
n
t Y
t f t e
n
1 2min , ,..., n X X X X also has an exponential distribution with parameter and mean:
1
1
1and
n
i ni ii
E X
The probability that i X is the smallest is given by:
1
i
n
j j
If all parameters are not equal, i.e. i j then the sum of n independent exponential r.v. has
a hyperexponential distribution with pdf:
1
, ,
1
wherein
ii
n j x
i n i i n X i j i j i
f x C e C
Counting Process
A counting process , 0 N t t represents the number of events that occur up to time t , i.e. it is a
discrete state space, continuous time stochastic process. A counting process has the properties:
0 N t
N t is integer-valued
for N s N t s t , i.e. it must be non-decreasing
For s t , N t N s is the number of events that occurred in the interval ( , ] s t
A counting process has independent increments if the number of events that occur in the interval
( , ] s t , N t N s , is independent of the number of events that occur up to time s
A counting process has stationary increments if 2 1 N t s N t s has the same distribution as
2 1 N t N t , i.e. the number of events that occur in any interval of time period only depends on
the length of the time period.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 13/62
13
Zhi Ying Feng
Poisson Process
A Poisson process denoted by , 0 N t t is a counting process that counts the number of events
which occur at a rate λ from time 0. It has the properties:
0 0 N
Independent and stationary increments
The number of events in any interval of length t has a Poisson distribution with mean t :
Pr Pr stationary increments
!
n
t
N s t N s n N t n
t e
n
Inter-arrival and Waiting Time
The inter-arrival or holding times , 1, 2, ...nT n is the time between the 1 th
n and thn event. For a
Poisson process, the inter-arrival times are i.i.d. exponential r.v. with:
1
, Pr andn
t t T n n f t e T t e E T
The waiting time nS is the time until the thn event, given by:
1 2
1
...n
n n i
i
S T T T T
The sum of exponential random variables each with SAME parameter has a gamma(n, )
distribution, therefore:
1
1 !n
n
t S
t f t en
Note: if S n is the aggregate of m independent claims, then it has a gamma(n, mλ ) distribution. E.g.
for a motor insurer, each insured motorist makes a claim at a rate of 0.2 per year, and there are a
total of 200 insured. Then S 100, the waiting time until the 100th event, has the distribution:
100 ~ Gamma 100, 0.2 200S
Order Statistics
Let iY be the ith smallest value among 1 2, , ..., nY Y Y , then 1 2, ,..., nY Y Y is called the order statistics. If
iY are i.i.d. continuous r.v. with pdf i f y , then the joint pdf of the order statistics 1 2, ,...,
nY Y Y is:
1 2 1 2
1
, ,..., ! , where ...n
n i n
i
f y y y n f y y y y
If iY are uniformly distributed over 0,t then the joint pdf of the order statistics is:
1 2
!, , ..., n n
n f y y y
t
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 14/62
14
Zhi Ying Feng
Conditional Distribution of Arrival Time
Given that N t n , i.e. up until time t , n events have occurred, the n arrival times 1,..., nS S have the
same distribution as the order statistics corresponding to n independent r.v. uniformly distributed
over 0,t .
Note that the following two events are equivalent:
1 1 2 2 1 1 2 2 1 1 1, ,..., , , ,..., ,n n n n n nS s S s S n N t n T s T s s T s s T t s
i.e. inter-arrival time between the 1 th
n and thn event is equal to the waiting time for n events
minus the waiting time for 1 th
n events:
11
1 21 2
, ,..., ,, ,..., |
Pr
...
!
!
n n n
nn
s s t s s
n
t
n
f s s s n f s s s N t n
N t n
e e e
t e
n
n
t
The conditional marginal distribution is:
Pr ,Pr |
Pr
Pr Pr
Pr
!
! !
n mm t s s
n t
mn m
n
N s m N t n N s m N t n
N t n
N s m N t s n m
N t n
n s e t s e
m n m t e
n st s
m t
Thinning of Poisson Process
Consider a Poisson process
, 0 N t t with rate and that each time an event occurs, it is either:
type I event with probability
type II event with probability 1
p
p
And that it is independent of all other events. Let 1 N t and 2 N t denote the type I and type II
events occurring in [0,t]. Let the sum of the two independent process be:
1 2 N t N t N t
1 , 0 N t t is a Poisson process with rate p
2 , 0 N t t is a Poisson process with rate 1 p
, 0 N t t is a Poisson process with rate equal to the sum of the two rates, i.e.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 15/62
15
Zhi Ying Feng
Non-homogenous Poisson Process
The counting process , 0 N t t is a non-homogenous Poisson process with intensity function
t , i.e. non-constant rate, if:
0 0 N
It has independent increments It has unit jumps, i.e.
Pr 1
Pr 2
N t h N t h o h
N t h N t o h
t itself can be deterministic or a process itself. Note that if t then it is a homogenous
Poisson process.
The mean value function of a non-homogenous Poisson Process is:
0
t
m t y dy
Then:
N t is a Poisson r.v. with mean m t
N s t N s is a Poisson r.v. with mean m s t m s
1 , 0 N m t t is a homogenous Poisson process with intensity 1
Compound Poisson Process
A stochastic process , 0 X t t is a compound Poisson process if
1
N t
i
i
X t Y
Where:
, 0 N t t is a Poisson process
, 1, 2, ...iY i are i.i.d. random variables
This is useful for insurance companies, where the uncertainty on total claim size X(t) is due to both
the number of claims N(t) which follows a Poisson process, and also the claim size Y , which canhave any distribution as long as they are i.i.d. Note that claim size Y is independent of the number of
claims N(t)
The mean and variance of a compound Poisson process is given by:
2var
i
i
E X t tE Y
X t t E Y
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 16/62
16
Zhi Ying Feng
Continuous-time Markov Chains
A continuous-time Markov chain, or a Markov jump process, is a Markov process in continuous
time with a discrete set space. The Markov jump process , 0 X t t has a continuous version of
the Markov property: for all , 0 s t and nonnegative integers , , , 0i j x u u s :
,
Pr , Pr |
0
X t i X t s j X u x u X t s j X t i
u t
i.e. the process at time t+s is only conditional on the state at time t and independent of its past
states before time t , i.e. for 0 X u x u u t . This means that time path up to the state at time t
does not matter.
The Markov jump process is a homogenous process, i.e. all transition probabilities are stationary, if:
Pr | Pr | 0 i.e. independent of
ij
X t s j X t i X s j X i t
P s
Hence, the transition probability over a time period only depends on the duration of the time period.
The Poisson Process is a continuous time Markov chain:
Pr | Pr jumps in ( , ] | jumps in 0,
Pr jumps in ( , ] independent increments
Pr jumps in (0, ] stationary increments
!
j i
t
X t s j X s i j i s s t i s
j i s s t
j i t
t e j i
Time Spent in a State
The time spent in a state i before the next jump, denoted as T i, for a continuous Markov chain, has
the memoryless property:
Pr | Pr , , 0
Pr , | Markov Property
Pr , 0 | 0 Stationary IncrementsPr
i i
i
T s t T s X v i s v s t X u i u s
X v i s v s t X s i
X v i v t X iT t
The only continuous distribution with the memoryless property is the exponential distribution.
Hence, the time spent in a state is exponentially distributed with:
~ exp Pr 1 iv t i i iT v T t e
With the expected time spent in state i before transitioning into another state is:
1i i E T v
Where vi is the rate transition when in state i, i.e. the transition rate out of state i to another state. It
is the sum of all the rates of jumping from state i to another state j, denoted as qij:
i ij ii j iv q q
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 17/62
17
Zhi Ying Feng
Transition Rates and Probabilities
The transition probability of the continuous-time Markov chain is defined as:
, Pr |ij P t t s X t s j X t i
For a homogenous or stationary process we have stationary transition probabilities:
, Pr |
Pr | 0
ij
ij
P t t s X t s j X t i
X s j X i
P s
With the initial conditions
0 if
, 01 if
ij ij
i j P s s P
i j
Transition Rates of Homogenous Processes
The instantaneous transition rate ijq from state i to state j of a continuous-time Markov chain is
defined as the rate of change of the transition probability over a small period of time:
0
00
0
lim if 0 0
lim1
lim if
ij
ijhij ij
ijh
t iiii i
h
P hq i j
P h P h P t
t h P hq v i j
h
Where i iiv q denote the transition rate out of state i when the process is in state i. Note that:
1 sum of all transition probabilities is 1
0 sum of all transition rates is 0
ij j
ij j
P
q
Therefore, we have that:
0ii ij j i
ii ij i j i
q q
q q v
Equivalently, the transition probability over a small time h can be defined as:
if
1 if
ijij
ii
q h o h i j P h
q h o h i j
Transition Rates of Non-homogenous Processes
In the non-homogenous case, the transition rates are also a derivative of the transition probabilities:
0
0
0
,lim if
, ,, lim
, 1lim if
ij
ijhij ij
ijh
t s iiii i
h
P s s hq s i j
P s s h P s s h P s t
t h P s s hq s v s i j
h
Equivalently, the transition probability over a small time h can be defined as:
if ,
1 if ij
ijii
q s h o h i j P s s hq s h o h i j
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 18/62
18
Zhi Ying Feng
Now, ignore when the transition occurs and how long is spent in each state. Consider only the
series of states that the process transitions into. Let:
iv be the transition rate OUT of state i when the process is in state i
ij P be the probability that the transition is into state j, conditional on the fact that a transition
has OCCURRED and process is currently in state i. (see embedded Markov chain)
Then, ijq , the transition rate INTO state j, when the process is in state i, is given by:
iij ijq v P
Using Pr | P A B A A P B , where A is the event of moving out of state i, B is the event of
moving to state j. Then we also have:
ii i ij i ij j i j i
ii i ij j i
q v q v P
q v q
Therefore,
ij ij
ij
i ij
j i
q q P
v q
Chapman-Kolmogorov Equations
For a homogenous continuous-time Markov chain the Chapman-Kolmogorov equation is:
0
for 0 and all states ,ij ik kj
k
P t s P t P s t i j
With the initial conditions:
0 if
01 if
ij
i j P
i j
The Kolmogorov’s backward equation is:
ij ik kj ik kj i ij
k k i
P t q P t q P t v P t t
The Kolmogorov’s forward equation is:
ij ik kj ik kj j ij
k k j
P t P t q P t q v P t t
Define the matrices:
ij ij ijt P t q t P t t t
P Q = P
In matrix form we have:
Chapman-Kolmogorov Equations
Kolmogorov's Backward Equations
Kolmogorov's Forward Equations
Initial Condition 0
t s t s
t t t
t t t
P P P
P QP
P P Q
P I
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 19/62
19
Zhi Ying Feng
Limiting Probabilities
The limiting probabilities of a continuous-time Markov chain P j is the long run probability, or the
long-run proportion of time that the process will be in state j, independent of the initial state:
lim j ijt
P P t
To solve the limiting probabilities, if it exists, we solve the equations:
0 j j kj k kj k
k j k
v P q P q P
0P Q
1k
k
P
This equation implies that the rate at which process leaves state j is equal to the rate at which the
process enters state j:
j jv P is the rate at which process leaves state j since j P is the long-run proportion of time the
process is in state j and jv is the rate of transition out of state j when the process is in state j
kj k k j
q P
is the rate at which the process enters state j since
k
P is the long-run proportion
of the time the process is in state k and kjq is the rate of transition from state k to state j
For the limiting probabilities limt ij P to exist, the conditions are:
All states communicate so that starting in state i there is a positive probability of being in
state j
Markov chain is positive recurrent so that starting in any state, the mean time to return to
that state is finite
When the limiting probabilities exist, then the Markov chain is ergodic, note that aperiodic
is unnecessary as periodicity does not apply to continuous-time Markov chains
If the initial state is chosen according to the limiting distribution, then the probability of being in
state j at time t for all t has the same distribution (homogenous)
Pr 0 Pr j j X j P X t j P
Embedded Markov Chain
For a continuous-time Markov chain that is ergodic with limiting probabilities i P , the sequence of
states visited (ignoring time spent in each state) is a discrete-time Markov chain. This discrete-timeMarkov chain is known as the embedded Markov chain.
The embedded Markov chain has transition probabilities:
Pr transition to state | transition out of state
Pr transition from state to state
Pr transition out of state
ij
ij ij
i ij j i
P j i
i j
i
q q
v q
Note: in general ij ij P P t ! Do not get confused!
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 20/62
20
Zhi Ying Feng
Assuming that the embedded Markov chain is ergodic, its limiting probabilities i , i.e. the long run
proportion of transitions into state i, are the unique solutions of the set of equations:
and 1i j ji i
j i
P
The proportion of time the continuous-time process spends in state i is given, i.e. the limiting
probabilities of the original continuous Markov process, can also be found using:
i i
i
j j j
v P
v
Note that i is the limiting probability of the embedded Markov chain, is the proportion of the
number of transits into state i multiplied by1 iv , i.e. the mean time spent in state i during a visit
Time Reversibility
Going backwards, given the process is in state i at time t the probability that the process has been in
state i for an amount of time greater than s is
iv s
e
:
Pr Process in state throughout [ , ]Pr Process in state throughout , |
Pr
Pr Pr
Pr
i
i
v s
i t s t i t s t X t i
X t i
X t s i T s
X t i
e
Since for large t , we have the limiting probability Pr Pr i X t s i X t i P . Therefore, if
we go back in time, the amount of time the process spends in state i is also exponentially with rate
iv . Thus the reversed continuous-time Markov chain has the same transition intensities as for the
forward-time process.
The sequence of states visited by the reversed process, i.e. its embedded chain, is a discrete-time
Markov chain with transition probabilities:
j ji
ij
i
P Q
Thus, the continuous-time Markov chain will be time reversible, i.e. with same probability
structure as the original process, if the embedded chain is time reversible, i.e.
i ij j ji P P
Using the proportion of time the continuous-time chain is in state i:
i i i
i i i i
j j j j j
i ij j ji i i ij j j ji
v P Pv
vv
P P Pv P P v P
Note that ij i ijq v P , then we have an equivalent condition for time reversibility:
i ij j ji Pq P q
i.e. rate at which the process goes directly from i to j is the same as the rate to go directly from j to i.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 21/62
21
Zhi Ying Feng
Birth and Death Process
A birth and death process is a continuous-time Markov chain with states 0,1, 2... for which
transitions from state n may only go into either 1n (a birth) or state 1n ( a death).
Suppose that the number of people in a population/system is n
New arrivals enter the population/system at a birth/arrival rate n
- The time until the next arrival is exponentially distributed with mean 1 n
People leave the population/system at a death/departure rate n
- The time until the next departure is exponentially distributed with mean 1 n
Transition Rates and Embedded Markov Chain
Let vi be the transition rate out of state i when the process is in state i. Initially with a population
of 0, there can only be birth (population can't be negative), so 0 0
0 0i ij
i i i j i
vv q
v
For the corresponding embedded Markov chain, let the transition probability P ij denote the
transition probabilities between states. If the Markov chain is in state 0 and a transition occurs, it
must transition into state 1, therefore:
01 1 P
To derive the transition probabilities of the embedded Markov chain, consider a population of i. It
will jump to 1i if a birth occurs before death and jump to 1i if a death occurs before birth:
The time to a birth T b is exponential with rate i
The time to a death T d is exponential with ratei
Therefore, the probability that a birth occurs before a death is given by: (minimum of ind. Exp.)
Pr ib d
i i
T T
Then the transition probabilities are:
, 1 , 1 , 1 ,and 1 and 0 for 1 1i ii i i i i i i k
i i i i
P P P P i k i
Also, since the time to the next jump, regardless of whether it is a birth or death, is exponential with
rate vi and the rate of jumping out of state i is either a birth or death, then we have as before:
i i iv
Examples of Birth and Death Processes
Birth and death rates independent of n
A supermarket has one server and customers join a queue. Customers arrive at rate with
exponential inter-arrival time (Poisson process). The server serves at a rate with service time also
exponentially distributed. Let X(t) be the number in the queue at time t , then:
for 0
for 10 for 0
n
n
n
nn
, 1
, 1
1 0
1,2,...
0 0
1,2,...
i i
i i
for i P
for i
for i P
for i
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 22/62
22
Zhi Ying Feng
Birth and death rates dependent on n
If there are now s checkout operators serving the queue, with customers joining a single queue and
going to the first available server, then:
for 0
0 for 0
for 1
for
n
n
n
n
n n s
s n s
, 1
, 1
for 0,1, 2...min ,
min , for 0,1,2..min ,
i i
i i
P ii s
i s P ii s
Poisson Process
A poison process is a special case of a birth and death process, with no death:
for 0
0 for 0
n
n
n
n
, 1
, 1
1 for 0,1, 2...
0 for 0,1, 2..
i i
i i
P i
P i
Population Growth
Each individual in a population gives birth at an exponential rate plus there is immigration at an
exponential rate
. Each individual has an exponential death rate
for 0
for 0
n
n
n n
n n
, 1
, 1
for 0,1,2...
for 0,1,2..
i i
i i
n P i
n n
n P i
n n
Expected Time in States
Let Ti be the time for the process, starting from state i, to enter state i+1, i.e. time until a birth.
Define the indicator function I as:
1 if first transition is 1
0 if first transition is 1i
i i I
i i
Then the expected value of T i can be calculated by conditioning on the first transition:
1
1
1
Pr 1 Pr 0| 1 | 0
1 1
1
1
i
ii i
i i i i
i i
i
i i
i i
i i
i i i i
i i
i i i i i i
i
T
T T
T
I I T I T I
T T
Since the time to first transition, regardless of whether it is a birth or death, is exponential with rate
i i . The time taken to i+1 if there is a death is the time from i-1 to i then from i to i+1. Note:
00
1T
In the case where the rates are homogenous, i.e. , for alli i i :
111
1 ...
ii
iT
Then the expected time to go from state i to a higher state j is:
1 1... j i i jT T T T
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 23/62
23
Zhi Ying Feng
Kolmogorov Equations
We can use the transition rates of a death and birth process to find the Kolmogorov’s equations, and
then solve these differential equations to find the transition probabilities.
0,1 0
0,0 0 0
0, 0 for all 1i
q
q v
q i
, 1
, 1
, , otherwise zero
i i i
i i i
i i i i i
q
q
q v
Kolmogorov’s Backward Equations
0, 0 1, 0 0,
, 1, 1, ,
j j j
i j i i j i i j i i i j
P t P t P t t
P t P t P t P t t
Kolmogorov’s Forward Equations
,0 0 ,1 0 ,0
, 1 , 1 1 , 1 ,
i i i
i j j i j j i j i i i j
P t P t P t t
P t P t P t P t t
Limiting Probabilities
The limiting probabilities are determined by the balance equations:
for all and 1 j j kj k k
k j k
v P q P j P
This states that the rate at which the process leaves state j (RHS) is equal to the rate at which the
process enters state j (LHS). For a birth and death process, this condition breaks down iteratively:
1 1 1 1 1 1n n n n n n n n n n n P P P P P Then in general:
0 01 11 0 2 1
1 0
0
1 2 2 1
1 111 0
1
0
2 1
...
, ...
n jn
n n
jn n j
n
P P P P P
P P P P
Substituting this into the other balance equation0
1nn P
:
1 01
0 0 0 01 011 1 2 1
12 1
1
1 ... 11 ...
n
nnn n n
nn
P P P P P
Substituting back into (2) for P n gives:
1 01
1 0 2 110
1 012 11
2 1
...
...
1 ...
n
n nn
nnn
n
P P
This also gives the condition for long run probabilities to exist, i.e.:
1 01
12 1
...n
nn
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 24/62
24
Zhi Ying Feng
Application: Pure Birth Process
The Poisson process is an example of a pure birth process, i.e.
for 0
0 for 0
n
n
n
n
In this case, re arranging the Kolmogorov backward equations for the case 0i
, 1, , , , 1,i j i j i j i j i j i j P t P t P t P t P t P t t t
Multiply both sides by t e :
, , 1,t t
i j i j i je P t P t e P t t
By the product rule, this is equivalent to:
, 1,t t
i j i je P t e P t t
Integrate both sides w.r.t. from 0 to s:
0, , 1, ,
0
, 1,0
0 0 0 for s
s t i j i j i j i j
s s t
i j i j
e P s e P e P t dt P i j
P s e P t dt
LHS is the probability of moving from state i to j over the time period 0 to s
RHS is the probability of the process staying in state i for a time s t , i.e. s t
e
, then
jumping to state i+1 over the time interval s t to s t dt , i.e. dt , then the probability
that the process starts in i+1 and finishes in j over the time interval s t to s, i.e. 1,i j P t dt ,
integrated for all possible times from 0 to s, which is equivalent to t varying from 0 to s.
Application: Simple Sickness Model
In a simple sickness model, an individual can be in state 0 (healthy) or in state 1 (sick).
An individual remains healthy for an exponential time with mean1 before becoming sick
An individual remains sick for an exponential time with mean1 before becoming healthy
In this birth and death process, we have:
0 1and
With all other ,i i being zeros.
Using Kolmogorov’s backward/forward equations, the transition probabilities are:
0,0
1,0
0,1
1,1
1
1
s
s
s
s
P s e
P s e
P s e
P s e
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 25/62
25
Zhi Ying Feng
Occupancy Probabilities and Time
In a simple sickness model, the occupancy probabilities are defined as:
0,0 P s is the probability that a healthy individual stays healthy for a period of s
1,1
P s is the probability that a sick individual remains sick for same period
By the Markov property, the time spent in state 0 or state 1 is exponential with the memoryless
property:
0 00,0
1 11,1
Pr 1 Pr 1 (1 )
Pr 1 Pr 1 (1 )
s s
s s
P s T s T s e e
P s T s T s e e
The occupancy time O(t) is the total time that the process spends in each state during the interval
(0,t). If we define the indicator function:
1 if 1
0 if 0
X s I s
X s
Then the occupation time for being sick is:
0
t O t I s ds
The expected occupation time being sick given that the initial state is healthy is given by:
0
0
0
0,10
0
2
| 0 0 | 0 0
| 0 0
Pr 1| 0 0
1
1
t
t
t
t
t s
t
O t X I s ds X
I s X ds
X s X ds
P s ds
e ds
t e
First Holding Time
The first holding time T 0 is the first the time process jumps out of the initial state:
0 0inf : t T t X X For a homogenous Markov jump process, this is exponentially distributed with rate i which is the
rate of jumping out of state i, previously denoted as iv .
i ii ij j iv q
Thus, the first holding time has the c.d.f. and p.d.f.:
0 0 0 0Pr | Pr |i it t iT t X i e T t X i e
The probability of the state to which the process jumps from is:
0 0Pr |
ij
T iji
q
X j X i P
Where0T X is independent of 0T
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 26/62
26
Zhi Ying Feng
Non-homogenous Markov Jump Processes
Let the transition rates be denoted by ij s , equivalent to the ijq s previously
,ij ij
t s
s P s t t
So that the non-homogenous transition probabilities over a small time period of h is:
if ,
1 if
ij
ij
ii
h s o h i j P s s h
h s o h i j
ii ij
i j
s s
Note that this can be written as:
if ,
1 if
ijij
ii
h s h o h i j P s h s
h s h o h i j
So that:
, ,ij ij
ij
P s h s P s s o h s h
h
Then by letting 0h we get:
0
, ,lim
, differential w.r.t
ij ij
ijh
ij
t s
P s h s P s s o h s
h
P s t s s
The Chapman-Kolmogorov equations are:
, , , , , ,
,
ij ik kj
k P s t P s u P u t s t s u u t
s s
P P P
P I
The Kolmogorov’s forward equations are derived by differentiating the Chapman-Kolmogorov
equation w.r.t t , then setting u t :
, , ,
, , ,
, ,
ij ik kj
k u t
kj ik kj ik j ij
k k j
P s t P s u P u t t t
t P s t t P s t v t P s t
s t s t t t
P P Q
The Kolmogorov’s backward equations are derived by differentiating the Chapman-Kolmogorov
equation w.r.t s, then setting u s :
, , ,
, , ,
, ,
ij ik kj
k u s
ik kj ik kj j kj
k k j
P s t P s u P s t s s
s P s t s P s t v t P s t
s t s s t
s
P Q P
Where t Q is the matrix containing ij s for all i,j.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 27/62
27
Zhi Ying Feng
Residual Holding Time
The residual holding time R s is the time between s and the NEXT jump:
, , s s u R w X i X i s u s w
i.e. the state remains in the same state i between time s and s+w. Note that this is the non-
homogenous case of first holding time. We can show that:
Pr | exp s w
s s i s
R w X i v u du
The density of | s s R w X i is given by:
Pr | 1 Pr | exp s w
s s s s i i s
R w X i R w X i v s w v u duw w
Define
s s s R X X
as the state the process jumps to at the next jump. The density for this is:
Pr | ,
ij
s s s
i
s w X j X i R w
v s w
Then we have the transition probability as:
, Pr |
s w
i s
ij t s
v u du
i
P s t X j X i
e v s w
iI
i
s w
v s w
0
,
t s
Ij
I i
P s w t dw
This is the conditional probability that a process is in state j at time t , given that it started in state i at
time s. This transition can be done by:
(1) Staying in state i from time s for a duration of w, i.e.
s w
i s
v u d ue
(2)
Jump out of state i, i.e. i s w
(3) Given that a jump has occurred, it jumps to an intermediate state I , i.e. iI i s w v s w
(4) Then jump from state I to j over s+w to t and stay in state j until time t , i.e. , Ij P s w t
(5) Integrated over all possible w and summed for all possible states
This is the integral form of the Kolmogorov backward equation, i.e. we consider the jump (q) first
Current Holding Time
For non-homogenous processes the current holding time C t is the time between the last jump and t :
, ,t t uC w X j X j t w u t
It can be shown that:
Pr | expt
t t jt w
C w X j v u du
The density of |t t C X j is given by:
Pr | 1 Pr | expt
t t t t j jt w
C w X j C w X j v t w v duw w
Then we have the transition probability as:
0
, ,
t
jt w
t sv u dukj
ij ik jk j j
t w P s t P s t w v t w e dw
v t w
This is the integral form of the Kolmogorov forward equation, i.e. we consider the jump (q) last.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 28/62
28
Zhi Ying Feng
Part 2: Time Series
Introduction to Time Series
A time series is a sequence of observations that are recorded at regular time intervals, usually
discrete and evenly spaced, e.g. daily or monthly. Let t x denote the observed data:
1 2, ,... t x x x
A time series model for t x is a family of distributions to which the joint distribution of t X is
assumed to belong. Let t X denote the time series, where each t X is a r.v.:
1 1..., , , ,...t t t X X X
Therefore, 1 1, ,t t t x x x are realisations of the r.v. 1 1, ,t t t X X X
Classical Decomposition Model
The classical decomposition model decomposes the original data X t into 3 components
t t t t X T S N
1
0 0d
t t t d t j j N S S S
Where:
T t is a deterministic trend component that is slowly changing and perfectly predictable
S t is a deterministic seasonal component with a known period of d and perfectly
predictable. Note that the seasonal component over a complete cycle is zero, i.e. d would be
4 if it is quarterly.
N t is a random component with expected value 0, as all information should be captured in
the trend and seasonal component. N t may be correlated and hence partially predictable
Moving Average Linear Filters
A moving average linear filter has the form:
1
2 1
q
t j t j t j
j j q
T a X X q
Eliminating the Seasonality
A moving average linear can also be passed through X t to eliminate the seasonal component:
If the period d is odd, i.e. 2 1d q , use the filter:
1
1 1 1...t q t q t q X X X d d d
If the period d is even, i.e. 2d q , use the filter
1 1
1 1 1...
2 2t q t q t q t q X X X X
d
Estimating the Trend
A moving average linear filter can be applied to estimate the trend. Consider t t t X T N where
t T t , applying a 2q+1 point moving average filter to X t gives:
1 1
2 1 2 1
q q
t t j t j t j t
j q j q
T T N t j N t T q q
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 29/62
29
Zhi Ying Feng
For the classic decomposition model:
1 1 1
2 1 2 1 2 1
q q q q
t j t j t j t j t j
j q j q j q j q
T a X T S N q q q
This model works well, i.e. t t T T if:
The trend is approximately linear
The sum of N t is close to zero
The sum of S t is zero
The estimated noise, or the residuals, is:
t t t t N x T S
Differencing
The backshift operator, or the lag operator, B is defined by:
1t t
jt t j
BX X
B X X
The difference operator is defined by:
11t t t t X B X X X
The powers of are defined by:
0
1 j j
t t
t t
X B X
X X
The difference operator with lag d is defined by:
1 d d t t t t d X B X X X
Eliminating the Trend
Trend can be eliminated by using differencing. E.g. consider a linear trend:
0 1,t t t t X T N T c c t
Apply differencing to the power of one:
0 1 0 1
1
1
t t t
t
t
X T N
c c t c c t N
c N
i.e. the trend becomes a constant c1, which is stationary. This constant can be estimated by the
sample average of t X . In general, a trend of a polynomial of degree k can be reduced to a constant
by differencing k times:
0 1 ... k k k k t k t t t T c c t c t X T N
Which is a random process with mean ! k k c
Differencing to Eliminate Seasonality
Seasonality can also be eliminated by differencing, by differencing once only but by a lag d
equivalent to the period of the seasonality:0d t t t d S S S
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 30/62
30
Zhi Ying Feng
Stationarity
The theory of stationary random processes is important in modelling time series, as stationarity
allows parameters to be estimated efficiently, since we can treat all samples as from the same
distribution. A non-stationary random process must be transformed into a stationary one before
analysis and modelling, i.e. by removing trend and seasonality, or by applying transformations.
A random process t X is said to be:
Integrated of order 0, i.e. 0 I if t X itself is a stationary time series process
Integrated of order 1, i.e. 1 I if t X is not stationary, but the increment through
differencing 1t t t t Y X X X form a stationary process
Integrated of order n, i.e. I n if 1n
t X is still not stationary, but the nth differenced series
nt t Y X is a stationary time series process.
However, the deseasonalised, detrended residuals, i.e. the random component, may still contain
information, i.e. they are correlated over time. Assuming stationarity in the residuals, we can try to
fit a probability model to the residuals for forecasting purposes. To do so, we will need to look at
some sample statistics of the residuals:
Sample Statistics
Let t X be a stochastic process such that var t X for all t .
The autocovariance function (ACVF) is defined by:
,
0 , var
t t
t t t
Cov X X
Cov X X X
The autocorrelation function (ACF) is defined by:
,
,0 ,
t t t t
t t
Cov X X Corr X X
Cov X X
The covariance function is defined to be:
,Cov X Y X X Y Y
XY X Y
Some properties of the covariance function are:
, 0
, ,
, , , , ,
Cov X a
Cov aX bY abCov X Y
Cov aX bY cU dV acCov X U adCov X V bcCov Y U bdCov Y V
A process t X is said to be weakly stationary if:
cov ,
t
t t
X
X X
i.e. mean is constant and the covariance of the process only depend on the time difference .
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 31/62
31
Zhi Ying Feng
Noise
I.I.D Noise
t X is i.i.d. noise if t X and t h X are independently and identically distributed with mean zero, i.e.
no covariance.
2~ IID 0,t X
Assuming 2t X , as for weakly stationary series we need bounded first and second moments,
then:
2
0
if 0
0 if 0
X
X
hh
h
White Noise
t X is white noise with zero mean if:
2~ WN 0,t X
Where:
2
0
if 0
0 if 0
X
X
hh
h
IID noise is white noise, but white noise is not necessarily IID noise.
White noise is weakly stationary
Usually, we assume that the error terms have a normal distribution for the purpose of
parameter estimation and etc.
2~ 0,t X N
Linear Processes
t X is a linear process if it can be represented as:
jt j t j j t t
j j
X Z B Z B Z
Where j is a sequence of constants with j j
and 2~ WN 0,t Z .
Linear processes are stationary because they are a linear combination of stationary white noise
terms. For stationary processes, the regularity condition j j
holds, i.e. j j
is
absolutely convergent. This ensures that the infinite sum can be manipulated the same way as a
finite sum, i.e. two absolutely convergent series can be added or multiplied together.
In general, if t Y is stationary and j j
holds then
t t X B Y
t X is stationary. Essentially, ALL stationary processes can be represented by a linear process
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 32/62
32
Zhi Ying Feng
Time Series Models
Moving Average Process
t X is a first-order moving average process MA(1), if there is a process 2~ WN 0,t Z and a
constant θ such that:
1t t t X Z Z
i.e. the next term depends on the noise plus the proportion of the noise of the previous period.
The mean is:
1
0
t t t X Z Z
The autocovariance ACVF is:
2 2
1 1 1
21 1 2 1
1 1
,
, var 1 if 0
, var if 1
, 0 if 1
t t h
t t t t t t
t t t t t
t t t h t h
h Cov X X
Cov Z Z Z Z Z Z h
Cov Z Z Z Z Z h
Cov Z Z Z Z h
The autocorrelations ACF is:
22 2
1 if 0
if 10 11
0 if 1
h
h hh h
h
The conditional mean is stochastic and depends on X t :
1 1 1 1| , ,... | 0t t t t t t t t X X X Z Z Z Z X
The conditional variance is constant, i.e. independent on X t :
2 21 1 1 1var | , ,... var | 1 var t t t t t t X X X Z Z X
Therefore, the conditional and unconditional mean and variance are different.
t X is a moving average process of order q, MA(q), if the process depends on its previous q
realisations of noises:
1 1 ...t t t q t q X Z Z Z
Note:
Moving average processes are stationary processes as X t is a linear combination of stationary
white noise terms
The ACF of a MA(q) process has non-zero values up until lag q, and near-zero values for all
lags greater than q
The conditional mean/variance is used for forecasts, whereas the unconditional
mean/variance is the long run results
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 33/62
33
Zhi Ying Feng
Autoregressive Process
t X is a first-order autoregressive process AR(1) if t X is stationary and there is a process
2~ WN 0,t Z , uncorrelated with all X s for s t and a constant such that:
1t t t X X Z
i.e. the next term depends on the previous value of the process and slowly converges to its mean. Note that an AR(1) can also be written as:
1
1 1
1
1
0
...h h
t t h t t t h
hh j
t h t j
j
X X Z Z Z
X Z
The mean is:
1
0
X t t E X E Z
The autocovariance ACVF is:
2
2
2
2
if 01
cov ,
if 01
X t t h
h
h
h X X
h
The autocorrelations ACF is:
1 if 0
0 if 0
X X h
X
hhh
h
The conditional mean is stochastic and depends on X t : 1 1 1 1| , ,... | 0t t t t t t t t E X X X X E Z X X E X
The conditional variance is constant, i.e. independent on X t :
21 1 1 1var | , ,... var | var t t t t t t t X X X X Z X X
Therefore, the conditional and unconditional mean and variance are different.
t X is an autoregressive process of order p, AR(p) if:
1 1 2 2 ...t t t p t p t X X X X Z
Note: For an AR(1) process to be stationary, 1 , so it can be expressed as a linear process.
When 1 , the process is known as a random walk.
For a higher order AR(p) process, there will also be conditions on 1 2, ... p for stationarity
An AR(1) process is equivalent to a MA( ∞ ) process and a MA(1) process is equivalent to an
AR( ∞) process. Theoretically, MA( ∞ ) is equivalent to and can be converted to a AR( ∞)
process
1
1
0 0
ash
h j j
t t h t j t j
j j
X X Z Z h
An AR(p) process has, in absolute values, in general decaying non-zero ACF for all lags.
The smaller , the faster ACF decays. If is negative, then ACF will have alternating signs.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 34/62
34
Zhi Ying Feng
Autoregressive Moving Average Models
t X is a first-order autoregressive moving average model ARMA(1,1), if X t is stationary and:
1 1t t t t X X Z Z
Where ϕ and θ are constants and 2~ WN 0,t Z . Or alternatively:
t t B X B Z
Where
1 and 1 B B B B
Then the ARMA(1,1) equation can be written as:
t t B X B Z
Note that for X t to be stationary, the condition is the same as the AR(1) process, i.e. 1 .
t X is an ARMA process of order (p,q), i.e. ARMA(p,q) if:
1 1 1 1... ...t t p t p t t q t q
t t
X X X Z Z Z
B X B Z
Where:
21 21 ... p
p z z z z
1 21 ... qq z z z z
The polynomials ϕ(z) and θ (z) have no common roots
The ARMA(p,q) process t X is stationary if the equation 0 z has no roots on the unit circle,note that the roots can be complex i.e.:
0 for 1 z z
t X is an ARMA(p,q) process with drift if it is in the form:
1 1 1 1... ...t t p t p t t q t q X X X Z Z Z
If we estimate then remove the mean, then we have X as a normal ARMA(p,q) process. The
drift here represents the expected change after differencing.
CausalityAn ARMA(p,q) process t X is causal if there exists constants j such that j and:
0
t j t j
j
X Z
i.e. X t is expressible in terms of current and past noise terms. Causal processes are a subset of
stationary processes, i.e. to be causal it must be stationary first. Causality is important in practice,
since if X t is not causal then it depends on future noise term, which doesn’t make sense.
Theorem:
The t X satisfying t t B X B Z is causal if and only if all the roots of the equation 0 z
are outside the unit circle.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 35/62
35
Zhi Ying Feng
Invertibility
An ARMA(p,q) process t X is invertible if there exists constants j such that j and:
0
t j t j
j
Z X
i.e. Z t is expressible in terms of current and past X t . If X t is not invertible then it depends on future processes, again it does not make sense
Theorem:
The t X satisfying t t B X B Z is invertible if and only if all the roots of the equation
0 z are outside the unit circle.
Note:
All MA processes are causal, but AR processes might not be
All AR processes are invertible, but MA processes might not be
An equivalent condition for an AR(1) process to be causal is 1
An equivalent condition for an MA(1) process to be invertible is 1
Calculation of ACF
Linear Filter Method
Consider the causal ARMA(p,q) process:
1 1 1 1... ...t t p t p t t q t q X X X Z Z Z
This can be written as:
0
t t t j t j
j
B X Z B Z Z
B
Note that the summation is only from 0 to infinity, as the process is causal.
First step is to determine the j by equating the coefficients in the equation:
t t B B Z B Z
1 1 2 11 0 1 2 11 ... ... 1 ... p q
p q B B B B B B
Then calculate the ACF by replacing the X t by their linear filter form. For h>0:
0 0
0
2 2
0
,
,
, push index back so that both term has
since , and , 0
t h t
j t h j j t j
j j
j h t j j t j t j
j h j
j h j t t t s j
h Cov X X
Cov Z Z
Cov Z Z Z
Cov Z Z Cov Z Z
This method is convenient for MA processes, since they are easily expressed as linear processes
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 36/62
36
Zhi Ying Feng
Yule-Walker Equations
Consider the causal ARMA(p,q) process:
1 1 1 1... ...t t p t p t t q t q X X X Z Z Z
First, multiply both sides by X t-h and then take the covariance:
1 1 ...t h t t h t p t h t p hCov X X Cov X X Cov X X C
Which is equivalent to:
1 1 ... p hh h h p C
Where C h is the moving average components on the RHS:
1 1 ...h t h t t h t q t h t qC Cov X Z Cov X Z Cov X Z
For 0,1,2,...h p there are a total of 1 p equations which can be expressed in matrix form:
11 00
1 21 11
2 1 31 2
11
1 ...0 1 ...0
1 ... 01 0 ... 11
... 02 1 ... 2
... 11 ... 0
p p
p
p
p
p p p p
p C C
p C C
p C
C p p p C
Thus,
given C j and ϕ j, the ACF can be computed by solving the linear equation, i.e. by taking the inverse.
For h p , we can find the ACF through:
1 1 ... p hh h h p C
To figure out C 0, C 1,…C p, if ψ j’s are available, then we can use:
0
t h j t h j j h t j
j j h
X Z Z
This gives C h as:
1 1
1 1
2 2
0
...
...
h t h t t q t q
j h t j t t q t q
j h
q q h
j h j j j h
j h j
C Cov X Z Z Z
Cov Z Z Z Z
Where 0 1 and 0 0
q h
j j h j
. However, if ψ j’s are available, we may as well use method 1!
Therefore, this method is more convenient for AR processes
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 37/62
37
Zhi Ying Feng
Partial Autocorrelation Function
The autocorrelation function is used to determine the no. of lags for a MA(q) process, while the
partial autocorrelation function is necessary to determine the no. of lags for a AR(p) process. In
finding ACF , we set up the Yule-Walker equations and converted into matrix form to solve for h .
By omitting the first equation, we can rewrite this matrix to solve for the parameters j s:
1 1
2 2
0 1 ... 1
1 0 ... 2
1 2 ... 0
1
2
p p
p
p
p
C
p
C
C p
Or equivalently:
p p p pC
Where the red matrix is the covariance matrix Γ and C h are:
1 1, , ... ,h t h t t h t q t h t qC Cov X Z Cov X Z Cov X Z
Therefore, the parameters are given by:
1 p p p pC
Note that for an AR(p) process, C h are all zeroes since the noise terms are uncorrelated with the past:1
p p p
Define the vector 11 2 ...
T
h h h hh h h , then for an AR(p) process where 1
p p p
,
we have:
If h = p, then 1,...,T
p p p which implies that pp p
If h > p, then 1,..., ,0,...,0T
h p which implies 0hh .
- Because AR(p) process is a special case of an AR(h) process, where 0t for t > p
If h < p, then we do not know precisely what h is
Therefore, for an AR(p) process:
1 1 ...t t p t p t X X X Z
We have:
? if
if
0 if
hh p
h p
h p
h p
The h-th partial autocorrelation function is defined by:
1 if 0
if 0hh
hh
h
Where hh is the last element in the vector 1
h h
E.g., for an AR(2) process, the PACF at lag 2 is given by:
222 where
121
22
0 1 1
1 0 2
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 38/62
38
Zhi Ying Feng
Model Building
For a stationary time series, model building can be classified into 3 stages:
Model Selection
To determine the appropriate model we need to look at the sample ACF and PACF:
The sample autocovariance function is:
1
1 n
t t t
x x x xn
The sample autocorrelation function is SACF:
0
Note that since SPACF will be calculated from SACF, both measures will have estimation error.
Once we have graphed the sample ACF and sample PACF, we first check that the sample ACF
should quickly converge to zero, which shows that the time series is stationary.
If the sample ACF decreases slowly but steadily from a value near 1, then the data need to
be differenced before fitting the model
If the sample ACF exhibits a periodic oscillation, then there may be some seasonality still.
Then we compare the sample ACF and PACF to the theoretical ACF and PACF of different
processes to see if there is a match.
For an AR( p) process:
Sample ACF shows exponential decay towards near-zero values Sample PACF shows significant values up to lag p, then near-zero values thereafter
For an MA(q) process:
Sample ACF shows significant values up to lag q, then near-zero values thereafter
Sample PACF shows exponential decay towards near-zero values
If neither of these situations occur, then consider an ARMA(p,q) process. However, the sample
ACF and PACF of an ARMA(p,q) process is very flexible, but in general the ACF and PACF are
the sum of the ACF and PACF of an AR(p) and a MA(q) process. So for an ARMA(p,q) model, it
should display: ACF that decays towards zero after lag p, either direct or oscillatory
PACF that decays towards zero after lag q, either direct or oscillatory
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 39/62
39
Zhi Ying Feng
Model Selection Criteria
When deciding the number of parameters, it is a trade-off between goodness of fit and the
simplicity of the model. More parameters mean more flexibility and a better sample fit. However,
more parameters also mean that each parameter is estimated with more uncertainty, i.e. higher
standard error.
A systematic method to choose p and q, i.e. the parameters, is to minimise an information criterion.
Information criteria have the form:
IC , , 2 log , , p q L P n p q
Where the first term, i.e. the log-likelihood function, always decreases with the no. of parameters.
However, the second “penalty” term, in terms of the no. of observations and the parameters, always
increases with the no. of parameters. Thus the IC seeks to balance out bias and variance.
There common IC’s are:
Information Criteria Penalty TermAIC 2 1 p q
BIC 1 log p q n
AICc 2 1
2
p q n
n p q
Parameter Estimation
Mean Estimation
Let t X be a weakly stationary process with mean µ. An estimator of µ is the sample mean:
1
1 n
t
t
X X n
This estimator is unbiased:
1
1 n
t
t
E X E X n
This estimator is also consistent:
1 1
21 1
21 1
2
1 1var cov ,
1
1
n n
t s
t s
n n
s t
n n s
s h s
n
h n
X X X
n n
t sn
hn
n hh
n
Thus, the variance 0 as n , i.e. it is consistent, if h
h
, which is true since for a
stationary time series the ACF should eventually converge to zero regardless of whether it is AR or
MA. For a more detailed proof, see ACTL2003 Proofs.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 40/62
40
Zhi Ying Feng
Parameter Estimation
If the sufficient number of sample ACF is known, then one way to estimate the parameters in a
model, i.e. θ , ϕ , σ 2, is by equating them to the theoretical ACF derived from the Yule-Walker
equations. Use this to set up equations in terms of the parameters and solve. If there are more than
one solution, choose the solution that makes the model causal and/or invertible.
Another method is to use maximum likelihood estimation. Suppose we have a set of errors that are
assumed to be normal, then the X t themselves are also normal so:
1,..., ~ 0,T
n n n X X N
X
Where:
β is a fixed set of parameters in the model, e.g. 2,T
for MA(1) model
n is the symmetric covariance matrix of X n, expressed in terms of the parameters
0 1 2 ... 1
1 0 1 ... 2
2 1 0 ... 3
1 2 3 ... 0
n
n
n
n
n n n
Assuming that the observations follow a multivariate normal distribution, the likelihood function is:
1
1 22
1 1exp
22 det
T n n nn
n
L
X X
The maximum likelihood estimator is the value that maximises the likelihood function L( β ).
Under the normality assumption, we have the asymptotic distribution of the MLE:
~ , var A
N
Where the variance can be estimated by:
12 ln
var T
L
This result can be used to compute confidence intervals for the parameters and for hypothesis
testing about the parameters, e.g. whether to include certain parameters.
Model Diagnosis
The residuals of the proposed model are:
t t t Z X X
Where ̂ are the fitted values computed using the estimated parameters. If the proposed model is a
good approximation to the underlying time series process, then the residuals should be
approximately a white noise process. There are several methods to check this:
Plot of Residuals
If the plot of Z t against t shows any trend or patterns in fluctuations, then the model is inadequate.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 41/62
41
Zhi Ying Feng
SACF of Residuals
If ~ WN 0,1t Z , then it can be shown that the ACF of Z t has the distribution:
1
~ 0, Z h N n
Therefore, at 95% significance, the sample ACF should be should be within the range:
1.960
n
Since for a white noise process, 0 for 1h h . If too many values of SACF lie outside this
range, then the model does not fit the process well and more parameters will be needed.
Ljung-Box Test
We can test the null hypothesis that ~ WN 0,1t Z using the Ljung-Box test statistic. This tests
whether jointly, all correlations at lags greater than zero are zero. Under the null hypothesis, for
large n, the Ljung-Box test statistic is:
2
2
1
2 ~h
Z h p q
j
jQ n n
n j
Where n is the no. of time series observations. In practice, h is chosen to be between 15 and 30, and
n should be large, i.e. 100n .
Using the Ljung-Box test, we reject the null hypothesis, i.e. the residual is not white noise, if:
2, 1h p qQ
Non-Stationarity
In practice, a non-stationary time series may exhibit a non-stationary level of mean, variance or
both. Transformations can be used to remove non-stationarity, e.g. taking logarithm of an
exponential trend can remove non-stationary mean, or it can smooth out the variance
Stochastic Trends
Apart from a deterministic trend or seasonality, a stochastic trend also causes non-stationarity. A
stochastic trend is when the noise terms have a permanent effect on the process. Consider the
random walk rewritten iteratively
21 0
1
where ~ WN 0,t
t t t t j t
j
X X Z X X Z Z
In this case, the effect of any Z t on X t+h is the SAME for all 0h , since 1 . This is not true for
stationary processes like AR(1) or ARMA(1,1), where 1 , since depending on h, Z t will have
different level of impact since the coefficient j changes, e.g.:
1
jt t j t j
X Z Z
Since noise terms have a lasting impact, the correlation between X t and X t-h is relatively high, so a
distinctive feature of a random walk is a very slowly decaying positive ACF. Note that differencing
the random walk once obtains a stationary series!
11t t t t t Y B X X X Z
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 42/62
42
Zhi Ying Feng
ARIMA Model
The process t Y is an autoregressive integrated moving average model ARIMA(p,d,q) with order of
integration d if:
1 d
t t B B Y B Z
That is t Y becomes a stationary ARMA(p,q) process after differencing d times, i.e:
1 ~ ( , )d
t t W B Y ARMA p q
t t B W B Z
E.g. consider the process defined by:
1 2 3 10.6 0.3 0.1 0.25t t t t t t X X X X Z Z
This process can be re-written as:
1 2 3 1
2 3
2
1
0.6 0.3 0.1 0.25
1 0.6 0.3 0.1 1 0.25
1 1 0.4 0.1 1 0.25
d
t t t t t t
t t
t t
B B B
X X X X Z Z
B B B X B Z
B B B X B Z
Therefore, this is a ARIMA(2,1,1) process.
SARIMA Model
Suppose that t X exhibits a stochastic seasonal trend, i.e. where X t not only depends on
1 2, ,...t t X X but also 2, ,...t s t s X X .
To model this, we can use a SARIMA , , , , s
p d q P D Q process, given by:
1 11 1 ... 1 1 ... P Q Dd s s s s s
P t Q t
AR p MA q I d I D AR P MA Q
B B B B B X B B B Z
Where AR(P), MA(Q) and I(D) are polynomials with the term s B .
E,g, consider an 12
SARIMA 1,0,0 0,1,1 process given by:
12 12
1 1 1
1 1 1t t
AR I MA
B B X B Z
This can be written as:
12 1312
12 1 13 12
1 t t t
t t t t t t
B B B X Z Z
X X X X Z Z
Note that X t depends on X t-12, X t-1, X t-13 as well as Z t-12.
SARIMA models can always be rewritten as an ARIMA model, with some constraints on the new
parameters. However, converting to ARIMA always requires more parameters than the SARIMAmodel, which leads to better in-the-sample fit, but worse out-of-sample fit, i.e. predictions. Thus,
when forecasting it is better to use SARIMA models than converting it to ARIMA models.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 43/62
43
Zhi Ying Feng
Dickey-Fuller Test
The Dickey-Fuller test is a unit root test, i.e. it tests whether there is a unit root in the time series.
Note that if the polynomial z has a unit root, then the time series is not stationary and requires
differencing.
Consider a causal time series process t X with:
21 ~ WN 0,t t t X t X Z Z
To test for stochastic trend, we test the hypothesis:
0 1: 1 against : 1 H H
Note that if 1 then there is a unit root, which leads to a stochastic trend so X t is not stationary.
We can write the above model as:
1 1
* *1
1
where 1
t t t t t
t t t
X X X Z
X t X Z
Then the alternative and equivalent null hypothesis is:
*0 : 0 H
This is known as the Dickey-Fuller Test, and the test statistic is:
*
* se
Once the parameters *, , have been estimated, reject the null hypothesis * 0 if is LESS than
the critical value, which will be negative since * is negative. Note that rejection of the null
hypothesis implies that the time series is stationary and accepting the null hypothesis implies that
differencing is required.
The distribution of this test statistic is a non-standard distribution depending on α and β, with
asymptotic percentiles.
Probability to the left 0.01 0.05 0.10
Standard Normal -2.33 -1.65 -1.28
DF with 0, 0 -2.58 -1.95 -1.62
DF with 0 -3.43 -2.86 -2.57DF (unconstrained) -3.96 -3.41 -3.12
Note that the DF distributions are much more spread out than the standard normal. When choosing
whether or not to do DF with 0, 0 , there is a trade-off:
If α or β is set to 0 when in fact the true values are nonzero, the test becomes inconsistent
and asymptotic critical values are no longer valid. Decisions based on the test are likely to
be wrong, i.e. it might confuse deterministic and stochastic trends
However, allowing α or β to be non-zero reduces the power of the test, i.e. harder to detect a
false null hypothesis
How to determine α and β usually depends on what type of series we have. E.g. if a linear trend
exists then we expect the difference to only have a constant, so 0, 0
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 44/62
44
Zhi Ying Feng
Overdifferencing
Let U t be an ARMA(p,q) process:
t t B U B Z
Define 1 d
t t V B U , i.e. n differences. Then we have:
1 1d d t t t t BV B Z B V B B Z
B
Therefore, V t becomes an ARMA(p,q+d) process since 1 d
B B is a polynomial with order q+d .
However, this MA polynomial has a unit root so the process V t is not invertible. Therefore, we
should avoid overdifferencing as it will give us a non-invertible process, even though it’s still
stationary.
Cointegrated Time Series
Many time series in finance and economics are non-stationary (random walks), e.g. CPI and GDP,
but at the same time do not move too far apart from each other. Cointegration is used to model non-stationary series that move together
For a bivariate process ,1 ,2, T
t t t X X X , we define:
,1
,2
d t d
t d t
X X
X
A bivariate process ,1 ,2, T
t t t X X X is integrated of order d , i.e. I(d), if d t X is stationary but
1d
t X
is not.
An I(d) bivariate process is cointegrated if there is a cointegrating factor 1 2, T
such that
T t X becomes stationary, i.e.:
1 ,1 2 ,2 ~ 0t t X X I
If ,1t X and ,2t X are cointegrated, then:
,1t X and ,2t X are both I(1)
t t t e Y X is I(0)
Cointegration implies that there is a common stochastic trend between ,1t X and ,2t X
In many financial applications, the cointegrating vector is in the form:
1, T a
That is, X t,1 and X t,2 are random walks themselves but the difference ,1 ,2t t X aX is stationary. The a
term can be estimated by using regressing:
,1 ,2t t t X aX
Then we expect in the long run that the two processes converge to:
,1 ,2t t X aX
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 45/62
45
Zhi Ying Feng
Time Series Forecasting
Time Series and Markov Property
Recall that a process , 0,1,...t X t has the Markov property if all future states of the process
depend on its present state alone and not on any of its past states:
1 1Pr | ,... , Pr |nt s s n s t s X A X x X x X x X A X x
For all times 1 2 ... n s s s s t and all states 1 2, ,..., ,n x x x x S and all subsets A of S
AR Processes
An AR(1) process has the Markov property, since the conditional distribution of 1n X given all
previous t X depends only on n X . However, an AR(2) process does not have the Markov property,
since the conditional distribution of 1n X given all previous t X depends on both n X and 1n X . Thus in
general, AR(p) processes do not have the Markov property for p greater than 1.
However, for an AR(2) process if we define a vector-valued process Y by 1, T
t t t Y X X , then Y
has the Markov property since the conditional distribution of 1nY given all previous t Y depends only
on nY . In general, for an AR(p) process we can define a vector-valued process with p elements that
will have the Markov property.
MA Processes
A MA(q) process can never have the Markov property, even in vector form, since the distribution of
1n X depends on the value of n Z and in theory no knowledge of the value of n X or any finite
collection of 1,...,T
n n q X X will never be enough to deduce the value of n Z .However in practice,
we can estimate n Z very accurately so Markov simulation techniques still applies.
ARIMA Processes
For an ARIMA(p,d,q) process, if q is zero, i.e. no moving average component, then the process
behaves similar to AR(p) in terms of Markov property, i.e. it might not be Markov but an
appropriate vector form can be Markov.
If d is also zero and p is greater than 1 then it is essentially an AR(p) process. If both p and d are
equal to 1, then the model can be written as:
1
21 1
1 1 1 2
1 1
1
1
t t
t t
t t t t
B B X Z
B B B X Z
X X X Z
This is clearly not Markov. However, it can still be written as a vector-valued process that has the
Markov property. In general, the vector process needs p+d terms to be Markov
If q is not equal to zero, i.e. it has a moving average part, then it will never be Markov for the same
reason that MA(q) is never Markov.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 46/62
46
Zhi Ying Feng
k -step Ahead Predictor
Assume that we have the following information:
(1)
All observations for t X up until time n: 1,..., n x x
(2)
An ARMA model has been fitted to the data
(3) All parameters of the model 2, , have been estimated
(4) The process t Z is known for all up until time n
k -step ahead forecasts |n k n X is one method to forecast/predict n k X by using the given
observations up until time n:
| 1| ,...,n k n n k n X X X X
This is obtained by:
Replacing the random variables 1,..., n X X by their observed values 1,..., n x x
Replacing the random variables 1 1,...,n n k X X by their forecast values |n k n X
Replacing the random variables 1 1,...,n n k Z Z by their expectations, i.e. 0
If 1,..., n Z Z are unknown, replacing 1,..., n Z Z by the residuals 1,..., n Z Z where
1| ,...,i i n Z E Z X X
To forecast |1n k X we have:
1| 1 1
1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1
| ,...,
... ... | ,...,| ,..., ... | ,..., | ,..., ... | ,...,
... ...
n n n n
n p n p n n q n q n
n n p n p n n n q n q n
n p n p n q n
X X X X
X X Z Z Z X X X X X X X X Z X X Z X X
X X Z Z
q
To forecast |2n k X we have:
2| 2 1
1 1 2 2 1 1 2 2 1
1 1 1 2 1 1 1 1 2 1
1|1 2
| ,...,
... ... | ,...,
| ,..., ... | ,..., | ,..., ... | ,...,
...
n n n n
n p n p n n n q n q n
n n p n p n n n q n q n
n n p n
X X X X
X X Z Z Z Z X X
E X X X X X X Z X X Z X X
X X
2 2... p n q n q Z Z
In practice we do not observe t Z so if there are MA terms in the model, then there are more values
of t Z than t X and there is no way of determining all of them from data. Consider the MA(1) process:
1 10.5 0.5t t t t t t X Z Z Z X Z
Then by repeated substitution we have:
1
0
0
0.5 0.5n
j n
n n j
j
Z X Z
To determine n Z we need 0 Z first, one simple way is to assume that 0 0 Z . If the process is invertible,
then this assumption will have negligible effect on |n k n X if n is large, since 0.5 n
will be large.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 47/62
47
Zhi Ying Feng
Best Linear Predictor
Under assumptions (1), (2) & (3), the best linear predictor n n h P X of n h X has the form:
0 1 2 1 1...n n h n n n P X a a X a X a X
We need the values of 0 ,..., na a that minimises the mean squared error:
2
n h n n h MSE X P X
The general solution is found by minimising the n+1 first-order conditions:
0
1
1
0 0
0 0
0 0
n h n n h
n h n n h n
n n h n n h
MSE a X P X
MSE a X P X X
MSE a X P X X
Note that due to the very first condition, the expected prediction error is:
0n h n n h X P X
i.e. the prediction is unbiased.
The first equation can be simplified to:
0 1 2 ... 0na a a a
While the second can be simplified using a trick:
0
1 2
,
0 1 ... 1
0
n h n n h n n h n n h n
n h n n h n n h n n h
n h n n h n
n
X P X X X P X X
X P X X X P X
Cov X P X X
h a a a n
Then applying this trick to every subsequent MSE minimising condition, we end up with a system
of n equations (excluding the first one) that is very similar to the process of finding PACF
coefficients. This system can be represented in matrix form:
n n nh a
Where:
0 1 ... 1
1 0 ... 2
1 2 ... 0
n
n
n
n n
1
2
n
n
a
aa
a
1
1
n
h
hh
h n
Therefore, the solution is:
1n n na h
Once this is known, the a0 can be found by rewriting the first equation in matrix form:
0 1 na T
1 a
Where T1 represents a column vector of 1’s.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 48/62
48
Zhi Ying Feng
Part 3: Brownian Motion
Definitions
A stochastic process , 0t X t is a Brownian motion process if:
0 0 X (1)
, 0t X t has stationary and independent increments (2)
For all 0t , 2~ 0,t X N t (3)
The conditions (2) and (3) together are equivalent to:
2~ 0, for 0t X N t t
, 0t X t has stationary increments and cov , 0 s t s X X X for 0 s t since if two
normal r.v. have 0 covariance then they are independent’
A standard Brownian motion , 0t B t is when the volatility is one, i.e.: 1 . Any Brownian
motion , 0t X t can be standardised by setting:
t t
X B
Note that Brownian motion is continuous for all values of t .
A Brownian motion process exhibits some strange behaviour:
A Brownian motion is continuous w.r.t. time t everywhere, but differentiable nowhere
Brownian motion will eventually hit any and every real value no matter how large, or how
negative. Likewise, no matter how large, it will always (with probability 1) come back down
to zero at some future time
Once a Brownian motion hits a certain value, it immediately hits it again infinitely often,
and will continue to return after arbitrarily large times
Brownian motion is fractal, i.e. it looks the same regardless of what scale you examine it
Properties of Brownian Motion
Consider a Brownian motion , 0t X t with volatility :
For any s with 0 s t , X s and X t are NOT independent. However, by independence of
increments,
and s t s X X X
are independent.
This can be shown by finding the covariance between any X s and X t
2
cov , cov ,
cov , cov ,
s t s s t s
s s s t s
X X X X X X
X X X X X
s
For any s and t , s X and t s X X are both normally distributed, i.e.:
2
2
~ 0,
~ 0,
s
t s
X N s
X X N t s
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 49/62
49
Zhi Ying Feng
Brownian Motion and Symmetric Random Walk
Define a symmetric random walk using random variable:
1 w.p. 0.5
1 w.p. 0.5iY
With mean 0i E Y and var 1iY
Now divide the interval 0,t into equal intervals of length t so that there is a total of t t
n
intervals. Suppose that at each t , the process can either go up or down by size x . Now let X(t) be
the position at time t , then:
1 2 ... t t
X t x Y Y Y
Then, if we let x t and let 0t then the limiting process of X(t) is a Brownian motion.
This is because: , 0t X t has independent increments, since Y’s are independent (changes in the value of
the random walk in non-overlapping time intervals are independent)
, 0t X t has stationary increments, since the distribution of the change in position of the
random walk over any time interval depends only on the length of the interval
, 0t X t has approximate normal distribution with mean 0 and variance 2t as t
converges to 0, by the Central Limit Theorem on i.i.d. random variables
Brownian Motion with Drift
A stochastic process , 0t X t is a Brownian motion process with drift coefficient and variance
parameter 2
if it satisfies:
0 0 X
, 0t X t has stationary and independent increments
For all 0t , 2~ ,t X N t t
A Brownian motion with drift can be converted to a standard Brownian motion by defining:
t t
X t B
Similarly, a standard Brownian motion can be converted to a Brownian motion with drift by
defining:
t t X t B
Geometric Brownian Motion
Let , 0t X t be a Brownian motion process with drift and volatility 2
, then the process
, 0t Y t defined by:
expt t Y X
is called a geometric Brownian motion. This is useful when you don’t want negative values
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 50/62
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 51/62
51
Zhi Ying Feng
Now consider the behaviour of or t t dt dB dB dt :
222
2 2
0
var
0
t t t t
t t t t t t t
t B t B B
t B t B B t B B
t t
o t
Then in differential form:
0 var t t dt dB dt dB o dt
Taking o dt as zero, this is again reduced to a constant:
0t dt dB
Finally, consider 2
dt :
2
0dt o dt
The following table summarises the results:
dt t dB
dt 0 0
t dB 0 dt
Stochastic Differential Equations
Assume that a stochastic process X t satisfies a stochastic differential equation (SDE) in the form:
, ,
t t t t
dX X t dt X t dB
Consider another stochastic process defined as
,t t Y F X t
Then the stochastic differential equation satisfied by Y t is given by the Ito’s formula:
22
2
, , ,1,
2t t t t x X t x X t x X
F x t F x t F x t dY dX dt X t dt
x t x
Proof:
The derivative of the second order Taylor series expansion for a function of two variables is:
2 21
, 22 x y xx yy xydf x y f dx f dy f dx f dy f dxdy
Applying this to Y t gives:
2 2 22 2
2 2
, ,
, , ,12
2
t t
t t t
t t
x X x X
t t
x X x X x X
F x t F x t dY dX dt
x t
F x t F x t F x t dX dt dX dt
x t x t
However, we know from the previous section that 2
0t dt dB dt , so:
, ,
0
t t t t dt dX dt X t dt X t dB
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 52/62
52
Zhi Ying Feng
And:
22
222 2
22
2
, ,
, , 2 , ,
,
,
t t t t
t t t t t t
t t
t
dX X t dt X t dB
X t dt X t dB X t X t dB dt
X t dB
X t dt
Therefore, the expansion reduces to the Ito’s formula
E.g. consider a Brownian motion , 0t X t with 0 drift and variance 2 , find the SDE for 2
t tX
We have the SDE for X t :
t t dX dB
Let 2 ,t t t Y tX F X t where 2, F x t tx so the derivatives are:
2
22
, , ,2 2
F x t F x t F x t tx x t
x t x
Then using Ito’s formula, the SDE for 2t tX is given by:
2 2 2
2 2
1, 2 2
2
2
t t t t t
t t t
d tX dF X t tX dX X dt t dt
tX dB X t dt
Stochastic Integration
Consider a Brownian motion , 0t X t with zero drift and variance 2 . Let f be a function with
continuous derivative on [a,b]. The stochastic process Y t defined by the stochastic/Ito’s integral is:
1
1
1
1max 0
limk k
k k
nb
t t k t t a n
k t t
Y f t dX f t X X
Where 0 1 ... na t t t b is a partition of [a,b].
This stochastic process Y t satisfies the properties:
Mean of the Ito integral is zero, i.e.
1
1
11
max 0lim 0k k
k k
nb
t k t t a nk
t t f t dX f t X X
- Note that functions of t in this case includes stochastic variables, e.g. 0nt t B dB
Thus, the variance and expectation squared are equal:
1
1
1
22
1
1max 0
2 21 1
1max 0
2 2
var lim var
lim
k k
k k
k k
nb b
t t k t t a a n
k t t
n
k k k n
k t t
b
a
f t dX f t dX f t X X
f t t t
f t dt
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 53/62
53
Zhi Ying Feng
Part 4: Simulation
Continuous Random Variables Simulation
When simulating continuous random variables we have the probability/cumulative distribution of
the random variable. The cumulative probabilities will always have a Uniform(0,1) distribution.
Since if ~ U 0,1 X then 1
F X
is distributed as F . This means that the probability that a probability is in [a,b] is the same as the probability that a probability is in [c,d]. Therefore, the first
step in simulation is usually to generate random variables from a Uniform(0,1).
Pseudo-Random Numbers
The procedure to generate pseudo-random numbers, i.e. from a Uniform(0,1), is:
1. Start with a seed X 0 and specify positive integers a, c and m, which are usually given
2. Generate pseudo-random numbers recursively using:
1 modn n X aX c m
3.
1n X m
will be an approximation to a Uniform(0,1) random variable
Inverse Transform Method
Consider a continuous random variable with cumulative distribution function F X and a Uniform(0,1)
random variable U . Define:
1 X X F U
Then X will have the distribution function:
1
Pr
Pr
Pr since is strictly increasing
using
X
X
X X
U X
X U
F x X x
F U x
U F x F
F F x
F x F y y
Then the inverse transform procedure to generate r.v. from a cumulative distribution F is:
1.
Compute 1 X F , from the p.d.f or c.d.f, if possible
2. Generate a Uniform(0,1) random variable U
3.
Set 1
X X F U
, then X will be from the distribution F
Note that for the inverse transform method to work, we must be able to calculate the inverse of the
c.d.f., i.e. 1 X F must have an explicit expression.
E.g. to simulate an exponential random variable, first find 1 X F :
1
0
1exp 1 exp ln 1
x
X x F x s ds x F x y
Suppose we generate a random variable U from Uniform(0,1). Then we set:
1 1 1
ln 1 ln x F U U U
Therefore x is a random variable with an exponential distribution.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 54/62
54
Zhi Ying Feng
Discrete Random Variable Simulation
The inverse transform method can also be applied to simulate discrete random variables. Consider
one with probability mass function:
Pr , for 1, 2,...
1
j j
j j
X x P j
P
Then the procedure to simulate r.v. from this p.m.f. is:
1. Generate a Uniform(0,1) random variable U
2. Set:
1 1
2 1 1 2
1
1 1
if
if
if j j
j i ii i
x U P
x P U P P X
x P U P
E.g. simulate random variables from a geometric distribution.
Consider the probability mass function for a geometric random variable:
1
Pr 1 1 j
j P X j p p j
Notice that:
1
1
1
1
1
1
1 Pr 1
1 1
11 geometric sum
1 1
1 1
j
i
i
i
j
j
j
P X j
p p
p p
p
p
Then we have:
1
1 1
1
1
1 1 1 1
1 1 1
j j
i ii i
j j
j j
P U P
p U p
p U p
Since U is a Uniform(0,1) random variable, this is equivalent to:
11 1
ln 1 ln 1 ln 1
ln1 since ln 1 0
ln 1
j j p U p
j p U j p
U j j p
p
Therefore to generate a random variable from a geometric distribution:
1. Generate a uniform random number U
2.
Set X as j, where j is the first integer for which:
ln
ln 1
U j
p
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 55/62
55
Zhi Ying Feng
Acceptance-Rejection Method
Suppose we have a method, e.g. inverse transform, to simulate an r.v. with density g(y). Then we
can use this as the basis for simulating from the continuous distribution with density f(x).
The procedure to simulate random variables using the rejection method is:
1.
Choose a distribution g for which you know you can simulate outcomes2.
Set c be some constant such that c f y g y for all y
3. Simulate a random variable Y with density function g(y)
4. Simulate a Uniform(0,1) random variable U
5. Accept this as the random number, i.e. set X = Y , if:
f Y U
c g Y
6. Otherwise, reject and return to step 3.
Therefore, the value for X is Y N where N is the number of iterations until a random number is
accepted. We want to be as efficient as possible, i.e. minimise the no. of iterations, by: For efficiency, choose a density g(y) similar to f(y), e.g. exponential to normal
For efficiency, choose the smallest value of c that satisfies the inequality, using calculus, i.e.:
maxc f y g y
E.g. simulate the absolute value of a standard normal random variable X Z
Firstly, X Z has the density:
22
exp 22
x
f x
1. Let another random variable Y be from the density exp g x x , note that this is
comparable to f(x)
2. Choose the smallest value of c such that c f y g y , e.g.:
212 2
max max exp2
f x
g x
xe ec
Since the exponential part will always be less than 1
3.
Generate U 1 and U 2 from Unif(0,1)4. Compute Y from the density g using inverse transform, and using U 1, i.e.:
1logY U
5. Now check if:
22
2 1
1 1exp 1 exp log 1
2 2
f Y U Y U
cg Y
6. If true, then set 1log X Y U , if false then return to step 3 and repeat.
7.
To now generate a standard normal random variable Z , generate U 3 and set:
3
3
if 0.5
if 0.5
X U Z X U
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 56/62
56
Zhi Ying Feng
Simulation Using Distributional Relationships
Gamma Distribution
Recall that the waiting time to the nth event in continuous Markov chains has a Gamma distribution.
To simulate random variables from Gamma ,n distribution, the procedure is:
1.
Generate n independent Uniform(0,1) random variables 1 2, , ..., nU U U
2.
Simulate an exponential random variable using the inverse transform method
1 1ln F U U
3. Since the sum of n independent Exp(λ ) random variables has a Gamma distribution, set:
1
1ln ~ Gamma ,
n
i
i
X U n
Chi-Squared Distribution
The sum of n standard normal r.v. has a chi-squared distribution with n degrees of freedom:
2 2
1 ~
n
i ni Z
Alternatively, a chi-squared distribution with an even degree of freedom 2k is equivalent to a
Gamma , 1 2k distribution. If the degree of freedom is odd, i.e. 2k+1 we can add on an extra Z 2
term, where Z is standard normal. That is:
21
22 11
2 ln ~
2 ln ~
k
i k i
k
i k i
U
Z U
Poisson Distribution
Recall that the no. of events within one period where events follow an Exp(λ ) distribution is Poi(λ ).
To simulate random variables from a Poisson(λ ) distribution, the procedure is:
1. Generate a Gamma ,n random variable using the above steps
2. Since the waiting time until the nth event has a Gamma ,n distribution, which is a sum of
independent Exp(λ ) random variables, we set:
1
1max : ln 1 ~ Poi
n
i
i
X n U
i.e. the total number of arrivals within 1 period.
Normal Distribution
One method of simulation random variables from a normal distribution is the Box-Muller approach:
1.
Generate two random variables U 1 and U 2 from Uniform(0,1).
2.
Set:
1
21 2
1
21 2
2 ln cos 2
2 ln sin 2
X U U
Y U U
Then X and Y are a pair of independent standard normal random variables
3.
A pair of uncorrelated normal random variables can be then derived using:and X Y
4. Note that if X is normal then to generate a lognormal random variable L we set X L e
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 57/62
57
Zhi Ying Feng
Monte Carlo Simulation
Suppose that 1,..., n X X X is a random vector with a given joint density function 1,..., n f x x and
we want to compute:
1 1 1... ,..., ,..., ...n n n g X g x x f x x dx dx
But with a function g for which this is impossible to compute. Then we can approximate thisintegral by using Monte Carlo simulation.
Let i
X and i
Y be the ith simulated sample path of X andY . Then the MC simulation procedure is:
1. Generate a random vector 1 1 1
1 ,..., n X X X
with joint density 1,..., n f x x and compute
1 1Y g X
2. Generate second, independent from step 1, random vector 2 2 2
1 ,..., n X X X
with joint
density 1,..., n f x x and compute 2 2Y g X
3.
Repeat this process until r (a fixed number) i.i.d. random vectors are generated:
, for 1,2,...,i i
Y g X i r
4. Estimate E g X by using the arithmetic average of the generated Y ’s
1
... r
Y Y Y g X
r
This method works due to the strong law of large numbers:
1 ...lim
r i
r
Y Y Y g X
r
Expectation and Variance
Y is an unbiased estimate of g X :
1
1 1
1 1,...,
r r i i
n
i i
Y Y Y g X X r r
Since each Y i is independent and identically distributed with the distribution of 1,..., n g X X
The variance of Y is given by:
21 1
var 1 1var var var
ir r
i i
i i
Y Y Y Y
r r r
Note that usually we do not know var
iY , so we estimate it using the sample estimate:
2
1
1var
1
r i
i
Y Y Y r
In the next 3 sections, we will describe some techniques to REDUCE this variance
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 58/62
58
Zhi Ying Feng
Antithetic Variables
Reducing the variance of the estimate using antithetic variable involves generating estimates with
negative correlation then adding these estimates to obtain a final estimate.
Assume that k is even and that 2r . The antithetic variates procedure is:
1.
Generate a set of n variates 1 11 ,..., n X X and determine
1 1 11 ,..., nY g X X
2. Generate a set of n variates 1 1
1 ,..., n X X
, which are correlated with 1 1
1 ,..., n X X and
determine 1 1 1
1 ,..., nY g X X
3. Repeat steps 1 and 2 times to form 21 , ,...,Y Y Y
and
1 2, ,...,
r Y Y Y
4. Calculate the arithmetic average of the Y ’s:
1 2
1 1
1 1and
r i i
i i
Y Y Y Y
5.
Use:
1 2
2
Y Y Y
as the final estimate for Y g X
Using this method, as long as the correlation between 1Y and 2Y are negative, then the variance will
be reduced. One example of this is by using:
X ~ Uniform(0,1) for the first set of n variates, e.g. 1Y
1 – X . which is also Uniform(0,1) for the second set of n variates, e.g. 2Y
Then as long as g(X) is a monotonic increasing/decreasing function, we will have a negative
correlation:
cov ,1 1 X X
To show why this method reduces the variance, consider the variance of the estimator using the
antithetic variable
1 2
1 2 1 2
1 1 2
var var 2
1
var var 2 var var 4
1var 1 since var var
2
var 11 using 1 but with only the first set up to
2
var 1
var
var for 0
AV
i
i
i
Y Y Y
Y Y Y Y
Y Y Y
Y
Y
r
Y
Y r
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 59/62
59
Zhi Ying Feng
Control Variates
If we wish to evaluate the expected value g X and there is a function f such that the expected
value of f(X) can be evaluated analytically with:
f X
Then we can evaluate g X using:
0
1
g X g X a f X
g X a f X
a g X af X
Where a is a parameter we can choose.
Therefore, instead of evaluating g X directly, we evaluate (1) using a control variates:
1
1
cv
r
i i
i
g X Y
a g X af X
a g X af X r
Then, this estimator will have the variance:
21
2
1var var
var
var var 2 ,
r
cv i i
i
i i
i i i i
Y g X af X r
g X af X r
g X a f X aCov g X f X
r r r
This variance can be minimised by solving:
var 0
cvY
a
With the solution being the best choice of a:
,
var i i
i
Cov g X f X a
f X
Substituting this value back, we have that the minimised variance is:
2
0
,var var
var
var
var
i iicv
i
i
Cov g X f X g X Y
r r f X
g X
r
Y
Therefore, using a control variates has decreased the variance compared to the original estimator
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 60/62
60
Zhi Ying Feng
Importance Sampling
We can write:
X g X g x f x dx
Where f X is the density of X .
Now consider another equivalent probability density to f(x) denoted by h(x) such that the zero
probability events agree under both densities. E.g. 0 f x if and only if 0h x . Let Y be the
random variable with density function h(x).
Then we can write:
X
Y
f x g X g x h x dx
h x
f Y g Y
h Y
Then, we can simulate Y i from density h(x) and estimate the expectation X g X with:
1
1 r i
h i
i i
f Y Y g Y
r h Y
If it is possible to select a probability density h(x) so that the random variable
f X g X
h X
has a smaller variance then the estimator will be more efficient, i.e.:
var var
f X g X g X
h X
To do this, we need to select a density h(x) such that the ratio of the two densities, i.e. f x h x is
large when g(x) is small and vice versa.
Number of Simulations
Ideally, we want to carry out simulations as efficiently as possible since the no. of simulations
required for accuracy is quite large. Assume we generate n samples from a known distribution. Toestimate its mean we will use the sample average as an estimator:
1
1 r i
i
Y Y r
This is an unbiased estimator, i.e.:
Y
Its variance is given by:
2var var
iY
Y r r
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 61/62
61
Zhi Ying Feng
However, the value of var
iY is usually not known, so we estimate it using the sample variance
from the first k runs where k < n and usually at least 30. Then the estimate of the mean and variance
becomes:
2
1 1
1 1var
1
k k i i ik
i iY Y Y Y Y
k k
For large values of n, we know that by the Central Limit Theorem this estimator will be
approximately normal:
2
~ ,Y N r
Then, we can select n such that the estimate is within a desired accuracy of the true mean, i.e. a
percentage of the true mean with a specified probability.
E.g. random variates are generated from a Gamma distribution:
1 for 0 x f x x e x
Twenty values are generated for each sample and the mean and standard deviation of each sample
are given as:
Sample 1 2 3 4 5 6 7 8 9 10
Mean 12.01 11.79 13.43 14.01 11.44 11.19 11.24 12.42 12.91 12.29
SD 6.15 4.73 6.42 7.02 4.30 3.90 3.84 4.35 3.59 4.60
Determine the no. of simulated values required for the estimate of the mean to be to be within 5% of
the true mean with 95% certainty.
We require n such that:
Pr 0.05 0.95
0.05Pr 0.95
0.05Pr 0.95
0.05 0.05Pr 0.95
0.052 Pr 1 0.95
0.051.96
X
X
n n
Z n
Z n n
Z n
n
We do not know the true value for and , so we must use the sample estimates. The sample
estimate for the mean is given by averaging the mean of each sample:10 20 10
1 1 1
1 1 112.273
20 10 10iij
i j i
X X X
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 62/62
To estimate the sample variance, use:
2
2
1
22
1 1 1
22
1
1
1
12
1
1
1
n
i i
i
n n n
i i
i i i
n
i
i
s X X n
X X y X n
X n X n
So the estimated variance is given by:
10 20
22 2
1 1
110 20
10 20 1 ij
i j
x X
Where we have:
10 20 10
22 2
1 1 1
10 1022
1 1
20 1 20
19 20
4790.0596 30285.702
35075.7616
iij i
i j i
ii
i i
x s X
s X
Therefore the estimated variance and standard deviation is:
2 2135075.7616 200 12.273
199
24.876664.98765
Substituting back into the previous equation, we have:
0.051.96
0.05 12.2731.96
4.98765
254
n
n
n
We can also estimate the parameters of the Gamma distribution, since we know that:
2
var X X
Using our estimated values we have:
12.273
24 87666