Time Series Analysis - Saarland UniversityTime Series Analysis (SS 2019) Lecture 3 Slide 1 Linear forecasting of stationary time series starting point: weakly stationary time series

Examples

For an independent white noise (εt)t∈Z, we consider a stochasticprocess X satisfying for all t ∈ Z the recursion Xt = φXt−1 + εtfor some φ ∈ R. Then we have in case of |φ| < 1:

1 Xt = εt + φεt−1 + φ2εt−2 + . . .,2 σ(Xt−1,Xt−2, . . .) = σ(εt−1, εt−2, . . .),3 E (Xt |Xt−1,Xt−2, . . .) = E (Xt |Xt−1) = φXt−1,4 E (Xt − E (Xt |Xt−1,Xt−2, . . .))2 = E (Xt − E (Xt |Xt−1))2 = σ2

ε .

For |φ| > 1, we have:1 Xt = − εt+1

φ −εt+2

φ2 − . . .,2 σ(Xt−1,Xt−2, . . .) = σ(Xt−1, εt−1, εt−2, . . .),3 E (Xt |Xt−1,Xt−2, . . .) = E (Xt |Xt−1),4 If ε is Gaussian white noise, we have E (Xt |Xt−1) = �Xt−1.

Time Series Analysis (SS 2019) Lecture 3 Slide 1

Linear forecasting of stationary time series

starting point: weakly stationary time series X with known meanµ := E (Xt) and autocovariance function γX ; we are looking for an’optimal’ linear combination

Xn+h := a0 + a1Xn + · · ·+ anX1,

to forecast Xn+h (h ∈ N) when given X1, . . . ,Xn, where ’optimal’stands for minimizing the mean squared forecast error

E (Xn+h − Xn+h)2.


More general problem: for a random vector W = (Wn, . . . ,W1)′ withcovariance matrix Γ and a random variable Y with finite variance, wewant to find a linear combination

Y := a0 + a1Wn + · · ·+ anW1

with minimal mean squared error E (Y − Y )2.

TheoremIn the above situation, we have:

1 Y = E (Y ) + a′(W − E (W )), with a any solution ofΓa = Cov(W ,Y ) (such a always exists).

2 E (Y − Y )2 = Var(Y − Y ) = Var(Y )− a′ Cov(W ,Y ) =Var(Y )− a′Γa.

3 Cov(W , Y − Y ) = 0.


Definition (Linear prediction)

Given a random vector W = (Wn, . . . ,W1)′ with covariance matrix Γ,a random variable Y with finite variance, and a any solution ofΓa = Cov(W ,Y ), we call Y := E (Y ) + a′(W − E (W )) linearprediction of Y given W , in symbols: P(Y |W ).

TheoremThe linear prediction has the following properties:

1 E (Y − P(Y |W )) = 0,

2 P(α1Y1 + α2Y2|W ) = α1P(Y1|W ) + α2P(Y2|W ) for allα1, α2 ∈ R,

3 P(n∑

i=1

αiWi + β|W ) =n∑

i=1

αiWi + β for all α1, . . . , αn, β ∈ R,

4 P(Y |W ) = E (Y ), when Cov(W ,Y ) = 0.


RemarksObviously, P(Y |W ) is σ(W1, . . . ,Wn)-measurable, which implies

1 E (Y − E (Y |W1, . . . ,Wn))2 ≤ E (Y − P(Y |W ))2

2 If E (Y |W1, . . . ,Wn) is linear in W1, . . . ,Wn, we haveP(Y |W ) = E (Y |W1, . . . ,Wn).

If W is a (univariate) random variable with positive variance, wehave

P(Y |W ) = β0 + β1W

and

E (Y − P(Y |W ))2 = Var(Y )− Cov(W ,Y )2

Var(W )

with

β1 =Cov(W ,Y )

Var(W )and β0 = E (Y )− β1E (W ) .


Examples

For a white noise (εt)t∈Z, we investigate the stationary processX θt = εt + θεt−1 for |θ| < 1.

1 P(Xt |Xt−1) = θ1+θ2Xt−1,

2 E (Xt − P(Xt |Xt−1))2 = σ2ε(1 + θ4

1+θ2 ),

3 P(Xt |Xt−1,Xt−2) = θ(1+θ2)1+θ2+θ4Xt−1 − θ2

1+θ2+θ4Xt−2,

4 E (Xt − P(Xt |Xt−1,Xt−2))2 = σ2ε(1 + θ6

1+θ2+θ4 ),5 P(Xt+h|Xt ,Xt−1) = 0 for all h ≥ 2.

For a random walk (Xt)t∈N0 with drift α0 and initial value x ∈ Rrelated to an independent white noise ε, we have for t ≥ 1:

1 P(Xt |Xt−1) = Xt−1 + α0,2 E (Xt − P(Xt |Xt−1))2 = σ2

ε .


Linear forecasting of time series

We apply the linear prediction operator P to calculate the linearforecast of Xn+h given X1, . . . ,Xn, when mean µ := E (Xt) andautocovariance function γX are known:

P(Xn+h|Xn, . . . ,X1) = µ +n∑

i=1

ai(Xn+1−i − µ), with a = (a1, . . . , an)′

any solution of Γna = γn(h), with

Γn = [γX (|i − j |)]ni ,j=1 =

γX (0) γX (1) . . . γX (n − 1)

γX (1) γX (0) . . . γX (n − 2)

.... . . . . .

...

γX (n − 1) γX (n − 2) . . . γX (0)

and γn(h) = (γX (h), γX (h + 1), . . . , γX (h + n − 1))′.


Remarks I

The equations comprising the system Γna = γn(1) are calledYule-Walker equations.

Dividing the Yule-Walker equations by γX (0), we arrive at asystem of equations making use only of autocorrelations ρX (h)with h = 1, . . . , n: Rna = ρn, with

Rn =

1 ρX (1) . . . . . . ρX (n − 1)

ρX (1) 1 ρX (1) . . . ρX (n − 2)

.... . . . . . . . .

...

ρX (n − 1) ρX (n − 2) . . . . . . 1

and ρn = (ρX (1), ρX (2), . . . , ρX (n))′.


Remarks II

γX (0) > 0 and asymptotic uncorrelatedness are sufficientconditions for Γn to be non-singular for all n ∈ N, and thereforealso for the existence of a unique solution to the Yule-Walkerequations.

In the following, for x = (x1, . . . , xn)′, x (r) denotes the reversedvector x (r) := (xn, . . . , x1)′.

Due to the special structure of Rn (Rn is a symmetricToeplitz-matrix), we have the following lemma:

Rnx(r) = (Rnx)(r) for all x ∈ Rn


Levinson-Durbin recursion

Theorem (Levinson-Durbin recursion)

For a weakly stationary process X with acf ρX , we denotevn := E (Xn+1 − P(Xn+1|X1, . . . ,Xn))2 and vn := vn

γX (0). We then

have, if vn > 0:

1 If an solves the equation Rnan = ρn, thenan+1 := (an+1,1, . . . , an+1,n+1)′ with

an+1,n+1 := γX (n+1)−a′nγn(1)(r)

vn= ρX (n+1)−a′nρ

(r)n

vnand

(an+1,1, . . . , an+1,n)′ = an − an+1,n+1a(r)n solves the equation

Rn+1an+1 = ρn+1.

2 vn+1 = vn(1− a2n+1,n+1) and vn+1 = vn(1− a2

n+1,n+1).


Remarks

1 The condition vn > 0 appearing in the preceding theorem isequivalent to Γn+1 and Rn+1 being non-singular.

2 Therefore, under the condition stated in the theorem, an+1 is theunique solution to the Yule-Walker equations used to computeP(Xn+2|X1, . . . ,Xn+1).

Examples

1 For the weakly stationary process X θt = εt + 1

2εt−1 with a white

noise (εt)t∈Z, we have:a) P(X4|X3,X2,X1) = 42

85X3 − 417X2 + 8

85X1,

b) E (X4 − P(X4|X3,X2,X1))2 = 341340σ

2ε .


2 For the weakly stationary process X θt = A cos(θt) + B sin(θt),

with uncorrelated random variables A and B with vanishingmean and unit variance, we have:

a) P(X2|X1) = cos(θ)X1,b) E (X2 − P(X2|X1))2 = sin2(θ),c) P(X3|X2,X1) = 2 cos(θ)X2 − X1,d) E (X3 − P(X3|X2,X1))2 = 0.

3 For a white noise ε and a random variable Y with mean µY andpositive variance σ2

Y , uncorrelated with all εt , we define a weaklystationary process Xt by Xt := Y + εt . Then we have for alln ∈ N:

a) P(Xn+1|Xn, . . . ,X1) = σ2ε

nσ2Y +σ2

ε· µY +

σ2Y

σ2Y +

σ2εn

· Xn+...+X1n ,

b) E (Xn+1 − P(Xn+1|Xn, . . . ,X1))2 = σ2ε(1 +

σ2Y

σ2ε+nσ2

Y).


Partial correlation

For 3 random variables Y1, Y2 and Y3, it might happen that Y3 ishighly correlated both with Y1 and Y2. Therefore it is possible thatY1 and Y2 might be highly correlated, but that this correlation stemsmostly from the impact of Y3 on Y1 and Y2. To measure this, partialcorrelation is used.

Definition (Partial correlation)

We take as given two univariate random variables Y1, Y2 and a(possibly multivariate) random variable Y3. We then call the

correlation of Y1 := Y1 − P(Y1|Y3) and Y2 := Y2 − P(Y2|Y3) thepartial correlation of Y1 and Y2 given Y3, in symbols:Corr(Y1,Y2|Y3) := Corr(Y1 − P(Y1|Y3),Y2 − P(Y2|Y3)).


Examples

For a white noise (εt)t∈Z, we consider the stochastic process Xwith Xt = φXt−1 + εt for all t ∈ Z and some φ ∈ R with|φ| < 1. We then have

Corr(X0,X2|X1) = 0.

For the weakly stationary process X θt = εt + θεt−1 with an

independent white noise (εt)t∈Z and |θ| < 1, we have

Corr(X0,X2|X1) = − θ2

1 + θ2 + θ4.


Partial autocorrelation function (PACF)

Definition (Partial autocorrelation function, PACF)

For a weakly stationary process (Xt)t∈Z and h ∈ N \ {1}, the partialcorrelation of X0 and Xh given X1, . . . ,Xh−1 is called partialautocorrelation αX (h) of lag h. For h = 1, one definesαX (1) = ρX (1).The function αX : N→ R, N 3 h 7→ αX (h) is called partialautocorrelation function (PACF) of X .

TheoremFor a weakly stationary process X with PACF αX ,

αX (n + 1) = an+1,n+1,

with an+1,n+1 denoting the coefficients obtained from theLevinson-Durbin recursion.


Definition (h-step-forecast, forecast error)1 For a weakly stationary process X = (Xt)t∈Z, we call

a) P(Xt+h|Xt ,Xt−1 . . . ,Xt+1−n) for h ∈ N and n ∈ N h-stepforecast of order n,

b) ∆n(h) := E (Xt+h − P(Xt+h|Xt ,Xt−1, . . . ,Xt+1−n))2 meansquared (forecast) error (mse) of the h-step forecast of order n,

c) P(Xt+h|Xt ,Xt−1, . . .) := limn→∞

P(Xt+h|Xt ,Xt−1, . . . ,Xt+1−n)

h-step forecast (based on an infinite past),d) ∆(h) := lim

n→∞∆n(h) = E (Xt+h − P(Xt+h|Xt ,Xt−1, . . .))2 mse

of the h-step forecast.

2 If ∆(1) = 0, the process X is called deterministic, singular orexactly linearly predictable.

3 If limh→∞

∆(h) = Var(X0), we call the process X purely

non-deterministic.


Remarks I

We always have ∆(h) ≤ ∆n+1(h) ≤ ∆n(h) for all n, h ∈ N.

The limit appearing in the definition of the h-step forecast,P(Xt+h|Xt ,Xt−1, . . .) := lim

n→∞P(Xt+h|Xt ,Xt−1, . . . ,Xt+1−n), is

to be understood as mse-convergence, i.e.

limn→∞

E (P(Xt+h|Xt ,Xt−1, . . .)−P(Xt+h|Xt ,Xt−1, . . . ,Xt+1−n))2 = 0.

A weakly stationary process is deterministic if and only if∆(h) = 0 for all h ∈ N.


Remarks II

As Xt+h can always be forecast by E (Xt+h) with an mse ofVar(Xt+h) = Var(X0), we have ∆n(h) ≤ Var(X0) and∆(h) ≤ Var(X0). The defining condition for purelynon-deterministic processes, lim

h→∞∆(h) = Var(X0), therefore

corresponds to the intuition that the process’ past does notsignificantly contribute to the forecasting of the process’ valuesfar in the future.


Examples

1 The weakly stationary process X θt = A cos(θt) + B sin(θt) with

uncorrelated random variables A and B with zero mean and unitvariance is deterministic.

2 The MA(1) process Xt := εt + θεt−1 is purely non-deterministic,as P(Xt+h|Xt ,Xt−1, . . . ,Xt+1−n) = 0 for all h ≥ 2 and n ∈ N.

3 The AR(1) process Xt = φXt−1 + εt with |φ| 6= 1 is purelynon-deterministic.

4 The process Xt := Y + εt with a white noise ε and a randomvariable Y with positive variance and uncorrelated to all εt , isneither deterministic nor purely non-deterministic.


Theorem (Wold decomposition)

If X = (Xt)t∈Z is weakly stationary but not deterministic, thefollowing holds true:

1 Zt := Xt − P(Xt |Xt−1,Xt−2, . . .) is a white noise with variance∆(1) and Zt = P(Zt |Xt ,Xt−1, . . .) for all t ∈ Z.

2 For ψj :=E(XtZt−j )

∆(1)(j ∈ N0), we have

∞∑j=0

ψ2j <∞. The process

Ut :=∞∑j=0

ψjZt−j is weakly stationary and purely

non-deterministic.

3 Vt := Xt − Ut = Xt −∞∑j=0

ψjZt−j is weakly stationary and

deterministic. We have Vt = P(Vt |Xs ,Xs−1, . . .) for all s, t ∈ Z.

4 U and V are uncorrelated, i.e. Cov(Us ,Vt) = 0 for all s, t ∈ Z.


Remarks1 The Wold decomposition Xt = Ut + Vt with uncorrelated purely

non-deterministic U and deterministic V is unique.2 The purely non-deterministic component U has a

MA(∞)-representation as a weighted average of the white noise

Z : Ut =∞∑j=0

ψjZt−j .

3 Zt = P(Zt |Xt ,Xt−1, . . .) means that Zt is known when the(infinite) past of X up to time t is known.

4 Vt = P(Vt |Xs ,Xs−1, . . .) means that Vt is known as soon as the(infinite) past of X up to some point s (possibly preceding t!) isknown.

5 Because ofa) Cov(Zt ,Xt−j) = 0 for all t ∈ Z, j ∈ N, andb) Cov(Zt ,P(Xt |Xt−1,Xt−2, . . .)) = 0 for all t ∈ Z,

Zt is called innovation.


Examples

1 For a white noise ε and a random variable Y with mean µY andpositive variance σ2

Y , uncorrelated with all εt , we define a weaklystationary process Xt by Xt := Y + εt . Then we have:

1 P(Xt |Xt−1,Xt−2, . . .) = Y , Zt = εt ,

2 ψj =

{1 , j = 0

0 , j > 0, Ut = εt , Vt = Y .

2 For the AR(1)-process Xt = φXt−1 + εt with |φ| > 1 and a whitenoise ε, we have:

1 P(Xt |Xt−1,Xt−2, . . .) = 1φXt−1,

2 Zt = Xt − 1φXt−1, ∆(1) = σ2

ε

φ2 ,

3 ψj = 1φj

, Ut = Xt , Vt = 0.


Definition (Causality, Invertibility)1 We call a weakly stationary process X causal w.r.t. a white noise

ε if there exist real numbers (ψj)j∈N0 with∞∑j=0

|ψj | <∞ and

Xt =∞∑j=0

ψjεt−j for all t ∈ Z.

2 We call a weakly stationary process X invertible w.r.t. a white

noise ε if there exist real numbers (ψj)j∈N0 with∞∑j=0

|ψj | <∞

and εt =∞∑j=0

ψjXt−j for all t ∈ Z.


Examples

1 For the AR(1)-process Xt = φXt−1 + εt with |φ| 6= 1 and a whitenoise ε, we have:

a) X is invertible w.r.t. ε.b) If |φ| < 1, X is causal w.r.t. ε.c) If |φ| > 1, X is not causal w.r.t. ε. In this case, X is causal

w.r.t. the white noise Zt := Xt − 1φXt−1.

2 For the MA(1)-Prozess Xt := εt + θεt−1 with θ ∈ R and a whitenoise ε, we have:

a) X is causal w.r.t. ε.b) If |θ| < 1, X is invertible w.r.t. ε.c) If |θ| > 1, X is not invertible w.r.t. ε. In this case, X is

invertible w.r.t. the white noiseZt := Xt − 1

θXt−1 + 1θ2Xt−2 ± . . ..


Documents

Time Series Analysis - Saarland UniversityTime Series Analysis (SS 2019) Lecture 3 Slide 1 Linear forecasting of stationary time series starting point: weakly stationary time series