Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
Examples
For an independent white noise (εt)t∈Z, we consider a stochasticprocess X satisfying for all t ∈ Z the recursion Xt = φXt−1 + εtfor some φ ∈ R. Then we have in case of |φ| < 1:
1 Xt = εt + φεt−1 + φ2εt−2 + . . .,2 σ(Xt−1,Xt−2, . . .) = σ(εt−1, εt−2, . . .),3 E (Xt |Xt−1,Xt−2, . . .) = E (Xt |Xt−1) = φXt−1,4 E (Xt − E (Xt |Xt−1,Xt−2, . . .))2 = E (Xt − E (Xt |Xt−1))2 = σ2
ε .
For |φ| > 1, we have:1 Xt = − εt+1
φ −εt+2
φ2 − . . .,2 σ(Xt−1,Xt−2, . . .) = σ(Xt−1, εt−1, εt−2, . . .),3 E (Xt |Xt−1,Xt−2, . . .) = E (Xt |Xt−1),4 If ε is Gaussian white noise, we have E (Xt |Xt−1) = �Xt−1.
Time Series Analysis (SS 2019) Lecture 3 Slide 1
Linear forecasting of stationary time series
starting point: weakly stationary time series X with known meanµ := E (Xt) and autocovariance function γX ; we are looking for an’optimal’ linear combination
Xn+h := a0 + a1Xn + · · ·+ anX1,
to forecast Xn+h (h ∈ N) when given X1, . . . ,Xn, where ’optimal’stands for minimizing the mean squared forecast error
E (Xn+h − Xn+h)2.
Time Series Analysis (SS 2019) Lecture 3 Slide 2
More general problem: for a random vector W = (Wn, . . . ,W1)′ withcovariance matrix Γ and a random variable Y with finite variance, wewant to find a linear combination
Y := a0 + a1Wn + · · ·+ anW1
with minimal mean squared error E (Y − Y )2.
TheoremIn the above situation, we have:
1 Y = E (Y ) + a′(W − E (W )), with a any solution ofΓa = Cov(W ,Y ) (such a always exists).
2 E (Y − Y )2 = Var(Y − Y ) = Var(Y )− a′ Cov(W ,Y ) =Var(Y )− a′Γa.
3 Cov(W , Y − Y ) = 0.
Time Series Analysis (SS 2019) Lecture 3 Slide 3
Definition (Linear prediction)
Given a random vector W = (Wn, . . . ,W1)′ with covariance matrix Γ,a random variable Y with finite variance, and a any solution ofΓa = Cov(W ,Y ), we call Y := E (Y ) + a′(W − E (W )) linearprediction of Y given W , in symbols: P(Y |W ).
TheoremThe linear prediction has the following properties:
1 E (Y − P(Y |W )) = 0,
2 P(α1Y1 + α2Y2|W ) = α1P(Y1|W ) + α2P(Y2|W ) for allα1, α2 ∈ R,
3 P(n∑
i=1
αiWi + β|W ) =n∑
i=1
αiWi + β for all α1, . . . , αn, β ∈ R,
4 P(Y |W ) = E (Y ), when Cov(W ,Y ) = 0.
Time Series Analysis (SS 2019) Lecture 3 Slide 4
RemarksObviously, P(Y |W ) is σ(W1, . . . ,Wn)-measurable, which implies
1 E (Y − E (Y |W1, . . . ,Wn))2 ≤ E (Y − P(Y |W ))2
2 If E (Y |W1, . . . ,Wn) is linear in W1, . . . ,Wn, we haveP(Y |W ) = E (Y |W1, . . . ,Wn).
If W is a (univariate) random variable with positive variance, wehave
P(Y |W ) = β0 + β1W
and
E (Y − P(Y |W ))2 = Var(Y )− Cov(W ,Y )2
Var(W )
with
β1 =Cov(W ,Y )
Var(W )and β0 = E (Y )− β1E (W ) .
Time Series Analysis (SS 2019) Lecture 3 Slide 5
Examples
For a white noise (εt)t∈Z, we investigate the stationary processX θt = εt + θεt−1 for |θ| < 1.
1 P(Xt |Xt−1) = θ1+θ2Xt−1,
2 E (Xt − P(Xt |Xt−1))2 = σ2ε(1 + θ4
1+θ2 ),
3 P(Xt |Xt−1,Xt−2) = θ(1+θ2)1+θ2+θ4Xt−1 − θ2
1+θ2+θ4Xt−2,
4 E (Xt − P(Xt |Xt−1,Xt−2))2 = σ2ε(1 + θ6
1+θ2+θ4 ),5 P(Xt+h|Xt ,Xt−1) = 0 for all h ≥ 2.
For a random walk (Xt)t∈N0 with drift α0 and initial value x ∈ Rrelated to an independent white noise ε, we have for t ≥ 1:
1 P(Xt |Xt−1) = Xt−1 + α0,2 E (Xt − P(Xt |Xt−1))2 = σ2
ε .
Time Series Analysis (SS 2019) Lecture 3 Slide 6
Linear forecasting of time series
We apply the linear prediction operator P to calculate the linearforecast of Xn+h given X1, . . . ,Xn, when mean µ := E (Xt) andautocovariance function γX are known:
P(Xn+h|Xn, . . . ,X1) = µ +n∑
i=1
ai(Xn+1−i − µ), with a = (a1, . . . , an)′
any solution of Γna = γn(h), with
Γn = [γX (|i − j |)]ni ,j=1 =
γX (0) γX (1) . . . γX (n − 1)
γX (1) γX (0) . . . γX (n − 2)
.... . . . . .
...
γX (n − 1) γX (n − 2) . . . γX (0)
and γn(h) = (γX (h), γX (h + 1), . . . , γX (h + n − 1))′.
Time Series Analysis (SS 2019) Lecture 3 Slide 7
Remarks I
The equations comprising the system Γna = γn(1) are calledYule-Walker equations.
Dividing the Yule-Walker equations by γX (0), we arrive at asystem of equations making use only of autocorrelations ρX (h)with h = 1, . . . , n: Rna = ρn, with
Rn =
1 ρX (1) . . . . . . ρX (n − 1)
ρX (1) 1 ρX (1) . . . ρX (n − 2)
.... . . . . . . . .
...
ρX (n − 1) ρX (n − 2) . . . . . . 1
and ρn = (ρX (1), ρX (2), . . . , ρX (n))′.
Time Series Analysis (SS 2019) Lecture 3 Slide 8
Remarks II
γX (0) > 0 and asymptotic uncorrelatedness are sufficientconditions for Γn to be non-singular for all n ∈ N, and thereforealso for the existence of a unique solution to the Yule-Walkerequations.
In the following, for x = (x1, . . . , xn)′, x (r) denotes the reversedvector x (r) := (xn, . . . , x1)′.
Due to the special structure of Rn (Rn is a symmetricToeplitz-matrix), we have the following lemma:
Rnx(r) = (Rnx)(r) for all x ∈ Rn
Time Series Analysis (SS 2019) Lecture 3 Slide 9
Levinson-Durbin recursion
Theorem (Levinson-Durbin recursion)
For a weakly stationary process X with acf ρX , we denotevn := E (Xn+1 − P(Xn+1|X1, . . . ,Xn))2 and vn := vn
γX (0). We then
have, if vn > 0:
1 If an solves the equation Rnan = ρn, thenan+1 := (an+1,1, . . . , an+1,n+1)′ with
an+1,n+1 := γX (n+1)−a′nγn(1)(r)
vn= ρX (n+1)−a′nρ
(r)n
vnand
(an+1,1, . . . , an+1,n)′ = an − an+1,n+1a(r)n solves the equation
Rn+1an+1 = ρn+1.
2 vn+1 = vn(1− a2n+1,n+1) and vn+1 = vn(1− a2
n+1,n+1).
Time Series Analysis (SS 2019) Lecture 3 Slide 10
Remarks
1 The condition vn > 0 appearing in the preceding theorem isequivalent to Γn+1 and Rn+1 being non-singular.
2 Therefore, under the condition stated in the theorem, an+1 is theunique solution to the Yule-Walker equations used to computeP(Xn+2|X1, . . . ,Xn+1).
Examples
1 For the weakly stationary process X θt = εt + 1
2εt−1 with a white
noise (εt)t∈Z, we have:a) P(X4|X3,X2,X1) = 42
85X3 − 417X2 + 8
85X1,
b) E (X4 − P(X4|X3,X2,X1))2 = 341340σ
2ε .
Time Series Analysis (SS 2019) Lecture 3 Slide 11
2 For the weakly stationary process X θt = A cos(θt) + B sin(θt),
with uncorrelated random variables A and B with vanishingmean and unit variance, we have:
a) P(X2|X1) = cos(θ)X1,b) E (X2 − P(X2|X1))2 = sin2(θ),c) P(X3|X2,X1) = 2 cos(θ)X2 − X1,d) E (X3 − P(X3|X2,X1))2 = 0.
3 For a white noise ε and a random variable Y with mean µY andpositive variance σ2
Y , uncorrelated with all εt , we define a weaklystationary process Xt by Xt := Y + εt . Then we have for alln ∈ N:
a) P(Xn+1|Xn, . . . ,X1) = σ2ε
nσ2Y +σ2
ε· µY +
σ2Y
σ2Y +
σ2εn
· Xn+...+X1n ,
b) E (Xn+1 − P(Xn+1|Xn, . . . ,X1))2 = σ2ε(1 +
σ2Y
σ2ε+nσ2
Y).
Time Series Analysis (SS 2019) Lecture 3 Slide 12
Partial correlation
For 3 random variables Y1, Y2 and Y3, it might happen that Y3 ishighly correlated both with Y1 and Y2. Therefore it is possible thatY1 and Y2 might be highly correlated, but that this correlation stemsmostly from the impact of Y3 on Y1 and Y2. To measure this, partialcorrelation is used.
Definition (Partial correlation)
We take as given two univariate random variables Y1, Y2 and a(possibly multivariate) random variable Y3. We then call the
correlation of Y1 := Y1 − P(Y1|Y3) and Y2 := Y2 − P(Y2|Y3) thepartial correlation of Y1 and Y2 given Y3, in symbols:Corr(Y1,Y2|Y3) := Corr(Y1 − P(Y1|Y3),Y2 − P(Y2|Y3)).
Time Series Analysis (SS 2019) Lecture 3 Slide 13
Examples
For a white noise (εt)t∈Z, we consider the stochastic process Xwith Xt = φXt−1 + εt for all t ∈ Z and some φ ∈ R with|φ| < 1. We then have
Corr(X0,X2|X1) = 0.
For the weakly stationary process X θt = εt + θεt−1 with an
independent white noise (εt)t∈Z and |θ| < 1, we have
Corr(X0,X2|X1) = − θ2
1 + θ2 + θ4.
Time Series Analysis (SS 2019) Lecture 3 Slide 14
Partial autocorrelation function (PACF)
Definition (Partial autocorrelation function, PACF)
For a weakly stationary process (Xt)t∈Z and h ∈ N \ {1}, the partialcorrelation of X0 and Xh given X1, . . . ,Xh−1 is called partialautocorrelation αX (h) of lag h. For h = 1, one definesαX (1) = ρX (1).The function αX : N→ R, N 3 h 7→ αX (h) is called partialautocorrelation function (PACF) of X .
TheoremFor a weakly stationary process X with PACF αX ,
αX (n + 1) = an+1,n+1,
with an+1,n+1 denoting the coefficients obtained from theLevinson-Durbin recursion.
Time Series Analysis (SS 2019) Lecture 3 Slide 15
Definition (h-step-forecast, forecast error)1 For a weakly stationary process X = (Xt)t∈Z, we call
a) P(Xt+h|Xt ,Xt−1 . . . ,Xt+1−n) for h ∈ N and n ∈ N h-stepforecast of order n,
b) ∆n(h) := E (Xt+h − P(Xt+h|Xt ,Xt−1, . . . ,Xt+1−n))2 meansquared (forecast) error (mse) of the h-step forecast of order n,
c) P(Xt+h|Xt ,Xt−1, . . .) := limn→∞
P(Xt+h|Xt ,Xt−1, . . . ,Xt+1−n)
h-step forecast (based on an infinite past),d) ∆(h) := lim
n→∞∆n(h) = E (Xt+h − P(Xt+h|Xt ,Xt−1, . . .))2 mse
of the h-step forecast.
2 If ∆(1) = 0, the process X is called deterministic, singular orexactly linearly predictable.
3 If limh→∞
∆(h) = Var(X0), we call the process X purely
non-deterministic.
Time Series Analysis (SS 2019) Lecture 3 Slide 16
Remarks I
We always have ∆(h) ≤ ∆n+1(h) ≤ ∆n(h) for all n, h ∈ N.
The limit appearing in the definition of the h-step forecast,P(Xt+h|Xt ,Xt−1, . . .) := lim
n→∞P(Xt+h|Xt ,Xt−1, . . . ,Xt+1−n), is
to be understood as mse-convergence, i.e.
limn→∞
E (P(Xt+h|Xt ,Xt−1, . . .)−P(Xt+h|Xt ,Xt−1, . . . ,Xt+1−n))2 = 0.
A weakly stationary process is deterministic if and only if∆(h) = 0 for all h ∈ N.
Time Series Analysis (SS 2019) Lecture 3 Slide 17
Remarks II
As Xt+h can always be forecast by E (Xt+h) with an mse ofVar(Xt+h) = Var(X0), we have ∆n(h) ≤ Var(X0) and∆(h) ≤ Var(X0). The defining condition for purelynon-deterministic processes, lim
h→∞∆(h) = Var(X0), therefore
corresponds to the intuition that the process’ past does notsignificantly contribute to the forecasting of the process’ valuesfar in the future.
Time Series Analysis (SS 2019) Lecture 3 Slide 18
Examples
1 The weakly stationary process X θt = A cos(θt) + B sin(θt) with
uncorrelated random variables A and B with zero mean and unitvariance is deterministic.
2 The MA(1) process Xt := εt + θεt−1 is purely non-deterministic,as P(Xt+h|Xt ,Xt−1, . . . ,Xt+1−n) = 0 for all h ≥ 2 and n ∈ N.
3 The AR(1) process Xt = φXt−1 + εt with |φ| 6= 1 is purelynon-deterministic.
4 The process Xt := Y + εt with a white noise ε and a randomvariable Y with positive variance and uncorrelated to all εt , isneither deterministic nor purely non-deterministic.
Time Series Analysis (SS 2019) Lecture 3 Slide 19
Theorem (Wold decomposition)
If X = (Xt)t∈Z is weakly stationary but not deterministic, thefollowing holds true:
1 Zt := Xt − P(Xt |Xt−1,Xt−2, . . .) is a white noise with variance∆(1) and Zt = P(Zt |Xt ,Xt−1, . . .) for all t ∈ Z.
2 For ψj :=E(XtZt−j )
∆(1)(j ∈ N0), we have
∞∑j=0
ψ2j <∞. The process
Ut :=∞∑j=0
ψjZt−j is weakly stationary and purely
non-deterministic.
3 Vt := Xt − Ut = Xt −∞∑j=0
ψjZt−j is weakly stationary and
deterministic. We have Vt = P(Vt |Xs ,Xs−1, . . .) for all s, t ∈ Z.
4 U and V are uncorrelated, i.e. Cov(Us ,Vt) = 0 for all s, t ∈ Z.
Time Series Analysis (SS 2019) Lecture 3 Slide 20
Remarks1 The Wold decomposition Xt = Ut + Vt with uncorrelated purely
non-deterministic U and deterministic V is unique.2 The purely non-deterministic component U has a
MA(∞)-representation as a weighted average of the white noise
Z : Ut =∞∑j=0
ψjZt−j .
3 Zt = P(Zt |Xt ,Xt−1, . . .) means that Zt is known when the(infinite) past of X up to time t is known.
4 Vt = P(Vt |Xs ,Xs−1, . . .) means that Vt is known as soon as the(infinite) past of X up to some point s (possibly preceding t!) isknown.
5 Because ofa) Cov(Zt ,Xt−j) = 0 for all t ∈ Z, j ∈ N, andb) Cov(Zt ,P(Xt |Xt−1,Xt−2, . . .)) = 0 for all t ∈ Z,
Zt is called innovation.
Time Series Analysis (SS 2019) Lecture 3 Slide 21
Examples
1 For a white noise ε and a random variable Y with mean µY andpositive variance σ2
Y , uncorrelated with all εt , we define a weaklystationary process Xt by Xt := Y + εt . Then we have:
1 P(Xt |Xt−1,Xt−2, . . .) = Y , Zt = εt ,
2 ψj =
{1 , j = 0
0 , j > 0, Ut = εt , Vt = Y .
2 For the AR(1)-process Xt = φXt−1 + εt with |φ| > 1 and a whitenoise ε, we have:
1 P(Xt |Xt−1,Xt−2, . . .) = 1φXt−1,
2 Zt = Xt − 1φXt−1, ∆(1) = σ2
ε
φ2 ,
3 ψj = 1φj
, Ut = Xt , Vt = 0.
Time Series Analysis (SS 2019) Lecture 3 Slide 22
Definition (Causality, Invertibility)1 We call a weakly stationary process X causal w.r.t. a white noise
ε if there exist real numbers (ψj)j∈N0 with∞∑j=0
|ψj | <∞ and
Xt =∞∑j=0
ψjεt−j for all t ∈ Z.
2 We call a weakly stationary process X invertible w.r.t. a white
noise ε if there exist real numbers (ψj)j∈N0 with∞∑j=0
|ψj | <∞
and εt =∞∑j=0
ψjXt−j for all t ∈ Z.
Time Series Analysis (SS 2019) Lecture 3 Slide 23
Examples
1 For the AR(1)-process Xt = φXt−1 + εt with |φ| 6= 1 and a whitenoise ε, we have:
a) X is invertible w.r.t. ε.b) If |φ| < 1, X is causal w.r.t. ε.c) If |φ| > 1, X is not causal w.r.t. ε. In this case, X is causal
w.r.t. the white noise Zt := Xt − 1φXt−1.
2 For the MA(1)-Prozess Xt := εt + θεt−1 with θ ∈ R and a whitenoise ε, we have:
a) X is causal w.r.t. ε.b) If |θ| < 1, X is invertible w.r.t. ε.c) If |θ| > 1, X is not invertible w.r.t. ε. In this case, X is
invertible w.r.t. the white noiseZt := Xt − 1
θXt−1 + 1θ2Xt−2 ± . . ..
Time Series Analysis (SS 2019) Lecture 3 Slide 24