4. Autocovariance Generating Functionsrady.ucsd.edu/faculty/directory/valkanov/pub/classes/mfe/docs/lectu… · are always faced with a trade-off: – Longer sample, to have better

1 Today’s Agenda

1. Announcement: Project due date: August 9th,11:59pm. Send it by email to Ilona and me.

2. Kalman Filtering (Review and Finish)3. Discuss two basic models in finance and theirapplications

4. Autocovariance Generating Functions5. Time Domain vs Frequency Domain Decomposition6. Spectral Representation of time series7. Finish Filtering

2 Kalman Filtering:

Setup:yt = H 0zt + wtzt = Fzt−1 + vt

• yt is the observable variable (think “returns”)– The first equation is the “space” or the “observa-tion” equation.

• zt is the unobservable variable (think volatility orstate of the economy)– The second equation is called the “state” equa-tion.

• Note: Here I have suppressed the xt variable fromlast lecture.

• Note: The system is linear. There are results fornon-linear Kalman filters, but they are not veryclear.

• vt and wt are iid and assumed to be uncorrelated atall lags.

E (wtv0t) = 0

E (vtv0t) = Q

E (wtw0t) = R

• This is a state-space system.

• Any time series can be written as a state space.• In finance, the goal is to estimate H, F, Q, R.• We want to forecast yt+1|t. But

yt+1|t = H 0zt+1|t• Hence, we need to produce a forecast of zt+1|t.

• Here is an AR(2) example: Yt+1 = φ1Yt+φ2Yt−1+εt+1• State equation:·

Yt+1Yt

¸=

·φ1 φ21 0

¸ ·Yt+1Yt

¸+

·εt+10

¸• Space equation:

Yt+1 =£1 0

¤ · Yt+1Yt

¸• But this representation is not unique. Here isanother one:

• State Equation:·Yt+1φ2Yt

¸=

·φ1 1φ2 0

¸ ·Yt

φ2Yt−1

¸+

·εt+10

¸• Space Equation:

Yt+1 =£1 0

¤ · Yt+1φ2Yt

¸

Here are some interesting models that enter intothe state-space framework:• Time-Varying parameter models (TVP):

yt = φtxt + εtφt = ρφt−1 + ut

• We can let xt = yt−1 for a recursive model.• Such models are naively estimated every day byrolling-regressions. In the recursive scheme, weare always faced with a trade-off:– Longer sample, to have better estimates.– Shorter sample, to capture the dynamics in φt.

– The question is: What is the optimal windowlength?

• Kalman Filtering gives the best answer.• Optimal forecast of φt+1/t in the MSE sense isprovided by the Kalman Filter.

• No other linear filter can do better!

• The Kalman Filter calculates the forecasts zt+1|trecursively, starting with z1|0, then z2|1, ...until zT |T−1.

• Since zt|t−1 is a forecast, we can ask how good of aforecast it is?

• Therefore, we define Pt|t−1 = E¡¡zt − zt|t−1

¢ ¡zt − zt|t−1

¢¢,

which is the forecasting error from the recursiveforecast zt|t−1.

• The Kalman Filter can be broken down into 5 steps1. Initialization of the recursion. We need z1|0. Usually,we take z1|0 to be the unconditional mean, orz1|0 = E (z1) . (Q: how can we estimate E (z1)? )The associated error with this forecast is P1|0 =E¡¡z1|0 − z1

¢ ¡z1|0 − z1

¢¢

2. Forecasting yt (intermediate step)The ultimate goal is to calculate zt|t−1, but we dothat recursively. We will first need to forecast thevalue of yt, based on available information:

E (yt|yt−1,....,zt) = H 0ztFrom the law of iterated expectations,

Et−1 (Et (yt)) = Et−1 (yt) = H 0zt|t−1The error from this forecast is

yt − yt|t−1 = H 0 ¡zt − zt|t−1¢ + wtwith MSE

E¡yt − yt|t−1

¢ ¡yt − yt|t−1

¢0= E

hH 0 ¡zt − zt|t−1¢ ¡zt − zt|t−1¢0Hi +E [wtw0t]

= H 0Pt|t−1H +R

3. Updating Step (zt|t)– Once we observe yt, we can update our forecastof zt, denoting it by zt|t, before making the newforecast, zt+1|t.

– We do this by calculating E (zt|yt, xt, ...) = zt|tzt|t = zt|t−1 + E

¡¡zt − zt|t−1

¢ ¡yt − yt|t−1

¢¢ ∗³E¡yt − yt|t−1

¢ ¡yt − yt|t−1

¢0´−1 ¡yt − yt|t−1

¢– We can write this a bit more intuitively as:

zt|t = zt|t−1 + β¡yt − yt|t−1

¢where β is the OLS coefficient from regressing¡zt − zt|t−1

¢on¡yt − yt|t−1

¢.

– The bigger is the relationship between the twoforecasting errors, the bigger the correction mustbe.

– It can be shown thatzt|t = zt|t−1+Pt|t−1H

¡H 0Pt|t−1H +R

¢−1 ¡yt −H 0zt|t−1

¢– This updated forecast uses the old forecast zt|t−1,and the just observed values of yt.

4. Forecast zt+1|t.– Once we have an update of the old forecast, wecan produce a new forecast, the forecast of zt+1|t

Et (zt+1) = E (zt+1|yt, xt, ...)= E (Fzt + vt+1|yt, xt, ...)= FE (zt|yt, xt, ...) + 0= Fzt|t

– We can use the above equation to writeEt (zt+1) = F{zt|t−1

+Pt|t−1H¡H 0Pt|t−1H +R

¢−1 ¡yt −A0xt −H 0zt|t−1

¢= Fzt|t−1+FPt|t−1H

¡H 0Pt|t−1H +R

¢−1 ¡yt −A0xt −H 0zt|t−

– We can also derive an equation for the error inforecast as a recursionPt+1|t = F [Pt|t

−Pt|t−1H¡H 0Pt|t−1H +R

¢−1H 0Pt|t−1]F 0

+Q

5. Go to step 2, until we reach T. Then, we are done.

• Summary: The Kalman Filter produces– The optimal forecasts of zt+1|t and yt+1|t (optimalwithin the class of linear forecasts)

– We need some initialization assumptions– We need to know the parameters of the system,i.e. A, H, F, Q, R.

• Now, we need to find a way to estimate theparameters A, H, F, Q, R.

• By far, the most popular method is MLE.• Aside: Simulations Methods–getting away from therestrictive assumptions of εt

3 Estimation of Kalman Filters

• Suppose that z1, and the shocks (wt, vt) are jointlynormally distributed.

• Under such an assumption, we can make the verystrong claim that the forecasts zt+1|t and yt+1|t areoptimal among any functions of yt−1....

• In other words, if we have normal errors, we cannotproduce better forecasts using the past data thanthe Kalman forecasts!!!

• If the errors are normal, then all variables in thelinear system have a normal distribution.

• More specifically, the distribution of yt conditionalon yt−1, ... is normal, or

yt|yt−1... ∼ N¡H 0zt|t−1,

¡H 0Pt|t−1H +R

¢¢• Therefore, we can specify the likelihood function ofyt|yt−1 as we did above.

fyt|xt,yt−1 = (2π)−n/2¯̄H 0Pt|t−1H +R

¯̄−1/2× exp[−1

2

¡yt −H 0zt|t−1

¢0 ¡H 0Pt|t−1H +R

¢−1× ¡yt −H 0zt|t−1

¢]

• The problem is to maximizemaxH,F,Q,R

TXt=1

log fyt|,yt−1

• Words of wisdom:– This maximization problem can easily get un-manageable to estimate, even using moderncomputers. The problem is that searching forglobal max is very tricky.∗ A possible solution is to make as many restric-tions as possible and then to relax them one byone.

∗ A second solution is to write a model that givestheoretical restrictions.

– Recall that there are more than 1 state spacerepresentations of an AR process. This impliesthat some of the parameters in the state-spacesystem are not identified. In other words, morethan one value of the parameters (differentcombinations) can give rise to the same likelihoodfunction.∗ Then, which likelihood do we choose?∗ Have to make restrictions so that we have anexactly identified problem.

4 The Gordon-Growth Model

• Recall the definition of return:Rt+1 =

Pt+1 +Dt+1Pt

− 1which can be re-written, if we assume thatEtRt+1 = R, as:

Pt = Et

·Pt+1 +Dt+11 +R

¸= Et

"KXi=1

µ1

1 +R

¶iDt+i

#+Et

"µ1

1 +R

¶KPt+K

#

≈ Et

" ∞Xi=1

µ1

1 +R

¶iDt+i

#Note:

EtPt+1 = (1 +R)Pt − EtDt+11. A constant expected return, as assumed, does notimply that Pt would follow a martingale.

2. To obtain a martingale, we must construct aportfolio for which all dividend payments are re-invested in the stock. The value of this portfolio isa martingale (Campbell et al. p. 257)!

• Now, suppose that Dt+1 = (1 +G)Dt + εt, where εtis iid. Then EtDt+i = (1 +G)i Dt and

Pt = Et

" ∞Xi=1

µ1

1 +R

¶iDt+i

#=(1 +G)DtR−G =

EtDt+1R−G

• This is called the Gordon-Growth model• Note the unrealistic assumptions: EtRt+1 = R andG is constant.

• This is how people think about prices.• But the definition seems circuitous: Prices aredetermined by the discount rate, what is thediscount rate determined by?

• But it must be the case that (our model):EtRt+1 = a + γσ2t

• Hence:Pt ≈ EtDt+1

EtRt+1 −G• Q: Why is this useful?

• A: Suppose α = 0.08 and γ = 2.5, and σ2t = 0.152.

Then EtRt+1 = 0.1362.• Suppose that EtDt+1 = 1, and G = 0. Then:

Pt ≈ 1

EtRt+1=

1

0.1362= 7.34

• Now, suppose that volatility increases at t + 1, orσ2t+1 = 0.25

2. Hence, Et+1Rt+2 = 0.2363.This implies that:

Pt+1 ≈ 1

0.2363= 4.23

• What happened to Rt+1 (no dividend got paid)?Rt+1 =

Pt+1 − PtPt

=4.23− 7.344.23

< 0

• MORAL: COV ¡Rt+1,σ2t+1¢ < 0.• BUT: COV (Rt+1,σ2t ) > 0, on average.• ALSO: COV ¡Rt+1,4σ2t+1

¢< 0.

• When you look for a positive risk-return trade-off,you must look at the correct lead-lag relationship.

• You should not look at the contemporaneousrelationship.

• Forecasting is not the same as estimation!

5 A Basic Structural model: Consumption-basedmodel (CCAPM)

• So far, we have estimated the APT and the CAPM• The CAPM and the APT capture risk and return, butare their related to our more fundamental needs:consumption of goods.

• Some of you have asked me to clarify: What do youmean by “The equity premium puzzle is too high”?

• We will work out a “simple” model where assetsare priced explicitly relative to our utility fromconsumption.

• This explicit model will generate a familiar stochasticdiscount factor pricing relation.

• First, we have to model the behavior of a represen-tative investor

• Think of this investor as the average person in theeconomy.

• The investor invests primarily so she consumesgoods (cheese, Ferraris, etc) tomorrow.

• The utility function of this investor, today andtomorrow is:

U (ct, ct+1) = u (ct) + βu (ct+1)

where ct denotes consumption at date t and β is asubjective discount factor.

• At the beginning, we will work with u(ct), where– u(.) is concave, reflecting a decreasing marginalvalue of consumption

– u(.) is increasing, reflecting a insatiable desire formore consumption

– the curvature generates aversion to risk and tointer-temporal substitution: The investor prefers aconsumption stream that is steady over time andacross states of nature.

• But to make the model operational (estimable), wehave to give a functional form to u(.).

• We will assume thatu (ct) =

1

1− γc1−γt

where γ captures the curvature.

• Here is the game:– We want to value an asset with an uncertainpayoff xt+1

– It is a two-period problem (today/tomorrow oryoung/old, etc.)

– The problem is:maxζu (ct) +Et [βu (ct+1)]

such that : ct = et − ptζct+1 = et+1 + xt+1ζ

– et is the original endowment of the individual(cash he inherited from his parents)

– At time t, he has an endowment et.– He decided to purchase ζ shares of the asset atprice pt

– Whatever is left over after the purchase of theasset et − ptζ, is used for consumption.

– At time t + 1, we has an endowment et+1 but alsothe payoff from the asset xt (think pt+1+dt+1) timethe number of shares.

– Since the individual “dies” at t + 1, he mustconsume everything, so ct+1 = et+1 + xt+1ζ.

• To recapitulate: We want to find the number ofshares that would be bought at price pt, given thatthe payoff from this investment is xt+1.

• Implicitly, we will find the price of the asset as afunction of the payoff and everything else.

• This is called an equilibrium model.• This is also a structural model.• Solving the maximization problem yields the FOC:

ptu0(ct) = Et [βu0 (ct+1)xt+1]

orpt = Et

·βu0 (ct+1)u0(ct)

xt+1

¸

• The equationpt = Et

·βu0 (ct+1)u0(ct)

xt+1

¸is quite interesting.

1. Note that if we write mt+1 = βu0(ct+1)u0(ct)

and interpretmt+1 as the “stochastic” discount factor, then weget the familiar pricing equation, where mt+1 is afunction of consumption. Hence, this is called aconsumption-based pricing model.

pt = Et [mt+1xt+1]

2. Note that we cannot go beyond this point withoutspecifying a functional form for u(), and hence, foru0(). In the above parameterization, u0(c) = c−γ andthe FOC is:

pt = Et

"β

µct+1ct

¶−γxt+1

#3. Now, we can rewrite the model as:

1 = Et

"β

µct+1ct

¶−γxt+1pt

#= Et

"β

µct+1ct

¶−γ(1 +Rt+1)

#4. If we have data on Rt+1 and on ct, we can think ofa way to estimate the parameters γ and β.

5. Problem: The above relationship is nonlinear...weonly know how to run linear regressions....

6. No problem...GMM

• Here is a good insight: The equation1 = Et

"β

µct+1ct

¶−γ(1 +Rt+1)

#must hold for any asset. Indeed, we did notspecify a particular asset when deriving the aboveequations.– Let’s consider a particular asset, the risk freeasset with return Rf (and forget uncertainty andexpectations for a moment)

– The pricing equation can be rewritten as

(1 +Rf) =1

β

µct+1ct

¶γ

– Take logs:ln(1 +Rf) = rf = − lnβ + γ (ln ct+1 − ln ct)

– Note that ln ct+1− ln ct is nothing but the growth inconsumption between t + 1 and t.

– We can estimate the above equation if we havedata on rf and ct.

– What is the interpretation of γ?

• Now, we have to take care of the uncertainly. Forthat we will use log-normality

• If X is conditionally lognormally distributed, it hasthe convenient property

lnEt (X) = Et (ln(X)) +1

2V art (ln(X))

• Recall our discussion (lecture 1) of: g(E(X)) 6=E(g(X)). In the above example, g() = ln().

• In addition, we will assume that V art (ln(X)) =V ar (ln(X)) = σ2x. (What is this assumption called?)

• Recall1 = Et

"β

µct+1ct

¶−γ(1 +Rt+1)

#• Taking logs0 = lnEt

"β

µct+1ct

¶−γ(1 +Rt+1)

#

= Et

"ln

Ãβ

µct+1ct

¶−γ(1 +Rt+1)

!#+

1

2V ar

Ãln

Ãβ

µct+1ct

¶−γ(1 +Rt+1)

!!= Et [ln (1 +Rt+1) + lnβ − γ (ln ct+1 − ln ct)] +1

2V ar (ln (1 +Rt+1) + lnβ − γ (ln ct+1 − ln ct))

= Etrt+1 + lnβ − γEt [(ln ct+1 − ln ct)] +1

2

£σ2r + γ2σ2∆c − 2γσr,∆c

¤

• The equation0 = Etrt+1 + lnβ − γEt [(ln ct+1 − ln ct)] +

1

2

£σ2r + γ2σ2∆c − 2γσr,∆c

¤is valid for any asset. It holds for the risk free

asset, where σ2rf = σrf,∆c = 0. Therefore, we canwrite

rf = − lnβ + γEt [(ln ct+1 − ln ct)]− 12γ2σ2∆c

• Comparing this formula to what we had before,the term −12γ2σ2∆c adjusts for the variance, oruncertainly in consumption, provided that allprocesses are lognormal.

• The advantage of this formula is that we can handleuncertainly and any other asset.

• FromEtrt+1 = − lnβ + γEt [(ln ct+1 − ln ct)]−

1

2

£σ2r + γ2σ2∆c − 2γσr,∆c

¤• And

rf = − lnβ + γEt [(ln ct+1 − ln ct)]− 12γ2σ2∆c

we obtain the cool expression

Errt+1 − rf = −γσr,∆c − σ2r2

= −γCov (rt+1, (ln ct+1 − ln ct))− V ar (rt+1)2

• In words, the excess return is equal to the co-variance of the asset return with consumptiongrowth.

• What is this result reminiscent of?• This is called the Consumption CAPM (or CCAPM)?• But then:– The CAPM does not hold. Does CCAPM hold?– Why is this model better? (before β, now γ)

• The CCAPM is as successful as the CAPM, or evenless, but– The coefficient γ has a very nice interpretation: Itmeasures our aversion to risk.

– We have a consumption variable in the pricingkernel.

– To test the CAPM we needed the market port-folio (Roll’s critique). Similarly, now we needconsumption.

– The CCAPM (with the added log-linearity restric-tions) is easy to test using regressions.

• Note that we have two regressions that we can runin order to estimate γ

• First, using the riskless raterf = − lnβ + γEt [(ln ct+1 − ln ct)]− 1

2γ2σ2∆c

• Second, using the risky rateErrt+1 − rf = −γσr,∆c − σ2r

2• Note that both equations must give us the sameresult (statistically speaking, at least).

• The trouble is that the estimates of γ in thoseregressions are in total disagreement.

• Either the risky rate has been “too high” or theriskless rate has been “too low” to reconcile themodel with the data.

• What’s next: The assumption that the risk premiumErrt+1 − rf does not vary with time has been seen,lately, as being particularly bothersome.

• A few models have tried to relax this assumption,while keeping the economic story of the model.

6 Autocovariance Generating Function

Before we go onto spectral decomposition, we haveto define the autocovariance generating function, g.• A seemingly vacuous definition: For each autoco-variance stationary process Yt with autocovariancesfunction ρ (j) = E(YtYt−j), one way of summarizingthe autocovariances is by defining an autocovari-ance generating function:

gY (z) =∞X

h=−∞ρ (h) zh

• This function is constructed by taking the j-thautocovariance and multiplying it by ...

• For instance, gY (1) =P∞

h=−∞ ρ (h) =long-run vari-ance of a process (or sum of all its autocovariancefunctions.)

• This definition would be pretty useless, were it notfor the following observation:– Suppose that

Xt = F (L) εt

– where F (L) is the filter (polynomial in L) and εt iswhite noise.

• The autocovariance generating functions of Xt andε are related by:

gX (z) = F (z)F¡z−1¢σ2ε

• We can generate the autocovariances of X easily,by figuring out the filter F (L) .

• Example: AR(1)(1− φL)Xt = εt

Xt = (1− φL)−1 εtor F (L) = (1− φL)−1 . Then, we claim that:

gx (z) = F (z)F¡z−1¢σ2ε

=σ2ε

(1− φz) (1− φz−1)• Check:

σ2ε(1− φz) (1− φz−1)

= σ2ε¡1 + φz + φ2z2 + ...

¢×¡1 + φz−1 + φ2z−2 + ...

¢• The coefficient on zj is:

σ2ε¡φj + φj+1 + φj+2 + ...

¢= φjσ2ε

¡1 + φz + φ2z2 + ...

¢=

φjσ2ε¡1− φ2

¢• So, looking at the coefficient of zj will give us thej-th autocovariance of Xt, once we know gx (z) .

• Hence, gx (z) is the autocovariance-generatingfunction of Xt.

Example: MA(1)Xt = (1 + θL) εt

or F (L) = (1 + θL) . Then we claim thatgx (z) = F (z)F

¡z−1¢σ2ε

= (1 + θz)¡1 + θz−1

¢σ2ε

= σ2ε£1 + θz + θz−1 + θ2

¤• The coefficient on z0 is ¡1 + θ2

¢σ2ε (this is ρ (0))

• The coefficient on z1 is σ2εθ (this is ρ (1))• The coefficient on z−1 is σ2εθ (this is ρ (−1))• The coefficients on all other powers of z are 0.

• The autocovariance generating function is THE wayto compute autocovariances of general ARMA(p,q)processes:¡1− φ1L− φ2L

2 − ...− φpLp¢Xt =

¡1 + θ1L + θ2L

2 + ... + θqP (L)Xt = Q (L) εt

Xt = P (L)−1Q(L)εtXt = F (L) εt

where F (L) = P (L)−1Q(L).• Therefore:gx (z) = F (z)F

¡z−1¢σ2ε

= σ2ε

¡1 + θ1z + θ2z

2 + ... + θqzq¢ ¡1 + θ1z

−1 + θ2z−2 +¡

1− φ1z − φ2z2 − ...− φpz

p¢ ¡1− φ1z

−1 − φ2z−2 −

• The coefficient on zj represents the jth autocovari-ance of Xt.

• CONCLUSION: Since the autocovariance functionρ (j) captures the dynamic behavior of Xt, thenthe autocovariance generating function gx (z) alsodoes.

• We will use the transformations of ρ (j) or ofgx (z), but the information in those functions will bepreserved.

7 Time Domain vs Frequency Domain

• Thus far, we have explicitly assumed that anycovariance stationary process Yt can be written as

Yt = µ+∞Xj=0

θjεt−j

• where θj are parameters and {εt} is a sequence ofwhite noise, or the error that one would make in theforecasting of Yt, using past Y 0t s.

• Why: Well, if εt is a white noise sequence, weargued that we can apply the (finite or infinite) filterθ (L) to obtain any covariance stationary process.

• This is known as the Wold Decomposition of a timeseries.

• It captures the behavior of the variable Y at differentpoints in time.

• We analyzed the time domain properties of timeseries by looking at autocovariances, etc.

• Note: The increments are uncorrelated. Eachinnovation provides new information.

• Instead of looking at the time series propertiesof the series, we ask the following question: Howoften would the time series exhibit certain periodicalbehaviors?

• We will decompose a covariance stationary processinto orthogonal cyclical components.

• Result: A covariance stationary process Yt can bewritten as:Yt =

Z ∞0

α (ω) cos(ωt)dω +

Z ∞0

β (ω) sin(ωt)dω

• The goal is to determine how important are thefluctuations at different frequencies ω.

• In other words, we want to characterize the mean-zero random variables α (ω) and β (ω) , much aswe were trying to find the parameters θ in the timedomain representation.

Recall:Period ∗ Frequency = 2π

• High frequency movements: Things that happenvery often in one period.

• Low frequency movements: Things that happenrarely.

• Result: Let f (x) be an absolutely integrablefunction. Then we can write

f (x) =a02+∞Xk=1

(α (ω) cos (kω) + β (ω) sin (kω))

• The function f (ω) defined by:f (ω) =

1

2π

∞Xh=−∞

ρ (h) e−iωh

is the Fourier transform of the function ρ (h) . It iscalled the spectral density associated with ρ (h) .

• The function:ρ (h) =

Z ∞−∞f (ω) eiωhdω

is the inverse Fourier transform of f (ω) .• The functions ρ (.) and f (.) are Fourier pairs.

Let’s put the pieces together:• First, recall that for a covariance stationary randomvariable Yt with autocovariances ρ (h) , we have theautocovariance generating function:

gy (z) =∞X

h=−∞ρ (h) zh

• Second, the Fourier transform of ρ (h) isf (ω) =

1

2π

∞Xh=−∞

ρ (h) e−iωh

=1

2πgy¡e−iω

¢• The function fy (ω) is called the SPECTRUM of Yt.• For now, we can only say that the spectrum fy (ω)is a transformation of gy (.)

• From above,f (ω) =

1

2π

∞Xh=−∞

ρ (h) e−iωh

• But, de Moivre’s theorem allows us to writee−iωh = cos (ωh)− i sin (ωh)

• Substituting above,f (ω) =

1

2π

∞Xh=−∞

ρ (h) [cos (ωh)− i sin (ωh)]

• But, we know that ρ (h) = ρ (−h) . (Why?)• Therefore:

f (ω) =1

2πρ (0) [cos (0)− i sin (0)] + 1

2π

∞Xh=1

ρ (h)×

[cos (ωh)− i sin (ωh) + cos (−ωh)− i sin (−ωh)]

• But note that:cos (0) = 1

sin (0) = 0

sin (−ω) = − sin (ω)cos (−ω) = cos (ω)

• Thereforef (ω) =

1

2πρ (0) +

1

2π

∞Xh=1

ρ (h)×

[cos (ωh)− i sin (ωh) + cos (ωh) + i sin (ωh)]=

1

2πρ (0) + 2

1

2π

∞Xh=1

ρ (h) cos (ωh)

• Final two observations:– Since cos (ω) = cos (−ω) , the spectrum is sym-metric around ω.

– Since cos (w + 2πk) = cos (ω) , if we know thevalue of f (ω) for (0,π) , we can infer the value forany π.

• Example: MA (1) process:Xt = (1 + θL) εt

• Recall that the spectrum of Xt at frequency ω isdefined as:

fx (ω) =1

2πgx¡e−iω

¢where

gx¡e−iω

¢= σ2ε

¡1 + θe−iω

¢ ¡1 + θeiω

¢• Therefore,fx (ω) =

1

2πσ2ε¡1 + θe−iω

¢ ¡1 + θeiω

¢=

1

2πσ2ε¡1 + θe−iω + θeiω + θ2

¢=

1

2πσ2ε¡1 + θ2 + θ {cosω − i sinω} + θ {cosω + i sinω}

=1

2πσ2ε¡1 + θ2 + 2θ cosω

¢

• Plot of fx (ω) = 12πσ

2¡1 + θ2 + 2θ cosω

¢as a function

of ω, the frequency, for θ = 0.5 and σ2 = 1.

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0 0.5 1 1.5 2 2.5 3frequency(w)

• Plot of fx (ω) = 12πσ

2ε

¡1 + θ2 + 2θ cosω

¢as a function

of ω, the frequency, for θ = −0.5 and σ2 = 1.

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0 0.5 1 1.5 2 2.5 3frequency(w)

• EXAMPLE: AR(1) Xt = φXt−1 + εt or(1− φL)Xt = εt

• Recall that the spectrum of Xt at frequency ω isdefined as:

fx (ω) =1

2πgx¡e−iω

¢where

gx¡e−iω

¢=

σ2ε(1− φe−iω) (1− φeiω)

• Thereforefx (ω) =

1

2π

σ2ε(1− φe−iω) (1− φeiω)

=1

2π

σ2ε1− φe−iω − φeiω + φ2

=1

2π

σ2ε1 + φ2 − 2φ cosω

• For φ > 0, the denominator is monotonicallyincreasing in ω over [0,π] or fx (ω) is monotonicallydecreasing in ω.

• Graph of spectrum of AR(1) with φ = 0.2, 0.5, 0.7

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0 0.5 1 1.5 2 2.5 3frequency(w)

• Graph of spectrum of AR(1) with φ = −0.2, −0.5,−0.7

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0 0.5 1 1.5 2 2.5 3frequency(w) s

• Q: What is the spectrum of a white noise process?• Plot it as a function of ω.

• Since we know the autocovariance generatingfunction of an ARMA(p, q), we can derive itsspectrum, as done before. You can convinceyourselves that the spectrum of an ARMA(p, q)process is

fx (ω) =σ2ε2π

Πqj=1¡1 + η2j − 2ηj cosω

¢Πpk=1

¡1 + λ2k − 2λk cosω

¢for parameters ηj and λk.

• Such functions will usually not have the nice mono-tonic properties of an AR(1) or MA(1) processes.

• It is not a problem to compute them with a computer.

• CONCLUSION:– If we know the autocovariances (and thus theautocovariance generating function), we cancalculate fx (ω) for any value of ω.

fx (ω) =1

2π

∞Xh=−∞

ρ (h) e−iωh

– IF we know fx (ω) for any value of ω, we canfind all the autocovariances of the process. Thisfollows because f (ω) and ρ (h) are Fourier pairs,and we can write

ρ (h) =

Zfx (ω) e

iωhdω

=

Zfx (ω) cos (ωh) dω

– Thus, the spectrum and the autocovariancescontain EXACTLY the same information about thetime series process.

• Q: OK, but why is the spectrum useful? We haveρ (h) and it is quite intuitive

• Note: For h = 0, we haveρ (0) = V ar (Xt) =

Z π

−πfx (ω) dω

• In other words, the variance of Xt is the area underthe spectrum between −π and π.

• More generally, Z s

−sfx (ω) dω

is the fraction of the variance of Xt betweenfrequencies −s and s!

•

Documents

4. Autocovariance Generating Functionsrady.ucsd.edu/faculty/directory/valkanov/pub/classes/mfe/docs/lectu… · are always faced with a trade-off: – Longer sample, to have better