Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
ForecastingLecture 1: Foundations
Bruce E. Hansen
Central Bank of ChileOctober 29-31, 2013
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 1 / 108
3-Day Course
TuesdayI Point ForecastingI Forecast SelectionI Leading IndicatorsI Variance ForecastingI Interval Forecasting
WednesdayI Combination ForecastsI Multi-Step ForecastingI Fan Charts
Thursday: Structural Breaks
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 2 / 108
Course Website
www.ssc.wisc.edu/~bhansen/cbc
Slides for all lectures
Data for the lectures
R code for empirical analysis for lectures 1 & 2
Related: www.ssc.wisc.edu/~bhansen/creteI 5-day forecasting courseI All the empirical analysis reported here was done in spring 2012, so the“forecasts” appear to be about the past
I We can compare the forecasts with realizations
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 3 / 108
Today’s Schedule
What is Forecasting?
Point Forecasting
Linear Forecasting Models
Forecast Selection: BIC, AIC, AICc , Mallows, Robust Mallows, FPE,Cross-Validation, PLS, LASSO
Leading Indicators
Variance Forecasting
Interval Forecasting
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 4 / 108
Example 1
U.S. Quarterly Real GDPI 1960:1-2012:1
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 5 / 108
Figure: U.S. Real Quarterly GDP
1960 1970 1980 1990 2000 2010
4000
6000
8000
1000
012
000
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 6 / 108
Transformations
It is mathematically equivalent to forecast yn+h or any monotonictransformation of yn+h and lagged values.
I It is equivalent to forecast the level of GDP, its logarithm, orpercentage growth rate
I Given a forecast of one, we can construct the forecast of the other.
Statistically, it is best to forecast a transformation which is close to iidI For output and prices, this typically means forecasting growth ratesI For rates, typically means forecasting changes
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 7 / 108
Annualized Growth Rate
yt = 400(log(Yt )− log(Yt−1))
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 8 / 108
Figure: U.S. Real GDP Quarterly Growth
1960 1970 1980 1990 2000 2010
10
50
510
15
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 9 / 108
Example 2
U.S. Monthly 10-Year Treasury Bill RateI 1960:1-2012:4
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 10 / 108
Figure: U.S. 10-Year Treasury Rate
1960 1970 1980 1990 2000 2010
24
68
1012
14
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 11 / 108
Monthly Change
yt = Yt − Yt−1
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 12 / 108
Figure: U.S. 10-Year Treasury Rate Change
1960 1970 1980 1990 2000 2010
1.5
1.0
0.5
0.0
0.5
1.0
1.5
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 13 / 108
Notation
yt : time series to forecastn : last observationn+ h : time period to forecasth : forecast horizon
I We often want to forecast at long, and multiple, horizonsI For the first days we focus on one-step (h = 1) forecasts, as they arethe simplest
In : Information available at time n to forecast yn+hI Univariate: In = (yn , yn−1, ...)I Multivariate: In = (xn , xn−1, ...) where xt includes yt , “leadingindicators”, covariates, dummy indicators
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 14 / 108
Forecast Distribution
When we say we want to forecast yn+h given In,I We mean that yn+h is uncertain.I yn+h has a (conditional) distributionI yn+h | In ∼ F (yn+h |In)
A complete forecast of yn+h is the conditional distribution F (yn+h |In)or density f (yn+h |In)F (yn+h |In) contains all information about the unknown yn+hSince F (yn+h |In) is complicated (a distribution) we typically reportlow dimensional summaries, and these are typically called forecasts
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 15 / 108
Standard Forecast Objects
Point Forecast
Variance Forecast
Interval Forecast
Density forecast
Fan Chart
All of these forecast objects are features of the conditional distribution
Today, we focus on point forecasts
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 16 / 108
Point Forecastsfn+h|h, the most common forecast object“Best guess” for yn+h given the distribution F (yn+h |In)We can measure its accuracy by a loss function, typically squared error
` (f , y) = (y − f )2
The risk is the expected loss
En` (f , yn+h) = E((yn+h − f )2 |In
)The “best”point forecast is the one with the smallest risk
f = argminf
E((yn+h − f )2 |In
)= E (yn+h |In)
Thus the optimal point forecast is the true conditional expectationPoint forecasts are estimates of the conditional expectation
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 17 / 108
Estimation
The conditional distribution F (yn+h |In) and ideal point forecastE (yn+h |In) are unknownThey need to be estimated from data and economic models
Estimation involvesI Approximating E (yn+h |In) with a parametric familyI Selecting a model within this parametric familyI Selecting a sample period (window width)I Estimating the parameters
The goal of the above steps is not to uncover the “true”E (yn+h |In),but to construct a good approximation.
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 18 / 108
Information Set
What variables are in the information set In?All past lags
I In = (xn , xn−1, ...)
What is xt?I Own lags, “leading indicators”, covariates, dummy indicators
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 19 / 108
Markov Approximation
E (yn+1|In) = E (yn+1|xn, xn−1, ...)I Depends on infinite past
We typically approximate the dependence on the infinite past with aMarkov (finite memory) approximation
For some p,
E (yn+1|xn, xn−1, ...) ≈ E (yn+1|xn, ..., xn−p)
This should not be interpreted as true, but rather as anapproximation.
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 20 / 108
Linear Approximation
While the true E (yn+1|xn, ..., xn−p) is probably a nasty non-linearfunction, we typically approximate it by a linear function
E (yn+1|xn, ..., xn−p) ≈ β0 + β′1xn + · · ·+ β′pxn−p
= β′xn
Again, this should not be interpreted as true, but rather as anapproximation.
The error is defined as the difference between yn+h and the linearfunction
et+1 = yt+1 − β′xt
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 21 / 108
Linear Forecasting Model
We now have the linear point forecasting model
yt+1 = β′xt + et+h
As this is an approximation, the coeffi cient and eror are defined byprojection
β =(E(xtx′t
))−1(E (xtyt+1))
et+1 = yt+1 − β′xtE (xtet+1) = 0
σ2 = E(e2t+1
)The conditional variance σ2t = E
(e2t+1|It
)may be time-varying
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 22 / 108
Univariate (Autoregressive) Model
xt = (yt , yt−1, ..., yt−k+1)
A linear forecasting model is
yt+1 = β0 + β1yt + β2yt−1 + · · ·+ βkyt−k+1 + et+1
AR(k) —Autoregression of order kI Typical AR(k) models add a stronger assumption about the error et+1
F IID (independent)F MDS (unpredictable)F white noise (linearly unpredicatable/uncorrelated)
I These assumptions are convenient for analytic purpose (calculations,simulations)
I But they are unlikely to be true
F Making an assumption does not make the assumption trueF Do not confuse assumptions with truth
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 23 / 108
Least Squares Estimation
β =
(n−1∑t=0
xtx′t
)−1 (n−1∑t=0
xtyt+1
)yn+1|n = fn+1|n = β
′xn
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 24 / 108
GDP Example
yt = ∆ log(GDPt ), quarterlyAR(4) (reasonable benchmark for quarterly data)
yt+1 = β0 + β1yt + β2yt−1 + β3yt−2 + β4yt−3 + et+1
β s(β)Intercept 1.54 (0.45)∆ log(GDPt ) 0.29 (0.09)∆ log(GDPt−1) 0.18 (0.10)∆ log(GDPt−2) −0.05 (0.08)∆ log(GDPt−3) 0.06 (0.10)
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 25 / 108
Point Forecast - GDP Growth
AR(4)
Data Forecast Actual2011:1 0.362011:2 1.332011:3 1.802011:4 2.912012:1 1.842012:2 2.59 1.20
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 26 / 108
Interest Rate Exampleyt = ∆RatetAR(12) (reasonable benchmark for monthly data)
β s(β)Intercept −0.002 (0.01)∆Ratet 0.40 (0.06)∆Ratet−1 −0.26 (0.07)∆Ratet−2 0.11 (0.06)∆Ratet−3 −0.07 (0.07)∆Ratet−4 0.10 (0.07)∆Ratet−5 −0.08 (0.07)∆Ratet−6 −0.05 (0.06)∆Ratet−7 −0.09 (0.06)∆Ratet−8 −0.01 (0.07)∆Ratet−9 0.03 (0.07)∆Ratet−10 0.09 (0.07)∆Ratet−11 −0.08 (0.06)
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 27 / 108
Point Forecast - 10-year Treasury Rate
AR(12)
Data Forecast ActualLevel Change Level Change Level Change
2012:1 1.97 -0.012012:2 1.97 0.002012:3 2.17 0.202012:4 2.05 -0.122012:5 1.93 -0.12 1.82 -0.23
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 28 / 108
Forecast Selection
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 29 / 108
Forecast SelectionWe used (arbitrarily) an AR(4) for GDP,and an AR(12) for the 10-year rateThe forecasts will be sensitive to this choiceGDP Example
Model ForecastAR(0) 2.99AR(1) 2.59AR(2) 2.65AR(3) 2.68AR(4) 2.59AR(5) 2.83AR(6) 2.83AR(7) 2.83AR(8) 2.78AR(9) 2.87AR(10) 2.87AR(11) 2.91AR(12) 3.45Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 30 / 108
Forecast Selection - Big Picture
What is the goal?I Accurate Forecasts
F Low Risk (low MSFE)
Finding the “true”model is irrelevantI The true model may be an AR(∞) or have a very large number ofnon-zero coeffi cients
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 31 / 108
Testing
It is common to use statistical tests to select empirical models
This is inappropriateI Tests answer the scientific question: Is there suffi cient evidence toreject the hypothesis that this coeffi cient is zero?
I Tests are not designed to answer the question: Which estimate yieldsthe better forecast?
This is not a minor issueI Lengthy statistics literature documenting the poor properties of "postselection" estimators.
I Estimators based on testing have particularly bad properties
Tests are appropriate for answering scientific questions aboutparameters
Standard errors are appropriate for measuring estimation precision
For model selection, we want something different
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 32 / 108
Model Selection: Framework
Set of estimates (models)
I β(m), m = 1, ...,M
Corresponding forecasts fn+1|n(m)
There is some population criterion C (m) which evaluates theaccuracy of fn+1|n(m)
I m0 = argminm C (m) is infeasible best estimator
There is a sample estimate C (m) of C (m)
m = argminm C (m) is empirical analog of m0β(m) is selected estimator
fn+1|n(m) selected forecast
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 33 / 108
Point Forecast and MSFEGiven an estimate β(m) of β , the point forecast for yn+1 is
fn+1|n = β(m)′xn
The forecast error is
yn+1 − fn+1|n = x′nβ+ et+1 − x′n β(m)
= en+1 − x′n(
β(m)− β)
The mean-squared-forecast-error (MSFE) is
MSFE (m) = E(en+1 − x′n
(β(m)− β
))2' σ2 + E
((β(m)− β
)′Q(m)
(β(m)− β
))where Q(m) = E (xnx′n) .A good forecast has low MSFE
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 34 / 108
Selection Criterion
Bayesian Information Criterion (BIC)I C (m) = P (m is true)
Akaike Information Criterion (AIC), Corrected AIC (AICc )I C (m) = KLIC
Mallows, Predictive Least Squares, Final Prediction Error,Leave-one-out Cross Validation:
I C (m) = MSFE
LASSOI Penalized LS
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 35 / 108
Important: Sample must be constant when comparingmodels
This requires careful treatment of samples
Suppose you observe yt , t = 1, ..., n
Estimation of an AR(k) requires k initial conditions, so the effectivesample is for obserations t = 1+ k , ..., n
The sample varies with k, sample size is n− kFor valid comparison of AR(k) models for k = 1, ...,K
I Fix sample with observations t = 1+K , ..., nI n−K observationsI Estimate all AR(k) models using this same n−K observations
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 36 / 108
Bayesian Information Criterion (BIC)
M models, equal prior probability that each is the “true”model
Compute posterior probability that model m is true, given data
Schwarz showed that in the normal linear regression model theposterior probability is proportional to
p(m) ∝ exp(−BIC (m)
2
)BIC (m) = n log σ2(m) + log(n)k(m)
whereI k(m) = # of parametersI σ2(m) = n−1 ∑n−1t=0 e
2t+1(m) = MLE estimate of σ2 in model m
The model with highest probability maximizes p(m),or equivalently minimizes BIC (m)
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 37 / 108
Bayesian Information Criterion - Properties
ConsistentI If true model is finite dimensional, BIC will identify it asymptotically
ConservativeI Tends to pick small models
Ineffi cient in nonparametric settingsI If there is no true finite-dimensional model, BIC is sub-optimalI It does not select a finite-sample optimal model
We are not interested in “truth”, rather we want good performance
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 38 / 108
Akaike Information Criterion (AIC)
Estimates Kullback-Leibler information criterion (KLIC) distancebetween true and estimated density
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 39 / 108
AIC
In the linear regression model
AIC = 2L(θ) + 2k= n log σ2(m) + 2k(m)
Similar in form to BIC, but “2” replaces log(n)
Picking a model with the smallest AIC is picking the model with thesmallest estimated KLIC.
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 40 / 108
Corrected AIC
In the normal linear regression model, Hurvich-Tsai (1989) calculatedthe exact AIC
AICc (m) = AIC (m) +2k(m) (k(m) + 1)n− k(m)− 1
Works better in finite samples than uncorrected AIC
It is an exact correction when the true model is a linear regression,not time series, with iid normal errors.
In time-series or non-normal errors, it is not an exact correction.
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 41 / 108
Comments on AIC Selection
Widely used, partially because of its simplicity
Full justification requires correct specificationI normal linear regression
Critical specification assumption: homoskedasticityI AIC is a biased estimate of KLIC under heteroskedasticity
Criterion: KLICI Not a natural measure of forecast accuracy.
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 42 / 108
Mallows Criterion
Cn(m) = σ2(m) +2n
σ2k(m)
Uses a preliminary estimate σ2 of the variance
Cn(m) is an (approximately) unbiased estimate of the MSFE underhomoskedasticity and one-step forecasting
Model m which minimizes Mallows criterion is an estimate of thelowest MSFE model
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 43 / 108
Final Prediction Error (FPE) Criterion
FPEn(m) = σ2(m)(1+
2nk(m)
)
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 44 / 108
Relation between Mallows, FPE, and Akaike
Take log of FPE and multiply by n
n log (FPEn(m)) = n log(
σ2(m))+ n log
(1+
2nk(m)
)' n log
(σ2(m)
)+ 2k(m)
= AIC (m)
Thus Mallows, FPE and Akaike model selection are quite similar
Mallows, FPE, and exp (AIC (m)/n) are estimates of MSFE underhomoskedasticity
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 45 / 108
Robust Mallows
C ∗n (m) = σ2(m) +2n
tr(Q(m)−1Ω(m)
)
Q(m) =1n
n−1∑t=0
xtx′t
Ω(m) =1n
n−1∑t=0
xtx′t e2t+1
where et+1 is residual from a preliminary estimate
An estimate of MSFE, robust to heteroskedasticity and serialcorrelation (similar to HAC standard errors)
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 46 / 108
Cross-Validation
Leave-one-out estimator and prediction residual
β−t (m) =
(∑j 6=txj (m)xj (m)′
)−1 (∑j 6=txj (m)yj+1
)
et+1(m) = yt+1 − β−t (m)′xt (m)
et+1(m) is a forecast error based on estimation without observation t
E(et+1(m)2
)' MSFEn(m)
CVn(m) =1n
∑n−1t=0 et+1(m)
2 is an estimate of MSFEn(m)
Called the leave-one-out cross-validation (CV) criterion
Similar to Robust Mallows criterion
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 47 / 108
Comments on CV Selection
Selecting one-step forecast models by cross-validation iscomputationally simple, generally valid, and robust toheteroskedasticity
Does not require correct specification
Similar to robust Mallows
Similar to Mallows, AIC and FPE under homoskedasticity
Conceptually easy to generalize beyond least-squares estimation
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 48 / 108
Predictive Least Squares (Out-of-Sample MSFE)
Sequential estimates
βt (m) =
(t−1∑j=0xj (m)xj (m)′
)−1 (t−1∑j=0xj (m)yj+1
)
Sequential prediction residuals
et+1(m) = yt+1 − βt (m)′xt (m)
Predictive Least Squares. For some P
PLSn(m) =1P
n−1∑
t=n−Pet+1(m)2
Major Diffi culty: PLS very sensitive to P
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 49 / 108
Comments on Predictive Least Squares
Conceptually simple, easy to generalize beyond least-squaresI Can be applied to actual forecasts, without need to know forecastmethod
et+1(m) are fully valid prediction errors
Possibly more robust to structural change than CVI Intuitive, but this claim has not been formally justified
Very common in applied forecastingI Frequently asserted as “empirical performance”
On the negative side, PLS over-estimates MSFEI et+1(m) is a prediction error from a sample of length t < nI PLS will tend to be overly-parsimoniousI Very sensitive to number of pseudo out-of-sample observations P
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 50 / 108
Theory of Optimal Selection
MSFEn(m) is the MSFE from model m
infmMSFEn(m) is the (infeasible) best MSFE
Let m be the selected model
Let MSFEn(m) denote the MSFE using the selected estimator
We say that selection is asymptotically optimal if
MSFEn(m)infmMSFEn(m)
p−→ 1
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 51 / 108
Theory of Optimal Selection
A series of papers have shown that AIC, Mallows, FPE areasymptotically optimal for selection
AssumptionsI AutoregressionsI Errors are iid, homoskedasticI True model is AR(∞)
Shibata (Annals, 1980), Ching-Kang Ing with co-authors (2003, 2005,etc)
Proof Method: Show that the selection criterion is uniformly close toMSFE
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 52 / 108
Theory of Optimal Selection - Regression Case
In regression (iid date) case
Li (1987), Andrews (1991), Hansen (2007), Hansen and Racine(2012)
AIC, Mallows, FPE, CV are asymptotically optimal for seletion underhomoskedasticity
CV is asymptotically optimal for seletion under heteroskedasticity
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 53 / 108
Forecast Selection - Summary
Testing inappropriate for forecast selection
Feasible selection criteria: BIC, AIC, AICc , Mallows, Robust Mallows,FPE, PLS, CV
Valid comparisons require holding sample constant across models
All methods except CV and PLS require conditional homoskedasticity
PLS sensitive to choice of P
BIC appropriate when true structure is sparse
CV quite general and flexibleI Recommended method
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 54 / 108
GDP ExampleMethods: BIC, AICc , Robust Mallows, CV
Model BIC AICc C ∗n CVAR(1) 473 466 10.7 10.7AR(2) 472 462 10.6 10.5AR(3) 477 464 10.7 10.7AR(4) 481 465 10.8 10.8AR(5) 483 464 10.8 10.8AR(6) 489 466 11.0 10.9AR(7) 494 468 11.1 11.1AR(8) 498 470 11.3 11.2AR(9) 500 469 11.3 11.2AR(10) 505 471 11.4 11.4AR(11) 511 473 11.5 11.5AR(12) 511 471 11.4 11.3
Methods select AR(2)
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 55 / 108
10-Year Treasury Rate
Model BIC AICc C ∗n CVAR(1) −1518 −1527 0.0798 0.0798AR(2) −1541∗ −1554 0.0768∗ 0.0768∗
AR(3) −1538 −1555 0.0769 0.0769AR(4) −1532 −1554 0.0773 0.0773AR(6) −1531 −1561 0.0772 0.0770AR(8) −1522 −1562 0.0777 0.0774AR(10) −1513 −1561 0.0784 0.0781AR(12) −1506 −1563 0.079 0.0787AR(20) −1471 −1561 0.081 0.080AR(22) −1470 −1570∗ 0.081 0.080AR(24) −1458 −1565 0.081 0.081
Mallows, AICc , FPE select AR(22)Robust Mallows, CV select AR(2)Difference due to conditional heteroskedasticityAR(2) through AR(6) near equivalent with respect to C ∗n and CVBruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 56 / 108
Point Forecast - GDP Growth
AR(2)
Data Forecast Actual2011:1 0.362011:2 1.332011:3 1.802011:4 2.912012:1 1.842012:2 2.65 1.20
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 57 / 108
Point Forecast - 10-year Treasury Rate
AR(2)
Data Forecast ActualLevel Change Level Change Level Change
2012:1 1.97 -0.012012:2 1.97 0.002012:3 2.17 0.202012:4 2.05 -0.122012:5 1.96 -0.09 1.82 -0.23
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 58 / 108
Forecasting with Leading Indicators
Recall, the ideal forecast is
E (yn+1|In) = E (yn+1|xn, xn−1, ...)
where In contains all information
xn = lags + leading indicatorsI Variables which help predict yt+1I We have focused on univariate lagsI Typically more information in related seriesI Which?
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 59 / 108
Good Leading Indicators
Measured quickly
Anticipatory
Varies by forecast variable
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 60 / 108
Interest Rate Spreads
Difference between Long and Short Rate
Measured immediately
Indicate monetary policy, aggregate demand
Term Structure of Interest Rates:
Long Rate is the market expectation of the average future short rates
Spread is the market expectation of future short rates
I use U.S. Treasury rates, difference between 10-year and 3-month
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 61 / 108
Figure: 10-Year and 3-Month T-Bill Rates
1960 1970 1980 1990 2000 2010
24
68
1012
14
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 62 / 108
Figure: Term Spread
1960 1970 1980 1990 2000 2010
10
12
34
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 63 / 108
High Yield Spread
“Riskless” rate: U.S. Treasury
Low-risk rate: AAA grade corporate bond
High Yield rate: Low grade corporate bond
Theory: high-yield rate includes premium for probability of default
Low grade bond rates increase with probability of default —when realactivity is expected to fall
Spread: Difference between corporate bond rates
I use difference between AAA and BAA bond rates
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 64 / 108
Figure: AAA and BAA Corporate Bond Rates
1960 1970 1980 1990 2000 2010
46
810
1214
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 65 / 108
Figure: High Yield Spread
1960 1970 1980 1990 2000 2010
0.5
1.0
1.5
2.0
2.5
3.0
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 66 / 108
Construction Indicators
Building Permits
Housing Starts
Anticipate construction spending
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 67 / 108
Figure: Housing Starts, Building Permits
1960 1970 1980 1990 2000 2010
0.5
1.0
1.5
2.0
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 68 / 108
Mixed Frequency Data
U.S. GDP is measured quarterly
Interest rates: Daily
Permits: Monthly
Simplest approach: Quarterly aggregationI Aggregate (average) daily and monthly variables to quarterly level
Mixed Frequency approachI Use lower frequency data as predictors
For now, we use aggregate (quarterly) data
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 69 / 108
Timing
Variables reported in separate sequences
Should we use only "quarter 1" variables to forecast "quarter 2"?
Or should we use whatever is available?I E.g., use quarter 2 interest rates to forecast quarter 1 GDP?
Let’s use quarter 1 data to forecast quarter 2
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 70 / 108
Models Selection by CVAll estimates include intercept plus two lags of GDP growth
Model CV ForecastSpread 10.4 2.8HY Spread 10.6 2.5Housing Starts 10.3 1.4Bulding Permits 10.3 1.7Sp+HY 10.3 2.7Sp+HS 10.02 1.5Sp+BP 10.1 1.9HY+HS 10.4 1.4HY+BP 10.4 1.6HS+BP 10.4 1.4Sp+HY+HS 10.00 1.3Sp+HY+BP 10.1 1.7Sp+HS+BP 10.05 1.3HY+HS+BP 10.5 1.3Sp+HY+HS+BP 10.02 1.1Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 71 / 108
CV-Selected Forecast: 1.3%Actual: 1.2%
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 72 / 108
Coeffi cient Estimates
∆ log(GDPt+1) β s(β)Intercept −0.33 (1.03)∆ log(GDPt ) 0.16 (0.10)∆ log(GDPt−1) 0.09 (0.10)Bond Spreadt 0.61 (0.23)High Yield Spread −1.10 (0.75)Housing Startst 1.86 (0.65)
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 73 / 108
Variance Forecasting
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 74 / 108
Variance Forecasts
Forecast uncertaintyI Point forecasts insuffi cient!
σ2t+1 = var (yt+1|It )In the model yt+1 = β′xt + et+1
I σ2t+1 = var (et+1 |In) = E(e2t+1 |It
)
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 75 / 108
10-Year Bond Rate
Prediction Residuals
Squares
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 76 / 108
Figure: Leave-One-Out Prediction Residuals
1960 1970 1980 1990 2000 2010
1.5
1.0
0.5
0.0
0.5
1.0
1.5
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 77 / 108
Figure: Squared Prediction Residuals
1960 1970 1980 1990 2000 2010
0.0
0.5
1.0
1.5
2.0
2.5
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 78 / 108
Variance Forecast Methods
Constant Variance σ2t = σ2
I Uncertainty not state-dependent
GARCHI Common in financial dataI Estimated by MLE
Regression ApproachI σ2t = E
(e2t+1 |In
)≈ α′xt
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 79 / 108
2-Step Variance Estimation
Start with residuals et+1I Better choice: leave-one-out residuals et+1
Estimate variance model (constant, ARCH, or regression)
Obtain σ2n from fitted model
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 80 / 108
Which Residuals?
Least-squares residual variance biased toward zeroI Forecast variance biased towards zero
Leave-one-out residual variance estimates out-of-sample MSFEI Better choice
et+1(m) = yt+1 − β−t (m)′xt (m)
β−t (m) =
(∑j 6=txj (m)xj (m)′
)−1 (∑j 6=txj (m)yj+1
)
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 81 / 108
Constant Variance Model
σ2t = σ2
σ2n = σ2 =1
n− 1 ∑n−1t=1 e
2t+1
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 82 / 108
Regression Variance Model
σ2t ≈ α′xte2t+1 = α′xt + ηt
α =(∑n−1t=1 xtx
′t
)−1 (∑n−1t=1 xt e
2t+1
)σ2n = α′xn
I Easy, but not constrained to (0,∞)
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 83 / 108
GARCH Models
σ2t = ω+ βσ2t−1 + αe2tConditional variance of et+1Specifies conditional variance as function of recent squaredinnovations
Large innovations (in magnitude) raise conditional variance
Lagged variance smooths σ2t
Non-negativity constraints: ω > 0, β ≥ 0, α > 0
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 84 / 108
GARCH with Regressors
σ2t = ω+ βσ2t−1 + αe2t + γxtxt > 0 useful to constrain regressor to be positive
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 85 / 108
Estimation by Quasi-LIkelihood
Numerical optimization
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 86 / 108
Model Selection
Model with 2 ARCH lags and 2 regressors
σ2t = ω+ βσ2t−1 + α1e2t + α2e2t−1 + γ1x1t + γ2x2t
How many lags? How many regressors?
Presence of lagged σ2t−1 complicates issuesI β not identified when α1 = α2 = γ1 = γ2 = 0I This means conventional tests and information criterion are not correctwhen the process is close to constant variance
I We typically ignore this complication
Since estimation is nonlinear MLE much of model selection &combination literature is not relevant
I AIC appropriateI Unfortunately, not easy to compute with standard packages
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 87 / 108
AIC for GARCH models
If model m has parameter vector θ(m) with k(m) elements
AIC (m) = 2L(θ(m)) + 2k(m)Not standard output
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 88 / 108
Variance Forecast from GARCH model
σ2n+1 = ω+ βσ2n + α1e2nσ2n+1 = ω+ βσ2n + α1 e2nσ2n+1 is estimated conditional variance of yn+1
Standard deviation√
σ2n+1
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 89 / 108
Example: 10-Year Bond Rate
GARCH(1,1)
σ2t = ω+ αe2t + βσ2t−1
Estimate s.e.ω 0.0001 0.0001α 0.200 0.041β 0.835 0.025
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 90 / 108
Variance Forecast
Conditional varianceI σ2n+1 = 0.054I σn+1 = 0.23
UnconditionalI σ2 = 0.076I σ = 0.28
The conditional variance at present is similar, but somewhat smallerthan the unconditional
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 91 / 108
Figure: Estimated Variance
1960 1970 1980 1990 2000 2010
0.0
0.2
0.4
0.6
0.8
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 92 / 108
Example: GDP Growth
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 93 / 108
Figure: GDP: Leave-One-Out Prediction Residuals
1970 1980 1990 2000 2010
10
50
510
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 94 / 108
Figure: GDP: Squared Prediction Residuals
1970 1980 1990 2000 2010
050
100
150
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 95 / 108
GARCH(1)σ2t = ω+ αe2t + βσ2t−1
Estimate s.e.ω 0.81 0.46α 0.21 0.06β 0.72 0.06
Conditional varianceI σ2n+1 = 4.1I σn+1 = 2.0
UnconditionalI σ2 = 9.8I σ = 3.1
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 96 / 108
Figure: GDP: Estimated Variance
1970 1980 1990 2000 2010
1020
3040
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 97 / 108
Interval Forecasting
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 98 / 108
Interval Forecasts
Take the form [a, b]
Should contain yn+1 with probability 1− 2α
1− 2α = Pn (yn+1 ∈ [a, b])= Pn (yn+1 ≤ b)− Pn (yn+1 ≤ a)= Fn(b)− Fn(a)
where Fn(y) is the forecast distribution
It follows that
a = qn(α)
b = qn(1− α)
a = α’th and b = (1− α)’th quantile of conditional distribution
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 99 / 108
Interval Forecasts are Conditional Quantiles
The ideal 80% forecast interval, is the 10% and 90% quantile of theconditional distribution of yn+1 given InOur feasible forecast intervals are estimates of the 10% and 90%quantile of the conditional distribution of yn+1 given InThe goal is to estimate conditional quantiles.
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 100 / 108
Mean-Variance Model
Write
yt+1 = µt + σt εt+1
µt = E (yt+1|It )σ2t = var (yt+1|It )
Assume that εt+1 is independent of It .
Let qt (α) and qε(α) be the α’th quantiles of yt+1 and εt+1. Then
qt (α) = µt + σtqε(α)
Thus a (1− 2α) forecast interval for yn+1 is
[µn + σnqε(α), µn + σnqε(1− α)]
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 101 / 108
Mean-Variance Model
Given the conditional mean µn and variance σ2n, the conditionalquantile of yn+1 is a linear function µn + σnqε(α) of the conditionalquantile qε(α) of the normalized error
εn+1 =en+1σn
Interval forecasts thus can be summarized by µn, σ2n, and qε(α)
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 102 / 108
Normal Error Quantile Forecasts
Make the approximation εt+1 ∼ N(0, 1)I Then qε(α) = Z (a) are normal quantilesI Useful simplification, especially in small samples
0.10, 0.25, 0.75, 0.90 quantiles areI −1.285, −0.675, 0.675, 1.285
Forecast intervals
[µn + σnZ (α), µn + σnZ (1− α)]
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 103 / 108
Nonparametric Error Quantile Forecasts
Let εt+1 ∼ F be unknownI We can estimate qε(α) as the empirical quantiles of the residualsI Set
εt+1 =et+1σt
I Sort ε1, ..., εn .I qε(α) and qε(1− α) are the α’th and (1− α)’th percentiles
[µn + σn qε(α), µn + σn qε(1− α)]
Computationally simple
Reasonably accurate when n ≥ 100Allows asymmetric and fat-tailed error distributions
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 104 / 108
Constant Variance Case
If σt = σ is a constant, there is no advantage for estimation of σ forforecast interval
Let qe (α) and qe (1− α) be the α’th and (1− α)’th percentiles oforiginal residuals et+1Forecast Interval:
[µn + qε(α), µn + q
e (1− α)]
When the estimated variance is a constant, this is numericallyidentical to the definition with rescaled errors εt+1
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 105 / 108
Example: Interest Rate Forecast
n = 603 observations
εt+1 =et+1σt
from GARCH(1,1) model
0.10, 0.25, 0.75, 0.90 quantiles
−1.16, −0.59, 0.62, 1.26Point Forecast = 1.96
50% Forecast interval = [1.82, 2.10]
80% Forecast interval = [1.69, 2.25]
Actual: 1.82
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 106 / 108
Example: GDP
n = 207 observations
εt+1 =et+1σt
from GARCH(1,1) model
0.10, 0.25, 0.75, 0.90 quantiles
−1.18, −0.63, 0.57, 1.26Point Forecast = 1.31
50% Forecast interval = [0.04, 2.4]
80% Forecast interval = [−1.07, 3.8]
Actual: 1.20
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 107 / 108
Mean-Variance Model Interval Forecasts - Summary
The key is to break the distribution into the mean µt , variance σ2t andthe normalized error εt+1
yt+1 = µt + σt εt+1
Then the distribution of yn+1 is determined by µn, σ2n and thedistribution of εn+1
Each of these three components can be separately approximated andestimated
Typically, we put the most work into modeling (estimating) the meanµt
I The remainder is modeled more simplyI For macro forecasts, this reflects a belief (assumption?) that most ofthe predictability is in the mean, not the higher features.
Bruce Hansen (University of Wisconsin) Foundations October 29-31, 2013 108 / 108