Upload
dimas-bagus-cahyaningrat-w
View
241
Download
1
Embed Size (px)
DESCRIPTION
ekos
Citation preview
Econometrics II – Heij et al. Chapter 7.7
Panel Data, SUR and GLS
Marius Ooms
Tinbergen Institute Amsterdam
TI Econometrics II 2006/2007, §7.7.1-7.7.3 – p. 1/22
Heij et al. (2004) §7.7.1-7.7.3
• Panel data• Seemingly unrelated regression model (SUR)
Generalized least squares (GLS) Feasible GLS
• Panel data with fixed effects• Panel data with random effects
TI Econometrics II 2006/2007, §7.7.1-7.7.3 – p. 2/22
Panel data
• Panel data consist of cross-section observations fordifferent time points.
• We observe one dependent variable yit for individual i attime t where i = 1, . . . ,m and t = 1, . . . , n.
• In most applications m is (much) larger than n: m >> n.• We have k strongly exogenous explanatory variables in a
vector xit and n >> k.
In the literature one often uses N (big N ) for m and T (big T ) forn. As Heij et al. is mostly time series oriented, they use n as thetime series dimension and m as the cross-section dimension (or”number of equations” in multivariate time series).
For n = 1 and large m , we have simple cross-section data.When m = 1 and n is large, we have univariate time-series.
TI Econometrics II 2006/2007, §7.7.1-7.7.3 – p. 3/22
General Panel data Model
The considered models are of the following regression form :
yit = αit + x′
itγit + εit, Var(ε) = Ω,
where Ω is the mn × mn covariance matrix of the mn × 1disturbances vector ε and xit is a (k − 1) × 1 vector ofexplanatory variables. Parameters depend on time t and onindividual i.
This general model is not empirically identified since itcontains more parameters than observations!
Therefore, we have to impose restrictions on the regressionparameters (αit, γit) and on the covariance matrix Ω, before wecan estimate parameters.
TI Econometrics II 2006/2007, §7.7.1-7.7.3 – p. 4/22
Seemingly unrelated regression model (SUR)
The SUR model is given by
yit = αi + x′
itγi + εit
Where E[εitεjt] = σij , E[εitεjs] = 0 for all i, j and t 6= s.
All individuals have their own regression parameters , butthese are restricted to be constant over time .
The regression relations for the different individuals are onlyrelated via the correlation of the error terms, but the errorcovariance across individuals is unrestricted
No error covariance across time: no serial correlation, or serialcross correlation.
TI Econometrics II 2006/2007, §7.7.1-7.7.3 – p. 5/22
SUR model specification and estimation per individual
Denote the observations for unit i by the n × 1 vector yi and bythe n × k matrix Xi, with corresponding parameter vectorβi = (αi, γ
′
i)′ and n × 1 vector of disturbances εi. The model for
SUR unit i can be written as
yi = Xiβi + εi
Estimating the parameters βi by OLS per equation isconsistent , but is inefficient if the disturbances for the differentindividuals display contemporaneous correlation and theregressor sets differ from individual (equation) to indivi dual(equation)
This is easily shown if we combine data for all the units in onebig transformed regression model, where we can apply OLStheory of §3.1.4. See next slides.
TI Econometrics II 2006/2007, §7.7.1-7.7.3 – p. 6/22
Complete SUR model in matrix notation
Combining the models for the m units gives
y1
y2
...ym
=
X1 0 · · · 0
0 X2 · · · 0...
......
0 0 · · · Xm
β1
β2
...βm
+
ε1
ε2
...εm
E[ε] = 0, var(ε) = Ω =
σ11I σ12I · · · σ1mI
σ12I σ22I · · · σ2mI...
......
σ1mI σ2mI · · · σmmI
TI Econometrics II 2006/2007, §7.7.1-7.7.3 – p. 7/22
Inefficiency of OLS for SUR model
Simultaneous OLS estimation of the mk × 1 parameters βi fori = 1, . . . ,m in the above model is equivalent to applying OLSper unit.
Exercise (1) Prove this proposition. Hint: Consider themethod-of-moments equations for OLS.
This simultaneous OLS estimator is not BLUE since thecovariance matrix Ω is not of the form σ2I. Next we will discus ageneral method to deal with this problem.
TI Econometrics II 2006/2007, §7.7.1-7.7.3 – p. 8/22
Generalized least squares (GLS) idea
First suppose we know Ω up to one constant term.
The idea is to transform the model so that the error covariancematrix becomes ”scalar” σ2I. This idea is similar to WLS.Transform the joint model by a (big) square and invertible butnondiagonal weighting matrix A, transform:
y = Xβ + ε into
Ay = AXβ + Aε
As the variance matrix of Aε is AΩA′, we choose A s.t.AΩA′ = I, or A−1(A′)−1 = (A′A)−1 = Ω.
A is a square root of Ω−1. The decomposition of Ω is standard inmatrix algebra, see section A.6, think of A as A = Ω−1/2. Notethat A is not uniquely defined.
TI Econometrics II 2006/2007, §7.7.1-7.7.3 – p. 9/22
GLS, theoretical infeasible version
Assume the following notation for the transformed variables :
y∗ = Ay, X∗ = AX, ε∗ = Aε,
so thaty∗ = X∗β + ε∗
with E[ε∗] = 0 and Var(ε∗) = Inm. Now the BLUE estimator of β
is given by
bGLS = (X ′
∗X∗)
−1X ′
∗y∗ = (X ′Ω−1X)−1X ′Ω−1y.
This is called the Generalized Least Squares estimator . Inpractice this estimator is infeasible as we do not know Ω.
TI Econometrics II 2006/2007, §7.7.1-7.7.3 – p. 10/22
Two step Feasible GLS in SUR model
When Ω is unknown, we first estimate Ω with OLS. We thenperform the so-called Feasible GLS in two steps:
• Estimate Ω.Do m regressions, one per unit to estimate βj by OLS,j = 1, . . . ,m. Let ei be the n × 1 vector of OLS residuals forunit i. The unknown (co)variances σij are then estimated by
sij = 1n
∑nt=1 eitejt. Then obtain Ω by replacing σij by sij .
• Estimate the parameters βj jointly by GLS.That is
bFGLS = (X ′Ω−1X)−1X ′Ω−1y
TI Econometrics II 2006/2007, §7.7.1-7.7.3 – p. 11/22
Asymptotic distribution of FGLS in SUR application
Under the assumption of correct specification and normallydistributed error terms, it can be shown that the FGLS estimatorin the SUR model has the same asymptotic properties as ML,i.c. for n → ∞ and m fixed one can derive
bFGLS ≈ N(β, (X ′
∗X∗)
−1) ≈ N(β, (X ′Ω−1X)−1).
We can use this result to perform ’asymptotic’ t- and F -tests. Inpractice n is finite and one has to take extra precautions to makesure X ′Ω−1X is a full rank positive definite matrix, see also nextslide.
NB: if the number of unknown parameters in Ω increases linearly with n, FGLS does not
work. Compare the GMM standard error derivation for general Ω in §5.5.2
TI Econometrics II 2006/2007, §7.7.1-7.7.3 – p. 12/22
Finite Sample Rank condition SUR, efficiency SUR
The estimated SUR covariance matrix Ω is an mn × mn blockdiagonal matrix with the m × m matrix S on the diagonal blockswith elements sij = 1
ne′iej with ei the n × 1 vector of OLS
residuals of unit i. This means that Ω is invertible, and 2-stepFLGS for SUR possible , if and only if the m × m matrix S isinvertible, i.e. if and only if rank(S) = m.
Necessary condition for Feasible GLS in SUR: Define the n × m
matrix E = (e1, · · · , em) , then S is 1nE′E and
m = rank(S) = rank(E) ≤ n. Therefore S and Ω can beinvertible and we can estimate β with FGLS only if m ≤ n.
There are special cases in which OLS is efficient for SURmodels. Exercise (2) 7.10, page 715.
TI Econometrics II 2006/2007, §7.7.1-7.7.3 – p. 13/22
Panel data with fixed effects
When m >> n the data are ’typical’ panel data or longitudinaldata. We cannot apply the SUR model, as this requires m ≤ n.The model has to be simplified by parameter restrictions.
E.g., the coefficients of the explanatory variables are assumedto be the same for all units (’pooling ’): we impose (pooling)restrictions on the slope parameters γi: γi = γ, i = 1, . . . ,m.
We then obtain the panel data model with fixed effects :
yit = αi + x′
itγ + εit, εit ∼ IID(0, σ2),
The constant terms αi are fixed unknown parameters , but theydiffer from unit to unit (not pooled). The errors are independentand homoskedastic in time and across units.
NB: the number of parameters increases linearly in m, sostandard asymptotic theory still requires n → ∞, although nearlyall parameters are pooled in the cross section.
TI Econometrics II 2006/2007, §7.7.1-7.7.3 – p. 14/22
Fixed effects model in matrix notation I
We can rewrite the model in standard regression form using unitdummy variables
Dit(j) =
1, i = j
0, i 6= j, i = 1, . . . ,m, j = 1, . . . ,m to get
yit =∑m
j=1 αjDit(j) + x′
itγ + εit, εit ∼ IID(0, σ2)
Next, define the n × 1 vector yi with elements yit, define εit
accordingly and define the n× (k − 1) matrix Xi with tth row x′
it ,t = 1, . . . , n and let ι be an n × 1 vector of ones.
For the ith unit we obtain the matrix notation
yi = ιαi + Xiγ + εi
TI Econometrics II 2006/2007, §7.7.1-7.7.3 – p. 15/22
Fixed effects model in matrix notation II
Now stack the equations yi = ιαi + Xiγ + εi for the m timeseries.
Next, define the mn × 1 vector y consisting of the stacked yis,define ε accordingly, define the mn × (k − 1) matrix X as thematrix of stacked Xi and define the stacked mn×m matrix D as
D =
ι 0 · · · 0
0 ι · · · 0...
......
0 0 · · · ι
If α = (α1, · · · , αm)′, then following single regression modelarises
y = Xγ + Dα + ε, ε ∼ N(0, σ2I)
TI Econometrics II 2006/2007, §7.7.1-7.7.3 – p. 16/22
Fast Fixed Effects estimation in regression form I
Efficient estimators of α and γ can be obtained by OLS. When m
is large, direct OLS is computationally unattractive as it requiresthe inverse of (X D)′(X D)
(m+k−1)×(m+k−1)
. An intuitive and easier method
applies partial regression, following the Frisch-Waugh theorem(§3.2.5), in matrix notation:
• 1. ’Regress’ y and (all columns of) X on D and save theresiduals, MDy and MDX, MD = I − D(D′D)−1D′. Since(D′D)−1 = 1
nI, MDy and MDX have elements yit − yi andx′
it − xi′: just removing individual sample means!
• 2. Regress MDy on MDX to obtain γOLS
γOLS = (X ′MDX)−1X ′MDy
TI Econometrics II 2006/2007, §7.7.1-7.7.3 – p. 17/22
Interpretation Fixed effects estimation in regression II
The Fixed Effect estimator or Least Squares Dummy VariableEstimator (LSDVE) of γ is therefore obtained by regressingunit-mean adjusted y on unit-mean adjusted X. The fixed effectOLS estimates α follow from the last m OLS normal equations(3.41). In matrix regression form:
D′Xγ + D′Dα = D′y, so that
α = (D′D)−1(D′y − D′Xγ).
which has the familiar interpretation of the estimates of constantterms in regressions per individual (but here with a givencommon γ) :
αi = yi − x′
iγ
TI Econometrics II 2006/2007, §7.7.1-7.7.3 – p. 18/22
Panel data model with random effects
The model with fixed effects cannot be consistently estimated ifn is fixed and m → ∞, and it cannot be used to forecast a newunit ym+1 given xm+1: αi is not modelled.The simplest model for this purpose is the random effectsmodel, which has a random intercept with a common mean α
for all units. In social sciences (SPSS) this specification is calledcalled mixed model (mix of random and fixed coefficients).
αi = α + ηi, ηi ∼ IID(0, σ2α)
yit = α + x′
itγ + ωit ωit = εit + ηi, εit ∼ IID(0, σ2)
with ηi and εit independent. The disturbances ωit are correlatedwith their own past because of the ηi. The properties of ωit are:
E[ωit] = 0, E[ω2it] = σ2 + σ2
α, E[ωitωis] = σ2α for t 6= s
E[ωitωjs] = 0 for all t, s and i 6= j
TI Econometrics II 2006/2007, §7.7.1-7.7.3 – p. 19/22
Random effects models FGLS, I
We can estimate the parameters α and γ by OLS, but thisestimator is not BLUE since the disturbances ωit are crosscorrelated. An efficient estimator can be obained by feasibleGLS.In the first step of FGLS we need to estimate σ2 and σ2
α.Since ηi is fixed in the ith unit, it can be removed from the modelby taking the unit de-meaned variables. Consider
yit − yi = (xit − xi)′γ + (εit − εi), i = 1, . . . ,m, t = 1, . . . , n
Let γ be the OLS estimate of γ for the above model. Then thewithin variance , σ2 = E(ε2
it), is estimated by
σ2 =1
m(n − 1)
m∑
i=1
n∑
t=1
(yit − yi − (xit − xi)′γ)2.
TI Econometrics II 2006/2007, §7.7.1-7.7.3 – p. 20/22
Random effects models FGLS, II
To estimate σ2α we combine the within variance estimator σ2
and the between variance estimator which estimates theunexplained variance between unit-means in:
yi = α + x′
iγ + (εi + ηi), i = 1, . . . ,m
The variance estimate of this regression, denoted by σ2B,
estimates var(εi + ηi) = var(n−1σ2 + σ2α). Combining σ2
B and σ2
one derives the estimator
σ2α = σ2
B − n−1σ2
Given σ2α and σ2 one can do the second step of FGLS to
reestimate α and γ. The resulting estimator is also known as theEGLS (Estimated GLS) estimator of γ.
Exercise (3): Check the derivation on page 695-696 form = 3, n = 2.
TI Econometrics II 2006/2007, §7.7.1-7.7.3 – p. 21/22
Conclusion
The courses Econometrics I and II have introduced you to• the main ideas of parametric econometric modelling
(Data analysis, parsimonious specification, consequencesof modellling errors, diagnostic checking, testing) and
• the basics of econometric (asymptotic) inference (Exactstatistical inference, likelihood based inference, momentbased inference, stationarity, rate of convergence)
in• Static linear and nonlinear single equation models• Binary Discrete choice models• Dynamic linear single- and multiple equation models• Panel data models
Many different parametric models and methods exist, but theseare (all) based on (combinations of) the ideas mentioned above.Not discussed: Bayesian and nonparametric estimation and infererence
TI Econometrics II 2006/2007, §7.7.1-7.7.3 – p. 22/22