Upload
lynga
View
215
Download
0
Embed Size (px)
Citation preview
5 Panel Data
• panel data include multiple draws on the same basic unit of observation
(‘group’)
• typically, multiple draws over time, but panel data need not require a
time dimension
• examples
– individuals, firms, or countries observed at multiple time periods
– multiple individuals within a household observed at a point in time
– multiple employees within a firm observed at a point in time
• can have 3 (or higher) dimensional panels (e.g., multiple individuals
within a household observed at multiple points in time)
86
• data structure
{yit, xit}i=1,...,N ;t=1,...,T
where total sample size = NT
– examples
∗ i indexes individuals, firms, or countries; t indexes time periods
∗ i indexes individuals; t indexes households
∗ i indexes employees; t indexes firms
• in microeconometric studies, typically N is large, T is small; macroe-
conometrics may be the reverse
• asymptotics can be performed on N −→∞, T −→∞, or both
87
5.1 Pooled OLS
• model
yit = α + xitβ + εit, εitiid∼ N(0, σ2)
• estimation via OLS
• identical to usual OLS, only now sample size is NT
• usual assumptions required for unbiasedness, consistency, etc.
• extensions
– time trend
∗ linear time trend
yit = α + λt + xitβ + εit, εitiid∼ N(0, σ2)
which allows the intercept to trend linearly over time, changing by
λ each period
∗ quadratic time trend
yit = α + λ1t + λ2t2 + xitβ + εit, εit
iid∼ N(0, σ2)
which allows the intercept to follow a more general time trend
– structural break
yit =
α1 + xitβ1 + εit if t 6 T
α2 + xitβ2 + εit if t > T
where T is the date of the structural break
88
∗ could have multiple breaks if panel is long enough
∗ Chow test
Ho : α1 = α2, β1 = β2
H1 : not all equal
has a test statistic of
FK+1,NT−2K−2 =(SSRR − SSR1 − SSR2)/(K + 1)
(SSR1 + SSR2)/(NT − 2K − 2)
where
· SSRR = SSR from pooled (restricted) model
· SSR1 = SSR from OLS using only obs with t 6 T
· SSR2 = SSR from OLS using only obs with t > T
· K = # of x’s
89
∗ alternative
· define
Iit =
1 if t > T
0 if t 6 T
· estimate via OLS
yit = α1 + α2Iit + xitβ1 + xitIitβ2 + εit, εitiid∼ N(0, σ2)
and test
Ho : α2, β2 = 0
H1 : not all = 0
90
– time-specific intercepts
yit = α +∑T
s=2λsDst + xitβ + εit
where
Dst =
1 if s = t
0 otherwise
∗ equivalent to
yit =∑T
s=1λsDst + xitβ + εit
where constant is no omitted to avoid perfect multicollinearity
∗ λ’s capture effects of all variables that do not vary across i at a
point in time
∗ any x’s that do not vary across individuals are subsumed by λ’s
even if they vary over time
∗ more general than time trend since intercepts can potentially bounce
all over
91
∗ Chow test
Ho : λ1 = · · · = λT
H1 : not all equal
has a test statistic of
FT−1,NT−K−T =(SSRR − SSRU)/(T − 1)
SSRU/(NT −K − T )
where
· SSRR = SSR from restricted model (single intercept)
· SSRU = SSR from unrestricted model (T intercepts)
· K = # of x’s
∗ can interact time dummies with x’s to allow β’s to vary over time
∗ inclusion of all time dummies, and all interactions between time
dummies and x’s equivalent to OLS period-by-period
92
– difference-in-difference estimation
∗ frequently used in policy analyses
∗ examples
· What was the impact of NJ’s minimum wage hike?
· What is the impact of legalized abortion on crime?
· What is the impact of the death penalty on crime?
∗ cross-sectional model
yi = α + xiβ + δDi + εi, εi ∼ N(0, σ2), i = 1, ..., N
where, say,
· y = unemployment rate
· x = macroeconomic variables
· D = 1 if NJ (high MW), 0 for all other states (low MW)
∗ potential shortcoming: what if there are unobservable differences
between observations with the policy, and those without the policy
· e.g., if NJ is different from other states for reasons not included
in x, then Cov(D, ε) 6= 0
· =⇒ δOLS (and perhaps βOLS) will be biased
∗ panel data offers a potential solution
∗ involves collecting data prior to policy implementation
93
∗ intuition
· cross-sectional model identifies δ by comparing the level of y
in states with the policy to the level of y in states without the
policy
· difference-in-difference model identifies δ by comparing the change
in y in states from before and after the policy to the change in
y in states with no policy change
∗ panel model
yit = α + xitβ + λ1Di + λ2D2t + δDiD2t + εit, εit ∼ N(0, σ2)
where, say,
· y = unemployment rate
· x = macroeconomic variables
· Di = 1 if NJ (‘policy changer’), 0 for all other states (no policy
change)
· D2t = 1 for periods in which NJ has a high MW, 0 for previous
time periods
· DiD2t = 1 if NJ after policy change, 0 for all other observations
94
∗ interpretation of parameters in the panel model (ignoring x’s)
pre-policy change post-policy change
t = 1 t = 2
no policy change D = 0 α α + λ2
‘policy changer’ D = 1 α + λ1 α + λ1 + λ2 + δ
which implies
· λ2 = difference (or change) over time in states without policy
change (α + λ2 − α = λ2)
· λ2 + δ = difference (or change) over time in states with policy
change (α + λ1 + λ2 + δ − (α + λ1) = λ2 + δ)
· δ = difference in the the two differences (λ2+δ−λ2 = δ), which
is the additional change in states with the policy change
· δPOLS known as DID estimator
95
∗ notes
· λ1 captures time-invariant differences in states with the policy
change vs. states with no policy change; solves the omitted
variable bias problem in cross-sectional models if the relevant
omitted vars do not change over time
· λ2 captures changes over time that affect all states – policy
changers and non-changers – equally
· δPOLS = unbiased estimate of policy impact if (i) λ1 captures
all differences between policy changers and non-changers, and
(ii) the change in y over time (equal to λ2) is idential for both
policy changers and non-changers
· if no x’s, then
δPOLS = (y12 − y1
1)− (y02 − y0
1)
where yDt = mean outcome in period t of states of type D
96
5.2 Fixed Effects
• motivation
– OLS is biased if omitted vars are correlated with included x’s
– not always possible to find valid IVs
– if omitted var does not vary over time (time invariant), panel data
can yield estimates free from omitted variable bias
• same setup as before, but allow for individual-specific intercepts
yit = αi + xitβ + εit, εitiid∼ N(0, σ2)
– αi = FE for group i (aka, unobserved effect, unobserved heterogene-
ity)
– also referred to as unobserved effects or unobserved heterogeneity
– εit = idiosyncratic error
• FEs subsume all time invariant x’s
• FEs capture all time invariant attributes – observable and unobservable
– of individual i
97
• pooled OLS equivalent to estimating
yit = α + xitβ + εit
εit = (αi − α) + εit
where εit is known as a composite error
– unbiasedness of βPOLS requires Cov(αi, xit) = 0 and Cov(εi, xit) = 0
– bias due to Cov(αi, xit) 6= 0 known as heterogeneity bias
• pulling out αi from the error term permits unbiased estimates of β even
if Cov(αi, xit) 6= 0
• if Cov(αi, xit) = 0, then random effects estimation is more efficient
• if αi = α ∀i, then pooled OLS is more efficient
• estimation methods when Cov(αi, xit) 6= 0
– LSDV (Least Squares Dummy Variable Model)
– FD (first-differencing)
– mean-differencing (FE estimator; within estimator)
• STATA: -xtreg, fe fd -, -areg-
98
• LSDV (Least Squares Dummy Variable Model)
yit =∑N
j=1αjDji + xitβ + εit
where
Dji =
1 if j = i
0 otherwise
• amounts to including N dummy vars, 1 for each group
• estimated by pooled OLS
• βLSDV is consistent even if Cov(αi, xit) 6= 0 (regressors can always be
correlated)
• only feasible computationally if N is of reasonable size
99
• FD (first-differencing)
yit = αi + xitβ + εit
– implies
yi1 = αi + xi1β + εi1
...
yiT = αi + xiTβ + εiT
– taking differences between consecutive years yields
yi2 − yi1 = (xi2 − xi1)β + (εi2 − εi1)
...
yiT − yiT−1 = (xiT − xiT−1)β + (εiT − εiT−1)
or, using new notation,
∆yi2 = ∆xi2β + ∆εi2
...
∆yiT = ∆xiTβ + ∆εiT
where ∆ represents the change from the preceding year
– the model to be estimated is
∆yit = ∆xitβ + ∆εit, i = 1, ..., N ; t = 2, ..., T
100
– notes
∗ FD the data, then regress ∆yit on ∆xit using N(T − 1) observa-
tions
∗ interpretation of βFD is same as original β
∗ differencing eliminates αi and any time invariant x’s
∗ consistency requires Cov(∆xit, ∆εit) = 0
· known as strict exogeneity
· requires Cov(xit, εit) = Cov(xit, εit−1) = Cov(xit, εit+1) = 0 ∀t∗ estimator of FEs
αi = yi − xiβ
which is unbiased, but consistency requires T −→∞
101
• mean-differencing (FE estimator; within estimator)
yit = αi + xitβ + εit
– implies
yi = αi + xiβ + εi
and
y = α + xβ + ε
where bars indicate average over T obs within group; double bars
indicate average over entire sample
– taking differences yields
yit − yi = (xit − xi)β + (εit − εi)
or, using new notation,
yit = xitβ + εit
102
– alternative representation
yit + y = α + (xit + x)β + (εit + ε)
or, using new notation,
˜yit = ˜xitβ + ˜εit
– notes
∗ demean the data, then regress yit on xit, or ˜y on ˜x, using NT
observations
∗ need to adjust degrees of freedom due to estimation of means
∗ interpretation of βFE is same as original β
∗ differencing eliminates αi and any time invariant x’s
∗ consistency requires Cov(xit, εit) = 0
· again, strict exogeneity
· requires xit to be independent of error term from every time
period
103
• comparisons
– LSDV and mean-differencing are identical
– T = 2 =⇒ all three are identical
– T > 3 =⇒ different, but both unbiased
• extensions
– time dummies (DID estimator)
104
5.3 Random Effects
• motivation
– if Cov(xit, αi) = 0, then estimating N parameters αi is inefficient
(equivalently, loosing N obs to FD or MD is inefficient)
– but, if αi 6= α ∀i, then pooled OLS yields incorrect std errors since
Cov(εit, εit′) 6= 0 (within-group serial correlation)
Cov(εit, εit′) =
σ2α + σ2
ε if t = t′
σ2α if t 6= t′
which implies positive serial correlation within groups
– solution
∗ leave αi as part of the composite error
∗ transform the data to a model with serially uncorrelated errors
∗ known as Generalized Least Squares (GLS) estimation in general,
RE estimation in this special case
• same setup as before
yit = α + xitβ + εit, εit ∼ N(0, σ2ε)
where εit is the composite error term
• assume αiiid∼ N(0, σ2
α) = RE for group i
• assume Cov(αi, εit) = 0 ∀t
105
• RE estimation
– transform the data to a model with serially uncorrelated errors
– RE covariance structure
Σi︸︷︷︸TxT
=
σ2α + σ2
ε
σ2α
. . .
... . . . . . .
σ2α · · · σ2
α σ2α + σ2
ε
and
Σ︸︷︷︸NTxNT
=
Σ1
0 . . .
... . . . . . .
0 · · · 0 ΣN
– define
λ = 1−√
σ2ε
Tσ2α + σ2
ε
∈ [0, 1]
– λ-difference the data
yit − λyi = (xit − λxi)β + (εit − λεi)
implies µit ≡ εit − λεiiid∼ N(0, σ2
µ)
106
• steps
– estimate the model using fixed effect methods or pooled OLS
– obtain an estimate, λ
– difference the data using λ
– regress yit − λyi on xit − λxi
• notes
– special cases
∗ λ = 0 =⇒ pooled OLS
∗ λ = 1 =⇒ FE estimation
– RE allows time invariant x’s
– consistency requires Cov(αi, xit) = Cov(εit, xit) = 0
• STATA: -xtreg, re-
107
5.4 Specification Tests
• Hausman test of FE vs. RE
– intuition
∗ if Cov(αi, xit) = 0, then RE and FE are both consistent, but RE
is more efficient
=⇒ βRE ≈ βFE
∗ if Cov(αi, xit) 6= 0, then RE is inconsistent, but FE is consistent
=⇒ βRE 6= βFE
– define test statistic based on difference βFE − βRE
H = T(βFE − βRE
)′ (ΣFE − ΣRE
)−1 (βFE − βRE
)∼ χ2
K
where K = # of x’s
– if test statistic is too large, then reject Cov(αi, xit) = 0
– STATA: -hausman-
108
• RE vs. pooled OLS
– hypothesis
Ho : σ2α = 0
H1 : σ2α 6= 0
– Breusch-Pagan (1980) test
λLM =NT
2(T − 1)
∑Ni=1
(T εi
)2
∑Ni=1
∑Tt=1 ε2
it
− 1
2
∼ χ21
– STATA: -xttest1 - after -xtreg, re-
109
• groupwise heteroskedasticity
– errors are homoskedastic within groups, heteroskedastic across groups
– e.g., errors for a given individual have same variance in each period,
but each individual has a unique variance
– structure:
Σi︸︷︷︸TxT
=
σ2i
0 . . .
... . . . . . .
0 · · · 0 σ2i
and
Σ︸︷︷︸NTxNT
=
Σ1
0 . . .
... . . . . . .
0 · · · 0 ΣN
– hypothesis
Ho : σ2i = σ2 ∀i
H1 : σ2i 6= σ2 for some i
– modified Wald test statistic
W ′ =∑N
i=1
(σ2
i − σ2)2
Vi∼ χ2
N
110
where
Vi =1
T − 1
∑T
t=1
(ε2it − σ2
i
)2
σ2i =
1
T
∑T
t=1ε2it
– notes
∗ valid in presence of non-normality
∗ lower power in ‘large N , small T ’ FE models
– STATA: -xttest3 - after -xtreg, fe-
111
• cross-sectional dependence
Ho : Cov(εit, εjt) = 0 ∀i 6= j
H1 : Cov(εit, εjt) 6= 0 for some i 6= j
– T > N
∗ Breusch-Pagan (1980) test
λLM = T∑N
i=2
∑i=1
j=1ρ2
ij ∼ χ2d
where
· d = N(N − 1)/2
· ρij = Corr(εi, εj), i 6= j; specifically,
ρij =
∑Tt=1 εitεjt√(∑T
t=1 ε2it
)√(∑Tt=1 ε2
jt
)
∗ intuition
· compute NxN correlation matrix
R =
ρ11 · · · · · · ρ1N
... . . .
... . . .
ρ1N · · · · · · ρNN
· no correlation =⇒ R = IN
112
∗ test does not have good statistical properties when T < N , and
likely to do worse as N −→∞∗ STATA: -xttest2 - after -xtreg, fe-
– T < N
∗ Peasaran (2004) test
λCD =
√2T
N(N − 1)
∑N−1
i=1
∑i=N
j=i+1ρij ∼ N(0, 1)
∗ STATA: -xtcsd, pes- after -xtreg, fe re-
113
5.5 Dynamic Panel Model
• model
yit = xitβ + γyit−1 + αi + εit, εitiid∼ N(0, σ2)
where i = 1, ..., N ; t = 2, ..., T ; and T > 3
• estimation
– even if Cov(αi, xit) = 0, RE not applicable since Cov(αi, yit−1) 6= 0
– need FE/FD estimator
– FD =⇒
∆yit = ∆xitβ + γ∆yit−1 + ∆εit, i = 1, ..., N ; t = 3, ..., T
– but this model is not estimable by OLS since Cov(∆εit, ∆yit−1) 6= 0
since Cov(εit−1, yit−1) 6= 0
114
• solutions
– FD, then estimate via IV, treating ∆yit−1 as endogenous
– what are potential IVs?
∗ need vars that are correlated with ∆yit−1, uncorrelated with ∆εit
∗ suitable candidates
· xit−2 (through yit−2)
· yit−2 (through yit−2)
· yit−3, yit−4, yit−5, ... (through autoregressive process) ... e.g.,
Cov(∆yit−1, yit−3) = Cov(yit−1, yit−3)− Cov(yit−2, yit−3)
6= 0
∗ lots of instruments (beware of weak IVs)
– simple solution: FD, then use TSLS with xit−2, yit−2 as IVs
115
∗
– more complex solution
∗ estimation by GMM to utilize more instruments
· writing out model for each period yields
∆yi3 = ∆xi3β + γ∆yi2 + ∆εi3
...
∆yiT = ∆xiTβ + γ∆yiT−1 + ∆εiT
where IVs for ∆yi2 are xi1, yi1; IVs for ∆yi3 are xi2, yi2, yi1; ...
; IVs for ∆yiT−1 are xiT−2, yiT−2, ..., yi2, yi1
· not usual TSLS set-up
· GMM allows moment conditions to be derived using as many
IVs as desired
· requires εit to be serially uncorrelated; or, equivalently, ∆εit
should be AR(1)
∗ STATA: -xtabond - (Arellano & Bond 1991)
116
• persistence
– Blundell & Bond (1998) show that if |γ| > 0.8 or so, TSLS and A-B
estimator do not work very well (weak IVs)
– solution
∗ add additional moment conditions derived from the model in levels
yit = xitβ + γyit−1 + αi + εit
∗ what are IVs for yit−1?
· ∆yit−1 (independent of αi, εit)
· ∆xit−1 (independent of αi, εit)
· ∆yit−2, ∆yit−3, ... (through autoregressive process)
– STATA: -xtabond2 - (system estimator)
117