View
110
Download
0
Embed Size (px)
DESCRIPTION
econometrics - the simplest semiparametric estimator from 1989. Only identification is argued.
Citation preview
Hidehiko IchimuraSemiparametric Least Squares (SLS) and weighted
SLS estimation of single-index models1
Péter Tó[email protected]
UT Austin
1Journal of Econometrics 58 (1993) pp. 71-120.
Introduction
Index models
I Semiparametric model: if an index parametrizing the DGP (adistribution) consists of two parts θ and γ, where θ ∈ Θ in a�nite dimensional space, while γ ∈ Γ in an in�nite dimensionalspace
I Single-index model: The DGP is assumed to beyi = φ(h(xi, θ0)) + εi for i = 1, 2, 3, ..., n
I where {(xi, yi)} is the observed iid sampleI θ0 ∈ RM is the true (�nite) parameter vector,I E[εi|xi] = 0I h() is known up to the parameter θ, but φ() is not.
I Examples.
1
(W)SLS estimator
Three observations
I 1. The variation in y results from the variation in ε and thevariation in x (or h(x, θ0))
I 2. On the 'contour line', where h(x, θ0) = const all thevariation in y comes from ε
I 3. (2) does not (necessarily) hold for θ 6= θ0
I So what will be the identi�cation strategy for θ0?
I What would be the estimation strategy?
I caveats - we have conditional moments
2
SLS estimator 1
I an extremum estimator
I if we know the conditional statistic, then the objective functionis the sample analogue of some measure of variation, inparticular now of the variance:
Jhn (θ) =1
n
∑[yi − E(yi|h(xi, θ)]
2
I you can also just sum for all the h-values
I if we do not know the conditional mean, we estimate itnonparametrically with a smooth kernel estimator (why?), so
Jn(θ) =1
n
∑I(xi ∈ X)[yi − E(xi, θ)]
2 + op(n−1)
3
SLS estimator 2
I here θ ∈ RL
E(xi, θ) =
∑j 6=i yjI(xj ∈ Xn)K[h(xi, θ)− h(xj , θ)]/an∑j 6=i I(xj ∈ Xn)K[h(xi, θ)− h(xj , θ)]/an
I if∑j 6=i I(xj ∈ Xn)K[h(xi, θ)− h(xj , θ)]/an 6= 0
I otherwise: if yi ≤ (ymin + ymax)/2, then E(xi, θ) = yminI otherwise E(xi, θ) = ymax
I where Xn = {x|||x− x′|| < 2an for some x′ ∈ X}
4
SLS estimator 3
I contd:I an → 0, but
√nan →∞ positive series
I K : R→ R is a smooth kernel functionI if all denominators of the kernel regression function are zero, θ
is assumed to be 0
I NLS vs SLS
I WSLS: W (x) weighting function will be introduced in thekernel denominator, numerator and to the objective functionitself (in front of the quadratic fn)
I 0 < W (x) < W
I From now on we will restrict our attention to linear h()-s
5
Identi�cation
Assumption 4.1.-2.
I Assumption 4.1: The φ() function is di�erentiable and notconstant on the whole support of x′β0
I nominal regressors: these are the x vectors with lth member(variable) xl, which are actually can be thought as functions
xl(z) of underlying regressors (z1, z2, ..., zL′)
I we assume the underlying regressors are either continuous ordiscrete
I the �rst L1 nominal and L′1 underlying regressors havecontinuous marginals (the rest discrete ones)
I Assumption 4.2I (1) xl() for all l has partial derivatives wrt the continuous
underlyingsI (2) for discrete nominal regressors ∂xl/∂
∑zl
′= 0 for
l = L1 + 1, L1 + 2, ..., L and l′ = 1, 2, ..., L′1 almosteverywhere in z
6
I contd.:I (3)
⋂L′1 l′ = 1{sl′1 , ..., sl
′
L1|sl′l = ∂xl
∂zl′for some z ∈ Z}⊥ = {0}
I (4) (i) for each θ ∈ Θ there is an open T and at leastL− L1 + 1 constant vectors cl = (cL+1, ..., cL) forl = 0, 1, ..., L− L1 such that
I cl − c0 are linearly independent (for l = 1, 2, ..L− L1)I T is in
L−L1⋂l=0
{t|t = θ1x1(z) + ...
...+ θL1xL1(z) + θL1+1c
lL1+1 + ...+ θLc
lL z ∈ Z(cl)}
where Z(cl) = {z ∈ Z|xL1+1 = clL1+1, ..., xL(z) = clL}I (ii) and φ is not periodic on (T )
7
Identi�cation: Theorem 4.1.
Let us have the linear single index model de�ned above. If there is
a continuous regressor that has a non-zero coe�cient, then
Assumption 4.1 and 4.2 (1-3) implies that θ0 is identi�ed up to a
scalar constant for all continuous regressors. In addition, if 4.2 (4)
is satis�ed, then the coe�cients of the discrete regressors are also
identi�ed (up to a scalar constant).
I Intuition for proof: assume there is another θ′ that minimizes theobjective function and get contradiction.
8
What does assumption 4.2 rule out?
I Ex. 4.2.: when x1 = z and x2 = z2
I Ex. 4.3.: x1 = z1 ∈ [0, 1] and x2 = z2 ∈ {0, 1}
I so 4.2. (3) is much a non-constant/inversibility condition
I ... while 4.2. (4) is a support-like condition
I either should have small enough Θ or large enough ( full support!)support for the continuous variable
I what is the problem with periodic φ(.)?
9
Asymptotic properties
Assumptions 5.1-2
I Some objects:
I X is a subset of x's support
I Tθ(X) = {t ∈ R|t = h(x, θ) ∀x ∈ X}I f(t, θ) is the Lebesgue density of t = h(x, θ) (aww Lord)
I Assumption 5.1: iid sample
I Assumption 5.2: θ ∈ Int(Θ) where Θ ∈ RM is a compact set
10
Assumptions 5.3-6
I Assumption 5.3:
I 1. X is compact
I 2. infx∈X f(h(x, θ), θ) > 0
I 3. f(t, θ) and E[y|h(x, θ) = t] are three times di�erentiable wrt t,and the third di�erential is Lipschitz jointly in both arguments
I Assumption 5.4: y is in Lm (for an m ≥ 2) and the conditionalvariance of y (on x) is uniformly bounded and bounded away from0 on X
I Assumption 5.5: h(x, θ) is Lipschitz jointly on X ×Θ
I Assumption 5.6: on the kernel K; besides the usuals we have thatthe second derivative is Lipschitz
11
Consistency: Theorem 5.1If Assumption 5.1-6 hold, the (W)SLS estimator de�ned above is
consistent.
I the proof uses that P [Jn(θ) ≤ Jn(θ0)] = 1, and then observes that
P [Jn(θ) ≤ Jn(θ0)] = P [Jn(θ) ≤ Jn(θ0), θ ∈ Bε(θ0)]+
+P [Jn(θ) ≤ Jn(θ0), θ /∈ Bε(θ0)] ≤
≤ P [θ ∈ Bε(θ0)] + P [ infθ∈Θ\Bε(θ0)
Jn(θ) ≤ Jn(θ0)]
I where Bε(θ0) is an open ball around θ0 with ε radius, and
P [ infθ∈Θ\Bε(θ0)
Jn(θ) ≤ Jn(θ0)]→ 0.
I Alternatively, I think after establishing consistency one could usethe continuous mapping theorem and the consistency theorem forextremum estimators
12
Asymptotic normality: Theorem 5.2
Under Ass. 5.1-6, and if y has at least 3 absolute moments, θ0 is
identi�ed, regularity conditions are satis�ed for an, then
√n(θ − θ0) N(0, V −1ΣV −1),
where the variance is of just the usual sandwhich.
I Note that the usual sandwhich formula is not feasible now, since wehave the derivative of th φ(.) function in it, which is unknown...
13
Some remarks
I Optimal reweighting
I 'inner weighting'
I weights are reducing variance AND bias
I the optimal weighting is the usual Σ(X)−1
I one can show it achieves the semiparametric lower bound by Newey(1990)
I Estimation of the covariance matrix
I he introduces a kernel estimate for ∂EW (xi, θ)/∂θ
I Small sample properties (example): comparable with MRC, betterthan MS
I the further we go from normality the better this performs relatively
14
Proofs
Identi�cation 1
I Suppose there is θ∗ minimizes the objective function, so
E{W (x)[φ(x, θ0)− E(φ(.)|x′θ∗)]2} = 0.
I Moreover, since W (x) > 0 for all x ∈ XE(φ(.)|x′θ∗ = t) = φ(x(z), θ0),
I but then after taking derivatives wrt z (normalize θ01 = θ∗1 = 1)
φ′(x, θ0)[γ2∂x2/∂zl
′+ ...+ γL1∂x
L1/∂zl′] = 0
I for all l′ ∈ {1, ..., L′1} a.s. for z ∈ ZI where γl = θ0l − θ01θ∗l
I Now we need that for the z-s for which φ′(θ0, x(z)) 6= 0assumption 4.3 holds - then we see the �rst statement proved.
I Common trick: t = x′(θ∗ − θ0), this is what you really conditionon...
15
Identi�cation 2
I So now we identi�ed the θ01, ..., θ0L1 coe�cients up to a constant r
I This leaves us with
φ(x, θ0) = E[φ()|x′θ∗] =
= φ(t/r + (θ0L1+1/r − θ∗L1+1)xL1+1 + ...+ (θ0L/r − θ∗L)xL)
I after starring for a minute or two you realize that Assumption 4.2.(4) is ready-made for this...
16
Consistency
I We only have to show
P [ infθ∈Θ\Bε(θ0)
Jn(θ) ≤ Jn(θ0)]→ 0.
I tedious algebra, the only "idea": build a bridge of ε-s through usingEW () instead of EW () + triangle inequality and identi�cation givesthe desired result
I intuition
17
Asympptotic Normality
I the standard proof from Newey-McFadden basically
I assuming kernel estimator is as consistent wsome restriction(Lipschitz)
18
Thank you for your attention!