Ichimura 1993: Semiparametric Least Squares (non-technical)

Hidehiko IchimuraSemiparametric Least Squares (SLS) and weighted

SLS estimation of single-index models1

Péter Tó[email protected]

UT Austin

1Journal of Econometrics 58 (1993) pp. 71-120.

Introduction

Index models

I Semiparametric model: if an index parametrizing the DGP (adistribution) consists of two parts θ and γ, where θ ∈ Θ in a�nite dimensional space, while γ ∈ Γ in an in�nite dimensionalspace

I Single-index model: The DGP is assumed to beyi = φ(h(xi, θ0)) + εi for i = 1, 2, 3, ..., n

I where {(xi, yi)} is the observed iid sampleI θ0 ∈ RM is the true (�nite) parameter vector,I E[εi|xi] = 0I h() is known up to the parameter θ, but φ() is not.

I Examples.

1

(W)SLS estimator

Three observations

I 1. The variation in y results from the variation in ε and thevariation in x (or h(x, θ0))

I 2. On the 'contour line', where h(x, θ0) = const all thevariation in y comes from ε

I 3. (2) does not (necessarily) hold for θ 6= θ0

I So what will be the identi�cation strategy for θ0?

I What would be the estimation strategy?

I caveats - we have conditional moments

2

SLS estimator 1

I an extremum estimator

I if we know the conditional statistic, then the objective functionis the sample analogue of some measure of variation, inparticular now of the variance:

Jhn (θ) =1

n

∑[yi − E(yi|h(xi, θ)]

2

I you can also just sum for all the h-values

I if we do not know the conditional mean, we estimate itnonparametrically with a smooth kernel estimator (why?), so

Jn(θ) =1

n

∑I(xi ∈ X)[yi − E(xi, θ)]

2 + op(n−1)

3

SLS estimator 2

I here θ ∈ RL

E(xi, θ) =

∑j 6=i yjI(xj ∈ Xn)K[h(xi, θ)− h(xj , θ)]/an∑j 6=i I(xj ∈ Xn)K[h(xi, θ)− h(xj , θ)]/an

I if∑j 6=i I(xj ∈ Xn)K[h(xi, θ)− h(xj , θ)]/an 6= 0

I otherwise: if yi ≤ (ymin + ymax)/2, then E(xi, θ) = yminI otherwise E(xi, θ) = ymax

I where Xn = {x|||x− x′|| < 2an for some x′ ∈ X}

4

SLS estimator 3

I contd:I an → 0, but

√nan →∞ positive series

I K : R→ R is a smooth kernel functionI if all denominators of the kernel regression function are zero, θ

is assumed to be 0

I NLS vs SLS

I WSLS: W (x) weighting function will be introduced in thekernel denominator, numerator and to the objective functionitself (in front of the quadratic fn)

I 0 < W (x) < W

I From now on we will restrict our attention to linear h()-s

5

Identi�cation

Assumption 4.1.-2.

I Assumption 4.1: The φ() function is di�erentiable and notconstant on the whole support of x′β0

I nominal regressors: these are the x vectors with lth member(variable) xl, which are actually can be thought as functions

xl(z) of underlying regressors (z1, z2, ..., zL′)

I we assume the underlying regressors are either continuous ordiscrete

I the �rst L1 nominal and L′1 underlying regressors havecontinuous marginals (the rest discrete ones)

I Assumption 4.2I (1) xl() for all l has partial derivatives wrt the continuous

underlyingsI (2) for discrete nominal regressors ∂xl/∂

∑zl

′= 0 for

l = L1 + 1, L1 + 2, ..., L and l′ = 1, 2, ..., L′1 almosteverywhere in z

6

I contd.:I (3)

⋂L′1 l′ = 1{sl′1 , ..., sl

′

L1|sl′l = ∂xl

∂zl′for some z ∈ Z}⊥ = {0}

I (4) (i) for each θ ∈ Θ there is an open T and at leastL− L1 + 1 constant vectors cl = (cL+1, ..., cL) forl = 0, 1, ..., L− L1 such that

I cl − c0 are linearly independent (for l = 1, 2, ..L− L1)I T is in

L−L1⋂l=0

{t|t = θ1x1(z) + ...

...+ θL1xL1(z) + θL1+1c

lL1+1 + ...+ θLc

lL z ∈ Z(cl)}

where Z(cl) = {z ∈ Z|xL1+1 = clL1+1, ..., xL(z) = clL}I (ii) and φ is not periodic on (T )

7

Identi�cation: Theorem 4.1.

Let us have the linear single index model de�ned above. If there is

a continuous regressor that has a non-zero coe�cient, then

Assumption 4.1 and 4.2 (1-3) implies that θ0 is identi�ed up to a

scalar constant for all continuous regressors. In addition, if 4.2 (4)

is satis�ed, then the coe�cients of the discrete regressors are also

identi�ed (up to a scalar constant).

I Intuition for proof: assume there is another θ′ that minimizes theobjective function and get contradiction.

8

What does assumption 4.2 rule out?

I Ex. 4.2.: when x1 = z and x2 = z2

I Ex. 4.3.: x1 = z1 ∈ [0, 1] and x2 = z2 ∈ {0, 1}

I so 4.2. (3) is much a non-constant/inversibility condition

I ... while 4.2. (4) is a support-like condition

I either should have small enough Θ or large enough ( full support!)support for the continuous variable

I what is the problem with periodic φ(.)?

9

Asymptotic properties

Assumptions 5.1-2

I Some objects:

I X is a subset of x's support

I Tθ(X) = {t ∈ R|t = h(x, θ) ∀x ∈ X}I f(t, θ) is the Lebesgue density of t = h(x, θ) (aww Lord)

I Assumption 5.1: iid sample

I Assumption 5.2: θ ∈ Int(Θ) where Θ ∈ RM is a compact set

10

Assumptions 5.3-6

I Assumption 5.3:

I 1. X is compact

I 2. infx∈X f(h(x, θ), θ) > 0

I 3. f(t, θ) and E[y|h(x, θ) = t] are three times di�erentiable wrt t,and the third di�erential is Lipschitz jointly in both arguments

I Assumption 5.4: y is in Lm (for an m ≥ 2) and the conditionalvariance of y (on x) is uniformly bounded and bounded away from0 on X

I Assumption 5.5: h(x, θ) is Lipschitz jointly on X ×Θ

I Assumption 5.6: on the kernel K; besides the usuals we have thatthe second derivative is Lipschitz

11

Consistency: Theorem 5.1If Assumption 5.1-6 hold, the (W)SLS estimator de�ned above is

consistent.

I the proof uses that P [Jn(θ) ≤ Jn(θ0)] = 1, and then observes that

P [Jn(θ) ≤ Jn(θ0)] = P [Jn(θ) ≤ Jn(θ0), θ ∈ Bε(θ0)]+

+P [Jn(θ) ≤ Jn(θ0), θ /∈ Bε(θ0)] ≤

≤ P [θ ∈ Bε(θ0)] + P [ infθ∈Θ\Bε(θ0)

Jn(θ) ≤ Jn(θ0)]

I where Bε(θ0) is an open ball around θ0 with ε radius, and

P [ infθ∈Θ\Bε(θ0)

Jn(θ) ≤ Jn(θ0)]→ 0.

I Alternatively, I think after establishing consistency one could usethe continuous mapping theorem and the consistency theorem forextremum estimators

12

Asymptotic normality: Theorem 5.2

Under Ass. 5.1-6, and if y has at least 3 absolute moments, θ0 is

identi�ed, regularity conditions are satis�ed for an, then

√n(θ − θ0) N(0, V −1ΣV −1),

where the variance is of just the usual sandwhich.

I Note that the usual sandwhich formula is not feasible now, since wehave the derivative of th φ(.) function in it, which is unknown...

13

Some remarks

I Optimal reweighting

I 'inner weighting'

I weights are reducing variance AND bias

I the optimal weighting is the usual Σ(X)−1

I one can show it achieves the semiparametric lower bound by Newey(1990)

I Estimation of the covariance matrix

I he introduces a kernel estimate for ∂EW (xi, θ)/∂θ

I Small sample properties (example): comparable with MRC, betterthan MS

I the further we go from normality the better this performs relatively

14

Proofs

Identi�cation 1

I Suppose there is θ∗ minimizes the objective function, so

E{W (x)[φ(x, θ0)− E(φ(.)|x′θ∗)]2} = 0.

I Moreover, since W (x) > 0 for all x ∈ XE(φ(.)|x′θ∗ = t) = φ(x(z), θ0),

I but then after taking derivatives wrt z (normalize θ01 = θ∗1 = 1)

φ′(x, θ0)[γ2∂x2/∂zl

′+ ...+ γL1∂x

L1/∂zl′] = 0

I for all l′ ∈ {1, ..., L′1} a.s. for z ∈ ZI where γl = θ0l − θ01θ∗l

I Now we need that for the z-s for which φ′(θ0, x(z)) 6= 0assumption 4.3 holds - then we see the �rst statement proved.

I Common trick: t = x′(θ∗ − θ0), this is what you really conditionon...

15

Identi�cation 2

I So now we identi�ed the θ01, ..., θ0L1 coe�cients up to a constant r

I This leaves us with

φ(x, θ0) = E[φ()|x′θ∗] =

= φ(t/r + (θ0L1+1/r − θ∗L1+1)xL1+1 + ...+ (θ0L/r − θ∗L)xL)

I after starring for a minute or two you realize that Assumption 4.2.(4) is ready-made for this...

16

Consistency

I We only have to show

P [ infθ∈Θ\Bε(θ0)

Jn(θ) ≤ Jn(θ0)]→ 0.

I tedious algebra, the only "idea": build a bridge of ε-s through usingEW () instead of EW () + triangle inequality and identi�cation givesthe desired result

I intuition

17

Asympptotic Normality

I the standard proof from Newey-McFadden basically

I assuming kernel estimator is as consistent wsome restriction(Lipschitz)

18

Thank you for your attention!

Education

Ichimura 1993: Semiparametric Least Squares (non-technical)