Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
LIMITED DEPENDENT VARIABLES - BASIC
[1] Binary choice models
• Motivation:
• Dependent variable (yt) is a yes/no variable (e.g., unionism, migration,
labor force participation, or dealth ...).
(1) Linear Model (Somewhat Defective)
Digression to Bernoulli's Distribution:
• Y is a random variable with pdf; p = Pr(Y=1) and (1-p) = Pr(Y=0).
• f(y) = py(1-p)1-y.
• E(y) = Σy yf(y) = 1•p + 0• (1-p) = p;
• var(y) = Σyy2f(y) - [E(y)]2 = p - p2 = p(1-p).
End of Digression
• Linear Model:
Limited_Basic-1
tt ty x β ε′= +i , where yt = 1 if yes and yt = 0 if no.
• Assume that ( | ) 0t tE xε =i .
• ( | ) Pr( 1 | )t t t t t tE y x x p y xβ′= ≡ = =i i i
• tj
tj
px
β∂=
∂: So, the coefficients measure effects of xtj on pt.
• Problems in the linear model:
1) The εt are nonnormal and heteroskedastic.
• Note that yt = 1 or 0.
Limited_Basic-2
t→ 1t xε β′= − i with prob = t tp x β•′=
tx β′= i with prob = 1 1t tp x β•′− = − .
• ( | ) (1 ) ( )(1 ) 0t t t t t tE x x x x xε β β β β• • • • •′ ′ ′ ′= − + − − =
• 2 2 2var( | ) ( | ) (1 ) ( ) (1 )(1 )
t t t t t t t t
t t
x E x x x x xx x
ε ε β β ββ β
• • • • •
• •
β•′ ′ ′ ′= = − + − −′ ′= −
→ Not constant over t.
• OLS is unbiased but not efficient.
• GLS using ( )( )2 ˆˆ 1t t tx x ˆσ β′= −i β′i is more efficient than OLS.
2) Suppose that we wish to predict po = P(yo = 1|xo) at xo. The natural
predictor of po is ˆox β′ where β is OLS or GLS. But ˆ
ox β′ would be outside
of the range (0,1).
(2) Probit Model
• Model:
Limited_Basic-3
t*t ty x β ε′= +i , t = 1, ... , T,
where yt* is a unobservable latent variable (e.g., level of utility);
yt = 1 if yt* > 0
= 0 if yt* < 0;
( | ) 0t tE xε • = ; ~ (0,1t N )ε− conditional on tx • ;
1 1 1( | ,..., , ,..., ) 0t t tE x xε ε ε− =i i ;
the tx • are ergodic and stationary.
Digression to normal pdf and cdf
• X ~ N(μ,σ2): 2
21 (( ) exp
22xf x μσπσ
⎛ ⎞−= −⎜
⎝ ⎠
)⎟ , -∞ < x < ∞.
• Z ~ N(0,1): 21( ) exp
22zzφ
π⎛ ⎞
= −⎜ ⎟⎝ ⎠
; . ( ) Pr( ) ( )z
z Z z vφ−∞
Φ = < = ∫ dv
• In GAUSS, φ(z) = pdfn(z) and Φ(z) = cdfn(z).
• Some useful facts:
dΦ(z)/dz = φ(z); dφ/dz = -zφ(z); Φ(-z) = 1 - Φ(z); φ(z) = φ(-z).
End of digression
• Return to the Probit model
• PDF of the yt: Conditional on xt•,
• Pr(yt = 1) = Pr(yt* > 0) = Pr(xt•′β + εt > 0) = Pr(xt•′β > -εt)
= Pr(-εt < xt•′β) = Φ(xt•′β).
→ This gaurantees pt ≡ Pr(yt = 1) being in the range (0,1).
• ( ) ( )1( | ) ( ) 1 ( )t ty yt t t tf y x x xβ β −
• •′ ′= Φ −Φi .
Short Digression *t ty x tβ ε′= +i , t = 1, ... , T,
(-εt) are i.i.d. U(0,1).
Then, Pr(yt = 1|xt•) = xt•′β (linear) (Heckman and Snyder, Rand, 1997).
End of Digression
• Log-likelihood Function of the Probit model
• LT(β) = 1 ( | )Tt t tf y x=Π i .
• lT(β) = Σtln(f(yt|xt•)) = ( ) ln ( ) (1 ) ln 1 (t t t t ty x y xβ β• •′ ′Σ Φ + − −Φ
• Some useful facts:
• E(yt|xt•) = Φ(xt•′β) .
• ( ) ( ) ( )t t tt t
t
x x x x xx
β β β φ ββ β β• • •
• ••
′ ′ ′∂Φ ∂Φ ∂ ′= =′∂ ∂ ∂
.
• 2 2( ) ( ) ( ) ( )t t
t t ti j k k
x xtx x x xβ β β φ β
β β β β• •
• • •
×
⎛ ⎞′ ′∂ Φ ∂ Φ•′ ′ ′= = −⎜ ⎟′∂ ∂ ∂ ∂⎝ ⎠.
Limited_Basic-4
• ( )Tl ββ
∂∂
= ln ( ) ln(1 ( ))(1 )t tt t t
x xy yβ ββ β
• •′ ′⎧ ⎫∂ Φ ∂ −ΦΣ + −⎨ ⎬∂ ∂⎩ ⎭
= ( ) ( )(1 )( ) 1 ( )
t tt t t t t
t t
x xy x yx x
φ β φ ββ β
• • x• •• •
⎧ ⎫′ ′−Σ + −⎨ ⎬′ ′Φ −Φ⎩ ⎭
= ( ( )) ( )( )(1 ( ))t t t
t t
t t
y x x xx x
β φ ββ β
′ ′−ΦΣ
′ ′Φ −Φ.
• Numerical Property of the MLE of β ( β )
• ˆ( )Tl β
β∂∂
= ˆ ˆ( ( )) ( )
ˆ ˆ( )(1 ( ))t t t
t tt t
y x x xx x
β φ ββ β
• ••
• •
′ ′−ΦΣ
′ ′Φ −Φ = 0k×1.
• ˆ( )ˆ( ) T
TlH βββ β
∂=
′∂ ∂ should be negative definite.
[See Judge, et al for the exact form of HT]
• lT(β) is globally concave with respect to β; that is, HT(β) is negative
definite. [Amemiya (1985, Advanced Econometrics)].
• Use as 1ˆ[ ( )]TH β −− ˆ( )Cov β .
Limited_Basic-5
• How to find MLE (See Greene Ch. 5 or Hamilton, Ch. 5)
1. Newton-Raphson’s algorithm:
STEP 1: Choose an initial oθ . Then compute
(*) 11
ˆ ˆ ˆ[ ( )] (o T o TH s )oθ θ θ −= + − θ .
STEP 2: Using 1θ , compute 2θ by (*).
STEP 3: Continue until 1ˆ ˆq qθ θ+ ≅ .
Note: N-R method is the best if lT(θ) is globally concave (i.e., the Hessian
matrix is always negative definite for any θ). N-R may not work, if
lT(θ) is not globally concave.
2. BHHH [Berndt, Hall, Hall, Hausman]
• lT(θ) = Σtln[ft(θ)].
• Define:
gt(θ) = ln[ ( )]tf θθ
∂∂
[p×1] (sT(θ) = Σtgt(θ).)
BBT(θ) = Σtgt(θ)gt(θ)′ [cross product of first derivatives].
Theorem: Under suitable regularity conditions,
1 1( ) lim ( )T p T T oB ET T
Hθ θ→∞⎛ ⎞→ −⎜ ⎟⎝ ⎠
.
Limited_Basic-6
Implication:
Limited_Basic-7
T• ( ) ( )TB Hθ θ≈ − , as T → ∞.
( )Cov θ can be estimated by 1[ ( )]TB θ − or 1[ ( )]TH θ −− .
• BHHH algorithm uses
( ) 11 ( ) ( )o oo T TB s oθ θ λ θ θ
−= + ,
where λ is called step length.
• When BHHH is used, no need to compute second derivatives.
• Other available algorithms: BFGS, BFGS-SC, DFP.
BHHH for Probit:
Can show gt(β) = ξtxt•, where,
( ) ; ( ); ((1 )
t t tt t t t
t t
y x x )tφξ φ φ β β• •
−Φ ′ ′= = Φ = ΦΦ −Φ
→ 2ˆ ˆˆ ˆ( )T t t t t t t tB g g x xβ ξ • •′ ′= Σ = Σ .
[BT( β )]-1 is ˆ( )Cov β by BHHH.
• Interpretation of β
1) βj shows direction of influence of xtj on Pr( | ) (t t ty x x )β• •′= Φ .
→ βj > 0 means that Pr( 1 | )t ty x •= increases with xtj
2) Rate of change:
Pr( 1| ) ( ) ( )t t tt j
tj tj
y x x xx x
β φ β β• ••
′∂ = ∂Φ ′= =∂ ∂
.
• Estimation of probabilities and rates of changes
• Estimation of pt = Pr(yt=1|xt•) at mean of xt•
• Use ˆˆ ( )p x β′= Φ .
• ( )2ˆ ˆˆvar( ) ( )p x xφ β′ ′= Ωx where ˆˆ ( )Cov βΩ = [by delta-method].
• Estimation of rates of change
• Use ˆ( ) ˆ ˆˆ ( )j
jtj
xp xxβ φ β β′∂Φ ′= =
∂.
• ˆ ˆ( ) ( )ˆˆvar( )
j jj p pp β β
β β∂ ∂
= Ω′∂ ∂
. [by delta-method].
• Note that:
( ) ( ) ( ) ( )j
j jp x x x xβ β φ β β φ ββ
∂ ′ ′ ′ ′= − +′∂
J ,
where Jj = 1×k vector of zeros except that the j’th element = 1.
• Note on normalization:
• Model: yt* = xt•′β + εt, -εt ~ N(0,σ2).
yt = 1 iff yt* > 0.
• *Pr( 1 | ) Pr( 0 | ) Pr( 0 | )
Pr( | ) Pr( / ( / ) | ) [ ( / )]t t t t t t t t
t t t t t t t
p y x y x x xx x x x x
β εε β ε σ β σ β
• • • •
• • • • •
′= = = > = + >′ ′= − < = − < = Φ σ′
• Can estimate β/σ, but not β and σ separately.
Limited_Basic-8
• Testing Hypothesis:
1. Wald test:
• Ho: w(β) = 0.
Limited_Basic-9
ˆ• WT = 1ˆ ˆ ˆˆ( ) ( ) ( ) ( )w W W wβ β β−
⎡ ⎤′ ′Ω⎣ ⎦ β →d χ2(df = # of restrictions),
where β = probit MLE and W(β) = ( )w ββ
∂′∂
.
2. LR test:
• Easy for equality or zero restrictions (i.e., Ho: β2 = β3, or Ho: β2 = β3 =
0).
• EX 1: Suppose you wish to test Ho: β4 = β5 = 0.
STEP 1: Do Probit without restriction and get lT,UR = ln(LT,UR).
STEP 2: Do Probit with the restrictrions and get lT,R = ln(LT,R).
→ Probit without xt4 and xt5.
STEP 3: LRT = 2[lT,UR - lT,R)] ⇒ χ2(df = 2).
• EX 2: Suppose you wish to test Ho: β2 = ... = βk = 0.
(Overall significance test)
• Let n = Σtyt.
• lT* = n ln(n/T) + (T-n) ln[(T-n)/T].
• LRT = 2[lT,UR – lT*] →p χ2(k-1).
• Pseudo-R2 (McFadden, 1974)
• ρ2 = 1 – lT,UR /lT*.
→ 0 ≤ ρ2 ≤ 1.
Limited_Basic-10
)• If ˆ( tx β•′Φ = 1 whenever yt = 1, and if ˆ( tx )β•′Φ = 0 whenever yt = 0, ρ2
= 1.
• If 0 < ρ2 < 1, no clear meaning.
(3) Logit Models
• Model: *t ty x tβ ε′= +i ,
εt ~ logistic with g(ε) = eε/(1+eε)2 and G(ε) = eε/(1+eε).
• Use Pr( 1 | ) (t t ty x G x )β• ′= = • (instead of ( )tx β•′Φ .
• Logit MLE logˆ
itβ max.
( ) ( ) ln( ) ln ( ) (1 ) ln 1 ( )T t t t t tL y G x y G xβ β• •′ ′= Σ + − − .
Use 1log
ˆ[ ( )]T itH β −− or 1log
ˆ[ ( )]T itB β − as . logˆ( )itCov β
• ( )tt j
jt
p g xx
β β•∂ ′=∂
.
• Facts:
• The logistic dist. is quite similar to standard normal dist. except that the
logistic dist. has thicker tails (similarly to t(7)).
• If data contain few obs. with y = 1 or y = 0, then probit and logit may be
quite different. Other than that, probit and logit yield very similar
predictions. Especially, marginal effects are quite similar.
• Roughly, . logˆ ˆ1.6it probitβ β=
Limited_Basic-11
[2] Censoring vs. Truncation
(Greene, ch. 20)
(1) Classical distinction
• Consider shots on target.
Truncation: cases where you have data on “hole” only.
Censoring: cases where you know how many shots missed.
(2) Censoring
• y* ~ pdf: f(y*).
• Observe y = y* if A < y* < B ; A if y* ≤ A ; B if y* ≥ B.
(For obs. with y = A or y = B, y* is unknown.)
• Log-likelihood function:
OB = t|yt* observed; NOB = t|yt
* unobserved,
( )*ln ( | ) Pr( ) ln Pr( )T t OB t t NOBl f y t OB t OB t N∈ ∈⎡ ⎤= Σ ∈ × ∈ + Σ ∈⎣ ⎦ OB .
• Note:
f(yt*|t∈OB)Pr(t∈OB)
= f(yt*|A < yt
* < B)Pr(A < yt* < B)
= [f(yt*)/Pr(A < yt
* < B)]Pr(A < yt* < B) = f(yt
*).
→ ( ) ( ) ( )* *ln ( ) ln Pr( ) ln Pr( )t t t
T t tA y B y A y B
l f y y A y< < = =
= + ≤ +∑ ∑ ∑ *t B≥
→ ( ) ( ) ( )* *ln ( ) ln Pr( ) ln Pr( )t t t
T t tA y B y A y B
l f y y A y< < = =
= + ≤ +∑ ∑ ∑ t B≥
Limited_Basic-12
(3) Truncation
• Observe y = y* iff A < y* < B
• Log-likelihood function:
pdf of yt: g(yt) = f(yt*| A < yt
* < B)
= *
* *( ) ( )
Pr( ) Pr( )t t
t t
f y f yA y B A y B
=< < < <
.
. *ln( ( )) ln[Pr( )T t t tl f y A y= Σ − < < B
t
(4) Tobit (A censored model)
1) Latent model: *t ty x β ε′= +i , εt ~ N(0,σ2) conditional on xt•.
[yt* ~ N(xt•′β, σ2)]
2) 3 possible cases:
A. Observe yt = yt* if yt
* > 0; = 0, otherwise. → yt = max(0, yt*)
B. Observe yt = yt* if yt
* < 0; = 0 otherwise. → yt = min(0, yt*).
C. Observe yt = yt* if yt
* < Lt; = Lt otherwise.
Limited_Basic-13
3) Log-likelihood for A
• Pr(yt* ≤ 0|xt•) = Pr(xt•′β + εt ≤ 0) = Pr(εt ≤ -xt•′β) = Pr(εt/σ ≤ -xt•′(β/σ))
= Φ[-xt•′(β/σ)] = 1 - Φ[xt•′(β/σ)].
• f(yt*) =
* 2
21 (exp
22t ty x βσπσ
•′⎛ ⎞−−⎜ ⎟⎝ ⎠
) .
• Therefore,
*
0 0
0 0
22
0
0
( , ) ln ( ) ln 1
ln ( ) ln 1
1 1ln(2 ) ln( ) ( )2 2
ln 1
t t
t t
t
t
tT t
y y
tt
y y
t ty
t
y
xl f y
xf y
y x
x
ββ σσ
βσ
π σ βσ
βσ
•
> =
•
> =
•>
•
=
′⎛ ⎞⎛ ⎞= + −Φ⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠′⎛ ⎞⎛ ⎞= + −Φ⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
′= − − − −
′⎛ ⎞⎛ ⎞+ −Φ⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
∑ ∑
∑ ∑
∑
∑
Limited_Basic-14
5) Interpretation:
(i) E(yt*|xt•) = E[latent var. (e.g., desired consumption)|xt•] = xt•′β.
→ βj = *( |t t
tj
)E y xx
•∂∂
.
(ii) E(yt|xt•) = E[observed variable (e.g., actual expenditure)]
= Pr(yt* ≥ 0)E(yt|yt
* ≥ 0) + Pr(yt* < 0)E(yt|yt
* < 0)
= Pr(yt* ≥ 0)E(yt
*|yt* ≥ 0) + Pr(yt
* < 0)E(0|yt* < 0)
= Φ(xt•′β/σ)E(xt•′β + εt|εt ≥ -xt•′β)
= Φ(xt•′β/σ)[xt•′β + σλ(xt•′β/σ)]
[where λ(xt•′β/σ) = φ(xt•′β/σ)/Φ(xt•′β/σ) (inverse Mill’s ratio)]
= Φ(xt•′β/σ)xt•′β + σφ(xt•′β/σ).
Short Digression:
Suppose that εt ~ N(0,σ2). Then,
( / )( | )( / )
tt t t
t
hE hh
φ σε ε σσ
> − =Φ
.
End of Digression
Note: Conditional on xt•,
( ) ( )j jt t t tt j t
tj
tj
E y x x xx xx
x
β ββ β βφ β β σ φ βσ σ σ σ
β βσ
• ••
•
⎛ ⎞′ ′ ′∂ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞′ ′= +Φ + −⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟∂ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠⎝ ⎠′⎛ ⎞= Φ⎜ ⎟
⎝ ⎠
i σ σ•
6) Estimation of E(y*|x) and E(y|x).
Limited_Basic-15
• Let g1(β) = x β′ .
• Estimated E(yt*) at sample mean = 1
ˆ( )g β .
• 1 1ˆse G G ′= Ω , where ˆˆ ( )Cov βΩ = and 1
1( )gG xββ
∂ ′= =′∂
• Let g2(β,σ) = x xxβ ββ σφσ σ
⎛ ⎞ ⎛ ⎞′ ′′Φ +⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠.
• Estimated E(yt) at sample mean = 2ˆ ˆ( , )g β σ .
• G2(β,σ) = 2 ,( , )
g x xxβ βφβ σ σ σ
⎡ ⎤⎛ ⎞ ⎛ ⎞′ ′∂ ′= Φ⎢ ⎥⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟′∂ ⎢ ⎥⎝ ⎠ ⎝ ⎠⎣ ⎦.
• se = 2 2ˆ ˆG G ′Ω , where
ˆ
ˆCov β
σ⎛ ⎞
Ω = ⎜ ⎟⎝ ⎠
.
Limited_Basic-16
(5) Truncation (Maddala, Ch. 6)
1) Example 1:
• Earnings function from a sample of poor people (Hausman and Wise,
ECON. 1979):
Limited_Basic-17
t*t ty x β ε′= +i , εt ~ N(0,σ2) conditional on xt•.
Observe yt = yt* iff yt
* < Lt (Lt = 1.5דpoverty line” dep. on family size)
• Log-likelihood function:
• pdf of yt: g(yt|xt•) = f(yt*|yt
* ≤ Lt, xt•) = *
*( | )
Pr( | )t t
t t t
f y xy L x
•
•≤.
Pr(yt* ≤ Lt) = *Pr( , ) Pr |t t t t t
t t t tL x L xy L x xε β β
σ σ σ• •
• •
′ ′− −⎛ ⎞ ⎛≤ = ≤ = Φ⎜ ⎟ ⎜⎝ ⎠ ⎝
⎞⎟⎠
→ ln L = ln( ( | )) ln t tt t t
L xf y x βσ
••
′−⎧ ⎫⎛ ⎞Σ − Φ⎨ ⎬⎜ ⎟⎝ ⎠⎩ ⎭
.
where f is the normal density function.
•
* *( | ) ( | )( | ) ( | (
( | ( ))
t t t t t
t t t t t t t t t t
t t t t t
t tt
E y x E y y LE x x L x E L xx E L x
L xx
))β ε β ε β ε ε ββ ε ε β
ββ σλσ
•
• • • •
• •
••
= <′ ′ ′ ′= + + < = + − > − −
′ ′= − − − > − −
′−⎛ ⎞′= − ⎜ ⎟⎝ ⎠
.
2) Example 2:
• Observe yt = yt* iff yt
* > Lt
• f(yt*|yt
* ≥ Lt, xt•) = f(yt*|xt•)/Pr(yt
* ≥ Lt|xt•)
• Pr(yt* ≥ Lt|xt•) = 1 - t tL x β
σ•′−⎡ ⎤Φ ⎢ ⎥⎣ ⎦
→ ln( ( | )) ln 1 t tT t t t
L xl f y x βσ
••
′⎧ ⎫−⎡ ⎤⎛ ⎞= Σ − −Φ⎨ ⎬⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦⎩ ⎭.
• E(yt|xt•) = E(yt*|yt
* ≥ Lt, xt•) = t tt
L xx ββ σλσ
••
′−⎛ ⎞′ + −⎜ ⎟⎝ ⎠
.
3) Link between Truncation and Tobit
• Suppose Lt = 0 for all t in Example 2,
1 1t t t tL x x xβ β βσ σ σ
• •′ ′− −⎛ ⎞ ⎛ ⎞ ⎛−Φ = −Φ = Φ⎜ ⎟ ⎜ ⎟ ⎜⎝ ⎠ ⎝ ⎠ ⎝
•′ ⎞⎟⎠.
• Then, the log-likelihood function becomes:
22
1 1ln(2 ) ln( ) ( ) ln2 2
tT t t t
xl y x βπ σ βσ σ
••
′⎧ ⎫⎡ ⎤⎛ ⎞′= Σ − − − − − Φ⎨ ⎬⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦⎩ ⎭
• Consider tobit. Choose observations with yt > 0 and do truncation MLE.
This is the case where we observe yt = yt* iff yt
* > Lt = 0. The truncation
MLE using the truncated data is consistent even if it is inefficient.
• If the estimation results from truncation and tobit MLE are quite different,
it means that the tobit model is not correctly specified.
Limited_Basic-18
(6) Two-part Model
Cragg (ECON, 1971), Lin and Schmidt (Review of Economics and Statistics
(RESTAT), 1984)
1) Model:
Limited_Basic-19
t• *t ty x β ε′= +i ,
where g(εt|xt•,zt•,vt) =
22
1 1exp22 t
tx
εσπσ
βσ•
⎛ ⎞−⎜ ⎟⎝′⎛ ⎞Φ⎜ ⎟
⎝ ⎠
⎠ and εt > - xt•′β;
• ht* = zt•′γ + vt with vt ~ N(0,1);
• ht = 1 iff ht* > 0; = 0, otherwise.
3) Example:
yt: desired spending on clothing; ht: timing to buy.
4) Log-likelihood function:
• g(yt|xt•) =
22
1 1exp ( )22 t t
t
y x
x
βσπσβ
σ
•
•
⎛ ⎞′− −⎜ ⎟⎝
′⎛ ⎞Φ⎜ ⎟⎝ ⎠
⎠ ; Pr(ht* > 0) = Φ(zt•′γ)
→ . * * *
1 0
ln[ ( | 0) Pr( 0)] ln[Pr( 0)]t t
t t t th h
g y h h h= =
> > + <∑ ∑
Note:
g(yt|ht* > 0) = g(yt), because εt and vt are sto. indep.
[ ]
( ) ( )
22
1 0
22
1
1 0
1 1ln(2 ) ln( ) ( )2 2 ln 1 ( )ln ( ) ln
1 1ln(2 ) ln( ) ( )2 2
ln
ln ( ) ln 1 ( ) .
t t
t
t t
t t
T th ht
t
t t
h t
t th h
y xl z
xz
y x
x
z z
π σ βσ γβγσ
π σ βσ
βσ
γ γ
= =
=
= =
⎧ ⎫′− − − −⎪ ⎪⎪ ⎪ ′= +⎨ ⎬′⎛ ⎞⎪ ⎪′+ Φ − Φ⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭⎧ ⎫′− − − −⎪ ⎪⎪ ⎪= ⎨ ⎬′⎛ ⎞⎪ ⎪− Φ⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭
′ ′+ Φ + −Φ
∑ ∑
∑
∑ ∑
i
ii
i
i
i
i i
−Φ
→ trunc. for yt > 0 + probit for all obs.
→ Estimate (β,σ) by trunc. and γ by probit.
→ lCragg = ltrunc + lprobit .
Limited_Basic-20
Limited_Basic-21
Note:
Let zt• = xt•. If γ = β/σ, Cragg becomes tobit!!!
5) LR test for tobit specification
STEP 1: Do tobit and get ltobit
STEP 2: Do trunc using observations with yt > 0 and get ltrunc.
STEP 3: Do probit using all observations, and get lprobit
STEP 4: lcragg = ltrunc + lprobit.
STEP 5: LR = 2[lcragg - ltobit] →d χ2(k).
[3] Selection Model
• Heckman, ECON, 1979.
Motivation:
• Model of interest:
y1t = x1t′β1 + ε1t .
• Observe y1t (or/and x1t,•) under a certain condition (“selection rule”).
Example:
• Observe a woman’s market wage if she works.
Complete Model:
y1t = x1t′β1 + ε1t ,
y2t* = x2t′β2 + ε2t .
y2t = 1 if y2t* > 0;
= 0 if y2t* < 0.
We observe y1t iff y2t = 1 (x2t must be observable for any t.)
Assumptions: Conditional on 1 2( , )t tx x′ ′ ′ ,
2
1 1 122
2 12 2
0~ ,
0t
t
Nε σ σε σ σ
.⎡ ⎤⎛ ⎞⎛ ⎞ ⎛ ⎞⎢ ⎥⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠ ⎝ ⎠⎣ ⎦
Limited_Basic-22
Theorem:
Suppose:
Limited_Basic-23
. 2
1 1 122
2 12 2
0~ ,
0h
Nh
σ σσ σ
⎡ ⎤⎛ ⎞⎛ ⎞ ⎛ ⎞⎢ ⎥⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠ ⎝ ⎠⎣ ⎦
Then, E(h1|h2 > -a) = 12( ) .( )aa
φσΦ
Facts: Conditional on 1 2( , )t tx x′ ′ ′ .
• *1 2 1 2 2 2 12 2 2( | 0) ( | ) (t t t t t tE y E x x )ε ε ε β σ λ β′ ′> = > − = ,
where 2 22 2
2 2
( )( )( )
tt
t
xxx t
φ βλ ββ
λ′
′ =′Φ
≡ [inverse Mill’s ratio]
• *1 2 1 1 1 2 2 2 1 1 12 2 2( | 0) ( | ) (t t t t t t t tE y y x E x x x )β ε ε β β σ λ β′ ′ ′> = + > − = + ′
t
• 1 1 1 12t t ty x vβ σ λ′= + + ,
where 2 2 2( | ) 0t t tE v xε β′> − = ; 2
2 2 2 1var( | )t t tv x tε β σ′> − ≡ −ξ
t
;
2 212 2 2[( ) ]t t txξ σ β λ λ′= + .
Two-Step Estimation:
STEP 1: Do probit for all t, and get 2β , and 2 2
2 2
ˆ( )ˆˆ( )
tt
t
xx
φ βλβ
′=
′Φ.
STEP 2: Do OLS on 1 1 1 12ˆ
t t ty x tβ σ λ η′= + + , and get 1β and 12σ .
Facts on the Two-Step Estimator:
• Consistent.
• t-test for Ho: σ12 = 0 (no selection) in STEP 2 is the LM test (Melino,
Review of Economic Studies, 1982)
• But all other t-tests are wrong!!! → s2(X′X)-1 is inconsistent.
• So, have to compute corrected covariance matrix
[See, Heckman (1979, Econ.), Greene (1981, Econ.).]
• Sometimes, corrected covariance matrix is not computable (Greene,
Econ, 1981).
Covariance Matrix of the Two-Step Estimator:
• Let 2ˆˆ ( )Cov βΩ = .
• 1 1 1 12t t t ty x vβ σ λ′= + +
→ . 1 1 1 12 12ˆ ˆ[ ( )t t t t ty x vβ σ λ σ λ λ′= + + − − + ]t
Short Digression:
By Taylor expansion around the true value of β2,
2 22 2 2 2 2 2
2
( )ˆ ˆ ˆ( ) ( ) (tt t t
xx x )λ βλ λ β λ β β ββ′∂′ ′= ≈ + −′∂
.
End of Digression
Limited_Basic-24
Limited_Basic-25
]t
tx
→ , ( ) 11 1 2 2 2 2
12
ˆ ˆ ˆ( ) [ ( )t t t t t t ty x h v z h vβ
λ β β γ β βσ⎛ ⎞′ ′ ′ ′= + − + = + − +⎜ ⎟⎝ ⎠
where . 212 2 2 2[( ) ]t t t th xσ β λ λ′= +
→ In matrix notation,
y1 = Zγ + [H 2 2ˆ( )β β− + v].
→
11
12 2
1 12 2
ˆ ( )ˆ( ) ( ( ) )
ˆ( ) ( ) ( )
TS Z Z Z y
Z Z Z Z H v
Z Z Z H Z Z Z
γ
γ β β
γ β β
−
−
− −
′ ′=
′ ′= + − +
′ ′ ′ ′= + − + v
)
.
→ Can show that 2 2ˆ(β β− and v are uncorrelated. Then, intuitively,
ˆ( TSCov )γ = Cov[(Z′Z)-1Z′H 2 2ˆ( )β β− + (Z′Z)-1Z′v]
= Cov[(Z′Z)-1Z′H 2 2ˆ( )β β− ] + Cov[(Z′Z)-1Z′v]
= (Z′Z)-1Z′HCov 2 2ˆ( )β β− H′Z(Z′Z)-1
+ (Z′Z)-1Z′Cov(v)Z(Z′Z)-1
= (Z′Z)-1Z′HΩH′Z(Z′Z)-1 + (Z′Z)-1Z′ΠZ(Z′Z)-1,
where 1( ,..., )Tdiag π πΠ = .
≈ (Z′Z)-1Z′ ˆˆ ˆH H ′Ω Z(Z′Z)-1 + (Z′Z)-1Z′ Z(Z′Z)Π -1,
where 21 ˆ( ,..., )Tdia 2g v vΠ = and is an
estimated H.
H
• MLE (which is more efficient than two-step estimator)
• Conditional on 1 2( , )t tx x′ ′ ′
2
.
• Pr(y1t is not observed) = Pr(y2t* < 0) = *
2 2Pr( 0) 1 ( )t ty x β′< = −Φ .
• Pr(y1t is observed) = *2 2Pr( 0) ( )t ty x 2β′> = Φ .
• f(y1t |y1t is observed)
= f(y1t|y2t* ≥ 0)
=
21 1 1
211
1 2 2 12 1 1 1 121 12
2 2
1 ( )exp22
( / )( )
( )
t t
t t
t
y x
x y
x
βσπσ
txσ β σ σ βσ σ
β
′⎡ ⎤−−⎢ ⎥⎣ ⎦⎡ ⎤′ ′+ −
×Φ ⎢ ⎥−⎢ ⎥⎣ ⎦
′÷Φ
• lT = 1
1 1 1ln[ ( | ) Pr( )t
t t ty observed
f y y is observed y is observed∑
+ . 1
1ln Pr( )t
ty is not observed
y is not observed∑
Limited_Basic-26