LIMITED DEPENDENT VARIABLES - BASICminiahn/ecn726/cn_limited.pdf · 2006. 10. 27. · LIMITED DEPENDENT VARIABLES - BASIC [1] Binary choice models • Motivation: • Dependent variable

LIMITED DEPENDENT VARIABLES - BASIC

[1] Binary choice models

• Motivation:

• Dependent variable (yt) is a yes/no variable (e.g., unionism, migration,

labor force participation, or dealth ...).

(1) Linear Model (Somewhat Defective)

Digression to Bernoulli's Distribution:

• Y is a random variable with pdf; p = Pr(Y=1) and (1-p) = Pr(Y=0).

• f(y) = py(1-p)1-y.

• E(y) = Σy yf(y) = 1•p + 0• (1-p) = p;

• var(y) = Σyy2f(y) - [E(y)]2 = p - p2 = p(1-p).

End of Digression

• Linear Model:

Limited_Basic-1

tt ty x β ε′= +i , where yt = 1 if yes and yt = 0 if no.

• Assume that ( | ) 0t tE xε =i .

• ( | ) Pr( 1 | )t t t t t tE y x x p y xβ′= ≡ = =i i i

• tj

tj

px

β∂=

∂: So, the coefficients measure effects of xtj on pt.

• Problems in the linear model:

1) The εt are nonnormal and heteroskedastic.

• Note that yt = 1 or 0.

Limited_Basic-2

t→ 1t xε β′= − i with prob = t tp x β•′=

tx β′= i with prob = 1 1t tp x β•′− = − .

• ( | ) (1 ) ( )(1 ) 0t t t t t tE x x x x xε β β β β• • • • •′ ′ ′ ′= − + − − =

• 2 2 2var( | ) ( | ) (1 ) ( ) (1 )(1 )

t t t t t t t t

t t

x E x x x x xx x

ε ε β β ββ β

• • • • •

• •

β•′ ′ ′ ′= = − + − −′ ′= −

→ Not constant over t.

• OLS is unbiased but not efficient.

• GLS using ( )( )2 ˆˆ 1t t tx x ˆσ β′= −i β′i is more efficient than OLS.

2) Suppose that we wish to predict po = P(yo = 1|xo) at xo. The natural

predictor of po is ˆox β′ where β is OLS or GLS. But ˆ

ox β′ would be outside

of the range (0,1).

(2) Probit Model

• Model:

Limited_Basic-3

t*t ty x β ε′= +i , t = 1, ... , T,

where yt* is a unobservable latent variable (e.g., level of utility);

yt = 1 if yt* > 0

= 0 if yt* < 0;

( | ) 0t tE xε • = ; ~ (0,1t N )ε− conditional on tx • ;

1 1 1( | ,..., , ,..., ) 0t t tE x xε ε ε− =i i ;

the tx • are ergodic and stationary.

Digression to normal pdf and cdf

• X ~ N(μ,σ2): 2

21 (( ) exp

22xf x μσπσ

⎛ ⎞−= −⎜

⎝ ⎠

)⎟ , -∞ < x < ∞.

• Z ~ N(0,1): 21( ) exp

22zzφ

π⎛ ⎞

= −⎜ ⎟⎝ ⎠

; . ( ) Pr( ) ( )z

z Z z vφ−∞

Φ = < = ∫ dv

• In GAUSS, φ(z) = pdfn(z) and Φ(z) = cdfn(z).

• Some useful facts:

dΦ(z)/dz = φ(z); dφ/dz = -zφ(z); Φ(-z) = 1 - Φ(z); φ(z) = φ(-z).

End of digression

• Return to the Probit model

• PDF of the yt: Conditional on xt•,

• Pr(yt = 1) = Pr(yt* > 0) = Pr(xt•′β + εt > 0) = Pr(xt•′β > -εt)

= Pr(-εt < xt•′β) = Φ(xt•′β).

→ This gaurantees pt ≡ Pr(yt = 1) being in the range (0,1).

• ( ) ( )1( | ) ( ) 1 ( )t ty yt t t tf y x x xβ β −

• •′ ′= Φ −Φi .

Short Digression *t ty x tβ ε′= +i , t = 1, ... , T,

(-εt) are i.i.d. U(0,1).

Then, Pr(yt = 1|xt•) = xt•′β (linear) (Heckman and Snyder, Rand, 1997).

End of Digression

• Log-likelihood Function of the Probit model

• LT(β) = 1 ( | )Tt t tf y x=Π i .

• lT(β) = Σtln(f(yt|xt•)) = ( ) ln ( ) (1 ) ln 1 (t t t t ty x y xβ β• •′ ′Σ Φ + − −Φ

• Some useful facts:

• E(yt|xt•) = Φ(xt•′β) .

• ( ) ( ) ( )t t tt t

t

x x x x xx

β β β φ ββ β β• • •

• ••

′ ′ ′∂Φ ∂Φ ∂ ′= =′∂ ∂ ∂

.

• 2 2( ) ( ) ( ) ( )t t

t t ti j k k

x xtx x x xβ β β φ β

β β β β• •

• • •

×

⎛ ⎞′ ′∂ Φ ∂ Φ•′ ′ ′= = −⎜ ⎟′∂ ∂ ∂ ∂⎝ ⎠.

Limited_Basic-4

• ( )Tl ββ

∂∂

= ln ( ) ln(1 ( ))(1 )t tt t t

x xy yβ ββ β

• •′ ′⎧ ⎫∂ Φ ∂ −ΦΣ + −⎨ ⎬∂ ∂⎩ ⎭

= ( ) ( )(1 )( ) 1 ( )

t tt t t t t

t t

x xy x yx x

φ β φ ββ β

• • x• •• •

⎧ ⎫′ ′−Σ + −⎨ ⎬′ ′Φ −Φ⎩ ⎭

= ( ( )) ( )( )(1 ( ))t t t

t t

t t

y x x xx x

β φ ββ β

′ ′−ΦΣ

′ ′Φ −Φ.

• Numerical Property of the MLE of β ( β )

• ˆ( )Tl β

β∂∂

= ˆ ˆ( ( )) ( )

ˆ ˆ( )(1 ( ))t t t

t tt t

y x x xx x

β φ ββ β

• ••

• •

′ ′−ΦΣ

′ ′Φ −Φ = 0k×1.

• ˆ( )ˆ( ) T

TlH βββ β

∂=

′∂ ∂ should be negative definite.

[See Judge, et al for the exact form of HT]

• lT(β) is globally concave with respect to β; that is, HT(β) is negative

definite. [Amemiya (1985, Advanced Econometrics)].

• Use as 1ˆ[ ( )]TH β −− ˆ( )Cov β .

Limited_Basic-5

• How to find MLE (See Greene Ch. 5 or Hamilton, Ch. 5)

1. Newton-Raphson’s algorithm:

STEP 1: Choose an initial oθ . Then compute

(*) 11

ˆ ˆ ˆ[ ( )] (o T o TH s )oθ θ θ −= + − θ .

STEP 2: Using 1θ , compute 2θ by (*).

STEP 3: Continue until 1ˆ ˆq qθ θ+ ≅ .

Note: N-R method is the best if lT(θ) is globally concave (i.e., the Hessian

matrix is always negative definite for any θ). N-R may not work, if

lT(θ) is not globally concave.

2. BHHH [Berndt, Hall, Hall, Hausman]

• lT(θ) = Σtln[ft(θ)].

• Define:

gt(θ) = ln[ ( )]tf θθ

∂∂

[p×1] (sT(θ) = Σtgt(θ).)

BBT(θ) = Σtgt(θ)gt(θ)′ [cross product of first derivatives].

Theorem: Under suitable regularity conditions,

1 1( ) lim ( )T p T T oB ET T

Hθ θ→∞⎛ ⎞→ −⎜ ⎟⎝ ⎠

.

Limited_Basic-6

Implication:

Limited_Basic-7

T• ( ) ( )TB Hθ θ≈ − , as T → ∞.

( )Cov θ can be estimated by 1[ ( )]TB θ − or 1[ ( )]TH θ −− .

• BHHH algorithm uses

( ) 11 ( ) ( )o oo T TB s oθ θ λ θ θ

−= + ,

where λ is called step length.

• When BHHH is used, no need to compute second derivatives.

• Other available algorithms: BFGS, BFGS-SC, DFP.

BHHH for Probit:

Can show gt(β) = ξtxt•, where,

( ) ; ( ); ((1 )

t t tt t t t

t t

y x x )tφξ φ φ β β• •

−Φ ′ ′= = Φ = ΦΦ −Φ

→ 2ˆ ˆˆ ˆ( )T t t t t t t tB g g x xβ ξ • •′ ′= Σ = Σ .

[BT( β )]-1 is ˆ( )Cov β by BHHH.

• Interpretation of β

1) βj shows direction of influence of xtj on Pr( | ) (t t ty x x )β• •′= Φ .

→ βj > 0 means that Pr( 1 | )t ty x •= increases with xtj

2) Rate of change:

Pr( 1| ) ( ) ( )t t tt j

tj tj

y x x xx x

β φ β β• ••

′∂ = ∂Φ ′= =∂ ∂

.

• Estimation of probabilities and rates of changes

• Estimation of pt = Pr(yt=1|xt•) at mean of xt•

• Use ˆˆ ( )p x β′= Φ .

• ( )2ˆ ˆˆvar( ) ( )p x xφ β′ ′= Ωx where ˆˆ ( )Cov βΩ = [by delta-method].

• Estimation of rates of change

• Use ˆ( ) ˆ ˆˆ ( )j

jtj

xp xxβ φ β β′∂Φ ′= =

∂.

• ˆ ˆ( ) ( )ˆˆvar( )

j jj p pp β β

β β∂ ∂

= Ω′∂ ∂

. [by delta-method].

• Note that:

( ) ( ) ( ) ( )j

j jp x x x xβ β φ β β φ ββ

∂ ′ ′ ′ ′= − +′∂

J ,

where Jj = 1×k vector of zeros except that the j’th element = 1.

• Note on normalization:

• Model: yt* = xt•′β + εt, -εt ~ N(0,σ2).

yt = 1 iff yt* > 0.

• *Pr( 1 | ) Pr( 0 | ) Pr( 0 | )

Pr( | ) Pr( / ( / ) | ) [ ( / )]t t t t t t t t

t t t t t t t

p y x y x x xx x x x x

β εε β ε σ β σ β

• • • •

• • • • •

′= = = > = + >′ ′= − < = − < = Φ σ′

• Can estimate β/σ, but not β and σ separately.

Limited_Basic-8

• Testing Hypothesis:

1. Wald test:

• Ho: w(β) = 0.

Limited_Basic-9

ˆ• WT = 1ˆ ˆ ˆˆ( ) ( ) ( ) ( )w W W wβ β β−

⎡ ⎤′ ′Ω⎣ ⎦ β →d χ2(df = # of restrictions),

where β = probit MLE and W(β) = ( )w ββ

∂′∂

.

2. LR test:

• Easy for equality or zero restrictions (i.e., Ho: β2 = β3, or Ho: β2 = β3 =

0).

• EX 1: Suppose you wish to test Ho: β4 = β5 = 0.

STEP 1: Do Probit without restriction and get lT,UR = ln(LT,UR).

STEP 2: Do Probit with the restrictrions and get lT,R = ln(LT,R).

→ Probit without xt4 and xt5.

STEP 3: LRT = 2[lT,UR - lT,R)] ⇒ χ2(df = 2).

• EX 2: Suppose you wish to test Ho: β2 = ... = βk = 0.

(Overall significance test)

• Let n = Σtyt.

• lT* = n ln(n/T) + (T-n) ln[(T-n)/T].

• LRT = 2[lT,UR – lT*] →p χ2(k-1).

• Pseudo-R2 (McFadden, 1974)

• ρ2 = 1 – lT,UR /lT*.

→ 0 ≤ ρ2 ≤ 1.

Limited_Basic-10

)• If ˆ( tx β•′Φ = 1 whenever yt = 1, and if ˆ( tx )β•′Φ = 0 whenever yt = 0, ρ2

= 1.

• If 0 < ρ2 < 1, no clear meaning.

(3) Logit Models

• Model: *t ty x tβ ε′= +i ,

εt ~ logistic with g(ε) = eε/(1+eε)2 and G(ε) = eε/(1+eε).

• Use Pr( 1 | ) (t t ty x G x )β• ′= = • (instead of ( )tx β•′Φ .

• Logit MLE logˆ

itβ max.

( ) ( ) ln( ) ln ( ) (1 ) ln 1 ( )T t t t t tL y G x y G xβ β• •′ ′= Σ + − − .

Use 1log

ˆ[ ( )]T itH β −− or 1log

ˆ[ ( )]T itB β − as . logˆ( )itCov β

• ( )tt j

jt

p g xx

β β•∂ ′=∂

.

• Facts:

• The logistic dist. is quite similar to standard normal dist. except that the

logistic dist. has thicker tails (similarly to t(7)).

• If data contain few obs. with y = 1 or y = 0, then probit and logit may be

quite different. Other than that, probit and logit yield very similar

predictions. Especially, marginal effects are quite similar.

• Roughly, . logˆ ˆ1.6it probitβ β=

Limited_Basic-11

[2] Censoring vs. Truncation

(Greene, ch. 20)

(1) Classical distinction

• Consider shots on target.

Truncation: cases where you have data on “hole” only.

Censoring: cases where you know how many shots missed.

(2) Censoring

• y* ~ pdf: f(y*).

• Observe y = y* if A < y* < B ; A if y* ≤ A ; B if y* ≥ B.

(For obs. with y = A or y = B, y* is unknown.)

• Log-likelihood function:

OB = t|yt* observed; NOB = t|yt

* unobserved,

( )*ln ( | ) Pr( ) ln Pr( )T t OB t t NOBl f y t OB t OB t N∈ ∈⎡ ⎤= Σ ∈ × ∈ + Σ ∈⎣ ⎦ OB .

• Note:

f(yt*|t∈OB)Pr(t∈OB)

= f(yt*|A < yt

* < B)Pr(A < yt* < B)

= [f(yt*)/Pr(A < yt

* < B)]Pr(A < yt* < B) = f(yt

*).

→ ( ) ( ) ( )* *ln ( ) ln Pr( ) ln Pr( )t t t

T t tA y B y A y B

l f y y A y< < = =

= + ≤ +∑ ∑ ∑ *t B≥

→ ( ) ( ) ( )* *ln ( ) ln Pr( ) ln Pr( )t t t

T t tA y B y A y B

l f y y A y< < = =

= + ≤ +∑ ∑ ∑ t B≥

Limited_Basic-12

(3) Truncation

• Observe y = y* iff A < y* < B


pdf of yt: g(yt) = f(yt*| A < yt

* < B)

= *

* *( ) ( )

Pr( ) Pr( )t t

t t

f y f yA y B A y B

=< < < <

.

. *ln( ( )) ln[Pr( )T t t tl f y A y= Σ − < < B

t

(4) Tobit (A censored model)

1) Latent model: *t ty x β ε′= +i , εt ~ N(0,σ2) conditional on xt•.

[yt* ~ N(xt•′β, σ2)]

2) 3 possible cases:

A. Observe yt = yt* if yt

* > 0; = 0, otherwise. → yt = max(0, yt*)

B. Observe yt = yt* if yt

* < 0; = 0 otherwise. → yt = min(0, yt*).

C. Observe yt = yt* if yt

* < Lt; = Lt otherwise.

Limited_Basic-13

3) Log-likelihood for A

• Pr(yt* ≤ 0|xt•) = Pr(xt•′β + εt ≤ 0) = Pr(εt ≤ -xt•′β) = Pr(εt/σ ≤ -xt•′(β/σ))

= Φ[-xt•′(β/σ)] = 1 - Φ[xt•′(β/σ)].

• f(yt*) =

* 2

21 (exp

22t ty x βσπσ

•′⎛ ⎞−−⎜ ⎟⎝ ⎠

) .

• Therefore,

*

0 0

0 0

22

0

0

( , ) ln ( ) ln 1

ln ( ) ln 1

1 1ln(2 ) ln( ) ( )2 2

ln 1

t t

t t

t

t

tT t

y y

tt

y y

t ty

t

y

xl f y

xf y

y x

x

ββ σσ

βσ

π σ βσ

βσ

•

> =

•

> =

•>

•

=

′⎛ ⎞⎛ ⎞= + −Φ⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠′⎛ ⎞⎛ ⎞= + −Φ⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

′= − − − −

′⎛ ⎞⎛ ⎞+ −Φ⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

∑ ∑

∑ ∑

∑

∑

Limited_Basic-14

5) Interpretation:

(i) E(yt*|xt•) = E[latent var. (e.g., desired consumption)|xt•] = xt•′β.

→ βj = *( |t t

tj

)E y xx

•∂∂

.

(ii) E(yt|xt•) = E[observed variable (e.g., actual expenditure)]

= Pr(yt* ≥ 0)E(yt|yt

* ≥ 0) + Pr(yt* < 0)E(yt|yt

* < 0)

= Pr(yt* ≥ 0)E(yt

*|yt* ≥ 0) + Pr(yt

* < 0)E(0|yt* < 0)

= Φ(xt•′β/σ)E(xt•′β + εt|εt ≥ -xt•′β)

= Φ(xt•′β/σ)[xt•′β + σλ(xt•′β/σ)]

[where λ(xt•′β/σ) = φ(xt•′β/σ)/Φ(xt•′β/σ) (inverse Mill’s ratio)]

= Φ(xt•′β/σ)xt•′β + σφ(xt•′β/σ).

Short Digression:

Suppose that εt ~ N(0,σ2). Then,

( / )( | )( / )

tt t t

t

hE hh

φ σε ε σσ

> − =Φ

.

End of Digression

Note: Conditional on xt•,

( ) ( )j jt t t tt j t

tj

tj

E y x x xx xx

x

β ββ β βφ β β σ φ βσ σ σ σ

β βσ

• ••

•

⎛ ⎞′ ′ ′∂ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞′ ′= +Φ + −⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟∂ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠⎝ ⎠′⎛ ⎞= Φ⎜ ⎟

⎝ ⎠

i σ σ•

6) Estimation of E(y*|x) and E(y|x).

Limited_Basic-15

• Let g1(β) = x β′ .

• Estimated E(yt*) at sample mean = 1

ˆ( )g β .

• 1 1ˆse G G ′= Ω , where ˆˆ ( )Cov βΩ = and 1

1( )gG xββ

∂ ′= =′∂

• Let g2(β,σ) = x xxβ ββ σφσ σ

⎛ ⎞ ⎛ ⎞′ ′′Φ +⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠.

• Estimated E(yt) at sample mean = 2ˆ ˆ( , )g β σ .

• G2(β,σ) = 2 ,( , )

g x xxβ βφβ σ σ σ

⎡ ⎤⎛ ⎞ ⎛ ⎞′ ′∂ ′= Φ⎢ ⎥⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟′∂ ⎢ ⎥⎝ ⎠ ⎝ ⎠⎣ ⎦.

• se = 2 2ˆ ˆG G ′Ω , where

ˆ

ˆCov β

σ⎛ ⎞

Ω = ⎜ ⎟⎝ ⎠

.

Limited_Basic-16

(5) Truncation (Maddala, Ch. 6)

1) Example 1:

• Earnings function from a sample of poor people (Hausman and Wise,

ECON. 1979):

Limited_Basic-17

t*t ty x β ε′= +i , εt ~ N(0,σ2) conditional on xt•.

Observe yt = yt* iff yt

* < Lt (Lt = 1.5×“poverty line” dep. on family size)


• pdf of yt: g(yt|xt•) = f(yt*|yt

* ≤ Lt, xt•) = *

*( | )

Pr( | )t t

t t t

f y xy L x

•

•≤.

Pr(yt* ≤ Lt) = *Pr( , ) Pr |t t t t t

t t t tL x L xy L x xε β β

σ σ σ• •

• •

′ ′− −⎛ ⎞ ⎛≤ = ≤ = Φ⎜ ⎟ ⎜⎝ ⎠ ⎝

⎞⎟⎠

→ ln L = ln( ( | )) ln t tt t t

L xf y x βσ

••

′−⎧ ⎫⎛ ⎞Σ − Φ⎨ ⎬⎜ ⎟⎝ ⎠⎩ ⎭

.

where f is the normal density function.

•

* *( | ) ( | )( | ) ( | (

( | ( ))

t t t t t

t t t t t t t t t t

t t t t t

t tt

E y x E y y LE x x L x E L xx E L x

L xx

))β ε β ε β ε ε ββ ε ε β

ββ σλσ

•

• • • •

• •

••

= <′ ′ ′ ′= + + < = + − > − −

′ ′= − − − > − −

′−⎛ ⎞′= − ⎜ ⎟⎝ ⎠

.

2) Example 2:

• Observe yt = yt* iff yt

* > Lt

• f(yt*|yt

* ≥ Lt, xt•) = f(yt*|xt•)/Pr(yt

* ≥ Lt|xt•)

• Pr(yt* ≥ Lt|xt•) = 1 - t tL x β

σ•′−⎡ ⎤Φ ⎢ ⎥⎣ ⎦

→ ln( ( | )) ln 1 t tT t t t

L xl f y x βσ

••

′⎧ ⎫−⎡ ⎤⎛ ⎞= Σ − −Φ⎨ ⎬⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦⎩ ⎭.

• E(yt|xt•) = E(yt*|yt

* ≥ Lt, xt•) = t tt

L xx ββ σλσ

••

′−⎛ ⎞′ + −⎜ ⎟⎝ ⎠

.

3) Link between Truncation and Tobit

• Suppose Lt = 0 for all t in Example 2,

1 1t t t tL x x xβ β βσ σ σ

• •′ ′− −⎛ ⎞ ⎛ ⎞ ⎛−Φ = −Φ = Φ⎜ ⎟ ⎜ ⎟ ⎜⎝ ⎠ ⎝ ⎠ ⎝

•′ ⎞⎟⎠.

• Then, the log-likelihood function becomes:

22

1 1ln(2 ) ln( ) ( ) ln2 2

tT t t t

xl y x βπ σ βσ σ

••

′⎧ ⎫⎡ ⎤⎛ ⎞′= Σ − − − − − Φ⎨ ⎬⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦⎩ ⎭

• Consider tobit. Choose observations with yt > 0 and do truncation MLE.

This is the case where we observe yt = yt* iff yt

* > Lt = 0. The truncation

MLE using the truncated data is consistent even if it is inefficient.

• If the estimation results from truncation and tobit MLE are quite different,

it means that the tobit model is not correctly specified.

Limited_Basic-18

(6) Two-part Model

Cragg (ECON, 1971), Lin and Schmidt (Review of Economics and Statistics

(RESTAT), 1984)

1) Model:

Limited_Basic-19

t• *t ty x β ε′= +i ,

where g(εt|xt•,zt•,vt) =

22

1 1exp22 t

tx

εσπσ

βσ•

⎛ ⎞−⎜ ⎟⎝′⎛ ⎞Φ⎜ ⎟

⎝ ⎠

⎠ and εt > - xt•′β;

• ht* = zt•′γ + vt with vt ~ N(0,1);

• ht = 1 iff ht* > 0; = 0, otherwise.

3) Example:

yt: desired spending on clothing; ht: timing to buy.

4) Log-likelihood function:

• g(yt|xt•) =

22

1 1exp ( )22 t t

t

y x

x

βσπσβ

σ

•

•

⎛ ⎞′− −⎜ ⎟⎝

′⎛ ⎞Φ⎜ ⎟⎝ ⎠

⎠ ; Pr(ht* > 0) = Φ(zt•′γ)

→ . * * *

1 0

ln[ ( | 0) Pr( 0)] ln[Pr( 0)]t t

t t t th h

g y h h h= =

> > + <∑ ∑

Note:

g(yt|ht* > 0) = g(yt), because εt and vt are sto. indep.

[ ]

( ) ( )

22

1 0

22

1

1 0

1 1ln(2 ) ln( ) ( )2 2 ln 1 ( )ln ( ) ln

1 1ln(2 ) ln( ) ( )2 2

ln

ln ( ) ln 1 ( ) .

t t

t

t t

t t

T th ht

t

t t

h t

t th h

y xl z

xz

y x

x

z z

π σ βσ γβγσ

π σ βσ

βσ

γ γ

= =

=

= =

⎧ ⎫′− − − −⎪ ⎪⎪ ⎪ ′= +⎨ ⎬′⎛ ⎞⎪ ⎪′+ Φ − Φ⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭⎧ ⎫′− − − −⎪ ⎪⎪ ⎪= ⎨ ⎬′⎛ ⎞⎪ ⎪− Φ⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭

′ ′+ Φ + −Φ

∑ ∑

∑

∑ ∑

i

ii

i

i

i

i i

−Φ

→ trunc. for yt > 0 + probit for all obs.

→ Estimate (β,σ) by trunc. and γ by probit.

→ lCragg = ltrunc + lprobit .

Limited_Basic-20

Limited_Basic-21

Note:

Let zt• = xt•. If γ = β/σ, Cragg becomes tobit!!!

5) LR test for tobit specification

STEP 1: Do tobit and get ltobit

STEP 2: Do trunc using observations with yt > 0 and get ltrunc.

STEP 3: Do probit using all observations, and get lprobit

STEP 4: lcragg = ltrunc + lprobit.

STEP 5: LR = 2[lcragg - ltobit] →d χ2(k).

[3] Selection Model

• Heckman, ECON, 1979.

Motivation:

• Model of interest:

y1t = x1t′β1 + ε1t .

• Observe y1t (or/and x1t,•) under a certain condition (“selection rule”).

Example:

• Observe a woman’s market wage if she works.

Complete Model:

y1t = x1t′β1 + ε1t ,

y2t* = x2t′β2 + ε2t .

y2t = 1 if y2t* > 0;

= 0 if y2t* < 0.

We observe y1t iff y2t = 1 (x2t must be observable for any t.)

Assumptions: Conditional on 1 2( , )t tx x′ ′ ′ ,

2

1 1 122

2 12 2

0~ ,

0t

t

Nε σ σε σ σ

.⎡ ⎤⎛ ⎞⎛ ⎞ ⎛ ⎞⎢ ⎥⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠ ⎝ ⎠⎣ ⎦

Limited_Basic-22

Theorem:

Suppose:

Limited_Basic-23

. 2

1 1 122

2 12 2

0~ ,

0h

Nh

σ σσ σ

⎡ ⎤⎛ ⎞⎛ ⎞ ⎛ ⎞⎢ ⎥⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠ ⎝ ⎠⎣ ⎦

Then, E(h1|h2 > -a) = 12( ) .( )aa

φσΦ

Facts: Conditional on 1 2( , )t tx x′ ′ ′ .

• *1 2 1 2 2 2 12 2 2( | 0) ( | ) (t t t t t tE y E x x )ε ε ε β σ λ β′ ′> = > − = ,

where 2 22 2

2 2

( )( )( )

tt

t

xxx t

φ βλ ββ

λ′

′ =′Φ

≡ [inverse Mill’s ratio]

• *1 2 1 1 1 2 2 2 1 1 12 2 2( | 0) ( | ) (t t t t t t t tE y y x E x x x )β ε ε β β σ λ β′ ′ ′> = + > − = + ′

t

• 1 1 1 12t t ty x vβ σ λ′= + + ,

where 2 2 2( | ) 0t t tE v xε β′> − = ; 2

2 2 2 1var( | )t t tv x tε β σ′> − ≡ −ξ

t

;

2 212 2 2[( ) ]t t txξ σ β λ λ′= + .

Two-Step Estimation:

STEP 1: Do probit for all t, and get 2β , and 2 2

2 2

ˆ( )ˆˆ( )

tt

t

xx

φ βλβ

′=

′Φ.

STEP 2: Do OLS on 1 1 1 12ˆ

t t ty x tβ σ λ η′= + + , and get 1β and 12σ .

Facts on the Two-Step Estimator:

• Consistent.

• t-test for Ho: σ12 = 0 (no selection) in STEP 2 is the LM test (Melino,

Review of Economic Studies, 1982)

• But all other t-tests are wrong!!! → s2(X′X)-1 is inconsistent.

• So, have to compute corrected covariance matrix

[See, Heckman (1979, Econ.), Greene (1981, Econ.).]

• Sometimes, corrected covariance matrix is not computable (Greene,

Econ, 1981).

Covariance Matrix of the Two-Step Estimator:

• Let 2ˆˆ ( )Cov βΩ = .

• 1 1 1 12t t t ty x vβ σ λ′= + +

→ . 1 1 1 12 12ˆ ˆ[ ( )t t t t ty x vβ σ λ σ λ λ′= + + − − + ]t

Short Digression:

By Taylor expansion around the true value of β2,

2 22 2 2 2 2 2

2

( )ˆ ˆ ˆ( ) ( ) (tt t t

xx x )λ βλ λ β λ β β ββ′∂′ ′= ≈ + −′∂

.

End of Digression

Limited_Basic-24

Limited_Basic-25

]t

tx

→ , ( ) 11 1 2 2 2 2

12

ˆ ˆ ˆ( ) [ ( )t t t t t t ty x h v z h vβ

λ β β γ β βσ⎛ ⎞′ ′ ′ ′= + − + = + − +⎜ ⎟⎝ ⎠

where . 212 2 2 2[( ) ]t t t th xσ β λ λ′= +

→ In matrix notation,

y1 = Zγ + [H 2 2ˆ( )β β− + v].

→

11

12 2

1 12 2

ˆ ( )ˆ( ) ( ( ) )

ˆ( ) ( ) ( )

TS Z Z Z y

Z Z Z Z H v

Z Z Z H Z Z Z

γ

γ β β

γ β β

−

−

− −

′ ′=

′ ′= + − +

′ ′ ′ ′= + − + v

)

.

→ Can show that 2 2ˆ(β β− and v are uncorrelated. Then, intuitively,

ˆ( TSCov )γ = Cov[(Z′Z)-1Z′H 2 2ˆ( )β β− + (Z′Z)-1Z′v]

= Cov[(Z′Z)-1Z′H 2 2ˆ( )β β− ] + Cov[(Z′Z)-1Z′v]

= (Z′Z)-1Z′HCov 2 2ˆ( )β β− H′Z(Z′Z)-1

+ (Z′Z)-1Z′Cov(v)Z(Z′Z)-1

= (Z′Z)-1Z′HΩH′Z(Z′Z)-1 + (Z′Z)-1Z′ΠZ(Z′Z)-1,

where 1( ,..., )Tdiag π πΠ = .

≈ (Z′Z)-1Z′ ˆˆ ˆH H ′Ω Z(Z′Z)-1 + (Z′Z)-1Z′ Z(Z′Z)Π -1,

where 21 ˆ( ,..., )Tdia 2g v vΠ = and is an

estimated H.

H

• MLE (which is more efficient than two-step estimator)

• Conditional on 1 2( , )t tx x′ ′ ′

2

.

• Pr(y1t is not observed) = Pr(y2t* < 0) = *

2 2Pr( 0) 1 ( )t ty x β′< = −Φ .

• Pr(y1t is observed) = *2 2Pr( 0) ( )t ty x 2β′> = Φ .

• f(y1t |y1t is observed)

= f(y1t|y2t* ≥ 0)

=

21 1 1

211

1 2 2 12 1 1 1 121 12

2 2

1 ( )exp22

( / )( )

( )

t t

t t

t

y x

x y

x

βσπσ

txσ β σ σ βσ σ

β

′⎡ ⎤−−⎢ ⎥⎣ ⎦⎡ ⎤′ ′+ −

×Φ ⎢ ⎥−⎢ ⎥⎣ ⎦

′÷Φ

• lT = 1

1 1 1ln[ ( | ) Pr( )t

t t ty observed

f y y is observed y is observed∑

+ . 1

1ln Pr( )t

ty is not observed

y is not observed∑

Limited_Basic-26

Documents

LIMITED DEPENDENT VARIABLES - BASICminiahn/ecn726/cn_limited.pdf · 2006. 10. 27. · LIMITED DEPENDENT VARIABLES - BASIC [1] Binary choice models • Motivation: • Dependent variable