Sample Selection Regression Models (Ch. 17)froelich.vwl.uni-mannheim.de/fileadmin/user_upload/froelich/... · Sample Selection Regression Models (Ch. 17) ... Under these assumptions,

Sample Selection Regression Models (Ch. 17) Until now we always assumed to have a random sample

Now we cover cases where no random sample is available

We look at two different cases

- the sample was collected/selected according to some value of y

- the sample is selected by behaviour of the population under

consideration (self-selection)

The assumption that a random sample from the underlying population is available is not always realistic. Selected sample: non random sample, selection mechanisms due to sample design, or to behaviour of the persons being sampled (including non response on survey questions, attrition from social programs)

Microeconometrics

2

Examples: Saving function Estimate a saving function for all families in a given country: saving = 0β + 1β income + 2β age + 3β married + 4β children + u, age is the age of the household head. We have data on families whose household head is > 45 years old

leads to sample selection problem, because we are interested in all

families and have a random sample only for a subset of the population

Selection on basis of x

Microeconometrics

3

Examples

Family wealth function

Effect of pension plan on wealth accumulation

Estimate effect of worker eligibility in a pension plan on family wealth wealth = 0β + 1β plan + 2β educ + 3β age + 4β income + u, plan is an indicator for eligibility.

(17.2) 0 1 2y plan x uβ β β= + + +

The sample only contains people with wealth less than 100'000

Selection on basis of y (endogenous variable)

Microeconometrics

4

Wage offer function

Estimation of wage function for population in working age

But wages are only observed for workers

y is only observable for subsample which is defined by another variable

(working)

Self selection: decision to work depends on wage

Estimate a wage offer equation for people of working age. However, data (wage) are only available for working people. Sample selection problem often called incidental truncation, because wage is missing as a result of another outcome, participation to the labour force.

Microeconometrics

5

When can Sample Selection Be Ignored? Conditions under which 2SLS using the selected sample is consistent.

Population represented by vector (x, y, z)

x: 1 x K y: 1 x 1 z: 1 x L

Population model:

(17.3) 1 2 2 ... K Ky x x uβ β β= + + + +

(17.4) ( | ) 0E u =z

This is stronger than we need for 2SLS to be consistent!

Special case: z = x x is exogenous

General treatment x can be endogenous

Microeconometrics

6

With a random sample (17.3) can be estimated consistently with 2SLS

(if rank[E(z’x)]=K)

(17.5) 1 2 2( | ) ... K KE y x xβ β β= + + +x

No random sample available data follow selection rule.

s: binary selection indicator

s = 1: observation is used s = 0: observation is not used

Key assumption

(17.6) ( | , ) 0E u s =z

Microeconometrics

7

(17.6) can follow directly from (17.4)

- s is deterministic function of z )( | , ) ( |E u s E u=z z . In this case

selection follows a fixed rule which only depends on exogenous

variables

- Selection is independent of (z,u) )( | , ) ( |E u s E u=z z

In estimating (17.3) we apply 2SLS to observations with s = 1.

The observed sample is { }( , , , ) : 1,...i i i iy s i N=x z . Observation i is used if

. 1is =

Microeconometrics

8

The 2SLS estimator with the selected sample is

11

1 1 1

1 1 1

11 1 1

1 1 1

ˆ ' ' '

' ' '

N N N

i i i i i i i i ii i i

N N N


N s N s N s

N s N s N s y

β

−−

− − −

= = =

−− − −

= = =

⎡ ⎤′⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎢ ⎥= ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎢ ⎥⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦⎡ ⎤′⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎢ ⎥× ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎢ ⎥⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦

∑ ∑ ∑

∑ ∑ ∑

z x z z z x

z x z z z

Substituting ii iy uβ= +x gives

11

1 1 1

1 1 1

11 1 1

1 1 1

ˆ ' ' '

' ' '

N N N


N N N


N s N s N s

N s N s N s u

β β

−−

− − −

= = =

−− − −

= = =

⎡ ⎤′⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎢ ⎥= + ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎢ ⎥⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦⎡ ⎤′⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎢ ⎥× ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎢ ⎥⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦

∑ ∑ ∑

∑ ∑ ∑

z x z z z x

z x z z z

Microeconometrics

9

By assumption ( | , ) 0i i iE u s =z and so ( ' ) 0i i iE s u =z (Law of iterated

expectations)

ˆplim β β= (by law of large numbers)

Theorem 17.1 (Consistency of 2SLS under Sample Selection)

In model (17.3) assume that

- (17.6) ( | , ) 0E u s =z

- (17.8) ( ' | 1)rank E s L= =z z

- (17.9) ( ' | 1)rank E s K= =z x

Then the 2SLS estimator using the selected sample is consistent for β and

asymptotically normally distributed

Microeconometrics

10

Under homoskedasticity, ( )2 2,E u s σ=z

( ) ( ) ( ) ( )1' 12 ' ' 'Â var N E s E s E sβ β σ

−−⎡ ⎤− = ⎢ ⎥⎣ ⎦z x z z z x with

How to estimate 2σ ?

21

2

1 1

ˆN N

pi i i

i is s u σ

−

= =

⎛ ⎞⎯⎯→⎜ ⎟

⎝ ⎠∑ ∑ (mean in selected sample)

Why is this consistent ?

2 2[ ] [ ]E su E s σ= ⋅ Hence 2 2

2 [ ] [ ][ ] [ ]

E su N E suE s N E s

σ ⋅= =

⋅

Microeconometrics

11

Example 4: Nonrandomly Missing IQ Scores

( )1 1 1log( ) , , , 0wage abil v E v abil IQδ= + + =z z Assume that IQ is a valid proxy for abil (good instrument:

correlated with abil and independent from e conditional on z1):

( )1 1, , 0abil IQ e E e IQθ= + = z

1 1 1log( )wage IQ uδ θ= + +z u=v+e

Under these assumptions, ( )1 , 0E u IQ =z . By Theorem 1, if we choose the sample excluding all people with IQs below a fixed value, then OLS estimation on the last equation will be consistent. (selection on exogenous variable)

Microeconometrics

12

6.2.2 Nonlinear Models 1. If ( ) ( ),E y s E y=x x , then selection is ignorable and NLS on the selected sample is consistent: ( ) 21

1min ,

N

i i ii

N s y mβ

β−

=

⎡ − ⎤⎣ ⎦∑ x Why? We use that ( ) ( ) ( ) ( )0 0, , ,y m y m m m ,β β β− = − + −x x x βx

and that ( )0, | , 0E y m sβ⎡ − ⎤ =⎣ ⎦x x

( ) ( )2 2[ ( , ) ] [ [( , ) | , ]]E s y m E s E y m sβ β⋅ − = ⋅ −x x x

( ) ( )2 20[ | , ]] [ ( , , ) | , ]]E s u s E s m m sβ β⋅ + ⋅ −x x x x

because 0β in ( )0,E y m β⎡ ⎤ =⎣ ⎦x x minimizes ( ){ }2

,E s y m β⎡ − ⎤⎣ ⎦x estimated by OLS of y on x, using the selected sample.

Microeconometrics

13

2. General conditional ML setup If distribution ( ) ( ),D y s D y=x x , then selection again is ignorable. This assumption holds if s in a nonrandom function of x or if s is independent of (x, y).

In this case, MLE on the selected sample is consistent: ( )1

1max , ,

N

i i ii

N s l yθ

θ−

=∑ x

because for each x , 0θ maximizes ( ), ,E l y θ⎡ ⎤⎣ ⎦x x over Θ Now write ( ) ( ){ } ( ){ }, , , , , , ,E sl y E sE l y s E sE l yθ θ θ⎡ ⎤ ⎡ ⎤⎡ ⎤ = =⎣ ⎦ ⎣ ⎦ ⎣ ⎦x x x x x

Because, for every x, ( ), ,E l y θ⎡ ⎤⎣ ⎦x x is maximized at 0θ it must also be the case that ( ){ }, ,E sE l y θ⎡ ⎤⎣ ⎦x x is maximized at 0θ

Microeconometrics

14

0θ maximizes ( ){ }, ,E sE l y θ⎡ ⎤⎣ ⎦x x

Microeconometrics

15

Truncated Regression (Selection on Response Variable)

( ,i iyx ) random draw from population: estimate ( | )i iE y x in this population.

But we only observe sample selected on value of y

Examples: wealth, wage in specific samples, ...

yi is continuous variable

The selection rule is 1 21[ ]i is a y a= < < were a1 and a2 are known.

We observe if 2a( , )i iyx 1 ia y< < , otherwise we observe neither y nor x.

In most cases we want to estimate ( | )i i iE y β=x x .

Microeconometrics

16

we need specification of full conditional distribution of |i iy x

Specify the conditional density of |iiy x ( | ; , )if β γ⋅ x ,

where γ are additional parameters, e.g. variance

cdf of is given by |iiy x ( | ; , )iF β γ⋅ x

In estimating ( | )i iE y x we must condition on 2a1 ia y< < , i.e. si = 1

Selection rule indicates that if falls in iy ( )1 2,a a , then both iy and are observed; if ix iy is outside this interval, then we do not observe iy and . In estimation, use the density of ix iy conditional on and the fact that we observe ix ( ),x iy i

The cdf of is | , 1ii iy s =x ( , 1 | )( | , 1)

( 1 | )i i i

i i ii i

P y c sP y c sP s

≤ =≤ = =

=xx

x

Microeconometrics

17

1 2 2 1( 1 | ) ( | ) ( | ; , ) ( | ; , )i i i i iP s P a y a F a F aβ γ β γ= = < < = −x x x x

If y is truncated from one side only then either: a1 = -∞ or a2 = ∞

To obtain the numerator above we write

1 1( , 1 | ) ( | ) ( | ; , ) ( | ; )i i i i i i ,P y c s P a y c F c F aβ γ β γ≤ = = < < = −x x x x

Plug this into the above equation and take the derivative with respect to c

we get the density of yi given (xi,si)

(17.14) 2 1

( | ; , )( | , 1)( | ; , ) ( | ; , )

ii i

i i

f cp c sF a F a

β γβ γ β γ

= =−

xxx x

1 2for a c a< <

(17.14) is valid irrespective of specific distributional assumption.

Usually we assume a normal distribution for f.

Assume further that ( | )E y β=x x

Microeconometrics

18

Then we have

2 1

1

( | , 1)

i i

i ii i

y

f y sa a

βφσ σ

β βσ σ

−⎛ ⎞⎜ ⎟⎝ ⎠= =

− −⎛ ⎞ ⎛ ⎞Φ − Φ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

x

xx x

In many cases . 1 20 und a a= = ∞

The CMLEs of β and γ using the selected sample are efficient in the class of estimators not using

information about the distribution of x.

Microeconometrics

19

6.3 Selection on Basis of the Response Variable: Truncated Regression In most applications of truncated samples, the population conditional distribution is assumed to be ( )2,N β σx : truncated Tobit model or truncated normal regression model. The truncated Tobit model is related to the censored Tobit model for data-censoring applications (see Chapter 5). The key difference between censored and truncated regressions is that in censored regression, we observe x for all people even if y is not known. heteroskedasticity or nonnormality in truncated regression results in inconsistent estimators of β.

Microeconometrics

20

Example:

set obs 10000 g x = uniform() g u = invnorm(uniform()) replace u = 2*u g y = 1 + x + u drop if y <= 0 reg y x outreg using d:\stata\micro\out\trunc_sim, replace truncreg y x, ll(0) outreg using d:\stata\micro\out\trunc_sim, append OLS TRUNCREG x 0.536 0.951 (9.11)** (9.00)** Constant 2.011 1.047 (57.84)** (13.24)** σ 1.97 (64.5)** Observations 7708 7708 Absolute value of t statistics in parentheses * significant at 5%; ** significant at 1%

Microeconometrics

21

17.4 Probit Selection

Sample selection is not result of sample design but due to decisions made by

members of the population (self selection)

Exogenous explanatory variable

Classic example: Labour force participation and wages

We want to know: i for a person randomly drawn from the

population (w : wage)

( | )iE w x

w is only observed for working people.

Microeconometrics

22

Model of labour supply:

(17.15) max ( , ) wrt 0 168i i ihU w h a h h+ ≤ ≤

h: hours of work per week a non-labour income

( ) ( ,i i is h U w h a h≡ + )i and h < 168

Possible solutions: h = 0 or 0 < h < 168

If d /d 0 at 0 0is h h h≤ = ⇒ =

Microeconometrics

23

This implies that h = 0 if

(17.16) i ( ,0) / ( ,0)h qi i i iw mu a mu a≤ −

where muh is marginal disutility of work and muq is marginal utilty of

income. The righthand side of (17.16) is called the reservation wage wr.

Parametric assumptions:

(17.17) 1 1 1 2 2 2 2exp( ) exp( )ri i i i i i iw u w a uβ β γ= + = +x x +

(u11 , ui2) independent of (xi1 , xi2 , ai). xi1 contains productivity

characteristics and xi2 contains charactistics that determine marginal utility

of leisure and income (there may be an overlap)

(17.18) 11 1log i i iw uβ= +x

Microeconometrics

24

But wage is only observed if w > wr, i.e.

1 1 2 2 2 1 2 2 2log log 0ri i i i i i i iw w a u u vβ β γ δ− = − − + − ≡ + >x x x

Problem: wr is not observed and depends on xi2 and ui2 ,

wr is unknown constant we need another estimation procedure

Notation: drop subscript i, w1 logy ≡ and y2 ist binary indicator

(17.19) 1 1 1 1y uβ= +x

(17.20) 2 2 21[ 0]y vδ= + >x

(17.20) is a probit if v2 is normally distributed

Microeconometrics

25

Assumptions 17.1: (a) (x,y2) are always observed, y1 only observed if y2 =1

(b) (u1,v2) is independent of x with zero mean

(c) 1) and (d) 2 ~ (0,v N 1 2 1 2( | )E u v vγ=

(a) describes the selection process;

(b) is strong exogeneity assumption;

(c) necessary to derive a conditional expectation given the selected sample;

(d) requires linearity of regression of u on v.

(d) always holds if (u1,v2) is bivariate normal (but it is not necessary

to assume that u is normally distributed).

Microeconometrics

26

Estimation of Selection Model

Let ) denote a random draw from the population. Given the

selection rule we can hope to estimate

1 2 1 2( , , , ,y y u vx

2 2( | , 1) and ( 1| )iE y y P y= =x x

How does ) depend on β1? 2( | , 1iE y y =x

First, note that

(17.21) 12 1 1 1 2 1 1 1 2 1 1 2( | , ) ( | , ) ( | )iE y v E u v E u v vβ β β γ= + = + = +x x x x x

where the second equality follows because (u1,v2) is independent of x

If γ1 = 0 no selection problem!

Microeconometrics

27

What if γ1 ≠ 0? Using iterated expectations on (17.21) gives

2 1 1 1 2 2 1 1 1 2( | , ) ( | , ) ( , )iE y y E v y h yβ γ β γ= + = +x x x x x

where ) 2 2 2( , ) ( | ,h y E v y=x x

If we knew ), we could estimate β1 und γ1 from 2( ,h yx

the regression of y1 on x and ) (in the selected sample). 2( ,h yx

In the selected sample y2 = 1 we only have to find ). ( ,1h x

2 2 2 2( ,1) ( | ) (h E v v )δ λ δ= > − =x x x , where ( )( )( )

φλ ⋅⋅ =

Φ ⋅

Microeconometrics

28

This follows from a special property of the normal distribution :

If ~ (0,1)z N then ( )( | )1 ( )

cE z z cc

φ> =

− Φ

The term ( )( )( )

φλ ⋅⋅ =

Φ ⋅ is called the inverse of Mill’s ratio

This implies

(17.22) )1 2 1 1 1 2( | , 1) (E y y β γ λ δ= = +x x x

From (17.22) it is obvious that OLS of y on x1 in the selected sample omits

the term 2( )λ δx omitted variable bias

Microeconometrics

29

(17.22) also shows a way to consistently estimate β1.

Heckman (1979) has shown that β1 und γ1 can consistently be estimated in

the selected sample by regressing y on x1 and 2( )λ δx .

But δ2 is unknown and must be estimated in a first step (using Probit).

Microeconometrics

30

Heckman Estimator

Step 1: Estimate Probit model

(17.23) 22( 1 | ) )iP y δ= = Φ(x x using all observations.

Obtain 2 2ˆ ˆ( )i iλ λ δ≡ x

Step 2: Estimate 11 ˆ und β γ using OLS in the selected sample

(17.24) 1 1 1 1 2ˆ

i i i iy uβ γ λ= +x +

This estimator is consistent and asymptotically normally distributed

Microeconometrics

31

Simple test for selection bias:

under H0 (no selection bias) in (17.24) γ1 = 0 t – test for γ1.

IMPORTANT: this test is only valid if the model is correctly specified

(distributional assumptions)

If γ1 ≠ 0 the standard errors of β1 must be corrected

- for heteroskedasticity

- because δ2 has been estimated in the first step

Stata does this for you if you use the command heckman

Microeconometrics

32

Theoretically, it is not necessary that x1 is a strict subset of x

β1 is identified if x = x1 (because λ is nonlinear function of x)

However, in practice λ is often almost a linear function of x

severe multicollinearity very imprecise estimates

Strong recommendation: you should have at least one element in x that is

not in x1 (exclusion restriction)

Microeconometrics

33

Relation between xβ and λ 0

12

3la

mbd

a

-4 -2 0 2 4xb

Microeconometrics

34

use d:\stata\micro\data\mroz; reg lwage educ exper expersq; heckman lwage educ exper expersq, select (inlf = educ exper expersq age kidslt6 kidsge6 nwifeinc) twostep; heckman lwage educ exper expersq , select (inlf = educ exper expersq ) twostep;

Table 17.1

wage equation OLS Heckman 2 step

Heckman 2 step no excl. restr.

educ 0.107 0.109 0.093 (7.60)** (7.03)** (1.82) exper 0.042 0.044 0.021 (3.15)** (2.70)** (0.28) expersq -0.001 -0.001 -0.000 (2.06)* (1.96) (0.27) mills:lambda 0.032 -0.270 (0.24) (0.28) Constant -0.522 -0.578 -0.010 (2.63)** (1.90) (0.01)

Microeconometrics

35

selection equation inlf:educ 0.131 0.097 (5.18)** (4.38)** inlf:exper 0.123 0.127 (6.59)** (7.12)** inlf:expersq -0.002 -0.002 (3.15)** (4.12)** inlf:age -0.053 (6.23)** inlf:kidslt6 -0.868 (7.33)** inlf:kidsge6 0.036 (0.83) inlf:nwifeinc -0.012 (2.48)* inlf:Constant 0.270 -1.925 (0.53) (6.67)** lambda .032 -.270 (0.24) (-0.28) sigma .663 .691 Observations 428 753 753 R-squared 0.16 Absolute value of t statistics in parentheses * significant at 5%; ** significant at 1%

Microeconometrics

36

Data generation for selection problem

set obs 10000 g x = uniform() g z = uniform() matrix c = (4, 1 \ 1, 1) /*Kovarianzmatrix u1,v2*/ drawnorm u1 v2, n(10000) cov(c) /*korrelierte Störterme */ g y1 = 1 + x + u1 g y2star = 0.5 + 0.5*x + 0.5*z + v2 g y2 = y2star>0.6 replace y1 = . if y2==0 reg y1 x heckman y1 x, select (y2= x z) twostep heckman y1 x, select (y2= x z) heckman y1 x, select (y2= x) twostep heckman y1 x z, select (y2= x z) twostep

Microeconometrics

37

Simulation results

OLS Heckman 2 step

Heckman ML

2 step no excl. restr.

2 Step no excl. restr.

Structural equation x 0.758 1.206 1.115 -0.258 -1.432 (9.48)** (8.96)** (11.94)** (0.08) (0.48) z -2.264 (0.88) lambda 1.611 -3.585 -7.603 (4.44)** (0.32) (0.73) Constant 1.683 0.557 0.793 4.216 8.219 (35.23)** (2.15)* (7.54)** (0.53) (0.95) Selection equation x 0.535 0.525 0.526 0.535 (11.83)** (11.65)** (11.67)** (11.83)** z 0.457 0.474 0.457 (10.12)** (11.23)** (10.12)** Constant -0.093 -0.097 0.137 -0.093 (2.73)** (2.93)** (5.36)** (2.73)** rho 0.712 (8.93)** Absolute value of t statistics in parentheses * significant at 5%; ** significant at 1%

Microeconometrics

38

Predictions after estimation of selection models

Often selection models are used to predict the dependent variable for the

observations not in the selected subsample

Example: expected wage of nonworkers

Correct prediction:

1ˆ( | )i i iE y β=x x

and NOT

2 1 2 1ˆ ˆ( | , 1) ( | )i i i i i i iE y y E yβ γ λ= = + ≠x x x

Stata heckman lnlohn .../*selection model for ln(wage)*/

predict lnlohn_pred, e(.,.) /* prediction of ln(wage)*/

Microeconometrics

39

Joint ML estimator

If (c) and (d) in Assumption 1 are replaced by stronger assumption that ( )1 2,u v ( ) ( ) ( )2

1 1 1 2 12 2, , , and 1Var u Cov u v Var vσ σ= = = is bivariate normal with mean 0, then partial likelihood estimation can be used. Partial MLE will be more efficient than the 2-step procedure (Partial MLE using the density of 1y when ) 2 1y =

( ) ( ) ( ) ( )( ) ( ) ( ){ }

1 2 2 1 1 2

1/ 22 2 22 1 2 12 1 1 1 1 12 1

1, P 1 , / P 1 , with

P 1 , 1

f y y y y f y y

y y yδ σ σ β σ σ−− −

= = = =

⎡ ⎤= = Φ + − −⎣ ⎦

x x x x

x x x

the log-likelihood for observation i is:

( ) ( ) ( )

( ) ( ){ } ( ) ( )( )2 2

1/ 22 2 22 2 12 1 1 1 1 12 1 1 1 1 1 1

1 log 1

log 1 log / log

i i i

i i i i i i

l y

y y y

θ δ

δ σ σ β σ σ φ β σ σ−− −

= − ⎡ − Φ ⎤ +⎣ ⎦

⎡ ⎤ ⎡Φ + − − + − −⎣ ⎦ ⎣

x

x x x ⎤⎦

Microeconometrics

40

Endogenous Explanatory Variables

One element of x1 correlated with u1

(17.25) 1 1 1 1 2 1y y uδ α= + +z

(17.26) 2 2 2y vδ= +z

(17.27) 3 3 31[ 0]y vδ= + >z

(17.25) is the structural equation to be estimated,

(17.26) is linear projection of the endogenous variable y2 (i.e. not structural)

(17.27) is the selection equation.

The correlations between u1, v2, v3 are unrestricted.

Microeconometrics

41

3 interesting cases:

- y2 is always observed, but endogenous in (17.25) (e.g. education in wage

equation)

- y2 is as well only observed if y3 = 1 . In this case y2 can be exogenous in

the population, but due to selection it becomes endogenous.

- y1 is always observed, but y2 only sometimes

If y1 and y2 were always observed along with z,

we would estimate (17.25) with 2SLS if y2 is endogenous.

In case of selection 2SLS with the inverse of Mill’s ratio added to the

regressors is consistent (only using the selcted sample in the second step).

Microeconometrics

42

Assumptions 17.2:

(a) (z,y3) are always observed, (y1, y2) are only observed if y3 = 1

(b) (u1,v3) are independent of z with mean 0

(c) 1) 3 ~ (0,v N

(d) 1 3 1 3( | )E u v vγ=

(e) 2 2 1 21 2 22 22( ' ) 0 and in is 0E v δ δ δ δ= = + ≠z z z z

Parts b, c, und d are identical to assumptions 17.1. Assumption e is new, it

corresponds to the usual assumptions needed for identification in 2SLS

Microeconometrics

43

Derivation of estimating equation

Write

(17.28) 1 1 1 1 2 3 1( , )y y g y eδ α= + + +z z

with 3z . Thus 03 1 3 1 1 1( , ) ( | , ) und ( | , )g y E u y e u E u y≡ ≡ −z z 1 3( | , )E e y =z

Note that cov(g,e1) = 0.

If we knew ) we estimate (17.28) with 2SLS in selected sample,

with instruments (z )). 3( ,g yz

, ( ,1g z

We know up to some parameters: )( ,1)g z 1 3 1 3( | , 1) (E u y γ λ δ= =z z

δ3 can be estimated consistently with Probit two-step prodecure

Microeconometrics

44

Step 1: Estimate 3δ with Probit of y3 on z using all observations and

calculate 3 3ˆ î i( )λ λ δ= z

Step 2: Estimate in selected sample

(17.29) 1 1 1 1 2 1 3ˆ

i i i i iy y eδ α γ λ= + + +z with 2SLS and IV: ( i3ˆ,i λz )

This procedure applies to any kind of endogenous variable y2, including

discrete variables (because reduced form for y2 (17.26) is linear projection

without distributional assumptions)

z2 must have predictive power in regression of y2 onto z1, z2, 3( )iλ δz

two exclusion restrictions needed (otherwise functional form identification)

Hypothesis of selection bias can be testet with t-value for 3iλ .

Microeconometrics

45

Example:

wage offer equation with education being endogenous

IV for education: mother’s and father’s education

IV for selection: number and age of children, non-labour income

Microeconometrics

46

How to do it with STATA

Possible endogeneity of education in wage equation of married women:

Instruments: education of parents and husband

use d:\stata\micro\data\mroz

probit inlf exper expersq age kidslt6 kidsge6 nwifeinc motheduc fatheduc huseduc;

predict xb, xb;

g lambda = normden(xb)/norm(xb);

ivreg lwage exper expersq lambda (educ=motheduc fatheduc huseduc kidslt6 kidsge6 nwifeinc ) if inlf==1;

Microeconometrics

47

6.4.3 Binary Response Models with Sample Selection Assume that latent errors are bivariate normal and independent of regressors

[ ][ ]

1 1 1 1

2 2 2

1 0

1 0

y u

y v

β

δ

= + >

= + >

x

x is observed only when 1y 2 1y = x is always observed. Example: 1y is employment indicator, x contains a job training indicator We can lose track of some people who are eligible to participate in program; example of sample attrition. If attrition is systematically related to , then estimation using the selected sample leads to an inconsistent estimator of

1u1β .

If we assume that ( )1 2,u v is independent of x with a 0 mean normal distribution and unit variances, we can apply the partial MLE using the density of 1y conditional on x and 2 1y = .

Microeconometrics

48

2-step procedure can also be applied: (1) estimate 2δ by probit of 2y on x; (2) estimate 1β and 1ρ (the correlation between and ) 1u 2v

along with ( )1 2P 0 , 1y y= =x

Microeconometrics

49

6.5 A Tobit Selection Equation 6.5.1 Exogenous Explanatory Variables Selection equation is a censored Tobit equation. The population model is:

1 1 1 1y uβ= +x ( )2 2 2max 0,y vδ= +x

where )( 2, yx always observed, but is observed only when 1y 2 0y >

Example: and 1 log( )y wage= 2 log( )y hours= Assumption 3: Type III Tobit model

(a) )( 2, yx always observed, but observed only when 1y 2 0y >

(b) independent of x; (

( )1 2,u v

)2~ 0,(c) 2v N 2τ here we do not have to normalize variance (d) ( )1 2 1 2E u v vγ=

Estimate ( )1 2 2 1 1 1 2, ,E y v s vβ γ= +x x

Microeconometrics

50

Estimate ( )1 2 2 1 1 1 2, ,E y v s vβ γ= +x x If we knew v2 we could estimate this equation. For observations with s2=1 we can estimate 2 2 2v y δ= − x This was not possible in Probit selection model Procedure 3: (a) 2δ is the standard Tobit estimate from the selection model using all N observations, then compute 2 2 2 2

ˆˆ , for those obs with 0i i i iv y yδ= − >x (b) estimator 1β and 1γ from the OLS regression of on and using the selected sample

1iy 1ix 2iv

2 0iy > The estimators are consistent and N -asymptotically normal. No instrument needed, because variation in 2y produces variation in 2v

Microeconometrics

51

For partial likelihood estimation, assume that ( )1 2,u v jointly normal such that ( ) ( ) ( )2 2

1 1 1 2 12 2 2, , and Var u Cov u v Var vσ σ ι= = = Density ( )2f y x for entire sample is used and the conditional density ( )1 2f y yx, for selected sample. The log-likelihood for observation i is:

( ) ( ) ( ) [ ]22 1 2 2 2 2 2 2log , ; log ; , , with 1 0i i i i i i i i il s f y y f y s yθ θ δ ι= + = >x x

where ( ) ( ) ( )2 2 2 2

1 2 1 1 1 2 2 1 1 12 2, ; , /i i i i i if y y Normal yθ β γ δ η σ σ ι⎡ ⎤= + − ≡ −⎣ ⎦x x x ( )2

2 2 2; , standard censored Tobit densityi if y δ ιx

( )[ ] [ ]1 0 1 0

1see chapter 5, 1-y y

i ii f y = β βφ

σ σ σ

= >⎛ ⎞⎧ ⎫ ⎧ ⎫⎛ ⎞ ⎡ ⎤⎜ ⎟Φ⎨ ⎬ ⎨ ⎬⎜ ⎟ ⎢ ⎥⎜ ⎟⎝ ⎠ ⎣ ⎦⎩ ⎭ ⎩ ⎭⎝ ⎠

x xx

Microeconometrics

52

6.5.2 Endogenous Explanatory Variables (in Tobit model) the model in the population is:

1 1 1 1 2 1 (4)y y u δ α= + +z 2 2 2 (5)y v δ= +z

( )3 3 3max 0, (6)y v δ= +z Assumption 4:

((a) ) always observed, 3, yz ( )1 2,y y observed when )

3 0y >

(b) is independent of z (

( 1 3,u v

)23 3~ 0,v N τ (c)

(d) ( )1 3 1 3E u v vγ=

(e) and writing ( )2 0E v ='z2 1 21 2 22 22, 0δ δ δ δ= + ≠z z z

We need only one instrument (for selection equation)

Microeconometrics

53

Write (17.3)

1 1 1 1 2 1 3 1y y v eδ α γ= + + +z where 31 1 1[ | ]e u E u v= − Procedure 4: (a) obtain 3δ from Tobit of 3y on z using all observations (eq(6)). Obtain the Tobit residuals 3 3 3 3

ˆˆ , for 0.i i i iv y yδ= − >z (b) using the selected subsample (for which 1y and 2y are observed), estimate the equation: i1 1 1 1 2 1 3î i i iy y v +error δ α γ= + +z by 2SLS using instruments ( )3ˆ,i ivz

Microeconometrics

54

6.6 Estimating Structural Tobit Equations with Sample Selection Structural labor supply model involving simultaneity and sample selection

01 1 1 1log( )y w uβ≡ = +z

( )2 2 2 2 1 2max 0,y h y uβ α≡ = + +z Reduced form: enter equation 1 into equation 2. What is different from previous analysis? Now we are interested in 2α Assumption 5:

( )2, yz always observed, 1y observed when 2 0y >(a) (b) is independent of z with 0-mean bivariate normal distribution ( )1 2,u u

(c) contains at least one element with non-zero coefficient not in 1z 2z

i.e. we need an IV for the first equation The assumption (c) is needed to identify 2 2,α β , whereas 1β is always identified

Microeconometrics

55

require new methods, whether or not and are uncorrelated, because is not observed when y

2 0y >

02u

1y 2 = . Estimation of ( )2 2,α β easy to obtained after having estimated 1β . Procedure 5:

1β(a) use procedure 3 to obtain (b) obtain 2β and 2α from the Tobit in ( )( )2 2 2 2 1 1max 0,i i i iy errorβ α β= + +z z

Microeconometrics

56

6.7 Sample Selection and Attrition in Linear Panel Models unbalanced panel (time periods for some persons are missing because of rotating panel or attrition or incidental truncation problem 6.7.1 Fixed Effects Estimation with Unbalanced Panels Model:

, 1,...,it it i it iy c u t Tβ= + + =x where is a 1xK and β a Kx1 vector. For a random draw i from the population, let

itx

( '1 2, ,...,i i i iTs s s≡s

( )1 if , oit it its y = x) the Tx1 vector of selection indicators:

bserved

random sample from the population: ( ){ }, , : 1,2,...,i i i i N=x y s Fixed effects estimator:

11 ' 1 '

1 1 1 1

ˆ ,N T N T

it it it it it iti t i t

N s N s uβ β−

− −

= = = =

⎛ ⎞ ⎛ ⎞= + ⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠∑∑ ∑∑x x x&& && &&

Microeconometrics

57

6.7 Sample Selection and Attrition in Linear Panel Data Models 6.7.1 Fixed Effects Estimation with Unbalanced Panels (continued) Assumption 6: (a) ( ), , 0, 1,2,...,it i i iE u c t T= =x s

(b) nonsingular; ( )1

T

it it itt

E s=∑ x 'x&& &&

(c) ( )' 2, , Ii i i i i u TE c σ=u u s x Under Assumption 6, the FE on the unbalanced panel is consistent and asymptotically normal (T fixed and large N)

( )1

2 'it

1 1

ˆ Â V arN T

u it iti t

N sβ β σ−

= =

⎛ ⎞− = ⎜ ⎟

⎝ ⎠∑∑ x x&& && with ( )

12 2 2

1 1 1

ˆ ˆ1N N T

Nu i it it u

i i tT s uσ σ

−→∞

= = =

⎡ ⎤= − ⎯⎯⎯→⎢ ⎥

⎣ ⎦∑ ∑∑

Microeconometrics

58

6.7 Sample Selection and Attrition in Linear Panel Data Models 6.7.2 Testing and Correcting for Sample Selection Bias Model: 1 1 1 1 1, 1,...,it it i ity c u t Tβ= + + =x selection equation: [ ] ( )2 2 2 21 0 , 0,1it i t it it is v v N ψ= + >x x contains 1i → x

(note: no fixed effect in selection equation!) under the null of Assumption 6 (a), the inverse Mills ratio 2itλ should not be significant in the equation estimated by fixed effects. Then a valid test of the null is a t statistic on 2itλ in the FE estimation on the unbalanced panel. under Assumption 6 (c), the usual t statistic is valid. Correcting for sample selection: adding 2itλ to the equation and using FE does not produce consistent estimators (if FE in selection equation). Chamberlain‘s approach to panel data models works, but we need some linearity assumptions.

Microeconometrics

59

Assumption 7: (a) the selection equation is given above; (b) ( ) ( )1 2 1 2 1 2, , 1,...,it i it it it t itE u v E u v v t Tρ= = =x ; and (c) ( )1 2 1 1 2,i i it i t itE c v vπ φ= +x x

Under Assumption 7, ( ) ( )1 2 1 1 1 1 2x , 1it i it it i t i tE y s β π γ λ ψ= = + +x x x

Microeconometrics

60

6.7 Sample Selection and Attrition in Linear Panel Data Models 6.7.2 Testing and Correcting for Sample Selection Bias (continued) we can consistently estimate 1β by 1) estimate a probit of 2its on for each t, compute inverse Mills ratio, ix

2itλ , all i and t; 2) run the pooled OLS regression using the selected sample of on 1ity

11 2 2 2ˆ ˆ ˆ, , , 2 ,..., for allit i it t it t it itd dT sλ λ λ =x x

where t

2 ,...,td dT are time dummies.

Microeconometrics

61

6.7 Sample Selection and Attrition in Linear Panel Data Models 6.7.3 Attrition Test and correct for attrition in a linear panel data model where attrition is assumed to be an absorbing state. Assume ( ),it ityx observed for all i when t = 1.

( )1 if , observedit it its y = x To remove the unobserved effect, first differencing:

, 2,...,it it ity u t TβΔ = Δ + Δ =x selection equation for t > 2: [ ] { } ( )1, 1 1,, 0is N− =w

it

it itw

1 0 it it tit it t it vs v δ= + >w Under the assumptions that are strictly exogenous and selection does not depend on Δx once controlled for;

x

( ) ( )1, , , 1it it it it it it it t itE u v s E u v vρ−Δ Δ = = Δ =x w Then ( ) ( )1, , 1 ,it it it it it t it tE y s β ρ λ δ−Δ Δ = = Δ +x w x w t=2,.. ,T.

pooled OLS regression using the selected sample of ityΔ on

T1ˆ ˆ, 2 ,...,i t t it t itd dλ λΔx

Microeconometrics

62

is consistent for 1β and the tρ Relaxing exogeneity of the x‘s: is a vector of variables, redundant in the selection equation and exogenous.

itz

In this case, we can estimate by IV using instruments

2ˆ ˆ2 ...it it t it T t it ity d dT errorβ ρ λ ρ λΔ = Δ + + + +x

( )ˆ ˆ, 2 ,...,it t it t itd dTλ λz in the selected sample 6.7.3 Attrition (continued) Estimate linear panel data under possible nonrandom attrition. ( ),it ityx observe only if . Under the assumption called selection on observables 1its =

( ) ( )1 1P 1 , , P 1it it it i it is y s= = =x z z Estimation method using the Inverse Probability Weighting (IPW): 2 steps

1.for each t, probit or logit is estimated of on → get the fitted values ˆ

its 1iz

itp 2.weight the objective function by 1/ ˆ itp

Microeconometrics

63

the argument of the IPW is that the probability limit of the weighted objective function is identical to that of the unweighted function if we had no attrition problem. Under this argument, Wooldridge (2000) shows that the IPW produces a consistent, N - asymptotically normal estimator. For the case where attrition is an absorbing state, the following probabilities can be used in the IPW procedure: ( )2 3 1ˆ ˆ ˆ ˆ... , where P 1 , 1it i i it it it it itp s sπ π π π −≡ ≡ = =z under the key assumption that

( ) ( ) ( )1 1 1P 1 ,..., , 1 P 1 , 1 , where ,it i iT it it it it it it its s s s − −= = = = = =v v z v w z

Microeconometrics

64

6.8 Stratified Sampling 6.8.1 Standard Stratified Sampling and Variable Probability Sampling 2 most common kinds of stratification used in social sciences: - standard stratified sampling (SS sampling) and - variable probability sampling (VP sampling). SS Sampling Population is partitioned into J groups 1 2, ,..., JW W W assumed to be non overlapping and exhaustive. Let w a RV representing the population of interest. For j = 1,...,J, draw a random sample of size jN from stratum j. For each j, denote this random sample by { }: 1,2,...,ij ji N=w The strata sample sizes are non random, thus the total sample size N is also non random. Observations within a stratum are iid, across strata they are not.

Microeconometrics

65

VP Sampling: repeat the following steps N times

1.Draw an observation at random from the population. p

iw

2.if iw is in stratum j, toss a coin with probability j of turning up heads. Let 1h = if the coin turns up heads and 0 otherwise

1ij

3.keep observation i if ijh = ; otherwise omit it from the sample

Microeconometrics

66

6.8.2 Weighted Estimators to Account for Stratification with VP sampling: define a set of binary variables that indicate whether a draw is kept in the sample and if so, which stratum it falls into: iw

ijr

the weighted M-estimator: ( )1

1 1

ˆ arg min ,N J

w ii j

j ijp r qθ

θ θ∈Θ

=

−

=∑∑= w

Wooldridge (1999) shows that under the same assumptions as Theorem 2 in chapter 1, the weighted M-estimator is consistent. Asymptotic normality follows under the same regularity conditions as in chapter 1. with SS sampling: weights are defined by ( )PjQ = ∈w jW (population frequency for stratum j). Using a random sample obtained from each stratum, we can obtain a consistent estimator as in the VP sampling with the following weights

( )/ , with /i i ij j j jQ H H N≡ N instead of 1

jp−

Microeconometrics

67

6.8.3 Stratification Based on Exogenous Variables (does not matter!) w partitioned as (x,y) where x is exogenous in the sense

( )( )0 arg min ,iE qθ

θ θ∈Θ

= w x in the VP sampling: the unweighted M-estimator on the stratification sample is:

( )1 1

ˆ arg min ,N J

u ij ij ii j

h s qθ

θ θ∈Θ

= =∑∑= w

Wooldridge (1999) shows that, when stratification is based on x, the unweighted estimator is more efficient than the weighted estimator under the key assumption:

( ) ( )( ) ( )( )' 2 20 0 0 0θ , , ,i i iE q q E qθ θ θθ θ σ∇ ∇ = ∇w w x w x

(type of IM equality) with SS sampling: similar conclusions obtained.

Microeconometrics

68

One useful fact is that when stratification is based on x we need not to compute within-strata variation in the estimated score to obtain consistent estimators for parameters that do not vary in the population.

Documents

Sample Selection Regression Models (Ch. 17)froelich.vwl.uni-mannheim.de/fileadmin/user_upload/froelich/... · Sample Selection Regression Models (Ch. 17) ... Under these assumptions,