Upload
dinhnhi
View
271
Download
3
Embed Size (px)
Citation preview
Sample Selection Regression Models (Ch. 17) Until now we always assumed to have a random sample
Now we cover cases where no random sample is available
We look at two different cases
- the sample was collected/selected according to some value of y
- the sample is selected by behaviour of the population under
consideration (self-selection)
The assumption that a random sample from the underlying population is available is not always realistic. Selected sample: non random sample, selection mechanisms due to sample design, or to behaviour of the persons being sampled (including non response on survey questions, attrition from social programs)
Microeconometrics
2
Examples: Saving function Estimate a saving function for all families in a given country: saving = 0β + 1β income + 2β age + 3β married + 4β children + u, age is the age of the household head. We have data on families whose household head is > 45 years old
leads to sample selection problem, because we are interested in all
families and have a random sample only for a subset of the population
Selection on basis of x
Microeconometrics
3
Examples
Family wealth function
Effect of pension plan on wealth accumulation
Estimate effect of worker eligibility in a pension plan on family wealth wealth = 0β + 1β plan + 2β educ + 3β age + 4β income + u, plan is an indicator for eligibility.
(17.2) 0 1 2y plan x uβ β β= + + +
The sample only contains people with wealth less than 100'000
Selection on basis of y (endogenous variable)
Microeconometrics
4
Wage offer function
Estimation of wage function for population in working age
But wages are only observed for workers
y is only observable for subsample which is defined by another variable
(working)
Self selection: decision to work depends on wage
Estimate a wage offer equation for people of working age. However, data (wage) are only available for working people. Sample selection problem often called incidental truncation, because wage is missing as a result of another outcome, participation to the labour force.
Microeconometrics
5
When can Sample Selection Be Ignored? Conditions under which 2SLS using the selected sample is consistent.
Population represented by vector (x, y, z)
x: 1 x K y: 1 x 1 z: 1 x L
Population model:
(17.3) 1 2 2 ... K Ky x x uβ β β= + + + +
(17.4) ( | ) 0E u =z
This is stronger than we need for 2SLS to be consistent!
Special case: z = x x is exogenous
General treatment x can be endogenous
Microeconometrics
6
With a random sample (17.3) can be estimated consistently with 2SLS
(if rank[E(z’x)]=K)
(17.5) 1 2 2( | ) ... K KE y x xβ β β= + + +x
No random sample available data follow selection rule.
s: binary selection indicator
s = 1: observation is used s = 0: observation is not used
Key assumption
(17.6) ( | , ) 0E u s =z
Microeconometrics
7
(17.6) can follow directly from (17.4)
- s is deterministic function of z )( | , ) ( |E u s E u=z z . In this case
selection follows a fixed rule which only depends on exogenous
variables
- Selection is independent of (z,u) )( | , ) ( |E u s E u=z z
In estimating (17.3) we apply 2SLS to observations with s = 1.
The observed sample is { }( , , , ) : 1,...i i i iy s i N=x z . Observation i is used if
. 1is =
Microeconometrics
8
The 2SLS estimator with the selected sample is
11
1 1 1
1 1 1
11 1 1
1 1 1
ˆ ' ' '
' ' '
N N N
i i i i i i i i ii i i
N N N
i i i i i i i i ii i i
N s N s N s
N s N s N s y
β
−−
− − −
= = =
−− − −
= = =
⎡ ⎤′⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎢ ⎥= ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎢ ⎥⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦⎡ ⎤′⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎢ ⎥× ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎢ ⎥⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦
∑ ∑ ∑
∑ ∑ ∑
z x z z z x
z x z z z
Substituting ii iy uβ= +x gives
11
1 1 1
1 1 1
11 1 1
1 1 1
ˆ ' ' '
' ' '
N N N
i i i i i i i i ii i i
N N N
i i i i i i i i ii i i
N s N s N s
N s N s N s u
β β
−−
− − −
= = =
−− − −
= = =
⎡ ⎤′⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎢ ⎥= + ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎢ ⎥⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦⎡ ⎤′⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎢ ⎥× ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎢ ⎥⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦
∑ ∑ ∑
∑ ∑ ∑
z x z z z x
z x z z z
Microeconometrics
9
By assumption ( | , ) 0i i iE u s =z and so ( ' ) 0i i iE s u =z (Law of iterated
expectations)
ˆplim β β= (by law of large numbers)
Theorem 17.1 (Consistency of 2SLS under Sample Selection)
In model (17.3) assume that
- (17.6) ( | , ) 0E u s =z
- (17.8) ( ' | 1)rank E s L= =z z
- (17.9) ( ' | 1)rank E s K= =z x
Then the 2SLS estimator using the selected sample is consistent for β and
asymptotically normally distributed
Microeconometrics
10
Under homoskedasticity, ( )2 2,E u s σ=z
( ) ( ) ( ) ( )1' 12 ' ' 'ˆA var N E s E s E sβ β σ
−−⎡ ⎤− = ⎢ ⎥⎣ ⎦z x z z z x with
How to estimate 2σ ?
21
2
1 1
ˆN N
pi i i
i is s u σ
−
= =
⎛ ⎞⎯⎯→⎜ ⎟
⎝ ⎠∑ ∑ (mean in selected sample)
Why is this consistent ?
2 2[ ] [ ]E su E s σ= ⋅ Hence 2 2
2 [ ] [ ][ ] [ ]
E su N E suE s N E s
σ ⋅= =
⋅
Microeconometrics
11
Example 4: Nonrandomly Missing IQ Scores
( )1 1 1log( ) , , , 0wage abil v E v abil IQδ= + + =z z Assume that IQ is a valid proxy for abil (good instrument:
correlated with abil and independent from e conditional on z1):
( )1 1, , 0abil IQ e E e IQθ= + = z
1 1 1log( )wage IQ uδ θ= + +z u=v+e
Under these assumptions, ( )1 , 0E u IQ =z . By Theorem 1, if we choose the sample excluding all people with IQs below a fixed value, then OLS estimation on the last equation will be consistent. (selection on exogenous variable)
Microeconometrics
12
6.2.2 Nonlinear Models 1. If ( ) ( ),E y s E y=x x , then selection is ignorable and NLS on the selected sample is consistent: ( ) 21
1min ,
N
i i ii
N s y mβ
β−
=
⎡ − ⎤⎣ ⎦∑ x Why? We use that ( ) ( ) ( ) ( )0 0, , ,y m y m m m ,β β β− = − + −x x x βx
and that ( )0, | , 0E y m sβ⎡ − ⎤ =⎣ ⎦x x
( ) ( )2 2[ ( , ) ] [ [( , ) | , ]]E s y m E s E y m sβ β⋅ − = ⋅ −x x x
( ) ( )2 20[ | , ]] [ ( , , ) | , ]]E s u s E s m m sβ β⋅ + ⋅ −x x x x
because 0β in ( )0,E y m β⎡ ⎤ =⎣ ⎦x x minimizes ( ){ }2
,E s y m β⎡ − ⎤⎣ ⎦x estimated by OLS of y on x, using the selected sample.
Microeconometrics
13
2. General conditional ML setup If distribution ( ) ( ),D y s D y=x x , then selection again is ignorable. This assumption holds if s in a nonrandom function of x or if s is independent of (x, y).
In this case, MLE on the selected sample is consistent: ( )1
1max , ,
N
i i ii
N s l yθ
θ−
=∑ x
because for each x , 0θ maximizes ( ), ,E l y θ⎡ ⎤⎣ ⎦x x over Θ Now write ( ) ( ){ } ( ){ }, , , , , , ,E sl y E sE l y s E sE l yθ θ θ⎡ ⎤ ⎡ ⎤⎡ ⎤ = =⎣ ⎦ ⎣ ⎦ ⎣ ⎦x x x x x
Because, for every x, ( ), ,E l y θ⎡ ⎤⎣ ⎦x x is maximized at 0θ it must also be the case that ( ){ }, ,E sE l y θ⎡ ⎤⎣ ⎦x x is maximized at 0θ
Microeconometrics
14
0θ maximizes ( ){ }, ,E sE l y θ⎡ ⎤⎣ ⎦x x
Microeconometrics
15
Truncated Regression (Selection on Response Variable)
( ,i iyx ) random draw from population: estimate ( | )i iE y x in this population.
But we only observe sample selected on value of y
Examples: wealth, wage in specific samples, ...
yi is continuous variable
The selection rule is 1 21[ ]i is a y a= < < were a1 and a2 are known.
We observe if 2a( , )i iyx 1 ia y< < , otherwise we observe neither y nor x.
In most cases we want to estimate ( | )i i iE y β=x x .
Microeconometrics
16
we need specification of full conditional distribution of |i iy x
Specify the conditional density of |iiy x ( | ; , )if β γ⋅ x ,
where γ are additional parameters, e.g. variance
cdf of is given by |iiy x ( | ; , )iF β γ⋅ x
In estimating ( | )i iE y x we must condition on 2a1 ia y< < , i.e. si = 1
Selection rule indicates that if falls in iy ( )1 2,a a , then both iy and are observed; if ix iy is outside this interval, then we do not observe iy and . In estimation, use the density of ix iy conditional on and the fact that we observe ix ( ),x iy i
The cdf of is | , 1ii iy s =x ( , 1 | )( | , 1)
( 1 | )i i i
i i ii i
P y c sP y c sP s
≤ =≤ = =
=xx
x
Microeconometrics
17
1 2 2 1( 1 | ) ( | ) ( | ; , ) ( | ; , )i i i i iP s P a y a F a F aβ γ β γ= = < < = −x x x x
If y is truncated from one side only then either: a1 = -∞ or a2 = ∞
To obtain the numerator above we write
1 1( , 1 | ) ( | ) ( | ; , ) ( | ; )i i i i i i ,P y c s P a y c F c F aβ γ β γ≤ = = < < = −x x x x
Plug this into the above equation and take the derivative with respect to c
we get the density of yi given (xi,si)
(17.14) 2 1
( | ; , )( | , 1)( | ; , ) ( | ; , )
ii i
i i
f cp c sF a F a
β γβ γ β γ
= =−
xxx x
1 2for a c a< <
(17.14) is valid irrespective of specific distributional assumption.
Usually we assume a normal distribution for f.
Assume further that ( | )E y β=x x
Microeconometrics
18
Then we have
2 1
1
( | , 1)
i i
i ii i
y
f y sa a
βφσ σ
β βσ σ
−⎛ ⎞⎜ ⎟⎝ ⎠= =
− −⎛ ⎞ ⎛ ⎞Φ − Φ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
x
xx x
In many cases . 1 20 und a a= = ∞
The CMLEs of β and γ using the selected sample are efficient in the class of estimators not using
information about the distribution of x.
Microeconometrics
19
6.3 Selection on Basis of the Response Variable: Truncated Regression In most applications of truncated samples, the population conditional distribution is assumed to be ( )2,N β σx : truncated Tobit model or truncated normal regression model. The truncated Tobit model is related to the censored Tobit model for data-censoring applications (see Chapter 5). The key difference between censored and truncated regressions is that in censored regression, we observe x for all people even if y is not known. heteroskedasticity or nonnormality in truncated regression results in inconsistent estimators of β.
Microeconometrics
20
Example:
set obs 10000 g x = uniform() g u = invnorm(uniform()) replace u = 2*u g y = 1 + x + u drop if y <= 0 reg y x outreg using d:\stata\micro\out\trunc_sim, replace truncreg y x, ll(0) outreg using d:\stata\micro\out\trunc_sim, append OLS TRUNCREG x 0.536 0.951 (9.11)** (9.00)** Constant 2.011 1.047 (57.84)** (13.24)** σ 1.97 (64.5)** Observations 7708 7708 Absolute value of t statistics in parentheses * significant at 5%; ** significant at 1%
Microeconometrics
21
17.4 Probit Selection
Sample selection is not result of sample design but due to decisions made by
members of the population (self selection)
Exogenous explanatory variable
Classic example: Labour force participation and wages
We want to know: i for a person randomly drawn from the
population (w : wage)
( | )iE w x
w is only observed for working people.
Microeconometrics
22
Model of labour supply:
(17.15) max ( , ) wrt 0 168i i ihU w h a h h+ ≤ ≤
h: hours of work per week a non-labour income
( ) ( ,i i is h U w h a h≡ + )i and h < 168
Possible solutions: h = 0 or 0 < h < 168
If d /d 0 at 0 0is h h h≤ = ⇒ =
Microeconometrics
23
This implies that h = 0 if
(17.16) i ( ,0) / ( ,0)h qi i i iw mu a mu a≤ −
where muh is marginal disutility of work and muq is marginal utilty of
income. The righthand side of (17.16) is called the reservation wage wr.
Parametric assumptions:
(17.17) 1 1 1 2 2 2 2exp( ) exp( )ri i i i i i iw u w a uβ β γ= + = +x x +
(u11 , ui2) independent of (xi1 , xi2 , ai). xi1 contains productivity
characteristics and xi2 contains charactistics that determine marginal utility
of leisure and income (there may be an overlap)
(17.18) 11 1log i i iw uβ= +x
Microeconometrics
24
But wage is only observed if w > wr, i.e.
1 1 2 2 2 1 2 2 2log log 0ri i i i i i i iw w a u u vβ β γ δ− = − − + − ≡ + >x x x
Problem: wr is not observed and depends on xi2 and ui2 ,
wr is unknown constant we need another estimation procedure
Notation: drop subscript i, w1 logy ≡ and y2 ist binary indicator
(17.19) 1 1 1 1y uβ= +x
(17.20) 2 2 21[ 0]y vδ= + >x
(17.20) is a probit if v2 is normally distributed
Microeconometrics
25
Assumptions 17.1: (a) (x,y2) are always observed, y1 only observed if y2 =1
(b) (u1,v2) is independent of x with zero mean
(c) 1) and (d) 2 ~ (0,v N 1 2 1 2( | )E u v vγ=
(a) describes the selection process;
(b) is strong exogeneity assumption;
(c) necessary to derive a conditional expectation given the selected sample;
(d) requires linearity of regression of u on v.
(d) always holds if (u1,v2) is bivariate normal (but it is not necessary
to assume that u is normally distributed).
Microeconometrics
26
Estimation of Selection Model
Let ) denote a random draw from the population. Given the
selection rule we can hope to estimate
1 2 1 2( , , , ,y y u vx
2 2( | , 1) and ( 1| )iE y y P y= =x x
How does ) depend on β1? 2( | , 1iE y y =x
First, note that
(17.21) 12 1 1 1 2 1 1 1 2 1 1 2( | , ) ( | , ) ( | )iE y v E u v E u v vβ β β γ= + = + = +x x x x x
where the second equality follows because (u1,v2) is independent of x
If γ1 = 0 no selection problem!
Microeconometrics
27
What if γ1 ≠ 0? Using iterated expectations on (17.21) gives
2 1 1 1 2 2 1 1 1 2( | , ) ( | , ) ( , )iE y y E v y h yβ γ β γ= + = +x x x x x
where ) 2 2 2( , ) ( | ,h y E v y=x x
If we knew ), we could estimate β1 und γ1 from 2( ,h yx
the regression of y1 on x and ) (in the selected sample). 2( ,h yx
In the selected sample y2 = 1 we only have to find ). ( ,1h x
2 2 2 2( ,1) ( | ) (h E v v )δ λ δ= > − =x x x , where ( )( )( )
φλ ⋅⋅ =
Φ ⋅
Microeconometrics
28
This follows from a special property of the normal distribution :
If ~ (0,1)z N then ( )( | )1 ( )
cE z z cc
φ> =
− Φ
The term ( )( )( )
φλ ⋅⋅ =
Φ ⋅ is called the inverse of Mill’s ratio
This implies
(17.22) )1 2 1 1 1 2( | , 1) (E y y β γ λ δ= = +x x x
From (17.22) it is obvious that OLS of y on x1 in the selected sample omits
the term 2( )λ δx omitted variable bias
Microeconometrics
29
(17.22) also shows a way to consistently estimate β1.
Heckman (1979) has shown that β1 und γ1 can consistently be estimated in
the selected sample by regressing y on x1 and 2( )λ δx .
But δ2 is unknown and must be estimated in a first step (using Probit).
Microeconometrics
30
Heckman Estimator
Step 1: Estimate Probit model
(17.23) 22( 1 | ) )iP y δ= = Φ(x x using all observations.
Obtain 2 2ˆ ˆ( )i iλ λ δ≡ x
Step 2: Estimate 11 ˆ und β γ using OLS in the selected sample
(17.24) 1 1 1 1 2ˆ
i i i iy uβ γ λ= +x +
This estimator is consistent and asymptotically normally distributed
Microeconometrics
31
Simple test for selection bias:
under H0 (no selection bias) in (17.24) γ1 = 0 t – test for γ1.
IMPORTANT: this test is only valid if the model is correctly specified
(distributional assumptions)
If γ1 ≠ 0 the standard errors of β1 must be corrected
- for heteroskedasticity
- because δ2 has been estimated in the first step
Stata does this for you if you use the command heckman
Microeconometrics
32
Theoretically, it is not necessary that x1 is a strict subset of x
β1 is identified if x = x1 (because λ is nonlinear function of x)
However, in practice λ is often almost a linear function of x
severe multicollinearity very imprecise estimates
Strong recommendation: you should have at least one element in x that is
not in x1 (exclusion restriction)
Microeconometrics
33
Relation between xβ and λ 0
12
3la
mbd
a
-4 -2 0 2 4xb
Microeconometrics
34
use d:\stata\micro\data\mroz; reg lwage educ exper expersq; heckman lwage educ exper expersq, select (inlf = educ exper expersq age kidslt6 kidsge6 nwifeinc) twostep; heckman lwage educ exper expersq , select (inlf = educ exper expersq ) twostep;
Table 17.1
wage equation OLS Heckman 2 step
Heckman 2 step no excl. restr.
educ 0.107 0.109 0.093 (7.60)** (7.03)** (1.82) exper 0.042 0.044 0.021 (3.15)** (2.70)** (0.28) expersq -0.001 -0.001 -0.000 (2.06)* (1.96) (0.27) mills:lambda 0.032 -0.270 (0.24) (0.28) Constant -0.522 -0.578 -0.010 (2.63)** (1.90) (0.01)
Microeconometrics
35
selection equation inlf:educ 0.131 0.097 (5.18)** (4.38)** inlf:exper 0.123 0.127 (6.59)** (7.12)** inlf:expersq -0.002 -0.002 (3.15)** (4.12)** inlf:age -0.053 (6.23)** inlf:kidslt6 -0.868 (7.33)** inlf:kidsge6 0.036 (0.83) inlf:nwifeinc -0.012 (2.48)* inlf:Constant 0.270 -1.925 (0.53) (6.67)** lambda .032 -.270 (0.24) (-0.28) sigma .663 .691 Observations 428 753 753 R-squared 0.16 Absolute value of t statistics in parentheses * significant at 5%; ** significant at 1%
Microeconometrics
36
Data generation for selection problem
set obs 10000 g x = uniform() g z = uniform() matrix c = (4, 1 \ 1, 1) /*Kovarianzmatrix u1,v2*/ drawnorm u1 v2, n(10000) cov(c) /*korrelierte Störterme */ g y1 = 1 + x + u1 g y2star = 0.5 + 0.5*x + 0.5*z + v2 g y2 = y2star>0.6 replace y1 = . if y2==0 reg y1 x heckman y1 x, select (y2= x z) twostep heckman y1 x, select (y2= x z) heckman y1 x, select (y2= x) twostep heckman y1 x z, select (y2= x z) twostep
Microeconometrics
37
Simulation results
OLS Heckman 2 step
Heckman ML
2 step no excl. restr.
2 Step no excl. restr.
Structural equation x 0.758 1.206 1.115 -0.258 -1.432 (9.48)** (8.96)** (11.94)** (0.08) (0.48) z -2.264 (0.88) lambda 1.611 -3.585 -7.603 (4.44)** (0.32) (0.73) Constant 1.683 0.557 0.793 4.216 8.219 (35.23)** (2.15)* (7.54)** (0.53) (0.95) Selection equation x 0.535 0.525 0.526 0.535 (11.83)** (11.65)** (11.67)** (11.83)** z 0.457 0.474 0.457 (10.12)** (11.23)** (10.12)** Constant -0.093 -0.097 0.137 -0.093 (2.73)** (2.93)** (5.36)** (2.73)** rho 0.712 (8.93)** Absolute value of t statistics in parentheses * significant at 5%; ** significant at 1%
Microeconometrics
38
Predictions after estimation of selection models
Often selection models are used to predict the dependent variable for the
observations not in the selected subsample
Example: expected wage of nonworkers
Correct prediction:
1ˆ( | )i i iE y β=x x
and NOT
2 1 2 1ˆ ˆ( | , 1) ( | )i i i i i i iE y y E yβ γ λ= = + ≠x x x
Stata heckman lnlohn .../*selection model for ln(wage)*/
predict lnlohn_pred, e(.,.) /* prediction of ln(wage)*/
Microeconometrics
39
Joint ML estimator
If (c) and (d) in Assumption 1 are replaced by stronger assumption that ( )1 2,u v ( ) ( ) ( )2
1 1 1 2 12 2, , , and 1Var u Cov u v Var vσ σ= = = is bivariate normal with mean 0, then partial likelihood estimation can be used. Partial MLE will be more efficient than the 2-step procedure (Partial MLE using the density of 1y when ) 2 1y =
( ) ( ) ( ) ( )( ) ( ) ( ){ }
1 2 2 1 1 2
1/ 22 2 22 1 2 12 1 1 1 1 12 1
1, P 1 , / P 1 , with
P 1 , 1
f y y y y f y y
y y yδ σ σ β σ σ−− −
= = = =
⎡ ⎤= = Φ + − −⎣ ⎦
x x x x
x x x
the log-likelihood for observation i is:
( ) ( ) ( )
( ) ( ){ } ( ) ( )( )2 2
1/ 22 2 22 2 12 1 1 1 1 12 1 1 1 1 1 1
1 log 1
log 1 log / log
i i i
i i i i i i
l y
y y y
θ δ
δ σ σ β σ σ φ β σ σ−− −
= − ⎡ − Φ ⎤ +⎣ ⎦
⎡ ⎤ ⎡Φ + − − + − −⎣ ⎦ ⎣
x
x x x ⎤⎦
Microeconometrics
40
Endogenous Explanatory Variables
One element of x1 correlated with u1
(17.25) 1 1 1 1 2 1y y uδ α= + +z
(17.26) 2 2 2y vδ= +z
(17.27) 3 3 31[ 0]y vδ= + >z
(17.25) is the structural equation to be estimated,
(17.26) is linear projection of the endogenous variable y2 (i.e. not structural)
(17.27) is the selection equation.
The correlations between u1, v2, v3 are unrestricted.
Microeconometrics
41
3 interesting cases:
- y2 is always observed, but endogenous in (17.25) (e.g. education in wage
equation)
- y2 is as well only observed if y3 = 1 . In this case y2 can be exogenous in
the population, but due to selection it becomes endogenous.
- y1 is always observed, but y2 only sometimes
If y1 and y2 were always observed along with z,
we would estimate (17.25) with 2SLS if y2 is endogenous.
In case of selection 2SLS with the inverse of Mill’s ratio added to the
regressors is consistent (only using the selcted sample in the second step).
Microeconometrics
42
Assumptions 17.2:
(a) (z,y3) are always observed, (y1, y2) are only observed if y3 = 1
(b) (u1,v3) are independent of z with mean 0
(c) 1) 3 ~ (0,v N
(d) 1 3 1 3( | )E u v vγ=
(e) 2 2 1 21 2 22 22( ' ) 0 and in is 0E v δ δ δ δ= = + ≠z z z z
Parts b, c, und d are identical to assumptions 17.1. Assumption e is new, it
corresponds to the usual assumptions needed for identification in 2SLS
Microeconometrics
43
Derivation of estimating equation
Write
(17.28) 1 1 1 1 2 3 1( , )y y g y eδ α= + + +z z
with 3z . Thus 03 1 3 1 1 1( , ) ( | , ) und ( | , )g y E u y e u E u y≡ ≡ −z z 1 3( | , )E e y =z
Note that cov(g,e1) = 0.
If we knew ) we estimate (17.28) with 2SLS in selected sample,
with instruments (z )). 3( ,g yz
, ( ,1g z
We know up to some parameters: )( ,1)g z 1 3 1 3( | , 1) (E u y γ λ δ= =z z
δ3 can be estimated consistently with Probit two-step prodecure
Microeconometrics
44
Step 1: Estimate 3δ with Probit of y3 on z using all observations and
calculate 3 3ˆ ˆi i( )λ λ δ= z
Step 2: Estimate in selected sample
(17.29) 1 1 1 1 2 1 3ˆ
i i i i iy y eδ α γ λ= + + +z with 2SLS and IV: ( i3ˆ,i λz )
This procedure applies to any kind of endogenous variable y2, including
discrete variables (because reduced form for y2 (17.26) is linear projection
without distributional assumptions)
z2 must have predictive power in regression of y2 onto z1, z2, 3( )iλ δz
two exclusion restrictions needed (otherwise functional form identification)
Hypothesis of selection bias can be testet with t-value for 3iλ .
Microeconometrics
45
Example:
wage offer equation with education being endogenous
IV for education: mother’s and father’s education
IV for selection: number and age of children, non-labour income
Microeconometrics
46
How to do it with STATA
Possible endogeneity of education in wage equation of married women:
Instruments: education of parents and husband
use d:\stata\micro\data\mroz
probit inlf exper expersq age kidslt6 kidsge6 nwifeinc motheduc fatheduc huseduc;
predict xb, xb;
g lambda = normden(xb)/norm(xb);
ivreg lwage exper expersq lambda (educ=motheduc fatheduc huseduc kidslt6 kidsge6 nwifeinc ) if inlf==1;
Microeconometrics
47
6.4.3 Binary Response Models with Sample Selection Assume that latent errors are bivariate normal and independent of regressors
[ ][ ]
1 1 1 1
2 2 2
1 0
1 0
y u
y v
β
δ
= + >
= + >
x
x is observed only when 1y 2 1y = x is always observed. Example: 1y is employment indicator, x contains a job training indicator We can lose track of some people who are eligible to participate in program; example of sample attrition. If attrition is systematically related to , then estimation using the selected sample leads to an inconsistent estimator of
1u1β .
If we assume that ( )1 2,u v is independent of x with a 0 mean normal distribution and unit variances, we can apply the partial MLE using the density of 1y conditional on x and 2 1y = .
Microeconometrics
48
2-step procedure can also be applied: (1) estimate 2δ by probit of 2y on x; (2) estimate 1β and 1ρ (the correlation between and ) 1u 2v
along with ( )1 2P 0 , 1y y= =x
Microeconometrics
49
6.5 A Tobit Selection Equation 6.5.1 Exogenous Explanatory Variables Selection equation is a censored Tobit equation. The population model is:
1 1 1 1y uβ= +x ( )2 2 2max 0,y vδ= +x
where )( 2, yx always observed, but is observed only when 1y 2 0y >
Example: and 1 log( )y wage= 2 log( )y hours= Assumption 3: Type III Tobit model
(a) )( 2, yx always observed, but observed only when 1y 2 0y >
(b) independent of x; (
( )1 2,u v
)2~ 0,(c) 2v N 2τ here we do not have to normalize variance (d) ( )1 2 1 2E u v vγ=
Estimate ( )1 2 2 1 1 1 2, ,E y v s vβ γ= +x x
Microeconometrics
50
Estimate ( )1 2 2 1 1 1 2, ,E y v s vβ γ= +x x If we knew v2 we could estimate this equation. For observations with s2=1 we can estimate 2 2 2v y δ= − x This was not possible in Probit selection model Procedure 3: (a) 2δ is the standard Tobit estimate from the selection model using all N observations, then compute 2 2 2 2
ˆˆ , for those obs with 0i i i iv y yδ= − >x (b) estimator 1β and 1γ from the OLS regression of on and using the selected sample
1iy 1ix 2iv
2 0iy > The estimators are consistent and N -asymptotically normal. No instrument needed, because variation in 2y produces variation in 2v
Microeconometrics
51
For partial likelihood estimation, assume that ( )1 2,u v jointly normal such that ( ) ( ) ( )2 2
1 1 1 2 12 2 2, , and Var u Cov u v Var vσ σ ι= = = Density ( )2f y x for entire sample is used and the conditional density ( )1 2f y yx, for selected sample. The log-likelihood for observation i is:
( ) ( ) ( ) [ ]22 1 2 2 2 2 2 2log , ; log ; , , with 1 0i i i i i i i i il s f y y f y s yθ θ δ ι= + = >x x
where ( ) ( ) ( )2 2 2 2
1 2 1 1 1 2 2 1 1 12 2, ; , /i i i i i if y y Normal yθ β γ δ η σ σ ι⎡ ⎤= + − ≡ −⎣ ⎦x x x ( )2
2 2 2; , standard censored Tobit densityi if y δ ιx
( )[ ] [ ]1 0 1 0
1see chapter 5, 1-y y
i ii f y = β βφ
σ σ σ
= >⎛ ⎞⎧ ⎫ ⎧ ⎫⎛ ⎞ ⎡ ⎤⎜ ⎟Φ⎨ ⎬ ⎨ ⎬⎜ ⎟ ⎢ ⎥⎜ ⎟⎝ ⎠ ⎣ ⎦⎩ ⎭ ⎩ ⎭⎝ ⎠
x xx
Microeconometrics
52
6.5.2 Endogenous Explanatory Variables (in Tobit model) the model in the population is:
1 1 1 1 2 1 (4)y y u δ α= + +z 2 2 2 (5)y v δ= +z
( )3 3 3max 0, (6)y v δ= +z Assumption 4:
((a) ) always observed, 3, yz ( )1 2,y y observed when )
3 0y >
(b) is independent of z (
( 1 3,u v
)23 3~ 0,v N τ (c)
(d) ( )1 3 1 3E u v vγ=
(e) and writing ( )2 0E v ='z2 1 21 2 22 22, 0δ δ δ δ= + ≠z z z
We need only one instrument (for selection equation)
Microeconometrics
53
Write (17.3)
1 1 1 1 2 1 3 1y y v eδ α γ= + + +z where 31 1 1[ | ]e u E u v= − Procedure 4: (a) obtain 3δ from Tobit of 3y on z using all observations (eq(6)). Obtain the Tobit residuals 3 3 3 3
ˆˆ , for 0.i i i iv y yδ= − >z (b) using the selected subsample (for which 1y and 2y are observed), estimate the equation: i1 1 1 1 2 1 3ˆi i i iy y v +error δ α γ= + +z by 2SLS using instruments ( )3ˆ,i ivz
Microeconometrics
54
6.6 Estimating Structural Tobit Equations with Sample Selection Structural labor supply model involving simultaneity and sample selection
01 1 1 1log( )y w uβ≡ = +z
( )2 2 2 2 1 2max 0,y h y uβ α≡ = + +z Reduced form: enter equation 1 into equation 2. What is different from previous analysis? Now we are interested in 2α Assumption 5:
( )2, yz always observed, 1y observed when 2 0y >(a) (b) is independent of z with 0-mean bivariate normal distribution ( )1 2,u u
(c) contains at least one element with non-zero coefficient not in 1z 2z
i.e. we need an IV for the first equation The assumption (c) is needed to identify 2 2,α β , whereas 1β is always identified
Microeconometrics
55
require new methods, whether or not and are uncorrelated, because is not observed when y
2 0y >
02u
1y 2 = . Estimation of ( )2 2,α β easy to obtained after having estimated 1β . Procedure 5:
1β(a) use procedure 3 to obtain (b) obtain 2β and 2α from the Tobit in ( )( )2 2 2 2 1 1max 0,i i i iy errorβ α β= + +z z
Microeconometrics
56
6.7 Sample Selection and Attrition in Linear Panel Models unbalanced panel (time periods for some persons are missing because of rotating panel or attrition or incidental truncation problem 6.7.1 Fixed Effects Estimation with Unbalanced Panels Model:
, 1,...,it it i it iy c u t Tβ= + + =x where is a 1xK and β a Kx1 vector. For a random draw i from the population, let
itx
( '1 2, ,...,i i i iTs s s≡s
( )1 if , oit it its y = x) the Tx1 vector of selection indicators:
bserved
random sample from the population: ( ){ }, , : 1,2,...,i i i i N=x y s Fixed effects estimator:
11 ' 1 '
1 1 1 1
ˆ ,N T N T
it it it it it iti t i t
N s N s uβ β−
− −
= = = =
⎛ ⎞ ⎛ ⎞= + ⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠∑∑ ∑∑x x x&& && &&
Microeconometrics
57
6.7 Sample Selection and Attrition in Linear Panel Data Models 6.7.1 Fixed Effects Estimation with Unbalanced Panels (continued) Assumption 6: (a) ( ), , 0, 1,2,...,it i i iE u c t T= =x s
(b) nonsingular; ( )1
T
it it itt
E s=∑ x 'x&& &&
(c) ( )' 2, , Ii i i i i u TE c σ=u u s x Under Assumption 6, the FE on the unbalanced panel is consistent and asymptotically normal (T fixed and large N)
( )1
2 'it
1 1
ˆ ˆA V arN T
u it iti t
N sβ β σ−
= =
⎛ ⎞− = ⎜ ⎟
⎝ ⎠∑∑ x x&& && with ( )
12 2 2
1 1 1
ˆ ˆ1N N T
Nu i it it u
i i tT s uσ σ
−→∞
= = =
⎡ ⎤= − ⎯⎯⎯→⎢ ⎥
⎣ ⎦∑ ∑∑
Microeconometrics
58
6.7 Sample Selection and Attrition in Linear Panel Data Models 6.7.2 Testing and Correcting for Sample Selection Bias Model: 1 1 1 1 1, 1,...,it it i ity c u t Tβ= + + =x selection equation: [ ] ( )2 2 2 21 0 , 0,1it i t it it is v v N ψ= + >x x contains 1i → x
(note: no fixed effect in selection equation!) under the null of Assumption 6 (a), the inverse Mills ratio 2itλ should not be significant in the equation estimated by fixed effects. Then a valid test of the null is a t statistic on 2itλ in the FE estimation on the unbalanced panel. under Assumption 6 (c), the usual t statistic is valid. Correcting for sample selection: adding 2itλ to the equation and using FE does not produce consistent estimators (if FE in selection equation). Chamberlain‘s approach to panel data models works, but we need some linearity assumptions.
Microeconometrics
59
Assumption 7: (a) the selection equation is given above; (b) ( ) ( )1 2 1 2 1 2, , 1,...,it i it it it t itE u v E u v v t Tρ= = =x ; and (c) ( )1 2 1 1 2,i i it i t itE c v vπ φ= +x x
Under Assumption 7, ( ) ( )1 2 1 1 1 1 2x , 1it i it it i t i tE y s β π γ λ ψ= = + +x x x
Microeconometrics
60
6.7 Sample Selection and Attrition in Linear Panel Data Models 6.7.2 Testing and Correcting for Sample Selection Bias (continued) we can consistently estimate 1β by 1) estimate a probit of 2its on for each t, compute inverse Mills ratio, ix
2itλ , all i and t; 2) run the pooled OLS regression using the selected sample of on 1ity
11 2 2 2ˆ ˆ ˆ, , , 2 ,..., for allit i it t it t it itd dT sλ λ λ =x x
where t
2 ,...,td dT are time dummies.
Microeconometrics
61
6.7 Sample Selection and Attrition in Linear Panel Data Models 6.7.3 Attrition Test and correct for attrition in a linear panel data model where attrition is assumed to be an absorbing state. Assume ( ),it ityx observed for all i when t = 1.
( )1 if , observedit it its y = x To remove the unobserved effect, first differencing:
, 2,...,it it ity u t TβΔ = Δ + Δ =x selection equation for t > 2: [ ] { } ( )1, 1 1,, 0is N− =w
it
it itw
1 0 it it tit it t it vs v δ= + >w Under the assumptions that are strictly exogenous and selection does not depend on Δx once controlled for;
x
( ) ( )1, , , 1it it it it it it it t itE u v s E u v vρ−Δ Δ = = Δ =x w Then ( ) ( )1, , 1 ,it it it it it t it tE y s β ρ λ δ−Δ Δ = = Δ +x w x w t=2,.. ,T.
pooled OLS regression using the selected sample of ityΔ on
T1ˆ ˆ, 2 ,...,i t t it t itd dλ λΔx
Microeconometrics
62
is consistent for 1β and the tρ Relaxing exogeneity of the x‘s: is a vector of variables, redundant in the selection equation and exogenous.
itz
In this case, we can estimate by IV using instruments
2ˆ ˆ2 ...it it t it T t it ity d dT errorβ ρ λ ρ λΔ = Δ + + + +x
( )ˆ ˆ, 2 ,...,it t it t itd dTλ λz in the selected sample 6.7.3 Attrition (continued) Estimate linear panel data under possible nonrandom attrition. ( ),it ityx observe only if . Under the assumption called selection on observables 1its =
( ) ( )1 1P 1 , , P 1it it it i it is y s= = =x z z Estimation method using the Inverse Probability Weighting (IPW): 2 steps
1.for each t, probit or logit is estimated of on → get the fitted values ˆ
its 1iz
itp 2.weight the objective function by 1/ ˆ itp
Microeconometrics
63
the argument of the IPW is that the probability limit of the weighted objective function is identical to that of the unweighted function if we had no attrition problem. Under this argument, Wooldridge (2000) shows that the IPW produces a consistent, N - asymptotically normal estimator. For the case where attrition is an absorbing state, the following probabilities can be used in the IPW procedure: ( )2 3 1ˆ ˆ ˆ ˆ... , where P 1 , 1it i i it it it it itp s sπ π π π −≡ ≡ = =z under the key assumption that
( ) ( ) ( )1 1 1P 1 ,..., , 1 P 1 , 1 , where ,it i iT it it it it it it its s s s − −= = = = = =v v z v w z
Microeconometrics
64
6.8 Stratified Sampling 6.8.1 Standard Stratified Sampling and Variable Probability Sampling 2 most common kinds of stratification used in social sciences: - standard stratified sampling (SS sampling) and - variable probability sampling (VP sampling). SS Sampling Population is partitioned into J groups 1 2, ,..., JW W W assumed to be non overlapping and exhaustive. Let w a RV representing the population of interest. For j = 1,...,J, draw a random sample of size jN from stratum j. For each j, denote this random sample by { }: 1,2,...,ij ji N=w The strata sample sizes are non random, thus the total sample size N is also non random. Observations within a stratum are iid, across strata they are not.
Microeconometrics
65
VP Sampling: repeat the following steps N times
1.Draw an observation at random from the population. p
iw
2.if iw is in stratum j, toss a coin with probability j of turning up heads. Let 1h = if the coin turns up heads and 0 otherwise
1ij
3.keep observation i if ijh = ; otherwise omit it from the sample
Microeconometrics
66
6.8.2 Weighted Estimators to Account for Stratification with VP sampling: define a set of binary variables that indicate whether a draw is kept in the sample and if so, which stratum it falls into: iw
ijr
the weighted M-estimator: ( )1
1 1
ˆ arg min ,N J
w ii j
j ijp r qθ
θ θ∈Θ
=
−
=∑∑= w
Wooldridge (1999) shows that under the same assumptions as Theorem 2 in chapter 1, the weighted M-estimator is consistent. Asymptotic normality follows under the same regularity conditions as in chapter 1. with SS sampling: weights are defined by ( )PjQ = ∈w jW (population frequency for stratum j). Using a random sample obtained from each stratum, we can obtain a consistent estimator as in the VP sampling with the following weights
( )/ , with /i i ij j j jQ H H N≡ N instead of 1
jp−
Microeconometrics
67
6.8.3 Stratification Based on Exogenous Variables (does not matter!) w partitioned as (x,y) where x is exogenous in the sense
( )( )0 arg min ,iE qθ
θ θ∈Θ
= w x in the VP sampling: the unweighted M-estimator on the stratification sample is:
( )1 1
ˆ arg min ,N J
u ij ij ii j
h s qθ
θ θ∈Θ
= =∑∑= w
Wooldridge (1999) shows that, when stratification is based on x, the unweighted estimator is more efficient than the weighted estimator under the key assumption:
( ) ( )( ) ( )( )' 2 20 0 0 0θ , , ,i i iE q q E qθ θ θθ θ σ∇ ∇ = ∇w w x w x
(type of IM equality) with SS sampling: similar conclusions obtained.
Microeconometrics
68
One useful fact is that when stratification is based on x we need not to compute within-strata variation in the estimated score to obtain consistent estimators for parameters that do not vary in the population.