Sample Selection, Heteroscedasticity, and Quantile · PDF fileSample Selection, Heteroscedasticity, and Quantile Regression Blaise Melly, Martin Huber Preliminary First draft: December

Sample Selection, Heteroscedasticity, and Quantile Regression

Blaise Melly, Martin Huber

PreliminaryFirst draft: December 2006, Last changes: February 2008

Abstract:

Independence of the error term and the covariates is a crucial assumption in virtually all

sample selection models. If this assumption is not satis�ed, for instance due to heteroscedasticity,

both mean and quantile regression estimators are inconsistent. If independence holds indeed, all

quantile functions and the mean function are parallel, which naturally limits the usefulness of

quantile estimators. However, quantile estimators can be used to build tests for the independence

condition because they are consistent under the null hypothesis. Therefore, we propose powerful

tests based on the whole conditional quantile regression process. If the independence assumption is

violated, quantile functions are not point identi�ed, but we show that it is still possible to bound

the coe¢ cients of interest. Our identi�ed set shrinks to a single point either if independence

holds or if some observations are selected and observed with probability one. Therefore, our

model generalizes simultaneously the traditional sample selection models and the identi�cation

at in�nity strategy.

Keywords: sample selection, quantile regression, heteroscedasticity, test, bootstrap, bounds

JEL classi�cation: C12, C13, C14, C21

We have bene�ted from comments by Michael Lechner and seminar participants at the University of St.

Gallen. Addresses for correspondence: Blaise Melly, MIT Department of Economics, 50 Memorial Drive, E52-251d,

Cambridge, MA 02142, USA, [email protected], www.siaw.unisg.ch/lechner/melly; Martin Huber, SIAW, University

of St. Gallen, Varnbüelstrasse 14, 9000 St. Gallen, Switzerland, [email protected].

1 Introduction

Selection bias arises when the outcome of interest is only observable for a subsample of individ-

uals conditional on selection and when selection is not random. A prominent example in labor

1

economics consists of the determinants of wages and labor supply behavior of females. Individu-

als are assumed to o¤er a positive labor supply only if their potential wage exceeds their reser-

vation wage such that a selection bias arises if we try to estimate the wage o¤er function in the

working subsample. The ability to consistently estimate econometric models in the presence of

nonrandom sample selection is one of the most important innovations in microeconometrics, as

illustrated by the Nobel Prize received by James Heckman.

Gronau (1974) and Heckman (1974 and 1979) have addressed the selectivity bias and

proposed fully parametric estimators. Naturally, this approach leads to inconsistent results if

the distribution of the error term is misspeci�ed. Therefore, Cosslett (1991), Gallant & Nychka

(1987), Powell (1987), Ahn & Powell (1993), and Newey (1991) have proposed semiparametric

estimators for the sample selection model. More recently, Das, Newey & Vella (2003) have

proposed a fully nonparametric estimator for this model. While these papers have weakened

the parametric and distributional assumptions made originally, they all assume independence

between the error term and the regressors. This assumption is crucial in parametric,

semiparametric and nonparametric models, as conditioning on the selection probability is not

su¢ cient to correct for the selection bias if it is violated.

However, dependence in general and heteroscedasticity in particular is an ubiquitous phenom-

enon in the �elds where sample selection models have been used. As suggested by Mincer (1973)

in his famous human capital earnings model, residual wage dispersion should increase with expe-

rience and education. In line with this �nding, the large majority of the applications of quantile

regression in the empirical literature �nds di¤erent coe¢ cients on di¤erent parts of the condi-

tional distribution. Therefore, the independence assumption cannot be taken as granted in most

economic applications. Donald (1995) has alleviated this assumption and proposed a two-step

estimator that allows for conditional heteroscedasticity but requires the error terms to be bivari-

ate normally distributed.1 Since distributional assumptions are always di¢ cult to motivate and

it is not clear why the regressors should a¤ect only the �rst two moments of the conditional dis-

tribution, we suppress the normality assumption in this paper.

In the absence of selection, Koenker & Bassett (1978) proposed and derived the statistical

1Chen & Khan (2003) have proposed a semiparametric estimator allowing for heteroscedasticity. However, it

appears that the proper identi�cation of their model requires a variable a¤ecting the variance but not the mean

of the dependent variable conditionally on the regressors. This additional exclusion restriction renders this model

unattractive.

2

properties of a parametric (linear) estimator for conditional quantile models. Due to its ability

to capture heterogeneous e¤ects, its theoretical properties have been studied extensively and it

has been used in many empirical studies; see, for example, Powell (1986), Guntenbrunner & Ju-

reµcková (1992), Buchinsky (1994), Koenker & Xiao (2002), Angrist, Chernozhukov & Fernández-

Val (2006). Chaudhuri (1991) analyzed nonparametric estimation of conditional QTE. Buchinsky

(1998b), Koenker & Hallock (2001), and Koenker (2005) provide a comprehensive discussion of

the quantile regression models and its recent developments.

Buchinsky (1998 and 2001) was the �rst to consider the semiparametric sample selection

model for conditional quantiles. He extends the series estimator of Newey (1991) for the mean

to the estimation of quantiles. The problem with this approach is that the independence

assumption is required to obtain the nice partial linear representation for the mean and the

quantiles. Implicitly, Buchinsky (1998 and 2001) assumes independence between the error term

and the regressors conditional on the selection probability. One implication of this assumption

is that all quantile regression curves are parallel. Naturally, this restricts the usefulness of the

estimator because it implies that all quantile regression coe¢ cients are identical and equal to

the mean slope coe¢ cients. Thus, this approaches does not allow to estimate the e¤ects of the

regressors on the conditional distribution of the dependent variable.

However, Buchinsky (1998a) estimator can be very usefull. The �rst motivation for quantile

regression was not to estimate the e¤ects of covariates on the conditional distribution (actually,

Koenker and Bassett assume independence in their seminal paper) but was the robustness of

the estimates in the presence of non-Gaussian errors. A similar result applies to the sample

selection model and we show that very signi�cant e¢ ciency gains can be achieved when the

distribution of the error term has fat tails. The second motivation for quantile regression was to

provide robust and e¢ cient tests for the presence of heteroscedasticity, as suggested by Koenker

& Bassett (1982). Testing the independence assumption is even more acute in the presence of

sample selection. As explained above, both mean and quantile estimators are inconsistent when

this assumption is violated. It is therefore surprising that such a test has not been proposed, what

we do in section 3. Under the null hypothesis of independence, the quantile regression estimator

proposed by Buchinsky (1998a) consistently estimates the coe¢ cients, that are the same for all

quantiles. When the independence assumption is violated, the estimates are not consistent but

the slope coe¢ cients di¤er from one quantile to another, which gives power to the test.

3

As suggested by Koenker & Bassett (1982) we could consider a �nite number of quantiles at

test whether the regression coe¢ cients are the same at all of these quantiles. However, a more

powerful test statistic can be built using the whole conditional quantile process. We therefore

suggest a test procedure similar to that proposed by Chernozhukov & Fernández-Val (2005).

The critical values for this test are obtained by subsampling the empirical quantile regression

processes. Since the computation of the estimates is quite demanding, we also apply the suggestion

of Chernozhukov & Hansen (2006) consisting in resampling the score instead of re-computing

the whole process. Our Monte Carlo simulations show that both the size and power of our

Kolmogorov-Smirnov and Cramer-Von-Mises-Smirnov statistics are very satisfactory.

This paper would be incomplete if we did not o¤er a solution when the independence assump-

tion is rejected, what we expect to be the case in a wide range of applications. In this case, it

appears that point identi�cation of the mean and quantile coe¢ cients is impossible. However, in

the spirit of the work of Manski (1989 and 1994), we show that it is still possible to bound the

quantile coe¢ cients even in the absence of bounded support. Our bounds are more informative

than the worst-case bounds of Manski because we maintain the linear functional form and we

make an independence assumption for the ranks of the distribution function. This last assump-

tion relaxes the traditional independence assumption since it allows for heteroscedasticity and all

type of dependence between the covariates and the potential wage distribution.

A very appealing feature of our bounds is that the identi�ed interval collapses to a single point

in two special cases: when there is independence between the error terms and the covariates and

when there are observations whose probability of selection is close to one. The �rst case is obvious

since we are back to the classical sample selection model but it is important since it implies that

the upper and lower bounds will be quite close when we have only a small amount of dependence.

The second case is an example of Chamberlain (1986) "identi�cation at in�nity". This approach

has been used by Heckman (1990) and Andrews & Schafgans (1998) to identify the constant in

a traditional sample selection model. In the case of dependence between the error term and the

covariates, it can even be used to identify the slope coe¢ cients. Our bounds also generalize this

identi�cation strategy to the case where some observations are observed with a high, but below 1

probability. In this case, we may get a short identi�ed interval for the coe¢ cients of the quantile

regression even when they are not point identi�ed.

Two short applications illustrate our results. First we apply our tests to the small textbook

4

data set of Mroz (1987) and can reject the independence assumption at the 10% signi�cance level.

Second, using the data set of Mulligan & Rubinstein (2005), we �rst reject the null hypothesis of

independence at the 0.1% level. We then bound the coe¢ cients of the model under our weaker

set of assumptions.

The remainder of this paper is organized as follows. In section 2 we describe the sample

selection model and discuss the role of the independence assumption. In section 3 we outline the

test procedure. Section 4 is devoted to the bounds of the coe¢ cient bounds when the independence

assumption is rejected. In section 5 Monte Carlo simulations show the possible e¢ ciency gains of

quantile regression in the sample selection model as well as the power and size properties of our

tests. Section 6 revisits two typical applications of sample selection models. Section 7 concludes.

2 The Classical Sample Selection Model

The parametric two step estimator to control for sample selection bias in economic applications

was �rst proposed by Heckman (1976 and 1979) and is known as type II tobit or heckit estimator.

Newey (1991) suggested a semiparametric two step estimator based on series expansion of the

inverse Mill�s ratio. Buchinsky (1998a) suggested an extension of this model to the estimation

of conditional quantiles. As in these papers, we assume that the potential outcome is linearly

dependent of X, a vector of covariates2:

Y �i = c (�) +X0i� (�) + "i (�) . (1)

The error term is assumed to satisfy the �th quantile restriction, Q� (" (�) jX) = 0, such that

� (�) could be estimated consistently by traditional quantile regression if there was no sample

selection problem. However, Y � is latent and only observed conditional on Di = 1. Thus, the

observed outcome Y is de�ned as

Yi = c (�) +X0i� (�) + "i (�) if Di = 1 and not observed otherwhise.

D is an indicator function that depends on Z, a superset of X.3 Identi�cation of � (�) requires

identi�cation of Pr (D = 1jZ). The rest of the paper does not depend on how Pr (D = 1jZ) is2 It is important to precise that all the insights of this paper (inconsistency in the presence of dependence,

possibility to test the independence assumption, bounds) are valid for a nonparametric sample selection model. We

consider the classical parametric model because it is more often applied and for simplicity.3For identi�cation, Z has to include at least one continuous variable which is not in X and has a non-zero

coe¢ cient in the selection equation.

5

identi�ed but for completeness we make the following assumption

Di = 1�Z 0i�+ ui � 0

�: (2)

This is a parametric restriction for the sample selection equation. We implement our test statistic

by using the estimator suggested by Klein & Spady (1993) to estimate �. Therefore, we use their

assumptions, in particular we assume that U ? ZjZ 0� This independence assumption condition-

ally on the index can be relaxed if Pr (D = 1jZ) is estimated nonparametrically, as proposed in

Ahn & Powell (1993).

The observed conditional quantiles of the observed outcome can be formulated as

Q� (Y jX;D = 1) = c (�) +X 0i� (�) +Q� (" (�) jX;D = 1).

If selection into observed outcomes was random, then Q� (" (�) jX;D = 1) = 0 and the

outcome equation could be estimated consistently by quantile regression. However, in general

Q� (" (�) jX;D = 1) 6= 0. For identi�cation, Buchinsky (1998a) assumes

Assumption 1 : (u; ") has a continuous density;

Assumption 2 : fu;"(�jZ) = fu;"(�jZ 0�).

Assumption 2 implies that Q� (" (�) jX;D = 1) depends on X only through the linear index

Z 0�4 and provides us with the following representation

Q� (Y jX;D = 1) = c (�) +X 0i� (�) +Q� (" (�) jX;D = 1):

h� (Z0i�) = Q� (" (�) jZ 0i�;D = 1) is an unknown nonlinear function and the residual � (�) satis�es

the quantile restriction Q� (� (�) jX;h� (Z�) ; D = 1) = 0 by construction. This representation

shows that � (�) can be estimated by a quantile regression of Y onX and on a series approximation

of Z 0� as long as there is an excluded element in Z with a corresponding nonzero coe¢ cient in �.

The problem is that Assumption 2 implies that the quantile slope coe¢ cients � (�) are con-

stant across the distribution and are equal to the mean slope coe¢ cient �. This becomes obvious

when remembering that X � Z: Therefore, Assumption 2 implies that "?XjZ 0�. Conditional

on h� (Z�) the distribution of (u; ") does not depend on the regressors�values X and are ho-

moscedastic. Thus, quantile regression does not provide more information on the e¤ect of X on4More generally, we can allow for fU;E(�jZ) = fU;E(�jPr (D = 1jZ)). This weakens the �rst step independence

assumption but not the second step one.

6

Y than mean regression. All conditional quantile functions are parallel and only the constant

changes from one conditional quantile to another.5

Let us now assume that "?XjZ 0� does not hold such that assumption 2 is violated. In this

case ", � are generally heteroscedastic and dependent on X: and fu;" is not independent of Z

even when conditioning on Z 0�. Coe¢ cients di¤er across quantiles and the regression slopes

X 0i�� are not parallel for various � . Including h� (Z

0i�) in the outcome equation will generally

not allow to estimate �(�), � consistently. This is due to the fact that the selection rule is likely

to select either low or high values of latent outcomes into the subsample of observed outcomes

due to the covariance of (u; "). This is innocuous when the errors are homoscedastic conditional

on Z 0�, as selection shifts only the location of the (parallel) quantile and mean regression curves,

whereas their gradient remains unchanged. However, in the presence of heteroscedastic errors,

positive or negative sample selection generally causes �(�) and � to be systematically over- or

underestimated.

A graphical illustration shall elucidate the intuition. Figure 1 displays 500 simulated real-

izations of (X;Y �) under homoscedastic (1a) and heteroscedastic (1b) errors conditional on Z 0�.

The true median regression curve (solid line) in 1a and 1b is horizontal, and thus, the median

slope coe¢ cient �(0:5) on X given Z 0� is zero. Sample selection in 1a leaves the regression curve

(dashed line) in the subsample of observed (X;Y ), i.e. crosses with framing, unchanged. Merely

its location is shifted upward, as realizations with small Y � are more likely not to be observed

(crosses without framing). However, when errors are heteroscedastic as in 1b, the disproportion-

ate non-observability of realizations with low Y � causes the slope coe¢ cient among observed re-

alizations to diverge from the true value. In the case considered, the upward slanted regression

curve indicates a positive relationship between the regressor and the outcome, even though the

true coe¢ cient is zero.

As one cannot observe whether the selection process more likely �cuts out�high or low Y �,

nonconstant estimates of quantile slope coe¢ cients bear no economic interpretation. They merely

tell us, that at least one basic condition necessary in sample selection models is violated and

that neither quantile, nor mean coe¢ cient estimators are consistent6. However, this obvious

5The constant is not identi�ed without further assumptions. Only an identi�cation at in�nity argument can

solve this identi�cation problem.6Only if selection was random, the estimators would be consistent, but then h� (Z0�) = 0 and the basic intuition

for using sample selection models breaks down.

7

shortcoming also bears the major attraction of the two step quantile estimator. In fact, it can

be used as a test for the independence assumption between errors and regressors by testing the

null hypothesis H0 : �� = �� 8 � � [0; 1], where �� is some unknown constant coe¢ cient. If the

null hypothesis holds, so does assumption 2. If coe¢ cients di¤er signi�cantly across quantiles,

neither H0 nor assumption 2 hold and both the quantile and mean estimators yield inconsistent

estimates.

Figure 1

Regression slopes under homoscedasticity (1a) and heteroscedasticity (1b)

3 Test Procedure

Our test procedure can be sketched as follows. We �rst estimate the selection equation using

Klein & Spady (1993) estimator. We then estimate the conditional quantile regression process

by approximating the bias by a series expansion of the inverse Mill�s ratio, as suggested by

Newey (1991) for the mean and Buchinsky (1998a and 2001) for the quantiles. We test the

independence assumption by testing whether the quantile regression slopes are the same over

the whole distribution. The critical values of the test statistic are obtained by resampling as

8

presented in Chernozhukov & Fernández-Val (2005). When resampling is computationally too

costly we use score resampling as suggested by Chernozhukov & Hansen (2006).

In details, the semiparametric discrete choice estimator suggested in Klein & Spady (1993) is

used to estimate the selection equation (2). Formally7:

� � max��<

Xn(1�Di) log[1� E(DjZ;�)] +Di log[E(DijZ;�)]

o; (3)

where

E(DjZ;�) =Pj 6=iDj�((Z

0i�� Z 0j�)=bn)P

j 6=i �((Z0i�� Z 0j�)=bn)

; (4)

where bn is a bandwidth dependent on sample size n and �(�) is a kernel function. The optimal

bandwidth boptn is determined by the generalized cross validation criterion (GCV) as discussed

in Craven & Wahba (1979), Golub, Heath & Wahba (1979) and Li (1985), among others. This

estimator attains the semiparametric e¢ ciency bound and is the most e¢ cient among those

semiparametric estimators that do not put any restrictions on the distribution of the error term.

Furthermore, heteroscedasticity of unknown form is allowed as long as it depends on the regressors

only via the index. Klein and Spady�s Monte Carlo simulations indicate that e¢ ciency losses are

only modest compared to probit estimation when the error term is standard normal, while being

considerably more e¢ cient in �nite samples when the errors are non-Gaussian.

In a second step, the function h(Z 0�) is approximated by a power series expansion. The exact

form of the approximation is asymptotically irrelevant. As suggested by Buchinsky (1998), we

use a power series expansion of the inverse Mill�s ratio of the normalized estimated index. Thus,

the �rst order approximation will be su¢ cient if the error term is normally distributed. Anyway,

the estimator is consistent since the order of the approximation increases with the sample size.

The coe¢ cient estimates �� and �� are obtained by solving the following minimization problem:

� (�) ; � (�) = min�;�

1

n

X��Yi �X 0

i� ��J�Z 0i�

��

(5)

where �� (a) = a(� � 1 (a � 0)) is the check function suggested by Koenker & Bas-

sett (1978) and �J (Z0i�) is a polynomial vector in the inverse Mill�s ratio �J(Z

0i�) =

(1; �(Z 0i�); �(Z0i�)

2; :::; �(Z 0i�)J). Again, GCV is used to determine the optimal maximum order

J .7More precisely: we apply the estimator used in Ger�n (1996) where the trimming term � = 0:5=n is added to

equation (3) such that � � max��<Pn

(1� d) log[1 + �� E(djz; �)] + d log[E(djz; �) + �]o.

9

As discussed in section 2, one would like to test H0 : � (�) = �� for 8 � � (0; 1), where �� is

some unknown constant coe¢ cient. This can be done by de�ning a grid of q equidistant quantiles

between zero and one, �1:q � T � (0; 1) ; and considering the general null hypothesis

H0 : � (�) = ��; � � T : (6)

We estimate �� by the median regression coe¢ cients vector � (0:5). Alternatively, we could use

the mean estimate but this would require the existence of at least the �rst two moments of Y

given X. We test the null hypothesis using the Kolmogorov-Smirnov (KS) and the Cramer-Von-

Mises-Smirnov (CMS) statistics on the empirical inference process � (�)� � (0:5):

TKSn = sup��T

pnjj� (�)� � (0:5) jj�� and TCMS

n = n

ZTjj� (�)� � (0:5) jj2

��d�; (7)

where jjajj�� denotespa0��a and �� is a positive weighting matrix such that �� = �� + op(1),

uniformly in � . �� is positive de�nite, continuous and symmetric, again uniformly in � .

We simply use the identity matrix in our simulations and aplications. TKSn ; TCMSn per se

are obviously not very useful since we don�t know their asymptotic distribution. However,

Chernozhukov & Fernández-Val (2005) show that asymptotically valid critical values can be

obtained by bootstrapping the recentered test statistic. To this end, B subsamples of block size

m are drawn out of the original sample with replacement to compute the inference process

�m;i (�)� �m;i (0:5) ; (8)

where 1 � i � B and �m;i (�) are the quantile slope coe¢ cient estimates for draw i and block size

m. The corresponding KS and CMS test statistics of the recentered bootstrapped process are

TKSn;m;i = sup��T

pmjj�m;i (�)� �m;i (0:5)� (� (�)� � (0:5))jj�� and (9)

TCMSn;m;i = m

ZTjj�m;i (�)� �m;i (0:5)� (� (�)� � (0:5))jj2��d� :

The distribution free p-values for the test statistics are obtained by simply estimating the prob-

ability Pr[T (� (�) � � (0:5) � (� (�) � ��)) > Tn] by 1=BPBi=1 IfTn;m;i > Tng, where Pr[�] is a

probability measure and I is the indicator function. However, the repeated computation of co-

e¢ cient estimates for each resampling step can get quite costly, especially in large samples. For

this reason, we follow Chernozhukov & Fernández-Val (2005) and use score resampling based on

10

the linear approximations of the empirical inference processes instead, which is considerably less

burdensome. The linear representation for the inference process is given by

pn(� (�)� � (0:5)� (� (�)� ��)) = � 1p

n

nXi=1

si(�) + op(1): (10)

Again, B subsamples of estimated scores are drawn. Let �i a speci�c subsample for blocksize to

m and 1 � i � B. The estimated inference process is de�ned as 1=mPj��i

sj(�). KS and CMS

statistics are then

TKSn;m;i � sup��T

pmjj1=m

Xj��i

sj(�)jj�� and TCMSn;m;i � m

ZTjj1=m

Xj��i

sj(�)jj2��d�: (11)

The computation of the score function in the quantile sample selection model framework is pre-

sented in the appendix of Buchinsky (1998a), using the Ichimura (1993) estimator for the �rst

step estimation of the selection probability. In the appendix, we present a slightly di¤erent ver-

sion, which is adapted to the estimator suggested by Klein & Spady (1993).

4 Bounds

It has been argued in section 2 that the traditional mean and quantile sample selection estima-

tors are inconsistent if the independence assumption is violated. Still, this does not necessarily

mean that nothing can be said about the the coe¢ cients�size. In the normal multiplicative het-

eroscedastic sample selection model, the coe¢ cients are even point identi�ed as shown by Don-

ald (1995). Under a much more general setting, Manski (1989 and 1994) derives the worst-case

bounds for the conditional mean and quantiles in the absence of any further assumption.

Here, an intermediate path between the worst case bounds of Manski and the classical sample

selection model is pursued. Apart from the independence assumption, all model assumptions

made by Buchinsky (1998a) are maintained to derive bounds of the quantile regression coe¢ cients.

Thus, the distribution of Y is not restricted and the regressors are allowed to in�uence the whole

conditional distribution of the dependent variable and not only the �rst moment(s). To this end,

independence is replaced by the following, less restrictive, assumption:

FY (F�1Y � (� jX = �x)jX = �x; P = p;D = 1) = FY (F

�1Y � (� jX = ~x)jX = ~x; P = p;D = 1) = �(p);

(12)

8 �x and ~x in the support of X. P denotes the selection probability de�ned as P = Pr(D = 1jZ).

Equation (12) states that the rank �(p) in the observed distribution that corresponds to rank

11

� in the latent (true) distribution does not depend on X given P . This assumption is implied

by the stronger independence assumption that has been made in the majority of studies using

parametric and nonparametric selection models. In contrast to Manski (1989 and 1994), equation

(12) excludes for instance the possibility of positive selection for low values of X and negative

selection for high values of X. This is the main reason why tighter bounds are obtained. The

second di¤erence with Manski is that a linear parametric speci�cation for the conditional quantile

of the dependent variable is assumed:

F�1Y � (� jX = �x) = c (�) + �x0� (�) : (13)

Linearity is not essential for the basic idea of the bounds but it will be maintained for simplicity.

By combining equations (12) and (13), one obtains

FY (c (�) + �x0� (�) jX = �x; P = p;D = 1) = FY (c (�) + ~x

0� (�) jX = ~x; P = p;D = 1) = �(p):

The value at the �(p)th conditional quantile in the observed outcome corresponds to the value

at the � th conditional outcome in the latent outcome. If �(p) was known, � (�) could be con-

sistently estimated by regressing Y on X and on a nonparametric series expansion of P at the

�(p)th quantile, conditional on D = 1. However, �(p) is unknown to the researcher due to the

unconsciousness of the selection rule. If there was random or no sample selection, �(p) would be

equal to � . Under nonrandom and highly positive selection, �(p) is equal to ��(1�p)p , whereas

under highly negative selection, �(p) is equal to �p8. Along with assumption (12) this information

can be used to bound the unknown �(p). By bounding �(p), � (�) is bounded, too. Let �(� ; p)

denote the vector of the true slope coe¢ cients at the � th quantile for observations with P = p

and let �(�; p) denote the vector of the slope coe¢ cients at the �(p)th quantile for the observed

realizations with P = p. Let PD=1 denote the set of selection probabilities for the observed pop-

ulation, i.e. PD=1 encounters all P conditional on D = 1. The true quantile coe¢ cient at P = p,

8An example should elucidate this point. Consider a distribution that is equal to the closed interval f1; 2; ::; 10g.

The value at � = 0:4 is then 4. Let p = 0:8. Under highly positive selection, the lower 1 � p (0:2) share of the

distribution will not be observed after selection. I.e., f1; 2g are not selected and the observed interval is f3; 4; :::; 10g.

4 is now at the 0:25th quantile of observed values, i.e. 0:25 = �(p) = ��(1�p)p

. Under highly positive selection , the

upper 0:2 share of the distribution, i.e. f9; 10g are not selected an the observed interval is f1; 2; :::; 8g. 4 is now at

the 0:5th quantile of observed values, i.e. 0:5 = �(p) = �p. The same holds for any other value for � and p.

12

�(� ; p), is located within these bounds:

�(� ; p)�[L�(p) = min�2[ ��(1�p)

p; �p]

�(�; p); U�(p) = max�2[ ��(1�p)

p; �p]

�(�; p)]; (14)

where p 2 PD=1 (15)

Let us additionally assume that �(� ; p) is constant in p, i.e., �(� ; p) = � (�). Then, � (�) has to

be an element out of the intersection of possible values for �(� ; p) across all selection probabilities

in the observed population:

� (�) 2 [maxp

L�(p); minp

U�(p)]:

Thus, the �nal bounds for �� are the best case solution (in the sense that they minimize the range

of possible values for �� over various p) out of all bounds obtained in equation (14). However, the

identi�cation of bounds on �(p) (and � (�)) hinges on two conditions. Let � denote the maximum

of all P 2 PD=1, i.e. the maximum of the selection probabilities in the observed population.

In order to obtain informative bounds, (i) � > 0:5 and (ii) 1 � � < � < � has to hold. For

� � 0:5, bounds on �(p), that is naturally bounded to the interval [0; 1], are not informative as

either ��(1��)� � 0, or �� 1, or both. For such cases � (�) cannot be bounded either. Secondly,

it is obvious that � has to lie within the 1 � � upper share and � lower share of the latent

distribution, as 1��; � determine the observed distribution for the worst cases of highly positive

and highly negative selection, respectively. Informative bounds cannot be obtained for any �

hitting the boundaries of or falling out of the interval [1��; �], which gives rise to 1�� < � < �.

We can use our framework of interval identi�cation based on bounds to investigate under

which conditions point identi�cation of � (�) is obtained. Point identi�cation can be considered

as special case when the interval that is identi�ed collapses to a single point. This is the case

when either independence between the error terms and the covariates or identi�cation at in�nity

(or both) is satis�ed. If independence is satis�ed, point identi�cation of the slope coe¢ cients is

obtained because all slope coe¢ cients are constant across quantiles and thus, the minimum is

equal to the maximum in 14. In this case, we are back in the framework of the classical sample

selection model. If independence does not hold, point identi�cation is still feasible if the data

contain observations with selection probability P = 1. Note that P = 1 is not only su¢ cient,

but also necessary for point identi�cation under heteroscedasticity. This is obvious from the fact

that only for P = 1, ��(1�p)p = �p = � = �(p), whereas for 0 < p < 1, the upper and lower

13

bound of �(p) di¤ers by �(1�p)p . Identi�cation conditional on P = 1 is known as identi�cation

at in�nity and has �rst been discussed by Chamberlain (1986). The strategy suggested in this

section, however, does not require to have such observations at hand and still allows to bound

the coe¢ cients for the share of the population for which P > 0:5. Furthermore, the constant can

be bounded by the same strategy. In contrast to the slope coe¢ cient estimation, the constant

is not point identi�ed even when independence is satis�ed, but P < 1 for all observations.

However, point identi�cation of the constant is feasible by identi�cation at in�nity as it has

been discussed in Heckman (1990) and Andrews & Schafgans (1998). As a �nal remark it is

worth noting that one needs not necessarily assume �� (p) = �� (see the discussion in Heckman

& Vytlacil (2005)). Without this assumption, it is not possible to integrate the results over

di¤erent participation probabilities but the bounds on �� (p) at each P = p remain valid.

5 Monte Carlo Simulations

In this section, we present the results of Monte Carlo simulations on the e¢ ciency and robustness

of quantile regression in sample selection models as well as on the power and small sample size

properties of our tests in �nite samples.

In their seminal paper on quantile regression, Koenker & Bassett (1978) provide Monte Carlo

results on the e¢ ciency of various estimators for several distributions of the errors. One of their

conclusions is that under Gaussian errors, the median estimator makes only small sacri�ces of

e¢ ciency compared to the mean estimator. It is, however, considerably more e¢ cient when errors

follow a non-Gaussian distribution, such as Laplace, Cauchy or contaminated Gaussian. Thus,

even if the errors are independent of the covariates, quantile regression methods can be preferable

to mean regression for the sake of e¢ ciency gains. A second argument in favor of quantile

regression is its increased robustness in the case of contaminated outcomes resulting in smaller

biases in coe¢ cient estimates. To illustrate that such e¢ ciency and robustness considerations

also apply to sample selection models, we conducted Monte Carlo simulations for (i) t-distributed

error terms with three degrees of freedom, (ii) Cauchy distributed error terms, (iii) contaminated

normal errors, (iv) contaminated outcomes. The data generating process in speci�cations (i) to

14

(iii) is de�ned as

Di = IfXi�+ Zi� + ui > 0g;

Yi = Xi� + "i if Di = 1;

X � N(0; 1); Z � N(0; 1);

� = 1; � = 1; � = 1:

In speci�cation (i) u; " � t(df=3) and in (ii) u; " � Cauchy. The covariance between the errors is

set to 0.8 for both cases, i.e. Cov(U;E) = 0:8. In (iii), the errors are a Gaussian mixture which

is constructed as

ui = 0:95�ui1 + 0:05�ui2; u1 � N(0; 1); u2 � N(0; 100);

"i = 0:95�"i1 + 0:05�"i2; "1 � N(0; 1); "2 � N(0; 100);

Cov(u1; "1) = (0:8); Cov(u2; "2) = (8);

where � is the cumulated density function. In (iv) the errors are standard normal: u � N(0; 1),

" � N(0; 1), Cov(u; ") = 0:8. However, the outcome is contaminated by the factor 10 with 5%

probability:

Yi = Di[(1� j)(Xi + ") + j(10Xi + ")]; j � U(0; 1); P r(j = 1) = 0:05:

For each of the model speci�cations a simulation with 1000 replications is conducted for a sample

of n = 400 observations. The bandwidth for the (�rst step) selection estimator is set to bn = 0:3.

Table 1

Coefficient estimates and variances of mean and median estimators

400 observations, 1000 replications, bandwidth = 0.3

Median estimator Mean estimator

Distributions Estimate Variance Estimate Variance

(i) Student�s t (df=3) 1.015 0.018 1.004 0.028

(ii) Cauchy 1.026 0.037 1.498 2756.653

(iii) Contaminated normal error 0.998 0.014 0.985 0.061

(iv) Contaminated outcome 1.105 0.016 1.914 0.178

The estimates and variances of the �-coe¢ cients for the median and mean estimator are

reported in table 1. In all speci�cations considered, the median estimator is more e¢ cient than the

15

mean estimator. In the contaminated outcome speci�cation it outperforms the mean estimator

more than 10 times. In the case of Cauchy distributed error terms, the variance of the mean

coe¢ cient is theoretically unbounded whereas it stays quite moderate for the median estimator.

The median estimator is also superior in robustness, which becomes particularly clear when

looking at the coe¢ cient estimates of the contaminated outcome speci�cation. While the mean

estimate is severely upward biased, the median estimate is still only moderately higher than the

true coe¢ cient � = 1.

In the second part of this section, we present results on the power and size properties of

the Kolmogorov-Smirnov (KS) and Cramer-Von-Mises-Smirnov (CMS) resampling tests in �nite

samples. We do so for three speci�cations based on (i) Gaussian, (ii) t-distributed and (iii) Cauchy

distributed error terms u and ". In speci�cation (i), the data generating process is de�ned as:

Di = IfXi�+ Zi� + uig > 0;

Yi = Xi� + (1 +Xi )"i];

u � N(0; 1); " � N(0; 1); Cov(u; ") = 0:8; X � N(0; 1); Z � N(0; 1);

� = 1; � = 1; � = 1; = 0; 0:2; 0:5;

where �, � are mean coe¢ cients. One is interested in the rejection frequencies of the KS and

CMS statistics testing the null hypothesis of constant coe¢ cients across all quantiles by drawing

repeated bootstrap samples. As outlined in section 2, H0 : �� = �� 8 0 � � � 1, where ��

is some unknown constant coe¢ cient. In the location shift model ( = 0), the error term E

is independent of the regressor X and hence, H0 is true. In this case, the rejection rates in the

Monte Carlo simulations yield the tests�size properties. Heteroscedasticity is introduced if is set

to values di¤erent from zero. In the location scale shift models ( = 0:2; 0:5), the null hypothesis

is false and the rejection frequencies indicate the tests�power to reject the incorrect H0. In order

to construct the test statistics, the coe¢ cients �� are estimated at equidistant quantiles with

step size 0:01 and compared to the the median estimate ��=0:5. Results are presented for three

di¤erent quantile regions for which the quantile coe¢ cients are estimated:

� �

8>>><>>>:T[0:05;0:95] = f0:05; 0:06; :::; 0:95g

T[0:1;0:9] = f0:10; 0:11; :::; 0:90g

T[0:2;0:8] = f0:20; 0:21; :::; 0:80g

Therefore, the number of estimated quantile coe¢ cients di¤ers across regions. In particular,

16

the largest region [0:05; 0:95] will include quantiles that are relatively close to the boundaries

0; 1, whereas all quantiles in the �narrow�region [0:2; 0:8] are situated well in the interior. As

will be shown below, the choice of the region a¤ects the tests��nite sample properties and it is

the distribution of the error term " that determines wether a narrow or large quantile region is

preferable. Similarly to Chernozhukov & Fernández-Val (2005), 1000 Monte Carlo replications

per simulation and 250 bootstrap replications within each replication are conducted to compute

the critical values of the test statistics. Five samples of sizes n = 100 to n = 3200 are generated.

In each sample, we draw two di¤erent subsamples with replacement for the bootstrap, with block

size m = 20 + n1=4 and m = n, respectively.

17

Table 2

Empirical rejection frequencies for 5% resampling tests

" �N(0,1), 250 bootstrap draws, 1000 replications

Kolmogorov-Smirnov statistics

m = 20 + n1=4

= 0 = 0:2 = 0:5

� � [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8]

n = 100 0.006 0.000 0.000 0.007 0.003 0.001 0.034 0.011 0.002

n = 400 0.030 0.014 0.003 0.306 0.184 0.070 0.920 0.818 0.500

n = 800 0.024 0.026 0.012 0.644 0.517 0.283 1.000 0.998 0.957

n = 1600 0.040 0.032 0.020 0.936 0.906 0.717 1.000 1.000 1.000

n = 3200 0.030 0.032 0.031 0.998 1.000 0.973 1.000 1.000 1.000

m = n

= 0 = 0:2 = 0:5

� � [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8]

n = 100 0.003 0.000 0.000 0.005 0.001 0.001 0.021 0.004 0.001

n = 400 0.022 0.006 0.002 0.273 0.145 0.054 0.895 0.780 0.434

n = 800 0.024 0.019 0.009 0.636 0.482 0.253 0.996 0.994 0.952

n = 1600 0.038 0.033 0.017 0.934 0.885 0.705 1.000 1.000 1.000

n = 3200 0.042 0.029 0.027 0.999 1.000 0.975 1.000 1.000 1.000

Cramer-Von-Mises-Smirnov statistics

m = 20 + n1=4

= 0 = 0:2 = 0:5

� � [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8]

n = 100 0.001 0.000 0.000 0.001 0.001 0.000 0.011 0.003 0.002

n = 400 0.011 0.006 0.002 0.191 0.103 0.047 0.933 0.835 0.468

n = 800 0.011 0.010 0.007 0.610 0.451 0.216 1.000 0.998 0.955

n = 1600 0.023 0.013 0.008 0.959 0.898 0.704 1.000 1.000 1.000

n = 3200 0.022 0.020 0.014 1.000 0.998 0.969 1.000 1.000 1.000

m = n

= 0 = 0:2 = 0:5

� � [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8]

n = 100 0.001 0.000 0.000 0.001 0.000 0.000 0.006 0.005 0.003

n = 400 0.011 0.006 0.002 0.192 0.112 0.045 0.924 0.838 0.498

n = 800 0.017 0.009 0.010 0.617 0.463 0.252 0.999 0.998 0.957

n = 1600 0.026 0.020 0.014 0.958 0.911 0.735 1.000 1.000 1.000

n = 3200 0.026 0.024 0.021 1.000 0.999 0.978 1.000 1.000 1.000

18

The empirical rejection frequencies reported in table 2 suggest that the resampling tests seem

to work quite well under standard normally distributed errors. In the case of homoscedastic

errors ( = 0), both the KS and CMS statistics are more conservative than the true rejection

frequencies (5%) of the data generating process, at least for the sample sizes considered. However,

both statistics generally seem to converge to the true values as sample size increases, although not

monotonically. The KS test does so at a faster pace than the CMS test. For the latter, the larger

resampling block size (m = n) works somewhat better than the smaller one (m = 20 + n1=4).

Under heteroscedastic errors, the test statistics converge to the true rejection rates of 100% as

sample size increases. As expected, this happens at a faster pace for = 0:5 than for = 0:2.

The power properties of the CMS and the KS statistics are rather similar and quite satisfactory,

given that the sample size is not too small. The convergence of the KS statistic is faster at both

levels of heteroscedasticity for the smaller block size, whereas the converse seems to be true for

the CMS statistic, given that the sample size is not too small. As one would expect, the empirical

rejection frequencies converge faster to the true values as the quantile region increases and this

holds true for both tests and any value . Summing up, both tests seem to perform pretty well in

�nite samples with Gaussian errors. For moderate sample sizes of several thousand observations,

the power is su¢ ciently high and small size distortions o hardly a¤ect the test statistics. In our

simulation, the KS test seems to be somewhat superior due to a faster convergence when = 0.

In speci�cation (ii), almost the same model is used as before, only the errors are changed to be

t-distributed with three degrees of freedom, U � t(df=3), " � t(df=3), Cov(u; ") = (0:8). Table

3 reports the rejection frequencies for t-distributed error terms. As one would expect, deviations

from true rejection rates in �nite samples generally increase due to the fatter tails compared to

Gaussian errors. For = 0, rejection frequencies of the KS test �overshoot�in small samples and

seem to converge to the true values as sample size increases. At least in small samples the CMS

test seems to perform slightly better as it stays on the �safe side�, i.e., it is more conservative than

the true rejection rates. Furthermore, the CMS rejection frequencies converge faster to 100%

under heteroscedastic errors. The larger block size (m = n) yields better results for both tests,

given that the sample size is not too small. Contrarily to the case of Gaussian errors, the largest

quantile region is generally not the best choice. Under heteroscedastic errors, T[0:1;0:9] is superior

for both the KS and CMS statistic, which is again due to fat tails. Even though power and size

properties are quite appealing for both tests, the CMS based procedure seems to be preferable in

19

the case of t-distributed errors.

20

Table 3


" �t (df=3), 250 bootstrap draws, 1000 replications


m = 20 + n1=4

= 0 = 0:2 = 0:5

� � [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8]

n = 100 0.058 0.007 0.000 0.062 0.012 0.001 0.087 0.014 0.004

n = 400 0.084 0.044 0.011 0.284 0.189 0.070 0.810 0.811 0.482

n = 800 0.094 0.063 0.019 0.410 0.441 0.281 0.962 0.991 0.950

n = 1600 0.056 0.056 0.032 0.570 0.719 0.651 0.999 1.000 1.000

n = 3200 0.052 0.056 0.043 0.827 0.957 0.942 1.000 1.000 1.000

m = n

= 0 = 0:2 = 0:5

� � [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8]

n = 100 0.037 0.006 0.000 0.041 0.005 0.001 0.064 0.006 0.001

n = 400 0.070 0.032 0.009 0.252 0.153 0.043 0.801 0.758 0.415

n = 800 0.103 0.068 0.013 0.447 0.414 0.253 0.966 0.986 0.943

n = 1600 0.066 0.061 0.026 0.642 0.743 0.632 0.999 1.000 1.000

n = 3200 0.068 0.055 0.045 0.909 0.969 0.945 1.000 1.000 1.000


m = 20 + n1=4

= 0 = 0:2 = 0:5

� � [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8]

n = 100 0.001 0.000 0.000 0.003 0.002 0.000 0.026 0.005 0.000

n = 400 0.022 0.012 0.002 0.215 0.107 0.026 0.894 0.815 0.466

n = 800 0.050 0.020 0.004 0.540 0.441 0.209 0.999 0.997 0.958

n = 1600 0.036 0.023 0.012 0.876 0.852 0.632 1.000 1.000 1.000

n = 3200 0.035 0.030 0.024 0.993 0.996 0.967 1.000 1.000 1.000

m = n

= 0 = 0:2 = 0:5

� � [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8]

n = 100 0.000 0.000 0.000 0.001 0.000 0.000 0.014 0.003 0.001

n = 400 0.020 0.007 0.001 0.205 0.104 0.019 0.890 0.800 0.462

n = 800 0.053 0.017 0.004 0.581 0.462 0.230 1.000 0.998 0.960

n = 1600 0.050 0.034 0.020 0.896 0.862 0.665 1.000 1.000 1.000

n = 3200 0.049 0.036 0.036 0.995 0.998 0.976 1.000 1.000 1.000

21

Lastly, table 4 displays the test rejection rates for Cauchy distributed errors (iii): u � Cauchy,

" � Cauchy, Cov(u; ") = (0:8). Due to the Cauchy distribution�s property of unde�ned �rst and

higher moments, the sample size needs to be su¢ ciently large (at least several 1000 observations)

to obtain satisfactory results. Under homoscedastic errors, the CMS test is again more conser-

vative than the KS test for any chosen block size. Given that the sample size is not too small,

the former outperforms the latter for = 0:2, 0:5, as its rejection rates converge faster to 100%.

In the majority of scenarios considered, the smaller block size yields better results for both tests

and across di¤erent levels of , due to the smaller probability of extreme values related to the

non-stationary behavior of the Cauchy distribution. For the same reason, the small quantile re-

gion T[0:2;0:8] is preferable to its larger alternatives. Thus, the power gains obtained by reducing

the probability of disturbing outliers clearly outweigh the power losses due to considering a de-

creased range of the distribution in the inference process. Summing up, the CMS test seems to

be somewhat superior to the KS test when errors are non-Gaussian, at least for the sample sizes

considered.

22

Table 4


" � Cauchy, 250 bootstrap draws, 1000 replications


m = 20 + n1=4

= 0 = 0:2 = 0:5

� � [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8]

n = 100 0.127 0.032 0.008 0.117 0.037 0.003 0.149 0.051 0.006

n = 400 0.081 0.056 0.035 0.097 0.102 0.053 0.303 0.375 0.335

n = 800 0.085 0.064 0.041 0.104 0.095 0.097 0.437 0.607 0.739

n = 1600 0.062 0.036 0.025 0.150 0.176 0.262 0.698 0.900 0.981

n = 3200 0.059 0.034 0.038 0.261 0.350 0.508 0.913 0.988 1.000

m = n

= 0 = 0:2 = 0:5

� � [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8]

n = 100 0.088 0.018 0.004 0.088 0.025 0.003 0.113 0.030 0.006

n = 400 0.091 0.056 0.018 0.103 0.084 0.030 0.285 0.333 0.270

n = 800 0.086 0.063 0.036 0.097 0.076 0.072 0.433 0.553 0.661

n = 1600 0.069 0.031 0.023 0.132 0.144 0.220 0.704 0.881 0.963

n = 3200 0.079 0.041 0.035 0.279 0.325 0.472 0.926 0.989 0.999


m = 20 + n1=4

= 0 = 0:2 = 0:5

� � [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8]

n = 100 0.063 0.007 0.000 0.067 0.008 0.001 0.098 0.018 0.001

n = 400 0.073 0.041 0.004 0.104 0.075 0.017 0.420 0.489 0.279

n = 800 0.070 0.043 0.014 0.118 0.123 0.078 0.658 0.815 0.832

n = 1600 0.053 0.030 0.014 0.194 0.293 0.337 0.890 0.986 0.996

n = 3200 0.046 0.032 0.029 0.379 0.563 0.708 0.992 1.000 1.000

m = n

= 0 = 0:2 = 0:5

� � [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8]

n = 100 0.053 0.005 0.000 0.057 0.007 0.000 0.066 0.013 0.001

n = 400 0.072 0.036 0.004 0.090 0.066 0.013 0.395 0.441 0.254

n = 800 0.083 0.040 0.016 0.110 0.092 0.063 0.625 0.773 0.812

n = 1600 0.052 0.033 0.014 0.174 0.245 0.292 0.888 0.980 0.993

n = 3200 0.061 0.031 0.025 0.385 0.525 0.668 0.994 1.000 1.000

23

6 Labor Market Applications

In this section, two applications of the test procedure to labor market data are presented. The �rst

application is a text book example for heckit estimation given by (Greene 2003), using a sample

of 753 married women originally investigated by (Mroz 1987). The data set contains information

on the wages and hours worked of the 428 women with positive labor supply (D = 1) along with

a set of regressors for the whole sample. Estimation is based on the conventional selection model

with normally distributed errors and and additive and linear bias correction function.

Yi = X0i� + "i if Di = 1; Di = IfX 0

i� + "i � Z 0i�+ uig; E("jX) = 0; E("jX;D = 1) 6= 0;

(16)

Z 0i� + ui now denotes the reservation wage that depends on a set of characteristics Z and the

error term u. An individual provides positive labor supply (Di = 1) only if the o¤ered wage

X 0i�0 + " is at least as high as the reservation wage. In the model presented in Greene (2003), Y

is the hourly wage and X consists of experience, experience2, eduction and a dummy for living

in a large urban area and Z contains age, age2, family income, a dummy for having kids, and

education. See page 786 in chapter 22 of Greene (2003) for the results.

Table 5

Labor market application I: p-values of the KS and CMS tests

Test m = 20 + n1=4 m = 20 + n1=2:01 m = n m = n=4

KS 0.037 0.037 0.039 0.045

CMS 0.094 0.097 0.086 0.094

Table 5 reports the p-values of the KS and CMS tests, however approximating h(g) by a

polynomial as outlined in section 2 rather than assuming a linear relationship. The number

of bootstraps is B = 10; 000 and analogously to Chernozhukov & Fernández-Val (2005), four

di¤erent block sizes m are used for the bootstrap subsamples. The coe¢ cients �� are estimated

at 99 equidistant quantiles, � � T81 = f0:10; 0:11; :::; 0:89; 0:90g. Again, the GCV is applied to

determine the optimal bandwidth boptn in (4) and the optimal maximum order �optJ9. For all

9boptn = 0:2, �optJ = 2

24

chosen m the KS and CMS tests reject the null hypothesis of constant quantile coe¢ cients on

the 5% and 10% level, respectively. So even in very small samples, the tests proof to be quite

powerful.

The second application deals with a considerably larger data set. In their study on US women�s

relative wages, Mulligan & Rubinstein (2005) estimate the conditional mean wages of married

white women using the heckit estimator. They investigate two repeated cross-sections covering

the periods 1975-1979 and 1995-1995 stemming from the US Current Population Survey (CPS)

and consider only observations of married white couples. In this application Y represents the

wife�s log weekly wage, which is computed with respect to the total annual earnings de�ated by

the US Consumer Price Index (CPI). Only prime age workers (25-54) that worked full time and at

least 50 weeks in the respective year are considered. The vector of regressors X consists of wife�s

working experience-15, wife�s (working experience-15)2=100 and wife�s education, including a

teacher dummy. Z contains X and additionally includes husband�s education, husband�s (working

experience-15), husband�s (working experience-15)2=100, and the number of children aged 0-6

present in the household. In period 1975-79, all in all 97,067 observations are available and d = 1

in 20,839 cases. For 1995-1999, the respective numbers are 87,004 and 35,206.

Table 6

Labor market application II: p-values of the KS and CMS tests

1975-1979

Test m = 20 + n1=4 m = 20 + n1=2:01 m = n m = n=4

KS 0.003 0.000 0.000 0.000

CMS 0.001 0.000 0.000 0.000

1995-1999

Test m = 20 + n1=4 m = 20 + n1=2:01 m = n m = n=4

KS 0.000 0.000 0.000 0.000

CMS 0.000 0.000 0.000 0.000

Mulligan & Rubinstein (2005) compare the coe¢ cient estimates of the heckit regression to

OLS estimates and �nd that the sample selection bias has changed over time from being negative

(-0.075) in the �rst period to positive (0.161) in the second. This would suggest that from

25

1975 to 1979, married women out of labor force had higher average potential earnings than

their working counterparts, whereas the converse was true within 1995 and 1999. Given that all

other parametric assumptions are correct, this holds only if E is homoscedastic. However, the

hypothesis of independent errors is rejected by both the KS and CMS tests on the 0.1% level

for B = 1000, T81 = f0:10; 0:11; :::; 0:89; 0:90g10, as reported in table 6. The highly signi�cant

p-values suggest that one can neither draw conclusions on the size, nor on the sign of the selection

bias. In fact, the bias might have been positive or negative throughout both periods. Therefore,

the authors�conclusions about the development of potential earnings are not necessarily correct.

As independence of E and X does not hold, coe¢ cient estimates are inconsistent and bear no

economic interpretation.

7 Conclusions

Independence between regressors and errors is a conditio sine qua non for the identi�cation in any

parametric or semiparametric sample selection model. It is, however, a rather strong restriction

that is likely to be violated in many �elds of research where two step estimators to correct

for selection bias are heavily applied. In cases where homoscedasticity holds indeed, quantile

regression methods do not seem to be particularly attractive in sample selection models at a �rst

glance, as all quantile regression curves are parallel and all conditional quantile coe¢ cients are

of the same quantity as the mean coe¢ cient. Applications which have found (and were happy to

�nd) signi�cant di¤erences between coe¢ cients at di¤erent quantiles have merely proofed that

their model assumptions are wrong and the estimator is inconsistent.

However, quantile regression methods have valuable properties that also apply to sample se-

lection models. Firstly, and this was the �rst motivation for suggesting quantile regression in

Buchinsky (1998a), quantile-based estimators are more robust and more e¢ cient than mean esti-

mators when distributions have fat tails. Secondly, as argued in Koenker & Bassett (1982), quan-

tile coe¢ cients can be used to detect heteroscedasticity and thus violations of the independence

assumption between errors and regressors. Thirdly, if independence is not satis�ed, quantile co-

e¢ cients can be bounded quite easily, which is not the case for mean coe¢ cients. In this paper,

10 Instead of the Gaussian kernel, the Epanechnikov kernel is used for �(�) in (4) to reduce computational burden.

boptn is 0.13 and 0.06 for 1975-1979 and 1995-1999, respectively, but it is locally increased to include at least 10

observations in �(�). �optJ is 3 and 2, respectively.

26

all three arguments in favor of quantile regression were discussed in the light of sample selection

models. We suggested a quantile-based and distribution-free procedure to test for homoscedastic-

ity, demonstrated the satisfactory size and power properties by Monte Carlo simulations and ap-

plied it to labor market data. Given that independence between the errors and covariates holds,

we showed that quantile estimators are more e¢ cient and more robust in sample selection models

than mean estimators if errors are non-Gaussian. For the case that independence does not hold,

we proposed conditions to identify the upper and lower bound for the interval identi�cation of

quantile coe¢ cients.

The application of the test procedure to labor market data previously investigated by Mulli-

gan & Rubinstein (2005) clearly rejects homoscedasticity and indicates that the coe¢ cients ob-

tained by heckit estimation are inconsistent. We have the strong suspicion that this is not an

exception, but that a considerable share of the results presented in the sample selection literature

are questionable due to the violation of the standard assumption of homoscedasticity.

27

References

Ahn, H. & Powell, J. (1993), �Semiparametric estimation of censored selection models with anonparametric selection mechanism�, Journal of Econometrics 58, 3�29.

Andrews, D. & Schafgans, M. (1998), �Semiparametric estimation of the intercept of a sampleselection model�, Review of Economic Studies 65, 497�517.

Angrist, J., Chernozhukov, V. & Fernández-Val, I. (2006), �Vouchers for private schooling incolombia�, Econometrica.

Buchinsky, M. (1994), �Changes in the u.s. wage structure 1963-1987: Application of quantileregression�, Econometrica 62, 405�458.

Buchinsky, M. (1998a), �The dynamics of changes in the female wage distribution in the usa: Aquantile regression approach�, Journal of Applied Econometrics 13, 1�30.

Buchinsky, M. (1998b), �Recent advances in quantile regression models: A practical guidline forempirical research�, The Journal of Human Resources 33(1), 88�126.

Buchinsky, M. (2001), �Quantile regression with sample selection: Estimating women�s return toeducation in the u.s.�, Empirical Economics 26, 87�113.

Chamberlain, G. (1986), �Asymptotic e¢ ciency in semiparametric models with censoring�, Journalof Econometrics 32, 189�218.

Chaudhuri, P. (1991), �Global nonparametric estimation of conditional quantile functions andtheir derivatives�, Journal of Multivariate Analysis 39, 246�269.

Chen, S. & Khan, S. (2003), �Semiparametric estimation of a heteroskedastic sample selectionmodel�, Econometric Theory 19, 1040�1064.

Chernozhukov, V. & Fernández-Val, I. (2005), �Subsampling inference on quantile regressionprocesses�, Sankhya: The Indian Journal of Statistics 67, 253�276.

Chernozhukov, V. & Hansen, C. (2006), �Instrumental quantile regression inference for structuraland treatment e¤ect models�, Journal of Econometrics 132, 491�525.

Cosslett, S. (1991), Distribution-free estimator of a regression model with sample selectivity, inW. Barnett, J. Powell & G. Tauchen, eds, �Nonparametric and semiparametric methods ineconometrics and statistics�, Cambridge University Press, Camdridge, UK, pp. 175�198.

Craven, P. & Wahba, G. (1979), �Smoothing noisy data with spline functions: estimating thecorrect degree of smoothing by the method of generalized cross-validation�, NumerischeMathematik 31, 377�403.

Das, M., Newey, W. & Vella, F. (2003), �Nonparametric estimation of sample selection models�,Review of Economic Studies 70, 33�58.

Donald, S. G. (1995), �Two-step estimation of heteroskedastic sample selection models�, Journalof Econometrics 65, 347�380.

28

Gallant, A. & Nychka, D. (1987), �Semi-nonparametric maximum likelihood estimation�, Econo-metrica 55, 363�390.

Ger�n, M. (1996), �Parametric and semi-parametric estimation of the binary response model oflabour market participation�, Journal of Applied Econometrics 11(3), 321�339.

Golub, G. H., Heath, M. & Wahba, G. (1979), �Generalized cross validation as a method forchoosing a good ridge parameter�, Technometrics 21(2), 215�224.

Greene, W. H. (2003), Econometric Analysis, New Jersey: Pearson Education.

Gronau, R. (1974), �Wage comparisons-a selectivity bias�, Journal of Political Economy82(6), 1119�1143.

Guntenbrunner, C. & Jureµcková, J. (1992), �Regression quantile and regression rank score processin the linear model and derived statistics�, Annals of Statistics 20, 305�330.

Heckman, J. (1974), �Shadow prices, market wages and labor supply�, Econometrica 42, 679�694.

Heckman, J. (1990), �Varieties of selection bias�, American Economic Review, Papers and Pro-ceedings 80, 313�318.

Heckman, J. J. (1979), �Sample selection bias as a speci�cation error�, Econometrica 47(1), 153�161.

Heckman, J. J. & Vytlacil, E. (2005), �Structural equations, treatment e¤ects, and econometricpolicy evaluation 1�, Econometrica 73(3), 669½U738.

Ichimura, H. (1993), �Semiparametric least squares (sls) and weighted sls estimation of single-index models�, Journal of Econometrics 58, 71�120.

Klein, R. W. & Spady, R. H. (1993), �An e¢ cient semiparametric estimator for binary responsemodels�, Econometrica 61(2), 387�421.

Koenker, R. (2005), Quantile Regression, Cambridge University Press.

Koenker, R. & Bassett, G. (1978), �Regression quantiles�, Econometrica 46(1), 33�50.

Koenker, R. & Bassett, G. (1982), �Robust tests for heteroskedasticity based on regression quan-tiles�, Econometrica 50(1), 43�62.

Koenker, R. & Hallock, K. F. (2001), �Quantile regression�, Journal of Economic Perspectives15, 143�156.

Koenker, R. & Xiao, Z. (2002), �Inference on the quantile regression process�, Econometrica70, 1583�1612.

Li, K. C. (1985), �From stein�s unbiased risk estimates to the method of generalized cross valida-tion�, The Annals of Statistics 13(4), 1352�1377.

Manski, C. F. (1989), �Anatomy of the selection problem�, The Journal of Human Resources24(3), 343�360.

29

Manski, C. F. (1994), The selection problem, in C. Sims., ed., �Advances in Econometrics: SixthWorld Congress�, Cambridge University Press, pp. 143�170.

Mincer, J. (1973), Schooling, Experience, and Earnings, NBER, New York.

Mroz, T. A. (1987), �The sensitivity of an empirical model of married women�s hours of work toeconomic and statistical assumptions�, Econometrica 55(4), 765�799.

Mulligan, C. B. & Rubinstein, Y. (2005), �Selection, investment, and women�s relative wages since1975�, NBER Working Paper.

Newey, W. K. (1991), �Two-step series estimation of sample selection models�, unpublished man-uscript, M.I.T.

Powell, J. (1986), �Censored regression quantiles�, Journal of Econometrics 32, 143�155.

Powell, J. (1987), �Semiparametric estimation of bivariate latent variable models�. unpublishedmanuscript, University of Wisconsin-Madison.

8 Appendix

If h(g) is known, �� can be estimated in a GMM framework by the moment condition

(z0; y; d; �; �; �) = d[� � 1=2 + 1=2 sgn(y � x0� � h(g(z; �)))]r

where r = (x; h(g)). Let �0 = (�0; �� ). The quantile regression estimator for �� solves approxi-mately

1=n

nXi=1

(z0; y; d; �; �� ) = 0

Assumingpn-consistency of �, ��

p! �� and su¢ cient smoothness of (z0; y; d; �; �; �) w.r.t. �,

�, the mean value theorem applied to the �rst-order conditions of � to get

0 = 1=n

nXi=1

�(�; �� ) +

@(�l; �l)

@�0(�� ) +

@(�l; �l)

@�0(�� 0)

�; (17)

where �l, �l are on the line segments connecting �; � and �� ; �� , respectively and where (�; �� )is the short form of (z0; y; d; �; �; �). It follows from equation (17) that

pn(�� ) = �

�1

n

nXi=1

@(�l; �l)

@�0

��1� 1pn

nXi=1

(�0; �0)��1

n

nXi=1

@(�l; �l)

@�0

�pn(�� 0)

�(18)

Buchinsky (1998a) shows thatpn(�� )

L! N(0;�� ) and

�� = ��1fr [�(1� �)�rr +�frx��

0frx]�

�1fr (19)

where

�fr� = E[dfe� (0jr)rr0]; �frx� = E[dfe� (0jr)�@h(g)

@g��

�0rz0]; �rr = E[drr0]: (20)

30

and f(=cdot) denotes the distribution function. The asymptotic covariance matrix �� is the topleft k � k matrix of �� . By combining (18), (20) and (A4) of Buchinsky (1998a) one obtains

pn(�� )

p! 1pn

nXi=1

`�;�� frx�pn(�� 0) (21)

where `�;�� = d(� � Ify < x0�� + ��h(g(z; �))g)r. We now insert equation (49) of Klein & Spady(1993) into (21) to get

pn(�� )

p! ��1fr�1pn

nXi=1

(`�;�� frx��1p k�); (22)

where

k� =@p(�0)

@�

d� p(�0)p(�0)(1� p(�0))

and �p = E�@P (�0)

@�

@P (�0)0

@�

1

p(�0)(1� p(�0))

�. (23)

31

Documents

Sample Selection, Heteroscedasticity, and Quantile · PDF fileSample Selection, Heteroscedasticity, and Quantile Regression Blaise Melly, Martin Huber Preliminary First draft: December