14
Computational Statistics and Data Analysis 56 (2012) 2889–2902 Contents lists available at SciVerse ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda Non-parametric bootstrap mean squared error estimation for M -quantile estimators of small area averages, quantiles and poverty indicators Stefano Marchetti a,, Nikos Tzavidis b , Monica Pratesi a a Department of Statistics and Mathematics Applied to Economics, University of Pisa, Via Ridolfi, 10 - 56124 Pisa (PI), Italy b Social Statistics and Southampton Statistical Sciences Research Institute, University of Southampton, Highfield, SO17 1BJ, Southampton, UK article info Article history: Received 29 December 2010 Received in revised form 12 January 2012 Accepted 28 January 2012 Available online 9 February 2012 Keywords: Chambers–Dunstan estimator Income distribution Domain estimation Poverty mapping Resampling methods Robust estimation abstract Small area estimation is conventionally concerned with the estimation of small area averages and totals. More recently emphasis has been also placed on the estimation of poverty indicators and of key quantiles of the small area distribution function using robust models, for example, the M-quantile small area model. In parallel to point estimation, Mean Squared Error (MSE) estimation is an equally crucial and challenging task. However, while analytic MSE estimation for small area averages is possible, analytic MSE estimation for quantiles and poverty indicators is difficult. Moreover, one of the main criticisms of the analytic MSE estimator for M-quantile estimates of small area averages is that it can be unstable when the area-specific sample sizes are small. A non-parametric bootstrap framework for MSE estimation for small area averages, quantiles and poverty indicators estimated with the M-quantile small area model is proposed. Emphasis is placed on second order properties of MSE estimators with results suggesting that the bootstrap MSE estimator is more stable than corresponding analytic MSE estimators. The proposed bootstrap is evaluated in a series of simulation studies under different parametric assumptions for the model error terms and different scenarios for the area-specific sample and population sizes. Finally, results from the application of the proposed MSE estimator to real income data from the European Survey of Income and Living Conditions (EU-SILC) in Italy are presented and information on the availability of R functions that can be used for implementing the proposed estimation procedures in practice is provided. © 2012 Elsevier B.V. All rights reserved. 1. Introduction Sample surveys provide an effective way of obtaining estimates for population characteristics. Estimation, however, can become difficult when the focus is on domains (areas) with small sample sizes. The term ‘small areas’ is typically used to describe domains whose sample sizes are not large enough to allow sufficiently precise direct estimation, i.e. estimation based only on the sample data from the domain (Rao, 2003). When direct estimation is not possible, one has to rely upon alternative model-based methods for producing small area estimates. Small area estimation is conventionally concerned with the estimation of small area averages and totals. More recently emphasis has been also placed on the estimation of Corresponding author. Tel.: +39 050 2216320; fax: +39 050 2216375. E-mail address: [email protected] (S. Marchetti). 0167-9473/$ – see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2012.01.023

Non-parametric bootstrap mean squared error estimation for -quantile estimators of small area averages, quantiles and poverty indicators

Embed Size (px)

Citation preview

Page 1: Non-parametric bootstrap mean squared error estimation for -quantile estimators of small area averages, quantiles and poverty indicators

Computational Statistics and Data Analysis 56 (2012) 2889–2902

Contents lists available at SciVerse ScienceDirect

Computational Statistics and Data Analysis

journal homepage: www.elsevier.com/locate/csda

Non-parametric bootstrap mean squared error estimation forM-quantile estimators of small area averages, quantiles andpoverty indicators

Stefano Marchetti a,∗, Nikos Tzavidis b, Monica Pratesi aa Department of Statistics and Mathematics Applied to Economics, University of Pisa, Via Ridolfi, 10 - 56124 Pisa (PI), Italyb Social Statistics and Southampton Statistical Sciences Research Institute, University of Southampton, Highfield, SO17 1BJ, Southampton, UK

a r t i c l e i n f o

Article history:Received 29 December 2010Received in revised form 12 January 2012Accepted 28 January 2012Available online 9 February 2012

Keywords:Chambers–Dunstan estimatorIncome distributionDomain estimationPoverty mappingResampling methodsRobust estimation

a b s t r a c t

Small area estimation is conventionally concerned with the estimation of small areaaverages and totals. More recently emphasis has been also placed on the estimation ofpoverty indicators and of key quantiles of the small area distribution function using robustmodels, for example, the M-quantile small area model. In parallel to point estimation,Mean Squared Error (MSE) estimation is an equally crucial and challenging task. However,while analytic MSE estimation for small area averages is possible, analytic MSE estimationfor quantiles and poverty indicators is difficult. Moreover, one of the main criticismsof the analytic MSE estimator for M-quantile estimates of small area averages is thatit can be unstable when the area-specific sample sizes are small. A non-parametricbootstrap framework for MSE estimation for small area averages, quantiles and povertyindicators estimated with the M-quantile small area model is proposed. Emphasis isplaced on second order properties of MSE estimators with results suggesting that thebootstrap MSE estimator is more stable than corresponding analytic MSE estimators. Theproposed bootstrap is evaluated in a series of simulation studies under different parametricassumptions for the model error terms and different scenarios for the area-specificsample and population sizes. Finally, results from the application of the proposed MSEestimator to real income data from the European Survey of Income and Living Conditions(EU-SILC) in Italy are presented and information on the availability of R functionsthat can be used for implementing the proposed estimation procedures in practice isprovided.

© 2012 Elsevier B.V. All rights reserved.

1. Introduction

Sample surveys provide an effective way of obtaining estimates for population characteristics. Estimation, however, canbecome difficult when the focus is on domains (areas) with small sample sizes. The term ‘small areas’ is typically used todescribe domains whose sample sizes are not large enough to allow sufficiently precise direct estimation, i.e. estimationbased only on the sample data from the domain (Rao, 2003). When direct estimation is not possible, one has to rely uponalternative model-based methods for producing small area estimates. Small area estimation is conventionally concernedwith the estimation of small area averages and totals. More recently emphasis has been also placed on the estimation of

∗ Corresponding author. Tel.: +39 050 2216320; fax: +39 050 2216375.E-mail address: [email protected] (S. Marchetti).

0167-9473/$ – see front matter© 2012 Elsevier B.V. All rights reserved.doi:10.1016/j.csda.2012.01.023

Page 2: Non-parametric bootstrap mean squared error estimation for -quantile estimators of small area averages, quantiles and poverty indicators

2890 S. Marchetti et al. / Computational Statistics and Data Analysis 56 (2012) 2889–2902

poverty indicators and of key quantiles of the small area distribution function (Molina et al., 2010a) using the M-quantilesmall area model (Chambers and Tzavidis, 2006).

Estimating the precision of small area estimates is both an important and challenging task. Despite the fact that MSEestimation forM-quantile small area averages has been studied fairly extensively (Chambers and Tzavidis, 2006; Chamberset al., 2011), MSE estimation formore complex small area statistics e.g. for poverty indicators estimatedwith theM-quantilemodel is unexplored. What is more, analytic MSE estimation for complex statistics is difficult. For example quantiles can beconsistently estimated by using the Chambers–Dunstan estimator (Chambers and Dunstan, 1986). Although the asymptoticbehaviour of this estimator was studied by Chambers and Dunstan (1986) and asymptotic expressions for the bias and thevariancewere derived by Chambers et al. (1992), the use of these expressions is computationally difficultwhile the extensionof this theory to small area estimation is a challenging task. This motivates the work in this paper in which we propose aunified non-parametric bootstrap framework for MSE estimation for small area averages, quantiles and poverty indicators– in particular, for the Head Count Ratio (HCR) and for the Poverty Gap (PG) – estimated with the M-quantile small areamodel. The proposed bootstrap is based on resampling empirical or smoothed M-quantile model residuals and presentsan extension of the work by Lombardía et al. (2003) to small area estimation with the M-quantile model. The use of non-parametric bootstrap for MSE estimation in the small area estimation literature is discussed in great detail in the seminalpaper byHall andMaiti (2006a). For the purposes of this paper, the choice of a non-parametric bootstrap scheme, instead of aparametric one, is dictated by the fact that theM-quantile small area model does not make explicit parametric assumptionsabout the model error terms. This is in contrast to the conventional unit level area random effects model which assumesthat the unit level and area level error terms are Gaussian. MSE estimation using parametric, instead of non-parametric,bootstrap has been proposed by Hall and Maiti (2006b) and recently used by Sinha and Rao (2009) for estimating the MSEof the Robust Empirical Best Linear Unbiased Predictor (REBLUP) of the small area average and by Molina and Rao (2010)for estimating the MSE of small area poverty indicators estimated by using the Empirical Best Prediction (EBP) approach.

The complexity of the small area target parameters we consider in this paper is only one way of motivating the use ofbootstrap. There is one additional reason as to why one may consider using a bootstrap MSE estimator. As we mentionedabove, analytic MSE estimation for M-quantile estimates of small area averages has been already proposed. Although thisestimator is bias robust against misspecification of the model assumptions, one of its main criticisms is that it can beunstable when used with small area-specific sample sizes. Second order properties of MSE estimators are, however, alsovery important. For this reason, a further aim of this paper is to also study the stability of the non-parametric bootstrapMSEestimator and compare this to the stability of corresponding analytic MSE estimators.

The paper is organized as follows. In Section 2 we review theM-quantile small area model and present point estimationfor small area averages, quantiles and poverty indicators. Analytic MSE estimation for estimates of small area averages isreviewed. Although the emphasis here is onMSE estimation, rather than on point estimation, wemust stress that estimationof poverty indicators under the M-quantile model is presented for the first time in this paper. However, comparisons withalternative poverty estimation approaches – e.g. the EBP method of Molina and Rao (2010) – will be discussed in futurework. In Section 3 we present the non-parametric bootstrap scheme. In Section 4 the performance of the proposed MSEestimator is empirically evaluated under different parametric assumptions for the model error terms and for the small areasample and population sizes. For the case of small area averages the bootstrapMSE estimator is also compared to the analyticMSE estimator proposed by Chambers and Tzavidis (2006) and Chambers et al. (2011). Using real income data from the EU-SILC survey in Italy, in Section 5 we apply the bootstrap MSE estimator for computing the accuracy of estimates of incomeaverages, income quantiles and poverty indicators for Provinces in Tuscany. Finally, in Section 6we conclude the paper withsome final remarks and provide information about the availability of software, in the form of R functions, that implementsthe proposed estimation procedures.

2. Small area estimation by using theM-quantile model

In what follows we assume that a vector of p auxiliary variable xij is known for each population unit i in small areaj = 1, . . . ,m and that values of the variable of interest y are available from a random sample, s, that includes units from allthe small areas of interest. We denote the population size, sample size, sampled part of the population and non-sampledpart of the population in area j respectively by Nj, nj, sj and rj. We assume that the sum over the areas of Nj and nj is equalto N and n respectively. We further assume that conditional on covariate information for example, design variables, thesampling design is ignorable.

A recently proposed approach to small area estimation is based on the use of a quantile/M-quantile regression model(Chambers and Tzavidis, 2006). The classical regression model summarizes the behaviour of the mean of a random variabley at each point in a set of covariates x. Instead, quantile regression summarizes the behaviour of different parts (e.g. quantiles)of the conditioned distribution of y at each point in the set of the x’s.

Let us for the moment and for notational simplicity drop subscript j. Suppose that (xTi , yi), i = 1, . . . , n, denotes theobserved values for a random sample consisting of n units, where xTi are row p-vectors of a known design matrix X andyi is a scalar response variable corresponding to a realization of a continuous random variable with unknown continuouscumulative distribution function F . A linear regression model for the qth conditional quantile of yi given xi is

Qyi(q|xi) = xiTβ(q).

Page 3: Non-parametric bootstrap mean squared error estimation for -quantile estimators of small area averages, quantiles and poverty indicators

S. Marchetti et al. / Computational Statistics and Data Analysis 56 (2012) 2889–2902 2891

An estimate of the qth regression parameter β(q) is obtained by minimizingn

i=1

|yi − xTi β(q)|

(1 − q)I(yi − xTi β(q) ≤ 0)+ qI(yi − xTi β(q) > 0)

.

Quantile regression presents a generalization of median regression and expectile regression (Newey and Powell, 1987),a ‘quantile-like’ generalization of mean regression. M-quantile regression (Breckling and Chambers, 1988) integratesthese concepts within a framework defined by a ‘quantile-like’ generalization of regression based on influence functions(M-regression). The M-quantile of order q for the conditional density of y given the set of covariates x, f (y|x), is defined asthe solutionMQy(q|x;ψ) of the estimating equation

ψqy−MQy(q|x;ψ)f (y|x)dy = 0, whereψq denotes an asymmetric

influence function, which is the derivative of an asymmetric loss function ρq. A linearM-quantile regression model yi givenxi is one where we assume that

MQy(q|xi;ψ) = xiTβψ (q), (1)

and estimates of βψ (q) are obtained by minimizing

ni=1

ρqyi − xiTβψ (q)

. (2)

Different regression models can be defined as special cases of (2). In particular, by varying the specifications of theasymmetric loss function ρq we obtain the expectile, M-quantile and quantile regression models as special cases. Whenρq is the square loss function, we obtain the linear expectile regression model if q = 0.5 (Newey and Powell, 1987) and thestandard linear regressionmodel if q = 0.5.When ρq is the loss function described by Koenker and Bassett (1978) we obtainthe linear quantile regression. Throughout this paper we will take the linear M-quantile regression model to be defined bywhen ρq is the Huber loss function (Breckling and Chambers, 1988). Setting the first derivative of (2) equal to zero leads tothe following estimating equation

ni=1

ψq(riq)xi = 0,

where riq = yi − xTi βψ (q), ψq(riq) = 2ψ(s−1riq)qI(riq > 0) + (1 − q)I(riq ≤ 0) and s > 0 is a suitable estimate of scale.For example, in the case of robust regression, s = median|riq|/0.6745. Since the focus of our paper is onM-type estimation,we use the Huber Proposal 2 influence function, ψ(u) = uI(−c ≤ u ≤ c)+ c · sgn(u). Provided that the tuning constant cis strictly greater than zero, estimates of βψ (q) are obtained using iterative weighted least squares (IWLS).

2.1. Estimators of small area averages

Chambers and Tzavidis (2006) extended the use of M-quantile regression models to small area estimation. Followingtheir development (see also Kokic et al., 1997; Aragon et al., 2005), these authors characterize the conditional variabilityacross the population of interest by the M-quantile coefficients of the population units. For unit i with values yi and xi,this coefficient is the value θi such that MQy(θi|xi;ψ) = yi. The M-quantile coefficients are determined at the populationlevel. Consequently, if a hierarchical structure does explain part of the variability in the population data, then we expectunits within clusters (domains) defined by this hierarchy to have similar M-quantile coefficients. When the conditionalM-quantiles are assumed to follow the linear model (1), with βψ (q) a sufficiently smooth function of q, Chambers andTzavidis (2006) suggested a plug in (naïve) estimator of the average value of y in area j

mMQj = N−1

j

i∈sj

yi +i∈rj

xTi βψ (θj)

, j = 1, . . . ,m, (3)

where θj is an estimate of the average value of theM-quantile coefficients of the units in area j. The area-specificM-quantilecoefficients, θj, can be viewed as pseudo-random effects. Empirical work indeed indicates that the area-specificM-quantilecoefficients are positively and highly correlated with the estimated random area-specific effects obtained with the nestederror regression small area model. Chambers and Tzavidis (2006) also observed that the naïveM-quantile estimator (3) canbe biased, especially in the presence of heteroskedastic and/or asymmetric errors. This observation motivated the workin Tzavidis et al. (2010). In particular, these authors proposed a bias adjusted M-quantile estimator for the small areaaverage that is derived by using an estimator of the finite population distribution function such as the Chambers–Dunstanestimator (Chambers and Dunstan, 1986). The Chambers–Dunstan estimator of the small area distribution function is of theform

F CDj (t) = N−1

j

i∈sj

I(yi ≤ t)+ n−1j

k∈rj

i∈sj

I(xTk βψ (θj)+ ei ≤ t)

.

Page 4: Non-parametric bootstrap mean squared error estimation for -quantile estimators of small area averages, quantiles and poverty indicators

2892 S. Marchetti et al. / Computational Statistics and Data Analysis 56 (2012) 2889–2902

Estimates of θj and βψ (θj) are obtained following Chambers and Tzavidis (2006) and ei = yi − xTi βψ (θj) are model residuals.TheM-quantile bias-adjusted estimator of the average of y in small area j is then defined as

mCDj =

+∞

−∞

y dF CDj (y)

= N−1j

i∈sj

yi +i∈rj

yi + (1 − fj)i∈sj

ei

(4)

where fj = njN−1j is the sampling fraction in area j and yi = xTi βψ (θj), i ∈ rj (Tzavidis et al., 2010). The bias correction

in (4) means that this estimator has higher variability than (3). Nevertheless, because of its bias robust properties, (4) isusually preferred, over the naïve M-quantile estimator, in practice. Finally, as we will also see in the next section, by usingthe Chambers–Dunstan estimator one can define a general framework for small area estimation that extends beyond theestimation of small area averages.

Analytic MSE estimation for M-quantile estimators of small area averages is described in Chambers and Tzavidis (2006)and Chambers et al. (2011). In particular, Chambers et al. (2011) proposed an analytic mean squared error estimator thatis a first order approximation to the mean squared error of estimator (4). These authors noted that since an iterativelyreweighted least squares algorithm is used to calculate theM-quantile regression fit at θj,

βψ (θj) = (XTs WsjXs)

−1XTs Wsjys,

where Xs and ys denote the matrix of sample x values and the vector of sample y values respectively, and Wsj denotesthe diagonal weight matrix of order n that defines the estimator of the M-quantile regression coefficient with q = θj. Itimmediately follows that (4) can be written

mCDj = wT

sjys, (5)

where wsj = (wij) = n−1j dsj + (1 − N−1

j nj)WjXs(XTs WjXs)

−1(xrj − xsj) with dsj denoting the n-vector that ‘picks out’ thesample units from area j. Here xsj and xrj denote the sample and non-sample means of x in area j. Also, these weights are‘locally calibrated’ on x since

i∈s

wijxi = xsj + (1 − fj)(xrj − xsj) = xj.

A first order approximation to the mean squared error of (5) then treats the weights as fixed and applies standard methodsof robust mean squared error estimation for linear estimators of population quantities (Royall and Cumberland, 1978). Withthis approach, the MSE of mCD

j is estimated by

Var(mCDj ) =

mg=1

i∈sg

λijgyi − xiβψ (θg)

2, (6)

where λijg = [(wij − 1)2 + (nj − 1)−1(Nj − nj)]I(g = j) + w2ig I(g = j). Empirical studies show that the analytic MSE

estimator (6) is bias robust against model misspecification (Chambers et al., 2011). However, its main criticism is that it canbe unstable especially with small area-specific sample sizes.

2.2. Estimators of small area poverty indicators and quantiles

Although small area averages are widely used in small area applications, relying only on averages may not be veryinformative. This is the case for example in economic applications where estimates of average income may not providean accurate picture of the area wealth due to the high within area inequality.

An estimate of the φ ∈ [0, 1] quantile of the distribution function in small area j is the value q(j;φ) obtained by anumerical solution to the following estimating equation q(j;φ)

−∞

dF CDj (t) = φ. (7)

Estimating poverty indicators at disaggregated geographical levels is also important. In this paper we focus on theestimation of the incidence of poverty or Head Count Ratio (HCR) and of the Poverty Gap (PG) as defined by Foster et al.(1984). Denoting by t the poverty line, different poverty indicators are defined by the small area means of the derivedvariable

zi(α, t) =

t − yit

αI(yi ≤ t) i = 1, . . . ,N.

Setting α = 0 defines the HCR while setting α = 1 defines the PG.

Page 5: Non-parametric bootstrap mean squared error estimation for -quantile estimators of small area averages, quantiles and poverty indicators

S. Marchetti et al. / Computational Statistics and Data Analysis 56 (2012) 2889–2902 2893

The population poverty indicators in small area j, zj(α, t), can be decomposed as follows,

zj(α, t) = N−1j

i∈sj

zi(α, t)+

k∈rj

zk(α, t)

. (8)

The unknown part, zk(α, t), k ∈ rj, in (8) can be estimated as follows,

zk(α, t) = n−1j

i∈sj

t − yikt

αI(yik ≤ t), (9)

where yik = xTk βψ (θj) + ei. Recollect that ei here is the sample residual defined by the M-quantile fit to yi. By substituting(9) into (8) we obtain an estimator zj(α, t) of zj(α, t).

An alternative approach to calculating (8) is by Monte Carlo simulation as follows:

1 Fit theM-quantile small area model using the sample values ys and obtain estimates θj, βψ (θj), of θj and βψ (θj).2 Generate an out of sample vector of size Nj − nj using

y∗

k = xTk βψ (θj)+ e∗

k , k ∈ rj,

where e∗

k , k ∈ rj is drawn from the empirical distribution function of the M-quantile model residuals. Residuals can bedrawn either from the domain (area) j residuals or from all the residuals. In the following empirical analyses we usedresiduals from the sample.

3 Repeat the process H times. Each time combine the sample data yi, i ∈ sj and out of sample data y∗

k , k ∈ rj for computingzj(α, t).

4 Average the results over H simulations to obtain an estimate of zj(α, t).

We have implemented both approaches and we can confirm that they provide virtually the same point estimates while thecomputational time is almost identical. Perhaps, the Monte Carlo approach makes it easier to draw some parallels betweenthe EBP approach (Molina and Rao, 2010) and the proposedM-quantile approach to poverty estimation. Note, for example,that y∗

k , k ∈ rj is generated by using xTk βψ (θj) i.e. from the conditionalM-quantile model, where θj plays the role of the arearandom effect in theM-quantile modelling framework.

3. Non-parametric bootstrap MSE estimation

Unlike MSE estimation for small area averages, analytic MSE estimation for small area poverty indicators and quantilesis complex. In this section we present a nonparametric bootstrap framework for MSE estimation of small area parametersestimated by theM-quantile model.

Let us start with theM-quantile small area model

yij = xTijβψ (θj)+ εij,

where βψ (θj) is the unknown vector of M-quantile regression parameters for the unknown area-specific M-quantilecoefficient θj, and εij is the unit level random error term with distribution function G for which no explicit parametricassumptions are being made. Using the sample data we obtain estimates θj, βψ (θj), of θj and βψ (θj), and estimated modelresiduals eij = yij − xTij βψ (θj). The target is to estimate the small area parameter τj using,

∆j = N−1j

i∈sj

Γ (yij; t, α)I(yij ≤ t)+ n−1j

k∈rj

i∈sj

Γ (ykij; t, α)I(ykij ≤ t)

, (10)

where ykij = xTkjβψ (θj)+ eij and Γ (y; t, α) = ((t − y)/t)α . By setting α = 0 we obtain the CD estimator of the distributionfunction which can be used for deriving estimates of small area averages, quantiles and HCRs, while by setting α = 1 wedefine an estimator of the PG. Hence, by using (10), we obtain estimates of the small area target parameters we presentedin Section 2, which we collectively denote by τj.

Given an estimator Gest(u) of the distribution of the residuals G(u) = Pr(ε ≤ u), a bootstrap population, consistent withtheM-quantile small area model,Ω∗

= y∗

ij, xij, can be generated by sampling from Gest(u) to obtain ε∗

ij ,

y∗

ij = xTij βψ (θj)+ ε∗

ij, i = 1, . . . ,Nj, j = 1, . . . ,m. (11)

For defining Gest(u) we consider two approaches: (1) sampling from the empirical distribution function of the modelresiduals or (2) sampling from a smoothed distribution function of the model residuals. For each of the two above

Page 6: Non-parametric bootstrap mean squared error estimation for -quantile estimators of small area averages, quantiles and poverty indicators

2894 S. Marchetti et al. / Computational Statistics and Data Analysis 56 (2012) 2889–2902

mentioned approaches, sampling can be done in two ways namely, by sampling from the distribution of all residualswithout conditioning on the small area (unconditional approach) or by sampling from the distribution of the residualswithin small area j (conditional approach). These methods are described in detail in Tzavidis et al. (2010). Here we onlyrecall the expression for the smoothed unconditional estimators of the distribution of the residuals, which is the one weused to obtain results in Section 4,

Gest(t) = n−1mj=1

i∈sj

Kh−1(t − eij + es)

, (12)

where es is the sample mean of the residuals eij, h > 0 is a smoothing parameter and K is the distribution functioncorresponding to a bounded symmetric kernel density k. Although there are four possible approaches for defining ε∗

ij wesuggest using the unconditional, empirical or smoothed, approach. The reason for this is that in applications of small areaestimation sampling from the conditional distribution would rely on potentially a very small number of data points whichcan cause Gest(t) to be unstable.

Let us now define∆j for the bootstrap population,

∆∗

j = N−1j

i∈s∗j

Γ (y∗

ij; t, α)I(y∗

ij ≤ t)+

k∈r∗j

Γ (y∗

kj; t, α)I(y∗

kj ≤ t)

,

from which we obtain the bootstrap population parameter τ ∗

j . This parameter can be estimated by selecting a withoutreplacement sample from the bootstrap population and by using estimator (10),

∆∗

j = N−1j

i∈s∗j

Γ (y∗

ij; t, α)I(y∗

ij ≤ t)+ n−1j

k∈r∗j

i∈s∗j

Γ (y∗

kij; t, α)I(y∗

kij ≤ t)

, (13)

where y∗

kij = xTkjβ∗

ψ (θ∗

j )+ e∗

ij and β∗

ψ (θ∗

j ) are bootstrap sample estimates of theM-quantile model parameters. Therefore, byusing (13) we obtain bootstrap estimates, τ ∗

j , of the bootstrap population small area parameters τ ∗

j .The steps of the bootstrap procedure are as follows: starting from sample s, selected from a finite populationΩ without

replacement, we fit the M-quantile small area model and obtain estimates of θj and βψ (θj) which are used to compute themodel residuals. We then generate B bootstrap populations,Ω∗b, using (11) and (12). From each bootstrap population,Ω∗b,we select L bootstrap samples using simple random sampling within the small areas andwithout replacement in a way suchthat n∗

j = nj. Using the bootstrap samples we obtain estimates of τj. Bootstrap estimators of the bias and variance of theestimated target small area parameter, τj, are defined respectively by

Bias(τj) = B−1L−1B

b=1

Ll=1

τ ∗blj − τ ∗b

j

,

Var(τj) = B−1L−1B

b=1

Ll=1

τ ∗blj − ¯τ

∗blj

2,

where τ ∗bj is the small area parameter of the bth bootstrap population, τ ∗bl

j is the small area parameter estimated by using

(13) with the lth sample of the bth bootstrap population and ¯τ∗blj = L−1L

l=1 τ∗blj . The bootstrap MSE estimator of the

estimated small area target parameter is then defined as

MSEτj

= Var(τj)+ Bias(τj)2. (14)

The asymptotic properties of the smoothed bootstrap method, under a linear model, have been studied by Lombardíaet al. (2003). Here we only comment on the validity of the assumptions by Lombardía et al. (2003) under the M-quantilemodel. We note that the superpopulation model assumed by these authors is a special case of the linear M-quantile modelwhen a squared loss function is used in (2) and q = 0.5 (see Breckling and Chambers, 1988; Newey and Powell, 1987).Therefore, and for the purposes of this paper, the results of Lombardía et al. (2003) are only valid at q = 0.5 and forψ equalto the identity function. The prospective reader of this paper should therefore realize that similar consistency results are notyet available for the general case of ψ functions other than the identity function and for q’s other than 0.5.

4. Empirical evaluations

In this section we use model-based Monte-Carlo simulations to empirically evaluate the performance of the bootstrapMSE estimator (14) when used to estimate the MSE of theM-quantile estimators of (a) the small area average, (b) the smallarea quantile and (c) the head count ratio (HCR) and the poverty gap (PG). Moreover, since analytic MSE estimation forM-quantile estimates of small area averages is possible, the proposed bootstrap MSE estimator is also contrasted to the

Page 7: Non-parametric bootstrap mean squared error estimation for -quantile estimators of small area averages, quantiles and poverty indicators

S. Marchetti et al. / Computational Statistics and Data Analysis 56 (2012) 2889–2902 2895

corresponding analytic MSE estimator both in terms of bias and stability. The behaviour of the alternative MSE estimatorsis assessed under two different parametric assumptions for the model error terms namely, Normal and Chi-square errors,and two scenarios for the area-specific sample and population sizes. Finally, we also present results on howwell estimatorsof small area averages, quantiles and poverty indicators estimate the corresponding population parameters.

Inwhat follows subscript j identifies small areas, j = 1, . . . ,m and subscript i identifies units in a given area, i = 1, . . . , nj.Population dataΩ = (x, y) inm = 30 small areas are generated under two parametric scenarios for the model error terms.Population data under the first parametric scenario were generated by using a unit level area random effects model withnormally distributed random area effects and unit level errors as follows

yij = 3000 − 150 ∗ xij + γj + εij,

where the area random effects γj ∼ N(0, 2002), the unit level errors εij ∼ N(0, 8002), the auxiliary variable xij ∼ N(µj, 1)where µj ∼ U[4, 10] and µj was held fixed over simulations. Similarly, under the second parametric scenario populationdata were generated using

yij = 11 − xij + γj + εij,

where now the area random effects γj ∼ χ2(1), the unit levels errors εij ∼ χ2(6) and the auxiliary variable xij was generatedas in the first scenario but withµj ∼ U[8, 11]. Note that both area random effects as well as unit level errors were correctedto have zero expectations.

For each Monte Carlo simulation a within small areas random sample is selected from the corresponding generatedpopulation. Two scenarios for the population and sample sizes are investigated. Under the first scenario (denoted in thetables of results by λ = 1) the total population size is N = 8400 with small area-specific population sizes ranging between150 ≤ Nj ≤ 440. The total sample size is n = 840 and the area-specific sample sizes are ranging between 15 ≤ nj ≤ 44.Under the second scenario (denoted in the tables of results by λ = 2) the total population size is N = 2820 with area-specific population sizes ranging between 50 ≤ Nj ≤ 150 and the total sample size is n = 282 with area-specific samplesizes ranging between 5 ≤ nj ≤ 15. Using these two scenarios enables us to assess the effect of the domain sample sizesboth on the bias and the stability of the MSE estimators.

Using the sample data we obtain point estimates of small area averages with expression (4), of the 0.25, 0.50 and 0.75percentiles of the distribution of ywith (7) and of the HCR and PG with (8) and (9). For small area averages MSE estimationis performed using both the analytic MSE (6) and the bootstrap MSE estimator (14). For estimators of small area percentilesand poverty indicators MSE estimation is performed using the bootstrap MSE estimator (14). We run in total H = 500Monte-Carlo simulations.

For bootstrap MSE estimation we used one bootstrap population (B = 1) from which we drew 400 bootstrap samples(L = 400). The unconditional approachwith both the empirical and smoothed (12) versions of the error distribution is used.For the smoothed case, we use the Epanechnikov kernel density, k(u) = (3/4)(1 − u2)I(|u| < 1), where the smoothingparameter h in (12) was chosen so that it minimizes the cross-validation criterion suggested by Bowman et al. (1998). Thatis, hwas chosen in order to minimize

CV (h) = n−1mj=1

i∈sj

I(eij − es) ≤ t − G−i(t)

2dt,

where G−i(t) is the version of G(t) that omits sample unit i (Li and Racine, 2007, Section 1.5). For computing the smoothingparameter h in (12) we used the np package (Hayfield and Racine, 2008) in the R environment (R Development Core Team,2010). In particular, we used the function npudensbw for computing the smoothing parameter h and the function npudistfor deriving the smoothed distribution function of the residuals.

Because the evaluation of the bootstrap MSE estimator is taking place within a Monte-Carlo framework, the generationof a new Monte-Carlo population and of a new bootstrap population in each iteration is imitating the generation of manybootstrap populations. Having said this, we have also replicated H = 250 Monte-Carlo simulations both with B = 1 andwith B = 50 but for reducing the computational time we used 100 instead of 400 bootstrap samples. Although the resultsfrom the comparison of the bootstrap MSE estimator with B = 1 and with B = 50 are not reported in this paper, we doprovide comments on this comparison thatwill potentially assist the prospective reader of the paper in choosing the numberof bootstrap populations.

Denoting by νj the true, i.e. Monte Carlo, MSE of τj and by νj = MSE(τj) the corresponding estimate, the performance ofMSE estimators is evaluated using the relative bias (RB) and root MSE (RMSE) respectively defined by

RB(νj) = H−1H

h=1

νj,h − νj

νj

,

RMSE(νj) =

H−1

Hh=1

(νj,h − νj)2

1/2

,

where νj = E[(τj −τj)2]. In what followswe present results using the smoothed unconditional approach. Results from using

the empirical distribution of the residuals have been also produced and are consistent with the results presented here. With

Page 8: Non-parametric bootstrap mean squared error estimation for -quantile estimators of small area averages, quantiles and poverty indicators

2896 S. Marchetti et al. / Computational Statistics and Data Analysis 56 (2012) 2889–2902

Table 1Averages over areas and simulations of the empirical (Monte Carlo) RMSE, of the estimated RMSE (analytic and bootstrap),of the relative bias (%) of the RMSE estimators and of the RMSE of the RMSE estimators for M-quantile estimators of smallarea averages. Bootstrap results are produced using the unconditional smoothed approach.

Normal scenario χ2 scenarioλ = 1 λ = 2 λ = 1 λ = 2

True 147.9 255.6 0.644 1.098Estimated (Analytic) 146.1 237.2 0.634 1.019Estimated (Bootstrap) 147.5 245.3 0.639 1.069Rel. Bias (%) (Analytic) −0.988 −6.53 −1.505 −6.531Rel. Bias (%) (Bootstrap) −0.189 −3.78 −0.818 −2.478RMSE (Analytic) 22.420 67.160 0.135 0.369RMSE (Bootstrap) 11.040 28.170 0.060 0.148

regards to the computational time required, we did not see a marked difference when implementing the two approaches.For example, for a sample of about 1500 households (see application in Section 5) generating 50 bootstrap populations usingthe smoothed unconditional approach is less than 45 s slower than generating 50 bootstrap populations using the empiricalunconditional approach version. We do expect, however, that as the sample size increases the computational time neededfor implementing the smoothed approach will also increase.

4.1. Results for small area averages

Table 1 presents the results for MSE estimation of M-quantile small area averages obtained with (4), under the twoparametric scenarios and the two scenarios for the area-specific sample and population sizes, using the analytic MSEestimator (6) and the bootstrap MSE estimator (14). The table reports averages over areas and simulations of the empirical(Monte Carlo) RMSE, the estimated RMSE, the relative bias (%) of the estimated RMSE and the RMSE of the RMSE estimatorswhich is used for assessing the stability of the bootstrap and analytic MSE estimators.

These results suggest that for all scenarios we studied the average (over areas and simulations) of the analytic and thebootstrap MSE estimators is close to the corresponding average of the empirical MSE, with the bootstrap MSE estimatorappearing to be somewhat more precise. The distribution of the bootstrap and the analytic MSE estimators over areas andsimulations is also close to the corresponding distribution obtained with the empirical MSE but in the economy of space wedo not report these results. The bootstrapMSE estimator appears to be notablymore stable than the analytic MSE estimator.In particular, the bootstrap MSE estimator has RMSE that is approximately half the RMSE of the analytic estimator (scenariowith λ = 1). These differences become more pronounced for the smaller area sample sizes (scenario λ = 2). Therefore,there is evidence to suggest that the bootstrapMSE estimator is more stable than the analytic MSE estimator. Fig. 1 presentsaverages, over simulations, of true and estimated, using (4), small area means. In the economy of space we present resultsonly for the χ2 scenario and the Normal scenario with the smaller area sample sizes (λ = 2). These results show thatestimates of small area averages are close to population values. The results for other scenarios are consistent with the oneswe present here. Results for coverage rates of 95% confidence intervals for estimates of small area averages constructed byusing the bootstrap MSE estimator are on average close to 95% (ranging between 0.931% and 0.944%) for both parametricscenarios and for both scenarios of the area sample and population sizes.

4.2. Results for small area poverty indicators and percentiles

Table 2 presents results on the performance of the bootstrap MSE estimator (14) when used to estimate the MSE ofestimates of HCR and PG obtained with (8), and for percentiles of y estimated with (7) (specifically for q = 0.25, 0.50, 0.75).The table reports the average over areas and simulations of the empirical (Monte Carlo) RMSE, the estimated RMSE, therelative bias of the bootstrap RMSE and the RMSE of the RMSE estimator which is used for assessing the stability of thebootstrap MSE estimator.

The estimated RMSEs for HCR and PG are on average close to the empirical RMSE both for the Normal and χ2 scenarios(the distribution of the empirical RMSE over areas is also tracked well by the bootstrap MSE and results are available uponrequest). For the normal scenario, these results show evidence of substantial relative bias ranging on average between(−16.9%, −7.6%). For the chi-square scenario the relative bias is substantially lower ranging on average between (−6.9%,−0.19%). However, the results on the relative bias must be interpreted with care since the values of the RMSEs are smalland hence even small differences will result in substantial relative bias. This is the case even with values that agree up tothe second decimal place. As expected, the variability of the RMSE estimator is greater when the sample size is smaller andgiven the decrease in the area sample sizes (see scenario λ = 2) the stability of the MSE estimator remains satisfactory.These stability results will be fully evaluated only when compared to alternative MSE estimators of the HCR and PG. To thebest of our knowledge, no analytic MSE estimators are currently available for the HCR and the PG. The only alternative MSEestimator is the parametric bootstrap proposed by Molina and Rao (2010). Fig. 1 present averages, over simulations, of trueand estimated, using (8), small area HCRs and PGs. In the economy of space we present results only for the χ2 scenarioand the Normal scenario with the smaller area sample sizes (λ = 2). These results show that estimates of small area HCRs

Page 9: Non-parametric bootstrap mean squared error estimation for -quantile estimators of small area averages, quantiles and poverty indicators

S. Marchetti et al. / Computational Statistics and Data Analysis 56 (2012) 2889–2902 2897

Fig. 1. Point estimates and true values for small area averages, HCRs and PGs. The first row of plots refers to the Normal scenario with sample size λ = 2,the second row refers to χ2 scenario with sample size λ = 2.

Table 2Averages over areas and simulations of the empirical (Monte Carlo) RMSE, of the bootstrap RMSE and of the relative Bias (%) and RMSE ofthe bootstrap RMSE estimator for the HCR, PG and Quantiles. Results are produced using the unconditional smoothed approach.

HCR PG Q25 Q50 Q75

Normal scenario, λ = 1

True 0.041 0.018 201.8 184.2 204.7Estimated 0.035 0.015 198.6 179.1 197.3Rel. bias (%) −14.22 −16.99 −1.60 −2.74 −3.55RMSE 0.013 0.006 23.61 20.38 24.56

Normal scenario, λ = 2

True 0.063 0.028 344.0 313.9 348.1Estimated 0.059 0.025 330.7 290.5 326.6Rel. bias (%) −7.66 −12.06 −3.71 −7.27 −6.13RMSE 0.022 0.012 55.19 48.71 55.70

χ2 scenario, λ = 1

True 0.063 0.062 0.591 0.73 1.02Estimated 0.059 0.058 0.589 0.71 1.00Rel. bias (%) −6.94 −5.58 −0.076 −3.10 −2.24RMSE 0.016 0.015 0.063 0.08 0.13

χ2 scenario, λ = 2

True 0.096 0.094 1.037 1.259 1.68Estimated 0.096 0.095 1.007 1.155 1.57Rel. bias (%) 0.195 0.261 −2.714 −8.16 −6.31RMSE 0.019 0.019 0.148 0.201 0.271

and PGs are close to population values. The results for other scenarios are consistent with the ones we present here. Resultsabout the percentiles show that on average the bootstrapMSE estimator is close to the empiricalMSE of the three percentilesunder both parametric scenarios and both scenarios for the small area sample sizes. Some underestimation is present butin terms of percentage relative bias this underestimation is not excessive. Fig. 2 presents averages, over simulations, of trueand estimated, using (7), small area percentile estimates of q = 0.25, 0.5, 0.75. Again, in the economy of space we presentresults only for the χ2 and Normal scenario with the smaller area sample sizes (λ = 2). These results show that estimates ofsmall area percentiles are close to population values. The results for other scenarios are consistent with the ones we presenthere.

Page 10: Non-parametric bootstrap mean squared error estimation for -quantile estimators of small area averages, quantiles and poverty indicators

2898 S. Marchetti et al. / Computational Statistics and Data Analysis 56 (2012) 2889–2902

Fig. 2. Point estimates and true values for small area percentiles (q = 0.25, 0.5, 0.75). The first row of plots refers to the Normal scenario with samplesize λ = 2, the second row refers to χ2 scenario with sample size λ = 2.

Coverage rates of normal theory 95% confidence intervals for estimates of small area quantiles constructed by using thebootstrap MSE estimator range on average between 93% and 95% for both parametric scenarios and for both scenarios ofthe area sample and population sizes. The coverage rates of normal theory 95% confidence intervals for estimates of smallarea poverty indicators (HCR and PG) constructed by using the bootstrap MSE estimator range on average between 90% and94% for both parametric scenarios and for both scenarios of the area sample and population sizes. The use of estimatedMSEs to construct normal theory confidence intervals, though widespread, has been criticized. Hall and Maiti (2006b)and more recently Chatterjee et al. (2008) discuss the use of bootstrap methods for constructing confidence intervals forsmall area parameters since there is no guarantee that the asymptotic behaviour underpinning normal theory confidenceintervals applies in the context of the small samples that characterize small area estimation. An alternative to a normalapproximation to constructing confidence intervals is to use bootstrap-based confidence intervals. Coverage rates fromusing bootstrap-based confidence intervals have been computed and are comparable to those produced from using normalconfidence intervals and which are reported above. However, for the purposes of this paper we are mainly interested in thefirst and second order properties of the bootstrapMSE estimator. A full study of the properties of coverage rates will requirea larger number of Monte-Carlo simulation and potentially more bootstrap samples.

We conclude this section by offering some guidance to the prospective reader for deciding the number of bootstrappopulations to be used in MSE estimation. At the start of Section 4 we mentioned that we replicated H = 250 Monte-Carlosimulations both with B = 1 andwith B = 50 bootstrap populations selecting 100 bootstrap samples from each population.The results from this simulation study indicate that there are some gains in the stability of the RMSE by using B = 50instead of B = 1 bootstrap populations. These gains are more evident for the MSE of small area averages and quantiles butless obvious for the MSE of the HCR and the PG. At the same time, increasing the number of bootstrap populations does notappear to have a significant effect on the bias of the bootstrapMSE. Decidingwhether to usemore bootstrap populationswillalso depend on the sample and population sizes we are working with. Hence, if computationally feasible, we recommendthe use of additional bootstrap populations as this can improve the stability of the MSE estimates.

5. An application: estimating the income distribution and poverty indicators for provinces in Tuscany

The aim of this section is to provide a picture of the economic conditions in Tuscan provinces. This is achieved bycomputing province-specific estimates of average equivalised income, of key percentiles of the income distribution function(25th, 50th, 75th) and of two poverty indicators namely, the HCR and PG as well as corresponding MSE estimates. Smallarea estimation is performed by using data from the 2006 European Survey on Income and Living Conditions (EU-SILC) inItaly and the 2001 Census microdata for the region of Tuscany. Provinces within regions are unplanned domains and thesample sizes for provinces in Tuscany range from 59 households in the Grosseto province to 445 households in the FlorenceProvince with an average sample size of 149 households (median 123 households). The population of households in thedifferent provinces, using 2001 Census data, ranges from 80,810 households in the province of Massa–Carrara to 376,300 inthe province of Florence with the total number of households in Tuscany being 1,388,252.

Page 11: Non-parametric bootstrap mean squared error estimation for -quantile estimators of small area averages, quantiles and poverty indicators

S. Marchetti et al. / Computational Statistics and Data Analysis 56 (2012) 2889–2902 2899

Fig. 3. Estimates of the income percentiles for provinces in Tuscany (darker colours indicate higher wealth).

Table 3Point estimates and corresponding RMSEs of small area averages, HCRs and PGs for Provinces in Tuscany.

PROVINCE MEAN RMSE HCR RMSE PG RMSE

Massa 14128.26 664.84 0.280 0.039 0.117 0.022Lucca 15867.69 766.81 0.239 0.026 0.094 0.015Pistoia 18980.76 1119.33 0.195 0.019 0.073 0.011Firenze 19184.92 498.35 0.166 0.012 0.061 0.007Livorno 17875.01 919.41 0.193 0.020 0.075 0.012Pisa 18550.16 876.38 0.175 0.018 0.065 0.010Arezzo 18665.97 1014.42 0.182 0.018 0.068 0.010Siena 20228.98 1113.91 0.161 0.023 0.060 0.012Grosseto 16152.47 1151.83 0.231 0.029 0.093 0.019Prato 17702.87 632.74 0.172 0.021 0.062 0.011

Table 4Point estimates and corresponding RMSEs of small area quartiles for Provinces in Tuscany.

Province Q1 RMSE Q2 RMSE Q3 RMSE

Massa 8837.42 712.33 13498.69 831.75 18528.25 1164.96Lucca 9715.74 640.71 14733.29 690.70 20650.60 1087.89Pistoia 11412.26 669.47 16124.78 685.77 22243.96 1017.31Firenze 12628.02 335.54 17364.81 377.37 23328.47 547.68Livorno 11338.34 610.02 16662.83 701.85 22991.71 983.31Pisa 11571.98 618.04 17161.42 681.67 23867.96 989.89Arezzo 12205.01 578.72 16724.22 638.74 22100.72 949.08Siena 12639.00 662.31 18373.53 703.94 25471.22 1087.76Grosseto 9924.80 924.38 15456.41 1016.58 22069.22 1483.27Prato 12779.53 669.54 16968.74 708.72 21796.88 1101.34

We start by first building a small area working model that is estimated using the EU-SILC survey data. The responsevariable is the household equivalised income. The explanatory variables we considered are those that are common to thesurvey and Census datasets. This includes explanatory variables that relate to the head of the household namely, gender, age,occupational status and years in education, and explanatory variables that relate to the household, namely, the ownershipstatus of the house and the number of household members. Fitting a two-level (households within provinces) randomeffects model using the above explanatory variables and performing residual analysis reveals departures from the assumednormality of the level 1 and level 2 error terms. For this reason, we decided to use an outlier robust model, in this case theM-quantile small area model (see Section 2). Small area estimates of average household income are derived by using (4).Small area estimates of the three income percentiles are derived by using (7). Finally, estimates of HCR and PG are obtainedby using (8). For estimating the poverty indicators the poverty line is computed at regional level as 60% of the medianhousehold equivalised income. Corresponding estimates of the MSE are obtained by using the bootstrap MSE estimator(14), which is implemented by generating B = 50 bootstrap populations and selecting L = 100 bootstrap samples fromeach bootstrap population. The choice of the number of bootstrap populations is to some extent justified by the empiricalresults in Section 4.

The results are summarized in Tables 3 and 4 and in Figs. 3 and 4, which present point estimates and correspondingestimates of RMSE. A clear picture about the wealth of Tuscan provinces emerges, with the provinces of Siena and Firenzebeing the wealthiest and the provinces of Massa–Carrara and Lucca the least wealthy. In particular, the provinces ofMassa–Carrara and Lucca have clearly the lowest average equivalised household incomewhereas the provinces of Siena and

Page 12: Non-parametric bootstrap mean squared error estimation for -quantile estimators of small area averages, quantiles and poverty indicators

2900 S. Marchetti et al. / Computational Statistics and Data Analysis 56 (2012) 2889–2902

Fig. 4. Estimates of the average income, Head Count Ratio and Poverty Gap for provinces in Tuscany (For average income, darker colours indicate higherwealth for HCR and PG darker colours indicate higher poverty).

Firenze the highest. Massa–Carrara and Lucca also have the highest number of households below the poverty line whereasSiena and Firenze the lowest and this picture remains the samewhenwe look at the spatial distribution of PG. Examining thepercentiles of the province-specific incomedistributionswe note that estimates of average income are higher than estimatesof median income, which highlights the right asymmetry of the income distributions. Using the percentile estimates ofincome we can draw some further conclusions. Looking at the average income and the HCR we note that the province ofGrosseto is among the least wealthy Tuscan provinces. However, when examining the estimate of the third quartile forGrosseto we note that this is similar to the estimate of the third quartile of Arezzo, which is one of the wealthiest provinces.This indicates the presence of inequality in Grosseto. Some evidence of inequality also exists for the provinces of Livornoand Pisa.

6. Conclusions

In this paper we propose the use of non-parametric bootstrap for estimating the MSE for small area averages, quantilesand poverty indicators estimated with theM-quantile model. Given that analytic MSE estimation for quantiles and povertyindicators is difficult, the proposedMSE estimator provides one practical approach forMSE estimation of complex small areastatistics. As illustrated in the empirical section, the proposed bootstrap MSE estimator approximates well the ‘true’ MSEerror of the target parameters. In addition, these results show that bootstrap MSE estimation is notably more stable thancorresponding analytic estimation. Although the proposed non-parametric bootstrap scheme is based on resampling modelresiduals, there are alternative approaches one could use for implementing the bootstrap. For example, in an attempt tomimic the nested error regression model, the M-quantile small area model can be expressed relative to the median model.In this way one could define area pseudo-random effects, which can be bootstrapped by using their empirical distribution.The impact of using alternative bootstrap schemes on MSE estimation for small area parameters extends beyond the scopeof this paper and will be studied elsewhere.

The practical implementation of the estimation procedures we describe in this paper is assisted by the availability of Rfunctions that are available from the authors upon request (see Appendix for a description of these R functions). Themethodweproposed is certainly computationally intensive, however, it is general and therefore it can be adapted forMSE estimationof a wide range of small area parameters. The bootstrap MSE estimator has been used with real Census micro-data in Italy.In the application of Section 5 we obtained MSE bootstrap estimates for a region with a population of about 1.4 millionhouseholds using 50 bootstrap populations and 100 bootstrap samples in about 16 h. The bootstrap MSE estimator methodhas been also used with a population of more than 3.6 millions of households (Census micro-data for the Lombardia regionin Italy) and a computational time of less than two days. (Molina et al., 2010b). The time required for running the bootstrapis certainly feasible provided that results are produced separately at some sub-national level (region, county) and not at thenational level in one go.

Although the present paper focuses on MSE estimation of specific small area parameters – averages, quantiles, andtwo poverty indicators – the proposed bootstrap can be potentially used for estimating the MSE of a range of other smallarea parameters for example, more complex poverty indicators. Future studies should aim at investigating this. Finally,comparing the proposed methods for poverty point and MSE estimation to alternative widely used methods, such as themethod proposed by Molina and Rao (2010) and the method proposed by Elbers et al. (2003), under a range of scenarios isthe focus of work we are currently pursuing.

Acknowledgements

The authors would like to acknowledge the valuable comments and suggestions of the Editor, the Associate Editor andthree referees. These led to a considerable improvement in the paper. This work is financially supported by the European

Page 13: Non-parametric bootstrap mean squared error estimation for -quantile estimators of small area averages, quantiles and poverty indicators

S. Marchetti et al. / Computational Statistics and Data Analysis 56 (2012) 2889–2902 2901

Project SAMPLE ‘‘Small Area Methods for Poverty and Living Condition Estimates’’, funded by the European Commission’s7th Framework Programme (www.sample-project.eu).

Appendix. Specifications of R functions

A.1. Estimation of small area means with mq.sae

• Required Packages:MASS• mq.sae(y,x,regioncode.s,m,p,x.outs,regioncode.r,tol.value,maxit.value,k.value).

The function provides estimates of small area averages using (3) and (4) and corresponding analytic estimates of MSEusing (6). Key arguments required are the response variable y, the matrix of covariates x, covariate information for out ofsample areas x.outs, the number of small areasm and values relating to the convergence of the algorithm.

A.2. Estimation of small area quantiles with mq.sae.quant

• Required Packages:MASS and np• mq.sae.quant(q,y,x,x.outs,regioncode.s,regioncode.r,MSE,B,R,method,maxit).

The function provides estimates of small area quantiles using (7) and corresponding bootstrap estimates of MSE using(14). Key arguments required are the response variable y, the matrix of covariates x, covariate information for out of sampleareas x.outs and values relating to the convergence of the algorithm. IfMSE = TRUE bootstrap MSE estimates are produced.B denotes the number of bootstrap populations and R denotes the number of bootstrap samples from each bootstrappopulation. Finally method defines the type of residuals used for generating the bootstrap population: ‘su’ (smoothedunconditional), ‘eu’ (empirical unconditional), ‘sc’ (smoothed conditional), ‘ec’ (empirical unconditional). The default is setto ‘eu’, which is computationally faster.

A.3. Estimation of small area poverty indicators with mq.sae.poverty

• Required Packages:MASS and np• mq.sae.poverty(y,x,x.outs,regioncode.s,regioncode.r,L,MSE,B,R,method).

The function provides estimates of small area HCRs and PGs using estimator of (8) proposed in Section 2 andcorresponding bootstrap estimates of MSE using (14). Key arguments required are the response variable y, the matrix ofcovariates x, covariate information for out of sample areas x.outs and values relating to the convergence of the algorithm. Lspecifies the number of Monte Carlo runs for estimating HCR and PG using the estimation method in Section 2.2. If MSE =

TRUE bootstrap MSE estimates are produced. B denotes the number of bootstrap populations and R denotes the numberof bootstrap samples from each bootstrap population. Finally method defines the type of residuals used for generatingthe bootstrap population: ‘su’ (smoothed unconditional), ‘eu’ (empirical unconditional), ‘sc’ (smoothed conditional), ‘ec’(empirical unconditional). The default is set to ‘eu’, which is computationally faster.

References

Aragon, Y., Casanova, S., Chambers, R., Leoconte, E., 2005. Conditional ordering using nonparametric expectiles. Journal of Official Statistics 21 (4).Bowman, A., Hall, P., Prvan, T., 1998. Bandwidth selection for the smoothing of distribution functions. Biometrika 85, 799–808.Breckling, J., Chambers, R., 1988.M-quantiles. Biometrika 75 (4), 761–771.Chambers, R., Chandra, H., Tzavidis, N., 2011. On bias-robust mean squared error estimation for pseudo-linear small area estimators. Survey Methodology

37 (2), 153–170.Chambers, R., Dorfman, A., Peter, H., 1992. Properties of estimators of the finite population distribution function. Biometrika 79 (3), 577–582.Chambers, R., Dunstan, , 1986. Estimating distribution function from survey data. Biometrika 73, 597–604.Chambers, R., Tzavidis, N., 2006.M-quantile models for small area estimation. Biometrika 93 (2), 255–268.Chatterjee, S., Lahiri, P., Li, H., 2008. Parametric bootstrap approximation to the distribution of eblup and related prediction intervals in linearmixedmodels.

The Annals of Statistics 36 (3), 1221–1245.Elbers, C., Lanjouw, J.O., Lanjouw, P., 2003. Micro-level estimation of poverty and inequality. Econometrica 71 (1), 355–364. URL:

http://www.jstor.org/stable/3082050.Foster, J., Greer, J., Thorbecke, E., 1984. A class of decomposable poverty measures. Econometrica 52, 761–766.Hall, P., Maiti, T., 2006a. Nonparametric estimation of mean-squared prediction error in nested-error regression models. The Annals of Statistics 34 (4),

1733–1750.Hall, P., Maiti, T., 2006b. On parametric bootstrap methods for small area prediction. Journal of the Royal Statistical Society: Series B (Statistical

Methodology) 68 (2), 221–238. URL: http://dx.doi.org/10.1111/j.1467-9868.2006.00541.x.Hayfield, T., Racine, J.S., 2008. Nonparametric econometrics: the np package. Journal of Statistical Software 27 (5), URL: http://www.jstatsoft.org/v27/i05/.Koenker, R., Bassett, G., 1978. Regression quantiles. Econometrica 46, 33–50.Kokic, P., Chambers, R., Breckling, J., Beare, S., 1997. A measure of production performance. Journal of Business and Economic Statistics 15 (4), 445–451.Li, Q., Racine, J., 2007. Nonparametric Econometric: Theory and Practice. Princeton University Press.Lombardía, M., González-Manteiga, W., Prada-Sánchez, J., 2003. Bootstrapping the chambers-dunstan estimate of finite population distribution function.

Journal of Statistical Planning and Inference 116, 367–388.Molina I., Morales D., Pratesi M., Tzavidis N., (Eds.), 2010a. Final small area estimation development and simulation results. Vol. Deliverable 12 and

16—S.A.M.P.L.E. Project. European Union—7th Framework Programme.

Page 14: Non-parametric bootstrap mean squared error estimation for -quantile estimators of small area averages, quantiles and poverty indicators

2902 S. Marchetti et al. / Computational Statistics and Data Analysis 56 (2012) 2889–2902

Molina I., Morales D., Pratesi M., Tzavidis N., (Eds.), 2010b. Pilot applications. Vol. Deliverable 17—S.A.M.P.L.E. Project. European Union—7th FrameworkProgramme.

Molina, I., Rao, J., 2010. Small area estimation of poverty indicators. Canadian Journal of Statistics 38 (3), 369–385.Newey, W., Powell, J., 1987. Asymmetric least squares estimation and testing. Econometrica 55 (4), 819–847.R Development Core Team, 2010. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.Rao, J., 2003. Small Area Estimation. Wiley, New York.Royall, R., Cumberland, W., 1978. Variance estimation in finite population sampling. Journal of the American Statistical Association 73, 351–358.Sinha, S., Rao, J., 2009. Robust small area estimation. The Canadian Journal of Statistics 37 (3), 381–399.Tzavidis, N., Marchetti, S., Chambers, R., 2010. Robust estimation of small area means and quantiles. Australian & New Zealand Journal of Statistics 52 (2),

167–186.