Challenges in small area estimation of poverty indicators

Challenges in small area estimation of poverty indicators

Risto Lehtonen, Ari Veijanen, Maria Valaste(University of Helsinki) , andMikko Myrskylä (Max Planck Institute for Demographic Research, Rostock)

Ameli 2010 Conference, 25-26 February 2010, Vienna

Outline

Background

Material and methods

Results

Discussion

References

2

EU/FP7 Project AMELI

Advanced Methodology for European Laeken Indicators (2008-2011)

The project is supported by European Commission funding from the Seventh Framework Programme for Research

DoW: The study will include research on data quality including

Measurement of quality Treatment of outliers and nonresponse Small area estimation The measurement of development over time

3

Material and methods

Investigation of statistical properties (bias and accuracy) of estimators of selected Laeken indicators for population subgroups or domains and small areas

Method: Design-based Monte Carlo simulation experiments based on real data

Data: Statistical register data based on merging of administrative register data at the unit level (Finland)

4

Laeken indicators based on binary variables

At-risk-of poverty rate

Direct estimators Horvitz-Thompson estimators HT

Indirect estimators Model-assisted GREG and MC estimators Model-based EBLUP and EB estimators

Modelling framework Generalized linear mixed models GLMM

Lehtonen and Veijanen (2009) Rao (2003), Jiang and Lahiri (2006)

5

Laeken indicators based on medians or quantiles

Indicators based on medians or quantiles of cumulative distribution function of the underlying continuous variable

Relative median at-risk-of poverty gap Quintile share ratio (S20/S80 ratio) Gini coefficient

Direct estimators DEFAULT Synthetic estimators SYN Expanded prediction SYN estimators EP-SYN Composite estimators COMP Simulation-based methods

6

Generalized linear mixed models

1

0 1

Model formulation with

( ) ( ( )), 1,..., , where

(.) refers to the chosen functional form

(1, ,..., )

( , ,...,

m k k dd

k k pk

p

E y f d D

f

x x

domain - specific

(area - specific) random terms

u x β u

x

β

0

) are fixed effects

( ,..., ) are random effects

ˆˆ ˆFitted values are ( ( )),

d d pd

k k d

u u

y f k U

u

x β u7

Design-based GREG type estimators for poverty rate

GREG estimators MLGREG

ˆ ˆˆ

1,..., , where

1/

ˆ ˆ

ˆˆ ˆ( ),

refers to logistic function

d ddMLGREG k k kk U k s

k k

k k k

k k d

t y a e

d D

a

e y y

y f k U

f

x β u

8

Model-based estimators forpoverty rate

EBLUP and EB type estimators

ˆ ˆ ,

1,..., , where

ˆˆ ˆ( ),

refers to logistic function

d d ddEBLUP k kk s k U s

k k d

t y y

d D

y f k U

f

x β u

9

Poverty gap for domains

Relative median at-risk-of poverty gap

Poverty gap in domain d describes the difference between the poor people's median income and the at-risk-of-poverty threshold t

10

{ ; ; }

1,...,

k k dd

t Md y y t k Ug

td D

Estimators of poverty gap

11

for domain is calculated

from the sample values :

ˆ ˆ{ ; ; }ˆ ˆ

1,..., ,

ˆwhere is HT estimator of poverty

threshold for the whole population

k

HT k k HT dd

HT

HT

d

y

t Md y y t k sg

t

d D

t

Default estimator


12

;

for domain is calculated from

ˆpredicted values so that people with prediction

smaller than the estimated threshold are classified

as poor:

ˆ ˆˆ ˆ{ ; ;ˆ

k

HT k k HT dd SYN

d

y

t Md y y t k Ug

Synthetic estimator

}ˆ

1,..., ,

where

ˆˆ ˆ( ), , 1,...,

HT

k k d

t

d D

y k U d D

x β u


13

Composite estimator incorporates the default

estimator and the synthetic estimator:

; ,ˆ ˆˆ ˆ ˆ(1 )d COMP d d d d SYNg g g

where d̂ is an average of

,

,

ˆ ˆ( )ˆ ˆˆ ˆ( ) ( )

d SYN

d SYN d

MSE g

MSE g MSE g

over a domain size class.


14

Alternative SYN estimator EP-SYN: Expanded prediction SYN estimator ;

ˆd EP SYNg

We transform predictions ˆky ( dk U ) so that they have

similar histogram as the observed values ky (k s )

ˆcq Percentage points of the distribution of ˆ

ky

cq Percentage points of the sample values ky

Find a linear transformation * ˆk ky a by so that * ˆ

c cq a bq

are close to corresponding cq dk U

(Ref. triple-goal estimation, e.g. Judkins and Liu 2000, Rao 2003)

MSE estimation for direct estimator DEFAULT

15

Estimation of ˆ ˆ( )dMSE g by bootstrap:

An artificial population is generated by cloning each unit with frequency equal to the design weight

Bootstrap samples are drawn with the original sampling design from the artificial population

The variance of the default estimator is then estimated by the sample variance of estimates in the bootstrap samples

MSE estimation for SYN estimator

16

Estimation of ,ˆ ˆ( )d SYNMSE g :

2

, ,ˆ ˆˆ ˆ ˆ ˆ( ) ( )d SYN d SYN d dMSE g g g MSE g

Rao (2003 p. 52) and Fabrizi et al. (2007)

Alternative estimation of ,ˆ ˆ( )d SYNMSE g :

Parametric bootstrap similar to Molina and Rao (2009)

Monte Carlo simulation Fixed finite population of 1,000,000 persons D = 70 domains of interest

Cross-classification of NUTS 3 with sex and age group (7x2x5)

Y-variables Equivalized income (based on register data) Binary indicator for persons in poverty

X-variables (binary or continuous variables) house _owner (binary) education_level (7 classes) and educ_thh lfs_code (3 classes) and empmohh socstrat (6 classes) sex_class and age_class (5 age classes) NUTS3

17

Sampling designs

SRSWOR sampling Sample size n = 5,000 persons

Stratified SRSWOR Sample size n = 5,000 persons Stratification by education level of HH head H = 7 strata Unequal inclusion probabilities Design weights vary between strata

- Min: 185, Max: 783

K = 1000 independent samples

18

Quality measures of estimators

Design bias Absolute relative bias

ARB (%)

Accuracy Relative root mean

squared error RRMSE (%)

1

1 ˆ ( ) /K

d k d dk

sK

2

1

1 ˆ( ( ) ) /K

d k d dk

sK

19

20

Table 1. Poverty rate estimators with logistic mixed model including NUTS3 level random intercepts Unequal probability sampling: Stratified SRS (by education level) Predictors: house_owner, age_class, sex_class, lfs_code, education Domains: NUTS3 x age x sex (D = 70 domains)

Estimator

Average ARB (%) Average RRMSE (%)

Domain size class Domain size class Minor 20-49

Medium 50-99

Major 100-

Minor 20-49

Medium 50-99

Major 100-

Design-based estimators

MLGREG 2.2 2.3 1.3 48.8 31.9 21.8

Model-based estimators

EBLUP (EB) 14.4 10.6 4.4 20.4 17.0 10.8

Table 2. Poverty gap estimators with linear mixed model fitted to log(income+1) including NUTS3 level random intercepts SRSWOR sampling Predictors: house_owner, educ_thh, empmohh, lfs_code, socstrat Domains: NUTS3 x age x sex (D = 70 domains)

Estimator

Average ARB (%) Average RRMSE (%)

Domain size class Domain size class Minor 20-49

Medium 50-99

Major 100-

Minor 20-49

Medium 50-99

Major 100-

Direct estimator

DEFAULT 12.1 4.4 1.8 65.8 43.6 27.3

Model-based estimators

SYN 40.1 43.4 57.5 61.5 57.1 62.1

EP-SYN 17.0 19.6 16.6 23.8 25.4 22.9

Composite estimator

COMP (with DEFAULT and EP-SYN)

10.9 14.4 11.9 25.6 22.4 18.6

21

Discussion: Poverty rate

Indirect design-based estimator MLGREG Design unbiased Large variance in small domains Small variance in large domains

Indirect model-based estimator EB Design biased Small variance also in small domains Accuracy: EB outperformed MLGREG Might be the best choice at least for small

domains unless it is important to avoid design bias

22

Discussion: Poverty gap

Direct estimator DEFAULT Small design bias but large variance

Indirect model-based SYN Very large bias but small variance

Indirect model-based EP-SYN based on expanded predictions

Much smaller bias and variance than in SYN

Composite (DEFAULT with EP-SYN) Small domains: good compromise Large domains: bias can still dominate the MSE

23

24

References Fabrizi, E., M. R. Ferrante and S. Pacei (2007). Comparing alternative distributional assumptions in mixed models used for small area estimation of income parameters. Statistics in Transition 8, 423-439. Jiang, J. and P. Lahiri (2006). Mixed model prediction and small area estimation. Sociedad de Estadistica e Investigacion Operative Test 15, 1-96. Judkins, D. R. and J. Liu (2000). Correcting the bias in the range of a statistic across small areas. Journal of Official Statistics 16, 1-13. Lehtonen, R. and A. Veijanen (2009). Design-based methods of estimation for domains and small areas. In: C. R. Rao and D. Pfeffermann (eds.), Handbook of Statistics 29B. Sample Surveys: Inference and Analysis. Elsevier. Molina, I. and J.N.K. Rao (2009). Estimation of poverty measures in small areas. (Manuscript) Rao, J. N. K. (2003). Small Area Estimation. John Wiley & Sons, New York.

Thank you for your attention!

25

Documents

Challenges in small area estimation of poverty indicators