Upload
rio
View
32
Download
2
Embed Size (px)
DESCRIPTION
Challenges in small area estimation of poverty indicators. Risto Lehtonen, Ari Veijanen, Maria Valaste (University of Helsinki) , and Mikko Myrskylä ( Max Planck Institute for Demographic Research, Rostock). Ameli 2010 Conference, 25-26 February 2010, Vienna. Outline. Background - PowerPoint PPT Presentation
Citation preview
Challenges in small area estimation of poverty indicators
Risto Lehtonen, Ari Veijanen, Maria Valaste(University of Helsinki) , andMikko Myrskylä (Max Planck Institute for Demographic Research, Rostock)
Ameli 2010 Conference, 25-26 February 2010, Vienna
Outline
Background
Material and methods
Results
Discussion
References
2
EU/FP7 Project AMELI
Advanced Methodology for European Laeken Indicators (2008-2011)
The project is supported by European Commission funding from the Seventh Framework Programme for Research
DoW: The study will include research on data quality including
Measurement of quality Treatment of outliers and nonresponse Small area estimation The measurement of development over time
3
Material and methods
Investigation of statistical properties (bias and accuracy) of estimators of selected Laeken indicators for population subgroups or domains and small areas
Method: Design-based Monte Carlo simulation experiments based on real data
Data: Statistical register data based on merging of administrative register data at the unit level (Finland)
4
Laeken indicators based on binary variables
At-risk-of poverty rate
Direct estimators Horvitz-Thompson estimators HT
Indirect estimators Model-assisted GREG and MC estimators Model-based EBLUP and EB estimators
Modelling framework Generalized linear mixed models GLMM
Lehtonen and Veijanen (2009) Rao (2003), Jiang and Lahiri (2006)
5
Laeken indicators based on medians or quantiles
Indicators based on medians or quantiles of cumulative distribution function of the underlying continuous variable
Relative median at-risk-of poverty gap Quintile share ratio (S20/S80 ratio) Gini coefficient
Direct estimators DEFAULT Synthetic estimators SYN Expanded prediction SYN estimators EP-SYN Composite estimators COMP Simulation-based methods
6
Generalized linear mixed models
1
0 1
Model formulation with
( ) ( ( )), 1,..., , where
(.) refers to the chosen functional form
(1, ,..., )
( , ,...,
m k k dd
k k pk
p
E y f d D
f
x x
domain - specific
(area - specific) random terms
u x β u
x
β
0
) are fixed effects
( ,..., ) are random effects
ˆˆ ˆFitted values are ( ( )),
d d pd
k k d
u u
y f k U
u
x β u7
Design-based GREG type estimators for poverty rate
GREG estimators MLGREG
ˆ ˆˆ
1,..., , where
1/
ˆ ˆ
ˆˆ ˆ( ),
refers to logistic function
d ddMLGREG k k kk U k s
k k
k k k
k k d
t y a e
d D
a
e y y
y f k U
f
x β u
8
Model-based estimators forpoverty rate
EBLUP and EB type estimators
ˆ ˆ ,
1,..., , where
ˆˆ ˆ( ),
refers to logistic function
d d ddEBLUP k kk s k U s
k k d
t y y
d D
y f k U
f
x β u
9
Poverty gap for domains
Relative median at-risk-of poverty gap
Poverty gap in domain d describes the difference between the poor people's median income and the at-risk-of-poverty threshold t
10
{ ; ; }
1,...,
k k dd
t Md y y t k Ug
td D
Estimators of poverty gap
11
for domain is calculated
from the sample values :
ˆ ˆ{ ; ; }ˆ ˆ
1,..., ,
ˆwhere is HT estimator of poverty
threshold for the whole population
k
HT k k HT dd
HT
HT
d
y
t Md y y t k sg
t
d D
t
Default estimator
Estimators of poverty gap
12
;
for domain is calculated from
ˆpredicted values so that people with prediction
smaller than the estimated threshold are classified
as poor:
ˆ ˆˆ ˆ{ ; ;ˆ
k
HT k k HT dd SYN
d
y
t Md y y t k Ug
Synthetic estimator
}ˆ
1,..., ,
where
ˆˆ ˆ( ), , 1,...,
HT
k k d
t
d D
y k U d D
x β u
Estimators of poverty gap
13
Composite estimator incorporates the default
estimator and the synthetic estimator:
; ,ˆ ˆˆ ˆ ˆ(1 )d COMP d d d d SYNg g g
where d̂ is an average of
,
,
ˆ ˆ( )ˆ ˆˆ ˆ( ) ( )
d SYN
d SYN d
MSE g
MSE g MSE g
over a domain size class.
Estimators of poverty gap
14
Alternative SYN estimator EP-SYN: Expanded prediction SYN estimator ;
ˆd EP SYNg
We transform predictions ˆky ( dk U ) so that they have
similar histogram as the observed values ky (k s )
ˆcq Percentage points of the distribution of ˆ
ky
cq Percentage points of the sample values ky
Find a linear transformation * ˆk ky a by so that * ˆ
c cq a bq
are close to corresponding cq dk U
(Ref. triple-goal estimation, e.g. Judkins and Liu 2000, Rao 2003)
MSE estimation for direct estimator DEFAULT
15
Estimation of ˆ ˆ( )dMSE g by bootstrap:
An artificial population is generated by cloning each unit with frequency equal to the design weight
Bootstrap samples are drawn with the original sampling design from the artificial population
The variance of the default estimator is then estimated by the sample variance of estimates in the bootstrap samples
MSE estimation for SYN estimator
16
Estimation of ,ˆ ˆ( )d SYNMSE g :
2
, ,ˆ ˆˆ ˆ ˆ ˆ( ) ( )d SYN d SYN d dMSE g g g MSE g
Rao (2003 p. 52) and Fabrizi et al. (2007)
Alternative estimation of ,ˆ ˆ( )d SYNMSE g :
Parametric bootstrap similar to Molina and Rao (2009)
Monte Carlo simulation Fixed finite population of 1,000,000 persons D = 70 domains of interest
Cross-classification of NUTS 3 with sex and age group (7x2x5)
Y-variables Equivalized income (based on register data) Binary indicator for persons in poverty
X-variables (binary or continuous variables) house _owner (binary) education_level (7 classes) and educ_thh lfs_code (3 classes) and empmohh socstrat (6 classes) sex_class and age_class (5 age classes) NUTS3
17
Sampling designs
SRSWOR sampling Sample size n = 5,000 persons
Stratified SRSWOR Sample size n = 5,000 persons Stratification by education level of HH head H = 7 strata Unequal inclusion probabilities Design weights vary between strata
- Min: 185, Max: 783
K = 1000 independent samples
18
Quality measures of estimators
Design bias Absolute relative bias
ARB (%)
Accuracy Relative root mean
squared error RRMSE (%)
1
1 ˆ ( ) /K
d k d dk
sK
2
1
1 ˆ( ( ) ) /K
d k d dk
sK
19
20
Table 1. Poverty rate estimators with logistic mixed model including NUTS3 level random intercepts Unequal probability sampling: Stratified SRS (by education level) Predictors: house_owner, age_class, sex_class, lfs_code, education Domains: NUTS3 x age x sex (D = 70 domains)
Estimator
Average ARB (%) Average RRMSE (%)
Domain size class Domain size class Minor 20-49
Medium 50-99
Major 100-
Minor 20-49
Medium 50-99
Major 100-
Design-based estimators
MLGREG 2.2 2.3 1.3 48.8 31.9 21.8
Model-based estimators
EBLUP (EB) 14.4 10.6 4.4 20.4 17.0 10.8
Table 2. Poverty gap estimators with linear mixed model fitted to log(income+1) including NUTS3 level random intercepts SRSWOR sampling Predictors: house_owner, educ_thh, empmohh, lfs_code, socstrat Domains: NUTS3 x age x sex (D = 70 domains)
Estimator
Average ARB (%) Average RRMSE (%)
Domain size class Domain size class Minor 20-49
Medium 50-99
Major 100-
Minor 20-49
Medium 50-99
Major 100-
Direct estimator
DEFAULT 12.1 4.4 1.8 65.8 43.6 27.3
Model-based estimators
SYN 40.1 43.4 57.5 61.5 57.1 62.1
EP-SYN 17.0 19.6 16.6 23.8 25.4 22.9
Composite estimator
COMP (with DEFAULT and EP-SYN)
10.9 14.4 11.9 25.6 22.4 18.6
21
Discussion: Poverty rate
Indirect design-based estimator MLGREG Design unbiased Large variance in small domains Small variance in large domains
Indirect model-based estimator EB Design biased Small variance also in small domains Accuracy: EB outperformed MLGREG Might be the best choice at least for small
domains unless it is important to avoid design bias
22
Discussion: Poverty gap
Direct estimator DEFAULT Small design bias but large variance
Indirect model-based SYN Very large bias but small variance
Indirect model-based EP-SYN based on expanded predictions
Much smaller bias and variance than in SYN
Composite (DEFAULT with EP-SYN) Small domains: good compromise Large domains: bias can still dominate the MSE
23
24
References Fabrizi, E., M. R. Ferrante and S. Pacei (2007). Comparing alternative distributional assumptions in mixed models used for small area estimation of income parameters. Statistics in Transition 8, 423-439. Jiang, J. and P. Lahiri (2006). Mixed model prediction and small area estimation. Sociedad de Estadistica e Investigacion Operative Test 15, 1-96. Judkins, D. R. and J. Liu (2000). Correcting the bias in the range of a statistic across small areas. Journal of Official Statistics 16, 1-13. Lehtonen, R. and A. Veijanen (2009). Design-based methods of estimation for domains and small areas. In: C. R. Rao and D. Pfeffermann (eds.), Handbook of Statistics 29B. Sample Surveys: Inference and Analysis. Elsevier. Molina, I. and J.N.K. Rao (2009). Estimation of poverty measures in small areas. (Manuscript) Rao, J. N. K. (2003). Small Area Estimation. John Wiley & Sons, New York.
Thank you for your attention!
25