28
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio, Marco Fortini, Stefano Falorsi ISTAT Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Embed Size (px)

Citation preview

Page 1: Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census

Loredana Di Consiglio, Marco Fortini, Stefano

Falorsi

ISTAT

Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Page 2: Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census

Outline

Purpose: to plan a sampling strategy taking into

account for municipal undercoverage of next

Italian Census round

Sketch of 2011 Italian Census

Sources of data useful in planning Post

Enumeration Survey (PES)

Sampling strategies considered for comparison

Construction of a fictitious, but plausible,

population for simulations of sampling universe

Results of simulation study

Page 3: Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census

Key innovations of the 2011 Italian census

From traditional enumeration method…

Search for households and people on the field

… to a register-supported census

Municipal population registers so to mail out questionnaires to people

Data collection method based on web, mail back and municipal data collection centres

Reduction of the number of enumerators Data collection from late respondentsCoverage evaluation activities

Page 4: Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census

Coverage evaluation program

Requested by Eurostat quality report, it is anyhow

crucial in this context of extensive process and

methods innovations

Over-coverage: people no more living in the

municipality who are still enlisted into the

population registers

Checked by interviewers during contact of late-respondents

Under-coverage: people living in the municipality

being not yet enlisted in population registers Supplemental lists of people Extensive search on the field Statistical estimation based on capture-recapture

techniques

Page 5: Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census

Overview of Italian census undercount

Gross undercoverage of population registers

Estimated by Fortini and Gallo (2009) in about 400,000 people (up to 560,000) through administrative data and mixture model analysis to account for underreporting in the source

Gross undercoverage of 2001 Census (enumeration

based)

2001 Post Enumeration Survey estimates that about 800,000 people were missed

Both estimates are based on strong assumptions

However, this evidence makes reasonable the use

of municipal population registers as the main

source for households enumeration

Page 6: Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census

Capture-Recapture Approach

Correction for population register undercount through a

second source based on independent field enumeration

x1+ people enlisted into municipal register

estimate of municipal population based on field enumeration survey in a sample or enumeration areas (EAs)

estimate of people that would have been counted by both the sources if field enumeration had carried out on the whole municipal area

Petersen estimator of the hidden population is (Wolter,

1986)

Main goal: municipality estimates of population

counts

1ˆx

11

11

11

11111 ˆ

ˆ

ˆ

ˆˆ~x

xx

x

xxxxN

11x̂

Page 7: Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census

Sampling design for the 2011 Post-Enumeration Survey

About 1300 municipalities and 1,200,000 people will

be sampled

Two alternative two-stage sampling design with

municipalities and enumeration areas as primary and

secondary sampling units

Design A - region by class of population size (less than 5000, 5000-20000, 20000-50000, more than 50000)

Design B - aggregation of provinces inside region by the 4 classes of population size (help in reducing bias of SAE)

Stratification and selection of municipalities

according to their population size is considered for

both designs

It is necessary to sample among municipalities in

order to control costs

Page 8: Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census

Estimators

Direct estimates of census counts are available only

at planned domain level

small area estimation methods are needed at least for municipalities not included in the sample

Possible available predictors at area level modelling

Population counts coming from register

Demographic indicators (e.g. dependency ratios)

Socio economic indicators

In what follows we consider Direct estimation at regional level (Planned domains) Synthetic estimator at municipality level

Assumption of invariance among municipal under-coverage rates at planned domain level

Page 9: Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census

Direct Estimators

)(11,

)(11)( ˆ

ˆˆ

Em

EmmEm

X

XXN

mi

iiEm xwX 1)(1ˆ

mi

iiEm xwX 11)(11,ˆ

mmi cCw /

)(11,

)(11)( ˆ

ˆˆ

Cm

CmmCm

X

XXN

mi

iiCCm xwX 1)(1ˆ

mi

iiCCm xwX 11)(11,ˆ

111 / iimiC xxXw

Simple

Calibrated

Expansion estimators

Expansion estimators

Inverse of the selection probability

Final weight

Page 10: Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census

Synthetic Estimator

)(1

)(11,)( ˆ

ˆˆ

ED

EDED

X

X

)(1

)(11,)( ˆ

ˆˆ

CD

CDCD

X

X

)(1)( ˆ/ˆEDmSEm XN

)(1)( ˆ/ˆCDmSCm XN

Based on invariance assumption of under-coverage rates for

municipalities belonging to the same planned domain

For each system of weights, the coverage ratio is computed at

domain level

From the ratios, simple and calibrated synthetic estimators

are obtained for municipalities

Simple Calibrated

Page 11: Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census

Empirical study

It is based on simulation study

Two pseudo-populations of 335,643 Italian EAs were

considered

Sources of information

2001 Italian Post Enumeration Census

Administrative data on changes of residence occurred after 2001 census (from November 2002 to December 2005)

For every non empty EAs belonging to the 8101 Italian

municipalities, the following counts were generated Observed count from population register (X1+)

True (N) population count Field enumeration count (X+1)

Count of people enumerated by both the sources (X11 )

Page 12: Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census

Assemble the Pseudo-population

For each Municipality

Munic.Id

EAId

True N

P. Reg

Survey

Both

1015

1 535

1015

2 37

1015

3 53

1015 4 40

1015 5 4

1015

6 64

1015

7 13

Tot. 746

EA Population register

counts come from 2001

Census counts

Page 13: Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census

Assign True population counts to municipality

For each Municipality

Munic. EA TrueN P.Reg Survey

Both

1015

1 535

1015

2 37

1015

3 53

1015 4 40

1015 5 4

1015

6 64

1015

7 13

Tot. 755 746

EA Population register

counts come from 2001

Census counts

True municipal Population counts: inflating P. Reg. with coverage rate ‘r’ estimated by model in Fortini, Gallo (2009) (2 different populations)

1/r

Page 14: Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census

Assign True population counts to EAs

For each Municipality

Munic. EA TrueN P.Reg Survey

Both

1015

1 538 535

1015

2 37 37

1015

3 58 53

1015 4 40 40

1015 5 4 4

1015

6 65 64

1015

7 13 13

Tot. 755 746

EA Population register

counts come from 2001

Census counts

True municipal Population counts: inflating P. Reg. with coverage rate ‘r’ estimated by model in Fortini, Gallo (2009) (2 different populations)

1/r

True N is allocated between EAs by hierarchical Dirichlet/Multinomial model with parameter vector p given by distribution of P. Reg population among EAs

Page 15: Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census

Assign survey counts to EAs

Each Municipality

Munic. EA True N

P.Reg Survey

Both

1015

1 538 535

1015

2 37 37

1015

3 58 53

1015 4 40 40

1015 5 4 4

1015

6 65 64

1015

7 13 13

Tot. 755 746

EA Survey counts – True N

multiplied by coverage

rate ‘rs’ ‘rs’ from beta -

binomial distribution

“alpha” and “beta” such

that mean and variance of

2001 PES coverage rates is

reproduced

(5 macro regions by 4

classes of munic. pop.

size)

rs536

Page 16: Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census

Assign survey counts to municipality

Each Municipality

Munic. EA True N

P. Reg

Survey

Both

1015

1 538 535 536

1015

2 37 37 37

1015

3 58 53 58

1015 4 40 40 39

1015 5 4 4 4

1015

6 65 64 65

1015

7 13 13 13

Tot. 755 746 752

Municipal count is obtained

summing up value of the EAs

Page 17: Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census

Assign number of people enumerated by both the lists

Each Municipality

Munic. EA TrueN P.Reg Survey

Both

1015

1 538 535 536 533

1015

2 37 37 37

1015

3 58 53 58

1015 4 40 40 39

1015 5 4 4 4

1015

6 65 64 65

1015

7 13 13 13

Tot. 755 746 752

People enumerated by

both lists: Hypergeometric

distribution at EA level with

parameters True N, P.Reg,

Survey

Page 18: Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census

Assign number of people enumerated by both the lists

Each Municipality

Munic. EA TrueN P.Reg Survey

Both

1015

1 538 535 536 533

1015

2 37 37 37 37

1015

3 58 53 58 53

1015 4 40 40 39 39

1015 5 4 4 4 4

1015

6 65 64 65 64

1015

7 13 13 13 13

Tot. 755 746 752 743

Municipal count is obtained

summing up EAs

Page 19: Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census

St. dev. of coverage rates among municipalities

About 400,000 and 900,000 missing people were generated for pseudo-Register and pseudo-Survey respectively

Population register variability is larger for POP2 than for POP1

Survey variability is larger than its respective Population register variability (because of its lower coverage rate)

Survey variability is not so close to PES variability, even though their order of magnitude is the same

p Register p Survey POP1 POP2 POP1 POP2

p PES

N-West 0.0051 0.0096 0.0164 0.0172 0.0211

N-East 0.0051 0.0100 0.0041 0.0044 0.0145

Centre 0.0041 0.0085 0.0108 0.0128 0.0059

South 0.0036 0.0068 0.0094 0.0099 0.0284

Isles 0.0040 0.0082 0.0134 0.0135 0.0211

Page 20: Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census

Variability of coverage rates among EAs – Population registers

Pseudo-coverage of the register vs size of EAs (left) is compared with EAs coverage rates distribution at 2001 Italian PES (1098 EAs)

Too many points here

Simulated EAs show too many large units with very small

coverage rate, which seems not realistic in our context

Page 21: Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census

Variability of coverage rates among EAs – Control survey

Pseudo-coverage of survey vs size of EAs (left) is compared with EAs coverage rates distribution at 2001 Italian PES (1098 EAs)

Too few points here

Simulated EAs show too few small units with small

coverage rate in this case

Page 22: Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census

Simulation of the sampling space

Four tests: designs A and B for populations 1 and 2

Each simulation is based on 500 sample replications

Sampling of municipalities with probability proportional

to their population size

Simple random sampling of EAs within municipalities

Simple and weighted direct estimation at domain level

Synthetic estimation at municipality level

Population counts coming from population registers are

used here as benchmark for comparisons

downwards biased but available at zero cost of achievement

Page 23: Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census

Results – Bias of registers vs. synthetic estimates

Main results Direct estimates have good performance in terms of bias and MSE

at domain level Calibrated estimates overcome the simple ones in terms of MSE,

both for direct and synthetic estimators The less-aggregate design B does not significantly improve the

estimates, so only design A is shown here

In terms of bias, synthetic estimator improves registers. Improvements decrease for larger municipalities. This results are more evident for population 1 than for population 2

In terms of maximum bias the improvement is not so noticeable

Table 2 Average and (Maximum Relative Bias)% of the calibrated synthetic estimator and register count for Population 1 and 2 (design A by class of municipality size ) Less than 5,000 5,000 – 19,000 20,000 – 50,000 50,000 and more

Register Synthetic Register Synthetic Register Synthetic Register Synthetic

P1 0.952 (8.75)

0.327 (7.94)

1.063 (9.20)

0.261 (7.80)

0.642 (3.74)

0.224 (2.40)

0.515 (1.42)

0.189 (1.30)

P2 0.844

(14.12) 0.676

(13.29) 0.940

(14.92) 0.541

(13.97) 0.702 (6.30)

0.469 (5.27)

0.479 (2.43)

0.326 (1.81)

Page 24: Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census

Bias of synthetic estimator vs register counts Population 1 - design A by class of municipality size

Less than 5,000

5,000 – 19,000Bisectors delimit the zone where synthetic estimates are better than simple register counts in term of bias

20,000 – 49,000

50,000 and more

Synthetic estimator almost always improve registers in terms of bias However, the improvement does not seem so prominent

Page 25: Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census

Bias of synthetic estimator vs register count Population 2 - design A by class of municipality size

Same conclusion for POP2 with worst results for larger municipalities

Page 26: Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census

Results – MSE of synthetic and direct estimators

Direct estimator can be applied to self-representative

municipalities

It is reported in the table for the two classes of larger municipalities

On average, synthetic estimator overcome the direct, which

seems not useful even in sampled municipalies

MSE of synthetic estimates is much larger than Bias (in

Table 2)

Since in real cases this does not happen, this could be an evidence of a too high variability of pseudo-populations at level of EAs

Table 3 Average Relative Root MSE% and Maximum ARRMSE% of the calibrated Synthetic estimator for Population 1 and 2- design A by class of municipality size (for classes 3 and 4, are reported also calibrated direct estimate)

Less than 5,000 5,000 – 19,000 20,000 – 50,000 50,000 and more

Synthetic Synthetic Synthetic Direct Synthetic Direct

P1 0.658 (7.95)

1.198 (9.96)

0.966 (3.58)

2.793 (30.667)

0.999 (6.86)

2.609 (19.07)

P2 1.070

(13.30) 1.297

(13.99) 1.350 (5.54)

3.113 (42.88)

1.280 (3.43)

2.372 (15.26)

Page 27: Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census

Difference between synthetic and direct estimator in terms of MSE – municipalities larger than 50,000 inh.

The most part of municipalities larger than 50,000 inh.

show better Synthetic MSE (negative values)

Direct and Synthetic estimates are equivalent for larger

municipalities (>250,000 inh.), but only for in POP1

Page 28: Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census

Concluding Remarks Sampling strategy of next Italian Census PES is evaluated here through pseudo-population and simulated experiments Slight improvement in census counts from registers is obtained from synthetic estimates Though Census PES is required by EU regulation for evaluation purposes, our present results does not endorse the use of PES in order to correct Census counts Even not discussed here, direct estimation with calibration achieved suitable results at domain level both in term of Bias and Variance

Further developments Better definition of pseudo-populations with respect to coverage ratios between EAs Use of model estimation (EBLUP) is promising in our previous studies carried out in a simplified framework

Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010