76
Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics [email protected] Statistical Training Course on Use of Administrative Registers in Production of Statistics in Warsaw (October 2014)

Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics [email protected]

Embed Size (px)

Citation preview

Page 1: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

Combined use of data from registers and sample surveys

Eric Schulte Nordholt

Statistics Netherlands

Division Socio-economic and spatial statistics

[email protected]

Statistical Training Course on Use of Administrative Registers in Production of Statistics in Warsaw (October

2014)

Page 2: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

2

Contents General

• Social Statistics

• System of social statistical datasets (SSD)

• Group work on registers and surveys

• The Dutch virtual census

• Time for questions and discussion

Page 3: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

3

Contents Social Statistics

• Requirements for modern Social Statistics

• Driving forces

• Policy implications

• Life cycle model

• Relevant statistical information for policy and society

• Strategy for data collection

• Secondary data

• How to get consistency of different data sources?

• Prototype of a micro database

• Conclusions

Page 4: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

4

Requirements for modern Social Statistics

Product quality (Eurostat Code of Practice):

1. Relevance

2. Accuracy

3. Timeliness and punctuality

4. Comparability and coherence

5. Accessibility and clarity

Page 5: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

5

Driving Forces

More coherence, more thematic publications, more detail (small areas, population groups) and more flexibility in the statistical output (will lead to a better product)

ICT developments: more registers

High nonresponse rates in social surveys

To cut down processing costs: standardisation

To lower response burden: less questions, EDI (or EDC) and diminish ‘irritation factor’

Page 6: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

6

Policy implications

• From primary to secondary data collection

– Wherever possible use data available in existing registers and other administrative sources

– Primary data collection only, if no (timely) data available (or of bad quality)

– Statistics Netherlands Act

• From traditional to electronic data collection

• Standardisation of statistical processes; multi-data-source statistics; efficient sampling

• Challenges must be faced while the available budget is constantly being reduced

Page 7: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

7

Socialcapital

Labour marketposition

Income

Consumption

Housing

Time use … Well-being

Demography

Health

Education

Life cycle model (1)

Labour market position - Working/non working- Occupation- Economic activityDemography

- Year of birth- Nationality- Household composition- Etc.

Page 8: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

8

Socialcapital

Labour marketposition

Income

Consumption

Housing

Time use … Well-being

Demography

Health

Education

Life cycle model (2)

Page 9: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

9

T+2T+1

Cas

es

Variables

Time

T

Life cycle model (3)

Page 10: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

10

Life cycle model (4)

Page 11: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

11

• Transitions between states

• State

Life cycle model (5)

• Duration time in a certain state

Time

Analysis possibilities:

Page 12: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

12

Life cycle model (6)

Time

Page 13: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

13

Relevant statistical informationfor policy and society

• Domain specific

• Transitions and durations within a domain

• Relations between domains

• Relations between transitions and durations between domains

• Monitor information (long period)

Page 14: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

14

Strategy for data collection (1)

• Start with registers (e.g. population register, housing register, business register)

• Add data from other administrative sources

• Add data from business and household surveys

• Match all these data at the micro level

• Create a ‘data clearing house’ within the statistical office

Page 15: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

15

Surveys

Variables

Registers

All inhabitants N

etherlands

1

n

.

.

Strategy for data collection (2)

Page 16: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

16

RINR

IN

Matching method for individual data

Longitudinal

Population Register

Administrative or

survey data

Strategy for data collection (3)

Page 17: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

17

Secondary data (1)

Quality• Quality may be good for some basic registers,

but not for all registers; monitoring quality is important

• No sampling errors• No unit nonresponse• Many sources of non-sampling errors remain:

– Item nonresponse

– Measurement errors

– Coverage errors

Page 18: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

18

Secondary data (2)

Challenges• Impact on the organisation, coordination,

crossing departmental boundaries, change in culture

• Influence of a statistical office on contents of registers is limited

• Communication with register holders, e.g. about quality and changes

• Quality control system (control surveys?)• Comprehensive, standardised metadata system• Version control system for updates• Changing form surveys to registers without

causing a trend break

Page 19: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

19

How to get consistency of different data sources?

• Harmonisation! (coverage, definitions, reference periods, etc.)

• Editing of all records at micro level by automated procedures

• Only edit what needs to be edited (clear instructions are necessary!)

• Make use of the technique of repeated weighting for survey data

Page 20: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

20

Prototype of a micro database (1)

LFS

HS

X1…XK Y1…YM Z1…ZR U1…US

Page 21: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

21

Output inspired harmonisation: the one figure for one phenomenon idea

StatLine:all statistical information on the web(via home page of Statistics Netherlands)

http://www.cbs.nl/en-GB/menu/home/default.htm

Prototype of a micro database (2)

Page 22: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

22

Conclusions

Social Statistics develop in the direction of a

permanent virtual census to be able to

produce: – More crosstables over different domains– More longitudinal information– More flexible policy relevant output

Page 23: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

23

Contents System of social statistical datasets (SSD)

• Introduction to Statistics Netherlands

• Examples of registers

• Definition and driving forces of the SSD

• The scope of the SSD

• Core and satellites

• The process

• Linking the sources

• Micro integration

• Estimation aspects

• Statistical confidentiality

• Conclusions

Page 24: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

24

Introduction to Statistics Netherlands (1)

The Central Statistical Office (CBS)• almost all official statistics in the Netherlands

• no regional offices

• two buildings: The Hague (in the West)

Page 25: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

25

Introduction to Statistics Netherlands (2)and Heerlen (in the South); both have about 1000 employees

MissionThe mission of Statistics Netherlands is to publish reliable and coherent statistical information that meets the needs of society.

Position of the Statistical OfficeStatistics Netherlands is since 2004 a semi-independent organisation (still government funding) with about 2000 employees

Page 26: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

26

Examples of registers

Three kinds of registers• Population Register (PR)• Job register• Self-employed register• Education register• Occupation register• Income register• Social security register• Unemployment register• Pension register• Other registers on persons, families and households• Housing register• Other registers on properties, buildings and dwellings• General business register• Other registers on enterprises and establishmentsCommon identifier: (numerical) address

Page 27: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

27

Definition and driving forces of the SSD

Definition:set of integrated microdata files with coherent and detailed demographic and socio-economic data on persons, households, jobs and benefits

No remaining internal conflicting information

Driving forces:

• Virtual Census of 2001

• Better products: more coherence and flexibility

Page 28: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

28

The scope of the SSD

All relevant variables in the life cycle • Demography• Health• Education• Labour market position• Income• Consumption• Housing• Time use• Etc.

Page 29: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

29

SSD-core

satellite

sate

llite

sate

llite

satellitesatellite

sate

llite

satellite

satellite

Core and satellites (1)

Page 30: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

30

Core and satellites (2)

Core:

• contains only integral register information

• contains the most important demographic and socio-economic information

• contains only information that is used in at least two satellites

Page 31: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

31

Core and satellites (3)

Satellites are produced in two steps:

• Copying and derivation of the relevant information from the core SSD

• Adding of the unique information on a specific theme from registers and surveys

Page 32: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

32

Core and satellites (4)

Examples of current SSD satellites:• Labour market• Social security• Income• Education• Health care• Justice and security • Ethnic minorities• Social cohesion

The development of more SSD-satellites has been planned

Page 33: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

33

The process

Already discussed:– Specify the information needed– Collection of registers– Surveys only additional

Still to discuss:– Linking the sources– Micro integration– Estimation aspects– Statistical confidentiality

Page 34: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

34

Linking the sources (1)

• The Population Register is the backbone of the system for persons

• All other files are matched exactly to the Population Register,

• such that the true matches are maximised (aim: no missed matches) and the false matches (mismatches) are minimised

Page 35: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

35

Linking the sources (2)

Matching variables:

• Social security and fiscal (SOFI) number (effectiveness close to 100%), since 2007 Citizen Service Number

• Other personal identifiers: sex, date of birth, and address (effectiveness close to 100%)

• Number of mismatches very low (close to 0%)

Page 36: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

36

Micro integration (1)

The aim of micro integration is:

– To check the linked data and modify incorrect records,

– in such a way that the results that are to be published are of higher quality than the original sources

Page 37: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

37

Micro integration (2)

To fulfil this demand an integrated process of:

• data editing,

• derivation of statistical variables,

• and imputation

is executed

Page 38: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

38

Micro integration (3)

Constraints and limitations:

- Only variables that are to be published are micro integrated

- Identity rules are necessary, e.g. the same variable in two sources or a relationship between two or more variables in one or more sources

- No mass imputation

Page 39: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

39

Estimation aspects

Surveys are samples from the population

If surveys are enriched with register information, estimations of the register part of the enriched survey will lead to inconsistencies with the counts from the entire register

Statistics Netherlands developed the method of repeated weighting to solve these inconsistencies (aim: numerically consistent estimations)

Page 40: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

40

Statistical confidentiality

IDs Variables

Characteristics

Identifiers (PINs, sex,date of birth, address)

PERSONS BACKBONEfull range of all persons as from 1995

Administrative sources

IDs Variables

Household surveys

IDs in sources are replaced by randomRecord Identification Numbers (RINs)

Page 41: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

41

Conclusions

The SSD diminishes the administrative burden and increases:– The efficiency of statistics production– The accuracy of statistical outputs – The possibilities for social policy research

Safeguarding confidentiality is vital for the process of record linkage

Page 42: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

42

Group work on registers and surveys (1)

Key question: which census variables are missing in all the registers? Consider the following thirteen census variables:

1.Sex

2.Age

3.Country of citizenship

4.Marital status

5.Household position

6.Religious denomination

7.Country of birth

8.Household size

Page 43: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

43

Group work on registers and surveys (2)

9. Place of residence one year prior to the census

10.Economic status

11.Level of educational attainment

12.Occupation

13.Branch of current economic activity

A. Discuss the situation in the countries represented in your group or select some countries for further discussion

Page 44: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

44

Group work on registers and surveys (3)

B. Are those missing variables available is any survey? Discuss where those surveys may be used (legal aspect and agreement with survey organiser) for producing official statisticsC. Can the surveys and registers be linked? Is this exact matching or is statistical matching necessary?

Are there other important issues that affect the overall situation?

Page 45: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

45

Group work on registers and surveys (4)

D. Possibilities and limitations for further development of combining registers and surveys. What is the policy in the NSIs for further development? What are the possibilities and limitations for such a development?E. Prepare a short presentation (5 minutes per group)

Page 46: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

46

Contents The Dutch virtual census (1)

• History of the Dutch Census

• The Dutch Census of 2011

• Data sources

• Combining sources: micro linkage

• Combining sources: micro integration

• Conditions facilitating use of administrative sources

• Miscellaneous aspects

• Census tables

• Micro macro method

• Result on 2011 economic activity

Page 47: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

47

Contents The Dutch virtual census (2)

• Comparison with other countries

• Comparison with other years

• Harmonisation

• Microdata availability

• Data integration activities between the 2001 Census and the 2011 Census

• Preparing the 2011 Census

• Conclusions

Page 48: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

48

History of the Dutch Census (1)

TRADITIONAL CENSUS

Ministry of Home Affairs:

1829, 1839, 1849, 1859, 1869, 1879 and 1889

Statistics Netherlands:

1899, 1909, 1920, 1930, 1947, 1960 and 1971

Unwillingness (nonresponse) and reduction expenses no more traditional censuses

Page 49: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

49

History of the Dutch Census (2)

ALTERNATIVE: VIRTUAL CENSUS1981 and 1991: limited virtual censuses based on Population Register and surveys

development 90’s: more registers → integrated set of registers and surveys, SSD

2001 and 2011: complete virtual censuses based on the SSD with information at the municipality level

Page 50: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

50

The Dutch Census of 2011

is based on the Social Statistical Database (SSD) which• is a set of integrated microdata files with coherent and detailed demographic and socio-economic data on persons, households, jobs and benefits• has no remaining internal conflicting information

is part of the European Census• Eurostat: coordinator of EU, accession and EFTA countries in the European Census Rounds• Census Table Programme, every 10 years

Social statistics in the Netherlands develop in the direction of a permanent Virtual Census to be able to produce: • More crosstables over different domains• More longitudinal information• More flexible policy relevant output

Page 51: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

51

Data sources

Registers:• Population Register (PR) → illegal people excluded, homeless counted at last known address• Jobs file, containing all employees • Self-employed file, containing all self-employed• Fiscal administration• Social Security administrations• Pensions and life insurance benefits• Housing registers

Surveys:• Survey on Employment and Earnings (SEE) stopped• Labour Force Survey data around Census Day• Housing surveys no longer necessary for the Census

Page 52: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

52

Combining sources: micro linkage

• Linkage key:Registers

Citizen Service Number, unique

Surveys Sex, date of birth, address (postal code and house number)

• Linkage key replaced by RIN-person

• Linkage strategyOptimizing number of matchesMinimizing number of mismatches and missed matches

Page 53: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

53

Combining sources: micro integration

• Collecting data from several sources more comprehensive and coherent information on aspects of a person’s life

• Compare sources - coverage - conflicting information (reliability of sources)

• Integration rules - checks - adjustments - imputations

• Optimal use of information quality improves

• Example: job period vs. benefit period

Page 54: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

54

Conditions facilitating use of administrative sources

• Legal base (Statistics Act)• Public approval (‘Big Brother is watching you’)• Cooperation among authorities (mainly

government organisations)• Comprehensive and reliable register system

(administrative versus statistical quality)• Unified identification system (preferably unique

ID-numbers)

Page 55: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

55

Miscellaneous aspects (1)

• Stable identifiers

• Stability of registers

• Only edit what needs to be edited (by automated procedures)

• Dates of real events versus dates of registration

• Derived variables (example: current activity status)

• Impact on the organisation (change of culture)

• Communication with register holders

Page 56: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

56

Output inspired harmonisation (coverage, definitions, reference periods): the one figure for one phenomenon idea

StatLine:all statistical information on the web(via home page of Statistics Netherlands)

http://www.cbs.nl/en-GB/menu/home/default.htm

Miscellaneous aspects (2)

Page 57: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

57

Census tables (1)

Preliminary work before tabulating

Census Programme definitions: not always clear and unambiguous, e.g. economic activity

Priority rules• (characteristics of) main job (highest wage)• employee or employer• job or (partially) unemployed• job or attending education• job or retired• engaged in family duties or retired• age restrictions

Tabulating register variables: Simply straightforward counting from SSD register data

Page 58: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

58

Census tables (2)

Tabulating survey (and register) variables

Mass imputation?•Pro’s: reproducible results •Con’s: danger of oddities in estimates (e.g. highly educated baby)

Traditional Weighting?•Pro’s: simple, reproducible results (if same microdata and

weights)•Con’s: no overall numerical consistency between survey

and register estimates

Demand for overall numerical consistency • one figure for one phenomenon idea• all tables based on different sources (e.g. surveys) should be mutually consistent

Page 59: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

59

Census tables (3)Ethnicity: registerEducation: survey 1 and survey 2Employment status: survey 2Estimate: T1: educ x ethnic and T2: educ x employ

Survey 1

Survey 2

Register

ethnic1...k educLo...Hi employ1...m

educ x

ethnicnot-NL

NL Total

educLo 20 29 49

educHi 9 42 51

Total 29 71 100

employ

x educemployed non-

employedTotal

educLo 32 20 52

educHi 28 20 48

Total 60 40 100

Register Survey 1

Survey 2Survey 2

7030Total

NLnot-NL

ethnic

Page 60: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

60

Census tables (4)

Repeated Weighting (RW) : tool to achieve numerical consistency (VRD-software)

Basic principles of RW:• estimate table on most reliable source (mostly source with most records, e.g. register)

• estimate tables by calibrating on common margins of the current table and tables already estimated (auxiliary information)

• repeatedly use of regression estimator: - initial weights (e.g. survey weights) calibrated as minimal as possible - lower variances - no excessive increase of (non-response) bias (as long as cell size>>0)

• each table has its own set of weights

Page 61: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

61

Census tables (5)

Survey 1

Survey 2

Register

ethnic1...k educLo...Hi employ1...m

sam

plin

g un

its

Register Survey 1

Survey 2Survey 2educ x ethnic

not-NL

NL Total

educLo

educHi

Total

employ

x educemployed non-

employedTotal

educLo

educHi

Total

50

50

100

31 19

30 20

61 39

ethnic not-NL

NL

Total

30 70 100

20 30

10 40

ethnic not-NL

NL

Total 30 70

50

50

2

1

3

Calibrate on ethnic, then on educ x ethnic

Page 62: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

62

Micro macro method (1)

Repeated Weighting works nicely, but in the 2011 Census a new requirement was introduced: hypercubes (= high dimensional tables)

Problem:Very detailed tables contain many sample zeros that RW cannot handle

Solution 1: estimate subhypercubesSolution 2: micro macro method (an IPF method) was introduced to estimate the interior of subhypercubes containing LFS variables

Page 63: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

63

Micro macro method (2)

Results of the micro macro method are published if two conditions are fullfilled:1. table margins estimated with RW are small enough2. number of records in estmated cells are large enough

Criteria:1. estimated relative inaccuracy of at most 20 percent (i.e. the estimated margins amount to 40 percent at most) which corresponds to a threshold of 25 persons2. only table cells based on 5 or more persons are published

Page 64: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

64

Result on 2011 economic activity

Page 65: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

65

Comparison with other countries

Traditional Census (complete enumeration): Most countries in the world (including the UK and the US)Traditional Census (partial enumeration) and Registers: Some countries (e.g. Germany, Poland and Switzerland)Rolling Census: FranceFully or largely register-based (Virtual) Census: Five Nordic countries (Iceland,Norway, Sweden, Finland and Denmark), the Netherlands, Belgium, Austria and Slovenia

Page 66: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

66

Comparison with other years

Page 67: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

67

Harmonisation (1)

More information about the Dutch traditional Censuses (including those of 1960 and 1971):http://www.volkstellingen.nl/en/

For 1960 and 1971 the same variables as for 2001• if not available: constructed based on existing variables in Census data

Variables not internationally harmonised (e.g. sex, age, marital status, household position, country of birth, economic status, household size and country of citizenship)• same classification and priority rules as for 2001

Page 68: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

68

Harmonisation (2)

Household size and country of citizenship:• missing for 1960

Religious denomination (philosophy of life):• only for 1960 and 1971

Place of residence one year prior to the census:• only for 2001

International classifications• Branch of current economic activity: ISIC / NACE• Occupation: ISCO• Level of educational attainment: ISCED

Page 69: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

69

Harmonisation (3) 1960 1971 2001

Sex X X X

Age X X X

Country of citizenship X X

Marital status X X X

Household position X X X

Religious denomination X X

Country of birth X X X

Household size X X

Place of residence one year prior to the census

X

Economic status X X X

Level of educational attainment

X X X

Occupation X X X

Branch of current economic activity

X X X

Page 70: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

70

Microdata availability

One percent samples for three years (1960, 1971 and 2001)IPUMS (Integrated Public Use Microdata Series):http://www.ipums.org/international/index.html

Weighting to population totals

Protecting according to rules for public use files

Microdata sets for all three years available for research!DANS (Data Archiving and Networked Services):http://www.dans.knaw.nl/en/

Page 71: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

71

Data integration activities between the 2001 Census and the 2011 Census (1)

• Tables (http://www.cbs.nl/nl-NL/menu/themas/dossiers/historische-reeksen/publicaties/volkstelling-2001/2003-volkstelling-excel.htm)

• Book and extra chapter (http://www.cbs.nl/nl-NL/menu/themas/dossiers/historische-reeksen/publicaties/volkstelling-2001/2001-b57-pub.htm)

Page 72: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

72

Data integration activities between the 2001 Census and the 2011 Census (2)• Integrated Public Use Microdata Series

(https://international.ipums.org/international)• Lectures (Conferences, Universities, Research

institutes, Statististical offices)• ESTP-course Registers in Statistics (Oslo)• International Statistical Seminar Eustat in Bilbao

(http://www.eustat.es/prodserv/seminario_i.html)• Digitalizing (http://www.volkstellingen.nl/en/)• Recommendations and register-based statistics• CENEX on ISAD (http://cenex-isad.istat.it)• European census regulations

Page 73: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

73

Preparing the 2011 Census

• Sources (the PR as backbone of the census, changes in contents and quality of registers, remaining information from LFS)

• Estimation method (repeated weighting, new version of the software, fall-back option of weighting to PR, zero cells problem)

• Statistical Disclosure Control of the hypercubes (Workshop on SDC of Census Data in April 2012)

• Tabular data in SDMX format and the Census Hub

Page 74: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

74

Conclusions (1)

• A Dutch Virtual Census: yes, we can!• Micro integration remains important• Repeated weighting was a success

Advantages:• Relatively cheap (small cost per inhabitant)• Quick (short production time)

Disadvantages:• Dependent on register holders (statistics is not their priority), timeliness of registers, concepts and population of registers may differ from what is needed (keep good relations with the register holders!)• Publication of small subpopulations sometimes difficult or even impossible because of limited information

Page 75: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

75

Conclusions (2)

Other aspects:• Less attention for the results of a virtual census than for a traditional one• Difficult to keep knowledge and software up-to-date (Census is running every ten years)• Enormous international interest in virtual censuses• A lot of interesting census work in the coming years!

Page 76: Combined use of data from registers and sample surveys Eric Schulte Nordholt Statistics Netherlands Division Socio-economic and spatial statistics e.schultenordholt@cbs.nl

76

Time for questions and discussion