8
Oslo, 26.9.2012 UNECE Work sessino on SDE Topic (vii) Editing and Imputation of Census data 1 Topic (vii): Editing and Imputation of Census data Discussion Session organizer: Daniel Kilchmann / SFSO

Topic (vii): Editing and Imputation of Census data Discussion · Topic (vii) Editing and ... – Manual imputation → very large households. – Predictive, ... – Vector's methodology:

  • Upload
    buikien

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Topic (vii): Editing and Imputation of Census data Discussion · Topic (vii) Editing and ... – Manual imputation → very large households. – Predictive, ... – Vector's methodology:

Oslo, 26.9.2012 UNECE Work sessino on SDETopic (vii) Editing and Imputation of Census data

1

Topic (vii): Editing and Imputation of Census data

Discussion Session organizer:

Daniel Kilchmann / SFSO

Page 2: Topic (vii): Editing and Imputation of Census data Discussion · Topic (vii) Editing and ... – Manual imputation → very large households. – Predictive, ... – Vector's methodology:

Oslo, 26.9.2012 UNECE Work sessino on SDETopic (vii) Editing and Imputation of Census data

2

Paper Summaries

WP.40 (Slovenia) – Editing of multiple source data in the case of Slovenian Agricultural Census 2010.

– Huge number of data tables and variables.

– Metadata driven applications for linkage and calculation of derived variables → independence from IT, decrease of IT workload, traceability and repeatability, documentation, but skilled subject staff needed, management and control of huge amount of metadata.

– Priority setting with overlapping sources, large differences.

– Combined data sources → decrease of reporting burden and higher data quality, but insignificant effect on costs and increase of E&I workload.

Page 3: Topic (vii): Editing and Imputation of Census data Discussion · Topic (vii) Editing and ... – Manual imputation → very large households. – Predictive, ... – Vector's methodology:

Oslo, 26.9.2012 UNECE Work sessino on SDETopic (vii) Editing and Imputation of Census data

3

WP.41 (Austria) – The data imputation process of the Austrian register-based Census.

– Huge cost reduction!– Census data base built up by unique key → unique and multiple

sources, derived variables.– Splitting into census subjects → dependencies between subjects.– Deterministic imputation, derivation from different sources.– Random distributional imputation inside imputation classes

(decks).– Huge number of quality indicators.

Page 4: Topic (vii): Editing and Imputation of Census data Discussion · Topic (vii) Editing and ... – Manual imputation → very large households. – Predictive, ... – Vector's methodology:

Oslo, 26.9.2012 UNECE Work sessino on SDETopic (vii) Editing and Imputation of Census data

4

WP.42 and WP.43 (UK), – The Practical Implementation of the 2011 UK Census Imputation Methodology.– Item Imputation of Census data in an automated production environment.

– Highly automated process.– Shift from EDIS to CANCEIS → Modules, HH first, increased

imputation quality.– Deterministic imputation for relationship.– Fall back imputation → tuning CANCEIS in automated production

environment.– Hard edit rules with variables in more than one module → e.g.

missingness of addresses.– Soft edit rules: increase of rare characteristics.– Reordering of household members (available in newer version of

CANCEIS)?

Page 5: Topic (vii): Editing and Imputation of Census data Discussion · Topic (vii) Editing and ... – Manual imputation → very large households. – Predictive, ... – Vector's methodology:

Oslo, 26.9.2012 UNECE Work sessino on SDETopic (vii) Editing and Imputation of Census data

5

WP.44 (Abu Dhabi) – Edit and Imputation of the 2011 Abu Dhabi Census.

– Donor imputation: CANCEIS.– Deterministic imputation: SAS → out of scope responses.– Relax edit rules for large households: extended households,

multiple wifes, large expatriate population.– Manual imputation → very large households.– Predictive, Estimation and Distributional Accuracy for test data.– Shift from manual to mainly donor imputation → decrease in

workload, measurable changes, reproducible.

Page 6: Topic (vii): Editing and Imputation of Census data Discussion · Topic (vii) Editing and ... – Manual imputation → very large households. – Predictive, ... – Vector's methodology:

Oslo, 26.9.2012 UNECE Work sessino on SDETopic (vii) Editing and Imputation of Census data

6

WP.45 (Mexico) – Editing Census Data: Mexico's experience.

– Traditional Census with 6 kinds of questionnaires.– Vector's methodology: generate all possible combinations of

values that variables involved in an edit rule can have → specific treatment.

– Editing Criteria Simulator to assess quality of edits.– Urban Environment form: standardization of street names,

presentation on map → inconsistencies.

Page 7: Topic (vii): Editing and Imputation of Census data Discussion · Topic (vii) Editing and ... – Manual imputation → very large households. – Predictive, ... – Vector's methodology:

Oslo, 26.9.2012 UNECE Work sessino on SDETopic (vii) Editing and Imputation of Census data

7

General discussion

• Is it worth shifting from classical Census to register-based Census?

• Are registers used for Censuses stable – can we guarantee stable figures?

• How to define the priority ordering of several sources and how to measure the efficiency of this decision?

• Is it better to have several sources or just one per variable?• Reliability of registers vs reliability of questionnaires?• What about other surveys than Census', is a register-based

strategy implemented or planned? Conclusions?• Respondent burden vs. user burden (in case of flags

indicating the source of data or other para-/metadata)?

Page 8: Topic (vii): Editing and Imputation of Census data Discussion · Topic (vii) Editing and ... – Manual imputation → very large households. – Predictive, ... – Vector's methodology:

Oslo, 26.9.2012 UNECE Work sessino on SDETopic (vii) Editing and Imputation of Census data

8

General discussion

• Re-use of the process easier for classical Census or for register-based Census?

• Is a fully automated process realistic?• Shift from good to better E&I tool: will we be able to solve all

problems one day? Is that needed? Stability of the process?• Rare sub-populations – 'important' for researchers and how

about E&I?• Always sequence in processing: can we solve problems due

to that, e.g. how to solve the problem with edit rules involving variables form different modules?

• Are there new dimensions of editing planned, like some sort of spatial editing?