Data editing and validation
Eric Schulte NordholtSenior researcher and project leader of the Census
Statistics NetherlandsDivision Social and Spatial Statistics
Department Support and DevelopmentSection Research and Development
UNECE-Eurostat Meeting on Population and Housing Censuses in Geneva
13-15 May 2008
2
Contents
• Introduction
• The French paper (Working Paper 8)
• The Italian paper (Working Paper 9)
3
Introduction
Activities• Regular international meetings (e.g. UNSD, UNECE and Eurostat)• Regular contacts and visits between countries• UN Recommendations and European regulation
Aims• Better comparability over time• Better comparability between countries
France and Italy both contribute to the aims in their own way
4
The French paper (Working Paper 8)
The validation of the census data in France• Rolling census with advantages (e.g. no longer an
enormous peak in the work load) and disadvantages (e.g. more complicated structure)
• Less attention for the results?
• Reference year? Assumption of stability of Census variables over time (research, e.g. based on EU-SILC?)
• Many checks to improve quality, but how large are the final overcount and undercount?
• Hot deck (within classes?) seems a logical choice for the imputation method, but what do we know about the selectivity of the non-response?
5
The Italian paper (Working Paper 9)
An overview of editing and imputation methods
for the next Italian censuses
• Strong link to methodology (department) at Istat
• Dilemma of timeliness versus quality: automate the editing procedure, but what is the effect of the order of the imputations (check on outliers, not inliers, Winkler)?
• Interesting link with graph theory, but a picture is missing (Blaise?)
• Minimum change approach (Chernikova algorithm?)
• Long / short form reduces number of possible donors
• Progress in quality: unique identifier (fiscal code)