22
Determination of Administrative Data Quality: Recent results and new developments Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands May 6, 2010, Helsinki, Finland

Determination of Administrative Data Quality : Recent results and new developments Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands

Embed Size (px)

Citation preview

Page 1: Determination of Administrative Data Quality : Recent results and new developments Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands

Determination of Administrative Data Quality: Recent results and new developments

Piet J.H. Daas, Saskia J.L. Ossen,

and Martijn Tennekes

Statistics NetherlandsMay 6, 2010, Helsinki, Finland

Page 2: Determination of Administrative Data Quality : Recent results and new developments Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands

2

Overview

Introduction View on quality Framework developed for admin. data sources

• Construction and composition

Application (first part)• Checklist and results

New developments• Ideas and future work

• BLUE-ETS

Page 3: Determination of Administrative Data Quality : Recent results and new developments Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands

3

Introduction

Statistics Netherlands increases the use of data (sources) collected and maintained by others• To decrease response burden and costs

As a result, Statistics Netherlands becomes:• More dependent on administrative data sources• Must be able to monitor the quality of those data

sources– What is ‘quality’ in this context?

Page 4: Determination of Administrative Data Quality : Recent results and new developments Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands

4

View on quality

Statistics Netherlands defines quality of administrative data sources as:

“Usability for the production of statistics”

Differs from ‘quality’ as used by the data source keeper

– Often does not have statistical use in mind – Can’t use the quality report of the data source

keeper (if available)

And it is quality of the input !

Page 5: Determination of Administrative Data Quality : Recent results and new developments Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands

5

Framework developed No standard framework available for input quality of administrative data sources

Quality of administrative data is only occasionally

observed in the literature• Majority of studies on quality and statistics focus on:

– output quality

– quality of survey data

Framework for the determination of the quality of

administrative data sources based on:• Statistics Netherlands experiences and ideas• Including the results published by others

Page 6: Determination of Administrative Data Quality : Recent results and new developments Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands

6

Many quality indicators were identified• In total 57!

Many dimensions were identified• In total 19

How to combine and structure these indicators?• Distinguish different views on quality• Alternative name is Hyperdimensions

3 Hyperdimensions were required to combine all quality indicators into a single framework !!

• First step towards a structured approach

Framework overview (1)

Page 7: Determination of Administrative Data Quality : Recent results and new developments Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands

7

Three high level views on the input quality of administrative data sources

• 3 hyperdimensions

Framework overview (2)

Page 8: Determination of Administrative Data Quality : Recent results and new developments Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands

8

Three high level views on the input quality of administrative data sources

• 3 hyperdimensions

Framework overview (2)

3 Different high level views on quality

Page 9: Determination of Administrative Data Quality : Recent results and new developments Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands

9

3 Different high level views on quality

SOURCE DATA

SOURCE: - Focus on data source as a whole - Delivery related aspects - and some other things

METADATA: Focuses on the (availability of the) information required to understand and use the

data in the data source

DATA: - Technical checks- Accuracy related issues

Page 10: Determination of Administrative Data Quality : Recent results and new developments Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands

10

Determine Source and Metadata quality

With a checklist • Used for both Source and

Metadata

Tested 8 administrative data sources• Took on average about 2 hours per

data source

Results expressed at the

dimensional level• 5 for Source, 4 for Metadata

Page 11: Determination of Administrative Data Quality : Recent results and new developments Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands

11

Checklist results (1) - Source

+, good; o, reasonable; -, poor; ?, unclear

IPA: Insurance Policy records Administration; 1FigHE: coordinated register for Higher Education SFR: Student Finance Register; 1FigSGE: coordinated register for Secondary General Education CWI: register of Centre for Work and Income; NCP: National Car Pass register ERR: Exam Results Register; MBA, Dutch Municipal Base Administration

Table 1. Evaluation results for the Source hyperdimension

Dimensions Data Sources

IPA SFR CWI ERR 1FigHE 1FigSGE NCP MBA

1. Supplier + + + + + + + +

2. Relevance + + + o + + + +

3. Privacy and Security

+ + + + + +/o + +

4. Delivery o + - + + o + +

5. Procedures + +/o + +/o +/o +/o o +

Page 12: Determination of Administrative Data Quality : Recent results and new developments Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands

12

Checklist results (2) - Metadata

+, good; o, reasonable; -, poor; ?, unclear

Table 2. Evaluation results for the Metadata hyperdimension

Dimensions Data Sources

IPA SFR CWI ERR 1FigHE 1FigSGE NCP MBA

1. Clarity + + - o + + + +

2. Comparability +/o + - + + + + +

3. Unique keys + + + + + + + +

4. Data treatment +/o ?(+) ? ?(o) ?(+) ?(+) + +

IPA: Insurance Policy records Administration; 1FigHE: coordinated register for Higher Education SFR: Student Finance Register; 1FigSGE: coordinated register for Secondary General Education CWI: register of Centre for Work and Income; NCP: National Car Pass register ERR: Exam Results Register; MBA, Dutch Municipal Base Administration

Page 13: Determination of Administrative Data Quality : Recent results and new developments Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands

13

Overall conclusions Data sources

• CWI only negative scoring data source– Tempted to recommend not using it!

– Result of delivery issues and vague definitions – However, it is the only administrative data source that contains

educational data on the non-student part of the population!– Solve the weaknesses!!

• Other data sources – Quite OK (there are always some things you can improve) – Data processing by data source keeper needs attention

Checklist– Good way to assist the user, quite fast– Quality information on a basic but essential level– Not all information is commonly known!

Page 14: Determination of Administrative Data Quality : Recent results and new developments Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands

14

What about the Data hyperdimension

How to study data quality?• A draft list of indicators is available

– 10 dimensions and 26 indicators

• A structured approach needs to be developed!

1. Data inspection should be efficient

2. Assist user with scripts/software (were possible)

• ?A checklist?

Page 15: Determination of Administrative Data Quality : Recent results and new developments Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands

15

Overview of data quality approach

Technical checks Data file size Metadata compliance Visualization methods ….

Output related indicators Precision Sensitivity Measurement ….

Input Throughput Output

Statistical processExploratory phase(preproduction phase)

Source andMetadataChecklist

Data-hyperdimensionSource- Metadata-

Accuracy related indicators Coverage Selectivity Linkage Editing/Imputation ….

Page 16: Determination of Administrative Data Quality : Recent results and new developments Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands

16

Data: Technical checks

Very basic • For RAW data• Should be easy and quick• No other info required!

Examples• File size• Number of (unique) units / records received• Metadata compliance (standard for XML-files)• Visual checks (Data fingerprinting)

– 2 examples

Page 17: Determination of Administrative Data Quality : Recent results and new developments Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands

17

Technical checks: Visualization examples

Missing data

‘Data fingerprinting’

Page 18: Determination of Administrative Data Quality : Recent results and new developments Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands

18

Data: Accuracy related indicators First true indicators in the process

• Information from other data sources is required

Examples of indicator for units• Over coverage indicator

– Units in source not belonging to NSI-population

• Under coverage indicators– Missing units

– NSI-population units not in source

– Selectivity– Representativity of units in data source

compared to NSI-population (RISQ-project)

• Linkability indicators– Correct, incorrect and selectivity of linked units

Page 19: Determination of Administrative Data Quality : Recent results and new developments Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands

19

Data: Output related indicators

Report data quality on an aggregated level • Quality of the output!

• Need to link input quality to output quality

Examples of indicators:• Precision of estimates of core variables

• Selectivity of core variable totals

Page 20: Determination of Administrative Data Quality : Recent results and new developments Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands

20

How to report data quality ?

‘Quality Report Card’• paper / computerized version• Place were all results are combined and orderly

presented

Which indicators always? • Is there a basic/minimum set?• Hierarchy of quality indicators

Which indicators can be automatically determined?• Create standardized scripts • Create a software prototype

Page 21: Determination of Administrative Data Quality : Recent results and new developments Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands

21

Future plans

Fully focus on Data hyperdimension• Is a lot of work!

Study this in a European context• BLUE-Enterprise and Trade Statistics project

– 7th Framework program– From 1-4-2010 till 31-3-2013– One of the topics is the study of admin. data quality– This topic is studied jointly by he NSI’s of:

Netherlands, Italy, Norway, Slovakia, Sweden

Page 22: Determination of Administrative Data Quality : Recent results and new developments Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands

22

Thank you for your attention!

More details in the Q2010-paper Checklist can be obtained

• From the Statistics Netherlands website

• by mailing [email protected] and request a copy