25
Evaluating the feasibility of using administrative data in the context of census population statistics Kimberley Brett Office for National Statistics Census Transformation Programme Beyond 2021 Research and Design

Evaluating the feasibility of using administrative data in the context of census population statistics

Embed Size (px)

Citation preview

Page 1: Evaluating the feasibility of using administrative data in the context of census population statistics

Evaluating the feasibility of using

administrative data in the context

of census population statistics

Kimberley Brett Office for National Statistics

Census Transformation Programme

Beyond 2021 Research and Design

Page 2: Evaluating the feasibility of using administrative data in the context of census population statistics

Overview

• Use of administrative data for population statistics

requires careful consideration of quality issues

- Coverage

- Lags

- Definitions

- Policy

- Administrative and operational process

• Outline how we are developing a framework for

evaluating and improving quality of administrative

data used in SPD construction

• Examples of quality issues in our analysis

Page 3: Evaluating the feasibility of using administrative data in the context of census population statistics

Admin Source Reports Beyond 2011

• During phase 1 Beyond 2011 published a series of

reports on the following data sources

• Patient Register (PR)

• Customer Information System (CIS)

• Higher Education Statistics Agency (HESA)-Student Records

• English School Census and Welsh School Census

• Electoral Register

• Evaluation framework that focused on each source

separately:

• Coverage

• Plausibility

• Metadata

• Understanding administrative process

Page 4: Evaluating the feasibility of using administrative data in the context of census population statistics

Assessing Coverage – Comparison with Census Estimates

• Patient Register 2011 • Customer Information System 2011

Page 5: Evaluating the feasibility of using administrative data in the context of census population statistics

Assessing Coverage – Comparison with Census Estimates

• School Census • Electoral Register

Page 6: Evaluating the feasibility of using administrative data in the context of census population statistics

Developing the quality framework

• Our approach is to combine administrative sources to estimate the

population and its characteristics

• Need to evolve the quality framework to identify and improve on

specific issues emerging in the SPD analysis

• Stability of the admin data

• Source conflicts

• Accounting for missing groups

• Established a Data Suppliers Group with government departments

supplying the data:

• Acquisition of additional data to supplement SPD

• Feedback on the quality issues identified in our analysis

• Try and influence how data is collected and maintained for statistical

purposes

• UKSA quality toolkit provides us with a framework to work towards

Page 7: Evaluating the feasibility of using administrative data in the context of census population statistics

SPD 5 population counts compared to

the 2011 Census

91% of LA total population

counts within 3.8% of

Census estimate in 2011

Admin data

method

lower than

2011 Census

Admin data

method

higher than

2011 Census

7

Page 8: Evaluating the feasibility of using administrative data in the context of census population statistics

SPD 5 population counts compared to

the 2011 Census

• 20- 24 year old males

Page 9: Evaluating the feasibility of using administrative data in the context of census population statistics

Using linked data to evaluate quality

• Linking data across multiple administrative sources and

Census data provides valuable insight to explain differences

between SPD counts and Official Estimates

• Can measure the accuracy

• How long lags persists for cohorts of the population

• Definitional and collection mode differences

• Volatility of operational processes

Page 10: Evaluating the feasibility of using administrative data in the context of census population statistics

Lags of a Statistical Population Dataset

• Counts on the SPD correlate well with Census estimates, but lags

exist in particular locations

Age by single year

Percentage of the population by single year of age that are recorded

in the same location when comparing the SPD to census estimates

Page 11: Evaluating the feasibility of using administrative data in the context of census population statistics

Lags on the Patient Register (PR)

• How long does it take for PR records to update location?

Same address information on 2011 PR as Census

Same address information on 2012 PR as Census

Same address information on 2013 PR as Census

Different address information on 2011 PR, 2012 PR and 2013 PR to

Census

Page 12: Evaluating the feasibility of using administrative data in the context of census population statistics

Examples of changes in operational

processes

• The National Duplicate Registration Initiative (NDRI) was the

Audit Commission’s periodic exercise that used data matching

techniques to review GPs’ patient lists

• Conducted list cleaning in 1999, 2004 and 2009

Removal of FP69s

• 2004 - 185,000 patient registration deductions

• 2009 - 95,000 patient registration deductions

• At present, this is done on an ad hoc basis by NHS areas

• Subsequently, in some LAs we observe significant reductions of

patient numbers following list cleaning exercises that result in

decreases of population counts on SPDs

Page 13: Evaluating the feasibility of using administrative data in the context of census population statistics

Example of list cleaning and impact on

SPD count

13

Page 14: Evaluating the feasibility of using administrative data in the context of census population statistics

Example of operational processing and

impact on SPD count

0

1000000

2000000

3000000

4000000

5000000

6000000

7000000

1 2 3 4 5 6 7 8 9 10 11 12

Po

pu

lati

on

Month

Page 15: Evaluating the feasibility of using administrative data in the context of census population statistics

Example of operational processing and

impact on SPD count

0

500000

1000000

1500000

2000000

2500000

3000000

3500000

0 5 10 15 20 25 30 35

Po

pu

lati

on

Day of the month

Page 16: Evaluating the feasibility of using administrative data in the context of census population statistics

Quality of admin data for household statistics

• Census collects information at household level: • Definition of “household” relates to ...

• Household definition not captured on admin data

• Limited to an address definition when producing household

statistics

• Aims to produce household statistics in 2016: • Number of households

• Household size

• Household composition

• Challenges in producing these statistics: • Unable to geo-reference all addresses to an address frame

• Churn at the address results in inflated household sizes

• Relationships between family members not recorded on

admin data

Page 17: Evaluating the feasibility of using administrative data in the context of census population statistics

Missing Address Identifiers (OSAPRs)

Page 18: Evaluating the feasibility of using administrative data in the context of census population statistics

Household Composition Estimates

Page 19: Evaluating the feasibility of using administrative data in the context of census population statistics

Modes of collection

• Later releases of Research Outputs will explore potential of

population characteristics.

• Will depend on data availability and quality.

• Potential topics include: • Income

• Ethnicity

• Health

• Need to understand how the definitional differences and modes of

collection impact on the statistics produced from admin data.

• Undertaken analysis of linked records between the 2011 Census

and School Census to compare ethnicity.

Page 20: Evaluating the feasibility of using administrative data in the context of census population statistics

2011 Census

ethnicity

English School Census ethnicity

Wh

ite B

riti

sh

Iris

h

Iris

h T

rav

eller/

Gyp

sy/R

om

an

y

Ind

ian

Ban

gla

desh

i

Pakis

tan

i

Wh

ite a

nd

Asia

n

Oth

er

Asia

n

Ch

inese

Afr

ican

Wh

ite a

nd

Bla

ck A

fric

an

Cari

bb

ean

W

hit

e a

nd

Bla

ck

Cari

bb

ean

Oth

er

Wh

ite

Oth

er

Bla

ck

Oth

er

Mix

ed

Oth

er

Eth

nic

ity

Mis

sin

g

To

tal

(den

om

inato

r)

White British 95% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 2% 0.50% 0.50% 0.50% 2% 5,048,672

Irish 41% 47% 1% 0.50% 0.50% 0.50% 1% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 5% 0.50% 3% 0.50% 2% 22,609

Irish Traveller/

Gypsy/Romany 35% 2% 54% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 6% 0.50% 0.50% 1% 2% 9,150

Indian 0.50

% 0.50% 0.50% 89% 0.50% 1% 1% 5% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 1% 0.50% 2% 169,609

Bangladeshi 0.50

% 0.50% 0.50% 0.50% 92% 1% 0.50% 2% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 4% 99,905

Pakistani 0.50

% 0.50% 0.50% 1% 0.50% 86% 1% 4% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 1% 3% 4% 252,189

White and

Asian 11% 0.50% 0.50% 1% 0.50% 2% 54% 3% 0.50% 0.50% 0.50% 0.50% 0.50% 3% 0.50% 15% 3% 4% 82,152

Other Asian 1% 0.50% 0.50% 12% 0.50% 2% 2% 58% 1% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 4% 17% 2% 84,028

Chinese 2% 0.50% 0.50% 0.50% 0.50% 0.50% 1% 2% 83% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 7% 2% 2% 27,577

African 1% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 1% 0.50% 83% 1% 1% 0.50% 1% 7% 2% 1% 3% 190,489

White and

Black African 6% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 8% 55% 1% 3% 3% 3% 14% 2% 4% 38,611

Caribbean 1% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 3% 0.50% 77% 3% 0.50% 9% 3% 1% 4% 71,256

White and

Black

Caribbean 12% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 2% 3% 62% 1% 2% 12% 1% 4% 108,920

Other White 8% 0.50% 1% 0.50% 0.50% 0.50% 1% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 75% 0.50% 6% 5% 3% 169,626

Other Black 1% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 0.50% 34% 1% 30% 2% 0.50% 20% 6% 1% 4% 27,625

Other Mixed 10% 0.50% 0.50% 1% 0.50% 1% 5% 3% 1% 2% 3% 2% 8% 5% 4% 47% 5% 5% 23,763

Other Ethnicity 5% 0.50% 0.50% 0.50% 0.50% 0.50% 2% 12% 0.50% 2% 1% 0.50% 0.50% 10% 2% 10% 50% 4% 66,760

Missing 69% 0.50% 0.50% 2% 1% 3% 1% 2% 0.50% 4% 0.50% 2% 1% 5% 1% 2% 2% 3% 222,193

Census and England School Census Ethnicity

Page 21: Evaluating the feasibility of using administrative data in the context of census population statistics

Admin data quality and how it impacts

SPDs

21

Risk / Issue Example Mitigation

Definitional differences Definition of residence

Definition of household

Lack of common

address identifier e.g.

(UPRN)

Harmonisation,

combined use with

surveys

Single address register

and UPRN at source

Registration People not de-

registered when abroad

or dead

Not registering on

arrival or moving

lags in registration and

update

Using activity data and

combined survey use

Feedback to data

suppliers

Policy or operational

change

Patient register list

cleaning

Benefit change, e.g.

Universal credit

Consultation and

engagement with

ONS/GSS,

Through new

legislation?

Page 22: Evaluating the feasibility of using administrative data in the context of census population statistics

Summary – The potential future of

administrative data

• Need to incorporate more evidence of activity from

admin data to help improve quality of SPDs

• More research to understand whether there are

different characteristics of those people who are

included or are not included on the SPD

• Need to maintain and establish positive relationships

with Data Suppliers

Page 23: Evaluating the feasibility of using administrative data in the context of census population statistics

References

• Source reports for administrative datasets can be

found on the ONS website under:

http://www.ons.gov.uk/ons/about-ons/who-ons-are/programmes-

and-projects/beyond-2011/reports-and-publications/index.html

Page 24: Evaluating the feasibility of using administrative data in the context of census population statistics

Census Transformation Programme

Annual Research Conference

Covering: 2021 Census design, Census topic

consultation, linking admin data, research outputs

26-27th November 2015

Chichester College

To register an interest or for more information

please email:

[email protected]

Page 25: Evaluating the feasibility of using administrative data in the context of census population statistics

[email protected]

Any Questions?