29
The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany Bond (U.S. Department of Commerce) J. David Brown & Amy O’Hara (U.S. Census Bureau) FedCASIC March 2014 1

The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

The Nature of the Bias When Studying Only

Linkable Person Records: Evidence from the American Community Survey

Adela Luque (U.S. Census Bureau)

Brittany Bond (U.S. Department of Commerce)

J. David Brown & Amy O’Hara (U.S. Census Bureau)

FedCASIC March 2014

1

Page 2: The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

Disclaimer

Any opinions and conclusions expressed herein are those of the authors and do not necessarily reflect the views of the U.S. Census Bureau

All results have been reviewed to ensure that no confidential information on individual persons is disclosed

2

Page 3: The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

Overview

Motivation

Objectives

Data & Methodology

Background on Anonymous Identifier Assignment Process

Expected Effects

Results

Conclusions

3

Page 4: The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

Motivation

Record linkage can enrich data, improve its quality & lead to research not otherwise possible - while reducing respondent burden & operational costs

Linking data requires common identifiers unique to each record that protect confidentiality

Census Bureau assigns Protected Identification Keys (PIKs) via a probabilistic matching algorithm: PVS (Personal Identification Validation System)

Not possible to reliably assign a PIK to every record, which may introduce bias in data analysis

4

Page 5: The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

Objectives What characteristics are associated with the

probability of receiving a PIK? That is, what is the nature of the bias introduced by incomplete PIK assignment?

Help researchers understand nature of bias, interpret results more accurately, adjust/reweight linked analytical dataset

Examine bias using regression analysis - before & after changes in PVS. Do alterations to PVS improve PIK assignment rates as well as reduce bias? NORC (2011) described some demographic and socio-

economic characteristics of those records not getting a PIK

5

Page 6: The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

Data & Methodology

2009 & 2010 American Community Survey (ACS) – processed through PVS Ongoing representative survey of the U.S. population Socioeconomic, demographic & housing characteristics 50 states & DC - Annual sample approximately 4.5 million person records

Probit model for 2009 and 2010 separately Dependent variable = 1 if person record received a PIK (0 otherwise) Covariates:

Demographic characteristics: age, sex, race and Hispanic origin Socio-economic characteristics: employment status, income, poverty status, marital status, level

of education, public program participation, health insurance status, citizenship status, English proficiency, military status, mobility status, and household type

Housing and address-related characteristics: urban vs. rural, type of living quarter, age of living quarter

ACS replicate weights Report marginal effects

2009 & 2010 results compared – before & after changes to PVS

6

Page 7: The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

Background on PVS

Probabilistic match of data from an incoming file (e.g., survey) to reference file containing data from the Social Security Administration enhanced with address data obtained from federal administrative records

If a match is found, person record receives a PIK or is “validated”

7

Page 8: The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

Background on PVS

Initial edit to clean & standardize linking fields (name, dob, sex & address)

Incoming data processed through cascading modules (or matching algorithms)

Only records failing a given module move on to the next

Impossible to compare all records in incoming file to all records in reference file → “blocking” Data split into blocks/groups based on exact matches

of certain fields or part of fields – probabilistic matching within block

8

Page 9: The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

Background on PVS

2009 PVS Modules Verification – Only for incoming files w/ SSNs

Geosearch looks for name/dob/gender matches after blocking on an address or address part (within 3-digit ZIP area)

Namesearch looks for name & dob matches within a block based on parts of name/dob

Each module has several ‘passes’ – different blocking & matching strategies

9

Page 10: The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

Background on PVS 2010 PVS Enhancements

ZIP3 Adjacency Module looks for name/dob/gender/address matches after blocking on address field parts in areas adjacent to 3-digit ZIP area

DOB Search Module looks for name/gender/dob matches after blocking on month & day of birth

Household Composition Search Module looks for name/dob matches for unmatched records that are seen in past at same address with PIKedrecord

Inclusion of ITINs in reference file

10

Page 11: The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

Expected EffectsLess likely to obtain a PIK: Insufficient or inaccurate person identifying info

in incoming record Issues w/ data collection or withholding due to

language barriers, trust in govt., privacy preferences Identifying info in incoming file & reference file more

likely to differ

Address info differs/not updated Movers, rent vs. own, certain types of housing

Record not in government reference files Newborns, recent immigrant, very

poor/unemployed/no govt. program recipient

11

Page 12: The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

Results – Overall Validation Rates

88.192.6

0.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

80.0

90.0

100.0

2009 2010

PVS Validation Rate (weighted)

12

Sources: 2009 & 2010 ACS

Page 13: The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

Probit Results

-0.040

-0.035

-0.030

-0.025

-0.020

-0.015

-0.010

-0.005

0.000

Hispanic Non-Hispanic (base)

Marginal Effect of Hispanic Origin on PVS Validation

2009

2010

13

Sources: 2009 & 2010 ACS

Page 14: The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

Probit Results

14

-0.035

-0.030

-0.025

-0.020

-0.015

-0.010

-0.005

0.000

0.005

0.010

White alone (base) Black alone AIAN alone Asian alone NHPI alone Some Other Race

Marginal Effect of Race on PVS Validation

2009 2010

Note: Dotted bars indicate that change in marginal effect from 2009 to 2010 is not statistically significant.

Page 15: The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

Probit Results

-0.140

-0.120

-0.100

-0.080

-0.060

-0.040

-0.020

0.000

Non-U.S. Citizen Foreign-Born U.S. Citizen U.S. Citizen (base)

Marginal Effect of Citizenship Status on PVS Validation

2009

2010

15

Page 16: The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

Probit Results

-0.050

-0.045

-0.040

-0.035

-0.030

-0.025

-0.020

-0.015

-0.010

-0.005

0.000

Poor Spoken English Not Poorly Spoken English (base)

Marginal Effect of Home Spoken English Quality on PVS Validation

2009

2010

16

Note: Dotted bars indicate that change in marginal effect from 2009 to 2010 is not statistically significant.

Page 17: The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

Probit Results

-0.050

-0.040

-0.030

-0.020

-0.010

0.000

0.010

0.020

0.030

0.040

0.050

0-2 3-5 6-9 10-14 15-18 19-24 25-34(base)

35-44 45-54 55-64 65-74 75 andOlder

Marginal Effect of Age on PVS Validation

2009 2010

17

Note: Dotted bars indicate that change in marginal effect from 2009 to 2010 is not statistically significant.

Page 18: The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

Probit Results

-0.180

-0.160

-0.140

-0.120

-0.100

-0.080

-0.060

-0.040

-0.020

0.000

0.020

Non-Mover in 12 MonthsBefore IM (base)

Mover From Abroad in 12Months before IM

Domestic Mover in 12Months before IM Moving status, missing

Marginal Effect of Mobility Status on PVS Validation

2009 2010

18

Page 19: The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

Probit Results

-0.040

-0.035

-0.030

-0.025

-0.020

-0.015

-0.010

-0.005

0.000

Rented Housing Unit Own Home (base)

Marginal Effect of Rent vs. Own on PVS Validation

2009

2010

19

Page 20: The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

Probit Results

-0.040

-0.030

-0.020

-0.010

0.000

0.010

0.020

0.030

0.040

0.050

Marginal Effect of Type of Living Quarter on PVS Validation

2009 2010

20

Note: Dotted bars indicate that change in marginal effect from 2009 to 2010 is not statistically significant.

Page 21: The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

Probit Results

0.0000

0.0050

0.0100

0.0150

0.0200

0.0250

0.0300

Non-Family Household (base) Living with Family

Marginal Effect of Family Household Status on PVS Validation

2009

2010

21

Page 22: The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

Probit Results

-0.004

-0.002

0.000

0.002

0.004

0.006

0.008

0.010

Rural (base) Urban Area

Marginal Effect of Rural vs Urban Area on PVS Validation

2009

2010

22

Page 23: The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

Probit Results

-0.005

0.000

0.005

0.010

0.015

0.020

0.025

0.030

0.035

Private Employment GovernmentEmployment

Self-Employed Family Employment Not Employed -includes missing (base)

Marginal Effect of Employment Status on PVS Validation

2009 2010

23

Note: Dotted bars indicate that change in marginal effect from 2009 to 2010 is not statistically significant.

Page 24: The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

Probit Results

0.000

0.010

0.020

0.030

0.040

0.050

0.060

Social SecurityRecipient

Private HealthInsurance

Public HealthInsurance

Both Private andPublic Health

Insurance

Uninsured (base)

Marginal Effect of Health Insurance Status on PVS Validation

2009

2010

24

Page 25: The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

Probit Results

0.0000

0.0020

0.0040

0.0060

0.0080

0.0100

0.0120

0.0140

Public Assistance Recipient Not Public Assistance Recipient (base)

Marginal Effect of Receiving Public Assistance on PVS Validation

2009

2010

25

Note: Dotted bars indicate that change in marginal effect from 2009 to 2010 is not statistically significant.

Page 26: The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

Probit Results

26

-0.020

-0.018

-0.016

-0.014

-0.012

-0.010

-0.008

-0.006

-0.004

-0.002

0.000

Not In Poverty (base) In Poverty

Marginal Effect of Poverty Status on PVS Validation

2009

2010

Page 27: The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

Conclusions

Mobile persons, those with lower income, unemployed, in process of integrating in economy/society, non-participants in government programs are less likely to be validated

Renters, movers, mobile homes

Low income, non-employed, most minorities, non-U.S. citizens, poor English

Non-participants of govt. program, uninsured, non-military

Researchers may wish to reweight observations based on validation propensity

27

Page 28: The Nature of the Bias When Studying Only Linkable Person ... · The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela

Conclusions

Changes to PVS system

Increased overall validation rate by 4.5 percentage points

Reduced validation differences across most groups from 2009 to 2010

Record linkage research can lead to higher PIK assignment rates and less bias

28