17
Statistical disclosure limitation: Balancing data confidentiality and data access

Statistical disclosure limitation: Balancing data confidentiality and data access

Embed Size (px)

Citation preview

Page 1: Statistical disclosure limitation: Balancing data confidentiality and data access

Statistical disclosure limitation: Balancing data confidentiality

and data access

Page 2: Statistical disclosure limitation: Balancing data confidentiality and data access

• Enables evidence based policy-making• Informs the general public on local and

national concerns• Advances scientific research • Trains students in data analysis and decision

making

Access to high quality data is vital

Page 3: Statistical disclosure limitation: Balancing data confidentiality and data access

Breach of confidentiality:• May violate laws (e.g., CIPSEA, HIPAA)• Undermine broadly held and highly valued

ethical principles• May lead data providers to withhold

important information or refuse to participate in research

Protecting confidentiality of data is essential

Page 4: Statistical disclosure limitation: Balancing data confidentiality and data access

• HIPAA Privacy Rule - “Safe harbor”

- Statistical standard- Limited data sets

• 2008: Delaware Cancer Registry vs. press - Public’s desire to learn cancer sites - State requirement to protect privacy - New legislation

Example of tension between access and confidentiality

Page 5: Statistical disclosure limitation: Balancing data confidentiality and data access

• De-identification Strip unique identifiers like names, addresses, and tax IDs from shared files.

• Reducing potential for re-identifcation Seemingly innocuous information may reveal individual identities and information

Protecting confidentiality while providing access

Page 6: Statistical disclosure limitation: Balancing data confidentiality and data access

• “De-identification”Original data name abcdefghijkl Name deleted abcdefghijkl

• “Re-identification” Shared data abcdefghijkl Other data abcdefmnop name

Where:a = Day, month, year of birth d = Countyb = Gender e = Occupationc = State of residence f = Race

Example: Re-identification by matching

Page 7: Statistical disclosure limitation: Balancing data confidentiality and data access

• Advances in statistical analysis and the collection of more detailed data enable researchers and policy makers to ask refined questions

• Enormous amounts of individual-level data are collected, processed, widely distributed … and linkable.

• Better matching technologies enable linkages

Better data – opportunities and problems

Page 8: Statistical disclosure limitation: Balancing data confidentiality and data access

• Personal information available on the Internet, from private sources, and government surveys

• Individuals with the right skills and resources could link this personal information to publicly available data:–MIT student re-identifies Massachusetts

governor– NIH scientists express caution in making

genetic information available

Problems – a closer look

Page 9: Statistical disclosure limitation: Balancing data confidentiality and data access

Statisticians:• Develop ways to identify risk of confidentiality

breaches• Develop methods for providing safe access to

confidential data• Conduct research on providing safe access to

emerging, complex data types

Statisticians can help find a satisfactory balance

Page 10: Statistical disclosure limitation: Balancing data confidentiality and data access

General strategies for data protection:

• Modify data content Remove or alter sensitive or identifying values, and provide unrestricted access to modified data (e.g., public use files)

• Control data accessUse technology and training to reduce chances of breaches, limit who can access the confidential data, the conditions under which the data can be accessed, and the purposes for which the data can be used

Useful data can be shared and protected

Page 11: Statistical disclosure limitation: Balancing data confidentiality and data access

• Eliminate variables (geography)• Aggregate sensitive data (age, income)• Add random variation to numerical data values• Exchange some values between selected

records • Replace sensitive data with values simulated

from statistical models estimated with the original data

Modified data: General techniques

Page 12: Statistical disclosure limitation: Balancing data confidentiality and data access

• Methods can be applied to all or some cases with varying degrees

• Wider application of methods improves confidentiality protection, but…

• …degrades usefulness of data • Statisticians measure the tradeoffs between

disclosure risk and analytic/policy priorities

Key features of modified data

Page 13: Statistical disclosure limitation: Balancing data confidentiality and data access

• Restricted data enclaves (Census, NCHS)• Remote access systems (NCHS, NORC)• Licensing (NCES, BLS, )• Online tabulations/analysis (Census, NCHS,

NCES)

Restricted access increasingly provided - examples

Page 14: Statistical disclosure limitation: Balancing data confidentiality and data access

• Safe projects: Authorized projects, typically with data use agreement

• Safe people: Approved analysts from authorized institutions; trained in confidentiality issues

• Safe sites: Use actively monitored by data custodians

• Safe outputs: Data products subject to statistical and confidentiality review

=> Analysts have use of detailed data but do not “own” them which permits manipulations not possible with publicly available data

Key features of restricted access

Page 15: Statistical disclosure limitation: Balancing data confidentiality and data access

• Data access and data confidentiality are intimately connected

• Statisticians play a central role in improving data usefulness while protecting data confidentiality

• Statisticians in government, academia, and industry can provide guidance to policy-makers on key issues related to privacy and confidentiality

Summary

Page 16: Statistical disclosure limitation: Balancing data confidentiality and data access

• ASA Statement on Data Access and Personal Privacy http://www.amstat.org/news/statementondataaccess.cfm

• ASA’s Privacy and Confidentiality Committee http://www.amstat.org/committees/commdetails/cfm?txtComm=CCNPRO02

• ASA’s Privacy, Data Security and Confidentiality Websitehttp://www.amstat.org/committee/pc/index.html

• OMB/FCSM Report on Statistical Disclosure Limitation Methodology

http://www.fcsm.gov/working-papers/spwp22.html

• Expanding Access to Research Data: Reconciling Risks and Opportunities http://books.nap.edu/catalog.php?record_id=11434

Further information

Page 17: Statistical disclosure limitation: Balancing data confidentiality and data access

American Statistical Association 732 N. Washington StreetAlexandria, Virginia 22314

703.684.1221http://www.amstat.org