18
Tragedy of the Deidentified Data Commons An Appeal for Transparency and Access Jane Bambauer James E. Rogers College of Law University of Arizona

Tragedy of the Deidentified Data Commons An Appeal for Transparency and Access Jane Bambauer James E. Rogers College of Law University of Arizona

Embed Size (px)

Citation preview

Page 1: Tragedy of the Deidentified Data Commons An Appeal for Transparency and Access Jane Bambauer James E. Rogers College of Law University of Arizona

Tragedy of theDeidentified Data

CommonsAn Appeal for Transparency and Access

Jane BambauerJames E. Rogers College of Law

University of Arizona

Page 2: Tragedy of the Deidentified Data Commons An Appeal for Transparency and Access Jane Bambauer James E. Rogers College of Law University of Arizona

The Data Commons

Information collected by the governmenttax information, epidemiological data, census surveys,

educational records, home mortgage data

Information collected by private companies

Anonymized and released*

Page 3: Tragedy of the Deidentified Data Commons An Appeal for Transparency and Access Jane Bambauer James E. Rogers College of Law University of Arizona

The Anonymization Problem

• Research subjects can be reidentified in anonymized databases “with astonishing ease.”

AOLRe-identification of Gov. WeldNetflix re-identification

• Every privacy law must be rewritten to eliminate dependence on anonymization and to restrict access to all data (even deidentified data) without consent

Paul Ohm, Broken Promises of Privacy

57 UCLA L. REV. 1701

Page 4: Tragedy of the Deidentified Data Commons An Appeal for Transparency and Access Jane Bambauer James E. Rogers College of Law University of Arizona

Save the Data Commons

The Data Commons has been used to:

• Detect housing and employment discrimination• Debunk the myth of the “welfare queen”• Inform the healthcare and

mortgage lending policy debates• Correct longstanding

misconceptions about crime and law enforcement

• Lots more…

Jane Yakowitz, Tragedy of the Data Commons

Page 5: Tragedy of the Deidentified Data Commons An Appeal for Transparency and Access Jane Bambauer James E. Rogers College of Law University of Arizona

Hazards of Covert Noise-Adding

Page 6: Tragedy of the Deidentified Data Commons An Appeal for Transparency and Access Jane Bambauer James E. Rogers College of Law University of Arizona

Hazards of Covert Noise-Adding

Page 7: Tragedy of the Deidentified Data Commons An Appeal for Transparency and Access Jane Bambauer James E. Rogers College of Law University of Arizona

Exaggerated Risks of ReidentificationThe Gov. Weld Example

Page 8: Tragedy of the Deidentified Data Commons An Appeal for Transparency and Access Jane Bambauer James E. Rogers College of Law University of Arizona

Exaggerated Risks of ReidentificationThe Gov. Weld Example

Page 9: Tragedy of the Deidentified Data Commons An Appeal for Transparency and Access Jane Bambauer James E. Rogers College of Law University of Arizona

Exaggerated Risks of ReidentificationThe Gov. Weld Example

Page 10: Tragedy of the Deidentified Data Commons An Appeal for Transparency and Access Jane Bambauer James E. Rogers College of Law University of Arizona

Gov. Weld Reidentification

Latanya Sweeney Collected Gov. Weld’s voter registration information and publicly available hospital data

Only one hospital patient matched Gov. Weld’s DOB, zip, and gender

Conclusion from analysis of US Census data:87% can be uniquely identified from DOB, zip, and gender

Golle recalculations:63% are unique using DOB, zip, and gender

Page 11: Tragedy of the Deidentified Data Commons An Appeal for Transparency and Access Jane Bambauer James E. Rogers College of Law University of Arizona

Daniel Barth-Jones, “Reidentification” of Governor William Weld

Page 12: Tragedy of the Deidentified Data Commons An Appeal for Transparency and Access Jane Bambauer James E. Rogers College of Law University of Arizona

Sweeney et al. 2013 PGP Study

579 Personal Genome Project participants provided their DOB, zip code, and gender

Using voter registration records and other commercial data sources, Sweeney et al. were able to reidentify 28%(accuracy unclear)

Page 13: Tragedy of the Deidentified Data Commons An Appeal for Transparency and Access Jane Bambauer James E. Rogers College of Law University of Arizona

2009 ONC Study

Out of 15,000 HIPAA-compliant records, 2 could be reidentified

.013% Chance of Reidentification

For comparison’s sake, chance of dying from an auto accident this year: .017%

Page 14: Tragedy of the Deidentified Data Commons An Appeal for Transparency and Access Jane Bambauer James E. Rogers College of Law University of Arizona

Total Number of Known Malicious Reidentifications

0 or 1*

Page 15: Tragedy of the Deidentified Data Commons An Appeal for Transparency and Access Jane Bambauer James E. Rogers College of Law University of Arizona

If I Were a Malicious Intruder…

3,101 reported data breaches in the U.S.

(about half a billion records)

700 reported breaches of health records

Page 16: Tragedy of the Deidentified Data Commons An Appeal for Transparency and Access Jane Bambauer James E. Rogers College of Law University of Arizona

If I Were a Malicious Intruder…

Sift through GarbageMake Inferences from Facebook ProfilesSwab a Coffee Cup

Page 17: Tragedy of the Deidentified Data Commons An Appeal for Transparency and Access Jane Bambauer James E. Rogers College of Law University of Arizona

What We Have to Lose• Fewer Opportunities for Replication• Fewer Voluntary Research Databases• Fewer Involuntary Public Databases• Increased Regulatory Precautions

More Status Quo Bias

Page 18: Tragedy of the Deidentified Data Commons An Appeal for Transparency and Access Jane Bambauer James E. Rogers College of Law University of Arizona

Vioxx “What If” Study

From Richard Platt’s FDA testimony in 2007

Vioxx approved May, 1999Removed from market September, 2004 (64 months)

Data on 7 million patients: 34 months

Data on 100 million: 3 months

88,000-139,000 avoidable heart attacks27,000-55,000 avoidable deaths