12
User-focused Threat Identification For Anonymised Microdata Hans-Peter Hafner HTW Saar – Saarland University of Applied Sciences [email protected] Felix Ritchie University of the West of England Rainer Lenz Technical University of Dortmund Conference of European Statistics Stakeholders Rome, 24 November 2014 1

User-focused Threat Identification For Anonymised Microdata Hans-Peter Hafner HTW Saar – Saarland University of Applied Sciences [email protected]

Embed Size (px)

Citation preview

Page 1: User-focused Threat Identification For Anonymised Microdata Hans-Peter Hafner HTW Saar – Saarland University of Applied Sciences Hans-Peter.Hafner@htwsaar.de

User-focused Threat IdentificationFor Anonymised Microdata

Hans-Peter HafnerHTW Saar – Saarland University of Applied Sciences [email protected]

Felix RitchieUniversity of the West of England

Rainer LenzTechnical University of Dortmund

Conference of European Statistics StakeholdersRome, 24 November 2014

1

Page 2: User-focused Threat Identification For Anonymised Microdata Hans-Peter Hafner HTW Saar – Saarland University of Applied Sciences Hans-Peter.Hafner@htwsaar.de

2

Motivation

Motivation

User-focused threat identification

Production of anonymised data sources for the scientific community as a key task of National Statistics Institutes (NSI)

Conservative risk averse approach (data protection) Release data only if it can be shown they are safe (defensive)

vs

Alternative user oriented approach Release data unless it presents a disclosure risk (cooperative)

Page 3: User-focused Threat Identification For Anonymised Microdata Hans-Peter Hafner HTW Saar – Saarland University of Applied Sciences Hans-Peter.Hafner@htwsaar.de

3

Overview

• Common approaches to anonymisation

• Critique of common perspective

Focus data protection Worst-case scenarios

• Evidence-based risk assignment (Case study: CIS 2010)

• Impact of new strategy

• Conclusion

Overview

User-focused threat identification

Page 4: User-focused Threat Identification For Anonymised Microdata Hans-Peter Hafner HTW Saar – Saarland University of Applied Sciences Hans-Peter.Hafner@htwsaar.de

4

Common approach to anonymisation

ESSNET Handbook on SDC (Statistical Disclosure Control)

• Microdata protection should be based on

Knowledge of the use of the data Access requirements Potential to match external datasets Structure of the data itself

• Risk scenarios are based on

Spontaneous recognition Actively searching (record linkage)

Common approach to anonymisation

User-focused threat identification

Page 5: User-focused Threat Identification For Anonymised Microdata Hans-Peter Hafner HTW Saar – Saarland University of Applied Sciences Hans-Peter.Hafner@htwsaar.de

5

Critique 1: Focus on data protection

Assumption:

Existence of intruders who want to identify companies / persons in the data.

But:

There are no known cases of malicious misuse of data.Only some mistakes or some efforts to circumvent procedures to make life easier are known.

Problem not anonymisation but accreditation procedures!

Critique of common perspective 1

User-focused threat identification

Page 6: User-focused Threat Identification For Anonymised Microdata Hans-Peter Hafner HTW Saar – Saarland University of Applied Sciences Hans-Peter.Hafner@htwsaar.de

6

Worst-case Scenarios

Scenario often:

Anonymised data vs. Original data (Record matching)

Not realistic:

• Large differences between official statistics and commercial databases

Total protection is not required by law:

• De facto anonymity (Germany): Reidentification allowed as far as effort / costs greater than benefit

Critique of common perspective 2

User-focused threat identification

Page 7: User-focused Threat Identification For Anonymised Microdata Hans-Peter Hafner HTW Saar – Saarland University of Applied Sciences Hans-Peter.Hafner@htwsaar.de

7

Evidence-Based Risk Assignment:Case Study CIS 2010

CIS (Community Innovation Survey)

• Survey about the innovation activities of enterprises in countries of the European Union

• Conducted every 2 years• For some countries census, for others only sample survey; but large

companies are always included• Many categorical variables, only 9 continuous attributes

Case Study 1

User-focused threat identification

Page 8: User-focused Threat Identification For Anonymised Microdata Hans-Peter Hafner HTW Saar – Saarland University of Applied Sciences Hans-Peter.Hafner@htwsaar.de

8

Case Study CIS 2010 – to be continued

Risk Scenario

Step 1: Identify user needs

Analysis of research papers + Google Scholar search

Linear and nonlinear regression are most frequently used methods

Step 2: Identify user risks

Spontaneous recognition of outliers No risk since no disclosure to unauthorized person

Group disclosure from categorical variables No risk since focus not on descriptive statistics

Case Study 2

User-focused threat identification

Page 9: User-focused Threat Identification For Anonymised Microdata Hans-Peter Hafner HTW Saar – Saarland University of Applied Sciences Hans-Peter.Hafner@htwsaar.de

9

Case Study CIS 2010 – to be continued

Case Study 3

User-focused threat identification

Risk Evaluation

Spontaneous recognition

Very unlikely because of large differences between data sources

Matching on categorical variables

Uncertain since statistical business register and classification of economic activity in commercial databases differ (main activity vs main turnover) Moreover: Matching is prohibited by licence agreements

Remaining risks Magnitude tables with 1 or 2 observations in a cell Dominance of one unit in cell / dataset

Page 10: User-focused Threat Identification For Anonymised Microdata Hans-Peter Hafner HTW Saar – Saarland University of Applied Sciences Hans-Peter.Hafner@htwsaar.de

10

Impact of new strategy

Impact

User-focused threat identification

Consequence of risk evaluation

Small cell count (< 3) or dominance problem in cell:

Determination of records at risk in these cells Only records at risk are perturbed (individual microaggregation of metric variables)

Consequence for the quality of the anonymised datasets

For less than 1% of all records microaggregation was performed Small impact on regression coefficients

Page 11: User-focused Threat Identification For Anonymised Microdata Hans-Peter Hafner HTW Saar – Saarland University of Applied Sciences Hans-Peter.Hafner@htwsaar.de

11

Conclusion

Conclusion

User-focused threat identification

Change of perspective

from total data protection to a realistic user-oriented approach

that takes into account user needs, quality of external databases, accreditation procedures and statistical legislation

leads to datasets with higher analytical potential for the scientific community!

Page 12: User-focused Threat Identification For Anonymised Microdata Hans-Peter Hafner HTW Saar – Saarland University of Applied Sciences Hans-Peter.Hafner@htwsaar.de

12

User-focused threat identification

THANK YOU FOR YOUR ATTENTION