Upload
clio
View
31
Download
0
Tags:
Embed Size (px)
DESCRIPTION
The secret lives of us: d ata confidentiality. October 2011. Linda Fardell Cross Portfolio Data Integration Secretariat. What is it & why should you care?. It’s about obligations – legal/ethical Aim – protect identity and release useful data It’s more than removing name & address - PowerPoint PPT Presentation
Citation preview
October 2011
Linda FardellCross Portfolio Data Integration Secretariat
The secret lives of us:data confidentiality
What is it & why should you care?
• It’s about obligations – legal/ethical
• Aim – protect identity and release useful data
• It’s more than removing name & address
• Trust of providers is essential to get good stats
Information is power
• Banker in Maryland obtained a list of patients with cancer• compared with list of clients with outstanding
loans
• called in the loans of clients with cancer.
Source: Data confidentiality: a review of methods for statistical disclosure limitation and methods for assessing privacy (Statist. Surv. Volume 5 (2011), 1-29.
Legislative obligations
• Privacy Act
• Specific legislation governing collection & use of information e.g.• Social Security (Administration) Act 1999
• Taxation Administration Act 1953
Other obligations
• Principles based obligationse.g. High Level Principles for Data Integration Involving Commonwealth Data for Statistical and Research Purposes
How agencies meet these obligations
• Implement procedures to address all aspects of data protection
• To ensure that identifiable information:• is not released publicly;• is available on a ‘need to know’ basis;• can’t be derived from disseminated data; and• is maintained and accessed securely.
Understand your obligations
Establish policies and procedures
De-identify the data
Assess potential identification risks
Manage the risks of identification - confidentialise
Test and evaluate to mitigate risks
Provide safe access to data
Managing identification risk
Access to other information
• Keep track of all information released from the dataset.
When should a cell be confidentialised?
• Common confidentiality rules:• frequency (threshold) rule• cell dominance (cell concentration) rule
• Keep specific confidentiality procedures secret (e.g. the particular value chosen when applying the threshold rule)
Two general methods
• Data reduction
• Data modification (perturbation)
Example: frequency rule - 5
Age Income
Low Med High Total
15–19 20 0 0 20
20–29 14 11 8 33
30–39 8 12 7 27
40–49 6 18 24 48
50–59 4 5 14 23
60+ 12 9 7 28
Total 64 55 60 179
Before
Example: cont.Age Income
Low Med High Total
15–19 20 0 0 20
20–29 14 11 8 33
30–39 8 12 7 27
40–49 6 n.p. 18 n.p. 24 48
50–59 4 n.p. 5 n.p. 14 23
60+ 12 9 7 28
Total 64 55 60 179
After
Alternative: concealing totals
Age Income
Low Medium High Total
15–19 20 0 0 20
20–29 14 11 8 33
30–39 8 12 7 27
40–49 6 18 24 48
50–59 n.p. 5 14 >19
60+ 12 9 7 28
Total >60 55 60 >175
E.g. 2 – the cell dominance (n,k) rule
Widget brand Profit ($m)
A 150B 93C 21D 13E 8F 8G 6H 1Total 300
• Cell unsafe if combined contributions of the ‘n’ largest members of the cell represent more than ‘k’% of the total value of the cell
• n & k values are set by data custodian
• Example: (2, 75) rule• A & B contribute 81% of
total profit, so profit needs protecting
Data modification methodsAge Income
Low Med High Total
15–19 20 0 0 20
20–29 14 11 8 33
30–39 8 12 7 27
40–49 6 18 24 48
50–59 4 5 14 23
60+ 12 9 7 28
Total 64 55 60 179
Before roundingRR3
Data modification methods
Age Income
Low Med High Total
15–19 20 21 0 0 20 21
20–29 14 15 11 12 8 9 33
30–39 8 9 12 7 6 27
40–49 6 18 24 48
50–59 4 3 5 6 14 15 23 24
60+ 12 9 7 6 28 27
Total 64 63
55 54
60 179 180
After rounding RR3
Microdata
• Valuable resource
• 2 key types of disclosure risk:
1. spontaneous recognition
2. deliberate (malicious) attempt
Microdata – managing risks
• confidentialising
• deterrents
• restricting access
• educating data users about their obligations
• safe environment for access
Microdata – methods to assess risks
• cross-tabulation of variables;
• comparing sample data with pop’n data to see if the unique characteristics in the sample are unique in the population; and
• acquiring knowledge of other datasets & publicly available info. that could be used for list matching.
Protecting microdata
• 1st level of protection: remove direct identifiers
• Common ways to protect microdata are:
1. confidentialising; and/or
2. restricting access to the file
Confidentialising microdata
• Same principles as protecting aggregate data:
• limit variables
• introduce small amounts of random error (e.g. data swapping)
• combine categories (e.g. age in 5 year ranges)
• top/bottom code
• suppress particular values/records that can’t otherwise be protected.
Restricting access to microdata
What affects the risk of identification?
• motivation
• level of detail
• presence of rare characteristics
• accuracy of the data
• age of the data
• coverage of the data (completeness)
• presence of other information
A note on terminology…
• Confusion between de-identification and confidentialisation
More information – www.nss.gov.au