Upload
bruce-carroll
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Holistic PrivacyFrom Location Privacy to Genomic Privacy
Jean-Pierre HubauxWith contributions from
E. Ayday, M. Humbert, J.-Y. Le Boudec, J.-L. Raisaro, R. Shokri, G. Theodorakopoulos
Make It Faster!!
2
Benz Motorwagen, 1885
Ford-T, 1915
After Some Decades…
3
… the Concerns Have Changed
• Reduce casualties– Better brakes– Safety belts– Airbags– …
• Mitigate side effects– Road congestion– Depletion of fossil fuel– Climate change– ….
4
Similar Phenomenon with IT
5
For each end user:
•10s to 1000s Mb/s•Terabytes of storage•Processor in the Ghz
Assault on privacy
Cyber-crime, cyberwar
Information overload,attention deficit disorder
Holistic PrivacyFrom Location Privacy to Genomic Privacy
1. On Privacy Protection2. Location Privacy3. Genomic Privacy
6
Another Observation Tool…
“The Right to Privacy”Warren and BrandeisHarvard Law Review
Vol. IV Dec. 15, 1890 No. 5 7
Major concern: photography without consent
Some Modern Observation Tools
8Cellularphones Online Social Networks
Genomicsequencing
Privacy: Definition
• Privacy control is the ability of individuals to determine when, how, and to what extent information about themselves is revealed to others.
• Goal: let personal data be used only in the context they have been released
• Privacy is about the data of individuals
9
Main Risk: People’s Mind Manipulation
10
Citizens (us)
Those observing us
Privacy Protection at Odds with…
11
Privacy Protection
Security (e.g., homeland security)
Business (e.g., targeted advertisement)
Usability
System performance
Medical progress
Holistic PrivacyFrom Location Privacy to Genomic Privacy
1. On Privacy Protection2. Location Privacy3. Genomic Privacy
12
13
Users upload location episodically
through WiFi or cellular networks
Query, Location, TimeQuery, Location, Time
Location-Based Services
14
Why Reveal Your Location?
• To use service– Cellular connectivity– Location-based services– Local recommendations– Road toll payment– …
• For social benefits– Find friends
15
Can You Clean up Your Digital Trace?
01
0203
04
05
06
07
08
09
10111213 14
15
16
17
18
0102
0304
05060708
091011
12
13 1415
16
1718
events-----------------------------------------------
Color: user identityNumber: time-stampPosition on the map: location-stamp
0101
Threat
16
The contextual information attached to a trace tells much about our habits, interests, activities, beliefs and relationships
17
Quantification of Location Privacy
• Many privacy-preserving mechanisms proposed
• No unified formal framework in previous work• Various metrics for location privacy
• How to compare different mechanisms?• Which metric to use?
18
Time and Space
• Consider discrete time and space
• Attacker: service provider (``honest but curious´´)
19
Quantifying Location Privacy
KC: Knowledge Constructor LPPM: Location Privacy Protection Mechanism:-deliberately imprecise coordinate reports (e.g., drop some of the least significant bits)-Swap user identifiers
20
Correctness
The adversary’s estimation of x given the observed traces o
21
Location-Privacy Preserving Mechanisms
Implemented LPPMs:
Location-Privacy Meter
Open source software tool (C++) to quantify location privacy
23
Location-Privacy Meter (LPM)– Some traces to learn the users’ mobility profiles
(background knowledge)– Observed traces
– Location privacy of users with respect to various attacks: Localization, Tracking, Meeting Disclosure, Aggregate Presence Disclosure,…
LPM
24
LPM: Example
• N = 20 users• R = 40 regions• T = 96 time instants
• Protection mechanism:– Hiding location– Precision reduction (dropping
low-order bits from the x, y coordinates of the location)
25
Attacks
•LO-ATT: Localization Attack: For a given user u and time t, what is the location of u at t?
•MD-ATT: Meeting Disclosure Attack: For a given pair of users u and v, what is the expected number of meetings between u and v?
•AP-ATT: Aggregated Presence Attack: For a given region r and time t, what is the expected number of users present in r at t?
27
Results
Protecting Location Privacy:Optimal Strategy against Localization Attacks
Adversary Knowledge:User’s “Location Access Profile”
29Data source: Location traces collected by Nokia Lausanne (Lausanne Data Collection Campaign)
Location Obfuscation Mechanism
Consequence: “Service Quality Loss”
30
Location Inference Attack
Estimation Error: “Location Privacy”
31
Problem Statement
32
Zero-sum Bayesian Stackelberg Game
User Adversary (leader) (follower)
Game
LBS message
user gain / adversary loss
33
Optimal Strategy for the User
Proper probability distribution
Respect service qualityconstraint
34
Optimal Strategy for the Adversary
Note: This is the dual of the previous optimization problem
Proper probability distribution
Shadow price of the service quality constraint .(exchange rate between service quality and privacy)
Minimizing the user’s maximum privacy under the service qualityconstraint
35
Evaluation: Obfuscation Function
36
Output Visualization of Obfuscation Mechanisms
Optimal Obfuscation Basic Obfuscation(k = 7)
37
38
Conclusion on Location Privacy• Protecting location privacy is a major challenge• Quantification expressed as adversary’s expected estimation error
(incorrectness)• Techniques to protect location privacy: introduce imprecision in the
reported location, reduce location report frequency, make use of pseudonyms,…
• Privacy (similarly to any security property) is adversary-dependent. Neglecting adversary’s strategy and knowledge limits the privacy protection
• More information and pointers:http://lca.epfl.ch/projects/quantifyingprivacy
Holistic PrivacyFrom Location Privacy to Genomic Privacy
1. On Privacy Protection2. Location Privacy3. Genomic Privacy
39
On Convergence…
40
``The last inch´´
Digital medicine:- Digital medical records- Digital imaging-Medical online social networks-Genome sequencing-Other ´omics data- Wireless biosensors…
Telecom Computing
ICT
…0100110100011… …CGTTAATTCCGTA…
41
The Genomic Avalanche Is Coming…
42
Genetic Sequencing
GATTACA (1997 Movie)
Basics of Genomics – 1• A full genome sequence:
– uniquely identifies each one of us
– contains information about our ethnic heritage, disease predispositions, and many other phenotypic traits.
• Human genome: 3 billion letters
44
Basics of Genomics - 2
• The cell’s nucleus holds the genetic program that determines most of our physical characteristics.
• This information is stored in chromosomes.• Billions of identical copies of the genetic program, one for each cell
nucleus.
45
Basics of Genomics – 3
• Chromosomes: molecules of a double-stranded chemical known as Deoxyribonucleic acid (DNA)
• DNA consists of chemical units that hook together known as nucleotides
46
Basics of Genomics – 4
• DNA has two strands and four nucleotides (A T G C):
• A = Adenosine• T = Thymidine• G = Guanosine• C = Cytidine
• The genetic information is stored in the exact sequence of nucleotides.Pairs: A-T and G-C
47
Basics of Genomics – 5
Human Genome complete and ordered sequence of all 23 chromosomes
48
Basics of Genomics - 6
• Human Genome identical in most places for all people.
• SNP (Single Nucleotide Polymorphism) positions where some people have one nucleotide pair while others have another.
49
Basics of Genomics – 7
• SNPs make up only 1.3% of the genome
• The differences at these places make each of us unique
Allele designates which nucleotide is present at a SNP.
50
40 million SNPs
… … ……
Summary of Key Concepts
• Our genetic information is stored in the sequence of DNA in our chromosomes.
• There are 23 chromosomes in a human genome. Men and women have slightly different sets of chromosomes.
• SNPs are chromosome addresses. They are spots where some people have one nucleotide, while others have another.
• SNPs have four possible alleles: A, T, G, and C.• Our collection of SNP alleles is what makes each of us unique.• Modern techniques make it possible to determine the status
of large numbers of SNPs very efficiently.
51
From the Sample to the Full Genome Sequence
Raw data(FASTq)
Full genome
• Individual diagnosis,personalized medicine
• Statistics
Deep / ultra-deep sequencing
SAM file (aligned reads)
52
Samples Sequencing machine (Illumina,
Roche, Life Technology,
Oxford Nanopore,PacBioScience,…)
Threat
• Leakage of genomic data• Revelation of privacy-sensitive data about the
patient – Predisposition to disease, ethnicity, paternity or
filiation, etc.– Denial of access to health insurance, mortgage,
education, and employment• Cross-layer attacks
– Using privacy-sensitive information belonging to a victim retrieved from different sources
53
Goals
• Allow specialists to access only to the genomic data they need
• Protect data, including from insiders (e.g., curious sysadmins) homomorphic encryption
• Access time to a single patient’s genomic data below a few seconds
• Access time to the data of a cohort of thousands of patients below a few minutes
Cryptographic Tools
55
Possible Solution
6) M
ark
ers re
late
d to
dise
ase
X a
nd th
eir
contrib
utio
ns
5) “Check my susceptibility to disease X”and part of P’s secret key, x(2)
3) Encrypted variants
8) E
nd-re
sult o
r re
late
d v
aria
nts
7) Homomorphic operationsand proxy encryption
Patient (P)Medical Center
(MC)
1) S
am
ple
Certified Institution Curious Party@ SPU
Malicious 3rd party
Storage and Processing Unit(SPU)
2) Sequencing and encryption
4) Part
of P’s
secre
t key, x
(1)
56
Probabilities:
. . .
Markers for disease X:
P’s SNPs:
Contributions
of markers:
P’s susceptibility for disease X:
. . . . . . . . .
Disease Susceptibility – Weighted Averaging
• All operations are conducted in ciphertext using homomorphic encryption. 57
Prototype – Patient Interface
58
Prototype – SPU Interface
59
Prototype – Medical Center Interface
60
Holistic Privacy: Data about an Individual
61
Genome
Human Relationships
Mobility+ Body Area Network
Conclusion on Genomic Privacy• Digital medicine is coming• It will for ever change the landscape of privacy
protection• Genomics is particularly relevant and there is a huge
ongoing research effort• Highly sensitive data + huge amounts of data + complex
correlations between data Complex field, Big Data• Tools (cryptography, security protocols,
database/differential privacy, anonymization techniques,…) already used for privacy protection in ICT can (and should) be applied here
• More information and pointers: http://lca.epfl.ch/projects/genomic-privacy/
62
Overall Conclusion• Assault on privacy huge research challenges • Location privacy
– quantifiable at the physical level ( (x, y) coordinates)– ongoing work at the semantic level
• Online Social Networks part of the background knowledge of the adversary
• Genomic privacy – still in its infancy– soon to be very hot – first results coming out
63