28
Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

Embed Size (px)

Citation preview

Page 1: Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

Task 1: Privacy Preserving Genomic

Data SharingPresented by

Noman MohammedSchool of Computer Science

McGill University

24 March 2014

Page 2: Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

2

Reference

N. Mohammed, R. Chen, B. C. M. Fung, and P. S. Yu. Differentially private data release for data mining. In Proceedings of the17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 493-501, 2011.

2

Page 3: Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

3

Outline

Privacy Models Algorithm for Relational Data Algorithm for Genomic Data Conclusion

3

Page 4: Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

4

Overview4

Privacy model

Anonymization

algorithm

Data utility

Page 5: Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

5

k-Anonymity [Samarati & Sweeney, PODS 1998]

Quasi-identifier (QID): The set of re-identification attributes.

k-anonymity: Each record cannot be distinguished from at least k-1 other records in the table wrt QID.

3-anonymous patient table

Job Sex Age Disease

Professional Male [36-40]

Cancer

Professional Male [36-40]

Cancer

Professional Male [36-40]

Cancer

Artist Female

[30-35]

Flu

Artist Female

[30-35]

Hepatitis

Artist Female

[30-35]

Fever

Artist Female

[30-35]

Hepatitis

Raw patient table

Job Sex Age Disease

Engineer Male 36 Cancer

Engineer Male 38 Cancer

Lawyer Male 38 Cancer

Musician Female 30 Flu

Musician Female 30 Hepatitis

Dancer Female 30 Fever

Dancer Female 30 Hepatitis

5

Page 6: Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

6

Differential Privacy [DMNS, TCC 06]

6

A

Page 7: Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

7

Differential Privacy 7

A non-interactive privacy mechanism A gives ε-differential privacy if for all neighbour D and D’, and for any possible sanitized database D*

PrA[A(D) = D*] ≤ exp(ε) × PrA[A(D’) = D*]

D D’

D and D’ are neighbors if they differ on at most one record

Page 8: Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

8

Laplace Mechanism8

For example, for a single counting query Q over a

dataset D, returning Q(D) + Laplace(1/ε) maintains

ε-differential privacy.

∆f = maxD,D’||f(D) – f(D’)||1

For a counting query f: ∆f =1

Page 9: Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

9

Outline

Privacy Models Algorithm for Relational Data Algorithm for Genomic Data Conclusion

9

Page 10: Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

10

Non-interactive Framework

0 + Lap(1/ε)

10

Page 11: Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

11

For high-dimensional

data, noise is too big

0 + Lap(1/ε)

11

Non-interactive Framework

Page 12: Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

12

12

Non-interactive Framework

Page 13: Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

13

Job Age Class Count

Any_Job [18-65) 4Y4N 8

Artist [18-65) 2Y2N 4Professional [18-65) 2Y2N 4

Age[18-65)

[18-40) [40-65)

Artist [18-40) 2Y2N 4 Artist [40-65) 0Y0N 0

Anonymization Algorithm

[18-30) [30-40)

13

Professional [18-40) 2Y1N 3 Professional [40-65) 0Y1N 1

JobAny_Job

Professional Artist

Engineer Lawyer Dancer Writer

Page 14: Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

14

Candidate Selection

we favor the specialization with maximum Score value

First utility function:

∆u =

Second utility function:

∆u = 1

14

14

Page 15: Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

15

Anonymization Algorithm

O(Aprx|D|log|D|)

O(|candidates|)

O(|D|)

O(|D|log|D|)

O(1)

15

Page 16: Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

16

Anonymization Algorithm

O(Aprx|D|log|D|)

O(|candidates|)

O(|D|)

O(|D|log|D|)

O(1)

O((Apr+h)x|D|log|D|)

16

Page 17: Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

17

Outline

Privacy Models Algorithm for Relational Data Algorithm for Genomic Data Conclusion

17

Page 18: Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

18

case_chr2_29504091_30044866

rs11686243AG AG AA AG GG AA AG AA GG AG AA AA AA AA AA AA AA GG GG AG AG AA AG GG AA AA GG AG AG AG GG AG AA AA AG AG AG AG AG AG AA AG GG AG AA GG GG GG GG AG AG AG AG AA GG GG GG AG AA AG GG AG AA GG GG AG AG AG AG AG AA AA AG AG AG AA AG AG AG AG GG AG AG AG GG GG AG AG GG AG AG AG AA AA GG AG AA GG AA AA AG GG AG AG AG AG AG AG AG AG GG GG AA AG AG AG AG AA AG GG AG GG AA AG GG AG AG AG AA AG AG AG GG AG GG AG GG AG AG AG GG AG AG GG GG AG AG GG AA GG AA AG AG AG AG GG AG AA AG GG GG AG AG AG AG AG GG AG AG AA AG AA AA AG GG AA AG AG GG AG GG AG AG GG GG AG AG AA AG AG AG GG AG GG GG AG AG GG AG GG

rs4426491CC CC CC CT CT CC CT CC CT CT CC CC CC CC CC CC CC CT CT CT CC CC CT CT CC CC CT CC CT CC CT CC CC CC CT ….

rs4305230CC CC CC CT CT CC CT CC TT CT CC CC CC CC CC CC CC TT TT CT CC CC CT CT CC CC CT CC CT CC TT CC CC CC CT ….

18

Page 19: Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

19

Case

rs11686243

rs4426491

rs4305230

rs4630725

… …

1 AG CC

2 AG CC

3 AA CC

4 AG CT

5 GG CT

… … …

… … …

… … …

… … …

198 GG TT

199 AG CT

200 GG CT

Raw Data19

Page 20: Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

20

Blocks/Attributes

Case

rs11686243

rs4426491

1 AG CC

2 AG CC

3 AA CC

4 AG CT

5 GG CT

20

Unique Combinations:

AG CC AA CC AG CT GG CT

Any

AG CC AA CC AG CT GG CT

Page 21: Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

21

Taxonomy Trees for Attributes SNP data was split evenly into N/6

blocks(attributes), where N is number of SNPs

21

Page 22: Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

22

Hierarchy Tree for Chr222

Page 23: Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

23

Hierarchy Tree for Chr1023

Page 24: Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

24

Block 1 Block 2 Block 3 Count

Any Any Any 200

AA CC Any Any 130AG CC Any Any 70

Block 3Any

CC GG CT AG

AA CC Any CC GG 60 AA CC Any CT AG 70

Genomic Data24

AG CC Any CC GG 30 AG CC Any CT AG 40

Block 1Any

AG CC AA CC

Page 25: Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

25

Anonymized Data25

Case

rs11686243

rs4426491

rs4305230

rs4630725

… …

1 AG CC Any Any

2 AG CC Any Any

3 … …

4 … …

5 AA CC Any Any

… AA CC Any Any

… … …

… … …

… … …

Page 26: Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

26

Heterogeneous Healthcare Data

ID Job Age rs4305230

rs4630725

… …

1 Engineer 50 AG CC … …

2 Doctor 45 AA CT … …

3 … … … … … …

Relational Data

Genomic Data

26

Page 27: Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

27

Privacy-Preserving Genomic Data Release Tree-based approach is promising

Future work Partitioning the SNPs to generate blocks Utility function for specialization Two-level tree Vs. multi-level hierarchy trees Single-dimension Vs. multi-dimensional

partitioning

Conclusions27

Page 28: Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

28

Privacy-Preserving Genomic Data Release Tree-based approach is promising

Future work Partitioning the SNPs to generate blocks Utility function for specialization Two-level tree Vs. multi-level hierarchy trees Single-dimension Vs. multi-dimensional

partitioning

Thank You !28