30
Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.con cordia.ca Harpreet Sandhu h_san(at)encs.concor dia.ca Qing Shi q_shi(at)encs.concor dia.ca Concordia Institute for Information Systems Engineering Concordia University Montreal, Quebec Canada H3G 1M8 C3S2E-2009 The research is supported in part by the Discovery Grants (356065-2008) from Natural Sciences and Engineering Research Council of Canada (NSERC). Benjamin Fung fung(at)ciise.concor dia.ca

Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

Embed Size (px)

Citation preview

Page 1: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

Anonymizing Location-based data

Jarmanjit SinghJar_sing(at)encs.concordia.ca

Harpreet Sandhuh_san(at)encs.concordia.ca

Qing Shiq_shi(at)encs.concordia.ca

Concordia Institute for Information Systems EngineeringConcordia University

Montreal, QuebecCanada H3G 1M8

C3S2E-2009

The research is supported in part by the Discovery Grants (356065-2008) from Natural Sciences and Engineering Research Council of Canada (NSERC).

Benjamin Fungfung(at)ciise.concordia.ca

Page 2: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

1

OverviewOverview

RFID basics RFID data publishing Problem statement Proposed algorithms Evaluation Conclusion

Page 3: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

2

RFID basics RFID basics

Radio frequency identificationUses radio frequency (RF) to identify (ID) objects.

Wireless technologyThat allows a sensor (reader) to read, from a distance, and without line of sight, a unique electronic product code (EPC) associated with a tag.

Tag

Reader

Page 4: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

3

Data flow in RFID systemData flow in RFID system

This is where we use anonymiziation algorithms to preserve the privacy of data to be published.

Page 5: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

4

Motivating exampleMotivating example

For example, Alice has used her RFID-based credit card at:

Grocery store, Dental clinic, Shopping mall, Beer bar, Casino, AIDs clinic etc.

Assume that Eve has seen Alice using her card at grocery store and shopping mall.

However, if RFID Company publish its data And there is only one record containing grocery store and shopping mall.

Then Eve can immediately infer that this record belongs to Alice and can also learn other locations visited by her.

How can the RFID company safeguard the data privacy while keeping the released RFID data useful?

Page 6: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

5

<EPC#; loc; time><EPC1; a; t1><EPC2; b; t1><EPC3; c; t2><EPC2; d; t2><EPC1; e; t2><EPC3; e; t4><EPC1; c; t3><EPC2; f; t3><EPC1; g; t4>

RFID database

PathEPC1

EPC2

EPC3

< a1 e2 c3 g4 >

< b1 d2 f3 >

< c2 e4 >

Person-Specific Path TableRFID database RFID database

<(loc1t1) … (locntn)>

where, (lociti) is a pair indicating the location and time (called transaction), <(loc1t1) … (locntn)> is a path (called record)

Page 7: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

6

Attacker knowledgeAttacker knowledge

Attacker knowledge: Suppose the adversary knows that the target victim, Alice, has visited e and c at time 4 and 7, respectively.

If there is only record containing e4 and c7 then attacker can easily infer that this record belongs to Alice and can also learn other

locations visited by Alice

Page 8: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

7

Problem statementProblem statement We model attacker knowledge by I.

Attacker can learn maximum of I transactions within any record. Knowledge is constrained by “effort” required to learn.

We transform person-specific path database D to (k,I)-anonymized database D’. Such that, no attacker having prior knowledge of m

transactions of a record r Є D and m ≤ I, can use his knowledge to identify less than k records from D’.

Page 9: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

8

• A table T satisfies (K,I)-anonymity if and only if r ≥ K for any subsequence s with |s| ≤ I of any path in T, where r is the number of records containing s and K is an anonymity threshold.

Problem statement cont.Problem statement cont.Assume, Attacker knowledge I=2 and, K value = 3

s = <e4c7>, r = 1s = <d2f6>, r = 3

Page 10: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

9

This is easy said but This is easy said but how to transform how to transform

database D to version D’ database D to version D’ that is immunized that is immunized

against re-identification against re-identification attacks ?attacks ?

Page 11: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

10

Pre-suppression

Firstly, we scan D to find items support < K.And, delete them from D to get Dpre.

Generate subsets of size-i

We generate subsets of size-I from Dpre.And, make additional scan to count their support.

Add dummy records

We make infrequent subsets to be frequent by using IF-anonymity algorithm.

Proposed method: Three stepsProposed method: Three steps

Page 12: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

11

Generate subsets of size-i Generate subsets of size-i Subset generation

Increasing lexicographical order, means we do not consider the reverse combinations of transactions within a record.

The size of subsets generated should not exceed I.

{a1, d2}, {a1, b3}, {a1, e4}, {a1, f6}, {a1, c7}, {d2, b3}, {d2, e4}, {d2, f6}, {d2, c7}, {b3, e4}, {b3, f6}, {b3, c7}, {e4, f6}, {e4, c7}, {f6, c7} {b3, e4}, {b3, f6}, {b3, e8}, {e4, f6}, {e4, e8}, {f6, e8}

Do this for all records!!

Suppose, I = 2

Page 13: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

12

Count support for each subset.

Identify frequent and infrequent subsets.

Frequent subsets

Infrequent subsets

These subsets have support value < K value. We need to add dummy records to make them (K,I) anonymous

Count supportCount support

These subsets have support value ≥ K value.

Page 14: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

14

Suppose, k = 3 Infrequent subsets

Subsets containing ‘a1’

Infrequent subsets

Pre-suppression:Pre-suppression: ExampleExample

Page 15: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

15

What is dummy record?What is dummy record?

Some properties of adding dummy record:

Property 1: Length of dummy record should not exceed the maximum length.

Property 2: The transactions within dummy record should have reasonable time difference.

Dummy records are fake records inserted in a database In order to make infrequent subsets meet support value.

Page 16: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

16

Construct tree out of infrequent subsets.

we can get the minimum reasonable time difference between any two locations either by learning from D or by using geographical databases

Process to add dummy recordProcess to add dummy record

Null

e4: 3 b6: 2 d2: 1

c7: 2 g9: 1 e4: 1 a5: 1b6: 1

Two properties:

Reasonable time difference.

Length of dummy record.

Page 17: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

17

Divide tree if time conflictDivide tree if time conflict

Rule 1: Let β is the set of nodes at level 1 of tree And ‘n’ be the node at which tree need to be divided.

Let γ be the set of children nodes of ‘n’. If there exists an intersection α between β and γ, β ∩

γ = α ≠ ф. Let δ be the set of children nodes of α.

And intersection |δ ∩ γ| ≥ |δ| / 2. We separate ‘n’ and α along with their children nodes (γ

and δ respectively) from original tree to construct different tree.

Page 18: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

18

Divide tree if largeDivide tree if large

Count the number of nodes in each tree except null.

If any tree has nodes more than threshold. Divide tree again by taking ratio:

Let X be the number of nodes in tree and X > λ Ratio: X / λ .

Page 19: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

19

Divide tree Cont.. Divide tree Cont..

Rule2: suppose number of nodes at level-1 of tree are |1x|.

And ratio: X / λ ≥ |1x| We divide tree for each node at level-1 and we

compute ratio again for each tree. And if ratio: X / λ < |1x|

We divide tree at level-1 by combining nodes (at level-1) having more intersecting children’s in one tree.

Page 20: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

20

Add dummyAdd dummy

After having each tree satisfying X ≈ λ. We can write dummy record by following rule 3.

Rule 3: let Xj be the set of nodes at level-i (initially i =1) And Xj+1 be the set of nodes at level-(i+1), ....., Xm be the set of nodes at level-m.

All sets have their values in ascending order by time. We get dummy record by taking Union of (X1, X2 , .., Xm).

Page 21: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

21

Recount support Recount support Dummy will also generate some subsets for which

we do not know the support. For ex, {a, b} , {b, c} are infrequent subsets and we added

dummy a b c. To make the frequent but there is also one new subset {a, c} for which we don’t know the support value.

So, we generate subsets of size-I from dummies and count support for each. We repeat IF-anonymity algorithm for new infrequent

subsets. Process stops when there is no infrequent subset.

Page 22: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

22

Page 23: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

23

Experimental evaluation: Experimental evaluation: Distortion vs. DimensionsDistortion vs. Dimensions

Page 24: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

24

Distortion vs. Attacker knowledgeDistortion vs. Attacker knowledge

Page 25: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

26

Distortion vs. Number of recordDistortion vs. Number of record

Page 26: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

27

ConclusionConclusion

Privacy in publishing high dimensional data has become an important issue.

We illustrate the treat of re-identification attack caused by publishing RFID data.

In this paper, we have proposed an efficient scheme to (K,I)-anonymize high dimensional data.

Page 27: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

28

ReferencesReferences• A. R. Beresford and F. Stajano. Location privacy in pervasive computing. IEEE

Pervasive Computing, 2003.• L. Sweeney. Achieving k-Anonymity Privacy Protection Using Generalization and

Suppression. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10(5), 2002.

• R. J. Bayardo and R. Agrawal. Data Privacy through Optimal k-Anonymization. In IEEE ICDE, pages 217–228, 2005.

• K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Incognito: Efficient Full-domain k-Anonymity. In ACM SIGMOD, pages 49–60, 2005.

• K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian Multidimensional k-Anonymity. In IEEE ICDE, 2006.

• C. C. Aggarwal and P. S. Yu. A Condensation Based approach to Privacy Preserving Data Mining. In EDBT, pages 183-199, 2004.

• L. Sweeney. k-Anonymity: A Model for Protecting Privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 5, pages 557–570, 2002.

• A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-Diversity: Privacy beyond k-Anonymity. In IEEE ICDE, 2006.

Page 28: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

29

References cont.References cont.• C. C. Aggarwal. On k-Anonymity and the Curse of Dimensionality. In VLDB, pages

901–909, 2005.• J. Xu, W. Wang, J. Pei, X. Wang, B. Shi, and A. Fu, Utility-Based anonymization

Using Local Recoding. In ACM SIGKDD, 2006.• X. Xiao and Y. Tao. Anatomy: Simple and Effective Privacy Preservation. In VLDB,

2006.• Y. Xu, B. C. M. Fung, K. Wang, A. W. C. Fu, and J. Pei. Publishing sensitive

transactions for itemset utility. In IEEE ICDM, pages 1109-1114, December 2008.• G. Ghinita, Y. Tao, P. Kalnis. On the anonymization of sparse high-dimensional

data. In IEEE ICDE, 2008.• M. Terrovitis, N. Mamoulis and P. Kalnis. Anonymity in unstructured data.

Technical Report, Hong Kong University, 2008. • J. Han and M. Kamber. Data mining: Concepts and Techniques. The Morgan

Kaufmann series in Data Management Systems, Jim Gray, Series Editor Morgan Kaufmann Publishers, March 2006. ISBN 1-55860-901-6

Page 29: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

30

References cont.References cont.• B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu. Privacy-preserving data publishing: a

survey on recent developments. ACM Computing Surveys, 2010.• B. C. M. Fung, K. Wang, L. Wang, and P. C. K. Hung. Privacy-preserving data

publishing for cluster analysis. Data & Knowledge Engineering, 2009.• N. Mohammed, B. C. M. Fung, P. C. K. Hung, and C. K. Lee. Anonymizing healthcare

data: a case study on the Red Cross. In ACM SIGKDD, June 2009.• B. C. M. Fung, K. Wang, and P. S. Yu. Anonymizing classification data for privacy

preservation. IEEE (TKDE), 19(5):711-725, May 2007.• K. Wang, B. C. M. Fung, and P. S. Yu. Handicapping attacker's confidence: an

alternative to k-anonymization. Knowledge and Information Systems (KAIS), 11(3):345-368, April 2007. Springer-Verlag.

• B. C. M. Fung, K. Wang, A. W. C. Fu, and J. Pei. Anonymity for continuous data publishing. In EDBT, pages 264-275, March 2008.

• K. Wang and B. C. M. Fung. Anonymizing sequential releases. In ACM SIGKDD, pages 414-423, August 2006. DOI= http://www.cs.sfu.ca/~wangk/pub/WF06kdd.pdf

• B. C. M. Fung, K. Wang, and P. S. Yu. Top-down specialization for information and privacy preservation. In IEEE ICDE, pages 205-216, April 2005.

Page 30: Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca

Thank you

?32