22
PRESERVING PRIVACY IN SEMANTIC-RICH TRAJECTORIES OF HUMAN MOBILITY Anna Monreale, Roberto Trasarti, Dino Pedreschi, Chiara Renso KDDLab, Pisa Vania Bogorny Univ. Santa Catarina, Brasile 1 Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa) www-kdd.isti.cnr.it ANONIMO MEETING, Pisa, 20,21 settembre 2010 SPRINGL 2010, San Jose, November 2, 2010

Preserving Privacy in Semantic-Rich Trajectories of Human Mobility

Embed Size (px)

DESCRIPTION

The increasing abundance of data about the trajectories ofpersonal movement is opening up new opportunities for analyzing and mining human mobility, but new risks emergesince it opens new ways of intruding into personal privacy.Representing the personal movements as sequences of placesvisited by a person during her/his movements - semantictrajectory - poses even greater privacy threats w.r.t. rawgeometric location data. In this paper we propose a privacy model defining the attack model of semantic trajectorylinking, together with a privacy notion, called c-safety. Thismethod provides an upper bound to the probability of inferring that a given person, observed in a sequence of non-sensitive places, has also stopped in any sensitive location.Coherently with the privacy model, we propose an algorithmfor transforming any dataset of semantic trajectories into ac-safe one. We report a study on a real-life GPS trajectory dataset to show how our algorithm preserves interestingquality/utility measures of the original trajectories, such assequential pattern mining results.

Citation preview

Page 1: Preserving Privacy in Semantic-Rich Trajectories of Human Mobility

PRESERVING PRIVACY IN SEMANTIC-RICH TRAJECTORIES OF

HUMAN MOBILITY

Anna Monreale, Roberto Trasarti, Dino Pedreschi, Chiara Renso

KDDLab, Pisa

Vania BogornyUniv. Santa Catarina, Brasile

1

Knowledge Discovery and Delivery Lab(ISTI-CNR & Univ. Pisa)

www-kdd.isti.cnr.it

ANONIMO MEETING, Pisa, 20,21 settembre 2010

SPRINGL 2010, San Jose, November 2, 2010

Page 2: Preserving Privacy in Semantic-Rich Trajectories of Human Mobility

How the story begins…

2 Semantic trajectories

represent the important places visited by people

Semantic trajectories

represent the important places visited by people

This information can be privacy sensitive! We

should find a good generalization of

the visited places… preserving semantics!But how?

This information can be privacy sensitive! We

should find a good generalization of

the visited places… preserving semantics!But how?

Can we use a taxonomy of places to generalize and find anonymous

datasets?Let’s ask help to Anna, Dino and

Roberto!

Can we use a taxonomy of places to generalize and find anonymous

datasets?Let’s ask help to Anna, Dino and

Roberto!

Page 3: Preserving Privacy in Semantic-Rich Trajectories of Human Mobility

Semantic Trajectories

Availability of trajectory data increases From raw trajectories to new forms of trajectory

data with richer semantic information: semantic trajectories

Semantic trajectories represents moving objects traces as sequences of stops and moves

A semantic trajectory can be represented as the sequence of stops, e.g.

<Home, Work, ShoppingCenter, Gym>

Page 4: Preserving Privacy in Semantic-Rich Trajectories of Human Mobility

Semantic Trajectory and Privacy

Data owner should not reveal personal sensitive information

Disclosure of personal sensitive information puts the citizen’s privacy at risk.

Hiding personal identifiers may not be sufficient Need for new privacy-preserving DT techniques

Privacy by Design

Natural trade-off between privacy quantification and data utility Analysis results should not be altered significantly Privacy has to be maximized

Page 5: Preserving Privacy in Semantic-Rich Trajectories of Human Mobility

Semantic Trajectories Analysis and Privacy Issues

Analyzing datasets of semantic trajectories may cause privacy issues

A place allows to infer personal sensitive information of an individual

Example: From the fact that a person has stopped in an oncology clinic, an attacker can derive private personal information about the health of such person.

5

Page 6: Preserving Privacy in Semantic-Rich Trajectories of Human Mobility

Semantic Trajectories Analysis and Privacy Issues

k-anonymity is not enough for a robust protection

When individuals with similar trajectories stop in the same sensitive place, we can easily infer the individual sensitive information.

Example:#U1 <Park, Restaurant, Oncology Clinic>

#U2 <Park, Restaurant, Oncology Clinic>

This dataset is 2-anonymous but the attacker can infer that the user has been to the Oncology Clinic!!!

6

Page 7: Preserving Privacy in Semantic-Rich Trajectories of Human Mobility

The Privacy Framework Anonymizes dataset of semantic trajectories

Based on semantic generalization and the notion of c-safety - similar to the notion of l-diversity in relational, tabular data

It is based on: a taxonomy of places, the notion of quasi identifier places and sensitive places.

Preserves patterns mining results

Page 8: Preserving Privacy in Semantic-Rich Trajectories of Human Mobility

Quasi-identifier and Sensitive stops

8

The taxonomy of places Represents important places and their

semantic categories in a given domain quasi-identifier places: can be used to

infer the identity of the user sensitive places: can disclose sensitive

information about the user

In general we don’t have an apriori classification since it depends on the application and the context

Page 9: Preserving Privacy in Semantic-Rich Trajectories of Human Mobility

Privacy place taxonomy9

Page 10: Preserving Privacy in Semantic-Rich Trajectories of Human Mobility

Privacy Model10

Adversary Knowledge: how we anonymize the data the privacy place taxonomy describing the levels of

abstraction the user U is in the dataset a quasi-identifier place sequence SQ visited by the user

U

 Attack Model: Given SQ, the attacker builts a set of candidate

semantic trajectories containing SQ and tries to infer the sensitive places visited by U.

We denote by Prob(SQ,S) the probability that, given a quasi-identifier place sequence SQ related to a user U, the attacker infers the sequence of sensitive places S visited by the user.

Page 11: Preserving Privacy in Semantic-Rich Trajectories of Human Mobility

C-Safe Dataset

We want to control the probability Prob(SQ, S) A dataset ST is said c-safe wrt the place set

Q if for every quasi-identifier place sequence SQ, we have that for each set of sensitive place S Prob(SQ,S) ≤ c with c ∈ [0,1].

Given a sequence of sensitive places S = s1, . . . , sh and a quasi-identifier sequence SQ the probability to infer S is the conditional probability:

P(SQ,S) = P(S|SQ)

11

Page 12: Preserving Privacy in Semantic-Rich Trajectories of Human Mobility

How we can obtain a c-safe dataset?

12

The CAST (C-safe Anonymization of Semantic Trajectories) algorithm guarantees that P(S|SQ) ≤ c for each sequence of S and SQ

While (|S|>0)SL = { s S| length(s) = MaxLength(S)}While (|SL| >= m)

1. Compute the Cost of all possible group Gi of m sequences in SL as: CostGi = CostQGi + CostSGi.

2. Apply the generalization with the lower Cost storing the results in R.

3. Remove Gi from S and SL.If (| SL |>0) Cut(SL);

Page 13: Preserving Privacy in Semantic-Rich Trajectories of Human Mobility

Example (1): The process13

Consider the following set of sequences, and m=3 and c=0.45: S = {<S1, R2, H1, R1, C1, S2>

<S3, D1, R1, C1, S2><S1, P3, C2, D2, S2> …}

Page 14: Preserving Privacy in Semantic-Rich Trajectories of Human Mobility

Example (2) CostQ14

CostQ is the number of hops on the tree needed to generalize the sequences of Quasi-identifiers to a common one.

Consider the group:<S1, R2, H1, R1, C1, S2> <S3, D1, R1, C1, S2> <S1, P3, C2, D2, S2>

CostQ = 6 + 6 + 6 = 18

<Station,Place,Entertainment,S2 (H1,C1)>

<Station,Place,Entertainment,S2 (C1)>

<Station,Place,Entertainment,S2 (C2)>

Page 15: Preserving Privacy in Semantic-Rich Trajectories of Human Mobility

Example (2) CostS15

CostS is the number of hops on the tree needed to generalize the sequence of Sensible in order to obtain the c-safety.

From the generalized group:<Station,Place,Entertainment,S2 (H1,C1)><Station,Place,Entertainment,S2 (C1)><Station,Place,Entertainment,S2 (C2)>

CostS = 3

The Total Cost of this group is 21 hops, which is the lower combination

<Station, Place, H1, Entertainment, Clinic, S2 ><Station, Place, Entertainment, Clinic, S2> <Station, Place, Clinic, Entertainment, S2>

Page 16: Preserving Privacy in Semantic-Rich Trajectories of Human Mobility

Example (4): Why is C-safe

<Station,Place,Entertainment,S2 (H1,C1)>

<Station,Place,Entertainment,S2 (C1)>

<Station,Place,Entertainment,S2 (C2)>

SQ = ⟨Station, Place, Entertainment, S2⟩.

Probability of crack: P (SQ , H1 ) = 1/3 <c , P(SQ,C1) = 2/3 > c and P(SQ,C2) = 1/3 <c

We need to generalize C1 to the higher representation level in the taxonomy: Clinic.

The probability of C1 become 2/5 < c !!!!

C-safe dataset:<Station, Place, H1, Entertainment, Clinic, S2 >

<Station, Place, Entertainment, Clinic, S2>

<Station, Place, Clinic, Entertainment, S2>

16

Page 17: Preserving Privacy in Semantic-Rich Trajectories of Human Mobility

Experiments

We found 6225 semantic trajectories with an average length equal to 5.2 stops.

We run the sequential pattern algorithm and we measured the quality of the results with two measures:

the coverage coefficient the distance coefficient.

17

The dataset contains trajectories of 17000 moving cars in Milan, in one week, collected through GPS devices.

Page 18: Preserving Privacy in Semantic-Rich Trajectories of Human Mobility

Experiments: Quality of the analysis

the coverage coefficient measures how many patterns extracted from the original dataset are covered (have a superclass in the taxonomy) by the patterns extracted in the anonymized dataset

18

Page 19: Preserving Privacy in Semantic-Rich Trajectories of Human Mobility

Experiments: Coverage Coefficient

19

Page 20: Preserving Privacy in Semantic-Rich Trajectories of Human Mobility

Experiments: Quality of the analysisDistance coefficient represents the

distance in terms of steps in the taxonomy to transform the patterns from the set extracted on the original dataset and the one from the anonymized dataset.

20

Page 21: Preserving Privacy in Semantic-Rich Trajectories of Human Mobility

Experiments: Distance Coefficient

Page 22: Preserving Privacy in Semantic-Rich Trajectories of Human Mobility

Conclusions and Future work Improve the algorithm with better heuristics and

that does not consider only groups of a fixed size. More experiments with other mining algorithms More utility measures for the evaluation of results Another future research direction goes towards

the exploitation of c-safe semantic trajectories dataset for semantic tagging of trajectories. How does the anonymization step affect the overall results of a trajectory semantic tagging inference?

22