15
Re-identification of Anonymized CDR datasets Using Social network Data Alket Cecaj, Marco Mamei, Nicola Bicocchi University of studies of Modena and Reggio Emilia PerCom 2014

Re-identification of Anomized CDR datasets using Social networlk Data

Embed Size (px)

Citation preview

Page 1: Re-identification of Anomized CDR datasets using Social networlk Data

Re-identification of Anonymized CDR datasets Using Social network Data

Alket Cecaj, Marco Mamei, Nicola BicocchiUniversity of studies of Modena and Reggio Emilia

PerCom 2014

IEEE International Conference on Pervasive Computing and Communications. Budapest, Hungary

Page 2: Re-identification of Anomized CDR datasets using Social networlk Data

More data..big opportunities of study

Page 3: Re-identification of Anomized CDR datasets using Social networlk Data

Dataset join and privacy issues

• Matching different users associated to the same real person.

• Privacy issues: any kind of information can be inferred

● Join different datasets is the key for advanced forms of context awareness

Page 4: Re-identification of Anomized CDR datasets using Social networlk Data

Related work Anonymization.. and re-identification• Gender, ZIP and full date of birth 63% of re-identification

• movie ratings from NetFlix Prize dataset

• Medical records of Massachusetts Hospital using a voters list

• re-identification of anonymous volunteers in a DNA study for Personal Genome Project

In line with our domain• Unique in the Crowd: the privacy bounds of Human Mobility

• Markov chain models for de-anonymization of geo-located data

Page 5: Re-identification of Anomized CDR datasets using Social networlk Data

Dataset join and privacy issues.

• Can we use data from social networks to re-identify users for an anonymized dataset such as a CDR one?

• Probabilistic approach to evaluate the re-identification potential.

Page 6: Re-identification of Anomized CDR datasets using Social networlk Data

CDR and Social Data sets

Page 7: Re-identification of Anomized CDR datasets using Social networlk Data

CDR and Social Dataset - Distribution of events● CDR● on average 28 events/period , max = 330, min = 3● 2.019321 users for final analysis● Social dataset● on average 20 events/period , max = 424, min = 3● 700 users for final analysis

Page 8: Re-identification of Anomized CDR datasets using Social networlk Data

Matching users among datasets● Time and space parameters for matching for example 10min of time

interval between events and cell radius as physical distance

● Clone of social dataset in order to check/verify the quantity of matchings that were done by chance following Bonferroni’s principle.

● Exclusion of CDR users making events in the same time but in a long distance much bigger that the cell radius.

Page 9: Re-identification of Anomized CDR datasets using Social networlk Data

Convergence to one ?

Page 10: Re-identification of Anomized CDR datasets using Social networlk Data

Distributions and Percentages

Page 11: Re-identification of Anomized CDR datasets using Social networlk Data

Probabilistic modelling Given FTa, U discrete random variable, having NU values Ui

i= 1...N

Page 12: Re-identification of Anomized CDR datasets using Social networlk Data

Overall results

Page 13: Re-identification of Anomized CDR datasets using Social networlk Data

ConclusionsPotential and/or limits of re-identification of users across multiple mobility datasets.

Future research:• the current model and overall approach needs refinement

• privacy concerns though mechanisms for preserving privacy and data utility for a single aspect

• correlation among data sets represents a big opportunity to enrich the information available to a pervasive application

Page 14: Re-identification of Anomized CDR datasets using Social networlk Data

Thank you for your attention. Questions are welcome.

Page 15: Re-identification of Anomized CDR datasets using Social networlk Data