Upload
alket-cecaj
View
180
Download
1
Embed Size (px)
Citation preview
Re-identification of Anonymized CDR datasets Using Social network Data
Alket Cecaj, Marco Mamei, Nicola BicocchiUniversity of studies of Modena and Reggio Emilia
PerCom 2014
IEEE International Conference on Pervasive Computing and Communications. Budapest, Hungary
More data..big opportunities of study
Dataset join and privacy issues
• Matching different users associated to the same real person.
• Privacy issues: any kind of information can be inferred
● Join different datasets is the key for advanced forms of context awareness
Related work Anonymization.. and re-identification• Gender, ZIP and full date of birth 63% of re-identification
• movie ratings from NetFlix Prize dataset
• Medical records of Massachusetts Hospital using a voters list
• re-identification of anonymous volunteers in a DNA study for Personal Genome Project
In line with our domain• Unique in the Crowd: the privacy bounds of Human Mobility
• Markov chain models for de-anonymization of geo-located data
Dataset join and privacy issues.
• Can we use data from social networks to re-identify users for an anonymized dataset such as a CDR one?
• Probabilistic approach to evaluate the re-identification potential.
CDR and Social Data sets
CDR and Social Dataset - Distribution of events● CDR● on average 28 events/period , max = 330, min = 3● 2.019321 users for final analysis● Social dataset● on average 20 events/period , max = 424, min = 3● 700 users for final analysis
Matching users among datasets● Time and space parameters for matching for example 10min of time
interval between events and cell radius as physical distance
● Clone of social dataset in order to check/verify the quantity of matchings that were done by chance following Bonferroni’s principle.
● Exclusion of CDR users making events in the same time but in a long distance much bigger that the cell radius.
Convergence to one ?
Distributions and Percentages
Probabilistic modelling Given FTa, U discrete random variable, having NU values Ui
i= 1...N
Overall results
ConclusionsPotential and/or limits of re-identification of users across multiple mobility datasets.
Future research:• the current model and overall approach needs refinement
• privacy concerns though mechanisms for preserving privacy and data utility for a single aspect
• correlation among data sets represents a big opportunity to enrich the information available to a pervasive application
Thank you for your attention. Questions are welcome.