View
9
Download
0
Category
Preview:
Citation preview
Privacy preserving similarity detection for data analysis
Iraklis Leontiadis1 Melek Önen1 Refik Molva1 M.J. Chorley2 G.B. Colombo2
CSAR 2013
1Eurecom - France 2Cardiff - UK
Privacy vs Utility
Data A1,A2,A3,…An
Data B1,B2,B3,… Bn
Clustering
Similarity
Privacy preserving similarity detection for data analysis 2
.
.
.
? ? ? ? ? ?
Personality test
Naïve solutions
• Encrypt data with standard crypto – Renders operations infeasible.
• Data separation – Vertical separation is not always applicable.
• Anonymizing techniques – Don’t protect individuals data.
Privacy preserving similarity detection for data analysis 3
Our Approach
• Combine crypto with data processing
User Data Data analysis
Alice 𝐴1′, …𝐴𝐴′ 𝐹(𝐴1′, …𝐴𝐴′)
Bob 𝐵1′, …𝐵𝐴′
𝐹(𝐵1′, …𝐵𝐴′)
𝐹(𝐴1, …𝐴𝐴) = 𝐹(𝐴1′, …𝐴𝐴′)
Data A’1,A’2,A’3,…A’n
Data B’1,B’2,B’3,… B’n
Privacy preserving similarity detection for data analysis 4
.
.
.
Outline
• Our solution – Cosine similarity – Privacy with Geometrical Transformations
• Security Analysis • Performance Evaluation
– Hierarchical clustering – Results
• Looking Ahead
Privacy preserving similarity detection for data analysis 5
Cosine similarity
A
B θo
1 1 w1 w2
…
w4
w3
wn
F1
Dictionary
F2
“Next CSAR workshop will be held in Karlsruhe”
“Next CSAR workshop will be held in London”
A= 1 1
1 1 0 1 1
1 1
1 1 0 1
1
1
1
1
1
B=
…
…
…
…
Privacy preserving similarity detection for data analysis 6
Random Scaling
• Data encoded as unique vectors in ℝ𝐴
• φr:ℝ𝐴 → ℝ𝐴 s.t:
cos a, b = cos φr1(a),φr2(b)
• Random scaling
– r ⟵ℝ𝑛
– S(r, A) = r ∙ A =r ⋯⋮ 𝑟 ⋮
⋯ 𝑟∙ A
Privacy preserving similarity detection for data analysis 7
θo θo
Vector Rotation
• Rotation by a common angle λ°
– R λ° a = a ∙cos (λ°) ⋯ sin (λ°)
⋮ ⋮−sin (λ°) ⋯ cos (λ°)
• φr = a ∙R λ° a ∙ 𝑆𝑟(a)
F1’
F2’
θo
F1
F2
Privacy preserving similarity detection for data analysis 8
Our solution
Privacy preserving similarity detection for data analysis 9
Dimension reduction
Random Scaling
A S(r1, A1) = A1
A2
A3
S(r2, A2) =
S(r3, A3) =
r1 ∙
r2 ∙
r3 ∙
Rotation
R λ° r1 ∙ A1 =
R λ° r2 ∙ A2 =
R λ° r3 ∙ A3 =
R λ° ∙ r1 ∙
R λ° ∙ r2 ∙
R λ° ∙ r3 ∙
Security analysis
𝑉′1 = R λ° (S r1,𝑑1,𝑑2 , S r2, 𝑑3,𝑑4 , S r3,𝑑1𝑑5 )
Privacy preserving similarity detection for data analysis 10
• Internal:
– Rotation angle is known.
• External:
– Rotation angle remains unknown.
Security analysis cont’d
Privacy preserving similarity detection for data analysis 11
Per user equivalent coefficient are exposured as auxiliary information
∙𝐜𝐨𝐨 (𝝀𝝀) ⋯ 𝐨𝐬𝐬 (𝝀𝝀)
⋮ ⋮−𝐨𝐬𝐬 (𝝀𝝀) ⋯ 𝐜𝐨𝐨 (𝝀𝝀)
?
∙ r1
∙ r2
∙ r3
∙ r1
∙ r2
∙ r3
Evaluation
• 173 users willing to run 4sqPersonality test • 5 factor personality test
– Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism.
Privacy preserving similarity detection for data analysis 12
Clustering approach
• Hierarchical Agglomerative clustering (HAC) – Input: n points and N*N similarity matrix – Output: Single cluster containing all n points C=MakeSingletonClusters(); for i=0 to i=n: Find “closest” clusters c1,c2; Merge(c1,c2); RecomputeDistances(C); if #C=1 exit();
Agglomerative: O(n3) Divisible: O(2n)
Privacy preserving similarity detection for data analysis 13
Cosine Similarity
Results
Recap
1. Pairwise cosine similarity for multidimensional vectors.
2. Geometrical transformations compatible with cosine similarity.
Privacy preserving similarity detection for data analysis 15
Looking Ahead
• Other privacy preserving similarity detection algorithms.
• Privacy preserving data analysis algorithms: – MAX,MIN
Thank you! Iraklis Leontiadis
leontiad@eurecom.fr
Privacy preserving similarity detection for data analysis 16
Recommended