29
Fast, Private and Verifiable: Server-aided Approximate Similarity Computation over Large-Scale Datasets Shuo Qiu, Boyang Wang, Ming Li, Jesse Victors, Jiqiang Liu, Yangfeng Shi, Wei Wang 4th ACM International Workshop on Security in Cloud Computing SCC 2016 Xi’an - China - 2016 SWIM Seminar August 4, 2016 Mateus Cruz

Fast, Private and Verifiable: Server-aided Approximate Similarity Computation over Large-Scale Datasets

Embed Size (px)

Citation preview

Fast, Private and Verifiable:Server-aided Approximate

Similarity Computation overLarge-Scale Datasets

Shuo Qiu, Boyang Wang, Ming Li, Jesse Victors, Jiqiang Liu,Yangfeng Shi, Wei Wang

4th ACM International Workshop on Security in Cloud ComputingSCC 2016

Xi’an - China - 2016

SWIM SeminarAugust 4, 2016Mateus Cruz

Introduction Preliminaries Proposal Experiments Conclusion

OUTLINE

1 Introduction

2 Preliminaries

3 Proposal

4 Experiments

5 Conclusion

Introduction Preliminaries Proposal Experiments Conclusion

OUTLINE

1 Introduction

2 Preliminaries

3 Proposal

4 Experiments

5 Conclusion

Introduction Preliminaries Proposal Experiments Conclusion

OVERVIEW

Use of Jaccard similarity (JS)Privacy concerns

Ï Wants to disclose only the similarity

Previous approaches use MPC1

Ï High performance overheads

1Multi-party Computation1 / 16

Introduction Preliminaries Proposal Experiments Conclusion

CONTRIBUTIONS

Protocol 1Ï Assumes semi-honest serverÏ Uses MinHash and deterministic encryptionÏ Only leaks Jaccard similarity

Protocol 2Ï Uses Protocol 1Ï Verifies whether the server is malicious

2 / 16

Introduction Preliminaries Proposal Experiments Conclusion

OUTLINE

1 Introduction

2 Preliminaries

3 Proposal

4 Experiments

5 Conclusion

Introduction Preliminaries Proposal Experiments Conclusion

JACCARD SIMILARITY (JS)

Measure similarity between sets A and B

JS(A,B) = A∩BA∪B

ExampleA = {1,2,4}

B = {2,4,8,9}

JS(A,B) = A∩BA∪B = 2

5

3 / 16

Introduction Preliminaries Proposal Experiments Conclusion

MINHASH

Approximation of Jaccard similarityCalculates k hash functions: {h1, . . . ,hk}

Uses the minimum hash value: min{hi(A)}Generate signatures from sets

Ï Compact representations of setsÏ Signatures of A: h(k)(A) = {min{hi(A)}}k

i=1– Length k

JS(A,B) ≈ |h(k)(A)∩h(k)(B)|k

4 / 16

Introduction Preliminaries Proposal Experiments Conclusion

DETERMINISTIC ENCRYPTION

Same ciphertext for the same messageÏ m1 = m2 → Enc(m1) = Enc(m2)

Allows equality checksAlgorithms

Ï sk ←KeyGen(1λ)– Security parameter λ, secret key sk

Ï c ←Enc(sk,m)– Message m, ciphertext c

Ï m ←Dec(sk,c)Ï Dec(sk,Enc(sk,m)) = m

5 / 16

Introduction Preliminaries Proposal Experiments Conclusion

ADVERSARY MODEL

Semi-honest adversary (Protocol 1)Ï Follows the protocolÏ Tries to learn from the data

Malicious adversary (Protocol 2)Ï May not execute the protocol correctly

– Returns a random similarity (Case I)– Returns a partial result (Case II)– Returns a false approximation (Case III)

No collusion between parties

6 / 16

Introduction Preliminaries Proposal Experiments Conclusion

OUTLINE

1 Introduction

2 Preliminaries

3 Proposal

4 Experiments

5 Conclusion

Introduction Preliminaries Proposal Experiments Conclusion

PROBLEM DEFINITION

Calculate similarity between setsÏ Using Jaccard similarityÏ Alice has set AÏ Bob has set BÏ Compute similarity on remote server

Security requirementsÏ Alice, Bob and the server only learn JS(A,B)Ï Alice does not learn |B|Ï Bob does not learn |A|Ï The server does not learn |A|, |B|, |A∩B|

7 / 16

Introduction Preliminaries Proposal Experiments Conclusion

PROTOCOL 1 (SEMI-HONEST SERVER)

Each client...1 Computes MinHash signatures

– Using k shared hash functions2 Encrypts signatures

– Using deterministic encryption– Secret key shared between Alice and Bob

Allows equality checksbetween ciphertexts

3 Sends ciphertexts to the serverThe server...

4 Calculates the JS(A,B)– By comparing encrypted signatures

5 Returns JS(A,B) to clients

8 / 16

Introduction Preliminaries Proposal Experiments Conclusion

PROTOCOL 2 (MALICIOUS SERVER)

Two-round consistency checkRound 1

Ï Calculate JS(A,B)

Round 2Ï Calculate JS(DA,DB)

– DA = A∪S0 ∪S1

– DB = B∪S0 ∪S2

– S0,S1,S2 are disjoint dummy setsÏ Check JS(A,B) and JS(DA,DB)

– Find out whether the server is really malicious

9 / 16

Introduction Preliminaries Proposal Experiments Conclusion

ADDITIONAL NOTATION

|A| = |B| = n and |S0| = |S1| = |S2| = tε: Approximation bias

Ï ε= 1pk, k is the number of hash functions

σ: Real similarity between A and BÏ σ= |A∩B|

2n−|A∩B|σd: Real similarity between DA and DB

Ï σd = |A∩B|+t2n−|A∩B|+3t

σ1: Approx. similarity between A and BÏ σ1 ∈ [σ−ε,σ+ε]

σ2: Approx. similarity between DA and DBÏ σ2 ∈ [σd −ε,σd +ε]

10 / 16

Introduction Preliminaries Proposal Experiments Conclusion

CONSISTENCY CHECK

Can detect malicious serversApply a map f : σ→σd

Ï σd = f (σ) = (2n+t)σ+t3tσ+2n+3t

Given σ1 and σ2, Alice...Ï Outputs 1 if σ2 ∈ [f (σ1 −ε)−ε, f (σ1 +ε)+ε]Ï Outputs 0 otherwise

11 / 16

Introduction Preliminaries Proposal Experiments Conclusion

ACCURACY OF CONSISTENCY CHECK

Evaluates whether the check worksFalse positives

Ï Honest server, but check says it is maliciousFalse negatives

Ï Malicious server, but check says it is honest

12 / 16

Introduction Preliminaries Proposal Experiments Conclusion

OUTLINE

1 Introduction

2 Preliminaries

3 Proposal

4 Experiments

5 Conclusion

Introduction Preliminaries Proposal Experiments Conclusion

SETUP

HardwareÏ Client

– Windows Server 7 with 8 vCPUs– 14GB RAM

Ï Server– Windows Server 2012 with 8 vCPUs– 12GB RAM

SoftwareÏ C++Ï Crypto++ libraryÏ AES-ECB cryptosystem

13 / 16

Introduction Preliminaries Proposal Experiments Conclusion

EFFICIENCY

Pipeline modeÏ Single thread

Parallel modeÏ Multiple threadsÏ Calculate signatures concurrently

14 / 16

Introduction Preliminaries Proposal Experiments Conclusion

VERIFIABILITY

False Positive Rate (FPR)False Negative Rate (FNR)

15 / 16

Introduction Preliminaries Proposal Experiments Conclusion

OUTLINE

1 Introduction

2 Preliminaries

3 Proposal

4 Experiments

5 Conclusion

Introduction Preliminaries Proposal Experiments Conclusion

CONCLUSION

Secure and scalable similarity computationÏ Using MinHash and deterministic encryption

Benefits from parallel executionÏ Speedups of about 5 times

Detection of malicious serverÏ Can have false positives and false negatives

16 / 16

Detailed Protocols

EXTRA SLIDES

Detailed Protocols

PROTOCOL 1: SETUP

DE = {KeyGen,Enc,Dec}Ï Secret key sk ← DE.KeyGen(1λ)Ï sk shared between Alice and Bob

Alice has input A, and Bob has input BÏ |A| = |B| = n

k random hash functions {h1, . . . ,hk}

Detailed Protocols

PROTOCOL 1: STEPS

1 Alice (Bob) computes signatures of A (B)Ï h(k)(A) = {min{hi(A)k

i=1}}Ï h(k)(B) = {min{hi(B)k

i=1}}

2 Alice (Bob) calculates ciphertextsÏ TA ← DE.Enc(sk,h(k)(A))Ï TB ← DE.Enc(sk,h(k)(B))

3 Alice (Bob) sends TA (TB) to the server4 The server computes the similarity σ

Ï σ= |TA∩TB|k

5 The server returns σ to both clients

Detailed Protocols

PROTOCOL 2: SETUP

DE = {KeyGen,Enc,Dec}Ï Secret key sk ← DE.KeyGen(1λ)Ï sk shared between Alice and Bob

Alice has input A, and Bob has input BÏ |A| = |B| = nÏ A,B ⊆D⊆ E

– E is the whole data space

k random hash functions {h1, . . . ,hk}

Detailed Protocols

PROTOCOL 2: STEPS

1 Alice chooses dummy sets S0,S1,S2Ï S0,S1,S2 ⊆D ′ ⊆ EÏ D∩D ′ =;

– A,B ⊆DÏ S0 ∩S1 ∩S2 =;Ï |S0| = |S1| = |S2| = t

– |A| = |B| = n

2 Alice and Bob obtain JS(A,B) = |TA∩TB|k

Ï Following Protocol 13 Alice (Bob) generate DA (DB)

Ï DA = A∪S0 ∪S1Ï DB = B∪S0 ∪S2

Detailed Protocols

PROTOCOL 2: STEPS

4 Alice and Bob obtain JS(DA,DB) = |TDA∩TDB |k

Ï Using Protocol 15 Given σ1 = JS(A,B) and σ2 = JS(DA,DB)

Ï Output 1 if σ2 ∈ [f (σ1 −ε)−ε, f (σ1 +ε)+ε]– ε= 1p

k

– f (x) = (2n+t)x+t3tx+2n+3t

Ï Output 0 otherwise