27
http://www.cs.ucla.edu/~rafail/ Private Keyword Private Keyword Search on Search on Streaming Data Streaming Data Rafail Ostrovsky William Skeith UCLA (patent (patent pending) pending)

Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

Embed Size (px)

Citation preview

Page 1: Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

http://www.cs.ucla.edu/~rafail/

Private Keyword Private Keyword Search on Search on

Streaming DataStreaming Data

Rafail Ostrovsky William Skeith UCLA

(patent pending)(patent pending)

Page 2: Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

Motivating ExampleMotivating Example

The intelligence community collects data The intelligence community collects data from multiple sources that might potentially from multiple sources that might potentially be “useful” for future analysis.be “useful” for future analysis. Network trafficNetwork traffic Chat roomsChat rooms Web sites, etc…Web sites, etc…

However, what is “useful” is often However, what is “useful” is often classified.classified.

Page 3: Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

Current PracticeCurrent Practice

Continuously transfer all data to a Continuously transfer all data to a secure environment.secure environment.

After data is transferred, filter in the After data is transferred, filter in the classified environment, keep only classified environment, keep only small fraction of documents.small fraction of documents.

Page 4: Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

¢¢¢! D(1,3)! D(1,2)! D(1,1)!

¢¢¢! D(2,3)! D(2,2)! D(2,1)!

¢¢¢! D(3,3)! D(3,2)! D(3,1)!

Classified EnvironmentClassified Environment

FilterFilter StorageStorageD(3,1)D(1,1)D(1,2)D(2,2)D(2,3)D(3,2)D(2,1)D(1,3)D(3,3)

Filter rules are Filter rules are

written by an written by an

analyst and are analyst and are

classified!classified!

Page 5: Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

Current PracticeCurrent Practice

Drawbacks:Drawbacks:CommunicationCommunicationProcessingProcessing

Page 6: Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

How to improve performance?How to improve performance?

Distribute work to many locations on Distribute work to many locations on a networka network

Seemingly ideal solution, but…Seemingly ideal solution, but…Major problem:Major problem:

Not clear how to maintain privacy, which Not clear how to maintain privacy, which is the focus of this talkis the focus of this talk

Page 7: Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

¢¢¢! D(1,3)! D(1,2)! D(1,1)!

¢¢¢! D(2,3)! D(2,2)! D(2,1)!

¢¢¢! D(3,3)! D(3,2)! D(3,1)!

Classified Classified EnvironmentEnvironmentFilterFilter

StorageStorage

EE (D(D(1,2)(1,2)))

EE (D(D(1,3)(1,3)))

FilterFilter

StorageStorage

EE (D(D(2,2)(2,2)))

FilterFilter

StorageStorage

DecryptDecrypt

StorageStorage

DD(1,2)(1,2)

DD(1,3)(1,3)

DD(2,2)(2,2)

Page 8: Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

Example Filter:Example Filter:Look for all documents that contain special Look for all documents that contain special

classified keywords, selected by an analystclassified keywords, selected by an analystPerhaps an alias of a dangerous criminalPerhaps an alias of a dangerous criminal

PrivacyPrivacyMust hide what words are used to create the Must hide what words are used to create the

filterfilterOutput must be encryptedOutput must be encrypted

Page 9: Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

More generally:More generally:

We define the notion of Public Key We define the notion of Public Key Program ObfuscationProgram Obfuscation

Encrypted version of a programEncrypted version of a programPerforms same functionality as un-obfuscated Performs same functionality as un-obfuscated

program, but:program, but:Produces encrypted outputProduces encrypted output Impossible to reverse engineerImpossible to reverse engineer

A little more formally:A little more formally:

Page 10: Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

Public Key Program ObfuscationPublic Key Program Obfuscation

Page 11: Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

PrivacyPrivacy

Page 12: Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

Related NotionsRelated Notions

PIR (Private Information Retrieval) PIR (Private Information Retrieval) [CGKS],[KO],[CMS]…[CGKS],[KO],[CMS]…

Keyword PIR [KO],[CGN],[FIPR]Keyword PIR [KO],[CGN],[FIPR]Program Obfuscation [BGIRSVY]…Program Obfuscation [BGIRSVY]…

Here output is identical to un-obfuscated Here output is identical to un-obfuscated program, but in our case it is encrypted.program, but in our case it is encrypted.

Public Key Program ObfuscationPublic Key Program ObfuscationA more general notion than PIR, with lots of A more general notion than PIR, with lots of

applicationsapplications

Page 13: Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

What we wantWhat we want

¢¢¢! D(1,3)! D(1,2)! D(1,1)! FilterFilterStorageStorage

Page 14: Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

This is matching document #2

This is a Non-matching document

This is matching document #1

This is matching document #3

This is a Non-matching document

This is a Non-matching document

Page 15: Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

How to accomplish this?How to accomplish this?

Page 16: Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

Several Solutions based on Several Solutions based on Homomorphic EncryptionsHomomorphic Encryptions

For this talk: Paillier EncryptionFor this talk: Paillier Encryption Properties:Properties:

Plaintext set = Plaintext set = ZZnn

Ciphertext set = Ciphertext set = ZZ**nn22

Homomorphic, i.e., Homomorphic, i.e., EE(x)(x)EE(y) = (y) = EE(x+y)(x+y)

Page 17: Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

Simplifying Assumptions for this Simplifying Assumptions for this TalkTalk

All keywords come from some poly-size All keywords come from some poly-size dictionarydictionary

Truncate documents beyond a certain Truncate documents beyond a certain lengthlength

Page 18: Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

wwt-2t-2 EE(1)(1)

wwt-1t-1 EE(0)(0)

wwtt EE(0)(0)

ww11 EE(0)(0)

ww22 EE(1)(1)

ww33 EE(0)(0)

ww44 EE(0)(0)

ww55 EE(1)(1)

.

.

.

D

EE(0(0))

EE(0(0))

EE(0(0))

EE(0(0))

EE(0(0))

EE(0(0))

EE(0(0))

EE(0(0))

EE(0(0))

EE(0(0))

(g,gD)

¤=

¤=

¤=

Dic

tiona

ry

Output Buffer

Page 19: Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

This is matching document #1

This is matching document#3

This is matching document #2

Here’s another matching document

Collisions cause two problems:

1. Good documents are destroyed

2. Non-existent documents could be fabricated

Page 20: Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

We’ll make use of two We’ll make use of two combinatorial lemmas…combinatorial lemmas…

Page 21: Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)
Page 22: Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

How to detect collisions?How to detect collisions?

Append a highly structured, (yet random) Append a highly structured, (yet random) k-bit string to the messagek-bit string to the message

The sum of two or more such strings will The sum of two or more such strings will be another such string with negligible be another such string with negligible probability in kprobability in k

Specifically, partition k bits into triples of Specifically, partition k bits into triples of bits, and set exactly one bit from each bits, and set exactly one bit from each triple to 1triple to 1

Page 23: Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

100|001|100|010|010|100|001|010|010100|001|100|010|010|100|001|010|010

010|001|010|001|100|001|100|001|010010|001|010|001|100|001|100|001|010

010|100|100|100|010|001|010|001|010010|100|100|100|010|001|010|001|010

100|100|010|100|100|010|111111|100|100||100|100|111111|010|010|010|010

==

Page 24: Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

Detecting Overflow > mDetecting Overflow > m

Double buffer size from m to 2mDouble buffer size from m to 2m If m < #documents < 2m, output “overflow”If m < #documents < 2m, output “overflow” If #documents > 2m, then expected If #documents > 2m, then expected

number of collisions is large, thus output number of collisions is large, thus output “overflow” in this case as well.“overflow” in this case as well.

Not yet in eprint version, will appear soon, as well as some other Not yet in eprint version, will appear soon, as well as some other

extensionsextensions. .

Page 25: Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

More from the paper that we don’t More from the paper that we don’t have time to discuss…have time to discuss…

Reducing program size below dictionary Reducing program size below dictionary size (using size (using – Hiding from [CMS]) – Hiding from [CMS])

Queries containing AND (using [BGN] Queries containing AND (using [BGN] machinery)machinery)

Eliminating negligible error (using perfect Eliminating negligible error (using perfect hashing)hashing)

Scheme based on arbitrary homomorphic Scheme based on arbitrary homomorphic encryptionencryption

Page 26: Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

ConclusionsConclusions

Private searching on streaming dataPrivate searching on streaming dataPublic key program obfuscation, more Public key program obfuscation, more

general than PIRgeneral than PIRPractical, efficient protocolsPractical, efficient protocolsMany open problemsMany open problems

Page 27: Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

Thanks For Listening!