Upload
elijah-doyle
View
221
Download
1
Embed Size (px)
Citation preview
http://www.cs.ucla.edu/~rafail/
Private Keyword Private Keyword Search on Search on
Streaming DataStreaming Data
Rafail Ostrovsky William Skeith UCLA
(patent pending)(patent pending)
Motivating ExampleMotivating Example
The intelligence community collects data The intelligence community collects data from multiple sources that might potentially from multiple sources that might potentially be “useful” for future analysis.be “useful” for future analysis. Network trafficNetwork traffic Chat roomsChat rooms Web sites, etc…Web sites, etc…
However, what is “useful” is often However, what is “useful” is often classified.classified.
Current PracticeCurrent Practice
Continuously transfer all data to a Continuously transfer all data to a secure environment.secure environment.
After data is transferred, filter in the After data is transferred, filter in the classified environment, keep only classified environment, keep only small fraction of documents.small fraction of documents.
¢¢¢! D(1,3)! D(1,2)! D(1,1)!
¢¢¢! D(2,3)! D(2,2)! D(2,1)!
¢¢¢! D(3,3)! D(3,2)! D(3,1)!
Classified EnvironmentClassified Environment
FilterFilter StorageStorageD(3,1)D(1,1)D(1,2)D(2,2)D(2,3)D(3,2)D(2,1)D(1,3)D(3,3)
Filter rules are Filter rules are
written by an written by an
analyst and are analyst and are
classified!classified!
Current PracticeCurrent Practice
Drawbacks:Drawbacks:CommunicationCommunicationProcessingProcessing
How to improve performance?How to improve performance?
Distribute work to many locations on Distribute work to many locations on a networka network
Seemingly ideal solution, but…Seemingly ideal solution, but…Major problem:Major problem:
Not clear how to maintain privacy, which Not clear how to maintain privacy, which is the focus of this talkis the focus of this talk
¢¢¢! D(1,3)! D(1,2)! D(1,1)!
¢¢¢! D(2,3)! D(2,2)! D(2,1)!
¢¢¢! D(3,3)! D(3,2)! D(3,1)!
Classified Classified EnvironmentEnvironmentFilterFilter
StorageStorage
EE (D(D(1,2)(1,2)))
EE (D(D(1,3)(1,3)))
FilterFilter
StorageStorage
EE (D(D(2,2)(2,2)))
FilterFilter
StorageStorage
DecryptDecrypt
StorageStorage
DD(1,2)(1,2)
DD(1,3)(1,3)
DD(2,2)(2,2)
Example Filter:Example Filter:Look for all documents that contain special Look for all documents that contain special
classified keywords, selected by an analystclassified keywords, selected by an analystPerhaps an alias of a dangerous criminalPerhaps an alias of a dangerous criminal
PrivacyPrivacyMust hide what words are used to create the Must hide what words are used to create the
filterfilterOutput must be encryptedOutput must be encrypted
More generally:More generally:
We define the notion of Public Key We define the notion of Public Key Program ObfuscationProgram Obfuscation
Encrypted version of a programEncrypted version of a programPerforms same functionality as un-obfuscated Performs same functionality as un-obfuscated
program, but:program, but:Produces encrypted outputProduces encrypted output Impossible to reverse engineerImpossible to reverse engineer
A little more formally:A little more formally:
Public Key Program ObfuscationPublic Key Program Obfuscation
PrivacyPrivacy
Related NotionsRelated Notions
PIR (Private Information Retrieval) PIR (Private Information Retrieval) [CGKS],[KO],[CMS]…[CGKS],[KO],[CMS]…
Keyword PIR [KO],[CGN],[FIPR]Keyword PIR [KO],[CGN],[FIPR]Program Obfuscation [BGIRSVY]…Program Obfuscation [BGIRSVY]…
Here output is identical to un-obfuscated Here output is identical to un-obfuscated program, but in our case it is encrypted.program, but in our case it is encrypted.
Public Key Program ObfuscationPublic Key Program ObfuscationA more general notion than PIR, with lots of A more general notion than PIR, with lots of
applicationsapplications
What we wantWhat we want
¢¢¢! D(1,3)! D(1,2)! D(1,1)! FilterFilterStorageStorage
This is matching document #2
This is a Non-matching document
This is matching document #1
This is matching document #3
This is a Non-matching document
This is a Non-matching document
How to accomplish this?How to accomplish this?
Several Solutions based on Several Solutions based on Homomorphic EncryptionsHomomorphic Encryptions
For this talk: Paillier EncryptionFor this talk: Paillier Encryption Properties:Properties:
Plaintext set = Plaintext set = ZZnn
Ciphertext set = Ciphertext set = ZZ**nn22
Homomorphic, i.e., Homomorphic, i.e., EE(x)(x)EE(y) = (y) = EE(x+y)(x+y)
Simplifying Assumptions for this Simplifying Assumptions for this TalkTalk
All keywords come from some poly-size All keywords come from some poly-size dictionarydictionary
Truncate documents beyond a certain Truncate documents beyond a certain lengthlength
wwt-2t-2 EE(1)(1)
wwt-1t-1 EE(0)(0)
wwtt EE(0)(0)
ww11 EE(0)(0)
ww22 EE(1)(1)
ww33 EE(0)(0)
ww44 EE(0)(0)
ww55 EE(1)(1)
.
.
.
D
EE(0(0))
EE(0(0))
EE(0(0))
EE(0(0))
EE(0(0))
EE(0(0))
EE(0(0))
EE(0(0))
EE(0(0))
EE(0(0))
(g,gD)
¤=
¤=
¤=
Dic
tiona
ry
Output Buffer
This is matching document #1
This is matching document#3
This is matching document #2
Here’s another matching document
Collisions cause two problems:
1. Good documents are destroyed
2. Non-existent documents could be fabricated
We’ll make use of two We’ll make use of two combinatorial lemmas…combinatorial lemmas…
How to detect collisions?How to detect collisions?
Append a highly structured, (yet random) Append a highly structured, (yet random) k-bit string to the messagek-bit string to the message
The sum of two or more such strings will The sum of two or more such strings will be another such string with negligible be another such string with negligible probability in kprobability in k
Specifically, partition k bits into triples of Specifically, partition k bits into triples of bits, and set exactly one bit from each bits, and set exactly one bit from each triple to 1triple to 1
100|001|100|010|010|100|001|010|010100|001|100|010|010|100|001|010|010
010|001|010|001|100|001|100|001|010010|001|010|001|100|001|100|001|010
010|100|100|100|010|001|010|001|010010|100|100|100|010|001|010|001|010
100|100|010|100|100|010|111111|100|100||100|100|111111|010|010|010|010
==
Detecting Overflow > mDetecting Overflow > m
Double buffer size from m to 2mDouble buffer size from m to 2m If m < #documents < 2m, output “overflow”If m < #documents < 2m, output “overflow” If #documents > 2m, then expected If #documents > 2m, then expected
number of collisions is large, thus output number of collisions is large, thus output “overflow” in this case as well.“overflow” in this case as well.
Not yet in eprint version, will appear soon, as well as some other Not yet in eprint version, will appear soon, as well as some other
extensionsextensions. .
More from the paper that we don’t More from the paper that we don’t have time to discuss…have time to discuss…
Reducing program size below dictionary Reducing program size below dictionary size (using size (using – Hiding from [CMS]) – Hiding from [CMS])
Queries containing AND (using [BGN] Queries containing AND (using [BGN] machinery)machinery)
Eliminating negligible error (using perfect Eliminating negligible error (using perfect hashing)hashing)
Scheme based on arbitrary homomorphic Scheme based on arbitrary homomorphic encryptionencryption
ConclusionsConclusions
Private searching on streaming dataPrivate searching on streaming dataPublic key program obfuscation, more Public key program obfuscation, more
general than PIRgeneral than PIRPractical, efficient protocolsPractical, efficient protocolsMany open problemsMany open problems
Thanks For Listening!