Presented By Amarjit Datta

Embed Size (px)

DESCRIPTION

Authors and Publication Information Ning Cao PhD in ECE from the Worcester Polytechnic Institute Cong Wang PhD in ECE from Illinois Institute of Technology Ming Li PhD in ECE from the Worcester Polytechnic Institute Kui Ren PhD in ECE from the Worcester Polytechnic Institute. Wenjing Lou PhD in ECE from the University of Florida

Citation preview

Presented By Amarjit Datta
Privacy-Preserving Multi-Keyword Ranked Search over Encrypted Cloud Data Presented By Amarjit Datta Authors and Publication Information
Ning CaoPhD in ECE from the Worcester Polytechnic Institute Cong WangPhD in ECE from Illinois Institute of Technology Ming LiPhD in ECE from the Worcester Polytechnic Institute Kui RenPhD in ECE from the Worcester Polytechnic Institute. Wenjing LouPhD in ECE from the University of Florida Table of Contents Introduction Problem Domain
Some Important Definitions MRSE framework and version MRSE schema analysis MRSE schema improvements Introduction Cloud computing is becoming more and more popularnowadays Why cloud is so popular? Minimum startup cost Pay-as-you-go Easily scalable No server administration overhead Introduction While uploading contents in cloudthere can be manysecurity issues. There can be; man-in-the-middle attack sniffing packets spoofing IP addresses and many more In this research paper, authors analyzed the privacy issuesthat can happenafter the content is uploaded on cloud. Introduction Cloud server acts as honest-but-curioushonest It follows the designated protocolscurious It want to infer and analyze data in its storage So we will have to search encrypted data, hosted in cloudenvironment, without sharing private information with thecloud Introduction So what data owners can do about it?
Data owner can encrypt his files before uploading it tocloud. But how can they search encrypted files in cloud? Traditional plain text keyword search wont work. Introduction We also need search results in a ranked order (Example:most relevant) Coordinate matching: Search for as many keyword matchesas possible in the document. Privacy must be preserved
Problem Definition Performing single keyword based search over encrypted datais already widely researched. This paper explores 2 new use-cases Multi keyword based search over encrypted cloud data. Ranked search (Sort results based on relevance). Herethe paper used coordinated matching for rank analysis. Privacy must be preserved Problem Model Problem Formulation Data owner has collection of data documents and theirencrypted forms. Data owner creates a encrypted searchable index. Both encrypted file and encrypted searchable index arecopied to the cloud server. To search, data users need corresponding trapdoor T Problem Formulation Based on the amount of information cloud server knows Known Ciphertext model - Cloud server will only know the encrypted dataset and searchable indexes. Known background model - Cloud server will know the encrypted dataset + searchable index + additional information (Example: Correlation of data search query) This is what we want! Data privacyEncrypting data file and searchable index file Keyword privacyHide what users are searching Trapdoor unlinkabilityTrapdoor generation function should be randomized insteadof deterministic one. Lets Check MRSE Schema! Main Idea is to confuse the cloud server
So that it cannot detect the search key words and document type. We can do that using randomization on different steps Lets Check MRSE Schema! Notations MRSE Basic Framework MRSE Framework Upload encrypted files and indexes Query Key Data owner
Setup Trapdoor Build Index Key Data owner Data user How to Do Ranking? - Similarity Calculation
Di is a binary data vector for document Fi where each bitDi is either 0 or 1 represents the existence of thecorresponding keyword Wj in that document Q is a binary query vector indicating the keywords ofinterest where each bit Qj represents the existence of thecorresponding keyword Wj in the query. The similarity score of document Fi to query is thereforeexpressed as the inner product of their binary columnvectors, i.e., Di . Q. MRSE_I Scheme MRSE_I Scheme Setup: The data owner randomly generates a (n+2)-bitvector as S and two (n+2) x (n+2) invertible matricesM1;M2. Generate secret key SK is in the form of a 3-tupleas {S;M1;M2}n is the number of fields for each recordn + 2 is = n {dummy random keyword} Build-Index: The data owner generates a binary datavector Di for every document Fi, where each binary bit Di[j]represents whether the corresponding keyword Wj appearsin the document Fi. MRSE_I Scheme Trapdoor: With t keywords of interest, one binary vector Qis generated where each bit Qj indicates whether Wjbelongs to W is true or false. Based on this vector, trapdooris generated. Query: With the trapdoor, the cloud server computes thesimilarity scores of each document Fi. After sorting all scores, the cloud server returns the top-kranked id list MRSE_I - Analysis Functionality: Random dummy keyword introduced canfollow a normal distribution where the standard deviationfunctions as a flexible tradeoff parameter among searchaccuracy and security. MRSE_I - Analysis Data privacy: Is preserved by the encryption of data. Index privacy: Secret until the secure key is protected. With the randomness introduced by the splitting process andthe random numbers r, and t, our basic scheme can generatetwo totally different trapdoors for the same query. Improvement of MRSE_1 MRSE is secure enough for known Cyphertext model. But for known background model, this is not sufficient. For example: Document frequency, which can be furthercombined with background information to identify the keywordin a query at high probability. Improvement of MRSE_1 - Scale Analysis Attack
Given two correlated trapdoors T1 and T2 for query keywords {K1;K2} and {K1;K2;K3} and three documents, respectively, the cloud server could deduce that whether all the three documents contain K3 or none of them contain K3. From this cloud server can find out document frequency MRSE_2 Scheme U is the number of dummy keywords inserted.
In MRSE_1, only 1 dummy keyword was used in 1 document.Both Build Index and Query considers U More Improvement So far we have used number of keywords available in thedocument count only for doing ranking. But there can be some other important facts too. For example: When a keyword appears in all documents,its important is less. So considering keyword weight while ranking documents canbe an improvement More Improvement MRSE_I_TF schema is the improved version of MRSE thatconsiders weight of the keyword during similaritycalculation. MRSE_2_TF schema incorporate both the idea ofMRSE_I_TF (weighted keyword) and MRSE_2 (List ofrandom dummy keywords) Future Possible Work For future work, authors will explore checking the integrityof the rank order in the search result assuming the cloudserver is untrusted.