LH* RE : A Scalable Distributed Data Structure with Recoverable Encryption Keys 1 ( Work in...

Preview:

Citation preview

1

LH*RE : A Scalable Distributed Data Structure with

Recoverable Encryption Keys

(Work in Progress, Jan 09)( Provisional Patent Appl.)

Sushil Jajodia Witold Litwin Thomas Schwarz

George Mason U. U. Paris Dauphine Santa Clara U.

2

Overview• A new data structure• A Scalable Distributed Data Structure– LH* Family

• Client-side Encryption–Using one or many symmetric encryption keys–Protects the privacy of client data stored on

unknown servers• Hence moderately trusted by the client

3

Overview• Recoverable Encryption Keys– Safely backed up in the file–Recoverable on behalf of the client –Recoverable without the client on behalf of

some Authority• Revocable Keys– Idem

• Scalable file parameters – Preserving the assurance

4

Overview

• Applications on:– SDDS– P2P– Clouds– Grids

• Enterprise Data• Medical Data• Social Networks

5

Overview• Basic Threat Model– Client site is safe– LH* Coordinator site is safe– Data hosting organization as the whole is safe

(trusted)– Network is safe while a key is backed up or recovered– No malicious intruder

• To decrypt some records an intruder then needs:– Break an encryption key– Break into at least k servers

• k is client defined parameter

6

Overview• The servers to break-in for a specific record can

be anywhere in the file– At locations unknown to the intruder– Changing with splits – The intruder may need to break to all the servers

• The effort of breaking some specific k servers may – Still not suffice to break any record

• Most often

– Suffice only for a few records• When the client uses many encryption keys

7

Overview

• LH*RE data record manipulation costs no more messaging than in an LH* file

• Key recovery cost is about that of LH* scan– Possibly 2M messages for M servers in one or

several rounds

• Storage overhead due to encryption is negligible

• In practice, LH*RE file should be safe

Overview• LH*RE could be useful for:–Organizations with multiple clients & servers

• Typical case today–Clients of remote storage services• P2P, Grid, Cloud … computing• Amazon, Google, MS, IBM…

• Distributed Systems need client-side encryption and key recoverability –Both not yet well handled in practice

8

9

Generic LH*

• Scalable Distributed Hash Data Structure• Data are stored in buckets on Server Sites

numbered 0,1,2…• Applications are at Client Sites– Peer Site may be client & server

• Data are in records with primary keys• Record can be inserted, updated, deleted,

searched or scanned• Record C address m is LH (C )

10

Generic LH*

• Overflowing inserts generate splits moving data into new buckets (on new sites)– Splits are ordered : 0, 0, 1, 0, 1,2,3,0,1,…,2j -1,0…

• LH (C) dynamically changes• Client may not know the actual file state• It uses only its private file state image for addressing• Addressing errors may result

11

Generic LH*• Any addressing error is resolved by the

servers in at most two forwarding messages– Only one for LH*RS

P2P

• Every forwarding adjusts the client image• Addressing errors do not repeat• All together LH* is the fastest SDDS (P2P,

Grid, Cloud...) addressing scheme.

12

LH*RE

• Coordinator may have additional capabilities– Certifying the address of every client– Maintaining PKI over the file • If the network is not safe• For client identity checking

– …• Records are LH* records with additional client

identity field I• Key-based addressing is as for LH*

13

LH*RE

• File starts with at least K buckets– K is file parameter– Basically, K is a power of 2

• Data in every record are encrypted by the client– Through some good symmetric encryption key

method• Much faster than known public key schemes

• Primary keys and I are not encrypted

14

Encryption/Decryption

• Client uses a cached table T (t) with N encryption keys Ei

• Some hash h (C) chooses t for R (C)– E.g., t = h (C) = C mod N

• Client encrypts/decrypts the non-key data field D in R (C) using Ei into D’ field – Using strong encryption• AES• PGP• …

15

Encryption/Decryption

• Client forms encrypted record R ’ (C) = (C, I, t, D’) – I is provable client identity –Or any info to provide by the future

requestor to access R ’

16

Encryption/Decryption

• The client manipulates the encrypted record R’ (C) basically as for LH*– Key-based search, insert, delete and update

• However, the scan operation over the non-key field does not operate anymore–Cannot search for the content – That is the basic purpose of LH*RE

17

Encryption Key Encoding• Client encodes each encryption key E – Using secret sharing with k ≤ K shares

• k - 1 shares are different white noises N 1 .. Nk-1

– There is a new set of shares for every encryption key • Higher assurance than if all keys used the same

set of noises• Such approach remains a possibility nevertheless–Not addressed in what follows, unless stated

otherwise

18

Encryption Key Encoding

• The k - th share value is

E' = N1 … Nk-1 E

– denotes X OR• Each share becomes share record

Sj = (Cj , t, I, Ni ) for j = 1, k - 1

S k = (Cj , t , I, E‘ )

19

Encryption Key Encoding

• Client chooses each key Cj by some hash LHK

defined as follows: – LHK hashes Nj or E’ on initial buckets 0,1…K -1

– For any j > 1 and any l < j : LHK (Cj ) ≠ LHK (Cl )

• Here Cl is previously generated key for E being encoded

– Every Cj is unique in the file• General constraint on LH* file–Could be relaxed

20

Encryption Key Encoding

• Client sends each Sj for storage– As usual if the network is safe– Using any reasonable protocol for safe

transmission otherwise• SSL…

• Otherwise, the snooper could guess all the shares and decode an encryption key

• Forwarding does not need this procedure• Neither the data record manipulation

21

Encryption Key Encoding• Main Property– All share records of E that client sends out for storage

end up at different servers• Even if they are forwarded

• Regardless of future splits and merges they always remain at different servers– Despite the migrations during the splits

• Proof : details avoided here• Basis : in LH*, no splits may migrate records in

different buckets into the same bucket

22

Encryption Key Encoding• Example– File extends over servers (buckets) 0,1,2,…12,13– Shares of some key end up in servers 0,3,6,11– Coming splits may only move these shares

respectively to servers distant by 23, 24, 25… 6 14,22… 0 16,32… 3 19,35… 11 27…

23

Encryption Key Recovery

• Concerns all the encryption keys of some client I’• Requestor can be the client itself– Having lost T for any reason

• Requestor can be a trusted authority A – In case of disappearance of I’

• Dismissal of an employee• Death or incapacity of a patient• ….

• A requests then the recovery on behalf of new client I”

24

Encryption Key Recovery

• Requestor basically does not know k and N • It requests then the LH-like scan with the

deterministic termination– Searching for any share record where for some N’

I := I’ and t ≤ N ’• Choice of N’ is arbitrary–Basically, should be large enough to be > N –Alternatively, the client may use it to prevent

the flooding by the incoming replies

25

Encryption Key Recovery

• If the requestor knows N and k the probabilistic termination suffices– Recovery may be cheaper

• In practice, with high probability, probabilistic termination should usually suffice– Why ?

26

Encryption Key Recovery

• The requestor could be fake– E.g., Monkey in the middle

• Each server receiving S verifies therefore the identity of the requestor– E.g., the IP address of the client with the

coordinator• Unless it caches the legal addresses• Or they are integral part of the I-fields

– Or it verifies the signature through PKI – …

27

Encryption Key Recovery

• Direct requests from servers to the coordinator generate 2N messages– Heavy load for the coordinator

• Alternative way is to aggregate the requests at the servers

• Sending fewer of those to the coordinator• Even a single one only• As below

28

Encryption Key Recovery

• Every server having a child waits for the request from it

• Every child requests the confirmation from its father

• Except for server 0, every server requests the confirmations from its father – By structure of LH* all these requests end up at

server 0– Server 0 forwards the request to the coordinator

29

Encryption Key Recovery

• The coordinator gets a single message – Regardless of N

• Its reply propagates downward similarly• Notice that the scheme works assuming no

malicious action at server– As we do unless we state otherwise

• Otherwise, e.g., server 0 could send fake OK• Big trouble could follow

30

Encryption Key Recovery

• Once the server gets OK, it starts the actual bucket scan

• Sends all the records found to I’ or I’’– If the network is not safe, it uses SSL or alike• Snooper could collect the shares otherwise

• Sends an Ack of having received S otherwise

31

Encryption Key Recovery

• The client – Matches the records with same t – Recovers the t-th key• By of all the shares sharing t• Deterministic termination guarantees that there are k

such shares

– Sets N = tmax where tmax is the maximal t received

32

Encryption Key Revocation

• Revocation consists of change of the encryption key for every data record of a client

• May happen when–Client’s T went to wrong hands–Client right to use data abruptly expired• Termination of employment• …

33

Encryption Scalability

• More encryption keys for a larger file– To offset assurance deterioration• Here: the number of keys that remain

undisclosed if a key gets disclosed • Suffices to append new keys to T and extend

the hash function• Existing encryption is not affected

34

Encoding Scalability

• More shares per key for a larger file– To offset assurance deterioration

• To set k = k + 1, it suffices:– Create for every i a new noise share Nk

– Read any but one share record Sj of the t – th key

– N j := Nj Nk

– Store updated Sj

– Create and store new share record Sk = (Ck , t, I, Nk)

35

Encoding Scalability

• The process may be carried out by scanning successive buckets 0,1…–Requesting from new buckets only share

records whose t was not dealt with yet.–Until we re-encode the entire T

36

Performance: Messaging Cost

• Same as for LH* for data records manipulation• Plus kN + messages to backup T • Basically, about 4N messages for key recovery

scan– In about log N rounds

• Can be (much) less messages for probabilistic termination or client address caching at the servers

37

Processing Cost

• Processing overhead concerns –Mainly, the (symmetric) encryption/decryption• Depends on encryption scheme used

– From time to time, especially initially • Key generation & encoding

– Sporadically• Key Recovery• Key Revocation

• This analysis is an open issue at present

38

Storage Overhead

• Should be O (kN) on the servers• Encryption keys & thus share records should be usually

small compared to data records• Same for other LH*RE specific fields within each data

record • Storage overhead on the servers should be usually

negligible• Client storage for T should be O (N)– Easily OK for even millions of encryption keys in a typical RAM

39

Encryption Strength

• Attack 1: Any Single Server Intrusion – By an Intruder or the Administrator

• Accidentally or willingly

• Impossible to decode any encryption key• One has to break the encryption keys of the data

records of interest– About impossible in practice for good encryption– Difficulty compounds when the client uses multiple

encryption keys

• LH*RE data on a server are safe in this sense

40

Encryption Strength • Attack 2 : Multiple Server Intrusion to decrypt a

specific data record• To decode E of any data record of interest intruder

has to break into at least k servers – With the shares of E

• Otherwise, the brute force is the only issue• If M > k, to break into k or more servers does

not guarantee the success with a specific record– See the example later on in this talk

41

Encryption Strength • The shares searched for may be anywhere in the

file• N o share has any info about the location of the

other shares• The intruder may need to break into every server• If M = k, to break into k servers suffices for the

success• Hence it is safer to start the file with K > k

42

Encryption Strength • Attack 3 : At least any k-server intrusion to decrypt

any data records• The decoding of some encryption keys hence

disclosure of some data is possible– But not sure

• The likelihood and consequences depend on file state and parameters

• Assurance analysis may be the tool to find out more

43

Encryption Assurance• Assuming impossible to break the encryption keys

by brute force,• What if an intruders breaks to l servers ?• Assurance Analysis Measures– Confidence that no disclosure happens – Extend of disclosure otherwise

44

Encryption Assurance• Basic measures–Probability a that no record gets disclosed– Expected fraction d of the file that gets

disclosed – Expected fraction that remains undisclosed–Number of records that are disclosed • or remain undisclosed

45

Encryption Assurance• If l < k, then a = 1• If l ≥ k, then a depends on number of servers M,

on N and on bucket size b at each server– Basically, larger are N or M and smaller is b,

higher is the assurance• In-depth analysis remains to be done

46

Example• k = 4, 1 encryption key, 16 servers• Assurance a against intrusion into k servers ?• Usual randomness – Servers are equally likely to be intruded

a = 1 – ( 4 /16 * 3 /15 * 2/14 * 1/13 ) = 1 – 1/1820 ≈ 0.9995

• Expected disclosure : d = ¼ of the file• Remains undisclosed : 1 – d = ¾ of the

file

47

Example• Use of 2 encryption keys

a (1) ≈ 1 – 2/1820 ≈ 0.999

a ( 2) = 1 – (2/1820) 2 > 0.999999

a = 1 – 2/1820 – (2/1820) 2 ≈ 0.999• Expected disclosure d ≈ 1/8 of the file• Now what about using 10 keys ?

a ≈ 0.99 d ≈ 1/ 4 0• And what about 100 keys ?• And what if the file becomes bigger ?– e.g. M 128

48

Conclusion• New data structure• Let the file to be scalable and distributed • Let data records to be client-side encrypted• Let encryption keys to be recoverable and

revocable• Negligible messaging, processing and storage

overhead• Future work should focus on experiments &

assurance analysis

49

Future work• Experiments • Assurance analysis• Applications• Variants– Server caches client addresses– Probabilistic termination for key recovery– …

• Larger threat model– Malicious intruder

• Destroying or corrupting the shares

50

Thank you for

Your Attention