41
page 1 March 1, 2005 10th Estonian Winter School in Computer Science Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions Benny Pinkas HP Labs, Israel

Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

  • Upload
    kiral

  • View
    36

  • Download
    0

Embed Size (px)

DESCRIPTION

Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions. Benny Pinkas HP Labs, Israel. Secure two-party computation - definition. y. x. Input:. F(x,y) and nothing else. Output:. y. As if…. x. F(x,y). F(x,y). Secure Function Evaluation. - PowerPoint PPT Presentation

Citation preview

Page 1: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 1March 1, 2005 10th Estonian Winter School in Computer Science

Privacy Preserving Data Mining

Lecture 2

Cryptographic Solutions

Benny PinkasHP Labs, Israel

Page 2: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 2March 1, 2005 10th Estonian Winter School in Computer Science

Secure two-party computation - definition

x y

F(x,y) and nothing else

Input:Output:

x yAs if…

F(x,y) F(x,y)

Page 3: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 3March 1, 2005 10th Estonian Winter School in Computer Science

Secure Function Evaluation

• A major topic of cryptographic research• How to let n parties, P1,..,Pn compute a function

F(x1,..,xn) – Where input xi is known to party Pi

– Parties learn the final input and nothing else

• Caveat: cryptographic definitions of secure computation are both too strong and too weak:– Too strong: do not allow leakage of harmless

information; the price of this extra security is in efficiency.

– Too weak: do not address leakage or misuse caused by the function itself (e.g., information implied by the outputs, or misbehavior in choosing an input).

Page 4: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 5March 1, 2005 10th Estonian Winter School in Computer Science

Secure Function Evaluation

•Major Result [Yao]: “Any function that can be evaluated using polynomial resources can be securely evaluated using polynomial resources”(under some cryptographic assumption)

Page 5: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 6March 1, 2005 10th Estonian Winter School in Computer Science

SFE Building Block: 1-out-of 2 Oblivious Transfer

Learns nothing

Yj

Alicej{0,1}

Bob

Y0, Y1

• 1-out-of-2 OT can be based on most public key systems

• There are implementations with two communication rounds

Page 6: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 7March 1, 2005 10th Estonian Winter School in Computer Science

General Two party Computation

Two party protocol

• Input:– Sender: Function F (some representation)

• The sender’s input Y is already embedded in F

– Receiver: X 0,1n

•Output:– Receiver: F(x) and nothing else about F– Sender: nothing about x

Page 7: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 8March 1, 2005 10th Estonian Winter School in Computer Science

Representations of F

• Boolean circuits [Yao,GMW,…]• Algebraic circuits [BGW,…]• Low deg polynomials [BFKR]• Matrices product over a large field

[FKN,IK]• Randomizing polynomials [IK]• Communication Complexity Protocol

[NN]

Page 8: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 9March 1, 2005 10th Estonian Winter School in Computer Science

Secure two-party computation of general functions [Yao]

•First, represent the function F as a Boolean circuit C– It’s always possible– Sometimes it’s easy (additions,

comparisons)– Sometimes the result is inefficient (e.g. for

indirect addressing, e.g. A[x] )

•Then, “garble” the circuit

•Finally, evaluate the garbled circuit

Page 9: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 10

March 1, 2005 10th Estonian Winter School in Computer Science

Garbling the circuit

• Bob constructs the circuit, and then garbles it.

G

wi0,wi

1 wJ0,wJ

1

wk0,wk

1

W values will serve as cryptographic keys

Wk0 0 on wire

kWk

1 1 on wire k

(Alice will learn onestring per wire, butnot which bit it corresponds to.)

Page 10: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 11

March 1, 2005 10th Estonian Winter School in Computer Science

Gate tables

• For every gate, every combination of input values is used as a key for encrypting the corresponding output

• Assume G=AND. Bob constructs a table:– Encryption of wk

0 using keys wi0,wJ

0 (AND(0,0)=0)– Encryption of wk

0 using keys wi0,wJ

1 (AND(0,1)=0)– Encryption of wk

0 using keys wi1,wJ

0 (AND(1,0)=0)– Encryption of wk

1 using keys wi1,wJ

1 (AND(1,1)=1)

• Result: given wix,wJ

y, can compute wkG(x,y)

Page 11: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 12

March 1, 2005 10th Estonian Winter School in Computer Science

Secure computation

• Bob sends the table of gate G to Alice• Given, e.g., wi

0,wJ1, Alice computes wk

0 by decrypting the corresponding entry in the table, but she does not know the actual values of the wires.

G

wi0,wi

1 wJ0,wJ

1

wk0,wk

1

Encryption of wk0 using keys

wi0,wJ

0

Encryption of wk0 using keys

wi0,wJ

1

Encryption of wk1 using keys

wi1,wJ

1

Encryption of wk0 using keys

wi1,wJ

0

Permuted order

Page 12: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 13

March 1, 2005 10th Estonian Winter School in Computer Science

Secure computation

•Bob sends to Alice– Tables encoding each circuit gate.– Garbled values (w’s) of his input values.– Translation from garbled values of output

wires to actual 0/1 values.

• If Alice gets garbled values (w’s) of her input values, she can compute the output of the circuit, and nothing else.

Page 13: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 14

March 1, 2005 10th Estonian Winter School in Computer Science

Alice’s input

•For every wire i of Alice’s input:– The parties run an OT protocol– Alice’s input is her input bit (s).– Bob’s input is wi

0,wi1

– Alice learns wis

•The OTs for all input wires can be run in parallel.

•Afterwards Alice can compute the circuit by herself.

Page 14: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 15

March 1, 2005 10th Estonian Winter School in Computer Science

Secure computation – the big picture

•Represent the function as a circuit C•Bob sends to Alice 4|C| encryptions (e.g. 64|C| Bytes), 4 encryptions for every gate.

•Alice performs an OT for every input bit. (Can do, e.g. 100-1000 OTs per sec.)

•~One round of communication.•Efficient for medium size circuits!

Page 15: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 16

March 1, 2005 10th Estonian Winter School in Computer Science

Example

• The Millionaires problem: comparing two N bit numbers

• What’s the overhead?

Page 16: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 17

March 1, 2005 10th Estonian Winter School in Computer Science

Applications

• Two parties. Two large data sets.• Max?• Mean? • Median?• Intersection?• Decision Tree learning? ID3?

Page 17: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 18

March 1, 2005 10th Estonian Winter School in Computer Science

Fairplay – a secure two-party computation systemMalkhi, Nissan, P., Sella

• A a full fledged secure two-party computation system, implementing Yao’s “garbled circuit” protocol.

• Goals:– Investigate whether two-party SFE is practical– Actual measurements of overall computation– Breakdown of computation into parts– Computation versus communication?– Test-bed for various optimizations

Page 18: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 19

March 1, 2005 10th Estonian Winter School in Computer Science

Fairplay

• The Compilation paradigm– Programs written in SFDL, a high-level

programming language– Allows clear, formal, easily understandable

definition and requirements by humans

– SHDL: Low-level language describing Boolean circuits

– SFDL SHDL compiler and optimizer– SHDL Java programs implementing Yao’s

protocol

Page 19: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 20

March 1, 2005 10th Estonian Winter School in Computer Science

Fairplay – SFDL example

program Millionaires { type int = Int<20>; // 20-bit integer type AliceInput = int; type BobInput = int; type AliceOutput = Boolean; type BobOutput = Boolean; type Output = struct {AliceOutput alice, BobOutput bob}; type Input = struct {AliceInput alice, BobInput bob};

function Output output(Input input) { output.alice = input.alice > input.bob; output.bob = input.bob > input.alice; }

Page 20: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 21

March 1, 2005 10th Estonian Winter School in Computer Science

SFDL properties

• Conventional syntax (C/Pascal-like)• Type system – Boolean, integer, enumerated• Program structure

– Declarations: global constants, types– Sequence of functions (no nesting [C], no recursion)– Function name is its return value [Pascal]

• Conditional execution and loops– if-then, if-then-else statements, For-loop (loop boundaries

should be known at compile time)• Assignments and expressions

– constants, variables, array entries, structure items, function calls, operators (+, -, logical, comparison), parenthesis

Page 21: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 22

March 1, 2005 10th Estonian Winter School in Computer Science

SHDL example

0 input //output$input.bob$01 input //output$input.bob$12 input //output$input.bob$23 input //output$input.bob$34 input //output$input.alice$05 input //output$input.alice$16 input //output$input.alice$27 input //output$input.alice$38 gate arity 2 table [ 1 0 0 0 ] inputs [ 4 5 ]9 gate arity 2 table [ 0 1 1 0 ] inputs [ 4 5 ]

Page 22: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 23

March 1, 2005 10th Estonian Winter School in Computer Science

kth-ranked element (e.g. median)

• Inputs:– Alice: SA Bob: SB

– Large sets of unique items (D).• Output:

– x SA SB s.t. x has k-1 elements smaller than it.• The rank k

– Could depend on the size of input datasets. – Median: k = (|SA| + |SB|) / 2

• Motivation:– Basic statistical analysis of distributed data.– E.g. histogram of salaries in CS departments

• The Problem: Generic constructions using circuits [Yao …] yield an overhead which is at least linear in k.

Page 23: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 24

March 1, 2005 10th Estonian Winter School in Computer Science

An (insecure) two-party median protocol

RALASA

SB

mA

RBLB mB

LA lies below the median, RB lies above the median.

New median is same as original median.Recursion Need log n rounds

(assume each set contains n=2i items)

mA < mB

Page 24: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 25

March 1, 2005 10th Estonian Winter School in Computer Science

A Secure two-party median protocol

A finds its median mA

B finds itsmedian mB

mA < mB

A deletes elements ≤ mA.B deletes elements > mB.

A deletes elements > mA.B deletes elements ≤ mB.

YES

NO

Secure comparison(e.g. a small circuit)

Page 25: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 26

March 1, 2005 10th Estonian Winter School in Computer Science

An example

A B

mA>mB

mA<mB

mA<mB

mA>mB

mA<mB

Medianfound!!

8 9 16

1

16 161 1

Page 26: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 27

March 1, 2005 10th Estonian Winter School in Computer Science

Proof of security

A B

mA>mB

mA<mB

mA<mB

mA>mB

mA<mB

median

mA>mB

mA<mB

mA<mB

mA>mB

mA<mB

Page 27: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 28

March 1, 2005 10th Estonian Winter School in Computer Science

+

Arbitrary input size, arbitrary k

SA

SB

k

Now, compute the median of two sets of size k.

Size should be a power of 2.

median of new inputs = kth element of original inputs

2i

+

-

Page 28: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 29

March 1, 2005 10th Estonian Winter School in Computer Science

Hiding size of inputs

• Can search for kth element without revealing size of input sets.

• However, k=n/2 (median) reveals input size.• Solution: Let S=2i be a bound on input size.

|SA|S

-+

-+

|SB|

Median of new datasets is same

as median of original datasets.

Page 29: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 30

March 1, 2005 10th Estonian Winter School in Computer Science

Privacy preserving data mining

Confidential database D1

Wish to “mine” D1 D2 without revealing more info

Examples:• Medical databases protected by law• Competing businesses• Government agencies (privacy, “need to know”)

Confidential database D2

P1P2

Huge

Page 30: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 31

March 1, 2005 10th Estonian Winter School in Computer Science

The classification problem

Age > 30 Sex time insured Claim > $500

Did fraud occur?

C1 Yes M t [0,9] years No No

C2 No F t [10,19] years Yes Yes

… … … … … …

Cn Yes F t [20,29] years No No

Goal: based on available data design an algorithm to classify new data

Page 31: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 32

March 1, 2005 10th Estonian Winter School in Computer Science

Classification using Decision Trees

Time insured

No

[0,9] years > 20 years [10,19] years

Age > 30

No Yes

YesNo

Claim > $500

No Yes

YesNo

ID3: Choose attribute A thatminimizes the conditionalentropy of the attribute class

Page 32: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 33

March 1, 2005 10th Estonian Winter School in Computer Science

Privacy Preserving ID3

• Scenario: The inputs are private information of P1 and P2

• Main technical problem: Comparing entropies while preserving privacy. (entropy = x logx)

• Efficiency: – most computation done independently by parties.– The overhead of cryptographic operations depends only on

the size of the decision tree (not on the input size).

• Basic task: compute x log x.

x = x1+x2 = e.g., total number of customers with (age

> 30) and (fraud = yes)

Page 33: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 34

March 1, 2005 10th Estonian Winter School in Computer Science

Privacy Preserving ID3

•Computing x log x:– x = x1 + x2, known to P1 and P2

respectively (independently computed from databases).

– Might as well compute x lnx, or lnx.– First run a protocol to compute random

shares, y1 + y2 = ln x• ln x is Real. Crypto works over finite fields. Must do numerical analysis.

Page 34: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 35

March 1, 2005 10th Estonian Winter School in Computer Science

Cryptographic Tools

x

Implementation:Two passes, O(degree) (or O( log|F|) ) exponentiations.

A polynomial Q(·)

Q(x) and nothing else nothingInput:Output:

• Secure Function Evaluation (SFE) [Yao]• Oblivious Polynomial Evaluation [NP]

Page 35: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 36

March 1, 2005 10th Estonian Winter School in Computer Science

Computing random shares of lnx = ln(x1+x2)

Use Taylor approximation for lnx– x = x1 + x2 = 2 n (1+) -½ < < ½ – lnx = ln(2 n (1+)) = ln 2 n + ln(1+)

ln 2 n + i=1..k (-1) i-1 i / i

= ln 2 n + T()

• T() is a polynomial of degree k. Error is exponentially small in k.

• We only know how to work over finite fields• Compute c·lnx, where c compensates for fractions.• Work in F, where |F| sufficiently large.

Page 36: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 37

March 1, 2005 10th Estonian Winter School in Computer Science

ln(x1+x2) Protocol

• Step 1 of the protocol – Find n, – Apply Yao’s protocol to the following small circuit

• Input: x1 and x2• Output (random shares):

•random a1 and a2 s.t. a1 + a2 = x-2 n = ·2 n

•random b1 and b2 s.t. b1 + b2 = ln 2 n

• Operation: The protocol finds 2 n closest to x1+x2, computes 2 n = x1+x2- 2 n.

– x = x1 + x2 = 2 n + 2 n

– lnx = ln(2 n (1+)) = ln 2 n + ln(1+)

Page 37: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 38

March 1, 2005 10th Estonian Winter School in Computer Science

ln(x1+x2) Protocol (Cont.)

Step 2 of the protocol– Compute random shares of T() (Taylor approx.)

– P1 chooses a random w1 F and defines a polynomial Q(x), s.t. w1 +Q(a2) = T() (recall a1 + a2 = ·2 n)

– Namely, Q(x) = T( (a1+x)/2 n ) – w1 . – Run an oblivious poly evaluation in which P2

computes

w2 = Q(a2) = T() – w1 . – Now the parties have random w1 and w2 s.t.

– w1 + w2 = T() ln(1+)

– (b1 + w1) + (b2 + w2) ln 2 n + ln(1+) = ln x

Page 38: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 40

March 1, 2005 10th Estonian Winter School in Computer Science

The rest of the work..

• The parties compute shares of lnx• Then they compute shares of xlnx• Each party computes a share of the entropy by

summing shares of x lnx (H(X) = x lnx )• A small circuit finds the attribute giving the

minimal conditional entropy• The attribute is assigned to the node• The databases are divided according to the

value of this attribute

Page 39: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 41

March 1, 2005 10th Estonian Winter School in Computer Science

Efficiency

• lnx protocol: – secure computation of a small circuit– one oblivious polynomial evaluation

• ID3 for a database with: – 1,000,000 transactions– 15 attributes– 10 values per attribute– 4 class values

– Communication per node takes seconds (T1)

– Computation per node takes minutes (P3)

Page 40: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 42

March 1, 2005 10th Estonian Winter School in Computer Science

Contributions

• Cryptographic protocols where the bulk of the operations is done independently.

• Data mining– Rigorous model for secure data-mining.– Efficient, secure protocol for specific problems (median, ID3).

• Cryptography– Sub-linear complexity - secure computation for large data sets.– Efficient protocols for complex known algorithms.– Secure computation of logarithms (real function - numerical

analysis).• Drawbacks:

– Privacy preserving solutions are less efficient– It’s hard to find efficient private solutions for all interesting

functions – Security against malicious parties

Page 41: Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions

page 43

March 1, 2005 10th Estonian Winter School in Computer Science

References

• Lecture notes and overview papers:– B. Pinkas, Cryptographic Techniques for Privacy-Preserving

Data Mining, SIGKDD Explorations, January 2003. http://www.pinkas.net/PAPERS/sigkdd.pdf

– R. Cramer: Introduction to Secure Computation, 2000. http://homepages.cwi.nl/~cramer/papers/CRAMER_revised.ps

– Ivan Damgård, Theory and practice of multiparty computation, 8th EWSCS, http://www.cs.ioc.ee/yik/schools/win2003/damgard.php

• Research papers:– G. Aggarwal, N. Mishra and B. Pinkas, Secure Computation of

the K'th-ranked Element, Eurocrypt '2004. http://www.pinkas.net/PAPERS/ANP04.pdf

– Y. Lindell and B. Pinkas, Privacy Preserving Data Mining, Journal of Cryptology, Vol. 15 – No. 3, 2002. http://www.pinkas.net/PAPERS/id3-final.pdf