30
Association Rule Mining with Privacy Preservation In Horizontally Distributed Databases Group 1 – Abhra Basak, Apoorva Kumar, Sachin K. Saini, Shiv Sankar, Suraj B. Malode

Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

Embed Size (px)

DESCRIPTION

Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

Citation preview

Page 1: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

Association Rule Mining with Privacy Preservation

In Horizontally Distributed Databases

Group 1 – Abhra Basak, Apoorva Kumar, Sachin K. Saini, Shiv Sankar, Suraj B. Malode

Page 2: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

Introduction

Look before you leap

Page 3: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

The Flow

Association Rule Mining

Privacy Preservation

Horizontally Distributed Datasets

Page 4: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

Before we start mining!

trends or patterns in large datasets

extracting useful information

useful and unexpected

insights

analyze and predicting system

behavior

Data Mining

Scalability ?

Artificial Engineeri

ng

Machine Learning

Statistics

Database Systems

Page 5: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

Association Rule Learning

By Rakesh Agarwal, IBM Almaden Research Center

Page 6: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

• 80% of people who buy bread + butter, buy milk

• {Bread, Butter} → {Milk}

What is an Association Rule?

Antecedent

Consequent

Antecedent

Consequent

Page 7: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

Definitions

• 80% of people who buy bread + butter, buy milk

• {Bread, Butter} → {Milk}

Antecedent

• Prerequisites for the rule to be applied

Consequent

• The outcome

Support

• Percentage of transaction containing the itemset

Confidence

• Faction of transaction satisfying the rule

Page 8: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

• Two different forms of constraints are used to generate the required association rules

• Syntactic Constraints: Restricts the attributes that may be present in a rule.

• Support Constraints: No of transactions that support a rule from the set of transactions.

Constraints

Page 9: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

Association Rule Learning in Large Datasets

large datasets

• To find association rules

Generating Large Items

et

• combinations of itemsets which are above a minimum support threshold

Generating Association Rules

• Mining all rules which are satisfied in that itemset

Page 10: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

Association Rule Learning in Distributed Datasets

And Privacy Preservation

Page 11: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

• Most tools used for mining association rules assume that data to be analyzed can be collected at one central site.

• But issues like Privacy Preservation restrict the collection of data.

• Alternative methods for mining have to be devised for distributed datasets to the mining process feasible while ensuring privacy.

Preview

Page 12: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

• Dataset• Combined data of Twitter and Facebook

• Rule• How many percentage of people login into a social

networking site and post within the next 2 minutes?

Privacy Preservation

Page 13: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

• Horizontally Partitioned (Example: Insurance Companies)

• Rule Being Mined: Does a procedure have an unusual rate

of complication?

• Implications:

• A company may have high cases of the procedure

failing and they may change policies to help.

• At the same time if this rule is exposed it may be a

huge problem for the company.

• The risks outweigh the gains.

Privacy Preservation

Patient ID

Disease Prescription

Effect

Patient ID

Disease Prescription

Effect

Patient ID

Disease Prescription

Effect

Company A

Company C

Company B

Page 14: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

• Vertically Partitioned

Privacy Preservation

Credit Card No. Bought tablet

2365987545623526 1

3639871526589414 1

4365845698742563 1

5962845632561200 1

6621563289657412 1

Credit Card No. Bought TCover

2365987545623526 0

7639871526589414 1

4365845698742563 1

9962845632561200 0

6621563289657412 1

Common Property

Not One We can exploit.

Page 15: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

Mining of Association Rules

In Horizontally Partitioned Databases

Page 16: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

What we want• Computing Association Rules without revealing private information and

getting • The global support • The global confidence

What we have• Only the following information is available

• Local Support • Local Confidence• Size of the DB

Fundamental Steps

Even this information may not be shared freely between sites. But we’ll get to that.

Page 17: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

Calculating Required Values

𝑠𝑢𝑝𝑝𝑜𝑟𝑡𝐴𝐵⇒ C=∑i=1

sites

supportcount ABC (i )

∑i=1

sites

database¿ (i ¿)

𝑠𝑢𝑝𝑝𝑜𝑟𝑡𝐴𝐵=∑i=1

sites

supportcount AB (i)

∑i=1

sites

database¿i ¿¿

𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒𝐴𝐵⇒C=support AB⇒ Csupport AB

Page 18: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

• It protects individual privacy but each site has to disclose information.

• It reveals the local support and confidence in a rule at each site.

• This information if revealed can be harmful to an organization.

Problems with the approach

Page 19: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

• We will be exploring two algorithms that have been used.

• One algorithm that has been used incorporates encryption with data distortion

while data sharing between sites.

• The second algorithm uses a particular Check Sum as the method of

encryption.

Introducing the two Algorithms

Page 20: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

Algorithm Uno

Some people are honest

Page 21: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

• Phase 1: Uses encryption for mining of the large itemsets

• Phase 2: Uses a random number to preserve the privacy of each site (assuming a 3 or more party system)

Two phased algorithm

Page 22: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

Phase 1: Commutative Encryption

Page 23: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

Phase 2: Data Distortion

Site AABC:5

Size=100

Site BABC:6

Size=200

Site CABC:20

Size=300

R+count-5%*Size=17+5-5%*100

13+20-5%*300 17+6-5%*20013

1718 >= R

R=17

Page 24: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

• Doesn’t work for a 2 party system

• Assumes honest parties

• Assumes Boolean responses to variable for support of rules

rather than a subjective or weighted approach.

• As the no of candidate itemsets increases the encryption

overhead increases.

• The encryption overhead also varies directly proportional to the

no of sites or partitions.

Problems with the Algorithm

I got ……

Page 25: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

Algorithm Dua

Don’t trust anyone

Page 26: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

• Primarily used for to tackle semi honest sites.

• Data of each site is broken down into segments.

• Two interleaved nodes have a probability of hacking the one in between them.

• The neighbors are changed for each round. Hence, they can only obtain one such

segment.

CK Secure Sum

Page 27: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

P1

P2

P3

P4

Changing Neighbors

P1

P2

P4

P3

P1

P4

P2

P3

Round 1

Round 2

Round 3

Page 28: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

Conclusion

The moral of the story...

Page 29: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

Before you leave

• It is interesting that association rules play a vital role in data mining.

• Through this, what appears to be unrelated can have a logical explanation

through careful analysis.

• This aspect of data mining can be very useful in predicting patterns and

foreseeing trends in consumer behavior, choices and preferences.

• Association rules are indeed one of the best ways to succeed in business and

enjoy the harvest from data mining.

Page 30: Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

There are no dumb questions

(No questions please shhhh…)