42
© 2014 MapR Technologies 1 © 2014 MapR Technologies Managing Security – Sharing Data Ted Dunning

Sharing Sensitive Data Securely

Embed Size (px)

Citation preview

Page 1: Sharing Sensitive Data Securely

© 2014 MapR Technologies 1© 2014 MapR Technologies

Managing Security – Sharing DataTed Dunning

Page 2: Sharing Sensitive Data Securely

© 2014 MapR Technologies 2

Agenda• Two kinds of security failure

– Buried treasure• But what could go wrong?

– Horror stories• Sharing into controlled environments

– Views, masking and fine-grained control• Sharing without sharing

– When masking is not sufficient• Summary

Page 3: Sharing Sensitive Data Securely

© 2014 MapR Technologies 3

Locked Up Tight – The Cheapside Hoard• Between 1640 and 1666 somebody hid

a cache of jewels under the floorof 30-32 Cheapside Road

• They never came back for them …

• The hoard was found by workmen in 1910• Did the owners forget where they were?• Why didn’t their heirs or partners recover them?

Page 4: Sharing Sensitive Data Securely

© 2014 MapR Technologies 4

The Other Kind of Security Failure• Security can fail when there is a leak

– Enigma decryption– Retail data compromise– Klaus Fuchs

• Security also fails when data is not shared– AKA siloing– The many threads of 9/11– The Cheapside hoard– Invisible technological opportunity cost

Page 5: Sharing Sensitive Data Securely

© 2014 MapR Technologies 5

Netflix• Shared anonymized data• Huge boost in state of the art for some kinds of

recommendations

• Anonymization shown to be weak barrier

• Lawsuit, security clamp-down everywhere

Page 6: Sharing Sensitive Data Securely

© 2014 MapR Technologies 6

Reference Data Attack

Page 7: Sharing Sensitive Data Securely

© 2014 MapR Technologies 7

The Moral

• If there is something to correlate, anonymization may fail

• When I say “may”, you should read “will”

Page 8: Sharing Sensitive Data Securely

© 2014 MapR Technologies 8

NY Cab• Hack license and medallion number hashed using MD-5• No correlation data to work with

• But cab (medallion) numbers have only a few forms

• So we can generate hashes for all 20 million (or so) medallions

Page 9: Sharing Sensitive Data Securely

© 2014 MapR Technologies 9

So What?• What correlations are there?

• NYC medallions are public information anyway

• Taxis operate in the public realm

Page 10: Sharing Sensitive Data Securely

© 2014 MapR Technologies 10

So What?

Page 11: Sharing Sensitive Data Securely

© 2014 MapR Technologies 11

Paparrazo + Timestamp + Taxi = Who and Where

See http://gawker.com/the-public-nyc-taxicab-database-that-accidentally-track-1646724546http://research.neustar.biz/2014/09/15/riding-with-the-stars-passenger-privacy-in-the-nyc-taxicab-dataset/

Page 12: Sharing Sensitive Data Securely

© 2014 MapR Technologies 12

Extended Moral• Correlations are more common than we thought

• Masking PII is not sufficient for public datasets

• Theoretically, no solution is possible• Pragmatically, never bet against cleverness

• Must change the game

Page 13: Sharing Sensitive Data Securely

© 2014 MapR Technologies 13

Alternative Strategies

Public disclosure + Simple masking

Public disclosure + Simple masking

Public disclosure + Simple masking

Page 14: Sharing Sensitive Data Securely

© 2014 MapR Technologies 14

Key Elements of Masking• Opaque or format preserving?• Random or reversible or one-way?

• Simple omission?

• Right to be forgotten?

Page 15: Sharing Sensitive Data Securely

© 2014 MapR Technologies 15

Releasing Public Data• Why?

– Required– For research– For support

• How?– New technology based on KPI-preserving random data

• Three use cases

Page 16: Sharing Sensitive Data Securely

© 2014 MapR Technologies 16

Secure Development is Hard

Page 17: Sharing Sensitive Data Securely

© 2014 MapR Technologies 17

Secure Development is Hard

Outside collaborators are outside the security perimeter

They can’t see the data and they can’t tune new algorithms to fit reality

Page 18: Sharing Sensitive Data Securely

© 2014 MapR Technologies 18

How To Make Realistic Data

Page 19: Sharing Sensitive Data Securely

© 2014 MapR Technologies 19

Parametric Simulation

Parametric matching of failure signatures allows emulation of complex data properties

Matching on KPI’s and failure modes guarantees practical fidelity

Page 20: Sharing Sensitive Data Securely

© 2014 MapR Technologies 20

The Method• Pick realistic and important KPI’s and failure measures

– False positive rate– Scale invariant score distribution– Internal performance metrics (# of candidates searched, similar)

• Build emulation roughly based on real system• Tune data spec to match KPI’s using real models• Export data spec to alternative models• Re-tune data spec to match on alternative models

Page 21: Sharing Sensitive Data Securely

© 2014 MapR Technologies 21

Example #1 – Query failure• Performance index is query failure with particular stack signature

• Tuning knobs include– Table sizes– Data distributions– (potentially) field value realism– (potentially) field cross correlations

Page 22: Sharing Sensitive Data Securely

© 2014 MapR Technologies 22

The Original ConversationThem UsHive broke, fix it.

Page 23: Sharing Sensitive Data Securely

© 2014 MapR Technologies 23

The Original ConversationThem UsHive broke, fix it. Sure! Can I see the data?

No.

Page 24: Sharing Sensitive Data Securely

© 2014 MapR Technologies 24

The Original ConversationThem UsHive broke, fix it. Sure! Can I see the data?

No. OK. Can I see the stack trace?

No.

Page 25: Sharing Sensitive Data Securely

© 2014 MapR Technologies 25

The Original ConversationThem UsHive broke, fix it. Sure! Can I see the data?

No. OK. Can I see the stack trace?

No. Can I log in to the system?

No.

Page 26: Sharing Sensitive Data Securely

© 2014 MapR Technologies 26

The Original ConversationThem UsHive broke, fix it. Sure! Can I see the data?

No. OK. Can I see the stack trace?

No. Can I log in to the system?

No. What do you want me to do?

Fix it.

Page 27: Sharing Sensitive Data Securely

© 2014 MapR Technologies 27

The Broken Query

Page 28: Sharing Sensitive Data Securely

© 2014 MapR Technologies 28

A Simpler Example Schema

Page 29: Sharing Sensitive Data Securely

© 2014 MapR Technologies 29

A Simpler Example[ {"name":"customer_id", "class":"id"}, {"name":"name", "class":"name", "type":"first_last"}, {"name":"street", "class":"address"}, {"class":"flatten", "value": { "class":"zip", "fields":"city,state,zip"}}]

[ {"name":"sales_id", "class":"id"}, {"name":"customer_id", "class":"foreign-key", "size":"$customers"}, {"name":"time_id", "class":"foreign-key", "size":"$times"}, {"name":"store_id", "class":"foreign-key", "size":"$stores"}, {"name":"item_id", "class":"foreign-key", "size":"$items"}, {"name":"quantity", "class":"int", "skew":0.5}, {"name":"unit_price", "class":"gamma", "dof":1, "scale":10}, {"name":"discount", "class":"uniform", "min":0, "max":20}, {"name":"exact_time", "class":"event", "start": "2014-01-01", "format":"yyyy-MM-dd HH:mm:ss", "rate": "10/d"}]

Page 30: Sharing Sensitive Data Securely

© 2014 MapR Technologies 30

Data Flow

Page 31: Sharing Sensitive Data Securely

© 2014 MapR Technologies 31

Sample Data

customer_id,name,street,zip,city,state0,"Mark Long","8578 Pied River Flats","02630","BARNSTABLE","MA"1,"Chris Lanier","90018 Lost Treasure Corner","06083","ENFIELD","CT"2,"Bryant Brandon","30712 Bright Shadow Stroll","93922","CARMEL","CA"3,"Norman Horn","66871 Dewy Bird Shoal","59727","DIVIDE","MT"4,"Carmen Nowell","6053 Velvet Barn Glen","29329","CONVERSE","SC"

Page 32: Sharing Sensitive Data Securely

© 2014 MapR Technologies 32

Results• We had to match size, number of records, rough levels of skew

• Bug was in query planner– For particular values of relative table size, planner messed up

• Once we had the fault, we could slim down the tables– Final example had 3 tables, 1000 records in larges

Page 33: Sharing Sensitive Data Securely

© 2014 MapR Technologies 33

Common Point of Compromise• Scenario:

– Merchant 0 is compromised, leaks account data during compromise– Fraud committed elsewhere during exploit– High background level of fraud– Limited detection rate for exploits

• Goal:– Find merchant 0

• Meta-goal:– Screen algorithms for this task without leaking sensitive data

Page 34: Sharing Sensitive Data Securely

© 2014 MapR Technologies 34

Simulation Setup

Page 35: Sharing Sensitive Data Securely

© 2014 MapR Technologies 35

Simulation Strategy• For each consumer

– Pick consumer parameters such as transaction rate, preferences– Generate transactions until end of sim-time

• If merchant 0 during compromise time, possibly mark as compromised• For all transactions, possible mark as fraud, probability depends on history• Merchants are selected using hierarchical Pittman-Yor

• Restate data– Flatten transaction streams– Sort by time

• Tunables– Compromise probability, transaction rates, background fraud, detection

probability

Page 36: Sharing Sensitive Data Securely

© 2014 MapR Technologies 36

Performance Indicators to Match• User and merchant population• Transaction count/consumer• Merchant propensity skew• Level of detected fraud• Spectrum of meta-model scores

Page 37: Sharing Sensitive Data Securely

© 2014 MapR Technologies 37

Page 38: Sharing Sensitive Data Securely

© 2014 MapR Technologies 38

Real bad guys

Page 39: Sharing Sensitive Data Securely

© 2014 MapR Technologies 39

Results• We matched general mechanism, rough transaction rates

• Model was tuned on synthetic data, tested on live data

• We found real bad guys on the first try

Page 40: Sharing Sensitive Data Securely

© 2014 MapR Technologies 40

Summary• Security can fail through too much and

too little access• Sharing widely can have significant benefits and

substantial risks• New levels of control available for masking and filtering

of big data via Drill views• Synthetic data with KPI matching provides sharing of

realistic data without risk

Page 41: Sharing Sensitive Data Securely

© 2014 MapR Technologies 41

Questions

Page 42: Sharing Sensitive Data Securely

© 2014 MapR Technologies 42

Thank You@mapr maprtech

[email protected]@apache.org

Ted Dunning, Chief Application Architect

MapRTechnologies

maprtech

mapr-technologies