Privacy-preserving Release of Statistics: Differential Privacy€¦ · 10-10-2018  ·...

Preview:

Citation preview

Privacy-preservingReleaseofStatistics:DifferentialPrivacy

PiotrMardziel orAnupam DattaCMU

Fall2018

18734:FoundationsofPrivacy

Privacy-PreservingStatistics:Non-InteractiveSetting

2

Goals:• Accuratestatistics(lownoise)• Preserveindividualprivacy(whatdoesthatmean?)

Addnoise,sample,generalize,suppress

x1…xn

DatabaseDmaintainedbytrustedcurator

• Censusdata• Healthdata• Networkdata• …

AnalystSanitizedDatabaseD’

Privacy-PreservingStatistics:InteractiveSetting

3

Goals:• Accuratestatistics(lownoise)• Preserveindividualprivacy(whatdoesthatmean?)

Queryf

f(D)+noise

x1…xn

DatabaseDmaintainedbytrustedcurator

Analyst

#individualswithsalary>$30K

• Censusdata• Healthdata• Networkdata• …

Somepossibledefenses

• Anonymizedata– Re-identification,informationamplification

• Queriesoverlargedatasets– Differencingattack

• Queryauditing– Refusalleaks,computationaltractability

• Summarystatistics– Frequencylists

4

ClassicalIntuitionforPrivacy

• “IfthereleaseofstatisticsSmakesitpossibletodeterminethevalue[ofprivateinformation]moreaccuratelythanispossiblewithoutaccesstoS,adisclosurehastakenplace.”[Dalenius 1977]– Privacymeansthatanythingthatcanbelearnedaboutarespondentfromthestatisticaldatabasecanbelearnedwithoutaccesstothedatabase

• Similartosemanticsecurityofencryption

5

ImpossibilityResult[Dwork,Naor 2006]

• Result:Forreasonable“breach”,ifsanitizeddatabasecontainsinformationaboutdatabase,thensomeadversarybreaksthisdefinition

• Example– TerryGrossistwoinchesshorterthantheaverageLithuanianwoman

– DBallowscomputingaverageheightofaLithuanianwoman

– ThisDBbreaksTerryGross’sprivacyaccordingtothisdefinition…evenifherrecordisnot inthedatabase!

6

VeryInformalProofSketch• SupposeDBisuniformlyrandom• “Breach”ispredictingapredicateg(DB)– Example:g(DB)=“TerryGross’sheight=6feet”

• Adversary’sbackgroundknowledge:– r,[H(r;San(DB)) ⊕g(DB) ]whereHisasuitablehashfunction,r=H(DB)

Example:“TerryGrossistwoinchesshorterthantheaverageLithuanianwoman“

• Byitself,doesnotleakanythingaboutDB• TogetherwithSan(DB),revealsg(DB)– Example:San(DB)=“averageheightofaLithuanianwoman“

7

DifferentialPrivacy:Idea

Releasedstatisticisaboutthesameifanyindividual’srecord isremovedfromthedatabase

8

[Dwork,McSherry,Nissim,Smith2006]

AnInformationFlowIdea

Changinginputdatabasesinaspecificwaychangesoutputstatisticbyasmallamount

9

NotAbsoluteConfidentiality

DoesnotguaranteethatTerryGross’sheightwon’tbelearnedbytheadversary

10

DifferentialPrivacy:Definition

Randomizedsanitizationfunctionκ hasε-differentialprivacyifforalldatasetsD1 andD2 differingbyatmostoneelement andallsubsetsS oftherangeofκ,

Pr[κ(D1)∈ S]≤eε Pr[κ(D2)∈ S]

Answertoquery#individualswithsalary>$30Kisinrange[100,110]withapproximatelythesame

probabilityinD1 andD2

11

AchievingDifferentialPrivacy:InteractiveSetting

Howmuchandwhattypeofnoiseshouldbeadded?

Tellmef(D)

f(D)+noisex1…xn

DatabaseDUser

12

Example:NoiseAddition

13

Slide:AdamSmith

GlobalSensitivity

14

Slide:AdamSmith

Exercise

15

• Functionf:#individualswithsalary>$30K• GlobalSensitivityoff=?

• Answer:1

BackgroundonProbabilityTheory(seeOct11,2013recitation)

16

ContinuousProbabilityDistributions

• Probabilitydensityfunction(PDF),fX

• Exampledistributions– Normal,exponential,Gaussian,Laplace

17

LaplaceDistribution

18

Mean=μ

Variance=2b2

PDF=

Source:Wikipedia

LaplaceDistribution

19

Changeofnotationfrompreviousslide:xà yμà 0bà λ

AchievingDifferentialPrivacy

20

LaplaceMechanism

21

Slide:AdamSmith

LaplaceMechanism:ProofIdea

22

Pr[A(x)=t]Pr[A(x’)=t]

LaplaceMechanism:Moredetails

• Pr 𝐴 𝑥 ∈ 𝑆 = ∫ 𝑝 𝐴 𝑥 = 𝑡 𝑑𝑡,∈-

• 𝑝 𝐴 𝑥 = 𝑡 = 𝑝 𝐿 = 𝑡 − 𝑓 𝑥 = ℎ 𝑡 − 𝑓 𝑥

• 2(,45 6 )2(,45 68 )

≤:;< =?@ A

B

:;< =?@ A8

B

≤ exp 5 6 45 68

F≤ exp

G-@F

• HI J 6 ∈-HI J 68 ∈-

=∫ K J 6 L, M,=∈N

∫ K J 68 L, M,=∈N

=∫ 2 ,45(6) M,=∈N

∫ 2 ,45(68) M,=∈N

≤ expG-@F

• For𝜆 =G-@P,wehave𝜖-differentialprivacy

23

Example:NoiseAddition

24

Slide:AdamSmith

UsingGlobalSensitivity

• Manynaturalfunctionshavelowglobalsensitivity– Histogram,covariancematrix,stronglyconvexoptimizationproblems

25

CompositionTheorem

• IfA1 isε1-differentiallyprivateandA2 isε2-differentiallyprivateandtheyuseindependentrandomcoinsthen< A1,A2>is(ε1+ε2)-differentiallyprivate

• Repeatedqueryingdegradesprivacy;degradationisquantifiable

26

Applications

• Netflixdataset[McSherry,Mironov 2009;MSR]– Accuracyofdifferentiallyprivaterecommendations(wrtonemovierating)comparabletobaselinesetbyNetflix

• Networktracedatasets[McSherry,Mahajan2010;MSR]

27

Challenge:HighSensitivity

• Approach:Addnoiseproportionaltosensitivitytopreserveε-differentialprivacy

• Improvements:– Smoothsensitivity[Nissim,Raskhodnikova,Smith2007;BGU-PSU]

– Restrictedsensitivity[Blocki,Blum,Datta,Sheffet 2013;CMU]

28

Challenge:IdentifyinganIndividual’sInformation

• Informationaboutanindividualmaynotbejustintheirownrecord

– Example: Inasocialnetwork,informationaboutnodeAalsoinnodeBinfluenced byA,forexample,becauseAmayhavecausedalinkbetweenBandC

29

DifferentialPrivacy:Summary

• Anapproachtoreleasingprivacy-preservingstatistics

• Arigorousprivacyguarantee– SignificantactivityintheoreticalCScommunity

• Severalapplicationstorealdatasets– Recommendationsystems,networktracedata,..

• Somechallenges– Highsensitivity,identifyingindividual’sinformation,repeatedquerying

30

Recommended