Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
MondrianMul+dimensionalK‐Anonymity
KristenLefevre,DavidJ.DeWi<,andRaghuRamakrishnan
TableLinking
Overview
• Mo+va+on&contribu+ons• Terminology
• QualityMetrics
• Mul+dimensionalK‐Anonymiza+on
• GreedyPar++oningAlgorithm
• Performanceexperiments
Mo+va+on
• Protectthedataownersprivacyusingk‐anonymoustables.
• Achievehigher‐qualityofanonymzeddata.
• Provideanalgorithmforanonymizingtables.
Theprimarygoalofk‐anonymiza3onistoprotecttheprivacyoftheindividualstowhomthedatapertains.However,subjecttothisconstraint,itis
importantthatthereleaseddataremainas“useful”aspossible.
Contribu+ons
• Introducingmul+dimensionalk‐anymiza+on.• IntroducingagreedyalgorithmforK‐anonymiza+on:
• moreefficientthanproposedop0malk‐anonymiza0onalgorithmsforsingle‐dimensionalmodels;complexityO(nlogn),comparedtoexponen0al.• Thegreedymul0dimensionalalgorithmoAenproduceshigher‐qualityresultsthanop0malsingledimensionalalgorithms.
• Moretargetedno0onofqualitymeasurement.
Terminology
• QuasiIdenAfier:Minimalsetofa<ributesX1,…XdintableTthatcanbejoinedwithexternalinforma+ontore‐iden+fyindividualrecords.
• Equivalenceclass:thesetofalltuplesinTcontainingiden0calvalues(x1…xd)forX1…Xd.
• K‐AnonymityProperty:TableTisk‐anonymouswithrespecttoa<ributesX1…Xdifeveryuniquetuple(x1…xd)inthe(mul0set)projec0onofTonX1…Xdoccursatleastk0mes.
• K‐AnonymizaAon:AviewVofrela0onTissaidtobeak‐anonymiza0oniftheviewmodifiesorgeneralizesthedataofTaccordingtosomemodelsuchthatVisk‐anonymouswithrespecttothequasi‐iden+fier.
GeneralQualityMetrics
• DiscernabilityMetric:
• NormalizedAverageEquivalence:
K‐anonymiza+on
• globalrecoding:achievesanonymitybymappingthedomainsofthequasi‐iden+fiera<ributestogeneralizedoralteredvalues.
SingleVS.Mul+dimensionalK‐Anonymiza+on
• Single‐dimensional:Asingle‐dimensionalpar++oningdefines,foreachXi,asetofnon‐overlappingsingle‐dimensionalintervalsthatcoverDxi.øimapseachxЄDxitosumsummarysta0s0c.
• Mul0‐dimensional:
Aglobalrecodingachievesanonymitybymappingthedomainsofthequasi‐iden+fiera<ributestogeneralizedoralteredvalues.Øi:Dxix…xDxn→D’
SingleVS.Mul+dimensionalK‐Anonymiza+on(Cont.)
Single‐dimensionalPar++oning
• Asingle‐dimensionalpar++oningdefines,foreachXi,asetofnon‐overlappingsingledimensionalintervalsthatcoverDxi.ФimapseachxЄDxitosomesummarysta0s0cfortheintervalinwhichitiscontained.
StrictMul+dimensionalPar++oning
• Astrictmul+dimensionalpar++oningdefinesasetofnon‐overlappingmul+dimensionalregionsthatcoverDX1…DXd.Ømapseachtuple(x1…xd)2DX1…DXdtoasummarysta+s+cfortheregioninwhichitiscontained.
• Proposi3on1:Everysingle‐dimensionalpar00oningforquasi‐iden0fieraWributesX1…Xdcanbeexpressedasastrictmul0dimensionalpar00oning.
StrictMul+dimensionalPar++oning(Cont.)
NP‐Hard
Single‐dimensionalpar++oningvs.mul+dimensional
• Proposi+on1:Everysingle‐dimensionalpar00oningforquasi‐iden0fieraWributesX1…Xdcanbeexpressedasastrictmul0dimensionalpar00oning.However,whend>=2andforalli,|Dxi|>=2,thereexistsastrictmul0dimensionalpar00oningthatcannotbeexpressedasasingledimensionalpar00oning.
DecisionalK‐AnonymousMul+dimensionalPar++oning
• GivenasetPofunique(point,count)pairs,withpointsind‐dimensionalspace,foreveryresul0ngmul+dimensionalregionRi:– OR–
NP‐Complete
AllowableCut
• Mul+dimensional:AcutperpendiculartoaxisXiatxiisallowableifandonlyifCount(P.Xi>xi)>=kandCount(P.Xi<xi)>=k.
• Single‐Dimensional:Asingle‐dimensionalcutperpendiculartoXiatxiisallowable,givenS,if
MinimalPar++oning
• MinimalStrictMul+dimensionalPar++oning:• LetR1…Rndenoteasetofregionsinducedbyastrictmul0dimensionalpar++oning,andleteachregionRicontainmul+setPiofpoints.Thismul0dimensionalpar00oningisminimalifandthereexistsnoallowablemul+dimensionalcutforPi.
• MinimalSingle‐DimensionalPar00oning:• AsetSofallowablesingle‐dimensionalcutsisaminimalsingle‐dimensionalpar++oningformul+setPofpointsiftheredoesnotexistanallowablesingle‐dimensionalcutforPgivenS.
BoundsonPar++onsizeinMul+dimensionalK‐Anonymiza+on
BoundsonPar++onsizeinSingle‐DimensionalK‐anonymiza+on
<=2k‐1
RelaxedMul+dimensionalPar++oning
• Arelaxedmul+dimensionalpar++oningforrela+onTdefinesasetof(poten+allyoverlapping)dis0nctmul0dimensionalregionsthatcoverDX1…DXd.Localrecodingfunc0onФ’mapseachtuple(x1…xd)ЄTtoasummarysta0s0cforoneoftheregionsinwhichitiscontained.
• Proposi0on2:Everystrictmul0dimensionalpar00oningcanbeexpressedasarelaxedmul0dimensionalpar00oning.However,ifthereareatleasttwotuplesintableThavingthesamevectorofquasi‐iden0fiervalues,thereexistsarelaxedmul0dimensionalpar00oningthatcannotbeexpressedasastrictmul0dimensionalpar00oning.
GreedyPar++oningAlgorithm
Choosethedimensionwiththewidestrangeofvalues
BoundsonQuality
ScalabilityProblem
Tablemaybetoolargetofitintheavailablememory
Calculatethefrequencysetofa<ributesandloadonlythefrequencysetInmemory.
WorkloadDrivenQuality
• RangeSta+s+cs:– SelectAvg(Age)FromPa+entswheresex=‘male’
• MeanSta+s+cs– Selectcount(*)FromPa+entswheresex=‘male’andage<=26
Itisimpossibletoanswerthesecondquerypreciselyusingthesingle‐dimensionalrecoding.
ExperimentalEvalua+on
• Usedasynthe+cdatageneratortoproducetwodiscretejointdistribu+ons:discreteuniformanddiscretenormal.
• Alsotestedonadultsdatabase.
ExperimentalEvalua+onforSynthe+cdata
ExperimentalEvalua+onforAdultsDatabase
Op+malsingle‐dimensionalvs.Greedystrictmul+dimensionalpar++oning
Strengthsvs.Weaknesses
• Definestheprocessofk‐anonymityinalargerandmoreaccurateconcept.
• Mul+dimensionalapproachmakesuretoincludeminimalpointsinapar++onsotheoutputdataisbe<er.
AnyWeaknesses?
Q&A
Thankyou