Robustness Meets Algorithms - Massachusetts Institute of ...people.csail.mit.edu/moitra/docs/robusttutorialpt1.pdf · Robustness Meets Algorithms Ankur Moitra (MIT) ... Given samples

RobustnessMeetsAlgorithms

AnkurMoitra(MIT)

ICML2017Tutorial,August6th

CLASSICPARAMETERESTIMATIONGivensamplesfromanunknowndistributioninsomeclass

e.g.a1-DGaussian

canweaccuratelyestimateitsparameters?


e.g.a1-DGaussian

canweaccuratelyestimateitsparameters? Yes!


e.g.a1-DGaussian


empiricalmean: empiricalvariance:

Yes!

Themaximumlikelihoodestimatorisasymptoticallyefficient(1910-1920)

R.A.Fisher

Themaximumlikelihoodestimatorisasymptoticallyefficient(1910-1920)

R.A.Fisher J.W.Tukey

Whatabouterrors inthemodelitself?(1960)

ROBUSTSTATISTICS

Whatestimatorsbehavewellinaneighborhood aroundthe model?

ROBUSTSTATISTICS

Whatestimatorsbehavewellinaneighborhood aroundthe model?

Let’sstudyasimpleone-dimensionalexample….

ROBUSTPARAMETERESTIMATIONGivencorrupted samplesfroma1-DGaussian:


=+idealmodel noise observedmodel

Howdoweconstrainthenoise?


Equivalently:

L1-normofnoiseatmostO(ε)


Equivalently:

L1-normofnoiseatmostO(ε) ArbitrarilycorruptO(ε)-fractionofsamples(inexpectation)


Equivalently:

ThisgeneralizesHuber’sContaminationModel:Anadversarycanadd anε-fractionofsamples



Equivalently:

ThisgeneralizesHuber’sContaminationModel:Anadversarycanadd anε-fractionofsamples


Outliers:Pointsadversaryhascorrupted,Inliers:Pointshehasn’t

Inwhatnormdowewanttheparameterstobeclose?


Definition:Thetotalvariationdistancebetweentwodistributionswithpdfs f(x)andg(x)is


FromtheboundontheL1-normofthenoise,wehave:

observedideal




estimate ideal

Goal:Finda1-DGaussianthatsatisfies


estimate observed


Equivalently,finda1-DGaussianthatsatisfies

Dotheempiricalmeanandempiricalvariancework?


No!


No!



No!


Asinglecorruptedsamplecanarbitrarilycorrupttheestimates


No!



Butthemedian andmedianabsolutedeviationdowork


No!



Butthemedian andmedianabsolutedeviationdowork

Fact[Folklore]:Givensamplesfromadistributionthatareε-closeintotalvariationdistancetoa1-DGaussian

themedianandMADrecoverestimatesthatsatisfy

where



where

Alsocalled(properly)agnosticallylearninga1-DGaussian



where

Whataboutrobustestimationinhigh-dimensions?

Whataboutrobustestimationinhigh-dimensions?

e.g.microarrayswith10kgenes



where

OUTLINE

PartI:Introduction

� RobustEstimationinOne-dimension� Robustnessvs.HardnessinHigh-dimensions

� RecentResults

PartII:AgnosticallyLearningaGaussian

� ParameterDistance� DetectingWhenanEstimatorisCompromised

� AWin-WinAlgorithm� UnknownCovariance

PartIII:Experiments PartIV:Extensions

OUTLINE

PartI:Introduction


� RecentResults





MainProblem:Givensamplesfromadistributionthatareε-closeintotalvariationdistancetoad-dimensionalGaussian

giveanefficientalgorithmtofindparametersthatsatisfy

MainProblem:Givensamplesfromadistributionthatareε-closeintotalvariationdistancetoad-dimensionalGaussian

giveanefficientalgorithmtofindparametersthatsatisfy

SpecialCases:

(1)Unknownmean

(2)Unknowncovariance

ACOMPENDIUMOFAPPROACHES

ErrorGuarantee

RunningTime

UnknownMean


ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean


ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

O(ε)


ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

O(ε) NP-Hard


ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

O(ε) NP-Hard

GeometricMedian


ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

O(ε) NP-Hard

GeometricMedian poly(d,N)


ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

O(ε) NP-Hard

GeometricMedian poly(d,N)O(ε√d)


ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

O(ε) NP-Hard


Tournament O(ε) NO(d)


ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

O(ε) NP-Hard



O(ε√d)Pruning O(dN)


ErrorGuarantee

RunningTime

TukeyMedian O(ε) NP-Hard

GeometricMedian O(ε√d) poly(d,N)


O(ε√d)Pruning O(dN)

UnknownMean

…

ThePriceofRobustness?

Allknownestimatorsarehardtocomputeorlosepolynomial factorsinthedimension



Equivalently:Computationallyefficientestimatorscanonlyhandle

fractionoferrorsandgetnon-trivial(TV<1)guarantees









Isrobustestimationalgorithmicallypossibleinhigh-dimensions?

OUTLINE

PartI:Introduction


� RecentResults





OUTLINE

PartI:Introduction


� RecentResults





RECENTRESULTS

Theorem[Diakonikolas,Li,Kamath,Kane,Moitra,Stewart‘16]:Thereisanalgorithmwhengivensamplesfromadistributionthatisε-closeintotalvariationdistancetoad-dimensionalGaussianfindsparametersthatsatisfy

Robustestimationishigh-dimensionsisalgorithmicallypossible!

Moreoverthealgorithmrunsintimepoly(N,d)

RECENTRESULTS

Theorem[Diakonikolas,Li,Kamath,Kane,Moitra,Stewart‘16]:Thereisanalgorithmwhengivensamplesfromadistributionthatisε-closeintotalvariationdistancetoad-dimensionalGaussianfindsparametersthatsatisfy

Robustestimationishigh-dimensionsisalgorithmicallypossible!


Alternatively:CanapproximatetheTukeymedian,etc,ininterestingsemi-randommodels

Independentlyandconcurrently:

Theorem[Lai,Rao,Vempala ‘16]:Thereisanalgorithmwhengivensamplesfromadistributionthatisε-closeintotal

variationdistancetoad-dimensionalGaussianfindsparametersthatsatisfy


Independentlyandconcurrently:

Theorem[Lai,Rao,Vempala ‘16]:Thereisanalgorithmwhengivensamplesfromadistributionthatisε-closeintotal

variationdistancetoad-dimensionalGaussianfindsparametersthatsatisfy


Whenthecovarianceisbounded,thistranslatesto:

AGENERALRECIPE

Robustestimationinhigh-dimensions:

� Step#1:Findanappropriateparameterdistance

� Step#2:Detectwhenthenaïveestimatorhasbeencompromised

� Step#3:Findgoodparameters,ormakeprogressFiltering:FastandpracticalConvexProgramming:Bettersamplecomplexity

AGENERALRECIPE





Let’sseehowthisworksforunknownmean…

OUTLINE

PartI:Introduction


� RecentResults





OUTLINE

PartI:Introduction


� RecentResults





PARAMETERDISTANCE

Step#1:FindanappropriateparameterdistanceforGaussians

PARAMETERDISTANCE


ABasicFact:

(1)

PARAMETERDISTANCE


ABasicFact:

(1)

ThiscanbeprovenusingPinsker’s Inequality

andthewell-knownformulaforKL-divergencebetweenGaussians

PARAMETERDISTANCE


ABasicFact:

(1)

PARAMETERDISTANCE


ABasicFact:

(1)

Corollary:Ifourestimate(intheunknownmeancase)satisfies

then

PARAMETERDISTANCE


ABasicFact:

(1)

Corollary:Ifourestimate(intheunknownmeancase)satisfies

then

OurnewgoalistobecloseinEuclideandistance

OUTLINE

PartI:Introduction


� RecentResults





OUTLINE

PartI:Introduction


� RecentResults





DETECTINGCORRUPTIONS

Step#2:Detectwhenthenaïveestimatorhasbeencompromised



=uncorrupted=corrupted



=uncorrupted=corrupted

Thereisadirectionoflarge(>1)variance

KeyLemma:IfX1,X2,…XN comefromadistributionthatisε-closetoandthenfor

(1) (2)

withprobabilityatleast1-δ

KeyLemma:IfX1,X2,…XN comefromadistributionthatisε-closetoandthenfor

(1) (2)

withprobabilityatleast1-δ

Take-away:Anadversaryneedstomessupthesecondmomentinordertocorruptthefirstmoment

OUTLINE

PartI:Introduction


� RecentResults





OUTLINE

PartI:Introduction


� RecentResults





AWIN-WINALGORITHM

Step#3:Eitherfindgoodparameters,orremovemanyoutliers

AWIN-WINALGORITHM


FilteringApproach:Supposethat:

AWIN-WINALGORITHM



Wecanthrowoutmorecorruptedthanuncorruptedpoints:

v

wherevisthedirectionoflargestvariance

AWIN-WINALGORITHM




v

wherevisthedirectionoflargestvariance,andThasaformula

AWIN-WINALGORITHM




v

T

wherevisthedirectionoflargestvariance,andThasaformula

AWIN-WINALGORITHM



Wecanthrowoutmorecorruptedthanuncorruptedpoints

AWIN-WINALGORITHM




Ifwecontinuetoolong,we’dhavenocorruptedpointsleft!

AWIN-WINALGORITHM





Eventuallywefind(certifiably)goodparameters

AWIN-WINALGORITHM






RunningTime: SampleComplexity:

AWIN-WINALGORITHM






RunningTime: SampleComplexity:ConcentrationofLTFs

OUTLINE

PartI:Introduction


� RecentResults





OUTLINE

PartI:Introduction


� RecentResults





AGENERALRECIPE





AGENERALRECIPE





Howaboutforunknowncovariance?

PARAMETERDISTANCE


PARAMETERDISTANCE


AnotherBasicFact:

(2)

PARAMETERDISTANCE


AnotherBasicFact:

Again,provenusingPinsker’s Inequality

(2)

PARAMETERDISTANCE


AnotherBasicFact:


(2)

Ournewgoalistofindanestimatethatsatisfies:

PARAMETERDISTANCE


AnotherBasicFact:


(2)

Ournewgoalistofindanestimatethatsatisfies:

Distanceseemsstrange,butit’stherightonetousetoboundTV

UNKNOWNCOVARIANCE

Whatifwearegivensamplesfrom?

UNKNOWNCOVARIANCE


Howdowedetectifthenaïveestimatoriscompromised?

UNKNOWNCOVARIANCE



KeyFact:Let and

Thenrestrictedtoflattenings ofdxdsymmetricmatrices

UNKNOWNCOVARIANCE



KeyFact:Let and


ProofusesIsserlis’s Theorem

UNKNOWNCOVARIANCE

needtoprojectout



KeyFact:Let and


KeyIdea: Transformthedata,lookforrestrictedlargeeigenvalues



Ifwerethetruecovariance,wewouldhaveforinliers


Ifwerethetruecovariance,wewouldhaveforinliers,inwhichcase:

wouldhavesmallrestrictedeigenvalues


Ifwerethetruecovariance,wewouldhaveforinliers,inwhichcase:

wouldhavesmallrestrictedeigenvalues

Take-away:Anadversaryneedstomessupthe(restricted)fourthmomentinordertocorruptthesecondmoment

ASSEMBLINGTHEALGORITHM

Givensamplesthatareε-closeintotalvariationdistancetoad-dimensionalGaussian



Step#1:Doublingtrick




Nowusealgorithmforunknowncovariance





Step#2:(Agnostic)isotropicposition






rightdistance,ingeneralcase






Nowusealgorithmforunknownmeanrightdistance,ingeneralcase

OUTLINE

PartI:Introduction


� RecentResults





OUTLINE

PartI:Introduction


� RecentResults





SYNTHETICEXPERIMENTS

Errorratesonsyntheticdata(unknownmean):

+10%noise


Errorratesonsyntheticdata(unknownmean):

100 200 300 400

0

0.5

1

1.5

dimension

excess` 2

error

Filtering

LRVMean

Sample mean w/ noise

Pruning

RANSAC Geometric Median

100 200 300 400

0.04

0.06

0.08

0.1

0.12

0.14

dimension

excess` 2

error


Errorratesonsyntheticdata(unknowncovariance,isotropic):

+10%noise

closetoidentity


20 40 60 80 100

0

0.5

1

1.5

dimension

excess` 2

error

Filtering

LRVCov

Sample covariance w/ noise

Pruning

RANSAC

20 40 60 80 100

0

0.1

0.2

0.3

0.4

dimension

excess` 2

error

Errorratesonsyntheticdata(unknowncovariance,isotropic):


Errorratesonsyntheticdata(unknowncovariance,anisotropic):

+10%noise

farfromidentity


20 40 60 80 100

0

50

100

150

200

dimension

excess` 2

error

Filtering

LRVCov

Sample covariance w/ noise

Pruning

RANSAC

20 40 60 80 100

0

0.5

1

dimension

excess` 2

error

Errorratesonsyntheticdata(unknowncovariance,anisotropic):

REALDATAEXPERIMENTS

Famousstudyof[Novembre etal.‘08]:TaketoptwosingularvectorsofpeoplexSNPmatrix(POPRES)

REALDATAEXPERIMENTS


-0.2

-0.1

0

0.1

0.2

0.3

-0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2

Original Data

REALDATAEXPERIMENTS


-0.2

-0.1

0

0.1

0.2

0.3

-0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2

Original Data

REALDATAEXPERIMENTS


-0.2

-0.1

0

0.1

0.2

0.3

-0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2

Original Data

“GenesMirrorGeographyinEurope”

REALDATAEXPERIMENTS

Canwefindsuchpatternsinthepresenceofnoise?

REALDATAEXPERIMENTS


-0.2 -0.1 0 0.1 0.2 0.3-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2Pruning Projection

10%noise

WhatPCAfinds

REALDATAEXPERIMENTS


-0.2 -0.1 0 0.1 0.2 0.3-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2Pruning Projection

10%noise

WhatPCAfinds

-0.2 -0.1 0 0.1 0.2 0.3

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2RANSAC Projection

REALDATAEXPERIMENTS


10%noise

WhatRANSACfinds

-0.2

-0.1

0

0.1

0.2

0.3

-0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2

XCS Projection

REALDATAEXPERIMENTS


10%noise

WhatrobustPCA(viaSDPs)finds

-0.2

-0.1

0

0.1

0.2

0.3

-0.15-0.1-0.0500.050.10.150.2

Filter Projection

REALDATAEXPERIMENTS


10%noise

Whatourmethodsfind

-0.2

-0.1

0

0.1

0.2

0.3

-0.15-0.1-0.0500.050.10.150.2

Filter Projection

-0.2

-0.1

0

0.1

0.2

0.3

-0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2

Original Data

REALDATAEXPERIMENTS

10%noise

Whatourmethodsfind

nonoise

Thepowerofprovablyrobustestimation:

LOOKINGFORWARD

CanalgorithmsforagnosticallylearningaGaussianhelpinexploratorydataanalysisinhigh-dimensions?

LOOKINGFORWARD


Isn’tthiswhatwewouldhavebeendoingwithrobuststatisticalestimators,ifwehadthemallalong?

OUTLINE

PartI:Introduction


� RecentResults





OUTLINE

PartI:Introduction


� RecentResults





LIMITATIONSTOROBUSTESTIMATION

Theorem[Diakonikolas,Kane,Stewart‘16]:Anystatisticalquerylearning* algorithminthestrongcorruptionmodel

thatmakeserrormustmakeatleastqueries

insertionsanddeletions




*Insteadofseeingsamplesdirectly,analgorithmqueriesafnctn

andgetsexpectation,uptosamplingnoise





*Insteadofseeingsamplesdirectly,analgorithmqueriesafnctn

andgetsexpectation,uptosamplingnoise

Thisisapowerfulbutrestrictedclassofalgorithms


HANDLINGMORECORRUPTIONS

Whatifanadversaryisallowedtocorruptmorethanhalfofthesamples?


Theorem[Charikar,Steinhardt,Valiant‘17]:Givensamplesfromadistributionwithmeanandcovariancewherehavebeencorrupted,thereisanalgorithmthatoutputs


with thatsatisfies



Thisextendstomixturesstraightforwardly

Theorem[Charikar,Steinhardt,Valiant‘17]:Givensamplesfromadistributionwithmeanandcovariancewherehavebeencorrupted,thereisanalgorithmthatoutputs

with thatsatisfies

SPARSEROBUSTESTIMATION

Canweimprovethesamplecomplexitywithsparsity assumptions?

Theorem[Li‘17] [Du,Balakrishnan,Singh’17]:Thereisanalgorithm,intheunknownk-sparsemeancaseachieveserror

withsamples



withsamples

[Li‘17] alsostudiedrobustsparsePCA




withsamples

[Li‘17] alsostudiedrobustsparsePCA

Isitpossibletoimprovethesamplecomplexitytooraretherecomputationalvs.statisticaltradeoffs?


LOOKINGFORWARD



LOOKINGFORWARD



Whatotherfundamentaltasksinhigh-dimensionalstatisticscanbesolvedprovablyandrobustly?

Summary:� DimensionindependenterrorboundsforrobustlylearningaGaussian

� Generalrecipeusingrestrictedeigenvalueproblems� SQLlowerbounds,handlingmorecorruptionsandsparserobustestimation

� Ispractical,robuststatisticswithinreach?

Thanks!AnyQuestions?

Summary:� DimensionindependenterrorboundsforrobustlylearningaGaussian

� Generalrecipeusingrestrictedeigenvalueproblems� SQLlowerbounds,handlingmorecorruptionsandsparserobustestimation

� Ispractical,robuststatisticswithinreach?

Documents

Robustness Meets Algorithms - Massachusetts Institute of ...people.csail.mit.edu/moitra/docs/robusttutorialpt1.pdf · Robustness Meets Algorithms Ankur Moitra (MIT) ... Given samples