Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
InteractiveClusteringBarnaSaha
Clustering
LearningoverNoisyData
Noisecomesfromusingsimilarityfunc5ons—addanedgebetweentwoimagesiftheyrepresentthesamemonument—clusterscouldbeerroneous
Learnaclassifierorfindclustersovernoisy/uncertaindata
• Learnaclassifierordoclusteringovernoisy/uncertaindataNoisecomesfrominherentdataerrors/missinga?ributes—clusteringcollabora5onnetworkobtainedfromDBLPcouldbeerroneous.
LearningoverNoisyDataLearnaclassifierorfindclustersovernoisy/uncertaindata
FurtherApplications• LinkingCensusRecords• PublicHealth• Websearch• Comparisonshopping• SpamDetec5on• MachineReading• IPAliasing• ……..
Querycomplexityofoptimalstrategy?
Querycomplexityofoptimalstrategy?
Davidson,Khanna,Milo,Roy,2014
FaultyOracle
FaultyOracle
Repeatthesameques5on.Assumingp=q,repeateachques5on(say)24logn/(1-2p)25mes
FaultyOracle
FaultyOracle:NoResampling• Findseednodesforeachcluster• Ifwecanfind24logn/(1-2p)2seednodesfromeachclusterthenwearedone![Why?]
…..
………………..
FaultyOracle:HowtoCindseednodes?• LetN=O(k2logn/(1-2p)4)• SelectNnodesandaskallpossiblepairwisequeriesamongthesenodes.
• Runcorrela5onclusteringalgorithminthissmallsetofnodes• Eachclusterreturnedbythecorrela5onclusteringthathassizeatleast24logn/(1-2p)2actasaseed
FaultyOracle:HowtoCindseednodes?• LetN=O(k2logn/(1-2p)4)• SelectNnodesandaskallpossiblepairwisequeriesamongthesenodes.
• Runcorrela5onclusteringalgorithminthissmallsetofnodes• Eachclusterreturnedbythecorrela5onclusteringthathassizeatleast24logn/(1-2p)2actasaseed
Someintui5onontheanalysis:Ifweknowallthequeryresults,correla5onclusteringgivesthemaximumlikelihoodes5mator.
Moreover,itisaninstanceofcorrela5onclusteringwhereerrorsarerandom—weknowhowtosolveit!