Clustering and Classification Methods: A Brief Introduction€¦ · Ø Divisive hierarchical...

ClusteringandClassificationMethods:ABriefIntroduction

EPSY905:MultivariateAnalysisSpring2016

Lecture#14(andlast!)–May4,2016

Today’sLecture

• Classificationmethods:Usefulforwhenyoualreadyknowwhichgroupsexistbutyoudon’tknowwhichgrouptoassignanewobservations

• Clusteringmethods:Usefulforwhenyoudon’tknowhowmanygroupsexist

• Asbothmethodsrelyupondistances(moreorless)wewillstartwithadefinitionofdistances

Ø We’llalsogetabonusslideortwoaboutmultidimensional scaling—anothermethod thatusesdistances (butdoesn’t clusterorclassifydirectly)

Today’sData

• WewillmakeuseoftheclassicFisher’sIrisDatatodemonstrateclusteringandclassificationmethods

• 150flowers;50fromthreespecies(Setosa,Versicolor,Virginica)

• Fourmeasurementsfromeachflower:Ø SepalwidthØ SepallengthØ PetalwidthØ Petallength

• ThesedataarebuiltintoRalready!

DISTANCEMETRICS

MeasuresofDistance

• Caremustbetakenwithchoosingthemetricbywhichsimilarityisquantified

• Importantconsiderationsinclude:Ø Thenatureofthevariables (e.g., discrete, continuous, binary)Ø Scalesofmeasurement (nominal,ordinal, interval, orratio)Ø Thenatureofthematterunderstudy

EuclideanDistance• Euclideandistanceisafrequentchoiceofadistancemetric:

𝑑 𝒙, 𝒚 = 𝑥' − 𝑦' * + 𝑥* − 𝑦* * + ⋯+ 𝑥- − 𝑦-*

= 𝒙 − 𝒚 . 𝒙− 𝒚

• Distancescanbebetweenanythingyouwishtocluster:Ø Observations (distanceacrossallvariables)Ø Variables (distance acrossallobservations)

• TheEuclideandistanceisafrequentchoicebecauseitrepresentsanunderstandablemetric

Ø Itmaynotbethebestchoicealways

EuclideanDistance?

• ImagineIwantedtoknowhowmanymilesitwasfrommyoldhouseinSacramentotoLombardStreetinSanFrancisco…

• Knowinghowfaritwasonastraightlinewouldnotdometoomuchgood,particularlywiththenumberofone-waystreetsthatexistinSanFrancisco

• OtherpopulardistancemetricsincludetheMinkowski metric:

• Thekeytothismetricisthechoiceofm:– Ifm =2,thisprovidestheEuclideandistance– Ifm=1,thisprovidesthe“city-block”distance

OtherDistanceMetrics

mii yxd

1),( ⎥

⎤⎢⎣

⎡−= ∑

PreferredDistanceProperties

• Itisoftendesirabletouseadistancemetricthatmeetsthefollowingproperties:

Ø d(P,Q) = d(Q,P)Ø d(P,Q) > 0 if P ≠ QØ d(P,Q) = 0 if P = QØ d(P,Q) ≤ d(P,R) + d(R,Q)

• TheEuclideanandMinkowskimetricssatisfytheseproperties

TriangleInequality• Thefourthproperty iscalled thetriangleinequality, whichoftengetsviolatedbynon-routinemeasures ofdistance

• Thisinequality canbeshownbythefollowing triangle(withlinesrepresenting Euclidean distances):

d(P,Q)

d(P,R)d(R,Q)

d(P,R) d(R,Q)

d(P,Q)

BinaryVariables• Inthecaseofbinary-valuedvariables(variablesthathavea0/1coding),manyotherdistancemetricsmaybedefined

• TheEuclideandistanceprovidesacountofthenumberofmismatchedobservations:

• Here,d(i,k)=2

• ThisissometimescalledtheHammingDistance

OtherBinaryDistanceMeasures

• Thereareanumberofotherwaystodefinethedistancebetweenasetofbinaryvariables

• Mostofthesemeasuresreflectthevariedimportanceplacedondifferingcellsina2x2table

GeneralDistanceMeasureProperties

• Useofmeasuresofdistancethataremonotonicintheirorderingofobjectdistanceswillprovideidenticalresultsfromclusteringheuristics

• Manytimesthiswillonlybeanissueifthedistancemeasureisforbinaryvariablesor thedistancemeasuredoesnotsatisfythetriangleinequality

DistancesinR

CLASSICALCLUSTERINGMETHODS(USINGDISTANCES)

ClusteringDefinitions• Clusteringobjects(orobservations)canprovidedetailregardingthenatureandstructureofdata

• ClusteringisdistinctfromclassificationinterminologyØ Classification pertainstoaknown numberofgroups,withtheobjectivebeingtoassignnewobservations tothesegroups

• Classicalmethodsofclusteranalysisisatechniquewherenoassumptionsaremadeconcerningthenumberofgroupsorthegroupstructure

Ø Mixturemodelsdomakeassumptions aboutthedata

• Clusteringalgorithmsmakeuseofmeasuresofsimilarity(oralternatively,dissimilarity)todefineandgroupvariablesorobservations

• Clusteringpresentsahostoftechnicalproblems

• Forareasonablesizeddatasetwithn objects(eithervariablesorindividuals),thenumberofwaysofgroupingn objectsintokgroupsis:

• Forexample,thereareoverfourtrillionwaysthat25objectscanbeclusteredinto4groups– whichsolutionisbest?

kj jjk

k ∑=−

⎟⎟⎠

⎞⎜⎜⎝

⎛−⎟

⎞⎜⎝

NumericalProblems• Intheory,onewaytofindthebestsolutionistotryeachpossiblegroupingofalloftheobjects– anoptimizationprocesscalledintegerprogramming

• Itisdifficult,ifnotimpossible,todosuchamethodgiventhestateoftoday’scomputers(althoughcomputersarecatchingupwithsuchproblems)

• Ratherthanusingsuchbrute-forcetypemethods,asetofheuristicshavebeendevelopedtoallowforfastclusteringofobjectsintogroups

• Suchmethodsarecalledheuristicsbecausetheydonotguaranteethatthesolutionwillbeoptimal(best),onlythatthesolutionwillbebetterthanmost

ClusteringHeuristicInputs

• Theinputsintoclusteringheuristicsareintheformofmeasuresofsimilaritiesordissimilarities

• Theresultoftheheuristicdependsinlargepartonthemeasureofsimilarity/dissimilarityusedbytheprocedure

ClusteringVariablesvs.ClusteringObservations

• Whenvariablesaretobeclustered,oftusedmeasuresofsimilarityincludecorrelationcoefficients(orsimilarmeasuresfornon-continuousvariables)

• Whenobservationsaretobeclustered,distancemetricsareoftenused

HierarchicalClusteringMethods• Becauseofthelargesizeofpossibleclusteringsolutions,searchingthroughallcombinationsisnotfeasible

• Hierarchicalclusteringtechniquesproceedbytakingasetofobjectsandgroupingasetatatime

• Twotypesofhierarchicalclusteringmethodsexist:Ø Agglomerative hierarchicalmethodsØ Divisive hierarchicalmethods

AgglomerativeClusteringMethods• Agglomerativeclusteringmethodsstartfirstwiththeindividualobjects

• Initially,eachobjectisit’sowncluster

• Themostsimilar objectsarethengroupedtogetherintoasinglecluster(withtwoobjects)

We will find that what we mean by similar will change depending on the method

AgglomerativeClusteringMethods• Theremainingstepsinvolvemergingtheclustersaccordingthesimilarityordissimilarityoftheobjectswithintheclustertothoseoutsideofthecluster

• Themethodconcludeswhenallobjectsarepartofasinglecluster

DivisiveClusteringMethods• Divisivehierarchicalmethodsworksintheoppositedirection–beginningwithasingle,n-objectsizedcluster

• Thelargeclusteristhendividedintotwosubgroupswheretheobjectsinopposinggroupsarerelativelydistantfromeachother

• Theprocesscontinuessimilarlyuntilthereareasmanyclustersasthereareobjects

ToSummarize• Soyoucanseethatwehavethisideaofsteps

Ø Ateachsteptwoclusters combine toformone(Agglomerative)

Ø Ateachstepaclusterisdivided intotwonewclusters(Divisive)

MethodsforViewingClusters• Asyoucouldimagine,whenweconsiderthemethodsforhierarchicalclustering,therearealargenumberofclustersthatareformedsequentially

• Oneofthemostfrequentlyusedtoolstoviewtheclusters(andlevelatwhichtheywereformed)isthedendrogram

• Adendrogram isagraphthatdescribesthedifferinghierarchicalclusters,andthedistanceatwhicheachisformed

ExampleDataSet#1• Todemonstrateseveralofthehierarchicalclusteringmethods,anexampledatasetisused

• Datacomefroma1991studybytheeconomicresearchdepartmentoftheunionbankofSwitzerlandrepresentingeconomicconditionsof48citiesaroundtheworld

• Threevariableswerecollected:– Average workinghoursfor12occupations– Priceof112goodsandservicesexcluding rent– Indexofnethourly earningsin12occupations

This is an example of an agglomerate cluster where cities start off in there own group and then are combined

For example, Here Amsterdam and Brussels were combined to form a group

Notice that we do not specify groups, but if we know how many we want…we simply go to the step where there are that many groups

The Cities

Where the lines connect is when those two previous groups were joined

Similarity?• So,wementionedthat:

Ø Themostsimilar objectsarethengrouped togetherintoasinglecluster(withtwoobjects)

• SothenextquestionishowdowemeasuresimilaritybetweenclustersØ Morespecifically,howdoweredefine itwhenaclustercontainsacombinationofoldclusters

• Wefindthatthereareseveralwaystodefinesimilarandeachwaydefinesanewmethodofclustering

AgglomerativeMethods• NextwediscussseveraldifferentwaytocompleteAgglomerativehierarchicalclustering:

Ø SingleLinkageØ Complete LinkageØ Average LinkageØ CentroidØ MedianØ WardMethod

ExampleDistanceMatrix

• Theexamplewillbebasedonthedistancematrixbelow

1 2 3 4 5

1 0 9 3 6 11

2 9 0 7 5 10

3 3 7 0 9 2

4 6 5 9 0 8

5 11 10 2 8 0

SingleLinkage

• Thesinglelinkagemethodofclusteringinvolvescombiningclustersbyfindingthe“nearestneighbor”– theclusterclosesttoanygivenobservationwithinthecurrentcluster

Cluster 1

Cluster 2

Notice that this is equal to the minimum distance of any observation in cluster one with any observation in cluster 3

• Sothedistancebetweenanytwoclustersis:

{ })(min),( jidBAD y,y= For all yi in A and yj in B

Notice any other distance is longer

So how would we do this using our distance matrix?

SingleLinkageExample

• Thefirststepintheprocessistodeterminethetwoelementswiththesmallestdistance,andcombinethemintoasinglecluster.

• Here,thetwoobjectsthataremostsimilarareobjects3and5…wewillnowcombinetheseintoanewcluster,andcomputethedistancefromthatclustertotheremainingclusters(objects)viathesinglelinkagerule.

1 2 3 4 5

1 0 9 3 6 11

2 9 0 7 5 10

3 3 7 0 9 2

4 6 5 9 0 8

5 11 10 2 8 0

• Theshadedrows/columns aretheportionsofthetablewiththedistances fromanobjecttoobject3orobject5.

• Thedistancefromthe(35)clustertotheremainingobjectsisgivenbelow:

d(35),1 =min{d31,d51}=min{3,11}=3d(35),2 =min{d32,d52}=min{7,10}=7d(35),4 =min{d34,d54}=min{9,8}=8

1 2 3 4 5

1 0 9 3 6 11

2 9 0 7 5 10

3 3 7 0 9 2

4 6 5 9 0 8

5 11 10 2 8 0

These are the distances of 3 and 5 with 1. Our rule says that the distance of our new cluster with 1 is equal to the minimum of theses two values…3

These are the distances of 3 and 5 with 2. Our rule says that the distance of our new cluster with 2 is equal to the minimum of theses two values…7

In equation form our new distances are

• Usingthedistancevalues,wenowconsolidateourtablesothat(35)isnowasinglerow/column

d(35)1 =min{d31,d51}=min{3,11}=3d(35)2 =min{d32,d52}=min{7,10}=7d(35)4 =min{d34,d54}=min{9,8}=8

(35) 1 2 4

(35) 0

2 7 9 0

4 8 6 5 0

• Wenowrepeattheprocess,byfinding thesmallestdistancebetween within thesetofremaining clusters

• Thesmallestdistanceisbetween object1andcluster(35)

• Therefore, object1joinscluster(35),creatingcluster(135)

(35) 1 2 4

(35) 0 3 7 8

1 3 0 9 6

2 7 9 0 5

4 8 6 5 0

The distance from cluster (135) to the other clusters is then computed:

d(135)2 = min{d(35)2,d12} = min{7,9} = 7d(135)4 = min{d(35)4,d14} = min{8,6} = 6

• Usingthedistancevalues,wenowconsolidateourtablesothat(135)isnowasinglerow/column

(135) 2 4

(135) 0

4 6 5 0

d(135)2 = min{d(35)2,d12} = min{7,9} = 7d(135)4 = min{d(35)4,d14} = min{8,6} = 6

• Wenowrepeattheprocess,byfindingthesmallestdistancebetweenwithinthesetofremainingclusters

• Thesmallestdistanceisbetweenobject2andobject4

• Thesetwoobjectswillbejoinedtofromcluster(24)

• Thedistancefrom(24)to(135)isthencomputed

d(135)(24) =min{d(135)2,d(135)4}=min{7,6}=6

• Thefinalclusterisformed(12345)withadistanceof6

(135) 2 4

(135) 0

4 6 5 0

TheDendogram

For example, here is where 3 and 5 joined

CompleteLinkage

• Thecompletelinkagemethodofclusteringinvolvescombiningclustersbyfindingthe“farthestneighbor”–theclusterfarthesttoanygivenobservationwithinthecurrentcluster

• Thisensuresthatallobjectsinaclusterarewithinsomemaximumdistanceofeachother

Cluster 1

Cluster 2

ClusteringinR

Dendrogram ofRExamplewithFlowers

OtherClassicalClusteringMethods

• K-meansclusteringwillpartitiondataintomorethanonegroup…butyouhavetospecifyhowmanygroups

• ThemethodusestheMahalanobisdistanceofeachobservationtothegroupmeananditerativelyswitchesgroupmembershipuntilnooneswitches

• Rhasabuilt-inK-meansmethod

FINITEMIXTUREMODELS

FiniteMixtureModels:ClusteringwithLikelihoods• Finitemixturemodelsaremodelsthatattempttodeterminethenumberofclusters(calledclasses)inasetofdata

Ø Finite:countablymanyclassesØ Mixture:distributionofthedatacomesfromamixtureofobservationsfromdifferentclasses

• FMMsuselikelihoodstodetermineclassmembershipØ Alikelihood(pdf)isasimilarity(inversedistance)Ø Higherlikelihoodà observationismorelikeclassaverageàmorelikelyobservationcomesfromclass

Ø Lowerlikelihoodà observationislesslikeclassaverageà lesslikelyobservationcomesfromclass

• FMMshaveveryspecificnames:Ø Latentclassanalysis:FMMwheredataarebinaryandindependentwithinclassØ Factormixturemodel:FMMwheredataarecontinuousandhaveafactorstructurethatyieldsawithin-classcovariancematrix

FMMMarginalLikelihood Function

• ThegeneralFMMmarginallikelihoodfunctionis:

𝑓 𝒙0 =1𝜂3𝑓(𝒙0|𝑐)8

39'• 𝑓(𝒙0|𝑐) iswithinclasslikelihoodfunction(pdf)

Ø Eachclasshasitsowndistribution

• 𝜂3 istheprobabilityanyobservationcomesfromclasscØ The“mixing”proportion

• Thesumiswherethemixturegetsitsname:themarginaldatalikelihoodisablend(mixture)ofeachclass’likelihood

FMMsinR:Gaussian(MVN)Mixtureswithmclust

• Rhasseveralpackagesformixtures:Iwillonlyshowyouone– Mclust

• Wewillattempttodeterminethenumberofclassesinourflowerdatausingonlythefourmeasurements

• First,weletmclust runanumberofmodelstodeterminethe“bestfitting”(lowestBICvalue)

PlotofBICsfrommclust

ExampleOutputfromOneSolution

ProblemswithFMMs

• Exploratory(findingnumberofclasses)usesofmixturesareoftensuspectasnearlyanythingcanbringaboutadditional(spurrious/false)classes

• Forinstance:ifourdatawerenotMVNwithinclass,wewouldlikelyseemoreclassesthanweneed

• Findingtherightmodelisoftendifficult

WRAPPINGUP

WrappingUp

• Thisclassonlyscratchedthesurfaceofwhatcanbedonetoclusterorclassifydata

• Clustering:DeterminingthenumberofclustersindataØ Agglomerative/Hierarchical clusteringØ K-MeansØ Finitemixturemodels

• Classification:PuttingobservationsintoknownclassesØ Classicalapproach:Discriminantanalysis(notshowntoday;lda()Rfunction)Ø Modernapproach:FMMs

• Myclassnextsemester(DiagnosticTesting)describesconfirmatoryusesofFMMs

Clustering and Classification Methods: A Brief Introduction€¦ · Ø Divisive hierarchical...

Documents

Assessment of Clustering Methods for Predicting Per

Language Independent Methods of Clustering Similar

Cluster analysis - Technical University of Denmark primary methods • Cluster analysis (no projection) – Hierarchical clustering – Divisive clustering – Fuzzy clustering •

Nearest neighbor-density-based clustering methods for

Two Statistical Methods for Clustering Medicare Claims ...digitalassets.lib.berkeley.edu/etd/ucb/text/Gibbons_berkeley_0028E... · Two Statistical Methods for Clustering Medicare

Chapter 15 CLUSTERING METHODS - BGU 15 CLUSTERING METHODS ... Clustering, K-means, Intra-cluster homogeneity, ... The measurement unit used can affect the clustering analysis

4. Clustering Methods

A SURVEY OF CLUSTERING METHODS VIA OPTIMIZATION …

Hierarchical Quasi-Clustering Methods for Asymmetric Networksproceedings.mlr.press/v32/carlsson14.pdf · Hierarchical Quasi-Clustering Methods for Asymmetric Networks Gunnar Carlsson

Chapter 8: Classification and Clustering Methods

Chapter 15 CLUSTERING METHODS

Introduction to clustering methods for gene expression ...homes.di.unimi.it/valenti/SlideCorsi/Bioinformatica06/IntroClustering.pdfA short bibliography on clustering methods for gene

Data Clustering 2 – K Means contd & Hierarchical Methods Data Clustering – An IntroductionSlide 1

Comparison of clustering methods

Methodology Clustering Methods · 329 Methodology Review: Clustering Methods Glenn W. Milligan and Martha C. Cooper Ohio State University A review of clustering methodology is presented,

Information Theoretic Methods For Biometrics, Clustering

Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the

Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

Spatial Clustering Methods

Robust Clustering-based Segmentation Methods for