View
12
Download
1
Category
Preview:
Citation preview
ClusteringandClassificationMethods:ABriefIntroduction
EPSY905:MultivariateAnalysisSpring2016
Lecture#14(andlast!)–May4,2016
Today’sLecture
• Classificationmethods:Usefulforwhenyoualreadyknowwhichgroupsexistbutyoudon’tknowwhichgrouptoassignanewobservations
• Clusteringmethods:Usefulforwhenyoudon’tknowhowmanygroupsexist
• Asbothmethodsrelyupondistances(moreorless)wewillstartwithadefinitionofdistances
Ø We’llalsogetabonusslideortwoaboutmultidimensional scaling—anothermethod thatusesdistances (butdoesn’t clusterorclassifydirectly)
Today’sData
• WewillmakeuseoftheclassicFisher’sIrisDatatodemonstrateclusteringandclassificationmethods
• 150flowers;50fromthreespecies(Setosa,Versicolor,Virginica)
• Fourmeasurementsfromeachflower:Ø SepalwidthØ SepallengthØ PetalwidthØ Petallength
• ThesedataarebuiltintoRalready!
DISTANCEMETRICS
MeasuresofDistance
• Caremustbetakenwithchoosingthemetricbywhichsimilarityisquantified
• Importantconsiderationsinclude:Ø Thenatureofthevariables (e.g., discrete, continuous, binary)Ø Scalesofmeasurement (nominal,ordinal, interval, orratio)Ø Thenatureofthematterunderstudy
EuclideanDistance• Euclideandistanceisafrequentchoiceofadistancemetric:
𝑑 𝒙, 𝒚 = 𝑥' − 𝑦' * + 𝑥* − 𝑦* * + ⋯+ 𝑥- − 𝑦-*
= 𝒙 − 𝒚 . 𝒙− 𝒚
• Distancescanbebetweenanythingyouwishtocluster:Ø Observations (distanceacrossallvariables)Ø Variables (distance acrossallobservations)
• TheEuclideandistanceisafrequentchoicebecauseitrepresentsanunderstandablemetric
Ø Itmaynotbethebestchoicealways
EuclideanDistance?
• ImagineIwantedtoknowhowmanymilesitwasfrommyoldhouseinSacramentotoLombardStreetinSanFrancisco…
• Knowinghowfaritwasonastraightlinewouldnotdometoomuchgood,particularlywiththenumberofone-waystreetsthatexistinSanFrancisco
• OtherpopulardistancemetricsincludetheMinkowski metric:
• Thekeytothismetricisthechoiceofm:– Ifm =2,thisprovidestheEuclideandistance– Ifm=1,thisprovidesthe“city-block”distance
OtherDistanceMetrics
mp
i
mii yxd
/1
1),( ⎥
⎦
⎤⎢⎣
⎡−= ∑
=
yx
PreferredDistanceProperties
• Itisoftendesirabletouseadistancemetricthatmeetsthefollowingproperties:
Ø d(P,Q) = d(Q,P)Ø d(P,Q) > 0 if P ≠ QØ d(P,Q) = 0 if P = QØ d(P,Q) ≤ d(P,R) + d(R,Q)
• TheEuclideanandMinkowskimetricssatisfytheseproperties
TriangleInequality• Thefourthproperty iscalled thetriangleinequality, whichoftengetsviolatedbynon-routinemeasures ofdistance
• Thisinequality canbeshownbythefollowing triangle(withlinesrepresenting Euclidean distances):
QP
R
d(P,Q)
d(P,R)d(R,Q)
d(P,R) d(R,Q)
d(P,Q)
BinaryVariables• Inthecaseofbinary-valuedvariables(variablesthathavea0/1coding),manyotherdistancemetricsmaybedefined
• TheEuclideandistanceprovidesacountofthenumberofmismatchedobservations:
• Here,d(i,k)=2
• ThisissometimescalledtheHammingDistance
OtherBinaryDistanceMeasures
• Thereareanumberofotherwaystodefinethedistancebetweenasetofbinaryvariables
• Mostofthesemeasuresreflectthevariedimportanceplacedondifferingcellsina2x2table
GeneralDistanceMeasureProperties
• Useofmeasuresofdistancethataremonotonicintheirorderingofobjectdistanceswillprovideidenticalresultsfromclusteringheuristics
• Manytimesthiswillonlybeanissueifthedistancemeasureisforbinaryvariablesor thedistancemeasuredoesnotsatisfythetriangleinequality
DistancesinR
CLASSICALCLUSTERINGMETHODS(USINGDISTANCES)
ClusteringDefinitions• Clusteringobjects(orobservations)canprovidedetailregardingthenatureandstructureofdata
• ClusteringisdistinctfromclassificationinterminologyØ Classification pertainstoaknown numberofgroups,withtheobjectivebeingtoassignnewobservations tothesegroups
• Classicalmethodsofclusteranalysisisatechniquewherenoassumptionsaremadeconcerningthenumberofgroupsorthegroupstructure
Ø Mixturemodelsdomakeassumptions aboutthedata
• Clusteringalgorithmsmakeuseofmeasuresofsimilarity(oralternatively,dissimilarity)todefineandgroupvariablesorobservations
• Clusteringpresentsahostoftechnicalproblems
• Forareasonablesizeddatasetwithn objects(eithervariablesorindividuals),thenumberofwaysofgroupingn objectsintokgroupsis:
• Forexample,thereareoverfourtrillionwaysthat25objectscanbeclusteredinto4groups– whichsolutionisbest?
nk
j
kj jjk
k ∑=−
⎟⎟⎠
⎞⎜⎜⎝
⎛−⎟
⎠
⎞⎜⎝
⎛
01
!1
NumericalProblems• Intheory,onewaytofindthebestsolutionistotryeachpossiblegroupingofalloftheobjects– anoptimizationprocesscalledintegerprogramming
• Itisdifficult,ifnotimpossible,todosuchamethodgiventhestateoftoday’scomputers(althoughcomputersarecatchingupwithsuchproblems)
• Ratherthanusingsuchbrute-forcetypemethods,asetofheuristicshavebeendevelopedtoallowforfastclusteringofobjectsintogroups
• Suchmethodsarecalledheuristicsbecausetheydonotguaranteethatthesolutionwillbeoptimal(best),onlythatthesolutionwillbebetterthanmost
ClusteringHeuristicInputs
• Theinputsintoclusteringheuristicsareintheformofmeasuresofsimilaritiesordissimilarities
• Theresultoftheheuristicdependsinlargepartonthemeasureofsimilarity/dissimilarityusedbytheprocedure
ClusteringVariablesvs.ClusteringObservations
• Whenvariablesaretobeclustered,oftusedmeasuresofsimilarityincludecorrelationcoefficients(orsimilarmeasuresfornon-continuousvariables)
• Whenobservationsaretobeclustered,distancemetricsareoftenused
HierarchicalClusteringMethods• Becauseofthelargesizeofpossibleclusteringsolutions,searchingthroughallcombinationsisnotfeasible
• Hierarchicalclusteringtechniquesproceedbytakingasetofobjectsandgroupingasetatatime
• Twotypesofhierarchicalclusteringmethodsexist:Ø Agglomerative hierarchicalmethodsØ Divisive hierarchicalmethods
AgglomerativeClusteringMethods• Agglomerativeclusteringmethodsstartfirstwiththeindividualobjects
• Initially,eachobjectisit’sowncluster
• Themostsimilar objectsarethengroupedtogetherintoasinglecluster(withtwoobjects)
We will find that what we mean by similar will change depending on the method
AgglomerativeClusteringMethods• Theremainingstepsinvolvemergingtheclustersaccordingthesimilarityordissimilarityoftheobjectswithintheclustertothoseoutsideofthecluster
• Themethodconcludeswhenallobjectsarepartofasinglecluster
DivisiveClusteringMethods• Divisivehierarchicalmethodsworksintheoppositedirection–beginningwithasingle,n-objectsizedcluster
• Thelargeclusteristhendividedintotwosubgroupswheretheobjectsinopposinggroupsarerelativelydistantfromeachother
• Theprocesscontinuessimilarlyuntilthereareasmanyclustersasthereareobjects
ToSummarize• Soyoucanseethatwehavethisideaofsteps
Ø Ateachsteptwoclusters combine toformone(Agglomerative)
OR…
Ø Ateachstepaclusterisdivided intotwonewclusters(Divisive)
MethodsforViewingClusters• Asyoucouldimagine,whenweconsiderthemethodsforhierarchicalclustering,therearealargenumberofclustersthatareformedsequentially
• Oneofthemostfrequentlyusedtoolstoviewtheclusters(andlevelatwhichtheywereformed)isthedendrogram
• Adendrogram isagraphthatdescribesthedifferinghierarchicalclusters,andthedistanceatwhicheachisformed
ExampleDataSet#1• Todemonstrateseveralofthehierarchicalclusteringmethods,anexampledatasetisused
• Datacomefroma1991studybytheeconomicresearchdepartmentoftheunionbankofSwitzerlandrepresentingeconomicconditionsof48citiesaroundtheworld
• Threevariableswerecollected:– Average workinghoursfor12occupations– Priceof112goodsandservicesexcluding rent– Indexofnethourly earningsin12occupations
This is an example of an agglomerate cluster where cities start off in there own group and then are combined
For example, Here Amsterdam and Brussels were combined to form a group
Notice that we do not specify groups, but if we know how many we want…we simply go to the step where there are that many groups
The Cities
Where the lines connect is when those two previous groups were joined
Similarity?• So,wementionedthat:
Ø Themostsimilar objectsarethengrouped togetherintoasinglecluster(withtwoobjects)
• SothenextquestionishowdowemeasuresimilaritybetweenclustersØ Morespecifically,howdoweredefine itwhenaclustercontainsacombinationofoldclusters
• Wefindthatthereareseveralwaystodefinesimilarandeachwaydefinesanewmethodofclustering
AgglomerativeMethods• NextwediscussseveraldifferentwaytocompleteAgglomerativehierarchicalclustering:
Ø SingleLinkageØ Complete LinkageØ Average LinkageØ CentroidØ MedianØ WardMethod
ExampleDistanceMatrix
• Theexamplewillbebasedonthedistancematrixbelow
1 2 3 4 5
1 0 9 3 6 11
2 9 0 7 5 10
3 3 7 0 9 2
4 6 5 9 0 8
5 11 10 2 8 0
SingleLinkage
• Thesinglelinkagemethodofclusteringinvolvescombiningclustersbyfindingthe“nearestneighbor”– theclusterclosesttoanygivenobservationwithinthecurrentcluster
Cluster 1
1
2
Cluster 2
34
5
Notice that this is equal to the minimum distance of any observation in cluster one with any observation in cluster 3
• Sothedistancebetweenanytwoclustersis:
{ })(min),( jidBAD y,y= For all yi in A and yj in B
1
2
34
5
Notice any other distance is longer
So how would we do this using our distance matrix?
SingleLinkageExample
• Thefirststepintheprocessistodeterminethetwoelementswiththesmallestdistance,andcombinethemintoasinglecluster.
• Here,thetwoobjectsthataremostsimilarareobjects3and5…wewillnowcombinetheseintoanewcluster,andcomputethedistancefromthatclustertotheremainingclusters(objects)viathesinglelinkagerule.
1 2 3 4 5
1 0 9 3 6 11
2 9 0 7 5 10
3 3 7 0 9 2
4 6 5 9 0 8
5 11 10 2 8 0
SingleLinkageExample
• Theshadedrows/columns aretheportionsofthetablewiththedistances fromanobjecttoobject3orobject5.
• Thedistancefromthe(35)clustertotheremainingobjectsisgivenbelow:
d(35),1 =min{d31,d51}=min{3,11}=3d(35),2 =min{d32,d52}=min{7,10}=7d(35),4 =min{d34,d54}=min{9,8}=8
1 2 3 4 5
1 0 9 3 6 11
2 9 0 7 5 10
3 3 7 0 9 2
4 6 5 9 0 8
5 11 10 2 8 0
These are the distances of 3 and 5 with 1. Our rule says that the distance of our new cluster with 1 is equal to the minimum of theses two values…3
These are the distances of 3 and 5 with 2. Our rule says that the distance of our new cluster with 2 is equal to the minimum of theses two values…7
In equation form our new distances are
SingleLinkageExample
• Usingthedistancevalues,wenowconsolidateourtablesothat(35)isnowasinglerow/column
• Thedistancefromthe(35)clustertotheremainingobjectsisgivenbelow:
d(35)1 =min{d31,d51}=min{3,11}=3d(35)2 =min{d32,d52}=min{7,10}=7d(35)4 =min{d34,d54}=min{9,8}=8
(35) 1 2 4
(35) 0
1 3 0
2 7 9 0
4 8 6 5 0
SingleLinkageExample
• Wenowrepeattheprocess,byfinding thesmallestdistancebetween within thesetofremaining clusters
• Thesmallestdistanceisbetween object1andcluster(35)
• Therefore, object1joinscluster(35),creatingcluster(135)
(35) 1 2 4
(35) 0 3 7 8
1 3 0 9 6
2 7 9 0 5
4 8 6 5 0
The distance from cluster (135) to the other clusters is then computed:
d(135)2 = min{d(35)2,d12} = min{7,9} = 7d(135)4 = min{d(35)4,d14} = min{8,6} = 6
SingleLinkageExample
• Usingthedistancevalues,wenowconsolidateourtablesothat(135)isnowasinglerow/column
• Thedistancefromthe(135)clustertotheremainingobjectsisgivenbelow:
(135) 2 4
(135) 0
2 7 0
4 6 5 0
d(135)2 = min{d(35)2,d12} = min{7,9} = 7d(135)4 = min{d(35)4,d14} = min{8,6} = 6
SingleLinkageExample
• Wenowrepeattheprocess,byfindingthesmallestdistancebetweenwithinthesetofremainingclusters
• Thesmallestdistanceisbetweenobject2andobject4
• Thesetwoobjectswillbejoinedtofromcluster(24)
• Thedistancefrom(24)to(135)isthencomputed
d(135)(24) =min{d(135)2,d(135)4}=min{7,6}=6
• Thefinalclusterisformed(12345)withadistanceof6
(135) 2 4
(135) 0
2 7 0
4 6 5 0
TheDendogram
For example, here is where 3 and 5 joined
CompleteLinkage
• Thecompletelinkagemethodofclusteringinvolvescombiningclustersbyfindingthe“farthestneighbor”–theclusterfarthesttoanygivenobservationwithinthecurrentcluster
• Thisensuresthatallobjectsinaclusterarewithinsomemaximumdistanceofeachother
Cluster 1
1
2
Cluster 2
3 4
5
ClusteringinR
Dendrogram ofRExamplewithFlowers
OtherClassicalClusteringMethods
• K-meansclusteringwillpartitiondataintomorethanonegroup…butyouhavetospecifyhowmanygroups
• ThemethodusestheMahalanobisdistanceofeachobservationtothegroupmeananditerativelyswitchesgroupmembershipuntilnooneswitches
• Rhasabuilt-inK-meansmethod
FINITEMIXTUREMODELS
FiniteMixtureModels:ClusteringwithLikelihoods• Finitemixturemodelsaremodelsthatattempttodeterminethenumberofclusters(calledclasses)inasetofdata
Ø Finite:countablymanyclassesØ Mixture:distributionofthedatacomesfromamixtureofobservationsfromdifferentclasses
• FMMsuselikelihoodstodetermineclassmembershipØ Alikelihood(pdf)isasimilarity(inversedistance)Ø Higherlikelihoodà observationismorelikeclassaverageàmorelikelyobservationcomesfromclass
Ø Lowerlikelihoodà observationislesslikeclassaverageà lesslikelyobservationcomesfromclass
• FMMshaveveryspecificnames:Ø Latentclassanalysis:FMMwheredataarebinaryandindependentwithinclassØ Factormixturemodel:FMMwheredataarecontinuousandhaveafactorstructurethatyieldsawithin-classcovariancematrix
FMMMarginalLikelihood Function
• ThegeneralFMMmarginallikelihoodfunctionis:
𝑓 𝒙0 =1𝜂3𝑓(𝒙0|𝑐)8
39'• 𝑓(𝒙0|𝑐) iswithinclasslikelihoodfunction(pdf)
Ø Eachclasshasitsowndistribution
• 𝜂3 istheprobabilityanyobservationcomesfromclasscØ The“mixing”proportion
• Thesumiswherethemixturegetsitsname:themarginaldatalikelihoodisablend(mixture)ofeachclass’likelihood
FMMsinR:Gaussian(MVN)Mixtureswithmclust
• Rhasseveralpackagesformixtures:Iwillonlyshowyouone– Mclust
• Wewillattempttodeterminethenumberofclassesinourflowerdatausingonlythefourmeasurements
• First,weletmclust runanumberofmodelstodeterminethe“bestfitting”(lowestBICvalue)
PlotofBICsfrommclust
ExampleOutputfromOneSolution
ProblemswithFMMs
• Exploratory(findingnumberofclasses)usesofmixturesareoftensuspectasnearlyanythingcanbringaboutadditional(spurrious/false)classes
• Forinstance:ifourdatawerenotMVNwithinclass,wewouldlikelyseemoreclassesthanweneed
• Findingtherightmodelisoftendifficult
WRAPPINGUP
WrappingUp
• Thisclassonlyscratchedthesurfaceofwhatcanbedonetoclusterorclassifydata
• Clustering:DeterminingthenumberofclustersindataØ Agglomerative/Hierarchical clusteringØ K-MeansØ Finitemixturemodels
• Classification:PuttingobservationsintoknownclassesØ Classicalapproach:Discriminantanalysis(notshowntoday;lda()Rfunction)Ø Modernapproach:FMMs
• Myclassnextsemester(DiagnosticTesting)describesconfirmatoryusesofFMMs
Recommended