UVA CS 4501: Machine Learning Lecture 21: Unsupervised … · 2020. 8. 4. · • Partitioning around medoids (PAM): instead of averages, use multidim medians as centroids (cluster

UVACS4501:MachineLearning

Lecture21: UnsupervisedClustering(III)

Dr.Yanjun Qi

UniversityofVirginiaDepartmentofComputerScience

Wherearewe?èmajorsectionsofthiscourse

q Regression(supervised)q Classification(supervised)

q Featureselection

q Unsupervisedmodelsq DimensionReduction(PCA)q Clustering(K-means,GMM/EM,Hierarchical )

q Learningtheoryq Graphicalmodels

12/6/18 2

Dr.YanjunQi/UVACS

AnunlabeledDatasetX

• Data/points/instances/examples/samples/records:[rows]• Features/attributes/dimensions/independentvariables/covariates/predictors/regressors:[columns]

12/6/18 3

a data matrix of n observations on p variables x1,x2,…xp

Unsupervisedlearning =learningfromraw(unlabeled,unannotated,etc)data,asopposedtosuperviseddatawhereaclassificationlabelofexamplesisgiven

Dr.YanjunQi/UVACS

• Find groups (clusters) of data points such that data points in a group will be similar (or related) to one another and different from (or unrelated to) the data points in other groups

Whatisclustering?

Inter-cluster distances are maximized

Intra-cluster distances are minimized

12/6/18 4

Dr.YanjunQi/UVACS

12/6/18

Roadmap:clustering

§ Definitionof"groupness”§ Definitionof"similarity/distance"§ Representationforobjects§ Howmanyclusters?§ ClusteringAlgorithms

§ Partitional algorithms§ Hierarchicalalgorithms

§ Formalfoundationandconvergence5

Dr.YanjunQi/UVACS

12/6/18

6

ClusteringAlgorithms

• Partitional algorithms– Usuallystartwitharandom(partial)partitioning

– Refineititeratively• Kmeansclustering• Mixture-Modelbasedclustering

• Hierarchicalalgorithms– Bottom-up,agglomerative– Top-down,divisive

Dr.YanjunQi/UVACS

(2)Partitional Clustering

• Nonhierarchical• Constructapartitionofn objectsintoasetofK clusters

• UserhastospecifythedesirednumberofclustersK.

12/6/18 7

Dr.YanjunQi/UVACS

12/6/18

8

OtherpartitioningMethods• Partitioningaroundmedoids (PAM):insteadofaverages,

usemultidim mediansascentroids(cluster“prototypes”).Dudoit andFreedland (2002).

• Self-organizingmaps(SOM):addanunderlying“topology”(neighboringstructureonalattice)thatrelatesclustercentroidstooneanother.Kohonen (1997),Tamayoetal.(1999).

• Fuzzyk-means:allowfora“gradation” ofpointsbetweenclusters;softpartitions.GashandEisen (2002).

• Mixture-basedclustering:implementedthroughanEM(Expectation-Maximization)algorithm.Thisprovidessoftpartitioning,andallowsformodelingofclustercentroidsandshapes.(Yeung etal.(2001),McLachlanetal.(2002))

Dr.YanjunQi/UVACS

12/6/18

9






Dr.YanjunQi/UVACS

Islandsofmusic(Pampalk etal.,KDD’03)

E.g.: SOM Used for Visualization

12/6/18 10

Dr.YanjunQi/UVACS6316/f16

12/6/18

11






Dr.YanjunQi/UVACS

12/6/18

12






Dr.YanjunQi/UVACS

Partitional :GaussianMixtureModel

• 1.ReviewofGaussianDistribution• 2.GMMforclustering:basicalgorithm• 3.GMMconnectingtoK-means• 4.ProblemsofGMMandK-means

12/6/18

Dr.YanjunQi/UVACS

13

A Gaussian Mixture Model for Clustering

• Assume that data are generated from a mixture of Gaussian distributions

• For each Gaussian distribution– Center: j

– covariance: j

• For each data point– Determine membership

: if belongs to j-th clusterij iz x

12/6/18 14

Dr.YanjunQi/UVACS

⌃µ

GaussianDistribution

Courtesy:http://research.microsoft.com/~cmbishop/PRML/index.htm

CovarianceMatrixMean12/6/18

Dr.YanjunQi/UVACS

15

( )2~ ,X N µ σ

Example: the Bivariate Normal distribution

p( !x) = f x1,x2( ) = 1

2π( ) Σ 1/2 e−1

2!x−!µ( )T Σ−1 !x−

!µ( )

with

!µ =

µ1

µ2

⎡

⎣⎢⎢

⎤

⎦⎥⎥

22 1 2

22 22 22 2 2

σ σ σ ρσ σσ σ ρσ σ σ

11 1 1

×1 1

⎡ ⎤⎡ ⎤Σ = = ⎢ ⎥⎢ ⎥

⎣ ⎦ ⎣ ⎦

and

( )2 2 2 222 12 1 2 1σ σ σ σ σ ρ11Σ = − = −

12/6/18

Dr.YanjunQi/UVACS

16

Scatter Plots of data from the bivariate Normal distribution

Dr.YanjunQi/UVACS

HowtoEstimateGaussian:MLE

µ = 1

n ixi=1

n

∑

• In the 1D Gaussian case, we simply set the mean and the variance to the sample mean and the sample variance:

2

σ = 1n

2

(Xi−µ)i=1

n

∑

Dr.YanjunQi/UVACS

Dr.YanjunQi/UVACS

< X1, X2!, X p >~ N µ

"#,Σ( )

The p-multivariate Normal distribution



12/6/18

Dr.YanjunQi/UVACS

20

Application:ThreeSpeakerRecognitionTasks

MIT Lincoln Laboratory7

Speaker Recognition Tasks

?

?

?

?

Whose voice is this?Whose voice is this??

?

?

?

Whose voice is this?Whose voice is this?

Identification

?

Is this Bob’s voice?Is this Bob’s voice?

?

Is this Bob’s voice?Is this Bob’s voice?

Verification/Authentication/Detection

Speaker B

Speaker A

Which segments are from the same speaker?Which segments are from the same speaker?

Where are speaker changes?Where are speaker changes?

Speaker B

Speaker A

Which segments are from the same speaker?Which segments are from the same speaker?

Where are speaker changes?Where are speaker changes?

Segmentation and Clustering (Diarization)

slidefromDouglasReynolds12/6/18

Dr.YanjunQi/UVACS

21

Application:GMMsforspeakerrecognition

• AGaussianmixturemodel(GMM)representsastheweightedsumofmultipleGaussiandistributions

• EachGaussianstatei hasa– Mean– Covariance– Weight

Dr. Yanjun Qi / UVA CS

Dim 1Dim 2

Model j

p(x | j)

µ j

Σ j

wj ≡ p(µ = µ j )

RecognitionSystemsGaussianMixtureModels


Parameters µ j

Σ j

wj

Dim 1Dim 2

( )p x

12/6/18 23

RecognitionSystemsGaussianMixtureModels


Model Components

Parameters

Dim 1Dim 2

( )p x

12/6/18 24

j = 1,..., K

Learning a Gaussian Mixture• Probability Model

p( !x = !xi )

= p( !x = !xi ,!µ =!µ j )

j∑

= p(!µ =!µ j ) p( !x = !xi |

!µ =!µ j )

j∑

= p(!µ =!µ j )

1

2π( )p/2Σ j

1/2 e−1

2!x−!µ j( )T Σ j

−1 !x−!µ j( )

j∑

12/6/18 25

Dr.YanjunQi/UVACS

Totallowofprobability

Chainrule

AGaussianmixturemodel (GMM)representsastheweightedsumofmultipleGaussiandistributions

MaxLog-likelihoodofObservedDataSamples

Dr.YanjunQi/UVACS

o Log-likelihoodofdata

ApplyMLEtofind

optimalGaussianparameters

{p(!µ = µ j )}, j = 1...K{ }

{!µ j ,Σ j , j = 1...K}

26

logp(x1, x2, x3, ..., xn) =

logi=1..n∏ p(

!µ =!µ j )

1

2π( )p/2Σ j

1/2e−1

2!xi−!µ j( )T Σ j

−1 !xi−!µ j( )

j=1..K∑

12/6/18

27

Expectation-MaximizationfortrainingGMM

• Start:– "Guess"thecentroidandcovarianceforeachoftheKclusters

– “Guess”theproportionofclusters,e.g.,uniformprob 1/K

• Loop– For each point, revising its proportions belonging to each

of the K clusters – For each cluster, revising both the mean (centroid

position) and covariance (shape)

Dr.YanjunQi/UVACS

12/6/18

Dr.YanjunQi/UVACS

28

each cluster, revising both the mean (centroid position) and covariance (shape)

Another Gaussian Mixture Example: Start

12/6/18 29

Dr.YanjunQi/UVACS

Another GMM Example: After First Iteration

12/6/18 30

Dr.YanjunQi/UVACS

For each cluster, revising its mean (centroid position), covariance (shape) and proportion in the mixture

For each point, revising its proportions belonging to each of the K clusters

Another GMM Example: After 2nd Iteration

12/6/18 31

Dr.YanjunQi/UVACS



After 3rd Iteration

12/6/18 32

Dr.YanjunQi/UVACS



After 4th Iteration

12/6/18 33

Dr.YanjunQi/UVACS



After 5th Iteration

12/6/18 34

Dr.YanjunQi/UVACS



After 6th Iteration

12/6/18 35

Dr.YanjunQi/UVACS



Another GMM Example: After 20th Iteration

12/6/18 36

Dr.YanjunQi/UVACS



Recap:GaussianMixtureModels


Model Components

Parameters

Dim 1Dim 2

( )p x

12/6/18 37

j = 1,..., K

Dr.YanjunQi/UVACS

TheSimplestGMMassumption

• Each component generates data from a Gaussian with • mean µi

• Shared diagonal covariance matrix σ2I

µ1

µ2

µ3

12/6/18 38

Dr.YanjunQi/UVACS

AnotherSimpleGMMassumption


• Shared covariance matrix as diagonal matrix

µ1

µ2

µ3

12/6/18 39

Dr.YanjunQi/UVACS

TheGeneral GMMassumption

µ1

µ2

µ3


• covariance matrix Σι

12/6/18 40

Dr.YanjunQi/UVACS

AnotherSimpleGMMassumption


• Cluster-specific diagonal covariance matrix as σϕ

2I

µ1

µ2

µ3

σϕ2I

12/6/18 41

Dr.YanjunQi/UVACS

AbitMoreGeneralGMMassumption


• Shared covariance matrix as full matrix

µ1

µ2

µ3

12/6/18 42

Concrete Equations for Learning a Gaussian Mixture(when assuming with known shared covariance)

12/6/18 43

Dr.YanjunQi/UVACS

AssumingKnownandShared

p( !x = !xi )

= p( !x = !xi ,!µ =!µ j )

µ j

∑

= p(!µ =!µ j ) p( !x = !xi |

!µ =!µ j )

j∑

= p(!µ =!µ j )

1

2π( )p/2Σ

1/2 e−1

2!xi−!µ j( )T Σ−1 !xi−

!µ j( )

j∑

when assuming with known shared covariance

Learning a Gaussian Mixture(when assuming with known shared covariance)

=

1

2π( )p/2Σ

1/2 e−1

2!xi−!µ j( )T Σ−1 !xi−

!µ j( ) p(µ = µ j )

1

2π( )p/2Σ

1/2 e−1

2!xi−!µs( )T Σ−1 !xi−

!µs( )

p(µ = µs )s=1

k

∑

E[zij ] = p(

!µ = µ j | x = xi )E-Step

12/6/18 44

Dr.YanjunQi/UVACS

=p(x = xi |µ = µ j ) p(µ = µ j )

p(x = xi |µ = µs ) p(µ = µs )s=1

k

∑

E-step (vs. Assignment Step in K-means)

=

1

2π( )p/2Σ

1/2 e−1

2!x−!µ j( )T Σ−1 !x−

!µ j( ) p(µ = µ j )

1

2π( )p/2Σ

1/2 e−1

2!x−!µs( )T Σ−1 !x−

!µs( )

p(µ = µs )s=1

k

∑

[ ] ( | )ij j iE z p x xµ µ= = =E-Step

12/6/18 45

Dr.YanjunQi/UVACS

=p(x = xi |µ = µ j ) p(µ = µ j )

p(x = xi |µ = µs ) p(µ = µs )s=1

k

∑


Learning a Gaussian Mixture

µ j ←1

E[zij ]i=1

n

∑E[zij ]xi

i=1

n

∑M-Step

p(µ = µ j )←

1n

E[zij ]i=1

n

∑

12/6/18 46

Dr.YanjunQi/UVACS

Covariance: j (j: 1 to K) can also be derived in the M-step under a full setting

⌃


M-step (vs. Centroid Step in K-means)

µ j ←1

E[zij ]i=1

n

∑E[zij ]xi

i=1

n

∑M-Step

p(µ = µ j )←

1n

E[zij ]i=1

n

∑

12/6/18 47

Dr.YanjunQi/UVACS

Covariance: j (j: 1 to K) will also be derived in the M-step under a full setting

⌃


12/6/18

Dr.YanjunQi/UVACS

48

M-stepforEstimatingunknownCovarianceMatrix

(moregeneral,detailsinEM-Extralecture)

!!

!Σ j(t+1) = i=1

n E[zij ](t )(xi − µ j(t+1))(xi − µ j

(t+1))T∑E[zij ](t )

i=1

n

∑

12/6/18

49

Recap:Expectation-MaximizationfortrainingGMM

• Start:– "Guess"thecentroidandcovarianceforeachoftheKclusters

– “Guess”theproportionofclusters,e.g.,uniformprob 1/K

• Loop– For each point, revising its proportions belonging to each

of the K clusters – For each cluster, revising both the mean (centroid

position) and covariance (shape)

Dr.YanjunQi/UVACS



12/6/18

Dr.YanjunQi/UVACS

50

Recap: K-means iterative learning

argmin!C j ,mi , j{ }

mi, j!xi −!C j( )2

i=1

n

∑j=1

K

∑

{ } { },Memberships and centers are correlated.i j jm C

Given memberships mi, j{ }, !C j =

mi, j!xi

i=1

n

∑

mi, ji=1

n

∑

Given centers {!C j}, mi, j =

1 j = argmink

(!xi −!C j )

2

0 otherwise

⎧⎨⎪

⎩⎪

M-Step

E-Step

12/6/18 51

Dr.YanjunQi/UVACS

Dr.YanjunQi/UVACS

52

Compare:K-means

• TheEMalgorithmformixturesofGaussiansislikea"softversion"oftheK-meansalgorithm.

• IntheK-means“E-step”wedohardassignment:• IntheK-means“M-step”weupdatethemeansastheweightedsumofthedata,butnowtheweightsare0or1:

12/6/18

12/6/18

Dr.YanjunQi/UVACS

53

argmin!C j ,mi , j{ }

mi, j!xi −!C j( )2

i=1

n

∑j=1

K

∑

log p(x = xi )i=1

n

∏i∑ = log p(µ = µ j )

1

2π( ) Σ 1/2e−1

2!x−!µ j( )T Σ−1 !x−

!µ j( )

µ j

∑⎡

⎣

⎢⎢

⎤

⎦

⎥⎥i

∑

• K-Meanonlydetectsphericalclusters.• GMMcanadjustitsselftoellipticshapeclusters.

(3) GMM Clustering

Clustering

Likelihood

EM algorithm

Each point’s soft membership & mean

/ covariance per cluster

Task

Representation

Score Function

Search/Optimization

Models, Parameters

12/6/18 54

MixtureofGaussian

Dr.YanjunQi/UVACS

log p(x = xi )i=1

n

∏i∑ = log p(µ = µ j )

1

2π( ) Σ j

1/2e−1

2!x−!µ j( )T Σ j

−1 !x−!µ j( )

µ j

∑⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥i

∑



12/6/18

Dr.YanjunQi/UVACS

55

Dr.YanjunQi/UVACS

UnsupervisedLearning:notashardasitlooks

Sometimes easy

Sometimes impossible

and sometimes in between

Problems (I)

• Both k-means and mixture models need to compute centers of clusters and explicit distance measurement– Given strange distance measurement, the center of clusters

can be hard to computeE.g.,

!x − !x '

∞= max x1 − x1

' , x2 − x2' ,..., xp − xp

'( )x y

z

∞ ∞− = −x y x z

12/6/18

Dr.YanjunQi/UVACS

57

Problem (II)

• Both k-means and mixture models look for compact clustering structures– In some cases, connected clustering structures are more desirable

Graph based clustering

e.g. MinCut, Spectral

clustering

12/6/18 58

Dr.YanjunQi/UVACS

e.g. Image Segmentation through minCut

12/6/18 59

Dr.YanjunQi/UVACS

References

q Hastie,Trevor,etal.Theelementsofstatisticallearning.Vol.2.No.1.NewYork:Springer,2009.

q BigthankstoProf.EricXing@CMUforallowingmetoreusesomeofhisslides

q BigthankstoProf.Ziv Bar-Joseph@CMUforallowingmetoreusesomeofhisslides

q clusteringslidesfromProf.Rong Jin@MSU

12/6/18 60

Dr.YanjunQi/UVACS

Documents

UVA CS 4501: Machine Learning Lecture 21: Unsupervised … · 2020. 8. 4. · • Partitioning around medoids (PAM): instead of averages, use multidim medians as centroids (cluster