30
8/20/2019 Cluster Anlysis http://slidepdf.com/reader/full/cluster-anlysis 1/30 Data Mining Cluster Analysis Lecture Notes 10 Probabilistic Model-Based Clustering 

Cluster Anlysis

Embed Size (px)

Citation preview

Page 1: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 1/30

Data Mining

Cluster Analysis

Lecture Notes 10

Probabilistic Model-Based Clustering 

Page 2: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 2/30

Probabilistic Model-Based Clustering 

Clustering methods discussed so far

 –  Every data object is assigned to exactly one cluster

Some applications may need for fuzzy or soft

cluster assignment

 –  Ex. An e-game could belong to both entertainment

and software

!  Methods:

 –  fuzzy clusters

 –  probabilistic model-based clusters

Page 3: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 3/30

Fuzzy Clustering 

Given a set of objects, X=[x1,!,xn], a fuzzy set S is asubset of X that allows each object in X to have a

membership degree between 0 and 1.

!  A fuzzy set, S, can be modeled as a function:

F S : X!

 [0, 1]!  Example: Popularity of cameras is defined as a fuzzy

mapping

 –  Then, A(0.05), B(1), C (0.86), D(0.27)

 pop (o ) =

1 if 1000 or more units of  o  are soldi 1000   if  i  (i  < 1000) units of  o  are sold.

Camera Sales (units)

A    50B    1320

C    860

D    270

Page 4: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 4/30

Fuzzy Clustering 

k fuzzy clusters C1, !,Ck , represented as a partitionmatrix M = [wij] ( wij is the membership degree of an

object oi in fuzzy cluster C j)

 –  for each object oi  and cluster C  j , 0 " w ij  " 1 (fuzzy set)

 –  for each object oi 

, , equal participation in the clustering

 –  for each cluster C  j  , ensures there is no empty cluster

!  For an object oi , SSE:

 –  where the parameter p controls the influence of the degrees of membership 

For a cluster Ci , SSE:

!  Measure how well a clustering fits the data:

SSE(o i )=

 j =1

w  p ij dist (o i , c  j )

2,

SSE(C  j )=

i =1

w  p ij dist (o i , c  j )

2.

SSE(C )=

i =1

 j =1

w  p ij dist (o i , c  j )

2.

 j =1

w ij  =  1

0 <i =1

w ij  <  n .

Page 5: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 5/30

Fuzzy Clustering - Example 

Example: Suppose the store has six reviews

!  Let cluster features be

 –  C1 :digital camera and lens 

 –  C2: computer  

!  For review, Ri, and cluster, C j, wij defined as

The partition matrix

Here review R4 belongs to

clusters C1 and C2 with

membership degrees 2/3

and 1/3, respectively

w ij  =|R i ∩C  j |

|R i ∩ (C 1 ∪C 2)|=

|R i ∩C  j |

|R i ∩ {digital camera , lens , computer }|.

Review ID Keywords

R 1   digital camera, lens

R 2   digital camera

R 3   lens

R 4   digital camera, lens, computer

R 5   computer, CPU

R 6   computer, computer game   M  =

1 0

1 0

1 023

13

0 1

0 1

.

Page 6: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 6/30

Page 7: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 7/30

Model-Based Clustering 

 A set C of k probabilistic clusters C 1,",C k  with probability densityfunctions f 1,", f k , respectively, and their probabilities #1,", #k.

Probability of an object o generated by cluster C  j  is

!  Probability of o generated by the set of cluster C  is

!  Since objects are assumed to be generated independently, for a data

set D = {o1, !, on}, we have,

P (o |C  j ) = ω j  f   j (o ).

P (o |C ) =

 j =1

ω

 j  f   j (o ).

P (D |C ) =

i =1

P (o i |C ) =

i =1

 j =1

ω j  f   j (o i ).

Page 8: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 8/30

Model-Based Clustering 

Task: Find a set C of k probabilistic clusters such that P(D|C) ismaximized.

However, maximizing P(D|C) is often intractable since the probability

density function of a cluster can take an arbitrarily complicated form

!  To make it computationally feasible (as a compromise), assume the

probability density functions being some parameterized distributions

O = {o1, !, on} (n observed objects), # = {$1, !, $k} (parameters of

the k distributions), and P j(oi| $ j) is the probability that oi is generated

from the j-th distribution using parameter $ j, we have

P (o i |) =

 j =1

ω j P  j (o i | j ),

P (O|) =

i =1

 j =1

ω j P  j (o i | j ).

Page 9: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 9/30

Univariate Gaussian Mixture Model 

 Assume the probability density function of each cluster follows a 1-d

Gaussian distribution. Suppose that there are k clusters.

!  The probability density function of each cluster are centered at µj with

standard deviation % j, $ j, = (µ j, % j), we have

 Assuming that each cluster has the same probability,

 –  that is &1 = &2 = ' ' ' = &k = 1 ,

Then, we have

P (o i | j ) =1√ 2πσ  j 

e − (o i −µ j )

2

2σ 2 .

P (O|) =1

i =1

 j =1

1√ 2πσ  j 

e − (o i −µ j )

2

2σ 2 .

P (o i |) =

1

 j =1

1√ 2πσ  j e −

(o i −µ j )2

2σ 2 .

Page 10: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 10/30

The EM (Expectation Maximization) Algorithm 

The k-means algorithm has two steps at each iteration: –  Expectation Step (E-step): Given the current cluster centers,

each object is assigned to the cluster whose center is closest to

the object: An object is expected to belong to the closest cluster

 –  Maximization Step (M-step): Given the cluster assignment, for

each cluster, the algorithm adjusts the center  so that the sum of

distance from the objects assigned to this cluster and the new

center is minimized

!  The (EM) algorithm: A framework to approach maximum likelihood

or maximum a posteriori estimates of parameters in statistical

models.

 –  E-step assigns objects to clusters according to the current fuzzy

clustering or parameters of probabilistic clusters

 –  M-step finds the new clustering or parameters that maximize the

sum of squared error (SSE) or the expected likelihood

Page 11: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 11/30

Example: Fuzzy Clustering Using the EM Algorithm 

Initially, let c1 = a and c2 = b!  1st E-step: assign o to c1,w. wt =

 – 

1st M-step: recalculate the centroids according to the partition matrix,

minimizing the sum of squared error (SSE)

Iteratively calculate this until the cluster centers converge or the change

is small enough

Page 12: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 12/30

Computing Mixture Models with EM 

12

Given n objects O = {o1, !, on}, we want to mine a set of parameters

# = {#1, !, #k} such that P(O|#) is maximized, where # j = (µ j, % j) are

the mean and standard deviation of the j-th univariate Gaussian

distribution

We initially assign random values to parameters # j, then iteratively

conduct the E- and M- steps until converge or sufficiently small change

 At the E-step, for each object oi, calculate the probability that oi  belongsto each distribution,

 At the M-step, adjust the parameters # j = (µ j, % j) so that the expected

likelihood P(O|#) is maximized

P ( j |o i ,) =P (o i | j )k l =1P (o i |l )

.

µ j  =1

n i =1

o i P ( j |o i ,)n l =1P ( j |o l ,)

=1

n i =1 o i P ( j |o i ,)n i =1P ( j |o i ,)

  σ  j  =

 n i =1 P ( j |o i ,)(o i − u  j )2n 

i =1 P ( j |o i ,).

Page 13: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 13/30

Advantages and Disadvantages of Mixture Models 

Strength –  Mixture models are more general than partitioning and fuzzy

clustering

 –  Clusters can be characterized by a small number of parameters

 – 

The results may satisfy the statistical assumptions of thegenerative models

Weakness

 –  Converge to local optimal (overcome: run multi-times w. random

initialization)

 – 

Computationally expensive if the number of distributions is large,

or the data set contains very few observed data points

 –  Need large data sets

 –  Hard to estimate the number of clusters

Page 14: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 14/30

Notes about slides

The following slides are adapted from“Introduction to Data Mining” book. It describes

the EM in simpler way.

!  There may be some notation differences from

previous ones.

Page 15: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 15/30

Model-based clustering

In order to understand our data, we will assume that thereis a generative process (a model) that creates/describes

the data, and we will try to find the model that best fits the

data.

 –  Models of different complexity can be defined, but we will

assume that our model is a distribution from which data pointsare sampled

 –  Example: the data is the height of all people in Greece

!  In most cases, a single distribution is not good enough todescribe all data points: different parts of the data follow a

different distribution

 –  Example: the data is the height of all people in Greece and China

 –  We need a mixture model

 –  Different distributions correspond to different clusters in the data.

Page 16: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 16/30

Gaussian Distribution

Example: the data is the height of all people inGreece

 –  Experience has shown that this data follows aGaussian (Normal) distribution

 – 

Reminder: Normal distribution:

 – 

! = mean, " = standard deviation

   

 

Page 17: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 17/30

Gaussian Model

What is a model? –   A Gaussian distribution is fully defined by the mean ! 

and the standard deviation " 

 –  We define our model as the pair of parameters

# = (!,")

This is a general principle: a model is defined as

a vector of parameters !  

!  Fitting the model: We want to find the normaldistribution that best fits our data

 –  Find the best values for ! and " and " 

 –  But what does best fit mean? 

Page 18: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 18/30

Maximum Likelihood Estimation (MLE)

Suppose that we have a vector  $ = (%1, ... , %&) of values

!  And we want to fit a Gaussian '(!, ") model to the data

!  Probability of observing point %(::

!  Probability of observing all points (assume independence)

!  We want to find the parameters # = (!, ") that maximizethe probability )( $|#)

 

   

 

 

   

 

Page 19: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 19/30

Page 20: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 20/30

Maximum Likelihood Estimation (MLE)

Note: these are also the most likely parametersgiven the data

If we have no prior  information about #, or X,

then maximizing )( $|#) is the same as

maximizing )(#|X)

 

 

Page 21: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 21/30

 A Gaussian Distribution

Page 22: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 22/30

 Mixture of Gaussians

!  Suppose that you have the heights of people from Greeceand China. In this case the data is the result of the

mixture of two Gaussians

 –  One for Greek people, and one for Chinese people

 –  Identifying for each value which Gaussian is most likely to have

generated it will give us a clustering.

Page 23: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 23/30

Mixture model

 A value %( is generated according to the followingis generated according to the followingprocess:

 –  First select the nationality

" With probability +, select Greek, with probability +- select

China (+, + +- = 1) –  Given the nationality, generate the point from the

corresponding Gaussian

"  )(%(|#,) ~ '(!,,",) if Greece

"  )(%(|#C) ~ '(!C,"C) if China

Page 24: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 24/30

Mixture Model

Our model has the following parameters

For value %(, we have:, we have:

For all values  $ =(%1,...,%&)

We want to estimate the parameters that

maximize the Likelihood of the data

     

 

 

Mixture probabilities Distribution Parameters

Page 25: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 25/30

Mixture Models

Once we have the parameters# = (+,, +-, !,, !-, ",, "-) we can estimate the

membership probabilities )(,|%() and )(C|%() for each

point %(::

 – 

This is the probability that point %( belongs to thebelongs to theGreek or the Chinese population (cluster )

   

     

 

     

 

 

Page 26: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 26/30

EM (Expectation Maximization) Algorithm

Initialize the values of the parameters in # to somerandom values

!  Repeat until convergence

 –  E-Step: Given the parameters # estimate the membership

probabilities )(,|%() and )(C|%()

 –  M-Step: Compute the parameter values that (in expectation)

maximize the data likelihood

 

 

 

  

   

 

 

 

 

 

     

   

     

 

MLE Estimates

if  

Fraction of

population in G,C

Page 27: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 27/30

Relationship to K-means

E-Step: Assignment of points to clusters –  K-means: hard assignment, EM: soft assignment

M-Step: Computation of centroids

 –  K-means assumes common fixed variance (spherical

clusters)

 –  EM: can change the variance for different clusters or

different dimensions (elipsoid clusters)

If the variance is fixed then both minimize the

same error function

Page 28: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 28/30

EM Clustering

Page 29: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 29/30

EM Clustering

Page 30: Cluster Anlysis

8/20/2019 Cluster Anlysis

http://slidepdf.com/reader/full/cluster-anlysis 30/30

Mixture Model and K-means Clustering