3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

Expectation-Maximization Algorithm

CAP5610: Machine LearningInstructor: Guo-Jun QI

What is EM?

An algorithm alternating between assignment and estimation

A parameter estimation method: it falls into the general framework of maximum-likelihood estimation (MLE).

The general form was given in (Dempster, Laird, and Rubin, 1977), although essence of the algorithm appeared previously in various forms.

2

Outline

Maximum Likelihood Estimation (MLE) EM: basic concepts EM for Gaussian Mixture Model (GMM)

3

Outline


4

What is MLE?

Given A sample X=X1, …, Xn A vector of parameters θ

We define Likelihood of the data: P(X | θ) Log-likelihood of the data: L(θ)=log P(X|θ)

Given X, find

)(maxarg Θ=Ω∈

LMLθ

θ

5

MLE (cont)

Often we assume that Xis are independently identically distributed (i.i.d.)

Depending on the form of p(x|θ), solving optimization problem can be easy or hard.

)|(logmaxarg

)|(logmaxarg

)|,...,(logmaxarg

)|(logmaxarg

)(maxarg

1

Θ=

Θ=

Θ=

Θ=

Θ=

∑

∏

Ω∈

Ω∈

Ω∈

Ω∈

Ω∈

ii

ii

n

ML

XP

XP

XXP

XP

L

θ

θ

θ

θ

θθ

6

Example 1: Fit a line to observed data

x

y

7

Maximum likelihood estimation for the slope of a single line

Maximum likelihood estimate:

gives regression formula

Data likelihood for point n:

NnYX nn ,,1),,( =waXY +=

),0(~ 2σNw

Data:Model:

where

8

Example 2:Fitting two lines to observed data

x

y

9

Fitting two lines: on the one hand…

x

y

If we knew which points went with which lines, we’d be back at the single line-fitting problem, twice.

Line 1

Line 2

10

Fitting two lines, on the other hand…

x

y

We could figure out the probability that any point came from either line if we just knew the two equations for the two lines.

11

Expectation Maximization (EM): a solution to chicken-and-egg problems

12

EM example:

Step 1: randomly guess two lines as initialization.

13

EM example:

Step 2: Assign points to the closest lines.

14

EM example:

Step 3: Update the current fitting of lines to the assigned points.

15

EM example:

Step 4: Re-assign points to the closest lines.

16

EM example:

Converged!

Step 5: Re-fit lines to the assigned points.

17

EM Algorithm

EM with hard assignment: Randomly guess the initial models

The cluster means in K-means and the lines in multi-line fitting problem

Repeat Hard assignment: assign data points to the corresponding clusters

based on fitness Fitness: distance to cluster mean in K-means and distance to

the line in multi-line fitting. Re-fitting of the model for each cluster

In K-means: update cluster mean In multi-line fitting: update the line Both re-fitting rules can be formulated by maximizing the

corresponding likelihood of generating assigned data points Until convergence

18

Outline

Maximum Likelihood Estimation (MLE) EM: basic concepts (soft assignment) EM for Gaussian Mixture Model (GMM)

19

Basic setting in EM

X is a set of data points: observed data Y is a cluster index denoting the assignment of

data point to a cluster Y is not a deterministic variable as in hard assignment; it is now a

random variable for soft assignment.

EM is a method to find θML where

Calculating P(X | θ) directly is hard. Because we have to marginalize out Y.

Calculating P(X,Y|θ) is much simpler, where Y is “hidden” data (or “missing” data).

arg max ( ) arg max log ( | )ML L P X

20

The EM terminology

( , | ) ( | Y k, ) P(Y | ) ( | ) Pk kP X Y k P X k P X

21

EM Algorithm

22

Outline


23

d = feature dimensions x = random data point = mean value = covariance matrix

Gaussian Distribution

( )( ) ( )11( , ) exp

22 det( )

T

d

x xf x

µ µµ

π

− − Σ −Σ = −

Σ

Σµ

24

GMMs – Gaussian Mixture Models

W

H

Suppose we have 1000 data points in 2D space (w,h)

25

W

H

GMMs – Gaussian Mixture Models

Assume each data point is normally distributed Obviously, there are 5 sets of underlying gaussians

26

GMM

Model the data distribution by a combination of Gaussian functions

Given a set of sample points, how to estimate their assignments and the parameters of the Gaussian distribution for each cluster?

EM with GMM

In the context of GMM X: data points Y: which Gaussian creates which data points Distribution of complete data

Gaussian distribution for each cluster

Proportion of each cluster

where Pk’s must sum up to 1.

Soft assignment

E-Step: Assign each data Xi to a cluster k

Max-Step: Q-function

1 1 11 1

1

1 1

1 , ,

n n Tt t t tik i ik i k i kn

t t t ti ik ik k kn n

t tiik ik

i i

p x p x xp p

n p p

29

EM for GMM (summary)

Objective:Given N data points, find maximum likelihood estimation of :

Algorithm:1. Guess initial2. Perform E step (expectation)

Based on , assign each data point to a specific Gaussian component

3. Perform M step (maximization) Based on data points clustering, maximize Q function

4. Repeat 2-3 until convergence (~tens iterations)

Θ

1arg max ( ,..., )Nf x xΘ

Θ = Θ

Θ

Θ

30

EM Example

Gaussian j

data point t

blue: P(Yt=j|Xt)

31

EM Example

32

EM Example

33

EM Example

34

EM Example

35

EM Example

36

EM Example

37

EM Example

38

EM is more than clustering

Y need not necessarily be a variable of cluster membership

It may just be any variable that is unknown to us. Bad pixels in an image Lost labels for an example

There could be many possible Ys. EM is still applicable

39

Summary

EM is composed of two basic steps E-Step: Estimate the posterior distribution of

unobserved variables based on the current estimate of model parameters

M-Step: Update the model parameters by maximizing the expected likelihood weighted by the posterior distribution obtained in E-Step

Application of EM into Gaussian Mixture Model

40

The steps of EM algorithm

E-step: calculate

M-step: find

),(maxarg)1( tt Q θθθθ

=+

∫==

dyxypyxp

yxpEQt

xypt

t

),|()|,(log

)]|,([log),( ),|(

θθ

θθθθ

See online notes for more derivations and analysis!

41

How to write the Q function under GMM setting Likelihood of a data set is the multiplication of all the

sample likelihood, so

1 1

' ''

1 1 1 1 1

( , ) | , log , |

|| ,

|

, | | | , |

n Kt t

i i i ii k

t tk i kt

i i t tk i k

k

t t t t ti i i i i k i k

Q p y k x p x y k

p p xp y k x

p p x

p x y k p y k p x y k p p x

The Q function specific for GMM is

Plug in the definition of p(x|Θk), compute derivative w.r.t. the parameters, we obtain the iteration procedures

E step

M step

( )( ) ( )( )∑∑∑=

+++ =n

i

tki

tk

kk

tki

tk

tki

tktt xpp

xppxppQ

1

11

'''

1 |log|

|),( θθ

θθθ

( )

( )( )

∑

∑

∑

∑∑

∑

=

=+

=

=+

=

+−−

=Σ==

===

n

i

tik

Ttki

tki

n

i

tik

tkn

i

tik

i

n

i

tik

tk

n

i

tik

tk

k

tki

tk

tki

tkt

iitik

p

xxp

p

xpp

np

xppxppxkypp

1

11

1

11

1

1

'''

,,1

,)|(

)|(,|

µµµ

θθθ

Documents

3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance