34
Incomplete Graphical Models Nan Hu

Incomplete Graphical Models

  • Upload
    jabir

  • View
    53

  • Download
    2

Embed Size (px)

DESCRIPTION

Incomplete Graphical Models. Nan Hu. Outline. Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture Regression and classification EM on conditional mixture A general formulation of EM Algorithm. K-means clustering. - PowerPoint PPT Presentation

Citation preview

Page 1: Incomplete Graphical Models

Incomplete Graphical Models

Nan Hu

Page 2: Incomplete Graphical Models

Outline

Motivation K-means clustering

Coordinate Descending algorithm Density estimation

EM on unconditional mixture Regression and classification

EM on conditional mixture A general formulation of EM Algorithm

Page 3: Incomplete Graphical Models

K-means clustering

Problem: Given a set of observations

how to group them into a set of K clustering, supposing the value of K is given.

First Phase

Second Phase

Page 4: Incomplete Graphical Models

K-means clustering

Original Set

First Iteration

Second Iteration

Third Iteration

Page 5: Incomplete Graphical Models

K-means clustering

Coordinate descent algorithm The algorithm is trying to minimize distortion

measure J

by setting the partial derivatives to zero

Page 6: Incomplete Graphical Models

Unconditional MixtureProblem: If the given sample data demonstrate

multimodal densities, how to estimate the true density?

Fit a single density with this bimodal case.

Although algorithm converges, the results bear little relationship to the truth.

Page 7: Incomplete Graphical Models

Unconditional Mixture

A “divide-and-conquer” way to solve this problem

Introducing latent variable Z

Z

X

Multinomial node taking on one of K values

Assign a density model for each subpopulation, overall density is

Back

Page 8: Incomplete Graphical Models

Unconditional Mixture

Gaussian Mixture Models In this model, the mixture components are

Gaussian distributions with parameters

Probability model for a Gaussian mixture

Page 9: Incomplete Graphical Models

Unconditional Mixture

Posterior probability of latent variable Z:

Log likelihood:

Page 10: Incomplete Graphical Models

Unconditional Mixture

Partial derivative of over using Lagrange Multipliers

Solve it, we have

Page 11: Incomplete Graphical Models

Unconditional Mixture

Partial derivative of over

Setting it to zero, we have

Page 12: Incomplete Graphical Models

Unconditional Mixture

Partial derivative of over

Setting it to zero, we have

Page 13: Incomplete Graphical Models

Unconditional Mixture

The EM Algorithm First Phase

Second Phase

Back

Page 14: Incomplete Graphical Models

Unconditional Mixture

EM algorithm from expected complete log likelihood point of view

Suppose we observed the latent variables , the data set becomes completely

observed, the likelihood is defined as the complete log likelihood

nZ),( nn zx

n i iiniin

ni

ziini

n nnnnc

xNz

xN

zxpzxlin

)],|(log[

)],|([log

)|,(log),|(

Page 15: Incomplete Graphical Models

Unconditional Mixture

We treat the as random variables and take expectations conditioned on X and .

Note are binary r.v., where

Use this as the “best guess” for , we haveExpected complete log likelihood

nZ

nZ)(t

nZ

n i iiniti

n

n i iiniin

n nnnnc

xN

xNZ

Zxpzxl

t

tt

)],|(log[

)],|(log[

)|,(log),|(

)(

)(

)()(

Page 16: Incomplete Graphical Models

Unconditional Mixture

Minimizing expected complete log likelihood by setting the derivatives to zero, we have

Page 17: Incomplete Graphical Models

Conditional Mixture

Graphical Model

X

Z

Y Latent variable Z, multinomial node taking on one of K values

For regression and classification

The relationship between X and Z can be modeled in a discriminative classification way, e.g. softmax func.

Back

Page 18: Incomplete Graphical Models

Conditional Mixture

By marginalizing over Z,

X is taken to be always observed. The posterior probability is defined as

Page 19: Incomplete Graphical Models

Conditional Mixture

Some specific choice of mixture components Gaussian components

Logistic components

Where is the logistic function:

Page 20: Incomplete Graphical Models

Conditional Mixture

Parameter estimation via EMComplete log likelihood :

Use expectation as the “best guess”, we have

n i ininnni

in

n i

zin

innni

ninnnnnnc

xZypxz

xZypx

xzypzyxl

in

)],,1|(),(log[

)],,1|(),([log

),|,(log)},,(|{

),,(

),,|1()(

)()(

tnn

in

tnn

in

in

yx

yxZpZ t

Page 21: Incomplete Graphical Models

Conditional Mixture

The expected complete log likelihood can then be written as

Taking partial derivatives and setting them to zero to find the update formula for EM

n i in

innni

tin

nnn

xZypx

zyxl

)],,1|(),(log[

)},,(|{)(

Page 22: Incomplete Graphical Models

Conditional Mixture

Summary of EM algorithm for conditional mixture (E step): Calculate the posterior probabilities (M step): Use the IRLS algorithm to update the

parameter , base on data pairs . (M step): Use the weighted IRLS algorithm to

update the parameters , based on the data points , with weights .

)(tin

),( )(tinnx

i),( nn yx )(ti

n

Back

Page 23: Incomplete Graphical Models

General Formulation

- all observable variables - all latent variables - all parametersSuppose is observed, the ML estimate is

However, is in fact not observed

Z

XZ

)|,(logmaxarg),;(maxarg zxpzxlc

Complete log likelihood

Z

z

zxpxpxl )|,(log)|(log);( Incomplete log likelihood

Page 24: Incomplete Graphical Models

General Formulation

Suppose factors in some way, complete log likelihood turns to be

Since is unknown, it’s not clear how to solve this ML estimation. However, if we average over the r.v. of

)|,( zxp

z

zc zxpxzfzxl )|,(log),|(),;(

),|( zxzf

),|( zxzf

z

zxzfxzq

),|()|(

Page 25: Incomplete Graphical Models

General Formulation

Use as an estimate of , complete log likelihood becomes expected complete log likelihood

This expected complete log likelihood becomes solvable, and hopefully, it’ll also improve the complete log likelihood in some way. (The basic idea behind EM.)

)|( xzq ),|( zxzf

z

qc zxpxzqzxl )|,(log)|(),;(

Page 26: Incomplete Graphical Models

General Formulation

EM maximizes incomplete log likelihood

),()|(

)|,(log)|(

)|()|,()|(log

)|,(log)|(log);(

qLxzqzxpxzq

xzqzxpxzq

zxpxpxl

z

z

z

Jensen’s Inequality

Auxiliary Function

Page 27: Incomplete Graphical Models

General Formulation

Given , maximizing is equal to maximizing the expected complete log likelihood

)|( xzq ),( qL

zqc

z z

z

xzqxzqzxl

xzqxzqzxpxzqxzqzxpxzqqL

)|(log)|(),;(

)|(log)|()|,(log)|()|(

)|,(log)|(),(

Page 28: Incomplete Graphical Models

General Formulation

Given , the choice yields the maximum of .

),( qL

),|()|( )()1( tt xzpxzq

);(

)|(log

)|(log),|(

),|()|,(log),|()),|((

)(

)(

)()(

)(

)()()()1(

xl

xp

xpxzp

xzpzxpxzpxzqL

t

t

z

tt

zt

tttt

Note: is the upper bound of );( )( xl t),( )(tqL

Page 29: Incomplete Graphical Models

General Formulation

From above, at every step of EM, we maximized .

However, how do we know whether the finally maximized also maximized incomplete log likelihood ?

),( qL

),( qL);( xl

Page 30: Incomplete Graphical Models

General Formulation

The different between and

z

z

zz

z

xzpxzqDxzpxzqxzq

xpxzpxzqxpxzq

xzqzxpxzpxpxzq

xzqzxpxzqxlqLxl

)),|(||)|((),|(

)|(log)|(

)|(),|()|(log)|(log)|(

)|()|,(log)|()|(log)|(

)|()|,(log)|();(),();(

);( xl ),( qL

KL divergencenon-negative and uniquely minimized at ),|()|( xzpxzq

Page 31: Incomplete Graphical Models

General Formulation

EM and alternating minimization Recall the maximization of the likelihood is

exactly the same as minimization of KL divergence between the empirical distribution and the model.

Including the latent variable , KL divergence comes to be a “complete KL divergence” between joint distributions on .

Z

),( zx

Page 32: Incomplete Graphical Models

General Formulation

Reformulated EM algorithm (E step)

(M step)

)||(minarg)|( )()1( t

q

t qDxzq

)||(minarg )1()1(

tt qDAlternating minimization algorithm

Page 33: Incomplete Graphical Models

Summary

Unconditional Mixture Graphic model EM algorithm

Conditional Mixture Graphic model EM algorithm

A general formulation of EM algorithm Maximizing auxiliary function Minimizing “complete KL divergence”

Page 34: Incomplete Graphical Models

Incomplete Graphical Models

Thank You!