74
Expectation- Maximization

Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Expectation-Maximization

Page 2: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

News o’ the dayFirst “3-d” picture of sun

Anybody got red/green sunglasses?

Page 3: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Administrivia

•No noose is good noose

Page 4: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Where we’re at

•Last time:

•E^3

•Finished up (our brief survey of) RL

•Today:

• Intro to unsupervised learning

•The expectation-maximization “algorithm”

Page 5: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

What’s with this EM thing?

Nobody expects...

Page 6: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Unsupervised learning•EM is (one form of) unsupervised learning:

•Given: data

•Find: “structure” of that data

•Clusters -- what points “group together”? (we’ll do this one today)

•Taxonomies -- what’s descended from/related to what?

•Parses -- grammatical structure of a sentence

•Hidden variables -- “behind the scenes”

Page 7: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Example task

Page 8: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Example task

We can see the clusters easily...

... but the computer can’t. How can weget the computer to identify the clusters?

Need: algorithm that takes data and returnsa label (cluster ID) for each data point

Page 9: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Parsing exampleWhat’s the grammatical structure of this

sentence?

He never claimed to be a god.

Page 10: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Parsing example

He never claimed to be a god.

What’s the grammatical structure of this sentence?

NN VVVV NNDetDetAdvAdv

NPNP

VPVP

NPNP

VPVP

SS

Page 11: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Parsing example

He never claimed to be a god.

What’s the grammatical structure of this sentence?

NN VVVV NNDetDetAdvAdv

NPNP

VPVP

NPNP

VPVP

SS

Note: entirely hidden information!Need to infer (guess) it in an ~unsupervised way.

Page 12: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

EM assumptions•All learning algorithms require data

assumptions

•EM: generative model

•Description of process that generates your data

•Assumes: hidden (latent) variables

•Probability model: assigns probability to data + hidden variables

•Often think: generate hidden var, then generate data based on that hidden var

Page 13: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Classic latent var model•Data generator looks like this:

•Behind a curtain:

• I flip a weighted coin

•Heads: I roll a 6-sided die

•Tails: I roll a 4-sided die

• I show you:

•Outcome of die

Page 14: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Your mission

•Data you get is sequence of die outcomes

•6, 3, 3, 1, 5, 4, 2, 1, 6, 3, 1, 5, 2, ...

•Your task: figure out what the coin flip was for each of these numbers

•Hidden variable: c≡outcome of coin flip

•What makes this hard?

Page 15: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

A more “practical” example•Robot navigating in physical world

•Locations in world can be occupied or unoccupied

•Robot wants occupancy map (so it doesn’t bump into things)

•Sensors are imperfect (noise, object variation, etc.)

•Given: sensor data

•Infer: occupied/unoccupied for each location

Page 16: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Classic latent var model•This process describes (generates) prob

distribution over numbers

•Hidden state: outcome of coin flip

Page 17: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Classic latent var model•This process describes (generates) prob

distribution over numbers

•Hidden state: outcome of coin flip

Page 18: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Classic latent var model•This process describes (generates) prob

distribution over numbers

•Hidden state: outcome of coin flip

•Observed state: outcome of die given (conditioned on) coin flip result

Page 19: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Classic latent var model•This process describes (generates) prob

distribution over numbers

•Hidden state: outcome of coin flip

•Observed state: outcome of die given (conditioned on) coin flip result

Page 20: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Classic latent var model•This process describes (generates) prob

distribution over numbers

•Hidden state: outcome of coin flip

•Observed state: outcome of die given (conditioned on) coin flip result

Page 21: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Probability of observations•Final probability of outcome (x) is mixture

of probability for each possible coin result:

Page 22: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Probability of observations•Final probability of outcome (x) is mixture

of probability for each possible coin result:

Page 23: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Probability of observations•Final probability of outcome (x) is mixture

of probability for each possible coin result:

Page 24: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Your goal

•Given model, and data, x1, x2, ..., xn

•Find Pr[ci|xi]

•So we need the model

•Model given by parameters: Θ= 〈 p,θheads,θtails 〈

•Where θheads and θtails are die outcome probabilities; p is prob of heads

Page 25: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Where’s the problem?

•To get Pr[ci|xi], you need Pr[xi|ci]

•To get Pr[xi|ci], you need model parameters

•To get model parameters, you need Pr[ci|xi]

•Oh oh...

Page 26: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

EM to the rescue!•Turns out that you can run this “chicken and

egg process” in a loop and eventually get the right* answer

•Make an initial guess about coin assignments

•Repeat:

•Use guesses to get parameters (M step)

•Use parameters to update coin guesses (E step)

•Until converged

Page 27: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

EM to the rescue!function [Prc,Theta]=EM(X)// initializationPrc=pick_random_values()// the EM looprepeat {// M step: pick maximum likelihood// parameters:// argmax_theta(Pr[x,c|theta])Theta=get_params_from_c(Prc)

// E step: use complete model to get data// likelihood: Pr[c|x]=1/z*Pr[x|c,theta]Prc=get_labels_from_params(X,Theta)

} until(converged)

Page 28: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Wierd, but true•This is counterintuitive, but it works

•Essentially, you’re improving guesses on each step

•M step “maximizes” parameters, Θ, given data

•E step finds “expectation” of hidden data, given Θ

•Both are driving toward max likelihood joint soln

•Guaranteed to converge

•Not guaranteed to find global optimum...

Page 29: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Very easy example

•Two Gaussian (“bell curve”) clusters

•Well separated in space

•Two dimensions

Page 30: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
Page 31: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
Page 32: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
Page 33: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
Page 34: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
Page 35: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
Page 36: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
Page 37: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
Page 38: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
Page 39: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
Page 40: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
Page 41: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

In more detail•Gaussian mixture w/ k “components”

(clusters/blobs)

•Mixture probability:

Page 42: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

In more detail•Gaussian mixture w/ k “components”

(clusters/blobs)

•Mixture probability:

One for each component

Page 43: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

In more detail•Gaussian mixture w/ k “components”

(clusters/blobs)

•Mixture probability:

Weight (probability) of each component

Page 44: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

In more detail•Gaussian mixture w/ k “components”

(clusters/blobs)

•Mixture probability:

Gaussian distribution for each componentw/ mean vector μi and covariance

matrix Σi

Page 45: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

In more detail•Gaussian mixture w/ k “components”

(clusters/blobs)

•Mixture probability:

Normalizing term for Gaussian

Page 46: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

In more detail•Gaussian mixture w/ k “components”

(clusters/blobs)

•Mixture probability:

Squared distance of data point x frommean μi (with respect to Σi)

Page 47: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

In more detail•Gaussian mixture w/ k “components”

(clusters/blobs)

•Mixture probability:

Page 48: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Hidden variables

• Introduce the “hidden variable”, ci(x) (or just ci for short)

•Denotes “amount by which data point x belongs to cluster i”

•Sometimes called “cluster ownership”, “salience”, “relevance”, etc.

Page 49: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

M step•Need: parameters (Θ) given hidden

variables (ci) and N data points, x1, x2,...,xN

•Q: what are the parameters of the model? (What do we need to learn?)

Page 50: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

M step•Need: parameters (Θ) given hidden

variables (ci) and N data points, x1, x2,...,xN

•A: Θ= 〈 αi,μi,Σi 〈 i=1..k

Page 51: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

M step•Need: parameters (Θ) given hidden

variables (ci) and N data points, x1, x2,...,xN

•A: Θ= 〈 αi,μi,Σi 〈 i=1..k

Page 52: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

M step•Need: parameters (Θ) given hidden

variables (ci) and N data points, x1, x2,...,xN

•A: Θ= 〈 αi,μi,Σi 〈 i=1..k

Page 53: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

M step•Need: parameters (Θ) given hidden

variables (ci) and N data points, x1, x2,...,xN

•A: Θ= 〈 αi,μi,Σi 〈 i=1..k

Page 54: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

E step•Need: probability of hidden variable (ci) given

fixed parameters (Θ) and observed data (x1,...,xN)

Page 55: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Another example

•k=3 Gaussian clusters

•Different means, covariances

•Well separated

Page 56: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
Page 57: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
Page 58: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
Page 59: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
Page 60: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
Page 61: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Restart

•Problem: EM has found a “minimum energy” solution

• It’s only “locally” optimal

•B/c of poor starting choice, it ended up in wrong local optimum -- not global optimum

•Default answer: pick a new random start and re-run

Page 62: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
Page 63: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
Page 64: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
Page 65: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
Page 66: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Final example•More Gaussians. How many clusters

here?

Page 67: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
Page 68: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
Page 69: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
Page 70: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
Page 71: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
Page 72: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Note...

•Doesn’t always work out this well in practice

•Sometimes the machine is smarter than humans

•Usually, if it’s hard for us, it’s hard for the machine too...

•First ~7-10 times I ran this one, it lost one cluster altogether (α3→0.0001)

Page 73: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Unresolved issues•Notice: different cluster IDs (colors) end

up on different blobs of points in each run

•Answer is “unique only up to permutation”

• I can swap around cluster IDs without changing solution

•Can’t tell what “right” cluster assignment is

Page 74: Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

Unresolved issues•“Order” of model

• I.e., what k should you use?

•Hard to know, in general

•Can just try a bunch and find one that “works best”

•Problem: answer tends to get monotonically better w/ increasing k

•Best answer to date: Chinese restaurant process