Graphical Models for Machine Learning and Computer Vision

Graphical Models for Machine Learning and Computer Vision

Statistical Models

• Statistical Models Describe observed ‘DATA’ via an assumed likelihood:

• With denoting the ‘parameters’ needed to describe the data. • Likelihoods measure how likely what was observed was. They

implicitly assume an error mechanism (in the translation between what was observed and what was ‘supposed’ to be observed).

• Parameters may describe model features or even specify different models.

(DATA | Θ)LΘ

An Example of a Statistical Model

• A burgler alarm is affected by both earthquakes and burgleries. It has a mechanism to communicate with the homeowner if activated. It went off at Judah Pearles house one day. Should he:

• a) immediately call the police • under suspicion that a burglary took • place, or• b) go home and immediately transfer his • valueables elsewhere?

A Statistical Analysis

• Observation: The burgler alarm went off (i.e., a=1);• Parameter 1: The presence or absence of an earthquake (i.e., e=1,0);• Parameter 2: The presence or absence of a burglary at Judah’s house (i.e., b=1,0).

LIKELIHOODS/PRIORS IN THIS CASE

• The Likelihood associated with the observation is:

• With b,e =0,1 (depending on whether a burglery,earthquake has taken place).

• The Priors specify the probabilities of a burglery or earthquake happenning:

( | ) ( 1| , )DATA P a b e L

( 1) ?; P(e=1)=?;P b

Example Probabilities

• Here are some probabilities indicating something about the likelihood and prior:( 0) .9; P(b=1)=.1;

P(a=1|e=b=0)=.001; P(a=1|b=1,e=0)=.368;P(a=1|e=1,b=0)=.135; P(a=1|b=e=1)=.607;

P b

LIKELIHOOD/PRIOR INTERPRETATION

• Burglaries are as likely (apriori) as earthquakes.• It is unlikely that the alarm just went off by itself.• The alarm goes off more often when a burglary

happens but an earthquakes does not than (the reverse) i.e., when an earthquake happens but a burglary does not.

• If both a burglary and an earthquake happens than it is (virtually) twice as likely the alarm will go off.

Probability Propagation Graph

•

PROBABILITY PROPOGATION

• There are two kinds of Probability Propogation: (see Frey 1998)

a) marginalization i.e., • And b) multiplication i.e., • Marginalization sums over terms leading

into the node;• Multiplication multiplies over terms leading

into the node.

( )P B bP(b B)

CAUSAL ANALYSIS

• To analyze the causes of the alarm going off, we calculate the probability that it was a burglary (in this case) and compare it with the probability

e

P(b = 1 | a = 1) P(B b)P(A b)

= (.1)* P(a = 1 | e,b = 1)P(e A)

= .1 .368* .9 + .607* .1 = .1* .3919

CAUSAL ANALYSIS II

• So, after normalization:

• Similarly,• So, if we had to choose between burglary

and earthquake as a cause of making the alarm go off, we should choose burglary.

P(b = 1 | a = 1) = .751P(e = 1 | a = 1) = .349

Markov Chain Monte Carlo for the Burglar Problem

• For current values of e =e*, calculate

• or

• Simulate b from this distribution. Call the result b*. Now calculate:

• Or

*P(b = 0 | a = 1,e = e*),P(b = 1 | a = 1,e = e )

* *P(e = 0 | b = b ,a = 1), P(e = 1 | b = b ,a = 1)

*A b B b *P( | e = e*) P( | e = e )

* *P(A e | b )* P(E e | b )

Independent Hidden Variables: A Factorial Model

• In statistical modeling it is often advantageous to treat variables which are not observed as ‘hidden’. This means that they themselves have distributions. In our case suppose b and e are independent hidden variables:

• Then optimally:

P(b = 1) = β; P(b = 0) = 1- β;P(e = 1) = ε; P(e = 0) = 1- ε;

P(b = 1 | a = 1) = .951P(e = 1 | a = 1) = .186

Nonfactorial Hidden Variable Models

• Suppose b and e are dependent hidden variables:

• Then a similar analysis yields a related result

1,1 1,0

0,1

1,1 1,0 0,1

P(b = 1,e = 1) = p ; P(b = 1,e = 0) = pP(b = 0,e = 1) = p ; P(b = 0,e = 0) = 1- p - p - p

INFORMATION

• The difference in information available from parameters after observing the alarm versus before the alarm was observed is:

• This is the Kullback-Leibler ‘distance’ between the prior and posterior distributions.

• Parameters are chosen to optimize this distance.

,b e

(b,e | β,ε)I(β,ε) = (b,e | β,ε)log

(b,e,a = 1)LLL

β,ε

Q PD

INFORMATION IN THIS EXAMPLE

• The information available in this example • Calculated using:

is

b 1-b e 1-e

1-b b e (1-e)

(b,e | β,ε) β (1- β) ε (1- ε)(b,e,a = 1) P(a = 1 | b,e).9 * .1 * .1 * .9

LL

I(β,ε) = -H(β) - H(ε) +

-logP(a = 1 | b,e) - (b + e)* log(.1) - (2 - b - e)log(.9)

Markov Random Fields

• Markov Random Fields are simply Graphical Models set in a 2 or higher dimensional field. Their fundamental criterion is that the distribution of a point x conditional on all of those that remain (i.e., -x) is identical to its distribution given a neighborhood ‘N’ of it (i.e.,

( | ) ( | )xx x x L L N

EXAMPLE OF A RANDOM FIELD

• Modeling a video frame is typically done via a random field. Parameters identify our expectations of what the frame looks like.

• We can ‘clean up’ video frames or related media using a methodology which distinguishes between what we expect and what was observed.

GENERALIZATION

• This is can be generalized to non-discrete likelihoods with non-discrete parameters.

• More generally (sans data) assume that a movie (consisting of many frames, each of which consists in grey level pixel values over a lattice) is observed. We would like to ‘detect’ ‘unnatural’ events.

GENERALIZATION II

• Assume a model for frame i (given frame i-1) taking the form,

• The parameters typically denote invariant features for pictures of cars, houses, etc..

• The presence or absence of unnatural events can be described by hidden variables.

• The (frame) likelihood describes the natural evolution of the movie over time.

(Frame[i] | Θ,Frame[i -1])L

Θ

GENERALIZATION III

• Parameters are estimated by optimizing the information they provide. This is accomplished by ‘summing or integrating over’ the hidden variables.

Documents

Graphical Models for Machine Learning and Computer Vision