Upload
rosine
View
51
Download
4
Embed Size (px)
DESCRIPTION
Graphical Models for Machine Learning and Computer Vision. Statistical Models. Statistical Models Describe observed ‘DATA’ via an assumed likelihood: With denoting the ‘parameters’ needed to describe the data. - PowerPoint PPT Presentation
Citation preview
Graphical Models for Machine Learning and Computer Vision
Statistical Models
• Statistical Models Describe observed ‘DATA’ via an assumed likelihood:
• With denoting the ‘parameters’ needed to describe the data. • Likelihoods measure how likely what was observed was. They
implicitly assume an error mechanism (in the translation between what was observed and what was ‘supposed’ to be observed).
• Parameters may describe model features or even specify different models.
(DATA | Θ)LΘ
An Example of a Statistical Model
• A burgler alarm is affected by both earthquakes and burgleries. It has a mechanism to communicate with the homeowner if activated. It went off at Judah Pearles house one day. Should he:
• a) immediately call the police • under suspicion that a burglary took • place, or• b) go home and immediately transfer his • valueables elsewhere?
A Statistical Analysis
• Observation: The burgler alarm went off (i.e., a=1);• Parameter 1: The presence or absence of an earthquake (i.e., e=1,0);• Parameter 2: The presence or absence of a burglary at Judah’s house (i.e., b=1,0).
LIKELIHOODS/PRIORS IN THIS CASE
• The Likelihood associated with the observation is:
• With b,e =0,1 (depending on whether a burglery,earthquake has taken place).
• The Priors specify the probabilities of a burglery or earthquake happenning:
( | ) ( 1| , )DATA P a b e L
( 1) ?; P(e=1)=?;P b
Example Probabilities
• Here are some probabilities indicating something about the likelihood and prior:( 0) .9; P(b=1)=.1;
P(a=1|e=b=0)=.001; P(a=1|b=1,e=0)=.368;P(a=1|e=1,b=0)=.135; P(a=1|b=e=1)=.607;
P b
LIKELIHOOD/PRIOR INTERPRETATION
• Burglaries are as likely (apriori) as earthquakes.• It is unlikely that the alarm just went off by itself.• The alarm goes off more often when a burglary
happens but an earthquakes does not than (the reverse) i.e., when an earthquake happens but a burglary does not.
• If both a burglary and an earthquake happens than it is (virtually) twice as likely the alarm will go off.
Probability Propagation Graph
•
PROBABILITY PROPOGATION
• There are two kinds of Probability Propogation: (see Frey 1998)
a) marginalization i.e., • And b) multiplication i.e., • Marginalization sums over terms leading
into the node;• Multiplication multiplies over terms leading
into the node.
( )P B bP(b B)
CAUSAL ANALYSIS
• To analyze the causes of the alarm going off, we calculate the probability that it was a burglary (in this case) and compare it with the probability
e
P(b = 1 | a = 1) P(B b)P(A b)
= (.1)* P(a = 1 | e,b = 1)P(e A)
= .1 .368* .9 + .607* .1 = .1* .3919
CAUSAL ANALYSIS II
• So, after normalization:
• Similarly,• So, if we had to choose between burglary
and earthquake as a cause of making the alarm go off, we should choose burglary.
P(b = 1 | a = 1) = .751P(e = 1 | a = 1) = .349
Markov Chain Monte Carlo for the Burglar Problem
• For current values of e =e*, calculate
• or
• Simulate b from this distribution. Call the result b*. Now calculate:
• Or
*P(b = 0 | a = 1,e = e*),P(b = 1 | a = 1,e = e )
* *P(e = 0 | b = b ,a = 1), P(e = 1 | b = b ,a = 1)
*A b B b *P( | e = e*) P( | e = e )
* *P(A e | b )* P(E e | b )
Independent Hidden Variables: A Factorial Model
• In statistical modeling it is often advantageous to treat variables which are not observed as ‘hidden’. This means that they themselves have distributions. In our case suppose b and e are independent hidden variables:
• Then optimally:
P(b = 1) = β; P(b = 0) = 1- β;P(e = 1) = ε; P(e = 0) = 1- ε;
P(b = 1 | a = 1) = .951P(e = 1 | a = 1) = .186
Nonfactorial Hidden Variable Models
• Suppose b and e are dependent hidden variables:
• Then a similar analysis yields a related result
1,1 1,0
0,1
1,1 1,0 0,1
P(b = 1,e = 1) = p ; P(b = 1,e = 0) = pP(b = 0,e = 1) = p ; P(b = 0,e = 0) = 1- p - p - p
INFORMATION
• The difference in information available from parameters after observing the alarm versus before the alarm was observed is:
• This is the Kullback-Leibler ‘distance’ between the prior and posterior distributions.
• Parameters are chosen to optimize this distance.
,b e
(b,e | β,ε)I(β,ε) = (b,e | β,ε)log
(b,e,a = 1)LLL
β,ε
Q PD
INFORMATION IN THIS EXAMPLE
• The information available in this example • Calculated using:
is
b 1-b e 1-e
1-b b e (1-e)
(b,e | β,ε) β (1- β) ε (1- ε)(b,e,a = 1) P(a = 1 | b,e).9 * .1 * .1 * .9
LL
I(β,ε) = -H(β) - H(ε) +
-logP(a = 1 | b,e) - (b + e)* log(.1) - (2 - b - e)log(.9)
Markov Random Fields
• Markov Random Fields are simply Graphical Models set in a 2 or higher dimensional field. Their fundamental criterion is that the distribution of a point x conditional on all of those that remain (i.e., -x) is identical to its distribution given a neighborhood ‘N’ of it (i.e.,
( | ) ( | )xx x x L L N
EXAMPLE OF A RANDOM FIELD
• Modeling a video frame is typically done via a random field. Parameters identify our expectations of what the frame looks like.
• We can ‘clean up’ video frames or related media using a methodology which distinguishes between what we expect and what was observed.
GENERALIZATION
• This is can be generalized to non-discrete likelihoods with non-discrete parameters.
• More generally (sans data) assume that a movie (consisting of many frames, each of which consists in grey level pixel values over a lattice) is observed. We would like to ‘detect’ ‘unnatural’ events.
GENERALIZATION II
• Assume a model for frame i (given frame i-1) taking the form,
• The parameters typically denote invariant features for pictures of cars, houses, etc..
• The presence or absence of unnatural events can be described by hidden variables.
• The (frame) likelihood describes the natural evolution of the movie over time.
(Frame[i] | Θ,Frame[i -1])L
Θ
GENERALIZATION III
• Parameters are estimated by optimizing the information they provide. This is accomplished by ‘summing or integrating over’ the hidden variables.