Upload
eugenia-greer
View
224
Download
6
Embed Size (px)
DESCRIPTION
Assignment 1: Noisy Observations (Nick) Z: true feature vector X: noisy observation X ~ Normal(z, s 2 ) We need to compute P(X|H) Φ: cumulative density fn of Gaussian
Citation preview
Some Neat Results From Assignment 1
Assignment 1:Negative Examples (Rohit)
Assignment 1:Noisy Observations (Nick)
Z: true feature vectorX: noisy observationX ~ Normal(z, s2)We need to compute P(X|H)
Φ: cumulative density fnof Gaussian
Assignment 1:Noisy Observations (Nick)
Guidance on Assignment 3
Guidance: Assignment 3 Part 1
matlab functions in statistics toolbox
betacdf, betapdf, betarnd, betastat, betafit
Guidance: Assignment 3 Part 2
You will explore the role of the priors.The Weiss model showed that priors play an important role when
observations are noisy
observations don’t provide strong constraints
there aren’t many observations.
Guidance: Assignment 3 Part 3
Implement model a bit like Weiss et al. (2002)Goal: infer motion (velocity) of a rigid shape from observations at two instances in time.
Assume distinctive features that make it easy to identify the location of the feature at successive times.
Assignment 2 Guidance
Bx: the x displacement of the blue square (= delta x in one unit of time)
By: the y displacement of the blue squareRx: the x displacement of the red squareRy: the y displacement of the red squareThese observations are corrupted by measurement noise.
Gaussian, mean zero, std deviation σ
D: direction of motion (up, down, left, right) Assume only possibilities are one unit of motion in any direction
Assignment 2: Generative Model
Same assumptions for Bx, By.
Rx conditioned on D=up isdrawn from aGaussian
Assignment 2 Math
)()|()|()|()|(~),,,|(
)()|,,,(~),,,|(
)()|,,,()()|,,,(),,,|(
),,,()()|,,,(),,,|(
DPDBypDBxpDRypDRxpByBxRyRxDP
DPDByBxRyRxpByBxRyRxDP
ePeByBxRyRxpDPDByBxRyRxpByBxRyRxDP
ByBxRyRxPDPDByBxRyRxpByBxRyRxDP
e
Conditional independence
Assignment 2 Implementation
)...|(~),,,|( dDrxRxpbyBybxBxryRyrxRxdDP
)()|()|()|()|(~),,,|( DPDBypDBxpDRypDRxpByBxRyRxDP
)...,;( 2ddrxGaussian
...)2/)(exp(21 22
2
dd
d
rx
Quiz: do we need worry about the Gaussian density function normalization term?
Introduction To Bayes Nets
(Stuff stolen fromKevin Murphy, UBC, and
Nir Friedman, HUJI)
What Do You Need To Do Probabilistic Inference In A Given Domain?
Joint probability distribution over all variables in domain
Qualitative part Directed acyclic graph(DAG)• Nodes: random vars. • Edges: direct influence
Quantitative part
Set of conditional probability distributions
0.9 0.1e
be
0.2 0.8
0.01 0.990.9 0.1
bebb
e
BE P(A | E,B)Family of Alarm
Earthquake
Radio
Burglary
Alarm
Call
Compact representation of joint probability distributions via conditional independence
Together
Define a unique distribution in a factored form
)|()|(),|()()(),,,,( ACPERPEBAPEPBPRCAEBP
Bayes Nets (a.k.a. Belief Nets)
Figure from N. Friedman
What Is A Bayes Net?
Earthquake
Radio
Burglary
Alarm
Call
A node is conditionally independent of itsancestors given its parents.
E.g., C is conditionally independent of R, E, and Bgiven A
Notation: C? R,B,E | A
Quiz: What sort of parameter reduction do we get?
From 25 – 1 = 31 parameters to 1+1+2+4+2=10
Conditional Distributions Are Flexible
E.g., Earthquake and Burglary might have independent effectson Alarm
A.k.a. noisy-or
where pB and pE are alarm probabilitygiven burglary and earthquake alone
This constraint reduces # free parameters to 8!
Earthquake Burglary
Alarm
B E P(A|B,E)
0 0 0
0 1 pE
1 0 pB
1 1 pE+pB-pEpB
Domain: Monitoring Intensive-Care Patients• 37 variables• 509 parameters …instead of 237
PCWP CO
HRBP
HREKG HRSAT
ERRCAUTERHRHISTORY
CATECHOL
SAO2 EXPCO2
ARTCO2
VENTALV
VENTLUNG VENITUBE
DISCONNECT
MINVOLSET
VENTMACHKINKEDTUBEINTUBATIONPULMEMBOLUS
PAP SHUNT
ANAPHYLAXIS
MINOVL
PVSAT
FIO2PRESS
INSUFFANESTHTPR
LVFAILURE
ERRBLOWOUTPUTSTROEVOLUMELVEDVOLUME
HYPOVOLEMIA
CVP
BP
A Real Bayes Net: Alarm
Figure from N. Friedman
More Real-World Bayes Net Applications
“Microsoft’s competitive advantage lies in its expertise in Bayesian networks”-- Bill Gates, quoted in LA Times, 1996
MS Answer Wizards, (printer) troubleshootersMedical diagnosisSpeech recognition (HMMs)Gene sequence/expression analysis Turbocodes (channel coding)
Why Are Bayes Nets Useful?
Factored representation may have exponentially fewer parameters than full joint
Easier inference (lower time complexity)
Less data required for learning (lower sample complexity) Graph structure supports
Modular representation of knowledge
Local, distributed algorithms for inference and learning
Intuitive (possibly causal) interpretation
Strong theory about the nature of cognition or the generative process that produces observed data Can’t represent arbitrary contingencies among variables, so theory can be rejected by data
Reformulating Naïve Bayes As Graphical Model
D
Rx Ry Bx By
),,,(/),,,,(),,,|(
),,,,(),,,(
)|()|()|()|()(),,,,(
ByBxRyRxPByBxRyRxDPByBxRyRxDP
ByBxRyRxDpByBxRyRxp
DBypDBxpDRypDRxpDPByBxRyRxDp
D
Marginalizing over D
Definition of conditional probability
survive
Age Class Gender
Review: Bayes Net
Nodes = random variablesLinks = expression of joint distribution
Compare to full joint distribution by chain rule
Earthquake
Radio
Burglary
Alarm
Call
Bayesian Analysis
Make inferences from data using probability models about quantities we want to predict
E.g., expected age of death given 51 yr old
E.g., latent topics in document
E.g., What direction is the motion? Set up full probability model that characterizes distribution over all quantities (observed and unobserved)
incorporates prior beliefs Condition model on observed data to compute posterior distribution
1.Evaluate fit of model to data
adjust model parameters to achieve better fits
Inference
• Computing posterior probabilities– Probability of hidden events given any evidence
• Most likely explanation– Scenario that explains evidence
• Rational decision making– Maximize expected utility– Value of Information
• Effect of intervention– Causal analysis
Earthquake
Radio
Burglary
Alarm
Call
Radio
Call
Figure from N. Friedman
Explaining away effect
Conditional Independence
A node is conditionally independentof its ancestors given its parents.
Example?
What about conditionalindependence between variablesthat aren’t directly connected?
e.g., Earthquake and Burglary?
e.g., Burglary and Radio?
Earthquake
Radio
Burglary
Alarm
Call
d-separationCriterion for deciding if nodes are conditionally independent.
A path from node u to node v is d-separated by a node z if the path matches one of these templates:
u z v
u z v
u z v
u z v
z
z
z
observed
unobserved
d-separationThink about d-separation as breaking a chain.If any link on a chain is broken, the whole chain is broken
u z v
u z v
u z v
u z v
z
u
u
u
u
v
v
v
v
x z y
x z y
x z y
x z y
z
d-separation Along Paths
Are u and v d-separated?
u z v
u z v
u z v
u z v
z
u vz z
u vzz
u vzz
d separated
d separated
Not d separated
Conditional Independence
Nodes u and v are conditionally independent given set Z if all (undirected) paths between u and v are d-separated by Z.
E.g.,
u v
z
z
z
PCWP CO
HRBP
HREKG HRSAT
ERRCAUTERHRHISTORY
CATECHOL
SAO2 EXPCO2
ARTCO2
VENTALV
VENTLUNG VENITUBE
DISCONNECT
MINVOLSET
VENTMACHKINKEDTUBEINTUBATIONPULMEMBOLUS
PAP SHUNT
ANAPHYLAXIS
MINOVL
PVSAT
FIO2
PRESS
INSUFFANESTHTPR
LVFAILURE
ERRBLOWOUTPUTSTROEVOLUMELVEDVOLUME
HYPOVOLEMIA
CVP
BP
PCWP CO
HRBP
HREKG HRSAT
ERRCAUTERHRHISTORY
CATECHOL
SAO2 EXPCO2
ARTCO2
VENTALV
VENTLUNG VENITUBE
DISCONNECT
MINVOLSET
VENTMACHKINKEDTUBEINTUBATIONPULMEMBOLUS
PAP SHUNT
ANAPHYLAXIS
MINOVL
PVSAT
FIO2
PRESS
INSUFFANESTHTPR
LVFAILURE
ERRBLOWOUTPUTSTROEVOLUMELVEDVOLUME
HYPOVOLEMIA
CVP
BP
Sufficiency For Conditional Independence: Markov Blanket
The Markov blanket of node u consists of the parents, children, and children’s parents of u
P(u|MB(u),v) = P(u|MB(u))
u
Probabilistic Models
Probabilistic models
Directed Undirected
Graphical models
Alarm networkState-space modelsHMMsNaïve Bayes classifierPCA/ ICA
Markov Random FieldBoltzmann machineIsing modelMax-ent modelLog-linear models
(Bayesian belief nets) (Markov nets)
Turning A Directed Graphical Model Into An Undirected Model Via Moralization
Moralization: connect all parents of each node and remove arrows
Toy Example Of A Markov Net
X1 X2
X5
X3
X4
e.g., X1 ? X4, X5 | X2, X3Xi ? Xrest | Xnbrs
Potential function
Partition function
Maximal clique: largest subset of vertices such that each pairis connected by an edge
Clique
1 2 3 3
A Real Markov Net
•Estimate P(x1, …, xn | y1, …, yn)• Ψ(xi, yi) = P(yi | xi): local evidence likelihood• Ψ(xi, xj) = exp(-J(xi, xj)): compatibility matrix
Observed pixels
Latent causes
Example Of Image Segmentation With MRFs
Sziranyi et al. (2000)
Graphical Models Are A Useful Formalism
E.g., feedforward neural net with noise, sigmoid belief net
Hidden layer
Input layer
Output layer
Graphical Models Are A Useful Formalism
E.g., Restricted Boltzmann machine (Hinton) Also known as Harmony network (Smolensky)
Hidden units
Visible units
Graphical Models Are A Useful FormalismE.g., Gaussian Mixture Model
Graphical Models Are A Useful Formalism
E.g., dynamical (time varying) models in which data arrives sequentially or output is produced as a sequence
Dynamic Bayes nets (DBNs) can be used to model such time-series (sequence) data
Special cases of DBNs include Hidden Markov Models (HMMs) State-space models
Hidden Markov Model (HMM)
Y1 Y3
X1 X2 X3
Y2
Phones/ words
acoustic signal
transitionmatrix
Gaussianobservations
State-Space Model (SSM)/Linear Dynamical System (LDS)
Y1 Y3
X1 X2 X3
Y2
“True” state
Noisy observations
Example: LDS For 2D Tracking
Q3
R1 R3R2
Q1 Q2
X1
X1 X2
X2
X1 X2
y1
y1 y2
y2
y2y1
oo
o o
sparse linear-Gaussian system
Kalman Filtering(Recursive State Estimation In An LDS)
Y1 Y3
X1 X2X3
Y2
Estimate P(Xt|y1:t) from P(Xt-1|y1:t-1) and yt
•Predict: P(Xt|y1:t-1) = sXt-1 P(Xt|Xt-1) P(Xt-1|y1:t-1)•Update: P(Xt|y1:t) / P(yt|Xt) P(Xt|y1:t-1)
Mike’s Project From Last Year
G
X
studenttrial
α
P
δ
problemIRT model
Mike’s Project From Last Year
X
studenttrial
L0
T
τ
G S
BKT model
Mike’s Project From Last Year
X
γ σ
studenttrial
L0
T
τ
α
P
δ
problem
η
G S
IRT+BKT model