View
218
Download
2
Category
Tags:
Preview:
Citation preview
Direct Message Passing for Direct Message Passing for Hybrid Bayesian NetworksHybrid Bayesian Networks
Wei Sun, PhDWei Sun, PhD
Assistant Research ProfessorAssistant Research Professor
SFL, C4I Center, SEOR Dept.SFL, C4I Center, SEOR Dept.
George Mason University, 2009George Mason University, 2009
2
Outline
Inference for hybrid Bayesian networks
Message passing algorithm
Direct message passing between discrete and continuous variables
Gaussian mixture reduction
Issues
3
Hybrid Bayesian Networks
Type Continuous features
Discretefeatures
speed
frequency
0.36589…
location
category
Type1Class 2
…
Both DISCRETE and CONTINUOUS variables are involved in a hybrid model.Both DISCRETE and CONTINUOUS variables are involved in a hybrid model.
4
Hybrid Bayesian Networks – Cont.
The simplest hybrid BN model – Conditional Linear Gaussian (CLG) no discrete child for continuous parent. Linear relationship between continuous variables. Clique Tree algorithm provides exact solution.
General hybrid BNs arbitrary continuous densities and arbitrary functional
relationships between continuous variables. No exact algorithm in general. Approximate methods include discretization, simulatio
n, conditional loopy propagation, etc.
5
Innovation
Message passing between pure discrete variables or between pure continuous variables is well defined. But it is an open issue to exchange messages between heterogeneous variables.
In this paper, we unify the message passing framework to exchange information between arbitrary variables. Provides exact solutions for polytree CLG, with full density estimat
ions, v.s. Clique Tree algorithm provides only first two moments. Both have same complexity.
Integrates unscented transformation to provide approximate solution for nonlinear non-Gaussian models.
Uses Gaussian mixture (GM) to represent continuous messages. May apply GM reduction techniques to make the algorithm scalabl
e.
6
Why Message Passing
Local, distributed, less computations.Local, distributed, less computations.
7
Message Passing in Polytree
In polytree, any node d-separate the sub-network above it from the sub-network below it. For a typical node X in a polytree, evidence can be divided into two exclusive sets, and processed separately:
Define messages and messages as:
Multiply-connected network may not be partitioned into two separate sub-networks by a node.
Then the belief of node X is:
8
Message Passing in Polytree – Cont
In message passing algorithm, each node maintains Lambda value and Pi value for itself. Also it sends Lambda message to its parent and Pi message to its child.
After finite-number iterations of message passing, every node obtains its correct belief.
For polytree, MP returns exact For polytree, MP returns exact belief; belief; For networks with loop, MP is For networks with loop, MP is called loopy propagation that often called loopy propagation that often gives good approximation to gives good approximation to posterior distributions.posterior distributions.
9
Message Passing in Hybrid Networks
For continuous variable, messages are represented by Gaussian mixture (GM).
Each state of discrete parent introduces A Gaussian component in continuous message.
Unscented transformation is used to compute continuous message when function relationship defined in CPD (Conditional Probability Distribution) is nonlinear.
When messages propagate, size of GM increased exponentially. Error-bounded GM reduction technique maintains the scalability of the algorithm.
10
Direct Passing between Disc. & Cont.
UD
X
Non-negative constant. Non-negative constant.
Gaussian mixture with discrete pi message as mixing prior,Gaussian mixture with discrete pi message as mixing prior,and is the inverse of function defined in CPD of X. and is the inverse of function defined in CPD of X.
Gaussian mixture with discrete pi message as mixing prior,Gaussian mixture with discrete pi message as mixing prior,and is the function specified in CPD of X.and is the function specified in CPD of X.
Message exchanged directly between discrete and continuous nodes, Size of GM increased when Message exchanged directly between discrete and continuous nodes, Size of GM increased when messages propagate. Need GM reduction technique to maintain scalability. messages propagate. Need GM reduction technique to maintain scalability.
A continuous node with bothA continuous node with bothdiscrete and continuous parents.discrete and continuous parents.
11
Complexity
-10 0 100
0.05
0.1
-10 0 100
0.05
0.1
-20 -10 0 10 200
0.1
0.2
-20 -10 0 10 200
0.1
0.2
-10 -5 0 5 100
0.05
0.1
0.15
0.2
-10 -5 0 5 100
0.05
0.1
0.15
0.2
Exploding??Exploding??
Z Y
X W
T
U
A B
12
Scalability - Gaussian Mixture Reduction
-10 -5 0 5 100
0.05
0.1
0.15
0.2f1: A 2-component GM
-10 -5 0 5 100
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18f2: A 3-component GM
-8 -6 -4 -2 0 2 4 6 80
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2f1 f2: 6-component GM
true densitytrue components
13
Gaussian Mixture Reduction – Cont.
-8 -6 -4 -2 0 2 4 6 80
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2f1 f2: 6-component GM
true densitytrue components
-8 -6 -4 -2 0 2 4 6 80
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2f1 f2: approx. 3-component GM
app. densityapp. components
Normalized integrated square error = 0.45%Normalized integrated square error = 0.45%
14
Example – 4-comp. GM to 20-comp. GM
-5 0 5 10 15 20 250
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09Gaussian mixture reduction with bounded error
20-component GM
4-component app. GM
NISE < 1%NISE < 1%
15
Scalability - Error Propagation
Approximate messages propagate, and so do the errors. We can have each approximation bounded. However, total errors after propagations is very difficult to estimate.
Ongoing research: having each GM reduction bounded with small error,
we aim to have total approximation errors are still bounded, at least empirically.
16
Numerical Experiments – Polytree CLG
Poly12CLG – a polytree BN modelPoly12CLG – a polytree BN model
DMP v.s. Clique TreeDMP v.s. Clique Tree Both have same complexity. Both provide exact solution for polytree. DMP provides full density estimation,
while CT provides only the first two
moments for continuous variables.
-6 -4 -2 0 2 4 6 80
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5Full density estimation by DMP
CliqueTreeDMPGM components
17
Numerical Experiments – Polytree CLG, with GM Reduction
Poly12CLG – a polytree BN modelPoly12CLG – a polytree BN model
V A L B H C0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Average and maximum errors after combining pi only(100 simulation runs)
Hidden discrete nodes
Abs
olut
e pr
obab
ility
err
ors
Average errorAverage diffMaximum errorMaximum diff
GM pi value -> single Gaussian approx.
18
Numerical Experiments – Polytree CLG, with GM Reduction
Poly12CLG – a polytree BN modelPoly12CLG – a polytree BN model GM lambda message -> single Gaussian approx.
V A L B H C0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Average and maximum errors after combining lambda only(100 simulation runs)
Hidden discrete nodes
Abs
olu
te p
roba
bilit
y er
rors
Average errorAverage diffMaximum errorMaximum diff
19
Numerical Experiments – Polytree CLG, with GM Reduction
Poly12CLG – a polytree BN modelPoly12CLG – a polytree BN model GM pi and lambda message -> single Gaussian approx.
20
Reduce GM under Bounded Error
V A L B H C0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Average and maximum errors after approximating both pi and lambda with fewer components Gaussian mixture
(100 simulation runs)
Hidden discrete nodes
Ave
rag
e a
bs
olu
te e
rro
rs
Average error
Average diff
Maximum error
Maximum diff
Each GM reduction has bounded error < 5%, Each GM reduction has bounded error < 5%, then the inference performance improved significantly. then the inference performance improved significantly.
21
Numerical Experiments – Network with loops
Loop13CLG – a BN model with loopsLoop13CLG – a BN model with loops
V A L B H C0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Average and maximum errors in loopy propagation(100 simulation runs)
Hidden discrete nodes
Abs
olut
e pr
obab
ility
err
ors
Average errorAverage diffMaximum errorMaximum diff
Errors are from 1% to 5% due to loopy propagation.
22
Empirical Insights
Combining pi does not affect network ‘above’;
Combining lambda does not affect network ‘below’;
Approximation errors due to GM reduction diminish for discrete nodes further away from the discrete parent nodes.
Loopy propagation usually provides accurate estimations.
23
Summary & Future Research
DMP provides an alternative algorithm for efficient inference in hybrid BNs:Exact for polytree modelFull density estimationsSame complexity as Clique TreeScalable in trading off accuracy with computat
ional complexityDistributed algorithm, local computations only
24
25
A1 A2 AnA3 …
Y1 Y2 Y3 YnT E
26
Pi Value of A Cont. Node with both Disc. & Cont. Parents
UD
X
Pi value of a continuous node is essentially Pi value of a continuous node is essentially a distribution transformed by the function a distribution transformed by the function defined in CPD of this node, with input defined in CPD of this node, with input distributions as all of pi messages sent distributions as all of pi messages sent from its parents. from its parents.
With both discrete and continuous parents, With both discrete and continuous parents, pi value of the continuous node can be pi value of the continuous node can be represented by a Gaussian mixture. represented by a Gaussian mixture.
Gaussian mixture with discrete pi message as mixing prior,Gaussian mixture with discrete pi message as mixing prior,and is the function specified in CPD of X.and is the function specified in CPD of X.
27
Lambda Value of A Cont. Node
Lambda value of a continuous node is a product Lambda value of a continuous node is a product of all lambda messages sent from its children. of all lambda messages sent from its children.
Lambda message sending to a continuous node Lambda message sending to a continuous node is definitely a continuous message in the form of is definitely a continuous message in the form of Gaussian mixture because only continuous child Gaussian mixture because only continuous child is allowed for continuous node. is allowed for continuous node.
Product of Gaussian mixture will be a Gaussian Product of Gaussian mixture will be a Gaussian mixture with exponentially increased size.mixture with exponentially increased size.
X
28
Pi Message Sending to Cont. node from Disc. Parent
UD
X
Pi message sending to a continuous Pi message sending to a continuous node ‘X’ from its discrete parent is the node ‘X’ from its discrete parent is the product of pi value of the discrete product of pi value of the discrete parent and all of lambda messages parent and all of lambda messages sending to this discrete parent from all sending to this discrete parent from all children except ‘X’.children except ‘X’.
Lambda message sending to discrete Lambda message sending to discrete node from its child is always a discrete node from its child is always a discrete vector. vector.
Pi value of discrete node is always a Pi value of discrete node is always a discrete distribution.discrete distribution.
Pi message sending to a continuous Pi message sending to a continuous node from its discrete parent is a node from its discrete parent is a discrete vector, representing the discrete vector, representing the discrete parent’s state probabilities.discrete parent’s state probabilities.
29
Pi Message Sending to Cont. node from Cont. Parent
UD
X
Pi message sending to a continuous Pi message sending to a continuous node ‘X’ from its continuous parent is node ‘X’ from its continuous parent is the product of pi value of the the product of pi value of the continuous parent and all of lambda continuous parent and all of lambda messages sending to the continuous messages sending to the continuous parent from all children except ‘X’.parent from all children except ‘X’.
Lambda message sending to a Lambda message sending to a continuous node from its child is continuous node from its child is always a continuous message, always a continuous message, represented by GM.represented by GM.
Pi value of a continuous node is Pi value of a continuous node is always a continuous distribution, also always a continuous distribution, also represented by GM.represented by GM.
Pi message sending to a continuous Pi message sending to a continuous node from its continuous parent is a node from its continuous parent is a continuous message, represented by continuous message, represented by a GM.a GM.
30
Lambda Message Sending to Disc. Parent from Cont. node
UD
X
Given each state of discrete parent, a Given each state of discrete parent, a function is defined between continuous function is defined between continuous node and its continuous parent.node and its continuous parent.
For each state of discrete parent, lambda For each state of discrete parent, lambda message sent from a continuous node is message sent from a continuous node is a integration of two continuous a integration of two continuous distributions (both represented by GM), distributions (both represented by GM), resulting in a non-negative constant.resulting in a non-negative constant.
31
Lambda Message Sending to Cont. Parent from Cont. Node
UD
X
Lambda message sending from a Lambda message sending from a continuous node to its continuous parent continuous node to its continuous parent is a Gaussian mixture using the pi is a Gaussian mixture using the pi message sending to it from its discrete message sending to it from its discrete parent as the mixing prior. parent as the mixing prior.
Pi message sending to continuous node Pi message sending to continuous node from its discrete parent is a discrete from its discrete parent is a discrete vector, serving as the mixing prior. vector, serving as the mixing prior.
32
Unscented Transformation
Unscented transformation (UT) is a deterministic sampling methodUnscented transformation (UT) is a deterministic sampling method UT approximates the first two moments of a continuous random variable UT approximates the first two moments of a continuous random variable
transformed via an arbitrary nonlinear function.transformed via an arbitrary nonlinear function. UT is based on the principle that it is easier to approximate a probability UT is based on the principle that it is easier to approximate a probability
distribution than a nonlinear function.distribution than a nonlinear function.
deterministic sample points are chosen and propagated via the deterministic sample points are chosen and propagated via the original function.original function.
Where is the dimension of Where is the dimension of XX, is a scaling parameter., is a scaling parameter.
UT keeps the original function unchanged and results are exact for linear function.UT keeps the original function unchanged and results are exact for linear function.
33
Why Message Passing
Local
Distributed
Less computations
Recommended