Probabilistic networks Inference and Other Problems Hans L. Bodlaender Utrecht University

Probabilistic networksInference and Other Problems

Hans L. Bodlaender

Utrecht University

Probabilistic networks - IPA fall days2

Overview

• Probabilistic networks• The inference problem• Tree decompositions and an algorithm for

probabilistic inference• The Maximum Probable Assignment problem• Monotonicity


Decision support systemsReasoning with uncertainty

• Decision support systems (and/or expert systems)

• Reasoning with uncertainty

– Set of stochastic variables

• Observations

• Other variables

• Variable(s) of interest

• In 1980s the probabilistic network model was proposed.

– Also called: Bayesian networks, belief networks, graphical models


Probabilistic networks• Directed acyclic graph• Each node is a (discrete)

stochastic variable• E.g. Boolean variable

• Given for each variable is its conditional probability distribution:• Conditional to values for

the parents of the node

Pr(x1)=0.7Pr(¬x1)=0.3

Pr(x1)=0.7Pr(¬x1)=0.3x1

x2 x3

x4

x5

Pr(x5|x3)= 0.6Pr(¬x5|x3)= 0.4Pr(x5|¬x3)= 0.2Pr(¬x5|¬x3)=0.8

Pr(x5|x3)= 0.6Pr(¬x5|x3)= 0.4Pr(x5|¬x3)= 0.2Pr(¬x5|¬x3)=0.8

Pr(x4| x2 and x3) = 0.12etc.

Pr(x4| x2 and x3) = 0.12etc.

…………




A probabilistic networkconsists of

• A directed acyclic graph G=(V,E)• For each vertex (variable) v, a table with

conditional probabilities (conditional to values of parents of v)

v

w x Pr(v | w and x) Pr(¬v | w and x)Pr(v | ¬w and x) Pr(¬v | ¬w and x)Pr(v | w and ¬x) Pr(¬v | w and ¬x)Pr(v | ¬w and ¬x) Pr(¬v | ¬w and ¬x)

Example


Configuration

• A configuration c is an assignment of a value to each variable (node).

• For set W of variables, or variable v, and configuration c, denote cW and cv for the restrictions (partial configurations).

• Probability of configuration c:

)|Pr( )(vparentsVv v cc


Topological sort of directed acyclic graph

• Order of vertices such that edges go from left to right:– List vertices v1, …, vn such that for each arc

(vi,vj): i < j.• Always exists for dag, and can be found in O(|V|+|

E|) time.


Pr(v2|v1) = 0.3Pr(¬v2|v1) = 0.7Pr(v2|¬v1) = 0.4Pr(¬v2|¬v1) = 0.6

Pr(v2|v1) = 0.3Pr(¬v2|v1) = 0.7Pr(v2|¬v1) = 0.4Pr(¬v2|¬v1) = 0.6

Generating a random configuration

Make a topological sort of G

For i= 1 to n dogenerate a value for vi using the probabilities dictated by values already generated for the parents of i

Pr(v1)=0.7Pr(¬v1)=0.3

Pr(v1)=0.7Pr(¬v1)=0.3x1

x2 x3

x4 x5

Pr(v5|v3)= 0.6Pr(¬v5|v3)= 0.4Pr(v5|¬v3)= 0.2Pr(¬v5|¬v3)=0.8

Pr(v5|v3)= 0.6Pr(¬v5|v3)= 0.4Pr(v5|¬v3)= 0.2Pr(¬v5|¬v3)=0.8

……

……


Inference problem

• Given: values for some variables (observations) cO

• Question: probability distribution on one variable conditional to observations, or:

)|Pr( Ov cTc

OO

OOv

ccc

ccTcc

c

c

','

',','

)'Pr(

)'Pr(


Use of inference problem

• Network models information from application domain (medical, agricultural, weather forecasting, …)

• User gives values for some variables (symptoms of patient, observed values) and wants to know distribution for other variables (likeliness of success of treatment, diagnostic)

• Used nowadays in many applications



Inference problem is #P-complete

• #P-completeness implies NP-hardness.• Proof of #P-hardness:

– Number of satisfying truth assignments of 3CNF formula is #P-complete

• E.g.: (x1 or x2 or ¬x4) and (x5 or ¬ x1 or ¬x3) and …

– Transform to probabilistic network


Example transformation(x1 or x2 or ¬ x4) and …

Pr(xi) = 0.5Pr(¬xi) = 0.5

T: Probability 1 whensatisfied; otherwise 0

T: Probability 1 whenboth parents true

Probability equals #sat / 2n

x1 x2 x3 x4 x5 x6


Lauritzen-Spiegelhalter algorithm

• Using tree decompositions to solve the problem

• Fast when width of tree decomposition is small

• Tree decomposition of moralisation of G:– Tree with each node a bag:

set of variables– For all v:

• Bags with v form connected subtree

• There is a bag containing v and its parents (bag covers v)

x1

x2 x3

x4 x5

x1x2

x2

x3

x3

x4x4

x5


LS algorithm

• Here: description without observations.

• Take node with variable of interest in it as root.

• Compute for each node i with bag X of T a table– For each assignment cX of

values to the variables in X, with Y the variables covered in the subtree with root i, compute vi(cx):

XccFTXYc Yv vparentsv cc},,{: )( )|Pr(

X

extends

Y


Computing tables bottom up

• A table for a node can be computed when the tables of the children are known

• E.g., compute tables in postorder (bottom up)


Example: node with two children with all bags identical

X

X X

i

j1 j2

)()()( 21 xjxjXi cvcvcv

XccFTXYYc Yv vparentsvXi cccv},,{: )(

21)|Pr()(

Y1 Y2


LS algorithm

• For other types of nodes, similarly table can be computed.• Time for one table linear in size of table: bag size k gives

time O(2k) for binary variables.• Linear time when bag size bounded by constant (bounded

treewidth). Happens often in practice!• Table of root allows to compute distribution for variables in

root bag• Similar scheme when observations are given; when variables

are discrete but not all binary• Scheme with also moving downwards in tree to compute

distribution for all variables: also linear time for bounded treewidth


MAP problem

• Given: probabilistic network, some observations• Question: most likely configuration given the

observations• Applications: most likely explanation, verification

of design of probabilistic networks


MAP is NP-hard(x1 or x2 or ¬ x4) and … Pr(xi) = 0.5

Pr(¬xi) = 0.5

T: Probability 1 when satisfied or Y is T

otherwise 1/2

x1 x2 x3 x4 x5 x6

Shimoney, 1994


MAP with tree decompositions

• Similar algorithm as for inference can solve MAP in linear time when tree decomposition of moralisation with bounded bag size (treewidth) is given

• Compute for cX:

X

Y

Yv vparentsvccFTXYc ccX

)|Pr(max )(},,{:


Fixed parameter variant of MAP

• MAP(p):

• Given: probabilistic network

• Question: is there a configuration with probability at least p?

• Can be solved in O(f(p) n) time, i.e., linear for fixed p.

• Joint work with van der Gaag and van den Eijkhof.

• Similar result when there are observations (values for some variables), and we look to a configuration consistent with the observations


Algorithm uses branch and bound

• Look at variables in order of a topological sort

• Recursive process:

– Branch for assignment of value to next variable

– Plus … bounding mechanism

Start here

v1=T v1=F

v1=T,v2=T

v1=T,v2=F

v1=T,v2=T

v1=T,v2=F

v1=T,v2=T,v3=T

v1=T,v2=T,v3=T

…


Bounding

• Recall:• Parents of v are before v in topological sort• Compute for a node z in branch and bound tree

with assigned values

• P(z) can be computed from P(parent(z)) and choice for ith variable

• Bound when P(z) < p: this can never be a solution

)|Pr()Pr( )(vparentsVv v ccc

ivvv ccc ,,,21

i

jvvv jj

ccczP1

),,|Pr()(11


Recursive scheme

• E-MPA-p(values for first i variables, p, pz)• If i=n (we have done all variables), then return

true (output the sequence); stop.

• Else: For each possible value x for vi+1:

– Compute pznew = pz * Pr(vi+1=x | values for first i vertices)

– If pznew p, then E-MPA-p( values for first i variables and then x, p, pznew)


Time analysis

• If a node has at least two children in the tree, then

– For each child, pznew p, hence Pr(vi+1=x | values for first i vertices) p

– Hence: pznew pz * (1-Pr(vi+1=x | values for first i vertices)) pz * (1-p)

• After a node in the tree with two children, value of pz is a factor at least (1-p) smaller

• Tree has at most log p / log (1-p) leaves. (How often can you divide 1 by 1-p till you are smaller than p?)

• Time is O(f(p) * n).


Partial MAP

• Variant of MAP where we ask for values to subset of variables with maximum probability, given some observations

• Park: NPPP-complete, and NP-complete when G is a polytree (underlying undirected graph is a tree)


Monotonicity

• Joint work with Linda van der Gaag and Ad Feelders

• Monotonicity is often a requested property of a probabilistic network

– E.g.: if a patient has more severe symptons, one expects the diagnosis is more severe

• Ordering on the values of variables

• cX c’X if for all x in X: cX(x) c’X(x)

• Two observations that are ordered should imply ordering of probabilities of values for variable of interest (formal definition follows).


Monotonicity in mode

• Let z be the output variable.• The mode of z given values cX for some other variables X: T(z | cX) is that

value for z such that Pr(z| cx) is maximal. (+ tie-breaking rule)• Take ordering on values of each variable.• The probabilistic network with observable nodes X and output variable z

is isotone when each pair of value assignments to X, cX,c’X, one has:

– cX c’X implies T(z | cX) T(z | c’X)

• Antitone: cX c’X implies T(z | cX) T(z | c’X)• Monotone: isotone or antitone • Monotone in distribution: similar, but looking to cumulative distribution.

Identical to monotonicity in mode when all variables are binary.


Results

• Testing if network is monotone (isotone, antitone) in mode (in distribution) is:– coNPPP complete– coNP-complete for polytrees


Hardness proof (sketch)

• Transformation from variant of Partial MAP problem– Can we set values for M,

such that Pr(E=T|cM) > p ?• Pr(A=T| E=T) = 1• Pr(A=T| E=F) = (1/2 –p)/(1-p)• Pr(C=T| A, B) = 1 if A and B F,

otherwise 0• Proof shows that the new

network is monotone in mode, and monotone in distribution, if and only if there is a cM with Pr(E=T|cM) > p

G: instance ofPartial MAP E

A B

C

M

M U B set of observablevariables;C variable of interest


Conclusions

• Probabilistic (belief, Bayesian) networks form mathematical precise model

• Used in several decision support system• Use and design of networks pose interesting

challenges, many algorithmic• Sometimes special structures help (tree

decompositions), also in practice

Documents

Probabilistic networks Inference and Other Problems Hans L. Bodlaender Utrecht University