Bayesian Belief Network - ULisboa · Bayesian Networks Bayesian belief network allows a subset of the variables conditionally independent A graphical model of causal relationships

1

Bayesian Belief Network

• The decomposition of large probabilistic domains intoweakly connected subsets via conditionalindependence is one of the most importantdevelopments in the recent history of AI

• This can work well, even the assumption is not true!

),,()(

),,,(

)()()(

cavitycatchtoothachePcloudyWeatherP

cloudyWeathercavitycatchtoothacheP

bPaPbaP

==

==

=!

2

vNB

Naive Bayes assumption:

which gives

Bayesian networks Conditional Independence Inference in Bayesian Networks Irrelevant variables Constructing Bayesian Networks Aprendizagem Redes Bayesianas

Examples - Exercisos

3

Naive Bayes assumption of conditionalindependence too restrictive

But it's intractable without some suchassumptions...

Bayesian Belief networks describe conditionalindependence among subsets of variables

allows combining prior knowledge about(in)dependencies amongvariables with observed training data

Bayesian networks A simple, graphical notation for conditional independence

assertions and hence for compact specification of full jointdistributions

Syntax: a set of nodes, one per variable a directed, acyclic graph (link ≈ "directly influences") a conditional distribution for each node given its parents:

P (Xi | Parents (Xi))

In the simplest case, conditional distribution represented as aconditional probability table (CPT) giving the distribution over Xifor each combination of parent values

4

Bayesian Networks Bayesian belief network allows a subset of

the variables conditionally independent A graphical model of causal relationships

Represents dependency among the variables Gives a specification of joint probability distribution

X Y

ZP

Nodes: random variablesLinks: dependencyX,Y are the parents of Z, and Y is theparent of PNo dependency between Z and PHas no loops or cycles

Conditional Independence Once we know that the patient has cavity we do

not expect the probability of the probe catchingto depend on the presence of toothache

Independence between a and b

)|()|(

)|()|(

cavitytoothachePcatchcavitytoothacheP

cavitycatchPtoothachecavitycatchP

=!

=!

)()|(

)()|(

bPabP

aPbaP

=

=

5

Example Topology of network encodes conditional independence assertions:

Weather is independent of the other variables Toothache and Catch are conditionally independent given Cavity

Bayesian Belief Network: AnExample

FamilyHistory

LungCancer

PositiveXRay

Smoker

Emphysema

Dyspnea

LC

~LC

(FH, S) (FH, ~S) (~FH, S) (~FH, ~S)

0.8

0.2

0.5

0.5

0.7

0.3

0.1

0.9

Bayesian Belief Networks

The conditional probability tablefor the variable LungCancer:Shows the conditional probabilityfor each possible combination ofits parents

6

Example I'm at work, neighbor John calls to say my alarm is ringing, but neighbor

Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there aburglar?

Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls

Network topology reflects "causal" knowledge:

A burglar can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call

Belief Networks

Burglary P(B)0.001

Earthquake P(E)0.002

Alarm

Burg. Earth. P(A)t t .95t f .94f t .29

f f .001

JohnCalls MaryCallsA P(J)t .90f .05

A P(M)t .7f .01

7

Full Joint Distribution

))(|(),...,(1

1 i

n

i

in XparentsxPxxP !=

=

00062.0998.0999.0001.07.09.0

)()()|()|()|(

)(

=!!!!=

¬¬¬"¬=

¬"¬"""

ePbPebaPamPajP

ebamjP

Compactness A CPT for Boolean Xi with k Boolean parents has 2k rows for the

combinations of parent values

Each row requires one number p for Xi = true(the number for Xi = false is just 1-p)

If each variable has no more than k parents, the complete networkrequires O(n · 2k) numbers

I.e., grows linearly with n, vs. O(2n) for the full joint distribution

For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 25-1 = 31)

8

Inference in Bayesian Networks How can one infer the (probabilities of)

values of one or more network variables,given observed values of others?

Bayes net contains all informationneeded for this inference

If only one variable with unknown value,easy to infer it

In general case, problem is NP hard

Example In the burglary network, we migth

observe the event in whichJohnCalls=true and MarryCalls=true

We could ask for the probability that theburglary has occurred

P(Burglary|JohnCalls=ture,MarryCalls=true)

9

Remember - Joint distribution

!

P(cavity | toothache) =P(cavity" toothache)

P(toothache)

!

P(¬cavity | toothache) =P(¬cavity" toothache)

P(toothache)!

=0.108 + 0.012

0.108 + 0.012 + 0.016 + 0.064= 0.6

!

=0.016 + 0.064

0.108 + 0.012 + 0.016 + 0.064= 0.4

Normalization

4.0,6.008.0,12.0

)|(),|(

)()|()|(

)|()|(1

=

¬

!=

¬+=

"

"

"

xyPxyP

YPYXPXYP

xyPxyP

10

Normalization

• X is the query variable• E evidence variable• Y remaining unobservable variable

• Summation over all possible y (all possible values of theunobservable variables Y)

!

P(Cavity | toothache) ="P(Cavity, toothache)

="[P(Cavity, toothache,catch) + P(Cavity, toothache,¬catch)]

="[< 0.108,0.016 > + < 0.012,0.064 >] =" < 0.12,0.08 >=< 0.6,0.4 >

!

P(X | e) ="P(X,e) =" P(X,e,y)y

#

P(Burglary|JohnCalls=ture,MarryCalls=true)• The hidden variables of the query are Earthquake

and Alarm

• For Burglary=true in the Bayesain network

!

P(B | j,m) ="P(B, j,m) =" P(B,e,a, j,m)a

#e

#

!

P(b | j,m) =" P(b)P(e)P(a |b,e)P( j | a)P(m | a)a

#e

#

11

To compute we had to add four terms,each computed by multiplying fivenumbers

In the worst case, where we have to sumout almost all variables, the complexity ofthe network with n Boolean variables isO(n2n)

P(b) is constant and can be moved out, P(e)term can be moved outside summation a

JohnCalls=true and MarryCalls=true, the probabilitythat the burglary has occured is aboud 28%!

P(b | j,m) ="P(b) P(e) P(a |b,e)P( j | a)P(m | a)a

#e

#

!

P(B, j,m) =" < 0.00059224,0.0014919 >#< 0.284,0.716 >

12

Computation for Burglary=true

Variable elimination algorithm• Eliminate repeated calculation

• Dynamic programming

13

Irrelevant variables• (X query variable, E evidence variables)

Complexity of exact inference

The burglary network belongs to a family ofnetworks in which there is at most oneundirected path between tow nodes in thenetwork These are called singly connected networks or

polytrees The time and space complexity of exact

inference in polytrees is linear in the size ofnetwork Size is defined by the number of CPT entries If the number of parents of each node is bounded by

a constant, then the complexity will be also linear inthe number of nodes

14

For multiply connected networks variableelimination can have exponential timeand space complexity

Constructing Bayesian Networks A Bayesian network is a correct

representation of the domain only if eachnode is conditionally independent of itspredecessors in the ordering, given its parents

P(MarryCalls|JohnCalls,Alarm,Eathquake,Bulgary)=P(MaryCalls|Alarm)

15

Conditional Independencerelations in Bayesian networks

The topological semantics is given eitherof the specifications of DESCENDANTSor MARKOV BLANKET

Local semantics

16

Example

JohnCalls is indipendent of Burglary andEarthquake given the value of Alarm

17

Example

Burglary is indipendent of JohnCalls andMaryCalls given Alarm and Earthquake

Constructing Bayesiannetworks 1. Choose an ordering of variables X1, … ,Xn

2. For i = 1 to n add Xi to the network select parents from X1, … ,Xi-1 such that

P (Xi | Parents(Xi)) = P (Xi | X1, ... Xi-1)

This choice of parents guarantees:

P (X1, … ,Xn) = πni =1 P (Xi | X1, … , Xi-1) (chain rule)

= πni =1P (Xi | Parents(Xi)) (by construction)

18

The compactness of Bayesian networks is anexample of locally structured systems Each subcomponent interacts directly with only

bounded number of other components Constructing Bayesian networks is difficult

Each variable should be directly influenced by only afew others

The network topology reflects thes direct influences

Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)?

Example

19


P(J | M) = P(J)? NoP(A | J, M) = P(A | J)? P(A | J, M) = P(A)? NoP(B | A, J, M) = P(B | A)?P(B | A, J, M) = P(B)?

Example


P(J | M) = P(J)? NoP(A | J, M) = P(A | J)? P(A | J, M) = P(A)? NoP(B | A, J, M) = P(B | A)? YesP(B | A, J, M) = P(B)? NoP(E | B, A ,J, M) = P(E | A)?P(E | B, A, J, M) = P(E | A, B)?

Example

20


P(J | M) = P(J)? NoP(A | J, M) = P(A | J)? P(A | J, M) = P(A)? NoP(B | A, J, M) = P(B | A)? YesP(B | A, J, M) = P(B)? NoP(E | B, A ,J, M) = P(E | A)? NoP(E | B, A, J, M) = P(E | A, B)? Yes

Example

Example contd.

Deciding conditional independence is hard in no causal directions (Causal models and conditional independence seem hardwired for humans!) Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed Some links represent tenuous relationship that require difficult and unnatural

probability judgment, such the probability of Earthquake given Burglary and Alarm

21

22

Aprendizagem Redes Bayesianas

Como preencher as entradas numa Tabela de ProbabilidadeCondicional

1º Caso: Se a estrutura da rede bayesiana fôr conhecida, e todas asvariavéis podem ser observadas do conjunto de treino.Então:Entrada (i,j) = utilizando os valoresobservados no conjunto de treino

2º Caso: Se a estrutura da rede bayesiana fôr conhecida, e algumasdas variavéis não podem ser observadas no conjunto de treino.

Então utiliza-se método do algoritmo do gradiente ascendente

))(Pr/( ii YsedecessoreyP

Exemplo 1º casoPerson Person FH S E LC FH S E LC PXRay PXRay DDP1 Sim Sim Não Sim + SimP2 Sim Não Não Sim - SimP3 Sim Não Sim Não + NãoP4 Não Sim Sim Sim - SimP5 Não Sim Não Não + Não

P6 Sim Sim ? ? ? ?

LC

~LC

(FH, S) (FH, ~S)(~FH, S) (~FH, ~S)

0.5

…

…

…

…

…

…

…

P(LC = Sim \ FH=Sim, S=Sim) =0.5

=))(Pr/( ii YsedecessoreyP

FamilyHistory

LungCancer

Smoker

Emphysema

23

Exemplo 2º caso

Suppose structure known, variables partiallyobservable Similar to training neural network with hidden units In fact, can learn network conditional probability

tables using gradient ascent

Person Person FH S E LC FH S E LC PXRay PXRay DDP1 --- Sim --- Sim + SimP2 --- Não --- Sim - SimP3 --- Não --- Não + NãoP4 --- Sim --- Sim - SimP5 --- Sim --- Não + Não

P6 Sim Sim ? ? ? ?

Summary Bayesian networks provide a natural

representation for (causally induced)conditional independence

Topology + CPTs = compactrepresentation of joint distribution

Generally easy for domain experts toconstruct

24

25

26

-> P(d|a,b,c)=P(d|a,c)=0.66

->

!

P(b | a,c,d) =" P(a)c

# P(b)P(c | a,b)P(d | a,c)

P(b | a,c,d) ="P(a)P(b) P(c | a,b)P(d | a,c)c

#

P(B | a,c,d) =" < 0.05,0.075 >=< 0.4,0.6 >

P(¬b | a,c,d) = 0.6

!

P(d | a,b,c) ="P(a)P(b)P(c | a,b)P(d | a,c)

P(D | a,b,c) =" < 0.0825,0.0425 >=< 0.66,034 >

Bayesian networks Conditional Independence Inference in Bayesian Networks Irrelevant variables Constructing Bayesian Networks Aprendizagem Redes Bayesianas

Examples - Exercisos

27

árv dec ID3

Documents

Bayesian Belief Network - ULisboa · Bayesian Networks Bayesian belief network allows a subset of the variables conditionally independent A graphical model of causal relationships