View
39
Download
2
Category
Tags:
Preview:
DESCRIPTION
Bayesian Belief Network. The decomposition of large probabilistic domains into weakly connected subsets via conditional independence is one of the most important developments in the recent history of AI This can work well, even the assumption is not true!. v NB. Naive Bayes assumption: - PowerPoint PPT Presentation
Citation preview
Bayesian Belief Network
• The decomposition of large probabilistic domains into weakly connected subsets via conditional independence is one of the most important developments in the recent history of AI
• This can work well, even the assumption is not true!
),,()(
),,,(
)()()(
cavitycatchtoothachePcloudyWeatherP
cloudyWeathercavitycatchtoothacheP
bPaPbaP
====
=∧
vNB
Naive Bayes assumption:
which gives
Bayesian networks Conditional Independence Inference in Bayesian Networks Irrelevant variables Constructing Bayesian Networks Aprendizagem Redes Bayesianas
Examples - Exercisos
Naive Bayes assumption of conditional independence too restrictive
But it's intractable without some such assumptions...
Bayesian Belief networks describe conditional independence among subsets of variables
allows combining prior knowledge about (in)dependencies amongvariables with observed training data
Bayesian networks A simple, graphical notation for conditional independence
assertions and hence for compact specification of full joint distributions
Syntax: a set of nodes, one per variable a directed, acyclic graph (link ≈ "directly influences") a conditional distribution for each node given its parents:
P (Xi | Parents (Xi))
In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over Xi for each combination of parent values
Bayesian Networks
Bayesian belief network allows a subset of the
variables conditionally independent
A graphical model of causal relationships Represents dependency among the variables Gives a specification of joint probability distribution
X Y
ZP
Nodes: random variablesLinks: dependencyX,Y are the parents of Z, and Y is the parent of PNo dependency between Z and PHas no loops or cycles
Conditional Independence Once we know that the patient has cavity we do
not expect the probability of the probe catching to depend on the presence of toothache
Independence between a and b
)|()|(
)|()|(
cavitytoothachePcatchcavitytoothacheP
cavitycatchPtoothachecavitycatchP
=∧=∧
)()|(
)()|(
bPabP
aPbaP
==
Example Topology of network encodes conditional independence assertions:
Weather is independent of the other variables Toothache and Catch are conditionally independent given Cavity
Bayesian Belief Network: An Example
FamilyHistory
LungCancer
PositiveXRay
Smoker
Emphysema
Dyspnea
LC
~LC
(FH, S) (FH, ~S) (~FH, S) (~FH, ~S)
0.8
0.2
0.5
0.5
0.7
0.3
0.1
0.9
Bayesian Belief Networks
The conditional probability table for the variable LungCancer:Shows the conditional probability for each possible combination of its parents
∏=
=n
iZParents iziPznzP
1))(|(),...,1(
Example I'm at work, neighbor John calls to say my alarm is ringing, but neighbor
Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar?
Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls
Network topology reflects "causal" knowledge:
A burglar can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call
Belief Networks
Burglary P(B)0.001
Earthquake P(E)0.002
Alarm
Burg. Earth. P(A)t t .95t f .94f t .29
f f .001
JohnCalls MaryCallsA P(J)t .90f .05
A P(M)t .7f .01
Full Joint Distribution
))(|(),...,(1
1 i
n
iin XparentsxPxxP ∏
=
=
00062.0998.0999.0001.07.09.0
)()()|()|()|(
)(
=××××=¬¬¬∧¬=
¬∧¬∧∧∧ePbPebaPamPajP
ebamjP
Compactness A CPT for Boolean Xi with k Boolean parents has 2k rows for the
combinations of parent values
Each row requires one number p for Xi = true(the number for Xi = false is just 1-p)
If each variable has no more than k parents, the complete network requires O(n · 2k) numbers
I.e., grows linearly with n, vs. O(2n) for the full joint distribution
For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 25-1 = 31)
Inference in Bayesian Networks How can one infer the (probabilities of)
values of one or more network variables, given observed values of others?
Bayes net contains all information needed for this inference
If only one variable with unknown value, easy to infer it
In general case, problem is NP hard
Example
In the burglary network, we migth observe the event in which JohnCalls=true and MarryCalls=true
We could ask for the probability that the burglary has occured
P(Burglary|JohnCalls=ture,MarryCalls=true)
Remember - Joint distribution
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“ benötigt.
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
€
P(cavity | toothache) =P(cavity∧toothache)
P(toothache)
€
P(¬ cavity | toothache) =P(¬ cavity∧toothache)
P(toothache)€
=0.108 + 0.012
0.108 + 0.012 + 0.016 + 0.064= 0.6
€
=0.016 + 0.064
0.108 + 0.012 + 0.016 + 0.064= 0.4
Normalization
4.0,6.008.0,12.0
)|(),|(
)()|()|(
)|()|(1
=
¬
×=
¬+=
α
α
α
xyPxyP
YPYXPXYP
xyPxyP
Normalization
• X is the query variable• E evidence variable• Y remaining unobservable variable
• Summation over all possible y (all possible values of the unobservable varables Y)
€
P(Cavity | toothache) =αP(Cavity, toothache)
=α [P(Cavity, toothache,catch) + P(Cavity, toothache,¬ catch)]
=α [< 0.108,0.016 > + < 0.012,0.064 >] =α < 0.12,0.08 >=< 0.6,0.4 >
€
P(X | e) =αP(X,e) =α P(X,e,y)y
∑
P(Burglary|JohnCalls=ture,MarryCalls=true)• The hidden variables of the query are Earthquake
and Alarm
• For Burglary=true in the Bayesain network
€
P(B | j,m) =αP(B, j,m) =α P(B,e,a, j,m)a
∑e
∑
€
P(b | j,m) =α P(b)P(e)P(a |b,e)P( j | a)P(m | a)a
∑e
∑
To compute we had to add four terms, each computed by multipling five numbers
In the worst case, where we have to sum out almost all variables, the complexity of the network with n Boolean variables is O(n2n)
P(b) is constant and can be moved out, P(e) term can be moved outside summation a
JohnCalls=true and MarryCalls=true, the probability that the burglary has occured is aboud 28%€
P(b | j,m) =αP(b) P(e) P(a |b,e)P( j | a)P(m | a)a
∑e
∑
€
P(B, j,m) =α < 0.00059224,0.0014919 >≈< 0.284,0.716 >
Computation for Burglary=true
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
Variable elimination algorithm• Eliminate repeated calculation
• Dynamic programming
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
Irrelevant variables• (X query variable, E evidence variables)
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
Complexity of exact inference
The burglary network belongs to a family of networks in which there is at most one undiracted path between tow nodes in the network These are called singly connected networks or
polytrees The time and space complexity of exact inference
in polytrees is linear in the size of network Size is defined by the number of CPT entries If the number of parents of each node is bounded by a
constant, then the complexity will be also linear in the number of nodes
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
For multiply connected networks variable elimination can have exponentional time and space complexity
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
Constructing Bayesian Networks
A Bayesian network is a correct representation of the domain only if each node is conditionally independent of its predecessors in the ordering, given its parents
P(MarryCalls|JohnCalls,Alarm,Eathquake,Bulgary)=P(MaryCalls|Alarm)
Conditional Independence relations in Bayesian networks
The toopological semantics is given either of the spqcifications of DESCENDANTS or MARKOV BLANKET
Local semantics
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
Example
JohnCalls is indipendent of Burglary and Earthquake given the value of Alarm
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
Example
Burglary is indipendent of JohnCalls and MaryCalls given Alarm and Earthquake
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
Constructing Bayesian networks 1. Choose an ordering of variables X1, … ,Xn
2. For i = 1 to n add Xi to the network select parents from X1, … ,Xi-1 such that
P (Xi | Parents(Xi)) = P (Xi | X1, ... Xi-1)
This choice of parents guarantees:
P (X1, … ,Xn) = πni =1 P (Xi | X1, … , Xi-1)
= πni =1P (Xi | Parents(Xi))
(by construction) (chain rule)
The compactness of Bayesian networks is an example of locally structured systems Each subcomponent interacts directly with only
bounded number of other components
Constructing Bayesian networks is difficult Each variable should be directly influenced by only a
few others The network topology reflects thes direct influences
Suppose we choose the ordering M, J, A, B, E
P(J | M) = P(J)?
Example
Suppose we choose the ordering M, J, A, B, E
P(J | M) = P(J)?
P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No
P(B | A, J, M) = P(B | A)?
P(B | A, J, M) = P(B)?
No
Example
Suppose we choose the ordering M, J, A, B, E
P(J | M) = P(J)?P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? NoP(B | A, J, M) = P(B | A)? YesP(B | A, J, M) = P(B)? NoP(E | B, A ,J, M) = P(E | A)?P(E | B, A, J, M) = P(E | A, B)?
No
Example
Suppose we choose the ordering M, J, A, B, E
P(J | M) = P(J)?P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? NoP(B | A, J, M) = P(B | A)? YesP(B | A, J, M) = P(B)? NoP(E | B, A ,J, M) = P(E | A)? NoP(E | B, A, J, M) = P(E | A, B)? Yes
No
Example
Example contd.
Deciding conditional independence is hard in noncausal directions (Causal models and conditional independence seem hardwired for humans!) Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed Some links represent tenous relationship that require difficult and unnatural
probability judgment, such the probability of Earthquake given Burglary and Alarm
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
Aprendizagem Redes Bayesianas
Como preencher as entradas numa Tabela de Probabilidade Condicional
1º Caso: Se a estrutura da rede bayesiana fôr conhecida, e todas as variavéis podem ser observadas do conjunto de treino.Então: Entrada (i,j) = utilizando os valores observados no conjunto de treino
2º Caso: Se a estrutura da rede bayesiana fôr conhecida, e algumas das variavéis não podem ser observadas no conjunto de treino.
Então utiliza-se método do algoritmo do gradiente ascendente
))(Pr/( ii YsedecessoreyP
Exemplo 1º caso
Person FH S E LC PXRay DPerson FH S E LC PXRay DP1 Sim Sim Não Sim + SimP2 Sim Não Não Sim - SimP3 Sim Não Sim Não + NãoP4 Não Sim Sim Sim - SimP5 Não Sim Não Não + Não
P6 Sim Sim ? ? ? ?
LC
~LC
(FH, S) (FH, ~S)(~FH, S) (~FH, ~S)
0.5
…
…
…
…
…
…
…
P(LC = Sim \ FH=Sim, S=Sim) =0.5
=))(Pr/( ii YsedecessoreyP
FamilyHistory
LungCancer
Smoker
Emphysema
Exemplo 2º caso
Suppose structure known, variables partially observable Similar to training neural network with hidden units In fact, can learn network conditional probability tables using
gradient ascent
Person FH S E LC PXRay DPerson FH S E LC PXRay DP1 --- Sim --- Sim + SimP2 --- Não --- Sim - SimP3 --- Não --- Não + NãoP4 --- Sim --- Sim - SimP5 --- Sim --- Não + Não
P6 Sim Sim ? ? ? ?
Summary
Bayesian networks provide a natural representation for (causally induced) conditional independence
Topology + CPTs = compact representation of joint distribution
Generally easy for domain experts to construct
-> P(d|a,b,c)=P(d|a,c)=0.66
->
€
P(b | a,c,d) =α P(a)c
∑ P(b)P(c | a,b)P(d | a,c)
P(b | a,c,d) =αP(a)P(b) P(c | a,b)P(d | a,c)c
∑
P(B | a,c,d) =α < 0.05,0.075 >=< 0.4,0.6 >
P(b | a,c,d) = 0.6
€
P(d | a,b,c) =αP(a)P(b)P(c | a,b)P(d | a,c)
P(D | a,b,c) =α < 0.0825,0.0425 >=< 0.66,034 >
Bayesian networks Conditional Independence Inference in Bayesian Networks Irrelevant variables Constructing Bayesian Networks Aprendizagem Redes Bayesianas
Examples - Exercisos
árv dec ID3
Recommended