Upload
maddox
View
52
Download
0
Embed Size (px)
DESCRIPTION
CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference. Jiang Bian, Fall 2012 University of Arkansas at Little Rock. Overview and Example. - PowerPoint PPT Presentation
Citation preview
CPSC 7373: Artificial IntelligenceLecture 5: Probabilistic Inference
Jiang Bian, Fall 2012University of Arkansas at Little Rock
Overview and Example
A
J M
B EThe alarm (A) might go off because of either a Burglary (B) and/or an Earthquake (E). And when the alarm (A) goes off, either John (J) and/or Mary (M) will call to report.
Possible questions:• Given the evidence of either B or E, what’s
the probability of J or M will call?
Answer to this type of questions:• Posterior distribution: P(Q1, Q2 … | E1=e1,
E2=e2)• It's the probability distribution of one or
more query variables given the values of the evidence variables.
EVIDENCE
QUERY
HIDDEN
Overview and Example
A
J M
B EThe alarm (A) might go off because of either a Burglary (B) and/or an Earthquake (E). And when the alarm (A) goes off, either John (J) and/or Mary (M) will call to report.
Possible questions:• Out of all the possible values for all the
query variables, which combination of values has the highest probability?
Answer to these questions:• argmaxq: P(Q1=q1, Q2=q2 … | E1=e1, …)• Which Q values are maxable given the
evidence values?
EVIDENCE
QUERY
HIDDEN
Overview and Example
A
J M
B EImagine the situation where Mary has called to report that the alarm is going off, and we want to know whether or not there has been a burglary. For each of the nodes, tell us if the node is an evidence node, a hidden node or a query node?
Overview and Example
A
J M
B EImagine the situation where Mary has called to report that the alarm is going off, and we want to know whether or not there has been a burglary. For each of the nodes, tell us if the node is an evidence node, a hidden node or a query node?
Evidence: MQuery: BHidden: E, A, J
Inference through enumeration
A
J M
B E P(+b|+j, +m) = ???
Imagine the situation where both John and Mary have called to report that the alarm is going off, and we want to know the probability of a burglary.
Definition:Conditional probability:
P(Q|E) = P(Q, E) / P(E)
Inference through enumeration
A
J M
B E P(+b|+j, +m) = ???
= P(+b, +j, +m) / P(+j, +m)
P(+b, +j, +m)
Definition:Conditional probability:
P(Q|E) = P(Q, E) / P(E)
Inference through enumeration
A
J M
B E
B P(B)
+b
0.01
¬b
0.999
E P(E)+e 0.00
2¬e 0.99
8
A J P(J|A)
+a +j 0.9
+a ¬j 0.1
¬a +j 0.05
¬a ¬j 0.95
A M P(M|A)
+a +m 0.7
+a ¬m 0.3
¬a +m 0.01
¬a ¬m 0.99
B E A P(A|B,E)
+b
+e +a
0.95
+b
+e ¬a
0.05
+b
¬e +a
0.94
+b
¬e ¬a
0.06
¬b
+e +a
0.29
¬b
+e ¬a
0.71
¬b
¬e +a
0.001
¬b
¬e ¬a
0.999
Given +e, +a ???
Inference through enumeration
P(+b) P(e) P(a|+b,e)
P(+j|a)
P(+m|a)
+e, +a
0.001
0.002
0.95 0.9 0.7 0.000001197
+e, ¬a
0.001
0.002
0.05 0.05 0.01 5e-11
¬e, +a
0.001
0.998
0.94 0.9 0.7 0.0005910156
¬e, ¬a
0.001
0.998
0.05 0.05 0.01 2.495e-8
0.0005922376
P(+b, +j, +m)
Inference through enumeration
P(b) P(e) P(a|b,e) P(+j|a)
P(+m|a)
+e, +a, +b
0.001
0.002
0.95 0.9 0.7 0.000001197
+e, ¬a, +b
0.001
0.002
0.05 0.05 0.01 5e-11
¬e, +a, +b
0.001
0.998
0.94 0.9 0.7 0.0005910156
¬e, ¬a, +b
0.001
0.998
0.05 0.05 0.01 2.495e-8
+e, +a, ¬b
0.999
0.002
0.29 0.9 0.7 0.0003650346
+e, ¬a, ¬b
0.999
0.002
0.05 0.05 0.01 4.995e-8
¬e, +a, ¬b
0.999
0.998
0.71 0.9 0.7 0.4459589946
¬e, ¬a, ¬b
0.999
0.998
0.999 0.05 0.01 0.00049800249
0.44741431924
P(+j, +m)
Inference through enumeration
A
J M
B E P(+b|+j, +m) = ???
= P(+b, +j, +m) / P(+j, +m)= 0.0005922376 / 0.44741431924= 0.284
Definition:Conditional probability:
P(Q|E) = P(Q, E) / P(E)
Enumeration
• We assumed binary events/Boolean variables.
• Only 5 variables:– 25 = 32 rows in the CPT
• Practically, what if we have a large network?
A
J M
B E
Example: Car-diagnosis Initial evidence: engine won't start
Testable variables (thin ovals), diagnosis variables (thick ovals)
Hidden variables (shaded) ensure sparse structure, reduce parameters
Example: Car insurancePredict claim costs (medical, liability, property) given data on application form (other unshaded nodes)
If Boolean: 227
rows in the CPTNOT Boolean in reality.
Speed Up EnumerationP(+b, +j, +m)
Pulling out terms:
Speed up enumeration• Maximize Independence
– The structure of the Bayes network determines how efficient to calculate the probability values.
X1 X2 Xn O(n)
X1
Xn X2
O(2n)
Bayesian networks: definition• A simple, graphical notation for conditional independence
assertions and hence for compact specification of full joint distributions
• Syntax:– a set of nodes, one per variable– a directed, acyclic graph (link = “directly influences")– a conditional distribution for each node given its parents: P(Xi|
Parents(Xi))• In the simplest case, conditional distribution represented
as a conditional probability table (CPT) giving the distribution over Xi for each combination of parent values
Constructing Bayesian Networks
• Dependent or Independent?– P(J|M) = P(J)?
A
J M
B E
J M
The alarm (A) might go off because of either a Burglary (B) and/or an Earthquake (E). And when the alarm (A) goes off, either John (J) and/or Mary (M) will call to report.
Suppose we choose the ordering M, J, A, B, E
J M
A
P(A|J,M) = P(A|J)?P(A|J,M) = P(A)?
J M
A
B
P(B|A, J, M) = P(B|A)?P(B|A, J, M) = P(B)?
J M
A
B E
P(E|B, A, J, M) = P(E|A)?P(E|B, A, J, M) = P(E|A, B)?
J M
A
B E
• Deciding conditional independence is hard in non-causal directions
• (Causal models and conditional independence seem hardwired for humans!)
• Assessing conditional probabilities is hard in non-causal directions
• Network is less compact: 1 + 2 + 4 + 2 + 4=13 numbers needed
Variable Elimination
• Variable elimination: carry out summations right-to-left, storing intermediate results (factors) to avoid re-computation
)()(
)()(
),()()(
)()(),,()()(
)()(),|()()(
),()|(),|()()(
),,,,(m)j,|P(B
bfbf
bfbP
ebfePbP
afafebafePbP
afafebaPePbP
amPajPebaPePbP
aemjbP
jmaeb
jmae
jmae
mjaae
mjae
ea
ea
(sum out A)
(sum out E)
Variable Elimination
• Variable elimination:– Summing out a variable from a product of factors:• move any constant factors outside the summation• add up submatrices in pointwise product of remaining
factors– still N-P complete, but faster than enumeration
Pointwise product of factors f1 and f2
Variable EliminationR T L
+r 0.1
¬r 0.9
+r +t 0.8
+r ¬t 0.2
¬r +t 0.1
¬r ¬t 0.9
+t +l 0.3
+t ¬l 0.7
¬t +l 0.1
¬t ¬l 0.9
1) Joining factorsP(R, T)
P(R) P(T|R) P(L|T)
+r +t 0.08
+r ¬t 0.02
¬r +t 0.09
¬r ¬t 0.81
Variable EliminationR T L
RT L
+r +t 0.08
+r ¬t 0.02
¬r +t 0.09
¬r ¬t 0.81
P(R, T)
+t +l 0.3
+t ¬l 0.7
¬t +l 0.1
¬t ¬l 0.9
P(L|T)
Marginalize on the variable R, to gives us a table of just the variable T. P(R,T) - > P(T)
+t ??¬t ??
Variable EliminationR T L
RT L
+r +t 0.08
+r ¬t 0.02
¬r +t 0.09
¬r ¬t 0.81
P(R, T)
+t +l 0.3
+t ¬l 0.7
¬t +l 0.1
¬t ¬l 0.9
P(L|T)
2) Marginalize on the variable R, to gives us a table of just the variable T. P(R,T) - > P(T)
+t 0.17¬t 0.83
Variable EliminationR T L
RT L +t +l 0.3
+t ¬l 0.7
¬t +l 0.1
¬t ¬l 0.9
P(L|T)
+t 0.17¬t 0.83
T L
P(T)3) Joint probability of P(T, L)
+t +l ??
+t ¬l ??
¬t +l ??
¬t ¬l ??
Variable EliminationR T L
RT L +t +l 0.3
+t ¬l 0.7
¬t +l 0.1
¬t ¬l 0.9
P(L|T)
+t 0.17¬t 0.83
T L
P(T)3) Joint probability of P(T, L)
+t +l 0.051
+t ¬l 0.119
¬t +l 0.083
¬t ¬l 0.747
Variable EliminationR T L
RT L
T L
4) P(L)
+t +l 0.051
+t ¬l 0.119
¬t +l 0.083
¬t ¬l 0.747
P(T, L)T, L
+l ??¬l ??
Variable EliminationR T L
RT L
T L
4) P(L)
+t +l 0.051
+t ¬l 0.119
¬t +l 0.083
¬t ¬l 0.747
P(T, L)T, L
+l 0.134¬l 0.886
Choice of ordering is important!
Approximate Inference: Sampling• Joint probability of heads and tails of a 1 cent, and a 5
cent coin.• Advantages:– Computationally easier.– Works even without CPTs.
1 cent 5 cent
H H
H T
T H
T T
Sampling ExampleCloudy: P(C)
C
S R
W
+c 0.5¬c 0.5
Rain: P(R|C)+c +r 0.8
¬r 0.2
¬c +r 0.2
¬r 0.8
Sprinkler: P(S|C)+c +s 0.1
¬s 0.9
¬c +s 0.5
¬s 0.5Sprinkler: P(W|S,R)+c +s +w 0.99
¬w 0.01
¬s +w 0.90
¬w 0.10
¬c +s +w 0.90
¬w 0.10
¬s +w 0.01
¬w 0.99
Samples: +c, ¬s, +r
• Sampling is consistent if we want to compute the full joint probability of the network or individual variables.
• What about conditional probability? P(w|¬c)• Rejection sampling: need to reject samples that do not
match the probabilities that we are interested in.
Rejection sampling
• Too many rejected samples make it in-efficient.– Likelihood weight sampling: inconsistent
AB
Likelihood weightingCloudy: P(C)
C
S R
W
+c 0.5¬c 0.5
Rain: P(R|C)+c +r 0.8
¬r 0.2
¬c +r 0.2
¬r 0.8
Sprinkler: P(S|C)+c +s 0.1
¬s 0.9
¬c +s 0.5
¬s 0.5Sprinkler: P(W|S,R)+c +s +w 0.99
¬w 0.01
¬s +w 0.90
¬w 0.10
¬c +s +w 0.90
¬w 0.10
¬s +w 0.01
¬w 0.99
P(R|+s, +w)
Weight samples:+c, 0.1 +s, +r, 0.99 +w
weight: .01 x .99, +c, +s, +r, +w
P(C|+s, +r) ??
Gibbs Sampling
• Markov Chain Monte Carlo (MCMC)– Sample one variable at a time conditioning on
others.
+s+c -r -w
-s+c -r -w
-s+c +r -w
Monty Hall Problem• Suppose you're on a game show, and you're given the choice of three doors:
Behind one door is a car; behind the others, goats. You pick a door, say No. 2 [but the door is not opened], and the host, who knows what's behind the doors, opens another door, say No. 1, which has a goat. He then says to you, "Do you want to pick door No. 3?" Is it to your advantage to switch your choice?
P(C=3|S=2) = ??P(C=3|H=1,S=2) = ??
Monty Hall Problem• Suppose you're on a game show, and you're given the choice of three doors:
Behind one door is a car; behind the others, goats. You pick a door, say No. 2 [but the door is not opened], and the host, who knows what's behind the doors, opens another door, say No. 1, which has a goat. He then says to you, "Do you want to pick door No. 3?" Is it to your advantage to switch your choice?
P(C=3|S=2) = 1/3P(C=3|H=1,S=2) = 2/3Why???
Monty Hall Problem
• P(C=3|H=1,S=2) – = P(H=1|C=3,S=1)P(C=3|S=1)/SUM(P(H=1|C=i,
S=2)P(C=i|S=2) = 2/3• P(C=1|S=2) = P(C=2|S=2)=P(C=3|S=2) = 1/3