Upload
chace
View
32
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Axioms and Algorithms for Inferences Involving Probabilistic Independence. Dan Geiger, Azaria Paz, and Judea Pearl, Information and Computation 91(1), March 1991, 128-141. Presentation by Guy Moses & Omer Weissbrod - PowerPoint PPT Presentation
Citation preview
Axioms and AlgorithmsAxioms and Algorithmsfor Inferencesfor Inferences InvolvingInvolving
Probabilistic IndependenceProbabilistic Independence
Dan Geiger, Azaria Paz, and Judea Pearl, Information and Computation 91(1), March 1991,
128-141.
Presentation by Guy Moses & Omer Weissbrod
for the course 236372 - Bayesian NetworksComputer Science Faculty, Technion – winter 2009
partially based on the presentation by Ilan Gronau
2
What’s ahead?
IntroductionIntroduction - some definitions, notations and reminders.
Proof of CompletenessProof of Completeness. - “if it’s true – it can be proved”.
Preparations for the Membership AlgorithmPreparations for the Membership Algorithm – more definitions, and some theoretical groundwork.
The Membership AlgorithmThe Membership Algorithm – description, proof of correctness, complexity analysis.
3
DefinitionsDefinitions• U (Universe) – set of random variables with probability
distribution P.
• X,Y – finite sets of random variables:
X = x1,…,xn, Y = y1,…,ym
• P(X,Y) = P(X)·P(Y) - a short-hand notation for the equality:
Pr{x1=a1,…, xn=an, y1=b1, …, ym=bm} =
Pr{x1=a1,…, xn=an} · Pr{y1=b1, …, ym=bm}
for every choice of a1, …, an, b1, …, bm
• (X,Y) – short-hand for P(X,Y) = P(X)·P(Y)
This is called an independence statement.
*note that X,Y are disjoint sets of variables (XY = ).
4
NotationsNotations - a specific independence statement of the form
(X,Y)
- a set of independence statements of the form
(X,Y): = 1, … , k
• XY - short-hand notation for the union X Y
• P satisfies = (X,Y) means: P(X,Y) = P(X)·P(Y) for that specific P.
5
Soundness and CompletenessSoundness and CompletenessDefinitions:
iff every distribution that satisfies also satisfies .
iff cl(),i.e. there exists a derivation chain 1,…,n=
s.t. for each j, either j or j is derived by an axiom from
the previous statements.
For a set of axioms A:
Soundness: A is sound iff for every and :
Completeness: A is complete iff for every and :
Completeness - Alternative definition:
A is complete iff for every and every cl() there exists a
distribution P that satisfies cl)( and does not satisfy .
6
Independence AxiomsIndependence Axioms
We saw (in 1st lecture) that axioms 1a-1d are sound (always infer correctly).
Today we’ll show they are complete (can derive every true statement).
1a Trivial Independence:
1b Symmetry:
1c Decomposition
( , )
( , ) ( , )
( , ) ( , )
( , ) (
:
1d M , ) ( ,i ing: )x
X
X Y Y X
X YZ X Y
X Y XY Z X YZ
7
What’s ahead?
IntroductionIntroduction - some definitions, notations and reminders.
Proof of CompletenessProof of Completeness. - “if it’s true – it can be proved”.
Preparations for the Membership AlgorithmPreparations for the Membership Algorithm – more definitions, and some theoretical groundwork.
The Membership AlgorithmThe Membership Algorithm – description, proof of correctness, complexity analysis.
8
Minimal StatementMinimal Statement• Definition: =(X,Y) cl() is minimal if for every non-empty X’,Y’
s.t. X’X, Y’ Y, X’Y’XY we have (X’,Y’) cl().
• For every =(X,Y) cl() we can find an appropriate minimal
’=(X’,Y’)cl() through iterative decomposition.
• Observation:
P satisfies P satisfies ’ (decomposition soundness),
Therefore:
P doesn’t satisfy ’ P doesn’t satisfy .
• Our plan: Given an arbitrary cl(), We will find a distribution P that
satisfies cl() but doesn’t satisfy ’. This will prove completeness (using the alternative completeness definition and the observation above).
• To simplify annotation, we will assume WLOG that =(X,Y) is already minimal.
9
Let =(X,Y) cl() be a minimal statement where:
X={x1,…,xn}, Y={y1,…,ym}, and Z={z1,z2,…,zk} stand for the rest
of the variables in U.
We will construct P as follows: All variables, except x1, are fair coins
(probability for each of their two values)
x1 is defined thus:
Part 1: P does not satisfy
We will inspect the following scenario: x1=1, all other variables are 0.
P(x1, … , xn, y1, … , ym) P(x1, … , xn)·P(y1, … , ym)
Therefore, P does not satisfy , as required.
Completeness ProofCompleteness Proof
12 1
mod2n m
i ji j
x x y
=0 =0.5n =0.5m
10
Completeness Proof – cont’dCompleteness Proof – cont’dPart 2: P satisfies cl()
Let (V,W) cl(). We will show that P(V,W)=P(V)·P(W). This is done by inspecting different scenarios:
Scenario 1: either V or W contains only elements of Z. We will assume WLOG that W contains only elements of Z.
all variables in Z are independent under P and therefore:
WWVV
X
ZY
Y
Y
YY
YX
X
X
Z
Z
Z
Z
Z
Z
Z
Z
ZZ
ZZ
Z
i
iz W
P VW P V P z P V P W
11
Completeness Proof – cont’dCompleteness Proof – cont’dPart 2: P satisfies cl()
Let (V,W) cl(). We will show that P(V,W)=P(V)·P(W). This is done by inspecting different scenarios:
Scenario 2: Both V and W contain elements of X Y,but V W doesn’t contain all elements of X Y.
Without full information about the assignments of the variables in X Y, x1 could turn out to be 0 or 1 with probability , and therefore:
0.5 0.5
0.5 0.5
V W V W
V W
P VW
P V P W
WW
VVX
ZY
Y
Y
YY
YX
X
X
Z
Z
Z
Z
Z
Z
Z
Z
ZZ
ZZ
Z
12
WWX
ZY
Y
Y
YY
YX
X
X
Z
Z
Z
Z
Z
Z
Z
Z
ZZ
ZZ
Z
VV
Completeness Proof – cont’dCompleteness Proof – cont’dPart 2: P satisfies cl() - continued
Scenario 3: Both V and W contain elements of X Y, and (X Y)(V W).
We will show a derivation chain for =(X,Y), contradicting our original assumption that cl():
Mark: (V,W)=(XVYVZV, XWYWZW)cl()
where: Y=YVYW, X=XVXW, ZVZWZ, V=XVYVZV, W=XWYWZW
Remove all z’s by decomposition: (XVYV, XWYW)cl()
Due to minimality of =(X,Y): (XV,YV)cl() and (XW,Y)cl()
(XV,YV)(XVYV, XWYW) (XV,YV XWYW) = (XV,XWY)
(XW,Y) (XWY, XV) (Y, XVXW) = (Y,X) =
mix
mix
13
Completeness Proof – SummaryCompleteness Proof – SummaryReminder: Completeness - Alternative definition:
A is complete iff for every and every cl() there exists a
distribution P that satisfies cl)( and does not satisfy .
We’ve shown: given a minimal cl(), there exists a
distribution P that obeys:
1. P does not satisfy .
2. P satisfies .
Given a non-minimal cl(), we will derive its minimal statement ’, and devise a distribution P’ that satisfies but does not satisfy ’. Due to soundness of decomposition, P’ cannot satisfy as well.
14
Scope of CompletenessScope of CompletenessThe proof uses P- a binary p.d. (probability distribution
function) therefore:all p.d.’s over
Udiscrete p.d.’s
binary p.d.’s
normalp.d.’s
however,
for normal p.d.’s, the axiom set a1-d1 is not complete.
a stronger axiom is required:
replace:
with:
P P P( , ) ( , ) ( ,1d Mixin : )g I X Y I X Y Z I X Y Z
P P P1d' Composition: ( , ) ( , ) ( , )I X Y I X Z I X Y Z
•P
15
What’s ahead?
IntroductionIntroduction - some definitions, notations and reminders.
Proof of CompletenessProof of Completeness. - “if it’s true – it can be proved”.
Preparations for the Membership AlgorithmPreparations for the Membership Algorithm – more definitions, and some theoretical groundwork.
The Membership AlgorithmThe Membership Algorithm – description, proof of correctness, complexity analysis.
16
Some more Definitions and ToolsSome more Definitions and Tools
Definition: Spanspan(): the set of elements represented in statement .
Example: span(x1x2,x3,x4) = {x1,x2,x3,x4}
span(): the set of elements represented in all statements of .
Example: span({(x1,x2),(x1,x3)}) = {x1,x2,x3}
17
Some more Definitions and ToolsSome more Definitions and Tools
Definition: ProjectionThe projection of on X, denoted (X), is the statement
derived from by removing all elements not in X from .
Example:
if =(x1x2x3, x4x5) and X={x2,x3,x4} then
(X)=(x2x3, x4).
The projection of on X, denoted (X), is {(X) | }.
18
Some more Definitions and ToolsSome more Definitions and Tools
Projection Lemma: iff ‘ , where ’ = (span())
) if ' then clearly because all the statements in ‘ can be derived from the statements in by decomposition.
19
Some more Definitions and ToolsSome more Definitions and Tools
Projection Lemma: iff ’ , where ’ = (span()), s = span()
) if then there is a derivation chain for : 1, 2, … , k.
For each j:
if k j, k<j, (by symmetry or decomposition)
then k(s) j(s) by symmetry or decomposition respectively.
Similarly,
if j is derived from k and l by mixing,
then j(s) is derived from k(s), l(s) by mixing.
20
Some more Definitions and ToolsSome more Definitions and Tools
Projection Lemma: iff ’ , where ’ = (span()), s = span()
Observations from projection lemma:• Variables not in are unnecessary for determining whether
.
• The problem of verifying whether can be simplified to the problem of verifying whether ' , where '= (span()).
• This problem can be solved with a possibly reduced time and space complexity.
21
Conditions for Conditions for Inference of IndependenceInference of Independence
Maim claim: for a given , we have ’ iff:
1. is trivial: =(X,) (up to symmetry)
OR
2. is in ’: ’ (up to symmetry)
OR
3. is derivable from ’:
there exists ’’ s.t. span() = span(’)
and for ’=(AP,BQ) =(AQ,BP) (A,B,Q,P may be empty)
’ (A,P), ’ (B,Q) (up to symmetry)
22
Proof of Main ClaimProof of Main ClaimMaim claim: for a given , we have ’ iff:
1. is trivial*: =(X,) *up to symmetry
2. is in* ’: ’
3. is derivable* from ’: ’’ s.t. span() = span(’)
and for ’=(AP,BQ) =(AQ,BP) : ’ (A,P), ’ (B,Q)
) if 1. is trivial*
OR 2. is in* ’. than the proof is immediate.
otherwise,
3. there exists ’’ s.t. span() = span(’)
and for ’=(AP,BQ) =(AQ,BP) : ’ (A,P), ’ (B,Q)
we will show a constructive proof under these conditions
23
Proof of Main ClaimProof of Main ClaimMaim claim: for a given , we have ’ iff:
1. is trivial*: =(X,) *up to symmetry
2. is in* ’: ’
3. is derivable* from ’: ’’ s.t. span() = span(’)
and for ’=(AP,BQ) =(AQ,BP) : ’ (A,P), ’ (B,Q)
) (contd.) given that ’ (AP,BQ), ’ (A,P), ’ (B,Q).
1. (A,P)(AP,BQ) (A,PBQ)
2. (B,Q)(AP,BQ) (APB,Q) (PB,Q)
3. (PB,Q)(A,PBQ) (AQ,PB) = (AQ, BP) =
We’ve proven this direction.
mix
mix
mix
dec.
24
Proof of Main ClaimProof of Main ClaimMaim claim: for a given , we have ’ iff:
1. is trivial*: =(X,) *up to symmetry
2. is in* ’: ’
3. is derivable* from ’: ’’ s.t. span() = span(’)
and for ’=(AP,BQ) =(AQ,BP) : ’ (A,P), ’ (B,Q)
) Given ’ , if 1. is trivial* OR 2. is in* ’,
than the proof is immediate.
Otherwise, since no axiom can add new variables to a statement, there must exist ’’ s.t. span() =
span(’) in the derivation chain of .
also: = (AQ,BP) (A,P)
= (AQ,BP) (Q,B)
dec.
dec.
25
Conclusions from ClaimConclusions from Claim• We’ve seen that, after discarding unneeded
variables,it is possible to tell whether ’ (when it’s not immediately obvious) by:
a. Finding another statement ’’ for whichspan() = span(’),
b. Verifying that ’ (A,P), ’ (B,Q)when ’=(AP,BQ) =(AQ,BP).
• This suggests using a recursive “divide and
conquer” approach.
26
What’s ahead?
IntroductionIntroduction - some definitions, notations and reminders.
Proof of CompletenessProof of Completeness. - “if it’s true – it can be proved”.
Preparations for the Membership AlgorithmPreparations for the Membership Algorithm – more definitions, and some theoretical groundwork.
The Membership AlgorithmThe Membership Algorithm – description, proof of correctness, complexity analysis.
27
The Membership Algorithm The Membership Algorithm Procedure Find(,):
1. set ’ := (span()).
2. if is trivial, or ’ (up to symmetry)then Find(,) := TRUE.
3. else if for all non-trivial ’’: span() span(’),
then Find(,) := FALSE.
4. else there exists ’’: span() = span(’),
and ’=(AP,BQ) =(AQ,BP),
set 1:= (A,P), 2:= (B,Q).
Find(,) := (Find(’,1) Find(’,2))
28
Algorithm Correctness ProofAlgorithm Correctness ProofWe will prove that Find(,) := TRUE cl() by
induction on k=.
Induction base: if k=1 then is trivial, therefore the algorithm will return TRUE in step 2 and cl().
29
Algorithm Correctness ProofAlgorithm Correctness ProofInduction assumption: Find(,) := TRUE cl()
for each ’<k.
Induction step: Find(,) := TRUE iff either:
1. Step 2 returns TRUE is trivial or ’ cl().
2. Step 4 returns TRUE
iff
Find(’,1) := TRUE Find(’,2) := TRUE
iff
1cl(’) 2cl(’)
iff
cl(’)
(according to algorithm’s definition)
(according to induction assumption)
(according to main claim)
(according to projection lemma)iff cl()
30
Complexity AnalysisComplexity AnalysisDefinitions:
n = the number of distinct variables in {}.
k = the number of distinct variables in {}.
• First projection cost: O(||·n) – happens only once.
• Recursive step: T)k) ||·k + T(k1) + T(k2)
where k1+k2=k, k1=|1|, k2=|2|
• Can be shown by induction: T)k) ||·k·(depth of recursion)
• Worst case analysis: T)k) ||·k·k= ||·k2
• Total run time is bounded by: O(||·n + ||·k2)which is also: O(||·n2) since k n.
31
Improvements and Variations Improvements and Variations • Instead of arbitrarily choosing ’, find one whose
sub-statements {A,B,P,Q} have balanced size (can improve run-time complexity).
• Using the derivation chain presented in the constructive proof, the algorithm can also return a derivation chain for with a length of O(k).
32
Variations (contd.) Variations (contd.) The algorithm can be expanded into a polynomial
algorithm for the following problems:
• Given two sets and , is cl() cl() ?is cl() = cl() ?
• Minimize the size of while preserving cl():
Start with a maximal-size statement and remove from all statements derivable from it.Repeat with the next largest statement etc.
Thank you!Thank you!