Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Full Bayesian Network Classifiersby Jiang Su and Harry Zhang
Flemming Jensen
November 2008
Purpose
To introduce the full Bayesian network classifier(FBC).
Introduction
Bayesian networks are often used for the classification problem,where a learner attempts to construct a classifier from a given setof labeled training examples.
Since the number of possible network structures is extremely huge,structure learning often has high computational complexity.
The idea behind the full Bayesian network classifier is to reduce thecomputational complexity of structure learning by using a fullBayesian network as the structure, and represent variableindependence in the conditional probability tables instead of in thenetwork structure.
We use decision trees to represent the conditional probabilitytables to keep the compact representation of the joint distribution.
Introduction
Bayesian networks are often used for the classification problem,where a learner attempts to construct a classifier from a given setof labeled training examples.
Since the number of possible network structures is extremely huge,structure learning often has high computational complexity.
The idea behind the full Bayesian network classifier is to reduce thecomputational complexity of structure learning by using a fullBayesian network as the structure, and represent variableindependence in the conditional probability tables instead of in thenetwork structure.
We use decision trees to represent the conditional probabilitytables to keep the compact representation of the joint distribution.
Introduction
Bayesian networks are often used for the classification problem,where a learner attempts to construct a classifier from a given setof labeled training examples.
Since the number of possible network structures is extremely huge,structure learning often has high computational complexity.
The idea behind the full Bayesian network classifier is to reduce thecomputational complexity of structure learning by using a fullBayesian network as the structure, and represent variableindependence in the conditional probability tables instead of in thenetwork structure.
We use decision trees to represent the conditional probabilitytables to keep the compact representation of the joint distribution.
Introduction
Bayesian networks are often used for the classification problem,where a learner attempts to construct a classifier from a given setof labeled training examples.
Since the number of possible network structures is extremely huge,structure learning often has high computational complexity.
The idea behind the full Bayesian network classifier is to reduce thecomputational complexity of structure learning by using a fullBayesian network as the structure, and represent variableindependence in the conditional probability tables instead of in thenetwork structure.
We use decision trees to represent the conditional probabilitytables to keep the compact representation of the joint distribution.
Variable Independence
Definition - Conditionally independence
Let X , Y , Z be subsets of the variable set W . The subsets X andY are conditionally independent given Z if:
P(X |Y ,Z ) = P(X |Z )
Definition - Contextually independence
Let X , Y , Z , T be disjoint subsets of the variable set W . Thesubsets X and Y are contextually independent given Z and thecontext t if:
P(X |Y ,Z , t) = P(X |Z , t)
Variable Independence
Definition - Conditionally independence
Let X , Y , Z be subsets of the variable set W . The subsets X andY are conditionally independent given Z if:
P(X |Y ,Z ) = P(X |Z )
Definition - Contextually independence
Let X , Y , Z , T be disjoint subsets of the variable set W . Thesubsets X and Y are contextually independent given Z and thecontext t if:
P(X |Y ,Z , t) = P(X |Z , t)
Existence
Theorem - Existence
For any BN B, there exists an FBC FB, such that B and FB encodethe same variable independencies.
Proof:
Since B is an acyclic graph, the nodes of B can be sorted onthe basis of the topological ordering.
Go through each node X in the topological ordering, and addarcs to all the nodes ranked after X .
The resulting network FB is a full BN.
Build a CPT-tree for each node X in FB, such that anyvariable that is not in the parent set ΠX of X in B does notoccur in the CPT-tree of X in FB.
Existence
Theorem - Existence
For any BN B, there exists an FBC FB, such that B and FB encodethe same variable independencies.
Proof:
Since B is an acyclic graph, the nodes of B can be sorted onthe basis of the topological ordering.
Go through each node X in the topological ordering, and addarcs to all the nodes ranked after X .
The resulting network FB is a full BN.
Build a CPT-tree for each node X in FB, such that anyvariable that is not in the parent set ΠX of X in B does notoccur in the CPT-tree of X in FB.
Existence
Theorem - Existence
For any BN B, there exists an FBC FB, such that B and FB encodethe same variable independencies.
Proof:
Since B is an acyclic graph, the nodes of B can be sorted onthe basis of the topological ordering.
Go through each node X in the topological ordering, and addarcs to all the nodes ranked after X .
The resulting network FB is a full BN.
Build a CPT-tree for each node X in FB, such that anyvariable that is not in the parent set ΠX of X in B does notoccur in the CPT-tree of X in FB.
Existence
Theorem - Existence
For any BN B, there exists an FBC FB, such that B and FB encodethe same variable independencies.
Proof:
Since B is an acyclic graph, the nodes of B can be sorted onthe basis of the topological ordering.
Go through each node X in the topological ordering, and addarcs to all the nodes ranked after X .
The resulting network FB is a full BN.
Build a CPT-tree for each node X in FB, such that anyvariable that is not in the parent set ΠX of X in B does notoccur in the CPT-tree of X in FB.
Existence
Theorem - Existence
For any BN B, there exists an FBC FB, such that B and FB encodethe same variable independencies.
Proof:
Since B is an acyclic graph, the nodes of B can be sorted onthe basis of the topological ordering.
Go through each node X in the topological ordering, and addarcs to all the nodes ranked after X .
The resulting network FB is a full BN.
Build a CPT-tree for each node X in FB, such that anyvariable that is not in the parent set ΠX of X in B does notoccur in the CPT-tree of X in FB.
Existence
Theorem - Existence
For any BN B, there exists an FBC FB, such that B and FB encodethe same variable independencies.
Proof:
Since B is an acyclic graph, the nodes of B can be sorted onthe basis of the topological ordering.
Go through each node X in the topological ordering, and addarcs to all the nodes ranked after X .
The resulting network FB is a full BN.
Build a CPT-tree for each node X in FB, such that anyvariable that is not in the parent set ΠX of X in B does notoccur in the CPT-tree of X in FB.
Example - FBC for Naive Bayes
Example of a naive Bayes
C
X1 X2 X3 X4
Example - FBC for Naive Bayes
Example of an FBC for the naive Bayes
C
X1 X2 X3 X4
X1 X1
C
p11p12p13p14
X2 X2
C
p21p22p23p24
X3 X3
C
p31p32p33p34
X4 X4
C
p41p42p43p44
Learning Full Bayesian Network Classifiers
Learning an FBC consists of two parts:
Construction of a full BN.
Learning of decision trees to represent the CPT of eachvariable.
The full BN is implemented using a Bayesian multinet.
Definition - Bayesian multinet
A Bayesian multinet is a set of Bayesian networks, each of whichcorresponds to a value c of the class variable C .
Learning Full Bayesian Network Classifiers
Learning an FBC consists of two parts:
Construction of a full BN.
Learning of decision trees to represent the CPT of eachvariable.
The full BN is implemented using a Bayesian multinet.
Definition - Bayesian multinet
A Bayesian multinet is a set of Bayesian networks, each of whichcorresponds to a value c of the class variable C .
Learning Full Bayesian Network Classifiers
Learning an FBC consists of two parts:
Construction of a full BN.
Learning of decision trees to represent the CPT of eachvariable.
The full BN is implemented using a Bayesian multinet.
Definition - Bayesian multinet
A Bayesian multinet is a set of Bayesian networks, each of whichcorresponds to a value c of the class variable C .
Learning Full Bayesian Network Classifiers
Learning an FBC consists of two parts:
Construction of a full BN.
Learning of decision trees to represent the CPT of eachvariable.
The full BN is implemented using a Bayesian multinet.
Definition - Bayesian multinet
A Bayesian multinet is a set of Bayesian networks, each of whichcorresponds to a value c of the class variable C .
Learning Full Bayesian Network Classifiers
Learning an FBC consists of two parts:
Construction of a full BN.
Learning of decision trees to represent the CPT of eachvariable.
The full BN is implemented using a Bayesian multinet.
Definition - Bayesian multinet
A Bayesian multinet is a set of Bayesian networks, each of whichcorresponds to a value c of the class variable C .
Structure Learning
Learning the structure of a full BN actually means learning anorder of variables and then adding arcs from a variable to all thevariables ranked after it.
A variable is ranked based on its total influence on other variables.
The influence (dependency) between two variables can bemeasured by mutual information.
Definition - Mutual information
Let X and Y be two variables in a Bayesian network. The mutualinformation is defined as:
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
Structure Learning
Learning the structure of a full BN actually means learning anorder of variables and then adding arcs from a variable to all thevariables ranked after it.
A variable is ranked based on its total influence on other variables.
The influence (dependency) between two variables can bemeasured by mutual information.
Definition - Mutual information
Let X and Y be two variables in a Bayesian network. The mutualinformation is defined as:
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
Structure Learning
Learning the structure of a full BN actually means learning anorder of variables and then adding arcs from a variable to all thevariables ranked after it.
A variable is ranked based on its total influence on other variables.
The influence (dependency) between two variables can bemeasured by mutual information.
Definition - Mutual information
Let X and Y be two variables in a Bayesian network. The mutualinformation is defined as:
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
Structure Learning
Learning the structure of a full BN actually means learning anorder of variables and then adding arcs from a variable to all thevariables ranked after it.
A variable is ranked based on its total influence on other variables.
The influence (dependency) between two variables can bemeasured by mutual information.
Definition - Mutual information
Let X and Y be two variables in a Bayesian network. The mutualinformation is defined as:
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
Structure Learning
It is possible that the dependency between two variables, measuredby mutual information, is caused merely by noise.
Results by Friedman are used as a dependency threshold to filterout unreliable dependencies.
Definition - Dependency threshold
Let Xi and Xj be two variables in a Bayesian network. Thedependency threshold, denoted by φ, is defined as:
φ(Xi ,Xj) = logN2N × Tij , where Tij = |Xi | × |Xj |.
Structure Learning
It is possible that the dependency between two variables, measuredby mutual information, is caused merely by noise.
Results by Friedman are used as a dependency threshold to filterout unreliable dependencies.
Definition - Dependency threshold
Let Xi and Xj be two variables in a Bayesian network. Thedependency threshold, denoted by φ, is defined as:
φ(Xi ,Xj) = logN2N × Tij , where Tij = |Xi | × |Xj |.
Structure Learning
It is possible that the dependency between two variables, measuredby mutual information, is caused merely by noise.
Results by Friedman are used as a dependency threshold to filterout unreliable dependencies.
Definition - Dependency threshold
Let Xi and Xj be two variables in a Bayesian network. Thedependency threshold, denoted by φ, is defined as:
φ(Xi ,Xj) = logN2N × Tij , where Tij = |Xi | × |Xj |.
Structure Learning
The total influence of a variable on other variables can now bedefined:
Definition - Total influence
Let Xi be a variable in a Bayesian network. The total influence ofXi on other variables, denoted by W (Xi ), is defined as:
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj).
Structure Learning
The total influence of a variable on other variables can now bedefined:
Definition - Total influence
Let Xi be a variable in a Bayesian network. The total influence ofXi on other variables, denoted by W (Xi ), is defined as:
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj).
Structure Learning Algorithm
Algorithm FBC-Structure(S, X)
1 B = empty.
2 Partition the training data S into |C | subsets Sc by the classvalue c .
3 For each training data set Sc
Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.
4 Return B.
Structure Learning Algorithm
Algorithm FBC-Structure(S, X)
1 B = empty.
2 Partition the training data S into |C | subsets Sc by the classvalue c .
3 For each training data set Sc
Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.
4 Return B.
Structure Learning Algorithm
Algorithm FBC-Structure(S, X)
1 B = empty.
2 Partition the training data S into |C | subsets Sc by the classvalue c .
3 For each training data set Sc
Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.
4 Return B.
Structure Learning Algorithm
Algorithm FBC-Structure(S, X)
1 B = empty.
2 Partition the training data S into |C | subsets Sc by the classvalue c .
3 For each training data set Sc
Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.
4 Return B.
Structure Learning Algorithm
Algorithm FBC-Structure(S, X)
1 B = empty.
2 Partition the training data S into |C | subsets Sc by the classvalue c .
3 For each training data set Sc
Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .
Compute W (Xi ) for each variable Xi .For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.
4 Return B.
Structure Learning Algorithm
Algorithm FBC-Structure(S, X)
1 B = empty.
2 Partition the training data S into |C | subsets Sc by the classvalue c .
3 For each training data set Sc
Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .
For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.
4 Return B.
Structure Learning Algorithm
Algorithm FBC-Structure(S, X)
1 B = empty.
2 Partition the training data S into |C | subsets Sc by the classvalue c .
3 For each training data set Sc
Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .For all variables Xi in X
- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.
4 Return B.
Structure Learning Algorithm
Algorithm FBC-Structure(S, X)
1 B = empty.
2 Partition the training data S into |C | subsets Sc by the classvalue c .
3 For each training data set Sc
Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .
- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.
4 Return B.
Structure Learning Algorithm
Algorithm FBC-Structure(S, X)
1 B = empty.
2 Partition the training data S into |C | subsets Sc by the classvalue c .
3 For each training data set Sc
Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .
Add the resulting network Bc to B.
4 Return B.
Structure Learning Algorithm
Algorithm FBC-Structure(S, X)
1 B = empty.
2 Partition the training data S into |C | subsets Sc by the classvalue c .
3 For each training data set Sc
Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.
4 Return B.
Structure Learning Algorithm
Algorithm FBC-Structure(S, X)
1 B = empty.
2 Partition the training data S into |C | subsets Sc by the classvalue c .
3 For each training data set Sc
Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.
4 Return B.
Example - Structure Learning Algorithm
Example using 1000 labeled instances, where C is the class variableand A, B, and D are feature variables.
C A B D #
c1 a1 b1 d1 11
c1 a1 b1 d2 5
c1 a1 b2 d1 7
c1 a1 b2 d2 17
c1 a2 b1 d1 227
c1 a2 b1 d2 97
c1 a2 b2 d1 11
c1 a2 b2 d2 25
C A B D #
c2 a1 b1 d1 36
c2 a1 b1 d2 36
c2 a1 b2 d1 259
c2 a1 b2 d2 29
c2 a2 b1 d1 96
c2 a2 b1 d2 96
c2 a2 b2 d1 43
c2 a2 b2 d2 5
Example - Structure Learning Algorithm
C A B D #
c1 a1 b1 d1 11
c1 a1 b1 d2 5
c1 a1 b2 d1 7
c1 a1 b2 d2 17
c1 a2 b1 d1 227
c1 a2 b1 d2 97
c1 a2 b2 d1 11
c1 a2 b2 d2 25The 400 data instances where C = c1.
b1 b2
a111+5400
7+17400
a2227+97
40011+25400
P(A,B)
Example - Structure Learning Algorithm
C A B D #
c1 a1 b1 d1 11
c1 a1 b1 d2 5
c1 a1 b2 d1 7
c1 a1 b2 d2 17
c1 a2 b1 d1 227
c1 a2 b1 d2 97
c1 a2 b2 d1 11
c1 a2 b2 d2 25The 400 data instances where C = c1.
b1 b2
a111+5400
7+17400
a2227+97
40011+25400
P(A,B)
Example - Structure Learning Algorithm
C A B D #
c1 a1 b1 d1 11
c1 a1 b1 d2 5
c1 a1 b2 d1 7
c1 a1 b2 d2 17
c1 a2 b1 d1 227
c1 a2 b1 d2 97
c1 a2 b2 d1 11
c1 a2 b2 d2 25The 400 data instances where C = c1.
b1 b2
a111+5400
7+17400
a2227+97
40011+25400
P(A,B)
Example - Structure Learning Algorithm
C A B D #
c1 a1 b1 d1 11
c1 a1 b1 d2 5
c1 a1 b2 d1 7
c1 a1 b2 d2 17
c1 a2 b1 d1 227
c1 a2 b1 d2 97
c1 a2 b2 d1 11
c1 a2 b2 d2 25The 400 data instances where C = c1.
b1 b2
a111+5400
7+17400
a2227+97
40011+25400
P(A,B)
Example - Structure Learning Algorithm
C A B D #
c1 a1 b1 d1 11
c1 a1 b1 d2 5
c1 a1 b2 d1 7
c1 a1 b2 d2 17
c1 a2 b1 d1 227
c1 a2 b1 d2 97
c1 a2 b2 d1 11
c1 a2 b2 d2 25The 400 data instances where C = c1.
b1 b2
a111+5400
7+17400
a2227+97
40011+25400
P(A,B)
Example - Structure Learning Algorithm
C A B D #
c1 a1 b1 d1 11
c1 a1 b1 d2 5
c1 a1 b2 d1 7
c1 a1 b2 d2 17
c1 a2 b1 d1 227
c1 a2 b1 d2 97
c1 a2 b2 d1 11
c1 a2 b2 d2 25The 400 data instances where C = c1.
b1 b2
a111+5400
7+17400
a2227+97
40011+25400
P(A,B)
Example - Structure Learning Algorithm
C A B D #
c1 a1 b1 d1 11
c1 a1 b1 d2 5
c1 a1 b2 d1 7
c1 a1 b2 d2 17
c1 a2 b1 d1 227
c1 a2 b1 d2 97
c1 a2 b2 d1 11
c1 a2 b2 d2 25The 400 data instances where C = c1.
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
Example - Structure Learning Algorithm
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
b1 b2
a1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09)
a2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09)
P(A)P(B)
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
M(A; B)= 0.04 · log(0.04
0.085) + 0.81 · log(
0.81
0.765)
+0.06 · log(0.06
0.015) + 0.09 · log(
0.09
0.135) = 0.027
Example - Structure Learning Algorithm
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
b1 b2
a1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09)
a2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09)
P(A)P(B)
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
M(A; B)= 0.04 · log(0.04
0.085) + 0.81 · log(
0.81
0.765)
+0.06 · log(0.06
0.015) + 0.09 · log(
0.09
0.135) = 0.027
Example - Structure Learning Algorithm
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
b1 b2
a1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09)
a2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09)
P(A)P(B)
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
M(A; B)= 0.04 · log(0.04
0.085) + 0.81 · log(
0.81
0.765)
+0.06 · log(0.06
0.015) + 0.09 · log(
0.09
0.135) = 0.027
Example - Structure Learning Algorithm
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
b1 b2
a1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09)
a2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09)
P(A)P(B)
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
M(A; B)= 0.04 · log(0.04
0.085) + 0.81 · log(
0.81
0.765)
+0.06 · log(0.06
0.015) + 0.09 · log(
0.09
0.135) = 0.027
Example - Structure Learning Algorithm
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
b1 b2
a1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09)
a2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09)
P(A)P(B)
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
M(A; B)= 0.04 · log(0.04
0.085) + 0.81 · log(
0.81
0.765)
+0.06 · log(0.06
0.015) + 0.09 · log(
0.09
0.135) = 0.027
Example - Structure Learning Algorithm
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
b1 b2
a1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09)
a2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09)
P(A)P(B)
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
M(A; B)= 0.04 · log(0.04
0.085) + 0.81 · log(
0.81
0.765)
+0.06 · log(0.06
0.015) + 0.09 · log(
0.09
0.135) = 0.027
Example - Structure Learning Algorithm
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
b1 b2
a1 0.085 0.015
a2 0.765 0.135
P(A)P(B)
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
M(A; B)= 0.04 · log(0.04
0.085) + 0.81 · log(
0.81
0.765)
+0.06 · log(0.06
0.015) + 0.09 · log(
0.09
0.135) = 0.027
Example - Structure Learning Algorithm
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
b1 b2
a1 0.085 0.015
a2 0.765 0.135
P(A)P(B)
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
M(A; B)= 0.04 · log(0.04
0.085) + 0.81 · log(
0.81
0.765)
+0.06 · log(0.06
0.015) + 0.09 · log(
0.09
0.135) = 0.027
Example - Structure Learning Algorithm
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
b1 b2
a1 0.085 0.015
a2 0.765 0.135
P(A)P(B)
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
M(A; B) = 0.04 · log(0.04
0.085) + 0.81 · log(
0.81
0.765)
+0.06 · log(0.06
0.015) + 0.09 · log(
0.09
0.135) = 0.027
Example - Structure Learning Algorithm
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
b1 b2
a1 0.085 0.015
a2 0.765 0.135
P(A)P(B)
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
M(A; B) = 0.04 · log(0.04
0.085)+0.81 · log(
0.81
0.765)
+0.06 · log(0.06
0.015) + 0.09 · log(
0.09
0.135) = 0.027
Example - Structure Learning Algorithm
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
b1 b2
a1 0.085 0.015
a2 0.765 0.135
P(A)P(B)
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
M(A; B) = 0.04 · log(0.04
0.085) + 0.81 · log(
0.81
0.765)
+0.06 · log(0.06
0.015) + 0.09 · log(
0.09
0.135) = 0.027
Example - Structure Learning Algorithm
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
b1 b2
a1 0.085 0.015
a2 0.765 0.135
P(A)P(B)
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
M(A; B) = 0.04 · log(0.04
0.085) + 0.81 · log(
0.81
0.765)
+ 0.06 · log(0.06
0.015)+0.09 · log(
0.09
0.135) = 0.027
Example - Structure Learning Algorithm
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
b1 b2
a1 0.085 0.015
a2 0.765 0.135
P(A)P(B)
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
M(A; B) = 0.04 · log(0.04
0.085) + 0.81 · log(
0.81
0.765)
+ 0.06 · log(0.06
0.015) + 0.09 · log(
0.09
0.135)= 0.027
Example - Structure Learning Algorithm
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
b1 b2
a1 0.085 0.015
a2 0.765 0.135
P(A)P(B)
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
M(A; B) = 0.04 · log(0.04
0.085) + 0.81 · log(
0.81
0.765)
+ 0.06 · log(0.06
0.015) + 0.09 · log(
0.09
0.135) = 0.027
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027
M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004
M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D)
= 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800
= 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A)
= M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B)
= 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027
indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B)
= M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B)
+M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D)
= 0.045indent indent indentW (D) = M(B; D) = 0.018
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045
indent indent indentW (D) = M(B; D) = 0.018
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D)
= M(B; D) = 0.018
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D)
= 0.018
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018
Example - Structure Learning Algorithm
We now construct a full Bayesian network with variable orderaccording to the total influence values:
W (A) = 0.027W (B) = 0.045W (D) = 0.018
W (B) > W (A) > W (D)
B A D
We now have the full Bayesian network Bc1 , which is the part ofthe multinet that corresponds to C = c1.We should now repeat the process to construct Bc2 and therebycomplete the FBC structure learning.
Example - Structure Learning Algorithm
We now construct a full Bayesian network with variable orderaccording to the total influence values:
W (A) = 0.027W (B) = 0.045W (D) = 0.018
W (B) > W (A) > W (D)
B A D
We now have the full Bayesian network Bc1 , which is the part ofthe multinet that corresponds to C = c1.We should now repeat the process to construct Bc2 and therebycomplete the FBC structure learning.
Example - Structure Learning Algorithm
We now construct a full Bayesian network with variable orderaccording to the total influence values:
W (A) = 0.027W (B) = 0.045W (D) = 0.018
W (B) > W (A) > W (D)
B A D
We now have the full Bayesian network Bc1 , which is the part ofthe multinet that corresponds to C = c1.We should now repeat the process to construct Bc2 and therebycomplete the FBC structure learning.
Example - Structure Learning Algorithm
We now construct a full Bayesian network with variable orderaccording to the total influence values:
W (A) = 0.027W (B) = 0.045W (D) = 0.018
W (B) > W (A) > W (D)
B A D
We now have the full Bayesian network Bc1 , which is the part ofthe multinet that corresponds to C = c1.We should now repeat the process to construct Bc2 and therebycomplete the FBC structure learning.
Example - Structure Learning Algorithm
We now construct a full Bayesian network with variable orderaccording to the total influence values:
W (A) = 0.027W (B) = 0.045W (D) = 0.018
W (B) > W (A) > W (D)
B A D
We now have the full Bayesian network Bc1 , which is the part ofthe multinet that corresponds to C = c1.We should now repeat the process to construct Bc2 and therebycomplete the FBC structure learning.
Example - Structure Learning Algorithm
We now construct a full Bayesian network with variable orderaccording to the total influence values:
W (A) = 0.027W (B) = 0.045W (D) = 0.018
W (B) > W (A) > W (D)
B A D
We now have the full Bayesian network Bc1 , which is the part ofthe multinet that corresponds to C = c1.We should now repeat the process to construct Bc2 and therebycomplete the FBC structure learning.
Example - Structure Learning Algorithm
We now construct a full Bayesian network with variable orderaccording to the total influence values:
W (A) = 0.027W (B) = 0.045W (D) = 0.018
W (B) > W (A) > W (D)
B A D
We now have the full Bayesian network Bc1 , which is the part ofthe multinet that corresponds to C = c1.We should now repeat the process to construct Bc2 and therebycomplete the FBC structure learning.
Example - Structure Learning Algorithm
We now construct a full Bayesian network with variable orderaccording to the total influence values:
W (A) = 0.027W (B) = 0.045W (D) = 0.018
W (B) > W (A) > W (D)
B A D
We now have the full Bayesian network Bc1 , which is the part ofthe multinet that corresponds to C = c1.We should now repeat the process to construct Bc2 and therebycomplete the FBC structure learning.
CPT-tree Learning
We now need to learn a CPT-tree for each variable in the full BN.
A traditional decision tree learning algorithm, such as C4.5, can beused to learn CPT-trees. However, since the time complexitytypically is O(n2 · N) the resulting FBC learning algorithm wouldhave a complexity of O(n3 · N).
Instead a fast decision tree learning algorithm is purposed.
The algorithm uses the mutual information to determine a fixedordering of variables from root to leaves.
The predefined variable ordering makes the algorithm faster thantraditional decision tree learning algorithms.
CPT-tree Learning
We now need to learn a CPT-tree for each variable in the full BN.
A traditional decision tree learning algorithm, such as C4.5, can beused to learn CPT-trees. However, since the time complexitytypically is O(n2 · N) the resulting FBC learning algorithm wouldhave a complexity of O(n3 · N).
Instead a fast decision tree learning algorithm is purposed.
The algorithm uses the mutual information to determine a fixedordering of variables from root to leaves.
The predefined variable ordering makes the algorithm faster thantraditional decision tree learning algorithms.
CPT-tree Learning
We now need to learn a CPT-tree for each variable in the full BN.
A traditional decision tree learning algorithm, such as C4.5, can beused to learn CPT-trees. However, since the time complexitytypically is O(n2 · N) the resulting FBC learning algorithm wouldhave a complexity of O(n3 · N).
Instead a fast decision tree learning algorithm is purposed.
The algorithm uses the mutual information to determine a fixedordering of variables from root to leaves.
The predefined variable ordering makes the algorithm faster thantraditional decision tree learning algorithms.
CPT-tree Learning
We now need to learn a CPT-tree for each variable in the full BN.
A traditional decision tree learning algorithm, such as C4.5, can beused to learn CPT-trees. However, since the time complexitytypically is O(n2 · N) the resulting FBC learning algorithm wouldhave a complexity of O(n3 · N).
Instead a fast decision tree learning algorithm is purposed.
The algorithm uses the mutual information to determine a fixedordering of variables from root to leaves.
The predefined variable ordering makes the algorithm faster thantraditional decision tree learning algorithms.
CPT-tree Learning
We now need to learn a CPT-tree for each variable in the full BN.
A traditional decision tree learning algorithm, such as C4.5, can beused to learn CPT-trees. However, since the time complexitytypically is O(n2 · N) the resulting FBC learning algorithm wouldhave a complexity of O(n3 · N).
Instead a fast decision tree learning algorithm is purposed.
The algorithm uses the mutual information to determine a fixedordering of variables from root to leaves.
The predefined variable ordering makes the algorithm faster thantraditional decision tree learning algorithms.
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.
2 If (S is pure or empty) or (ΠXiis empty)
Return T.3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.
4 While (qualified == False) and (ΠXiis not empty)
Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)
Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).
Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .
Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.
Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.
If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True5 If qualified == True
Create a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == True
Create a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.
Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .
For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)
- Add Tx as a child of Xj .
6 Return T.
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
Example - CPT-tree Learning Algorithm
We construct the CPT-tree for the variable D first.
Fast-CPT-Tree(ΠD = {A,B}, S)
M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.MS(D; B) = M(D; B) = 0.018 , φS(D,B) = φ(D,B) = 0.013MS(D; B) > φS(D,B) so qualified = True.Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)Add the resulting trees as children of Xj = B.
Bb1 b2
Example - CPT-tree Learning Algorithm
We construct the CPT-tree for the variable D first.Fast-CPT-Tree(ΠD = {A,B}, S)
M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.MS(D; B) = M(D; B) = 0.018 , φS(D,B) = φ(D,B) = 0.013MS(D; B) > φS(D,B) so qualified = True.Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)Add the resulting trees as children of Xj = B.
Bb1 b2
Example - CPT-tree Learning Algorithm
We construct the CPT-tree for the variable D first.Fast-CPT-Tree(ΠD = {A,B}, S)
M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.
MS(D; B) = M(D; B) = 0.018 , φS(D,B) = φ(D,B) = 0.013MS(D; B) > φS(D,B) so qualified = True.Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)Add the resulting trees as children of Xj = B.
Bb1 b2
Example - CPT-tree Learning Algorithm
We construct the CPT-tree for the variable D first.Fast-CPT-Tree(ΠD = {A,B}, S)
M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.MS(D; B) = M(D; B) = 0.018
, φS(D,B) = φ(D,B) = 0.013MS(D; B) > φS(D,B) so qualified = True.Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)Add the resulting trees as children of Xj = B.
Bb1 b2
Example - CPT-tree Learning Algorithm
We construct the CPT-tree for the variable D first.Fast-CPT-Tree(ΠD = {A,B}, S)
M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.MS(D; B) = M(D; B) = 0.018 , φS(D,B) = φ(D,B) = 0.013
MS(D; B) > φS(D,B) so qualified = True.Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)Add the resulting trees as children of Xj = B.
Bb1 b2
Example - CPT-tree Learning Algorithm
We construct the CPT-tree for the variable D first.Fast-CPT-Tree(ΠD = {A,B}, S)
M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.MS(D; B) = M(D; B) = 0.018 , φS(D,B) = φ(D,B) = 0.013MS(D; B) > φS(D,B) so qualified = True.
Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)Add the resulting trees as children of Xj = B.
Bb1 b2
Example - CPT-tree Learning Algorithm
We construct the CPT-tree for the variable D first.Fast-CPT-Tree(ΠD = {A,B}, S)
M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.MS(D; B) = M(D; B) = 0.018 , φS(D,B) = φ(D,B) = 0.013MS(D; B) > φS(D,B) so qualified = True.Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .
Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)Add the resulting trees as children of Xj = B.
Bb1 b2
Example - CPT-tree Learning Algorithm
We construct the CPT-tree for the variable D first.Fast-CPT-Tree(ΠD = {A,B}, S)
M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.MS(D; B) = M(D; B) = 0.018 , φS(D,B) = φ(D,B) = 0.013MS(D; B) > φS(D,B) so qualified = True.Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)
Add the resulting trees as children of Xj = B.
Bb1 b2
Example - CPT-tree Learning Algorithm
We construct the CPT-tree for the variable D first.Fast-CPT-Tree(ΠD = {A,B}, S)
M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.MS(D; B) = M(D; B) = 0.018 , φS(D,B) = φ(D,B) = 0.013MS(D; B) > φS(D,B) so qualified = True.Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)Add the resulting trees as children of Xj = B.
Bb1 b2
Example - CPT-tree Learning Algorithm
We construct the CPT-tree for the variable D first.Fast-CPT-Tree(ΠD = {A,B}, S)
M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.MS(D; B) = M(D; B) = 0.018 , φS(D,B) = φ(D,B) = 0.013MS(D; B) > φS(D,B) so qualified = True.Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)Add the resulting trees as children of Xj = B.
Bb1 b2
Example - CPT-tree Learning Algorithm
Fast-CPT-Tree(ΠD = {A}, Sb1)
Only one parent variable remains, so Xj = A.
MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Fast-CPT-Tree(ΠD = {A}, Sb2)
Only one parent variable remains, so Xj = A.
MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Example - CPT-tree Learning Algorithm
Fast-CPT-Tree(ΠD = {A}, Sb1)
Only one parent variable remains, so Xj = A.
MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Fast-CPT-Tree(ΠD = {A}, Sb2)
Only one parent variable remains, so Xj = A.
MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Example - CPT-tree Learning Algorithm
Fast-CPT-Tree(ΠD = {A}, Sb1)
Only one parent variable remains, so Xj = A.
MSb1 (D; A) = 7 · 10−6
, φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Fast-CPT-Tree(ΠD = {A}, Sb2)
Only one parent variable remains, so Xj = A.
MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Example - CPT-tree Learning Algorithm
Fast-CPT-Tree(ΠD = {A}, Sb1)
Only one parent variable remains, so Xj = A.
MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015
MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Fast-CPT-Tree(ΠD = {A}, Sb2)
Only one parent variable remains, so Xj = A.
MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Example - CPT-tree Learning Algorithm
Fast-CPT-Tree(ΠD = {A}, Sb1)
Only one parent variable remains, so Xj = A.
MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Fast-CPT-Tree(ΠD = {A}, Sb2)
Only one parent variable remains, so Xj = A.
MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Example - CPT-tree Learning Algorithm
Fast-CPT-Tree(ΠD = {A}, Sb1)
Only one parent variable remains, so Xj = A.
MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Fast-CPT-Tree(ΠD = {A}, Sb2)
Only one parent variable remains, so Xj = A.
MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Example - CPT-tree Learning Algorithm
Fast-CPT-Tree(ΠD = {A}, Sb1)
Only one parent variable remains, so Xj = A.
MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Fast-CPT-Tree(ΠD = {A}, Sb2)
Only one parent variable remains, so Xj = A.
MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Example - CPT-tree Learning Algorithm
Fast-CPT-Tree(ΠD = {A}, Sb1)
Only one parent variable remains, so Xj = A.
MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Fast-CPT-Tree(ΠD = {A}, Sb2)
Only one parent variable remains, so Xj = A.
MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Example - CPT-tree Learning Algorithm
Fast-CPT-Tree(ΠD = {A}, Sb1)
Only one parent variable remains, so Xj = A.
MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Fast-CPT-Tree(ΠD = {A}, Sb2)
Only one parent variable remains, so Xj = A.
MSb2 (D; A) = 4 · 10−5
, φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Example - CPT-tree Learning Algorithm
Fast-CPT-Tree(ΠD = {A}, Sb1)
Only one parent variable remains, so Xj = A.
MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Fast-CPT-Tree(ΠD = {A}, Sb2)
Only one parent variable remains, so Xj = A.
MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059
MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Example - CPT-tree Learning Algorithm
Fast-CPT-Tree(ΠD = {A}, Sb1)
Only one parent variable remains, so Xj = A.
MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Fast-CPT-Tree(ΠD = {A}, Sb2)
Only one parent variable remains, so Xj = A.
MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Example - CPT-tree Learning Algorithm
Fast-CPT-Tree(ΠD = {A}, Sb1)
Only one parent variable remains, so Xj = A.
MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Fast-CPT-Tree(ΠD = {A}, Sb2)
Only one parent variable remains, so Xj = A.
MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Example - CPT-tree Learning Algorithm
We now only need to add Xi = D as children of B and specify theprobabilities, which are trivial to calculate.
Bb1 b2
We should repeat this process for each variable in each network.
Example - CPT-tree Learning Algorithm
We now only need to add Xi = D as children of B and specify theprobabilities, which are trivial to calculate.
Bb1 b2
We should repeat this process for each variable in each network.
Example - CPT-tree Learning Algorithm
We now only need to add Xi = D as children of B and specify theprobabilities, which are trivial to calculate.
Bb1 b2
Dd1 d2
Dd1 d2
We should repeat this process for each variable in each network.
Example - CPT-tree Learning Algorithm
We now only need to add Xi = D as children of B and specify theprobabilities, which are trivial to calculate.
Bb1 b2
Dd1 d2
Dd1 d2
11+227340 =0.7
5+97340 =0.3
7+1160 =0.3
17+2560 =0.7
We should repeat this process for each variable in each network.
Complexity
Let n be the number of variables and N the number of datainstances.
FBC-Structure has time complexity O(n2 · N).
Fast-CPT-Tree has time complexity O(n · N).Fast-CPT-Tree is called once for each variable in each of the |C |multinet parts. Hence the time complexity:O(|C | · n2 · N
|C |) = O(n2 · N).
Thus, the FBC learning algorithm has the time complexityO(n2 · N).
Complexity
Let n be the number of variables and N the number of datainstances.
FBC-Structure has time complexity O(n2 · N).
Fast-CPT-Tree has time complexity O(n · N).Fast-CPT-Tree is called once for each variable in each of the |C |multinet parts. Hence the time complexity:O(|C | · n2 · N
|C |) = O(n2 · N).
Thus, the FBC learning algorithm has the time complexityO(n2 · N).
Complexity
Let n be the number of variables and N the number of datainstances.
FBC-Structure has time complexity O(n2 · N).
Fast-CPT-Tree has time complexity O(n · N).
Fast-CPT-Tree is called once for each variable in each of the |C |multinet parts. Hence the time complexity:O(|C | · n2 · N
|C |) = O(n2 · N).
Thus, the FBC learning algorithm has the time complexityO(n2 · N).
Complexity
Let n be the number of variables and N the number of datainstances.
FBC-Structure has time complexity O(n2 · N).
Fast-CPT-Tree has time complexity O(n · N).Fast-CPT-Tree is called once for each variable in each of the |C |multinet parts. Hence the time complexity:O(|C | · n2 · N
|C |) = O(n2 · N).
Thus, the FBC learning algorithm has the time complexityO(n2 · N).
Complexity
Let n be the number of variables and N the number of datainstances.
FBC-Structure has time complexity O(n2 · N).
Fast-CPT-Tree has time complexity O(n · N).Fast-CPT-Tree is called once for each variable in each of the |C |multinet parts. Hence the time complexity:O(|C | · n2 · N
|C |) = O(n2 · N).
Thus, the FBC learning algorithm has the time complexityO(n2 · N).
Experiments - Results
33 UCI data sets, available in Weka, are used for experiments.
Performance of an algorithm on each data set is observed via 10runs of 10-fold cross validation.
Two-tailed t-test with a 95% confidence interval is conducted tocompare each pair of algorithms on each data set.
Results on accuracy - classification (data sets won - draw - lost)AODE HGC TAN NBT C4.5 SMO
FBC 8/22/3 4/27/2 6/27/0 6/27/0 11/19/3 6/24/2
Results on AUC - ranking (data sets won - draw - lost)AODE HGC TAN NBT C4.5L SMO
FBC 7/22/4 6/25/2 9/24/0 8/24/1 25/7/1 10/20/3
Experiments - Results
33 UCI data sets, available in Weka, are used for experiments.
Performance of an algorithm on each data set is observed via 10runs of 10-fold cross validation.
Two-tailed t-test with a 95% confidence interval is conducted tocompare each pair of algorithms on each data set.
Results on accuracy - classification (data sets won - draw - lost)AODE HGC TAN NBT C4.5 SMO
FBC 8/22/3 4/27/2 6/27/0 6/27/0 11/19/3 6/24/2
Results on AUC - ranking (data sets won - draw - lost)AODE HGC TAN NBT C4.5L SMO
FBC 7/22/4 6/25/2 9/24/0 8/24/1 25/7/1 10/20/3
Experiments - Results
33 UCI data sets, available in Weka, are used for experiments.
Performance of an algorithm on each data set is observed via 10runs of 10-fold cross validation.
Two-tailed t-test with a 95% confidence interval is conducted tocompare each pair of algorithms on each data set.
Results on accuracy - classification (data sets won - draw - lost)AODE HGC TAN NBT C4.5 SMO
FBC 8/22/3 4/27/2 6/27/0 6/27/0 11/19/3 6/24/2
Results on AUC - ranking (data sets won - draw - lost)AODE HGC TAN NBT C4.5L SMO
FBC 7/22/4 6/25/2 9/24/0 8/24/1 25/7/1 10/20/3
Experiments - Results
33 UCI data sets, available in Weka, are used for experiments.
Performance of an algorithm on each data set is observed via 10runs of 10-fold cross validation.
Two-tailed t-test with a 95% confidence interval is conducted tocompare each pair of algorithms on each data set.
Results on accuracy - classification (data sets won - draw - lost)AODE HGC TAN NBT C4.5 SMO
FBC 8/22/3 4/27/2 6/27/0 6/27/0 11/19/3 6/24/2
Results on AUC - ranking (data sets won - draw - lost)AODE HGC TAN NBT C4.5L SMO
FBC 7/22/4 6/25/2 9/24/0 8/24/1 25/7/1 10/20/3
Experiments - Results
33 UCI data sets, available in Weka, are used for experiments.
Performance of an algorithm on each data set is observed via 10runs of 10-fold cross validation.
Two-tailed t-test with a 95% confidence interval is conducted tocompare each pair of algorithms on each data set.
Results on accuracy - classification (data sets won - draw - lost)AODE HGC TAN NBT C4.5 SMO
FBC 8/22/3 4/27/2 6/27/0 6/27/0 11/19/3 6/24/2
Results on AUC - ranking (data sets won - draw - lost)AODE HGC TAN NBT C4.5L SMO
FBC 7/22/4 6/25/2 9/24/0 8/24/1 25/7/1 10/20/3
Experiments - Complexity
Complexity of tested algorithms
Training Classification
FBC O(n2 · N) O(n)AODE O(n2 · N) O(n2)HGC O(n4 · N) O(n)TAN O(n2 · N) O(n)NBT O(n3 · N) O(n)C4.5 O(n2 · N) O(n)SMO O(n2.3) O(n)
Experiments - Conclusion
FBC demonstrates good performance in both classification andranking.
FBC is among the most efficient algorithms in both training andclassification time.
Overall, the performance of FBC is the best among the algorithmscompared.
Experiments - Conclusion
FBC demonstrates good performance in both classification andranking.
FBC is among the most efficient algorithms in both training andclassification time.
Overall, the performance of FBC is the best among the algorithmscompared.
Experiments - Conclusion
FBC demonstrates good performance in both classification andranking.
FBC is among the most efficient algorithms in both training andclassification time.
Overall, the performance of FBC is the best among the algorithmscompared.