Upload
roderick-sutton
View
220
Download
2
Tags:
Embed Size (px)
Citation preview
Rafi Bojmel
supervised by Dr. Boaz Lerner
Automatic Threshold Selection Automatic Threshold Selection for conditional independence for conditional independence
tests in learning tests in learning a Bayesian networka Bayesian network
OverviewOverview Machine Learning (ML) investigates the mechanisms by
which knowledge is acquired through experience.
Hard-core ML based applications:
Web search engines, On-line help services
Document processing (text classification, OCR)
Biological data analysis, Military applications
The Bayesian network (BN) has become one of the most The Bayesian network (BN) has become one of the most
studied machine learning models for knowledge studied machine learning models for knowledge
representation, probabilistic inference and recently also representation, probabilistic inference and recently also
classification classification
Recent visit to Asia
Tuberculosis
Smoker
Lung cancer
Positive X-ray
Either Tuberculosis or
Lung cancer
Bronchitis
Dyspnea(shortness-of-breath)
BN Example (1)BN Example (1)A=yes A=no
P(A) 50%50% 50%50%
D=yes D=no
P(D | B=yes) 90%90% 10%10%
P(D | B=no) 5%5% 95%95%
Chest Clinic (Asia) ProblemChest Clinic (Asia) Problem
'
( | , , , , , , )
( , , , , , , , )
( , , , , , , )
( , , , , , , , )
( ', , , , , , , )L
P L D S X A B E T
P L D S X A B E T
P D S X A B E T
P L D S X A B E T
P L D S X A B E T
'
( | , , , , , , )
( ) ( ) ( | ) ( | ) ( | ) ( | , ) ( | ) ( | , )
( ) ( ) ( | ) ( ' | ) ( | ) ( | , ') ( | ) ( | , )L
P L D S X A B E T
P A P S P T A P L S P B S P E T L P X E P D E B
P A P S P T A P L S P B S P E T L P X E P D E B
Recent visit to Abroad
Tuberculosis
Smoker
Lung cancer
Positive X-ray
Either Tuberculosis or Lung cancer
Bronchitis
Dyspnea (shortness-of-
breath)
'
( | ) ( | , )
( ' | ) ( | , ')L
P L S P E T L
P L S P E T L
Markov Blanketof Lung cancer
BN Example (2)BN Example (2) Chest Clinic (Asia) ProblemChest Clinic (Asia) Problem
Bayesian NetworksBayesian Networks
Learning Bayesian networks
Structure learning Parameter learning
Search-and-score
Constraint-based
Inference(e.g., classification)
Bayesian networkStructure/Graph
BN Structure LearningBN Structure Learning
Database Database Training Set Training Set Model Construction Model Construction
Test set Test set Bayesian inference (classification) Bayesian inference (classification)
Two main approaches in the area of BN Structure learning:Two main approaches in the area of BN Structure learning:
Search-and-Score, uses heuristic search method
Constraint based, analyzes dependency relationships among nodes, using
conditional independence (CI) tests. The PC algorithm is a CB based algorithm.
………………………………………………
1100000000000000#6#6
0011110000110011#5#5
0011110011111100#4#4
1100111111001111#3#3
1100001100000000#2#2
1100001100001100#1#1
DDyspneaXX-ray-rayEEitheritherBBronchitisLLung cancerung cancerTTuberculosisuberculosisSSmokermokerAAsiasia
PC algorithm (1)PC algorithm (1) Inputs:Inputs:
V: set of variables (and corresponding database)
I*(Xi,Xj|{S}) <> ε: A test of conditional independence
ε: Threshold
Order{V}: Ordering of V
Output:Output:
Directed Acyclic Graph (DAG)
V
*, |
, | , , log| |
j j i i
i j
i j i j i js x X x X si j
P x x sI X X P x x s X X s
P x s P x s
S S
S
Xi,Xj = any two nodes in the graph
I*(Xi,Xj|{S}) = Normalized Conditional Mutual Information
{S} = subset of variables (other than Xi,Xj)
PC algorithm (2)PC algorithm (2) The algorithm contains three stagesThe algorithm contains three stages::
Stage I: Start from the complete graph and find an undirected graph using conditional independence tests
Stage II: Find some head to head (V-Structures) links( X – Y – Z becomes X Y Z )
Stage III: Orient all those links that can be oriented
V
Recent visit to Asia
Tuberculosis
Smoker
Lung cancer
Positive X-ray
Either Tuberculosis or
Lung cancer
Bronchitis
Dyspnea(shortness-of-breath)
PC Algorithm SimulationPC Algorithm SimulationStageI
,I A S , |I S D B
, | ,I A D T E
END
StageII
V-structure
V-structure
StageIIIPre
cise
Struct
ure
Threshold SelectionThreshold Selection – existing methods – existing methods
Arbitrary (trial-and-error) selectionArbitrary (trial-and-error) selection
Disadvantages: haphazardness, inaccuracy, time
Likelihood or Classifier Accuracy based selectionLikelihood or Classifier Accuracy based selection
Disadvantages: exponentially run-time
V
The “risk” in selecting the wrong threshold:
Too small too many edges causality run-time
Too large loose important edges inaccuracy
Threshold selection - Novel Technique (1)Threshold selection - Novel Technique (1)
Mutual information Probability Density Functions based:
Calculate the MI values, I*(Xi,Xj | {S})I*(Xi,Xj | {S}), for different sizes
(orders) of condition set, S.
Create histograms (PDF estimation technique).Create histograms (PDF estimation technique).
Techniques to define the best threshold automatically:Techniques to define the best threshold automatically:
Zero-Crossing-Decision (ZCD)Zero-Crossing-Decision (ZCD)
Best-Candidate (BC)Best-Candidate (BC)
V
V
0 0.2 0.4 0.6 0.8 1
Mutual Information I (Xi,X
j) = mi
f MI (
mi)
Ideal and bimodal PDF of Mutual Information
bimodal MI PDFideal MI PDF
Threshold selection - Novel Technique (2)Threshold selection - Novel Technique (2)
V
Histogram of CMI values - Illustration
0
50
100
150
200
250
300
00.
020.
040.
060.
08 0.1
0.12
0.14
0.16
0.18 0.
20.
220.
240.
260.
28 0.3
CMI Values
CM
I C
ou
nte
r
Order 0 (Mutual Information)
Order 1 (Conditional Mutual Information, |S|=1)
ZCD (order=0) ZCD (order=1)
Zero-Crossing-Decision (ZCD)Zero-Crossing-Decision (ZCD)
V
Experiment and ResultsExperiment and Results
Classification experiments with 8 real-world
databases have been performed (UCI Repository)
Databases sizes: 128 - 3,200 cases.
Graph sizes: 5 - 17 nodes.
Dimension of class variable: 2 - 10.
Results - Classification Performance
0
10
20
30
40
50
60
70
80
90
100
Australian Car Cmc Corral Crx Flare Iris Mofn-3-7-10
Cla
ssific
atio
n a
ccu
racy (
%)
OTHER CB PC (Manual)PC (ZCD) PC (BC)PC (AVG) ZCD (AVG)
SummarySummary
The PC algorithm requires selecting a threshold for The PC algorithm requires selecting a threshold for
structure learning, which is a time-consuming process structure learning, which is a time-consuming process
that also undermines automatic structure learning.that also undermines automatic structure learning.
Initial examination of our novel techniques testifies that Initial examination of our novel techniques testifies that
there is a potential of both enjoying the automatic there is a potential of both enjoying the automatic
process and improving performance.process and improving performance.
Further research is executed in order to valid and Further research is executed in order to valid and
improve the proposed techniques.improve the proposed techniques.