Upload
minowa
View
27
Download
0
Embed Size (px)
DESCRIPTION
A Strategy for Making Predictions under Manipulation. Ioannis Tsamardinos Assistant Professor Computer Science Department, University of Crete ICS, Foundation for Research and Technology - Hellas. Laura E. Brown Ph.D. Candidate Dept. Biomedical Inf., Vanderbilt Univ. - PowerPoint PPT Presentation
Citation preview
A Strategy for Making Predictions under Manipulation
Ioannis TsamardinosAssistant ProfessorComputer Science Department, University of CreteICS, Foundation for Research and Technology - Hellas
Laura E. BrownPh.D. CandidateDept. Biomedical Inf., Vanderbilt Univ.
5/10/2007 I. Tsamardinos, CSD, University of Crete
2
Selecting a Formulation of Causality Causal Bayesian
Networks Cross Sectional Data No explicit notion of time No feedback cycles
allows Edges express causal
relations Distribution expressed
as
T
V2
V3V1
V5
V4
V6 ))(|()( ii VPaVPVP
5/10/2007 I. Tsamardinos, CSD, University of Crete
3
Effect of Manipulation
T
V2
V3V1
V5
V4
V6Manipulate V1 , V5
5/10/2007 I. Tsamardinos, CSD, University of Crete
4
Effect of Manipulation
T
V2
V3V1
V5
V4
V6Manipulate V1 , V5
T
V2
V3V1
V5
V4
V6
E
External Manipulator
5/10/2007 I. Tsamardinos, CSD, University of Crete
5
Effect of Manipulation
T
V2
V3V1
V5
V4
V6Manipulate V1 , V5
T
V2
V3V1
V5
V4
V6
E
Other parents are removed
5/10/2007 I. Tsamardinos, CSD, University of Crete
6
Effect of Manipulation
T
V2
V3V1
V5
V4
V6
E
Mii
MiiiM
ii
EVPVPaVPVP
VPaVPVP
)|())(|()(
))(|()(
M the set of manipulated variables
J Pearl. Causality, Models, Reasoning, and Inference, 2000.
5/10/2007 I. Tsamardinos, CSD, University of Crete
7
Types of Predictive Tasks
A. No manipulations
B. Known set of manipulated variables M From data following P(V) Predict data following PM(V) The way manipulations are performed is
unknown, i.e. PM(Vi | E) are uknown
C. Unknown M
5/10/2007 I. Tsamardinos, CSD, University of Crete
8
The Markov Blanket of T
The set of direct causes, direct effects, and direct causes of direct effects
T
V2
V3V1
V5
V4
V6
5/10/2007 I. Tsamardinos, CSD, University of Crete
9
The Manipulated Markov Blanket of T The set of direct
causes, direct effects, and direct causes of direct effects in the manipulated distribution E.g. V1 and V5
T
V2
V3V1
V5
V4
V6
5/10/2007 I. Tsamardinos, CSD, University of Crete
10
Properties of MB(T)
The smallest-size, most-predictive subset of variables
All and only the variables we need for building optimal predictive models
I. Tsamardinos and C. F. Aliferis. Towards principled feature selection: Relevancy, Filters and Wrappers. AI & Statistics, 2003.
5/10/2007 I. Tsamardinos, CSD, University of Crete
11
A. No Manipulations
Find the MB(T) Fit a model from training data for P(T |
MBM(T)), using only the the variables of the MB(T)
5/10/2007 I. Tsamardinos, CSD, University of Crete
12
B. Known M
Find the MBM(T) Fit a model from training data, using only the
variables of the MBM(T) Proposition:
PM(T | MBM(T)) = P(T | MBM(T))
provided there are no manipulated spouses of T that is a descendant of T in the unmanipulated distribution
5/10/2007 I. Tsamardinos, CSD, University of Crete
13
Can Be Fit From Unmanipulated Data
T
V2
V3V1
V5
V4
V6
M = {V1 , V5}
PM(T | MBM(T)) = P(T | MBM(T))
5/10/2007 I. Tsamardinos, CSD, University of Crete
14
Cannot Be Fit From Unmanipulated Data
T
V2
V3V1
V5
V4
V6
M = {V1, V4 }
PM(T | MBM(T)) P(T | MBM(T))
5/10/2007 I. Tsamardinos, CSD, University of Crete
15
Unknown Manipulations M
Find the direct causes of T Fit a model from training data, using only the
the variables that are direct causes of T
Only the direct causes remain in MBM(T) under any manipulation
5/10/2007 I. Tsamardinos, CSD, University of Crete
16
Learning Bayesian Networks
Many algorithms that can learn the network exist Discrete data : MMHC1
Mixed: Bach2
Find the graph, find the MBM(T), fit a model and you are done
… or are you?
1. I Tsamardinos, LE Brown, and CF Aliferis. Machine Learning, 65(1):31, 2006.2. F.R. Bach and M.I. Jordan. NIPS-02
5/10/2007 I. Tsamardinos, CSD, University of Crete
17
Faithfulness and Parity Functions All BN methods assume
Faithfulness Causes and effects have
detectable conditional pairwise associations with T
T = V1 XOR V3
No pairwise association between T and V1
T
V3V1
5/10/2007 I. Tsamardinos, CSD, University of Crete
18
Parity Functions in Feature Space T = V1 XOR V2
No pairwise association T, V1
Construct New Feature V1 V2
Pairwise associations become apparent
T
V2V1
V1V2
V2V1
T
5/10/2007 I. Tsamardinos, CSD, University of Crete
19
Feature Space Markov Blanket Map Data to Feature Space Learn the Markov Blanket in Feature Space
5/10/2007 I. Tsamardinos, CSD, University of Crete
20
Feature Space Markov Blanket Map Data to Feature Space
Brute force is inefficient Indirectly map to feature space using an SVM Assume: low SVM weight of a feature implies low
association of the feature with T Produce only the top weighted features!
(recently developed heuristic method) Learn the Markov Blanket in Feature Space
Run HITON1
1. C. F. Aliferis, I. Tsamardinos, and A. Statnikov. AMIA 2003.
5/10/2007 I. Tsamardinos, CSD, University of Crete
21
Inducting the MB(T)
Run MMMB1, RFE2, FSMB3, no feature selection
Build predictive models If there is a large discrepancy in predicting
performance consult FSMB If there are “parity”-like variables, add the
corresponding constructed features in the data before learning the network
1. I Tsamardinos, CF Aliferis, and A Statnikov. KDD 2003.2. I. Guyon, et. al. Machine Learning, 46(1-3):389{422}, 2002.3. submitted for publication
5/10/2007 I. Tsamardinos, CSD, University of Crete
22
Hidden Variables and Confounding
T
V2
V3V1
V5
V4
V6
H1
H2
H1 , H2 hidden variables
Dashed edges appear in the marginal network
Marginal MB(T) showed in green
5/10/2007 I. Tsamardinos, CSD, University of Crete
23
Hidden Variables and Confounding
T
V2
V3V1
V5
V4
V6
H1
H2
H1 , H2 hidden variables
Dashed edges appear in the marginal network
Redish edges are “removed” by manipulations
Manipulations of V5 , V3
lead to errors in estimating MBM(T) (bluish nodes)
5/10/2007 I. Tsamardinos, CSD, University of Crete
24
Finding Non-Confounded Edges
T
V3V1
V5
V6
V2Proposition: V = O H, O are
observable, H are not. P(V) is faithful to a Causal Bayesian Network . If
1. S O, I(V1 ; T | S)
2. S O, I(V3 ; T | S)
3. S O, I(V5 ;T | S)
4. Z1 O, s.t. I(V1 ; V3 | S)
5. Z2 O, s.t. I(V1 ; V5 | S)
6. I(V1 ; V3 | Z1 {T})
7. I(V1 ; V5 | Z2 {T})
Then there is a causal path T to V5
(edge T V5 is causal)
5/10/2007 I. Tsamardinos, CSD, University of Crete
25
Finding Non-Confounded Edges
T
V3V1
V5
V6
V2Proposition: V = O H, O are
observable, H are not. P(V) is faithful to a Causal Bayesian Network . If
1. S O, I(V1 ; T | S)
2. S O, I(V3 ; T | S)
3. S O, I(V5 ;T | S)
4. Z1 O, s.t. I(V1 ; V3 | S)
5. Z2 O, s.t. I(V1 ; V5 | S)
6. I(V1 ; V3 | Z1 {T})
7. I(V1 ; V5 | Z2 {T})
Then there is a causal path T to V5
(edge T V5 is causal)
H
5/10/2007 I. Tsamardinos, CSD, University of Crete
26
Finding Non-Confounded Edges Use to test to
Orient some edges Find truly causal (non-confounded) edges
Extension of basic idea presented in [1]
1. S. Mani, P. Spirtes, and G.F. Cooper. UAI 2006.
5/10/2007 I. Tsamardinos, CSD, University of Crete
27
Finding the MBM(T)
Edge existence: BN learning algorithm Edge orientation:
Learn the network, convert to PDAG, obtain compelled edges
Confounding test Edge confounding
Confounding test Weigh evidence and decide on orientation
and absence of confounding
5/10/2007 I. Tsamardinos, CSD, University of Crete
28
Finding the MBM(T)
T
V2
V3V1
V5
V4
V6
V7
Non-confounded
Oriented but could be confounded
Undirected
Manipulated NodesVi
Are V7 , V3 part of MBM(T)?
Is V4 part of MBM(T)?
5/10/2007 I. Tsamardinos, CSD, University of Crete
29
Results
5/10/2007 I. Tsamardinos, CSD, University of Crete
30
Limitations
Most time spent or REGED Conditional independence tests were
sometimes inappropriate New methods not optimized or fully tested Model averaging should be used Formal methods for weighing the evidence
are needed
5/10/2007 I. Tsamardinos, CSD, University of Crete
31
Conclusions
General basis of theory and algorithms for predictions under manipulation
New algorithms for addressing lack of faithfulness and hidden confounding variables
The strategy can be implemented using the new and existing algorithms
Many open directions/problems Faithfulness Acyclicity Hidden variables Timed data