A Strategy for Making Predictions under Manipulation

A Strategy for Making Predictions under Manipulation

Ioannis TsamardinosAssistant ProfessorComputer Science Department, University of CreteICS, Foundation for Research and Technology - Hellas

Laura E. BrownPh.D. CandidateDept. Biomedical Inf., Vanderbilt Univ.

5/10/2007 I. Tsamardinos, CSD, University of Crete

2

Selecting a Formulation of Causality Causal Bayesian

Networks Cross Sectional Data No explicit notion of time No feedback cycles

allows Edges express causal

relations Distribution expressed

as

T

V2

V3V1

V5

V4

V6 ))(|()( ii VPaVPVP


3

Effect of Manipulation

T

V2

V3V1

V5

V4

V6Manipulate V1 , V5


4


T

V2

V3V1

V5

V4


T

V2

V3V1

V5

V4

V6

E

External Manipulator


5


T

V2

V3V1

V5

V4


T

V2

V3V1

V5

V4

V6

E

Other parents are removed


6


T

V2

V3V1

V5

V4

V6

E

Mii

MiiiM

ii

EVPVPaVPVP

VPaVPVP

)|())(|()(

))(|()(

M the set of manipulated variables

J Pearl. Causality, Models, Reasoning, and Inference, 2000.


7

Types of Predictive Tasks

A. No manipulations

B. Known set of manipulated variables M From data following P(V) Predict data following PM(V) The way manipulations are performed is

unknown, i.e. PM(Vi | E) are uknown

C. Unknown M


8

The Markov Blanket of T

The set of direct causes, direct effects, and direct causes of direct effects

T

V2

V3V1

V5

V4

V6


9

The Manipulated Markov Blanket of T The set of direct

causes, direct effects, and direct causes of direct effects in the manipulated distribution E.g. V1 and V5

T

V2

V3V1

V5

V4

V6


10

Properties of MB(T)

The smallest-size, most-predictive subset of variables

All and only the variables we need for building optimal predictive models

I. Tsamardinos and C. F. Aliferis. Towards principled feature selection: Relevancy, Filters and Wrappers. AI & Statistics, 2003.


11

A. No Manipulations

Find the MB(T) Fit a model from training data for P(T |

MBM(T)), using only the the variables of the MB(T)


12

B. Known M

Find the MBM(T) Fit a model from training data, using only the

variables of the MBM(T) Proposition:

PM(T | MBM(T)) = P(T | MBM(T))

provided there are no manipulated spouses of T that is a descendant of T in the unmanipulated distribution


13

Can Be Fit From Unmanipulated Data

T

V2

V3V1

V5

V4

V6

M = {V1 , V5}

PM(T | MBM(T)) = P(T | MBM(T))


14

Cannot Be Fit From Unmanipulated Data

T

V2

V3V1

V5

V4

V6

M = {V1, V4 }

PM(T | MBM(T)) P(T | MBM(T))


15

Unknown Manipulations M

Find the direct causes of T Fit a model from training data, using only the

the variables that are direct causes of T

Only the direct causes remain in MBM(T) under any manipulation


16

Learning Bayesian Networks

Many algorithms that can learn the network exist Discrete data : MMHC1

Mixed: Bach2

Find the graph, find the MBM(T), fit a model and you are done

… or are you?

1. I Tsamardinos, LE Brown, and CF Aliferis. Machine Learning, 65(1):31, 2006.2. F.R. Bach and M.I. Jordan. NIPS-02


17

Faithfulness and Parity Functions All BN methods assume

Faithfulness Causes and effects have

detectable conditional pairwise associations with T

T = V1 XOR V3

No pairwise association between T and V1

T

V3V1


18

Parity Functions in Feature Space T = V1 XOR V2

No pairwise association T, V1

Construct New Feature V1 V2

Pairwise associations become apparent

T

V2V1

V1V2

V2V1

T


19

Feature Space Markov Blanket Map Data to Feature Space Learn the Markov Blanket in Feature Space


20

Feature Space Markov Blanket Map Data to Feature Space

Brute force is inefficient Indirectly map to feature space using an SVM Assume: low SVM weight of a feature implies low

association of the feature with T Produce only the top weighted features!

(recently developed heuristic method) Learn the Markov Blanket in Feature Space

Run HITON1

1. C. F. Aliferis, I. Tsamardinos, and A. Statnikov. AMIA 2003.


21

Inducting the MB(T)

Run MMMB1, RFE2, FSMB3, no feature selection

Build predictive models If there is a large discrepancy in predicting

performance consult FSMB If there are “parity”-like variables, add the

corresponding constructed features in the data before learning the network

1. I Tsamardinos, CF Aliferis, and A Statnikov. KDD 2003.2. I. Guyon, et. al. Machine Learning, 46(1-3):389{422}, 2002.3. submitted for publication


22

Hidden Variables and Confounding

T

V2

V3V1

V5

V4

V6

H1

H2

H1 , H2 hidden variables

Dashed edges appear in the marginal network

Marginal MB(T) showed in green


23

Hidden Variables and Confounding

T

V2

V3V1

V5

V4

V6

H1

H2

H1 , H2 hidden variables

Dashed edges appear in the marginal network

Redish edges are “removed” by manipulations

Manipulations of V5 , V3

lead to errors in estimating MBM(T) (bluish nodes)


24

Finding Non-Confounded Edges

T

V3V1

V5

V6

V2Proposition: V = O H, O are

observable, H are not. P(V) is faithful to a Causal Bayesian Network . If

1. S O, I(V1 ; T | S)

2. S O, I(V3 ; T | S)

3. S O, I(V5 ;T | S)

4. Z1 O, s.t. I(V1 ; V3 | S)

5. Z2 O, s.t. I(V1 ; V5 | S)

6. I(V1 ; V3 | Z1 {T})

7. I(V1 ; V5 | Z2 {T})

Then there is a causal path T to V5

(edge T V5 is causal)


25

Finding Non-Confounded Edges

T

V3V1

V5

V6

V2Proposition: V = O H, O are

observable, H are not. P(V) is faithful to a Causal Bayesian Network . If

1. S O, I(V1 ; T | S)

2. S O, I(V3 ; T | S)

3. S O, I(V5 ;T | S)

4. Z1 O, s.t. I(V1 ; V3 | S)

5. Z2 O, s.t. I(V1 ; V5 | S)

6. I(V1 ; V3 | Z1 {T})

7. I(V1 ; V5 | Z2 {T})

Then there is a causal path T to V5

(edge T V5 is causal)

H


26

Finding Non-Confounded Edges Use to test to

Orient some edges Find truly causal (non-confounded) edges

Extension of basic idea presented in [1]

1. S. Mani, P. Spirtes, and G.F. Cooper. UAI 2006.


27

Finding the MBM(T)

Edge existence: BN learning algorithm Edge orientation:

Learn the network, convert to PDAG, obtain compelled edges

Confounding test Edge confounding

Confounding test Weigh evidence and decide on orientation

and absence of confounding


28

Finding the MBM(T)

T

V2

V3V1

V5

V4

V6

V7

Non-confounded

Oriented but could be confounded

Undirected

Manipulated NodesVi

Are V7 , V3 part of MBM(T)?

Is V4 part of MBM(T)?


29

Results


30

Limitations

Most time spent or REGED Conditional independence tests were

sometimes inappropriate New methods not optimized or fully tested Model averaging should be used Formal methods for weighing the evidence

are needed


31

Conclusions

General basis of theory and algorithms for predictions under manipulation

New algorithms for addressing lack of faithfulness and hidden confounding variables

The strategy can be implemented using the new and existing algorithms

Many open directions/problems Faithfulness Acyclicity Hidden variables Timed data

Documents

A Strategy for Making Predictions under Manipulation