94
Graphs for margins of Bayesian networks. Robin J. Evans www.stats.ox.ac.uk/evans Department of Statistics, University of Oxford ERCIM, Pisa December 2014 1 / 36

Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Graphs for margins of Bayesian networks.

Robin J. Evanswww.stats.ox.ac.uk/∼evans

Department of Statistics, University of Oxford

ERCIM, PisaDecember 2014

1 / 36

Page 2: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Outline

1 Introduction

2 DAGs

3 Margins of DAG Models

4 mDAGs

5 Markov Equivalence

6 Summary

2 / 36

Page 3: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Correlation does not imply causation

“Dr Matthew Hobbs, head of research for Diabetes UK, saidthere was no proof that napping actually caused diabetes.”

3 / 36

Page 4: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Correlation does not imply causation

“Dr Matthew Hobbs, head of research for Diabetes UK, saidthere was no proof that napping actually caused diabetes.”

3 / 36

Page 5: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Correlation does not imply causation

“Dr Matthew Hobbs, head of research for Diabetes UK, saidthere was no proof that napping actually caused diabetes.”

3 / 36

Page 6: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Correlation does not imply causation

“Dr Matthew Hobbs, head of research for Diabetes UK, saidthere was no proof that napping actually caused diabetes.”

3 / 36

Page 7: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Distinguishing Between Causal Models

napdiabetes

lifestyle

gene

napdiabetes

lifestyle

gene

In order to compare the models, we need to understand in whatways causal models will differ, both:

observationally;

under interventions.

4 / 36

Page 8: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Distinguishing Between Causal Models

napdiabetes

lifestyle

gene

napdiabetes

lifestyle

gene

In order to compare the models, we need to understand in whatways causal models will differ, both:

observationally;

under interventions.

4 / 36

Page 9: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Distinguishing Between Causal Models

napdiabetes

lifestyle

gene

napdiabetes

lifestyle

gene

In order to compare the models, we need to understand in whatways causal models will differ, both:

observationally;

under interventions.

4 / 36

Page 10: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Outline

1 Introduction

2 DAGs

3 Margins of DAG Models

4 mDAGs

5 Markov Equivalence

6 Summary

5 / 36

Page 11: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Outline

1 Introduction

2 DAGs

3 Margins of DAG Models

4 mDAGs

5 Markov Equivalence

6 Summary

6 / 36

Page 12: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Directed Acyclic Graphs

vertices

edges

no directed cycles

directed acyclic graph (DAG), G

4

21 3

5

If w → v then w is a parent of v : paG(4) = {1, 2}.

If w → · · · → v then w is a ancestor of v

7 / 36

Page 13: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Directed Acyclic Graphs

vertices

edges

no directed cycles

directed acyclic graph (DAG), G

4

21 3

5

If w → v then w is a parent of v : paG(4) = {1, 2}.

If w → · · · → v then w is a ancestor of v

7 / 36

Page 14: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Directed Acyclic Graphs

vertices

edges

no directed cycles

directed acyclic graph (DAG), G

4

21 3

5

If w → v then w is a parent of v : paG(4) = {1, 2}.

If w → · · · → v then w is a ancestor of v

7 / 36

Page 15: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Directed Acyclic Graphs

vertices

edges

no directed cycles

directed acyclic graph (DAG), G

4

21 3

5

If w → v then w is a parent of v : paG(4) = {1, 2}.

If w → · · · → v then w is a ancestor of v

7 / 36

Page 16: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

DAG Models

vertex random variable

a Xa

⇐⇒

4

2

graph G

1 3

5

⇐⇒ P : Xi ⊥⊥ Xpre(i) |Xpa(i)[P]

ordered local Markov property

model M(G)

So in example above:

X2 ⊥⊥ X1 X3 ⊥⊥ X1 |X2

X4 ⊥⊥ X3 |X1,X2 X5 ⊥⊥ X1,X2 |X3,X4

8 / 36

Page 17: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

DAG Models

vertex random variable

a Xa

⇐⇒

4

2

graph G

1 3

5

⇐⇒ P : Xi ⊥⊥ Xpre(i) |Xpa(i)[P]

ordered local Markov property

model M(G)

So in example above:

X2 ⊥⊥ X1 X3 ⊥⊥ X1 |X2

X4 ⊥⊥ X3 |X1,X2 X5 ⊥⊥ X1,X2 |X3,X4

8 / 36

Page 18: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

DAG Models

vertex random variable

a Xa

⇐⇒

4

2

graph G

1 3

5

⇐⇒ P : Xi ⊥⊥ Xpre(i) |Xpa(i)[P]

ordered local Markov property

model M(G)

So in example above:

X2 ⊥⊥ X1 X3 ⊥⊥ X1 |X2

X4 ⊥⊥ X3 |X1,X2 X5 ⊥⊥ X1,X2 |X3,X4

8 / 36

Page 19: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Global Markov Property

P satisfies the global Markov property if for all sets A,B,C ,

A d-separated from B by C =⇒ XA ⊥⊥ XB |XC [P].

Theorem (Lauritzen et al, 1990)

P satisfies the global Markov property if and only if it satisfies theordered local Markov property.

Point: the model can be defined in terms of ‘paths of information’.

9 / 36

Page 20: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Global Markov Property

P satisfies the global Markov property if for all sets A,B,C ,

A d-separated from B by C =⇒ XA ⊥⊥ XB |XC [P].

Theorem (Lauritzen et al, 1990)

P satisfies the global Markov property if and only if it satisfies theordered local Markov property.

Point: the model can be defined in terms of ‘paths of information’.

9 / 36

Page 21: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Global Markov Property

P satisfies the global Markov property if for all sets A,B,C ,

A d-separated from B by C =⇒ XA ⊥⊥ XB |XC [P].

Theorem (Lauritzen et al, 1990)

P satisfies the global Markov property if and only if it satisfies theordered local Markov property.

Point: the model can be defined in terms of ‘paths of information’.

9 / 36

Page 22: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Causal Interventions

If we interpret the DAG as representing structural assumptions,then if we intervene on a node, the graph of the resulting model isjust locally altered:

napdiabetes

lifestyle

gene

napdiabetes

lifestyle

gene

so if we force people to stop napping...

10 / 36

Page 23: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Causal Interventions

If we interpret the DAG as representing structural assumptions,then if we intervene on a node, the graph of the resulting model isjust locally altered:

napdiabetes

lifestyle

gene

napdiabetes

lifestyle

gene

so if we force people to stop napping...

10 / 36

Page 24: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Outline

1 Introduction

2 DAGs

3 Margins of DAG Models

4 mDAGs

5 Markov Equivalence

6 Summary

11 / 36

Page 25: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Latent ProjectionCan preserve conditional independences and causal coherence evenwith latents by considering paths. DAG G on vertices V = O∪U,define latent projection G(O) as follows: (Verma and Pearl, 1992)

Whenever there is a path of the form

x u1 · · · uk y

addx y

Whenever there is a path of the form

x u1 · · · uk y

addx y

Then remove the latent variables U from the graph.

12 / 36

Page 26: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Latent ProjectionCan preserve conditional independences and causal coherence evenwith latents by considering paths. DAG G on vertices V = O∪U,define latent projection G(O) as follows: (Verma and Pearl, 1992)

Whenever there is a path of the form

x u1 · · · uk y

addx y

Whenever there is a path of the form

x u1 · · · uk y

addx y

Then remove the latent variables U from the graph.

12 / 36

Page 27: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Latent ProjectionCan preserve conditional independences and causal coherence evenwith latents by considering paths. DAG G on vertices V = O∪U,define latent projection G(O) as follows: (Verma and Pearl, 1992)

Whenever there is a path of the form

x u1 · · · uk y

addx y

Whenever there is a path of the form

x u1 · · · uk y

addx y

Then remove the latent variables U from the graph.

12 / 36

Page 28: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Latent ProjectionCan preserve conditional independences and causal coherence evenwith latents by considering paths. DAG G on vertices V = O∪U,define latent projection G(O) as follows: (Verma and Pearl, 1992)

Whenever there is a path of the form

x u1 · · · uk y

addx y

Whenever there is a path of the form

x u1 · · · uk y

addx y

Then remove the latent variables U from the graph.12 / 36

Page 29: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

ADMGs

u

x

y

z

w t

−→project

x

y

z

t

Latent projection leads to an acyclic directed mixed graph(ADMG) (equivalent to summary graph without undirected edges).

Can read off independences with d/m-separation. Like an ancestralgraph, these are precisely observable independences from theoriginal DAG. See Richardson (2003) for more details.

In addition, can see that projection preserves the causal structure;Verma and Pearl (1992).

13 / 36

Page 30: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

ADMGs

u

x

y

z

w t

−→project

x

y

z

t

Latent projection leads to an acyclic directed mixed graph(ADMG) (equivalent to summary graph without undirected edges).

Can read off independences with d/m-separation. Like an ancestralgraph, these are precisely observable independences from theoriginal DAG. See Richardson (2003) for more details.

In addition, can see that projection preserves the causal structure;Verma and Pearl (1992).

13 / 36

Page 31: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

ADMGs

u

x

y

z

w t

−→project

x

y

z

t

Latent projection leads to an acyclic directed mixed graph(ADMG) (equivalent to summary graph without undirected edges).

Can read off independences with d/m-separation. Like an ancestralgraph, these are precisely observable independences from theoriginal DAG. See Richardson (2003) for more details.

In addition, can see that projection preserves the causal structure;Verma and Pearl (1992).

13 / 36

Page 32: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Causal Coherence

If we intervene on some observed variables, this ‘breaks’ theirdependence upon their parents.

1 2

3u

4

w

1

−→intervene

2

3u

4

w

1 2

3

4

↓ project

1

−→intervene

2

3

4

↓ project

14 / 36

Page 33: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Causal Coherence

If we intervene on some observed variables, this ‘breaks’ theirdependence upon their parents.

1 2

3u

4

w

1

−→intervene

2

3u

4

w

1 2

3

4

↓ project

1

−→intervene

2

3

4

↓ project

14 / 36

Page 34: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Causal Coherence

If we intervene on some observed variables, this ‘breaks’ theirdependence upon their parents.

1 2

3u

4

w

1

−→intervene

2

3u

4

w

1 2

3

4

↓ project

1

−→intervene

2

3

4

↓ project

14 / 36

Page 35: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Causal Coherence

If we intervene on some observed variables, this ‘breaks’ theirdependence upon their parents.

1 2

3u

4

w

1

−→intervene

2

3u

4

w

1 2

3

4

↓ project

1

−→intervene

2

3

4

↓ project

14 / 36

Page 36: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Causal Coherence

If we intervene on some observed variables, this ‘breaks’ theirdependence upon their parents.

1 2

3u

4

w

1

−→intervene

2

3u

4

w

1 2

3

4

↓ project

1

−→intervene

2

3

4

↓ project

14 / 36

Page 37: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

ADMGs are not sufficient

u

x

y

z

G1

x

yG2

z

u

v w

These have the same latent projection:

x

y

z

But the model over (x , y , z) in G2 is not saturated. Still true if wedichotomize.

15 / 36

Page 38: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

ADMGs are not sufficient

u

x

y

z

G1

x

yG2

z

u

v w

These have the same latent projection:

x

y

z

But the model over (x , y , z) in G2 is not saturated. Still true if wedichotomize.

15 / 36

Page 39: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

ADMGs are not sufficient

u

x

y

z

G1

x

yG2

z

u

v w

These have the same latent projection:

x

y

z

But the model over (x , y , z) in G2 is not saturated. Still true if wedichotomize.

15 / 36

Page 40: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

The Problem

Verma and Pearl’s latent projection only uses paths, which areinherently ‘pairwise’;

the paths are objects suited to conditional independence, butnot all constraints on margins are conditional independences;

ADMGs are not a sufficiently rich class of graphs to capturethe different models one can obtain.

16 / 36

Page 41: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Outline

1 Introduction

2 DAGs

3 Margins of DAG Models

4 mDAGs

5 Markov Equivalence

6 Summary

17 / 36

Page 42: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Structural Equation Model View

There is another way to think about DAG models (e.g. Lauritzenet al, 1990).

P ∈M(G) iff there exist functions fi and independent variables Ei

such that recursively setting

Xi = fi (Xpa(i),Ei )

gives XV the distribution P.

4

21 3

5

X1 = f1(E1)X2 = f2(E2)X3 = f3(X2,E3)X4 = f4(X1,X2,E4)X5 = f5(X3,X4,E5).

18 / 36

Page 43: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Structural Equation Model View

There is another way to think about DAG models (e.g. Lauritzenet al, 1990).

P ∈M(G) iff there exist functions fi and independent variables Ei

such that recursively setting

Xi = fi (Xpa(i),Ei )

gives XV the distribution P.

4

21 3

5

X1 = f1(E1)X2 = f2(E2)X3 = f3(X2,E3)X4 = f4(X1,X2,E4)X5 = f5(X3,X4,E5).

18 / 36

Page 44: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Simplifications

Simplification 1. WLOG latent vertices have no parents.

x1 x2

u

y2y1 y3

M=

x1 x2

y2y1 y3

u′

(Of course, this is not true if we assume a specific state-space: e.g.phylogenetic model)

19 / 36

Page 45: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Simplifications

Simplification 1. WLOG latent vertices have no parents.

x1 x2

u

y2y1 y3

M=

x1 x2

y2y1 y3

u′

(Of course, this is not true if we assume a specific state-space: e.g.phylogenetic model)

19 / 36

Page 46: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Simplifications

Simplification 1. WLOG latent vertices have no parents.

x1 x2

u

y2y1 y3

M=

x1 x2

y2y1 y3

u′

(Of course, this is not true if we assume a specific state-space: e.g.phylogenetic model)

19 / 36

Page 47: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Simplifications

Simplification 2. If u,w are latent with chG(w) ⊆ chG(u), thenwe don’t need w .

x1 x2 x3

u

w

M=

x1 x2 x3

u,w

20 / 36

Page 48: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Simplifications

Simplification 2. If u,w are latent with chG(w) ⊆ chG(u), thenwe don’t need w .

x1 x2 x3

u

w

M=

x1 x2 x3

u,w

20 / 36

Page 49: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

mDAGsSo we only need to consider models like this:

c

e

da

u v

w

fb

fb

...which we represent with a hyper-graph called an mDAG.

Formally, an mDAG on V is a DAG (in blue), together with someinclusion maximal collection subsets of size at least 2 (red).

Going backwards and replacing bidirected edges with latents givesus the canonical DAG G.

21 / 36

Page 50: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

mDAGsSo we only need to consider models like this:

c

e

da

u v

w

fb

fb

...which we represent with a hyper-graph called an mDAG.

Formally, an mDAG on V is a DAG (in blue), together with someinclusion maximal collection subsets of size at least 2 (red).

Going backwards and replacing bidirected edges with latents givesus the canonical DAG G.

21 / 36

Page 51: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

mDAGsSo we only need to consider models like this:

c

e

da

u v

w

fb

fb

...which we represent with a hyper-graph called an mDAG.

Formally, an mDAG on V is a DAG (in blue), together with someinclusion maximal collection subsets of size at least 2 (red).

Going backwards and replacing bidirected edges with latents givesus the canonical DAG G.

21 / 36

Page 52: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

mDAGsSo we only need to consider models like this:

c

e

da

u v

w

fb

fb

...which we represent with a hyper-graph called an mDAG.

Formally, an mDAG on V is a DAG (in blue), together with someinclusion maximal collection subsets of size at least 2 (red).

Going backwards and replacing bidirected edges with latents givesus the canonical DAG G.

21 / 36

Page 53: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Markov Properties

c

e

da

u v

w

fb

fb

Given an mDAG G and distribution P, say P is in the completemodel for G, or P ∈M(G) if it is the margin of some distributionin the model for the canonical DAG M(G).

There are other (weaker) properties Shpitser et al. (2014).

22 / 36

Page 54: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Markov Properties

c

e

da

u v

w

fb

fb

Given an mDAG G and distribution P, say P is in the completemodel for G, or P ∈M(G) if it is the margin of some distributionin the model for the canonical DAG M(G).

There are other (weaker) properties Shpitser et al. (2014).

22 / 36

Page 55: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Latent ProjectionFor mDAG G and subset of vertices O, form latent projectionp(G,O) by:Whenever there is a path of the form

x u1 · · · uk y

addx y

Whenever there is a maximal set B = {x1, x2, . . . , xk} such thatthese variables share a hidden common cause, add hyper-edge B.

Then remove the latent variables U from the graph.

u

x

y

z

w t

−→project

x

y

z

t

23 / 36

Page 56: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Latent ProjectionFor mDAG G and subset of vertices O, form latent projectionp(G,O) by:Whenever there is a path of the form

x u1 · · · uk y

addx y

Whenever there is a maximal set B = {x1, x2, . . . , xk} such thatthese variables share a hidden common cause, add hyper-edge B.

Then remove the latent variables U from the graph.

u

x

y

z

w t

−→project

x

y

z

t

23 / 36

Page 57: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Latent ProjectionFor mDAG G and subset of vertices O, form latent projectionp(G,O) by:Whenever there is a path of the form

x u1 · · · uk y

addx y

Whenever there is a maximal set B = {x1, x2, . . . , xk} such thatthese variables share a hidden common cause, add hyper-edge B.

Then remove the latent variables U from the graph.

u

x

y

z

w t

−→project

x

y

z

t

23 / 36

Page 58: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Outline

1 Introduction

2 DAGs

3 Margins of DAG Models

4 mDAGs

5 Markov Equivalence

6 Summary

24 / 36

Page 59: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Results

The mDAG latent projection preserves the distinction betweenmodels.

Theorem (Evans, 2014)

If p(G,O) = p(G′,O) then the models induced by M(G) andM(G′) on the margin O are the same.

So the problem which arises with ADMGs never occurs for mDAGs.

Theorem (Evans, 2014)

If C ⊆ O then p(GC ,O) = p(G,O)C ; i.e. the projection respectscausal interventions.

25 / 36

Page 60: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Results

The mDAG latent projection preserves the distinction betweenmodels.

Theorem (Evans, 2014)

If p(G,O) = p(G′,O) then the models induced by M(G) andM(G′) on the margin O are the same.

So the problem which arises with ADMGs never occurs for mDAGs.

Theorem (Evans, 2014)

If C ⊆ O then p(GC ,O) = p(G,O)C ; i.e. the projection respectscausal interventions.

25 / 36

Page 61: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Instrumental Variables

The Instrumental Variables model assumes causally exogenousvariable z affects the treatment x .

z x y

But it’s well known that this is observationally indistinguishablefrom a hidden common cause for x and z (e.g. Didelez andSheehan, 2007).

z x y z x y

26 / 36

Page 62: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Instrumental Variables

The Instrumental Variables model assumes causally exogenousvariable z affects the treatment x .

z x y

But it’s well known that this is observationally indistinguishablefrom a hidden common cause for x and z (e.g. Didelez andSheehan, 2007).

z x y z x y

26 / 36

Page 63: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Instrumental Variables

To see this, imagine z is an exact copy of z ′.

z

z ′

x y

Doesn’t really matter whether x gets information from z or z ′.

Very hard to see this equivalence with conditional independence.

27 / 36

Page 64: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Instrumental Variables

To see this, imagine z is an exact copy of z ′.

z

z ′

x y

Doesn’t really matter whether x gets information from z or z ′.

Very hard to see this equivalence with conditional independence.

27 / 36

Page 65: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Instrumental Variables

To see this, imagine z is an exact copy of z ′.

z

z ′

x y

Doesn’t really matter whether x gets information from z or z ′.

Very hard to see this equivalence with conditional independence.

27 / 36

Page 66: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Instrument Generalisation

Let G have bidirected edge B = C ∪D with:

(i) every c ∈ C contained in no other bidirected edge;

(ii) paG(d) ⊇ paG(C ) for each d ∈ D.

Can ‘split’ B into C and D and add edges c → d where necessary.

c1

d1 d2

c2

a b

28 / 36

Page 67: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Instrument Generalisation

Let G have bidirected edge B = C ∪D with:

(i) every c ∈ C contained in no other bidirected edge;

(ii) paG(d) ⊇ paG(C ) for each d ∈ D.

Can ‘split’ B into C and D and add edges c → d where necessary.

c1

d1 d2

c2

a b

28 / 36

Page 68: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Instrument Generalisation

Let G have bidirected edge B = C ∪D with:

(i) every c ∈ C contained in no other bidirected edge;

(ii) paG(d) ⊇ paG(C ) for each d ∈ D.

Can ‘split’ B into C and D and add edges c → d where necessary.

c1

d1 d2

c2

a b

28 / 36

Page 69: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Skeletons

Define the skeleton of two mDAGs as the undirected graph withv − w whenever v and w are contained in some edge together.

a

c

b

d

a

c

b

d

Theorem

mDAGs with different skeletons induce different models in general.

(Consequence of Theorem 4.2 of Evans, 2012)

29 / 36

Page 70: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Skeletons

Define the skeleton of two mDAGs as the undirected graph withv − w whenever v and w are contained in some edge together.

a

c

b

d

a

c

b

d

Theorem

mDAGs with different skeletons induce different models in general.

(Consequence of Theorem 4.2 of Evans, 2012)

29 / 36

Page 71: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Skeletons

Define the skeleton of two mDAGs as the undirected graph withv − w whenever v and w are contained in some edge together.

a

c

b

d

a

c

b

d

Theorem

mDAGs with different skeletons induce different models in general.

(Consequence of Theorem 4.2 of Evans, 2012)

29 / 36

Page 72: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Equivalence on Three Variables

Combining the previous results, there are 8 Markov equivalenceclasses on three variables.

30 / 36

Page 73: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

But Not on Four!

On four variables, it’s still not clear whether or not the followingmodels are saturated: (they are of full dimension in the discretecase)

1 2

3 41 2 4

3

1 2 3 4

31 / 36

Page 74: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Outline

1 Introduction

2 DAGs

3 Margins of DAG Models

4 mDAGs

5 Markov Equivalence

6 Summary

32 / 36

Page 75: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Summary

We have seen that:

graphs with ‘ordinary’ edges can give a causally coherentrepresentation of marginal models;

but: ordinary mixed graphs are not rich enough to representall models;

mDAGs provide the most general necessary framework forrepresenting causal DAGs under marginalization;

general Markov equivalence in this class is hard, but we’regetting there!

33 / 36

Page 76: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Thank you!

34 / 36

Page 77: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Main References

Didelez and Sheehan. Mendelian Randomisation: Why Epidemiologyneeds a Formal Language for Causality, SMMR, 2007.

Evans. Graphical methods for inequality constraints in marginalizedDAGs, MLSP, 2012.

Evans. Margins of Bayesian networks. Draft, 2014.

Evans. Graphs for margins of Bayesian networks. arXiv:1408.1809, 2014.

Pearl. Causality. Cambridge, 2000.

Pearl and Verma. A statistical semantics for causation, Stats andComputing, 1992.

Richardson. Markov properties for ADMGs, SJS, 2003.

Richardson and Spirtes. Ancestral graph Markov models, Ann. Stat.,2002.

Shpitser et al. Introduction to nested Markov models. Behaviormetrika,2014.

Verma. Invariant properties of causal models, Tech Report R-134, UCLACognitive Systems Laboratory, 1991.

35 / 36

Page 78: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

d-Separation

A path is a sequence of edges in the graph; vertices may not berepeated.

A path from a to b is blocked by C ⊆ V \ {a, b} if either

(i) any non-collider is in C :

c c

(ii) or any collider is not in C , nor has descendants in C :

d d

e

Two vertices a and b are d-separated given C ⊆ V \ {a, b} if allpaths are blocked.

36 / 36

Page 79: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

d-Separation

A path is a sequence of edges in the graph; vertices may not berepeated.

A path from a to b is blocked by C ⊆ V \ {a, b} if either

(i) any non-collider is in C :

c c

(ii) or any collider is not in C , nor has descendants in C :

d d

e

Two vertices a and b are d-separated given C ⊆ V \ {a, b} if allpaths are blocked.

36 / 36

Page 80: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

d-Separation

A path is a sequence of edges in the graph; vertices may not berepeated.

A path from a to b is blocked by C ⊆ V \ {a, b} if either

(i) any non-collider is in C :

c c

(ii) or any collider is not in C , nor has descendants in C :

d d

e

Two vertices a and b are d-separated given C ⊆ V \ {a, b} if allpaths are blocked.

36 / 36

Page 81: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

d-Separation

A path is a sequence of edges in the graph; vertices may not berepeated.

A path from a to b is blocked by C ⊆ V \ {a, b} if either

(i) any non-collider is in C :

c c

(ii) or any collider is not in C , nor has descendants in C :

d d

e

Two vertices a and b are d-separated given C ⊆ V \ {a, b} if allpaths are blocked.

36 / 36

Page 82: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Inequality Results

Z X

U

Y

p(x , y | z) =

∫p(u) p(x | z , u) · p(y | x , u) du

Let p∗(x , y | z) ≡∫

p(u) p(x | z , u) · p(y | x = 0, u) du

Can’t observe p∗ but:

Compatibility: p(0, y | z) = p∗(0, y | z) for each z , y ; and

Independence: Y ⊥⊥ Z under p∗.

This ‘compatibility’ requirement turns out to place an inequalityrestriction on p: max

x

∑y

maxz

p(x , y | z) ≤ 1.

37 / 36

Page 83: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Inequality Results

Z X

U

Y

p(x , y | z) =

∫p(u) p(x | z , u) · p(y | x , u) du

Let p∗(x , y | z) ≡∫

p(u) p(x | z , u) · p(y | x = 0, u) du

Can’t observe p∗ but:

Compatibility: p(0, y | z) = p∗(0, y | z) for each z , y ; and

Independence: Y ⊥⊥ Z under p∗.

This ‘compatibility’ requirement turns out to place an inequalityrestriction on p: max

x

∑y

maxz

p(x , y | z) ≤ 1.

37 / 36

Page 84: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Inequality Results

Z X

U

Y

p(x , y | z) =

∫p(u) p(x | z , u) · p(y | x , u) du

Let p∗(x , y | z) ≡∫

p(u) p(x | z , u) · p(y | x = 0, u) du

Can’t observe p∗ but:

Compatibility: p(0, y | z) = p∗(0, y | z) for each z , y ; and

Independence: Y ⊥⊥ Z under p∗.

This ‘compatibility’ requirement turns out to place an inequalityrestriction on p: max

x

∑y

maxz

p(x , y | z) ≤ 1.

37 / 36

Page 85: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Inequality Results

Z X

U

Y

p(x , y | z) =

∫p(u) p(x | z , u) · p(y | x , u) du

Let p∗(x , y | z) ≡∫

p(u) p(x | z , u) · p(y | x = 0, u) du

Can’t observe p∗ but:

Compatibility: p(0, y | z) = p∗(0, y | z) for each z , y ; and

Independence: Y ⊥⊥ Z under p∗.

This ‘compatibility’ requirement turns out to place an inequalityrestriction on p: max

x

∑y

maxz

p(x , y | z) ≤ 1.

37 / 36

Page 86: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Inequality Results

Z X

U

Y

p(x , y | z) =

∫p(u) p(x | z , u) · p(y | x , u) du

Let p∗(x , y | z) ≡∫

p(u) p(x | z , u) · p(y | x = 0, u) du

Can’t observe p∗ but:

Compatibility: p(0, y | z) = p∗(0, y | z) for each z , y ; and

Independence: Y ⊥⊥ Z under p∗.

This ‘compatibility’ requirement turns out to place an inequalityrestriction on p: max

x

∑y

maxz

p(x , y | z) ≤ 1.

37 / 36

Page 87: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Inequality Results

Generalizing this argument, we find a rich theory of results oninequalities (Evans, 2012).

However these results are not exhaustive!Finding all inequality constraints in marginal models is probably anNP hard problem.

Additionally:

fitting models with inequality constraints is not trivial;

the usual asymptotic results do not necessarily apply.

Maybe the nested model is a good compromise!

38 / 36

Page 88: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Inequality Results

Generalizing this argument, we find a rich theory of results oninequalities (Evans, 2012).

However these results are not exhaustive!Finding all inequality constraints in marginal models is probably anNP hard problem.

Additionally:

fitting models with inequality constraints is not trivial;

the usual asymptotic results do not necessarily apply.

Maybe the nested model is a good compromise!

38 / 36

Page 89: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Inequality Results

Generalizing this argument, we find a rich theory of results oninequalities (Evans, 2012).

However these results are not exhaustive!Finding all inequality constraints in marginal models is probably anNP hard problem.

Additionally:

fitting models with inequality constraints is not trivial;

the usual asymptotic results do not necessarily apply.

Maybe the nested model is a good compromise!

38 / 36

Page 90: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Inequality Results

Generalizing this argument, we find a rich theory of results oninequalities (Evans, 2012).

However these results are not exhaustive!Finding all inequality constraints in marginal models is probably anNP hard problem.

Additionally:

fitting models with inequality constraints is not trivial;

the usual asymptotic results do not necessarily apply.

Maybe the nested model is a good compromise!

38 / 36

Page 91: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

ADMGs are not sufficient

In general we need to distinguish between {1, 2, 3} and {1, 2},{1, 3}, {2, 3}.

X1

X2

X3 X1

X2

X3

The model on the right is not saturated. Still true if wedichotomize.

39 / 36

Page 92: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

ADMGs are not sufficient

In general we need to distinguish between {1, 2, 3} and {1, 2},{1, 3}, {2, 3}.

X1

X2

X3 X1

X2

X3

The model on the right is not saturated. Still true if wedichotomize.

39 / 36

Page 93: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

ADMGs are not sufficient

Lemma

Let F , G, H be mutually independent σ-algebrae (so thatF ⊥⊥ G ∨H and so on), and let X , Y and Z be random variablessuch that

(i) X is F ∨ G-measureable;

(ii) Y is G ∨ H-measureable;

(iii) Z is F ∨H-measureable.

Then P(X = Y = Z ) > 1− ε implies

VarX < 3ε.

40 / 36

Page 94: Graphs for margins of Bayesian networks.evans/ERCIM.pdf · lifestyle gene diabetes nap lifestyle gene In order to compare the models, we need to understand in what ways causal models

Causal Equivalence

The two mDAGs below are Markov equivalent, and lead to thesame graph under any ordinary causal intervention.

1

2

3 1

2

3

41 / 36