162
Causal models and how to refute them. Robin Evans University of Oxford www.stats.ox.ac.uk/evans Statistics Seminar, University of York 26th November 2015 1 / 42

Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Causal models and how to refute them.

Robin EvansUniversity of Oxford

www.stats.ox.ac.uk/∼evans

Statistics Seminar, University of York26th November 2015

1 / 42

Page 2: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Acknowledgements

Thomas Richardson (U. of Washington)

James Robins (Harvard)

Ilya Shpitser (Johns Hopkins)

2 / 42

Page 3: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Correlation does not imply causation

“Dr Matthew Hobbs, head of research for Diabetes UK, said therewas no proof that napping actually caused diabetes.”

3 / 42

Page 4: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Correlation does not imply causation

“Dr Matthew Hobbs, head of research for Diabetes UK, said therewas no proof that napping actually caused diabetes.”

3 / 42

Page 5: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Correlation does not imply causation

“Dr Matthew Hobbs, head of research for Diabetes UK, said therewas no proof that napping actually caused diabetes.”

3 / 42

Page 6: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Distinguishing Between Causal Models

Causality is best inferred from experiments.But doing experiments is hard (expensive, impractical, unethical...)

Collecting observational data is cheap.Can we still tell what causes what from observational data?

T S C

p(t, s, c) = p(t)p(c)p(s | t, c)

T ⊥⊥ C

T S C

p(t, s, c) = p(t)p(s | t)p(c | s)

T ⊥⊥ C |S

Sometimes!

This is the basis of some causal search algorithms (e.g. PC, FCI).Note: other methods (e.g. integer programming) are also used.

In order to do this well, we need to understand in what ways causalmodels will be observationally different.

When everything is observed this is (mathematically) easy.

4 / 42

Page 7: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Distinguishing Between Causal Models

Causality is best inferred from experiments.But doing experiments is hard (expensive, impractical, unethical...)

Collecting observational data is cheap.Can we still tell what causes what from observational data?

T S C

p(t, s, c) = p(t)p(c)p(s | t, c)

T ⊥⊥ C

T S C

p(t, s, c) = p(t)p(s | t)p(c | s)

T ⊥⊥ C |S

Sometimes!

This is the basis of some causal search algorithms (e.g. PC, FCI).Note: other methods (e.g. integer programming) are also used.

In order to do this well, we need to understand in what ways causalmodels will be observationally different.

When everything is observed this is (mathematically) easy.

4 / 42

Page 8: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Distinguishing Between Causal Models

Causality is best inferred from experiments.But doing experiments is hard (expensive, impractical, unethical...)

Collecting observational data is cheap.Can we still tell what causes what from observational data?

T S C

p(t, s, c) = p(t)p(c)p(s | t, c)

T ⊥⊥ C

T S C

p(t, s, c) = p(t)p(s | t)p(c | s)

T ⊥⊥ C |S

Sometimes!

This is the basis of some causal search algorithms (e.g. PC, FCI).Note: other methods (e.g. integer programming) are also used.

In order to do this well, we need to understand in what ways causalmodels will be observationally different.

When everything is observed this is (mathematically) easy.

4 / 42

Page 9: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Distinguishing Between Causal Models

Causality is best inferred from experiments.But doing experiments is hard (expensive, impractical, unethical...)

Collecting observational data is cheap.Can we still tell what causes what from observational data?

T S C

p(t, s, c) = p(t)p(c)p(s | t, c)

T ⊥⊥ C

T S C

p(t, s, c) = p(t)p(s | t)p(c | s)

T ⊥⊥ C |S

Sometimes!

This is the basis of some causal search algorithms (e.g. PC, FCI).Note: other methods (e.g. integer programming) are also used.

In order to do this well, we need to understand in what ways causalmodels will be observationally different.

When everything is observed this is (mathematically) easy.

4 / 42

Page 10: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Distinguishing Between Causal Models

Causality is best inferred from experiments.But doing experiments is hard (expensive, impractical, unethical...)

Collecting observational data is cheap.Can we still tell what causes what from observational data?

T S C

p(t, s, c) = p(t)p(c)p(s | t, c)

T ⊥⊥ C

T S C

p(t, s, c) = p(t)p(s | t)p(c | s)

T ⊥⊥ C |S

Sometimes!

This is the basis of some causal search algorithms (e.g. PC, FCI).Note: other methods (e.g. integer programming) are also used.

In order to do this well, we need to understand in what ways causalmodels will be observationally different.

When everything is observed this is (mathematically) easy.

4 / 42

Page 11: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Distinguishing Between Causal Models

Causality is best inferred from experiments.But doing experiments is hard (expensive, impractical, unethical...)

Collecting observational data is cheap.Can we still tell what causes what from observational data?

T S C

p(t, s, c) = p(t)p(c)p(s | t, c)

T ⊥⊥ C

T S C

p(t, s, c) = p(t)p(s | t)p(c | s)

T ⊥⊥ C |S

Sometimes!

This is the basis of some causal search algorithms (e.g. PC, FCI).Note: other methods (e.g. integer programming) are also used.

In order to do this well, we need to understand in what ways causalmodels will be observationally different.

When everything is observed this is (mathematically) easy.

4 / 42

Page 12: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Hidden VariablesInstrumental variables:

T S C

U

Dynamic treatment model / longitudinal exposure:

T1 X1 T2

U

X2

Principal aims:

be able to test causal models;

identify and bound causal effects;

use constraints for model search.

5 / 42

Page 13: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Hidden VariablesInstrumental variables:

T S C

U

Dynamic treatment model / longitudinal exposure:

T1 X1 T2

U

X2

Principal aims:

be able to test causal models;

identify and bound causal effects;

use constraints for model search.

5 / 42

Page 14: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Hidden VariablesInstrumental variables:

T S C

U

Dynamic treatment model / longitudinal exposure:

T1 X1 T2

U

X2

Principal aims:

be able to test causal models;

identify and bound causal effects;

use constraints for model search.

5 / 42

Page 15: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Hidden VariablesInstrumental variables:

T S C

U

Dynamic treatment model / longitudinal exposure:

T1 X1 T2

U

X2

Principal aims:

be able to test causal models;

identify and bound causal effects;

use constraints for model search.

5 / 42

Page 16: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Hidden VariablesInstrumental variables:

T S C

U

Dynamic treatment model / longitudinal exposure:

T1 X1 T2

U

X2

Principal aims:

be able to test causal models;

identify and bound causal effects;

use constraints for model search.5 / 42

Page 17: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

The Holy Grail: Structure Learning

Truth:

X Y Z

Given a distribution P (or rather data from P) and a set of possiblecausal models...

X Y

Z

X Y

Z

X Y

Z

X Y

Z

X Y

Z

X Y

Z

X Y

Z

X Y

Z

X Y

Z

X Y

Z

X Y

Z

X Y

Z

...return list of models which are compatible with data.

6 / 42

Page 18: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

The Holy Grail: Structure Learning

Truth:

X Y Z

Given a distribution P (or rather data from P) and a set of possiblecausal models...

X Y

Z

X Y

Z

X Y

Z

X Y

Z

X Y

Z

X Y

Z

X Y

Z

X Y

Z

X Y

Z

X Y

Z

X Y

Z

X Y

Z

...return list of models which are compatible with data.

6 / 42

Page 19: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

The Holy Grail: Structure Learning

Truth:

X Y Z

Given a distribution P (or rather data from P) and a set of possiblecausal models...

X Y

Z

X Y

Z

X Y

Z

X Y

Z

X Y

Z

X Y

Z

X Y

Z

X Y

Z

X Y

Z

X Y

Z

X Y

Z

X Y

Z

...return list of models which are compatible with data.

6 / 42

Page 20: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Experimental Design

Truth:

X Y Z

We could then identify an experiment to distinguish remaining models:

X Y

Z

X Y

Z

X Y

Z

X Y

Z

intervene on X : X Y

Z

X Y

Z

X Y

Z

X Y

Z

...return list of models which are compatible with data.

To do this we need to know what constraints the model places on thedistribution (the focus of this talk).

7 / 42

Page 21: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Experimental Design

Truth:

X Y Z

We could then identify an experiment to distinguish remaining models:

X Y

Z

X Y

Z

X Y

Z

X Y

Z

intervene on X : X Y

Z

X Y

Z

X Y

Z

X Y

Z

...return list of models which are compatible with data.

To do this we need to know what constraints the model places on thedistribution (the focus of this talk).

7 / 42

Page 22: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Outline

1 DAG Models

2 An Example

3 mDAGs

4 Inequalities

5 Testing, Fitting and Searching

8 / 42

Page 23: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Directed Acyclic Graphs

vertices

edges

no directed cycles

directed acyclic graph (DAG), G

4

21 3

5

If w → v then w is a parent of v : paG(4) = {1, 2}.

If w → · · · → v then w is a ancestor of v .An ancestral set contains all its own ancestors.

9 / 42

Page 24: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Directed Acyclic Graphs

vertices

edges

no directed cycles

directed acyclic graph (DAG), G

4

21 3

5

If w → v then w is a parent of v : paG(4) = {1, 2}.

If w → · · · → v then w is a ancestor of v .An ancestral set contains all its own ancestors.

9 / 42

Page 25: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Directed Acyclic Graphs

vertices

edges

no directed cycles

directed acyclic graph (DAG), G

4

21 3

5

If w → v then w is a parent of v : paG(4) = {1, 2}.

If w → · · · → v then w is a ancestor of v .An ancestral set contains all its own ancestors.

9 / 42

Page 26: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Directed Acyclic Graphs

vertices

edges

no directed cycles

directed acyclic graph (DAG), G

4

21 3

5

If w → v then w is a parent of v : paG(4) = {1, 2}.

If w → · · · → v then w is a ancestor of v .An ancestral set contains all its own ancestors.

9 / 42

Page 27: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

DAG Models (aka Bayesian Networks)

vertex random variable

a Xa

⇐⇒

4

2

graph G

1 3

5

⇐⇒ p(x1, . . . , xk) =∏

i p(xi | xpa(i)).

(factorization)

model M(G)

So in example above:

p(xV ) = p(x1) · p(x2) · p(x3 | x2) · p(x4 | x1, x2) · p(x5 | x3, x4)

10 / 42

Page 28: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

DAG Models (aka Bayesian Networks)

vertex random variable

a Xa

⇐⇒

4

2

graph G

1 3

5

⇐⇒ p(x1, . . . , xk) =∏

i p(xi | xpa(i)).

(factorization)

model M(G)

So in example above:

p(xV ) = p(x1) · p(x2) · p(x3 | x2) · p(x4 | x1, x2) · p(x5 | x3, x4)

10 / 42

Page 29: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

DAG Models (aka Bayesian Networks)

vertex random variable

a Xa

⇐⇒

4

2

graph G

1 3

5

⇐⇒ p(x1, . . . , xk) =∏

i p(xi | xpa(i)).

(factorization)

model M(G)

So in example above:

p(xV ) = p(x1) · p(x2) · p(x3 | x2) · p(x4 | x1, x2) · p(x5 | x3, x4)

10 / 42

Page 30: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

DAG ModelsCan also define model as a list of conditional independences:

4

21 3

5

pick a topological orderingof the graph: 1, 2, 3, 4, 5.

Can always factorize a joint distribution as:

p(xV ) = p(x1) · p(x2 | x1) · p(x3 | x1, x2) · p(x4 | x1, x2, x3)

· p(x5 | x1, x2, x3, x4).

The model is the same as setting

p(xi | x1, x2, . . . , xi−1) = p(xi | xpa(i)), for each i .

Thus M(G) is precisely distributions such that:

Xi ⊥⊥ X[i−1]\pa(i) |Xpa(i), i ∈ V .

This is a constraint-based perspective.

11 / 42

Page 31: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

DAG ModelsCan also define model as a list of conditional independences:

4

21 3

5

pick a topological orderingof the graph: 1, 2, 3, 4, 5.

Can always factorize a joint distribution as:

p(xV ) = p(x1) · p(x2 | x1) · p(x3 | x1, x2) · p(x4 | x1, x2, x3)

· p(x5 | x1, x2, x3, x4).

The model is the same as setting

p(xi | x1, x2, . . . , xi−1) = p(xi | xpa(i)), for each i .

Thus M(G) is precisely distributions such that:

Xi ⊥⊥ X[i−1]\pa(i) |Xpa(i), i ∈ V .

This is a constraint-based perspective.

11 / 42

Page 32: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

DAG ModelsCan also define model as a list of conditional independences:

4

21 3

5

pick a topological orderingof the graph: 1, 2, 3, 4, 5.

Can always factorize a joint distribution as:

p(xV ) = p(x1) · p(x2 | x1) · p(x3 | x1, x2) · p(x4 | x1, x2, x3)

· p(x5 | x1, x2, x3, x4).

The model is the same as setting

p(xi | x1, x2, . . . , xi−1) = p(xi | xpa(i)), for each i .

Thus M(G) is precisely distributions such that:

Xi ⊥⊥ X[i−1]\pa(i) |Xpa(i), i ∈ V .

This is a constraint-based perspective.11 / 42

Page 33: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Causal Models

A DAG can also encode causal information:

4

21 3

5

If we intervene to experiment on X4, just delete incoming edges.

In distribution, just delete factor corresponding to X4:

p(x1, x2, x3, x4, x5) = p(x1) · p(x2) · p(x3 | x2) · p(x4 | x1, x2) · p(x5 | x3, x4).

p(x1, x2, x3, x5 | do(x4)) = p(x1) · p(x2) · p(x3 | x2) · p(x5 | x3, x4).

All other terms preserved.

12 / 42

Page 34: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Causal Models

A DAG can also encode causal information:

4

21 3

5

If we intervene to experiment on X4, just delete incoming edges.

In distribution, just delete factor corresponding to X4:

p(x1, x2, x3, x4, x5) = p(x1) · p(x2) · p(x3 | x2) · p(x4 | x1, x2) · p(x5 | x3, x4).

p(x1, x2, x3, x5 | do(x4)) = p(x1) · p(x2) · p(x3 | x2) · p(x5 | x3, x4).

All other terms preserved.

12 / 42

Page 35: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Causal Models

A DAG can also encode causal information:

4

21 3

5

If we intervene to experiment on X4, just delete incoming edges.

In distribution, just delete factor corresponding to X4:

p(x1, x2, x3, x4, x5) = p(x1) · p(x2) · p(x3 | x2) · p(x4 | x1, x2) · p(x5 | x3, x4).

p(x1, x2, x3, x5 | do(x4)) = p(x1) · p(x2) · p(x3 | x2) · p(x5 | x3, x4).

All other terms preserved.

12 / 42

Page 36: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Outline

1 DAG Models

2 An Example

3 mDAGs

4 Inequalities

5 Testing, Fitting and Searching

13 / 42

Page 37: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Marginalization

Very often causal models include random quantities that we cannotobserve.

Wisconsin Longitudinal Study:

over 10,000 Wisconsin high-school graduates from 1957;

data on primary respondents collected in 1957, 1975, 1992, 2004.

Suppose we want to know whether drafting has impact on futureearnings, controlling for education/family background.

X family income in 1957;

E education level;

M drafted into military;

Y respondent income 1992;

U unmeasured confounding.

X E M Y

U

14 / 42

Page 38: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Marginalization

Very often causal models include random quantities that we cannotobserve.

Wisconsin Longitudinal Study:

over 10,000 Wisconsin high-school graduates from 1957;

data on primary respondents collected in 1957, 1975, 1992, 2004.

Suppose we want to know whether drafting has impact on futureearnings, controlling for education/family background.

X family income in 1957;

E education level;

M drafted into military;

Y respondent income 1992;

U unmeasured confounding.

X E M Y

U

14 / 42

Page 39: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Marginalization

Very often causal models include random quantities that we cannotobserve.

Wisconsin Longitudinal Study:

over 10,000 Wisconsin high-school graduates from 1957;

data on primary respondents collected in 1957, 1975, 1992, 2004.

Suppose we want to know whether drafting has impact on futureearnings, controlling for education/family background.

X family income in 1957;

E education level;

M drafted into military;

Y respondent income 1992;

U unmeasured confounding.

X E M Y

U

14 / 42

Page 40: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Marginalization

Note we don’t want to make assumptions about U; so this is not a latentvariable model in the usual sense.

Model is defined (implicitly) by an integral:

p(x , e,m, y) =

∫p(u) p(x) p(e | x , u) p(m | e) p(y | x ,m, u) du

No state-space is assumed for hidden variable (though uniform on(0, 1) is sufficient).

But how can we

characterize the model?

test membership of the model?

fit it to data?

We aim to study the set of distributions constructed in this way.

Strategy: study constraints satisfied by these models.

15 / 42

Page 41: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Marginalization

Note we don’t want to make assumptions about U; so this is not a latentvariable model in the usual sense.

Model is defined (implicitly) by an integral:

p(x , e,m, y) =

∫p(u) p(x) p(e | x , u) p(m | e) p(y | x ,m, u) du

No state-space is assumed for hidden variable (though uniform on(0, 1) is sufficient).

But how can we

characterize the model?

test membership of the model?

fit it to data?

We aim to study the set of distributions constructed in this way.

Strategy: study constraints satisfied by these models.

15 / 42

Page 42: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Marginalization

Note we don’t want to make assumptions about U; so this is not a latentvariable model in the usual sense.

Model is defined (implicitly) by an integral:

p(x , e,m, y) =

∫p(u) p(x) p(e | x , u) p(m | e) p(y | x ,m, u) du

No state-space is assumed for hidden variable (though uniform on(0, 1) is sufficient).

But how can we

characterize the model?

test membership of the model?

fit it to data?

We aim to study the set of distributions constructed in this way.

Strategy: study constraints satisfied by these models.

15 / 42

Page 43: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Latent Variable Models

Traditional latent variable models would assume that the hidden variablesare (e.g.) Gaussian, or discrete with some fixed number of states.

Advantages: can fit fairly easily (e.g. EM algorithm, Monte Carlo).

X1

X2

H1 H2 H3

X3

X4

X5

But:

assumptions may be wrong!

latent variables lead to singularities and nasty statistical properties(see e.g. Drton, Sturmfels and Sullivant, 2009)

16 / 42

Page 44: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Latent Variable Models

Traditional latent variable models would assume that the hidden variablesare (e.g.) Gaussian, or discrete with some fixed number of states.

Advantages: can fit fairly easily (e.g. EM algorithm, Monte Carlo).

X1

X2

H1 H2 H3

X3

X4

X5

But:

assumptions may be wrong!

latent variables lead to singularities and nasty statistical properties(see e.g. Drton, Sturmfels and Sullivant, 2009)

16 / 42

Page 45: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Getting the Picture

M

(nested) N

L (example latentvariables model)

17 / 42

Page 46: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Getting the Picture

M

(nested) N

L (example latentvariables model)

17 / 42

Page 47: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Getting the Picture

M

(nested) N

L (example latentvariables model)

17 / 42

Page 48: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Getting the Picture

M

(nested) N

L (example latentvariables model)

17 / 42

Page 49: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Outline

1 DAG Models

2 An Example

3 mDAGs

4 Inequalities

5 Testing, Fitting and Searching

18 / 42

Page 50: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

The Marginal Model

Can represent any causal model with hidden variables in followingcompact format; we call this an mDAG (Evans, 2015).

C

E

DA

FB

P satisfies the marginal Markov property for G if it is the margin ofsome distribution in M(G).

The marginal model is denoted M(G).

19 / 42

Page 51: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

The Marginal Model

Can represent any causal model with hidden variables in followingcompact format; we call this an mDAG (Evans, 2015).

C

E

DA

FB

Only observed variables on graph G; latent variables represented by redhyper edges.

Can put the latents back: call this the canonical DAG G.

19 / 42

Page 52: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

The Marginal Model

Can represent any causal model with hidden variables in followingcompact format; we call this an mDAG (Evans, 2015).

C

E

DA

U V

W

FB

Only observed variables on graph G; latent variables represented by redhyper edges.

Can put the latents back: call this the canonical DAG G.

19 / 42

Page 53: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

The Marginal Model

Can represent any causal model with hidden variables in followingcompact format; we call this an mDAG (Evans, 2015).

C

E

DA

U V

W

FB

P satisfies the marginal Markov property for G if it is the margin ofsome distribution in M(G).

The marginal model is denoted M(G).

19 / 42

Page 54: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

The Marginal Model

Can represent any causal model with hidden variables in followingcompact format; we call this an mDAG (Evans, 2015).

C

E

DA

FB

P satisfies the marginal Markov property for G if it is the margin ofsome distribution in M(G).

The marginal model is denoted M(G).

19 / 42

Page 55: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Model Description

We can write down a causal model, and collapse it to an mDAG,representing its margin.

But the definition of the marginal model is implicit:

p(x1, x2, x3, x4) =

∫p(u) p(x1) p(x2 | x1, u) p(x3 | x2) p(x4 | x3, u) du

Actually determining whether or not a distribution satisfies the marginalMarkov property is hard.

Our strategy:

derive some properties satisfied by the marginal model;

define a new (larger) model that satisfies these properties;

work with the larger model.

20 / 42

Page 56: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Model Description

We can write down a causal model, and collapse it to an mDAG,representing its margin.

But the definition of the marginal model is implicit:

p(x1, x2, x3, x4) =

∫p(u) p(x1) p(x2 | x1, u) p(x3 | x2) p(x4 | x3, u) du

Actually determining whether or not a distribution satisfies the marginalMarkov property is hard.

Our strategy:

derive some properties satisfied by the marginal model;

define a new (larger) model that satisfies these properties;

work with the larger model.

20 / 42

Page 57: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Ancestral Sets

1 2 3

U

4

44

p(x1, x2, x3, x4)

=

∫u

p(u) p(x1) p(x2 | x1, u) p(x3 | x2) p(x4 | x3, u) du

=

∫u

p(u) p(x1) p(x2 | x1, u) p(x3 | x2)

∫x4

p(x4 | x3, u) dx4 du

=

∫up(u) p(x1) p(x2 | x1,u) p(x3 | x2) u

= p(x1) p(x3 | x2)

∫up(u) p(x2 | x1,u) du

= p(x1) p(x3 | x2) p(x2 | x1)

.

Density has form corresponding to ancestral sub-graph.

21 / 42

Page 58: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Ancestral Sets

1 2 3

U

4

44

p(x1, x2, x3)

=

∫x4

∫u

p(u) p(x1) p(x2 | x1, u) p(x3 | x2) p(x4 | x3, u) du dx4

=

∫u

p(u) p(x1) p(x2 | x1, u) p(x3 | x2)

∫x4

p(x4 | x3, u) dx4 du

=

∫up(u) p(x1) p(x2 | x1,u) p(x3 | x2) u

= p(x1) p(x3 | x2)

∫up(u) p(x2 | x1,u) du

= p(x1) p(x3 | x2) p(x2 | x1)

.

Density has form corresponding to ancestral sub-graph.

21 / 42

Page 59: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Ancestral Sets

1 2 3

U

4

44

p(x1, x2, x3)

=

∫x4

∫u

p(u) p(x1) p(x2 | x1, u) p(x3 | x2) p(x4 | x3, u) du dx4

=

∫u

p(u) p(x1) p(x2 | x1, u) p(x3 | x2)

∫x4

p(x4 | x3, u) dx4 du

=

∫up(u) p(x1) p(x2 | x1,u) p(x3 | x2) u

= p(x1) p(x3 | x2)

∫up(u) p(x2 | x1,u) du

= p(x1) p(x3 | x2) p(x2 | x1)

.

Density has form corresponding to ancestral sub-graph.

21 / 42

Page 60: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Ancestral Sets

1 2 3

U

4

4

4

p(x1, x2, x3)

=

∫x4

∫u

p(u) p(x1) p(x2 | x1, u) p(x3 | x2) p(x4 | x3, u) du dx4

=

∫u

p(u) p(x1) p(x2 | x1, u) p(x3 | x2)

∫x4

p(x4 | x3, u) dx4 du

=

∫up(u) p(x1) p(x2 | x1,u) p(x3 | x2) u

= p(x1) p(x3 | x2)

∫up(u) p(x2 | x1,u) du

= p(x1) p(x3 | x2) p(x2 | x1)

.

Density has form corresponding to ancestral sub-graph.

21 / 42

Page 61: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Ancestral Sets

1 2 3

U

4

4

4

p(x1, x2, x3)

=

∫x4

∫u

p(u) p(x1) p(x2 | x1, u) p(x3 | x2) p(x4 | x3, u) du dx4

=

∫u

p(u) p(x1) p(x2 | x1, u) p(x3 | x2)

∫x4

p(x4 | x3, u) dx4 du

=

∫up(u) p(x1) p(x2 | x1,u) p(x3 | x2) u

= p(x1) p(x3 | x2)

∫up(u) p(x2 | x1,u) du

= p(x1) p(x3 | x2) p(x2 | x1)

.

Density has form corresponding to ancestral sub-graph.

21 / 42

Page 62: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Ancestral Sets

1 2 3

U

4

4

4

p(x1, x2, x3)

=

∫x4

∫u

p(u) p(x1) p(x2 | x1, u) p(x3 | x2) p(x4 | x3, u) du dx4

=

∫u

p(u) p(x1) p(x2 | x1, u) p(x3 | x2)

∫x4

p(x4 | x3, u) dx4 du

=

∫up(u) p(x1) p(x2 | x1,u) p(x3 | x2) u

= p(x1) p(x3 | x2)

∫up(u) p(x2 | x1,u) du

= p(x1) p(x3 | x2) p(x2 | x1).

Density has form corresponding to ancestral sub-graph.21 / 42

Page 63: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Factorization into DistrictsDistrict is a maximal set connected by latent variables / bidirected edges:

1

2

3

4

5

U

V

∫u,v

p(u) p(x1 | u) p(x2 | u) p(v) p(x3 | x1, v) p(x4 | x2, v) p(x5 | x3) du dv

=

∫u

p(u) p(x1 | u) p(x2 | u) du

∫v

p(v) p(x3 | x1, v) p(x4 | x2, v) dv p(x5 | x3)

= q12(x1, x2) · q34(x3, x4 | x1, x2) · q5(x5 | x3) .

=∏i

qDi (xDi | xpa(Di )\Di)

Each qD piece should come from the model based on district D and itsparents (G[D]).

22 / 42

Page 64: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Factorization into DistrictsDistrict is a maximal set connected by latent variables / bidirected edges:

1

2

3

4

5

U

V

∫u,v

p(u) p(x1 | u) p(x2 | u) p(v) p(x3 | x1, v) p(x4 | x2, v) p(x5 | x3) du dv

=

∫u

p(u) p(x1 | u) p(x2 | u) du

∫v

p(v) p(x3 | x1, v) p(x4 | x2, v) dv p(x5 | x3)

= q12(x1, x2) · q34(x3, x4 | x1, x2) · q5(x5 | x3) .

=∏i

qDi (xDi | xpa(Di )\Di)

Each qD piece should come from the model based on district D and itsparents (G[D]).

22 / 42

Page 65: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Factorization into DistrictsDistrict is a maximal set connected by latent variables / bidirected edges:

1

2

3

4

5U

V

∫u,v

p(u) p(x1 | u) p(x2 | u) p(v) p(x3 | x1, v) p(x4 | x2, v) p(x5 | x3) du dv

=

∫u

p(u) p(x1 | u) p(x2 | u) du

∫v

p(v) p(x3 | x1, v) p(x4 | x2, v) dv p(x5 | x3)

= q12(x1, x2) · q34(x3, x4 | x1, x2) · q5(x5 | x3) .

=∏i

qDi (xDi | xpa(Di )\Di)

Each qD piece should come from the model based on district D and itsparents (G[D]).

22 / 42

Page 66: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Factorization into DistrictsDistrict is a maximal set connected by latent variables / bidirected edges:

1

2

3

4

5U

V

∫u,v

p(u) p(x1 | u) p(x2 | u) p(v) p(x3 | x1, v) p(x4 | x2, v) p(x5 | x3) du dv

=

∫u

p(u) p(x1 | u) p(x2 | u) du

∫v

p(v) p(x3 | x1, v) p(x4 | x2, v) dv p(x5 | x3)

= q12(x1, x2) · q34(x3, x4 | x1, x2) · q5(x5 | x3) .

=∏i

qDi (xDi | xpa(Di )\Di)

Each qD piece should come from the model based on district D and itsparents (G[D]).

22 / 42

Page 67: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Factorization into DistrictsDistrict is a maximal set connected by latent variables / bidirected edges:

1

2

3

4

5U

V

∫u,v

p(u) p(x1 | u) p(x2 | u) p(v) p(x3 | x1, v) p(x4 | x2, v) p(x5 | x3) du dv

=

∫u

p(u) p(x1 | u) p(x2 | u) du

∫v

p(v) p(x3 | x1, v) p(x4 | x2, v) dv p(x5 | x3)

= q12(x1, x2) · q34(x3, x4 | x1, x2) · q5(x5 | x3) .

=∏i

qDi (xDi | xpa(Di )\Di)

Each qD piece should come from the model based on district D and itsparents (G[D]).

22 / 42

Page 68: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Factorization into DistrictsDistrict is a maximal set connected by latent variables / bidirected edges:

1

2

3

4

5U

V

∫u,v

p(u) p(x1 | u) p(x2 | u) p(v) p(x3 | x1, v) p(x4 | x2, v) p(x5 | x3) du dv

=

∫u

p(u) p(x1 | u) p(x2 | u) du

∫v

p(v) p(x3 | x1, v) p(x4 | x2, v) dv p(x5 | x3)

= q12(x1, x2) · q34(x3, x4 | x1, x2) · q5(x5 | x3) .

=∏i

qDi (xDi | xpa(Di )\Di)

Each qD piece should come from the model based on district D and itsparents (G[D]).

22 / 42

Page 69: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Factorization into DistrictsDistrict is a maximal set connected by latent variables / bidirected edges:

1

2

1

2

3

4

3 5

∫u,v

p(u) p(x1 | u) p(x2 | u) p(v) p(x3 | x1, v) p(x4 | x2, v) p(x5 | x3) du dv

=

∫u

p(u) p(x1 | u) p(x2 | u) du

∫v

p(v) p(x3 | x1, v) p(x4 | x2, v) dv p(x5 | x3)

= q12(x1, x2) · q34(x3, x4 | x1, x2) · q5(x5 | x3) .

=∏i

qDi (xDi | xpa(Di )\Di)

Each qD piece should come from the model based on district D and itsparents (G[D]).

22 / 42

Page 70: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Nested Model

We use these two rules to define our model.

Say (conditional) probability distribution p recursively factorizesaccording to mDAG G and write p ∈ N (G) if:

1. Ancestrality. ∫xv

p(xV | xW ) dxv ∈ N (G−v )

for each childless v ∈ V .

2. Factorization into districts.

p(xV | xW ) =∏D

qD(xD | xpa(D)\D)

for districts D, where qD ∈ N (G[D]).

Note that one can iterate between 1 and 2.

This defines the nested Markov model N (G). (Shpitser et al., 2014)

23 / 42

Page 71: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Nested Model

We use these two rules to define our model.

Say (conditional) probability distribution p recursively factorizesaccording to mDAG G and write p ∈ N (G) if:

1. Ancestrality. ∫xv

p(xV | xW ) dxv ∈ N (G−v )

for each childless v ∈ V .

2. Factorization into districts.

p(xV | xW ) =∏D

qD(xD | xpa(D)\D)

for districts D, where qD ∈ N (G[D]).

Note that one can iterate between 1 and 2.

This defines the nested Markov model N (G). (Shpitser et al., 2014)

23 / 42

Page 72: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Nested Model

We use these two rules to define our model.

Say (conditional) probability distribution p recursively factorizesaccording to mDAG G and write p ∈ N (G) if:

1. Ancestrality. ∫xv

p(xV | xW ) dxv ∈ N (G−v )

for each childless v ∈ V .

2. Factorization into districts.

p(xV | xW ) =∏D

qD(xD | xpa(D)\D)

for districts D, where qD ∈ N (G[D]).

Note that one can iterate between 1 and 2.

This defines the nested Markov model N (G). (Shpitser et al., 2014)

23 / 42

Page 73: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Nested Model

We use these two rules to define our model.

Say (conditional) probability distribution p recursively factorizesaccording to mDAG G and write p ∈ N (G) if:

1. Ancestrality. ∫xv

p(xV | xW ) dxv ∈ N (G−v )

for each childless v ∈ V .

2. Factorization into districts.

p(xV | xW ) =∏D

qD(xD | xpa(D)\D)

for districts D, where qD ∈ N (G[D]).

Note that one can iterate between 1 and 2.

This defines the nested Markov model N (G). (Shpitser et al., 2014)

23 / 42

Page 74: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Example

X E M Y

Y

Y childless,

so if p ∈ N (G), then

p(x , e,m) = p(x) · p(e | x) · p(m | e),

and therefore X ⊥⊥ M |E .

24 / 42

Page 75: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Example

X E M

Y

Y

Y childless, so if p ∈ N (G), then

p(x , e,m) = p(x) · p(e | x) · p(m | e),

and therefore X ⊥⊥ M |E .

24 / 42

Page 76: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Example

X E M

Y

Y

Y childless, so if p ∈ N (G), then

p(x , e,m) = p(x) · p(e | x) · p(m | e),

and therefore X ⊥⊥ M |E .

24 / 42

Page 77: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Example

X

X

E M

M

Y

Axiom 2:

p(x , e,m, y) = qX (x) · qM(m | e) · qEY (e, y | x ,m).

Can consider the district {E ,Y } and factor qEY ...and then apply Axiom 1 to marginalize E .

We see that X ⊥⊥ M,Y [qEY ].

This places a non-trivial constraint on p.

25 / 42

Page 78: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Example

X

X E

M

M Y

Axiom 2:

p(x , e,m, y) = qX (x) · qM(m | e) · qEY (e, y | x ,m).

Can consider the district {E ,Y } and factor qEY ...

and then apply Axiom 1 to marginalize E .

We see that X ⊥⊥ M,Y [qEY ].

This places a non-trivial constraint on p.

25 / 42

Page 79: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Example

X

X

E M

M Y

Axiom 2:

p(x , e,m, y) = qX (x) · qM(m | e) · qEY (e, y | x ,m).

Can consider the district {E ,Y } and factor qEY ...and then apply Axiom 1 to marginalize E .

We see that X ⊥⊥ M,Y [qEY ].

This places a non-trivial constraint on p.

25 / 42

Page 80: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Example

X

X

E M

M Y

Axiom 2:

p(x , e,m, y) = qX (x) · qM(m | e) · qEY (e, y | x ,m).

Can consider the district {E ,Y } and factor qEY ...and then apply Axiom 1 to marginalize E .

We see that X ⊥⊥ M,Y [qEY ].

This places a non-trivial constraint on p.

25 / 42

Page 81: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Completeness

We knowM(G) ⊆ N (G).

Could there be other constraints?

For discrete observed variables, we know not.

Theorem (Evans, 2015a)

For discrete observed variables, the constraints implied by the nestedMarkov model are algebraically equivalent to causal model with latentvariables (with suff. large latent state-space).

‘Algebraically equivalent’ = ‘up to inequalities’.Any ‘gap’ M(G) ⊂ N (G) is due to inequality constraints.

So in particular they have the same dimension.

26 / 42

Page 82: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Completeness

We knowM(G) ⊆ N (G).

Could there be other constraints?For discrete observed variables, we know not.

Theorem (Evans, 2015a)

For discrete observed variables, the constraints implied by the nestedMarkov model are algebraically equivalent to causal model with latentvariables (with suff. large latent state-space).

‘Algebraically equivalent’ = ‘up to inequalities’.Any ‘gap’ M(G) ⊂ N (G) is due to inequality constraints.

So in particular they have the same dimension.

26 / 42

Page 83: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Completeness

We knowM(G) ⊆ N (G).

Could there be other constraints?For discrete observed variables, we know not.

Theorem (Evans, 2015a)

For discrete observed variables, the constraints implied by the nestedMarkov model are algebraically equivalent to causal model with latentvariables (with suff. large latent state-space).

‘Algebraically equivalent’ = ‘up to inequalities’.Any ‘gap’ M(G) ⊂ N (G) is due to inequality constraints.

So in particular they have the same dimension.

26 / 42

Page 84: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Getting the Picture

M

(nested) N

L (example latentvariables model)

27 / 42

Page 85: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Getting the Picture

M

(nested) N

L (example latentvariables model)

27 / 42

Page 86: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Getting the Picture

M

(nested) N

L (example latentvariables model)

27 / 42

Page 87: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Getting the Picture

M

(nested) N

L (example latentvariables model)

27 / 42

Page 88: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Main Result

Nested model is a good approximation to the marginal model: in thediscrete case it can be explicitly parameterized and fitted.

Theorem (Evans and Richardson, 2015)

Discrete nested models are curved exponential families.

This has very nice statistical implications, including for the marginalmodel.

All parameters are of the form p(X | do(Y )): easily interpretable.

28 / 42

Page 89: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Main Result

Nested model is a good approximation to the marginal model: in thediscrete case it can be explicitly parameterized and fitted.

Theorem (Evans and Richardson, 2015)

Discrete nested models are curved exponential families.

This has very nice statistical implications, including for the marginalmodel.

All parameters are of the form p(X | do(Y )): easily interpretable.

28 / 42

Page 90: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Wisconsin Data Example

Take only male respondents who were either drafted or didn’t entermilitary at all (before 1975).

Continuous values dichotomised close to median.

Four binary indicators:

X family income >$5k in 1957;

E education post high school;

M drafted into military;

Y respondent income >$37k in 1992.

1,676 complete cases in 24 contingency table (minimum count 16).

29 / 42

Page 91: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Resultsmodel deviance d.f.

(a) X E M Y (saturated) 15

(b) X E M Y 31.3 2

(c) X E M Y 5.6 6

No evidence that military service has any effect on income aftercontrolling for education.

Removing any edges from (c) strongly rejected.

Also find strong residual income effect:

P(Y = 1 | do(X = 0)) = 0.36 P(Y = 1 | do(X = 1)) = 0.50.

30 / 42

Page 92: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Resultsmodel deviance d.f.

(a) X E M Y (saturated) 15

(b) X E M Y 31.3 2

(c) X E M Y 5.6 6

No evidence that military service has any effect on income aftercontrolling for education.

Removing any edges from (c) strongly rejected.

Also find strong residual income effect:

P(Y = 1 | do(X = 0)) = 0.36 P(Y = 1 | do(X = 1)) = 0.50.

30 / 42

Page 93: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Resultsmodel deviance d.f.

(a) X E M Y (saturated) 15

(b) X E M Y 31.3 2

(c) X E M Y 5.6 6

No evidence that military service has any effect on income aftercontrolling for education.

Removing any edges from (c) strongly rejected.

Also find strong residual income effect:

P(Y = 1 | do(X = 0)) = 0.36 P(Y = 1 | do(X = 1)) = 0.50.

30 / 42

Page 94: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Outline

1 DAG Models

2 An Example

3 mDAGs

4 Inequalities

5 Testing, Fitting and Searching

31 / 42

Page 95: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

The IV ModelThe instrumental variables model is represented by the mDAG below.

Z X Y

Assume all observed variables are discrete.

Nested Markov property gives saturated model, so true model of fulldimension.

Pearl (1995) showed that if the observed variables are discrete,

maxx

∑y

maxz

P(X = x ,Y = y |Z = z) ≤ 1. (∗)

e.g.

P(X = x ,Y = 0 |Z = 0) + P(X = x ,Y = 1 |Z = 1) ≤ 1.

This is the instrumental inequality, and can be empirically tested.

32 / 42

Page 96: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

The IV ModelThe instrumental variables model is represented by the mDAG below.

Z X Y

Assume all observed variables are discrete.

Nested Markov property gives saturated model, so true model of fulldimension.

Pearl (1995) showed that if the observed variables are discrete,

maxx

∑y

maxz

P(X = x ,Y = y |Z = z) ≤ 1. (∗)

e.g.

P(X = x ,Y = 0 |Z = 0) + P(X = x ,Y = 1 |Z = 1) ≤ 1.

This is the instrumental inequality, and can be empirically tested.

32 / 42

Page 97: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

The IV ModelThe instrumental variables model is represented by the mDAG below.

Z X Y

Assume all observed variables are discrete.

Nested Markov property gives saturated model, so true model of fulldimension.

Pearl (1995) showed that if the observed variables are discrete,

maxx

∑y

maxz

P(X = x ,Y = y |Z = z) ≤ 1. (∗)

e.g.

P(X = x ,Y = 0 |Z = 0) + P(X = x ,Y = 1 |Z = 1) ≤ 1.

This is the instrumental inequality, and can be empirically tested.

32 / 42

Page 98: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

The IV ModelThe instrumental variables model is represented by the mDAG below.

Z X Y

Assume all observed variables are discrete.

Nested Markov property gives saturated model, so true model of fulldimension.

Pearl (1995) showed that if the observed variables are discrete,

maxx

∑y

maxz

P(X = x ,Y = y |Z = z) ≤ 1. (∗)

e.g.

P(X = x ,Y = 0 |Z = 0) + P(X = x ,Y = 1 |Z = 1) ≤ 1.

This is the instrumental inequality, and can be empirically tested.32 / 42

Page 99: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Missing Edges Give Constraints

Proposition (Evans, 2012)

If X and Y are not joined by an edge in G there is always a constraintinduced on a discrete joint distribution.

33 / 42

Page 100: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Outline

1 DAG Models

2 An Example

3 mDAGs

4 Inequalities

5 Testing, Fitting and Searching

34 / 42

Page 101: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Equivalence on Three Variables

Markov equivalence (i.e. determining whether two models are observablythe same) is hard.

Using Evans (2015) there are 8 unlabelled marginal models on threevariables.

35 / 42

Page 102: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

But Not on Four!

On four variables, it’s still not clear whether or not the following modelsare saturated: (they are of full dimension in the discrete case)

1 2

3 41 2 4

3

1 2 3 4

36 / 42

Page 103: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Fitting Marginal Models

The ‘implicit’ nature of marginal models makes them hard to describeand to test.

We can test constraints individually, but this is very inefficient.

On the other hand

the nested model N (G) can be parameterized and fitted;

latent variable models L(G) can be parameterized and fitted;

L(G) ⊆M(G) ⊆ N (G).

So if we accept the latent variable model, or reject the nested model,same applies to marginal model.

37 / 42

Page 104: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Fitting Marginal Models

The ‘implicit’ nature of marginal models makes them hard to describeand to test.

We can test constraints individually, but this is very inefficient.

On the other hand

the nested model N (G) can be parameterized and fitted;

latent variable models L(G) can be parameterized and fitted;

L(G) ⊆M(G) ⊆ N (G).

So if we accept the latent variable model, or reject the nested model,same applies to marginal model.

37 / 42

Page 105: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Fitting Marginal Models

The ‘implicit’ nature of marginal models makes them hard to describeand to test.

We can test constraints individually, but this is very inefficient.

On the other hand

the nested model N (G) can be parameterized and fitted;

latent variable models L(G) can be parameterized and fitted;

L(G) ⊆M(G) ⊆ N (G).

So if we accept the latent variable model, or reject the nested model,same applies to marginal model.

37 / 42

Page 106: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

That Picture Again

M

(nested) N

L (example latentvariables model)

38 / 42

Page 107: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Some Extensions

We know nested models are curved exponential families, so justifiesclassical statistical theory:

likelihood ratio tests have asymptotic χ2-distribution;

BIC as Laplace approximation of marginal likelihood.

Since marginal models are the same dimension, they share theseproperties (except on their boundary).

Also, latent variable models become regular if state-space is largeenough.

Can also include continuous covariates with outcome as multivariateresponse. e.g.:

X E M Y

C

39 / 42

Page 108: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Some Extensions

We know nested models are curved exponential families, so justifiesclassical statistical theory:

likelihood ratio tests have asymptotic χ2-distribution;

BIC as Laplace approximation of marginal likelihood.

Since marginal models are the same dimension, they share theseproperties (except on their boundary).

Also, latent variable models become regular if state-space is largeenough.

Can also include continuous covariates with outcome as multivariateresponse. e.g.:

X E M Y

C

39 / 42

Page 109: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Some Extensions

We know nested models are curved exponential families, so justifiesclassical statistical theory:

likelihood ratio tests have asymptotic χ2-distribution;

BIC as Laplace approximation of marginal likelihood.

Since marginal models are the same dimension, they share theseproperties (except on their boundary).

Also, latent variable models become regular if state-space is largeenough.

Can also include continuous covariates with outcome as multivariateresponse. e.g.:

X E M Y

C

39 / 42

Page 110: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Summary

(Causal) DAGs with latent variables induce non-parametricconstraints;

can use these to define nested models;

avoids some problems and assumptions of latent variable models:non-regularity, unidentifiability;

discrete parameterization and fitting algorithms available;

solves some boundary issues (at expense of larger model class).

Some limitations:

Complete inequality constraints seem very complicated (thoughsome hope exists);

nice rule for model equivalence not yet available for either nested ormarginal models.

40 / 42

Page 111: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Summary

(Causal) DAGs with latent variables induce non-parametricconstraints;

can use these to define nested models;

avoids some problems and assumptions of latent variable models:non-regularity, unidentifiability;

discrete parameterization and fitting algorithms available;

solves some boundary issues (at expense of larger model class).

Some limitations:

Complete inequality constraints seem very complicated (thoughsome hope exists);

nice rule for model equivalence not yet available for either nested ormarginal models.

40 / 42

Page 112: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Summary

(Causal) DAGs with latent variables induce non-parametricconstraints;

can use these to define nested models;

avoids some problems and assumptions of latent variable models:non-regularity, unidentifiability;

discrete parameterization and fitting algorithms available;

solves some boundary issues (at expense of larger model class).

Some limitations:

Complete inequality constraints seem very complicated (thoughsome hope exists);

nice rule for model equivalence not yet available for either nested ormarginal models.

40 / 42

Page 113: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Summary

(Causal) DAGs with latent variables induce non-parametricconstraints;

can use these to define nested models;

avoids some problems and assumptions of latent variable models:non-regularity, unidentifiability;

discrete parameterization and fitting algorithms available;

solves some boundary issues (at expense of larger model class).

Some limitations:

Complete inequality constraints seem very complicated (thoughsome hope exists);

nice rule for model equivalence not yet available for either nested ormarginal models.

40 / 42

Page 114: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Summary

(Causal) DAGs with latent variables induce non-parametricconstraints;

can use these to define nested models;

avoids some problems and assumptions of latent variable models:non-regularity, unidentifiability;

discrete parameterization and fitting algorithms available;

solves some boundary issues (at expense of larger model class).

Some limitations:

Complete inequality constraints seem very complicated (thoughsome hope exists);

nice rule for model equivalence not yet available for either nested ormarginal models.

40 / 42

Page 115: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Summary

(Causal) DAGs with latent variables induce non-parametricconstraints;

can use these to define nested models;

avoids some problems and assumptions of latent variable models:non-regularity, unidentifiability;

discrete parameterization and fitting algorithms available;

solves some boundary issues (at expense of larger model class).

Some limitations:

Complete inequality constraints seem very complicated (thoughsome hope exists);

nice rule for model equivalence not yet available for either nested ormarginal models.

40 / 42

Page 116: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Summary

(Causal) DAGs with latent variables induce non-parametricconstraints;

can use these to define nested models;

avoids some problems and assumptions of latent variable models:non-regularity, unidentifiability;

discrete parameterization and fitting algorithms available;

solves some boundary issues (at expense of larger model class).

Some limitations:

Complete inequality constraints seem very complicated (thoughsome hope exists);

nice rule for model equivalence not yet available for either nested ormarginal models.

40 / 42

Page 117: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Summary

(Causal) DAGs with latent variables induce non-parametricconstraints;

can use these to define nested models;

avoids some problems and assumptions of latent variable models:non-regularity, unidentifiability;

discrete parameterization and fitting algorithms available;

solves some boundary issues (at expense of larger model class).

Some limitations:

Complete inequality constraints seem very complicated (thoughsome hope exists);

nice rule for model equivalence not yet available for either nested ormarginal models.

40 / 42

Page 118: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Thank you!

41 / 42

Page 119: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

References

Evans – Graphical methods for inequality constraints in marginalized DAGs,MLSP, 2012.

Evans – Graphs for margins of Bayesian networks, arXiv:1408.1809, Scand. J.Statist., to appear, 2015.

Evans – Margins of discrete Bayesian networks, arXiv:1501.02103, 2015a.

Evans and Richardson – Smooth, identifiable supermodels of discrete DAGmodels with latent variables, arXiv:1511.06813, 2015.

Pearl – On the testability of causal models with latent and instrumentalvariables, UAI, 1995.

Richardson and Spirtes. Ancestral graph Markov models, Ann. Stat., 2002.

Shpitser, Evans, Richardson, and Robins – Introduction to Nested MarkovModels. Behaviormetrika, 2014.

Spirtes, Glymour, Scheines – Causation Prediction and Search, 2nd Edition,MIT Press, 2000.

Verma, Pearl. Equivalence and synthesis of causal models, UAI, 1990.

42 / 42

Page 120: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Ancestral Sets

1 2 3

u

4

44

p(x1, x2, x3, x4)

=∑u

p(u) p(x1) p(x2 | x1, u) p(x3 | x2) p(x4 | x3, u)

=∑u

p(u) p(x1) p(x2 | x1, u) p(x3 | x2)∑x4

p(x4 | x3, u)

=∑u

p(u) p(x1) p(x2 | x1,u) p(x3 | x2)

= p(x1) p(x3 | x2)∑u

p(u) p(x2 | x1,u)

= p(x1) p(x3 | x2) p(x2 | x1)

.

Density has form corresponding to ancestral sub-graph.

43 / 42

Page 121: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Ancestral Sets

1 2 3

u

4

44

p(x1, x2, x3)

=∑x4

∑u

p(u) p(x1) p(x2 | x1, u) p(x3 | x2) p(x4 | x3, u)

=∑u

p(u) p(x1) p(x2 | x1, u) p(x3 | x2)∑x4

p(x4 | x3, u)

=∑u

p(u) p(x1) p(x2 | x1,u) p(x3 | x2)

= p(x1) p(x3 | x2)∑u

p(u) p(x2 | x1,u)

= p(x1) p(x3 | x2) p(x2 | x1)

.

Density has form corresponding to ancestral sub-graph.

43 / 42

Page 122: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Ancestral Sets

1 2 3

u

4

44

p(x1, x2, x3)

=∑x4

∑u

p(u) p(x1) p(x2 | x1, u) p(x3 | x2) p(x4 | x3, u)

=∑u

p(u) p(x1) p(x2 | x1, u) p(x3 | x2)∑x4

p(x4 | x3, u)

=∑u

p(u) p(x1) p(x2 | x1,u) p(x3 | x2)

= p(x1) p(x3 | x2)∑u

p(u) p(x2 | x1,u)

= p(x1) p(x3 | x2) p(x2 | x1)

.

Density has form corresponding to ancestral sub-graph.

43 / 42

Page 123: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Ancestral Sets

1 2 3

u

4

4

4

p(x1, x2, x3)

=∑x4

∑u

p(u) p(x1) p(x2 | x1, u) p(x3 | x2) p(x4 | x3, u)

=∑u

p(u) p(x1) p(x2 | x1, u) p(x3 | x2)∑x4

p(x4 | x3, u)

=∑u

p(u) p(x1) p(x2 | x1,u) p(x3 | x2)

= p(x1) p(x3 | x2)∑u

p(u) p(x2 | x1,u)

= p(x1) p(x3 | x2) p(x2 | x1)

.

Density has form corresponding to ancestral sub-graph.

43 / 42

Page 124: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Ancestral Sets

1 2 3

u

4

4

4

p(x1, x2, x3)

=∑x4

∑u

p(u) p(x1) p(x2 | x1, u) p(x3 | x2) p(x4 | x3, u)

=∑u

p(u) p(x1) p(x2 | x1, u) p(x3 | x2)∑x4

p(x4 | x3, u)

=∑u

p(u) p(x1) p(x2 | x1,u) p(x3 | x2)

= p(x1) p(x3 | x2)∑u

p(u) p(x2 | x1,u)

= p(x1) p(x3 | x2) p(x2 | x1)

.

Density has form corresponding to ancestral sub-graph.

43 / 42

Page 125: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Ancestral Sets

1 2 3

u

4

4

4

p(x1, x2, x3)

=∑x4

∑u

p(u) p(x1) p(x2 | x1, u) p(x3 | x2) p(x4 | x3, u)

=∑u

p(u) p(x1) p(x2 | x1, u) p(x3 | x2)∑x4

p(x4 | x3, u)

=∑u

p(u) p(x1) p(x2 | x1,u) p(x3 | x2)

= p(x1) p(x3 | x2)∑u

p(u) p(x2 | x1,u)

= p(x1) p(x3 | x2) p(x2 | x1).

Density has form corresponding to ancestral sub-graph.43 / 42

Page 126: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Factorization into DistrictsDistrict is a maximal set connected by latent variables / bidirected edges:

1

2

3

4

5

uv

∑u,v

p(u) p(x1 | u) p(x2 | u) p(v) p(x3 | x1, v) p(x4 | x2, v) p(x5 | x3)

=∑u

p(u) p(x1 | u) p(x2 | u)∑v

p(v) p(x3 | x1, v) p(x4 | x2, v) p(x5 | x3)

= q12(x1, x2) · q34(x3, x4 | x1, x2) · q5(x5 | x3) .

=∏i

qDi (xDi | xpa(Di )\Di)

Each qD piece should come from the model based on district subgraphand its parents (G[D]).

44 / 42

Page 127: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Factorization into DistrictsDistrict is a maximal set connected by latent variables / bidirected edges:

1

2

3

4

5

uv

∑u,v

p(u) p(x1 | u) p(x2 | u) p(v) p(x3 | x1, v) p(x4 | x2, v) p(x5 | x3)

=∑u

p(u) p(x1 | u) p(x2 | u)∑v

p(v) p(x3 | x1, v) p(x4 | x2, v) p(x5 | x3)

= q12(x1, x2) · q34(x3, x4 | x1, x2) · q5(x5 | x3) .

=∏i

qDi (xDi | xpa(Di )\Di)

Each qD piece should come from the model based on district subgraphand its parents (G[D]).

44 / 42

Page 128: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Factorization into DistrictsDistrict is a maximal set connected by latent variables / bidirected edges:

1

2

3

4

5u

v

∑u,v

p(u) p(x1 | u) p(x2 | u) p(v) p(x3 | x1, v) p(x4 | x2, v) p(x5 | x3)

=∑u

p(u) p(x1 | u) p(x2 | u)∑v

p(v) p(x3 | x1, v) p(x4 | x2, v) p(x5 | x3)

= q12(x1, x2) · q34(x3, x4 | x1, x2) · q5(x5 | x3) .

=∏i

qDi (xDi | xpa(Di )\Di)

Each qD piece should come from the model based on district subgraphand its parents (G[D]).

44 / 42

Page 129: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Factorization into DistrictsDistrict is a maximal set connected by latent variables / bidirected edges:

1

2

3

4

5u

v

∑u,v

p(u) p(x1 | u) p(x2 | u) p(v) p(x3 | x1, v) p(x4 | x2, v) p(x5 | x3)

=∑u

p(u) p(x1 | u) p(x2 | u)∑v

p(v) p(x3 | x1, v) p(x4 | x2, v) p(x5 | x3)

= q12(x1, x2) · q34(x3, x4 | x1, x2) · q5(x5 | x3) .

=∏i

qDi (xDi | xpa(Di )\Di)

Each qD piece should come from the model based on district subgraphand its parents (G[D]).

44 / 42

Page 130: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Factorization into DistrictsDistrict is a maximal set connected by latent variables / bidirected edges:

1

2

3

4

5u

v

∑u,v

p(u) p(x1 | u) p(x2 | u) p(v) p(x3 | x1, v) p(x4 | x2, v) p(x5 | x3)

=∑u

p(u) p(x1 | u) p(x2 | u)∑v

p(v) p(x3 | x1, v) p(x4 | x2, v) p(x5 | x3)

= q12(x1, x2) · q34(x3, x4 | x1, x2) · q5(x5 | x3) .

=∏i

qDi (xDi | xpa(Di )\Di)

Each qD piece should come from the model based on district subgraphand its parents (G[D]).

44 / 42

Page 131: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Factorization into DistrictsDistrict is a maximal set connected by latent variables / bidirected edges:

1

2

3

4

5u

v

∑u,v

p(u) p(x1 | u) p(x2 | u) p(v) p(x3 | x1, v) p(x4 | x2, v) p(x5 | x3)

=∑u

p(u) p(x1 | u) p(x2 | u)∑v

p(v) p(x3 | x1, v) p(x4 | x2, v) p(x5 | x3)

= q12(x1, x2) · q34(x3, x4 | x1, x2) · q5(x5 | x3) .

=∏i

qDi (xDi | xpa(Di )\Di)

Each qD piece should come from the model based on district subgraphand its parents (G[D]).

44 / 42

Page 132: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Factorization into DistrictsDistrict is a maximal set connected by latent variables / bidirected edges:

1

2

1

2

3

4

3 5

∑u,v

p(u) p(x1 | u) p(x2 | u) p(v) p(x3 | x1, v) p(x4 | x2, v) p(x5 | x3)

=∑u

p(u) p(x1 | u) p(x2 | u)∑v

p(v) p(x3 | x1, v) p(x4 | x2, v) p(x5 | x3)

= q12(x1, x2) · q34(x3, x4 | x1, x2) · q5(x5 | x3) .

=∏i

qDi (xDi | xpa(Di )\Di)

Each qD piece should come from the model based on district subgraphand its parents (G[D]).

44 / 42

Page 133: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Nested Model

We use these two rules to define our model.

Say (conditional) probability distribution p recursively factorizesaccording to CADMG G and write p ∈ N (G) if:

1. Ancestrality. ∑xv

p(xV | xW ) ∈ N (G−v )

for each childless v ∈ V .

2. Factorization into districts.

p(xV | xW ) =∏D

qD(xD | xpa(D)\D)

for districts D, where qD ∈ N (G[D]).

Note that one can iterate between 1 and 2.

This defines the nested Markov model N (G).

45 / 42

Page 134: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Nested Model

We use these two rules to define our model.

Say (conditional) probability distribution p recursively factorizesaccording to CADMG G and write p ∈ N (G) if:

1. Ancestrality. ∑xv

p(xV | xW ) ∈ N (G−v )

for each childless v ∈ V .

2. Factorization into districts.

p(xV | xW ) =∏D

qD(xD | xpa(D)\D)

for districts D, where qD ∈ N (G[D]).

Note that one can iterate between 1 and 2.

This defines the nested Markov model N (G).

45 / 42

Page 135: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Nested Model

We use these two rules to define our model.

Say (conditional) probability distribution p recursively factorizesaccording to CADMG G and write p ∈ N (G) if:

1. Ancestrality. ∑xv

p(xV | xW ) ∈ N (G−v )

for each childless v ∈ V .

2. Factorization into districts.

p(xV | xW ) =∏D

qD(xD | xpa(D)\D)

for districts D, where qD ∈ N (G[D]).

Note that one can iterate between 1 and 2.

This defines the nested Markov model N (G).

45 / 42

Page 136: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Nested Model

We use these two rules to define our model.

Say (conditional) probability distribution p recursively factorizesaccording to CADMG G and write p ∈ N (G) if:

1. Ancestrality. ∑xv

p(xV | xW ) ∈ N (G−v )

for each childless v ∈ V .

2. Factorization into districts.

p(xV | xW ) =∏D

qD(xD | xpa(D)\D)

for districts D, where qD ∈ N (G[D]).

Note that one can iterate between 1 and 2.

This defines the nested Markov model N (G).

45 / 42

Page 137: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Causal Coherence of mDAGs

If P is represented be a DAG in a causally interpreted way, thenintervening on some set of nodes C ⊆ V can be represented by deleting

incoming edges to C in G. Call that graph GC

Theorem (Evans, 2015)

If C ⊆ O then p(GC ,O) = p(G,O)C ; i.e. the projection respects causalinterventions.

46 / 42

Page 138: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Causal Coherence

If we intervene on some observed variables, this ‘breaks’ their dependenceupon their parents.

1 2

3u

4

w

1

−→intervene

2

3u

4

w

1 2

3

4

↓ project

1

−→intervene

2

3

4

↓ project

47 / 42

Page 139: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Causal Coherence

If we intervene on some observed variables, this ‘breaks’ their dependenceupon their parents.

1 2

3u

4

w

1

−→intervene

2

3u

4

w

1 2

3

4

↓ project

1

−→intervene

2

3

4

↓ project

47 / 42

Page 140: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Causal Coherence

If we intervene on some observed variables, this ‘breaks’ their dependenceupon their parents.

1 2

3u

4

w

1

−→intervene

2

3u

4

w

1 2

3

4

↓ project

1

−→intervene

2

3

4

↓ project

47 / 42

Page 141: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Causal Coherence

If we intervene on some observed variables, this ‘breaks’ their dependenceupon their parents.

1 2

3u

4

w

1

−→intervene

2

3u

4

w

1 2

3

4

↓ project

1

−→intervene

2

3

4

↓ project

47 / 42

Page 142: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Causal Coherence

If we intervene on some observed variables, this ‘breaks’ their dependenceupon their parents.

1 2

3u

4

w

1

−→intervene

2

3u

4

w

1 2

3

4

↓ project

1

−→intervene

2

3

4

↓ project

47 / 42

Page 143: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Proof of Instrumental Inequality

Z X

U

Y

Have: p(x , y | z) =

∫p(u) p(x | z , u) p(y | x , u) du.

Construct a fictitious distribution p∗ξ :

p∗ξ (x , y | z) =

∫p(u) p(x | z , u) p(y | x = ξ, u) du.

Now Y behaves as though X = ξ regardless of X ’s actual value.Causally, we can think of this as an intervention severing X → Y .

Can’t observe p∗ but:

Consistency: p(ξ, y | z) = p∗(ξ, y | z) for each z , y ; and

Independence: Y ⊥⊥ Z under p∗.

48 / 42

Page 144: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Proof of Instrumental Inequality

Z X

U

Y

Have: p(x , y | z) =

∫p(u) p(x | z , u) p(y | x , u) du.

Construct a fictitious distribution p∗ξ :

p∗ξ (x , y | z) =

∫p(u) p(x | z , u) p(y | x = ξ, u) du.

Now Y behaves as though X = ξ regardless of X ’s actual value.

Causally, we can think of this as an intervention severing X → Y .

Can’t observe p∗ but:

Consistency: p(ξ, y | z) = p∗(ξ, y | z) for each z , y ; and

Independence: Y ⊥⊥ Z under p∗.

48 / 42

Page 145: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Proof of Instrumental Inequality

Z X

U

Y

Have: p(x , y | z) =

∫p(u) p(x | z , u) p(y | x , u) du.

Construct a fictitious distribution p∗ξ :

p∗ξ (x , y | z) =

∫p(u) p(x | z , u) p(y | x = ξ, u) du.

Now Y behaves as though X = ξ regardless of X ’s actual value.Causally, we can think of this as an intervention severing X → Y .

Can’t observe p∗ but:

Consistency: p(ξ, y | z) = p∗(ξ, y | z) for each z , y ; and

Independence: Y ⊥⊥ Z under p∗.

48 / 42

Page 146: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Proof of Instrumental Inequality

Z X

U

Y

Have: p(x , y | z) =

∫p(u) p(x | z , u) p(y | x , u) du.

Construct a fictitious distribution p∗ξ :

p∗ξ (x , y | z) =

∫p(u) p(x | z , u) p(y | x = ξ, u) du.

Now Y behaves as though X = ξ regardless of X ’s actual value.Causally, we can think of this as an intervention severing X → Y .

Can’t observe p∗ but:

Consistency: p(ξ, y | z) = p∗(ξ, y | z) for each z , y ; and

Independence: Y ⊥⊥ Z under p∗.

48 / 42

Page 147: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Solution: A Different Proof

For each x = ξ we require p∗ξ :

pξ(ξ, y | z) = p∗ξ (ξ, y | z) for each y , z , Y ⊥⊥ Z [p∗ξ ].

Does such a distribution exist?

p(ξ, y | z) = p∗ξ (ξ, y | z) ≤

p∗ξ (y | z) = p∗ξ (y)

So clearly maxz

p(ξ, y | z) ≤ p∗ξ (y)

∑y

maxz

p(ξ, y | z) ≤ 1.

By maxing over ξ, the instrumental inequality follows.

We say that the probabilities p(x , y | z) are compatible with Y ⊥⊥ Z .

49 / 42

Page 148: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Solution: A Different Proof

For each x = ξ we require p∗ξ :

pξ(ξ, y | z) = p∗ξ (ξ, y | z) for each y , z , Y ⊥⊥ Z [p∗ξ ].

Does such a distribution exist?

p(ξ, y | z) = p∗ξ (ξ, y | z) ≤

p∗ξ (y | z) = p∗ξ (y)

So clearly maxz

p(ξ, y | z) ≤ p∗ξ (y)

∑y

maxz

p(ξ, y | z) ≤ 1.

By maxing over ξ, the instrumental inequality follows.

We say that the probabilities p(x , y | z) are compatible with Y ⊥⊥ Z .

49 / 42

Page 149: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Solution: A Different Proof

For each x = ξ we require p∗ξ :

pξ(ξ, y | z) = p∗ξ (ξ, y | z) for each y , z , Y ⊥⊥ Z [p∗ξ ].

Does such a distribution exist?

p(ξ, y | z) =

p∗ξ (ξ, y | z) ≤ p∗ξ (y | z) = p∗ξ (y)

So clearly maxz

p(ξ, y | z) ≤ p∗ξ (y)

∑y

maxz

p(ξ, y | z) ≤ 1.

By maxing over ξ, the instrumental inequality follows.

We say that the probabilities p(x , y | z) are compatible with Y ⊥⊥ Z .

49 / 42

Page 150: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Solution: A Different Proof

For each x = ξ we require p∗ξ :

pξ(ξ, y | z) = p∗ξ (ξ, y | z) for each y , z , Y ⊥⊥ Z [p∗ξ ].

Does such a distribution exist?

p(ξ, y | z) = p∗ξ (ξ, y | z) ≤ p∗ξ (y | z) = p∗ξ (y)

So clearly maxz

p(ξ, y | z) ≤ p∗ξ (y)

∑y

maxz

p(ξ, y | z) ≤ 1.

By maxing over ξ, the instrumental inequality follows.

We say that the probabilities p(x , y | z) are compatible with Y ⊥⊥ Z .

49 / 42

Page 151: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Solution: A Different Proof

For each x = ξ we require p∗ξ :

pξ(ξ, y | z) = p∗ξ (ξ, y | z) for each y , z , Y ⊥⊥ Z [p∗ξ ].

Does such a distribution exist?

p(ξ, y | z) = p∗ξ (ξ, y | z) ≤ p∗ξ (y | z) = p∗ξ (y)

So clearly maxz

p(ξ, y | z) ≤ p∗ξ (y)

∑y

maxz

p(ξ, y | z) ≤ 1.

By maxing over ξ, the instrumental inequality follows.

We say that the probabilities p(x , y | z) are compatible with Y ⊥⊥ Z .

49 / 42

Page 152: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Solution: A Different Proof

For each x = ξ we require p∗ξ :

pξ(ξ, y | z) = p∗ξ (ξ, y | z) for each y , z , Y ⊥⊥ Z [p∗ξ ].

Does such a distribution exist?

p(ξ, y | z) = p∗ξ (ξ, y | z) ≤ p∗ξ (y | z) = p∗ξ (y)

So clearly maxz

p(ξ, y | z) ≤ p∗ξ (y)∑y

maxz

p(ξ, y | z) ≤ 1.

By maxing over ξ, the instrumental inequality follows.

We say that the probabilities p(x , y | z) are compatible with Y ⊥⊥ Z .

49 / 42

Page 153: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Solution: A Different Proof

For each x = ξ we require p∗ξ :

pξ(ξ, y | z) = p∗ξ (ξ, y | z) for each y , z , Y ⊥⊥ Z [p∗ξ ].

Does such a distribution exist?

p(ξ, y | z) = p∗ξ (ξ, y | z) ≤ p∗ξ (y | z) = p∗ξ (y)

So clearly maxz

p(ξ, y | z) ≤ p∗ξ (y)∑y

maxz

p(ξ, y | z) ≤ 1.

By maxing over ξ, the instrumental inequality follows.

We say that the probabilities p(x , y | z) are compatible with Y ⊥⊥ Z .

49 / 42

Page 154: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Generalizing

How does this help us with other graphs?

The argument works precisely because cutting edges led to anindependence:

X

Z Y

Z is independent of Y in the graph after cutting edges emanating fromX .

So by the same argument, for fixed ξ, p(ξ, y , z) must be compatible witha (fictitious) distribution p∗ξ in which Y ⊥⊥ Z .

[Note for the IV model, the conditional distribution p(ξ, y | z) had to becompatible.]

50 / 42

Page 155: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Generalizing

How does this help us with other graphs?

The argument works precisely because cutting edges led to anindependence:

X

Z Y

Z is independent of Y in the graph after cutting edges emanating fromX .

So by the same argument, for fixed ξ, p(ξ, y , z) must be compatible witha (fictitious) distribution p∗ξ in which Y ⊥⊥ Z .

[Note for the IV model, the conditional distribution p(ξ, y | z) had to becompatible.]

50 / 42

Page 156: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Generalizing

How does this help us with other graphs?

The argument works precisely because cutting edges led to anindependence:

X

Z Y

Z is independent of Y in the graph after cutting edges emanating fromX .

So by the same argument, for fixed ξ, p(ξ, y , z) must be compatible witha (fictitious) distribution p∗ξ in which Y ⊥⊥ Z .

[Note for the IV model, the conditional distribution p(ξ, y | z) had to becompatible.]

50 / 42

Page 157: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Generalizing

How does this help us with other graphs?

The argument works precisely because cutting edges led to anindependence:

X

Z Y

Z is independent of Y in the graph after cutting edges emanating fromX .

So by the same argument, for fixed ξ, p(ξ, y , z) must be compatible witha (fictitious) distribution p∗ξ in which Y ⊥⊥ Z .

[Note for the IV model, the conditional distribution p(ξ, y | z) had to becompatible.]

50 / 42

Page 158: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

Generalizing

How does this help us with other graphs?

The argument works precisely because cutting edges led to anindependence:

X

Z Y

Z is independent of Y in the graph after cutting edges emanating fromX .

So by the same argument, for fixed ξ, p(ξ, y , z) must be compatible witha (fictitious) distribution p∗ξ in which Y ⊥⊥ Z .

[Note for the IV model, the conditional distribution p(ξ, y | z) had to becompatible.]

50 / 42

Page 159: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

d-Separation

A path is a sequence of edges in the graph; vertices may not be repeated.

A path from v to w is blocked by C ⊆ V \ {v ,w} if either

(i) any non-collider is in C :

c c

(ii) or any collider is not in C , nor has descendants in C :

d d

e

Two vertices v and w are d-separated given C ⊆ V \ {v ,w} if all pathsare blocked.

51 / 42

Page 160: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

d-Separation

A path is a sequence of edges in the graph; vertices may not be repeated.

A path from v to w is blocked by C ⊆ V \ {v ,w} if either

(i) any non-collider is in C :

c c

(ii) or any collider is not in C , nor has descendants in C :

d d

e

Two vertices v and w are d-separated given C ⊆ V \ {v ,w} if all pathsare blocked.

51 / 42

Page 161: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

d-Separation

A path is a sequence of edges in the graph; vertices may not be repeated.

A path from v to w is blocked by C ⊆ V \ {v ,w} if either

(i) any non-collider is in C :

c c

(ii) or any collider is not in C , nor has descendants in C :

d d

e

Two vertices v and w are d-separated given C ⊆ V \ {v ,w} if all pathsare blocked.

51 / 42

Page 162: Causal models and how to refute them. - Oxford Statisticsevans/York.pdfCausal models and how to refute them. Robin Evans University of Oxford ˜evans Statistics Seminar, University

d-Separation

A path is a sequence of edges in the graph; vertices may not be repeated.

A path from v to w is blocked by C ⊆ V \ {v ,w} if either

(i) any non-collider is in C :

c c

(ii) or any collider is not in C , nor has descendants in C :

d d

e

Two vertices v and w are d-separated given C ⊆ V \ {v ,w} if all pathsare blocked.

51 / 42