Causal Cognition 1: learning David Lagnado University College London

Causal Cognition 1:Causal Cognition 1:learninglearning

David LagnadoDavid LagnadoUniversity College London

Causal knowledgeCausal knowledge

Causality is ‘the cement of the universe’ Causality is ‘the cement of the universe’ Mirrored by our cognitive systemMirrored by our cognitive system Causal knowledge binds our concepts and Causal knowledge binds our concepts and

shapes our reasoningshapes our reasoning Fundamental to prediction, control, Fundamental to prediction, control,

explanation, attributionexplanation, attribution

How do people acquire this knowledge?How do people acquire this knowledge? How does it influence their reasoning?How does it influence their reasoning?

Internal models in learningInternal models in learning

People construct internal models to People construct internal models to represent the causal texture of the represent the causal texture of the environment (Tolman & Brunswik, environment (Tolman & Brunswik, 1935)1935)

Subsequent developments favoured Subsequent developments favoured probabilistic models (e.g., associative probabilistic models (e.g., associative networks; regression models; networks; regression models; connectionist models)connectionist models)

Internal models in learningInternal models in learning

Recent emphasis on causal models Recent emphasis on causal models Inspired by formal work on causality Inspired by formal work on causality

in AI and statistics in AI and statistics Pearl, 2000: people encode stable Pearl, 2000: people encode stable

aspects of their experience in terms aspects of their experience in terms of qualitative causal relationsof qualitative causal relations

Inverts common view that Inverts common view that probabilistic relations are primary probabilistic relations are primary

Sources of causal knowledge

Instruction Analogy

(both require prior causal knowledge) Direct perception?

– Michotte’s launching effect– Agency

Induction from patterns of experience

Causal learning

Infer causal relations from patterns of data

Often difficult because:– Probabilistic and incomplete data– Small samples– Different models can generate same

data How do people do it?

Methods

Variety of experimental paradigms– Physical set-ups– Exposure to data in on-line tasks– Presentation of summary statistics– Verbal scenarios

Measured responses– Explicit (e.g., judgments of structure or

strength)– Implicit (e.g., performance on control or

prediction task)

Models

Aim: construct a descriptive model of causal learning– Computational level (what & why)– Algorithmic/Process level (how)– Implementation level

Status of normative models– As standard of appraisal– As guide/framework for developing

computational models– Ideal learner

StructureStructure– Does a causal link exist?Does a causal link exist?

StrengthStrength– To what extent does MMR cause autism?To what extent does MMR cause autism?

Conceptually question of structure is Conceptually question of structure is primaryprimary– Need to posit a causal link before estimating Need to posit a causal link before estimating

its strengthits strength – But most psychological research concerned But most psychological research concerned

with strengthwith strength

Structure before strengthStructure before strength

MMR Autism

– E.g., Does MMR jab cause autism?E.g., Does MMR jab cause autism?

Learning about causal strength

Covariation theories– Associative (e.g., Shanks & Dickinson,

1989)– Rule-based (e.g., power PC: Cheng,

1997)

People estimate the strength of causal relations on the basis of covariation between events

Associative theories People form associations between event

representations Strength of association determined by

contingency between events Associations updated via Rescorla-Wagner

learning rule – incremental, error-driven – Equivalent to delta learning rule in neural

networks– computes delta P at asymptote

Delta P as normative index of degree of contingency delta P = P(E|C) – P(E|~C)

Power PC

People assume that objects have hidden causal powers (generative or preventative)

Strength of causal powers inferred from observed frequencies

Normatively derived given certain independence assumptions

Power p = delta P /(1 – P(E|~C)) Corresponds to Noisy-or gate in Causal

Bayes Net

Typical experimental paradigm Subjects given a cover story that identifies a

potential cause C and a potential effect E (e.g., drugs and recovery)

Learning phase– Exposure to numerous trials in which C is present (or

absent) and effect E present (or absent)

E.g., C = Drug taken E = Recovery

Test phase– Subjects judge the effectiveness of the cause

a b

c d

C

~C

E ~E

Comparison of models

Normative Process level

Empirical data

PsychologicalPlausibility

Associative

Computes d P at asymptoteBut d P not always appropriate index of causation

RW rule

Dynamics

Ratings sensitive to d P; But vary when d P constant

Continuity with animal learning and other kinds of human learning

Power p Normatively derived None

Ratings sensitive to pBut vary when p constant

Rational or computational level analysis

Summary

Neither associative nor power models give a complete account of the empirical data on people’s causal strength judgments

Perhaps no unitary model for strength estimates

People use various learning strategies according to context and probe questions

Incompleteness of covariation-based models

Focus on simple models where potential causes and effects pre-sorted (by time order, prior knowledge, instructions)

But people often confronted with more complex structures (many variables, different functional relations etc), and have to infer structure (what is a cause, what an effect)

Two approaches to causal Two approaches to causal learninglearning

Data-drivenData-driven– Focus on how people make causal judgments Focus on how people make causal judgments

from patterns of covariation (Cheng, Shanks & from patterns of covariation (Cheng, Shanks & Dickinson)Dickinson)

– Events pre-sorted as potential causes and Events pre-sorted as potential causes and effectseffects

– Estimate strength of causal links Estimate strength of causal links Hypothesis-drivenHypothesis-driven

– Learning guided by prior knowledge or Learning guided by prior knowledge or assumptions about structure (Waldmann & assumptions about structure (Waldmann & Hagmayer)Hagmayer)

But neither approach tells us how But neither approach tells us how structurestructure is learned is learned

Causal Bayesian networks Normative framework

– Spirtes, Glymour & Schienes, 1993; Pearl, 2000– Clarifies relationship between probabilistic data

and causal structure– Formalizes notion of intervention– Distinguishes between observation and Distinguishes between observation and

interventionintervention Development of various structure learning

algorithms– Constraint-based– Bayesian

Strong claim

Causal Bayes nets as model for representation, inference and learning

Adults and children use causal maps (Gopnik, Glymour et al., 2004)– Represent causal structure in terms of causal

Bayes nets– Predictions via Bayesian updating (with special

rules for interventions)– Use formal learning procedures to discover

causal structure

Problems? People often make causal judgments on

basis of a few trials but structure learning algorithms require large sample sizes

Structure learning models both over- and under-estimate human capabilities:– Memory and processing limitations– People are immersed in a spatiotemporal

environment with various other cues to causal structure (time order, spatial contiguity, …)

Experimental evidence lacking– Gopnik et al.’s data with children admit of

alternative explanations (and child can’t tell you!)

Cues to causal structureCues to causal structure

Multiple fallible cues Multiple fallible cues (cf. Einhorn & Hogarth, 1986)(cf. Einhorn & Hogarth, 1986)

– Statistical covariation Statistical covariation – Temporal order Temporal order – InterventionIntervention– Proximity (space & time)Proximity (space & time)– Similarity…Similarity…

These can cohere or conflict These can cohere or conflict – In natural environment cues often correlated In natural environment cues often correlated

Statistical covariation is focus of most Statistical covariation is focus of most researchresearch

But covariation alone is insufficient to infer But covariation alone is insufficient to infer uniqueunique causal structure causal structure

Two central cuesTwo central cues

Temporal order (study 1)Temporal order (study 1)– Previous work focuses on how time Previous work focuses on how time

delays affect strength estimates delays affect strength estimates – Not on structure questionsNot on structure questions

Intervention (study 2)Intervention (study 2)– Previous work does not fully distinguish Previous work does not fully distinguish

intervention from observationintervention from observation

– (Both studies from Lagnado & Sloman, 2006 (Both studies from Lagnado & Sloman, 2006 JEP:LMC)JEP:LMC)

Temporal orderTemporal order

Temporal order of events provides a basic Temporal order of events provides a basic cue to causal structurecue to causal structure

Causes occur before their effectsCauses occur before their effects Suggests simple heuristic: use temporal Suggests simple heuristic: use temporal

order as cue to causal orderorder as cue to causal order– If MMR jabs are reliably followed by autism, If MMR jabs are reliably followed by autism,

infer that MMR causes autisminfer that MMR causes autismMMR Autis

m

But temporal order is a fallible cueBut temporal order is a fallible cue

A B

A B

TIME

Order of appearance of Virus

A B

Structure suggested by temporal order

(A infects B)

Temporal order can be misleading

A B

A B

TIME

A

B

Another possible structure

(A and B infected by common cause C)



C

A B

A B

TIME

A

B

Yet another possible structure

(B infects A)



Study 1Study 1

Pits covariation against temporal order Pits covariation against temporal order 2 main questions2 main questions

– How does temporal order influence causal How does temporal order influence causal learning?learning?

– Do people make spurious inferences when Do people make spurious inferences when temporal order is misleading?temporal order is misleading?

Email virus taskEmail virus task Task Task

– Participants send viruses to a small computer Participants send viruses to a small computer networknetwork

– Must infer which connections are workingMust infer which connections are working Vary time order of receipt of informationVary time order of receipt of information

– Participants told that there is variability in time Participants told that there is variability in time delaysdelays

– Transmission between computers Transmission between computers – Between infection and appearance of virusBetween infection and appearance of virus

Learning phaseLearning phase

A

B

C D

1. Send Virus to A

2. Observe which other computers receive virus

3. Infer which connections work

C

A

B

Design of experimentDesign of experiment Participants complete four similar problemsParticipants complete four similar problems All problems have same underlying network All problems have same underlying network

structure structure Each problem displays viruses in a different Each problem displays viruses in a different

temporal ordertemporal order

Response mode – binary choice for each linkResponse mode – binary choice for each link

A

B

C D Links only work 80% of the time

No spontaneous viruses

Initial intervention always to send virus to A

100 test trials

Frequencies of patternsFrequencies of patterns

A

B

C DPatternPattern frequencyfrequency

ABCDABCD 51%51%

ABCABC 13%13%

ABDABD 13%13%

ABAB 3%3%

AA 20%20% Note: C and D are conditionally independent given B

Simultaneous

A

B

C DFOUR TIME

CONDITIONS

within-subject

Time order ABDC

A

B

C D

Time order ADCB

A

B

C D

A

B

C D

Time order AB[CD]

A

B

C D

Simultaneous

A

B

C D

Time order ABDC

A

B

C D

Time order ADCB

A

B

C D

Time order AB[CD]

Which connections are working?

Links endorsed by > 50% of subjects

Significantly > 50%

A

B

C D

A

B

C D

A

B

C D

Simultaneous

Time order ABDC

Time order AB[CD]

75% 62%

21% 75%

CHOICE: Use B or D to send message to C?

A

B

C D

Time order ADCB

ConclusionsConclusions

Time order ABDCA

B

C D

Subjects use time order to hypothesize Subjects use time order to hypothesize causal linkscausal links

They confirm or revise these links through They confirm or revise these links through patterns of covariation datapatterns of covariation data

Revision not optimalRevision not optimal

ConclusionsConclusions Subjects use time order to hypothesize Subjects use time order to hypothesize

causal linkscausal links They confirm or revise these links through They confirm or revise these links through

patterns of covariation datapatterns of covariation data Revision not optimalRevision not optimal

A

B

C D

Confirming pattern of data

51% of trials

ConclusionsConclusions Subjects use time order to hypothesize Subjects use time order to hypothesize

causal linkscausal links They confirm or revise these links through They confirm or revise these links through

patterns of covariation datapatterns of covariation data Revision not optimalRevision not optimal

A

B

C D

Disconfirming pattern of data

13% of trials


A

B

C D

Add an extra link to account for data

Subjects use time order to hypothesize Subjects use time order to hypothesize causal linkscausal links

They confirm or revise these links through They confirm or revise these links through patterns of covariation datapatterns of covariation data

Revision not optimalRevision not optimal


Temporal order cue overrides covariation Temporal order cue overrides covariation informationinformation

Can lead to spurious causal inferences & Can lead to spurious causal inferences & memory distortionsmemory distortions

Hypothesis-driven learningHypothesis-driven learning– Temporal order used to generate initial causal Temporal order used to generate initial causal

hypotheseshypotheses– Revised in light of covariational dataRevised in light of covariational data

Sequential testing of individual models Sequential testing of individual models rather than full Bayesian updating rather than full Bayesian updating

InterventionIntervention

Manipulating a variable in the system Manipulating a variable in the system – Conducting an experimentConducting an experiment– Often critical to establishing causal relationsOften critical to establishing causal relations– demodemo

Causal models as ‘oracles for intervention’ Causal models as ‘oracles for intervention’ (Pearl, 2000) (Pearl, 2000) – Prediction of consequences of actions (even Prediction of consequences of actions (even

those you have never tried)those you have never tried) A key benefit of intervention is that it can A key benefit of intervention is that it can

discriminate between ‘Markov equivalent’ discriminate between ‘Markov equivalent’ modelsmodels

Does listening to country music Does listening to country music cause suicide?cause suicide?

SuicideCountry Music

Country Music

Suicide

Strong correlation between air-time dedicated to country music and suicide rates (across districts)

Covariational data alone cannot distinguish between these models

Stack & Gundlach, 1992

Does listening to country music Does listening to country music cause suicide?cause suicide?

SuicideCountry Music

Country Music

Suicide

Intervene by banning Country music

Do suicide rates go down?

If so, model on left is correct

I I

Graph surgery (Pearl, 2000; Spirtes, Glymour & Schienes, 1993)

Formal representation of intervention

Benefits of interventionBenefits of intervention

Recent studies suggest that adults & children Recent studies suggest that adults & children (& rats?) can distinguish models by making (& rats?) can distinguish models by making appropriate interventionsappropriate interventions– Blaisdell et al., 2006; Gopnik et al., 2004; Lagnado Blaisdell et al., 2006; Gopnik et al., 2004; Lagnado

& Sloman, 2002, 2004; Sobel, 2003; Steyvers et al., & Sloman, 2002, 2004; Sobel, 2003; Steyvers et al., 20032003

But intervention and temporal order are often But intervention and temporal order are often confoundedconfounded– Interventions occur Interventions occur beforebefore their effects their effects– Learners might use temporal order rather than Learners might use temporal order rather than

intervention per seintervention per se

Temporal order based Temporal order based heuristic?heuristic?

Infer that changes that occur Infer that changes that occur afterafter one’s one’s intervention are effects of the intervened-intervention are effects of the intervened-on variableon variable

Inference involves no explicit Inference involves no explicit representation of how interventions can representation of how interventions can modify structuremodify structure

Interveners benefit from ‘surgery’ without Interveners benefit from ‘surgery’ without representing itrepresenting it

Study 2Study 2

Assess separate effects of intervention and Assess separate effects of intervention and temporal ordertemporal order

How do these cues combine, and what How do these cues combine, and what happens when they conflict?happens when they conflict?

Demo of slider study of slider study

DesignDesign Learning status crossed with temporal orderLearning status crossed with temporal order

Participants each completed 6 problems Participants each completed 6 problems Response mode – binary choice for each linkResponse mode – binary choice for each link

Time consistentTime consistent Time Time inconsistentinconsistent

(order reversed)(order reversed)

InterventionIntervention xx xx

Yoked Yoked interventionintervention

xx xx

ObservationObservation xx xx

within

between

Causal modelsCausal models

C

chain

A B

C

common cause

A

B

D

Ccommon

effect B

A

B

simple

A

chain + common cause C

BBAC

long chain

A B D

ResultsResults

• Intervention (active or yoked) better than observation

• Inconsistent time reduces learning in yoked intervention & observation

• But no effect on active intervention

0

10

20

30

40

50

60

70

80

90

100

consistent inconsistent

Time

% c

orr

ec

t m

od

el c

ho

ice

s

intervention

yoked intervention

observation

Follow-up studyFollow-up study

Why aren’t interveners affected by time Why aren’t interveners affected by time reversal?reversal?

Do interveners overcome inconsistent Do interveners overcome inconsistent temporal order by figuring out that temporal order by figuring out that variable information is reversed? variable information is reversed?

Use randomized rather than reverse Use randomized rather than reverse temporal ordertemporal order

ResultsResults

Intervention2 shows similar decline in performance with inconsistent temporal order

0

10

20

30

40

50

60

70

80

90

100

consistent inconsistent

Time

% c

orr

ec

t m

od

el c

ho

ice

s

intervention1

intervention2

yoked intervention

observation


Intervention and temporal order provide Intervention and temporal order provide separate cues to causal structureseparate cues to causal structure– They work best when combinedThey work best when combined– This explains efficacy of interventions in natural This explains efficacy of interventions in natural

environmentsenvironments Both studies support hypothesis-driven Both studies support hypothesis-driven

accountaccount– Temporal order used to generate causal hypothesesTemporal order used to generate causal hypotheses– Confirmed or revised in light of covariation dataConfirmed or revised in light of covariation data– Choices of intervention are systematic and near Choices of intervention are systematic and near

optimaloptimal

Conclusions

Causal Bayes net formalism provides rich framework for exploring causal structure learning

But the cognitive mechanisms we use to infer causal structure seem to exploit spatiotemporal aspects of our environment not yet captured in these models

Causality, time perception & agency(Moore, Lagnado & Haggard, 2008)

Introduction

Sense of causal agency/experience of action

Key to notion of ourselves as agents– For learning– For effective control– For assigning responsibility

How do we arrive at this experience of agency?

Two accounts of action awareness

Predictive– Arises from predictive motor control processes

(Blakemore et al., 2002)

– ‘Forward model’ generated by motor system– Uses internal motoric information

Inferential– ‘Sense-making’ process that occurs after action

(Wegner, 2002)

– Principles of priority, consistency and exclusivity– No special role of intrinsic information

Implications

Chronometric– Predictive processes contribute to action awareness

before action– Inferential processes contribute after action, when

external information is available

Sensitivity to external context– Predictive account accentuates role of internal

information– Inferential account accentuates external information – “we are not intrinsically informed of our own

authorship” (Wegner, 2002)

Intentional binding (Haggard et al, 2002)

Implicit measure of sense of agency Subjective perception of time

distorted according to sense of agency

Intentional actions ‘bind’ with their effects

Binding paradigm

Watch clock Press key at freely chosen time Tone occurs shortly afterwards Subject notes time of action/tone

(B) Judgments: Baseline conditions Action only

Tone only

250ms(A) Physical events

Action Tone

(C) Intentional action + effect

+15 -46

189ms

-27 +31

(D) Involuntary movement + effect 308ms

Intentional binding

Two routes to binding (Moore & Haggard 2007)

Predictive– When P(Tone|Action) > .50– Action shifted towards tone, even if tone does

not occur Inferential

– When P(Tone|Action) = .50– Action only shifted towards tone when tone

occurs– Postdiction– Judged time of action retrospectively adjusted

if tone occurs

Causal agency

Causal agency not explicitly manipulated Notion of causal control depends on

comparison between what happens when action is taken vs. when not taken

Thus far binding just observed in contexts where subjects always act – They decide when to act, but not whether to act

Is binding sensitive to what happens on non-action trials? – e.g., to background context

Causal agency Depends on contingency between action and

effect Does A raise probability of E? In typical causal learning, people are sensitive

to dP = P(E|A) – P(E|~A) dP > 0: A causes E dP = 0: No causal relation

dP < 0: A prevents E Note - High base rate P(E|~A) potentially

undermines exclusivity of A as cause But does not affect prediction of E given A

Experimental question: Is binding modulated by

contingency? Does binding depend just on immediate

predictive processing in motor system?– E.g., mainly on P(E|A)

Or also dependent on background context of regularities in external world?– E.g., on both P(E|A) and P(E|~A) – dP

If latter, support for inferential view

Design

Tone-Probability

LOW HIGH

Contingent

(dP = 0.5)

P(T|A) = 0.5

P(T|~A) = 0

P(T|A) = 0.75

P(T|~A) = 0.25

Non-contingent

(dP = 0)

P(T|A) = 0.5

P(T|~A) = 0.5

P(T|A) = 0.75

P(T|~A) = 0.75

Between

Within

Method

On each trial subjects chose whether or not to press key

Tone occurs (250ms)/does not occur Enter clock time at which they pressed key Eight blocks of 20 trials

– 2 baseline (press key, no tone)– 3 contingent– 3 non-contingent

Judged causal efficacy of key press after each block

Low tone-probability: (baseline subtracted timing estimates)

Action only – no binding for contingent or non-contingent blocks

Action + tone – binding for contingent blocks only

‘Inferential’ binding but no ‘predictive’ binding

Action only – binding for contingent blocks only (‘predictive’)

Action + tone – binding for both blocks (Outcome density effect?)

High tone-probability: (baseline subtracted timing estimates)

Summary

‘Predictive’ and ‘inferential’ binding Both sensitive to contingency Causal control (as indexed by dP)

leads to binding of actions to effects

Conclusions

Sense of agency is not solely result of direct conscious access to signals within motor system

People form a causal model of statistical relation between action and tone … this model structures their experience of their own action

Motoric contribution to agency embedded in wider interpretative process that includes causal model & external context

Conclusions

Intimate relation between subjective time and causality

Time as cue to cause Cause as cue to time

Documents

Causal Cognition 1: learning David Lagnado University College London