Upload
june-whitehead
View
216
Download
1
Tags:
Embed Size (px)
Citation preview
Causal Cognition 1:Causal Cognition 1:learninglearning
David LagnadoDavid LagnadoUniversity College London
Causal knowledgeCausal knowledge
Causality is ‘the cement of the universe’ Causality is ‘the cement of the universe’ Mirrored by our cognitive systemMirrored by our cognitive system Causal knowledge binds our concepts and Causal knowledge binds our concepts and
shapes our reasoningshapes our reasoning Fundamental to prediction, control, Fundamental to prediction, control,
explanation, attributionexplanation, attribution
How do people acquire this knowledge?How do people acquire this knowledge? How does it influence their reasoning?How does it influence their reasoning?
Internal models in learningInternal models in learning
People construct internal models to People construct internal models to represent the causal texture of the represent the causal texture of the environment (Tolman & Brunswik, environment (Tolman & Brunswik, 1935)1935)
Subsequent developments favoured Subsequent developments favoured probabilistic models (e.g., associative probabilistic models (e.g., associative networks; regression models; networks; regression models; connectionist models)connectionist models)
Internal models in learningInternal models in learning
Recent emphasis on causal models Recent emphasis on causal models Inspired by formal work on causality Inspired by formal work on causality
in AI and statistics in AI and statistics Pearl, 2000: people encode stable Pearl, 2000: people encode stable
aspects of their experience in terms aspects of their experience in terms of qualitative causal relationsof qualitative causal relations
Inverts common view that Inverts common view that probabilistic relations are primary probabilistic relations are primary
Sources of causal knowledge
Instruction Analogy
(both require prior causal knowledge) Direct perception?
– Michotte’s launching effect– Agency
Induction from patterns of experience
Causal learning
Infer causal relations from patterns of data
Often difficult because:– Probabilistic and incomplete data– Small samples– Different models can generate same
data How do people do it?
Methods
Variety of experimental paradigms– Physical set-ups– Exposure to data in on-line tasks– Presentation of summary statistics– Verbal scenarios
Measured responses– Explicit (e.g., judgments of structure or
strength)– Implicit (e.g., performance on control or
prediction task)
Models
Aim: construct a descriptive model of causal learning– Computational level (what & why)– Algorithmic/Process level (how)– Implementation level
Status of normative models– As standard of appraisal– As guide/framework for developing
computational models– Ideal learner
StructureStructure– Does a causal link exist?Does a causal link exist?
StrengthStrength– To what extent does MMR cause autism?To what extent does MMR cause autism?
Conceptually question of structure is Conceptually question of structure is primaryprimary– Need to posit a causal link before estimating Need to posit a causal link before estimating
its strengthits strength – But most psychological research concerned But most psychological research concerned
with strengthwith strength
Structure before strengthStructure before strength
MMR Autism
– E.g., Does MMR jab cause autism?E.g., Does MMR jab cause autism?
Learning about causal strength
Covariation theories– Associative (e.g., Shanks & Dickinson,
1989)– Rule-based (e.g., power PC: Cheng,
1997)
People estimate the strength of causal relations on the basis of covariation between events
Associative theories People form associations between event
representations Strength of association determined by
contingency between events Associations updated via Rescorla-Wagner
learning rule – incremental, error-driven – Equivalent to delta learning rule in neural
networks– computes delta P at asymptote
Delta P as normative index of degree of contingency delta P = P(E|C) – P(E|~C)
Power PC
People assume that objects have hidden causal powers (generative or preventative)
Strength of causal powers inferred from observed frequencies
Normatively derived given certain independence assumptions
Power p = delta P /(1 – P(E|~C)) Corresponds to Noisy-or gate in Causal
Bayes Net
Typical experimental paradigm Subjects given a cover story that identifies a
potential cause C and a potential effect E (e.g., drugs and recovery)
Learning phase– Exposure to numerous trials in which C is present (or
absent) and effect E present (or absent)
E.g., C = Drug taken E = Recovery
Test phase– Subjects judge the effectiveness of the cause
a b
c d
C
~C
E ~E
Comparison of models
Normative Process level
Empirical data
PsychologicalPlausibility
Associative
Computes d P at asymptoteBut d P not always appropriate index of causation
RW rule
Dynamics
Ratings sensitive to d P; But vary when d P constant
Continuity with animal learning and other kinds of human learning
Power p Normatively derived None
Ratings sensitive to pBut vary when p constant
Rational or computational level analysis
Summary
Neither associative nor power models give a complete account of the empirical data on people’s causal strength judgments
Perhaps no unitary model for strength estimates
People use various learning strategies according to context and probe questions
Incompleteness of covariation-based models
Focus on simple models where potential causes and effects pre-sorted (by time order, prior knowledge, instructions)
But people often confronted with more complex structures (many variables, different functional relations etc), and have to infer structure (what is a cause, what an effect)
Two approaches to causal Two approaches to causal learninglearning
Data-drivenData-driven– Focus on how people make causal judgments Focus on how people make causal judgments
from patterns of covariation (Cheng, Shanks & from patterns of covariation (Cheng, Shanks & Dickinson)Dickinson)
– Events pre-sorted as potential causes and Events pre-sorted as potential causes and effectseffects
– Estimate strength of causal links Estimate strength of causal links Hypothesis-drivenHypothesis-driven
– Learning guided by prior knowledge or Learning guided by prior knowledge or assumptions about structure (Waldmann & assumptions about structure (Waldmann & Hagmayer)Hagmayer)
But neither approach tells us how But neither approach tells us how structurestructure is learned is learned
Causal Bayesian networks Normative framework
– Spirtes, Glymour & Schienes, 1993; Pearl, 2000– Clarifies relationship between probabilistic data
and causal structure– Formalizes notion of intervention– Distinguishes between observation and Distinguishes between observation and
interventionintervention Development of various structure learning
algorithms– Constraint-based– Bayesian
Strong claim
Causal Bayes nets as model for representation, inference and learning
Adults and children use causal maps (Gopnik, Glymour et al., 2004)– Represent causal structure in terms of causal
Bayes nets– Predictions via Bayesian updating (with special
rules for interventions)– Use formal learning procedures to discover
causal structure
Problems? People often make causal judgments on
basis of a few trials but structure learning algorithms require large sample sizes
Structure learning models both over- and under-estimate human capabilities:– Memory and processing limitations– People are immersed in a spatiotemporal
environment with various other cues to causal structure (time order, spatial contiguity, …)
Experimental evidence lacking– Gopnik et al.’s data with children admit of
alternative explanations (and child can’t tell you!)
Cues to causal structureCues to causal structure
Multiple fallible cues Multiple fallible cues (cf. Einhorn & Hogarth, 1986)(cf. Einhorn & Hogarth, 1986)
– Statistical covariation Statistical covariation – Temporal order Temporal order – InterventionIntervention– Proximity (space & time)Proximity (space & time)– Similarity…Similarity…
These can cohere or conflict These can cohere or conflict – In natural environment cues often correlated In natural environment cues often correlated
Statistical covariation is focus of most Statistical covariation is focus of most researchresearch
But covariation alone is insufficient to infer But covariation alone is insufficient to infer uniqueunique causal structure causal structure
Two central cuesTwo central cues
Temporal order (study 1)Temporal order (study 1)– Previous work focuses on how time Previous work focuses on how time
delays affect strength estimates delays affect strength estimates – Not on structure questionsNot on structure questions
Intervention (study 2)Intervention (study 2)– Previous work does not fully distinguish Previous work does not fully distinguish
intervention from observationintervention from observation
– (Both studies from Lagnado & Sloman, 2006 (Both studies from Lagnado & Sloman, 2006 JEP:LMC)JEP:LMC)
Temporal orderTemporal order
Temporal order of events provides a basic Temporal order of events provides a basic cue to causal structurecue to causal structure
Causes occur before their effectsCauses occur before their effects Suggests simple heuristic: use temporal Suggests simple heuristic: use temporal
order as cue to causal orderorder as cue to causal order– If MMR jabs are reliably followed by autism, If MMR jabs are reliably followed by autism,
infer that MMR causes autisminfer that MMR causes autismMMR Autis
m
But temporal order is a fallible cueBut temporal order is a fallible cue
A B
A B
TIME
Order of appearance of Virus
A B
Structure suggested by temporal order
(A infects B)
Temporal order can be misleading
A B
A B
TIME
A
B
Another possible structure
(A and B infected by common cause C)
Order of appearance of Virus
Temporal order can be misleading
C
A B
A B
TIME
A
B
Yet another possible structure
(B infects A)
Order of appearance of Virus
Temporal order can be misleading
Study 1Study 1
Pits covariation against temporal order Pits covariation against temporal order 2 main questions2 main questions
– How does temporal order influence causal How does temporal order influence causal learning?learning?
– Do people make spurious inferences when Do people make spurious inferences when temporal order is misleading?temporal order is misleading?
Email virus taskEmail virus task Task Task
– Participants send viruses to a small computer Participants send viruses to a small computer networknetwork
– Must infer which connections are workingMust infer which connections are working Vary time order of receipt of informationVary time order of receipt of information
– Participants told that there is variability in time Participants told that there is variability in time delaysdelays
– Transmission between computers Transmission between computers – Between infection and appearance of virusBetween infection and appearance of virus
Learning phaseLearning phase
A
B
C D
1. Send Virus to A
2. Observe which other computers receive virus
3. Infer which connections work
C
A
B
Design of experimentDesign of experiment Participants complete four similar problemsParticipants complete four similar problems All problems have same underlying network All problems have same underlying network
structure structure Each problem displays viruses in a different Each problem displays viruses in a different
temporal ordertemporal order
Response mode – binary choice for each linkResponse mode – binary choice for each link
A
B
C D Links only work 80% of the time
No spontaneous viruses
Initial intervention always to send virus to A
100 test trials
Frequencies of patternsFrequencies of patterns
A
B
C DPatternPattern frequencyfrequency
ABCDABCD 51%51%
ABCABC 13%13%
ABDABD 13%13%
ABAB 3%3%
AA 20%20% Note: C and D are conditionally independent given B
Simultaneous
A
B
C DFOUR TIME
CONDITIONS
within-subject
Time order ABDC
A
B
C D
Time order ADCB
A
B
C D
A
B
C D
Time order AB[CD]
A
B
C D
Simultaneous
A
B
C D
Time order ABDC
A
B
C D
Time order ADCB
A
B
C D
Time order AB[CD]
Which connections are working?
Links endorsed by > 50% of subjects
Significantly > 50%
A
B
C D
A
B
C D
A
B
C D
Simultaneous
Time order ABDC
Time order AB[CD]
75% 62%
21% 75%
CHOICE: Use B or D to send message to C?
A
B
C D
Time order ADCB
ConclusionsConclusions
Time order ABDCA
B
C D
Subjects use time order to hypothesize Subjects use time order to hypothesize causal linkscausal links
They confirm or revise these links through They confirm or revise these links through patterns of covariation datapatterns of covariation data
Revision not optimalRevision not optimal
ConclusionsConclusions Subjects use time order to hypothesize Subjects use time order to hypothesize
causal linkscausal links They confirm or revise these links through They confirm or revise these links through
patterns of covariation datapatterns of covariation data Revision not optimalRevision not optimal
A
B
C D
Confirming pattern of data
51% of trials
ConclusionsConclusions Subjects use time order to hypothesize Subjects use time order to hypothesize
causal linkscausal links They confirm or revise these links through They confirm or revise these links through
patterns of covariation datapatterns of covariation data Revision not optimalRevision not optimal
A
B
C D
Disconfirming pattern of data
13% of trials
ConclusionsConclusions
A
B
C D
Add an extra link to account for data
Subjects use time order to hypothesize Subjects use time order to hypothesize causal linkscausal links
They confirm or revise these links through They confirm or revise these links through patterns of covariation datapatterns of covariation data
Revision not optimalRevision not optimal
ConclusionsConclusions
Temporal order cue overrides covariation Temporal order cue overrides covariation informationinformation
Can lead to spurious causal inferences & Can lead to spurious causal inferences & memory distortionsmemory distortions
Hypothesis-driven learningHypothesis-driven learning– Temporal order used to generate initial causal Temporal order used to generate initial causal
hypotheseshypotheses– Revised in light of covariational dataRevised in light of covariational data
Sequential testing of individual models Sequential testing of individual models rather than full Bayesian updating rather than full Bayesian updating
InterventionIntervention
Manipulating a variable in the system Manipulating a variable in the system – Conducting an experimentConducting an experiment– Often critical to establishing causal relationsOften critical to establishing causal relations– demodemo
Causal models as ‘oracles for intervention’ Causal models as ‘oracles for intervention’ (Pearl, 2000) (Pearl, 2000) – Prediction of consequences of actions (even Prediction of consequences of actions (even
those you have never tried)those you have never tried) A key benefit of intervention is that it can A key benefit of intervention is that it can
discriminate between ‘Markov equivalent’ discriminate between ‘Markov equivalent’ modelsmodels
Does listening to country music Does listening to country music cause suicide?cause suicide?
SuicideCountry Music
Country Music
Suicide
Strong correlation between air-time dedicated to country music and suicide rates (across districts)
Covariational data alone cannot distinguish between these models
Stack & Gundlach, 1992
Does listening to country music Does listening to country music cause suicide?cause suicide?
SuicideCountry Music
Country Music
Suicide
Intervene by banning Country music
Do suicide rates go down?
If so, model on left is correct
I I
Graph surgery (Pearl, 2000; Spirtes, Glymour & Schienes, 1993)
Formal representation of intervention
Benefits of interventionBenefits of intervention
Recent studies suggest that adults & children Recent studies suggest that adults & children (& rats?) can distinguish models by making (& rats?) can distinguish models by making appropriate interventionsappropriate interventions– Blaisdell et al., 2006; Gopnik et al., 2004; Lagnado Blaisdell et al., 2006; Gopnik et al., 2004; Lagnado
& Sloman, 2002, 2004; Sobel, 2003; Steyvers et al., & Sloman, 2002, 2004; Sobel, 2003; Steyvers et al., 20032003
But intervention and temporal order are often But intervention and temporal order are often confoundedconfounded– Interventions occur Interventions occur beforebefore their effects their effects– Learners might use temporal order rather than Learners might use temporal order rather than
intervention per seintervention per se
Temporal order based Temporal order based heuristic?heuristic?
Infer that changes that occur Infer that changes that occur afterafter one’s one’s intervention are effects of the intervened-intervention are effects of the intervened-on variableon variable
Inference involves no explicit Inference involves no explicit representation of how interventions can representation of how interventions can modify structuremodify structure
Interveners benefit from ‘surgery’ without Interveners benefit from ‘surgery’ without representing itrepresenting it
Study 2Study 2
Assess separate effects of intervention and Assess separate effects of intervention and temporal ordertemporal order
How do these cues combine, and what How do these cues combine, and what happens when they conflict?happens when they conflict?
Demo of slider study of slider study
DesignDesign Learning status crossed with temporal orderLearning status crossed with temporal order
Participants each completed 6 problems Participants each completed 6 problems Response mode – binary choice for each linkResponse mode – binary choice for each link
Time consistentTime consistent Time Time inconsistentinconsistent
(order reversed)(order reversed)
InterventionIntervention xx xx
Yoked Yoked interventionintervention
xx xx
ObservationObservation xx xx
within
between
Causal modelsCausal models
C
chain
A B
C
common cause
A
B
D
Ccommon
effect B
A
B
simple
A
chain + common cause C
BBAC
long chain
A B D
ResultsResults
• Intervention (active or yoked) better than observation
• Inconsistent time reduces learning in yoked intervention & observation
• But no effect on active intervention
0
10
20
30
40
50
60
70
80
90
100
consistent inconsistent
Time
% c
orr
ec
t m
od
el c
ho
ice
s
intervention
yoked intervention
observation
Follow-up studyFollow-up study
Why aren’t interveners affected by time Why aren’t interveners affected by time reversal?reversal?
Do interveners overcome inconsistent Do interveners overcome inconsistent temporal order by figuring out that temporal order by figuring out that variable information is reversed? variable information is reversed?
Use randomized rather than reverse Use randomized rather than reverse temporal ordertemporal order
ResultsResults
Intervention2 shows similar decline in performance with inconsistent temporal order
0
10
20
30
40
50
60
70
80
90
100
consistent inconsistent
Time
% c
orr
ec
t m
od
el c
ho
ice
s
intervention1
intervention2
yoked intervention
observation
ConclusionsConclusions
Intervention and temporal order provide Intervention and temporal order provide separate cues to causal structureseparate cues to causal structure– They work best when combinedThey work best when combined– This explains efficacy of interventions in natural This explains efficacy of interventions in natural
environmentsenvironments Both studies support hypothesis-driven Both studies support hypothesis-driven
accountaccount– Temporal order used to generate causal hypothesesTemporal order used to generate causal hypotheses– Confirmed or revised in light of covariation dataConfirmed or revised in light of covariation data– Choices of intervention are systematic and near Choices of intervention are systematic and near
optimaloptimal
Conclusions
Causal Bayes net formalism provides rich framework for exploring causal structure learning
But the cognitive mechanisms we use to infer causal structure seem to exploit spatiotemporal aspects of our environment not yet captured in these models
Causality, time perception & agency(Moore, Lagnado & Haggard, 2008)
Introduction
Sense of causal agency/experience of action
Key to notion of ourselves as agents– For learning– For effective control– For assigning responsibility
How do we arrive at this experience of agency?
Two accounts of action awareness
Predictive– Arises from predictive motor control processes
(Blakemore et al., 2002)
– ‘Forward model’ generated by motor system– Uses internal motoric information
Inferential– ‘Sense-making’ process that occurs after action
(Wegner, 2002)
– Principles of priority, consistency and exclusivity– No special role of intrinsic information
Implications
Chronometric– Predictive processes contribute to action awareness
before action– Inferential processes contribute after action, when
external information is available
Sensitivity to external context– Predictive account accentuates role of internal
information– Inferential account accentuates external information – “we are not intrinsically informed of our own
authorship” (Wegner, 2002)
Intentional binding (Haggard et al, 2002)
Implicit measure of sense of agency Subjective perception of time
distorted according to sense of agency
Intentional actions ‘bind’ with their effects
Binding paradigm
Watch clock Press key at freely chosen time Tone occurs shortly afterwards Subject notes time of action/tone
(B) Judgments: Baseline conditions Action only
Tone only
250ms(A) Physical events
Action Tone
(C) Intentional action + effect
+15 -46
189ms
-27 +31
(D) Involuntary movement + effect 308ms
Intentional binding
Two routes to binding (Moore & Haggard 2007)
Predictive– When P(Tone|Action) > .50– Action shifted towards tone, even if tone does
not occur Inferential
– When P(Tone|Action) = .50– Action only shifted towards tone when tone
occurs– Postdiction– Judged time of action retrospectively adjusted
if tone occurs
Causal agency
Causal agency not explicitly manipulated Notion of causal control depends on
comparison between what happens when action is taken vs. when not taken
Thus far binding just observed in contexts where subjects always act – They decide when to act, but not whether to act
Is binding sensitive to what happens on non-action trials? – e.g., to background context
Causal agency Depends on contingency between action and
effect Does A raise probability of E? In typical causal learning, people are sensitive
to dP = P(E|A) – P(E|~A) dP > 0: A causes E dP = 0: No causal relation
dP < 0: A prevents E Note - High base rate P(E|~A) potentially
undermines exclusivity of A as cause But does not affect prediction of E given A
Experimental question: Is binding modulated by
contingency? Does binding depend just on immediate
predictive processing in motor system?– E.g., mainly on P(E|A)
Or also dependent on background context of regularities in external world?– E.g., on both P(E|A) and P(E|~A) – dP
If latter, support for inferential view
Design
Tone-Probability
LOW HIGH
Contingent
(dP = 0.5)
P(T|A) = 0.5
P(T|~A) = 0
P(T|A) = 0.75
P(T|~A) = 0.25
Non-contingent
(dP = 0)
P(T|A) = 0.5
P(T|~A) = 0.5
P(T|A) = 0.75
P(T|~A) = 0.75
Between
Within
Method
On each trial subjects chose whether or not to press key
Tone occurs (250ms)/does not occur Enter clock time at which they pressed key Eight blocks of 20 trials
– 2 baseline (press key, no tone)– 3 contingent– 3 non-contingent
Judged causal efficacy of key press after each block
Low tone-probability: (baseline subtracted timing estimates)
Action only – no binding for contingent or non-contingent blocks
Action + tone – binding for contingent blocks only
‘Inferential’ binding but no ‘predictive’ binding
Action only – binding for contingent blocks only (‘predictive’)
Action + tone – binding for both blocks (Outcome density effect?)
High tone-probability: (baseline subtracted timing estimates)
Summary
‘Predictive’ and ‘inferential’ binding Both sensitive to contingency Causal control (as indexed by dP)
leads to binding of actions to effects
Conclusions
Sense of agency is not solely result of direct conscious access to signals within motor system
People form a causal model of statistical relation between action and tone … this model structures their experience of their own action
Motoric contribution to agency embedded in wider interpretative process that includes causal model & external context
Conclusions
Intimate relation between subjective time and causality
Time as cue to cause Cause as cue to time