Syntax, fMRI, and neural implementation. Last time If we’re not behaviorists, we believe in stored ‘knowledge’ independent of behavior – E.g. knowledge

Syntax, fMRI, and neural implementation

Last time

• If we’re not behaviorists, we believe in stored ‘knowledge’ independent of behavior– E.g. knowledge of rules of a given grammar

independent of producing or comprehending• But in practice, we cannot probe this

knowledge except through ‘behavior’– Acceptability judgments, eye movements or

neural responses during comprehension, production

Last time

• We can use neuro techniques to address multiple kinds of questions—not just neural implementation– About representation, about processing, about

neural implementation of representations and processes

• If you think too hard about this, becomes hard to say why in principle you couldn’t have a syntactician that only collects neuro data…

Last time

• Be clear about what question you’re investigating and why

• Use your common sense when approaching neuro stuff

• Bad methods = no conclusions

Today

• Broca’s aphasia – some history• fMRI methods• Matchin Sprouse Hickok• Pallier Devauchelle Dehaene

Representation

• In practice, it’s currently really challenging to probe representational questions about syntax with neuro or ‘psycho’/’expensive’ techniques

• But just for illustration before we do implementation…

Representation

• Debate in the literature about whether coordination is asymmetrically represented or symmetrically represented

and

cats dogs

cats

and dogs

Representation

• Agreement attraction (processing phenomenon) give insight into the structure?– The movie about the goats are….

– The movie about the goats and cow are– The movie about the goat and cows are

Lago & Felser

Today

• Investigating ‘neural implementation of syntax’– What does this mean?

• Investigating ‘neural implementation of syntax’– What does this mean?

• Area(s) that instantiate the process of parsing (assigning groupings to string in comprehension)?

• Area(s) that instantiate the process of producing a structured sentence?

• Area(s) that store the ‘rules’—e.g. the information that a given NVN sequence must be structured in X fashion in language 1 and in Y fashion in language 2, or that a modifier precedes its head in language 1 and not in language 2?

• Area(s) that store the lexical items—which contain much of the syntax according to lexicalized grammars?

• Area(s) that ‘buffer’ (short-term store) the structured representation of the sentence (the groupings of the lexical terminals)?

• Area that distinguishes humans from non-linguistic animals?

Localization (neural implementation)

• Why do we care?


• Why do we care?– Just an inherently interesting basic question about

humans (e.g. that neurons coding visual information are located in the back of the brain, etc.)


• Why do we care?– Might be a methodological pathway towards

answering a representational question (e.g. if I know area X stores nouns, I can use it to figure out whether something is a noun or not)

(process question)

• Does context impact visual perception at an early step in the computation or at a late step?

Murray et al. 2006



answering a representation or process question– If I know something ELSE about the properties of

the brain region, that could tell me something interesting about why this representation/process ended up here

Dehaene & Cohen, 2007

• ‘Neuronal Recycling’– Innate constraints in neural organization bias

learning– ‘Cultural acquisitions (e.g. reading) must find their

“neuronal niche”, in circuits that are innately set up to do something similar but are plastic enough to be modified for new use

– Prior organization of circuit can’t be completely overwritten, so also shapes cultural acquisition



answering a representation or process question– If I know something ELSE about the properties of

the brain region, that could tell me something interesting about why this representation/process ended up here

– Another measure of ‘similarity’ (if X and Y both localize to region A, they must share something)


• Why do we care?– Consistent localization across individuals tells me

something about innateness?– Clinical applications

• What does it mean to localize something like ‘syntax’?

The famous case of Broca’s area

• Classically, BA44 and BA45

Major Theories of Broca’s Area

• ~1870s-1970s – Speech production

‘TAN’(Leborgne)

Broca

Wernicke

Arcuate Fasciculus

Lichtheim


• ~1870s-1970s – Speech production• 1970s-1990s – Syntactic processing

Broca’s aphasic production deficits seem systematic

Agrammatism

• Disorder characterized by disfluent speech with few function words, not so many verbs, and lots of nouns

Broca’s aphasics have reliable comprehension deficits

Edgar Zurif

The apple that the boy is eating is redThe boy is being chased by the girl

Caramazza & Zurif, 1976

An abstract syntactic deficit?

• Caramazza & Zurif (1976) suggest that the same patients who have agrammatic production also have trouble in comprehension using syntax to get interpretation– The boy was hit by the girl


• ~1870s-1970s – Speech production• 1970s-1990s – Syntactic processing– Movement theory (Grodzinsky)

Movement Theory

Movement Theory (Trace-Deletion Hypothesis)

• Claim: patients have difficulty exactly with those sentences which require a long-distance relationship between elements of a sentence (a configuration usually referred to as movement in generative theories of grammar)

A critical problem

• Prior to early 1990s, MRI and fMRI are not widespread

• Therefore, for over 100 years, much of this theorizing had to be done WITHOUT knowing for sure where the patients’ damage was

A critical problem

• Most stunning illustration of this:

a Broca’s aphasic is NOT defined by damage to Broca’s area

a Broca’s aphasic is defined by a pattern of symptoms, may or may not be driven by damage to Broca’s area

A critical problem

• Today it is debatable whether the root problem in Broca’s original patients was actually in Broca’s area (IFG = BA 44/45)

Dronkers, 2007

Dronkers & Baldo, 2009

Fact 1

• There exists a group of patients that produce ‘telegraphic’ speech that is labored and has relatively few function words (100+ years data)

Fact 1

• There exists a group of patients that produce ‘telegraphic’ speech that is labored and has relatively few function words

• We call these patients agrammatic

Fact 2

• There exists a group of patients that do not do well at understanding sentences whose interpretation is not given by the lexical semantics + world knowledge (Caramazza & Zurif, 1976; Schwartz et al., 1980)

‘Nonreversible’ sentences

• Interpretation given by lexical semantics + world knowledgeThe boy is eating a red appleThe apple that the boy is eating is redThe apple was eaten by the boy

‘Reversible’ sentences

• Interpretation not given by lexical semantics + world knowledgeThe boy is being chased by the girlThe boy that the girl is chasing is tallThe girl is chasing the boy that is tall

Fact 2

• There exists a group of patients that do not do well at understanding ‘non-canonical’ sentences whose interpretation is not given by the lexical semantics + world knowledge– Let’s call this the NCR deficit (Non-Canonical

Reversible)

Claims (Caramazza & Zurif)

• (1) The NCR deficit is due to a problem with syntactic knowledge and/or syntactic parsing algorithms

• (2) Damage to Broca’s area causes the NCR deficit

• (3) The same underlying syntactic problem is responsible for the NCR deficit and agrammatic production deficits

Linebarger, Schwartz & Saffran,1983

• Evaluating the claim that the NCR deficit is due to a problem with syntactic knowledge and/or syntactic parsing algorithms

• Prediction: if this claim is correct, then patients with a deficit in this interpretation task will also fail at other tasks that require syntactic knowledge/processing

Linebarger, Schwartz & Saffran,1983

• Are the NCR patients also bad at assessing the grammaticality of sentences of English?– If so, it would support the claim that the

underlying cognitive deficit in these patients is syntactic

– If not, it would suggest that the cognitive deficit in these patients is not purely syntactic, but instead might have something to do with the mapping between syntax and interpretation

• Awareness of subcategorization requirements– ‘testimony to their considerable knowledge of the

subcategorization requirements of lexical items’– Not explainable by semantic strategy– Not explainable by prosodic strategy

• Able to compute constraints on long-distance dependencies

• Notice/compute functional elements– Not consistent with semantic strategy

Conclusion

• The NCR deficit that some patients have is not due to the absence of normal syntactic knowledge/processing routines

Remaining Questions

• If NCR patients have intact syntactic knowledge and processing, what causes their deficit on the picture-matching task?

Remaining Questions

• If NCR patients have intact syntactic knowledge and processing, what causes their deficit on the picture-matching task?– H1: A specific deficit in syntax-semantic mapping– H2: Limited resources (in the judgment task they

can focus all these resources on the syntactic analysis, but in the picture-matching task they have to divide them between syntactic and semantic analysis)





Thothathiri, Kimberg & Schwartz 2012

• 79 patients with MRI data on lesions• Correlate NCR deficit with lesion location

Thothathiri, Kimberg & Schwartz 2012

• NCR deficit correlates with damage to posterior temporal/inferior parietal cortex– This area has been suggested to be involved in

integrating different kinds of information– Consistent with H1, that this is a specific deficit in

mapping syntax to thematic roles





Thothathiri, Kimberg & Schwartz, 2012

• Also observe that in their sample, there is no clear relationship between agrammatic production and the NCR deficit (also, Berndt et al. 1996)





fMRI and syntax

• Despite the challenges to the aphasia data, a number of studies showed that in normal healthy adults, fMRI showed differential activity in Broca’s area depending on the syntax of the sentence

• Does this indicate that IFG is a ‘syntax’ area?

Ben-Shachar & Grodzinsky, 2004

Rogalsky & Hickok’s 6 theories of Broca’s Area

1. Syntactic movement (Grodzinsky)2. Hierarchical processing (Friederici)3. Ordering/Linearization (Bornkessel)4. Working memory (many)5. Cognitive control (Thompson-Schill, Novick)6. ‘Unification’/thematic role checking

(Hagoort, Caplan)

• Measurement• Temporal properties of BOLD response• Analysis principles

Functional MRI

• BOLD - Blood-Oxygenation-Level-Dependent • Oxygenated blood returns a stronger signal (less

decay) than deoxygenated blood

Functional MRI Contrast

• Oxygenated blood returns a stronger signal (less decay) than deoxygenated blood

• If we look at a visual cortex MR image after a stimulus is presented, do we expect it to be brighter (more signal) or darker (less signal)?– A: Brighter! Due to mechanisms still not well-understood,

much more oxygen is delivered in the blood than can be consumed, leading to excess oxygen

– ‘Watering the garden for the sake of one thirsty flower’ (Malonek & Grinvald)

Functional MRI Contrast

• Still measuring one slice at a time• Time required to collect a brain volume much more

of an issue• One full-brain measurement every 2-4seconds is

standard• Slices aren’t collected at exactly the same time, so

need to correct for this

Functional MRI Measurement

smoothing

• Measurement• Temporal properties of BOLD response• Analysis principles

Functional MRI

• What is the problem with using blood flow as a measure?– Blood flow is way slower than neurons firing!

• BOLD– Blood Oxygenation Level Dependent response

-6 -4 -2 0 2 4 6 8 10 12 14 16 18 20

-0.2

-0.1

0

0.1

0.2

0.3

0.4

TPAGBA44BA45BA47MTGSTSSTGPlan_polarS_occ

Reading Sentence – Reading list of consonant strings

• What is the problem with using blood flow as a measure?– This means that you usually can’t do an event-

related design and analysis in the same way as EEG and MEG, because all of the events would mash together

– Instead you do a regression over the timecourse of the entire experiment

• What design would be the easiest to analyze?

• MRI block designs

Doug Greve

• MRI event-related designs

Doug Greve

• Downside of fMRI: horrible temporal resolution

• Upside of fMRI: good spatial resolution– In 2 s, you can usually get a whole-brain BOLD

image that has ~3mm spatial resolution– Doesn’t depend on an ill-formed ‘source

localization’ model; in almost all cases there is only one pattern of blood oxygenation changes that could have driven the observed effects• If you see a BOLD change in an area of your image, you

pretty much know that blood oxygenation changed in that region in reality

• Practical Challenges in recording fMRI– No metal in body– No claustrophobia– No movement– No money left after you run an fMRI

experiment($500+ per hour)

• Great things about fMRI– No existing technique gives you more definitive

spatial information about neural activity in healthy humans

– Better than event-related EEG/MEG at measuring activity that’s not precisely time-locked

– Facilities everywhere– Due to massive popularity, lots of resources

available for data analysis

• Does activity in Broca’s area index a certain kind of syntactic representation, or does it index a more general sentence processing operation?

Localizing ‘syntax’

• What would it mean for a brain area to implement a syntactic ‘operation’ like movement?– Movement is not meant by most syntacticians to be a

process that happens in comprehension or production; it is meant as a formal condition on what strings of a language are acceptable

– Speakers of human languages must have knowledge of this condition on acceptability, but in exactly what sense do they appeal to this knowledge during comprehension?

Matchin et al.

• We know something independent about how wh-movement sentences are processed from psycholinguistics: they are processed predictively

• Which cat did the dog chase

Filled-gap effect

• Which cat did the dog chase the butterfly for?

Stowe, 1986

Plausibility mismatch

• Which city did the woman write

Plausibility mismatch

• Which city did the woman write a book about?

Traxler & Pickering, 1996

Matchin et al.

• Maybe left IFG is active in wh-dependency sentences because it supports the process of prediction, rather than being specific to a certain syntactic configuration

Matchin et al.

• Test this by looking at a case that doesn’t have the same representation as wh-dependencies but does involve prediction: backwards anaphora

Van Gompel & Liversedge, 2003; Kazanina et al. 2007

Matchin et al.

• Important methodological point: you can’t just compare backwards anaphora and wh-dependencies in fMRI, because they differ in a million ways

• So instead, you try to manipulate prediction effort with distance, and compare more similar sentences to each other

Procedure/Task

• Make sense? judgment

Functional Localizers

• Localize a simpler function in your set of subjects

• Use this to focus your analysis on areas that you know are doing this function (reducing false positives)

• Or use this to rule out a functional hypothesis

Functional localizer

Matchin et al. take-homes

• Processing backwards anaphora does engage left IFG

• Processing wh-dependencies does, but if anything, to a lesser extent

• More generally: a linking theory between the syntactic representation and how it is processed in comprehension or production is important for interpreting neural data

Localizing syntax: the ‘new wave’

• OK, but what if we just try to study the most basic aspects of phrase structure building, so that the link between the representation and process is as simple as possible?

Some intuitions…

• Smolensky superposition framework: tensor product of a role vector that codes for the syntactic role of the word and the filler vector that codes for the content (the word itself)

Lewis & Vasishth, 2005

Lots of past work: sentences vs words

• the

B - AB = the poet will recite a verseA = verse poet a recite will the

Lots of past work: sentences vs words

This is your brain on structure!

• Hmmm…lots of differences between sentences and word lists?

• Breaking it down:– Words vs. Jabberwocky– Parametric variation between sentences and lists

Maintenance

• Not exactly clear in this paper what happens when on a given step an input word is added into many open constituents vs. few

• In the current experiment, this question is avoided by using right-branching structures

Right-branching structures

• Means that none of the constituents are complete until the last word of the sentence (although you don’t know that in advance)

Right-branching structures

• Only exception in materials is DPs

Predictions

• Maintaining more words in the current structure should lead to more neural activity– Longer sentences = more activity

• In the 12-word constituent condition, when you get to word 12 you’re maintaining 11 filler-role tensor products

• In the 6-word constituent condition, when you get to word 12 you’re maintaining 5 filler-role tensor products

• Area under curves ≠

• fMRI is actually a perfect technique in this case because the key prediction has to do with the total activity smeared across the entire duration of the sentence

Procedure/Task

• Rapid serial visual presentation• Randomized presentation• Subjects have to detect rare ‘probe’ sentences

and be prepared for a post-experiment memory test on the words they saw

Results

Timing

Claims

• In processing syntactic constituents, IFG and pSTS do something like the vector product structure maintenance from Smolensky’s model– Isn’t pSTS Wernicke’s area…lexical semantics?

Claims

• In processing syntactic constituents, IFG and pSTS do something like the vector product structure maintenance from Smolensky’s model

• aSTS, TP, TPJ are involved in semantics– aSTS for smaller constituents, TP and TPJ for

discourse?

Claims

• In processing syntactic constituents, IFG and pSTS do something like the vector product structure maintenance from Smolensky’s model

• aSTS, TP, TPJ are involved in semantics– aSTS for smaller constituents, TP and TPJ for

discourse?• The fact that any areas track constituency for

Jabberwocky argues against a purely lexicalist framework for syntactic processing

Outstanding Questions• Alternative explanations– Are subjects just paying more attention to full sentences?– Is this just a classic Broca’s-area-memory-load result in

which comprehending sentences requires you to hold more items in memory than comprehending sub-sentences?

– If it is about maintaining constituents do they have to be connected? What if task probed memory for sub-sentential constituents?

Outstanding Questions

• What about sub-sentential constituents?– E.g. if two 6-word clauses are coordinated, is it more like

the 2 x 6-word condition or like the 12-word condition?

• What about morphology?• What about Lewis/McElree claims that there is

no active maintenance of working memory?

Breaking it down

• with William Matchin

Breaking it down…

flat caps shiny gowns

flat caps and shiny gowns

• See also Ding, Melloni & Poeppel (submitted) work on coding constituent structure with neural oscillations

Neural basis of ‘syntax’

• Simplistic framing is incoherent• But if we break this down into component

parts, with an accompanying processing theory, there are reasonable questions to pursue

Neural basis of syntax

• Past claims have suffered from many challenges

• But recent work suggests some very promising ways forward!

Documents

Syntax, fMRI, and neural implementation. Last time If we’re not behaviorists, we believe in stored ‘knowledge’ independent of behavior – E.g. knowledge