29
The 30th Annual Conference of the Cognitive Science Society Computational Modeling of Spoken Language Processing: A hands-on tutorial

The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

The 30th Annual Conference of the Cognitive Science Society

Computational Modeling of Spoken Language Processing:

A hands-on tutorial

Page 2: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

Computational Modeling of Spoken LanguageProcessing: A hands-on tutorial

Ted StraussNew School University

Dan MirmanJim Magnuson

University of Connecticut Department of Psychologyand

Haskins Laboratories

Page 3: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

Plan• Module 1: Introduction, About TRACE• Module 2: Tour of jTRACE• Module 3: Classic simulations• Module 4: Scripting• Module 5: Linking hypotheses• Module 6: Lab time, Q&A, one-on-one

Page 4: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

An aside• Why did we develop jTRACE?

– To facilitate large-scale modeling– To promote active testing of TRACE predictions and

wider use of modeling– To facilitate replication and sharing of simulations

• How did we develop jTRACE?– With a budget supplement to an NIDCD R01 and a

couple Ted years

• Why are we doing this tutorial?

Page 5: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

Module 5: Advanced topics• How do you decide whether a model has

succeeded or failed?– Connecting model to human behavior

• Pitfalls: simulations can fail at multiple levels– Theory -- most interesting/informative– Implementational details/parameters– Linking hypotheses -- not a model failure --

equivalent to flawed operational definitions inan experiment!

• Before assuming a failure has theoreticalimplications, other levels must be excluded

Page 6: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

Linking hypotheses• Informal: does model capture basic trends?• Formal: linking hypothesis• Link model to data by constructing task

constraints for the model analogous to thosefaced by human subjects

• Model: Activations over time• Data: Reaction times/accuracy for specific decisions

or behaviors (lexical decision, eye tracking, ERP)

Page 7: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

Simple example 1: Threshold• Recognition =

word unitactivation exceedsthreshold

• RT ≈ number ofprocessing cyclesfrom word onset

• Activation ~internal state; whatabout choicebehavior?

45

Page 8: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

Simple example 2: Response probability• RT ≈ number of

processing cycles fromword onset

• Additional competitionanalogous to human choicebehavior in many domains

• Formalization of overtchoice based on internalstates

• When to use: many choicesituations, but especiallyAFC

35

Page 9: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

Pitfalls in modeling1. Material selection2. Material manipulation3. Linking hypotheses4. Logic

Page 10: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

Pitfall 1: Material selection• Frauenfelder & Peeters (1998)• Feedback doesn’t help

– Half their items were recognizedmore quickly with feedback off

– Feedback only allows TRACE toaccount for top-down effects?

• 21 items: 7 phones long, UP atphone 4

• Magnuson, Strauss, & Harris(2005)

• Tested 900 words• With/without feedback• Increasing levels of noise• 73% of words recognized

more quickly w/feedback

Page 11: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

Pitfall 2: Material manipulation• Model materials must be held to same standard asbehavioral materials

• Material manipulations should have analogous effects• Marslen-Wilson & Warren, 1994 (subcategorical mismatch)• Hypothesis:

• If there is lateral inhibition between words,• Then, if we provide misleading coarticulation consistentwith a

•Word: substantial lexical competition•Nonword: less competition

Page 12: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

nettnet spliced with net

nektneck spliced with net

neptnep* spliced with net

RT = 487

RT = 609

RT = 610

Page 13: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

Pitfall 2: Material manipulation

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 4 7 10 13 16 19

Cycle / 4

Resp

on

se p

rob

ab

ilit

y (

W1)

W1W1

W2W1

N3W1

Words

Lexicaldecisionnett 487nept 609neckt 610

• Marslen-Wilson & Warren (1994): TRACE simulations

nettnecktnept

Page 14: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1
Page 15: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

Pitfall 2: Materialmanipulation

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 4 7 10 13 16 19

Cycle / 4

Resp

on

se p

rob

ab

ilit

y (

W1)

W1W1

W2W1

N3W1

WordsLexicaldecisionnett 487nept 609neckt 610

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0 5 10 15 20 25 30 35 40 45 50 55 60

Cycle

Re

ps

on

se

pro

ba

bil

ity

W1 | w1w1

W2 | w1w1

W1 | w2w1

W2 | w2w1

W1 | n3w1

W2 | n3w1

Closestreplication withappropriatecross-splicing

• Original splicing was so latethat “neck” was recognizedinstead of “net”!

• We cross-spliced at the latestslice where “net” was stillultimately recognized

Page 16: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

From TRACE activations

.

0.300.250.200.150.1040

50

60

70w1w1

w2w1

n3w1

Threshold

Pre

dic

ted

LD

T (

cycle

s)

w1w1 48n3w1 54w2w1 53

nettnecktnept

nettnecktnept

Pitfall 3: Linking hypotheses• How can we reconcile TRACE activations and human LD RTs?• Lexical decision need not be based on target• What if we set threshold so that W2’s activation sometimes passes?Lexical decisionnett 487nept 609neckt 610

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0 5 10 15 20 25 30 35 40 45 50 55 60

Cycle

Rep

so

nse p

rob

ab

ilit

y

W1 | w1w1

W2 | w1w1

W1 | w2w1

W2 | w2w1

W1 | n3w1

W2 | n3w1

Page 17: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

The “visual world” paradigm

200 ms400 ms700 ms

Target

Competitor

Unrelated

Time

Fixa

tion

prop

ortio

n

Ave

rage

d da

ta

Target

Unrelated

Unrelated

Competitor

Fixa

tion

prop

ortio

n

Time

Exa

mpl

e tri

als

Eye tracking

Page 18: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

Linking hypothesis• Link fixation data to TRACE word activations• Luce Choice rule: Activations and strengths based on

entire lexicon; choice only includes the 4 onscreenitems -- analogous to choice faced by subjects

Starget

(Starget + Scomp + Sd1 + Sd2)Ltarget =

• Provides clear and testable predictions about numberand nature of the items in the display

• See Dahan et al. (2001) for examples where thissimple linking hypothesis accurately predictschanges in fixation proportions depending on display

Page 19: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

Fixation data

.

1200100080060040020000.0

0.2

0.4

0.6

0.8

1.0

Msecs since target onset

Fix

ati

on

pro

po

rtio

n

W1 | w1w1 - ne(t)t

W1 | n3w1 - ne(p)t

W1 | w2w1 - ne(ck)t

Target (w1, net) fixations

W2 | w1w1 - ne(t)t

W2 | n3w1 - ne(p)t

W2 | w2w1 - ne(ck)t

Cohort (w2, neck) fixations

Model predictions

.

60504030201000.0

0.2

0.4

0.6

0.8

1.0

Cycle

Pre

dic

ted

fix

ati

on

pro

bab

ilit

y

w1w1 - ne(t)t

n3w1 - ne(p)t

w2w1 - ne(ck)t

Target (w1, net)

w1w1 - ne(t)t

n3w1 - ne(p)t

w2w1 - ne(ck)t

Cohort (w2, neck)

Dahan, Magnuson, Hogan & Tanenhaus, 2001

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0 12 24 36 48 60 72 84 96Cycle

Re

po

ns

e p

rob

ab

ilit

y

W1 | w1w1

W1 | w2w1

W1 | n3w1

D | w1w1

D | w2w1

D | n3w1

0.0

0.2

0.4

0.6

0.8

1.0

0 200 400 600 800 1000 1200 1400 1600

Msecs since target onset

Fix

ati

on

pro

po

rtio

n

W1 | w1w1

W1 | w2w1

W1 | n3w1

D | w1w1

D | w2w1

D | n3w1

Average vowel offset

Average word offset

Cohortabsent

Cohortpresent

Page 20: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

Advantages of linking hypotheses

• Formalizes model of both internal state and overtbehavior for specific tasks

• Also links cognitive systems -- e.g., visualattention and lexical activation

• Facilitates the interpretation of behavioral datafrom complex experimental paradigms

• Without careful, explicit modeling of taskconstraints via linking hypotheses, simple modelactivations can be misleading

Page 21: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

Pitfall 4: Logic• Sometimes, the most reasonable predictions turn

out to be wrong when you test them bysimulation

• Case in point: word frequency• Assumptions about modeling: resting activation ≈

bottom-up connection strength• Assumptions about empirical results:

– Absence of frequency effects in early responses– Therefore, frequency is a late, top-down bias

Page 22: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

Proposed loci of frequency• HF words recognized more quickly than LF words

• Early sampling (e.g., fast reactions) sometimes fails todetect frequency effects (Connine, Titone, & Wang, 1993)

• Conclusion: frequency is a late/2nd stage bias?Bottom-up dependent(connection strengths)

d L

Bed

Eb

Bell

Constant bias(resting levels)

d L

Bed

Eb

Bell

Late bias(???)

Late,external

d L

Bed

Eb

Bell

Page 23: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

Time course implications• Is a late frequency effect evidence for 2nd stage?• No; consistent with bottom-up dependency

Constant bias(resting levels)

Bottom-up dependent(connection strengths)

Late bias(???)

Page 24: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

Cohort frequencySample critical display• LF target (bell)• LF cohort (bench)• HF cohort (bed)

Page 25: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

Target HF Cohort LF Cohort Unrelated

100080060040020000.0

0.2

0.4

0.6

0.8

1.0

No frq.

Si = eka

i

Pred

icte

d fix

atio

n pr

opor

tion

Time since target onset (ms)

100080060040020000.0

0.2

0.4

0.6

0.8

1.0R' = R + s(log[f + 1])

Rest

100080060040020000.0

0.2

0.4

0.6

0.8

1.0

Post

Si' = Si(log[f + s])

100080060040020000.0

0.2

0.4

0.6

0.8

1.0

Weight

a'pi = api + api[s(log[f + 1])]

Modelpredictions

Page 26: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

Model-data comparisons

.

100080060040020000.0

0.2

0.4

0.6

0.8

1.0 Target

HF Cohort

LF Cohort

Unrelated

Msecs since target onset

Fix

ati

on

pro

po

rtio

n

Avg. target offset

Data

.

100080060040020000.0

0.2

0.4

0.6

0.8

1.0 Target

HF Cohort

LF Cohort

Unrelated

Time since target onset (msecs)P

red

icte

d F

ixa

tio

n P

rop

ort

ion

Model (weight version)

T H L D r2 .97 .96 .94 .80RMS .09 .03 .03 .06

Page 27: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

Gauging success and failure• Given an apparent mismatch between model and data, you must

avoid the pitfalls -- you must exclude:– Poor analogs to materials (representativeness, manipulations)– Insufficient linking hypothesis– Worst case: mismatch between expectations and data --

jTRACE was created to encourage more testing of logicalexpectations

• Then, if you still have a mismatch between model and data, youmust determine the level of the failure– Parameter value?– Specific aspects of model mechanisms– Theoretical assumptions

Page 28: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

Gauging success and failure• Model flexibility, model fit

– Fit measures: r2, RMS error– Debates over whether fit is sufficiently

constraining– Parameter space partitioning (Pitt et al., 2005)

• Comparing two models– Occam’s razor

• More important?– Breadth

Page 29: The 30th Annual Conference of the Cognitive Science Society · 2015-01-03 · 0.100.150.200.250.30 40 50 60 70 w1w1 w2w1 n3w1 Threshold Predicted LDT (cycles) w1w1 48 n3w1 54 w2w1

Using jTRACE• Now what?

– Work through the examples in the gallery (seedocumentation in ‘help’)

– Make a plan for doing your own simulations– Email us if you need help (or find bugs, or have

feature requests*)– Save your simulations! This is a great way to save

yourself work and facilitate replication

• Right now: ‘lab time’