Back to the future: Where we’re going, we don’t need phonemes. Implications of a gradient lexicon

Back to the future: Where we’re going, we don’t need phonemes.

Implications of a gradient lexicon.

Bob McMurrayUniversity of Iowa

Dept. of Psychology

Collaborators

Richard AslinMichael TanenhausDavid Gow

Joe ToscanoCheyenne MunsonMeghan ClayardsDana Subik

The students of the MACLab

In language, information arrives sequentially.

• Partial syntactic and semantic representations are formed as words arrive.

The cowboy chased by the

• Words are identified over sequential phonemes.

Linguistics department yodeled

Spoken Word Recognition is an ideal arena in which to study these issues because:

• Speech production gives us a lot of rich temporal information to use in this way.

• We have a clear understanding of the input (from phonetics).

• The output is easy to measure online

Online Comprehension

• Listeners form hypotheses as the input unfolds.

• Need measurements of how listeners interpret speech, moment-by-moment.

• May reveal how information is integrated:

- Discreteness vs. Gradiency- Combinatorial Units

Mechanisms of Temporal Integration

Stimuli do not change arbitrarily.

Perceptual cues reveal something about the change itself.

Active integration:• Anticipating future events• Retain current partial representations.• Resolve prior ambiguity.

Overview

1) Speech perception and Spoken Word Recognition.

time

Input: b... u… tt… e… r

beach

bumpputter

dog

butter

time


beach

bumpputter

dog

butter

2) Lexical activation is sensitive to fine-grained detail in speech.

4) Back in time: staying off the garden-path.

3) Where we’re going, we don’t need phonemes: evidence for continuous information integration.

Lexicon

Cue 1 Cue 2 time

Lexicon

Cue 1 Cue 2 timeCue 1 Cue 2 time

5) Forward to the future: coping with (and benefiting from) with phonological modification.

bakeryba…

basic

barrier

barricade bait

baby

Xkery

bakery

X

XXX

Online Word Recognition

• Information arrives sequentially• At early points in time, signal is temporarily ambiguous.

• Later arriving information disambiguates the word.

Current models of spoken word recognition

• Immediacy: Hypotheses formed from the earliest moments of input.

• Activation Based: Lexical candidates (words) receive activation to the degree they match the input.

• Parallel Processing: Multiple items are active in parallel.

• Competition: Items compete with each other for recognition.

time


beach

bump putter

dog

butter

These processes have been well defined for a phonemic representation of the input.

But considerably less ambiguity if we consider subphonemic information.

Example: subphonemic effects of motor processes.

Coarticulation

Sensitivity to these perceptual details might yield earlier disambiguation.

Example: CoarticulationArticulation (lips, tongue…) reflects current, future and past events.

Subtle subphonemic variation in speech reflects temporal organization.

n ne et c

k

Any action reflects future actions as it unfolds.

These processes have largely been ignored because of a history of evidence that perceptual variability gets discarded.

Example: Categorical Perception

Categorical Perception

B

P

Subphonemic variation in VOT is discarded in favor of a discrete symbol (phoneme).

• Sharp identification of tokens on a continuum.

VOT

0

100

PB

% /p

/

ID (%/pa/)0

100Discrim

ination

Discrimination

• Discrimination poor within a phonetic category.

Categorical Perception (CP)

Defined fundamental computational problems.

CP is output of • Speech perception

Input to • Phonology• Word recognition.

Phon

olog

y

Words

Phonemes

Sound

Sense


Enables a divide-and-conquer approach.

Phon

olog

y

Words

Phonemes

Sound

But, assumes that

1) Speech tasks tap phonemes (or something like them)

2) Phonemes (or something like them) are legitimate processing units.

Sense

Evidence against the strong form of Categorical Perception from psychophysical-type tasks:

Discrimination Tasks Pisoni and Tash (1974) Pisoni & Lazarus (1974)Carney, Widin & Viemeister (1977)

Training Samuel (1977)Pisoni, Aslin, Perey & Hennessy (1982)

Goodness Ratings Miller (1997)Massaro & Cohen (1983)

Classic explanation:

Auditory tasks: non-categoricalPhonological tasks: categorical

Paradigmatic CP: within-category variation is noise.Not important to higher language.


“Phonological/auditory” task distinction

• validates standard paradigm?

or

• assume standard paradigm?Ph

onol

ogy

Words

Phonemes

Sound

Sense

Cont. cues

(non-CP)

Minimal computational problem:

Computing meaning.

Phon

olog

y

Words

Phonemes

Sound


Computing meaning.

Phon

olog

y

Words

Phonemes

Sound

CP tasks don’t necessarily tap a stage of this problem. ?CP


Computing meaning.

Phon

olog

y

Words

Phonemes

Sound

CP tasks don’t necessarily tap a stage of this problem.

Lexical representation: clearly a component.

Goal: Reassess continuous sensitivity (non-CP) w.r.t. words

?Does within-category acoustic detail

systematically affect higher level language?

Is there a gradient effect of continusuous acoustic detail on lexical activation?

Experiment 1

A gradient relationship would yield systematic effects of subphonemic information on lexical activation.

If this gradiency is useful for temporal integration, it must be preserved over time.

Need a design sensitive to both acoustic detail and detailed temporal dynamics of lexical activation.

Experiment 1

McMurray, Aslin & Tanenhaus (2002)

Use a speech continuum—more steps yields a better picture acoustic mapping.

KlattWorks: generate synthetic continua from natural speech.

Acoustic Detail

9-step VOT continua (0-40 ms)

6 pairs of words.beach/peach bale/pale bear/pearbump/pump bomb/palm butter/putter

6 fillers.lamp leg lock ladder lip leafshark shell shoe ship sheep shirt

How do we tap on-line recognition?With an on-line task: Eye-movements

Subjects hear spoken language and manipulate objects in a visual world.

Visual world includes set of objects with interesting linguistic properties.

a beach, a peach and some unrelated items.

Eye-movements to each object are monitored throughout the task.

Temporal Dynamics

Tanenhaus, Spivey-Knowlton, Eberhart & Sedivy, 1995

• Relatively natural task.

• Eye-movements generated very fast (within 200ms of first bit of information).

• Eye movements time-locked to speech.

• Subjects aren’t aware of eye-movements.

• Fixation probability maps onto lexical activation..

Why use eye-movements and visual world paradigm?

A moment to view the items

Task

Task

Bear

Repeat 1080 times

By subject: 17.25 +/- 1.33ms By item: 17.24 +/- 1.24ms

High agreement across subjects and items for category boundary.

0 5 10 15 20 25 30 35 400

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

VOT (ms)

prop

ortio

n /p

/

B P

Identification Results

Task

Target = Bear

Competitor = Pear

Unrelated = Lamp, Ship

200 ms

1

2

3

4

5

Trials

Time

% fi

xatio

ns

Task

00.10.20.30.40.50.60.70.80.9

0 400 800 1200 1600 0 400 800 1200 1600 2000

Time (ms)

More looks to competitor than unrelated items.

VOT=0 Response= VOT=40 Response=Fi

xatio

n p

ropo

rtio

n

Task

Given that • the subject heard bear• clicked on “bear”…

How often was the subject looking at the “pear”?

Categorical Results Gradient Effect

target

competitortime

Fixa

tion

prop

ortio

n target

competitor competitorcompetitortime

Fixa

tion

prop

ortio

n target

Results

0 400 800 1200 16000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0 ms5 ms10 ms15 ms

VOT

0 400 800 1200 1600 2000

20 ms25 ms30 ms35 ms40 ms

VOT

Com

petit

or F

ixat

ions

Time since word onset (ms)

Response= Response=

Long-lasting gradient effect: seen throughout the timecourse of processing.

0 5 10 15 20 25 30 35 400.02

0.03

0.04

0.05

0.06

0.07

0.08

VOT (ms)

CategoryBoundary

Response= Response=

Looks to

Looks to C

ompe

titor

Fix

atio

ns

B: p=.017* P: p<.001***Clear effects of VOTLinear Trend B: p=.023* P: p=.002***

Area under the curve:

0 5 10 15 20 25 30 35 400.02

0.03

0.04

0.05

0.06

0.07

0.08

VOT (ms)

Response= Response=

Looks to

Looks to

B: p=.014* P: p=.001***Clear effects of VOTLinear Trend B: p=.009** P: p=.007**

Unambiguous Stimuli Only

CategoryBoundaryC

ompe

titor

Fix

atio

ns

Summary

Subphonemic acoustic differences in VOT have gradient effect on lexical activation.

• Gradient effect of VOT on looks to the competitor.

• Seems to be long-lasting.• Effect holds even for unambiguous stimuli.

Consistent with growing body of work using priming (Andruski, Blumstein & Burton, 1994; Utman, Blumstein & Burton, 2000; Gow, 2001, 2002).

1) Word recognition is systematically sensitive to subphonemic acoustic detail.

2) Continuous acoustic detail is represented as gradations in activation across the lexicon.

4) Gradient sensitivity coupled to normal word recognition processes enables the system to take advantage of subphonemic regularities for temporal integration.

An alternative framework

3) This can do the work of sublexical units like phonemes.

Lexical Sensitivity


Voicing Laterality, Manner, Place Natural Speech Vowel Quality Infant voicing categories


Metalinguistic Tasks

B

ShL

P

Extensions




0 5 10 15 20 25 30 35 40

VOT (ms)

CategoryBoundary

0

0.02

0.04

0.06

0.08

0.1

Response=BLooks to B

Response=PLooks to B

Com

petit

or F

ixat

ions

Extensions


0 5 10 15 20 25 30 35 40

VOT (ms)

CategoryBoundary

0

0.02

0.04

0.06

0.08

0.1

Response=BLooks to B

Response=PLooks to B

Com

petit

or F

ixat

ions



Extensions


Lexical Sensitivity




? Non minimal pairs? Duration of effect

(Exp 3-4)

time

Input: b... u… m… p…

bun

bumper

pump

dump

bump

bomb

2) Continuous acoustic detail is represented as gradations in activation across the lexicon.

If lexical processes can represent speech detail, do we need sublexical processes?

Perhaps:How are multiple cues (to the same phoneme) integrated? (Exp 2)

3) This can do the work of sublexical units like phonemes.

Regressive ambiguity resolution (exp 3-5):• Ambiguity retained until more information arrives.

Progressive expectation building (exp 5-6):• Phonetic distinctions are spread over time• Anticipate upcoming material.

Temporal Integration

4) Gradient sensitivity coupled to normal word recognition processes enables the system to take advantage of subphonemic regularities for temporal integration.

Overview





Lexicon

Cue 1 Cue 2 time

Lexicon


5) Forward in time: coping with (and benefiting from) with phonological modification.

Traditional speech chain: signal-> phonemes -> words

phonemes

words

Substitute your favorite sublexical unit here (syllables, diphones, etc)…

ə

Measuring phonemes

What do phonemes do?

phonemes

words

phonemes

words

1) Categorize continuous acoustic detail?

2) Integrate multiple cues?

3) Generalize phonological information during development?

4) Learning new words?

5) Speech production?

6) Reading?

Measuring phonemes

phonemes

words

phonemes

words

1) Categorize continuous acoustic detail? (categorizer)

2) Integrate multiple cues? (buffer)

What do you do with phonemes?

We have: an extremely sensitive measure of:- lexical activation

- temporal dynamicsCan we use it to assess sublexical processes?

- Categorization - Integration

Occam’s razor: if this stuff doesn’t happen until the information meets the lexicon then phonemes are

- not adding anything computationally.- not theoretically necessary

Measuring phonemes

No.

Computational Necessity of the phoneme.

Logic:Take the phoneme seriously.What does it do (computationally) (in a specific task)?Does that actually get done (during comprehension)?

The psychological reality of the phoneme?

May still “exist”…May still have computational necessity for other tasks…

1) Categorizing continuous detail

1) Categorize continuous acoustic detail. (phoneme as “categorizer”

Categorical perception: continuous detail discarded.- only in metalinguistic tasks (what are these

tapping?)

Gradient lexical activation: continuous detail systematically affects lexical activation.

Phonemesə

No phonemes required

2) Integrating multiple cues

2) Integrating multiple cues. (phoneme as “buffer”)

Phonetic cues for a given phoneme are spread out over time.

• Combined at phoneme level prior to accessing lexicon?

or

• Direct access to lexicon. Lexical integration?

Integrating multiple cues

Lexicon

time

Phoneme

Cue 1 Cue 2 time

Cue 2Cue 1Phoneme Story

• Integration before lexical access

• “Buffer”

The Alternative

• Integration at the Lexicon

The Logic

The Logic:

1) Assess Temporal Integration.

- Use two asynchronous cues to single phoneme.- Assess lexical activation over time.

Simultaneous effects: phonemic integration.Asynchronous effects: lexical integration.

The Logic

The Logic:

1) Assess Temporal Integration.

- Use two asynchronous cues to single phoneme.- Assess lexical activation over time.

Simultaneous effects: phonemic integration.Asynchronous effects: lexical integration.Which cues?

VOT Vowel Length

Asynchronous cues

Both covary with speaking rate: rate normalization

Asynchronous cues to voicing: VOT Vowel Length

VOT Vowel LengthVOT Vowel Length

Phonetic Context

Asynchronous cues to voicing: VOT Vowel Length

Both covary with speaking rate: rate normalization

Manner of Articulation Formant Transition Slope (FTSlope): Temporal cue like VOT: covaries with vowel length.

belt

welt

9-step VOT continua (0-40 ms) beach/peachbeak/peakbees/peas

9-step formant transition slopebench/wenchbelt/weltbell/well

2 Vowel Lengths x

The usual task

1080 Trials

Experiment 2

Results

Step 1: Assess gradiency

Step 2:Assess temporal integration

9-step b/w continuaVOT varied.

bench/wenchbelt/weltbell/well

Manner Continua

00.10.20.30.40.50.60.70.80.91

1 2 3 4 5 6 7 8 9

Step

% W

LongShort

N=36

Manner Continua

Experiment 1

Looks to competitor

Clicked on Clicked on

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Time (ms)

Look

s to

B

12345

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Time (ms)

Look

s to

W

-5-4-3-2-1

All ConsB .0001 .003W .0001 .01

Gradiency Results

Exp 2

All ConsManner B

W

Voicing BP

Gradiency Results

All ConsManner B

W

Voicing B P

Gradiency Results

All Cons All ConsManner B

W

Voicing B P

Replication

• Replication of gradiency with Manner + Voicing.

• Phoneme not required as “categorizer” (categorization isn’t happening)

Integrating multiple cues

Lexicon

time

Phoneme

Cue 1 Cue 2 time

Cue 2Cue 1Phoneme Story

• Integration before the Lexicon

• “Buffer”

The Alternative

• Integration at the Lexicon

?Results: Temporal Dynamics

When do effects on lexical activation occur?

VOT / FTStep effects cooccurs with vowel length.(Phonemic Integration)

VOT / FTStep precedes vowel length.(Lexical integration)

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

-30 -25 -20 -15 -10 -5 0

Distance from Boundary (VOT)

Com

petit

or F

ixat

ions

Y = M320x + B M320 = 0

• VOT / FTStep: Regression slope of competitor fixations as a function of VOT.

Compute 2 effect sizes at each 20 ms time slice.

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Time (ms)

Loo

ks to

P

-5-4-3-2-1

t

Time = 320 ms…

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

-30 -25 -20 -15 -10 -5 0

Distance from Boundary (VOT)

Com

petit

or F

ixat

ions

Y = M720x + B

• VOT / FTStep: Regression slope of competitor fixations as a function of VOT.


0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Time (ms)

Loo

ks to

P

-5-4-3-2-1

t

Time = 720 ms…

VOT = 30

0

0.05

0.1

0.15

0.2

0.25

0 500 1000 1500 2000 2500

Time (ms)

Com

petit

or F

ixat

ions

LongShort


• Vowel Length: Difference (D) between fixations after hearing long vs. short vowel.

• Repeat for each time slice, subject.

L-S = VL

Resulting dataset…

Subject Time VOT (M) Vowel (D)1 20 -0.0023 0.0094

40 -0.0016 0.009560 -0.0008 0.0108…2000 0.06021 0.123

2 20 0.0014 0.009140 0.0018 0.008860 0.0029 0.0104…2000 0.0604 0.1223

…

Compute average effect size at each time slice.

When does it (statistically) depart from 0?

Voiced Sounds OnlyVOT: 660 ms

Vowel: 820 ms

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Time (ms)

Effe

ct S

ize

VOTVowel

t

Voicing Continua: Temporal Dynamics

Voiceless Sounds Only


-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 500 1000 1500 2000

Time (ms)

Effe

ct S

ize

VOTVowel

VOT: 640 msVowel: 780 ms

Combined


-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Time (ms)

Effe

ct S

ize

VOTVowel

VOT: 560 msVowel: 800 ms

Integration Results

VOT VowelVoicing B 660 820

P 640 780 All 560 800

Manner BWAll

Stops only

Manner Continua: Temporal Dynamics

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Time (ms)

Effe

ct S

ize

FTStepVowel

FTStep: 840 msVowel: 1340 ms

Approximants only


-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Time (ms)

Effe

ct S

ize

FTStepVowel


Combined


-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Time (ms)

Effe

ct S

ize

FTStepVowel


Integration Results

VOT/FT VowelVoicing B 660 820

P 640 780 All 560 800

Manner B 840 1340 W 280 860 All 620 820

Replication Underway

VOT/FT VowelVoicing B 660 820

P 640 780 All 560 800

Voicing II All 560 800 B 560 800 P 660 860

Manner B 840 1340 W 280 860 All 620 820

Manner II All 620 740 B 440 680 W 700 880

Exp 2: Summary

Phonemes not computationally necessary for:• Phonetic categorization.• Asynchronous cue integration.

words

ə

Still may play a role in • Development• Reading• Production…

Between signal and words there doesn’t appear to be any

• Complex computation• Nonlinearities

What role do intermediate units play?

Exp 2: Summary

Phonemes not computationally necessary for:• Phonetic categorization.• Asynchronous cue integration.

words

ə

What is doing the work?

Lexical activation processes.

Overview






Competitor still active - easy to activate it rest of the way.

Competitor completely inactive- system will “garden-path”.

P ( misperception ) distance from boundary.

Gradient activation allows the system to hedge its bets.

What if initial portion of a stimulus was misperceived?

Misperception

time

Input: p/b eI r ə k i t…

parakeetbarricade

Categorical Lexicon

barricade vs. parakeet

parakeet

barricade

Gradient Sensitivity

/ beIrəkeId / vs. / peIrəkit /

Experiment 3

Can gradient maintenance of lexical alternatives prevent system from wandering down “garden-path”?

By avoiding commitment to a discrete phoneme, can the system elegantlyrecover from earlymisperception?

Misperception

come on, gz.lt’s go!

əə

10 Pairs of b/p items.

Voiced Voiceless OverlapBumpercar Pumpernickel 6Barricade Parakeet 5Bassinet Passenger 5Blanket Plankton 5Beachball Peachpit 4Billboard Pillbox 4Drain Pipes Train Tracks 4Dreadlocks Treadmill 4Delaware Telephone 4Delicatessen Television 4

Experiment 3 Methods

X

05101520253035

0

0.2

0.4

0.6

0.8

1

300 600 900

Time (ms)

Fixa

tions

to T

arge

t

VOT

Barricade -> Parricade

Experiment 3: Results

Faster activation of target as VOTs near lexical endpoint.

—Even within the non-word range.

05101520253035

0

0.2

0.4

0.6

0.8

1

300 600 900

Time (ms)

Fixa

tions

to T

arge

t

VOT

Barricade -> Parricade Parakeet -> Barakeet

300 600 900 1200

Time (ms)

Faster activation of target as VOTs near lexical endpoint.

—Even within the non-word range.

Experiment 3: Results

Target: p=.001VOT: p = .0001T x V: p=.002

Voiceless: No sudden, categorical shift—just smooth gradiency.

0.4

0.45

0.5

0.55

0.6

0.65

0 5 10 15 20 25 30 35

Distance from Prototype (VOT, ms)

Fixa

tions

to T

arge

t

B

P

Target: p=.001VOT: p = .0001T x V: p=.002

Voiced:Categorical???

0.4

0.45

0.5

0.55

0.6

0.65

0 5 10 15 20 25 30 35


Fixa

tions

to T

arge

t

B

P

0.520.530.540.550.560.570.580.590.6

0.610.62

0 5 10 15 20 25 30 35


Fixa

tions

to T

arge

t

B

Experiment 3: Garden path

XXXb/parri…

Garden-path analysis:

Identify trials in which

Time FixationPre POD Competitor

XXXb/parricade

Garden-path analysis:

Identify trials in which

Time FixationPre POD CompetitorPost POD Target

Is the latency to switch to the target related to VOT?• Accelerated latency = more (residual) target activation

XXXb/parricade

Is the latency to switch to the target related to VOT?

• Accelerated latency = more (residual) target activation

Sparse databut:

100

120

140

160

180

200

220

240

260

0 5 10 15 20 25 30


Tim

e to

Tar

get

Voiced

Voiceless

B: p=.002P: p=.007

XXXb/parricade



VOT: p=.0001Targ: p=.001V x T: p>.1

120

140

160

180

200

220

240

260

280

0 5 10 15 20 25 30 35


Tim

e to

Tar

get

Voiced

Voiceless

More data…

Experiment 3: Summary

?

Same gradiency seen in McMurray et al (2002)

Facilitates ambiguity resolution (time-to-target)• 240 ms after VOT

Gradiency lasts…… and is useful

?????Idiosyncracies:

• Attenuated gradiency for B• Interactions with target (voiced/voiceless).

Effect of reduced VOT range?

Barricade -> Parricade

00.10.20.30.40.50.60.70.80.9

300 500 700 900 1100 1300Time (ms)

Fixa

tions

to T

arge

t

0.4

0.45

0.5

0.55

0.6

0.65

0 10 20 30 40


Fixa

tions

to T

arge

t

VoicedVoiceless

Target fixations.

Gradient VOT: - both targetsReduced target effectNo interaction

p=.0001

Replication: longer continua (0-45 ms).

Experiment 4

Exp 4Does the presence of the visual competitor (parakeet)

artificially heighten competitor activation (and cause the effect)?

XXX0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

300 500 700 900 1100 1300

Time (ms)

Fixa

tions

to T

arge

t

0

5

10

15

20

25

30

35

40

450

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

300 500 700 900 1100 1300

Time (ms)

Fixa

tions

to T

arge

t

0

5

10

15

20

25

30

35

40

45

XXXb/parricade



p=.0001

Experiment 4: Garden Path

100

150

200

250

0 10 20 30 40


Tim

e to

Tar

get

VoicedVoiceless

No Competitor

Gradiency:Gradient effect of within-category variation without minimal-pairs.

Regressive Ambiguity Resolution

Gradient effect long-lasting: mean POD = 240 ms.

Gradiency:Gradient effect of within-category variation without minimal-pairs.

Regressive Ambiguity Resolution

Regressive ambiguity resolution:• Subphonemic gradations maintained until more

information arrives.• Improves (or hinder) recovery from garden path.• Gradient lexical sensitivity prevents over-

committing to a garden-path interpretation.• Likely lexical locus: echoic memory decays

around 400 ms.

Gradient effect long-lasting: mean POD = 240 ms.

Overview






Progressive Expectation Formation

Can within-category detail be used to predict future acoustic/phonetic events?

Yes: Phonological regularities create systematic within-category variation.

• Predicts future events.

time

Input: m… a… rr… oo… ng… g… oo… s…

maroongoose

goatduck

Word-final coronal consonants (n, t, d) assimilate the place of the following segment.

Place assimilation -> ambiguous segments —anticipate upcoming material.

Experiment 5: Anticipation

Maroong Goose Maroon Duck

Assimilation is subphonemic, continuous, not discrete.

Experiment 5: Anticipation

F2 Transitions in /æC/ Contexts

Pitch Period155016001650

1700175018001850

Freq

uenc

y (H

z)

F3 Transitions in /æC/ Contexts

Pitch Period2550

2600

2650

2700

2750

2800

Freq

uenc

y (H

z)coronalassimilatedlabial

Subject hears “select the maroon duck”“select the maroon goose”“select the maroong goose”“select the maroong duck” *

We should see faster eye-movements to “goose” after assimilated consonants.

Results

Looks to “goose“ as a function of time

00.10.20.30.40.50.60.70.80.9

0 200 400 600Time (ms)

Fixa

tion

Prop

ortio

n

AssimilatedNon Assimilated

Onset of “goose” + oculomotor delay

Anticipatory effect on looks to non-coronal.

Inhibitory effect on looks to coronal (duck, p=.024)

0

0.05

0.1

0.15

0.2

0.25

0.3

0 200 400 600Time (ms)

Fixa

tion

Prop

ortio

n


Looks to “duck” as a function of time

Onset of “goose” + oculomotor delay

a quick runm picks you up.

a quick runm takes you down ***

When /p/ is heard, the bilabial feature can be assumed to come from assimilation (not an underlying /m/).

When /t/ is heard, the bilabial feature is likely to be from an underlying /m/.

Subject hears“select the mud drinker”“select the mudg gear” “select the mudg drinker

Critical Pair

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0 200 400 600 800 1000 1200 1400 1600 1800 2000Time (ms)

Fixa

tion

Prop

ortio

n

Initial Coronal:Mud Gear

Initial Non-Coronal:Mug Gear

Onset of “gear” Avg. offset of “gear” (402 ms)

Mudg Gear is initially ambiguous with a late bias towards “Mud”.

0

0.1

0.2

0.3

0.4

0.5

0.6

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Time (ms)

Fixa

tion

Prop

ortio

n

Initial Coronal: Mud Drinker

Initial Non-Coronal: Mug Drinker

Onset of “drinker” Avg. offset of “drinker (408 ms)

Mudg Drinker is also ambiguous with a late bias towards “Mug” (the /g/ has to come from somewhere).

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 200 400 600Time (ms)

Fixa

tion

Prop

ortio

n


Onset of “gear”

Looks to non-coronal (gear) following assimilated or non-assimilated consonant.

In the same stimuli/experiment there is also a progressive effect!

Assimilated(ambiguous)

Segment

SubsequentConsonant

predicts

resolves ambiguity

Phonological modification has a benefit and a cost:

• Creates predictive regularities.

• Creates ambiguity.

Gradiency:• Enables anticipation.• Retain both items until ambiguity resolution.

What is the mechanism?

What is the mechanism? Assimilated(ambiguous)

Segment

SubsequentConsonant

predicts

resolves ambiguity

Assimilated(ambiguous)

Segment

SubsequentConsonant

predicts

resolves ambiguity

Lexical activation• Activation/competition processes retain partial

representations.• Lexical processes are predictive by nature.

Prediction:• Lexical competition ought to inhibit these

processes.

Experiment 3: Extensions

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

200 300 400 500 600 700 800

Time (ms)

Loo

ks to

Lab

ial

Assim-LabialsLabials

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

200 300 400 500 600 700 800

Time (ms)

Loo

ks to

Lab

ial

AssimilatedNeutral

Compare progressive effect as a function of competition

Green/m Boat

Eight/Ape BabiesCompetition:100 ms delay in effect.

Reduction inmagnitude.

Sensitivity to subphonemic detail allows the system to simultaneously cope with and harness graded phonological modification.

• Increase priors on likely upcoming events.• Decrease priors on unlikely upcoming events.• Retain ambiguity until resolution occurs.

Lexical processes may play a pivotal role.

Conclusions

Lexical activation is exquisitely sensitiveto within-category detail.

3) Phonological (maroong goose): progressive facilitation, regressive ambiguity resolution.

This sensitivity enables integration over time at multiple levels and time-scales using normal word recognition mechanisms

2) Lexical (e.g. barricade/parakeet): regressive ambiguity resolution.

Lexicon

Cue 1 Cue 2 time

Lexicon


1) Phonetic (e.g. VOT / Rate): No need for “buffer”.

ə

Summary

Words

Phonemes

Sound

SenseThe standard paradigm

• Does not capture data.• Limits our thinking in

terms of how continuous detail might be used.

Summary

Words

Sound

SenseThe standard paradigm

• Does not capture data.• Limits our thinking in

terms of how continuous detail might be used.

The Alternative

• Continuous phonetic cues.• Integrated directly by

normal lexical processes.• Rich temporal integration.• Computationally simple.

Integration over time at multiple levels.1) Phonetic (e.g. VOT / Rate)2) Lexical (e.g. barricade/parakeet) 3) Phonological (maroong goose)

Conclusions

My lab

These are rich, general processes:• Sentential garden-paths.• V-to-V coarticulation and harmony.• Anticipatory R-coloration• Vowel Nasalization

• Prosodic domain (Keating)

• Reduction (Manuel)• Misproduction (Goldrick)• Bilingual categories (Ju)• Allophonic variation

(Samuel, McLennan)• “Parsing” (Gow, Fowler)

• V-to-V coarticulation (Beddor, Fowler, Cole).

• Assimilation (Byrd, Mitterer, Gow)

• Nasalization (Dahan, Fowler)• R-Coloration (Hillenbrand)• Vowel length/embedded words

(Crosswhite, Salverda)

Other folks

Take home message

Spoken language is defined by change.

But the information to cope with it is in the signal—if we look online.

Within-category acoustic variation is signal, not noise.

Back to the future: Where we’re going, we don’t need phonemes.

Implications of a gradient lexicon.

Bob McMurrayUniversity of Iowa

Dept. of Psychology

Head-Tracker Cam Monitor

IR Head-Tracker Emitters

EyetrackerComputer

SubjectComputer

Computers connected via Ethernet

Head

2 Eye cameras

Misperception: Additional Results

10 Pairs of b/p items.• 0 – 35 ms VOT continua.

20 Filler items (lemonade, restaurant, saxophone…)

Option to click “X” (Mispronounced).

26 Subjects

1240 Trials over two days.

0.000.100.200.300.400.500.600.700.800.901.00

0 5 10 15 20 25 30 35

Barricade

Res

pons

e R

ate

VoicedVoicelessNW

Identification Results

Parricade

0.000.100.200.300.400.500.600.700.800.901.00

0 5 10 15 20 25 30 35

VoicedVoicelessNW

Barakeet Parakeet

Res

pons

e R

ate

Significant target responses even at extreme.

Graded effects of VOT on correct response rate.

“Garden-path” effect:Difference between looks to each target (b

vs. p) at same VOT.

VOT = 0 (/b/)

0

0.2

0.4

0.6

0.8

1

0 500 1000

Time (ms)

Fixa

tions

to T

arge

t

BarricadeParakeet

VOT = 35 (/p/)

0 500 1000 1500

Time (ms)

Phonetic “Garden-Path”

-0.1

-0.05

0

0.05

0.1

0.15

0 5 10 15 20 25 30 35

VOT (ms)

Gar

den-

Path

Eff

ect

( Bar

rica

de -

Para

keet

)

-0.1

-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0 5 10 15 20 25 30 35

VOT (ms)

Gar

den-

Path

Eff

ect

( Bar

rica

de -

Para

keet

)

Target

Competitor

GP Effect:Gradient effect of VOT.

Target: p<.0001Competitor: p<.0001

Assimilation: Additional Results

Within-category detail used in recovering from assimilation: temporal integration.

• Anticipate upcoming material• Bias activations based on context

- Like barricade/parakeet: within-category detail retained to resolve ambiguity..

Phonological variation is a source of information.

Exp 3 & 4: Conclusions

• Similar properties in terms of starting and sparseness.

VOT

Categories• Competitive Hebbian Learning

(Rumelhart & Zipser, 1986).• Not constrained by a particular

equation—can fill space better.

Non-parametric approach?

9-step b/p continua VOT varied.

beach/peachbees/peasbeak/peak

Voicing Continua

00.10.20.30.40.50.60.70.80.91

0 5 10 15 20 25 30 35 40

VOT

% P

LongShort

N=29

Voicing Continua

Looks to competitor

Clicked on Clicked on

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Time (ms)Lo

oks t

o B

+1+2+3+4

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Time (ms)

Loo

ks to

P

-5-4-3-2-1

t

All ProtoB .0001 .0001P .008 >.1

Documents

Back to the future: Where we’re going, we don’t need phonemes. Implications of a gradient lexicon