Upload
benny
View
20
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Back to the future: Where we’re going, we don’t need phonemes. Implications of a gradient lexicon. Bob McMurray University of Iowa Dept. of Psychology. Collaborators. Richard Aslin Michael Tanenhaus David Gow. Joe Toscano Cheyenne Munson Meghan Clayards Dana Subik. - PowerPoint PPT Presentation
Citation preview
Back to the future: Where we’re going, we don’t need phonemes.
Implications of a gradient lexicon.
Bob McMurrayUniversity of Iowa
Dept. of Psychology
Collaborators
Richard AslinMichael TanenhausDavid Gow
Joe ToscanoCheyenne MunsonMeghan ClayardsDana Subik
The students of the MACLab
In language, information arrives sequentially.
• Partial syntactic and semantic representations are formed as words arrive.
The cowboy chased by the
• Words are identified over sequential phonemes.
Linguistics department yodeled
Spoken Word Recognition is an ideal arena in which to study these issues because:
• Speech production gives us a lot of rich temporal information to use in this way.
• We have a clear understanding of the input (from phonetics).
• The output is easy to measure online
Online Comprehension
• Listeners form hypotheses as the input unfolds.
• Need measurements of how listeners interpret speech, moment-by-moment.
• May reveal how information is integrated:
- Discreteness vs. Gradiency- Combinatorial Units
Mechanisms of Temporal Integration
Stimuli do not change arbitrarily.
Perceptual cues reveal something about the change itself.
Active integration:• Anticipating future events• Retain current partial representations.• Resolve prior ambiguity.
Overview
1) Speech perception and Spoken Word Recognition.
time
Input: b... u… tt… e… r
beach
bumpputter
dog
butter
time
Input: b... u… tt… e… r
beach
bumpputter
dog
butter
2) Lexical activation is sensitive to fine-grained detail in speech.
4) Back in time: staying off the garden-path.
3) Where we’re going, we don’t need phonemes: evidence for continuous information integration.
Lexicon
Cue 1 Cue 2 time
Lexicon
Cue 1 Cue 2 timeCue 1 Cue 2 time
5) Forward to the future: coping with (and benefiting from) with phonological modification.
bakeryba…
basic
barrier
barricade bait
baby
Xkery
bakery
X
XXX
Online Word Recognition
• Information arrives sequentially• At early points in time, signal is temporarily ambiguous.
• Later arriving information disambiguates the word.
Current models of spoken word recognition
• Immediacy: Hypotheses formed from the earliest moments of input.
• Activation Based: Lexical candidates (words) receive activation to the degree they match the input.
• Parallel Processing: Multiple items are active in parallel.
• Competition: Items compete with each other for recognition.
time
Input: b... u… tt… e… r
beach
bump putter
dog
butter
These processes have been well defined for a phonemic representation of the input.
But considerably less ambiguity if we consider subphonemic information.
Example: subphonemic effects of motor processes.
Coarticulation
Sensitivity to these perceptual details might yield earlier disambiguation.
Example: CoarticulationArticulation (lips, tongue…) reflects current, future and past events.
Subtle subphonemic variation in speech reflects temporal organization.
n ne et c
k
Any action reflects future actions as it unfolds.
These processes have largely been ignored because of a history of evidence that perceptual variability gets discarded.
Example: Categorical Perception
Categorical Perception
B
P
Subphonemic variation in VOT is discarded in favor of a discrete symbol (phoneme).
• Sharp identification of tokens on a continuum.
VOT
0
100
PB
% /p
/
ID (%/pa/)0
100Discrim
ination
Discrimination
• Discrimination poor within a phonetic category.
Categorical Perception (CP)
Defined fundamental computational problems.
CP is output of • Speech perception
Input to • Phonology• Word recognition.
Phon
olog
y
Words
Phonemes
Sound
Sense
Categorical Perception (CP)
Enables a divide-and-conquer approach.
Phon
olog
y
Words
Phonemes
Sound
But, assumes that
1) Speech tasks tap phonemes (or something like them)
2) Phonemes (or something like them) are legitimate processing units.
Sense
Evidence against the strong form of Categorical Perception from psychophysical-type tasks:
Discrimination Tasks Pisoni and Tash (1974) Pisoni & Lazarus (1974)Carney, Widin & Viemeister (1977)
Training Samuel (1977)Pisoni, Aslin, Perey & Hennessy (1982)
Goodness Ratings Miller (1997)Massaro & Cohen (1983)
Classic explanation:
Auditory tasks: non-categoricalPhonological tasks: categorical
Paradigmatic CP: within-category variation is noise.Not important to higher language.
Categorical Perception (CP)
“Phonological/auditory” task distinction
• validates standard paradigm?
or
• assume standard paradigm?Ph
onol
ogy
Words
Phonemes
Sound
Sense
Cont. cues
(non-CP)
Minimal computational problem:
Computing meaning.
Phon
olog
y
Words
Phonemes
Sound
Minimal computational problem:
Computing meaning.
Phon
olog
y
Words
Phonemes
Sound
CP tasks don’t necessarily tap a stage of this problem. ?CP
Minimal computational problem:
Computing meaning.
Phon
olog
y
Words
Phonemes
Sound
CP tasks don’t necessarily tap a stage of this problem.
Lexical representation: clearly a component.
Goal: Reassess continuous sensitivity (non-CP) w.r.t. words
?Does within-category acoustic detail
systematically affect higher level language?
Is there a gradient effect of continusuous acoustic detail on lexical activation?
Experiment 1
A gradient relationship would yield systematic effects of subphonemic information on lexical activation.
If this gradiency is useful for temporal integration, it must be preserved over time.
Need a design sensitive to both acoustic detail and detailed temporal dynamics of lexical activation.
Experiment 1
McMurray, Aslin & Tanenhaus (2002)
Use a speech continuum—more steps yields a better picture acoustic mapping.
KlattWorks: generate synthetic continua from natural speech.
Acoustic Detail
9-step VOT continua (0-40 ms)
6 pairs of words.beach/peach bale/pale bear/pearbump/pump bomb/palm butter/putter
6 fillers.lamp leg lock ladder lip leafshark shell shoe ship sheep shirt
How do we tap on-line recognition?With an on-line task: Eye-movements
Subjects hear spoken language and manipulate objects in a visual world.
Visual world includes set of objects with interesting linguistic properties.
a beach, a peach and some unrelated items.
Eye-movements to each object are monitored throughout the task.
Temporal Dynamics
Tanenhaus, Spivey-Knowlton, Eberhart & Sedivy, 1995
• Relatively natural task.
• Eye-movements generated very fast (within 200ms of first bit of information).
• Eye movements time-locked to speech.
• Subjects aren’t aware of eye-movements.
• Fixation probability maps onto lexical activation..
Why use eye-movements and visual world paradigm?
A moment to view the items
Task
Task
Bear
Repeat 1080 times
By subject: 17.25 +/- 1.33ms By item: 17.24 +/- 1.24ms
High agreement across subjects and items for category boundary.
0 5 10 15 20 25 30 35 400
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
VOT (ms)
prop
ortio
n /p
/
B P
Identification Results
Task
Target = Bear
Competitor = Pear
Unrelated = Lamp, Ship
200 ms
1
2
3
4
5
Trials
Time
% fi
xatio
ns
Task
00.10.20.30.40.50.60.70.80.9
0 400 800 1200 1600 0 400 800 1200 1600 2000
Time (ms)
More looks to competitor than unrelated items.
VOT=0 Response= VOT=40 Response=Fi
xatio
n p
ropo
rtio
n
Task
Given that • the subject heard bear• clicked on “bear”…
How often was the subject looking at the “pear”?
Categorical Results Gradient Effect
target
competitortime
Fixa
tion
prop
ortio
n target
competitor competitorcompetitortime
Fixa
tion
prop
ortio
n target
Results
0 400 800 1200 16000
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0 ms5 ms10 ms15 ms
VOT
0 400 800 1200 1600 2000
20 ms25 ms30 ms35 ms40 ms
VOT
Com
petit
or F
ixat
ions
Time since word onset (ms)
Response= Response=
Long-lasting gradient effect: seen throughout the timecourse of processing.
0 5 10 15 20 25 30 35 400.02
0.03
0.04
0.05
0.06
0.07
0.08
VOT (ms)
CategoryBoundary
Response= Response=
Looks to
Looks to C
ompe
titor
Fix
atio
ns
B: p=.017* P: p<.001***Clear effects of VOTLinear Trend B: p=.023* P: p=.002***
Area under the curve:
0 5 10 15 20 25 30 35 400.02
0.03
0.04
0.05
0.06
0.07
0.08
VOT (ms)
Response= Response=
Looks to
Looks to
B: p=.014* P: p=.001***Clear effects of VOTLinear Trend B: p=.009** P: p=.007**
Unambiguous Stimuli Only
CategoryBoundaryC
ompe
titor
Fix
atio
ns
Summary
Subphonemic acoustic differences in VOT have gradient effect on lexical activation.
• Gradient effect of VOT on looks to the competitor.
• Seems to be long-lasting.• Effect holds even for unambiguous stimuli.
Consistent with growing body of work using priming (Andruski, Blumstein & Burton, 1994; Utman, Blumstein & Burton, 2000; Gow, 2001, 2002).
1) Word recognition is systematically sensitive to subphonemic acoustic detail.
2) Continuous acoustic detail is represented as gradations in activation across the lexicon.
4) Gradient sensitivity coupled to normal word recognition processes enables the system to take advantage of subphonemic regularities for temporal integration.
An alternative framework
3) This can do the work of sublexical units like phonemes.
Lexical Sensitivity
1) Word recognition is systematically sensitive to subphonemic acoustic detail.
Voicing Laterality, Manner, Place Natural Speech Vowel Quality Infant voicing categories
Voicing Laterality, Manner, Place Natural Speech Vowel Quality Infant voicing categories
Metalinguistic Tasks
B
ShL
P
Extensions
1) Word recognition is systematically sensitive to subphonemic acoustic detail.
Voicing Laterality, Manner, Place Natural Speech Vowel Quality Infant voicing categories
Metalinguistic Tasks
0 5 10 15 20 25 30 35 40
VOT (ms)
CategoryBoundary
0
0.02
0.04
0.06
0.08
0.1
Response=BLooks to B
Response=PLooks to B
Com
petit
or F
ixat
ions
Extensions
1) Word recognition is systematically sensitive to subphonemic acoustic detail.
0 5 10 15 20 25 30 35 40
VOT (ms)
CategoryBoundary
0
0.02
0.04
0.06
0.08
0.1
Response=BLooks to B
Response=PLooks to B
Com
petit
or F
ixat
ions
Voicing Laterality, Manner, Place Natural Speech Vowel Quality Infant voicing categories
Metalinguistic Tasks
Extensions
1) Word recognition is systematically sensitive to subphonemic acoustic detail.
Lexical Sensitivity
1) Word recognition is systematically sensitive to subphonemic acoustic detail.
Voicing Laterality, Manner, Place Natural Speech Vowel Quality Infant voicing categories
Metalinguistic Tasks
? Non minimal pairs? Duration of effect
(Exp 3-4)
time
Input: b... u… m… p…
bun
bumper
pump
dump
bump
bomb
2) Continuous acoustic detail is represented as gradations in activation across the lexicon.
If lexical processes can represent speech detail, do we need sublexical processes?
Perhaps:How are multiple cues (to the same phoneme) integrated? (Exp 2)
3) This can do the work of sublexical units like phonemes.
Regressive ambiguity resolution (exp 3-5):• Ambiguity retained until more information arrives.
Progressive expectation building (exp 5-6):• Phonetic distinctions are spread over time• Anticipate upcoming material.
Temporal Integration
4) Gradient sensitivity coupled to normal word recognition processes enables the system to take advantage of subphonemic regularities for temporal integration.
Overview
1) Speech perception and Spoken Word Recognition.
2) Lexical activation is sensitive to fine-grained detail in speech.
4) Back in time: staying off the garden-path.
3) Where we’re going, we don’t need phonemes: evidence for continuous information integration.
Lexicon
Cue 1 Cue 2 time
Lexicon
Cue 1 Cue 2 timeCue 1 Cue 2 time
5) Forward in time: coping with (and benefiting from) with phonological modification.
Traditional speech chain: signal-> phonemes -> words
phonemes
words
Substitute your favorite sublexical unit here (syllables, diphones, etc)…
ə
Measuring phonemes
What do phonemes do?
phonemes
words
phonemes
words
1) Categorize continuous acoustic detail?
2) Integrate multiple cues?
3) Generalize phonological information during development?
4) Learning new words?
5) Speech production?
6) Reading?
Measuring phonemes
phonemes
words
phonemes
words
1) Categorize continuous acoustic detail? (categorizer)
2) Integrate multiple cues? (buffer)
What do you do with phonemes?
We have: an extremely sensitive measure of:- lexical activation
- temporal dynamicsCan we use it to assess sublexical processes?
- Categorization - Integration
Occam’s razor: if this stuff doesn’t happen until the information meets the lexicon then phonemes are
- not adding anything computationally.- not theoretically necessary
Measuring phonemes
No.
Computational Necessity of the phoneme.
Logic:Take the phoneme seriously.What does it do (computationally) (in a specific task)?Does that actually get done (during comprehension)?
The psychological reality of the phoneme?
May still “exist”…May still have computational necessity for other tasks…
1) Categorizing continuous detail
1) Categorize continuous acoustic detail. (phoneme as “categorizer”
Categorical perception: continuous detail discarded.- only in metalinguistic tasks (what are these
tapping?)
Gradient lexical activation: continuous detail systematically affects lexical activation.
Phonemesə
No phonemes required
2) Integrating multiple cues
2) Integrating multiple cues. (phoneme as “buffer”)
Phonetic cues for a given phoneme are spread out over time.
• Combined at phoneme level prior to accessing lexicon?
or
• Direct access to lexicon. Lexical integration?
Integrating multiple cues
Lexicon
time
Phoneme
Cue 1 Cue 2 time
Cue 2Cue 1Phoneme Story
• Integration before lexical access
• “Buffer”
The Alternative
• Integration at the Lexicon
The Logic
The Logic:
1) Assess Temporal Integration.
- Use two asynchronous cues to single phoneme.- Assess lexical activation over time.
Simultaneous effects: phonemic integration.Asynchronous effects: lexical integration.
The Logic
The Logic:
1) Assess Temporal Integration.
- Use two asynchronous cues to single phoneme.- Assess lexical activation over time.
Simultaneous effects: phonemic integration.Asynchronous effects: lexical integration.Which cues?
VOT Vowel Length
Asynchronous cues
Both covary with speaking rate: rate normalization
Asynchronous cues to voicing: VOT Vowel Length
VOT Vowel LengthVOT Vowel Length
Phonetic Context
Asynchronous cues to voicing: VOT Vowel Length
Both covary with speaking rate: rate normalization
Manner of Articulation Formant Transition Slope (FTSlope): Temporal cue like VOT: covaries with vowel length.
belt
welt
9-step VOT continua (0-40 ms) beach/peachbeak/peakbees/peas
9-step formant transition slopebench/wenchbelt/weltbell/well
2 Vowel Lengths x
The usual task
1080 Trials
Experiment 2
Results
Step 1: Assess gradiency
Step 2:Assess temporal integration
9-step b/w continuaVOT varied.
bench/wenchbelt/weltbell/well
Manner Continua
00.10.20.30.40.50.60.70.80.91
1 2 3 4 5 6 7 8 9
Step
% W
LongShort
N=36
Manner Continua
Experiment 1
Looks to competitor
Clicked on Clicked on
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Time (ms)
Look
s to
B
12345
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Time (ms)
Look
s to
W
-5-4-3-2-1
All ConsB .0001 .003W .0001 .01
Gradiency Results
Exp 2
All ConsManner B
W
Voicing BP
Gradiency Results
All ConsManner B
W
Voicing B P
Gradiency Results
All Cons All ConsManner B
W
Voicing B P
Replication
• Replication of gradiency with Manner + Voicing.
• Phoneme not required as “categorizer” (categorization isn’t happening)
Integrating multiple cues
Lexicon
time
Phoneme
Cue 1 Cue 2 time
Cue 2Cue 1Phoneme Story
• Integration before the Lexicon
• “Buffer”
The Alternative
• Integration at the Lexicon
?Results: Temporal Dynamics
When do effects on lexical activation occur?
VOT / FTStep effects cooccurs with vowel length.(Phonemic Integration)
VOT / FTStep precedes vowel length.(Lexical integration)
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
-30 -25 -20 -15 -10 -5 0
Distance from Boundary (VOT)
Com
petit
or F
ixat
ions
Y = M320x + B M320 = 0
• VOT / FTStep: Regression slope of competitor fixations as a function of VOT.
Compute 2 effect sizes at each 20 ms time slice.
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Time (ms)
Loo
ks to
P
-5-4-3-2-1
t
Time = 320 ms…
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
-30 -25 -20 -15 -10 -5 0
Distance from Boundary (VOT)
Com
petit
or F
ixat
ions
Y = M720x + B
• VOT / FTStep: Regression slope of competitor fixations as a function of VOT.
Compute 2 effect sizes at each 20 ms time slice.
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Time (ms)
Loo
ks to
P
-5-4-3-2-1
t
Time = 720 ms…
VOT = 30
0
0.05
0.1
0.15
0.2
0.25
0 500 1000 1500 2000 2500
Time (ms)
Com
petit
or F
ixat
ions
LongShort
Compute 2 effect sizes at each 20 ms time slice.
• Vowel Length: Difference (D) between fixations after hearing long vs. short vowel.
• Repeat for each time slice, subject.
L-S = VL
Resulting dataset…
Subject Time VOT (M) Vowel (D)1 20 -0.0023 0.0094
40 -0.0016 0.009560 -0.0008 0.0108…2000 0.06021 0.123
2 20 0.0014 0.009140 0.0018 0.008860 0.0029 0.0104…2000 0.0604 0.1223
…
Compute average effect size at each time slice.
When does it (statistically) depart from 0?
Voiced Sounds OnlyVOT: 660 ms
Vowel: 820 ms
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Time (ms)
Effe
ct S
ize
VOTVowel
t
Voicing Continua: Temporal Dynamics
Voiceless Sounds Only
Voicing Continua: Temporal Dynamics
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 500 1000 1500 2000
Time (ms)
Effe
ct S
ize
VOTVowel
VOT: 640 msVowel: 780 ms
Combined
Voicing Continua: Temporal Dynamics
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Time (ms)
Effe
ct S
ize
VOTVowel
VOT: 560 msVowel: 800 ms
Integration Results
VOT VowelVoicing B 660 820
P 640 780 All 560 800
Manner BWAll
Stops only
Manner Continua: Temporal Dynamics
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Time (ms)
Effe
ct S
ize
FTStepVowel
FTStep: 840 msVowel: 1340 ms
Approximants only
Manner Continua: Temporal Dynamics
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Time (ms)
Effe
ct S
ize
FTStepVowel
FTStep: 280 msVowel: 860 ms
Combined
Manner Continua: Temporal Dynamics
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Time (ms)
Effe
ct S
ize
FTStepVowel
FTStep: 620 msVowel: 880 ms
Integration Results
VOT/FT VowelVoicing B 660 820
P 640 780 All 560 800
Manner B 840 1340 W 280 860 All 620 820
Replication Underway
VOT/FT VowelVoicing B 660 820
P 640 780 All 560 800
Voicing II All 560 800 B 560 800 P 660 860
Manner B 840 1340 W 280 860 All 620 820
Manner II All 620 740 B 440 680 W 700 880
Exp 2: Summary
Phonemes not computationally necessary for:• Phonetic categorization.• Asynchronous cue integration.
words
ə
Still may play a role in • Development• Reading• Production…
Between signal and words there doesn’t appear to be any
• Complex computation• Nonlinearities
What role do intermediate units play?
Exp 2: Summary
Phonemes not computationally necessary for:• Phonetic categorization.• Asynchronous cue integration.
words
ə
What is doing the work?
Lexical activation processes.
Overview
1) Speech perception and Spoken Word Recognition.
2) Lexical activation is sensitive to fine-grained detail in speech.
4) Back in time: staying off the garden-path.
3) Where we’re going, we don’t need phonemes: evidence for continuous information integration.
5) Forward in time: coping with (and benefiting from) with phonological modification.
Competitor still active - easy to activate it rest of the way.
Competitor completely inactive- system will “garden-path”.
P ( misperception ) distance from boundary.
Gradient activation allows the system to hedge its bets.
What if initial portion of a stimulus was misperceived?
Misperception
time
Input: p/b eI r ə k i t…
parakeetbarricade
Categorical Lexicon
barricade vs. parakeet
parakeet
barricade
Gradient Sensitivity
/ beIrəkeId / vs. / peIrəkit /
Experiment 3
Can gradient maintenance of lexical alternatives prevent system from wandering down “garden-path”?
By avoiding commitment to a discrete phoneme, can the system elegantlyrecover from earlymisperception?
Misperception
come on, gz.lt’s go!
əə
10 Pairs of b/p items.
Voiced Voiceless OverlapBumpercar Pumpernickel 6Barricade Parakeet 5Bassinet Passenger 5Blanket Plankton 5Beachball Peachpit 4Billboard Pillbox 4Drain Pipes Train Tracks 4Dreadlocks Treadmill 4Delaware Telephone 4Delicatessen Television 4
Experiment 3 Methods
X
05101520253035
0
0.2
0.4
0.6
0.8
1
300 600 900
Time (ms)
Fixa
tions
to T
arge
t
VOT
Barricade -> Parricade
Experiment 3: Results
Faster activation of target as VOTs near lexical endpoint.
—Even within the non-word range.
05101520253035
0
0.2
0.4
0.6
0.8
1
300 600 900
Time (ms)
Fixa
tions
to T
arge
t
VOT
Barricade -> Parricade Parakeet -> Barakeet
300 600 900 1200
Time (ms)
Faster activation of target as VOTs near lexical endpoint.
—Even within the non-word range.
Experiment 3: Results
Target: p=.001VOT: p = .0001T x V: p=.002
Voiceless: No sudden, categorical shift—just smooth gradiency.
0.4
0.45
0.5
0.55
0.6
0.65
0 5 10 15 20 25 30 35
Distance from Prototype (VOT, ms)
Fixa
tions
to T
arge
t
B
P
Target: p=.001VOT: p = .0001T x V: p=.002
Voiced:Categorical???
0.4
0.45
0.5
0.55
0.6
0.65
0 5 10 15 20 25 30 35
Distance from Prototype (VOT, ms)
Fixa
tions
to T
arge
t
B
P
0.520.530.540.550.560.570.580.590.6
0.610.62
0 5 10 15 20 25 30 35
Distance from Prototype (VOT, ms)
Fixa
tions
to T
arge
t
B
Experiment 3: Garden path
XXXb/parri…
Garden-path analysis:
Identify trials in which
Time FixationPre POD Competitor
XXXb/parricade
Garden-path analysis:
Identify trials in which
Time FixationPre POD CompetitorPost POD Target
Is the latency to switch to the target related to VOT?• Accelerated latency = more (residual) target activation
XXXb/parricade
Is the latency to switch to the target related to VOT?
• Accelerated latency = more (residual) target activation
Sparse databut:
100
120
140
160
180
200
220
240
260
0 5 10 15 20 25 30
Distance from Prototype (VOT, ms)
Tim
e to
Tar
get
Voiced
Voiceless
B: p=.002P: p=.007
XXXb/parricade
Is the latency to switch to the target related to VOT?
• Accelerated latency = more (residual) target activation
VOT: p=.0001Targ: p=.001V x T: p>.1
120
140
160
180
200
220
240
260
280
0 5 10 15 20 25 30 35
Distance from Prototype (VOT, ms)
Tim
e to
Tar
get
Voiced
Voiceless
More data…
Experiment 3: Summary
?
Same gradiency seen in McMurray et al (2002)
Facilitates ambiguity resolution (time-to-target)• 240 ms after VOT
Gradiency lasts…… and is useful
?????Idiosyncracies:
• Attenuated gradiency for B• Interactions with target (voiced/voiceless).
Effect of reduced VOT range?
Barricade -> Parricade
00.10.20.30.40.50.60.70.80.9
300 500 700 900 1100 1300Time (ms)
Fixa
tions
to T
arge
t
0.4
0.45
0.5
0.55
0.6
0.65
0 10 20 30 40
Distance from Prototype (VOT, ms)
Fixa
tions
to T
arge
t
VoicedVoiceless
Target fixations.
Gradient VOT: - both targetsReduced target effectNo interaction
p=.0001
Replication: longer continua (0-45 ms).
Experiment 4
Exp 4Does the presence of the visual competitor (parakeet)
artificially heighten competitor activation (and cause the effect)?
XXX0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
300 500 700 900 1100 1300
Time (ms)
Fixa
tions
to T
arge
t
0
5
10
15
20
25
30
35
40
450
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
300 500 700 900 1100 1300
Time (ms)
Fixa
tions
to T
arge
t
0
5
10
15
20
25
30
35
40
45
XXXb/parricade
Is the latency to switch to the target related to VOT?
• Accelerated latency = more (residual) target activation
p=.0001
Experiment 4: Garden Path
100
150
200
250
0 10 20 30 40
Distance from Prototype (VOT, ms)
Tim
e to
Tar
get
VoicedVoiceless
No Competitor
Gradiency:Gradient effect of within-category variation without minimal-pairs.
Regressive Ambiguity Resolution
Gradient effect long-lasting: mean POD = 240 ms.
Gradiency:Gradient effect of within-category variation without minimal-pairs.
Regressive Ambiguity Resolution
Regressive ambiguity resolution:• Subphonemic gradations maintained until more
information arrives.• Improves (or hinder) recovery from garden path.• Gradient lexical sensitivity prevents over-
committing to a garden-path interpretation.• Likely lexical locus: echoic memory decays
around 400 ms.
Gradient effect long-lasting: mean POD = 240 ms.
Overview
1) Speech perception and Spoken Word Recognition.
2) Lexical activation is sensitive to fine-grained detail in speech.
4) Back in time: staying off the garden-path.
3) Where we’re going, we don’t need phonemes: evidence for continuous information integration.
5) Forward in time: coping with (and benefiting from) with phonological modification.
Progressive Expectation Formation
Can within-category detail be used to predict future acoustic/phonetic events?
Yes: Phonological regularities create systematic within-category variation.
• Predicts future events.
time
Input: m… a… rr… oo… ng… g… oo… s…
maroongoose
goatduck
Word-final coronal consonants (n, t, d) assimilate the place of the following segment.
Place assimilation -> ambiguous segments —anticipate upcoming material.
Experiment 5: Anticipation
Maroong Goose Maroon Duck
Assimilation is subphonemic, continuous, not discrete.
Experiment 5: Anticipation
F2 Transitions in /æC/ Contexts
Pitch Period155016001650
1700175018001850
Freq
uenc
y (H
z)
F3 Transitions in /æC/ Contexts
Pitch Period2550
2600
2650
2700
2750
2800
Freq
uenc
y (H
z)coronalassimilatedlabial
Subject hears “select the maroon duck”“select the maroon goose”“select the maroong goose”“select the maroong duck” *
We should see faster eye-movements to “goose” after assimilated consonants.
Results
Looks to “goose“ as a function of time
00.10.20.30.40.50.60.70.80.9
0 200 400 600Time (ms)
Fixa
tion
Prop
ortio
n
AssimilatedNon Assimilated
Onset of “goose” + oculomotor delay
Anticipatory effect on looks to non-coronal.
Inhibitory effect on looks to coronal (duck, p=.024)
0
0.05
0.1
0.15
0.2
0.25
0.3
0 200 400 600Time (ms)
Fixa
tion
Prop
ortio
n
AssimilatedNon Assimilated
Looks to “duck” as a function of time
Onset of “goose” + oculomotor delay
a quick runm picks you up.
a quick runm takes you down ***
When /p/ is heard, the bilabial feature can be assumed to come from assimilation (not an underlying /m/).
When /t/ is heard, the bilabial feature is likely to be from an underlying /m/.
Subject hears“select the mud drinker”“select the mudg gear” “select the mudg drinker
Critical Pair
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0 200 400 600 800 1000 1200 1400 1600 1800 2000Time (ms)
Fixa
tion
Prop
ortio
n
Initial Coronal:Mud Gear
Initial Non-Coronal:Mug Gear
Onset of “gear” Avg. offset of “gear” (402 ms)
Mudg Gear is initially ambiguous with a late bias towards “Mud”.
0
0.1
0.2
0.3
0.4
0.5
0.6
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Time (ms)
Fixa
tion
Prop
ortio
n
Initial Coronal: Mud Drinker
Initial Non-Coronal: Mug Drinker
Onset of “drinker” Avg. offset of “drinker (408 ms)
Mudg Drinker is also ambiguous with a late bias towards “Mug” (the /g/ has to come from somewhere).
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 200 400 600Time (ms)
Fixa
tion
Prop
ortio
n
AssimilatedNon Assimilated
Onset of “gear”
Looks to non-coronal (gear) following assimilated or non-assimilated consonant.
In the same stimuli/experiment there is also a progressive effect!
Assimilated(ambiguous)
Segment
SubsequentConsonant
predicts
resolves ambiguity
Phonological modification has a benefit and a cost:
• Creates predictive regularities.
• Creates ambiguity.
Gradiency:• Enables anticipation.• Retain both items until ambiguity resolution.
What is the mechanism?
What is the mechanism? Assimilated(ambiguous)
Segment
SubsequentConsonant
predicts
resolves ambiguity
Assimilated(ambiguous)
Segment
SubsequentConsonant
predicts
resolves ambiguity
Lexical activation• Activation/competition processes retain partial
representations.• Lexical processes are predictive by nature.
Prediction:• Lexical competition ought to inhibit these
processes.
Experiment 3: Extensions
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
200 300 400 500 600 700 800
Time (ms)
Loo
ks to
Lab
ial
Assim-LabialsLabials
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
200 300 400 500 600 700 800
Time (ms)
Loo
ks to
Lab
ial
AssimilatedNeutral
Compare progressive effect as a function of competition
Green/m Boat
Eight/Ape BabiesCompetition:100 ms delay in effect.
Reduction inmagnitude.
Sensitivity to subphonemic detail allows the system to simultaneously cope with and harness graded phonological modification.
• Increase priors on likely upcoming events.• Decrease priors on unlikely upcoming events.• Retain ambiguity until resolution occurs.
Lexical processes may play a pivotal role.
Conclusions
Lexical activation is exquisitely sensitiveto within-category detail.
3) Phonological (maroong goose): progressive facilitation, regressive ambiguity resolution.
This sensitivity enables integration over time at multiple levels and time-scales using normal word recognition mechanisms
2) Lexical (e.g. barricade/parakeet): regressive ambiguity resolution.
Lexicon
Cue 1 Cue 2 time
Lexicon
Cue 1 Cue 2 timeCue 1 Cue 2 time
1) Phonetic (e.g. VOT / Rate): No need for “buffer”.
ə
Summary
Words
Phonemes
Sound
SenseThe standard paradigm
• Does not capture data.• Limits our thinking in
terms of how continuous detail might be used.
Summary
Words
Sound
SenseThe standard paradigm
• Does not capture data.• Limits our thinking in
terms of how continuous detail might be used.
The Alternative
• Continuous phonetic cues.• Integrated directly by
normal lexical processes.• Rich temporal integration.• Computationally simple.
Integration over time at multiple levels.1) Phonetic (e.g. VOT / Rate)2) Lexical (e.g. barricade/parakeet) 3) Phonological (maroong goose)
Conclusions
My lab
These are rich, general processes:• Sentential garden-paths.• V-to-V coarticulation and harmony.• Anticipatory R-coloration• Vowel Nasalization
• Prosodic domain (Keating)
• Reduction (Manuel)• Misproduction (Goldrick)• Bilingual categories (Ju)• Allophonic variation
(Samuel, McLennan)• “Parsing” (Gow, Fowler)
• V-to-V coarticulation (Beddor, Fowler, Cole).
• Assimilation (Byrd, Mitterer, Gow)
• Nasalization (Dahan, Fowler)• R-Coloration (Hillenbrand)• Vowel length/embedded words
(Crosswhite, Salverda)
Other folks
Take home message
Spoken language is defined by change.
But the information to cope with it is in the signal—if we look online.
Within-category acoustic variation is signal, not noise.
Back to the future: Where we’re going, we don’t need phonemes.
Implications of a gradient lexicon.
Bob McMurrayUniversity of Iowa
Dept. of Psychology
Head-Tracker Cam Monitor
IR Head-Tracker Emitters
EyetrackerComputer
SubjectComputer
Computers connected via Ethernet
Head
2 Eye cameras
Misperception: Additional Results
10 Pairs of b/p items.• 0 – 35 ms VOT continua.
20 Filler items (lemonade, restaurant, saxophone…)
Option to click “X” (Mispronounced).
26 Subjects
1240 Trials over two days.
0.000.100.200.300.400.500.600.700.800.901.00
0 5 10 15 20 25 30 35
Barricade
Res
pons
e R
ate
VoicedVoicelessNW
Identification Results
Parricade
0.000.100.200.300.400.500.600.700.800.901.00
0 5 10 15 20 25 30 35
VoicedVoicelessNW
Barakeet Parakeet
Res
pons
e R
ate
Significant target responses even at extreme.
Graded effects of VOT on correct response rate.
“Garden-path” effect:Difference between looks to each target (b
vs. p) at same VOT.
VOT = 0 (/b/)
0
0.2
0.4
0.6
0.8
1
0 500 1000
Time (ms)
Fixa
tions
to T
arge
t
BarricadeParakeet
VOT = 35 (/p/)
0 500 1000 1500
Time (ms)
Phonetic “Garden-Path”
-0.1
-0.05
0
0.05
0.1
0.15
0 5 10 15 20 25 30 35
VOT (ms)
Gar
den-
Path
Eff
ect
( Bar
rica
de -
Para
keet
)
-0.1
-0.08
-0.06
-0.04
-0.02
0
0.02
0.04
0.06
0 5 10 15 20 25 30 35
VOT (ms)
Gar
den-
Path
Eff
ect
( Bar
rica
de -
Para
keet
)
Target
Competitor
GP Effect:Gradient effect of VOT.
Target: p<.0001Competitor: p<.0001
Assimilation: Additional Results
Within-category detail used in recovering from assimilation: temporal integration.
• Anticipate upcoming material• Bias activations based on context
- Like barricade/parakeet: within-category detail retained to resolve ambiguity..
Phonological variation is a source of information.
Exp 3 & 4: Conclusions
• Similar properties in terms of starting and sparseness.
VOT
Categories• Competitive Hebbian Learning
(Rumelhart & Zipser, 1986).• Not constrained by a particular
equation—can fill space better.
Non-parametric approach?
9-step b/p continua VOT varied.
beach/peachbees/peasbeak/peak
Voicing Continua
00.10.20.30.40.50.60.70.80.91
0 5 10 15 20 25 30 35 40
VOT
% P
LongShort
N=29
Voicing Continua
Looks to competitor
Clicked on Clicked on
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Time (ms)Lo
oks t
o B
+1+2+3+4
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Time (ms)
Loo
ks to
P
-5-4-3-2-1
t
All ProtoB .0001 .0001P .008 >.1