View
217
Download
1
Category
Tags:
Preview:
Citation preview
Efficient Computer Interfaces Using Continuous Gestures,
Language Models, and Speech
Keith Vertanen
Inference Group
August 4th, 2004
The problem
Speech recognizers make mistakes Correcting mistakes is inefficient
140 WPM Uncorrected dictation 14 WPM Corrected dictation, mouse/keyboard 32 WPM Corrected typing, mouse/keyboard
Voice-only correction is even slower and more frustrating
Research overview
Make correction of dictation: More efficient More fun More accessible
Approach: Build a word lattice from a recognizer’s n-best list Expand lattice to cover likely recognition errors Make a language model from expanded lattice Use model in a continuous gesture interface to
perform confirmation and correction
Building lattice
Example n-best list:1: jack studied very hard2: jack studied hard3: jill studied hard4: jill studied very hard5: jill studied little
Acoustic confusions Given a word, find words that sound similar Look pronunciation up in dictionary:
studied s t ah d iy d Use observed phone confusions to generate alternative
pronunciations:s t ah d iy d s t ah d iy d
s ao d iys t ah d iy…
Map pronunciation back to words:s t ah d iy d studieds ao d iy saudis t ah d iy study
Morphology confusions Given a word, find words that share the same “root”. Using the Porter stemmer:
jackingjacksjackjacked
studystudyingstudiedstudies
studi
jack
Language model confusions:“Jack studied hard”
Look at words before or after a node, add likely alternate words based on n-gram LM
Expansion results (on WSJ1)
84.0%
86.0%
88.0%
90.0%
92.0%
94.0%
96.0%
98.0%
Baseli
ne
Inse
rtion
Acous
tic
Mor
pholo
gy
Bigram
Trigra
m
Backw
ard
bigra
m
Backw
ard
trigr
am
Ora
cle
wo
rd a
ccu
racy
ObservedFully additive
Upper bound
Probability model
Our confirmation and correction interface requires probability of a letter given prior letters:
Probability model
Keep track of possible paths in lattice Prediction based on next letter on paths Interpolate with default language model Example, user has entered “the_cat”:
1.00
1.00
Handling word errors Use default language model during entry of erroneous word Rebuild paths allowing for an additional deletion or substitution error Example, user has entered “the_cattle_”:
0.25
0.25
0.25
0.0625
0.0625
Using expanded lattice Paths using arcs added during lattice expansion are penalized. Example, user has entered “jack_”:
0.04
0.041.00
Evaluating expansion Assume a good model requires as little information
from the user as possible
1t
0ii211i2 )s...ss|s(Plog
t
1 entropy(T) Cross
0.4
0.5
0.6
0.7
0.8
0.9
Baseli
ne
Inse
rtion
Acous
tic
Mor
pholo
gy
Bigram
Trigra
m
Backw
ard
bigra
m
Backw
ard
trigr
am
Cro
ss
en
tro
py
(b
its
)
Results on test set Model evaluated on held out test set (Hub1) Default language model
2.4 bits/letter User decides between 5.3 letters
Best speech-based model 0.61 bits/letter User decides between 1.5 letters
“The hibernating skunk curled up in his deep den uncurls himself and ventures forth to prowl the world”
Conclusions One-third of recognition errors covered by
expanding lattice. Only insertion error expansion improves
efficiency. Speech-based model significantly improves
efficiency (2.4 bits -> 0.61 bits). A good correction interface is possible using
Dasher and an off-the-shelf recognizer.
Future work Update Speech Dasher to use lattice-based
probability model. Incorporate hypothesis probabilities into lattice
(or even better get at recognizer’s lattice). Improve efficiency on sentences with few or no
errors. User trials to validate numeric results.
Recommended