Upload
grupo-golem-dcc-iimas-unam
View
46
Download
1
Embed Size (px)
DESCRIPTION
Talk given in MICAI 2013: Practical Speech Recognition for Contextualized Service Robots
Citation preview
Practical Speech Recognition for Contextualized Service Robots
Departamento de Ciencias de la ComputaciónInstituto de Investigaciones en Matemáticas Aplicadas y en Sistemas
Universidad Nacional Autónoma de México
http://golem.iimas.unam.mx/
Ivan Meza, Caleb Rascón and Luis Pineda
GrupoGolem
Service robots● Our future butlers ● They are task oriented
○ Clean up a room○ Play a game
● Interaction with spoken language ● They work in noisy environments● Microphone is not close to the speaker● Poor speech recognition
Proposal● Improve the system on four aspects
● Contextualized recogniser
● Prompting strategies
● Recovery strategies
● Audio calibration
I. Contextualized recognition
● Use specific language models for the given expectations
■ YES: yes, okay, all right■ NO: no, don’t, do not
■ NAVIGATE: go to the kitchen, go to the living room, go to the bedroom
ASR module
II. Prompting strategies
● Let know the user when to speak
■ Beep sound
● Speaker volume monitor
■ Could you speak louder or softer
III. Recovery strategy
● Let know the user when something went wrong
■ could you repeat? ■ i can’t hear you well, could you repeat■ sorry, i’m a little deaf
IV. Calibration of audio setting
● Hardware■ 1 directional microphone■ 1 USB interface with 4 channels■ 2 speakers
● Calibration of SNR in situ■ For background noise -58dB■ SNR set to 20 dB
Corpus evaluation
● Logs from the robot performing RoboCup tasks■ 2 years interactions in lab and competition■ 1,439 utterances■ 2,472 tokens■ 120 types■ 11 tasks■ 9 of 11 tasks are contextualized■ 14 language models
Contextualized recognitionWe measure WER (the lower the better)
● With a unique LM for all tasks: 53.84%
● With task-based LM: 28.28%
● With contextualized: 23.42%
17.2% relative error reduction
Beep sound
● 79 utterances were recorded without the beep sound
■ Without beeps 55.86%
■ With beeps 39.75%
■ With beeps full 53.72%
30%-4% Relative error reduction
Usage of SoundLoc System ● We measure usage
■ 174 times could have been triggered
■ 21 soft speech
■ 4 louder
14.36% of the times
Recovery strategy ● We measure usage
■ 504 times could have been triggered
■ 85 times activated
16.87% of the times
Conclusions
● These strategies help to improve in small amounts the performance
● Together they allow practical speech recognition on a service robot
Thank you
● ¿Questions?