Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Speech Processing 11-492/18-492
Speech Translation
Case study: Transtac
Details
Phraselator: One Way Translation
Commercial System
VoxTec
Rapid deployment
Modules of 500ish utts
Transtac: Two S2S System
DARPA developed for
Check points, medical and civil defense
Requirements
Two way
Eyes-free (no screen)
Portable
Usable by real users
Transtac System
Laptop secured in Backpack
Optional speech control
Push-to-Talk Buttons
Close-talking Microphone
Small powerful Speakers
Transtac System Details
Two way system
2 ASR systems: English and Iraqi
2 way statistical translation
2 synthesizers
Push-to-talk system
(Users don’t like “translate everything mode”)
Echo back ASR result
And then translation
Iraqi Language
Iraqi Arabic is a dialect
Most Iraqi’s write Modern Standard Arabic
Most Iraqi’s do not write their own dialect
No standardized spelling
Transtac project invented one
But Iraqi’s may not be used to it
Arabic (MSA and dialects)
Do not write short vowels in words
Data for Training
Collected human mediated dialogs
Human acts as a machine
Passed a microphone back an forward
Try to get people not to talk at same time
Large number of collections (over 4 years)
650 thousand sentences pairs
Many different speakers
Hand transcribed by experts (in Iraqi spelling)
Hand translate (Source sentences and Interpreter’s)
Iraqi ASR
Acoustic model from Iraqi data
Based on MSA phoneset
Needs to be small fast models
Discriminative Training
Speaker specific adaptation
Lexicon
Based on LDC provided lexicon
Multiple pronunciations/typos still a problem
Statistically trained LTS rules
Language Model
Trained on Iraqi input (and translated output)
English ASR
Acoustic model Originally using other models
Then trained from collected data
(Mostly military personnel)
Lexicon Existing lexicon but needed to add Military speak:
MRAP, IED
Language model Trained from data provided
Trained from “similar” data found on the web
Training from hand created “typical” examples
TTS
Standard English TTS
Appropriate “command” voice
Unit selection
Added lots of military vocabulary
Iraqi TTS
Recorded from Iraqi radio announcer
Based on example sentences in the domain
LDC lexicon and LTS rules (same as ASR)
Hand tuned
S2S Interface Issues
How do you teach people to use the system
“Transtac say instructions”
Not really sufficient
How can you tell it translated correctly
Give (speech) feedback.
Backtranslation
ASR echo back
S2S Interface Issues
How do you translate names
A correct translation/transliteration is hard to
understand
Mark names in translations
“My name is … Abdullah”
“He lives on … al-Aqar … street”
S2S Evaluation (Transtac)
Offline tests ASR->Text and Text->Text
Compare to translation references
WER and “BLEU” score
Online tests Concept transfer (through defined scenarios)
Speed (number of concepts per minute)
(English speech masking)
Utility tests Does it really work
Transtac Participants
Developer groups IBM
SRI
BBN
CMU
USC
Evaluations Twice a year in Iraqi (somewhere in DC)
One surprise language Farsi, Bahasa Malay, Dari, Pashto
Other evaluations with military groups
Does it work??
Yes, mostly
27 concepts out of 30-ish turns
Systems are mostly similar
But some better than others
Other techniques
Belt/holster based PC with handheld speaker
Small PC in pouch
Chest mounted array microphone
S2S ASR Advanced issues
Tight coupling
ASR should output N-best
Translated all (lattice)
Choose best translation
(MT as a LM for ASR)
Remove disfluencies/hestitations
Add more relevant data
Automatically convert past tense/third person data to
present tense/first+second person …
S2S TTS Advance Issues
MT output isn’t grammatical
TTS doesn’t care and just says it
TTS should try to say MT output with more
breaks.
TTS (unit selection)
As a LM on MT output
Choose the best translation on what is said best
S2S MT Advanced issues
Train on ASR output
Do ASR on training data
Build SMT model ASR-TEXT to TEXT
Session adaptation
Improve coverage from daily usage
S2S In-line Translation
CMU-INESC (Portugal) project
Translation of TED videos
Align audio to give “dubbing” not “voiceover”
Align: timing, breaks, focus across language