Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/s2s_details.pdfTranstac System Details Two way system 2 ASR systems: English and Iraqi 2 way statistical translation

Speech Processing 11-492/18-492

Speech Translation

Case study: Transtac

Details

Phraselator: One Way Translation

Commercial System

VoxTec

Rapid deployment

Modules of 500ish utts

Transtac: Two S2S System

DARPA developed for

Check points, medical and civil defense

Requirements

Two way

Eyes-free (no screen)

Portable

Usable by real users

Transtac System

Laptop secured in Backpack

Optional speech control

Push-to-Talk Buttons

Close-talking Microphone

Small powerful Speakers

Transtac System Details

Two way system

2 ASR systems: English and Iraqi

2 way statistical translation

2 synthesizers

Push-to-talk system

(Users don’t like “translate everything mode”)

Echo back ASR result

And then translation

Iraqi Language

Iraqi Arabic is a dialect

Most Iraqi’s write Modern Standard Arabic

Most Iraqi’s do not write their own dialect

No standardized spelling

Transtac project invented one

But Iraqi’s may not be used to it

Arabic (MSA and dialects)

Do not write short vowels in words

Data for Training

Collected human mediated dialogs

Human acts as a machine

Passed a microphone back an forward

Try to get people not to talk at same time

Large number of collections (over 4 years)

650 thousand sentences pairs

Many different speakers

Hand transcribed by experts (in Iraqi spelling)

Hand translate (Source sentences and Interpreter’s)

Iraqi ASR

Acoustic model from Iraqi data

Based on MSA phoneset

Needs to be small fast models

Discriminative Training

Speaker specific adaptation

Lexicon

Based on LDC provided lexicon

Multiple pronunciations/typos still a problem

Statistically trained LTS rules

Language Model

Trained on Iraqi input (and translated output)

English ASR

Acoustic model Originally using other models

Then trained from collected data

(Mostly military personnel)

Lexicon Existing lexicon but needed to add Military speak:

MRAP, IED

Language model Trained from data provided

Trained from “similar” data found on the web

Training from hand created “typical” examples

TTS

Standard English TTS

Appropriate “command” voice

Unit selection

Added lots of military vocabulary

Iraqi TTS

Recorded from Iraqi radio announcer

Based on example sentences in the domain

LDC lexicon and LTS rules (same as ASR)

Hand tuned

S2S Interface Issues

How do you teach people to use the system

“Transtac say instructions”

Not really sufficient

How can you tell it translated correctly

Give (speech) feedback.

Backtranslation

ASR echo back

S2S Interface Issues

How do you translate names

A correct translation/transliteration is hard to

understand

Mark names in translations

“My name is … Abdullah”

“He lives on … al-Aqar … street”

S2S Evaluation (Transtac)

Offline tests ASR->Text and Text->Text

Compare to translation references

WER and “BLEU” score

Online tests Concept transfer (through defined scenarios)

Speed (number of concepts per minute)

(English speech masking)

Utility tests Does it really work

Transtac Participants

Developer groups IBM

SRI

BBN

CMU

USC

Evaluations Twice a year in Iraqi (somewhere in DC)

One surprise language Farsi, Bahasa Malay, Dari, Pashto

Other evaluations with military groups

Does it work??

Yes, mostly

27 concepts out of 30-ish turns

Systems are mostly similar

But some better than others

Other techniques

Belt/holster based PC with handheld speaker

Small PC in pouch

Chest mounted array microphone

S2S ASR Advanced issues

Tight coupling

ASR should output N-best

Translated all (lattice)

Choose best translation

(MT as a LM for ASR)

Remove disfluencies/hestitations

Add more relevant data

Automatically convert past tense/third person data to

present tense/first+second person …

S2S TTS Advance Issues

MT output isn’t grammatical

TTS doesn’t care and just says it

TTS should try to say MT output with more

breaks.

TTS (unit selection)

As a LM on MT output

Choose the best translation on what is said best

S2S MT Advanced issues

Train on ASR output

Do ASR on training data

Build SMT model ASR-TEXT to TEXT

Session adaptation

Improve coverage from daily usage

S2S In-line Translation

CMU-INESC (Portugal) project

Translation of TED videos

Align audio to give “dubbing” not “voiceover”

Align: timing, breaks, focus across language

Documents

Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/s2s_details.pdfTranstac System Details Two way system 2 ASR systems: English and Iraqi 2 way statistical translation