20
Speech Processing 11-492/18-492 Speech Translation Case study: Transtac Details

Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/s2s_details.pdfTranstac System Details Two way system 2 ASR systems: English and Iraqi 2 way statistical translation

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

  • Speech Processing 11-492/18-492

    Speech Translation

    Case study: Transtac

    Details

  • Phraselator: One Way Translation

    Commercial System

    VoxTec

    Rapid deployment

    Modules of 500ish utts

  • Transtac: Two S2S System

    DARPA developed for

    Check points, medical and civil defense

    Requirements

    Two way

    Eyes-free (no screen)

    Portable

    Usable by real users

  • Transtac System

    Laptop secured in Backpack

    Optional speech control

    Push-to-Talk Buttons

    Close-talking Microphone

    Small powerful Speakers

  • Transtac System Details

    Two way system

    2 ASR systems: English and Iraqi

    2 way statistical translation

    2 synthesizers

    Push-to-talk system

    (Users don’t like “translate everything mode”)

    Echo back ASR result

    And then translation

  • Iraqi Language

    Iraqi Arabic is a dialect

    Most Iraqi’s write Modern Standard Arabic

    Most Iraqi’s do not write their own dialect

    No standardized spelling

    Transtac project invented one

    But Iraqi’s may not be used to it

    Arabic (MSA and dialects)

    Do not write short vowels in words

  • Data for Training

    Collected human mediated dialogs

    Human acts as a machine

    Passed a microphone back an forward

    Try to get people not to talk at same time

    Large number of collections (over 4 years)

    650 thousand sentences pairs

    Many different speakers

    Hand transcribed by experts (in Iraqi spelling)

    Hand translate (Source sentences and Interpreter’s)

  • Iraqi ASR

    Acoustic model from Iraqi data

    Based on MSA phoneset

    Needs to be small fast models

    Discriminative Training

    Speaker specific adaptation

    Lexicon

    Based on LDC provided lexicon

    Multiple pronunciations/typos still a problem

    Statistically trained LTS rules

    Language Model

    Trained on Iraqi input (and translated output)

  • English ASR

    Acoustic model Originally using other models

    Then trained from collected data

    (Mostly military personnel)

    Lexicon Existing lexicon but needed to add Military speak:

    MRAP, IED

    Language model Trained from data provided

    Trained from “similar” data found on the web

    Training from hand created “typical” examples

  • TTS

    Standard English TTS

    Appropriate “command” voice

    Unit selection

    Added lots of military vocabulary

    Iraqi TTS

    Recorded from Iraqi radio announcer

    Based on example sentences in the domain

    LDC lexicon and LTS rules (same as ASR)

    Hand tuned

  • S2S Interface Issues

    How do you teach people to use the system

    “Transtac say instructions”

    Not really sufficient

    How can you tell it translated correctly

    Give (speech) feedback.

    Backtranslation

    ASR echo back

  • S2S Interface Issues

    How do you translate names

    A correct translation/transliteration is hard to

    understand

    Mark names in translations

    “My name is … Abdullah”

    “He lives on … al-Aqar … street”

  • S2S Evaluation (Transtac)

    Offline tests ASR->Text and Text->Text

    Compare to translation references

    WER and “BLEU” score

    Online tests Concept transfer (through defined scenarios)

    Speed (number of concepts per minute)

    (English speech masking)

    Utility tests Does it really work

  • Transtac Participants

    Developer groups IBM

    SRI

    BBN

    CMU

    USC

    Evaluations Twice a year in Iraqi (somewhere in DC)

    One surprise language Farsi, Bahasa Malay, Dari, Pashto

    Other evaluations with military groups

  • Does it work??

    Yes, mostly

    27 concepts out of 30-ish turns

    Systems are mostly similar

    But some better than others

    Other techniques

    Belt/holster based PC with handheld speaker

    Small PC in pouch

    Chest mounted array microphone

  • S2S ASR Advanced issues

    Tight coupling

    ASR should output N-best

    Translated all (lattice)

    Choose best translation

    (MT as a LM for ASR)

    Remove disfluencies/hestitations

    Add more relevant data

    Automatically convert past tense/third person data to

    present tense/first+second person …

  • S2S TTS Advance Issues

    MT output isn’t grammatical

    TTS doesn’t care and just says it

    TTS should try to say MT output with more

    breaks.

    TTS (unit selection)

    As a LM on MT output

    Choose the best translation on what is said best

  • S2S MT Advanced issues

    Train on ASR output

    Do ASR on training data

    Build SMT model ASR-TEXT to TEXT

    Session adaptation

    Improve coverage from daily usage

  • S2S In-line Translation

    CMU-INESC (Portugal) project

    Translation of TED videos

    Align audio to give “dubbing” not “voiceover”

    Align: timing, breaks, focus across language