Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

Speech-to-Speech MTJANUS C-STAR/Nespole!

Lori Levin, Alon Lavie, Bob Frederking

LTI Immigration Course

September 11, 2000

Outline

• Problems in Speech-to-Speech MT• The JANUS Approach• The C-STAR/NESPOLE! Interlingua (IF)• System Design and Engineering• Evaluation and User Studies• Open Problems, Current and Future Research

JANUS Speech Translation

• Translation via an interlingua representation

• Main translation engine is rule-based

• Semantic grammars

• Modular grammar design

• System engineered for multiple domains

• Incorporate alternative translation engines

The C-STAR Travel Planning Domain

General Scenario:

• Dialogue between one traveler and one or more travel agents

• Focus on making travel arrangements for a personal leisure trip (not business)

• Free spontaneous speech

The C-STAR Travel Planning Domain

Natural breakdown into several sub-domains:

• Hotel Information and Reservation

• Transportation Information and Reservation

• Information about Sights and Events

• General Travel Information

• Cross Domain

Semantic Grammars

• Describe structure of semantic concepts instead of syntactic constituency of phrases

• Well suited for task-oriented dialogue containing many fixed expressions

• Appropriate for spoken language - often disfluent and syntactically ill-formed

• Faster to develop reasonable coverage for limited domains

Semantic Grammars

Hotel Reservation Example:

Input: we have two hotels available

Parse Tree:

[give-information+availability+hotel]

(we have [hotel-type]

([quantity=] (two)

[hotel] (hotels)

available)

The JANUS-III Translation System

The JANUS-III Translation System

The SOUP Parser

• Specifically designed to parse spoken language using domain-specific semantic grammars

• Robust - can skip over disfluencies in input• Stochastic - probabilistic CFG encoded as a

collection of RTNs with arc probabilities• Top-Down - parses from top-level concepts of the

grammar down to matching of terminals• Chart-based - dynamic matrix of parse DAGs

indexed by start and end positions and head cat

The SOUP Parser

• Supports parsing with large multiple domain grammars

• Produces a lattice of parse analyses headed by top-level concepts

• Disambiguation heuristics rank the analyses in the parse lattice and select a single best path through the lattice

• Graphical grammar editor

SOUP Disambiguation Heuristics

• Maximize coverage (of input)• Minimize number of parse trees (fragmentation)• Minimize number of parse tree nodes• Minimize the number of wild-card matches• Maximize the probability of parse trees• Find sequence of domain tags with maximal

probability given the input words: P(T|W), where T= t1,t2,…,tn is a sequence of domain tags

JANUS Generation Modules

Two alternative generation modules:

• Top-Down context-free based generator - fast, used for English and Japanese

• GenKit - unification-based generator augmented with Morphe morphology module - used for German

Modular Grammar Design• Grammar development separated into modules corresponding to

sub-domains (Hotel, Transportation, Sights, General Travel, Cross Domain)

• Shared core grammar for lower-level concepts that are common to the various sub-domains (e.g. times, prices)

• Grammars can be developed independently (using shared core grammar)

• Shared and Cross-Domain grammars significantly reduce effort in expanding to new domains

• Separate grammar modules facilitate associating parses with domain tags - useful for multi-domain integration within the parser

Translation with Multiple Domain Grammars

• Parser is loaded with all domain grammars

• Domain tag attached to grammar rules of each domain

• Previously developed grammars for other domains can also be incorporated

• Parser creates a parse lattice consisting of multiple analyses of the input into sequences of top-level domain concepts

• Parser disambiguation heuristics rank the analyses in the parse lattice and select a single best sequence of concepts

Translation with Multiple Domain Grammars

A SOUP Parse Lattice

Alternative Approaches: SALT

SALT - Statistical Analyzer for Lang. Translation• Combines ML trainable and rule-based analysis

methods for robustness and portability• Rule-based parsing restricted to well-defined set of

argument-level phrases and fragments• Trainable classifiers (NN, Decision Trees, etc.) used to

derive the DA (speech-act and concepts) from the sequence of argument concepts.

• Phrase-level grammars are more robust and portable to new domains

SALT Approach

• Example:Input: we have two hotels available

Arg-SOUP: [exist] [hotel-type] [available]

SA-Predictor: give-information

Concept-Predictor: availability+hotel

• Predictors using SOUP argument concepts and input words

• Preliminary results are encouraging

Alternative Approaches: MEMT

Glossary-based Translation• Translates directly into target language (no IF)• Based on Pangloss translation system developed at

CMU• Uses a combination of EBMT, phrase glossaries

and a bilingual dictionary• English/German system operational• Good fall-back for uncovered utterances

User Studies• We conducted three sets of user tests• Travel agent played by experienced system user• Traveler is played by a novice and given five minutes of

instruction• Traveler is given a general scenario - e.g., plan a trip to

Heidelberg

• Communication only via ST system, multi-modal interface and muted video connection

• Data collected used for system evaluation, error analysis and then grammar development

System Evaluation Methodology

• End-to-end evaluations conducted at the SDU (sentence) level

• Multiple bilingual graders compare the input with translated output and assign a grade of: Perfect, OK or Bad

• OK = meaning of SDU comes across• Perfect = OK + fluent output• Bad = translation incomplete or incorrect

August-99 Evaluation

• Data from latest user study - traveler planning a trip to Japan

• 132 utterances containing one or more SDUs, from six different users

• SR word error rate 14.7%

• 40.2% of utterances contain recognition error(s)

Evaluation ResultsMethod Output

LanguageOK+Perfect Perfect

SOUP -Transcribed English 74% 54%SOUP-Recognition English 59% 42%SOUP-Transcribed Japanese 77% 59%SOUP-Recognition Japanese 62% 45%SOUP-Transcribed German 70% 39%SOUP-Recognition German 58% 34%

Evaluation - Progress Over Time

Method OK+Perfect Perfect

Jan-99 Transcribed 69% 46%

Apr-99 Transcribed 70% 49%

Aug-99 Transcribed 74% 54%

Jan-99 Recognition 55% 36%

Apr-99 Recognition 57% 38%

Aug-99 Recognition 59% 42%

• Speech-to-speech translation for eCommerce– CMU, Karlsruhe, IRST, CLIPS, 2 commercial partners

• Improved limited-domain speech translation• Experiment with multimodality and with MEMT• EU-side has strict scheduling and deliverables

– First test domain: Italian travel agency

– Second “showcase”: international Help desk

• Tied in to CSTAR-III

C-STAR-III

• Partners: ATR, CMU, CLIPS, ETRI, IRST, UKA

• Main Research Goals:– Expandability - towards unlimited domains– Accessibility - Speech Translation over

wireless phone– Usability - real service for real users

LingWear for the Information WarriorNew Ideas

• The pre-development of appropriate interlingua representations for domains of interest facilitates generation into a new language within two weeks.

• The development of new MT engines (e.g. learnable transfer rules) and improved multi-engine integration supports rapid deployment of MT for a new language with scarce resources.

• Gisting and summarzation in the source language followed by MT is better than vice versa.

Carnegie Mellon University School of Computer Science: A.Waibel, L. Levin, A. Lavie, R. Frederking

Impact• Allow military and relief organizations to converse in limited domains of interest with the local population in an area of conflict and/or disaster

• Allow military and other operatives in the field to assimilate forien language information they encounter on-the-move

• Rapidly port and deploy the technology into new languages with scarce resources

Schedule

9/00 12/00

9/01 9/02

Baseline MT systems ready Port to third

language

Baseline summarizer ready

Port to second language

Current and Future Work

• Expanding the travel domain: covering descriptive as well as task-oriented sentences

• Development of the SALT statistical approach and expanding it to other domains

• Full integration of multiple MT approaches: SOUP, SALT, Pangloss

• Task-based evaluation• Disambiguation: improved sentence-level

disambiguation; applying discourse contextual information for disambiguation

Students Working on the Project

• Chad Langley: improved SALT approach

• Dorcas Wallace: DA disambiguation using decision trees, English grammars

• Taro Watanabe: DA correction and disambiguation using Transformation-based Learning, Japanese grammars

• Ariadna Font-Llitjos: Spanish Generation

The JANUS/C-STAR/Nespole! Team

• Project Leaders: Lori Levin, Alon Lavie, Alex Waibel, Bob Frederking

• Grammar and Component Developers: Donna Gates, Dorcas Wallace, Kay Peterson, Chad Langley, Taro Watanabe, Celine Morel, Susie Burger, Vicky Maclaren, Dan Schneider

Documents

Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000