31
Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

Embed Size (px)

Citation preview

Page 1: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

Speech-to-Speech MTJANUS C-STAR/Nespole!

Lori Levin, Alon Lavie, Bob Frederking

LTI Immigration Course

September 11, 2000

Page 2: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

Outline

• Problems in Speech-to-Speech MT• The JANUS Approach• The C-STAR/NESPOLE! Interlingua (IF)• System Design and Engineering• Evaluation and User Studies• Open Problems, Current and Future Research

Page 3: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

JANUS Speech Translation

• Translation via an interlingua representation

• Main translation engine is rule-based

• Semantic grammars

• Modular grammar design

• System engineered for multiple domains

• Incorporate alternative translation engines

Page 4: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

The C-STAR Travel Planning Domain

General Scenario:

• Dialogue between one traveler and one or more travel agents

• Focus on making travel arrangements for a personal leisure trip (not business)

• Free spontaneous speech

Page 5: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

The C-STAR Travel Planning Domain

Natural breakdown into several sub-domains:

• Hotel Information and Reservation

• Transportation Information and Reservation

• Information about Sights and Events

• General Travel Information

• Cross Domain

Page 6: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

Semantic Grammars

• Describe structure of semantic concepts instead of syntactic constituency of phrases

• Well suited for task-oriented dialogue containing many fixed expressions

• Appropriate for spoken language - often disfluent and syntactically ill-formed

• Faster to develop reasonable coverage for limited domains

Page 7: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

Semantic Grammars

Hotel Reservation Example:

Input: we have two hotels available

Parse Tree:

[give-information+availability+hotel]

(we have [hotel-type]

([quantity=] (two)

[hotel] (hotels)

available)

Page 8: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

The JANUS-III Translation System

Page 9: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

The JANUS-III Translation System

Page 10: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

The SOUP Parser

• Specifically designed to parse spoken language using domain-specific semantic grammars

• Robust - can skip over disfluencies in input• Stochastic - probabilistic CFG encoded as a

collection of RTNs with arc probabilities• Top-Down - parses from top-level concepts of the

grammar down to matching of terminals• Chart-based - dynamic matrix of parse DAGs

indexed by start and end positions and head cat

Page 11: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

The SOUP Parser

• Supports parsing with large multiple domain grammars

• Produces a lattice of parse analyses headed by top-level concepts

• Disambiguation heuristics rank the analyses in the parse lattice and select a single best path through the lattice

• Graphical grammar editor

Page 12: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

SOUP Disambiguation Heuristics

• Maximize coverage (of input)• Minimize number of parse trees (fragmentation)• Minimize number of parse tree nodes• Minimize the number of wild-card matches• Maximize the probability of parse trees• Find sequence of domain tags with maximal

probability given the input words: P(T|W), where T= t1,t2,…,tn is a sequence of domain tags

Page 13: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

JANUS Generation Modules

Two alternative generation modules:

• Top-Down context-free based generator - fast, used for English and Japanese

• GenKit - unification-based generator augmented with Morphe morphology module - used for German

Page 14: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

Modular Grammar Design• Grammar development separated into modules corresponding to

sub-domains (Hotel, Transportation, Sights, General Travel, Cross Domain)

• Shared core grammar for lower-level concepts that are common to the various sub-domains (e.g. times, prices)

• Grammars can be developed independently (using shared core grammar)

• Shared and Cross-Domain grammars significantly reduce effort in expanding to new domains

• Separate grammar modules facilitate associating parses with domain tags - useful for multi-domain integration within the parser

Page 15: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

Translation with Multiple Domain Grammars

• Parser is loaded with all domain grammars

• Domain tag attached to grammar rules of each domain

• Previously developed grammars for other domains can also be incorporated

• Parser creates a parse lattice consisting of multiple analyses of the input into sequences of top-level domain concepts

• Parser disambiguation heuristics rank the analyses in the parse lattice and select a single best sequence of concepts

Page 16: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

Translation with Multiple Domain Grammars

Page 17: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

A SOUP Parse Lattice

Page 18: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

Alternative Approaches: SALT

SALT - Statistical Analyzer for Lang. Translation• Combines ML trainable and rule-based analysis

methods for robustness and portability• Rule-based parsing restricted to well-defined set of

argument-level phrases and fragments• Trainable classifiers (NN, Decision Trees, etc.) used to

derive the DA (speech-act and concepts) from the sequence of argument concepts.

• Phrase-level grammars are more robust and portable to new domains

Page 19: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

SALT Approach

• Example:Input: we have two hotels available

Arg-SOUP: [exist] [hotel-type] [available]

SA-Predictor: give-information

Concept-Predictor: availability+hotel

• Predictors using SOUP argument concepts and input words

• Preliminary results are encouraging

Page 20: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

Alternative Approaches: MEMT

Glossary-based Translation• Translates directly into target language (no IF)• Based on Pangloss translation system developed at

CMU• Uses a combination of EBMT, phrase glossaries

and a bilingual dictionary• English/German system operational• Good fall-back for uncovered utterances

Page 21: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

User Studies• We conducted three sets of user tests• Travel agent played by experienced system user• Traveler is played by a novice and given five minutes of

instruction• Traveler is given a general scenario - e.g., plan a trip to

Heidelberg

• Communication only via ST system, multi-modal interface and muted video connection

• Data collected used for system evaluation, error analysis and then grammar development

Page 22: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

System Evaluation Methodology

• End-to-end evaluations conducted at the SDU (sentence) level

• Multiple bilingual graders compare the input with translated output and assign a grade of: Perfect, OK or Bad

• OK = meaning of SDU comes across• Perfect = OK + fluent output• Bad = translation incomplete or incorrect

Page 23: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

August-99 Evaluation

• Data from latest user study - traveler planning a trip to Japan

• 132 utterances containing one or more SDUs, from six different users

• SR word error rate 14.7%

• 40.2% of utterances contain recognition error(s)

Page 24: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

Evaluation ResultsMethod Output

LanguageOK+Perfect Perfect

SOUP -Transcribed English 74% 54%SOUP-Recognition English 59% 42%SOUP-Transcribed Japanese 77% 59%SOUP-Recognition Japanese 62% 45%SOUP-Transcribed German 70% 39%SOUP-Recognition German 58% 34%

Page 25: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

Evaluation - Progress Over Time

Method OK+Perfect Perfect

Jan-99 Transcribed 69% 46%

Apr-99 Transcribed 70% 49%

Aug-99 Transcribed 74% 54%

Jan-99 Recognition 55% 36%

Apr-99 Recognition 57% 38%

Aug-99 Recognition 59% 42%

Page 26: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

• Speech-to-speech translation for eCommerce– CMU, Karlsruhe, IRST, CLIPS, 2 commercial partners

• Improved limited-domain speech translation• Experiment with multimodality and with MEMT• EU-side has strict scheduling and deliverables

– First test domain: Italian travel agency

– Second “showcase”: international Help desk

• Tied in to CSTAR-III

Page 27: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

C-STAR-III

• Partners: ATR, CMU, CLIPS, ETRI, IRST, UKA

• Main Research Goals:– Expandability - towards unlimited domains– Accessibility - Speech Translation over

wireless phone– Usability - real service for real users

Page 28: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

LingWear for the Information WarriorNew Ideas

• The pre-development of appropriate interlingua representations for domains of interest facilitates generation into a new language within two weeks.

• The development of new MT engines (e.g. learnable transfer rules) and improved multi-engine integration supports rapid deployment of MT for a new language with scarce resources.

• Gisting and summarzation in the source language followed by MT is better than vice versa.

Carnegie Mellon University School of Computer Science: A.Waibel, L. Levin, A. Lavie, R. Frederking

Impact• Allow military and relief organizations to converse in limited domains of interest with the local population in an area of conflict and/or disaster

• Allow military and other operatives in the field to assimilate forien language information they encounter on-the-move

• Rapidly port and deploy the technology into new languages with scarce resources

Schedule

9/00 12/00

9/01 9/02

Baseline MT systems ready Port to third

language

Baseline summarizer ready

Port to second language

Page 29: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

Current and Future Work

• Expanding the travel domain: covering descriptive as well as task-oriented sentences

• Development of the SALT statistical approach and expanding it to other domains

• Full integration of multiple MT approaches: SOUP, SALT, Pangloss

• Task-based evaluation• Disambiguation: improved sentence-level

disambiguation; applying discourse contextual information for disambiguation

Page 30: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

Students Working on the Project

• Chad Langley: improved SALT approach

• Dorcas Wallace: DA disambiguation using decision trees, English grammars

• Taro Watanabe: DA correction and disambiguation using Transformation-based Learning, Japanese grammars

• Ariadna Font-Llitjos: Spanish Generation

Page 31: Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000

The JANUS/C-STAR/Nespole! Team

• Project Leaders: Lori Levin, Alon Lavie, Alex Waibel, Bob Frederking

• Grammar and Component Developers: Donna Gates, Dorcas Wallace, Kay Peterson, Chad Langley, Taro Watanabe, Celine Morel, Susie Burger, Vicky Maclaren, Dan Schneider