15
Speech-to-Speech Infrastructure Based on UIMA © 2003 IBM Corporation http://w3.ibm.com/ibm/presentations SpeechTek 2007 | August, 20 2007 Speech-to-Speech Infrastructure Based on UIMA S Jan Kleindienst, Ph.D. (on behalf of TC_STAR partners) Manager, Conversational Interactions and Architectures IBM Prague

Speech-to-Speech Infrastructure Based on UIMA © 2003 IBM Corporation Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum

Embed Size (px)

Citation preview

Page 1: Speech-to-Speech Infrastructure Based on UIMA © 2003 IBM Corporation Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum

Speech-to-Speech Infrastructure Based on UIMA

© 2003 IBM CorporationSpeechTek 2007 | August, 20 2007

Speech-to-Speech Infrastructure Based on UIMA

Jan Kleindienst, Ph.D. (on behalf of TC_STAR partners) Manager, Conversational Interactions and ArchitecturesIBM Prague

Page 2: Speech-to-Speech Infrastructure Based on UIMA © 2003 IBM Corporation Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum

Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA

Speech Tek 2007 August 20, 2007

Overview

Challenges

Approach

The Resulting Infrastructure

Use Cases

Conclusion

Page 3: Speech-to-Speech Infrastructure Based on UIMA © 2003 IBM Corporation Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum

Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA

Speech Tek 2007 August 20, 2007

What is a speech-to-speech system?

S2S system translates spoken input from a source language to a target language

Speech-to-speech systems typically consist of three main processing blocks:

– Transcription

– Translation

– Synthesis

ASR MT TTS

Page 4: Speech-to-Speech Infrastructure Based on UIMA © 2003 IBM Corporation Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum

Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA

Speech Tek 2007 August 20, 2007

Challenges

TC_STAR Project , 2004-2007, www.tc-star.org

Create an open technological infrastructure to support effective delivery of scientific results from speech-to-speech research community

Online distributed speech-to-speech infrastructure for automatic performance evaluation of end-2-end systems as well as individual components

Open technological framework based on open-source Unstructured Information Management Architecture (UIMA)

Page 5: Speech-to-Speech Infrastructure Based on UIMA © 2003 IBM Corporation Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum

Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA

Speech Tek 2007 August 20, 2007

Key Challenge: Support Online System Combinations and Automatic Evaluations

UPC

LIMSI

ELDA

ITC-Irst

UKA

RWTH IBM

?

Page 6: Speech-to-Speech Infrastructure Based on UIMA © 2003 IBM Corporation Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum

Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA

Speech Tek 2007 August 20, 2007

Approach: Pick such an infrastructure, which…

…specifies a common data format understood by all speech-to-speech components

…has well-defined APIs that let the engines pass the data in and read them out

…transparently takes care of network and local connectivity options

…requires just minimum coding to plug the proprietary engines to the infrastructure

Common MUMA Type System

initialize(), process(), destroy(), …

Java/C++/… local calls or SOAP and Vinci

Concept of UIMA Annotators

UIMA Component Model:

Page 7: Speech-to-Speech Infrastructure Based on UIMA © 2003 IBM Corporation Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum

Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA

Speech Tek 2007 August 20, 2007

Unstructured Information Management Architecture (UIMA)

What is UIMA?In Business Terms => the Analysis Bridge between unstructured and structured

information

In Technical Terms => infrastructure for integrating, processing and data managing all kinds of data driven engine entities, incl. support on monitoring

Key featuresUIMA is an emerging standard for text and media processing

UIMA SDK is open source under Apache license

UIMA infrastructure supports interoperability between platforms, component interfacing via Java, C++, Python, Perl, and remote/networked services

Offers a simple XML based integration with UIMA APIs

Distributed data exchange which supports complex data structures

UnstructuredInformation

UnstructuredInformation

AnalysisBridge

AnalysisBridge

StructuredInformationStructuredInformation

….….

Inefficient Search Efficient Search

Page 8: Speech-to-Speech Infrastructure Based on UIMA © 2003 IBM Corporation Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum

Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA

Speech Tek 2007 August 20, 2007

How to make components UIMA-pluggable?

Step1: Implement the required Annotator interface -=> initiate() & process() methods

Step2: Specify Component Descriptor XML file for configuration and lifecycle

Step3: Define in and out data structures of the Type System

proprietaryengine

Wrapper codeUIMA Annotator

CASMeta-dataMeta-data

data

CASdata

componentdescriptor

Page 9: Speech-to-Speech Infrastructure Based on UIMA © 2003 IBM Corporation Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum

Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA

Speech Tek 2007 August 20, 2007

http EvaluationData input

SLT

Wrapper coderAnnotator API

TTS

Wrapper codeAnnotator API Upload

CAS CASCAS CAS

Collection Processing Engine

ASR

wrapper codeAnnotator API

Download

Evaluation

Wrapper coderAnnotator API

CAS

Vinci Name Service

EvaluationData results

EvaluationReports

TC_STAR Speech to Speech Evaluation infrastructure

pcm pcmsource text

pcmsource texttarget text

pcmsource texttarget text

target audio

pcmsource texttarget text

target audioevaluation

Page 10: Speech-to-Speech Infrastructure Based on UIMA © 2003 IBM Corporation Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum

Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA

Speech Tek 2007 August 20, 2007

TC_STAR Speech-2-Speech pan-European deployment

Data Webserver

SLT

TTS

EvalSLT

Vinci nameserver

Control Web Server

Download

CPE

ASR

Upload

TTS

Upload

Annotator

UIMA/other

Profile 2 ASR->SLT->TTS->EVAL in different setup

Profile 1: ASR->SLT->TTS->EVAL (with ASR ROVER)UPC

LIMSI

ELDA

ITC-Irst

UKA

RWTH IBM

ASR

SLT

PuncuatorASR Rover

ASR

ASR

Page 11: Speech-to-Speech Infrastructure Based on UIMA © 2003 IBM Corporation Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum

Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA

Speech Tek 2007 August 20, 2007

UIMA Web Control Console

Distributed Logging and Monitoring

AJAX infrastructure

Current user and status

Annotators combinationin use for the experiment

Experiment ID, and the set of input data

Links to graphical speech-to-speech evaluation results

Page 12: Speech-to-Speech Infrastructure Based on UIMA © 2003 IBM Corporation Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum

Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA

Speech Tek 2007 August 20, 2007

UIMA Web Control ConsoleProcessing

engine

Path of completed processing

Engine where the data are currently

processed

Indication of active engine

Page 13: Speech-to-Speech Infrastructure Based on UIMA © 2003 IBM Corporation Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum

Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA

Speech Tek 2007 August 20, 2007

Lessons learned… Pain in placing machines on public IPs

Firewall configuration for all participating machines, local IT people ;-) Need to support variety of Linux distributions to host UIMA …

Partially eliminated by UIMA school development warm up

Variety of programming languages for writing AnnotatorsJava, C++, Perl, Python, …

Broad Requirements on Common Type System Punctuation, Casing, Lattices

Support for individual secure data download/upload of data serverAuthentication, HTTPS, Firewall rules

Web console for controlling the evaluation lifecycleConcept of profiles, experiment ids, monitoring

Remote Logging and DebuggingDistributed logging capabilities, Logging to Web console

Reliability of components and networks

Page 14: Speech-to-Speech Infrastructure Based on UIMA © 2003 IBM Corporation Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum

Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA

Speech Tek 2007 August 20, 2007

Speech-to-Speech Showcases

UIMA S2S Evaluation Web Portal The video demonstrates how S2S portal users (e.g. S2S researchers) set up, test, and evaluate speech-to-speech chains consisting of individual text and media processing components such as ASR, machine translation, TTS, etc. These components, in UIMA jargon called Annotators, are exported as Web services on public Internet and glued together by UIMA. More that 15 annotators are currently exported by IBM and EU institutes and universities.

http://www.tc-star.org/Demo/ibm/web_console_batch.swf

UIMA S2S Translation Video ConsoleThe individual Web service components can be assembled online into remote services that provide direct value to citizens. We show a video console that translates from English to Spanish (EU parliamentary domain). Note that the three Web services involved – ASR, MT, TTS are hosted by three different sites hundred kilometers away – glued together by UIMA.

http://www.tc-star.org/Demo/ibm/video_console_near_real_time.swf

Page 15: Speech-to-Speech Infrastructure Based on UIMA © 2003 IBM Corporation Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum

Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA

Speech Tek 2007 August 20, 2007

Conclusion

First-of-a-kind online multi-partner speech-to-speech system demonstrated on UIMA (Jun 06-May 07)

Remote speech-to-speech components dynamically combined via UIMA infrastructure to support different combinations, e.g. ROVER– Annotators hosted on public IPs of partner’s site

– The framework controlled via UIMA Web AJAX infrastructure

The open infrastructure is used to automatically set-up and evaluate individual components as well as end-to-end systems

Designed to support various use cases from research experiments to technology showcasing