23
Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302- 5252/4162 fax: (+49 681) 302-5341 e-mail: [email protected] WWW:http://www.dfki.de/ ~wahlster Brain and Communication Mainz Friday, 24 November 2000 Computers that read, hear and understand

Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681)

Embed Size (px)

Citation preview

Prof. Wolfgang Wahlster

German Research Center for Artificial Intelligence, DFKI GmbH

Stuhlsatzenhausweg 366123 Saarbruecken, Germany

phone: (+49 681) 302-5252/4162fax: (+49 681) 302-5341e-mail: [email protected]

WWW:http://www.dfki.de/~wahlster

Brain and CommunicationMainz

Friday, 24 November 2000

Computers that read, hear and understand

© Wolfgang Wahlster, DFKI

Pervasive Speech and Language Technology

A capuccino in 10 minutes, please!

Send the following email to Mark Maybury: Hi Mark,

please forward the following agenda to your project

partners!

Let‘s go to Baker Street in Berkeley!

I would like to hear Mozart‘s piano concert No. 3!

Speech-controlled coffee machine

Speech-basedcar navigation

Speech-enabledmusic selection

Dictation

© Wolfgang Wahlster, DFKI

Show me all CNN news of the last 3 months that

feature Bill Clinton discussing health care!

I would like to make an appointment with

Dr. Kuremastu in Kyoto next week!

Pervasive Speech and Language Technology

What has Jim Hendler said about DAML during our

recent Dagstuhl seminar?

Information on demand

Audio Mining

Speech-to-SpeechTranslation

© Wolfgang Wahlster, DFKI

What has the speakersaid?100

Alternatives

What has the speaker meant?

10Alternatives

What does the speakerwant?

Unambiguous Understanding in the

Dialog Context

Red

uct

ion

of

Un

cert

ain

tySprachanalyse

Speech Recognition

Speech Input

Discourse Context

Knowledgeabout Domainof Discourse

Grammar

LexicalMeaning

AcousticLanguage Models

Word Lists

Speech Analysis

SpeechUnder-

standing

Three Levels of Language Processing

© Wolfgang Wahlster, DFKI

Input Conditions Naturalness Adaptability Dialog Capabilities

Incr

easi

ng

Co

mp

lexi

ty

Close-SpeakingMicrophone/Headset

Push-to-talk

Telephone,Pause-basedSegmentation

Isolated Words

Read ContinuousSpeech

SpeakerIndependent

SpeakerDependent

MonologDictation

Information-seeking Dialog

Open Microphone,GSM Quality

SpontaneousSpeech

Speakeradaptive

MultipartyNegotiation

Challenges for Language Engineering

© Wolfgang Wahlster, DFKI

Wann fährt der nächsteZug nach Hamburg ab?

When does the next train to Hamburg depart?

Wo befindet sichdas nächste

Hotel?

Where is the nearest hotel?

Context-Sensitive Speech-to-Speech Translation

VerbmobilServer

© Wolfgang Wahlster, DFKI

Mobile Speech-to-Speech Translation of Spontaneous Dialogs

Verbmobil Speech Translation Server

Solution: Conference Call: The Verbmobil Speech Translation Server

is accessed by GSM mobile phones.

© Wolfgang Wahlster, DFKI

Speech-to-Speech Translation

© Wolfgang Wahlster, DFKI

The Control Panel of Verbmobil

© Wolfgang Wahlster, DFKI

General Speech Recognition Task

GermanGerman

EnglishEnglish

JapaneseJapanese

Audio Signal Recognizers Word Hypotheses Graph

© Wolfgang Wahlster, DFKI

Machine Learningfor the Integration of Statistical Properties into

Symbolic Models for Speech Recognition, Parsing,Dialog Processing, Translation

TranscribedSpeech Data

SegmentedSpeech

with ProsodicLabels

AnnotatedDialogs withDialog Acts

Treebanks &Predicate-ArgumentStructures

AlignedBilingualCorpora

HiddenMarkovModels

Neural Nets,MultilayeredPerceptrons

ProbabilisticAutomata

ProbabilisticGrammars

ProbabilisticTransfer

Rules

Extracting Statistical Properties from Large Corpora

© Wolfgang Wahlster, DFKI

The Use of Prosodic Information at All Processing Stages

Speech Signal Word Hypotheses Graph

Multilingual Prosody ModuleProsodic features:durationpitchenergypause

Search SpaceRestriction

Parsing

Dialog ActSegmentation and

Recognition

Dialog Understanding

Constraints forTransfer

Translation

LexicalChoice

GenerationSpeech

Synthesis

SpeakerAdaptation

BoundaryInformationBoundary

InformationBoundary

InformationBoundary

InformationSentence

MoodSentence

MoodAccented

WordsAccented

WordsProsodic Feature

Vector

© Wolfgang Wahlster, DFKI

I need a car next Tuesday oops MondayI need a car next Tuesday oops Monday

Original Utterance Editing Phase Repair Phase

Reparandum Hesitation Reparans

Recognition ofSubstitutions

Transformation of theWord Hypothesis Graph

I need a car next MondayI need a car next Monday

Verbmobil Technology: Understands Speech Repairs and extracts the intended meaning

Dictation Systems like: ViaVoice, VoiceXpress, FreeSpeech, Naturally Speaking cannot deal with spontaneous speech and transcribe the corrupted utterances.

The Understanding of Spontaneous Speech Repairs

© Wolfgang Wahlster, DFKI

Wir treffen uns inMannheim, äh, in Saarbrücken.

(We are meeting in Mannheim, oops, in Saarbruecken.)

We are meetingin Saarbruecken.

English

German

Automatic Understanding and Correction of Speech Repairs in Spontaneous Telephone Dialogs

© Wolfgang Wahlster, DFKI

Fielded applications

Train schedules (German Railway System, DB)

TABA (Philips)+49 241 60 40 20

OSCAR (DaimlerChrysler)+49 1805 99 66 22

Flight Schedules (Lufthansa)

ALF (Philips)+49 1803 00 00 74

Technical Challenges: phone -based dialogs, many proper names, clarificationsubdialogs

Spoken Dialogs about Schedules

© Wolfgang Wahlster, DFKI

MicrophonePush-to-talk

Switch

Please call Doris Wahlster.

Open the left window in the back.

I want to hear the weather channel.

When will I reach the next gas station?

Where is the next parking lot?

Speech control of: cellular phone, radio, windows / AC, route guidance system Option for S-, C-, and E-Class of Mercedes and BMW Speaker-independent, Garbage models for non-speech (blinker, AC, wheels)

Linguatronic : Spoken Dialogs with Mercedes-Benz

© Wolfgang Wahlster, DFKI

With Maier on 25 Oktober, with Tetzlaff,

and with Streit too.

Oops, not with Streit.

From 2 to 3.

Okay!

Speech-based Interaction with an Organizeron a WAP Phone (Voice In - WML out)

© Wolfgang Wahlster, DFKI

Augmented Reality: Combining Speech, Gestures andGraphics for Mobile Access to a Digital Library

Mobile Dialog with a Virtual TouristGuide for the Heidelberg Castle

Location-adaptiveQuery Interpretation

© Wolfgang Wahlster, DFKI

Multimodal Route Description

Mobile Speech Translation andMultilingual Information Access

Augmented Reality: Combining Speech, Gestures andGraphics for Mobile Access to a Digital Library

© Wolfgang Wahlster, DFKI

Speech-based Accessto 3D Virtual Views

Multimodal Output froma Digital Library and Speech-based Access to Internet Content

Augmented Reality: Combining Speech, Gestures andGraphics for Mobile Access to a Digital Library

© Wolfgang Wahlster, DFKI

Multilingualand Mobile

CommunicationAssistants

Multimodal Interfaces

SmartKom

Speech-based Web Access to Multilingual

Web pages

WAP Phones WebTV

Multilingual Audio Retrieval

and Audio Mining

Discussions Lecture Notes Organizers

MultilingualIndexing andAnnotation of

Videos

Video Archives News Archives

Call CentersECommerce Mobile Travel Assistance Telephone Translations

Verbmobil

Dialog Translation

International Research Trends in Multilingual Systems

Multilingual Language Technology Speech Recognition, Language Understanding, Language Generation,

and Speech Synthesis

Multilingual Language Technology Speech Recognition, Language Understanding, Language Generation,

and Speech Synthesis

Spontaneous Speech, Robust Processing and Translation, Semantic and Pragmatic Understanding

© Wolfgang Wahlster, DFKI

Open Problems for the Next Decade

Problems with current machine learning approaches

Expensive data collection

Cognitively unrealistic training data

Data sparseness

Problems with current hand-crafted knowledge sources

Brittleness

Domain dependence

Limited scalability

© Wolfgang Wahlster, DFKI

A Speculative Conclusion (+50 years)

-500 years TODAY +50 years

Oral Society Textual Society Oral Society

News and knowledge ispassed orally

No mass storageNo automatic processingNo automatic retrieval

Mass storage of textsText ProcessingText Retrieval

Mass storage of speechSpeech ProcessingAudio Retrieval

News and knowledge ispassed textually

News and knowledge ispassed orally