Upload
alexina-stafford
View
230
Download
0
Embed Size (px)
Citation preview
Prof. Wolfgang Wahlster
German Research Center for Artificial Intelligence, DFKI GmbH
Stuhlsatzenhausweg 366123 Saarbruecken, Germany
phone: (+49 681) 302-5252/4162fax: (+49 681) 302-5341e-mail: [email protected]
WWW:http://www.dfki.de/~wahlster
Brain and CommunicationMainz
Friday, 24 November 2000
Computers that read, hear and understand
© Wolfgang Wahlster, DFKI
Pervasive Speech and Language Technology
A capuccino in 10 minutes, please!
Send the following email to Mark Maybury: Hi Mark,
please forward the following agenda to your project
partners!
Let‘s go to Baker Street in Berkeley!
I would like to hear Mozart‘s piano concert No. 3!
Speech-controlled coffee machine
Speech-basedcar navigation
Speech-enabledmusic selection
Dictation
© Wolfgang Wahlster, DFKI
Show me all CNN news of the last 3 months that
feature Bill Clinton discussing health care!
I would like to make an appointment with
Dr. Kuremastu in Kyoto next week!
Pervasive Speech and Language Technology
What has Jim Hendler said about DAML during our
recent Dagstuhl seminar?
Information on demand
Audio Mining
Speech-to-SpeechTranslation
© Wolfgang Wahlster, DFKI
What has the speakersaid?100
Alternatives
What has the speaker meant?
10Alternatives
What does the speakerwant?
Unambiguous Understanding in the
Dialog Context
Red
uct
ion
of
Un
cert
ain
tySprachanalyse
Speech Recognition
Speech Input
Discourse Context
Knowledgeabout Domainof Discourse
Grammar
LexicalMeaning
AcousticLanguage Models
Word Lists
Speech Analysis
SpeechUnder-
standing
Three Levels of Language Processing
© Wolfgang Wahlster, DFKI
Input Conditions Naturalness Adaptability Dialog Capabilities
Incr
easi
ng
Co
mp
lexi
ty
Close-SpeakingMicrophone/Headset
Push-to-talk
Telephone,Pause-basedSegmentation
Isolated Words
Read ContinuousSpeech
SpeakerIndependent
SpeakerDependent
MonologDictation
Information-seeking Dialog
Open Microphone,GSM Quality
SpontaneousSpeech
Speakeradaptive
MultipartyNegotiation
Challenges for Language Engineering
© Wolfgang Wahlster, DFKI
Wann fährt der nächsteZug nach Hamburg ab?
When does the next train to Hamburg depart?
Wo befindet sichdas nächste
Hotel?
Where is the nearest hotel?
Context-Sensitive Speech-to-Speech Translation
VerbmobilServer
© Wolfgang Wahlster, DFKI
Mobile Speech-to-Speech Translation of Spontaneous Dialogs
Verbmobil Speech Translation Server
Solution: Conference Call: The Verbmobil Speech Translation Server
is accessed by GSM mobile phones.
© Wolfgang Wahlster, DFKI
General Speech Recognition Task
GermanGerman
EnglishEnglish
JapaneseJapanese
Audio Signal Recognizers Word Hypotheses Graph
© Wolfgang Wahlster, DFKI
Machine Learningfor the Integration of Statistical Properties into
Symbolic Models for Speech Recognition, Parsing,Dialog Processing, Translation
TranscribedSpeech Data
SegmentedSpeech
with ProsodicLabels
AnnotatedDialogs withDialog Acts
Treebanks &Predicate-ArgumentStructures
AlignedBilingualCorpora
HiddenMarkovModels
Neural Nets,MultilayeredPerceptrons
ProbabilisticAutomata
ProbabilisticGrammars
ProbabilisticTransfer
Rules
Extracting Statistical Properties from Large Corpora
© Wolfgang Wahlster, DFKI
The Use of Prosodic Information at All Processing Stages
Speech Signal Word Hypotheses Graph
Multilingual Prosody ModuleProsodic features:durationpitchenergypause
Search SpaceRestriction
Parsing
Dialog ActSegmentation and
Recognition
Dialog Understanding
Constraints forTransfer
Translation
LexicalChoice
GenerationSpeech
Synthesis
SpeakerAdaptation
BoundaryInformationBoundary
InformationBoundary
InformationBoundary
InformationSentence
MoodSentence
MoodAccented
WordsAccented
WordsProsodic Feature
Vector
© Wolfgang Wahlster, DFKI
I need a car next Tuesday oops MondayI need a car next Tuesday oops Monday
Original Utterance Editing Phase Repair Phase
Reparandum Hesitation Reparans
Recognition ofSubstitutions
Transformation of theWord Hypothesis Graph
I need a car next MondayI need a car next Monday
Verbmobil Technology: Understands Speech Repairs and extracts the intended meaning
Dictation Systems like: ViaVoice, VoiceXpress, FreeSpeech, Naturally Speaking cannot deal with spontaneous speech and transcribe the corrupted utterances.
The Understanding of Spontaneous Speech Repairs
© Wolfgang Wahlster, DFKI
Wir treffen uns inMannheim, äh, in Saarbrücken.
(We are meeting in Mannheim, oops, in Saarbruecken.)
We are meetingin Saarbruecken.
English
German
Automatic Understanding and Correction of Speech Repairs in Spontaneous Telephone Dialogs
© Wolfgang Wahlster, DFKI
Fielded applications
Train schedules (German Railway System, DB)
TABA (Philips)+49 241 60 40 20
OSCAR (DaimlerChrysler)+49 1805 99 66 22
Flight Schedules (Lufthansa)
ALF (Philips)+49 1803 00 00 74
Technical Challenges: phone -based dialogs, many proper names, clarificationsubdialogs
Spoken Dialogs about Schedules
© Wolfgang Wahlster, DFKI
MicrophonePush-to-talk
Switch
Please call Doris Wahlster.
Open the left window in the back.
I want to hear the weather channel.
When will I reach the next gas station?
Where is the next parking lot?
Speech control of: cellular phone, radio, windows / AC, route guidance system Option for S-, C-, and E-Class of Mercedes and BMW Speaker-independent, Garbage models for non-speech (blinker, AC, wheels)
Linguatronic : Spoken Dialogs with Mercedes-Benz
© Wolfgang Wahlster, DFKI
With Maier on 25 Oktober, with Tetzlaff,
and with Streit too.
Oops, not with Streit.
From 2 to 3.
Okay!
Speech-based Interaction with an Organizeron a WAP Phone (Voice In - WML out)
© Wolfgang Wahlster, DFKI
Augmented Reality: Combining Speech, Gestures andGraphics for Mobile Access to a Digital Library
Mobile Dialog with a Virtual TouristGuide for the Heidelberg Castle
Location-adaptiveQuery Interpretation
© Wolfgang Wahlster, DFKI
Multimodal Route Description
Mobile Speech Translation andMultilingual Information Access
Augmented Reality: Combining Speech, Gestures andGraphics for Mobile Access to a Digital Library
© Wolfgang Wahlster, DFKI
Speech-based Accessto 3D Virtual Views
Multimodal Output froma Digital Library and Speech-based Access to Internet Content
Augmented Reality: Combining Speech, Gestures andGraphics for Mobile Access to a Digital Library
© Wolfgang Wahlster, DFKI
Multilingualand Mobile
CommunicationAssistants
Multimodal Interfaces
SmartKom
Speech-based Web Access to Multilingual
Web pages
WAP Phones WebTV
Multilingual Audio Retrieval
and Audio Mining
Discussions Lecture Notes Organizers
MultilingualIndexing andAnnotation of
Videos
Video Archives News Archives
Call CentersECommerce Mobile Travel Assistance Telephone Translations
Verbmobil
Dialog Translation
International Research Trends in Multilingual Systems
Multilingual Language Technology Speech Recognition, Language Understanding, Language Generation,
and Speech Synthesis
Multilingual Language Technology Speech Recognition, Language Understanding, Language Generation,
and Speech Synthesis
Spontaneous Speech, Robust Processing and Translation, Semantic and Pragmatic Understanding
© Wolfgang Wahlster, DFKI
Open Problems for the Next Decade
Problems with current machine learning approaches
Expensive data collection
Cognitively unrealistic training data
Data sparseness
Problems with current hand-crafted knowledge sources
Brittleness
Domain dependence
Limited scalability
© Wolfgang Wahlster, DFKI
A Speculative Conclusion (+50 years)
-500 years TODAY +50 years
Oral Society Textual Society Oral Society
News and knowledge ispassed orally
No mass storageNo automatic processingNo automatic retrieval
Mass storage of textsText ProcessingText Retrieval
Mass storage of speechSpeech ProcessingAudio Retrieval
News and knowledge ispassed textually
News and knowledge ispassed orally