22
Speech Processing 11-492/18-492 Spoken Dialog Systems Tree based dialogs VoiceXML

Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_treedialogs.pdf · ECMAscript (Javascript) Beyond VoiceXML (in VoiceXML) ... A way to add new items to

  • Upload
    others

  • View
    18

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_treedialogs.pdf · ECMAscript (Javascript) Beyond VoiceXML (in VoiceXML) ... A way to add new items to

Speech Processing 11-492/18-492

Spoken Dialog Systems

Tree based dialogs

VoiceXML

Page 2: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_treedialogs.pdf · ECMAscript (Javascript) Beyond VoiceXML (in VoiceXML) ... A way to add new items to

State-based Dialogs

Simple state-based dialog systems

Get Name

Get Account number

Get PIN

Present balance

Go back to start or exit

Page 3: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_treedialogs.pdf · ECMAscript (Javascript) Beyond VoiceXML (in VoiceXML) ... A way to add new items to

State-based Dialogs

Get Name:

What is your name?

ASR Name

May be correct (in the database)

May be unknown (not in database)

May not be name (What do I say?/Help/Repeat)

Should you echo the recognized name?

Confirmation (or not)

Page 4: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_treedialogs.pdf · ECMAscript (Javascript) Beyond VoiceXML (in VoiceXML) ... A way to add new items to

State-based dialog

Get name

Check in database

Ask again if not

Deal with help

Get account number

Check in database (with name)

Confirm account number and name

For security

Page 5: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_treedialogs.pdf · ECMAscript (Javascript) Beyond VoiceXML (in VoiceXML) ... A way to add new items to

State-based Interaction

Trees can get very large

User can get lost easily

You want to minimize the number of turns

Faster throughput means more calls

Faster throughput means happier customer

Page 6: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_treedialogs.pdf · ECMAscript (Javascript) Beyond VoiceXML (in VoiceXML) ... A way to add new items to

The level of help

First time users *need* a successful call

Otherwise, they wont call back

Having very helpful prompts is good

At start, gets annoying quickly

Designing prompts is a craft

What is spoken should be understood

How much should you tailor it to the user

Page 7: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_treedialogs.pdf · ECMAscript (Javascript) Beyond VoiceXML (in VoiceXML) ... A way to add new items to

VoiceXML

A W3C standard for voice browsing

XML based “programming” language for

speech

Output synthesized (and recorded) speech

Recognition of speech

Recording of spoken input

Telephony features

Page 8: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_treedialogs.pdf · ECMAscript (Javascript) Beyond VoiceXML (in VoiceXML) ... A way to add new items to

VoiceXML

ASR

From Grammars (JSGF: java speech grammar

format)

From tri-grams

From “Domain Managers”

Credit card numbers

City, Stats

Page 9: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_treedialogs.pdf · ECMAscript (Javascript) Beyond VoiceXML (in VoiceXML) ... A way to add new items to

VoiceXML

TTS

<ssml> markup

Choice of voice

Choice of language

Choice of how to pronounce things

Specify breaks, timing, emphasis

Page 10: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_treedialogs.pdf · ECMAscript (Javascript) Beyond VoiceXML (in VoiceXML) ... A way to add new items to

Structure

<vxml version="1.0">

<meta name="author" content="John Doe"/>

<var name="hi" expr="'Hello World!'"/>

<form>

<block>

<value expr="hi"/>

<goto next="#say_goodbye"/>

</block>

</form>

<form id="say_goodbye">

<block>

Goodbye!

</block>

</form>

</vxml>

Page 11: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_treedialogs.pdf · ECMAscript (Javascript) Beyond VoiceXML (in VoiceXML) ... A way to add new items to

Basic Tags

<form id=“xxxx”>

<goto next=“#xxx”>

<field> gather info from user through

speech

<record> record data user

<subdialog> performs some sub dialog

Page 12: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_treedialogs.pdf · ECMAscript (Javascript) Beyond VoiceXML (in VoiceXML) ... A way to add new items to

<field> tag

<form id=“getBusNumber”>

<field name=“BusNumber”>

<prompt>Which bus line do you want?</prompt>

<grammar src=“grams/bus.gram”>

<help> Please say your desired bus number, e.g. 61C</help>

</field>

</form>

Page 13: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_treedialogs.pdf · ECMAscript (Javascript) Beyond VoiceXML (in VoiceXML) ... A way to add new items to

Flow of Control

Goto <goto next=“#GetBusNumber>

<goto next=“Trains.vxml”>

<if cond=“BusNumber == ‘501”>

<prompt> Sorry that bus no longer runs</prompt>

<elseif cond=“BusNumber == ’56U”>

<prompt> Sorry it’ll be a long wait </prompt>

<else />

<prompt> One will be along shortly </prompt>

</if>

Page 14: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_treedialogs.pdf · ECMAscript (Javascript) Beyond VoiceXML (in VoiceXML) ... A way to add new items to

Variables

<var name=“var1” expr =“hello”>

<prompt> I just wanted to say <value

expr=“var1”> </prompt>

<assign name=“var1” expr=“goodbye”>

Page 15: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_treedialogs.pdf · ECMAscript (Javascript) Beyond VoiceXML (in VoiceXML) ... A way to add new items to

Recognition Grammars

Speech Recognition Grammar Specification

(SRGS)

Augmented BNF

$order = I would like a $drink

$drink = coke | pepsi | mountain_dew

Page 16: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_treedialogs.pdf · ECMAscript (Javascript) Beyond VoiceXML (in VoiceXML) ... A way to add new items to

VoiceXML Browsers

Compatibility

Not as compatible as one would like

<objects> can be different (but useful)

City, State recognizers

ECMAscript (Javascript)

Page 17: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_treedialogs.pdf · ECMAscript (Javascript) Beyond VoiceXML (in VoiceXML) ... A way to add new items to

Beyond VoiceXML (in VoiceXML)

Mixing html/cgi scripts in VoiceXML

Use php to generate VoiceXML files

Use urls (with ?...) to calculate/get data

http://weather.com?zip=“15213”

Use urls to get waveforms

http://tts.com?text=“Hello World”

Page 18: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_treedialogs.pdf · ECMAscript (Javascript) Beyond VoiceXML (in VoiceXML) ... A way to add new items to

VoiceXML future

N-gram grammar Markup Language

Many browsers have own extensions

Pronunciation Lexicon Markup Language

A way to add new items to the lexicon

Hard to find good standards

Call Control Markup Language

For management and logging of calls

Page 19: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_treedialogs.pdf · ECMAscript (Javascript) Beyond VoiceXML (in VoiceXML) ... A way to add new items to

Microsoft SALT

SALT tags

Listen DTMF prompt bind grammar (plus ssml)

Designed for desktop not just phone

Design to be shared documents

Viewing (HTML) and Speech (SALT)

Page 20: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_treedialogs.pdf · ECMAscript (Javascript) Beyond VoiceXML (in VoiceXML) ... A way to add new items to

Available Systems

Nuance

Be-vocal

Tell Me

Tell-me studio

OpenVXI/publicvoicexml.org

HALEF

Many others

Page 21: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_treedialogs.pdf · ECMAscript (Javascript) Beyond VoiceXML (in VoiceXML) ... A way to add new items to
Page 22: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_treedialogs.pdf · ECMAscript (Javascript) Beyond VoiceXML (in VoiceXML) ... A way to add new items to

SDS Architecture