20
Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials.

Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials

Embed Size (px)

Citation preview

Page 1: Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials

Creating User Interfaces

Directed Speech. XML. VoiceXMLClasswork/Homework: Sign up to be

Voxeo developer. Do tutorials.

Page 2: Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials

Speech recognition

• Encompasses variety and range of activities• Totally open-ended to content and audience

– May claim more than really exists

• Restricted to small[er] set of phrases– Phrases within longer sections of speech

• Restricted to require training• OR system learns

– Dictation systems learn your voice

Page 3: Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials

Speech recognition

• User speaks. System 'understands', at least enough to perform some action.

• Related to (but not the same as)– Natural language understanding– Voice print identification– Record information to be re-played to human in

compressed form for later interaction– Speech synthesis (other direction): words to speech– ?

Page 4: Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials

Natural language understanding

• Skip speech altogether, but type in statements or phrases in normal language– What is normal? We tend not to speak that

grammatically– Many 'natural language systems' actually use

keywords• Histor• Moon rocks example

• Combine speech to natural language …

Page 5: Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials

Continuous versus discrete

• Speaker speaks 'naturally' versus

• Speaker separates words

Page 6: Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials

Examples• Dictation: no understanding as such, produce

words/sentences in a program• (Telephone) Help desk / Information: generally

restricted or directed speech, choosing from alternatives (may or may not be given). Advances the process

• [Restricted] commands: actually carrying out operations– Factory example: start and stop– Car: radio, heat/AC– Phone: call specific number

Page 7: Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials

Training

• Dictation application: user takes time to read specific test to train the system– Note: some systems also adapt with use. If &

when user corrects the results, system may do better next time.

• Phone lookup: user records names. No 'understanding', just record for matching.

Page 8: Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials

Audience & content

• Some systems may allow adapting to audiences, for example, male versus female

• Some systems have restrictions on types of content– Historical note: IBM system in 1980s & 1990s

was restricted to male, American-born speakers (no speech impediments) and legal text.

Page 9: Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials

Speech recognition concepts

• Air pressure diaphragm in phone electrical signal (Fourier Transform) wave pattern

matched against• sets of canonical patterns

(native speaker of English, perhaps male/female & young/old alternatives)

• generated for the specified grammar (using a segmentation=dividing up of the parts)

Note: interplay of grammar and statistics distinguishes different approaches

Page 10: Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials

Fourier Transform(Fast Fourier Transform -- FFT)

• Takes data representing a signal

• And produces numbers representing the combination of sine and cosine waves that make up the signal

Page 11: Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials

Speech recognition

• Works on the product of the FFT

• Uses (in most cases) – Segmentation: attempt to break up into

pieces, perhaps syllables or words– Grammar: definition of what is to be expected– Probabilities: if first part matched X, then

greater probability that then next would match to Y

Page 12: Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials

Current State of the Art• General, no restrictions, speech reco, good

enough to act on the speech? always about to happen?

• dictation / substitute for keyboard+ exists and satisfies many– Is this most important application for most users?– May not be killer ap, but may be good for motivating

research

Extra credit posting: prepare brief report on [a] current product or application. Can be one you use yourself.

Page 13: Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials

Speech synthesis

• aka TTS (text to speech)

• Application determines that the computer needs to say certain words

• lexical units (syllables of words) phonemes pre-recorded (wav) files of phonemes

Page 14: Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials

Speech synthesis• This is again a segmentation process: need to

divide up the words and then put together so speech sounds 'natural'. – particular phoneme may [need to] sound different in

different context.– also need to deal with abbreviations & local accents– Place names (important in travel & weather

applications)• Special case: detect and use wav file for each name.

• Older methods were all synthesized – similar distinction between all synthesized and

samples of music

Page 15: Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials

Speech synthesis

is essentially ‘the computer’ reading ‘out loud’.Easy to do most thingsMore and more difficult to do complete job

Different languages may be easier than English.People who are not monolingual please comment!

Page 16: Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials

Restricted / directed speech applications

• The language is VoiceXML

• We will use evolution.voxeo.com to create directed speech applications.– Free facilty: put in URL pointing to a

VoiceXML document. Supplies phone numbers to call in to test.

– You need to register.– Note: previously used Tellme studios but they

stopped offering service.

Page 17: Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials

XML

• Generalization of HTML• XML documents have markup.

– Tag indicating type of element and, possibly with attributes, content, tag closer.

• Document must be well-formed.– Elements nested in other elements– Quotation marks around attribute values

• Developers decide on element types.– So, we need to obey rules of VoiceXML

• Each element type can only have certain child elements

Page 18: Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials

Screen shot from Voxeo

Page 19: Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials

Screen shot: phone numbers

Page 20: Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials

Homework (over break)

• Sign up to be Voxeo developer.– Start VoiceXML tutorials.– Do your own hello, world application.

• Start planning your VoiceXML project.