30
VoiceXML continued Speech reco/speech synthesis recap rps example (<menu><choice>) Homework: Do VoiceXML examples.

VoiceXML continued Speech reco/speech synthesis recap rps example ( ) Homework: Do VoiceXML examples. Start planning Project 2

Embed Size (px)

Citation preview

VoiceXML continued

Speech reco/speech synthesis recap

rps example (<menu><choice>)

Homework: Do VoiceXML examples. Start

planning Project 2.

Speech recognition / Speech synthesis

• Area of research

• Area of ‘marketing’ / product research– What is the killer ap? Perhaps phones & PDAs.

Focus was on dictation

• You do not need to understand the science since tellme studio does it for us.– Some background always helpful.

Speech recognition concepts• Air pressure diaphragm in phone electrical

signal (Fourier Transform) wave patternmatched against• sets of canonical patterns

(native speaker of English, perhaps male/female & young/old alternatives)

• generated for the specified grammar (using a segmentation=dividing up of the parts)

produce• set of probabilities for each option in grammar

– Tellme/VoiceXML provides fieldname$.confidence

Fourier Transform(Discrete Fourier Transform -- FFT)

• Takes data representing a signal

• And produces numbers representing the combination of sine and cosine waves that make up the signal

Speech recognition

• Works on the product of the FFT

• Uses (in most cases) – Segmentation: attempt to break up into pieces,

perhaps syllables or words– Grammar: definition of what is to be expected– Probabilities: if first part matched X, then

greater probability that then next would match to Y

Speech synthesis• aka TTS (text to speech)• lexical units (syllabus of words) phonemes pre-

recorded (wav) files of phonemes• This is again a segmentation process: need to divide up

the words and then put together so speech sounds 'natural'. – particular phoneme may [need to] sound different in different

context.– also need to deal with abbreviations & local accents– Place names (important in travel & weather applications)

• Special case: detect and use wav file for each name.

• Older methods were all synthesized – similar distinction between all synthesized and samples of music

Speech synthesis

is essentially ‘the computer’ reading ‘out loud’.

Easy to do most things

More and more difficult to do complete job

VoiceXML elements

• <prompt count="1" timeout="2.0">– for handling less than ideal conditions

• <menu> and <choice> – for selection from list (alternative to grammar)

• <break ..>– pause

• <assign name="varname" expr="expression" />

Accepting caller input

• Application: getting caller name

• More complex than you would think

• There is the <record> element

Simple record/playback

<?xml version="1.0" ?><vxml version="2.0"><form> <record name="answer" maxtime="3s"> <prompt> <audio>Hello. Who is calling?</audio>

</prompt> <filled> <audio> Hello. </audio> <audio expr="answer"/> <audio> Have a nice day. </audio> </filled> </record></form></vxml>

Problems

• Replays in caller’s voice

• Can’t use outside the form

• Could save to server: lookup record under VoiceXML elements. – Sample php code

More involved dialogue

• <?xml version="1.0" ?>• <vxml version="2.0">• <form>• <record name="answer" maxtime="3s">• <prompt>• <audio>Hello. Who is calling?</audio>• </prompt>• <filled>• <audio>Great to hear from you. </audio>• </filled>• </record>

Continued: ask for class

<field name="ca"> <prompt> <audio expr="answer"/> <audio> Which class are you in? </audio> </prompt>

<grammar type="application/x-gsl“ mode="voice">

<![CDATA[ [ [databases] {<ca "db">} [interfaces both] {<ca "interface">} [no none not] {<ca "noclass">} ] ]]> </grammar>

Continued: respond for each class

<filled> <if cond="ca=='db'"> <audio>Good to hear from you </audio> <audio expr="answer"/> <elseif cond="ca=='interface'"/> <audio>check the next class notes for how to do this </audio>

<audio expr="answer"/> <audio>This may not be quite what some of you wanted.</audio>

<else/> <audio> I don't know why you are calling </audio>

<audio expr="answer"/> <audio> but it is nice to chat </audio> </if> </filled> </field> </form> </vxml>

Suggestion

• Get the simple one working and then work on the second one

• Suggested use is for callers to record a greeting– So it makes sense to be in the caller’s voice

• Experiment with grammar

rock paper scissors

• Note: a different version is on the tellme site.

• This is not a great choice (identify a better one) since it doesn't support the illusion that 'the computer' is making the choice at the same time that the player is.

• … but it does illustrate the features.

rps logic requirements

• create a random move for 'the computer'– use Math.random and Math.floor

• prompt and then distinguish caller saying 'rock', 'paper', 'scissors', 'score', 'quit'– use <menu> and <choice> elements

• keep score– use JavaScript variables

• modest amount of error handling– <nomatch> and <noinput> and count attribute in

<prompt>

VoiceXML features

• Menu element replaces form element (and its use of field and grammar elements)

• Each <prompt…> has a count. Used when nomatch or noinput occurs and you don’t want to re-prompt with exact same words.

• Can distinguish nomatch from noinput• Can control timing: amount of time waiting for

caller input and time between system utterances.– My version could be improved!

<?xml version="1.0" ?><vxml version="2.0"><var name="cchoice" /><var name="win" expr="0"/><var name="loss" expr="0"/><var name="tie" expr="0"/><script>var moves=new Array('rock','paper','scissors');

function randommove() { var r = Math.floor(Math.random()*3); return moves[r];}</script>

<menu id="play"> <prompt count="1" timeout="2.0"> <audio>Say rock, paper, scissors, score or quit </audio>

</prompt> <prompt count="2" timeout="2.0"> <audio>Please make a choice, rock, paper or scissors, or say score to get the score or quit to quit </audio>

</prompt> <prompt count="3" timeout="2.0"> <audio> I guess you're done. </audio> <exit/> </prompt>

<nomatch> <audio>I didn't understand. </audio> <reprompt/> </nomatch><noinput> <audio> I didn't hear anything </audio> <reprompt/></noinput> <choice next="#prock">rock </choice> <choice next="#pscissors">scissors </choice>

<choice next="#ppaper">paper </choice> <choice next="#sscore">score</choice> <choice next="#squit">quit</choice></menu>

<form id="sscore">

<block>

<audio>Scores are wins

<break size="medium"/>

<value expr="win"/>.

Losses <break time="500ms"/>

<value expr="loss"/>

<break time="800ms"/>

Ties <break size="medium"/>

<value expr="tie"/> </audio>

<goto next="#play"/>

</block>

</form>

loops back

<form id="prock"> <block> <assign name="cchoice" expr="randommove()" /> <audio> Computer played <value expr="cchoice"/>

</audio> <if cond="cchoice=='rock'"> <assign name="tie" expr="tie+1"/> <audio> Tie </audio> <elseif cond="cchoice=='paper'"/> <assign name="loss" expr="loss+1"/> <audio>You lose. Paper covers rock. </audio> <else/> <assign name="win" expr="win+1"/> <audio> You win. Rock breaks scissors </audio> </if><goto next="#play"/> </block></form>

Exercise: write the missing forms.

Exercise: improve prompts.

(Exercise when you can test it using the phone: improve use of breaks.)

<form id="squit">

<block>

<audio>Good bye </audio>

<exit />

</block>

</form>

</vxml>

testing

• Go to your tellme account

• Point your account at the website.

• For my examples, find out URLs by going to my XML stuff site.

• Rock-paper-scissors is athttp://newmedia.purchase.edu/~Jeanine/interfaces/rps.xml

Adding your voice

• Use tellme studio procedure for recording using the phone.

• They mail you the file.• You should rename file.• You upload that file to your website.• You will need to keep track of files. They will use

a name/number system for files sent.– Do one at a time

Other VoiceXML elements• subdialog

– to jump to encapsulated dialog OR server-side script (which generates a VoiceXML document)

• submit– to jump to a new document generated by server-side script

• data– to access XML content (generated by server-side programs)

• record– recording caller input for storage on server

• transfer– transfer to another phone number

Phone as interface: recap

• Familiar, ubiquitous, friendly (?), potentially hands-free

• Design challenge same (similar, not graphic design)– focus on function (information exchange)– caller actions can be guided but still potentially

quite variable

Homework

• VoiceXML exploration exercises• Begin to plan Project 2. Look at my pages for

ideas. Make posting to Forum (indicate team members).– another Web/XML project

– VoiceXML

– WML and/or XHTML-MP (using Nokia or OpenWave. Can also try with your phone, if your phone is WAP enabled.)