44
Oliver Scheer Senior Technical Evangelist Microsoft Deutschland http://the-oliver.com Using Speech

Windows Phone 8 - 14 Using Speech

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Windows Phone 8 - 14 Using Speech

Oliver Scheer

Senior Technical Evangelist

Microsoft Deutschland

http://the-oliver.com

Using Speech

Page 2: Windows Phone 8 - 14 Using Speech

Topics

• Speech on Windows Phone 8

• Speech synthesis

• Controlling applications using speech• Voice command definition files• Building conversations• Selecting application entry points

• Simple speech input

• Speech input and grammars•Using Grammar Lists

Page 3: Windows Phone 8 - 14 Using Speech

Speech on Windows Phone 8

3

Page 4: Windows Phone 8 - 14 Using Speech

4

Windows Phone Speech Support

•Windows Phone 7.x had voice support built into the operating system• Programs and phone features could be started by voice commands e.g “Start

MyApp”• Incoming SMS messages could be read to the user• The user could compose and send SMS messages

•Windows 8 builds on this to allow applications to make use of speech

• Applications can speak messages using the Speech Synthesis feature

• Applications can be started and given commands

• Applications can accept commands using voice input

• Speech recognition requires an internet connection, but Speech Synthesis does

not

Page 5: Windows Phone 8 - 14 Using Speech

Speech Synthesis

5

Page 6: Windows Phone 8 - 14 Using Speech

04/10/20236

Enabling Speech Synthesis

• If an application wishes to use speech

output the ID_CAP_SPEECH_RECOGNITION

capability must be enabled in

WMAppManifest.xml

• The application can also reference the

Synthesis namespace

using Windows.Phone.Speech.Synthesis;

Page 7: Windows Phone 8 - 14 Using Speech

04/10/20237

Simple Speech

• The SpeechSynthesizer class provides a simple way to produce speech

• The SpeakTextAsync method speaks the content of the string using the

default voice

•Note that the method is an asynchronous one, so the calling method must use

the async modifier

• Speech output does not require a network connection

async void CheeseLiker(){ SpeechSynthesizer synth = new SpeechSynthesizer();

await synth.SpeakTextAsync("I like cheese.");}

Page 8: Windows Phone 8 - 14 Using Speech

04/10/20238

Selecting a language

• The default speaking voice is selected automatically from the locale set for the

phone

• The InstalledVoices class provides a list of all the voices available on the phone

• The above code selects a French voice

// Query for a voice that speaks French.var frenchVoices = from voice in InstalledVoices.All where voice.Language == "fr-FR" select voice;

// Set the voice as identified by the query.synth.SetVoice(frenchVoices.ElementAt(0));

Page 9: Windows Phone 8 - 14 Using Speech

DemoDemo 1: Voice Selection

Page 10: Windows Phone 8 - 14 Using Speech

04/10/202310

Speech Synthesis Markup Language

• You can use Speech Synthesis Markup Language (SSML) to control the spoken

output• Change the voice, pitch, rate, volume, pronunciation and other

characteristics• Also allows the inclusion of audio files into the spoken output

• You can also use the Speech synthesizer to speak the contents of a file

<?xml version="1.0" encoding="ISO-8859-1"?><speak version="1.0" xmlns=http://www.w3.org/2001/10/synthesis xml:lang="en-US"> <p> Your <say-as interpret-as="ordinal">1st</say-as> request was for <say-as interpret-as="cardinal">1</say-as> room on <say-as interpret-as="date" format="mdy">10/19/2010</say-as> , arriving at <say-as interpret-as="time" format="hms12">12:35pm</say-as>. </p></speak>

Page 11: Windows Phone 8 - 14 Using Speech

Controlling Applications using Voice Commands

11

Page 12: Windows Phone 8 - 14 Using Speech

04/10/202312

Application Launching using Voice command

• The Voice Command feature of Windows Phone 7 allowed users to start

applications

• In Windows Phone 8 the feature has been expanded to allow the user to

request data from the application in the start command

• The data will allow a particular application page to be selected when the

program starts and can also pass request information to that page

• To start using Voice Commands you must Create a Voice Command Definition

(VCD) file that defines all the spoken commands

• The application then calls a method to register the words and phrases the first

time it is run

Page 13: Windows Phone 8 - 14 Using Speech

04/10/202313

The Fortune Teller Program

• The Fortune Teller program will tell

your future

• You can ask it questions and it will

display replies• It could also speak them

• Some of the spoken commands

activate different pages of the

application and others are

processed by the application when

it starts running

Page 14: Windows Phone 8 - 14 Using Speech

04/10/202314

<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/></Command><PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item></PhraseList>

The Voice Command Definition (VCD) file

• This is the “money” question:

“Fortune Teller Will I find money”

Page 15: Windows Phone 8 - 14 Using Speech

04/10/202315

<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/></Command><PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item></PhraseList>

The Voice Command Definition (VCD) file• This is the phrase

the user says to

trigger the

command

• All of the Fortune

Teller commands

start with this

phrase

Page 16: Windows Phone 8 - 14 Using Speech

04/10/202316

<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/></Command><PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item></PhraseList>

The Voice Command Definition (VCD) file• This is example text

that will be

displayed by the

help for this app as

an example of the

commands the app

supports

Page 17: Windows Phone 8 - 14 Using Speech

04/10/202317

<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/></Command><PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item></PhraseList>

The Voice Command Definition (VCD) file• This is the

command name

• This can be

obtained from the

URL by the

application when it

starts

Page 18: Windows Phone 8 - 14 Using Speech

04/10/202318

<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/></Command><PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item></PhraseList>

The Voice Command Definition (VCD) file• This is the example

for this specific

command

Page 19: Windows Phone 8 - 14 Using Speech

04/10/202319

<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/></Command><PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item></PhraseList>

The Voice Command Definition (VCD) file• This is the trigger

phrase for this

command

• It can be a

sequence of words

• The user must

prefix this sequence

with the words

“Fortune Teller”

Page 20: Windows Phone 8 - 14 Using Speech

04/10/202320

<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/></Command><PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item></PhraseList>

The Voice Command Definition (VCD) file• This is the phraselist

for the command

• The user can say any

of the words in the

phraselist to match

this command

• The application can

determine the phrase

used

• The phraselist can be

changed by the

application

dynamically

Page 21: Windows Phone 8 - 14 Using Speech

04/10/202321

<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/></Command><PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item></PhraseList>

The Voice Command Definition (VCD) file• This is the spoken

feedback from the

command

• The feedback will

insert the phrase

item used to activate

the command

Page 22: Windows Phone 8 - 14 Using Speech

04/10/202322

<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/></Command><PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item></PhraseList>

The Voice Command Definition (VCD) file• This is the url for the

page to be activated

by the command

• Commands can go to

different pages, or all

go to MainPage.xaml

if required

Page 23: Windows Phone 8 - 14 Using Speech

04/10/202323

<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/></Command><PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item></PhraseList>

The Voice Command Definition (VCD) file• These are the

phrases that can be

used at the end of

the command

• The application can

modify the phrase

list of a command

dynamically• It could give movie

times for films by

name

Page 24: Windows Phone 8 - 14 Using Speech

04/10/202324

Installing a Voice Command Definition (VCD) file

• The VCD file can be loaded from the application or from any URI• In this case it is just a file that has been added to the project and marked as

Content

• The VCD can also be changed by the application when it is running

• The voice commands for an application are loaded into the voice command

service when the application runs• The application must run at least once to configure the voice commands

async void setupVoiceCommands(){ await VoiceCommandService.InstallCommandSetsFromFileAsync( new Uri("ms-appx:///VCDCommands.xml", UriKind.RelativeOrAbsolute));}

Page 25: Windows Phone 8 - 14 Using Speech

04/10/202325

Launching Your App With a Voice Command

• If the user now presses and holds the Windows button, and says:

Fortune Teller, Will I find gold?

the Phone displays “Showing gold”

• It then launches your app and navigates to the page associated with this

command, which is /Money.xaml

• The query string passed to the page looks like this:

"/?voiceCommandName=showMoney&futureMoney=gold&reco=Fortune%20Teller%Will%20I%20find

%20gold"

Command Name

Phaselist Name

Recognized phrase

Whole phrase as it was

recognized

Page 26: Windows Phone 8 - 14 Using Speech

04/10/202326

Handling Voice Commands

• This code runs in the OnNavigatedTo method of a target page

• Can also check for the voice command phrase that was used

if (e.NavigationMode == System.Windows.Navigation.NavigationMode.New) { if (NavigationContext.QueryString.ContainsKey("voiceCommandName")) { string command = NavigationContext.QueryString["voiceCommandName"]; switch command) { case "tellJoke": messageTextBlock.Text = "Insert really funny joke here"; break; // Add cases for other commands. default: messageTextBlock.Text = "Sorry, what you said makes no sense."; break; } }}

Page 27: Windows Phone 8 - 14 Using Speech

04/10/202327

Identifying phrases

• The navigation context can be queried to determine the phrase used to trigger

the navigation

• In this case the program is selecting between the phrase used in the “riches”

question

<PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item></PhraseList>

string moneyPhrase = NavigationContext.QueryString["futureMoney"];

Page 28: Windows Phone 8 - 14 Using Speech

DemoDemo 2: Fortune Teller

Page 29: Windows Phone 8 - 14 Using Speech

04/10/202329

Modifying the phrase list

• An application can modify a phrase list when it is running• It cannot add new commands however

• This would allow a program to implement behaviours such as:

“Movie Planner tell me showings for Batman”

VoiceCommandSet fortuneVcs = VoiceCommandService.InstalledCommandSets["en-US"];

await fortuneVcs.UpdatePhraseListAsync("futureMoney", new string[] { "money", "cash", "wonga", "spondoolicks" });

Page 30: Windows Phone 8 - 14 Using Speech

Simple Speech Input

30

Page 31: Windows Phone 8 - 14 Using Speech

04/10/202331

Recognizing Free Speech

• A Windows Phone application can recognise

words and phrases and pass them to your

program

• From my experiments it seems quite reliable•Note that a network connection is required for

this feature

• Your application can just use the speech string

directly

• The standard “Listening” interface is displayed

over your application

Page 32: Windows Phone 8 - 14 Using Speech

04/10/202332

Simple Speech Recognition

• The above method checks for a successful response

• By default the system uses the language settings on the Phone

SpeechRecognizerUI recoWithUI;

async private void ListenButton_Click(object sender, RoutedEventArgs e){ this.recoWithUI = new SpeechRecognizerUI();

SpeechRecognitionUIResult recoResult = await recoWithUI.RecognizeWithUIAsync(); if ( recoResult.ResultStatus == SpeechRecognitionUIStatus.Succeeded ) MessageBox.Show(string.Format("You said {0}.", recoResult.RecognitionResult.Text));}

Page 33: Windows Phone 8 - 14 Using Speech

04/10/202333

Customizing Speech Recognition

• InitialSilenceTimeout• The time that the speech recognizer will wait until it hears speech. • The default setting is 5 seconds.

BabbleTimeout• The time that the speech recognizer will listen while it hears background

noise• The default setting is 0 seconds (the feature is not activated).

• EndSilenceTimeout• The time interval during which the speech recognizer will wait before

finalizing the recognition operation• The default setting is 150 milliseconds.

Page 34: Windows Phone 8 - 14 Using Speech

04/10/202334

Customizing Speech Recognition

• A program can also select whether or not the speech recognition echoes back

the user input and displays it in a message box

• The code above also sets timeout values

recoWithUI.Settings.ReadoutEnabled = false; // don't read the saying backrecoWithUI.Settings.ShowConfirmation = false; // don't show the confirmation

recoWithUI.Recognizer.Settings.InitialSilenceTimeout = TimeSpan.FromSeconds(6.0);recoWithUI.Recognizer.Settings.BabbleTimeout = TimeSpan.FromSeconds(4.0);recoWithUI.Recognizer.Settings.EndSilenceTimeout = TimeSpan.FromSeconds(1.2);

Page 35: Windows Phone 8 - 14 Using Speech

04/10/202335

Handling Errors

• An application can bind to events which indicate problems with the audio input

• There is also an event fired when the state of the capture changes

recoWithUI.Recognizer.AudioProblemOccurred +=Recognizer_AudioProblemOccurred;recoWithUI.Recognizer.AudioCaptureStateChanged += Recognizer_AudioCaptureStateChanged;...

void Recognizer_AudioProblemOccurred(SpeechRecognizer sender, SpeechAudioProblemOccurredEventArgs args){ MessageBox.Show("PLease speak more clearly");}

Page 36: Windows Phone 8 - 14 Using Speech

Using Grammars

36

Page 37: Windows Phone 8 - 14 Using Speech

04/10/202337

Grammars and Speech input

• The simple speech recognition we have seen so far uses the “Short Dictation”

grammar which just captures the text and returns it to the application

• You can add your own grammars that will structure the conversation between

the user and the application

•Grammars can be created using the Speech Recognition Grammar

Specification (SRGS) Version 1.0 and stored as XML files loaded when the

application runs• This is a little complex, but worth the effort if you want to create applications

with rich language interaction with the user

• If the application just needs to identify particular commands you can use a

grammar list to achieve this

Page 38: Windows Phone 8 - 14 Using Speech

04/10/202338

Using Grammar Lists

• To create a Grammar List an application defines an array of strings that form

the words in the list

• The Grammar can then be added to the recognizer and given a name

•Multiple grammar lists can be added to a grammar recognizer

• The recognizer will now resolve any of the words in the lists that have been

supplied

string [] strengthNames = { "weak", "mild", "medium", "strong", "english"};

recoWithUI.Recognizer.Grammars.AddGrammarFromList("cheeseStrength", strengthNames);

Page 39: Windows Phone 8 - 14 Using Speech

04/10/202339

Enabling and Disabling Grammar Lists

• An application can enable or disable particular grammars before a recognition

action

• It is also possible to set relative weightings of grammar lists

• The text displayed as part of the listen operation can also be set, as shown

above

recoWithUI.Settings.ListenText = "How strong do you like your cheese?";

recoWithUI.Recognizer.Grammars["cheeseStrength"].Enabled = true;

SpeechRecognitionUIResult recoResult = await recoWithUI.RecognizeWithUIAsync();

Page 40: Windows Phone 8 - 14 Using Speech

04/10/202340

Determining the confidence in the result

• An application can determine the confidence that the speech system has in the

result that was obtained• Result values are High, Medium, Low, Rejected

SpeechRecognitionUIResult recoResult = await recoWithUI.RecognizeWithUIAsync();

if ( recoResult.RecognitionResult.TextConfidence == SpeechRecognitionConfidence.High ){ // select cheese based on strength value}

Page 41: Windows Phone 8 - 14 Using Speech

04/10/202341

Matching Multiple Grammars

• If the spoken input matches multiple grammars a program can obtain a list of

the alternative results using

recoResult.RecognitionResult.GetAlternatives

• The list is supplied in order of confidence

• The application can then determine the best fit from the context of the voice

request

• This list is also provided if the request used a more complex grammar

var alternatives = recoResult.RecognitionResult.GetAlternates(3);

Page 42: Windows Phone 8 - 14 Using Speech

04/10/202342

Profanity

•Words that are recognised as profanities are not displayed in the response from

a recognizer command

• The speech system will also not repeat them

• They are enclosed in <Profanity> </Profanity> when supplied to the program

that receives the speech data

Page 43: Windows Phone 8 - 14 Using Speech

43

Review• Applications in Windows Phone 8 can use speech generation and recognition to

interact with users

• Applications can produce speech output from text files which can be marked up

with Speech Synthesis Markup Language (SSML) to include sound files

• Applications can be started and provided with initial commands by registering a

Voice Command Definition File with the Windows Phone• The commands can be picked up when a page is loaded, or the commands

specify a particular page to load• An application can modify the phrase part of a command to change the

activation commands

• Applications can recognise speech using complex grammars or simple word lists

Page 44: Windows Phone 8 - 14 Using Speech

The information herein is for informational purposes only an represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be

interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.

© 2012 Microsoft Corporation.

All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.