Upload
oliver-scheer
View
274
Download
0
Embed Size (px)
DESCRIPTION
Citation preview
Oliver Scheer
Senior Technical Evangelist
Microsoft Deutschland
http://the-oliver.com
Using Speech
Topics
• Speech on Windows Phone 8
• Speech synthesis
• Controlling applications using speech• Voice command definition files• Building conversations• Selecting application entry points
• Simple speech input
• Speech input and grammars•Using Grammar Lists
Speech on Windows Phone 8
3
4
Windows Phone Speech Support
•Windows Phone 7.x had voice support built into the operating system• Programs and phone features could be started by voice commands e.g “Start
MyApp”• Incoming SMS messages could be read to the user• The user could compose and send SMS messages
•Windows 8 builds on this to allow applications to make use of speech
• Applications can speak messages using the Speech Synthesis feature
• Applications can be started and given commands
• Applications can accept commands using voice input
• Speech recognition requires an internet connection, but Speech Synthesis does
not
Speech Synthesis
5
04/10/20236
Enabling Speech Synthesis
• If an application wishes to use speech
output the ID_CAP_SPEECH_RECOGNITION
capability must be enabled in
WMAppManifest.xml
• The application can also reference the
Synthesis namespace
using Windows.Phone.Speech.Synthesis;
04/10/20237
Simple Speech
• The SpeechSynthesizer class provides a simple way to produce speech
• The SpeakTextAsync method speaks the content of the string using the
default voice
•Note that the method is an asynchronous one, so the calling method must use
the async modifier
• Speech output does not require a network connection
async void CheeseLiker(){ SpeechSynthesizer synth = new SpeechSynthesizer();
await synth.SpeakTextAsync("I like cheese.");}
04/10/20238
Selecting a language
• The default speaking voice is selected automatically from the locale set for the
phone
• The InstalledVoices class provides a list of all the voices available on the phone
• The above code selects a French voice
// Query for a voice that speaks French.var frenchVoices = from voice in InstalledVoices.All where voice.Language == "fr-FR" select voice;
// Set the voice as identified by the query.synth.SetVoice(frenchVoices.ElementAt(0));
DemoDemo 1: Voice Selection
04/10/202310
Speech Synthesis Markup Language
• You can use Speech Synthesis Markup Language (SSML) to control the spoken
output• Change the voice, pitch, rate, volume, pronunciation and other
characteristics• Also allows the inclusion of audio files into the spoken output
• You can also use the Speech synthesizer to speak the contents of a file
<?xml version="1.0" encoding="ISO-8859-1"?><speak version="1.0" xmlns=http://www.w3.org/2001/10/synthesis xml:lang="en-US"> <p> Your <say-as interpret-as="ordinal">1st</say-as> request was for <say-as interpret-as="cardinal">1</say-as> room on <say-as interpret-as="date" format="mdy">10/19/2010</say-as> , arriving at <say-as interpret-as="time" format="hms12">12:35pm</say-as>. </p></speak>
Controlling Applications using Voice Commands
11
04/10/202312
Application Launching using Voice command
• The Voice Command feature of Windows Phone 7 allowed users to start
applications
• In Windows Phone 8 the feature has been expanded to allow the user to
request data from the application in the start command
• The data will allow a particular application page to be selected when the
program starts and can also pass request information to that page
• To start using Voice Commands you must Create a Voice Command Definition
(VCD) file that defines all the spoken commands
• The application then calls a method to register the words and phrases the first
time it is run
04/10/202313
The Fortune Teller Program
• The Fortune Teller program will tell
your future
• You can ask it questions and it will
display replies• It could also speak them
• Some of the spoken commands
activate different pages of the
application and others are
processed by the application when
it starts running
04/10/202314
<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/></Command><PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item></PhraseList>
The Voice Command Definition (VCD) file
• This is the “money” question:
“Fortune Teller Will I find money”
04/10/202315
<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/></Command><PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item></PhraseList>
The Voice Command Definition (VCD) file• This is the phrase
the user says to
trigger the
command
• All of the Fortune
Teller commands
start with this
phrase
04/10/202316
<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/></Command><PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item></PhraseList>
The Voice Command Definition (VCD) file• This is example text
that will be
displayed by the
help for this app as
an example of the
commands the app
supports
04/10/202317
<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/></Command><PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item></PhraseList>
The Voice Command Definition (VCD) file• This is the
command name
• This can be
obtained from the
URL by the
application when it
starts
04/10/202318
<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/></Command><PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item></PhraseList>
The Voice Command Definition (VCD) file• This is the example
for this specific
command
04/10/202319
<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/></Command><PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item></PhraseList>
The Voice Command Definition (VCD) file• This is the trigger
phrase for this
command
• It can be a
sequence of words
• The user must
prefix this sequence
with the words
“Fortune Teller”
04/10/202320
<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/></Command><PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item></PhraseList>
The Voice Command Definition (VCD) file• This is the phraselist
for the command
• The user can say any
of the words in the
phraselist to match
this command
• The application can
determine the phrase
used
• The phraselist can be
changed by the
application
dynamically
04/10/202321
<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/></Command><PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item></PhraseList>
The Voice Command Definition (VCD) file• This is the spoken
feedback from the
command
• The feedback will
insert the phrase
item used to activate
the command
04/10/202322
<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/></Command><PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item></PhraseList>
The Voice Command Definition (VCD) file• This is the url for the
page to be activated
by the command
• Commands can go to
different pages, or all
go to MainPage.xaml
if required
04/10/202323
<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/></Command><PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item></PhraseList>
The Voice Command Definition (VCD) file• These are the
phrases that can be
used at the end of
the command
• The application can
modify the phrase
list of a command
dynamically• It could give movie
times for films by
name
04/10/202324
Installing a Voice Command Definition (VCD) file
• The VCD file can be loaded from the application or from any URI• In this case it is just a file that has been added to the project and marked as
Content
• The VCD can also be changed by the application when it is running
• The voice commands for an application are loaded into the voice command
service when the application runs• The application must run at least once to configure the voice commands
async void setupVoiceCommands(){ await VoiceCommandService.InstallCommandSetsFromFileAsync( new Uri("ms-appx:///VCDCommands.xml", UriKind.RelativeOrAbsolute));}
04/10/202325
Launching Your App With a Voice Command
• If the user now presses and holds the Windows button, and says:
Fortune Teller, Will I find gold?
the Phone displays “Showing gold”
• It then launches your app and navigates to the page associated with this
command, which is /Money.xaml
• The query string passed to the page looks like this:
"/?voiceCommandName=showMoney&futureMoney=gold&reco=Fortune%20Teller%Will%20I%20find
%20gold"
Command Name
Phaselist Name
Recognized phrase
Whole phrase as it was
recognized
04/10/202326
Handling Voice Commands
• This code runs in the OnNavigatedTo method of a target page
• Can also check for the voice command phrase that was used
if (e.NavigationMode == System.Windows.Navigation.NavigationMode.New) { if (NavigationContext.QueryString.ContainsKey("voiceCommandName")) { string command = NavigationContext.QueryString["voiceCommandName"]; switch command) { case "tellJoke": messageTextBlock.Text = "Insert really funny joke here"; break; // Add cases for other commands. default: messageTextBlock.Text = "Sorry, what you said makes no sense."; break; } }}
04/10/202327
Identifying phrases
• The navigation context can be queried to determine the phrase used to trigger
the navigation
• In this case the program is selecting between the phrase used in the “riches”
question
<PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item></PhraseList>
string moneyPhrase = NavigationContext.QueryString["futureMoney"];
DemoDemo 2: Fortune Teller
04/10/202329
Modifying the phrase list
• An application can modify a phrase list when it is running• It cannot add new commands however
• This would allow a program to implement behaviours such as:
“Movie Planner tell me showings for Batman”
VoiceCommandSet fortuneVcs = VoiceCommandService.InstalledCommandSets["en-US"];
await fortuneVcs.UpdatePhraseListAsync("futureMoney", new string[] { "money", "cash", "wonga", "spondoolicks" });
Simple Speech Input
30
04/10/202331
Recognizing Free Speech
• A Windows Phone application can recognise
words and phrases and pass them to your
program
• From my experiments it seems quite reliable•Note that a network connection is required for
this feature
• Your application can just use the speech string
directly
• The standard “Listening” interface is displayed
over your application
04/10/202332
Simple Speech Recognition
• The above method checks for a successful response
• By default the system uses the language settings on the Phone
SpeechRecognizerUI recoWithUI;
async private void ListenButton_Click(object sender, RoutedEventArgs e){ this.recoWithUI = new SpeechRecognizerUI();
SpeechRecognitionUIResult recoResult = await recoWithUI.RecognizeWithUIAsync(); if ( recoResult.ResultStatus == SpeechRecognitionUIStatus.Succeeded ) MessageBox.Show(string.Format("You said {0}.", recoResult.RecognitionResult.Text));}
04/10/202333
Customizing Speech Recognition
• InitialSilenceTimeout• The time that the speech recognizer will wait until it hears speech. • The default setting is 5 seconds.
BabbleTimeout• The time that the speech recognizer will listen while it hears background
noise• The default setting is 0 seconds (the feature is not activated).
• EndSilenceTimeout• The time interval during which the speech recognizer will wait before
finalizing the recognition operation• The default setting is 150 milliseconds.
04/10/202334
Customizing Speech Recognition
• A program can also select whether or not the speech recognition echoes back
the user input and displays it in a message box
• The code above also sets timeout values
recoWithUI.Settings.ReadoutEnabled = false; // don't read the saying backrecoWithUI.Settings.ShowConfirmation = false; // don't show the confirmation
recoWithUI.Recognizer.Settings.InitialSilenceTimeout = TimeSpan.FromSeconds(6.0);recoWithUI.Recognizer.Settings.BabbleTimeout = TimeSpan.FromSeconds(4.0);recoWithUI.Recognizer.Settings.EndSilenceTimeout = TimeSpan.FromSeconds(1.2);
04/10/202335
Handling Errors
• An application can bind to events which indicate problems with the audio input
• There is also an event fired when the state of the capture changes
recoWithUI.Recognizer.AudioProblemOccurred +=Recognizer_AudioProblemOccurred;recoWithUI.Recognizer.AudioCaptureStateChanged += Recognizer_AudioCaptureStateChanged;...
void Recognizer_AudioProblemOccurred(SpeechRecognizer sender, SpeechAudioProblemOccurredEventArgs args){ MessageBox.Show("PLease speak more clearly");}
Using Grammars
36
04/10/202337
Grammars and Speech input
• The simple speech recognition we have seen so far uses the “Short Dictation”
grammar which just captures the text and returns it to the application
• You can add your own grammars that will structure the conversation between
the user and the application
•Grammars can be created using the Speech Recognition Grammar
Specification (SRGS) Version 1.0 and stored as XML files loaded when the
application runs• This is a little complex, but worth the effort if you want to create applications
with rich language interaction with the user
• If the application just needs to identify particular commands you can use a
grammar list to achieve this
04/10/202338
Using Grammar Lists
• To create a Grammar List an application defines an array of strings that form
the words in the list
• The Grammar can then be added to the recognizer and given a name
•Multiple grammar lists can be added to a grammar recognizer
• The recognizer will now resolve any of the words in the lists that have been
supplied
string [] strengthNames = { "weak", "mild", "medium", "strong", "english"};
recoWithUI.Recognizer.Grammars.AddGrammarFromList("cheeseStrength", strengthNames);
04/10/202339
Enabling and Disabling Grammar Lists
• An application can enable or disable particular grammars before a recognition
action
• It is also possible to set relative weightings of grammar lists
• The text displayed as part of the listen operation can also be set, as shown
above
recoWithUI.Settings.ListenText = "How strong do you like your cheese?";
recoWithUI.Recognizer.Grammars["cheeseStrength"].Enabled = true;
SpeechRecognitionUIResult recoResult = await recoWithUI.RecognizeWithUIAsync();
04/10/202340
Determining the confidence in the result
• An application can determine the confidence that the speech system has in the
result that was obtained• Result values are High, Medium, Low, Rejected
SpeechRecognitionUIResult recoResult = await recoWithUI.RecognizeWithUIAsync();
if ( recoResult.RecognitionResult.TextConfidence == SpeechRecognitionConfidence.High ){ // select cheese based on strength value}
04/10/202341
Matching Multiple Grammars
• If the spoken input matches multiple grammars a program can obtain a list of
the alternative results using
recoResult.RecognitionResult.GetAlternatives
• The list is supplied in order of confidence
• The application can then determine the best fit from the context of the voice
request
• This list is also provided if the request used a more complex grammar
var alternatives = recoResult.RecognitionResult.GetAlternates(3);
04/10/202342
Profanity
•Words that are recognised as profanities are not displayed in the response from
a recognizer command
• The speech system will also not repeat them
• They are enclosed in <Profanity> </Profanity> when supplied to the program
that receives the speech data
43
Review• Applications in Windows Phone 8 can use speech generation and recognition to
interact with users
• Applications can produce speech output from text files which can be marked up
with Speech Synthesis Markup Language (SSML) to include sound files
• Applications can be started and provided with initial commands by registering a
Voice Command Definition File with the Windows Phone• The commands can be picked up when a page is loaded, or the commands
specify a particular page to load• An application can modify the phrase part of a command to change the
activation commands
• Applications can recognise speech using complex grammars or simple word lists
The information herein is for informational purposes only an represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be
interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
© 2012 Microsoft Corporation.
All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.