28
Speech in .NET Sphinx CMU November 2002

Speech in.NET Sphinx CMU November 2002. 2 Presenter casey chesnut brains-N-brawn.com – Web Services – Mobile / Wireless – Speech

  • View
    216

  • Download
    1

Embed Size (px)

Citation preview

Speech in .NET

Sphinx CMU

November 2002

2

Presenter

casey chesnut brains-N-brawn.com

– Web Services– Mobile / Wireless– Speech

3

Audience

Java / C++ / VB / C# ? VoiceXml ? SALT / Speech .NET ?

4

Outline

MS Technologies VoiceXml

– Demo Speech .NET

– Demo Future Questions (throughout) ~25 slides

5

MS Technologies

Tools Devices

– Phone– Desktop PC– Pocket PC– Tablet PC

6

Tools

MS Agents SAPI / Speech SDK 5.1 (.NET wrappable) Office AutoPC ??? ASP .NET (VoiceXml) (beta) Speech .NET / IE Speech Add-In … SALT Telephony gateway (early 2003) … Pocket IE Speech Add-In (mid 2003)

7

Devices

Phone– billions of devices, people are comfortable speaking to

Desktop PC– large market, speech input is slower and uncomfortable

Pocket PC– small market, opportunities for speech (device limitations)

Tablet PC– new market, speech friendly (slate models don’t have

keyboards)

8

Phone

ASP .NET w/ VoiceXml 2.0– Production quality now– Multiple vendor support

Speech .NET VoiceOnly– Currently no way to deploy and test over a phone– Speech .NET Beta 2 has telephony simulation– MS target market for Speech .NET

9

Desktop PC

Web– Speech .NET MultiModal

Beta 2 IE Speech Add-In

– Embedded control w/SAPI– MS Agents

Fat– SAPI– MS Agents

10

Pocket PC

Web– SALT Pocket IE Speech Add-Ins (mid 2003)

Fat– 3rd parties only– MS Reader does not support TTS

11

Tablet PC - TODAY!

Web– … same as desktop PC– Beta 2 has added support for Tablet PC– Virtual keyboard has speech control

Fat– … same as desktop PC– Virtual keyboard has speech control– MS Reader should be able to support TTS– Digital Ink is currently more compelling to MS

12

VoiceXml

XML-based language– Declarative – XML tags, grammars– Procedural – Javascript

Telephony Gateway is the client

– Event driven – Bargein, Goodbye– Object oriented – Properties

13

Usage

Input – Speech Recognition (Command and Control)– DTMF– Voice recording and posting to a server

Output– Text-To-Speech– Prerecorded audio files

Telephony control– Hang-up, Transfers, …

14

Architecture

15

VoiceXml

DEMO– /vxml (VS.NET)– Mobile ADK (menu1.aspx)– BeVocal

16

VoiceXml - SALT

VoiceXml : ??? : : SALT : Speech .NET– Nuance has some WYSIWYG

SALT is considered lightweight to VoiceXml SALT was submitted to W3C August 2002 VoiceXml is v2.0 in W3C

– Mandatory W3C grammar spec Beta 2 Speech .NET has moved to W3C SRGS

VoiceXml has complementary specs (ccXml) VoiceXml is moving to MultiModal as well

17

VoiceXml - SALT

VoiceXml = AT&T, Motorola, TellMe, (IBM) SALT = MS, SpeechWorks, Intel, (BeVocal) VoiceXml has multiple vendor support with

venture capital from before the burst Most vendors will support both specs VoiceXml has ~ 15,000 developers SALT has potentially millions

18

SALT

I have not read the new spec Remember doing an in-head mapping to VoiceXml

when reading an early spec Why

– Common spec for MultiModal operation– Multiple modes of interaction with the same syntax– Speech enabling existing sites

Why not VoiceXml– MultiModal retrofit harder than redo

19

Speech .NET

MS implementation of SALT (VoiceWebSolutions + DreamWeaver MX) Some Beta 1 Speech .NET apps still work,

because SALT has not changed much, but Speech .NET Beta 2 controls have

VoiceXml not as portable between vendors as it should be, the Speech .NET controls could help mitigate this for SALT– i.e. layer of abstraction for voice browser wars

20

Architecture

21

Code

Creating static grammars and prompts Very little server-side code

– Only dynamic grammars / prompts– Server-side code mods to better support speech

Mainly setting properties on Speech controls and tying to client-side javascript

Tie javascript to mouse-click events to avoid redundant code

22

Impression

Separate app layers to reduce complexity– Voice UI will be less functional, design is key

Learning low level SALT might be easier than high level Speech .NET controls

Application controls change this in Beta 2 Speech .NET has a great debugger (now server side

too), grammar, and prompt tools Speech Control Editor was needed for dev IE Audio meter was needed for MultiModal MultiModal has some time to grow

23

Speech .NET

DEMO– Speech .NET Beta 2 (VS .NET)– /noHands (VoiceOnly web app)

24

Industry

Wrote 1st VoiceXml article a year ago– Received 1st proposal request last month– 1 other proposal request since then

Wrote 1st Speech .NET article 5 months ago– Request for an article from MSDN magazine

25

Voice Recognition

PSTN is less secure than Internet!– More accessible and easier to automate hack

Traditionally spoken password OR DTMF pin, also # Clients always confuse with speech recognition Not a part of VoiceXml or SALT specs

– Telephony gateways proprietary implementations Not useful for identifying somebody Useful for confirming somebody is whom they say

they are Prints have to change when device changes

26

Future (MS Speech)

SALT Telephony gateways Speech .NET (VoiceOnly then MultiModal) Pocket IE Speech Add-In NET Fat-client Speech APIs

– Desktop / Tablet / PPC

MS or 3rd party VS .NET VoiceXml controls Possibility for Speech .NET controls to

render both SALT and VoiceXml

27

Future

Lots of W3C Voice specs … VoiceXml MultiModal browser Auto (hands-free, navigation, radio) 3G (bridge voice and wireless web)

– offload Speech processing– VOIP or PSTN– Pocket PC Phone Edition / SmartPhones

IBM recently announced chip for Speech on mobile devices

28

Questions