36
State of the Art/Best Practices in Speech Technology Dan Burnett, Director of Speech Technologies

Voxeo Summit 2010: Best Practices in Speech Technology

Embed Size (px)

DESCRIPTION

At the Voxeo Customer Summit 2010, Dan Burnett, Voxeo's Director of Speech Technologies, explained the state of the art with regard to speech technologies and what the best practices are in implementing speech-enabled applications and technology. More information at: http://www.voxeo.com/ http://www.voxeo.com/summit2010

Citation preview

Page 1: Voxeo Summit 2010: Best Practices in Speech Technology

State of the Art/Best Practices in Speech Technology Dan Burnett, Director of Speech Technologies

Page 2: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

Why speech?

2

Ma Ma

Vok Say Oh

Page 3: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

Speech is the natural human interface

  15% of world population has a personal computer

  Greater than 60% of world population has a mobile phone

Page 4: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

What is communication?

You (Your speech-enabled IVR)

Your Customer

Page 5: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

Communication is natural?

249694

Page 6: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

But for IVRs . . .

You (Your untuned speech-enabled IVR)

Your Customer

Page 7: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

So why do we tune?

  For better communication, which leads to

 More satisfied customers

 Shorter call durations

Page 8: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

What can we tune?

Your untuned speech-enabled IVR

Page 9: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

What can we tune?

Your untuned speech-enabled IVR

Page 10: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

What we say – prompts

  Goal: naturally reduce variability in caller's responses

  Because: predictability simplifies grammars and increases recognition accuracy

Page 11: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

Prompt tuning

  Vocabulary •  Use the words your customers use •  For sales, say “sales”; For billing, say “billing”; ... •  Are you calling to learn more about our products, to

fix a problem with your bill, or …

  Keep in mind •  Speech allows your customer to describe things

THEIR way rather than to use your internal company description

•  Make it easier for them to do that!

Page 12: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

Prompt tuning

  Prompt specificity •  General: “What would you like?” •  More specific: “Which department would you like?” •  Precise: “Would you like A, B, C, or something else?”

  Keep in mind •  The caller will often use the exact words YOU use

Page 13: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

Ever heard this before?

  For Sales, press 1

  For Billing, press 2

  For option I can't remember, press 3

  For another option I can't remember, press 4

  For yet another option I can't remember, press 5

  For more of the same, press 6

  Blah blah, press 7

  For help with this menu, press 8

  To hear these options again, press 9

Page 14: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

Prompt tuning

  Prompt length •  Keep it short: less than a few sentences total, only

one of which asks for input •  Or: provide pauses (at least one second long) for

interruption

  Keep in mind •  Speech communication is only natural if it's not

drawn out •  Primacy and recency effects

Page 15: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

What can we tune?

Your untuned speech-enabled IVR

Page 16: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

What we listen for – grammars

  Goal: Cover everything they are likely to say, and nothing more

  Because: Accuracy in grammar coverage directly affects recognition accuracy

Page 17: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

Grammar tuning

  Cover everything they say •  Pre- and post- phrases such as please, I would like,

and thank you •  Synonyms such as (for yes/no) yeah, sure, absolutely

not

  Keep in mind •  Recognizers can only hear it if it's in the grammar

Page 18: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

Grammar tuning

  Include only what they say •  Write grammars that don't overgenerate •  If matching numbers/digits, only include valid strings

if at all possible

  Keep in mind •  Every unnecessary grammar phrase is a potential

misrecognition

Page 19: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

What can we tune?

Your untuned speech-enabled IVR

Page 20: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

How we listen – parameter optimization

  Goal: Optimize recognizer parameter settings

  Because: Better accuracy, of course!

Page 21: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

Parameter optimization – which parameters?

  Rejection threshold

  Endpointer settings (sensitivity)

  Large grammar parameters

Page 22: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

Rejection threshold – what is it?

Misrecognitions

False Rejections

Rejection Threshold 0 100

Page 23: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

Rejection threshold – what is it?

Misrecognitions

False Rejections

Rejection Threshold 0 100

Cutoff value for the recognizer confidence below which the speaker's utterance will be rejected

Page 24: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

Rejection threshold – total error

Misrecognitions

False Rejections

Rejection Threshold 0 100

Page 25: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

Rejection threshold – comparison

Rejection Threshold 0 100

ASR Engine A

ASR Engine B

Page 26: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

Rejection threshold – comparison

Rejection Threshold 0 100

ASR Engine A

ASR Engine B

Optimal thresholds

Page 27: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

Rejection threshold – another comparison

Rejection Threshold 0 100

ASR Engine A

ASR Engine B

Optimal thresholds

Page 28: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

Parameter optimization

  Rejection threshold •  Generally largest impact on accuracy •  Optimum varies across recognition engines •  Optimum varies by set of active grammars

  Keep in mind •  Optimizing the rejection threshold is the SINGLE

MOST IMPORTANT parameter tuning you can do

Page 29: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

Endpointer sensitivity

You (Your hard-of-hearing speech-enabled IVR)

Your Customer

Page 30: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

Parameter optimization

  Endpointer sensitivity •  Second-largest impact on accuracy •  Unnecessarily high and low sensitivity are both bad •  Optimum should be set once, checked annually

  Keep in mind •  If the recognizer can't hear you, it can't understand

what you say

Page 31: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

Parameter optimization

  Large grammar parameters •  Typically need to be adjusted if grammar has more

than 5000 entries •  Typically consumes more memory and/or CPU •  Vary by ASR engine, so ask

  Keep in mind •  If your grammar has many options, your recognizer

needs to “think” more than the default settings usually allow

Page 32: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

What can we tune?

Your untuned speech-enabled IVR

Page 33: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

Summary – Keep in mind

  Speech allows your customer to describe things THEIR way rather than to use your internal company description – make it easy for them!

  The caller will often use the exact words YOU use

  Speech communication is only natural if it's not drawn out

  Recognizers can only hear it if it's in the grammar

  Every unnecessary grammar phrase is a potential misrecognition

  Optimizing the rejection threshold is the SINGLE MOST IMPORTANT parameter tuning you can do

  If the recognizer can't hear you, it can't understand what you say

  If your grammar has many options, your recognizer needs to “think” more than the default settings usually allow

Page 34: Voxeo Summit 2010: Best Practices in Speech Technology

© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation

For help

34

Page 35: Voxeo Summit 2010: Best Practices in Speech Technology

State of the Art/Best Practices in Speech Technology Dan Burnett, Director of Speech [email protected]

Page 36: Voxeo Summit 2010: Best Practices in Speech Technology