Holly5 VoiceXML Developer Guide - West Corporation · 3.2.3 Loquendo ..... 19 . Confidential Final Holly5 VoiceXML Developer Guide, v1-0, 22 December ... Holly5 VoiceXML Developer

Holly5 VoiceXML Developer Guide Holly Voice Platform 5.1

Document number: hvp-vxml-0009

Version: 1-0

Issue date: December 22 2009

Confidential

Final

Holly5 VoiceXML Developer Guide, v1-0, 22 December 2009

2/61

Copyright

© Copyright 2013 West Corporation. These documents are confidential and contain proprietary

information. No part of these documents may be reproduced, published or disclosed in whole or part,

by any means: mechanical, electronic, photocopying, recording or otherwise without the prior written

permission of West Corporation. or Holly Australia Pty Ltd.

The information contained in this document is strictly commercial in confidence and can only be

provided to persons who have signed a non-disclosure agreement. This document is not to be copied

without prior written consent.

Control

Version

Date

Change Notes

Author

1-0

22 Dec 2009

Approved for release

A Hunt

Related Documents

Document Title

Doc Number

Holly Voice Platform Release Notes – HVP Release 5.1

hvp-rpt-141

Holly Management System User Guide – HVP Release 5.1

hvp-hms-0013

Holly Voice Platform Operations Guide – HVP Release 5.1

hvp-0028

McGlashan et al., Voice Extensible Markup Language (VoiceXML) Version 2.0, W3C

Recommendation 16 March 2004, http://www.w3.org/TR/2004/REC-voicexml20-20040316/

VoiceXML 2.0

Oshry et al., Voice Extensible Markup Language (VoiceXML) 2.1, W3C Recommendation 19

June 2007, http://www.w3.org/TR/2007/REC-voicexml21-20070619/

VoiceXML 2.1

D. Kristol and L. Montuli, “HTTP State Management Mechanism”, RFC 2965, October 2000

RFC2965

T. Berners-Lee, R. Fielding, U.C. Irvine and L. Masinter, “Uniform Resource Identifiers (URI):

Generic Syntax”, RFC 2396, August 1998

RFC2396

McGlashan & Hunt, Speech Recognition Grammar Specification Version 1.0, W3C

Recommendation 16 March 2004, http://www.w3.org/TR/2004/REC-speech-grammar-

20040316/

SRGS 1.0

Burnett, Walker & Hunt, Speech Synthesis Markup Language (SSML) Version 1.0, W3C

Recommendation 7 September 2004, http://www.w3.org/TR/2004/REC-speech-synthesis-

20040907/

SSML 1.0

Tichelen & Burke, Semantic Interpretation for Speech Recognition (SISR) Version 1.0, W3C

Working Draft 3 November 2006, http://www.w3.org/TR/2006/WD-semantic-interpretation-

20061103/

SISR 1.0

http://www.w3.org/TR/2004/REC-voicexml20-20040316/


http://www.w3.org/TR/2004/REC-speech-grammar-20040316/

http://www.w3.org/TR/2004/REC-speech-grammar-20040316/

http://www.w3.org/TR/2004/REC-speech-synthesis-20040907/

http://www.w3.org/TR/2004/REC-speech-synthesis-20040907/

http://www.w3.org/TR/2006/WD-semantic-interpretation-20061103/

http://www.w3.org/TR/2006/WD-semantic-interpretation-20061103/

Confidential

Final


3/61

Table of Contents

1. Introduction .......................................................................................... 6

2. VoiceXML Documents and Execution ............................................................ 7

2.1 Standards Compliance ........................................................................ 7

2.1.1 VoiceXML 2.0 & 2.1 Compliance .................................................. 7

2.1.2 Compatibility of VoiceXML 2.0 & 2.1 ............................................. 7

2.1.3 VoiceXML Extensions ................................................................ 7

2.1.4 Holly Voice Platform Dependent Behaviors ...................................... 7

2.2 VoiceXML Document Handling ............................................................... 7

2.2.1 VoiceXML Document Headers ...................................................... 7

2.2.2 Header and Parsing ................................................................. 8

2.2.3 Scope of VoiceXML Properties ..................................................... 8

2.2.4 Character Sets and Supported Languages ........................................ 9

2.3 Events .......................................................................................... 9

2.4 Errors ........................................................................................... 9

2.4.1 Fetch Errors .......................................................................... 9

2.4.2 Other Errors ........................................................................ 11

2.5 Fetching Behaviors .......................................................................... 12

2.5.1 Initial Document Failover ........................................................ 12

2.5.2 HTTP ................................................................................ 12

2.5.3 HTTPS ............................................................................... 13

2.6 HTTP User-Agent Header .................................................................. 13

2.6.1 DNS Resolving Behavior ........................................................... 14

2.6.2 Cookies ............................................................................. 14

2.6.3 Caching ............................................................................. 14

2.6.4 Disabling Caching .................................................................. 15

2.7 Default Property Values .................................................................... 16

2.8 Access Control on <data> .................................................................. 16

2.9 Browser Protections ........................................................................ 17

3. Input: Speech Recognition and DTMF ........................................................... 18

3.1 Selecting a Speech Recognizer ............................................................ 18

3.1.1 ASR Engine Switching ............................................................. 18

3.1.2 Allowed ASR Engines .............................................................. 19

3.2 ASR-Specific Behaviors ..................................................................... 19

3.2.1 Engine-Specific Properties ....................................................... 19

3.2.2 vLingo ............................................................................... 19

3.2.3 Loquendo ........................................................................... 19

Confidential

Final


3.3 ASR Sessions ................................................................................. 20

3.4 Timeouts (speech) .......................................................................... 20

3.5 Confidence Scores .......................................................................... 20

3.6 N-Best ........................................................................................ 20

3.7 Record ........................................................................................ 20

3.8 Recording User Utterances during Recognition ......................................... 21

3.9 Re-Recognition from Recorded Utterances .............................................. 21

3.10 Grammars .................................................................................... 21

3.10.1 Standard Grammars ............................................................... 21

3.10.2 Proprietary Grammars ............................................................ 22

3.10.3 External Grammars ............................................................... 22

3.10.4 Pre-built and Binary Grammars .................................................. 22

3.10.5 Universal Command Grammars .................................................. 22

3.10.6 Builtin Support ..................................................................... 22

3.10.7 Grammar Fetch Behavior ......................................................... 23

3.11 DTMF ......................................................................................... 24

3.11.1 interdigittimeout and termtimeout ............................................. 24

3.11.2 termchar ........................................................................... 24

3.12 DTMF Buffering. ............................................................................. 24

3.13 Holly DTMF Recognizer v2 ................................................................. 25

4. Output: Prompting and TTS ...................................................................... 27

4.1 Selecting a TTS Engine ..................................................................... 27

4.1.1 TTS Switching ...................................................................... 27

4.1.2 Using a Non-default TTS Voice .................................................. 27

4.1.3 SSML ................................................................................ 28

4.2 Audio Files ................................................................................... 28

4.2.1 Throwing Errors on Audio Fetch Failures ....................................... 28

4.3 <mark> Element ............................................................................. 29

5. Telephony: Session & Transfers ................................................................. 30

5.1 Session Variables ............................................................................ 30

5.1.1 Session Variables for Outbound ................................................. 31

5.2 Session.connection.aai Example .......................................................... 31

5.3 Passing Data Between Sessions ............................................................ 32

5.4 Transfers ..................................................................................... 32

5.4.1 Transfer Types ..................................................................... 32

5.4.2 Destination URIs ................................................................... 33

5.4.3 Transfer CLID ...................................................................... 33

5.4.4 Recognition During Transfer ..................................................... 33

5.4.5 Whisper Transfer .................................................................. 33

Confidential

Final


6. Logging ............................................................................................... 40

6.1 Events ........................................................................................ 40

6.1.1 Configuring Event Logging ....................................................... 43

6.2 <log> Element ............................................................................... 43

6.2.1 Label on <log> ..................................................................... 44

6.2.2 Changing the Event Type ......................................................... 44

6.2.3 Objects and Arrays ................................................................ 44

6.2.4 ECMAScript Log Function ......................................................... 45

6.3 Call Record: LOG_CALLS ................................................................... 45

6.4 Log Suppression ............................................................................. 46

6.4.1 Exceptions to Suppression ....................................................... 47

6.4.2 Record of Suppression ............................................................ 47

6.4.3 Logging Masked Data .............................................................. 48

6.5 Raising Alarms ............................................................................... 48

A. Appendix: Application Parameters .............................................................. 50

A.1 VoiceXML ..................................................................................... 50

A.2 Speech Recognition ......................................................................... 51

A.3 DTMF ......................................................................................... 52

A.4 Text to Speech .............................................................................. 52

A.5 Logging ....................................................................................... 52

A.6 Telephony .................................................................................... 53

B. Appendix: Re-Recognition from Recorded Utterance ....................................... 54

B.1 Re-recognition in VoiceXML Applications ................................................ 55

B.2 Prompts and Barge-in ...................................................................... 58

B.3 ASR Configuration ........................................................................... 58

C. Appendix: Holly DTMF Recognizer v2 .......................................................... 59

C.1 SRGS+XML .................................................................................... 59

C.2 Sample grammars ........................................................................... 59

Confidential

Final


1. Introduction

The purpose of this document is to provide a guide for VoiceXML developers who are constructing

VoiceXML applications to run on the Holly Voice Platform. It documents the characteristics of the Holly

VoiceXML implementation and details the supported VoiceXML extensions to the standard.

This document does not provide a full introduction to programming in VoiceXML and some knowledge of

the VoiceXML 2.0 & 2.1 standards is assumed. See

VoiceXML 2.0: http://www.w3.org/TR/2004/REC-voicexml20-20040316/ (16 March 2004)

VoiceXML 2.1: http://www.w3.org/TR/2007/REC-voicexml21-20070619/ (19 June 2007)



Confidential

Final


2. VoiceXML Documents and Execution 2.1 Standards Compliance

2.1.1 VoiceXML 2.0 & 2.1 Compliance

Holly is VoiceXML 2.0 and VoiceXML 2.1 conformant platform according to the following specifications.

VoiceXML 2.0: http://www.w3.org/TR/2004/REC-voicexml20-20040316/ (16 March 2004)

VoiceXML 2.1: http://www.w3.org/TR/2007/REC-voicexml21-20070619/ (19 June 2007)

The Holly Voice Platform supports all required capabilities defined in these standards and supports

most of the optional features as documented in this guide.

In terms of the VoiceXML architectural model, the Holly Voice Browser comprises a VoiceXML

Interpreter and a VoiceXML Interpreter Context integrated with the Holly Voice Platform.

2.1.2 Compatibility of VoiceXML 2.0 & 2.1

VoiceXML 2.1 is fully compatible with VoiceXML 2.0 so there is no requirement for application

migration. Further, Holly enables VoiceXML 2.0 and 2.1 applications to co-reside on the same platform

and even allows a single call or application to mix VoiceXML 2.0 and 2.1 content.

2.1.3 VoiceXML Extensions

Holly sees clear benefit to customers by providing faithful and compliant implementation of open

standards. In addition to having a Certified 100% Compliant implementation of the VoiceXML standard,

Holly has implemented limited extensions where required to provide customers with functionality that

cannot be directly implemented through the standard. These extensions are clearly labeled as such in

this document.

2.1.4 Holly Voice Platform Dependent Behaviors

In implementing the VoiceXML specification, Holly delivers a range of value-add capabilities that

enhance its utility for development and operations staff whilst maintaining full compatibility to the

standard.

Note: Some properties depend on the configuration of the Holly Voice Platform as described

elsewhere in this document. Platform administrators may choose to switch off support for

some features and therefore properties may have no effect despite being set correctly.

2.2 VoiceXML Document Handling

2.2.1 VoiceXML Document Headers

When using VoiceXML 2.1 features, the following declaration must be present in the document header:

<vxml version="2.1" xmlns="http://www.w3.org/2001/vxml">

Following is a sample VoiceXML 2.1 header:



http://www.w3.org/2001/vxml

Confidential

Final


<vxml xmlns="http://www.w3.org/2001/vxml"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://www.w3.org/2001/vxml

http://www.w3.org/TR/voicexml20/vxml.xsd"

version="2.1">

Following is a sample VoiceXML 2.0 header:

<vxml xmlns="http://www.w3.org/2001/vxml"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://www.w3.org/2001/vxml

http://www.w3.org/TR/voicexml20/vxml.xsd"

version="2.0">

2.2.2 Header and Parsing

For conforming validation of input VoiceXML documents, the ‘xmlns’ attribute must be supplied on the

root element of the document with the value “http://www.w3.org/2001/vxml” as required by

VoiceXML 2.0.

The handling of the ‘xmlns’ attribute is documented in the following table.

‘xmlns’ attribute value

Document Treatment

http://www.w3.org/2001/vx

ml

Indicates that the document is a Conforming VoiceXML 2.0 document. The Holly

Voice Platform performs strict document checking according to the standard.

<not defined>

The Holly Voice Platform assumes a VoiceXML 2.0 document and performs loose

validation of the document content.

<other namespaces>

The Holly Voice Platform throws an ‘error.badfetch’ event.

The application parameter ‘nonstrictvxml’ can be set to “true” to achieve loose validation of

document content even when the ‘xmlns’ attribute is present on the root element of documents. This

parameter affects the parsing of all documents loaded during a call, and also causes the ECMAscript

interpreter to ignore references to undefined properties.

The ‘nonstrictvxml’ parameter can only be set as an application parameter via the Holly Management

System; it cannot be set as a VoiceXML property.

With ‘nonstrictvxml’ set to true the parser uses DTD rather than VXML schema.

2.2.3 Scope of VoiceXML Properties

The VoiceXML property behaviors are fully implemented following Section 6.3 of VoiceXML 2.0.

Properties may be declared with application scope (in the root document), with document scope

(within a <vxml> element), or for a particular <menu>, <form>, or form item.

Properties apply to their parent element and all the descendants of the parent. A property at a lower

level overrides a property at a higher level. When different values for a property are specified at the

same level, the last one in document order applies. Properties specified in the application root

document provide default values for properties in every document in the application; properties

specified in an individual document override property values specified in the application root document.

Additionally, the Holly Voice Platform provides the ability to set properties at the session level

(interpreter context). Session properties are set as an Application Paramater on the Applications page


http://www.w3.org/2001/XMLSchema-instance


http://www.w3.org/TR/voicexml20/vxml.xsd







http://www.w3.org/2001/vx

Confidential

Final


in the Holly Management System (administrators should refer to the Holly Management System User

Guide for details on modifying application parameters). The session scope is above the application

scope as defined in VoiceXML 2.0, section 5.1.2.

2.2.4 Character Sets and Supported Languages

By default the Holly Voice Platform assumes a language identifier of ‘en-AU’. Other values may be

specified in VoiceXML documents using the ‘xml:lang’ attribute, but they will result in

‘error.unsupported.language’ events being thrown unless they are also listed in the ‘Supported

Languages’ configuration parameter. Note that ASR and TTS engines may require additional

configuration to support other languages.

HVP 5.1 has support for the Latin-1 character set (ISO-8859-1) in grammar files, TTS strings, ECMAScript,

in line grammar content, and similar functionality, e.g. “é” as in “café”.

HVP 5.1 supports UTF-8 in recognizers, TTS, VoiceXML applications, grammars and reporting.

2.3 Events

Errors on Document Transition

The Holly Voice Platform handles document transition errors, typically ‘error.badfetch’, in the scope of

the calling document.

2.4 Errors

This section documents common errors generated by the Holly Voice Browser as viewable in the event

logs through the Holly Management System. For errors not described in this section, please contact

Holly Customer Support.

2.4.1 Fetch Errors

All fetch errors result in a fetch failure and an error.badfetch will be thrown within the application.

The web fetching architecture of the Holly Voice Browser is implemented in the four layers shown

below. Fetch errors propagate through these four layers resulting in multiple error reports for a single

anomalous event. Typically for any fetch anomaly there will be an error from each layer (in a few cases

there may be two from layer 3).

Layer

Error numbers

1 High level I/O layer

203, 211, 576

2 High level network I/O layer

204, 206, 228

3 HTTP protocol layer

217, 219, 221, 236, 237, 244, 249

4 Low level socket and SSL layer

241, 500, 700

Error

#

Distinguishing error text

Cause

Does Holly terminate call

Comments

Confidential

Final


Error

#


Cause


Comments

203

Unable to open

URI

An attempt to fetch

a VoiceXML

document failed with

one of the errors

documented above

On first document

triggers failover;

on subsequent

documents

handled by

application

The specific error condition will be

reflected in other error messages.

204

Open error +

<URL>

An error occurred

while attempting to

open a connection to

the server

No

Investigate a network or application

server issue.

206

Read error +

rc=51

The Holly Voice

Browser timed out

during a read

operation on an

HTTP(S) connection

No


server issue.

211

Audio fetch

failed + <URL>

The audio resource

could not be

retrieved

No

This is a “top level” error message;

other messages give more specific

explanation.

217

open failed

(internal error)

+ <URL>

Failed to open a

connection to the

server denoted by

the URL

No


server issue.

221

Failed to get

HTTP status

The status line in the

HTTP response could

not be parsed

No

More specific errors will also be

reported; investigate a network or

application server issue.

228

Fetch operation

timed out

An attempt to fetch

a resource exceeded

fetchtimeout while

creating or reading

from a connection

No


server issue, or change the fetchtimeout

property.

236

Fetch timeout

exceeded during

Open

An attempt to open a

connection exceeded

fetchtimeout

No


server issue.

237

Fetch timeout

exceeded during

read

A connection was

created successfully,

but while attempting

to read from it

fetchtimeout was

exceeded

No


server issue.

241

Socket error –

timeout

connecting to

socket

Failed to open a

connection to a

particular IP address

No


server issue.

244

Socket error –

cannot connect

+ <URL>

An error, other than

a timeout, occurred

while connecting to a

server

No

More specific errors will also be

reported; investigate a network or

application server issue.

Confidential

Final


Error

#


Cause


Comments

249

No data

available on

socket for

reading

An error occurred

while trying to read

the status line in the

HTTP response

No

Error 221 and more specific errors will

also be reported; investigate a network

or application server issue.

500

Socket

operations:

System call

failed +

errno=131

Socket read failed

because the

connection was

closed by the server

No

Typically caused by a network timeout

on an idle persistent connection.

576

Grammar fetch

failed + <URL>

An attempt to fetch

an external grammar

document failed with

one of the errors

documented above

No

The specific error condition will be

reflected in other error messages.

700

SSL operation

failed +

SSL_get_error=5

Premature close of

an SSL connection

No

There are two underlying causes of the

error: the remote end is not strictly

adhering to the SSL protocol, or an idle

persistent connection has timed out. In

the first case the error can be ignored.

2.4.2 Other Errors

Error

#


Cause


Comments

205

Unable to parse

contents of URI.

The VoiceXML

document does not

consist of valid

VoiceXML.

On first document

triggers failover; on

subsequent

document fetches a

VoiceXML error is

raised and can be

handled by the

application

If the document dump call event is

enabled for the application, Holly

Management System will show the

errors in the document.

Correction to the VoiceXML document

is required.

212

Licence

Manager

connection

failed

The Holly Voice

Browser was unable

to communicate with

any of the configured

Holly Licence

Managers

Yes

Check that Holly Licence Managers

configured for Holly Voice Browser are

correct.

Check that Holly Licence Managers are

running.

517

Error activating

grammar

The ASR engine

returned an error

when the Holly Voice

Browser attempted

to activate a

grammar

No

There may be a semantic error in the

grammar (for example, incorrect tag

format), or if the default grammar

fetch style is used there may be

character encoding problems with

XML files.

Check the ASR engine logs for details

of the grammar error.

If the default grammar fetch style is

used, set

‘com.holly.grammarfetchstyle’ to

“absolute”.

Confidential

Final


Error

#


Cause


Comments

550

OSBrec:

Recognition

failed +

reason=gramma

r error

Unknown or illegal

grammar not caught

as an activation error

The Holly Voice

Browser throws the

event

"error.recognition”.

If the application is

unable to handle

the event the call

terminates.

Review the grammar or the ASR

engine diagnostic logs for more

information.

550

OSBrec:

Recognition

failed +

reason=recogniti

on error

Unknown or illegal

grammar not caught

as an activation

error.

Or, other error raised

by the speech

recognizer.

The Holly Voice

Browser throws the

event

"error.recognition”.

If the application is

unable to handle

the event the call

terminates.

Review the grammar or the ASR

engine diagnostic logs for more

information.

2.5 Fetching Behaviors

2.5.1 Initial Document Failover

The initial URL for an application is configured through the Application page in HMS. Holly allows

multiple initial URLs to be specified. The platform starts by attempting to execute from the first URL.

Upon failure to execute the first URL the platform will failover to the second defined URL, then to the

third and so on.

Failover of the initial URL is triggered by any of the following:

• Failure to fetch an application’s initial document or its root document

• Failure to fetch the application’s initial document or root document before a timeout defined on

the Application page in HMS

• Failure to parse either the application’s initial document or root document.

If failover exhausts all declared initial URIs then the call is rejected. The default behavior is to play

the following error message: "We are currently experiencing technical difficulties. Please try again

later."

The default error message is configurable through the default error handler in the defaults document

specified by ‘uriplatformdefaults.browser’ (URI Platform Defaults). Other rejection behaviors can be

configured through HMS (see the Holly Operations Manual).

2.5.2 HTTP

The Holly Voice Platform supports HTTP/1.0 and HTTP/1.1.

HVP implements HTTP persistence for both standard HTTP and secure HTTPS.

HTTP Persistence

Confidential

Final


It is sometimes necessary to align the persistence of HTTP sessions of the VoiceXML browser and

application server. The platform may be configured by the platform adminstrator for one of three

modes of persistence with the 'httppersistencelevel' configuration property.

• 10 Request: connection is maintained for a duration of a single HTTP request

• 20 Session (default): connection is maintained for the duration of a VoiceXML session

• 30 Indefinite: connection is maintained indefinitely and may be used in subsequent calls

The Application Parameter inet.sessionpersistence may be set to overwrite the browser configuration.

A value of true is equivalent to Session level. A value of false in equivalent to indefinite.

HTTP Retries

The browser will always attempt a retry of HTTP or HTTPS GET or POST requests to a server if no data

has been sent to the server while attempting the connection. This is a safe retry as the server is

unaware of the browser's intention to send a request and thus the server doesn't change state of the

call (if any).

Application parameters control the retry policy in the situation when the browser was able to send a

partial or complete request to the server or received a partial or complete reply from the server but

subsequent communication failed on the TCP/IP level or there was a problem parsing the HTTP

response from the server. The browser will retry the request if a relevant parameter permits it.

Parameter

Description

Values

com.holly.retryget

Control HTTP GET request retry policy.

true

false (default)

com.holly.retrypost

Control HTTP POST request retry policy.

true

false (default)

In the cases of both safe and unsafe retries the browser will attempt the retry only once.

In the case of an unsafe retry the browser won't attempt retrying if it fails when reading the body of a

response from the server.

2.5.3 HTTPS

The Holly Voice Platform supports the ‘https’ URI scheme.

The HVP fully supports accessing VoiceXML documents over HTTP as well as HTTPS where a

secure/encrypted connection is required.

The Holly Voice Browser supports certificate-based authentication of a server and certificate-based

client authentication (the latter is at the server’s request). When configuring an HTTPS-based

VoiceXML application server for use with the HVP, please note that SSL certificate validation is

dependent on the application parameter ‘ssl.authenticateserver’ (“false” = no certificate validation,

“true” = perform certificate validation) which can be set on the HMS Applications page. The SSL

certificate chosen for use can be self-signed, or signed by an authority such as a Certificate Authority.

2.6 HTTP User-Agent Header

The Holly Voice Browser sets the ‘User-Agent’ header in HTTP requests to “HVP/5.1”. It is configurable

via the ‘HTTP User Agent’ configuration parameter. The User-Agent request-header field contains

Confidential

Final


information about the user agent originating the request. Refer to RFC 2616, Section 14.43,

http://www.w3.org/Protocols/rfc2616/rfc2616.html.

2.6.1 DNS Resolving Behavior

The default behavior is for all fetches to resolve DNS when performing the fetch.

Setting the application parameter “resolvehostnames=true” on the HMS Applications page resolves

hostnames on failover and all subsequent fetches will use the IP address(es) obtained from DNS

resolution.

2.6.2 Cookies

The Holly Voice Platform supports cookies as described in [RFC 2965]. Multiple cookies are sent as

separate Cookie request headers, not as a list in a single Cookie request header.

If the application parameter ‘singlecookieheader’ is set, and there is more than one cookie for an HTTP

request, the browser sends the cookies folded into a single HTTP Cookie header, as described in RFC

2965, section 3.3.4.

2.6.3 Caching

Document caching on the Holly Voice Platform is determined by the HTTP cache control headers

supplied by the application server. The VoiceXML cache control properties ‘documentmaxage’,

‘documentmaxstale’, ‘grammarmaxage’, ‘grammarmaxstale’, ‘datamaxage’, ‘objectmaxage’,

‘objectmaxstale’, ‘scriptmaxage’, and ‘scriptmaxstale’ can be used to control caching from within

VoiceXML applications. Refer to [VoiceXML 2.0] section 6.3.5.

HTTP headers give a lot of control over caching. Voice applications, CGI scripts, or Web server may

generate them in response to HVP browser requests (see the diagram above).

A typical HTTP response header might look like this:

HTTP/1.1 200 OK

Date: Thr, 22 Jun 2006 15:37:45 GMT

Server: Apache/1.3.3 (Unix)

Cache-Control: max-age=3600

Expires: Fri, 23 Jun 2006 21:30:45 GMT

Last-Modified: Mon, 19 Jun 2006 10:07:15 GMT

ETag: "3e95-520-33c6faaf"

Content-Length: 2040 Content-

Type: audio/x-wav

Cache related HTTP headers

Header

Semantic

max-age=seconds

Specifies the maximum amount of time in seconds when a cached copy will be

considered fresh. This directive is relative to the time of the request. Setting this

parameter to zero (max-age = 0) suppresses caching.

Example:

Cache-Control: max-age=60

s-max-age=seconds

Same as “max-age”. “s-max-age” takes precedence over “max-age” if both headers are

present.

Example:

Cache-Control: s-max-age=60

no-store

Instructs the cache not to keep a copy of the resource under any conditions

Example:

http://www.w3.org/Protocols/rfc2616/rfc2616.html

Confidential

Final


Header

Semantic

Cache-Control: no-store

no-cache

The cache doesn’t use the response without revalidation with the origin server. This

prevents caching or forces revalidation of the resource on every request.

Example:

Cache-Control: no-cache

Pragma: no-cache

Same as no-cache. Used in HTTP/1.0 protocol.

Example:

Pragma: no-cache

Expires: date

The field gives the date/time after which the response is considered stale. The cache

does not return a stale cache entry without revalidation with the origin server.

This header may be useful in some situations, but it has certain limitation. First it is easy

to forget to update this header, which will suppress caching after the set date. Second,

if the server clock and HVP clock are not synchronized then this header may cause an

undesired effect. The “max-age” directive, if specified, overrides the “Expires” header.

Example:

Expires: Fri, 23 Jun 2006 21:30:45 GMT

Last-Modified: date

Indicates the date and time at which the origin server believes the resource was last

modified.

If none of “Expires”, “max-age”, or “s-maxage” appears in the response, and the

response does not include other restrictions on caching, the cache computes a freshness

lifetime using a heuristic. The cache copy expires in a period which is 10% of the

difference between “now” time and Last-Modified time. E.g. if Last-Modified is 60

seconds before “now”, then the entry stales in approximately 6 seconds in a future.

Servers should send Last-Modified header whenever feasible to provide means of

validating of resources.

Example:

Last-Modified: Mon, 19 Jun 2006 10:07:15 GMT

private

Indicates that the response is intended for a single user and must not be cached by a

shared cache. HVP cache doesn’t store the response.

Example:

Cache-Control: private

ETag: entity-tag

Provides the current value of the entity tag. If a cashed copy of the resource is stale

then ETag value may be used for the resource validation.

Entity tag must change whenever the associated resource changes in any way. Servers

should send an entity tag unless it is not feasible to generate one.

Example:

ETag: "3e95-520-33c6faaf"

If no cache-related headers (excluding the “ETag” header) are specified in the response then the cache

treats it as not cacheable and does not save a copy of the response.

2.6.4 Disabling Caching

A common requirement during development is to disable caching so that all content is always fetched

from the application server. To achieve this set the maxage properties to zero, i.e. ‘audiomaxage’,

‘documentmaxage’, ‘grammarmaxage’, ‘scriptmaxage’.

These can be set as application parameters via the Holly Management System.

Alternatively they can be set in the VoiceXML application using the <property> element (typically in the

application root document so it affects overall behavior).

Confidential

Final


2.7 Default Property Values

The following are the factory-default settings for VoiceXML properties. The platform defaults may be

changed as a browser configuration (via HMS Configuration by the Administrator). The defaults for

individual applications may be changed by setting an Application Parameter via HMS Application

Configuration). Applications may also change the properties using the VoiceXML <property> element.

Key Default Value

audiofetchhint

prefetch

bargein

true

bargeintype

speech

confidencelevel

0.5

documentfetchhint

safe

fetchaudiodelay

2s

fetchaudiominimum

5s

fetchtimeout

7s

grammarfetchhint

prefetch

inputmodes

dtmf voice

maxnbest

1

objectfetchhint

prefetch

scriptfetchhint

prefetch

sensitivity

0.5

speedvsaccuracy

0.5

termchar

#

termtimeout

0s

universals

none

2.8 Access Control on <data>

Access control on <data> (see Section 5 of VoiceXML 2.1) allows an application server to indicate that

XML content is authorized for use by only selected applications. The Holly Voice Platform implements

this behavior to complement its multi-tenancy.

Access control element may specify virtual hosts as well as IP addresses and hostnames. Virtual host is

specified using Application, Affiliate and Service Provider name separated with dots. The attribute is

case insensitive.

<?access-control allow="application.affiliate.service_provider.host"?>

There are two exceptions to VXML 2.1 specification:

• * '.com' part of fully qualified name is omitted

• * if there is no access control instruction then access is allowed by default

Confidential

Final


2.9 Browser Protections

The Holly Voice Browser imposes the following constraints on running applications to ensure a rogue

application does not reduce the platform’s capacity to service other applications. Each constraint is

configurable via the Holly Management System (administrators should refer to the Holly Management

System User Guide for details on modifying configuration parameters).

Configuration Parameter

Description

Default Value

JavaScript Max Branches

A count is kept of each time a script jumps backward or returns from

a function; the count is not permitted to exceed this value.

100000

ECMAScript Max Object

Depth

When serializing objects -- for example, for logging or transferring

between execution contexts on return from a subdialog -- the

browser will generate an error if objects are nested to a depth

greater than this value.

10

Maximum Documents

The browser is not permitted to fetch more document instances than

this value.

500

Maximum Event Count

This value provides an upper limit to the number of events that can

be thrown by a particular condition within a single form; this value is

included to assist in the detection of infinite loops or bugs.

12

Maximum Event Rethrows

This value provides an upper limit to the number of times a

particular event can be rethrown.

6

Maximum Execution Stack

Depth

This value effectively limits the depth of subdialog calls within an

application.

5

Maximum Loop Iterations

This value limits the number of iterations of the form interpretation

algorithm on a single form.

100

Maximum Dialogs with no

User Input

The number of transitions between forms without entering a wait

state is limited to this value.

10

Confidential

Final


3. Input: Speech Recognition and DTMF 3.1 Selecting a Speech Recognizer

The Holly Voice Platform supports a broad range of speech recognition products. Developers should

contact the platform administrator to confirm which ASR products are available. Administrators should

refer to the ASR Configuration information in the Holly Voice Platform Operations Guide.

A single installation may be configured to support many speech recognizers. Holly allows for each

Application to select its preferred speech recognizer or it may use the platform’s default.

The default speech recognizer for an Application is determined by the value of the ‘asrengine’ property

(a VoiceXML property extension). The Application default may be set as an Application Parameter

through HMS or can be set using a VoiceXML <property> element with appropriate scope.

For example:

<property name="asrengine" value="dtmf"/>

The table shows the list of speech recognition products and the corresponding ‘asrengine’ value.

Vendor

ASR Engine

Value

Holly

Holly DTMF (Direct API)

dtmf

IBM

IBM Websphere Voice Server 5.1.3 (MRCP v1)

wvs513-mrcp1

Loquendo

Loquendo 7.8 ASR (MRCP v1)

loquendo-mrcp1

LumenVox

LumenVox

lumenvox-mrcp1

Nuance

Nuance 8.5 ASR (Direct API)

nuance

Nuance

Nuance 8.5 ASR (MRCP v1)

nuance85-mrcp1

Nuance

Nuance 9.0 ASR (MRCP v1)

nuance90-mrcp1

Nuance

OSR - Open Speech Recognizer (Direct API)

scansoft

Nuance

SpeechWorks Media Server 4.0 (MRCP v1)

swms40-mrcp1

Siemens

Siemens (MRCP v1)

siemens

Telisma

Telisma 1.3 Patch 1 (MRCP v1)

telisma-mrcp1

vLingo

vLingo Network ASR Service

vlingo

Note: The Nuance 8.5 ASR (via direct API) recognition interface is not available on Linux HVP

deployments.

Note: The ASR ‘asrengine’ values may be customized at a platform installation. The table shows the

default values which may not apply if customized.

3.1.1 ASR Engine Switching

The Holly Voice Platform performs all recognitions (voice or dtmf) using the default ASR engine. The

Holly Voice Platform permits switching between ASR engines within a call and within a single VoiceXML

document. To switch engines set the ‘asrengine’ in a <property> element with appropriate scope (e.g.

field, form or document scope).

Confidential

Final


3.1.2 Allowed ASR Engines

The Holly Voice Platform allows an Administrator to restrict an application to the use of specific named

speech recognizers. This is achieved by setting ‘gw.asrallowed’ as Application Parameter. The value is

a comma-separated list of allowed engines.

For example, setting “gw.asrallowed=nuance90-mrcp1,dtmf” means the ASR engine can only be

switched between Nuance 9.0 ASR (MRCP v1) and the Holly DTMF Recognizer.

3.2 ASR-Specific Behaviors

This section documents configuration and behaviors that are specific to the ASR products supported by

the Holly Voice Platform.

3.2.1 Engine-Specific Properties

The Holly Voice Platform allows for any Nuance property defined in the respective reference manuals

to be set using the VoiceXML <property> element and passed to the engine.

Recognizers

Property Prefixes

Example

Nuance Recognizer 9

Nuance OSR 3.0.x

Nuance SWMS

swirec_

swiep_

<property name="swirec_state_beam" value="-20"/>

<property name="swiep_audio_environment"

value="‘channel=cellular’"/>

A full property list is available in the Reference Guides for these

products.

Nuance 8.5

Nuance 8.5 MRCP

nuance.core.rec

nuance.core.ep

<property name="nuance.core.rec.GenEpFeedback"

value="’TRUE’"/>

<property name="nuance.core.ep.EndSeconds" value="’1.50’"/>

A full property list is available in Nuance 8.5 Documentation.

The developer must ensure the value they set the property to is valid for the underlying speech

recognizer. Standard VoiceXML properties are submitted to the speech recognizer before ASR-specific

properties, so ASR-specific properties will override any parameter mapping made from a VoiceXML

property to an ASR property.

3.2.2 vLingo

vLingo is a network-hosted speech recognition service that facilitates “open grammar” recognition with

very large vocabularies. Unlike the other speech products supported by Holly it does not provide

declarative grammar support (using SRGS or any other grammar standard).

The platform administrator can contact Holly Connects Support to obtain information on licensing

vLingo and documentation on writing VoiceXML applications with vLingo.

3.2.3 Loquendo

To support DTMF-only input with Loquendo ASR (inputmodes=dtmf) the system administrator must set

set the parameter enableDiscontinuousInboundStream=true in the Loquendo Management Console.

To support hash input with termination the system administrator must set the

dtmfNoMatchIfOnlyTermCharPressed=enable in the Loquendo Management Console. This parameter can

be found under Configuration | Advanced | MRCPv1Server.

Confidential

Final


3.3 ASR Sessions

ASR session IDs and the relevant recognizer name are logged as events in the Holly Management

System; this ID can be used to correlate ASR logs with HVP logs.

The event is logged on the first use of the speech recognizer (or DTMF) and each time the application

switches ASR engine.

The format of the log event is:

asrengine=<name>|sessionid=<id>|address=<host:port>|server=<server>|endpoint=<address>

3.4 Timeouts (speech)

As required by [VoiceXML 2.0], section 6.3.2, the Holly Voice Platform uses the maximum of the

‘completetimeout’ and ‘incompletetimeout’ properties as the actual value of the end of utterance

timeout during recognition. The default behavior of the platform is to use the larger of the two

properties ‘completetimeout’ and ‘incompletetimeout’ as the actual value of the end of utterance

timeout during recognition.

For recognizers that do enable the platform to distinguish ‘completetimeout’ and ‘incompletetimeout’,

the ‘com.holly.distincttimeout’ property can be set to “true” to permit the timeouts to be treated

differently.

3.5 Confidence Scores

The Holly Voice Platform sets ‘name$.confidence’ to the utterance confidence in a range of 0.0 to 1.0

as required by VoiceXML 2.0 (see Section 2.3.1). Although the Holly Voice Platform normalizes the

confidence value received from ASR engines to the VoiceXML 2.0 range, the specific interpretation of

values is relative to the ASR engine.

The default confidence level is 0.5. This value can be changed using the standard VoiceXML 2.0

<property> tag to set the ‘confidencelevel’ property (see VoiceXML 2.0, Section 6.3.2). The value can

also be changed for each application by setting ‘confidencelevel’ as an Application Parameter through

HMS.

3.6 N-Best

Following the VoiceXML 2.0 Specification, the N-best list contains a list of recognition results matching

all active grammars ordered by their confidence score (highest confidence to lowest). Active

grammars include those explicitly specified by the VoiceXML application plus application requested

platform grammars such as links, universals, menus and field options.

For application-defined grammars the slot filling follows the explicit specification of the application-

supplied grammar. For platform-generated grammars there is no standard as to how the slots should

be filled for those grammars (if any slots at all) so it is recommended that applications do not rely on

slots for those grammars.

3.7 Record

Recordings are stored as WAV (RIFF header) 8kHz 8-bit mono mu-law single channel.

The ‘maxtime’ attribute on the ‘<record>’ element defaults to 300 seconds.

The Holly Voice Platform does not currently support speech recognition on <record>.

Confidential

Final


DTMF recognition during record is supported by the behavior is determined by the currently selected

speech recognizer as follows:

• The MRCP recognizers handle arbitrary DTMF grammars

• The API recognizers handle a single digit DTMF grammar

• The Holly DTMF Recognizer handles arbitrary DTMF grammars.

3.8 Recording User Utterances during Recognition

The Holly Voice Platform permits user utterances to be recorded during recognition as per VoiceXML

2.1 (see section 7).

Recordings are stored as WAV (RIFF header) 8kHz 8-bit mono mu-law single channel.

Note: Recording of utterances must be enabled by the administrators of the Holly Voice Platform.

Administrators of the platform should refer to the Holly Voice Platform Operations Guide for

more information.

3.9 Re-Recognition from Recorded Utterances

HVP 5.1 introduces the ability for an application to pe-perform “re-recognition” from a recorded

waveform. Audio recorded from caller input using <record> or an utterance recorded during

recognition may be stored by the application then declared as input by the application to a subsequent

recognition attempt – typically using a different set of grammars.

The full description of application development using re-recognition is provided in Appendix B.

3.10 Grammars

3.10.1 Standard Grammars

The Holly Voice Platform supports the standard XML form of SRGS as per the Speech Recognition

Grammar Specification Version 1.0, W3C Recommendation 16 March 2004.

The content type for SRGS grammars should be specified as “application/srgs+xml”.

Recognizers

Grammar Format

Semantic Format

Nuance Recognizer 9

SRGS 1.0 XML

SISR 1.0

Others – look up doc

Nuance OSR 3.0.x

Nuance SWMS XX

SRGS 1.0 XML

SISR 1.0 and SWI

extensions.

Nuance 8.5

SRGS 1.0 XML Draft

GSL

GSL

IBM WVS 5.1.3

SRGS 1.0 XML

SRGS 1.0 ABNF

SISR 1.0

LumenVox 8

SRGS 1.0 XML

SRGS 1.0 ABNF

SISR 1.0

Loquendo

SRGS 1.0 XML

SISR 1.0

Confidential

Final


Recognizers

Grammar Format

Semantic Format

vLingo

Custom

Custom

3.10.2 Proprietary Grammars

Holly supports the use of proprietary grammar formats. Other supported formats are ASR engine-

specific and developers should refer to the relevant production documentation.

Proprietary grammar types, such as Nuance 8.5 GSL grammars, must be put inside CDATA sections when

used as inline VoiceXML grammars.

3.10.3 External Grammars

Note that external grammars declared as ISO-8859-1 do not support non-ASCII (i.e. Latin-1) characters

in “default” mode. In order to use Latin-1 characters in external grammars the property

‘com.holly.grammarfetchstyle’ should be set to “absolute”.

3.10.4 Pre-built and Binary Grammars

Holly supports pre-compiled and binary grammar formats for most recognizers. For example, Holly

supports pre-compiled grammars for OSR and Nuance 9 (these are .gram files created with the “sgc”

Grammar Compiler tool).

Contact Holly for information on using Nuance 8.5 static grammar packages, note that this is not

recommended in Virtual IVR systems.

3.10.5 Universal Command Grammars

The Holly Voice Platform supports the optional default grammars for ‘help’, ‘cancel’, and ‘exit’. These

are controlled by the VoiceXML 2.0 ‘universals’ property; see [VoiceXML 2.0], section 6.3.6. These are

available in English only.

The platform administrator may define new universal command grammars for an MRCP recognizer by

setting the following platform configurations: uriuniversalcancel, uriuniversalexit, and uriuniversalhelp.

3.10.6 Builtin Support

The Holly Voice Platform supports the builtin grammar types as summarized in the following table.

Recognizer

Language

Status

Nuance 8.5

en-AU

All builtin grammar types supported as listed in Appendix P of

[VoiceXML 2.0].

Other languages

Contact Holly Customer Support.

Nuance 9

en-AU


[VoiceXML 2.0].

Other languages

Implemented as Nuance 9 builtin grammars. See the relevant Nuance

9 Language Supplement.

Nuance OSR

(Scansoft)

en-AU


[VoiceXML 2.0]. Additional non-standard types supported as per the

ScanSoft OSR documentation.

Other languages

Implemented as OSR builtin grammars. See the relevant OSR Language

Supplement.

Confidential

Final


Recognizer

Language

Status

Holly DTMF

Recognizer

Any language

All builtin grammar types as listed in Appendix P of [VoiceXML 2.0].

DTMF only.

Holly supports the <type> attribute on a field tag with the builtin grammar types defined in the table

above. Developers may also reference a builtin grammar by specifying a grammar src attribute of

“builtin:<name>”. The syntax “builtin:grammar/<type>” and “builtin:dtmf/<type>” can be used to

specify the input mode for a particular builtin grammar type; see [VoiceXML 2.0], section 2.3.1.2.

Note: This method can also be used for non-standard builtin grammars provided by a particular ASR

engine.

3.10.7 Grammar Fetch Behavior

By default the Holly Voice Platform fetches external grammars and passes them to ASR engines. This

behavior can be changed by means of the ‘com.holly.grammarfetchstyle’ property.

The possible values of the property are “default”, “absolute”, and “relative”. There is no one method

of handling grammars that will work for all cases, for instance “absolute” or “relative” may be more

efficient if big grammars are frequently passed to ASR.

Note: Caching may not work with “absolute” or “relative” as VXML side cache control information is

not passed to the ASR engine. Also, ASR may not have its own caching facility, and even if ASR

is able to use HTTP cache control mechanism to store grammars in its own cache this may

cause problems in a virtual environment (multi-tenancy).

Value

Description

Comments

default

Holly Voice Platform

fetches grammars

“default” requires that the browser fetch and manipulate the grammar. The

browser converts these to multi-byte strings. Proper cookies (if any) are

passed to an application server and cache control directives specified in

VXML document are honoured.

“default” reverts to “absolute” if the grammar URI uses anything other than

the root rule (i.e. the grammar URI contains a fragment) or if the grammar is

a binary grammar (determined by the file name extension, either .gram

or .ngo).

“default” does not work if there are additional URIs specified in the

grammar passed to ASR.

Note: The HVP sends the contents of the grammar in a buffer to the ASR

engine, this may affect performance.

absolute


resolves relative URI

references and

passes the absolute

URI to the ASR

“absolute” requires both platform and ASR engine to have access to the

application server.

External grammars in proprietary formats (such as Nuance binary grammars)

are always processed as though the ‘com.holly.grammarfetchstyle’ property

as “absolute”. The absolute URI is passed to the ASR engine to fetch.

Support for HTTPS is vendor specific.

Grammar URIs with query parameters of fragment identifiers are also passed

to the recognizer as “absolute” URIs regardless of the property setting.

If cookies or session identifiers are required the ASR may not be able to

fetch the grammars. Check with your network owner if the platform’s load

balancing configuration is able to pass cookies.

relative


passes the value of

“relative” is typically used with OSR where large grammar files are stored in

the OSR grammar root directory which is controlled by the property

Confidential

Final


Value

Description

Comments

the ‘src’ attribute to

the ASR (it does not

resolve relative URI

references)

SWIsvcRootGrammarDirectory. For example, the file street-names.gram can

be copied to $SWISRSDK/config (the default value of the grammar root) and

referenced in VoiceXML as “street-names.gram”. OSR fetches the file from

disk rather than HTTP.

If cookies or session identifiers are required the ASR may not be able to

fetch the grammars. Check with your network owner if the platform’s load

balancing configuration is able to pass cookies.

3.11 DTMF

3.11.1 interdigittimeout and termtimeout

The Holly Voice Platform supports the properties ‘interdigittimeout’ and ‘termtimeout’. The HVP uses

the maximum of these parameters combined as the actual value of the end of DTMF timeout, the

default value of which is 2 seconds.

For the Nuance 8.5 ASR engine ‘termtimeout’ should be disabled (i.e. “termtimeout=0”) as Nuance

does not provide sufficient information in its DTMF parse results to determine whether a parse is

complete, incomplete, invalid or a valid prefix.

Use ‘interdigittimeout’ to control DTMF recognition timing. This should generally be set per recognition

state (usually a field) to an appropriate value for the input required. The ‘interdigittimeout’ property

can also be set globally for the application by setting an application parameter. Use 0s for a menu

application that accepts a single digit (e.g. “interdigittimeout=0”); use something larger, e.g. 2s, when

collecting information that callers will have to check such as a credit card number.

Note: The default value of ‘termtimeout’ is “0s” in HVP 5.0 to comply with the VoiceXML standard.

Note: The Holly DTMF recognizer and ScanSoft OSR support both interdigittimeout and termtimeout.

3.11.2 termchar

The VoiceXML property termchar is set to “#” by default and can be changed by setting the <property>

element in the VoiceXML application. For example to set termchar to empty:

<property name="termchar" value=" "/>

Note: VoiceXML <property> scoping rules apply. The allowable scopes are application, document,

form, menu, form item – see VoiceXML 2.0 S6.3.

The default termchar property can be modified by creating an Application Parameter through HMS.

3.12 DTMF Buffering

Interactive DTMF parsing is a different mode of DTMF treatment available for HVP 5.0 and later with

the ability to be disabled for compatibility with HVP 4.1 and earlier.

At a global level Interactive DTMF parsing can be enabled or disabled for an ASR engine by setting the

parameter ‘dtmfinteractive’ to “true” or “false” under the appropriate ASR Plug-in component on the

Holly Configuration page via the Holly Management System.

Interactive DTMF parsing can be enabled or disabled at the application level via the

‘sr.dtmfinteractive’ application parameter on the Applications page in the Holly Management System

(administrators should refer to the Holly Management System User Guide for details on modifying

application parameters).

Confidential

Final


The DTMF buffer is cleared whenever prompts that have been queued are played to completion before

continuing processing. One such scenario is when prompts are queued with bargein disabled; another

is when prompts are queued before a fetch that specifies fetchaudio. In the latter case there is an

ambiguity in the VoiceXML 2.0 specification about the handling of DTMF input during the playing of the

prompts and the fetch.

By default, the Holly Voice Browser clears the DTMF buffer after playing the fetchaudio, so any DTMF

collected during the prompt playback or during the fetch will be lost. If the proprietary VoiceXML

property ‘com.holly.fetchaudiodtmf’ is set, the Holly Voice Browser will not clear the buffer after

playing the fetchaudio, and any DTMF collected during the prompt playback or during the fetch is fed

to the next recognition (unless some other action such as a non-bargeinable prompt clears the buffer

first).

The DTMF buffer is also cleared when the ASR engine switches. If the Holly Voice Browser parameter

‘asrengine’ is used to make the switch this will occur immediately before the first recognition in the

scope of the property. If this switch is performed using the VoiceXML ‘asrengine’ property, it won’t

take effect until the first recognition in the application and any DTMF digits collected to that point are

lost. This can be prevented by setting the VoiceXML application parameter ‘asrengine’ to the same

value as the Holly Voice Gateway component configuration parameter ‘srdefault’ so that the ASR

engine switch takes place at the very beginning of the call (administrators should refer to the Holly

Management System User Guide for details on modifying application and configuration parameters).

3.13 Holly DTMF Recognizer v2

The Holly DTMF Recognizer v2 is a conforming XML form grammar processor, as specified in SRGS

section 5.4, except that it does not support references to rules defined in external grammars.

Other notes on its implementation:

• Full schema validation of SRGS+XML documents is not performed

• Recursive grammars are not supported

• xml:lang attributes are ignored (following the SRGS specification)

• Grammars with mode attribute of “voice” are ignored. Only grammar documents that explicitly

set the mode to “dtmf” are processed

• The <record> element is supported.

Tokens

The tokens (terminal symbols) supported by the Holly DTMF Recognizer v2 are shown in the following

table; entries in the same row are synonymous. Uppercase or lowercase alphabetic characters may be

used.

1, one, dtmf-1

2, two, dtmf-2

3, three, dtmf-3

4, four, dtmf-4

5, five, dtmf-5

6, six, dtmf-6

Confidential

Final


7, seven, dtmf-7

8, eight, dtmf-8

9, nine, dtmf-9

*, dtmf-*, star, dtmf-star

#, dtmf-#, hash, pound, dtmf-hash, dtmf-pound

Further information on the Holly DTMF Recognizer v2 is available in Appendix B.

Confidential

Final


4. Output: Prompting and TTS 4.1 Selecting a TTS Engine

The Holly Voice Platform supports a broad range of text-to-speech products. Developers should

contact the platform administrator to confirm which TTS products are available. Administrators should

refer to the TTS Configuration information in the Holly Voice Platform Operations Guide.

A single installation may be configured to support many text-to-speech engines. Holly allows for each

Application to select its preferred text-to-speech engine or it may use the platform’s default.

The default text-to-speech engine for an Application is determined by the value of the ‘ttsengine’

property (a VoiceXML property extension). The Application default may be set as an Application

Parameter through HMS or can be set using a VoiceXML <property> element with appropriate scope.

For example:

<property name="ttsengine" name="realspeak45-mrcp1"/>

The table shows the list of text-to-speech products and the corresponding ‘ttsengine’ value.

Vendor

TTS Engine

Value

Acapela

Acapela (MRCP v1)

acapela-mrcp1

IBM

IBM Websphere Voice Server 5.1.3 (MRCP v1)

wvs513-mrcp1

Nuance

Speechify 3.0 (Direct API)

scansoft

Nuance

SpeechWorks Media Server 4.0 (RealSpeak TTS engine) (MRCP v1)

swms40-mrcp1

Nuance

RealSpeak 4.5 (MRCP v1)

realspeak45-mrcp1

Nuance

Recognizer to RealSpeak 4.0 (Direct API)

realspeak

Loquendo

Loquendo (MRCP v1)

loquendo-mrcp1

Note: The ‘ttsengine’ values may be customized at a platform installation. The platform

administrator can advise if any values are different from those shown in the table.

4.1.1 TTS Switching

The Holly Voice Platform performs all text-to-speech using the default TTS engine. The Holly Voice

Platform permits switching between TTS engines within a call and within a single VoiceXML document.

To switch engines set the ‘ttsengine’ in a <property> element with appropriate scope (e.g. field, form

or document scope). It is, however, not possible to have prompts in the same queue using a different

TTS setting. A switch will only take place when the queue is flushed (usually by performing a

recognition).

Note: The ‘ttsengine’ property is a proprietary extension to VoiceXML.

4.1.2 Using a Non-default TTS Voice

The ‘ttsvoice’ property is used to set a specific TTS voice for a VoiceXML document. The available

values for this parameter are dependent on the TTS voices installed on the platform.

Note: The ‘ttsvoice’ property is a proprietary extension to VoiceXML.

Confidential

Final


4.1.3 SSML

The Holly Voice Platform supports SSML for TTS and audio output.

Prompts consisting of unmarked-up text are wrapped in SSML tags before being sent to the TTS engine.

SSML tag support depends on the capabilities of the TTS engine. The IBM WVS proprietary alphabet and

other proprietary alphabets not starting with the “x-” prefix are supported.

The Holly Voice Platform does not support the VoiceXML 2.0 extension of the ‘<say-as>’ SSML element

described in Appendix P of [VoiceXML 2.0]. That is, the results from a recognition using one of the

builtin grammar types cannot be passed directly to a ‘<say-as>’ element to be read as a valid value of

that type for the current language.

4.2 Audio Files

The Holly Voice Platform supports the required audio formats specified in VoiceXML (see VoiceXML 2.0,

Appendix E).

All audio files must have an 8KHz sample rate. All audio files must contain single channel “mono”

recordings. The audio file must have a file extension matching the file format as shown in the table.

Audio is fetched by the Holly Voice Platform and played by the Holly Voice Gateway. The TTS server is

not used to fetch audio data or generate an audio stream.

Audio Format and Supported Content

Media Type

File Extension

.WAV (RIFF header)

– 8kHz 8-bit mono mu-law single channel

– 8kHz 8-bit mono A-law single channel

– 8KHz 16-bit mono linear [PCM] single channel

audio/x-wav

.wav

Raw (headerless) mu-law

– 8kHz 8-bit mono mu-law single channel (G.711)

audio/basic [RFC1521]

.ulaw

Raw (headerless) A-law

– 8kHz 8 bit mono A-law single channel. (G.711)

audio/x-alaw-basic

.alaw

Raw (headerless) mu-law

– 8kHz 8-bit mono mu-law single channel (G.711)

audio/basic [RFC1521]

.au

Note: There is no support for audio/basic .au with au header format.

4.2.1 Throwing Errors on Audio Fetch Failures

VoiceXML 2.0 specifies that failure to fetch an audio file does not result in an ‘error.badfetch’ event

being thrown, even when there is no alternative content. The Holly Voice Platform recognizes a

property ‘com.holly.audiobadfetch’ which, when set to “true”, results in an ‘error.badfetch’ event

being thrown if an audio file cannot be fetched.

Note: Setting this property to “true” results in behavior that does not conform to VoiceXML 2.0, but

can be very useful for development.

Holly also recognizes the VoiceXML property ‘com.holly.audiofetchalarm’ to enable SNMP and email

alarming for missing prompts, this can be turned on or off as required.

http://www.w3.org/TR/voicexml20/#ref_RFC1521

http://www.w3.org/TR/voicexml20/#ref_RFC1521

Confidential

Final


The VoiceXML property can be set either at the application level (in the Holly Management System) or

for individual dialogs within an application.

Setting the property “off” suppresses the alarm (SNMP, email etc) and also suppresses the error events

in the Holly Management System Call_Events for audio files fetch errors. The fetch failure would still

be logged as "outcome=error" within the fetch parameter details.

Note: Some lower level socket fetch errors may still be written to the Holly Management System log.

These are outside of the browser control, but have no effect on processing and do not raise

alarms.

Setting the property to “on” returns behavior to normal, raising both an alarm and writing the severe

errors to the Holly Management System log. By default the property is “off”.

4.3 <mark> Element

The standard <mark> tag is supported as in the VoiceXML 2.0 standard.

From HVP 5.1, <mark> may be used without a TTS engine being invoked so long as the prompt content

comprises only <audio> and <mark> elements.

Confidential

Final


5. Telephony: Session & Transfers 5.1 Session Variables

The following are the VoiceXML variables set by the Holly Voice Browser. Note that for outbound calls

there are some differences as documented in the following section.

Variable

Description

session.connection.local.uri

Set to the URI in the SIP “To” header.

session.connection.remote.uri

Set to the URI in the SIP “From” header (prior to any CTI Manager lookup),

this will be "anonymous" if CLI is restricted.

session.connection.aai

Variables received from CTI.

In AIN calls, this is an object whose property/value pairs are the key/value

pairs returned as call data by the CTI Manager. The CTI keys become the

property values, and they resolve to the corresponding CTI values

(interpreted as strings).

In other calls it is defined, but has no properties.

session.connection.originator

Always points to the same object as session.connection.remote.

session.connection.protocol.name

“sip” for SIP calls; an empty string for any other signaling type.

session.connection.protocol.sip.call

id

Set to the value of the Call-ID header field from the SIP INVITE that

initiated the call (inbound calls only).

session.connection.protocol.version

“2.0” for SIP calls; an empty string for any other signaling type.

session.connection.redirect

Always an empty string.

session.telephone.ani

ANI if CLIR is not set or the passCLI application parameter is set; a masked

ANI otherwise.

session.telephone.clearani

Only defined (to be ANI) if the passCLI application parameter is set.

session.telephone.moli

MOLI; only defined if the passMOLI application parameter is set.

session.telephone.dnis

DNIS after any CTI Manager lookup.

session.telephone.iidigits


session.telephone.uui


session.telephone.rdnis


session.telephone.redirect_reason


session.telephone.follow_on

Only defined if this is a AIN follow-on call; set to one of “busy”,

“noanswer”, “invalid”, “congestion”, “hangup”.

session.com.holly.callid

The call ID assigned to this call.

session.com.holly.switchmessageid

session.com.holly.trunkgroupid

session.com.holly.trunkid

These three variables come from the CTI Manager. The message ID is used

to determine the follow-on status.

Confidential

Final


Variable

Description

session.com.holly.applicationid

session.com.holly.affiliateid

session.com.holly.servprovid

session.com.holly.initialuri

These four variables come from the Licence Manager.

session.com.holly.channelid

This variable comes from the HVG.

session.com.holly.browserid

The IP address of the host this browser is running on.

session.com.holly.correlationid

session.com.holly.scfid

session.com.holly.callerlocation

These variables are only set if the CTI Manager returns them to the

browser in the call data.

5.1.1 Session Variables for Outbound

After a call is answered by a remote party the Holly Voice Browser commences execution of the

VoiceXML session. The session is started with the outbound parameters of the original call request

placed into ECMAScript variables in the session scope of the VoiceXML context. The mapping of

parameters is shown in the table below.

For outbound calls the CLID/ANI corresponds to the Remote party and the DNIS to the platform service

calling the Remote party. The Holly License Manager uses the DNIS to retrieve the virtual IVR data for

the call handling.

VoiceXML Variable

PLACECALL/PLACECALLRESULT Field

session.connection.local.uri

Local (number of the calling application)

session.connection.remote.uri

Remote (number of the called party).

session.connection.originator

Local (the party that initiated the call, this is a reference to

either session.connection.local or session.connection.remote)

session.telephone.ani

Remote

session.telephone.dnis

Local

session.com.holly.callid

Holly Call ID (returned in PLACECALLRESULT)

session.com.holly.userdata

User Data

5.2 Session.connection.aai Example

In certain telephony configurations the Holly Voice Platform populates the VoiceXML session with CTI or

other connection-related data. This section documents how VoiceXML applications can access this call

data.

For the following examples, suppose the call data returned by the CTI Manager for a call is

fruit = apple

beer = lager

If the name of a property is known in advance, it can be accessed directly:

<var name="fruit" expr="session.connection.aai.fruit"/>

With error-checking:

Confidential

Final


<var name="fruit"/>

<script>

<![CDATA[

if (typeof(session.connection.aai.fruit) != "undefined")

fruit = session.connection.aai.fruit;

else

fruit = ’unknown’;

]]>

</script>

If the property names are not known in advance, the ECMAScript ‘for/in’ operator can be used to

iterate through the properties defined on the session.connection.aai object. Here is an example of

iterating through the session.connection.aai object and reading out the name/value pairs:

<?xml version="1.0" encoding="UTF-8"?>

<vxml version="2.0" xmlns="http://www.w3.org/2001/vxml">

<form id="example">

<var name="text" expr="‘‘"/>

<script>

<![CDATA[

for (var prop in session.connection.aai)

text += prop + " is " + session.connection.aai[prop] + ", ";

]]>

</script>

<block>

<value expr="text"/>

<disconnect/>

<exit/>

</block>

</form>

</vxml>

With the call data above this would result in the prompt “fruit is apple, beer is lager”.

5.3 Passing Data Between Sessions

HVP 5.1 introduces a mechanism for an application to store data to the Holly CTI Manager so that it is

available for a subsequent VoiceXML session in a follow-on call (where the same call results in more

than one VoiceXML session). Contact Holly Support for details.

5.4 Transfers

5.4.1 Transfer Types

Support for transfers varies for each installation, depending upon the telephony environment within

which the Holly Voice Platform is deployed. The VoiceXML interpreter will attempt both blind and

bridge transfer types, according to the rules in VoiceXML 2.0; however, if a particular type is not

supported in a given environment, an ‘error.unsupported.transfer.<type>’ event is thrown. Note that

<transfer> does not set extra shadow variables as described in VoiceXML 2.1, section 7.


Confidential

Final


5.4.2 Destination URIs

The allowed transfer destination URIs in VoiceXML applications depends on the deployment

environment. The VoiceXML interpreter recognizes ‘tel’ URIs, but the actual digits needed to establish

a transfer depend on carrier business rules. Therefore the processing of ‘tel’ URIs is installation-

dependent.

The VoiceXML interpreter recognizes ‘sip’ URIs. Contact the system administrator to determine

whether they are supported in the deployment environment and for content information.

As a special case the Holly Voice Platform treats “relative” URIs, understood here as those with no

scheme component, as belonging to an unnamed proprietary scheme in which the string is passed

without modification to the telephony integration.

For example,

<transfer dest="CODE1">

...

results in the string “CODE1” being passed to the telephony interface as the destination for the

transfer. Contact the system administrator to determine whether they are supported in the

deployment environment and for content information.

5.4.3 Transfer CLID

If the VoiceXML property ‘com.holly.transferclid’ is set in the scope of the transfer element (typically

within the element), its value is supplied as the CLID in the transfer-initiating SIP message.

5.4.4 Recognition During Transfer

The Holly Voice Platform does not support speech or DTMF recognition during transfer.

5.4.5 Whisper Transfer

Holly supports whisper transfer as an extension to the set of standard VoiceXML transfer types. In a

whisper transfer the VoiceXML application is able to interact fully with the C-party (recipient of the

transfer) prior to either completing or rejecting the transfer. Examples of its use:

• Perform a transfer with the application passing information about the call to the C-party (transfer

recipient) so that they can continue the call smoothly;

• Provide the C-party with the option to accept or reject the transfer request (e.g. “I have Joe

Smith on the line. Would you like to take the call?”).

To achieve both a transfer and a separate dialog with the C-party the VoiceXML code for whisper

transfer combines the standard <transfer> elements (used like a bridge transfer) and <subdialog>

element. For reference, <subdialog> element is described in Section 2.3.4 of VoiceXML 2.0 and bridge

transfer is described in Section 2.3.7.2 of VoiceXML 2.0.

Executing whisper transfer causes the following sequence of events:

• Any queued prompts are played to completion to the A-party (caller) before they are put on hold.

• A-party is put on hold and may be played transferaudio (as for standard bridge transfer).

• Platform initiates a new call to the C-party indicated by the dest or destexpr attribute of the

<transfer> element (as for any transfer).

Confidential

Final


• Once a connection to the C-party is established a subdialog is invoked to interact with the

answerer. As with standard subdialogs this is executed in its own execution context. As with

<subdialog> variables may be passed and returned through the <subdialog> mechanism only.

Note: recognition on transfer (or transfer barge-in) is not supported.

To combine <transfer> and <subdialog> capabilities the <transfer> element is extended to allow the

<param> element (which is standard for the <subdialog> element). There are four specific values of

the <param> element within a <transfer> element that have special interpretations. The values may

be provided by either the “value” or “expr” attributes.

Parameter Name

Description

Example

com_holly_uri

URI reference to the subdialog to

execute with the C-party. The URI

may be a fragment referencing the

current document.

./whisper_dialog.vxml#form

#whisper_dialog

com_holly_namelist

The list of ECMAScript variables to

submit as URI query parameters.

Equivalent of namelist attribute on

<subdialog>.

var1 var2 status

com_holly_method

Method of fetch.

get (default)

post

com_holly_enctype

Encoding type of the fetch.

application/x-www-form-

urlencoded (default)

multi-part/form-data

<any other name>

Parameter is passed as a <param>

to the subdialog. There must be a

corresponding <var> element in the

subdialog.

Attributes

The <transfer> maxtime attribute can be used to limit the duration of the transfer. This limits the

combined duration of both the interaction within the subdialog and the subsequent bridge connection.

During the transfer operation and while the subdialog executes the current interpreter session is

suspended. If a transferaudio attribute is provided the audio resource will be played to the caller until

the connections are bridged or the subdialog ends.

Properties

Four standard VoiceXML properties apply to the fetch behavior:

• fetchtimeout

• fetchhint

• maxage

• maxstale

The fetchaudio property does not apply to the fetch since the transferaudio attribute of

the <transfer> extension applies.

Error Handling

Confidential

Final


The following events might be thrown during the attempt to establish the new connection. They will

all be thrown in the context of the calling dialog.

error.semantic: The com_holly_uri parameter is not supplied.

error.badfetch: The URI referenced by the “com_holly_uri” parameter is invalid.

error.connection.baddestination: The URI reference com_holly_uri is malformed.

error.unsupported.uri: The platform does not support the URI scheme in the URI reference.

error.connection.noroute: The platform is not able to make the connection.

error.connection.noresource: The platform cannot allocate resources to create the new connection.

Connecting A-Party to C-Party

A <transfer> element with the type of com.holly.join must be executed in the subdialog to connect the

A-party to the C-party (i.e. to join the whisper transfer). (Alternatively the subdialog can <return> or

<exit> as described in the following section.)

The dest, destexpr, connecttimeout, maxtime and transferaudio attributes of the <transfer> element

are ignored for this type of transfer.

If the application executes a <transfer> of type com.holly.join then the platform will join the A-party

and C-party legs as a bridge transfer which completes the whisper transfer. The A-party and C-party

remain connected until one of the following:

• A-party hangs up

• C-party hangs up

• maxtime property is reached

There can be no failures before the bridge is established with this type of transfer (because connection

to C-party has already been established). The only conditions are thus post-connection conditions, as

listed in Table 2 (but excluding the subdialog disconnect condition). The results of a join transfer are

automatically copied to the corresponding whisper transfer element variable, and once the bridge is

complete control continues with any <filled> element in the whisper <transfer> element in the calling

dialog. If the A-party hangs up, a connection.disconnect.hangup is thrown in the calling dialog. If the

C-party hangs up, the calling dialog <transfer> item variable is populated accordingly.

Once the connections are bridged, no further interaction in the subdialog will take place. The

VoiceXML execution next returns to the VoiceXML context in which the whisper transfer was initiated

and the behavior follows the specification for returning from a bridge transfer.

If the join <transfer> element specifies a com.holly.join.namelist <param> element, the whisper

<transfer> element result shadow variable will be populated with the variable values returned, as if

the subdialog had terminated with a <return> with a namelist attribute.

Whisper Subdialog <return> or <exit>

If the application does not complete the transfer with a join then the subdialog execution will

complete either when the application executes a <return> element or when there is an explicit or

implicit <exit>. With a <return> the control and data are returned to the calling whisper transfer with

behavior following the specification for returning from a sub-dialog. This also results in the connection

to the C-party being closed and the subdialog execution context being deleted. (Note that because the

application has not executed a <transfer> with holly.com.join there will be no connection between the

A-party and C-party.)

Confidential

Final


The <return> element may specify a list of variable values to return (namelist attribute) and these

populate the shadow variable of the calling <transfer> element. Alternatively the <return> element

may specify an event to throw in the calling dialog (event or eventexpr attribute) with an associate

message (message or messageexpr attribute). The standard <transfer> shadow variables are set as in

VoiceXML 2.0.

For subdialogs that end with either a join or a <return> the duration shadow variable is set to the

duration of the whisper transfer which is the sum of the time the caller is on hold and the time the two

connections are bridged. Also, an extension shadow variable, holdduration, is set to the time the

caller spent on hold.

If the subdialog ends, explicitly or implicitly, with an <exit> interpretation terminates and any

remaining connections are released following the VoiceXML <subdialog> behavior.

After Completion of Whisper Transfer

When execution returns to the initiating <transfer> element (of type whisper) the item variable

contains an indication of the final condition of the transfer. Following the specification for bridge

transfers in VoiceXML 2.0 the return behavior is different according to whether connection is

established to the C-party. For whisper transfer there is the additional distinction that the platform

can connect successfully to the C-party but the application does not join the A- and C-parties.

The possible values of the transfer item variable depend on the stage reached. Table 1 shows the

possible values assigned to the item variable before the connection to C-party is established.

Condition

Value

target busy

‘busy’

no answer before connect timeout

`noanswer'

other

‘unknown’

Table 1: Values of the transfer item variable before the new connection is established (prior to

execution of subdialog)

Table 2 shows the possible values after the connection is established. The possible values after the

connections are joined are a subset of these, and are distinguished using a non-standard boolean

shadow variable name.joined that indicates whether the connections were joined or not. This shadow

variable is not defined if the new connection fails to be established.

Condition

Value

C-party disconnects

‘far end disconnect’

maxtime reached

‘maxtime disconnect’

Subdialog disconnects

‘application disconnect’

other

‘unknown’

Table 2: Values of the transfer item variable after the new connection is established but before

joining (i.e. during execution of subdialog)

At any stage, if the A-party disconnects a connection.disconnect.hangup is thrown. If this occurs

during the establishing of the new connection or after the connections are joined, it will be thrown in

Confidential

Final


the context of the calling dialog. If it occurs while the caller is on hold, it will be thrown in the

context of the subdialog as a connection.disconnect.hangup.a.

Example: Initiating Whisper Transfer

<?xml version="1.0"?>

<vxml xmlns="http://www.w3.org/2001/vxml" version="2.1">

<catch event="connection.disconnect.hangup.c">

<log>The C party hung up: caught in whisper-call</log>

<script>result.condition = "disconnect"</script>

<return namelist="result"/>

</catch>

<catch event="connection.disconnect.hangup">

<log>The A party hung up: caught in whisper-call</log>

<exit/>

</catch>

<form>



<var name="com_holly_uri"/>

<var name="com_holly_namelist"/>

<var name="karma"/>

<var name="result" expr="{}"/>

<field name="proceed" type="boolean">

<prompt>Are you willing to help improve a karma of

<value expr="karma"/>?</prompt>

<filled>

<if cond="proceed">

<script>result.condition = "accepted"</script>

<goto nextitem="join"/>

<else/>

<script>result.condition = "rejected"</script>

<return namelist="result"/>

</if>

</filled>

</field>

<transfer name="join" cond="false" dest="ignored" type="com.holly.join">

<param name="com_holly_namelist" value="result"/>

</transfer>

</form>

</vxml>

Example: Whisper Subdialog

<?xml version="1.0"?>

<vxml xmlns="http://www.w3.org/2001/vxml" version="2.1">

<catch event="connection.disconnect.hangup">

<log>The A party hung up: caught in whisper-main</log>

<exit/>

</catch>

<catch event="error">

<exit/>

</catch>

<form>

<field name="karma" type="digits?minlength=3;maxlength=10">

<prompt>What's your karma?</prompt>



Confidential

Final


</field>

<transfer name="guru" destexpr="karma" type="com.holly.whisper"

connecttimeout="15s" maxtime="60s" transferaudio="chicken.wav">

<param name="com_holly_uri" value="whisper-call.vxml"/>

<param name="com_holly_namelist" value="karma"/>

<param name="karma" expr="karma"/>

<prompt>

Please wait while your karma works for you or against you.

</prompt>

<filled>

<log>whisper transfer returned: guru$=<value expr="guru$"/>,

duration=<value expr="guru$.duration"/>, holdduration=<value

expr="guru$.holdduration"/>, result=<value expr="guru$.result"/></log>

<if cond="guru$.result">

<goto nextitem="subdialogresult"/>

<else/>

<goto nextitem="transferresult"/>

</if>

</filled>

</transfer>

<block name="subdialogresult" cond="false">



<if cond="guru == 'application_disconnect'">

<prompt>application disconnect</prompt>

<elseif cond="guru == 'unknown'"/>

<prompt>outcome is unknown</prompt>

<elseif cond="guru == 'far_end_disconnect'"/>

<prompt>We hope your conversation with the guru improved

your already excellent karma.</prompt>

<elseif cond="guru == 'maxtime_disconnect'"/>

<prompt>Sorry, the guru is a busy, important person, and

he has other karmae to attend to. We hope the abrupt

termination has not karmed your harma too

much.</prompt>

<elseif cond="guru$.result.condition == 'rejected'"/>

<prompt>sorry, the guru does not wish to address your

karma. Try again in your next life.</prompt>

<else/>

<prompt>Well, look at that. A guru meditation error.

We hope your karma hasn't been bruised.</prompt>

</if>

</block>

<block name="transferresult" cond="false">



<if cond="guru == 'busy'">

<prompt>Sorry, with your karma you're condemned to be a

silent follower, never even observed. Try again in your

next life.</prompt>

<elseif cond="guru == 'noanswer'"/>

<prompt>Sorry, with your karma you are not important enough

to talk to the guru. Try again in your next

life.</prompt>

<elseif cond="guru == 'unknown'"/>

<prompt>Sorry, it seems the guru's karma is worse than

yours, so you probably don't wan't to talk to him

anyway.</prompt>

</if>

Confidential

Final


</block>

<block>

all over, disconnect call

<log>All over.</log>

<exit/>

</block>

</form>

</vxml>

Confidential

Final


6. Logging

The Holly Voice Platform provides in-depth logging, reporting and analytics to facilitate development,

debugging and support for production applications. This section documents the logging as used by

application developers. The Holly Management System Guide provides information on using HMS for

reporting and analytics. The Holly Reference Manual and Holly Operations Guide provide platform

administrators with information on configuration of a platform to enable logging and reporting.

The Holly Voice Platform has two key forms of logging:

• Call Detail Record: stored as LOG_CALLS in Holly, this is a summary table containing around 90

properties. Core information in the CDR includes call start/end times, call duration, ANI, DNIS, use

of ASR and TTS, use of logging and much more.

• Call Event Record: stored as LOG_EVENTS in Holly, this is a event-by-event record that documents

the progress of a call from initiation to completion.

6.1 Events

The Holly Voice Platform can be configured to record over 30 different types of event during execution

of each call. These events are recorded in real-time during the call by the Holly Voice Browser. The

events are submitted to the platform database via the Holly Log Manager. The events can be viewed

and analyzed through the Holly Management System.

The <log> element of VoiceXML is used by application developers to insert a “Log” type event. All the

other event types are generated by the Holly Voice Platform. (The set of events configured for

recording can be configured at the platform and application levels as described in the following

section.)

For optimal platform and database efficiency it is recommended to remove any logging options that are

not required.

EVENTID

PARAM

Description

Answer

result=%s

Indicates the call has been answered or not. The

parameter “result” may be success or failure.

ASR Session

asrengine=%S| sessionid=%S|

address=<host:port>|

server=<server>|

endpoint=<address>

Logs the ASR session event. The endpoint address of

the recognizer is also included.

Call end

cpu=%.3f| normalcpu=%.3f|

callduration=%.3f| reccount=%d|

ttscount=%d| ttsduration=%.3f|

logcount=%d| logbytes=%d

Summary call statistics recorded irrespective of the

level of logging.

Call start

ANI=%s| DNIS=%s| VURL=%s| follow

on,reason=%s

At the start of a call logs the Caller Line Identifier,

DNIS, and the initial applications document identifier.

Disconnect

List of variables and corresponding

values specified with VXML

attribute ‘namelist’.

Records a hang-up event.

Document dump

<a reference to the VoiceXML

document>

Logs the whole VoiceXML document as it is fetched.

Confidential

Final


EVENTID

PARAM

Description

Document

transition

uri=%s| cpu=%.3f| normalcpu=%.3f

Written when document scope changes.

Error Critical

error=%d| msg=%s

Logged when a critical error occurs.

Error Severe

error=%d| msg=%s

Logged when a severe error occurs.

Error Warning

error=%d| msg=%s

Logged when a warning error occurs.

Exit

result=%s

Logs exit event

Fetch

uri=%s| fetchtype=(VXML| |audio|

|grammar| |other)|

incache=(true| |false)|

latency=%.3f| documentsize=%d|

outcome=(success| |no response|

|error| |timeout)| failover=(true|

|false)| localport=%s| hostname=%s

Logged for each document fetch. When external

scripts are fetched they are logged with a fetch type

of 'vxml'.

Grammar

activation

URI=%s

Fetch and/or activation of a new grammar.

Grammar

deactivation

URI=%s

Logged when a grammar is deactivated.

License

mode=(acquire| |resolve)|

key=nnn| outcome=(success|

|maximum licence number reached

or exceeded| |license key not

found| |socket error| |message

encode error| |message decode

error| |application not found|

|affiliate not found| |service

provider not found| |licences do

not reconcile| |licence manager

not seeded| |connect failure)|

service=<service provider

ID>:<affiliate ID>:<application ID>

The first event for a session, preceding the “call start”

event representing the license lookup. Values are:

mode=acquire or resolve.

The value 'acquire' means that this license is being

requested for the duration of the call; this is the

license that authorizes all sessions within the call. The

value 'resolve' means that this license is being

requested solely to obtain application data for a new

session -- the Holly License Manager will not increase

the license allocation for the application.

key=The key for the license lookup.

outcome= outcome of the request

If the key lookup is successful, a fourth field will be

present:

service=<service provider ID>:<affiliate

ID>:<application ID>

Note: by default this event is not included in the Holly

Voice Browser callevents parameter which determines

which events are logged.

Confidential

Final


EVENTID

PARAM

Description

Log Event

EVNT=<event id>| Label=<label in

log tab>| expr=<expression in log

tag>| content=<user defined

parameter

Logged as a result of a <LOG> tag included in the

VoiceXML document. . The VoiceXML log tag format

includes the attributes label and expr in addition to

the log tag content. The format and contents of the

PARAM field are under the control of the application

builder. The VoiceXML log tag format includes the

attributes label and expr.

Note: The browser treats <log> content differently if

it is of the form 'EVNT=<txt>|...'. This content will be

logged to the ASR engine with the event name <txt>

and content all the text after the '|'. If <txt> begins

with 'SWI' or 'calllog:?' the event will NOT be logged to

HMS; otherwise the event will also be logged to HMS

with the event name <txt>.

This format is supported by the

SpeechWorks/ScanSoft/Nuance OSDM (OpenSpeech

DialogModule) products which log a lot of useful dialog

state information with events that start with

EVNT=SWI. This information can then by used by the

OpenSpeech Insight reporting tool to do some

powerful ASR analysis.

Placecall start

remote=<SIP URI>| local=<app

number>

Logs the SIP URI and application number.

Placecall end

result=(no reason supplied| |user

disconnect| |silent| |maximum

duration| |special information

tones| |fax| |busy| |cti| |no

answer| |error| |bad destination|

|bad format| |answering machine)

Logs the result of the outbound call.

Prompt (external)

type=external| URI=%s

Logged when an external prompt is played. URI is the

uri of the audio file

Prompt (SSML)

type=SSML| content=%s

Logged when a TTS prompt is played. Content is the

TTS string.

Prompt

(disconnect )

status=disconnect

Logged when the Holly Voice Gateway attempts to

play a prompt but the call has already disconnected.

This is common in normal operation for most

applications.

Recognition start

inputmodes=(dtmf| |voice|

|dtmf,voice)| threshold=%d|

timeout=%.3f|

bargeintype=(speech| |hotword)

Start of recognition. This event contains information

such as inputmodes, timeout, threshold and bargein

type.

Recognition end

(fail)

result=(no input| |disconnect| |no

match| |error)| bargein=(true|

|false)| inputmode=unknown|

dtmfinput=%s

End of recognition when the recognition fails. The

PARAM value indicates the failure reason. Failed DTMF

input (if applicable) is also logged.

Recognition end

(success)

result=success| utterance=%s|

confidence=%d| bargein=(true|

|false)| inputmode=(dtmf|

|speech)| utterance=%s|

confidence=%d

End of a successful recognition.

Confidential

Final


EVENTID

PARAM

Description

Recording start

maxtime=%.3f| dtmfterm=(true|

|false)| type=%5

Start of voice recording.

Recording end

(fail)

result=(maxtime exceeded||no

input||disconnect||max speech

timeout||error)

End of voice recording when the recording has failed.

The PARAM value indicates the failure reason.

Recording end

(success)

result=success| duration=%d

End of a voice recording when successful.

SIP session

callid=<SIP called>| remote-

rtp=%s| local-rtp=%s

SIP Session Call ID.

System response

latency=%.3f

Logged whenever a recognition event occurs, in

seconds to millisecond resolution.

Transfer start

mode=(network| |blind| |bridge|

|conditional)| {URI=%s|

|destination=%s}

Start of a call transfer. Given either the URI or the

destination.

Transfer end(fail)

result=(bad destination|

|disconnect| |error| |remote

busy| |timeout| |network busy|

|maxtime exceeded)

End of a call transfer. PARAM value indicates the

failure reason.

Transfer

end(success)

result=success| duration=%.3f

End of a call transfer when successful. Duration is only

present for a successful bridge-transfer, value in

seconds to millisecond resolution.

VXML Event

event=%s

or event=%s| message=%s

Logs events thrown by VXML application

6.1.1 Configuring Event Logging

Each event has a default configuration (on or off) at the platform level. This is determined by the

platform administrator and may vary from the factory default settings provided by Holly.

Through the Application Parameters in HMS it is possible to enable or disable individual events

separately for each Application. The parameter name is “client.log.<event name>” and the parameter

value is “true” or “false”. The event name must be lower case and any space should be replaced by

underscore.

For example:

client.log.system_response = true

For optimal platform and database efficiency it is recommended to remove any logging options that are

not required.

6.2 <log> Element

VoiceXML provides the standard <log> element for use in applications. The standard states that the

behavior of <log> is platform-specific.

The Holly Voice Platform logs each <log> event as an event record. All events for a call - including

<log> and many other events - can be displayed and analyzed through the Holly Management System.

Confidential

Final


This section defines Holly's specific behaviors relating to the <log> element. These capabilities have

been implemented to simplify application development and debugging, as well as platform operation.

The format of the ‘log’ event is:

ID

Param ID

log

label=<label in log tab>|expr=<expr in log tag>|content=<user defined parameter>

6.2.1 Label on <log>

The VocieXML <log> element allows a “label” attribute. If this attribute is provided by the application

then the label is inserted into the text of the logged event.

For example, the following code:

<log label="myLabel">

dialog state info

</log>

would result in the following event being recorded in the LOG_EVENTS table.

ID

Param ID

log

label=myLabel|content=dialog state info

6.2.2 Changing the Event Type

Holly can log over 30 different event types. Holly also allows applications to create custom event

types. This is often used when generating custom reports from the platform database.

If the content of the <log> tag is of the form “EVNT={name}|{description}” then the event will be

logged as:

ID

Param ID

name

description

For example the code:

<log>EVNT=myevent|This is the text of myevent.</log>

would result in the following record in the LOG_EVENTS table:

ID

Param ID

myevent

This is the text of myevent

Any spaces in the event ID field are stripped, and any consecutive white space in the Param ID field is

converted to a single space.

6.2.3 Objects and Arrays

For convenience objects and arrays can be inserted into <log> elements using <value> elements. HVP

5.0 can report display object arrays as per the following examples:

The array:

values[0] = zero

values[1].name = one

values[1].value = 1

values[2] = two

Confidential

Final


values[3][0] = cero

values[3][1] = uno

values[3][2] = dos

will be logged as:

values=[ zero, [object], two, [array] ]

The object:

result.fruit = apple

result.pizza.base = thin

result.pizza.topping = hawaiian

result.drink = juice

result.lotto[0] = 21



will be logged as:

result={ fruit:apple; pizza:[object]; drink:juice; lotto:[array] }

6.2.4 ECMAScript Log Function

Holly has extended VoiceXML with the built-in function ‘session.logEvent()’ to enable logging from

ECMAScript scripts within VoiceXML applications. The effect of the function is as though the argument

had been included in a <log> tag in VoiceXML. The log message will appear with a “script=” prefix.

6.3 Call Record: LOG_CALLS

The Call Detail Record (or LOG_CALLS in the Holly database) contains three attributes that can be set

by an application to indicate the characteristics or outcome of a call. These application-specific fields

can be used in HMS to enhance reporting and analysis (e.g. what is the average duration of calls from

“gold card members”).

The values of the field are set by a VoiceXML application during execution of the call at any time from

start to end of VoiceXML execution.

The fields are:

• outcome: If set, it must be one of SUCCESS, FAIL or UNKNOWN. No other values are allowed.

Lower case values will be converted automatically to uppercase. If outcome is not defined then

the record will be “null”.

• calldesc1 and calldesc2: Call Descriptions 1 & 2 are application-defined values and can be any

string up to 100 characters. Strings over 100 characters are truncated. If the values are not

defined then the value will be “null”.

Use of this capability is optional. If not set then “null” values are recorded.

The values can be logged by the <disconnect> and <exit> tags and specifically the ‘namelist’ attribute.

The VoiceXML/ECMAScript variable of the same name (i.e. outcome, calldesc1 or calldesc2) must be

declared and assigned in advance. The variable (or variables) can then be referenced as below:

<var name="calldesc1" expr="Payment Completed by Credit Card"/>

<disconnect namelist="calldesc1"/>

Or


<exit namelist="calldesc1"/>

Confidential

Final


More than one attribute can be set using the <disconnect> or <exit> tag by including the variable

names in the namelist expression separated by a space. For example:


<var name="calldesc2" expr="Gold Status Account Holder"/>

<var name="outcome" expr="SUCCESS"/>

<disconnect namelist="calldesc1 calldesc2 outcome"/>

The attribute can also be set using the <log> tag. This mechanism is different because it uses the label

section of the <log> tag to reference the attribute rather than a variable name and to derive the value

to be logged in the attribute from the expr section. For example:

<log label=”calldesc1” expr=”Payment Completed by Credit Card”\>

This mechanism does not require a variable to be declared (but a variable can be referenced in the

expr section in the normal way if desired.) Only one attribute can be set in a single <log> expression,

so multiple <log> expressions must be used to set all the attributes:

<log label=”outcome” expr=”SUCCESS”\>

<log label=”calldesc1” expr=”Payment Completed by Credit Card”\>

<log label=”calldesc2” expr=‘”Gold Status Account Holder”\>

If any attribute is set more than once during the execution of a call then previous values are

overwritten.

6.4 Log Suppression

Some deployments of DTMF and speech applications involve the collection and/or presentation of

sensitive information. Examples of sensitive information might include security identifiers (e.g. PIN),

or private caller data (e.g. phone number, credit card number, name, address and bank account

balance).

To protect this information, log suppression can be enabled and disabled around sensitive information

within an application through the VoiceXML extension property ‘suppresslogs’.

When set the Holly Voice Platform will suppress all logging of events. Additionally, Holly will suppress

logging by the following speech recognizers if they are currently in use:

• Nuance 8.5 MRCP v1

• Nuance 9.0 MRCP v1

• Nuance ASR 8.5-20050930

• ScanSoft OSR 3.0.9

• Holly DTMF.

To enable log suppression the following should be included in the VoiceXML document. It affects only

the current scope (field, form, document etc).

<property name="com.holly.suppresslogs" value="true"/>

Confidential

Final


Note: VoiceXML <property> scoping rules apply. The allowable scopes are application, document,

form, menu, form item – see VoiceXML 2.0 S6.3.

The suppression applies to trace logging by all Holly components. Suppression also blocks audio

recording of utterances and data logging by the recognizers stated above.

6.4.1 Exceptions to Suppression

The suppression does not affect the following:

• Explicit requests for logging by an application using the <log> tag of logEvent() function

• Logging of warnings or errors

• Full call recording

• "call start" event

• An initial ASR Session event

• "exit" and "call end" events

• Call detail record (LOG_CALLS record)

6.4.2 Record of Suppression

An event with the event ID “note” is logged to indicate when logging as suppression is applied and as

logging is re-enabled. Both events can be viewed in the Customer Usage report Call Event log and are

controlled by the callevents.browser configuration parameter. The “note” event is enabled by default.

Figure 1 Notification of log suppression

Confidential

Final


© 2009 Holly Connects

48/61

hvp-vxml-0009

Holly 5-1 VoiceXML Developer Guide v1-0.doc

6.4.3 Logging Masked Data

Standard programming techniques can be used by application developers to mask logging of sensitive

data. The following VoiceXML snippet shows the collection of a 16-digit credit number with the logging

of only the last four digits; e.g. a credit card number might be logged as "XXXXXXXXXXXX1234":

<property name="com.holly.suppresslogs" value="true"/>

<form>

<field name="credit_card" type="digits?length=16">

<prompt>

Please enter your credit card number followed by the pound sign.

</prompt>

<filled>

<log>auth_number: <value expr=" 'XXXXXXXXXXXX' + creditcard.substring(12,4)"/></log>

</filled>

6.5 Raising Alarms

The Holly Voice Platform generates alarms that may be monitored by the system administrator. These

alarms, which are described in detail in the Operations Manual and Reference Manual, can result in

SNMP traps, syslog events, file logging or email messages according to the configuration of the platform.

The platform can raise alarms for a range of failure scenarios during VoiceXML application execution

(HTTP issues, VoiceXML parsing errors, missing documents and many more as documented in the

Reference Manual).

It is sometimes required that an application explicitly raise an alarm. This is implemented by a

subdialog call through the Holly VoiceXML Subdialog Server (HVSS) which is a Holly Voice Platform

component that may be activated by the system administrator. The following example code shows how

to raise and then clear an alarm through HVSS.

<form id="raiseAndClearAlarm">

<var name="hollyAlarmType" expr="'appServerConnectErrorAlarm'"/>

<var name="hollyAlarmDescription" expr="'testing HVSS alarm interface'"/>

<var name="hollyAlarmID"/>



<subdialog name="raise" src="http://localhost:8030/holly/raise"

namelist="hollyAlarmType hollyAlarmDescription">

<filled>

<assign name="hollyAlarmID" expr="raise.alarmID"/>

</filled>

</subdialog>

<block>

alarm id is <value expr="hollyAlarmID"/>

</block>



<subdialog name="clear" src="http://localhost:8030/holly/clear"

namelist="hollyAlarmType hollyAlarmID">

<filled>

<assign name="hollyAlarmID" expr="''"/>

</filled>

</subdialog>

</form>

The alarm type must be one of the alarm types defined in the Reference Manual. The description string

can be customized to pass meaningful information to the system monitor.

The variable names passed in the namelist must be exactly as provided in the example above.

Confidential

Final



49/61

hvp-vxml-0009


The alarm ID is created by the Holly Foreman. The ID returned from the ‘raise’ subdialog must be

passed to the ‘clear’ subdialog to clear the correct alarm.

Developers should coordinate with the platform administrator so that application-raised alarms are

appropriately monitored in platform operations.

Confidential

Final



50/61

hvp-vxml-0009


A. Appendix: Application Parameters

This section covers the following categories of application parameters:

• VoiceXML

• Speech Recognition

• DTMF

• Text to Speech

• Logging

• Telephony.

A.1 VoiceXML

Key Permitted Values

Default Value

Description

com.holly.audiobadfet

ch

true

false

false

When set to “true” the interpreter will throw an

error.badfetch event if an audio file fetch fails. Refer to

section on Audio Fetch Failures.

Note that the behavior of the VoiceXML interpreter with

this parameter set to “true” violates the W3C VoiceXML 2.0

Recommendation.

This parameter can also be set by an administrator on the

HMS Applications page.

com.holly.audiofetchal

arm

true

false

false

This property enables SNMP and email alarming for missing

prompts; it can be turned on or off as required. Refer to

section on Audio Fetch Alarms.

Note: In the case of TTS fallback, if an audio fetch fails and

there is no TTS then an alarm of severity = WARNING is

raised.



com.holly.dtmfbuffercl

ear

true

This property allows a developer to manually clear buffered

digits. It is processed on recognize and record start.

com.holly.xmlspace

normalize

ignore

ignore

This parameter affects how whitespace is handled in

<prompt> and <log> elements. If the value is ‘normalize’,

meaningful whitespace in prompts is preserved; if the value

is “ignore”, some meaningful whitespace may be lost.

Changing this value to “normalize” will impact platform

performance. This parameter can also be set by an

administrator on the HMS Applications page.

audiomaxage,

documentmaxage,

grammarmaxage,

scriptmaxage

[integer]

Set these properties to 0 to disable caching. Refer to

section on Caching.



Confidential

Final



51/61

hvp-vxml-0009


Key Permitted

Values Default Value

Description

singlecookieheader

true

false

false

If set, and there is more than one cookie for an HTTP

request, the browser sends the cookies folded into a single

HTTP Cookie header, as described in RFC 2965, section

3.3.4.

To disable this parameter, delete it from the list.



A.2 Speech Recognition


Default Value

Description

asrengine

[string]

Use this property to switch between ASR engines within a

single VoiceXML document.

Refer to the ASR section for a list of possible values.



com.holly.collapsesingl

eslot

true

false

false

Some older voice browsers collapse a structured recognition

result to a simple string value if the structure contained a

single element. Setting this property to “true” will cause

the Holly Voice Browser to do this.

Note that the behavior of the Holly VoiceXML interpreter

with this property set to “true” violates the W3C VoiceXML

2.0 Recommendation.



com.holly.distincttime

out

true

false

false

For recognizers that enable the platform to distinguish the

two timeouts completetimeout and incompletetimeout, the

‘com.holly.distincttimeout’ property can be set to “true”

to permit the timeouts to be treated differently.



com.holly.grammarfetc

hstyle

default

absolute

relative

default

By default the Holly Voice Browser fetches all grammars

referenced by URI in VoiceXML documents. This behavior is

often not desirable and can be changed using this property.



com.holly.grammarlab

el

[string]

This property is useful only if the Nuance 8.5 plug-in

recognizer is being used. The supplied string is passed to

the ASR, and included in the Nuance logs. This can be

useful for grammar tuning.


HMS Applications page. It is expected that this property will

be set in VoiceXML documents rather than as an application

parameter in the Holly Management System.

Confidential

Final



52/61

hvp-vxml-0009


A.3 DTMF


Default Value

Description

com.holly.fetchaudiodt

mf

true

false

false

If set to “true”, the DTMF buffer will be cleared when the

Holly Voice Browser plays fetch audio.



interdigittimeout

[integer]

Use interdigittimeout to control DTMF recognition timing.



termtimeout

[integer]

0

For use with the Nuance 8.5 ASR engine an application

parameter should be set to disable the termtimeout (i.e.

“termtimeout=0”).



A.4 Text to Speech


Default Value

Description

ttsengine

Use this property to switch between TTS engines within a

single VoiceXML document. Refer to TTS and Prompting

section for a list of possible values.

Note: it is not possible to have prompts in the same queue

using a different TTS setting. A switch will only take place

when the queue is flushed (usually by performing

recognition).



ttsvoice

This property is used to set a specific TTS voice for an

application. The available values for this parameter are

dependent on the TTS voices installed on the platform.



A.5 Logging


Default Value

Description

com.holly.suppresslogs

true

false

False

If this property is set to “true”, the diagnostic logging in

the Holly Voice Browser for the channel running the

application will be turned off; so will the diagnostic logging

for the Holly Voice Gateway for the duration of any

recognitions in the scope of the VoiceXML property.

recordutterance

true

false

0

This property uses ‘sr.recordutterances’ to implement the

recording of utterances in a scoped manner.

Confidential

Final



53/61

hvp-vxml-0009


A.6 Telephony


Default Value

Description

com.holly.transferclid

[string]

The supplied string is used as the user part of the SIP From

header in the transfer INVITE (for bridge transfer) or REFER

(for blind transfer).



Confidential

Final



54/61

hvp-vxml-0009


B. Appendix: Re-Recognition from Recorded Utterance

In normal VoiceXML execution speech recognition and DTMF input processing are performed using live

input from the caller. Specifically, the application declares a set of prompts and grammars and then

reaches a “wait state” at which point the application suspends while the prompts are played and input

from the caller is matched against the grammars.

With re-recognition from an utterance this normal behaviour is modified so that a previously recorded

input from the caller, which has been stored as a wavefile, is used during the wait state as input to the

speech/DTMF recognition process (instead of live audio).

The following is an abstraction of the typical use case for re-recognition capability:

1. Prompt the caller to say some information (e.g. their name)

2. Keep a recording of the caller’s response to the prompt (e.g. a recording of the name)

3. Optionally, attempt to recognize this input using a broad grammar (e.g. a list of 100,000 common

names)

4. Further interaction with the caller to gather further information that would help in determining

what the caller said at step (e.g. zipcode/postcode, suburb name, street name)

5. Create a targeted grammar using the further information (e.g. list of names in a specified postcode

+ suburb + address based on a postal database)

6. The VoiceXML application performs a re-recognition using the utterance recorded in step 1/2 and

the grammar created in step 5.

The key features of Holly’s implementation of re-recognition are:

• Standard recognition: All the standard VoiceXML capabilities for speech recognition are available

for re-recognition including parallel grammars, configuration properties, form filling, N-best and

confidence scores. Except for setting the re-recognition property (using a standard VoiceXML tag),

re-recognition is like any recognition ensuring familiarity to developers and enabling the use of

standard development tools and frameworks.

• Re-recognition of speech input: Re-recognition supports speech playback only – DTMF input is not

currently supported.

• Supports any MRCP speech recognizer: Re-recognition supports all MRCP ASR integrations including

Nuance (multiple products), IBM, Loquendo, LumenVox, Siemens and Telisma. (The Holly DTMF

Recognizer and vLingo do not support re-recognition.)

• Real-time or optimised playback: All MRCP integrations will support real-time playback of

recorded audio (i.e. re-recognition at normal speed). Where an MRCP ASR product support faster

than real-time playback Holly will stream the audio faster so that the re-recognition delay is

reduced for faster response to a caller.

• VoiceXML Extension: This capability is an extension because the VoiceXML standard defines no

native means of implementing the capability. Applications that use the capability are not

portable to other platforms.

Confidential

Final



55/61

hvp-vxml-0009


B.1 Re-recognition in VoiceXML Applications

This section documents the way in which VoiceXML applications are written to use re-recognition. Since

re-recognition is nearly identical to normal speech recognition the model will be familiar to VoiceXML

developers.

B.1.1 Using Re-Recognition

The only required difference between a normal recognition and a re-recognition is the declaration of

the variable name for the recorded utterance to be used in re-recognition:

<property name=”com.holly.rerecognition” value=”nextUtterance”>

This declaration must be placed in the scope of the re-recognition, typically the <field> at which re-

recognition apply.

The value is the name of an ECMAScript variable that contains a recording. This variable must be

assigned the value of a previous recording that it either:

• <record> item variable (See VoiceXML 2.0 Section 2.3.6)

• application.lastresult$.recording (see VoiceXML 2.1 Section 7)

Table 3 presents a sample re-recognition application.

1. <?xml version="1.0" encoding="UTF-8"?>

2.

3. <vxml xmlns="http://www.w3.org/2001/vxml"

4. xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

5. xsi:schemaLocation="http://www.w3.org/2001/vxml

6. http://www.w3.org/TR/voicexml20/vxml.xsd"

7. xml:lang="en-US"

8. version="2.1">

9.

10. <form id="form">

11. <var name="nextUtterance"/>

12. <field name="yesno" type="boolean">

13. 

14. 

15. <property name="recordutterance" value="true"/>

16. Say yes or no

17. <filled>

18. <assign name="nextUtterance"

19. expr="application.lastresult$.recording"/>

20. </filled>

21. <field>

22.

23. <field name="yesno2" modal="true">

24. 

25. 

26. <property name="com.holly.rerecognition"

27. value="nextUtterance"/>

28. <grammar mode="voice" version="1.0" root="YN">

29. <rule id="YN" scope="public">

30. <one-of>

31. <item> yes </item>





Confidential

Final



56/61

hvp-vxml-0009


32. <item> no </item>

33. </one-of>

34. </rule>

35. </grammar>

36. <prompt bargein=”false”>Let me check that</prompt>

37. <field>

38.

39. <block>

40. first recognition was <value expr="yesno"/>

41. re-recognition was <value expr="yesno2"/>

42. </block>

43. </form>

44. </vxml>

Table 3: Example of re-recognition

Commentary on the example:

• Line 8: declare version=“2.1” on the <vxml> tag to enable the recording during recognition feature

of VoiceXML 2.1.

• Lines 11-20: this is a standard VoiceXML recognition <field>. The recordutterance property is set

to true in line 14 so that the application can take a copy of application.lastresult$.recording in

line 17.

• Lines 22-36: this <field> is nearly identical except for (a) the declaration of modal which will often

be used in re-recognition to disable global grammars, (b) the use of an inline grammar which

illustrates the use of a different grammar for re-recognition, and (c) the declaration of

com.holly.rerecognition in line 25 which requests that Holly use the recorded stored in the

“nextUtterance” variable rather than live input.

• Line 36: the prompt played prior to re-recognition is set with barge-in off to ensure that the

complete prompt is played to the caller without interruption.

• Lines 38-41: the processing of normal recognition and re-recognition results is identical.

B.1.2 <nomatch>, <noinput>, disconnect and other re-recognition outcomes

The VoiceXML application should handle all possible outcomes from re-recognition.

• <nomatch>: the re-recognition did not successfully match any active grammar. The application

may wish to try re-recognition against a different grammar or request that the caller provide the

input again.

• <noinput>: the re-recognition did not detect spoken input in the recorded utterance. No-input

will not normally occur if the original collection of the recording checked for noinput and speech

mode. Nevertheless, applications should handle this case because, for example, the recording

may be quiet and not trigger speech detection on the re-recognition. The application developer

may find that adjusting the sensitivity setting or changing the timeout settings will affect the

detection of speech.

• Hang-up: the caller may hang-up whilst a re-recognition is in progress. A connection.disconnect

event will be thrown as in normal VoiceXML execution.

• maxspeechtimeout: the maxspeechtimeout event will be thrown if the recording exceeds the

duration configured in the current scope.

Confidential

Final



57/61

hvp-vxml-0009


• <help>, <cancel>, <exit>: if the universal grammars are active then these events will be thrown

if the universal grammar is matched. Deactivating universals or using a modal field will prevent

this from occurring.

• Grammar exceptions: an exception will be thrown for any of the normal speech recognition errors

such as illegal grammars and unavailable languages.

B.1.3 Re-recognition Variable Values & Scoping

The example above uses a form-scope variable to pass the utterance to re-recognition. The following

are the full requirements for the ECMAScript variable name passed as the value for the

“com.holly.rerecognition” property.

1. The value must be a legal ECMAScript variable identifier;

2. The variable (e.g. “nextUtterance”) must be accessible from the scope in which the VoiceXML

application enters the wait state to perform re-recognition;

3. The variable may be explicitly scoped (e.g. “applicaton.nextUtterance”, “myform.varName”);

4. The variable must be a reference to a previous recording collected by either a <record> item

variable or from application.lastresult$.recording.

If the ECMAScript variable name does not meet these conditions then a VoiceXML error is thrown. The

following are error conditions that developers should avoid.

• The property value is not a legal ECMAScript variable name (e.g. “9”);

• The variable is undefined;

• The variable is a Number, String or any other non-waveform reference.

B.1.4 Form Filling

The normal form-filling behaviours of VoiceXML apply to re-recognition including the mapping of

semantic results to VoiceXML forms.

B.1.5 Grammar Scope

The normal VoiceXML grammar scoping rules apply to re-recognition.

Both external and inline grammars may be used.

It is expected that <field> grammars in modal fields will be the most common usage of re-recognition.

However, form grammars may be declared as well as mixed-initiative, <link> and <menu> grammars.

B.1.6 Grammar Modes

The re-recognition capability currently enables playback of audio input only and does not playback

DTMF input. DTMF grammars, if declared, will be loaded and enabled as normal but will not be

matched by the playback of the recorded waveform.

B.1.7 Utterance Recording

The application.lastresult$.recording variable is filled for a re-recognition as in a normal recognition

(see VoiceXML 2.1 Section 7). As is normal, the recordutterance must be set true for the recording to

be provided.

The utterance recording will be similar or identical to the recording provided as input for re-

recognition. There may be slight variation due to different end-pointing of the speech input.

Confidential

Final



58/61

hvp-vxml-0009


B.1.8 Result Processing

The “application.lastresult$” array is filled following the normal VoiceXML behaviour including

completion of the n-best results, confidence score, input mode and interpretation.

B.2 Prompts and Barge-in

Recommendation: bargein=false

An application may play one or many prompts prior to re-recognition. Applications will normally set

barge-in off (i.e. false) for prompts prior to re-recognition so that they are played in their entirety.

This is because (a) barge-in for re-recognition is typically much sooner than for normal spoken input

and (b) unlike a normal recognition the caller cannot choose whether to wait for the prompt to

complete before input is provided.

The sample presented in Table 3 in Section B.1.1 set barge-in off to ensure playback of a prompt.

The period after the end-of-prompt until the availability of the re-recognition result is typically brief

because the original recorded audio will have the leading and trailing silence removed. Furthermore,

most MRCP recognizers (currently all but Nuance 9) allow faster-than-realtime presentation of

recorded audio for re-recognition.

Use bargeintype=speech (default)

It is recommended that applications leave the bargeintype as “speech”. This ensures return to the

application immediately following re-recognition irrespective of whether there is a match, nomatch or

noinput. The use of the alternate “hotword” mode is not recommended because in the event of a

nomatch hotword recognition should continue to retry and this not sensible with re-recognition.

B.3 ASR Configuration

The standard speech recognition properties defined by VoiceXML 2.0 apply to re-recognition (see

VXML2.0 section 6.3.2 and 6.3.6). This includes confidencelevel, sensitivity, speedvsaccuracy (not

supported by all ASR engines), completetimeout, incompletetimeout, maxspeechtimeout and maxnbest.

Holly’s “asrengine” property and engine-specific configurations (e.g. “swirec” and “swiep” for Nuance

9) are supported as usual.

Confidential

Final



59/61

hvp-vxml-0009


C. Appendix: Holly DTMF Recognizer v2

Holly’s DTMF recognizer recognizes DTMF input in a caller’s audio stream and enables users to specify

grammars using the standard XML form of the Speech Recognition Grammar Specification (SRGS), as

specified by W3C and VoiceXML 2.0. Many applications already have grammars for DTMF specified using

SRGS+XML for Nuance or OSR, these are supported with minimal alteration.

The Holly DTMF Recognizer v2 is part of a shared-object plug-in to the Holly Voice Gateway (HVG). It

can be configured to be available or unavailable per HVG instance. Recognition takes place within the

plug-in on the same machine as the HVG; it does not act as a proxy for a remote recognizer as the

Nuance plug-ins do.

The Holly DTMF Recognizer v2 allows multiple grammars may be activated simultaneously. Grammars

may be activated by URI in order to support the preferred absolute mode of grammar fetching, or

may be supplied as an inline VoiceXML grammar.

The Holly DTMF Recognizer is enabled by setting the parameter ‘asrengine’ to “dtmf”. This property

can be set via the following methods:

• Through the Holly Management System’s Applications page on a per-application basis. or

• As an explicit <property> element within the code of the VoiceXML application.

The Holly DTMF Recognizer v2 supports the VXML <record> element.

The Holly DTMF Recognizer v2 implements the “literals” syntax of the SISR 1.0. Support for the

“semantics/1.0” syntax is planned for a future release.

C.1 SRGS+XML

The Holly DTMF Recognizer v2 is a conforming XML form grammar processor, as specified in SRGS

section 5.4, except that it is not required to support references to rules defined in external grammars.

In particular, the recognizer:

• Parses and processes all XML and XML Namespaces constructs.

• Ignores xml:lang attributes in grammar documents because they are not relevant to DTMF.

• Logs DTMF results in the HMS as with any speech recognition result.

• Ignores grammars whose mode attribute is "voice". Note that the default value for the mode

attribute is "voice", so the Holly DTMF Recognizer v2 will only process grammar documents that

explicitly set the mode to "dtmf".

C.2 Sample grammars

• Basic Menu

• Boolean

• Digits

• Phone

C.2.1 Basic Menu

Confidential

Final



60/61

hvp-vxml-0009


This sample grammar supports a simple menu collection that enables a caller to enter options 1, 2, 3, 4

or 0. The VoiceXML application will apply the behaviors to the digits, for example, “sales”, “directory”,

“operator” etc. The value returned to the application is the DTMF key; “1”, “2”, “3”, “4” or “0”. If

the caller enters any other DTMF key (e.g. 5 – 9, * #) then VoiceXML will present a “nomatch”.

<?xml version="1.0" encoding="iso-8859-1"?>

<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar"

root="menu" mode="dtmf">

<rule id="menu" scope="public">

<one-of>

<item>1</item>

<item>2</item>

<item>3</item>

<item>4</item>

<item>0</item>

</one-of>

</rule>

</grammar>

C.2.2 Boolean

This grammar collects a Boolean input. It demonstrates the use of literal tags to return an application-

defined string rather than the DTMF sequence. If the caller enters “1” then the returned result is

“true”. Similarly, for entry of “2” the result is “false”.



root="boolean" mode="dtmf" tag-format="semantics/1.0-literals">

<rule id="boolean" scope="public">

<one-of>

<item>1<tag>true</tag></item>

<item>2<tag>false</tag></item>

</one-of>

</rule>

</grammar>

Note: The return values of the sample boolean grammar are returned as strings.

C.2.3 Digits

This sample grammar supports a digit sequence with 1 or more digits (with no imposed limit). For

example, the following sequences are legal: “1234”, “123456789”. The return value is the entered

DTMF sequence as shown (i.e. without any whitespace).

Since there is no limit on the number of digits, the recognizer will keep waiting for digits until either:

• A DTMF “termchar” is received: (termchar is a VoiceXML property that can be set by the

application with the default value of “#”). The return value does not include the termchar.

• A DTMF interdigit timeout is reached (default value is 3 seconds)



root="digits" mode="dtmf">

<rule id="digits" scope="public">

<item repeat="1-">

<one-of>

http://www.w3.org/2001/06/grammar



Confidential

Final



61/61

hvp-vxml-0009


<item>0</item>

<item>1</item>

<item>2</item>

<item>3</item>

<item>4</item>

<item>5</item>

<item>6</item>

<item>7</item>

<item>8</item>

<item>9</item>

</one-of>

</item>

</rule>

</grammar>

C.2.4 Phone

This sample grammar re-uses the digits grammar above to recognize telephone numbers with an

optional extension. The “*” key is used to mark the extension.

There are no constraints on the length of the phone number. Making modifications to the repeat value

allows support for specific national phone patterns.



root="phone" mode="dtmf" tag-format="semantics/1.0-literals">

<rule id="phone" scope="public">

<ruleref uri="#digits"/>

<item repeat="0-1"> dtmf-

star

<ruleref uri="#digits"/>

</item>

</rule>

<rule id="digits">

<item repeat="1-">

<one-of>

<item>0</item>

<item>1</item>

<item>2</item>

<item>3</item>

<item>4</item>

<item>5</item>

<item>6</item>

<item>7</item>

<item>8</item>

<item>9</item>

</one-of>

</item>

</rule>

</grammar>

*** End of Document ***