17
Global Information Systems Group Department of Computer Science ETH Zurich, Switzerland Aural Interfaces to Databases based on VoiceXML Beat Signer , Moira C. Norrie, Peter Geissbuehler and Daniel Heiniger

Aural Interfaces to Databases based on VoiceXML

Embed Size (px)

DESCRIPTION

Presentation given at VDB6, 6th IFIP Workshop on Visual Database Systems, Brisbane, Australia, May 2002 ABSTRACT: As part of a general framework for the development of global information systems, we include support for the development of aural interfaces. The framework uses an object-oriented database for the management of application, document content and presentation data. The access layer is based around an XML server and XSLT for document generation from default and customised templates. Specifically, aural interfaces are supported through a VoiceXML server that provides the speech recognition and synthesis mechanisms, together with XSLT templates for the generation of VoiceXML. In this paper, we describe the implementation of a generic voice browser for application databases as well as the development of a customised aural interface for a community diary managing appointments and events.

Citation preview

Page 1: Aural Interfaces to Databases based on VoiceXML

Global Information Systems Group

Department of Computer Science

ETH Zurich, Switzerland

Aural Interfaces to Databases

based on VoiceXML

Beat Signer, Moira C. Norrie,

Peter Geissbuehler and Daniel Heiniger

Page 2: Aural Interfaces to Databases based on VoiceXML

Global Information Systems Group

Department of Computer Science

ETH Zurich, Switzerland

Outline

Motivation

Architecture

Voice Interfaces

Application Development

Page 3: Aural Interfaces to Databases based on VoiceXML

Global Information Systems Group

Department of Computer Science

ETH Zurich, Switzerland

Avalanche Forecasting System

Project to provide

WAP and

Voice Access

Page 4: Aural Interfaces to Databases based on VoiceXML

Global Information Systems Group

Department of Computer Science

ETH Zurich, Switzerland

Avalanche Forecasting System ...

Information model (OM model) for SLF

forecast data

Application user interfaces for WAP

and voice access

national bulletin with maps and glossary

local bulletin based on a region's start

letter, GPS or Swiss Coordinates

WAP responses for voice requests

(mixed-mode) or triggered events

Page 5: Aural Interfaces to Databases based on VoiceXML

Global Information Systems Group

Department of Computer Science

ETH Zurich, Switzerland

Requirements

Platform supporting universal client access to databases→ eXtensible Information Management Architecture (XIMA)

Use of a technology which allows the separation of content and presentation → XML and XSL

Minimise effort to support new types of client devices, e.g. XML, HTML, WML, CHTML, VXML, ?

Page 6: Aural Interfaces to Databases based on VoiceXML

Global Information Systems Group

Department of Computer Science

ETH Zurich, Switzerland

XIMA

OMS Java API

OMS Java Workspace

XML Server

HTML Servlet WML Servlet VXML Servlet

HTML

Browser

WML

Browser

VXML

Browser

Delegation

Builds XML

based on JDOM

XML + XSLT

→ Response

Main Entry Servlet

OM Model

Collections, Associations, multiple inheritance and multiple instantiation

Page 7: Aural Interfaces to Databases based on VoiceXML

Global Information Systems Group

Department of Computer Science

ETH Zurich, Switzerland

XML Reponse

valid?

XML Response

<?xml version="1.0" encoding="ISO-8859-1"?>

<oms>

<instance id="OM_4077" last="true" pos="1" type="person">

<dressedWith type="person"/>

<attribute name="name">

<string>Moira Norrie</string>

</attribute>

<attribute name="picture">

<mime>/globis/staff/moira.jpg</mime>

</attribute>

<method name="age"/>

<link idref="OM_2693" inv="false" name="Workplace"/>

</instance>

</oms>

XML Schema

<xsd:element name="oms">

<xsd:complexType>

<xsd:choice minOccurs="0" maxOccurs="unbounded">

<xsd:element name="workspace" type="workspaceType"/>

<xsd:element name="instance" type="instanceType"/>

<xsd:element name="collection" type="collectionType"/>

<xsd:element name="association" type="associationType"/>

<xsd:element name="result" type="resultType"/>

<xsd:element ref="warning"/>

</xsd:choice>

</xsd:complexType>

</xsd:element>

<xsd:complexType name="instanceType">

<xsd:sequence>

<xsd:element name="dressedWith" type="dressedWithType" …>

<xsd:element name="link" type="linkType" minOccurs="0" …>

</xsd:sequence>

<xsd:attribute name="id" type="xsd:string" use="required"/>

</xsd:complexType>

Page 8: Aural Interfaces to Databases based on VoiceXML

Global Information Systems Group

Department of Computer Science

ETH Zurich, Switzerland

VoiceXML

DevelopmentIBM WebSphere Voice Server SDK

DeploymentBeVocal Cafe Voice Portal

Speech

Recogniser

Converts voice

input into text

Speech model

Language

Analyser

Extracts meaning

from text

Grammar

Application

Server

Gets data (text)

from database

Application

database

Speech

Synthesiser

Generates

speech output

Pronounciation

rules

MeaningText Text

Voice Input Voice Output

Speech Speech

Page 9: Aural Interfaces to Databases based on VoiceXML

Global Information Systems Group

Department of Computer Science

ETH Zurich, Switzerland

VoiceXML ...

VoiceXML is an application of XML

Describes call flows and human machine

dialogues

Use advantages of web-based development

and content delivery to build interactive voice

response applications

Hello Word Example

<?xml version="1.0" encoding="ISO-8859-1"?>

<vxml version="2.0">

<form id="f1">

<block>Hello World</block>

</form>

</vxml>

Page 10: Aural Interfaces to Databases based on VoiceXML

Global Information Systems Group

Department of Computer Science

ETH Zurich, Switzerland

XML to VXML Example

XML Response

<?xml version="1.0" encoding=… ?>

<oms>

<instance id="OM_4077" type="person" …>

<dressedWith type="person"/>

<attribute name="name">

<string>Moira Norrie</string>

</attribute>

<method name="age"/>

</instance>

</oms>

XSLT Stylesheet

<xsl:template match="instance">

<form id="instance_entry">

<block>

<xsl:choose>

<xsl:when test="count(dressedWith)=1">

Object

<xsl:call-template name="removeUnderscore">

<xsl:with-param name="label" select="@id"/>

</xsl:call-template>

is dressed with type

<xsl:value-of select="./@type"/>

</xsl:when>

</xsl:template>

VXML Result

<?xml version="1.0" encoding="ISO-8859-1"?>

<vxml application="http://macbain/xima/omsmain_root.vxml" version="2.0">

<form id="instance_entry"><block>

Object 4077 is dressed with type person and is viewed as type person.

<prompt>It contains 8 attributes, 5 links, and 1 method</prompt>

<goto next="#instance_process"/></block></form>

<form id="instance_process"><field name="Member_Choice"><prompt>Would you

like to hear the attributes, the links or the methods or go back?</prompt>

Page 11: Aural Interfaces to Databases based on VoiceXML

Global Information Systems Group

Department of Computer Science

ETH Zurich, Switzerland

Design Phase

Define the required functionality

User analysis

motivation, expertise

High level decisions

full-duplex (barge-in)

simple grammars (dynamic)

only synthesised speech (TTS)

Representation of base types

Information flow

Page 12: Aural Interfaces to Databases based on VoiceXML

associationscollections objects

The database contains #Collections #Associations

Would you like to go to the collections, to the associations,

directly to an object or back to the main menu?

The database contains the

following # associations

Choose an association

Association 'name' contains #A

Would you like to list the

members or go back?

Association 'name' contains the

following # associations

Choose a 'domaintype' or

a 'rangetype' or say back

Object 'oID' is dressed with type 'type' and currently viewed as type 'type'. It contains #Attr, #Links, #Methods

Choose a link

or say back

The object contains the

following # attributes

Would you like to hear the attributes, the links or

the methods, change the type or go back?

You can choose among

the following links

You can choose among

the following methods

You can view the object

as the following types

The database contains the

following # collections

Choose a collection

Collection 'name' contains #M

Would you like to list the

members or go back?

Collection 'name' contains the

following # members

Choose one of the members

The database contains #Objects

Choose an object or say back

Choose a method

or say back

Choose one of the

types or say back

The result of the

method is Result

Page 13: Aural Interfaces to Databases based on VoiceXML

Global Information Systems Group

Department of Computer Science

ETH Zurich, Switzerland

Test and Refinement Phase

Recognition problems

elimination of similar sounding words from

the grammar

addition of optional words to the grammar

(e.g. "please")

Insufficient help functionality

introduction of prompt-specific help

instead of always active command list

Immediate feedback after input has

been processed ("OK" prompt)

Page 14: Aural Interfaces to Databases based on VoiceXML

Global Information Systems Group

Department of Computer Science

ETH Zurich, Switzerland

OMS Database Development Suite

OM

Semantic Object Data Model Application Modelling

OMS Pro

Rapid Prototyping System

and Lightweight DBMSDatabase and

Application Design

OMS Java

Data Management System

and Application Framework

Implementation

Page 15: Aural Interfaces to Databases based on VoiceXML

Global Information Systems Group

Department of Computer Science

ETH Zurich, Switzerland

XIMA Application Development

Prototype the application's information

model in prototyping system OMS Pro

Export model (and data) to OMS Java

Installation of XML Server with default

XSLT stylesheets and servlets

database immediately acessible by

generic object browser

Customisation of stylesheets

Page 16: Aural Interfaces to Databases based on VoiceXML

Global Information Systems Group

Department of Computer Science

ETH Zurich, Switzerland

Conclusions

Database driven development of voice-

enabled applications

Rapid prototyping supported by OMS

Pro and XIMA's generic object browser

Multi-mode access provided by generic

object browser (HTML, WAP, VXML)

Customised user interfaces (stepwise

refinement of XSLT stylesheets)

New potential user communities

Page 17: Aural Interfaces to Databases based on VoiceXML

Global Information Systems Group

Department of Computer Science

ETH Zurich, Switzerland

Questions?