Speech Recognition Grammars

Embed Size (px)

Citation preview

  • 7/29/2019 Speech Recognition Grammars

    1/35

    VoiceXML:

    Speech Recognition Grammars

  • 7/29/2019 Speech Recognition Grammars

    2/35

    Acknowledgements

    Prof. Mctear, Natural Language Processing,http://www.infj.ulst.ac.uk/nlp/index.html, University ofUlster.

    Bevocal documentation

    http://www.infj.ulst.ac.uk/nlp/index.htmlhttp://www.infj.ulst.ac.uk/nlp/index.html
  • 7/29/2019 Speech Recognition Grammars

    3/35

    Overview

    Types of grammar Grammar design and use

    Optional items in a grammar

    Semantic tags

    DTMF grammars

    Grammar rules

    Built-in grammars

    Grammar scope

    http://localhost/var/www/apps/conversion/Voice%20demos/speechreco.exe
  • 7/29/2019 Speech Recognition Grammars

    4/35

    What is a grammar

    A grammar defines the words and patterns of words that a user cansay at any particular point in a dialogue

    Uses:

    speech recognition: to constrain the speech recognition process byspecifying permissible sequences of words

    language understanding: to determine the structure and/or

    meaning of a sequence of words e.g.Transfer one hundred dollars from my checking to my savingsaccount

    might be parsed and transformed into the structure:

    transfer

    savings

    checking

    100

  • 7/29/2019 Speech Recognition Grammars

    5/35

    Types of grammar

    Finite-state and phrase structure take the form of rules with a left-hand and right-hand side

    e.g.noun_phrase -> determiner adjective noun

    flight ->

    used in language understanding and speech recognition

    N-gram (used in speech recognition)

    based on probabilities of word combinationse.g. bigrams, trigrams

  • 7/29/2019 Speech Recognition Grammars

    6/35

    Grammar in VoiceXML

    May be specifiedInline i.e. embedded into a VoiceXML page External i.e. stored as files on Web servers, etc.

    Grammar formats XML, ABNF (Augmented BNF syntax), Java Speech

    Grammar format (JSGF), GSL (Nuances GrammarSpecification language) W3C specification embodies XML and ABNF IBM Voice Toolkit supports the XML and ABNF grammar

    formats Bevocal Caf, Voxpilot and Tellme support the XML and

    GSL grammar formats For further details on the W3C Speech Recognition

    Grammar Specification, seehttp://www.w3.org/TR/speech-grammar/

    http://www.w3.org/TR/speech-grammar/http://www.w3.org/TR/speech-grammar/http://www.w3.org/TR/speech-grammar/http://www.w3.org/TR/speech-grammar/
  • 7/29/2019 Speech Recognition Grammars

    7/35

    Inline and External Grammar Definitions

    An inline grammar is definedwithin the elementin a VoiceXML document.

    In an inline grammar, if thegrammar consists of exactly 1rule, that rule does not have tohave a name.

    GSL grammars use specialcharacters: wrap your inlinegrammar as a section ofCDATA:

    An external grammar is defined in

    an external file and referenced inthe VoiceXML document

    In an external grammardocument, all rules must benamed

    In external GSL grammar file, thecontents of that file should notbeinside a CDATA section andshould not contain a element. :

    ;GSL2.0 ...grammar ruledefinitions...

  • 7/29/2019 Speech Recognition Grammars

    8/35

    element

    Specifies a set of possible responses for a field If the number of possible responses is small, then a setof elements can be used instead of a element

  • 7/29/2019 Speech Recognition Grammars

    9/35

    Grammar Design

    A grammar should cover all the ways that a user might say

    something

    1. Alternative choices within a category e.g.studentname [john rosemary etc]

    2. Alternative words for the same concept e.g.[comms communications]

    3. Alternative sentences that have the same meaning e.g.[(student john scott taking databases)(databases john scott)(john scott taking the course databases)]

    Note: careful wording of prompts can constrain the user tosaying what has been predicted by the grammar designer

    These examples use the GSL grammar format,

    which is more suitable than the XML format forthe presentation of examples

  • 7/29/2019 Speech Recognition Grammars

    10/35

    Grammars for words

    Simplewords (or

    touch-tonestrings):tokens

    GSL

    (student name)

    XML

    student name

    Alternativewords

    GSLChoice[

    students

    courses

    reports]

    XML

  • 7/29/2019 Speech Recognition Grammars

    11/35

    Making items optional

    GSLName

    (?firstname lastname)

    XML

  • 7/29/2019 Speech Recognition Grammars

    12/35

    Making items optional-2

    ( [ news weather sports ] ?please )

    ( ?[ (i'd like) (tell me) ] ?the [ news weather sports ]?please )

  • 7/29/2019 Speech Recognition Grammars

    13/35

    Repeating items

    XML: repeat = "0-1" means the item is optional i.e. zero or one

    time

    repeat = "n- means the item is repeated n or more timese.g. 0- = zero or more times

    repeat = "m-n" means the item re repeated between mand ntimes (inclusive) e.g. 1-3 = between one and threetimes

    repeat = "n" means the item is repeated exactly n times GSL:

    +(item) - the item is repeated 1 or more times *(item) - the item is repeated 0 or more times

    ?(item) the item is optional

  • 7/29/2019 Speech Recognition Grammars

    14/35

    Grammar Slots (Tags)

    Grammar slots are used in grammars to return a valuerepresenting the meaning of the word(s) recognised e.g.

    checking account and checking should return the samevalue.

    GSL:

  • 7/29/2019 Speech Recognition Grammars

    15/35

    Grammars often consist of sub-grammars e.g.

    ;GSL 2.0;

    ColoredOjbect:public (Color Object)

    Color [

    [red pink] { }[yellow canary] { }

    [green khaki] { }

    ]

    Object [

    [truck car] { }[ball block] { }

    [shirt blouse] { }

    ]

    "yellow shirt" "canary blouse"=> { color: yellow; object: clothing; }

    Grammar rules: sentences

    Colored Object

    ObjectColor

  • 7/29/2019 Speech Recognition Grammars

    16/35

    Grammar with sub-rules

    Sub-grammars and rules are referenced in XML form using arule reference. A rule reference can point to a localgrammar, or an external grammar rule contained in anotherfile or even on another server on the Internet.

    Design of a grammar consisting of sub-grammars requiresconsiderable planning to ensure that all possible utterancesare covered and also to avoid redundancies as well asrepetitions in the grammar.

    It is often useful to map out the grammar diagrammatically or

    using a simple format such as GSL or ABNF beforeattempting to code the rules in XML format.

  • 7/29/2019 Speech Recognition Grammars

    17/35

    Rule Scope - GSL

    Each defined rule has a scope of either private or public. A rule withpublicscope is

    visible outside its grammar and can be referenced by name from othergrammars can be activated for recognition (can serve as a top-level rule)

    A rule withprivate scope is visible only within its containing grammar may be referenced only by other rules within the same grammar.

    To mark a rule as public, the format is: RuleName:public ruleExpansion If no rules in the grammar are explicitly marked with :public, then all

    rules in the grammar are public. If any rule in the grammar is marked with :public, then all public rules

    must be so marked. The root rule in a GSL grammar is always the first public rule.

    For example, the following set of definitions creates one public rule namedSnapperand two private rules named SnapperType and FishColors:

    SnapperType [mutton FishColors]FishColors [black gray red]Snapper:public (SnapperType snapper)

  • 7/29/2019 Speech Recognition Grammars

    18/35

    Rule scope - XML By default, VoiceXML 2.0 grammar rules are private. This

    means that the rules can only be referenced within the same

    grammar file. To allow a grammar rule to be referenced from an external

    source, such as a VoiceXML document or another grammar,the rule needs to be scoped as public using the scopeattribute

  • 7/29/2019 Speech Recognition Grammars

    19/35

    Grammar Headers - GSL

    Inline

    External:;GSL2.0

    ...grammar rule definitions...

    No definition of top-level rule

    Referencing an external grammar or a top level rule in agrammar:

  • 7/29/2019 Speech Recognition Grammars

    20/35

    Grammar Headers - XMLInline

  • 7/29/2019 Speech Recognition Grammars

    21/35

    Grammar Scope

    Grammar elements can be included within anyVoiceXML element that receives user input field link: for transitions to other documents e.g. operator.vxml menu: grammar implicitly specified by the

    element form: for mixed-initiative dialogues

    by default the scope of a grammar is limited to theelements in which it is defined

    scope can be set using the scope attribute e.g.

    grammars defined within forms or menus can be givendocument scope grammars defined in the root document scope to the

    entire application

  • 7/29/2019 Speech Recognition Grammars

    22/35

    Using Grammar Effectively

    A grammar should cover effectively the range ofresponses that can be encountered to a prompt

    this can include the essential input as well asextraneous words and phrases

    a grammar that is too large will hinder speechprocessing and lead potentially to more misrecognitions

    scope is important: grammars should not overlap

    excessive use of global grammars (defined in the root

    document) can increase the possibility of overlapping

  • 7/29/2019 Speech Recognition Grammars

    23/35

    Tutorial Exercise 1. Using tagsIntegrate the following rule and its grammar into an application thattakes in the name of a student and the name of a course and outputs

    the student's name along with a course code.

    comms communications $="01"

    algorithms $="02" programming $="03" databases $="04"

  • 7/29/2019 Speech Recognition Grammars

    24/35

    DTMF

    DTMF (touch-tone) can be used as an alternative to speechinput, particularly when speech recognition is unreliable orproblematic.

    In VoiceXML 2.0 dtmf is included as a value of the modeattribute in the element

    1 $= students" 2 $= courses"

    3 $= reports"

  • 7/29/2019 Speech Recognition Grammars

    25/35

    DTMF and / or speech in GSL

    ;GSL 2.0;

    Rating(

    ?[(i feel ?like) (it is ?a) (its ?a)][

    [one dtmf-1] { }

    [two dtmf-2] { }

    [three dtmf-3] { }

    .

    ]

  • 7/29/2019 Speech Recognition Grammars

    26/35

    DTMF after counts

    Prompt counts can be used, e.g. to give the user anopportunity to choose using speech, then advise use of

    keypad if speech is unsuccessful

    please use your keypad

  • 7/29/2019 Speech Recognition Grammars

    27/35

    Tutorial Exercise 2: DTMF and speech

    Create a file with choices (student details | course details | reports)

    that allows speech as well as DTMF inputInclude a nomatch (or noinput) event that asks the user to use the

    keypad on the second time that speech input is unsuccessful.

    The system should confirm with words rather than DTMF

    1 $= "student details"

    student details $= "student details"

  • 7/29/2019 Speech Recognition Grammars

    28/35

    Built-In Grammars

    Built-in grammars are provided in VoiceXML boolean (true or false: in DTMF 1 is true, 2 is false)

    date

    digits (e.g. three four seven)

    currency number (e.g. three hundred and forty seven)

    phone

    time

    specifying within the element

  • 7/29/2019 Speech Recognition Grammars

    29/35

    Built-In Grammar: Digits

    Digit recognition is performed in VoiceXML by using a built-in grammar

    for digits that is declared as a field type. For example:

  • 7/29/2019 Speech Recognition Grammars

    30/35

    Digits grammar example

  • 7/29/2019 Speech Recognition Grammars

    31/35

    Built-in grammar: boolean

    The boolean grammar contains ways of saying yes or no

    The particular words within the boolean grammar aredependent on the locale i.e. the language type e.g. USEnglish, UK English, etc.

    The words may also vary from one platform to another

    IBM Voice Toolkit UK English:

    yes, true, positive, right, ok, sure, affirmative, check, yep,correct, no, false, negative, wrong,not, nope, incorrect

    The return value sent is a boolean true or false.

    If the field name is subsequently used in a value elementwithin a prompt, the TTS engine will speak eitheryes orno.

    Users can also provide DTMF input: 1 is yes, and 2 is no.

  • 7/29/2019 Speech Recognition Grammars

    32/35

    Boolean grammar example

  • 7/29/2019 Speech Recognition Grammars

    33/35

    Built-in field type Sample input

    currency three twenty five

    sixteen dollars and fifty seven centsten dollars

    nine million two hundred thousand dollars

    date may fifth

    march

    the thirty first of december two thousandyesterday

    today

    tomorrow

    phone seven three five eight four nine zero

    two one two four nine six two seven oh six

    Sample input for built-in field types

  • 7/29/2019 Speech Recognition Grammars

    34/35

    Built-in field type Sample input

    number ten million five hundred thousand and fifty

    three

    minus one point five

    plus one point five

    point seven

    digits zero, oh, one, two, three, four , five, six,

    seven, eight, nine

    time one oclock

    five past one

    three fifteen

    seven thirty

    half past eight

    oh four hundred hours

    sixteen fifty

    twelve noon

    midnight

    Sample input for UK English built-in field types (continued)

  • 7/29/2019 Speech Recognition Grammars

    35/35

    Tutorial Exercise 3. Built-in grammars

    Aim: to include built-in grammars

    Create an application in which the user has to speak theiraccount number, which consists of 6 digits (use built-indigit grammar).

    Extend the application with other built-in grammars, suchas date.

    Experiment with the use of the DTMF simulator to enterthe values for account number, date, etc.