36
Page 1 of 36 salsman-culnane-specification 6/2/17, 4:47 PM PRONUNCIATION ASSESSMENT FOR INTELLIGIBILITY REMEDIATION Utility Patent Specification by James Salsman, Fort Lupton; and Lance Culnane, Westminster, both of Colorado. April 22, 2017 BACKGROUND CITATIONS U.S. Patent Documents: 5,679,001: Russell, et al. (1997) “Children's speech training aid.” 5,920,838: Mostow, et al. (1999) “Reading and Pronunciation Tutor.” 6,634,887: Heffernan, III, et al. (2003) “Methods and Systems for Tutoring Using a Tutorial Model with Interactive Dialog.” 6,963,841: Handal et al. (2005) “Speech Training Method with Alternative Proper Pronunciation Database.” 7,752,045: Eskenazi, et al. (2010) “Systems and Methods for Comparing Speech Elements.” 8,109,765: Beattie, et al. (2012) “Intelligent Tutoring Feedback.”

PRONUNCIATION ASSESSMENT FOR INTELLIGIBILITY … · PRONUNCIATION ASSESSMENT FOR INTELLIGIBILITY REMEDIATION Utility Patent Specification by James Salsman, Fort Lupton; and Lance

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

  • Page 1 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    PRONUNCIATION ASSESSMENT FOR INTELLIGIBILITY REMEDIATION

    Utility Patent Specification

    by James Salsman, Fort Lupton; and Lance Culnane, Westminster, both of Colorado.

    April 22, 2017

    BACKGROUND CITATIONS

    U.S. Patent Documents:

    5,679,001: Russell, et al. (1997) “Children's speech training aid.”

    5,920,838: Mostow, et al. (1999) “Reading and Pronunciation Tutor.”

    6,634,887: Heffernan, III, et al. (2003) “Methods and Systems for Tutoring Using a

    Tutorial Model with Interactive Dialog.”

    6,963,841: Handal et al. (2005) “Speech Training Method with Alternative Proper

    Pronunciation Database.”

    7,752,045: Eskenazi, et al. (2010) “Systems and Methods for Comparing Speech

    Elements.”

    8,109,765: Beattie, et al. (2012) “Intelligent Tutoring Feedback.”

  • Page 2 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    8,271,281: Jayadeva, et al. (2012) “Method for Assessing Pronunciation Abilities.”

    8,744,856: Ravishankar (2014) “Computer implemented system and method and

    computer program product for evaluating pronunciation of phonemes in a language.”

    9,520,068: Beattie, et al. (2016) “Sentence Level Analysis in a Reading Tutor.”

    Other References:

    Chen and Li (2016) “Computer-assisted pronunciation training: From pronunciation

    scoring towards spoken language learning,” in Proceedings of the 2016 Asian-Pacific

    Signal and Information Processing Association (APSIPA) Annual Summit and

    Conference:

    http://www.apsipa.org/proceedings_2016/HTML/paper2016/227.pdf

    Cole, et al. (1999) “A platform for multilingual research in spoken dialogue systems,” in

    Proceedings of the Multilingual Interoperability in Speech Technology Conference

    (Leusden, Netherlands.)

    http://www.cslu.ogi.edu/people/hosom/pubs/cole_MIST-platform_1999.pdf

    Hawkins, J.A.; and Filipović, L. (2012) Criterial Features in L2 English: Specifying the

    Reference Levels of the Common European Framework (United Kingdom: Cambridge

    University Press.)

    https://drive.google.com/open?

    id=0B73LgocyHQnfcEVacmZRc2xEQ3VIZ0tkMHNmdjhNOXVsS1VR

  • Page 3 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    Huggins-Daines, et al. (2006) “Pocketsphinx: A free, real-time continuous speech

    recognition system for hand-held devices.” Proceedings of the IEEE International

    Conference on Acoustics, Speech and Signal Processing (ICASSP),

    https://www.cs.cmu.edu/~awb/papers/ICASSP2006/0100185.pdf

    Kibishi, et al. (2014) “A statistical method of evaluating the pronunciation proficiency/

    intelligibility of English presentations by Japanese speakers,” ReCALL (European

    Association for Computer Assisted Language Learning) doi:10.1017/

    S0958344014000251,

    http://www.slp.ics.tut.ac.jp/Material_for_Our_Studies/Papers/shiryou_last/e2014-

    Paper-01.pdf

    Loukina, et al. (2015) “Pronunciation accuracy and intelligibility of non-native speech,”

    in InterSpeech-2015, the Proceedings of the Sixteenth Annual Conference of the

    International Speech Communication Association (Dresden, Germany: Educational

    Testing Service)

    http://www.oeft.com/su/pdf/interspeech2015b.pdf

    Panayotov, V., et al. (2015) "LIBRISPEECH: an ASR Corpus Based on Public Domain

    Audio Books," Proceedings of the IEEE International Conference on Acoustics, Speech

    and Signal Processing (ICASSP 2015),

    http://www.danielpovey.com/files/2015_icassp_librispeech.pdf

    Proceedings of the International Symposium on Automatic Detection of Errors in

    Pronunciation Training, June 6–8, 2012, KTH, Stockholm, Sweden.

    http://www.speech.kth.se/isadept/ISADEPT-proceedings.pdf

    Proceedings of the Workshop on Speech and Language Technology in Education,

  • Page 4 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    September 4–5, 2015 (Satellite Workshop of Interspeech 2015 and the ISCA Special

    Interest Group SLaTE) Leipzig, Germany:

    https://www.slate2015.org/files/SLaTE2015-Proceedings.pdf

    Ronanki, S.; Salsman, J. and Bo, L. (December 2012) “Automatic Pronunciation

    Evaluation And Mispronunciation Detection Using CMUSphinx,” in the Proceedings of

    the 24th International Conference on Computational Linguistics (Mumbai, India:

    COLING 2012) pp. 61- 67.

    http://www.aclweb.org/anthology/W12-5808

    Salsman, J. (July 2014) “Development challenges in automatic speech recognition for

    computer assisted pronunciation teaching and language learning” in Proceedings of the

    Research Challenges in Computer Aided Language Learning Conference (Antwerp,

    Belgium: CALL 2014.)

    http://talknicer.com/Salsman-CALL-2014.pdf

    Computer-Assisted Pronunciation Teaching (CAPT) Bibliography:

    http://liceu.uab.es/~joaquim/applied_linguistics/L2_phonetics/CALL_Pron_Bib.html

    FIELD OF THE INVENTION

    This invention relates to the field of computer-assisted pronunciation training (CAPT)

    using automatic speech recognition for language learning, speech language pathology,

    and reading tutoring, such as described by Russell, et al. (1997) “Children's speech

    training aid,” U.S. Patent 5,679,001. The assessment and remediation of the authentic

    intelligibility of learners' spoken language as measured by agreement with panels of non-

    expert word transcriptionists including both native and non-native language listeners

  • Page 5 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    provides substantial advantages over the current state of the art which instead typically

    assesses formal pronunciation agreement with a panel of native language listener

    pronunciation experts, because those formal mispronunciations are only associated with

    16% of measured authentic intelligibility of words, according to Loukina, et al. (2015.)

    DISCUSSION OF PRIOR ART

    While Kibishi et al. (2014) have demonstrated the achievement of 75% agreement with

    authentic word transcription, even earlier work by Ronanki, Salsman, and Bo (2012)

    produced open source software implementing means of more precise discrimination

    between consequential and incidental errors by allowing accent and dialect adaptation

    using physiologically neighboring phones (phonemes and diphones) derived from the

    adjacency of vocal tract components, e.g., in the positions and configuration of the lips,

    teeth, tongue, jaw, vocal fold cords, nasal flap, and diaphragm.

    As stated in Salsman (2014), “To best support language instruction, we have been

    developing the use of physiologically neighboring phonemes, i.e., sounds produced with

    similar vocal tract articulations, to identify and discern between serious

    mispronunciations and incidental errors (Ronanki et al., 2012.) We are using diphones,

    i.e. the last half of one phoneme followed by the first half of the next, as alternatives and

    supplements to phonemes and triphones for both automatic speech recognition and

    pronunciation scoring (Cole, et al., 1999.) We plan to model learner fluency and select

    the sequence of self-study practice exercises using cumulative diphone scores. We are

    scoring segment durations to indicate syllables and words pronounced too quickly

    relative to exemplary pronunciations. We have measured substantial potential

    improvements from all of these techniques.

  • Page 6 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    “The language instructor’s experience of computer-assisted pronunciation assessment can

    be enhanced by offering comparisons of students’ utterances to exemplary pronunciations

    in ways that illustrate the measurements of physiologically neighboring phonemes,

    diphones, and speech segment durations. For example, mispronunciations might be

    annotated with International Phonetic Alphabet symbols for both the expected

    pronunciation and its physiologically neighboring phoneme which most closely matched

    the observed speech. Diphones can be used to highlight difficult phonetic transitions, for

    example when two adjacent phonemes are both mispronounced. Duration scoring can

    annotate not just words and sub-word segments given insufficient emphasis, e.g. such as

    might confuse ‘fourteen’ with ‘forty,’ and can highlight missing glottal stops essential to

    discern, for example, ‘harder’ from ‘hard or.’”

    Pronunciation assessment and CAPT responses should be based on at least 44 exemplary

    pronunciations for each response word or phrase, comprised of both genders, two age

    groups such as age 20s and 50s, and, for English, at least eleven geographic regions, in

    order to provide for world-wide English accent and dialect adaptation coverage. For

    English, such exemplary pronunciations should be recorded from native language

    speakers selected from, for example, Australia, Canada, Ireland, New Zealand, South

    Africa, London (Standard Southern English), London (Cockney), London (Received

    Pronunciation), Birmingham, Cornwall, East Anglia, East Yorkshire, North Wales,

    Edinburgh, Ulster, Dublin, Boston, Midwestern US (i.e., in or west of Michigan,

    Pennsylvania, Missouri, or New Mexico), New England, New York City, and the

    Southern U.S. Gulf Coast region.

    Learner analytics (scoring pronunciation for CAPT and grading authentic intelligibility)

    may include log-normal means and variances of phoneme, diphone, word, and phrase

    acoustic scores and durations, along with cumulative phoneme and diphone scores;

    mispronunciations ranked by consequential interference with intelligibility for each word

  • Page 7 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    in an utterance and for the whole utterance; tonality scores for tonal languages; language

    grammar, morphology, and vocabulary criterial feature coverage scores; and subject

    matter topic correctness and coverage scores.

    The intelligibility scoring system should agree with a panel of non-expert, authentic

    native and non-native language word transcriptionists. Beyond the logistic regression of

    word intelligibility by such transcriptionists, other machine learning techniques may

    include, but are not limited to those of Kibishi et al. (2014), such as symbolic regression,

    general and nonlinear regression, classification, artificial neural networks, support vector

    machines, learning vector quantization, or self-organizing maps. Quality assurance

    should be performed by measuring the extent to which the resulting intelligibility scores

    match those of an actual panel of such non-expert native and non-native word

    transcriptionists, preferably using blind or double-blind analysis. Transcriptionist data

    may be enhanced with automatic spelling correction. Intelligibility determination may be

    enhanced with word frequency-based phonological similarity measures of speech

    ambiguity.

    Learner remediation may include audio and visual feedback using expected and observed

    phones and their durations to show vocal tract sagittal sections and front-facing lip static

    graphic diagrams and animations along with spoken audio and text describing corrective

    vocal tract motions in the learner's preferred language with examples in that language.

    OBJECTIVES AND ADVANTAGES

    The invention eliminates pronunciation assessment feedback which does not involve a

    consequential mispronunciation interfering with the student's authentic intelligibility, and

    provides feedback as pair of audio words in the learners' first language, the first

  • Page 8 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    containing the correct phoneme and the second containing the mistaken sound produced.

    To achieve those goals, we collect transcriptions of learner utterances. For example, while

    displaying, “Please listen to this phrase and type in the English words you hear,” play this

    audio for the phrase: “I'm here on behalf of the Excellence Hotel group.” For this

    example, let's say that in the audio, “behalf” was mispronounced as “beh-alf” and

    “Excellence” was mispronounced as “Excellent” but everything else was good. The

    learner types in the text: “I'm here on behalf of the excellent hotel group.” (I.e., the

    transcribing advanced learner gets “behalf” right, but doesn't transcribe Excellence

    correctly because it was mispronounced.) The system sees that “Excellence” was not

    transcribed correctly, while the SR system reports two mispronunciations. Therefore,

    update the database entry for this phrase that a tally for the corresponding phonemes in

    “behalf” are inconsequential, but the final phoneme /s/ in “excellence” is consequential if

    mispronounced.

    After sufficient data is collected, inconsequential mispronunciations can be ignored. The

    database of the prompting phrases will have a probability associated with each phoneme

    by which we can scale (or "weight" per Figure 2) each mispronunciation's acoustic score

    with that probability to establish the cut-of point for the scaled values which will not be

    scored as wrong, e.g. by displaying the word as green or yellow instead of orange or red.

    Using a recorded audio library for words in each learner's first language containing each

    phoneme near the front, instead of showing green/yellow/orange, the audio recording of

    an e.g. Spanish word which starts with a /s/ sound can be played. For example, a

    recording saying in Spanish audio, “When you said excellence [that target word in

    English] you needed the sound that [a Spanish word starting with /s/] starts with, but

    instead you pronounced the sound [a Spanish word starting with /t/] starts with. Listen to

    what you said. [Playing the audio of the learner's mispronounced word.] You were

    supposed to say excellence [the word in English again]. Click replay to hear this again,”

  • Page 9 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    can be played while displaying the word “Excellence” and, e.g., two buttons labeled

    Replay and Continue.

    The specific advantageous improvements of the invention include:

    Learner analytics: Learners are scored by any combination of the quality and

    intelligibility of their phoneme, diphone, syllable, and word production; their word and

    phrase comprehension, and their ability to both comprehend and use grammatical forms,

    word stem morphology, "can-do" criteria, including for both production and

    comprehension, and other criterial aspects of the instructional interactions (please see, for

    example, Hawkins and Filipović, 2012.) In addition to accuracy for each of those aspects,

    the learner's confidence, effort, and independence are measured too. For example,

    confidence can be self-reported, derived from vocal and timing features, or both. Effort

    corresponds to the number and duration of attempts to perform exercises. And

    independence can be measured by the number and frequency of learner requests for help.

    Integrated content development system: Both instructions and peer learners can add to

    and extend branching scenario instructional interactions, which are multiple choice

    response instructional content, such as is used in the Twine Twee formalism, or "Choose

    Your Own Adventure" role-play interactions. This branching scenario instructional

    content can be added and removed by editing the database of interactions in a manner

    similar to editing a wiki such as Wikipedia or Wiktionary.

    Phonetic disambiguation of homographs (and equivalently, heterophones, meaning words

    that are spelled identically but pronounced differently, such as the past and future tenses

    of the word, “read”) are automatically presented for disambiguation as an integrated part

    of the instructional content development subsystem. This allows instructors and peer

    learners to code their instructional content prompting response phrases, of which there

  • Page 10 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    are typically three per branching scenario node, although there can be any natural

    number: zero responses ends the instructional interaction module, one response requires

    the production of a particular prompted response and two or more choices allow for

    transitions to (usually other) nodes.

    Part of speech labeling: The instructional interaction development support subsystem

    also assists in labeling the part of speech (e.g., noun, verb, article, adjective, conjunction,

    preposition, adverb, etc.) of each word of the prompt phrases in new instructional content

    to assist with pronunciation assessment for intelligibility remediation.

    Peer consensus-based validation of instructional content: Each node and each transition

    between nodes in the branching scenario instructional interactions are separately

    validated by instructor data entry and review or peer learner review or both.

    Caching stand-alone exercises for offline execution: The system network interface

    caches both instructional interactions during download and their results in nonvolatile

    storage so that the system will still be usable when disconnected from the network, when

    downloads or uploads or both are inhibited, so that the entire system can perform in a

    manner consistent with stand-alone operation compatible with free, freemium, or paid

    content accession models.

    Extensible vocabulary: Each of the prompting phrases is composed of one or more

    words, each of which is in turn composed of one or more syllables, diphones, and

    phonemes. The number and type of words may be increased by length, subject matter,

    vocational or other topic, geography, languages, morphological features, and other

    aspects.

    Extensible prompting phrases: The number and type of prompting phrases associated

  • Page 11 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    with each of the branching scenario transitions may be increased by length, subject

    matter, vocational topic, geographies, languages, grammatical features, "can-do" criteria,

    and other criteria and aspects. The branching scenario interaction modules in which the

    transitions are contained may similarly be increased by each of those aspects.

    Instructional interaction sequencing: Using a registration and sign-in system which

    records the learners' proficiency with each phoneme, diphone, word, and other learner

    analytics in the system, allows selection of instructional content modules such as

    branching scenarios and prompting phrases which the learner needs to practice the most

    to be provided to them in sequence. While the sequence is often determined by the

    branching scenario interaction transitions, sequencing can also be performed with

    adaptive instruction, by selecting prompting phrases based on how much the learner

    analytics database indicates that the learner needs to practice words or criterial aspects

    contained in the selected phrases.

    Collecting exemplar and student pronunciation audio recordings: The instructional

    interaction development subsystem also includes support for collecting, evaluating the

    authentic intelligibility of, and storing audio recordings from students, instructors, and

    paid voice artists.

    Collecting transcriptions of both first and subsequent language transcriptionists from

    recorded phrases: Both the instructional interactions and the interaction development

    system collect transcriptions of the words that both native and foreign speakers can hear

    when they listen to recorded audio from instructors, voiceover artists, and learners. Such

    transcriptions are scored by the extent to which they match the words that the speaker

    was trying to say when recording the audio.

    Authentic intelligibility remediation: This groundbreaking technique was developed

  • Page 12 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    independently by researchers and software engineers in Japan and the U.S. Educational

    Testing Service. Please see Kibishi, et al. (2014) and Loukina, et al. (2015.) This

    advantage is a monumental improvement over the commercial state of the art, much if

    not most of which is two or three substantial generations behind (see Figure 2.) The

    invention's specific remediation process emphasizes audio feedback of spoken words in

    the learners' first language containing the sounds of the correct and mistaken

    pronunciations, as opposed to merely visual feedback alone.

    Multiple pass automatic speech recognition: The learner analytics assessment process

    includes the temporal endpoints (and thus the duration) and acoustic scores for the words,

    syllables, diphones, and phonemes, of each such speech segments in prompting phrases,

    using anomalous durations of those segments to guide multiple passes of automatic

    speech recognition against audio input using different speech recognition grammars

    representing utterance expectations, and different overall endpoints.

    Speech-language pathology reporting: The reports, statistics, and alerts produced from

    the learner analytics are designed to provide data in the terms, manner, form, order, and

    with the information contained in reports familiar to practicing speech-language

    pathologists. However, the same reports are also annotated and provided with context

    available by, for example, clickable links to additional text, or similar explanatory

    information such that the learners themselves, their teachers, parents, school

    administrators, and peers can understand and interpret those reports, statistics, and alerts

    produced from the analytics database.

    BRIEF DESCRIPTION OF DRAWINGS

    Figure 1 depicts the databases and dataflow for the voice-response instructional

  • Page 13 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    application, comprising a client-server networked computer system composed of: (#1) an

    integrated instructional interaction development system; (#2) an instructional interaction

    database server process and database; (#3) an interaction and prompting phrase selection

    server process; (#4) a network connection from the server to the client; (#5) a client

    computer system which may include a web browser in which the client software is

    implemented; (#6) an instruction delivery application composed of: (#7) an interaction

    and prompting phrase section client process, (#8) a display for interaction multimedia and

    prompting phrases, (#9) a microphone for speech audio input and recording, and (#10) a

    client process to record speech, determine learner analytics; (#11) a network connection

    from the client to the server, (#12) a server process to update speech recognition results

    and learner analytics; (#13) a learner analytics database server process and database;

    (#14) a server process to calculate and update learner analytics results, reports, and

    statistics; and (#15) a server process to produce, display, and send reports, statistics, and

    alerts.

    Figure 2 depicts the motivation for collecting intelligibility transcriptions, as opposed to

    text-independent pronunciation assessment or pronunciation assessment based solely on

    exemplar pronunciations of students or voiceover talent.

    Figure 3 depicts an example use of logistic regression for intelligibility remediation.

    Figure 4 depicts the main database records in an asynchronous intelligibility remediation

    peer learning and data collection system.

    Figure 5 depicts learner analytics-based instructional prompting phrase sequencing and

    branching scenario transitions.

  • Page 14 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    DESCRIPTION OF THE PREFERRED EMBODIMENT

    In its preferred embodiment, the invention consists of software modules to extend

    software systems such as Moodle, a free open source instructional course management

    system, Wikipedia, a free open editable online encyclopedia, Wiktionary, a free open

    editable online dictionary, or Wikiversity, a free open editable online instructional course

    creation system. The user of such software, who typically intends to learn the meaning,

    pronunciation, grammar, morphology, and associated aspects of words and phrases, will

    be shown user interface elements to allow audio recording and subsequent evaluation of

    the audio phrase.

    For example, a Wiktionary user may be presented with buttons labeled "Record," "Stop,"

    "Play," "Evaluate," and, "Try in phrase." The Record button would begin storing audio

    data from the microphone, perhaps with a visual audio level meter indicator. The Stop

    button would terminate the recording, the Play button would allow the learner to listen to

    the recording, perhaps to ascertain the loudness of background noise in order to decide

    whether to evaluate the recording. The Evaluate button would perform the pronunciation

    assessment and determine the intelligibility of the phrase, and use that information to

    select, compose, and produce audio or visual feedback or both, for the learner to review

    in order to remediate their pronunciation intelligibility issues that could be identified.

    Finally, the "Try in phrase" button should provide an opportunity for the learner to

    practice the word in a phrase, and may link the user to a registration and sign-in system

    which records their proficiency with each phoneme, diphone, word, and phrase in the

    system so that the exercises which the learner needs to practice the most can be provided

    to them in a sequence beginning with trying to pronounce the word in a phrase.

  • Page 15 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    OPERATION AND EXPLANATION

    One well-known automatic speech recognition system capable of providing the data on

    which the processes of the invention rely is the Carnegie Mellon Sphinx Speech

    Recognition Project’s PocketSphinx free open source software described in Huggins-

    Daines, et al. (2006.) The operation of the PocketSphinx system to provide pronunciation

    assessment data is described on this CMUsphinx Wiki page tutorial describing the use of

    PocketSphinx for pronunciation evaluation:

    https://cmusphinx.github.io/wiki/pocketsphinx_pronunciation_evaluation

    One of the most important advances of the invention over essentially all of the prior art is

    the use of physiologically nearby neighboring phonemes, which are shown on that wiki

    page as the following file encoding the speech recognition results grammar comprised of

    the physiologically nearby neighboring phonemes of the word “with,” along with those of

    the other phonemes in alphabetical order:

    #JSGF V1.0;

    grammar neighbors;

    public = sil [sil];

    = aa | ah | er | ao;

    = ae | eh | er | ah;

    = ah | ae | er | aa;

    = ao | aa | er | uh;

    = aw | aa | uh | ow;

    = ay | aa | iy | oy | ey;

    = b | p | d;

    = ch | sh | jh | t;

  • Page 16 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    = dh | th | z | v;

    = d | t | jh | g | b;

    = eh | ih | er | ae;

    = er | eh | ah | ao;

    = ey | eh | iy | ay;

    = f | hh | th | v;

    = g | k | d;

    = hh | th | f | p | t | k;

    = ih | iy | eh;

    = iy | ih;

    = jh | ch | zh | d;

    = k | g | t | hh;

    = l | r | w;

    = m | n;

    = ng | n;

    = n | m | ng;

    = ow | ao | uh | aw;

    = oy | ao | iy | ay;

    = p | t | b | hh;

    = r | y | l;

    = sh | s | z | th;

    = sh | s | zh | ch;

    = t | ch | k | d | p | hh;

    = th | s | dh | f | hh;

    = uh | ao | uw | uw;

    = uw | uh | uw;

    = v | f | dh;

    = w | l | y;

  • Page 17 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    = y | w | r;

    = z | s | dh | z;

    = zh | sh | z | jh;

    The phonemes shown above are encoded in the CMUBET phonetic alphabet, which is

    described and explained on this wiki page:

    https://cmusphinx.github.io/wiki/cmubet

    Another important advance of the invention is the use of diphones. A diphone is the last

    part of one phoneme followed by the first part of another. There are over 1,000 diphones

    in spoken English, but only about 650 of those occur with substantial frequency. English

    diphones in the CMUBET phonetic alphabet are explained and listed with their

    frequencies on this wiki page:

    http://cmusphinx.github.io/wiki/diphones

    The use of logistic regression for intelligibility remediation is explained by Figure 3. The

    primary database records for asynchronous intelligibility remediation using peer learning

    and data collection are depicted in Figure 4. The use of learner analytics for instructional

    prompt phrase sequencing and branching scenario transitions are explained by Figure 5.

    CONCLUSION

    The invention provides better speaking skills instructional software than presently

    commercially available from the state of the art. Language students can use thousands of

    free web and stand-alone software applications for learning reading, writing, and

  • Page 18 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    listening. But speaking skills instruction is limited to expensive, cumbersome, and often

    inaccurate commercial software for pronunciation assessment. The interactive language

    pronunciation assessment and remediation software of the invention may be able to

    improve students’ pronunciation of words perhaps six times faster than commercially

    available products. Millions of people worldwide currently wish to improve their

    pronunciation in order to gain access to better jobs and succeed at more opportunities to

    speak in public, on teleconferences, or to groups. Unfortunately, the state of the art often

    frustrates students by putting too much emphasis on inconsequential mistakes. The

    invention solves those problems by allowing adaptive instruction

    While the description above contains many specifics, they should not be considered as

    limitations on the scope of the invention, but rather as exemplification of one preferred

    embodiment thereof. Many other variations are possible. For example, a children's toy

    to teach speaking skills may be provided as a device with a microphone and display, or

    the software system may run in internet web browsers as software executed by the

    browsers as, for example, program code in the JavaScript computer programming

    language. Accordingly, the scope of the invention should be be determined not by the

    embodiments as described and illustrated, but by the following claims.

  • Page 19 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    CLAIMS

    What is claimed is:

    (1) A networked client-server computer system composed of:

    (a) an instructional interaction database server process and database (Figure 1, #2);

    (b) an interaction and prompting phrase selection server process (#3);

    (c) a network connection from the server to the client (#4);

    (d) a client web browser (#5);

    (e) an instruction delivery application (#6), composed of:

    (e)(1) an interaction and prompting phrase section client process (#7),

    (e)(2) a display for interaction multimedia and prompting phrases (#8),

    (e)(3) a microphone for speech audio input and recording (#9),

    (e)(4) a client process to record speech, determine learner analytics, such as the quality

    and intelligibility of the learner’s phoneme, diphone, syllable, and word production; their

    word and phrase comprehension; their ability to both comprehend and use grammatical

    forms; word stem morphology production and comprehension; "can-do" criteria such as

    arbitrary instructional objectives and subject matter; the learner's measured confidence,

    effort, and independence; and use those analytics to assess resulting achievement and

  • Page 20 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    progress scores from the learner’s audio input (#10), and

    (f) a network connection from the client to the server (#11);

    (g) a server process to update speech recognition results and learner analytics, such as the

    quality and intelligibility of the learner’s phoneme, diphone, syllable, and word

    production; their word and phrase comprehension; their ability to both comprehend and

    use grammatical forms; word stem morphology production and comprehension; "can-do"

    criteria, including arbitrary instructional objectives and subject matter; the learner's

    measured confidence, effort, and independence; and use those analytics to assess

    resulting achievement and progress scores from the learner’s audio input (#12);

    (h) a learner analytics database server process and database (#13);

    (i) a server process to calculate and update learner analytics results, reports, and statistics

    (#14);

    (j) a server process to produce, display, and send reports, statistics, and alerts (#15).

    (2) The computer system of Claim 1 with an integrated instructional interaction

    development system (#1) composed of a means to input, edit, and and extend branching

    scenario instructional interactions composed of multiple choice response instructional

    content, such as: the Twine (twinery.org) Twee language and "Choose Your Own

    Adventure" role-play interactions, which can be added, changed, and removed by editing

    a database of interactions in a manner similar to editing a wiki such as Wikipedia or

    Wiktionary.

    (3) The computer system and instructional interaction development system of Claim 2,

  • Page 21 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    with a means of phonetic disambiguation of homographs (words that are spelled

    identically but pronounced differently) presented to the instructional interaction

    developer for disambiguation by selection of alternative pronunciations during input and

    editing.

    (4) The computer system and instructional interaction development system of Claim 2,

    with a means of part of speech (e.g., noun, verb, article, adjective, conjunction,

    preposition, adverb, etc.) labeling of each word of the instructional interaction prompting

    phrases presented for selection of each word’s part of speech during instructional

    interaction input and editing.

    (5) The computer system of Claim 1, with a means of peer consensus-based validation of

    instructional content composed of a way for learners, instructors, parents, and

    administrators to verify and validate each node and each transition between nodes in the

    branching scenario instructional interactions are separately validated by instructor data

    entry and review or peer learner review or both.

    (6) The computer system of Claim 1, with a means of caching stand-alone exercises for

    offline execution comprised of a processes reading instructional interactions and

    associated data from the system network input interface (#4) which caches instructional

    interactions during download, allowing them to be used when the network becomes

    disconnected, and a process storing data when the system network output interface their

    results in nonvolatile storage so that the system will still be usable when disconnected

    from the network, when downloads or uploads or both are inhibited, such that the system

    can perform in a manner consistent with stand-alone operation compatible with free,

    freemium, or paid content accession models.

    (7) The computer system of Claim 1, with a means of extensible vocabulary, composed of

  • Page 22 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    processes to assist in increasing the number and type of words contained in prompting

    phrases by length, subject matter, vocational topic, geography, languages, morphological

    features, and other topics and aspects.

    (8) The computer system of Claim 1, with a means of extensible prompting phrases and

    branching scenario interaction modules, allowing for increasing the number and type of

    prompting phrases and branching scenario interaction modules by length, subject matter,

    vocational topic, geographies, languages, grammatical features, "can-do" criteria, and

    other criteria and aspects.

    (9) The computer system of Claim 1, with a means of instructional interaction sequencing

    composed of processes for registration and sign-in, a process to allow recording learners'

    proficiency with each phoneme, diphone, word, and other learner analytics, and a process

    to determine which instructional content modules such as branching scenarios and

    prompting phrases that the learner needs to practice the most, and a process to provide

    learners those instructional content modules in sequence (Figure 5.)

    (10) The computer system of Claim 1, with a means of authentic intelligibility

    remediation composed of two processes:

    (a) to obtain recorded audio prompting phrase utterances, their transcriptions from native

    and foreign language transcriptionists, to create a predictive model of the consequence of

    observed mispronunciations as follows:

    (a)(1) obtain learner attempts at pronouncing a number of phrases, each associated with a

    branching scenario instructional interaction transition in the form of recorded audio;

    (a)(2) using the recorded audio attempts, categorize each word as having been transcribed

  • Page 23 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    either correctly or incorrectly;

    (a)(3) using automatic speech recognition, evaluate the pronunciation of the recorded

    audio to determine the temporal endpoints and duration, along with the acoustic

    confidence probability, and alternative nearby physiologically neighboring speech

    segments such as phonemes, diphones, and syllables which may have matched the

    recorded audio more closely than the expected segments;

    (a)(4) using the recorded audio of each words and the proportion of the time that they

    were transcribed correctly, use logistic regression to model the consequence of each

    mispronunciation for prediction of the likelihood that the word was correctly transcribed,

    from the independent variables produced by the automatic speech recognition results

    (Figure 3); and

    (a)(5) store the results of the logistic regression predictive model as weight coefficients

    for each of the independent variables of each word of each prompting phrase in the

    predictive model. And,

    (b) to provide learner exercise interaction as follows:

    (b)(1) display one or more prompting phrases;

    (b)(2) record audio from the learner;

    (b)(3) using automatic speech recognition, evaluate the pronunciation of the recorded

    audio to determine the temporal endpoints and duration, along with the acoustic

    confidence probability, and alternative nearby physiologically neighboring speech

    segments such as phonemes, diphones, and syllables which may have matched the

  • Page 24 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    recorded audio more closely than the expected segments;

    (b)(4) scale the results of the automatic speech recognition according to the weights

    stored in step (a)(5) to determine the expected probability that each word is intelligible;

    (b)(5) rank each of the predicted unintelligible words by consequence according to part of

    speech and predictive model probability magnitude;

    (b)(6) provide audio or audio and visual feedback to the learner based on their most

    consequential pronunciation mistake as expected by the predictive model; and

    (b)(7) as part of the audio feedback, replay the learner's most consequential

    mispronunciation followed by another two prerecorded audio words, one of which

    includes the phoneme or diphone associated with the observed sound constituting the

    mispronunciation, followed by a word with the phoneme or diphone associated with the

    correct pronunciation.

    (11) The computer system of Claim 1, with a means of multiple pass automatic speech

    recognition composed of learner analytics assessment processes to determine temporal

    endpoints, and thereby the duration, and acoustic scores for speech segments such as

    phonemes, diphones, syllables, and words of prompting phrases, wherein anomalous

    durations of those segments guide multiple passes of automatic speech recognition of the

    same audio input using different speech recognition grammars representing utterance

    expectations.

    (12) The computer system of Claim 1, with a means of speech-language pathology

    reporting composed of surveying current terms used in, the manner of presentation of,

    printed forms composing, the order of presentation of, and the information contained in

  • Page 25 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    reports used by practicing speech-language pathologists, and then formatting reports,

    statistics, and alerting messages according to the surveyed descriptions of those reports.

    (13) A networked client-server computer system composed of:

    (a) an instructional interaction database server process and database (Figure 1, #2);

    (b) an interaction and prompting phrase selection server process (#3);

    (c) a network connection from the server to the client (#4);

    (d) a client web browser (#5);

    (e) an instruction delivery application (#6), composed of:

    (e)(1) an interaction and prompting phrase section client process (#7),

    (e)(2) a display for interaction multimedia and prompting phrases (#8),

    (e)(3) a microphone for speech audio input and recording (#9),

    (e)(4) a client process to record speech, determine learner analytics, including the quality

    and intelligibility of the learner’s phoneme, diphone, syllable, and word production; their

    word and phrase comprehension; their ability to both comprehend and use grammatical

    forms; word stem morphology production and comprehension; "can-do" criteria such as

    arbitrary instructional objectives and subject matter; the learner's measured confidence,

    effort, and independence; and use those analytics to assess resulting achievement and

    progress scores from the learner’s audio input (#10), and

  • Page 26 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    (f) a network connection from the client to the server (#11);

    (g) a server process to update speech recognition results and learner analytics, including

    the quality and intelligibility of the learner’s phoneme, diphone, syllable, and word

    production; their word and phrase comprehension; their ability to both comprehend and

    use grammatical forms; word stem morphology production and comprehension; "can-do"

    criteria, including arbitrary instructional objectives and subject matter; the learner's

    measured confidence, effort, and independence; and use those analytics to assess

    resulting achievement and progress scores from the learner’s audio input (#12);

    (h) a learner analytics database server process and database (#13);

    (i) a server process to calculate and update learner analytics results, reports, and statistics

    (#14);

    (j) a server process to produce, display, and send reports, statistics, and alerts (#15);

    (k) an integrated instructional interaction development system (#1) composed of a means

    to input, edit, and and extend branching scenario instructional interactions composed of

    multiple choice response instructional content, such as: the Twine (twinery.org) Twee

    language and "Choose Your Own Adventure" role-play interactions, which can be added,

    changed, and removed by editing a database of interactions in a manner similar to editing

    a wiki such as Wikipedia or Wiktionary;

    (l) a means of phonetic disambiguation of homographs (words that are spelled identically

    but pronounced differently) presented to the instructional interaction developer for

    disambiguation by selection of alternative pronunciations during input and editing;

  • Page 27 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    (m) a means of part of speech (e.g., noun, verb, article, adjective, conjunction,

    preposition, adverb, etc.) labeling of each word of the instructional interaction prompting

    phrases presented for selection of each word’s part of speech during instructional

    interaction input and editing;

    (n) a means of peer consensus-based validation of instructional content composed of a

    way for learners, instructors, parents, and administrators to verify and validate each node

    and each transition between nodes in the branching scenario instructional interactions are

    separately validated by instructor data entry and review or peer learner review or both;

    (o) a means of caching stand-alone exercises for offline execution comprised of a

    processes reading instructional interactions and associated data from the system network

    input interface (#4) which caches instructional interactions during download, allowing

    them to be used when the network becomes disconnected, and a process storing data

    when the system network output interface their results in nonvolatile storage so that the

    system will still be usable when disconnected from the network, when downloads or

    uploads or both are inhibited, such that the system can perform in a manner consistent

    with stand-alone operation compatible with free, freemium, or paid content accession

    models.

    (p) a means of extensible vocabulary, composed of processes to assist in increasing the

    number and type of words contained in prompting phrases by length, subject matter,

    vocational topic, geography, languages, morphological features, and other topics and

    aspects.

    (q) a means of extensible prompting phrases and branching scenario interaction modules,

    allowing for increasing the number and type of prompting phrases and branching scenario

  • Page 28 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    interaction modules by length, subject matter, vocational topic, geographies, languages,

    grammatical features, "can-do" criteria, and other criteria and aspects.

    (r) a means of instructional interaction sequencing composed of processes for registration

    and sign-in, a process to allow recording learners' proficiency with each phoneme,

    diphone, word, and other learner analytics, and a process to determine which instructional

    content modules such as branching scenarios and prompting phrases that the learner

    needs to practice the most, and a process to provide learners those instructional content

    modules in sequence (Figure 5.)

    (s) a means of authentic intelligibility remediation composed of two processes:

    (s)(1) to obtain recorded audio prompting phrase utterances, their transcriptions from

    native and foreign language transcriptionists, to create a predictive model of the

    consequence of observed mispronunciations as follows:

    (s)(1)(a) obtain learner attempts at pronouncing a number of phrases, each associated

    with a branching scenario instructional interaction transition in the form of recorded

    audio;

    (s)(1)(b) using the recorded audio attempts, categorize each word as having been

    transcribed either correctly or incorrectly;

    (s)(1)(c) using automatic speech recognition, evaluate the pronunciation of the recorded

    audio to determine the temporal endpoints and duration, along with the acoustic

    confidence probability, and alternative nearby physiologically neighboring speech

    segments such as phonemes, diphones, and syllables which may have matched the

    recorded audio more closely than the expected segments;

  • Page 29 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    (s)(1)(d) using the recorded audio of each words and the proportion of the time that they

    were transcribed correctly, use logistic regression to model the consequence of each

    mispronunciation for prediction of the likelihood that the word was correctly transcribed,

    from the independent variables produced by the automatic speech recognition results

    (Figure 3); and

    (s)(1)(e) store the results of the logistic regression predictive model as weight coefficients

    for each of the independent variables of each word of each prompting phrase in the

    predictive model. And,

    (s)(2) to provide learner exercise interaction as follows:

    (s)(2)(a) display one or more prompting phrases;

    (s)(2)(b) record audio from the learner;

    (s)(2)(c) using automatic speech recognition, evaluate the pronunciation of the recorded

    audio to determine the temporal endpoints and duration, along with the acoustic

    confidence probability, and alternative nearby physiologically neighboring speech

    segments such as phonemes, diphones, and syllables which may have matched the

    recorded audio more closely than the expected segments;

    (s)(2)(d) scale the results of the automatic speech recognition according to the weights

    stored in step (s)(1)(e) to determine the expected probability that each word is

    intelligible;

    (s)(2)(e) rank each of the predicted unintelligible words by consequence according to part

  • Page 30 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    of speech and predictive model probability magnitude;

    (s)(2)(f) provide audio or audio and visual feedback to the learner based on their most

    consequential pronunciation mistake as expected by the predictive model; and

    (s)(2)(g) as part of the audio feedback, replay the learner's most consequential

    mispronunciation followed by another two prerecorded audio words, one of which

    includes the phoneme or diphone associated with the observed sound constituting the

    mispronunciation, followed by a word with the phoneme or diphone associated with the

    correct pronunciation;

    (t) a means of multiple pass automatic speech recognition composed of learner analytics

    assessment processes to determine temporal endpoints, and thereby the duration, and

    acoustic scores for speech segments such as phonemes, diphones, syllables, and words of

    prompting phrases, wherein anomalous durations of those segments guide multiple passes

    of automatic speech recognition of the same audio input using different speech

    recognition grammars representing utterance expectations; and

    (u) a means of speech-language pathology reporting composed of surveying current terms

    used in, the manner of presentation of, printed forms composing, the order of presentation

    of, and the information contained in reports used by practicing speech-language

    pathologists, and then formatting reports, statistics, and alerting messages according to

    the surveyed descriptions of those reports.

    (14) A networked client-server computer system composed of:

    (a) an instructional interaction database server process and database (Figure 1, #2);

  • Page 31 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    (b) an interaction and prompting phrase selection server process (#3);

    (c) a network connection from the server to the client (#4);

    (d) a client web browser (#5);

    (e) an instruction delivery application (#6), composed of:

    (e)(1) an interaction and prompting phrase section client process (#7),

    (e)(2) a display for interaction multimedia and prompting phrases (#8),

    (e)(3) a microphone for speech audio input and recording (#9),

    (e)(4) a client process to record speech, determine learner analytics, such as the quality

    and intelligibility of the learner’s phoneme, diphone, syllable, and word production; their

    word and phrase comprehension; their ability to both comprehend and use grammatical

    forms; word stem morphology production and comprehension; "can-do" criteria such as

    arbitrary instructional objectives and subject matter; the learner's measured confidence,

    effort, and independence; and use those analytics to assess resulting achievement and

    progress scores from the learner’s audio input (#10), and

    (f) a network connection from the client to the server (#11);

    (g) a server process to update speech recognition results and learner analytics, such as the

    quality and intelligibility of the learner’s phoneme, diphone, syllable, and word

    production; their word and phrase comprehension; their ability to both comprehend and

    use grammatical forms; word stem morphology production and comprehension; "can-do"

  • Page 32 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    criteria, including arbitrary instructional objectives and subject matter; the learner's

    measured confidence, effort, and independence; and use those analytics to assess

    resulting achievement and progress scores from the learner’s audio input (#12);

    (h) a learner analytics database server process and database (#13);

    (i) a server process to calculate and update learner analytics results, reports, and statistics

    (#14);

    (j) a server process to produce, display, and send reports, statistics, and alerts (#15).

    (15) The computer system of Claim 14 with an integrated instructional interaction

    development system (#1) composed of a means to input, edit, and and extend branching

    scenario instructional interactions composed of multiple choice response instructional

    content, such as: the Twine (twinery.org) Twee language and "Choose Your Own

    Adventure" role-play interactions, which can be added, changed, and removed by editing

    a database of interactions in a manner similar to editing a wiki such as Wikipedia or

    Wiktionary.

    (16) The computer system of Claim 14, with a means of caching stand-alone exercises for

    offline execution comprised of a processes reading instructional interactions and

    associated data from the system network input interface (#4) which caches instructional

    interactions during download, allowing them to be used when the network is, as usual,

    disconnected, and a process storing data when the system network output interface their

    results in nonvolatile storage so that the system will still be usable when disconnected

    from the network, when downloads or uploads or both are inhibited, such that the system

    can perform in a manner consistent with stand-alone operation compatible with free,

    freemium, or paid content accession models.

  • Page 33 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    (17) The computer system of Claim 14, with a means of instructional interaction

    sequencing composed of processes for registration and sign-in, a process to allow

    recording learners' proficiency with each phoneme, diphone, word, and other learner

    analytics, and a process to determine which instructional content modules such as

    branching scenarios and prompting phrases that the learner needs to practice the most,

    and a process to provide learners those instructional content modules in sequence (Figure

    5.)

    (18) The computer system of Claim 14, with a means of authentic intelligibility

    remediation composed of two processes:

    (a) to obtain recorded audio prompting phrase utterances, their transcriptions from native

    and foreign language transcriptionists, to create a predictive model of the consequence of

    observed mispronunciations as follows:

    (a)(1) obtain learner attempts at pronouncing a number of phrases, each associated with a

    branching scenario instructional interaction transition in the form of recorded audio;

    (a)(2) using the recorded audio attempts, categorize each word as having been transcribed

    either correctly or incorrectly;

    (a)(3) using automatic speech recognition, evaluate the pronunciation of the recorded

    audio to determine the temporal endpoints and duration, along with the acoustic

    confidence probability, and alternative nearby physiologically neighboring speech

    segments such as phonemes, diphones, and syllables which may have matched the

    recorded audio more closely than the expected segments;

  • Page 34 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    (a)(4) using the recorded audio of each words and the proportion of the time that they

    were transcribed correctly, use logistic regression to model the consequence of each

    mispronunciation for prediction of the likelihood that the word was correctly transcribed,

    from the independent variables produced by the automatic speech recognition results

    (Figure 3); and

    (a)(5) store the results of the logistic regression predictive model as weight coefficients

    for each of the independent variables of each word of each prompting phrase in the

    predictive model. And,

    (b) to provide learner exercise interaction as follows:

    (b)(1) display one or more prompting phrases;

    (b)(2) record audio from the learner;

    (b)(3) using automatic speech recognition, evaluate the pronunciation of the recorded

    audio to determine the temporal endpoints and duration, along with the acoustic

    confidence probability, and alternative nearby physiologically neighboring speech

    segments such as phonemes, diphones, and syllables which may have matched the

    recorded audio more closely than the expected segments;

    (b)(4) scale the results of the automatic speech recognition according to the weights

    stored in step (a)(5) to determine the expected probability that each word is intelligible;

    (b)(5) rank each of the predicted unintelligible words by consequence according to part of

    speech and predictive model probability magnitude;

  • Page 35 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    (b)(6) provide audio or audio and visual feedback to the learner based on their most

    consequential pronunciation mistake as expected by the predictive model; and

    (b)(7) as part of the audio feedback, replay the learner's most consequential

    mispronunciation followed by another two prerecorded audio words, one of which

    includes the phoneme or diphone associated with the observed sound constituting the

    mispronunciation, followed by a word with the phoneme or diphone associated with the

    correct pronunciation.

    (19) The computer system of Claim 14, with a means of multiple pass automatic speech

    recognition composed of learner analytics assessment processes to determine temporal

    endpoints, and thereby the duration, and acoustic scores for speech segments such as

    phonemes, diphones, syllables, and words of prompting phrases, wherein anomalous

    durations of those segments guide multiple passes of automatic speech recognition of the

    same audio input using different speech recognition grammars representing utterance

    expectations.

    (20) The computer system of Claim 14, with a means of speech-language pathology

    reporting comprised of surveying current terms used in, the manner of presentation of,

    printed forms composing, the order of presentation of, and the information contained in

    reports used by practicing speech-language pathologists, and then formatting reports,

    statistics, and alerting messages according to the surveyed descriptions of those reports.

  • Page 36 of 36

    salsman-culnane-specification 6/2/17, 4:47 PM

    ABSTRACT

    This invention is a method of interactive computer-aided instruction for general education

    including speaking skills. Learners are asked to read text prompting phrases into a

    microphone in response to multiple choice questions. Automatic speech recognition is

    used to assess the pronunciation and provide remediation, in the form of audio or visual

    responses or both, based on the authentic intelligibility of the learners' spoken responses

    determined from transcriptions of other learners' utterances of the same prompting

    phrases.

    PROVISIONAL PATENT APPLICATION AND DISCLOSURE DOCUMENT

    REFERENCES

    The forgoing utility patent application specification claims the earlier date of James

    Salsman's U.S. provisional patent application of March 4, 2016, entitled, “Pronunciation

    Assessment for Intelligibility Remediation.” The delay in filing the present application

    beyond the one year statutory limit was unavoidable, but was less than the two month

    regulatory exemption for unavoidable delay. The present application also makes reference

    to U.S. Patent and Trademark Office Disclosure Document number S00867 filed by

    James Salsman on October 23, 1998, entitled, “Solar-powered Portable Reading

    Instruction System.”