37
Uses for Automatic Speech Recognition with Diverse English Speakers 2002 American Speech-Language-Hearing Association Annual Convention Atlanta, Georgia World Congress Center, Room: A314, Saturday, Nov 23 2002 4:30PM – 5:30PM Presenters/Authors: Kathleen Eilers Crandall, Ph.D., Paula M. Brown, Ph.D., Donna E. Gustina, and Stephen S. Campbell National Technical Institute for the Deaf Rochester Institute of Technology

Uses for Automatic Speech Recognition with Diverse English Speakers 2002 American Speech-Language-Hearing Association Annual Convention Atlanta, Georgia

  • View
    217

  • Download
    2

Embed Size (px)

Citation preview

Uses for Automatic Speech Recognition with Diverse English Speakers

2002 American Speech-Language-Hearing Association Annual Convention

Atlanta, Georgia World Congress Center, Room: A314, Saturday, Nov 23 2002 4:30PM – 5:30PM

Presenters/Authors: Kathleen Eilers Crandall, Ph.D., Paula M. Brown, Ph.D., Donna E. Gustina, and Stephen S. Campbell

National Technical Institute for the DeafRochester Institute of Technology

Seminar – PresentersKathleen Eilers crandall,

Ph.D.Department of English, National Technical Institute for the Deaf, Rochester Institute of Technology

Paula M. Brown, Ph.D., CCC-SLP Department of Speech and Language, National Technical Institute for the Deaf, Rochester Institute of Technology

The Glossograph

• Fay wrote about an experimental mechanical device used to transcribe human speech, and said,

• “… it is not unreasonable to hope that some instrument will yet be contrived …“

Fay, E.A. (1883). The glossograph. American Annals of the Deaf, 28, 67-69.

Sci-Fi or Reality?

"The pen was an archaic instrument, seldom used even for signatures...Apart from very short notes, it was usual to dictate everything into the speak-write…” (Nineteen eighty-four. Orwell, 1949)

Two Projects

• Teacher use of ASR:– English Classroom/Lab Project

• Student use of ASR:– Speech Project

Funded by a grant from the Parsons Foundation of California

English Classroom/Lab Project

English Classroom/Lab Project

Purpose

Investigate direct use of ASR by classroom teacher to learn:

• Is acceptable recognition level attained?

• Under what conditions?– Style of speaking– Communication mode– Language complexity

Related Work

Use of ASR by an intermediary • Intermediary, a ‘captionist,’ re-speaks

professor’s words into a computer• Intermediary summarizes professor’s

words into a computer (‘interpreted speech’)

• Intermediary may use C-print (a shorthand typing system) in combination with ASR http://cprint.rit.edu/

Related Work

Use of ASR by the primary speaker

• iCommunicator™ http://www.myicommunicator.com/product_info.html

• Liberated Learning Environment http://www.liberatedlearning.com (St. Mary’s University, Halifax, Nova Scotia)

Speech Project

Speech Project Intent

• Can ASR become better than a naïve listener?

• Can ASR serve as an effective and motivating feedback system?

Speech Project How ASR Is Used Educationally

Visual displays provide feedback regarding speech production

• Natural way of learning

• Expect feedback to reflect accuracy– Assume if don’t get right picture, you were

wrong

English Classroom/Lab Project

English Classroom/Lab Project

Teacher -- Students• Teacher -- Speaker

– Native speaker of American English– User of ASL as a second language – Trained the ASR equipment

• Students -- Readers – Young adult college students who are deaf or hard-of-

hearing– Reading and writing skills at the lowest quartile of

entering students– Enrolled in basic level English language reading and

writing courses

English Classroom/Lab Project

Evaluation Procedures

• ASR Software: – Dragon Naturally Speaking– IBM ViaVoice– Microsoft Office

• Speaking styles: – Spontaneous conversation– Dictation-like speech

• Communication modes:– Speaking– Simultaneously speaking and signing

English Classroom/Lab

Teacher stationControl systemSmart Board & LCD Projector

Student Stations

English Classroom/Lab Project

Accuracy Needs

• Vary by population and message predictability– New vs. Known information– Fluent readers vs.

Language learners– Reading for pleasure vs. Reading to master new

information

• CLOZE research and prediction of missing information

English Classroom/Lab Project

Results: ASR Software

75%

80%

85%

90%

95%

100%

Dragon ViaVoice XP

Conversation

Dictation

English Classroom/Lab Project

Results: Communication Mode

80%

82%

84%

86%

88%

90%

92%

94%

96%

98%

Simultaneous Commmunication Speech Only

Conversation

Dictation

English Classroom/Lab Project

Results: Language Complexity

82%

84%

86%

88%

90%

92%

94%

96%

98%

< 7th Grade > 7th Grade

Conversation

Dictation

English Classroom/Lab Project

Correcting Text

• Error correction– What to correct – When to correct– How to correct

Multitasking Demands

• Normal tasks for speaker/teacher– Formulating ideas relevant to topic– Attending to learning needs of students – Meeting lipreading and sign language needs

• Added tasks for speaker/teacher – Speaking to produce readable ASR text– Monitoring text– Making corrections

Speech Project

Speech Project

Training Sequence

• Read a paragraph

• Correct and train recognition errors

• Reread paragraph

• Correct and train recognition errors

• Create transfer paragraph or spontaneous speech

• Correct and train recognition errors

Recognition Accuracy

0%

10%

20%30%

40%

50%

60%

70%

80%90%

100%

M Intel F semi-intel F quasi-intel

Improvement Across Sessions

0%

10%

20%

30%

40%

50%

60%

70%

80%

time 1 time 2 time 3 time 4 time 5

Improvement Within Session

65%

70%

75%

80%

85%

90%

95%

Reading 1 Reading 2 Reading 3 Spon Sp

Speech Project

Improvement Evaluated

• Improvement across sessions

• Improvement within a session– Improvement with speaker training– Improvement with ASR training

RecommendationsDiscussionQuestions

Grammatical Correctness

• Is ASR accuracy affected by the grammatical correctness of the user’s speech?

• Student written responses spoken as written: Accuracy – 93.8%

• Student written responses spoken after corrected: Accuracy - 94.3%

Style of Speaking

1. Style of speaking that more closely resembles dictation approaches a usable accuracy rate.

2. Lowering the complexity does not improve accuracy.

Conditions of Use

Direct use of ASR by a language teacher --Useful only under very controlled conditions.• Illustrating the generation of written

language • Demonstrating the use of notes and

outlines to produce written text• Translating selected sign language

utterances into English text during discussions

ASR: Classroom Use

Prepared Outline

Student’s Screen

Teacher’s Screen

Considerations• Training

– Critical to reach over 90% accuracy– Training with conversation

• Corrections– Familiarity with strategies – Dictate, Spell, Right click

• Equipment– Microphone headsets - design, comfort, and size– Demand on computer processor– Effect of optional settings

Language Processing

Teaching/Learning Issues:• Does ASR promote the learning of reading

and writing for Deaf and Hard-of-Hearing students?

• How do students process this information?• Do students attend to multiple inputs?• Can teachers attend to this many tasks

effectively?

More Questions

• Who is at fault?– Speaker or ASR receiver?

• Acceptability of input– Various voices– Nontypical speakers

• User friendliness– Want immediate use

PresentersKathleen Eilers Crandall, Ph.D.Department of English

National Technical Institute for the Deaf

Rochester Institute of Technology Lyndon Baines Johnson Building -

2264

Phone: (585) 475-5111

Fax: (585) 475-6500

Email: [email protected]

Web: http://www.rit.edu/~kecncp

Paula M. Brown, Ph. D., CCC-SLP

Department of Speech and Language

National Technical Institute for the Deaf

Rochester Institute of Technology Lyndon Baines Johnson Building -

3851

Phone: (585) 475-6593 V/TDD

Fax: (585) 475-6500

Email: [email protected]

Web: http://www.rit.edu/~462www/