Speech recognition in MUMIS

Mirjam Wester, Judith Kessens

& Helmer Strik

• Objective: Automatic speech recognition of football commentaries

• SPEX transcribed two matches for two languages (Dutch and English):– England - Germany (Eng-Dld) and – Yugoslavia -The Netherlands (Yug-Ned)

• Commentaries and stadium noise are mixed

Data Conversion

• SPEX transcription:– text grid:

• orthographic transcription

• chunk alignment; chunk = a segment of speech of about 2 to 3 seconds

– CD with one large wav file

• Split according to chunk alignments

Examples of data

• Yug-Ned Dutch

• Yug-Ned English

• Eng-Dld Dutch

• Eng-Dld English

Statistics

Dutch English

#chunks 5146 5613

#speech chunks 3006 3725

#empty chunks 2140 1843

#words (types) 1954 2923

#words (tokens) 12079 24022

English matches have two commentators, Dutch only one.Overlapping segments have been disregarded.

TrainingDutch:• Yug-Ned ¾ of CD (19 min speech)• France Telecom Noise Reduction (FTNR)

English:• Yug-Ned ¾ of CD (28 min speech)• FTNR

For more information on France Telecom Noise Reduction tool see: B. Noé, J. Sienel, D. Jouvet, L. Mauuary, L. Boves, J. de Veth & F. de Wet “Noise Reduction for Noise Robust Feature Extraction for Distributed Speech Recognition”. In Proc. of Eurospeech ’01

TestDutch:• Yug-Ned ¼ of CD

– 626 chunks, 1577 words– lexicon and language model based on complete Yug-

Ned match

English:• Yug-Ned ¼ of CD

– 636 chunks, 2641 words– lexicon and language model based on complete Yug-

Ned match

SNR before and after FTNR tool

WER results for Yug-Ned before and after FTNR

NL-original NL-FTNR Eng-Original Eng-FTNR

Training material acoustic models

Dutch – Polyphone

• Data is phonetically rich sentences

• Phone models were trained on:– Polyphone all speakers– Polyphone male speakers– Polyphone male speakers + MUMIS noise

• Polyphone as bootstrap for segmentation of MUMIS material

Polyphone models (Dutch)Yug-Ned test set

Poly-all Poly-male Poly-male+noise Poly-seg.MUMIS

Cross tests (Dutch & English)

Cross-tests:

• train on ¾ Yug-Ned test on ¼ Eng-Dld

• train on ¾ Eng-Dld test on ¼ Yug-Ned

MUMIS models (Dutch)

Yug-Ned Eng-Dld-cross Eng-Dld Yug-Ned-cross

Yug-Ned test Eng-Dld test

MUMIS models (English)

MUMIS models (Dutch+English)

Function words vs content words

Yug-Ned Eng-Dld Yug-Ned Eng-Dld

) functioncontentnamesall

word type

English data Dutch data

SNR vs. WER (1)Dutch Data

0 5 10 15 20 25 30

SNR1 (dB)

YugNed YugNed_ftnr EngDld

SNR vs. WER (2)English Data

0 10 20 30 40

SNR1 (dB)

YugNed YugNed_ftnr EngDld

Discussion

• WERs are high• Noise?

– FTNR leads to lower SNR, but WERs do not improve substantially

• Not enough training data?– Polyphone for training/bootstrapping does not lead to

lower WERs than training on MUMIS data

– Noisifying Polyphone with MUMIS gives encouraging results

Discussion continued

• Function words comprise ± 50% of the data, and cause great deal of the errors

• Names are recognized very well

• Function words not necessary for information extraction (?)

Future work• Steps to noise robust speech recognition:

– model/speaker adaptation– combinations of noisified Polyphone models

and FTNR

• Other issues:– transcription of more data

• English, Dutch and German• preference specific games? radio? TV?

– generic football specific language model– confidence measures?

Future work continued

Questions: • What type of output from ASR is needed?

– word-graph

– n-best list

– top of the list

– word spotting? only content words?

• For research purposes: is it possible to obtain data that has not been mixed (noise + commentary)?

Speech recognition in MUMIS

Documents

ISSUES IN SPEECH RECOGNITION Shraddha Sharma. Contents: Introduction What is speech recognition? Terminology of speech recognition Why we want speech

Speech recognition in MUMIS Judith Kessens, Mirjam Wester & Helmer Strik

Speech Recognition. What makes speech recognition hard?

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W)

ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition? also known as automatic speech recognition or computer speech

SPEECH RECOGNITION:

Speech recognition in MUMIS Eric Sanders (KUN) March 2003

Speech and Speech Recognition resources

Chapter 5: Speech Recognition An example of a speech recognition system Speech recognition techniques Ch5., v.5b1

Information for Speech Recognition Joint Processing of ... Speech Recognition ... speech onset cues with audio-based speech energy Audio-Visual Speech synthesis ... speech recognition

The Practical Guide to Speech Recognition · Speech recognition offers a rapid and substantial payback. Table One: Increasing Self-Help with Speech Recognition 3 Speech Recognition

Speech Recognition

Speech Recognition and Speech Translation

SpeM: Modeling Human Speech Recognition - MRC ... · Web viewKeywords: human speech recognition; automatic speech recognition; spoken word recognition; computational modeling Abstract