MediaEval 2016 - Zero-Cost Speech Recognition Task

Zero-CostSpeech Recognition task

Igor Szoke (BUT, CZ)Xavier Anguera (ElsaNow, PT)

The Zero-Cost task goal...● ...not to be a zero/low resource

○ Lots of work done with data preparation.○ There is “lots” of data outside.

● To be more realistic○ Limitation in budget.. Not in data.○ Very noisy data.

● To setup a knowledge base○ How to put things together.○ Not how to compete in data download.

● To be in touch with community○ IARPA BABEL, MGB Challenge, JHU workshops, Zero resource Interspeech challenge

The Zero-Cost task in detail...● Language: Vietnamese● Data provided (~12 hours)● 2 sub-tasks

○ Full-size ASR -> Word Error Rate○ A tokenizer -> Normalized Mutual Information

● Participants can find and share more data (must be free).● Participants score using an on-line Leader Board - www.Zero-Cost.org● To help participants we:

○ Prepared a baseline system (Triphone-GMM system trained using Kaldi)○ Provided a BUT BABEL baseline system (to frame ZC results)○ Provided a local scoring

The task in deeper detail..

● Data○ ELSA - read sentences on a cellphone (various public places)○ Forvo.com - single word pronunciation (office)○ Rhinospike.com - read sentences or paragraphs (office)○ YouTube.com - news, presentations, talks (single speaker, reverberant)

● Other data provided by participants○ (NNI) - URLs of Vietnamese web pages - (BUT) downloaded and cleaned - 18MB○ (NNI) - Wiki text - cleaned - 750MB○ (NNI) - Wordlist - 80k○ (BUT) - Subtitles - 93MB○ (BUT) - Some telenovels (video + subtitles) - not used

Participants● 12 interested● 6 signed up● 3 finished

○ ASR sub task■ NNI - fusion of 3 systems based on DNNs and data augmentation. Paticp. data used.■ ININ - fusion of several systems based on SGMMs, RNN and data augmentation.■ BUT - single system based on BLSTMs.

○ Subword sub task

■ BUT - single system based on automatically derived units using an infinite mixture of HMM.

ASR sub-task results (zero-cost.org)

ASR sub-task results (zero-cost.org)

Subword sub-task results (zero-cost.org)

Conclusion● Thanks to all!

○ Participants shared data.○ Coped with low amount of data - data augmentation.○ Used state of the art techniques.

● The future? ○ Leader board stays open. Anyone can join or continue.○ We would like to continue!

■ The new language?■ Tweak the task a bit?■ More active participants...

That’s it…

Big thank to all participants!

Zero-cost speech recognition task

Leader-Board

Zero-Cost Leader Board

Details...● Zero-cost.org● News, task description, scoring, leader board● Small dev subset provided to participants to save some of their time

uploading too often.● Uploading dev + test, to see just dev results. When evaluations are finished,

to see also test results.● Support of late submissions.● Web interface, token based backend processes (working independently on

web).

Science

MediaEval 2016 - Zero-Cost Speech Recognition Task