Deep Speech: Recent Progress on Mandarin Speech Recognition

Deep Speech: Scaling up end-to-end speech recognition

Deep Speech Recent Progress on Mandarin Speech RecognitionTony Han

Deep Speech Key ingredientsSlides credit: Awni HannunModelNo alignment needed, using objective from [Graves, Fernandez, Gomez and Schmidhuber, 2006]

Data Computation (GPUs)

-TODO key problems with model Score an existing example, and infer the best transcription for a unlabeled example2

Deep Speech Recurrent Neural NetworkSlides credit: Awni Hannun

th_ (blank)Outputalphabet,space,& blank

-how to handle variable length input, recurrent network3

Deep Speech - CTCSlides credit: Awni HannunNo alignment needed!P(_ _ T H _ _ _ _ E _ _ C _ _ A A A _ _ T T _ _ )P(_ T _ _ H _ _ E E _ _ _ C _ _ A A _ _ T _ _ _ )P(THECAT)

...

Difficulties in Mandarin Speech Recognition1. Mandarin is a tonal language

Slides Credit: Awni Hannun

Slides Credit: Awni Hannun

Difficulties in Mandarin Speech Recognition2. Thousands of characters! > 80K

-challenges?6

Difficulties in Mandarin Speech Recognition3. The homophone problem is ubiquitous.

-challenges?7

Deep Speech - Hours of speech data80300200010000

-making this feasible we aggregate many existing datasets and collected several thousand hours of our own data8

Deep Speech - DataSlides credit: Awni Hannun

Speech

Noise

Noisy Speech

-talk about reverb, tempo, etc9

Deep Speech Data Parallel GPU ScalingSlides Credit: Awni Hannun

InfiniBand

Model 1

Model 2

Model 3

Model 4Share weight updates each iteration

-when 1 gpu isnt enough, lots of gpus10

Deep Speech Mandarin

-progress11

Deep Speech Demo1

12

Deep Speech Results13

Deep Speech Demo214

Deep Speech Results15

Demo

Awni HannunOur Deep Speech 2 Paper will be published soon. Stay Tuned

Science

Deep Speech: Recent Progress on Mandarin Speech Recognition