Upload
baidu-usa-research
View
653
Download
0
Embed Size (px)
Citation preview
Deep Speech: Scaling up end-to-end speech recognition
Deep Speech Recent Progress on Mandarin Speech RecognitionTony Han
Deep Speech Key ingredientsSlides credit: Awni HannunModelNo alignment needed, using objective from [Graves, Fernandez, Gomez and Schmidhuber, 2006]
Data Computation (GPUs)
-TODO key problems with model Score an existing example, and infer the best transcription for a unlabeled example2
Deep Speech Recurrent Neural NetworkSlides credit: Awni Hannun
th_ (blank)Outputalphabet,space,& blank
-how to handle variable length input, recurrent network3
Deep Speech - CTCSlides credit: Awni HannunNo alignment needed!P(_ _ T H _ _ _ _ E _ _ C _ _ A A A _ _ T T _ _ )P(_ T _ _ H _ _ E E _ _ _ C _ _ A A _ _ T _ _ _ )P(THECAT)
...
Difficulties in Mandarin Speech Recognition1. Mandarin is a tonal language
Slides Credit: Awni Hannun
Slides Credit: Awni Hannun
Difficulties in Mandarin Speech Recognition2. Thousands of characters! > 80K
-challenges?6
Difficulties in Mandarin Speech Recognition3. The homophone problem is ubiquitous.
-challenges?7
Deep Speech - Hours of speech data80300200010000
-making this feasible we aggregate many existing datasets and collected several thousand hours of our own data8
Deep Speech - DataSlides credit: Awni Hannun
Speech
Noise
Noisy Speech
-talk about reverb, tempo, etc9
Deep Speech Data Parallel GPU ScalingSlides Credit: Awni Hannun
InfiniBand
Model 1
Model 2
Model 3
Model 4Share weight updates each iteration
-when 1 gpu isnt enough, lots of gpus10
Deep Speech Mandarin
-progress11
Deep Speech Demo1
12
Deep Speech Results13
Deep Speech Demo214
Deep Speech Results15
Demo
Awni HannunOur Deep Speech 2 Paper will be published soon. Stay Tuned