18
www.intelligentvoice.com Deep Convolution Neural Networks for Dialect Classification of Spectrogram Images Nigel Cannings Chase Information Technology Services Limited 1

Deep Convolution Neural Networks for Dialect Classification of Spectrogram Imageson-demand.gputechconf.com/gtc/2016/presentation/s6371... · 2016-03-31 · Deep Convolution Neural

  • Upload
    others

  • View
    20

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Convolution Neural Networks for Dialect Classification of Spectrogram Imageson-demand.gputechconf.com/gtc/2016/presentation/s6371... · 2016-03-31 · Deep Convolution Neural

www.intelligentvoice.com

Deep Convolution Neural Networks for Dialect Classification of

Spectrogram Images

Nigel Cannings

Chase Information Technology Services Limited

1

Page 2: Deep Convolution Neural Networks for Dialect Classification of Spectrogram Imageson-demand.gputechconf.com/gtc/2016/presentation/s6371... · 2016-03-31 · Deep Convolution Neural

www.intelligentvoice.com

Convolution Networks: Brief History Inspired from receptive fields

in the visual cortex

Notable Implementations:

• Fukushima’s NeoCognitron (1980)

• Explicit parallel implementations (1988)

• LeCun’s LeNet-5 (1998)

• Ciresan’s GPU Implementation (2011)

• GoogLeNet (2014)

2

Fukushima, Kunihiko, ‘Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position,’ Biological Cybernetics 36 (4): 193-202, 1980

LeNet 5 (1998), image source:

http://yann.lecun.com/exdb/lenet/

Page 3: Deep Convolution Neural Networks for Dialect Classification of Spectrogram Imageson-demand.gputechconf.com/gtc/2016/presentation/s6371... · 2016-03-31 · Deep Convolution Neural

www.intelligentvoice.com

Deep Learning

3

Sigmoidal activation functions have now been largely replaced with rectified linear units (ReLU)

‘Vanishing error’ problem (Hochreiter, 1991) doesn’t exist with ReLU

Now we can do `deep’ learning i.e. networks with more than 2 hidden layers

This discovery and GPU computing has resulted in much recent activity in the Neural Network community

Page 4: Deep Convolution Neural Networks for Dialect Classification of Spectrogram Imageson-demand.gputechconf.com/gtc/2016/presentation/s6371... · 2016-03-31 · Deep Convolution Neural

www.intelligentvoice.com

GoogLeNet State of the Art winner of the

ImageNet 2014 competition: classifying 1.2M images into 1K classes

Convolution neural network inspired by LeCun’s LeNet-5

Has 9 ‘Inception’ modules, multiple convolution sizes, and pooling in each module

Stochastic Gradient Descent used to train the network with ‘dropout’ which helps prevents overfitting

4 Szegedy, ‘Going deeper with convolutions,’ arXiv, 2014

Page 5: Deep Convolution Neural Networks for Dialect Classification of Spectrogram Imageson-demand.gputechconf.com/gtc/2016/presentation/s6371... · 2016-03-31 · Deep Convolution Neural

www.intelligentvoice.com

GoogLeNet Structure

Topology consists of ‘Inception’ modules consisting of: Convolutions – Filters for extracting

features, filter size tends to be small in the early layers, bigger in later layers

Pooling – dimensionality reduction

Softmax loss for predicting classes at 3 progressive stages of the network

Other – concatenations for combining convolutions

‘Rinse and Repeat’ 9 times

5

Page 6: Deep Convolution Neural Networks for Dialect Classification of Spectrogram Imageson-demand.gputechconf.com/gtc/2016/presentation/s6371... · 2016-03-31 · Deep Convolution Neural

www.intelligentvoice.com

NIST LRE Competition 6 Language clusters, 20 dialects:

• Arabic (Egyptian, Iraqi, Levantine, Maghrebi, Modern Standard)

• Chinese (Cantonese, Mandarin, Min, Wu)

• English (British, General American, Indian)

• French (West African, Haitian Creole)

• Iberian (Caribbean Spanish, European Spanish, Latin American Spanish, Brazilian Portuguese)

• Slavic (Polish, Russian)

500+ hours of speech data

Data set very unbalanced 6

2015 NIST Language Recognition Evaluation, http://www.nist.gov/itl/iad/lre15.cfm

Page 7: Deep Convolution Neural Networks for Dialect Classification of Spectrogram Imageson-demand.gputechconf.com/gtc/2016/presentation/s6371... · 2016-03-31 · Deep Convolution Neural

www.intelligentvoice.com

RASTA 12 MATLAB RASTA

Spectrogram Convolution Network

Based on Nvidia’s Digits implementation of GoogLeNet

Converted speech to 256x256 pixel spectrograms

Tried different spectral representations and coding…

7

SOX PYTHON

Page 8: Deep Convolution Neural Networks for Dialect Classification of Spectrogram Imageson-demand.gputechconf.com/gtc/2016/presentation/s6371... · 2016-03-31 · Deep Convolution Neural

www.intelligentvoice.com

8

GoogLeNet Processing

Page 9: Deep Convolution Neural Networks for Dialect Classification of Spectrogram Imageson-demand.gputechconf.com/gtc/2016/presentation/s6371... · 2016-03-31 · Deep Convolution Neural

www.intelligentvoice.com

9

GoogLeNet Processing

Page 10: Deep Convolution Neural Networks for Dialect Classification of Spectrogram Imageson-demand.gputechconf.com/gtc/2016/presentation/s6371... · 2016-03-31 · Deep Convolution Neural

www.intelligentvoice.com

10

GoogLeNet Processing

Database: 501248 spectrograms for training 24352 spectrograms for validation 51501 spectrograms for testing

Page 11: Deep Convolution Neural Networks for Dialect Classification of Spectrogram Imageson-demand.gputechconf.com/gtc/2016/presentation/s6371... · 2016-03-31 · Deep Convolution Neural

www.intelligentvoice.com

11

GoogLeNet Processing

Database: 501248 spectrograms for training 24352 spectrograms for validation 51501 spectrograms for testing

Apply convolutions to extract primitives such

as edges

Page 12: Deep Convolution Neural Networks for Dialect Classification of Spectrogram Imageson-demand.gputechconf.com/gtc/2016/presentation/s6371... · 2016-03-31 · Deep Convolution Neural

www.intelligentvoice.com

12

GoogLeNet Processing

Database: 501248 spectrograms for training 24352 spectrograms for validation 51501 spectrograms for testing

Apply convolutions to extract primitives such

as edges

Object parts extracted

Page 13: Deep Convolution Neural Networks for Dialect Classification of Spectrogram Imageson-demand.gputechconf.com/gtc/2016/presentation/s6371... · 2016-03-31 · Deep Convolution Neural

www.intelligentvoice.com

13

GoogLeNet Processing

Database: 501248 spectrograms for training 24352 spectrograms for validation 51501 spectrograms for testing

Apply convolutions to extract primitives such

as edges

Object parts extracted

Full Spectral Features, e.g. phones, words

Page 14: Deep Convolution Neural Networks for Dialect Classification of Spectrogram Imageson-demand.gputechconf.com/gtc/2016/presentation/s6371... · 2016-03-31 · Deep Convolution Neural

www.intelligentvoice.com

14

GoogLeNet Processing

Database: 501248 spectrograms for training 24352 spectrograms for validation 51501 spectrograms for testing

Apply convolutions to extract primitives such

as edges

Object parts extracted

Full Spectral Features, e.g. phones, words

Refinement of accuracy

Page 15: Deep Convolution Neural Networks for Dialect Classification of Spectrogram Imageson-demand.gputechconf.com/gtc/2016/presentation/s6371... · 2016-03-31 · Deep Convolution Neural

www.intelligentvoice.com

15

GoogLeNet Processing

Database: 501248 spectrograms for training 24352 spectrograms for validation 51501 spectrograms for testing

Apply convolutions to extract primitives such

as edges

Object parts extracted

Full Spectral Features, e.g. phones, words

Refinement of accuracy

Dialect Classification

Loss1 Loss2

Loss3

Page 16: Deep Convolution Neural Networks for Dialect Classification of Spectrogram Imageson-demand.gputechconf.com/gtc/2016/presentation/s6371... · 2016-03-31 · Deep Convolution Neural

www.intelligentvoice.com

Preliminary Results

16

0 20 40 60 80 100

Arabic-Leventine

French-Haitian

Slavic-Polish

Chinese-Wu

French-West_African

English-American

Arabic-Iraqi

Chinese-Mandarin

Arabic-Maghrebi

Slavic-Russian

Spanish-Caribbean

English-British

Arabic-Egyptian

Chinese-Cantonese

Arabic-Modern_Standard

Chinese-Min_Dong

Spanish-European

Spanish-…

Portuguese-Brazilian

English-South_Asian_(Indian)

Accuracy – 83.99 (Top-1), 98.89% (Top-5)

Page 17: Deep Convolution Neural Networks for Dialect Classification of Spectrogram Imageson-demand.gputechconf.com/gtc/2016/presentation/s6371... · 2016-03-31 · Deep Convolution Neural

www.intelligentvoice.com

Still to be investigated…

Many of the scaling, cropping, rotating of images common in image classification to balance data and improve generalisation is not appropriate for spectrograms

Dynamic frequency warping techniques to balance the data sets and improve generalisation

Taxonomy of languages investigation of the similarity of classification results across dialects

• David Cameron – Arabic?

17

Page 18: Deep Convolution Neural Networks for Dialect Classification of Spectrogram Imageson-demand.gputechconf.com/gtc/2016/presentation/s6371... · 2016-03-31 · Deep Convolution Neural

www.intelligentvoice.com

Questions

Thank you

18