Deep Temporal Models (Benchmarks and …...2016 SRI International Deep Temporal Models (Benchmarks and Applicaons Analysis) Sek Chai SRI Internaonal Presented at: NICE 2016, March

2016 SRI International

DeepTemporalModels(BenchmarksandApplica6onsAnalysis)SekChaiSRIInterna6onalPresentedat:NICE2016,March7,2016


HighdimensionalDataSet

Project Summary

SRIInterna4onalSekChai(PI)MohamedAmerDavidZhangTimShields

U.MontrealRolandMemisevicYoshuaBengio

U.GuelphGrahamTaylorDhaneshRamachandram

2

GoalsAnalyzeDeepTemporalModels(DTMs).Findapproachestoreducetraining6me,lowermemorysize,anduselowprecision.

Audio,video,gesture

DeepTemporalModels

Analysis

BenchmarksandApplica4onsAnalysis

ProcessorArchitecture

Benchmark

RNN,LTSM,CRBM


SeeingHumans•  ChaLearn2014-Thisdatasetconsistsofasingleuserisrecordedinfrontofadepthcamera,

performingnaturalcommunica6vegesturesandspeakinginfluentItalian.Thedatasetfocusesontheuserindependentautoma6crecogni6onofavocabularyof20Italiancultural/anthropologicalsignsinimagesequences.

•  Challenges:•  Mul6modalvisualcues(RGBD)andaudio

•  Mul6-6mescale,unreliabledepthcues

•  Noinforma6onaboutthenumberofgestureswithineachsequence

•  Highintra-classvariabilityofgesturesamples

•  Lowinter-classvariabilityforsomegesturecategories.

•  Severaldistractorgestures(outofthevocabulary)arepresent.

3

S.Escalera,etal.,"ChaLearnLookingatPeopleChallenge2014:DatasetandResults",ECCV-W2014

Image:Neverovaetal.(2015)


DeepGesture Architecture (for ChaLearn Dataset)

4

State-of-Art88.1%recogni4onrate

N.Neverova,etal.(2015),“ModDrop:Adap6veMul6modalGestureRecogni6on,IEEEPAMI(InPress)

temporalstrides

Valida6

onError,%

TrainingStage

KeyInsights•  Weadoptedastrategywhere“likemodali6es”arefusedfirst;itresemblesbrain’smul6-modalfusionstrategy.

•  Mostpreviousworkonmul6-modellearninghasfuseddata–  Attheinputfeaturelevel(earlyfusion);or–  Atthelevelofper-modalityclassifieroutputs(latefusion)


Example Training Complexity (for ChaLearn Dataset)

5

Classifier Description Modality Training Data Size (GB)

Training Time Epochs Time(sec)/epoch

Motion detect

Shallow MLP used to detect the start-frame and stop-frame of a given gesture in an action set.

Motion Capture 1.663 GB 20,174 sec (5.6hrs)

200,000 0.100

Skeleton Convolutional network which is trained to extract features from motion capture data (Path M)

Motion Capture 1.663 GB 69,344 sec (19.25hrs)

200,000 0.347

Video Feature

3D convolutional layer followed by 2D convolution layer which uses depth and intensity video (ConvC1 -> ConvC2, ConD1->ConvD2 )

intensity+depth video

12.211 GB 82,655 sec (22.9 hrs)

819 100.922

Video Shared hidden layers which uses inputs from previous convolutional layers (HLV1 + HLV2)

Intensity+depth video

12.211 GB 170,223 sec (47.28 hrs)

796 213.848

Multimodal Corresponds to the fully connected shared hidden layer where multimodal inputs are fused. (HLS)

all 13.874 GB 93,247 (25.9 hrs)

449 207.677

Summary•  Total5daystoprocess42GBtrainingdataonSharcnetCopperCluster(064GPUs,128CPUcores,

24cores/node,64GB/node,x86,080TBRAIDAhachedStorage,InfiniBand,4TeslaK80s/node).•  Total#Parameters=7,836,457

2016 SRI International 6

Stochas6crounding

CurrentApproaches

Needs:Memoryismainbohleneck,especiallyforembeddedsolu6ons.Es6mates1Bconnec6onneuralnetworkconsumes12W*.

LowPrecisionNeuralNetworks

NetworkPruning*Image:Han,etal.(NIPS2015)

BinaryConnectImage:Courbariaux,etal.(NIPS2015)

Gupta,etal.(arXiV2015)


Subband/WaveletDecomposi6on

7

Subbanddecomposi6onenablesdatareduc6onbydiscardinginforma6onaboutcertainfrequencieswherehumanvisualsystemislesssensi6ve.*Canwedothesameforlearntrepresenta6ons?

[1] "The Laplacian Pyramid as a Compact Image Code", Burt et. al. [2] Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks, Denton et. al. [3] Image fusion: algorithms and applications, Stathaki.

*Goodrepresenta6on:UsedinImageCompression[1],Reconstruc6on[2]andFusion[3]extensively.


TheMNISTdatabasecontainsblackandwhitehandwrihendigits,normalizedto20x20pixelsize.Thereare60,000trainingimagesand10,000tes6ngimages.

I0

Conven6onalApproachforMNISTdata

*Image:Lecun,etal.(1998)


G1

Lo

+Fusion

+

Image(I0) Laplacian(L0) Gaussian(G1)

28x28 32x32 16x16

OurApproach

CapturesEdges Spa6alInforma6on

BackgroundCues

SeparateNetworksforSubbandLearning

BasicIdea:Weseparateimageryintodifferentfrequencybands(e.g.withdifferentinforma6oncontent)suchthattheneuralnetcanbe.erlearnusinglessbits.


Valid

a4on

Error(Logsc

ale)

Epochs

Valida6onErrorvsEpochsinMNIST

---GaussianG1

-x-Fusion

-x-LaplaceL0-o-OriginalI0

+

I0 L0 G1

28x28 32x32 16x16

Discussions•  CNNtrainedonLaplacianfocussesmoreonedges(goodfeature).

•  CNN(L0)beatsCNN(I0)onthisdataset.


Weightbits

32bit 16bit 8bit 4bit

Original 1.13 1.13 1.11 1.32

GBlur 1.38 1.41 1.36 1.38

Laplace 1.02 1.00 0.91 1.66

Fusion 0.95 0.89 0.89 1.13

Stochas4cRoundingaZereveryepochWeightbits

32bit 16bit 8bit 4bit

Original 1.13 1.22 1.14 4.45

GBlur 1.38 1.34 1.35 5.24

Laplace 1.02 0.96 1.03 6.03

Fusion 0.89 0.91 0.92 6.03

Stochas4cRoundingaZerfinalepoch

Robustness to Low Bit Precision Weights

Discussions•  Fusionresultsarecomparabletooriginal,usinghalfthenumberofbits.•  Stochas6croundingavereveryepochguidesthelearning,andisespecially

usefulforlowprecision.

•  SimpleFusion:Equi-Weightedaverageofsovmaxoutput•  Scores0.5*(s1(x)+s2(y)),s1(x),s2(x)in[0,1]10and||s(x)||1=1.


Cifar-10 data set

Discussions:•  Cluheredcolorimages,andmorechallengingthanMNIST.•  Containsbackgroundcuesandcontextthatcanhelprecogni6on,(e.g.blue

skyforairplaneorwaterforship).•  Architectureisthesameasbefore(LeNet-5)with30and60featuremaps.•  Ourgoalistoshowcompara6veresultsforlowprecision.Therearenodata

augmenta6on.

TheCIFAR-10datasetconsistsof6000032x32colorimagesin10classes,with6000imagesperclass.Thereare50000trainingimagesand10000testimages.


0.28

0.38

0.48

0.58

0.68

0.78

Testerror

Epoch

CIFAR-10Performance

Original Laplacian Gblur Fusion

Clu.eredimagecombinedwithhighlearningrate

Gaussianblurremovesnoiseinclu.er

Laplacianenhancesforeground


Conclusion

14

• Hybrid-mul6modalneuralnetworksimprovesalgorithmicperformance.

•  Fusionoflearntrepresenta6onsisimportant.•  Lowprecisionnetworksshowspromise.

•  StopbyandvisittheposteratNICE2016.

Chai,etal.,"LowPrecisionNeuralNetworksusingSubbandDecomposiHon",CogArch,April2016

Documents

Deep Temporal Models (Benchmarks and …...2016 SRI International Deep Temporal Models (Benchmarks and Applicaons Analysis) Sek Chai SRI Internaonal Presented at: NICE 2016, March