14
2016 SRI International Deep Temporal Models (Benchmarks and Applica6ons Analysis) Sek Chai SRI Interna6onal Presented at: NICE 2016, March 7, 2016

Deep Temporal Models (Benchmarks and …...2016 SRI International Deep Temporal Models (Benchmarks and Applicaons Analysis) Sek Chai SRI Internaonal Presented at: NICE 2016, March

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Temporal Models (Benchmarks and …...2016 SRI International Deep Temporal Models (Benchmarks and Applicaons Analysis) Sek Chai SRI Internaonal Presented at: NICE 2016, March

2016 SRI International

DeepTemporalModels(BenchmarksandApplica6onsAnalysis)SekChaiSRIInterna6onalPresentedat:NICE2016,March7,2016

Page 2: Deep Temporal Models (Benchmarks and …...2016 SRI International Deep Temporal Models (Benchmarks and Applicaons Analysis) Sek Chai SRI Internaonal Presented at: NICE 2016, March

2016 SRI International

HighdimensionalDataSet

Project Summary

SRIInterna4onalSekChai(PI)MohamedAmerDavidZhangTimShields

U.MontrealRolandMemisevicYoshuaBengio

U.GuelphGrahamTaylorDhaneshRamachandram

2

GoalsAnalyzeDeepTemporalModels(DTMs).Findapproachestoreducetraining6me,lowermemorysize,anduselowprecision.

Audio,video,gesture

DeepTemporalModels

Analysis

BenchmarksandApplica4onsAnalysis

ProcessorArchitecture

Benchmark

RNN,LTSM,CRBM

Page 3: Deep Temporal Models (Benchmarks and …...2016 SRI International Deep Temporal Models (Benchmarks and Applicaons Analysis) Sek Chai SRI Internaonal Presented at: NICE 2016, March

2016 SRI International

SeeingHumans•  ChaLearn2014-Thisdatasetconsistsofasingleuserisrecordedinfrontofadepthcamera,

performingnaturalcommunica6vegesturesandspeakinginfluentItalian.Thedatasetfocusesontheuserindependentautoma6crecogni6onofavocabularyof20Italiancultural/anthropologicalsignsinimagesequences.

•  Challenges:•  Mul6modalvisualcues(RGBD)andaudio

•  Mul6-6mescale,unreliabledepthcues

•  Noinforma6onaboutthenumberofgestureswithineachsequence

•  Highintra-classvariabilityofgesturesamples

•  Lowinter-classvariabilityforsomegesturecategories.

•  Severaldistractorgestures(outofthevocabulary)arepresent.

3

S.Escalera,etal.,"ChaLearnLookingatPeopleChallenge2014:DatasetandResults",ECCV-W2014

Image:Neverovaetal.(2015)

Page 4: Deep Temporal Models (Benchmarks and …...2016 SRI International Deep Temporal Models (Benchmarks and Applicaons Analysis) Sek Chai SRI Internaonal Presented at: NICE 2016, March

2016 SRI International

DeepGesture Architecture (for ChaLearn Dataset)

4

State-of-Art88.1%recogni4onrate

N.Neverova,etal.(2015),“ModDrop:Adap6veMul6modalGestureRecogni6on,IEEEPAMI(InPress)

temporalstrides

Valida6

onError,%

TrainingStage

KeyInsights•  Weadoptedastrategywhere“likemodali6es”arefusedfirst;itresemblesbrain’smul6-modalfusionstrategy.

•  Mostpreviousworkonmul6-modellearninghasfuseddata–  Attheinputfeaturelevel(earlyfusion);or–  Atthelevelofper-modalityclassifieroutputs(latefusion)

Page 5: Deep Temporal Models (Benchmarks and …...2016 SRI International Deep Temporal Models (Benchmarks and Applicaons Analysis) Sek Chai SRI Internaonal Presented at: NICE 2016, March

2016 SRI International

Example Training Complexity (for ChaLearn Dataset)

5

Classifier Description Modality Training Data Size (GB)

Training Time Epochs Time(sec)/epoch

Motion detect

Shallow MLP used to detect the start-frame and stop-frame of a given gesture in an action set.

Motion Capture 1.663 GB 20,174 sec (5.6hrs)

200,000 0.100

Skeleton Convolutional network which is trained to extract features from motion capture data (Path M)

Motion Capture 1.663 GB 69,344 sec (19.25hrs)

200,000 0.347

Video Feature

3D convolutional layer followed by 2D convolution layer which uses depth and intensity video (ConvC1 -> ConvC2, ConD1->ConvD2 )

intensity+depth video

12.211 GB 82,655 sec (22.9 hrs)

819 100.922

Video Shared hidden layers which uses inputs from previous convolutional layers (HLV1 + HLV2)

Intensity+depth video

12.211 GB 170,223 sec (47.28 hrs)

796 213.848

Multimodal Corresponds to the fully connected shared hidden layer where multimodal inputs are fused. (HLS)

all 13.874 GB 93,247 (25.9 hrs)

449 207.677

Summary•  Total5daystoprocess42GBtrainingdataonSharcnetCopperCluster(064GPUs,128CPUcores,

24cores/node,64GB/node,x86,080TBRAIDAhachedStorage,InfiniBand,4TeslaK80s/node).•  Total#Parameters=7,836,457

Page 6: Deep Temporal Models (Benchmarks and …...2016 SRI International Deep Temporal Models (Benchmarks and Applicaons Analysis) Sek Chai SRI Internaonal Presented at: NICE 2016, March

2016 SRI International 6

Stochas6crounding

CurrentApproaches

Needs:Memoryismainbohleneck,especiallyforembeddedsolu6ons.Es6mates1Bconnec6onneuralnetworkconsumes12W*.

LowPrecisionNeuralNetworks

NetworkPruning*Image:Han,etal.(NIPS2015)

BinaryConnectImage:Courbariaux,etal.(NIPS2015)

Gupta,etal.(arXiV2015)

Page 7: Deep Temporal Models (Benchmarks and …...2016 SRI International Deep Temporal Models (Benchmarks and Applicaons Analysis) Sek Chai SRI Internaonal Presented at: NICE 2016, March

2016 SRI International

Subband/WaveletDecomposi6on

7

Subbanddecomposi6onenablesdatareduc6onbydiscardinginforma6onaboutcertainfrequencieswherehumanvisualsystemislesssensi6ve.*Canwedothesameforlearntrepresenta6ons?

[1] "The Laplacian Pyramid as a Compact Image Code", Burt et. al. [2] Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks, Denton et. al. [3] Image fusion: algorithms and applications, Stathaki.

*Goodrepresenta6on:UsedinImageCompression[1],Reconstruc6on[2]andFusion[3]extensively.

Page 8: Deep Temporal Models (Benchmarks and …...2016 SRI International Deep Temporal Models (Benchmarks and Applicaons Analysis) Sek Chai SRI Internaonal Presented at: NICE 2016, March

2016 SRI International 8

TheMNISTdatabasecontainsblackandwhitehandwrihendigits,normalizedto20x20pixelsize.Thereare60,000trainingimagesand10,000tes6ngimages.

I0

Conven6onalApproachforMNISTdata

*Image:Lecun,etal.(1998)

Page 9: Deep Temporal Models (Benchmarks and …...2016 SRI International Deep Temporal Models (Benchmarks and Applicaons Analysis) Sek Chai SRI Internaonal Presented at: NICE 2016, March

2016 SRI International 9

G1

Lo

+Fusion

+

Image(I0) Laplacian(L0) Gaussian(G1)

28x28 32x32 16x16

OurApproach

CapturesEdges Spa6alInforma6on

BackgroundCues

SeparateNetworksforSubbandLearning

BasicIdea:Weseparateimageryintodifferentfrequencybands(e.g.withdifferentinforma6oncontent)suchthattheneuralnetcanbe.erlearnusinglessbits.

Page 10: Deep Temporal Models (Benchmarks and …...2016 SRI International Deep Temporal Models (Benchmarks and Applicaons Analysis) Sek Chai SRI Internaonal Presented at: NICE 2016, March

2016 SRI International 10

Valid

a4on

Error(Logsc

ale)

Epochs

Valida6onErrorvsEpochsinMNIST

---GaussianG1

-x-Fusion

-x-LaplaceL0-o-OriginalI0

+

I0 L0 G1

28x28 32x32 16x16

Discussions•  CNNtrainedonLaplacianfocussesmoreonedges(goodfeature).

•  CNN(L0)beatsCNN(I0)onthisdataset.

Page 11: Deep Temporal Models (Benchmarks and …...2016 SRI International Deep Temporal Models (Benchmarks and Applicaons Analysis) Sek Chai SRI Internaonal Presented at: NICE 2016, March

2016 SRI International 11

Weightbits

32bit 16bit 8bit 4bit

Original 1.13 1.13 1.11 1.32

GBlur 1.38 1.41 1.36 1.38

Laplace 1.02 1.00 0.91 1.66

Fusion 0.95 0.89 0.89 1.13

Stochas4cRoundingaZereveryepochWeightbits

32bit 16bit 8bit 4bit

Original 1.13 1.22 1.14 4.45

GBlur 1.38 1.34 1.35 5.24

Laplace 1.02 0.96 1.03 6.03

Fusion 0.89 0.91 0.92 6.03

Stochas4cRoundingaZerfinalepoch

Robustness to Low Bit Precision Weights

Discussions•  Fusionresultsarecomparabletooriginal,usinghalfthenumberofbits.•  Stochas6croundingavereveryepochguidesthelearning,andisespecially

usefulforlowprecision.

•  SimpleFusion:Equi-Weightedaverageofsovmaxoutput•  Scores0.5*(s1(x)+s2(y)),s1(x),s2(x)in[0,1]10and||s(x)||1=1.

Page 12: Deep Temporal Models (Benchmarks and …...2016 SRI International Deep Temporal Models (Benchmarks and Applicaons Analysis) Sek Chai SRI Internaonal Presented at: NICE 2016, March

2016 SRI International 12

Cifar-10 data set

Discussions:•  Cluheredcolorimages,andmorechallengingthanMNIST.•  Containsbackgroundcuesandcontextthatcanhelprecogni6on,(e.g.blue

skyforairplaneorwaterforship).•  Architectureisthesameasbefore(LeNet-5)with30and60featuremaps.•  Ourgoalistoshowcompara6veresultsforlowprecision.Therearenodata

augmenta6on.

TheCIFAR-10datasetconsistsof6000032x32colorimagesin10classes,with6000imagesperclass.Thereare50000trainingimagesand10000testimages.

Page 13: Deep Temporal Models (Benchmarks and …...2016 SRI International Deep Temporal Models (Benchmarks and Applicaons Analysis) Sek Chai SRI Internaonal Presented at: NICE 2016, March

2016 SRI International 13

0.28

0.38

0.48

0.58

0.68

0.78

Testerror

Epoch

CIFAR-10Performance

Original Laplacian Gblur Fusion

Clu.eredimagecombinedwithhighlearningrate

Gaussianblurremovesnoiseinclu.er

Laplacianenhancesforeground

Page 14: Deep Temporal Models (Benchmarks and …...2016 SRI International Deep Temporal Models (Benchmarks and Applicaons Analysis) Sek Chai SRI Internaonal Presented at: NICE 2016, March

2016 SRI International

Conclusion

14

• Hybrid-mul6modalneuralnetworksimprovesalgorithmicperformance.

•  Fusionoflearntrepresenta6onsisimportant.•  Lowprecisionnetworksshowspromise.

•  StopbyandvisittheposteratNICE2016.

Chai,etal.,"LowPrecisionNeuralNetworksusingSubbandDecomposiHon",CogArch,April2016