10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE...

10417/10617IntermediateDeepLearning:

Fall2019RussSalakhutdinov

Machine Learning Departmentrsalakhu@cs.cmu.edu

https://deeplearning-cmu-10417.github.io/

Variational Autoencoders

MotivatingExample• Canwegenerateimagesfromnaturallanguagedescriptions? Astopsignisflyingin

blueskies

Apaleyellowschoolbusisflyinginblueskies

Aherdofelephantsisflyinginblueskies

Alargecommercialairplaneisflyinginblueskies

(Mansimov,Parisotto,Ba,Salakhutdinov,2015)

OverallModel

VariationalAutoecnoder

StochasticLayer

Motivation• Hinton,G.E.,Dayan,P.,Frey,B.J.andNeal,R.,Science1995

Inputdata

GenerativeProcessApproximate

Inference

• Kingma&Welling,2014

• Rezende,Mohamed,Daan,2014

• Mnih&Gregor,2014

• Bornschein&Bengio,2015

• Tang&Salakhutdinov,2013

VariationalAutoencoders(VAEs)• TheVAEdefinesagenerativeprocessintermsofancestralsamplingthroughacascadeofhiddenstochasticlayers:

Eachtermmaydenoteacomplicatednonlinearrelationship

•  Samplingandprobabilityevaluationistractableforeach.

GenerativeProcess

•  denotesparametersofVAE.

•  isthenumberofstochasticlayers.

Inputdata 5

VAE:Example• TheVAEdefinesagenerativeprocessintermsofancestralsamplingthroughacascadeofhiddenstochasticlayers:

Thistermdenotesaone-layerneuralnet.

DeterministicLayer

StochasticLayer

•  denotesparametersofVAE.

•  Samplingandprobabilityevaluationistractableforeach.

•  isthenumberofstochasticlayers.

RecognitionNetwork• Therecognitionmodelisdefinedintermsofananalogousfactorization:

Inputdata

GenerativeProcess

Eachtermmaydenoteacomplicatednonlinearrelationship

•  Theconditionals:

areGaussianswithdiagonalcovariances

ApproximateInference

•  Weassumethat

VariationalBound• TheVAEistrainedtomaximizethevariationallowerbound:

Inputdata

•  Hardtooptimizethevariationalboundwithrespecttotherecognitionnetwork(high-variance).

•  KeyideaofKingmaandWellingistousereparameterizationtrick.

•  Tradingoffthedatalog-likelihoodandtheKLdivergencefromthetrueposterior.

ReparameterizationTrick• AssumethattherecognitiondistributionisGaussian:

withmeanandcovariancecomputedfromthestateofthehiddenunitsatthepreviouslayer.

•  Alternatively,wecanexpressthisintermofauxiliaryvariable:

• AssumethattherecognitiondistributionisGaussian:

•  Or

DeterministicEncoder

•  Therecognitiondistributioncanbeexpressedintermsofadeterministicmapping:

Distributionofdoesnotdependon

ReparameterizationTrick

ComputingtheGradients•  Thegradientw.r.ttheparameters:bothrecognitionandgenerative:

Gradientscanbecomputedbybackprop

Themappinghisadeterministicneuralnetforfixed.

wherewedefinedunnormalizedimportanceweights:

•  VAEupdate:Lowvarianceasitusesthelog-likelihoodgradientswithrespecttothelatentvariables.

•  Thegradientw.r.ttheparameters:recognitionandgenerative:

•  Approximateexpectationbygeneratingksamplesfrom:

ComputingtheGradients

VAE:Assumptions•  Rememberthevariationalbound:

•  Thevariationalassumptionsmustbeapproximatelysatisfied.

•  Weshowthatwecanrelaxtheseassumptionsusingatighterlowerboundonmarginallog-likelihood.

•  Theposteriordistributionmustbeapproximatelyfactorial(commonpractice)andpredictablewithafeed-forwardnet.

ImportanceWeightedAutoencoders•  Considerthefollowingk-sampleimportanceweightingofthelog-likelihood:

wherearesampledfromtherecognitionnetwork.

Inputdata

unnormalizedimportanceweights

ImportanceWeightedAutoencoders•  Considerthefollowingk-sampleimportanceweightingofthelog-likelihood:

•  Thisisalowerboundonthemarginallog-likelihood:

•  SpecialCaseofk=1:SameasstandardVAEobjective.

•  UsingmoresamplesàImprovesthetightnessofthebound.15

TighterLowerBound

•  Forallk,thelowerboundssatisfy:

•  Usingmoresamplescanonlyimprovethetightnessofthebound.

•  Moreoverifisbounded,then:

ComputingtheGradients•  Wecanusetheunbiasedestimateofthegradientusingreparameterizationtrick:

wherewedefinenormalizedimportanceweights:

IWAEsvs.VAEs•  Drawk-samplesformtherecognitionnetwork-  ork-setsofauxiliaryvariables.

•  ObtainthefollowingMonteCarloestimateofthegradient:

•  ComparethistotheVAE’sestimateofthegradient:

Firstterm:- Decoder:encouragesthegenerativemodeltoassignhighprobabilitytoeach.

IWAE:Intuition•  Thegradientofthelogweightsdecomposes:

Deterministicdecoder

Inputdata

.-  Encoder:encouragestherecognitionnettoadjustitslatentstateshsothatthegenerativenetworkmakesbetterpredictions.

Secondterm:-  Encoder:encouragestherecognitionnetworktohaveaspread-outdistributionoverpredictions.

IWAE:Intuition•  Thegradientofthelogweightsdecomposes:

Deterministicdecoder

Inputdata

TwoArchitectures

•  FortheMNISTexperiments,weconsideredtwoarchitectures:

DeterministicLayers

1-stochasticlayer

2-stochasticlayers

StochasticLayers

DeterministicLayers

MNISTResults

LatentSpaceRepresentation•  BothVAEsandIWAEstendtolearnlatentrepresentationswitheffectivedimensionsfarbelowtheircapacity.

•  Measuretheactivityofthelatentdimensionuusingthestatistics:

•  Optimizationissue?

•  Thedistributionofconsistoftwoseparatedmodes.

•  Inactivedimensionsàunitsdyingout.

IWAEsvs.VAEs

OMNIGLOTExperiments

ModelingImagePatchesBSDSDataset

•  Model8x8patches.

DeterministicLayer

1-stochasticlayerStochasticLayer

•  Reporttestlog-likelihoodson10^68x8patches,extractedfromBSDStestdataset.

•  EvaluationprotocolestablishedbyUria,MurrayandLarochelle):-  adduniformnoisebetween0and1,divideby256,-  subtractthemeananddiscardingthelastpixel 28

TestLog-probabilitiesModel nats Bits/pixel

RNADE6hiddenlayers(Uriaet.al.2013) 155.2nats 3.55bit/pixel

MoG,200full-covariancemixture(ZoranandWeiss,2012)

152.8nats 3.50bit/pixel

IWAE(k=500) 151.4nats 3.47bit/pixelVAE(k=500) 148.0nats 3.39bit/pixelGSM(GaussianScaleMixture) 142nats 3.25bit/pixel

ICA 111nats 2.54bit/pixelPCA 96nats 2.21bit/pixel

Burda2015 29

LearnedFilters

Burda2015 30

MotivatingExample• Canwegenerateimagesfromnaturallanguagedescriptions? Astopsignisflyingin

blueskies

Apaleyellowschoolbusisflyinginblueskies

Aherdofelephantsisflyinginblueskies

Alargecommercialairplaneisflyinginblueskies

OverallModel

VariationalAutoecnoder

StochasticLayer

Sequence-to-Sequence• Sequence-to-sequenceframework.(Sutskeveretal.2014;Choetal.2014;Srivastavaetal.2015)

• Caption(y)isrepresentedasasequenceofconsecutivewords.• Image(x)isrepresentedasasequenceofpatchesdrawnoncanvas.

• Attentionmechanismover:- Words:Whichwordstofocusonwhengeneratingapatch- ImageLocationWheretoplacethegeneratedpatchesonthecanvas

RepresentingCaptionsBidirectionalRNN

• ForwardRNNreadsthesentenceyfromlefttoright:

• BackwardRNNreadsthesentenceyfromrighttoleft:

• Thehiddenstatesarethenconcatenated:

• Ateachstepthemodelgeneratesapxppatch.

DRAWModelwriteoperator:

whosefilterlocationsandscalesarecomputedfrom:

• ItgetstransformedintowxhcanvasusingtwoarraysofGaussianfilterbanks

(Gregoret.al.2015)

OverallModel

• GenerativeModel:StochasticRecurrentNetwork,chainedsequenceofVariationalAutoencoders,withasinglestochasticlayer.

StochasticLayer

Gregoret.al.2015

Bidirectional LSTM

OverallModel

• GenerativeModel:StochasticRecurrentNetwork,chainedsequenceofVariationalAutoencoders,withasinglestochasticlayer.• RecognitionModel:DeterministicRecurrentNetwork.Gregoret.al.2015

Bidirectional LSTM

StochasticLayer

• Attention(alignment):Focusondifferentwordsatdifferenttimestepswhengeneratingpatchesandplacingthemonthecanvas.

Sentencerepresentation:dynamicallyweightedaverageofthehiddenstatesrepresentingwords.

Bahdanauet.al.2015

OverallModel

StochasticLayer

GeneratingImages

• Imageisrepresentedasasequenceofpatches(t=1,…T)drawnoncanvas:

GeneratingImages

• Inpractice,weusetheconditionalmean:.41

AlignmentModel

• Dynamicsentencerepresentationattimet:weightedaverageofthebi-directionalhiddenstates:

wherethealignmentprobabilitiesarecomputedas:

Learning

• Maximizethevariationallowerboundonthemarginallog-likelihoodofthecorrectimagexgiventhecaptiony:

Sharpening

• Additionalpostprocessingstep:useanadversarialnetworktrainedonresidualsofaLaplacianpyramidtosharpenthegeneratedimages(Dentonet.al.2015).

MSCOCODataset• Contains83Kimages.

Linet.al.2014

• Eachimagecontains5captions.

• Standardbenchmarkdatasetformanyoftherecentimagecaptioningsystems.

FlippingColorsAyellowschoolbusparkedintheparkinglot

Aredschoolbusparkedintheparkinglot

Agreenschoolbusparkedintheparkinglot

Ablueschoolbusparkedintheparkinglot

FlippingBackgroundsAverylargecommercialplaneflyinginclearskies.

Averylargecommercialplaneflyinginrainyskies.

Aherdofelephantswalkingacrossadrygrassfield.

Aherdofelephantswalkingacrossagreengrassfield.

FlippingObjectsThedecadentchocolatedesertisonthetable.

Abowlofbananasisonthetable..

Avintagephotoofacat. Avintagephotoofadog.

QualitativeComparisonAgroupofpeoplewalkonabeachwithsurfboardsOurModel LAPGAN(Dentonet.al.2015)

FullyConnectedVAEConv-DeconvVAE

VariationalLower-Bound• Wecanestimatethevariationallower-boundontheaveragetestlog-probabilities:

• Atleastwecanseethatwedonotoverfittothetrainingdata,unlikemanyotherapproaches.

Model Training Test

OurModel -1792,15 -1791,53Skipthought-Draw -1794,29 -1791,37noAlignDraw -1792,14 -1791,15

NovelSceneCompositionsAtoiletseatsitsopeninthebathroom

AskGoogle?

Atoiletseatsitsopeninthegrassfield

10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE...

Documents

sistemadje.tjmt.jus.brsistemadje.tjmt.jus.br/publicacoes/10617-2019 C9 Comarcas - Entrância... · TRIBUNAL PLENO Reunir-se-á mediante convocação do Presidente do Tribunal Des

perpustakaan.uns.ac.id digilib.uns.ac.id ALARM SMS ...eprints.uns.ac.id/10617/1/189101011201111161.pdfmurah, masyarakat akan lebih cepat menerima peringatan banjir, dan juga sangat

SUPER 2 Chain Saw UT-10617-A Page 1 of 8 2 Chain Saw UT-10617-A Page 1 of 8 For Homelite Discount Parts Call 606-678-9623 or 606-561-4983

Mała ojczyzna Kultura - Edukacja - Rozwój lokalnybiblioteka.teatrnn.pl/dlibra/Content/10417/Mala_ojczyzna.pdf · Agnieszka Ciesiołkiewicz - Organizacje społeczne jako środowisko

SUPER 2 Chain Saw UT-10617-A Page 1 of 8...SUPER 2 Chain Saw UT-10617-A Page 3 of 8 For Homelite Discount Parts Call 606-678-9623 or 606-561-4983 Size: 1MBPage Count: 8

Case 20-10417-BLS Doc 37 Filed 02/26/20 Page 1 of 47 · Case 20-10417-BLS Doc 37 Filed 02/26/20 Page 1 of 47. Case 20-10417-BLS Doc 37 Filed 02/26/20 Page 2 of 47. Case 20-10417-BLS

10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_DBN.pdfDepartement d’informatique´ Universite de Sherbrooke´ hugo.larochelle@usherbrooke.ca October

DELEGATURA PARA ASUNTOS … · reyes hamburguer maria del pilar financar s.a 10417 20/02/2015 1 auto 10417 9 '14 ... duque gonzalez david aerovias del continente americano s.a. avianca

Mergers & Acquisitions - studenttheses.cbs.dkstudenttheses.cbs.dk/bitstream/handle/10417/1563/helene_jo_bjerre... · Mergers & Acquisitions ... Synergies are often used as the justification

Bang & Olufsen - studenttheses.cbs.dkstudenttheses.cbs.dk/bitstream/handle/10417/913/kim_thisted... · Porters Five Forces, Segmenteringsanalyse Bowmans Strategy Clock Ansoffs vækststrategier

Accommodations & Special Education Assessments Update TETN Event #10417 November 18, 2011 TEA Student Assessment Division

SarahRohleder035 10617 000-13-084 085 Softball Baseball

Modulating Control Valves PN16 MXG461 MXF461 10417 Hq En

bibliotecadigital.fgv.brbibliotecadigital.fgv.br/dspace/bitstream/handle/10438/10617/... · A INDUSTRIALIZAÇAO DO CEARA: / ... cias comportamentais dos principais grupos que compõem

Case 20-10417-BLS Doc 149 Filed 05/28/20 Page 1 of 19€¦ · Case 20-10417-BLS Doc 149 Filed 05/28/20 Page 19 of 19. In re Cosi, Inc. (and subsidiaries) Debtors UNITED STATES BANKRUPTCY

Case 1:20-cv-10617-WGY Document 91-1 Filed 04/16/20 Page 1

THE NEW LEADERSHIP GENRE - studenttheses.cbs.dkstudenttheses.cbs.dk/bitstream/handle/10417/2621/anders_hyldelund... · THE NEW LEADERSHIP GENRE ... Servant-Leadership ... based on

scholar.unand.ac.idscholar.unand.ac.id/10617/1/201501210731st_skripsi.pdf · Untuk itu peneliti ingin mengetahui bentuk komunikasi interpersonal yang dilakukan ayah dan anak remaja

McKim 10617-000-12-146-147 new

IS 10617 (2013): Hermetic CompressorsIS 10617 : 2013 1 Indian Standard HERMETIC COMPRESSORS — SPECIFICATION (First Revision) 1SCOPE This standard applies to hermetically sealed refrigeration