Upload
mark-chang
View
2.360
Download
0
Embed Size (px)
Citation preview
Varia%onalAutoencoder
MarkChang
OriginalPaper
• Title:– Auto-EncodingVaria%onalBayes
• Author:– DiederikP.Kingma– MaxWelling
• Organiza%on:– MachineLearningGroup,UniversiteitvanAmsterdam
Outlines
• Varia%onalInference• Varia%onalAutoencoder• Experiment• FurtherResearch
Varia%onalInference
• ProblemDefini%on– ObservableData:– HiddenVariable:– PosteriorDistribu%onofhiddenvariablegivensomedata:
Intractabletocompute
z = {z1, z2, ..., zn}x
z
m
n
p(z|x) = p(z,x)
p(x)=
p(x|z)p(z)Rp(x|z)p(z)dz
x = {x1, x2, ..., xm}
Varia%onalInference
• Solu%onsforIntractablePosterior– MonteCarloSampling
• MetropolisHas%ng• GibbsSampling
– Varia%onalInference
Varia%onalInference
• Approximate by• MinimizetheKLDivergence:
p(z|x) q(z)
DKL[q(z)||p(z|x)] =Z
q(z)logq(z)
p(z|x)dz
Evidence(Varia%onal)LowerBound
EvidenceLowerBound(ELBO):
= �(Eq(z)[logp(z,x)]� Eq(z)[logq(z)]) + logp(x)
L[q(z)]
DKL[q(z)||p(z|x)] =Z
q(z)logq(z)
p(z|x)dz
=
Zq(z)log
q(z)p(x)
p(z,x)dz
=
Zq(z)log
q(z)
p(z,x)dz+
Zq(z)logp(x)dz
=
Zq(z)(logq(z)� logp(z,x))dz+ logp(x)
EvidenceLowerBound
Minimize
isequaltoMaximize L[q(z)]
DKL[q(z)||p(z|x)] = �L[q(z)] + logp(x)
logp(x) = DKL[q(z)||p(z|x)] + L[q(z)]
DKL[q(z)||p(z|x)]
Mean-FieldVaria%onalInference
• Qcanbefactorized:
q(z) =Y
i
q(zi|✓i)
8i,Z
q(zi|✓i)dzi = 1
MinimizeDKL[q(z)||p(z|x)]
q(z)
p(z|x)
hXp://cpmarkchang.logdown.com/posts/737247-pgm-varia%onal-inference
Varia%onalAutoencoder
q�(z|x)EncoderNetwork
DecoderNetwork
p✓(x|z)
DKL[q�(z|x)||p✓(z|x)]Minimize:
p✓(z|x) =p✓(x|z)p✓(z)
p✓(x)Intractable:
Varia%onalAutoencoder
logp✓(x) = DKL[q�(z|x)||p✓(z|x)] + L(✓,�,x)
L(✓,�,x) = Eq�(z|x)[logp✓(x, z)� logq�(z|x)]
= �DKL[q�(z|x)||p✓(z)] + Eq�(z|x)[logp✓(x|z)]
MarginalLikelihood:
Varia%onalLowerBound:
= Eq�(z|x)[logp✓(z) + logp✓(x|z)� logq�(z|x)]
= Eq�(z|x)[logp✓(z)
q�(z|x)+ p✓(x|z)]
MonteCarloGradientEs%matorGradientofcontainswhichisIntractable
L(✓,�,x)
UseMonteCarloGradientEs%mator:
where
r�Eq�(z|x)[logp✓(x|z)]
r�Eq�(z)[f(z)] = r�
Zq�(z)f(z)dz
=
Zq�(z)f(z)
r�q�(z)
q�(z)dz =
Zq�(z)f(z)r�logq�(z)dz
= Eq�(z)[f(z)r�logq�(z)]
⇡ 1
L
LX
l=1
f(z)r�logq�(z(l)) z(l) ⇠ q�(z)
Objec%veFunc%on
L(✓,�,x(i)) = �DKL[q�(z|x(i)
)||p✓(z)] + Eq�(z|x(i))[logp✓(x(i)|z)]
MonteCarloGradientEs%matorL̃(✓,�,x(i)) ⇡ L(✓,�,x(i))
where z
(l) ⇠ q�(z|x(i,l))
˜L(✓,�,x(i)) = �DKL[q�(z|x(i)
)||p✓(z)] +1
L
LX
l=1
logp✓(x(i)|z(i,l))
Reparameteriza%onTrick
✏ ⇠ p(✏)z ⇠ q�(z|x)
determinis%cvariable
auxiliaryvariable
z = µ+ �✏
Example: ✏ ⇠ N (0, 1)z ⇠ p(z|x) = N (µ,�2)
z = g�(✏,x)
Reparameteriza%onTrick
z(i,l) = µ(i) + �(i) � ✏(l)
x
(i)
EncoderNetworks
q�(z|x)
logq�(z|x(i)) = logN (z, µ(i),�2(i)
I)
✏ ⇠ N (0, I)
Reparameteriza%onTrick
z(i,l) = µ(i) + �(i) � ✏(l)
x
(i)
EncoderNetworks
x
(i)
q�(z|x) p✓(x|z)
z(i,l)
DecoderNetworks
✏ ⇠ N (0, I)
Objec%veFunc%on
˜L(✓,�,x(i)) = �DKL[q�(z|x(i)
)||p✓(z)] +1
L
LX
l=1
(logp✓(x(i)|z(i,l))
˜L(✓,�,x(i)) =
1
2
JX
j=1
(1+log((�(i)j )
2)�(µ(i)
j )
2�(�(i)j )
2)+
1
L
LX
l=1
(logp✓(x(i)|z(i,l))
Regulariza%on Reconstruc%onError
p✓(z) = N (z, 0, I)
q�(z|x(i)) = N (z, µ(i),�2(i)I)
Training
Experiment
Horizontalaxis:sizeoftrainingdataVer%calaxis:evidenceLowerBoundNz:dimensionsofhiddenvariables
Experiment
Experiment
Visualiza%onof2dlatentspace
FurtherResearchDRAW:ARecurrentNeuralNetworkFor
ImageGenera%onKarolGregor,IvoDanihelka,AlexGraves,DaniloJimenezRezendeandDaanWierstra
FurtherResearch
NeuralVaria%onalInferenceforTextProcessing
YishuMiao,LeiYu&PhilBlunsom
NeuralVaria%onalDocumentModel
NeuralAnswerSelec%onModel
FurtherResearch
DeepConvolu%onalInverseGraphicsNetworkTejasD.Kulkarni,WilliamF.Whitney,PushmeetKohli,JoshuaB.Tenenbaum
SourceCode
• hXps://jmetzen.github.io/2015-11-27/vae.html
Reference
• CharlesFox,StephenRoberts.ATutorialonVaria%onalBayesianInference.– hXp://www.orchid.ac.uk/eprints/40/1/fox_vbtut.pdf
• DiederikPKingma,MaxWelling.Auto-EncodingVaria%onalBayes.– hXps://arxiv.org/pdf/1312.6114v10.pdf