View
1.757
Download
1
Embed Size (px)
Citation preview
Bayesian Dark Knowledge and Matrix Factorization
Masatoshi UeharaMentor: Oono Kenta, Brian Vogel
October 27, 2016
Contents
1 Introduction
2 Bayesian Dark Knowledge with various SG-MCMC methods
3 Matrix Factorization
(JPN) Masatoshi October 27, 2016 2 / 18
Introduction
Introduction
SG-MCMC is a sampling algorithm towards large data.
We apply a variety of SG-MCMC methods to Bayesian DarkKnowledge.
We combine GANs with Bayesian Dark Knowledge.
We apply SG-MCMC and neural networks to matrix factorization.
(JPN) Masatoshi October 27, 2016 3 / 18
Introduction
SGLD
SGLD
SGLD is a method combining with SGD and MLA(a samplingalgorithm)
θt+1 ← θt − εtD∇U(θt) + N(0, 2εD)
In the case of Bayesian Neural Network, the formula is as follows:
∆θt =εt2
(∇ log p(θt) +
N
n
∑∇ log p(yti |xti , θt)
)+ ηt , ηt ∼ N(0, εt).
Note that the noise term is removed in SGD.
(JPN) Masatoshi October 27, 2016 4 / 18
Bayesian Dark Knowledge with various SG-MCMC methods
Bayesian Dark Knowledge Overview
Overview
Bayesian Dark knowledge is a method of combining SGLD with theconcept of distillation.
SGLD is a useful method for learning Bayeisian Deep Networks.
The problem is that SGLD needs to archive many copies ofparameters.
The motivation is replacing a set of neural networks with a singledeep network.
We can estimate the confidence rate even if data number is small.
(JPN) Masatoshi October 27, 2016 5 / 18
Bayesian Dark Knowledge with various SG-MCMC methods
Method
Teacher networks is denoted as p(y |x ,DN).Student network is denoted as S(y |x , ω).
In the distillation phase, the followingequation is minimized.
Distillation loss
L(ω) =
∫p(ω|x)p(x)
≈ 1
Θ
1
D ′
∑θ∈Θ
∑x ′∈D′
p(y |x ′, θ)[S(y |x ′, ω)]dx
(JPN) Masatoshi October 27, 2016 6 / 18
Bayesian Dark Knowledge with various SG-MCMC methods
Algorithm
Algorithm
Note that the student network is trained online. We do not have toarchive many copies of parameters.
(JPN) Masatoshi October 27, 2016 7 / 18
Bayesian Dark Knowledge with various SG-MCMC methods
How to improve?
We want to make a variety of teachers.
Use other SG-MCMC methods.
How to make unlabeled data set?
Use GANs.
(JPN) Masatoshi October 27, 2016 8 / 18
Bayesian Dark Knowledge with various SG-MCMC methods
SG-HMC and SG-NHT
SG-HMC
θt+1 ← θt + εM−1rt
rt+1 ← rt − εt∇U(θt)− εtCM−1rt + N(0, εt(2C − εtBt))
SG-NHT
θt+1 ← θt + εrt
rt+1 ← rt − εt∇U(θt)− εtζtrt + N(0, εt(2C − εtBt))
ζt+1 ← ζt + (1
drTt rt − 1)
(JPN) Masatoshi October 27, 2016 9 / 18
Bayesian Dark Knowledge with various SG-MCMC methods
Bayesian Dark Knowledge with GANs
GANs can mimic the empiricaldistribution.
In the distillation phase, we use GANsas a simulator.
How to remove poor images....
(JPN) Masatoshi October 27, 2016 10 / 18
Bayesian Dark Knowledge with various SG-MCMC methods
Anormaly detection by GANs
uLSIF
GAN
(JPN) Masatoshi October 27, 2016 11 / 18
Bayesian Dark Knowledge with various SG-MCMC methods
Result : MNIST
Setting: 800 labeled samples in MNIST, Epoch: 2000, Burn-inintervals:200, Thinning intervals:5.
Network 784-1200-1200-10, Activation: Relu
Result
(JPN) Masatoshi October 27, 2016 12 / 18
Matrix Factorization
Matrix Factorization
Rating matrix is given.
ui ....user feature, vj ...itemfeature , Rij ... rating matrix.
When learning, use SGD.
ui+1 ← ui −∇ui [(Ri ,j − uTi vj)2 + λu2
i ]
vj+1 ← vj −∇vj [(Ri ,j − uTi vj)2 + λv2
j ]
(JPN) Masatoshi October 27, 2016 13 / 18
Matrix Factorization
Matrix Factorization with SGLD
p(R|U,V , τ) =L∏
i=1
M∏j=1
[N(Rij |UTi Vj , τ
−1]Iij
p(U|λU) =L∏
i=1
N(Ui |0, λ−1U )
p(V |λV ) =M∏j=1
N(Vj |0, λ−1V )
λUd∼ Gamma(α0, β0)
λVd∼ Gamma(α0, β0)
Use Gibbs Sampling.
When updating u and v, SGLDis used.
λ is automatically tuned.
(JPN) Masatoshi October 27, 2016 14 / 18
Matrix Factorization
Neural Network Matrix Factorization
Estimate Xn,m by the equation:Xn,m = fθ(Un,Vm)
Cost function:∑(Xn,m − Xn,m)2 + λ[
∑‖Un‖2
2 +∑‖Vm‖2
2]
Update θ, un, vm at the same time
NNMF can reach state of the art accuracy....
(JPN) Masatoshi October 27, 2016 15 / 18
Matrix Factorization
Results
Use ML-100K, ML-1M data set.
Evaluate by root mean square method(RMSE).
Unfortunately, state of the art accuracy is not reproduced....
(JPN) Masatoshi October 27, 2016 16 / 18
Matrix Factorization
Discussion
Does data generated by GANs help classifiers?
What is a good method of combining Neural Networks with matrixfactorization?
(JPN) Masatoshi October 27, 2016 17 / 18
Matrix Factorization
References Papers
Large-Scale Distributed Bayesian Matrix Factorization usingStochastic Gradient MCMC
Neural Network Matrix Factorization
A Complete Recipe for Stochastic Gradient MCMC
Bayesian Dark Knowledge
Probabilistic Matrix Factorization
(JPN) Masatoshi October 27, 2016 18 / 18