Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
Approximate Inference:Variational Inference
CMSC 678UMBC
Outline
Recap of graphical models & belief propagation
Posterior inference (Bayesian perspective)
Math: exponential family distributions
Variational InferenceBasic TechniqueExample: Topic Models
Recap from last time…
Graphical Models
𝑝𝑝 𝑥𝑥1, 𝑥𝑥2, 𝑥𝑥3, … , 𝑥𝑥𝑁𝑁 = �𝑖𝑖
𝑝𝑝 𝑥𝑥𝑖𝑖 𝜋𝜋(𝑥𝑥𝑖𝑖))
Directed Models (Bayesian networks)
Undirected Models (Markov random fields)
𝑝𝑝 𝑥𝑥1, 𝑥𝑥2, 𝑥𝑥3, … , 𝑥𝑥𝑁𝑁 =1𝑍𝑍�𝐶𝐶
𝜓𝜓𝐶𝐶 𝑥𝑥𝑐𝑐
Markov Blanket
x
Markov blanket of a node x is its parents, children, and
children's parents
𝑝𝑝 𝑥𝑥𝑖𝑖 𝑥𝑥𝑗𝑗≠𝑖𝑖 =𝑝𝑝(𝑥𝑥1, … , 𝑥𝑥𝑁𝑁)
∫ 𝑝𝑝 𝑥𝑥1, … , 𝑥𝑥𝑁𝑁 𝑑𝑑𝑥𝑥𝑖𝑖
=∏𝑘𝑘 𝑝𝑝(𝑥𝑥𝑘𝑘|𝜋𝜋 𝑥𝑥𝑘𝑘 )
∫ ∏𝑘𝑘 𝑝𝑝 𝑥𝑥𝑘𝑘 𝜋𝜋 𝑥𝑥𝑘𝑘 )𝑑𝑑𝑥𝑥𝑖𝑖factor out terms not dependent on xi
factorization of graph
=∏𝑘𝑘:𝑘𝑘=𝑖𝑖 or 𝑖𝑖∈𝜋𝜋 𝑥𝑥𝑘𝑘 𝑝𝑝(𝑥𝑥𝑘𝑘|𝜋𝜋 𝑥𝑥𝑘𝑘 )
∫ ∏𝑘𝑘:𝑘𝑘=𝑖𝑖 or 𝑖𝑖∈𝜋𝜋 𝑥𝑥𝑘𝑘 𝑝𝑝 𝑥𝑥𝑘𝑘 𝜋𝜋 𝑥𝑥𝑘𝑘 )𝑑𝑑𝑥𝑥𝑖𝑖
the set of nodes needed to form the complete conditional for a variable xi
Markov Random Fields withFactor Graph Notation
x: original pixel/state
y: observed (noisy)
pixel/state
factor nodes are added
according to maximal cliques
unaryfactor
variable
factor graphs are bipartite
binaryfactor
Two Problems for Undirected Models
Finding the normalizer
𝑍𝑍 = �𝑥𝑥
�𝑐𝑐
𝜓𝜓𝑐𝑐(𝑥𝑥𝑐𝑐)
Computing the marginals
𝑍𝑍𝑛𝑛(𝑣𝑣) = �𝑥𝑥:𝑥𝑥𝑛𝑛=𝑣𝑣
�𝑐𝑐
𝜓𝜓𝑐𝑐(𝑥𝑥𝑐𝑐)Q: Why are these difficult?
A: Many different combinations
Sum over all variable combinations, with the xn
coordinate fixed
𝑍𝑍2(𝑣𝑣) = �𝑥𝑥1
�𝑥𝑥3
�𝑐𝑐
𝜓𝜓𝑐𝑐(𝑥𝑥 = 𝑥𝑥1, 𝑣𝑣, 𝑥𝑥3 )
Example: 3 variables, fix the
2nd dimensionBelief propagation algorithms
• sum-product (forward-backward in HMMs)
• max-product/max-sum (Viterbi)
Sum-ProductFrom variables to factors
𝑞𝑞𝑛𝑛→𝑚𝑚 𝑥𝑥𝑛𝑛 = �𝑚𝑚′∈𝑀𝑀(𝑛𝑛)\𝑚𝑚
𝑟𝑟𝑚𝑚′→𝑛𝑛 𝑥𝑥𝑛𝑛
From factors to variables
𝑟𝑟𝑚𝑚→𝑛𝑛 𝑥𝑥𝑛𝑛= �
𝒘𝒘𝑚𝑚\𝑛𝑛
𝑓𝑓𝑚𝑚 𝒘𝒘𝑚𝑚 �𝑛𝑛′∈𝑁𝑁(𝑚𝑚)\𝑛𝑛
𝑞𝑞𝑛𝑛′→𝑚𝑚(𝑥𝑥𝑛𝑛𝑛)
n
m
n
m
set of variables that the mth factor depends on
set of factors in which variable n participates
sum over configuration of variables for the mth factor,
with variable n fixed
default value of 1 if empty product
Outline
Recap of graphical models & belief propagation
Posterior inference (Bayesian perspective)
Math: exponential family distributions
Variational InferenceBasic TechniqueExample: Topic Models
Goal: Posterior Inference
Hyperparameters αUnknown parameters ΘData:
Likelihood model:
p( | Θ )
pα( Θ | )
we’re going to be Bayesian (perform Bayesian inference)
Posterior Classification vs.Posterior Inference
“Frequentist” methods
prior over labels (maybe), not weights
Bayesian methods
Θ includes weight parameters
pα( Θ | )pα,w ( y| )
(Some) Learning Techniques
MAP/MLE: Point estimation, basic EM
Variational Inference: Functional Optimization
Sampling/Monte Carlo
today
next class
what we’ve already covered
Outline
Recap of graphical models & belief propagation
Posterior inference (Bayesian perspective)
Math: exponential family distributions
Variational InferenceBasic TechniqueExample: Topic Models
Exponential Family Form
Exponential Family Form
Support function• Formally necessary, in practice
irrelevant
Exponential Family Form
Distribution Parameters• Natural parameters• Feature weights
Exponential Family Form
Feature function(s)• Sufficient statistics
Exponential Family Form
Log-normalizer
Exponential Family Form
Log-normalizer
Why? Capture Common Distributions
Discrete (Finite distributions)
Why? Capture Common Distributions
• Gaussian
https://kanbanize.com/blog/wp-content/uploads/2014/07/Standard_deviation_diagram.png
Why? Capture Common Distributions
Dirichlet (Distributions over (finite) distributions)
Why? Capture Common Distributions
Discrete (Finite distributions)
Dirichlet (Distributions over (finite) distributions)
Gaussian
Gamma, Exponential, Poisson, Negative-Binomial, Laplace, log-Normal,…
Why? “Easy” Gradients
Observed feature countsCount w.r.t. empirical distribution
Expected feature countsCount w.r.t. current model parameters
(we’ve already seen this with maxent models)
Why? “Easy” Expectations
expectation of the sufficient
statistics
gradient of the log normalizer
Why? “Easy” Posterior Inference
Why? “Easy” Posterior Inference
p is the conjugate prior for q
Why? “Easy” Posterior Inference
p is the conjugate prior for q
Posterior p has same form as prior p
Why? “Easy” Posterior Inference
p is the conjugate prior for q
Posterior p has same form as prior p
All exponential family models have a conjugate prior (in theory)
Why? “Easy” Posterior Inference
p is the conjugate prior for q
Posterior p has same form as prior p
Posterior Likelihood Prior
Dirichlet (Beta) Discrete (Bernoulli) Dirichlet (Beta)
Normal Normal (fixed var.) Normal
Gamma Exponential Gamma
Outline
Recap of graphical models & belief propagation
Posterior inference (Bayesian perspective)
Math: exponential family distributions
Variational InferenceBasic TechniqueExample: Topic Models
Goal: Posterior Inference
Hyperparameters αUnknown parameters ΘData:
Likelihood model:
p( | Θ )
pα( Θ | )
(Some) Learning Techniques
MAP/MLE: Point estimation, basic EM
Variational Inference: Functional Optimization
Sampling/Monte Carlo
today
next class
what we’ve already covered
Variational Inference
Difficult to compute
Variational Inference
Difficult to compute
Minimize the “difference”
by changing λ
Easy(ier) to compute
q(θ): controlled by parameters λ
Variational Inference
Difficult to compute
Easy(ier) to compute
Minimize the “difference”
by changing λ
Variational Inference: A Gradient-Based Optimization Technique
Set t = 0Pick a starting value λt
Until converged:1. Get value y t = F(q(•;λt))2. Get gradient g t = F’(q(•;λt))3. Get scaling factor ρ t4. Set λt+1 = λt + ρt*g t5. Set t += 1
Variational Inference: A Gradient-Based Optimization Technique
Set t = 0Pick a starting value λt
Until converged:1. Get value y t = F(q(•;λt))2. Get gradient g t = F’(q(•;λt))3. Get scaling factor ρ t4. Set λt+1 = λt + ρt*g t5. Set t += 1
Variational Inference:The Function to Optimize
Posterior of desired model
Any easy-to-compute distribution
Variational Inference:The Function to Optimize
Posterior of desired model
Any easy-to-compute distribution
Find the best distribution (calculus of variations)
Variational Inference:The Function to Optimize
Find the best distribution
Parameters for desired model
Variational Inference:The Function to Optimize
Find the best distribution
Variational parameters for θ
Parameters for desired model
Variational Inference:The Function to Optimize
Find the best distribution
Variationalparameters for θ
Parameters for desired model
KL-Divergence (expectation)
DKL 𝑞𝑞 𝜃𝜃 || 𝑝𝑝(𝜃𝜃|𝑥𝑥) =
𝔼𝔼𝑞𝑞 𝜃𝜃 log𝑞𝑞 𝜃𝜃𝑝𝑝(𝜃𝜃|𝑥𝑥)
Variational Inference
Find the best distribution
Variational parameters for θ
Parameters for desired model
Exponential Family Recap: “Easy” Expectations
Exponential Family Recap: “Easy” Posterior Inference
p is the conjugate prior for π
Variational Inference
Find the best distribution
When p and q are the same exponential family form, the variational update q(θ) is (often) computable (in closed form)
Variational Inference: A Gradient-Based Optimization Technique
Set t = 0Pick a starting value λtLetF(q(•;λt)) = KL[q(•;λt) || p(•)]
Until converged:1. Get value y t = F(q(•;λt))2. Get gradient g t = F’(q(•;λt))3. Get scaling factor ρ t4. Set λt+1 = λt + ρt*g t5. Set t += 1
Variational Inference:Maximization or Minimization?
Evidence Lower Bound (ELBO)
log𝑝𝑝 𝑥𝑥 = log∫ 𝑝𝑝 𝑥𝑥,𝜃𝜃 𝑑𝑑𝜃𝜃
Evidence Lower Bound (ELBO)
log𝑝𝑝 𝑥𝑥 = log∫ 𝑝𝑝 𝑥𝑥,𝜃𝜃 𝑑𝑑𝜃𝜃
= log∫ 𝑝𝑝 𝑥𝑥,𝜃𝜃𝑞𝑞 𝜃𝜃𝑞𝑞(𝜃𝜃)
𝑑𝑑𝜃𝜃
Evidence Lower Bound (ELBO)
log𝑝𝑝 𝑥𝑥 = log∫ 𝑝𝑝 𝑥𝑥,𝜃𝜃 𝑑𝑑𝜃𝜃
= log∫ 𝑝𝑝 𝑥𝑥,𝜃𝜃𝑞𝑞 𝜃𝜃𝑞𝑞(𝜃𝜃)
𝑑𝑑𝜃𝜃
= log𝔼𝔼𝑞𝑞 𝜃𝜃𝑝𝑝 𝑥𝑥,𝜃𝜃𝑞𝑞 𝜃𝜃
Evidence Lower Bound (ELBO)
log𝑝𝑝 𝑥𝑥 = log∫ 𝑝𝑝 𝑥𝑥,𝜃𝜃 𝑑𝑑𝜃𝜃
= log∫ 𝑝𝑝 𝑥𝑥,𝜃𝜃𝑞𝑞 𝜃𝜃𝑞𝑞(𝜃𝜃)
𝑑𝑑𝜃𝜃
= log𝔼𝔼𝑞𝑞 𝜃𝜃𝑝𝑝 𝑥𝑥,𝜃𝜃𝑞𝑞 𝜃𝜃
≥ 𝔼𝔼𝑞𝑞 𝜃𝜃 𝑝𝑝 𝑥𝑥,𝜃𝜃 − 𝔼𝔼𝑞𝑞 𝜃𝜃 𝑞𝑞 𝜃𝜃= ℒ(𝑞𝑞)
Outline
Recap of graphical models & belief propagation
Posterior inference (Bayesian perspective)
Math: exponential family distributions
Variational InferenceBasic TechniqueExample: Topic Models
Bag-of-Items Models
Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junindepartment, central Peruvian mountain region . …
p( ) Three: 1,people: 2,attack: 2,
…p( )=Unigram counts
Bag-of-Items Models
Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junindepartment, central Peruvian mountain region . …
p( ) Three: 1,people: 2,attack: 2,
…pφ,ω( )=
Unigram counts
Global (corpus-level) parameters interact with local (document-level) parameters
Latent Dirichlet Allocation(Blei et al., 2003)
Per-document (unigram) word counts
Latent Dirichlet Allocation(Blei et al., 2003)
Per-document (unigram) word counts
Count of word j in document i
j
i
Latent Dirichlet Allocation(Blei et al., 2003)
Per-document (latent) topic usage
Per-document (unigram) word counts
Per-topic word usage
Count of word j in document i
j
i
K topics
Latent Dirichlet Allocation(Blei et al., 2003)
Per-document (latent) topic usage
Per-document (unigram) word counts
Per-topic word usage
~ Multinomial ~ Dirichlet ~ Dirichlet
(regularize/place priors)
Count of word j in document i
j
i
K topics
Latent Dirichlet Allocation(Blei et al., 2003)
Per-document
(latent) topic usage
Per-document (unigram) word counts
Per-topic word usage
Latent Dirichlet Allocation(Blei et al., 2003)
Per-document
(latent) topic usage
Per-document (unigram) word counts
Per-topic word usage
d
Latent Dirichlet Allocation(Blei et al., 2003)
Per-document
(latent) topic usage
Per-document (unigram) word counts
Per-topic word usage
d
Latent Dirichlet Allocation(Blei et al., 2003)
Per-document
(latent) topic usage
Per-document (unigram) word counts
Per-topic word usage
d
Latent Dirichlet Allocation(Blei et al., 2003)
Per-document
(latent) topic usage
Per-document (unigram) word counts
Per-topic word usage
d
Latent Dirichlet Allocation(Blei et al., 2003)
Per-document
(latent) topic usage
Per-document (unigram) word counts
Per-topic word usage
d
Variational Inference: LDirA
Topic usage
Per-document (unigram) word counts
Topic words
p: True model
𝜙𝜙𝑘𝑘 ∼ Dirichlet(𝜷𝜷)𝑤𝑤(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜙𝜙𝑧𝑧 𝑑𝑑,𝑛𝑛 )
𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜶𝜶)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜃𝜃(𝑑𝑑))
Variational Inference: LDirA
Topic usage
Per-document (unigram) word counts
Topic words
p: True model q: Mean-field approximation
𝜙𝜙𝑘𝑘 ∼ Dirichlet(𝜷𝜷)𝑤𝑤(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜙𝜙𝑧𝑧 𝑑𝑑,𝑛𝑛 )
𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜶𝜶)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜃𝜃(𝑑𝑑))
𝜙𝜙𝑘𝑘 ∼ Dirichlet(𝝀𝝀𝒌𝒌)
𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜸𝜸𝒅𝒅)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜓𝜓(𝑑𝑑,𝑛𝑛))
Variational Inference: A Gradient-Based Optimization Technique
Set t = 0Pick a starting value λtLetF(q(•;λt)) = KL[q(•;λt) || p(•)]
Until converged:1. Get value y t = F(q(•;λt))2. Get gradient g t = F’(q(•;λt))3. Get scaling factor ρ t4. Set λt+1 = λt + ρt*g t5. Set t += 1
Variational Inference: LDirA
𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜶𝜶)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜃𝜃(𝑑𝑑))
𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜸𝜸𝒅𝒅)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜓𝜓(𝑑𝑑,𝑛𝑛))
p: True model q: Mean-field approximation
𝔼𝔼𝑞𝑞(𝜃𝜃(𝑑𝑑)) log𝑝𝑝 𝜃𝜃(𝑑𝑑) | 𝛼𝛼
Variational Inference: LDirA
𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜶𝜶)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜃𝜃(𝑑𝑑))
𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜸𝜸𝒅𝒅)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜓𝜓(𝑑𝑑,𝑛𝑛))
p: True model q: Mean-field approximation
𝔼𝔼𝑞𝑞(𝜃𝜃(𝑑𝑑)) log𝑝𝑝 𝜃𝜃(𝑑𝑑) | 𝛼𝛼 =
𝔼𝔼𝑞𝑞(𝜃𝜃(𝑑𝑑)) 𝛼𝛼 − 1 𝑇𝑇 log𝜃𝜃(𝑑𝑑) + 𝐶𝐶
exponential family form of Dirichlet
𝑝𝑝 𝜃𝜃 =Γ(∑𝑘𝑘 𝛼𝛼𝑘𝑘)∏𝑘𝑘 Γ 𝛼𝛼𝑘𝑘
�𝑘𝑘
𝜃𝜃𝑘𝑘𝛼𝛼𝑘𝑘−1
params = 𝛼𝛼𝑘𝑘 − 1 𝑘𝑘suff. stats.= log𝜃𝜃𝑘𝑘 𝑘𝑘
Variational Inference: LDirA
𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜶𝜶)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜃𝜃(𝑑𝑑))
𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜸𝜸𝒅𝒅)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜓𝜓(𝑑𝑑,𝑛𝑛))
p: True model q: Mean-field approximation
𝔼𝔼𝑞𝑞(𝜃𝜃(𝑑𝑑)) log𝑝𝑝 𝜃𝜃(𝑑𝑑) | 𝛼𝛼 =
𝔼𝔼𝑞𝑞(𝜃𝜃(𝑑𝑑)) 𝛼𝛼 − 1 𝑇𝑇 log𝜃𝜃(𝑑𝑑) + 𝐶𝐶
expectation of sufficient statistics of q distribution
params = 𝛾𝛾𝑘𝑘 − 1 𝑘𝑘
suff. stats. = log𝜃𝜃𝑘𝑘 𝑘𝑘
Variational Inference: LDirA
𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜶𝜶)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜃𝜃(𝑑𝑑))
𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜸𝜸𝒅𝒅)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜓𝜓(𝑑𝑑,𝑛𝑛))
p: True model q: Mean-field approximation
𝔼𝔼𝑞𝑞(𝜃𝜃(𝑑𝑑)) log𝑝𝑝 𝜃𝜃(𝑑𝑑) | 𝛼𝛼 =
𝔼𝔼𝑞𝑞(𝜃𝜃(𝑑𝑑)) 𝛼𝛼 − 1 𝑇𝑇 log𝜃𝜃(𝑑𝑑) + 𝐶𝐶 =expectation of the
sufficient statistics is the gradient of the
log normalizer
𝛼𝛼 − 1 𝑇𝑇𝔼𝔼𝑞𝑞(𝜃𝜃(𝑑𝑑)) log𝜃𝜃(𝑑𝑑) + 𝐶𝐶
Variational Inference: LDirA
𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜶𝜶)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜃𝜃(𝑑𝑑))
𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜸𝜸𝒅𝒅)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜓𝜓(𝑑𝑑,𝑛𝑛))
p: True model q: Mean-field approximation
𝔼𝔼𝑞𝑞(𝜃𝜃(𝑑𝑑)) log𝑝𝑝 𝜃𝜃(𝑑𝑑) | 𝛼𝛼 =
𝔼𝔼𝑞𝑞(𝜃𝜃(𝑑𝑑)) 𝛼𝛼 − 1 𝑇𝑇 log𝜃𝜃(𝑑𝑑) + 𝐶𝐶 =expectation of the
sufficient statistics is the gradient of the
log normalizer
𝛼𝛼 − 1 𝑇𝑇𝛻𝛻𝛾𝛾𝑑𝑑𝐴𝐴 𝛾𝛾𝑑𝑑 − 1 + 𝐶𝐶
Variational Inference: LDirA
𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜶𝜶)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜃𝜃(𝑑𝑑))
𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜸𝜸𝒅𝒅)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜓𝜓(𝑑𝑑,𝑛𝑛))
p: True model q: Mean-field approximation
𝔼𝔼𝑞𝑞(𝜃𝜃(𝑑𝑑)) log𝑝𝑝 𝜃𝜃(𝑑𝑑) | 𝛼𝛼 = 𝛼𝛼 − 1 𝑇𝑇𝛻𝛻𝛾𝛾𝑑𝑑𝐴𝐴 𝛾𝛾𝑑𝑑 − 1 + 𝐶𝐶
ℒ �𝛾𝛾𝑑𝑑
= 𝛼𝛼 − 1 𝑇𝑇𝛻𝛻𝛾𝛾𝑑𝑑𝐴𝐴 𝛾𝛾𝑑𝑑 − 1 + 𝑀𝑀 𝛾𝛾𝑑𝑑there’s more math
to do!
Variational Inference: A Gradient-Based Optimization Technique
Set t = 0Pick a starting value λtLetF(q(•;λt)) = KL[q(•;λt) || p(•)]
Until converged:1. Get value y t = F(q(•;λt))2. Get gradient g t = F’(q(•;λt))3. Get scaling factor ρ t4. Set λt+1 = λt + ρt*g t5. Set t += 1
Variational Inference: LDirA
𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜶𝜶)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜃𝜃(𝑑𝑑))
𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜸𝜸𝒅𝒅)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜓𝜓(𝑑𝑑,𝑛𝑛))
p: True model q: Mean-field approximation
ℒ �𝛾𝛾𝑑𝑑
= 𝛼𝛼 − 1 𝑇𝑇𝛻𝛻𝛾𝛾𝑑𝑑𝐴𝐴 𝛾𝛾𝑑𝑑 − 1 + 𝑀𝑀 𝛾𝛾𝑑𝑑
𝛻𝛻𝛾𝛾𝑑𝑑ℒ �𝛾𝛾𝑑𝑑= 𝛼𝛼 − 1 𝑇𝑇𝛻𝛻𝛾𝛾𝑑𝑑
2 𝐴𝐴 𝛾𝛾𝑑𝑑 − 1 + 𝛻𝛻𝛾𝛾𝑑𝑑𝑀𝑀 𝛾𝛾𝑑𝑑