76
Approximate Inference: Variational Inference CMSC 678 UMBC

Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Approximate Inference:Variational Inference

CMSC 678UMBC

Page 2: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Outline

Recap of graphical models & belief propagation

Posterior inference (Bayesian perspective)

Math: exponential family distributions

Variational InferenceBasic TechniqueExample: Topic Models

Page 3: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Recap from last time…

Page 4: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Graphical Models

𝑝𝑝 𝑥𝑥1, 𝑥𝑥2, 𝑥𝑥3, … , 𝑥𝑥𝑁𝑁 = �𝑖𝑖

𝑝𝑝 𝑥𝑥𝑖𝑖 𝜋𝜋(𝑥𝑥𝑖𝑖))

Directed Models (Bayesian networks)

Undirected Models (Markov random fields)

𝑝𝑝 𝑥𝑥1, 𝑥𝑥2, 𝑥𝑥3, … , 𝑥𝑥𝑁𝑁 =1𝑍𝑍�𝐶𝐶

𝜓𝜓𝐶𝐶 𝑥𝑥𝑐𝑐

Page 5: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Markov Blanket

x

Markov blanket of a node x is its parents, children, and

children's parents

𝑝𝑝 𝑥𝑥𝑖𝑖 𝑥𝑥𝑗𝑗≠𝑖𝑖 =𝑝𝑝(𝑥𝑥1, … , 𝑥𝑥𝑁𝑁)

∫ 𝑝𝑝 𝑥𝑥1, … , 𝑥𝑥𝑁𝑁 𝑑𝑑𝑥𝑥𝑖𝑖

=∏𝑘𝑘 𝑝𝑝(𝑥𝑥𝑘𝑘|𝜋𝜋 𝑥𝑥𝑘𝑘 )

∫ ∏𝑘𝑘 𝑝𝑝 𝑥𝑥𝑘𝑘 𝜋𝜋 𝑥𝑥𝑘𝑘 )𝑑𝑑𝑥𝑥𝑖𝑖factor out terms not dependent on xi

factorization of graph

=∏𝑘𝑘:𝑘𝑘=𝑖𝑖 or 𝑖𝑖∈𝜋𝜋 𝑥𝑥𝑘𝑘 𝑝𝑝(𝑥𝑥𝑘𝑘|𝜋𝜋 𝑥𝑥𝑘𝑘 )

∫ ∏𝑘𝑘:𝑘𝑘=𝑖𝑖 or 𝑖𝑖∈𝜋𝜋 𝑥𝑥𝑘𝑘 𝑝𝑝 𝑥𝑥𝑘𝑘 𝜋𝜋 𝑥𝑥𝑘𝑘 )𝑑𝑑𝑥𝑥𝑖𝑖

the set of nodes needed to form the complete conditional for a variable xi

Page 6: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Markov Random Fields withFactor Graph Notation

x: original pixel/state

y: observed (noisy)

pixel/state

factor nodes are added

according to maximal cliques

unaryfactor

variable

factor graphs are bipartite

binaryfactor

Page 7: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Two Problems for Undirected Models

Finding the normalizer

𝑍𝑍 = �𝑥𝑥

�𝑐𝑐

𝜓𝜓𝑐𝑐(𝑥𝑥𝑐𝑐)

Computing the marginals

𝑍𝑍𝑛𝑛(𝑣𝑣) = �𝑥𝑥:𝑥𝑥𝑛𝑛=𝑣𝑣

�𝑐𝑐

𝜓𝜓𝑐𝑐(𝑥𝑥𝑐𝑐)Q: Why are these difficult?

A: Many different combinations

Sum over all variable combinations, with the xn

coordinate fixed

𝑍𝑍2(𝑣𝑣) = �𝑥𝑥1

�𝑥𝑥3

�𝑐𝑐

𝜓𝜓𝑐𝑐(𝑥𝑥 = 𝑥𝑥1, 𝑣𝑣, 𝑥𝑥3 )

Example: 3 variables, fix the

2nd dimensionBelief propagation algorithms

• sum-product (forward-backward in HMMs)

• max-product/max-sum (Viterbi)

Page 8: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Sum-ProductFrom variables to factors

𝑞𝑞𝑛𝑛→𝑚𝑚 𝑥𝑥𝑛𝑛 = �𝑚𝑚′∈𝑀𝑀(𝑛𝑛)\𝑚𝑚

𝑟𝑟𝑚𝑚′→𝑛𝑛 𝑥𝑥𝑛𝑛

From factors to variables

𝑟𝑟𝑚𝑚→𝑛𝑛 𝑥𝑥𝑛𝑛= �

𝒘𝒘𝑚𝑚\𝑛𝑛

𝑓𝑓𝑚𝑚 𝒘𝒘𝑚𝑚 �𝑛𝑛′∈𝑁𝑁(𝑚𝑚)\𝑛𝑛

𝑞𝑞𝑛𝑛′→𝑚𝑚(𝑥𝑥𝑛𝑛𝑛)

n

m

n

m

set of variables that the mth factor depends on

set of factors in which variable n participates

sum over configuration of variables for the mth factor,

with variable n fixed

default value of 1 if empty product

Page 9: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Outline

Recap of graphical models & belief propagation

Posterior inference (Bayesian perspective)

Math: exponential family distributions

Variational InferenceBasic TechniqueExample: Topic Models

Page 10: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Goal: Posterior Inference

Hyperparameters αUnknown parameters ΘData:

Likelihood model:

p( | Θ )

pα( Θ | )

we’re going to be Bayesian (perform Bayesian inference)

Page 11: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Posterior Classification vs.Posterior Inference

“Frequentist” methods

prior over labels (maybe), not weights

Bayesian methods

Θ includes weight parameters

pα( Θ | )pα,w ( y| )

Page 12: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

(Some) Learning Techniques

MAP/MLE: Point estimation, basic EM

Variational Inference: Functional Optimization

Sampling/Monte Carlo

today

next class

what we’ve already covered

Page 13: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Outline

Recap of graphical models & belief propagation

Posterior inference (Bayesian perspective)

Math: exponential family distributions

Variational InferenceBasic TechniqueExample: Topic Models

Page 14: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Exponential Family Form

Page 15: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Exponential Family Form

Support function• Formally necessary, in practice

irrelevant

Page 16: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Exponential Family Form

Distribution Parameters• Natural parameters• Feature weights

Page 17: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Exponential Family Form

Feature function(s)• Sufficient statistics

Page 18: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Exponential Family Form

Log-normalizer

Page 19: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Exponential Family Form

Log-normalizer

Page 20: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Why? Capture Common Distributions

Discrete (Finite distributions)

Page 21: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Why? Capture Common Distributions

• Gaussian

https://kanbanize.com/blog/wp-content/uploads/2014/07/Standard_deviation_diagram.png

Page 22: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Why? Capture Common Distributions

Dirichlet (Distributions over (finite) distributions)

Page 23: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Why? Capture Common Distributions

Discrete (Finite distributions)

Dirichlet (Distributions over (finite) distributions)

Gaussian

Gamma, Exponential, Poisson, Negative-Binomial, Laplace, log-Normal,…

Page 24: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Why? “Easy” Gradients

Observed feature countsCount w.r.t. empirical distribution

Expected feature countsCount w.r.t. current model parameters

(we’ve already seen this with maxent models)

Page 25: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Why? “Easy” Expectations

expectation of the sufficient

statistics

gradient of the log normalizer

Page 26: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Why? “Easy” Posterior Inference

Page 27: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Why? “Easy” Posterior Inference

p is the conjugate prior for q

Page 28: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Why? “Easy” Posterior Inference

p is the conjugate prior for q

Posterior p has same form as prior p

Page 29: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Why? “Easy” Posterior Inference

p is the conjugate prior for q

Posterior p has same form as prior p

All exponential family models have a conjugate prior (in theory)

Page 30: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Why? “Easy” Posterior Inference

p is the conjugate prior for q

Posterior p has same form as prior p

Posterior Likelihood Prior

Dirichlet (Beta) Discrete (Bernoulli) Dirichlet (Beta)

Normal Normal (fixed var.) Normal

Gamma Exponential Gamma

Page 31: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Outline

Recap of graphical models & belief propagation

Posterior inference (Bayesian perspective)

Math: exponential family distributions

Variational InferenceBasic TechniqueExample: Topic Models

Page 32: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Goal: Posterior Inference

Hyperparameters αUnknown parameters ΘData:

Likelihood model:

p( | Θ )

pα( Θ | )

Page 33: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

(Some) Learning Techniques

MAP/MLE: Point estimation, basic EM

Variational Inference: Functional Optimization

Sampling/Monte Carlo

today

next class

what we’ve already covered

Page 34: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference

Difficult to compute

Page 35: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference

Difficult to compute

Minimize the “difference”

by changing λ

Easy(ier) to compute

q(θ): controlled by parameters λ

Page 36: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference

Difficult to compute

Easy(ier) to compute

Minimize the “difference”

by changing λ

Page 37: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference: A Gradient-Based Optimization Technique

Set t = 0Pick a starting value λt

Until converged:1. Get value y t = F(q(•;λt))2. Get gradient g t = F’(q(•;λt))3. Get scaling factor ρ t4. Set λt+1 = λt + ρt*g t5. Set t += 1

Page 38: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference: A Gradient-Based Optimization Technique

Set t = 0Pick a starting value λt

Until converged:1. Get value y t = F(q(•;λt))2. Get gradient g t = F’(q(•;λt))3. Get scaling factor ρ t4. Set λt+1 = λt + ρt*g t5. Set t += 1

Page 39: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference:The Function to Optimize

Posterior of desired model

Any easy-to-compute distribution

Page 40: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference:The Function to Optimize

Posterior of desired model

Any easy-to-compute distribution

Find the best distribution (calculus of variations)

Page 41: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference:The Function to Optimize

Find the best distribution

Parameters for desired model

Page 42: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference:The Function to Optimize

Find the best distribution

Variational parameters for θ

Parameters for desired model

Page 43: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference:The Function to Optimize

Find the best distribution

Variationalparameters for θ

Parameters for desired model

KL-Divergence (expectation)

DKL 𝑞𝑞 𝜃𝜃 || 𝑝𝑝(𝜃𝜃|𝑥𝑥) =

𝔼𝔼𝑞𝑞 𝜃𝜃 log𝑞𝑞 𝜃𝜃𝑝𝑝(𝜃𝜃|𝑥𝑥)

Page 44: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference

Find the best distribution

Variational parameters for θ

Parameters for desired model

Page 45: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Exponential Family Recap: “Easy” Expectations

Exponential Family Recap: “Easy” Posterior Inference

p is the conjugate prior for π

Page 46: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference

Find the best distribution

When p and q are the same exponential family form, the variational update q(θ) is (often) computable (in closed form)

Page 47: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference: A Gradient-Based Optimization Technique

Set t = 0Pick a starting value λtLetF(q(•;λt)) = KL[q(•;λt) || p(•)]

Until converged:1. Get value y t = F(q(•;λt))2. Get gradient g t = F’(q(•;λt))3. Get scaling factor ρ t4. Set λt+1 = λt + ρt*g t5. Set t += 1

Page 48: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference:Maximization or Minimization?

Page 49: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Evidence Lower Bound (ELBO)

log𝑝𝑝 𝑥𝑥 = log∫ 𝑝𝑝 𝑥𝑥,𝜃𝜃 𝑑𝑑𝜃𝜃

Page 50: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Evidence Lower Bound (ELBO)

log𝑝𝑝 𝑥𝑥 = log∫ 𝑝𝑝 𝑥𝑥,𝜃𝜃 𝑑𝑑𝜃𝜃

= log∫ 𝑝𝑝 𝑥𝑥,𝜃𝜃𝑞𝑞 𝜃𝜃𝑞𝑞(𝜃𝜃)

𝑑𝑑𝜃𝜃

Page 51: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Evidence Lower Bound (ELBO)

log𝑝𝑝 𝑥𝑥 = log∫ 𝑝𝑝 𝑥𝑥,𝜃𝜃 𝑑𝑑𝜃𝜃

= log∫ 𝑝𝑝 𝑥𝑥,𝜃𝜃𝑞𝑞 𝜃𝜃𝑞𝑞(𝜃𝜃)

𝑑𝑑𝜃𝜃

= log𝔼𝔼𝑞𝑞 𝜃𝜃𝑝𝑝 𝑥𝑥,𝜃𝜃𝑞𝑞 𝜃𝜃

Page 52: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Evidence Lower Bound (ELBO)

log𝑝𝑝 𝑥𝑥 = log∫ 𝑝𝑝 𝑥𝑥,𝜃𝜃 𝑑𝑑𝜃𝜃

= log∫ 𝑝𝑝 𝑥𝑥,𝜃𝜃𝑞𝑞 𝜃𝜃𝑞𝑞(𝜃𝜃)

𝑑𝑑𝜃𝜃

= log𝔼𝔼𝑞𝑞 𝜃𝜃𝑝𝑝 𝑥𝑥,𝜃𝜃𝑞𝑞 𝜃𝜃

≥ 𝔼𝔼𝑞𝑞 𝜃𝜃 𝑝𝑝 𝑥𝑥,𝜃𝜃 − 𝔼𝔼𝑞𝑞 𝜃𝜃 𝑞𝑞 𝜃𝜃= ℒ(𝑞𝑞)

Page 53: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Outline

Recap of graphical models & belief propagation

Posterior inference (Bayesian perspective)

Math: exponential family distributions

Variational InferenceBasic TechniqueExample: Topic Models

Page 54: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Bag-of-Items Models

Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junindepartment, central Peruvian mountain region . …

p( ) Three: 1,people: 2,attack: 2,

…p( )=Unigram counts

Page 55: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Bag-of-Items Models

Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junindepartment, central Peruvian mountain region . …

p( ) Three: 1,people: 2,attack: 2,

…pφ,ω( )=

Unigram counts

Global (corpus-level) parameters interact with local (document-level) parameters

Page 56: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Latent Dirichlet Allocation(Blei et al., 2003)

Per-document (unigram) word counts

Page 57: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Latent Dirichlet Allocation(Blei et al., 2003)

Per-document (unigram) word counts

Count of word j in document i

j

i

Page 58: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Latent Dirichlet Allocation(Blei et al., 2003)

Per-document (latent) topic usage

Per-document (unigram) word counts

Per-topic word usage

Count of word j in document i

j

i

K topics

Page 59: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Latent Dirichlet Allocation(Blei et al., 2003)

Per-document (latent) topic usage

Per-document (unigram) word counts

Per-topic word usage

~ Multinomial ~ Dirichlet ~ Dirichlet

(regularize/place priors)

Count of word j in document i

j

i

K topics

Page 60: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Latent Dirichlet Allocation(Blei et al., 2003)

Per-document

(latent) topic usage

Per-document (unigram) word counts

Per-topic word usage

Page 61: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Latent Dirichlet Allocation(Blei et al., 2003)

Per-document

(latent) topic usage

Per-document (unigram) word counts

Per-topic word usage

d

Page 62: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Latent Dirichlet Allocation(Blei et al., 2003)

Per-document

(latent) topic usage

Per-document (unigram) word counts

Per-topic word usage

d

Page 63: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Latent Dirichlet Allocation(Blei et al., 2003)

Per-document

(latent) topic usage

Per-document (unigram) word counts

Per-topic word usage

d

Page 64: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Latent Dirichlet Allocation(Blei et al., 2003)

Per-document

(latent) topic usage

Per-document (unigram) word counts

Per-topic word usage

d

Page 65: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Latent Dirichlet Allocation(Blei et al., 2003)

Per-document

(latent) topic usage

Per-document (unigram) word counts

Per-topic word usage

d

Page 66: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference: LDirA

Topic usage

Per-document (unigram) word counts

Topic words

p: True model

𝜙𝜙𝑘𝑘 ∼ Dirichlet(𝜷𝜷)𝑤𝑤(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜙𝜙𝑧𝑧 𝑑𝑑,𝑛𝑛 )

𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜶𝜶)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜃𝜃(𝑑𝑑))

Page 67: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference: LDirA

Topic usage

Per-document (unigram) word counts

Topic words

p: True model q: Mean-field approximation

𝜙𝜙𝑘𝑘 ∼ Dirichlet(𝜷𝜷)𝑤𝑤(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜙𝜙𝑧𝑧 𝑑𝑑,𝑛𝑛 )

𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜶𝜶)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜃𝜃(𝑑𝑑))

𝜙𝜙𝑘𝑘 ∼ Dirichlet(𝝀𝝀𝒌𝒌)

𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜸𝜸𝒅𝒅)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜓𝜓(𝑑𝑑,𝑛𝑛))

Page 68: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference: A Gradient-Based Optimization Technique

Set t = 0Pick a starting value λtLetF(q(•;λt)) = KL[q(•;λt) || p(•)]

Until converged:1. Get value y t = F(q(•;λt))2. Get gradient g t = F’(q(•;λt))3. Get scaling factor ρ t4. Set λt+1 = λt + ρt*g t5. Set t += 1

Page 69: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference: LDirA

𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜶𝜶)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜃𝜃(𝑑𝑑))

𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜸𝜸𝒅𝒅)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜓𝜓(𝑑𝑑,𝑛𝑛))

p: True model q: Mean-field approximation

𝔼𝔼𝑞𝑞(𝜃𝜃(𝑑𝑑)) log𝑝𝑝 𝜃𝜃(𝑑𝑑) | 𝛼𝛼

Page 70: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference: LDirA

𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜶𝜶)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜃𝜃(𝑑𝑑))

𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜸𝜸𝒅𝒅)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜓𝜓(𝑑𝑑,𝑛𝑛))

p: True model q: Mean-field approximation

𝔼𝔼𝑞𝑞(𝜃𝜃(𝑑𝑑)) log𝑝𝑝 𝜃𝜃(𝑑𝑑) | 𝛼𝛼 =

𝔼𝔼𝑞𝑞(𝜃𝜃(𝑑𝑑)) 𝛼𝛼 − 1 𝑇𝑇 log𝜃𝜃(𝑑𝑑) + 𝐶𝐶

exponential family form of Dirichlet

𝑝𝑝 𝜃𝜃 =Γ(∑𝑘𝑘 𝛼𝛼𝑘𝑘)∏𝑘𝑘 Γ 𝛼𝛼𝑘𝑘

�𝑘𝑘

𝜃𝜃𝑘𝑘𝛼𝛼𝑘𝑘−1

params = 𝛼𝛼𝑘𝑘 − 1 𝑘𝑘suff. stats.= log𝜃𝜃𝑘𝑘 𝑘𝑘

Page 71: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference: LDirA

𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜶𝜶)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜃𝜃(𝑑𝑑))

𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜸𝜸𝒅𝒅)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜓𝜓(𝑑𝑑,𝑛𝑛))

p: True model q: Mean-field approximation

𝔼𝔼𝑞𝑞(𝜃𝜃(𝑑𝑑)) log𝑝𝑝 𝜃𝜃(𝑑𝑑) | 𝛼𝛼 =

𝔼𝔼𝑞𝑞(𝜃𝜃(𝑑𝑑)) 𝛼𝛼 − 1 𝑇𝑇 log𝜃𝜃(𝑑𝑑) + 𝐶𝐶

expectation of sufficient statistics of q distribution

params = 𝛾𝛾𝑘𝑘 − 1 𝑘𝑘

suff. stats. = log𝜃𝜃𝑘𝑘 𝑘𝑘

Page 72: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference: LDirA

𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜶𝜶)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜃𝜃(𝑑𝑑))

𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜸𝜸𝒅𝒅)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜓𝜓(𝑑𝑑,𝑛𝑛))

p: True model q: Mean-field approximation

𝔼𝔼𝑞𝑞(𝜃𝜃(𝑑𝑑)) log𝑝𝑝 𝜃𝜃(𝑑𝑑) | 𝛼𝛼 =

𝔼𝔼𝑞𝑞(𝜃𝜃(𝑑𝑑)) 𝛼𝛼 − 1 𝑇𝑇 log𝜃𝜃(𝑑𝑑) + 𝐶𝐶 =expectation of the

sufficient statistics is the gradient of the

log normalizer

𝛼𝛼 − 1 𝑇𝑇𝔼𝔼𝑞𝑞(𝜃𝜃(𝑑𝑑)) log𝜃𝜃(𝑑𝑑) + 𝐶𝐶

Page 73: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference: LDirA

𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜶𝜶)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜃𝜃(𝑑𝑑))

𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜸𝜸𝒅𝒅)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜓𝜓(𝑑𝑑,𝑛𝑛))

p: True model q: Mean-field approximation

𝔼𝔼𝑞𝑞(𝜃𝜃(𝑑𝑑)) log𝑝𝑝 𝜃𝜃(𝑑𝑑) | 𝛼𝛼 =

𝔼𝔼𝑞𝑞(𝜃𝜃(𝑑𝑑)) 𝛼𝛼 − 1 𝑇𝑇 log𝜃𝜃(𝑑𝑑) + 𝐶𝐶 =expectation of the

sufficient statistics is the gradient of the

log normalizer

𝛼𝛼 − 1 𝑇𝑇𝛻𝛻𝛾𝛾𝑑𝑑𝐴𝐴 𝛾𝛾𝑑𝑑 − 1 + 𝐶𝐶

Page 74: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference: LDirA

𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜶𝜶)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜃𝜃(𝑑𝑑))

𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜸𝜸𝒅𝒅)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜓𝜓(𝑑𝑑,𝑛𝑛))

p: True model q: Mean-field approximation

𝔼𝔼𝑞𝑞(𝜃𝜃(𝑑𝑑)) log𝑝𝑝 𝜃𝜃(𝑑𝑑) | 𝛼𝛼 = 𝛼𝛼 − 1 𝑇𝑇𝛻𝛻𝛾𝛾𝑑𝑑𝐴𝐴 𝛾𝛾𝑑𝑑 − 1 + 𝐶𝐶

ℒ �𝛾𝛾𝑑𝑑

= 𝛼𝛼 − 1 𝑇𝑇𝛻𝛻𝛾𝛾𝑑𝑑𝐴𝐴 𝛾𝛾𝑑𝑑 − 1 + 𝑀𝑀 𝛾𝛾𝑑𝑑there’s more math

to do!

Page 75: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference: A Gradient-Based Optimization Technique

Set t = 0Pick a starting value λtLetF(q(•;λt)) = KL[q(•;λt) || p(•)]

Until converged:1. Get value y t = F(q(•;λt))2. Get gradient g t = F’(q(•;λt))3. Get scaling factor ρ t4. Set λt+1 = λt + ρt*g t5. Set t += 1

Page 76: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference: LDirA

𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜶𝜶)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜃𝜃(𝑑𝑑))

𝜃𝜃(𝑑𝑑) ∼ Dirichlet(𝜸𝜸𝒅𝒅)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(𝜓𝜓(𝑑𝑑,𝑛𝑛))

p: True model q: Mean-field approximation

ℒ �𝛾𝛾𝑑𝑑

= 𝛼𝛼 − 1 𝑇𝑇𝛻𝛻𝛾𝛾𝑑𝑑𝐴𝐴 𝛾𝛾𝑑𝑑 − 1 + 𝑀𝑀 𝛾𝛾𝑑𝑑

𝛻𝛻𝛾𝛾𝑑𝑑ℒ �𝛾𝛾𝑑𝑑= 𝛼𝛼 − 1 𝑇𝑇𝛻𝛻𝛾𝛾𝑑𝑑

2 𝐴𝐴 𝛾𝛾𝑑𝑑 − 1 + 𝛻𝛻𝛾𝛾𝑑𝑑𝑀𝑀 𝛾𝛾𝑑𝑑