1
Variational Algorithms for Marginal MAP Qiang Liu Alexander Ihler Department of Computer Science, University of California, Irvine Abstract Marginal MAP tasks seek an optimal configuration of the marginal distribution over a subset of variables. Marginal MAP can be computationally much harder than more common inference tasks. We show • a general variational framework for marginal MAP problems • analogues to Bethe, tree-reweighted, & mean-field approximations • novel upper bounds via the tree-reweighted free energy • “mixed” message passing and CCCP-based solvers • conditions for global or local optimality of the solutions • close connections to EM and variational EM approaches Variational Form Graphical Models Graphical models: • Factors & exponential family form • Factors are associated with cliques of a graph G=(V,E) Tasks: max (B) sum (A) H a r d e r Mixed inference problems can be hard even in trees, since • A-B trees extend notion of efficient structure to mixed inference • Ensure graph structure remains a tree during inference • Two example sub-types: sum max sum max Example from D. Koller and N. Friedman (2009) Mixed- Inference (marginal MAP, MAP) Sum-Inference (partition function, probability of evidence) Max-Inference (MAP, MPE) Variational Algorithms Sum- product Max- product Match max and sum max (B) sum (A) A ! A [ B B ! B B ! A Mixed-product message passing • start with “standard” weighted message passing • Generalize zero-temperature limit results of Weiss et al. (2007) • Apply limit directly to messages ( for Bethe, for TRW) • Match updates interpretable as a “local” marginal MAP problem • Mixed marginals satisfy a reparameterization property • Fixed points are locally optimal (similar to max-product results) • Convergence can be an issue Double-loop algorithms Decompose H into two parts H=H + - H - & iteratively linearize H - CCCP algorithm: take H + , H - to be convex • Can also take H + to be the Bethe approximation (non- convex) • Iteratively solve sum-product and apply truncation correction “Type 1” “Type 2” Connections to EM • Restrict to the mean-field like product subspace • Coordinate-wise updates = in the primal: • Reformulate inference as a distributional optimization problem • Define and Sum-Inference Max-Inference Mixed-Inference Sum-inference: Mixed-inference: (with equality when q=p) (with equality when q = p(A|B) 1(B=B*) or similar) This w ork Variational Approximations Bethe approximation (exact on A-B tree) • “Truncated” free energy Tree-reweighted approximation (convex comb. of A-B trees) • Dual in terms of edge appearances Experiments Chain graphs G A is a tree • TRW1: type-1 only • TRW2: ½ type-1, ½ type 2 • Bethe: most accurate • EM: stuck quickly (2-3 iter.) Grid graphs • Attractive or mixed potentials G A has cycles • Similar trends Attractive Mixed % correct solutions Energy relative error

Variational Algorithms for Marginal MAP

  • Upload
    prue

  • View
    52

  • Download
    0

Embed Size (px)

DESCRIPTION

Variational Algorithms for Marginal MAP. Qiang Liu Alexander Ihler Department of Computer Science, University of California, Irvine. Variational Approximations. Abstract. - PowerPoint PPT Presentation

Citation preview

Page 1: Variational  Algorithms for Marginal MAP

Variational Algorithms for Marginal MAP Qiang Liu Alexander Ihler

Department of Computer Science, University of California, Irvine

AbstractMarginal MAP tasks seek an optimal configuration of the marginal distribution over a subset of variables. Marginal MAP can be computationally much harder than more common inference tasks.

We show• a general variational framework for marginal MAP problems• analogues to Bethe, tree-reweighted, & mean-field approximations• novel upper bounds via the tree-reweighted free energy• “mixed” message passing and CCCP-based solvers• conditions for global or local optimality of the solutions• close connections to EM and variational EM approaches

Variational Form

Graphical ModelsGraphical models:• Factors & exponential family form

• Factors are associated with cliques of a graph G=(V,E)

Tasks: max (B) sum (A)

Harder

Mixed inference problems can be hard even in trees, since

• A-B trees extend notion of efficient structure to mixed inference• Ensure graph structure remains a tree during inference• Two example sub-types:

sum

max

sum

max

Example from D. Koller and N. Friedman (2009)

Mixed-Inference (marginal MAP, MAP)

Sum-Inference (partition function, probability of evidence)

Max-Inference (MAP, MPE)

Variational Algorithms

Sum- product

Max- product

Match max and sum

max (B) sum (A)A ! A [ B

B ! B

B ! A

Mixed-product message passing• start with “standard” weighted message passing• Generalize zero-temperature limit results of Weiss et al. (2007)• Apply limit directly to messages ( for Bethe, for TRW)

• Match updates interpretable as a “local” marginal MAP problem• Mixed marginals satisfy a reparameterization property• Fixed points are locally optimal (similar to max-product results)• Convergence can be an issue

Double-loop algorithms• Decompose H into two parts H=H+ - H- & iteratively linearize H-

• CCCP algorithm: take H+, H- to be convex• Can also take H+ to be the Bethe approximation (non-convex)• Iteratively solve sum-product and apply truncation correction

“Type 1” “Type 2”

Connections to EM• Restrict to the mean-field like product subspace

• Coordinate-wise updates = in the primal:

• Reformulate inference as a distributional optimization problem

• Define and

Sum-Inference

Max-Inference

Mixed-Inference

Sum-inference: Mixed-inference:

(with equality when q=p) (with equality when q = p(A|B) 1(B=B*) or similar)

This work

Variational ApproximationsBethe approximation (exact on A-B tree)• “Truncated” free energy

Tree-reweighted approximation (convex comb. of A-B trees)• Dual in terms of edge appearances

ExperimentsChain graphs• GA is a tree• TRW1: type-1 only• TRW2: ½ type-1, ½ type 2• Bethe: most accurate• EM: stuck quickly (2-3 iter.)

Grid graphs• Attractive or mixed potentials • GA has cycles• Similar trends

Attractive Mixed

% correct solutions Energy relative error