1
Weighted log-partition function Covering graph: • Weighted log-partition has derivatives where q(.) is defined by a chain rule: • Tightening the bound: • related to TRBP and reparameterization • Important, but largely unexplored Bounding the Partition Function Using Hölder’s Inequality Qiang Liu Alexander Ihler Department of Computer Science, University of California, Irvine Graphical models Markov random fields • Factorized form • Factors are associated with cliques of a graph G=(V,E) Task: calculate the partition function Z, or • Important: probability of evidence, parameter estimation • #P-complete in general graphs • Approximations and bounds are needed Variational methods Dual representation • Loopy belief propagation • Tree-reweighted belief propagation • Generalizes to hypertrees, GTRBP • Conditional entropy decomposition • Generalizes to weighted combinations of orders Comments • Relatively good bounds at convergence • Bound not guaranteed until convergence • Hard to choose weights & cliques; esp. for GTRBP, CED Mini-bucket elimination Bucket elimination (variable elimination) • Directly sum over the variables in sequence • Cost is exponential in the tree-width Mini-bucket elimination (MBE) approximates BE • Gives upper or lower bound Comments • Low accuracy for small clique sizes (ibound) • Single pass, non-iterative • Easy to implement with high ibound Splittin g ibound: controls clique size, & how much splitting is required (Distributive law) Hölder’s inequality Define the weighted (or power) sum: • has “zero-temperature” limits • Hölder’s inequality: • if some weights are negative, the bound reverses: Weighted mini-bucket (WMBE): • Same procedure as naïve MBE • sum/max bounds replaced with weighted sums • reduces to MBE if w! 0 + or 0 - How to choose the weights and split the parameters? original graph = = Duality results Dual form of the weighted log-partition function: µ-optimal bound is Comments: µ-optimal bound is equivalent to TRBP (or more generally CED) • More compact representation: • Fewer parameters (others held at optimal values) • Simple & efficient weight optimization Covering tree Spanning trees = 3X3 grid Experiments 10x10 Ising grids • random • mixed interactions A few iterations are usually good enough ibound is the most dominant factor Optimizing w can be better than optimizing θ Linkage analysis • from UAI2008 competition • 300-1000 nodes, treewidth 20-30 ibound=5 or (A natural extension of the log-partition function) pedigree13 ibound=15 Mini- bucket θ-optimized, one pass w-optimized, one pass Both fully optimized Timing comparisons (Wainwright & Jordan 08) (Yedidia et al. 04) (Wainwright et al. 05) (Globerson & Jaakkola 07) (Dechter & Rish 03)

Weighted log-partition function

  • Upload
    yates

  • View
    72

  • Download
    2

Embed Size (px)

DESCRIPTION

Bounding the Partition Function Using Hölder’s Inequality. Qiang Liu Alexander Ihler Department of Computer Science, University of California, Irvine. Duality results. Graphical models. H ö lder’s inequality. Markov random fields Factorized form - PowerPoint PPT Presentation

Citation preview

Page 1: Weighted log-partition function

Weighted log-partition functionCovering graph:

• Weighted log-partition

has derivatives

where q(.) is defined by a chain rule:

• Tightening the bound:

• related to TRBP and reparameterization

• Important, but largely unexplored

Bounding the Partition Function Using Hölder’s Inequality Qiang Liu Alexander Ihler

Department of Computer Science, University of California, Irvine

Graphical modelsMarkov random fields• Factorized form

• Factors are associated with cliques of a graph G=(V,E)

Task: calculate the partition function Z, or • Important: probability of evidence, parameter estimation• #P-complete in general graphs• Approximations and bounds are needed

Variational methodsDual representation

• Loopy belief propagation

• Tree-reweighted belief propagation

• Generalizes to hypertrees, GTRBP• Conditional entropy decomposition

• Generalizes to weighted combinations of ordersComments • Relatively good bounds at convergence• Bound not guaranteed until convergence• Hard to choose weights & cliques; esp. for GTRBP, CED

Mini-bucket eliminationBucket elimination (variable elimination)• Directly sum over the variables in sequence

• Cost is exponential in the tree-width

Mini-bucket elimination (MBE) approximates BE• Gives upper or lower bound

Comments • Low accuracy for small clique sizes (ibound)• Single pass, non-iterative• Easy to implement with high ibound

Splitting

ibound: controls clique size, & how much splitting is required

(Distributive law)

Hölder’s inequalityDefine the weighted (or power) sum:

• has “zero-temperature” limits

• Hölder’s inequality:

• if some weights are negative, the bound reverses:

Weighted mini-bucket (WMBE): • Same procedure as naïve MBE• sum/max bounds replaced with weighted sums• reduces to MBE if w! 0+ or 0-

How to choose the weights and split the parameters? original graph

=

=

Duality resultsDual form of the weighted log-partition function:

• µ-optimal bound is

Comments:• µ-optimal bound is equivalent to TRBP (or more generally CED)• More compact representation:

• Fewer parameters (others held at optimal values)• Simple & efficient weight optimization

Covering tree Spanning trees

=

3X3 grid

Experiments

10x10 Ising grids• random• mixed interactions

• A few iterations are usually good enough• ibound is the most dominant factor • Optimizing w can be better than optimizing θ

Linkage analysis• from UAI2008 competition• 300-1000 nodes, treewidth 20-30

ibound=5

or

(A natural extension of the log-partition function)

pedigree13

ibound=15

Mini-bucket θ-optimized, one pass

w-optimized, one pass Both fully optimizedTiming comparisons

(Wainwright & Jordan 08)

(Yedidia et al. 04)

(Wainwright et al. 05)

(Globerson & Jaakkola 07)

(Dechter & Rish 03)