Should We Believe in Numbers Computed by Loopy Belief …people.csail.mit.edu/andyd/CIOG_slides/vontobel_talk_ciog2011.pdf · CIOG Workshop, Princeton University, November 3, 2011

Should We Believe

in Numbers Computed

by Loopy Belief Propagation?Pascal O. VontobelInformation Theory Research GroupHewlett-Packard Laboratories Palo Alto

CIOG Workshop, Princeton University, November 3, 2011

c© 2011 Hewlett-Packard Development Company, L.P.

The information contained herein is subject to change without notice

Chess Board

Chess Board

Question: in how many ways can we place 8 non-attacking rooks

on a chess board?

Chess Board

Row condition: exactly one rook per row.

Chess Board

Column condition: exactly one rook per column.

Chess Board


on a chess board?

Chess Board

Chess Board

Chess Board


on this modified chess board?

Chess Board

Chess Board

Chess Board


on this modified chess board?

Sudoku

Sudoku

1

9 5 2

3 7 4

6 8 1

8 1 6

2 3 9

5 4 7

7 9 8

4 3

5

6

2

4 6 3

1 2 8

7 5 9

9 3 7

5 8 4

1 2

2 4 1

8 7 5

3 9 6

6

7

6

3

4

1

9

5

2

8

1

5

4

2

6

8

3

9

7

8

9

2

5

7

3

1

4

6

Sudoku

1

9 5 2

3 7 4

6 8 1

8 1 6

2 3 9

5 4 7

7 9 8

4 3

5

6

2

4 6 3

1 2 8

7 5 9

9 3 7

5 8 4

1 2

2 4 1

8 7 5

3 9 6

6

7

6

3

4

1

9

5

2

8

1

5

4

2

6

8

3

9

7

8

9

2

5

7

3

1

4

6

Question: how many Sudoku arrays are there?

(More technically: how many valid configurations are there?)

Sudoku

1

9 5 2

3 7 4

6 8 1

8 1 6

2 3 9

5 4 7

7 9 8

4 3

5

6

2

4 6 3

1 2 8

7 5 9

9 3 7

5 8 4

1 2

2 4 1

8 7 5

3 9 6

6

7

6

3

4

1

9

5

2

8

1

5

4

2

6

8

3

9

7

8

9

2

5

7

3

1

4

6

Row condition: numbers 1, . . . , 9 appear exactly once.

Sudoku

1

9 5 2

3 7 4

6 8 1

8 1 6

2 3 9

5 4 7

7 9 8

4 3

5

6

2

4 6 3

1 2 8

7 5 9

9 3 7

5 8 4

1 2

2 4 1

8 7 5

3 9 6

6

7

6

3

4

1

9

5

2

8

1

5

4

2

6

8

3

9

7

8

9

2

5

7

3

1

4

6

Column condition: numbers 1, . . . , 9 appear exactly once.

Sudoku

1

9 5 2

3 7 4

6 8 1

8 1 6

2 3 9

5 4 7

7 9 8

4 3

5

6

2

4 6 3

1 2 8

7 5 9

9 3 7

5 8 4

1 2

2 4 1

8 7 5

3 9 6

6

7

6

3

4

1

9

5

2

8

1

5

4

2

6

8

3

9

7

8

9

2

5

7

3

1

4

6

Sub-block condition: numbers 1, . . . , 9 appear exactly once.

Sudoku

1

9 5 2

3 7 4

6 8 1

8 1 6

2 3 9

5 4 7

7 9 8

4 3

5

6

2

4 6 3

1 2 8

7 5 9

9 3 7

5 8 4

1 2

2 4 1

8 7 5

3 9 6

6

7

6

3

4

1

9

5

2

8

1

5

4

2

6

8

3

9

7

8

9

2

5

7

3

1

4

6

Question: how many Sudoku arrays are there?

(More technically: how many valid configurations are there?)

Other Sudoku Setups

Other Sudoku Setups

1D constraints in communications

RLL Constraints

0 0 0 0 0 0 0 01 1 1 1 1

A (d, k) RLL constraint imposes:

At least d zero symbols between two ones.

At most k zero symbols between two ones.

RLL Constraints

0 0 0 0 0 0 0 01 1 1 1 1




Question: how many sequences of length T fulfill these constraints?

RLL Constraints

0 0 0 0 0 0 0 01 1 1 1 1





Answer: typically, the answer to such questions looks like

N(T ) = exp(

C · T + o(T ))

.

RLL Constraints

0 0 0 0 0 0 0 01 1 1 1 1






N(T ) = exp(

C · T + o(T ))

.

C: “capacity” or “entropy.”

RLL Constraints

0 1 0 1 0 1 11 0 0 0 0 0 0 0 1






N(T ) = exp(

C · T + o(T ))

.


2D constraints in communications

Two-Dimensional RLL Constraints


A (d1, k; d2, k2) RLL constraint imposes:

. . .

. . .



. . .

. . .

Question: How many arrays of size m× n

fulfill these constraints?



. . .

. . .



Answer: Typically, the answer to such

questions looks like

N(m,n) = exp(

C ·mn+ o(mn))

.



. . .

. . .



Answer: Typically, the answer to such

questions looks like

N(m,n) = exp(

C ·mn+ o(mn))

.


Towards a graphical model

Towards a Graphical Model


on a chess board?




A8,8A8,1

A3,1

A2,1

A1,1 A1,2 A1,3 A1,8



grow,8

grow,8(a8,1, . . . , a8,8) ,

1 exactly one rook

0 otherwise


gcol,8

gcol,8(a1,8, . . . , a8,8) ,

1 exactly one rook

0 otherwise



A1,1

A2,1

A7,1

A8,1

...


A1,1

A2,1

A7,1

A8,1

...

A2,2

A1,2

A7,2

A8,2

...


A1,1

A2,1

A7,1

A8,1

...

A2,2

A1,2

A7,2

A8,2

...

A1,3

A8,8

...

...

...


A1,1

A2,1

A7,1

A8,1

...

A2,2

A1,2

A7,2

A8,2

...

A1,3

A8,8

...

...

...

gcol,8

gcol,2

gcol,1

...


A1,1

A2,1

A7,1

A8,1

...

A2,2

A1,2

A7,2

A8,2

...

A1,3

A8,8

...

...

...grow,8

grow,2

grow,1

gcol,8

gcol,2

gcol,1

...

...


A1,1

A2,1

A7,1

A8,1

...

A2,2

A1,2

A7,2

A8,2

...

A1,3

A8,8

...

...

...grow,8

grow,2

grow,1

gcol,8

gcol,2

gcol,1

...

...


A1,1

A2,1

A7,1

A8,1

...

A2,2

A1,2

A7,2

A8,2

...

A1,3

A8,8

...

...

...grow,8

grow,2

grow,1

gcol,8

gcol,2

gcol,1

...

...


A1,1

A2,1

A7,1

A8,1

...

A2,2

A1,2

A7,2

A8,2

...

A1,3

A8,8

...

...

...grow,8

grow,2

grow,1

gcol,8

gcol,2

gcol,1

...

...


A1,1

A2,1

A7,1

A8,1

...

A2,2

A1,2

A7,2

A8,2

...

A1,3

A8,8

...

...

...grow,8

grow,2

grow,1

gcol,8

gcol,2

gcol,1

...

...

Global function:

g(a1,1, . . . , a8,8)

=∏

j

gcol,j(a1,j, . . . , a8,j)×

∏

i

grow,i(ai,1, . . . , ai,8)


A1,1

A2,1

A7,1

A8,1

...

A2,2

A1,2

A7,2

A8,2

...

A1,3

A8,8

...

...

...grow,8

grow,2

grow,1

gcol,8

gcol,2

gcol,1

...

...

Global function:

g(a1,1, . . . , a8,8)

=∏

j

gcol,j(a1,j, . . . , a8,j)×

∏

i

grow,i(ai,1, . . . , ai,8)

Total sum:

Z =∑

a1,1,...,a8,8

g(a1,1, . . . , a8,8)


A1,1

A2,1

A7,1

A8,1

...

A2,2

A1,2

A7,2

A8,2

...

A1,3

A8,8

...

...

...grow,8

grow,2

grow,1

gcol,8

gcol,2

gcol,1

...

...

Global function:

g(a1,1, . . . , a8,8)

=∏

j

gcol,j(a1,j, . . . , a8,j)×

∏

i

grow,i(ai,1, . . . , ai,8)

Total sum (partition function):

Z =∑

a1,1,...,a8,8

g(a1,1, . . . , a8,8)


A1,1

A2,1

A7,1

A8,1

...

A2,2

A1,2

A7,2

A8,2

...

A1,3

A8,8

...

...

...grow,8

grow,2

grow,1

gcol,8

gcol,2

gcol,1

...

...

Global function:

g(a1,1, . . . , a8,8)

=∏

j

gcol,j(a1,j, . . . , a8,j)×

∏

i

grow,i(ai,1, . . . , ai,8)


Z =∑

a1,1,...,a8,8

g(a1,1, . . . , a8,8)

Use of loopy belief propagation

for approximating Z?


A1,1

A2,1

A7,1

A8,1

...

A2,2

A1,2

A7,2

A8,2

...

A1,3

A8,8

...

...

...grow,8

grow,2

grow,1

gcol,8

gcol,2

gcol,1

...

...

Global function:

g(a1,1, . . . , a8,8)

=∏

j

gcol,j(a1,j, . . . , a8,j)×

∏

i

grow,i(ai,1, . . . , ai,8)


Z =∑

a1,1,...,a8,8

g(a1,1, . . . , a8,8)




A1,1

A2,1

A7,1

A8,1

...

A2,2

A1,2

A7,2

A8,2

...

A1,3

A8,8

...

...

...grow,8

grow,2

grow,1

gcol,8

gcol,2

gcol,1

...

...

Global function:

g(a1,1, . . . , a8,8)

=∏

j

gcol,j(a1,j, . . . , a8,j)×

∏

i

grow,i(ai,1, . . . , ai,8)


Z =∑

a1,1,...,a8,8

g(a1,1, . . . , a8,8)



Some considerations

on counting algorithms

Coloring the Surfacesof a Closed Strip










�

�

�

4

�

4 2

�

44+22 = 3 2

· · ·�

· · ·�

4 2 4 2

The permanent of a matrix

Determinant vs. Permanentof a Matrix

Consider the matrix θ =

θ11 θ12 θ13

θ21 θ22 θ23

θ31 θ32 θ33

.



θ11 θ12 θ13

θ21 θ22 θ23

θ31 θ32 θ33

.

The determinant of θ:

det(θ) = +θ11θ22θ33 + θ12θ23θ31 + θ13θ21θ32

− θ11θ23θ32 − θ12θ21θ33 − θ13θ22θ31.



θ11 θ12 θ13

θ21 θ22 θ23

θ31 θ32 θ33

.

The determinant of θ:

det(θ) = +θ11θ22θ33 + θ12θ23θ31 + θ13θ21θ32

− θ11θ23θ32 − θ12θ21θ33 − θ13θ22θ31.

The permanent of θ:

perm(θ) = + θ11θ22θ33 + θ12θ23θ31 + θ13θ21θ32

+ θ11θ23θ32 + θ12θ21θ33 + θ13θ22θ31.


The determinant of an n× n-matrix θ

det(θ) =∑

σ

sgn(σ)∏

i∈[n]

θi,σ(i).

where the sum is over all n! permutations of the set [n] , {1, . . . , n}.



det(θ) =∑

σ

sgn(σ)∏

i∈[n]

θi,σ(i).


The permanent of an n× n-matrix θ:

perm(θ) =∑

σ

∏

i∈[n]

θi,σ(i).



det(θ) =∑

σ

sgn(σ)∏

i∈[n]

θi,σ(i).


The permanent of an n× n-matrix θ:

perm(θ) =∑

σ

∏

i∈[n]

θi,σ(i).

The permanent turns up in a variety of context, especially in

combinatorial problems, statistical physics (partition function), . . .

Exactly Computing the Permanent


Brute-force computation:

O(n · n!) = O(

n3/2 · (n/e)n)

arithmetic operations.



O(n · n!) = O(

n3/2 · (n/e)n)


Ryser’s algorithm:

Θ(n · 2n) arithmetic operations.



O(n · n!) = O(

n3/2 · (n/e)n)


Ryser’s algorithm:

Θ(n · 2n) arithmetic operations.

Complexity class [Valiant, 1979]:

#P (“sharp P” or “number P”),

where #P is the set of the counting problems associated with the

decision problems in the set NP. (Note that even the computation

of the permanent of zero-one matrices is #P-complete.)

Estimating the Permanent

More efficient algorithms are possible if one does not want to compute

the permanent of a matrix exactly.




For a matrix that contains positive and negative entries:

→ “constructive and destructive interference of terms

in the summation.”




For a matrix that contains positive and negative entries:

→ “constructive and destructive interference of terms

in the summation.”

For a matrix that contains only non-negative entries:

→ “constructive interference of terms in the summation.”


FROM NOW ON: we focus on the case where all entries of the

matrix are non-negative, i.e.

θij ≥ 0 ∀i, j.




θij ≥ 0 ∀i, j.

Markov chain Monte Carlo based methods: [Broder, 1986], . . .




θij ≥ 0 ∀i, j.


Godsil-Gutman formula based methods: [Karmarkar et al., 1993],

[Barvinok, 1997ff.], [Chien, Rasmussen, Sinclair, 2004], . . .




θij ≥ 0 ∀i, j.




Fully polynomial-time randomized approximation schemes

(FPRAS): [Jerrum, Sinclair, Vigoda, 2004], . . .




θij ≥ 0 ∀i, j.




Fully polynomial-time randomized approximation schemes

(FPRAS): [Jerrum, Sinclair, Vigoda, 2004], . . .

Bethe-approximation-based / sum-product-algorithm-based

methods: [Chertkov et al., 2008], [Huang and Jebara, 2009], . . .

Estimating the Permanent of a Matrix

From [Huang/Jebara, 2009].

Estimating the Permanent of a Matrix

From [Huang/Jebara, 2009].

Valid Rook Configurationsand Permanents

Number of valid rook configurations

= perm

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

=∑

σ

∏

i∈[8]

θi,σ(i)



= perm

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

=∑

σ

∏

i∈[8]

θi,σ(i)



= perm

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 0 0 1 1 1

1 1 1 0 0 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

=∑

σ

∏

i∈[8]

θi,σ(i)



= perm

1 0 1 1 0 1 1 1

1 1 1 0 0 0 1 0

1 1 0 1 1 0 1 1

0 0 1 1 0 1 0 1

0 1 0 1 1 1 0 1

1 1 1 0 1 0 0 1

1 1 1 1 1 0 1 1

1 1 0 1 1 1 1 1

=∑

σ

∏

i∈[8]

θi,σ(i)

Perfect Matchings and PermanentsNumber of valid perfect matchings

= perm

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

=∑

σ

∏

i∈[8]

θi,σ(i)


= perm

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

=∑

σ

∏

i∈[8]

θi,σ(i)


= perm

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 0 0 1 1 1

1 1 1 0 0 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

=∑

σ

∏

i∈[8]

θi,σ(i)


= perm

1 0 1 1 0 1 1 1

1 1 1 0 0 0 1 0

1 1 0 1 1 0 1 1

0 0 1 1 0 1 0 1

0 1 0 1 1 1 0 1

1 1 1 0 1 0 0 1

1 1 1 1 1 0 1 1

1 1 0 1 1 1 1 1

=∑

σ

∏

i∈[8]

θi,σ(i)


= perm

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

=∑

σ

∏

i∈[8]

θi,σ(i)

Perfect Matchings and Permanents

θi,ji jTotal sum of weighted perf. matchings

= perm

θ11 θ12 θ13 θ14 θ15 θ16 θ17 θ18

θ21 θ22 θ23 θ24 θ25 θ26 θ27 θ28

θ31 θ32 θ33 θ34 θ35 θ36 θ37 θ38

θ41 θ42 θ43 θ44 θ45 θ46 θ47 θ48

θ51 θ52 θ53 θ54 θ55 θ56 θ57 θ58

θ61 θ62 θ63 θ64 θ65 θ66 θ67 θ68

θ71 θ72 θ73 θ74 θ75 θ76 θ77 θ78

θ81 θ82 θ83 θ84 θ85 θ86 θ87 θ88

=∑

σ

∏

i∈[8]

θi,σ(i)

Perfect Matchings and Permanents

θi,ji jTotal sum of weighted perf. matchings

= perm

θ11 θ12 θ13 θ14 θ15 θ16 θ17 θ18

θ21 θ22 θ23 θ24 θ25 θ26 θ27 θ28

θ31 θ32 θ33 θ34 θ35 θ36 θ37 θ38

θ41 θ42 θ43 θ44 θ45 θ46 θ47 θ48

θ51 θ52 θ53 θ54 θ55 θ56 θ57 θ58

θ61 θ62 θ63 θ64 θ65 θ66 θ67 θ68

θ71 θ72 θ73 θ74 θ75 θ76 θ77 θ78

θ81 θ82 θ83 θ84 θ85 θ86 θ87 θ88

=∑

σ

∏

i∈[8]

θi,σ(i)

Graphical Model for Permanent

A1,1

A2,1

A7,1

A8,1

...

A2,2

A1,2

A7,2

A8,2

...

A1,3

A8,8

...

...

...grow,8

grow,2

grow,1

gcol,8

gcol,2

gcol,1

...

...

(function nodes are suitably defined based on θ)

Global function:

g(a1,1, . . . , a8,8)

=∏

j

gcol,j(a1,j, . . . , a8,j)×

∏

i

grow,i(ai,1, . . . , ai,8)

Permanent:

perm(θ) = Z =∑

a1,1,...,a8,8

g(a1,1, . . . , a8,8)


grow,1

grow,2

grow,3

grow,4

grow,5

grow,6

grow,7

grow,8

gcol,1

gcol,2

gcol,3

gcol,4

gcol,5

gcol,6

gcol,7

gcol,8


(variable nodes have been omitted)

Global function:

g(a1,1, . . . , a8,8)

=∏

j

gcol,j(a1,j, . . . , a8,j)×

∏

i

grow,i(ai,1, . . . , ai,8)

Permanent:

perm(θ) = Z =∑

a1,1,...,a8,8

g(a1,1, . . . , a8,8)


grow,1

grow,2

grow,3

grow,4

grow,5

grow,6

grow,7

grow,8

gcol,1

gcol,2

gcol,3

gcol,4

gcol,5

gcol,6

gcol,7

gcol,8



Many short cycles.

The vertex degrees are high.


grow,1

grow,2

grow,3

grow,4

grow,5

grow,6

grow,7

grow,8

gcol,1

gcol,2

gcol,3

gcol,4

gcol,5

gcol,6

gcol,7

gcol,8



Many short cycles.


Both facts might suggest that the

application of the sum-product algo-

rithm to this factor graph is rather

problematic.


grow,1

grow,2

grow,3

grow,4

grow,5

grow,6

grow,7

grow,8

gcol,1

gcol,2

gcol,3

gcol,4

gcol,5

gcol,6

gcol,7

gcol,8



Many short cycles.





problematic.

However, luckily this is not the

case.


grow,1

grow,2

grow,3

grow,4

grow,5

grow,6

grow,7

grow,8

gcol,1

gcol,2

gcol,3

gcol,4

gcol,5

gcol,6

gcol,7

gcol,8



Many short cycles.





problematic.

However, luckily this is not the

case.

For an SPA suitability assessment,

the overall cycle structure and the

types of functions nodes are at least

as important.

Factor graphs and the

sum-product algorithm

Factor Graphs

A factor graph can be

used to represent a

multivariate function:

f(x1, x2, x3)

f

X1 X2

X3

Factor Graphs


used to represent a


f(x1, x2, x3)

f

X1 X2

X3

Variable nodes: for each variable we draw

a variable node (empty circles).

Factor Graphs


used to represent a


f(x1, x2, x3)

f

X1 X2

X3



Function nodes: for each function we draw

a function node (filled squares).

Factor Graphs


used to represent a


f(x1, x2, x3)

f

X1 X2

X3





Edges: there is an edge between a variable

node and a function node if the

corresponding variable is an argument of

the corresponding function.

Factor Graphs


used to represent a


f(x1, x2, x3)

f

X1 X2

X3





Edges: there is an edge between a variable

node and a function node if the

corresponding variable is an argument of

the corresponding function.

Bipartite graph: the resulting graph is a

bipartite graph, i.e. there are only edges

between vertices of different types.

Factor Graphs

General references for factor graphs are:

F. R. Kschischang, B. J. Frey and H.-A. Loeliger, “Factor graphs

and the sum-product algorithm,” IEEE Trans. on Inform. Theory,

IT–47, Feb. 2001.

H.-A. Loeliger, “An introduction to factor graphs,” IEEE Signal

Processing Magazine, Jan. 2004.

Factor Graphs

We assume that we know more about the internal structure of the

function f(x1, x2, x3), e.g.

f(x1, x2, x3) = fA(x1, x2) · fB(x2, x3).

Factor Graphs



f(x1, x2, x3) = fA(x1, x2) · fB(x2, x3).

Then we can take advantage of this fact and the factor graph represents

this structure.X1 X2 X3

fA fB

Factor Graphs



f(x1, x2, x3) = fA(x1, x2) · fB(x2, x3).



fA fB

f(., ., .) is called the global function.

Factor Graphs



f(x1, x2, x3) = fA(x1, x2) · fB(x2, x3).



fA fB

f(., ., .) is called the global function.

fA(., .) and fB(., .) are called local functions.

Factor Graphsf(x1, x2, x3, x4, x5)

= fA(x1) · fB(x2) · fC(x1, x2, x3) · fD(x3, x4) · fE(x3, x5)



X1 X2 X3 X4 X5

fA fB fC fD fE



X1 X2 X3 X4 X5

fA fB fC fD fE

X4

X5

fD

fE

X3

X2

X1

fC

fA

fB



X1 X2 X3 X4 X5

fA fB fC fD fE

X4

X5

fD

fE

X3

X2

X1

fC

fA

fB

Note: One and the same function can be represented by graphs with

different structures: some are more pleasing than others.

Factor Graph of an LDPC Code

In the context of channel coding, we usually take a factor graph to represent

the factorization of the joint pmf/pdf of all occuring variables, i.e. uncoded

symbols, coded symbols, and received symbols. Here it is shown when using

a quasi-cyclic repeat-accumulate LDPC code (binary [44, 22, 8] linear code).

BitInfo-BitChannel-BitInfo-/Channel-BitXOR-FunctionChannel-Function

The Sum-Product Algorithm

Let us consider again the following factor graph (which is a tree).

X4

X5

fD

fE

X3

X2

X1

fC

fA

fB

The global function is

f(x1, x2, x3, x4, x5)

= fA(x1) · fB(x2) · fC(x1, x2, x3) · fD(x3, x4) · fE(x3, x5).

The Sum-Product AlgorithmThe global function is

f(x1, x2, x3, x4, x5) = fA(x1) · fB(x2) · fC(x1, x2, x3) · fD(x3, x4) · fE(x3, x5).



Very often one wants to calculate marginal functions. E.g.

ηX1(x1) =∑

x2,x3,x4,x5

f(x1, x2, x3, x4, x5)

=∑

x2,x3,x4,x5

fA(x1) · fB(x2) · fC(x1, x2, x3) · fD(x3, x4) · fE(x3, x5).



Very often one wants to calculate marginal functions. E.g.

ηX1(x1) =∑

x2,x3,x4,x5

f(x1, x2, x3, x4, x5)

=∑

x2,x3,x4,x5


ηX3(x3) =∑

x1,x2,x4,x5

f(x1, x2, x3, x4, x5)

=∑

x1,x2,x4,x5


etc.

The Sum-Product AlgorithmηX1

(x1) =∑

x2

∑

x3

∑

x4

∑

x5

fA(x1) · fB(x2) · fC(x1, x2, x3) · fD(x3, x4) · fE(x3, x5)


(x1) =∑

x2

∑

x3

∑

x4

∑

x5


= fA(x1)︸︷︷︸

∑

x2

∑

x3

fC(x1, x2, x3) fB(x2)︸︷︷︸

︸︷︷︸

∑

x4

fD(x3, x4) · 1︸︷︷︸

︸︷︷︸

∑

x5

fE(x3, x5) · 1︸︷︷︸

︸︷︷︸

︸︷︷︸

︸︷︷︸


(x1) =∑

x2

∑

x3

∑

x4

∑

x5


= fA(x1)︸︷︷︸

∑

x2

∑

x3

fC(x1, x2, x3) fB(x2)︸︷︷︸

︸︷︷︸

∑

x4

fD(x3, x4) · 1︸︷︷︸

︸︷︷︸

∑

x5

fE(x3, x5) · 1︸︷︷︸

µX5→fE(x5)

︸︷︷︸

︸︷︷︸

︸︷︷︸


(x1) =∑

x2

∑

x3

∑

x4

∑

x5


= fA(x1)︸︷︷︸

∑

x2

∑

x3

fC(x1, x2, x3) fB(x2)︸︷︷︸

︸︷︷︸

∑

x4

fD(x3, x4) · 1︸︷︷︸

︸︷︷︸

∑

x5

fE(x3, x5) · 1︸︷︷︸

µX5→fE(x5)

︸︷︷︸

µfE→X3(x3)

︸︷︷︸

︸︷︷︸


(x1) =∑

x2

∑

x3

∑

x4

∑

x5


= fA(x1)︸︷︷︸

∑

x2

∑

x3

fC(x1, x2, x3) fB(x2)︸︷︷︸

︸︷︷︸

∑

x4

fD(x3, x4) · 1︸︷︷︸

µX4→fD(x4)

︸︷︷︸

∑

x5

fE(x3, x5) · 1︸︷︷︸

µX5→fE(x5)

︸︷︷︸

µfE→X3(x3)

︸︷︷︸

︸︷︷︸


(x1) =∑

x2

∑

x3

∑

x4

∑

x5


= fA(x1)︸︷︷︸

∑

x2

∑

x3

fC(x1, x2, x3) fB(x2)︸︷︷︸

︸︷︷︸

∑

x4

fD(x3, x4) · 1︸︷︷︸

µX4→fD(x4)

︸︷︷︸

µfD→X3(x3)

∑

x5

fE(x3, x5) · 1︸︷︷︸

µX5→fE(x5)

︸︷︷︸

µfE→X3(x3)

︸︷︷︸

︸︷︷︸


(x1) =∑

x2

∑

x3

∑

x4

∑

x5


= fA(x1)︸︷︷︸

∑

x2

∑

x3

fC(x1, x2, x3) fB(x2)︸︷︷︸

︸︷︷︸

∑

x4

fD(x3, x4) · 1︸︷︷︸

µX4→fD(x4)

︸︷︷︸

µfD→X3(x3)

∑

x5

fE(x3, x5) · 1︸︷︷︸

µX5→fE(x5)

︸︷︷︸

µfE→X3(x3)

︸︷︷︸

µfX3→fC(x3)

︸︷︷︸


(x1) =∑

x2

∑

x3

∑

x4

∑

x5


= fA(x1)︸︷︷︸

∑

x2

∑

x3

fC(x1, x2, x3) fB(x2)︸︷︷︸

µfB→X2(x2)

︸︷︷︸

∑

x4

fD(x3, x4) · 1︸︷︷︸

µX4→fD(x4)

︸︷︷︸

µfD→X3(x3)

∑

x5

fE(x3, x5) · 1︸︷︷︸

µX5→fE(x5)

︸︷︷︸

µfE→X3(x3)

︸︷︷︸

µfX3→fC(x3)

︸︷︷︸


(x1) =∑

x2

∑

x3

∑

x4

∑

x5


= fA(x1)︸︷︷︸

∑

x2

∑

x3

fC(x1, x2, x3) fB(x2)︸︷︷︸

µfB→X2(x2)

︸︷︷︸

µX2→fC(x2)

∑

x4

fD(x3, x4) · 1︸︷︷︸

µX4→fD(x4)

︸︷︷︸

µfD→X3(x3)

∑

x5

fE(x3, x5) · 1︸︷︷︸

µX5→fE(x5)

︸︷︷︸

µfE→X3(x3)

︸︷︷︸

µfX3→fC(x3)

︸︷︷︸


(x1) =∑

x2

∑

x3

∑

x4

∑

x5


= fA(x1)︸︷︷︸

∑

x2

∑

x3

fC(x1, x2, x3) fB(x2)︸︷︷︸

µfB→X2(x2)

︸︷︷︸

µX2→fC(x2)

∑

x4

fD(x3, x4) · 1︸︷︷︸

µX4→fD(x4)

︸︷︷︸

µfD→X3(x3)

∑

x5

fE(x3, x5) · 1︸︷︷︸

µX5→fE(x5)

︸︷︷︸

µfE→X3(x3)

︸︷︷︸

µfX3→fC(x3)

︸︷︷︸

µfC→X1(x1)


(x1) =∑

x2

∑

x3

∑

x4

∑

x5


= fA(x1)︸︷︷︸

µfA→X1(x1)

∑

x2

∑

x3

fC(x1, x2, x3) fB(x2)︸︷︷︸

µfB→X2(x2)

︸︷︷︸

µX2→fC(x2)

∑

x4

fD(x3, x4) · 1︸︷︷︸

µX4→fD(x4)

︸︷︷︸

µfD→X3(x3)

∑

x5

fE(x3, x5) · 1︸︷︷︸

µX5→fE(x5)

︸︷︷︸

µfE→X3(x3)

︸︷︷︸

µfX3→fC(x3)

︸︷︷︸

µfC→X1(x1)


(x1) =∑

x2

∑

x3

∑

x4

∑

x5


= fA(x1)︸︷︷︸

µfA→X1(x1)

∑

x2

∑

x3

fC(x1, x2, x3) fB(x2)︸︷︷︸

µfB→X2(x2)

︸︷︷︸

µX2→fC(x2)

∑

x4

fD(x3, x4) · 1︸︷︷︸

µX4→fD(x4)

︸︷︷︸

µfD→X3(x3)

∑

x5

fE(x3, x5) · 1︸︷︷︸

µX5→fE(x5)

︸︷︷︸

µfE→X3(x3)

︸︷︷︸

µfX3→fC(x3)

︸︷︷︸

µfC→X1(x1)

The objects µXi→fj(xi) and µfj→Xi

(xi):


(x1) =∑

x2

∑

x3

∑

x4

∑

x5


= fA(x1)︸︷︷︸

µfA→X1(x1)

∑

x2

∑

x3

fC(x1, x2, x3) fB(x2)︸︷︷︸

µfB→X2(x2)

︸︷︷︸

µX2→fC(x2)

∑

x4

fD(x3, x4) · 1︸︷︷︸

µX4→fD(x4)

︸︷︷︸

µfD→X3(x3)

∑

x5

fE(x3, x5) · 1︸︷︷︸

µX5→fE(x5)

︸︷︷︸

µfE→X3(x3)

︸︷︷︸

µfX3→fC(x3)

︸︷︷︸

µfC→X1(x1)


(xi):

They are called messages.


(x1) =∑

x2

∑

x3

∑

x4

∑

x5


= fA(x1)︸︷︷︸

µfA→X1(x1)

∑

x2

∑

x3

fC(x1, x2, x3) fB(x2)︸︷︷︸

µfB→X2(x2)

︸︷︷︸

µX2→fC(x2)

∑

x4

fD(x3, x4) · 1︸︷︷︸

µX4→fD(x4)

︸︷︷︸

µfD→X3(x3)

∑

x5

fE(x3, x5) · 1︸︷︷︸

µX5→fE(x5)

︸︷︷︸

µfE→X3(x3)

︸︷︷︸

µfX3→fC(x3)

︸︷︷︸

µfC→X1(x1)


(xi):


They can be associated with the edge between the vertex Xi and the vertex fj .


(x1) =∑

x2

∑

x3

∑

x4

∑

x5


= fA(x1)︸︷︷︸

µfA→X1(x1)

∑

x2

∑

x3

fC(x1, x2, x3) fB(x2)︸︷︷︸

µfB→X2(x2)

︸︷︷︸

µX2→fC(x2)

∑

x4

fD(x3, x4) · 1︸︷︷︸

µX4→fD(x4)

︸︷︷︸

µfD→X3(x3)

∑

x5

fE(x3, x5) · 1︸︷︷︸

µX5→fE(x5)

︸︷︷︸

µfE→X3(x3)

︸︷︷︸

µfX3→fC(x3)

︸︷︷︸

µfC→X1(x1)


(xi):



They are functions of xi, i.e. their domain is the alphabet of Xi.


(x1) =∑

x2

∑

x3

∑

x4

∑

x5


= fA(x1)︸︷︷︸

µfA→X1(x1)

∑

x2

∑

x3

fC(x1, x2, x3) fB(x2)︸︷︷︸

µfB→X2(x2)

︸︷︷︸

µX2→fC(x2)

∑

x4

fD(x3, x4) · 1︸︷︷︸

µX4→fD(x4)

︸︷︷︸

µfD→X3(x3)

∑

x5

fE(x3, x5) · 1︸︷︷︸

µX5→fE(x5)

︸︷︷︸

µfE→X3(x3)

︸︷︷︸

µfX3→fC(x3)

︸︷︷︸

µfC→X1(x1)


(xi):



They are functions of xi, i.e. their domain is the alphabet of Xi.

Note: similar manipulations can be performed for calculating ηX2(x2), ηX3

(x3), ηX4(x4),

ηX5(x5).


Messages necessary for calculating ηX1(x1).

X4

X5

fD

fE

X3

X2

X1

fC

fA

fB



X4

X5

fD

fE

X3

X2

X1

fC

fA

fB



X4

X5

fD

fE

X3

X2

X1

fC

fA

fB



X4

X5

fD

fE

X3

X2

X1

fC

fA

fB



X4

X5

fD

fE

X3

X2

X1

fC

fA

fB



X4

X5

fD

fE

X3

X2

X1

fC

fA

fB



X4

X5

fD

fE

X3

X2

X1

fC

fA

fB



X4

X5

fD

fE

X3

X2

X1

fC

fA

fB



X4

X5

fD

fE

X3

X2

X1

fC

fA

fB



X4

X5

fD

fE

X3

X2

X1

fC

fA

fB



X4

X5

fD

fE

X3

X2

X1

fC

fA

fB

The Sum-Product AlgorithmThe figure shows the messages that are necessary for calculating ηX1

(x1), ηX2(x2),

ηX3(x3), ηX4

(x4), and ηX5(x5).

X4

X5

fD

fE

X3

X2

X1

fC

fA

fB


(x1), ηX2(x2),

ηX3(x3), ηX4

(x4), and ηX5(x5).

X4

X5

fD

fE

X3

X2

X1

fC

fA

fB

Edges: Messages are sent along edges.


(x1), ηX2(x2),

ηX3(x3), ηX4

(x4), and ηX5(x5).

X4

X5

fD

fE

X3

X2

X1

fC

fA

fB


Processing: Taking products and doing summations is done in the vertices.


(x1), ηX2(x2),

ηX3(x3), ηX4

(x4), and ηX5(x5).

X4

X5

fD

fE

X3

X2

X1

fC

fA

fB


Processing: Taking products and doing summations is done in the vertices.

Reuse of messages: We see that messages can be “reused” in the sense that

many partial calculations are the same; so it suffices to perform them only

once.


f1

f2

f3

f4X

µX→f4(x)

µX→f4(x) = µf1→X(x) · µf2→X(x) · µf3→X(x)


f1

f2

f3

f4X

µX→f4(x)

µX→f4(x) = µf1→X(x) · µf2→X(x) · µf3→X(x)

X1

X3

µf→X4(x4)

f X4

X2

µf→X4(x4) =∑

x1

∑

x2

∑

x3

f(x1, x2, x3, x4) · µX1→f (x1) · µX2→f (x2) · µX3→f (x3)


f2

X

f1

f3

f4

Computation of marginal at variable node:

ηX(x) = µf1→X(x) · µf2→X(x)

· µf3→X(x) · µf4→X(x)


f2

X

f1

f3

f4

Computation of marginal at variable node:

ηX(x) = µf1→X(x) · µf2→X(x)

· µf3→X(x) · µf4→X(x)

f

X2

X1

X4

X3

Computation of marginal at function node:

ηf (x1, x2, x3, x4) = f(x1, x2, x3, x4)

· µX1→f (x1) · µX2→f (x2)

· µX3→f (x3) · µX4→f (x4)


Factor graph without loops: in this case it is obvious what

messages have to be calculated when.

⇒ Mode of operation 1


Factor graph without loops: in this case it is obvious what

messages have to be calculated when.


Factor graph with loops: one has to decide what update schedule

to take.


Comments on theSum-Product Algorithm

If the factor graph has no loops then it is obvious what messages

have to be calculated when.

If the factor graphs has loops one has to decide what update

schedule to take.

Depending on the underlying semi-ring one gets different versions

of the summary-product algorithm.

For 〈R,+, · 〉 one gets the sum-product algorithm.

(This is the case discussed above.)

For 〈R+,max, · 〉 one gets the max-product algorithm.

For 〈R,min,+〉 one gets the min-sum algorithm.

etc.

Partition function (total sum)

Partition Function

Z =∑

x1,x2,x3,x4,x5

f(x1, x2, x3, x4, x5)

Partition Function

Z =∑

x1,x2,x3,x4,x5

f(x1, x2, x3, x4, x5)

Recall:

ηX1(x1) =∑

x2,x3,x4,x5

f(x1, x2, x3, x4, x5)

ηX2(x2) =∑

x1,x3,x4,x5

f(x1, x2, x3, x4, x5)

......

Partition Function

Z =∑

x1,x2,x3,x4,x5

f(x1, x2, x3, x4, x5)

Recall:

ηX1(x1) =∑

x2,x3,x4,x5

f(x1, x2, x3, x4, x5)

ηX2(x2) =∑

x1,x3,x4,x5

f(x1, x2, x3, x4, x5)

......

Define:

ZX1 =∑

x1

ηX1(x1)

ZX2 =∑

x2

ηX2(x2)

......

Partition Function

Z =∑

x1,x2,x3,x4,x5

f(x1, x2, x3, x4, x5)

Recall:

ηX1(x1) =∑

x2,x3,x4,x5

f(x1, x2, x3, x4, x5)

ηX2(x2) =∑

x1,x3,x4,x5

f(x1, x2, x3, x4, x5)

......

Define:

ZX1 =∑

x1

ηX1(x1) = Z

ZX2 =∑

x2

ηX2(x2) = Z

......

Partition Function

Z =∑

x1,x2,x3,x4,x5

f(x1, x2, x3, x4, x5)

Partition Function

Z =∑

x1,x2,x3,x4,x5

f(x1, x2, x3, x4, x5)

Recall:

......

ηfC(x1, x2, x3) =

∑

x4,x5

f(x1, x2, x3, x4, x5)

......

Partition Function

Z =∑

x1,x2,x3,x4,x5

f(x1, x2, x3, x4, x5)

Recall:

......

ηfC(x1, x2, x3) =

∑

x4,x5

f(x1, x2, x3, x4, x5)

......

Define:

......

ZfC=

∑

x1,x2,x3

ηfC(x1, x2, x3)

......

Partition Function

Z =∑

x1,x2,x3,x4,x5

f(x1, x2, x3, x4, x5)

Recall:

......

ηfC(x1, x2, x3) =

∑

x4,x5

f(x1, x2, x3, x4, x5)

......

Define:

......

ZfC=

∑

x1,x2,x3

ηfC(x1, x2, x3) = Z

......

Partition Function

X4

X5

fD

fE

X3

X2

X1

fC

fA

fB

Partition Function

X4

X5

fD

fE

X3

X2

X1

fC

fA

fB

Z = ZX1 = ZX2 = ZX3 = ZX4 = ZX5 = ZfA = ZfB = ZfC = ZfD = ZfE

Partition Function

X4

X5

fD

fE

X3

X2

X1

fC

fA

fB


Claim:

Z =ZfA · ZfB · ZfC · ZfD · ZfE · ZX1 · ZX2 · ZX3 · ZX4 · ZX5

Z2X1

· Z2X2

· Z3X3

· Z1X4

· Z1X5

(Note: exponents in denominator equal variable node degrees.)

Partition Function

X4

X5

fD

fE

X3

X2

X1

fC

fA

fB


Claim:

Z =Z#vertices

Z#edges=

ZfA · ZfB · ZfC · ZfD · ZfE · ZX1 · ZX2 · ZX3 · ZX4 · ZX5

Z2X1

· Z2X2

· Z3X3

· Z1X4

· Z1X5

(Note: exponents in denominator equal variable node degrees.)

Partition Function

X4

X5

fD

fE

X3

X2

X1

fC

fA

fB


Claim:

Z =Z#vertices

Z#edges=

ZfA · ZfB · ZfC · ZfD · ZfE · ZX1 · ZX2 · ZX3 · ZX4 · ZX5

Z2X1

· Z2X2

· Z3X3

· Z1X4

· Z1X5

(Here we used the fact that for a graph with one component and no cycles it holds

that #vertices = #edges + 1.)

Partition Function

X4

X5

fD

fE

X3

X2

X1

fC

fA

fB


Z2X1

· Z2X2

· Z3X3

· Z1X4

· Z1X5

Partition Function

X4

X5

fD

fE

X3

X2

X1

fC

fA

fBrescaled by γ


Z2X1

· Z2X2

· Z3X3

· Z1X4

· Z1X5

Partition Function

X4

X5

fD

fE

X3

X2

X1

fC

fA

fBrescaled by γ


Z2X1

· Z2X2

· Z3X3

· Z1X4

· Z1X5

Partition Function

X4

X5

fD

fE

X3

X2

X1

fC

fA

fBrescaled by γ


Z2X1

· Z2X2

· Z3X3

· Z1X4

· Z1X5

Partition Function

X4

X5

fD

fE

X3

X2

X1

fC

fA

fBrescaled by γ


Z2X1

· Z2X2

· Z3X3

· Z1X4

· Z1X5

Partition Function

X4

X5

fD

fE

X3

X2

X1

fC

fA

fBrescaled by γ

Z =ZfA · ZfB · ZfC · γZfD · γZfE · ZX1 · ZX2 · γZX3 · γZX4 · γZX5

Z2X1

· Z2X2

· γ3Z3X3

· γ1Z1X4

· γ1Z1X5

Partition Function

X4

X5

fD

fE

X3

X2

X1

fC

fA

fBrescaled by γ

Z =γ5

γ5· ZfA · ZfB · ZfC · ZfD · ZfE · ZX1 · ZX2 · ZX3 · ZX4 · ZX5

Z2X1

· Z2X2

· Z3X3

· Z1X4

· Z1X5

Partition Function

X4

X5

fD

fE

X3

X2

X1

fC

fA

fBrescaled by γ


Z2X1

· Z2X2

· Z3X3

· Z1X4

· Z1X5

Partition Function

X4

X5

fD

fE

X3

X2

X1

fC

fA

fBrescaled by γ


Z2X1

· Z2X2

· Z3X3

· Z1X4

· Z1X5

Remarkable: this expression is invariant to rescaling of

function-node-to-variable-node messages!

Partition Function

X4

X5

fD

fE

X3

X2

X1

fC

fA

fB


Z2X1

· Z2X2

· Z3X3

· Z1X4

· Z1X5

Partition Function

X4

X5

fD

fE

X3

X2

X1

fC

fA

fB

Z =

∏

f Zf · ∏

X ZX∏

X Zdeg(X)X

Partition Function

X4

X5

fD

fE

X3

X2

X1

fC

fA

fB

Z =

∏

f Zf · ∏

X ZX∏

X Zdeg(X)X

Bethe approximation:

Use the above type of expression also when factor graph has cycles.

Partition Function

X4

X5

fD

fE

X3

X2

X1

fC

fA

fB

Z =

∏

f Zf · ∏

X ZX∏

X Zdeg(X)X

Bethe approximation:

Use the above type of expression also when factor graph has cycles.

→ Z ′Bethe

Bethe Partition Function


Basically, we can evaluate the expresion for Z ′Bethe at any iteration

of the SPA.



of the SPA.

Factor graph without cycles:

We have Z ′Bethe = Z only at a fixed point of the SPA.



of the SPA.



Factor graph with cycles:

Therefore, we call Z ′Bethe a (local) Bethe partition function only if

we are at a fixed point of the SPA.



of the SPA.



Factor graph with cycles:

Therefore, we call Z ′Bethe a (local) Bethe partition function only if

we are at a fixed point of the SPA.

Factor graph with cycles: the SPA can have multiple fixed points.

We define the Bethe partition function to be

ZBethe , maxfixed points of SPA

Z ′Bethe.


grow,1

grow,2

grow,3

grow,4

grow,5

grow,6

grow,7

grow,8

gcol,1

gcol,2

gcol,3

gcol,4

gcol,5

gcol,6

gcol,7

gcol,8



Global function:

g(a1,1, . . . , a8,8)

=∏

j

gcol,j(a1,j, . . . , a8,j)×

∏

i

grow,i(ai,1, . . . , xi,8)

Permanent:

perm(θ) = Z =∑

a1,1,...,a8,8

g(a1,1, . . . , a8,8)


grow,1

grow,2

grow,3

grow,4

grow,5

grow,6

grow,7

grow,8

gcol,1

gcol,2

gcol,3

gcol,4

gcol,5

gcol,6

gcol,7

gcol,8



Global function:

g(a1,1, . . . , a8,8)

=∏

j

gcol,j(a1,j, . . . , a8,j)×

∏

i

grow,i(ai,1, . . . , xi,8)

Bethe Permanent:

permB(θ) , ZBethe


grow,1

grow,2

grow,3

grow,4

grow,5

grow,6

grow,7

grow,8

gcol,1

gcol,2

gcol,3

gcol,4

gcol,5

gcol,6

gcol,7

gcol,8

Global function:

g(a1,1, . . . , a8,8)

=∏

j

gcol,j(a1,j, . . . , a8,j)×

∏

i

grow,i(ai,1, . . . , xi,8)

Bethe Permanent:

permB(θ) , ZBethe

However, the SPA is a locally operating algorithm and so has its

limitations in the conclusions that it can reach.


grow,1

grow,2

grow,3

grow,4

grow,5

grow,6

grow,7

grow,8

gcol,1

gcol,2

gcol,3

gcol,4

gcol,5

gcol,6

gcol,7

gcol,8

Global function:

g(a1,1, . . . , a8,8)

=∏

j

gcol,j(a1,j, . . . , a8,j)×

∏

i

grow,i(ai,1, . . . , xi,8)

Bethe Permanent:

permB(θ) , ZBethe

This locality of the SPA turns out to be well-captured by so-called

finite graph covers, especially at fixed points of the SPA.

A combinatorial interpretation

of the Bethe permanent

Combinatorial Characterizationof the Permanent


Consider the matrix

θ =

θ1,1 θ1,2

θ2,1 θ2,2

with perm(θ) = θ1,1θ2,2 + θ2,1θ1,2.


Consider the matrix

θ =

θ1,1 θ1,2

θ2,1 θ2,2

with perm(θ) = θ1,1θ2,2 + θ2,1θ1,2.

In particular,

θ =

1 1

1 1

with perm(θ) = 1 · 1 + 1 · 1 = 2.


Recall that the permanent of a zero/one matrix like

θ =

1 1

1 1

equals the number of perfect matchings in the following bipartite graph:

2

11

2


Recall that the permanent of a zero/one matrix like

θ =

1 1

1 1

equals the number of perfect matchings in the following bipartite graph:

2

11

2

Namely,

2

11

2 2

11

2

A Combinatorial Interpretationof the Bethe Permanent

Consider the non-negative matrix θ of size n× n.

Let PM×M be the set of all permutation matrices of size M ×M .

For every positive integer M , we define ΨM be the set

ΨM ,

{

P ={

P(i,j)}

(i,j)∈[n]2

∣

∣

∣ P(i,j) ∈ PM×M

}

.

For P ∈ ΨM we define the P-lifting of θ to be the following

(nM)× (nM) matrix

θ =

θ1,1 · · · θ1,n...

...

θn,1 · · · θn,n

P-lifting−→of θ

θ↑P ,

θ1,1P(1,1) · · · θ1,nP

(1,n)

......

θn,1P(n,1) · · · θn,nP

(n,n)

.

Degree-M Bethe Permanent

Definition: For any positive integer M , we define the degree-M Bethe

permanent of θ to be

permB,M(θ) , M

√

⟨

perm(

θ↑P)

⟩

P∈ΨM

.

Theorem:

permB(θ) = lim supM→∞

permB,M(θ).

Special Case:Deg.-M Bethe Permanent for n = 2

We want to obtain some appreciation why the Bethe permanent of θ is

close to the permanent of θ, and where the differences are.




As before, consider the matrix

θ =

θ1,1 θ1,2

θ2,1 θ2,2

with perm(θ) = θ1,1θ2,2 + θ2,1θ1,2.




As before, consider the matrix

θ =

θ1,1 θ1,2

θ2,1 θ2,2

with perm(θ) = θ1,1θ2,2 + θ2,1θ1,2.

In particular,

θ =

1 1

1 1

with perm(θ) = 1 · 1 + 1 · 1 = 2.


For this θ, a P-lifting looks like

θ↑P =

1 ·P1,1 1 ·P1,2

1 ·P2,1 1 ·P2,2

=

P1,1 P1,2

P2,1 P2,2

.



θ↑P =

1 ·P1,1 1 ·P1,2

1 ·P2,1 1 ·P2,2

=

P1,1 P1,2

P2,1 P2,2

.

Applying some row and column permutations, we obtain

perm(

θ↑P)

= perm

I I

I P−12,1P2,2P

−11,2P1,1

.



θ↑P =

1 ·P1,1 1 ·P1,2

1 ·P2,1 1 ·P2,2

=

P1,1 P1,2

P2,1 P2,2

.

Applying some row and column permutations, we obtain

perm(

θ↑P)

= perm

I I

I P−12,1P2,2P

−11,2P1,1

.

Therefore,

permB,M(θ) , M

√

√

√

√

√

⟨

perm

I I

I P′2,2

⟩

P′

2,2∈PM×M

.

Special Case:Degree-2 Bethe Permanent for n = 2

For M = 2 we have

permB,2(θ) , 2

√

√

√

√

√

⟨

perm

I I

I P′2,2

⟩

P′

2,2∈P2×2


For M = 2 we have

permB,2(θ) , 2

√

√

√

√

√

⟨

perm

I I

I P′2,2

⟩

P′

2,2∈P2×2

corresponds to computing the average number of perfect matchings in

the following 2-covers (and taking the 2nd root):

1′′

2′

2′′

1′′

1′

1′

2′

2′′

1′′

2′

2′′

1′′

1′

1′

2′

2′′

4 2


For M = 2 we have

permB,2(θ) =2

√

1

2!· (4 + 2)



1′′

2′

2′′

1′′

1′

1′

2′

2′′

1′′

2′

2′′

1′′

1′

1′

2′

2′′

4 2


For M = 2 we have

permB,2(θ) =2

√

1

2!· (4 + 2)

=3

√

1

2!· 6 =

2√3 ≈ 1.732



1′′

2′

2′′

1′′

1′

1′

2′

2′′

1′′

2′

2′′

1′′

1′

1′

2′

2′′

4 2


For M = 2 we have

permB,2(θ) =2

√

1

2!· (4 + 2)

=3

√

1

2!· 6 =

2√3 ≈ 1.732 <

2√4 = 2 = perm(θ)



1′′

2′

2′′

1′′

1′

1′

2′

2′′

1′′

2′

2′′

1′′

1′

1′

2′

2′′

4 2


Let us have a closer look at the perfect matchings in the graph

1′′

2′

2′′

1′′

1′

1′

2′

2′′



1′′

2′

2′′

1′′

1′

1′

2′

2′′

For this graph, the perfect matchings are

1′′

2′

2′′

1′′

1′

1′

2′

2′′

1′′

2′

2′′

1′′

1′

1′

2′

2′′

1′′

2′

2′′

1′′

1′

1′

2′

2′′

1′′

2′

2′′

1′′

1′

1′

2′

2′′



1′′

2′

2′′

1′′

1′

1′

2′

2′′


1′′

2′

2′′

1′′

1′

1′

2′

2′′

1′′

2′

2′′

1′′

1′

1′

2′

2′′

1′′

2′

2′′

1′′

1′

1′

2′

2′′

1′′

2′

2′′

1′′

1′

1′

2′

2′′

Because this double cover consists of two independent copies of the

base graph, the number of perfect matchings is 22 = 4.



1′′

2′

2′′

1′′

1′

1′

2′

2′′



1′′

2′

2′′

1′′

1′

1′

2′

2′′


1′′

2′

2′′

1′′

1′

1′

2′

2′′

1′′

2′

2′′

1′′

1′

1′

2′

2′′



1′′

2′

2′′

1′′

1′

1′

2′

2′′


1′′

2′

2′′

1′′

1′

1′

2′

2′′

1′′

2′

2′′

1′′

1′

1′

2′

2′′

The coupling of the cycles causes this graph to have fewer than

22 perfect matchings!


On the other hand, for M = 2 we have

permB,2(θ) =2

√

1

2!· (4 + 2)

=3

√

1

2!· 6 =

2√3 ≈ 1.732 <

2√4 = 2 = perm(θ)



1′′

2′

2′′

1′′

1′

1′

2′

2′′

1′′

2′

2′′

1′′

1′

1′

2′

2′′

4 2



permB,3(θ) , 3

√

√

√

√

√

⟨

perm

I I

I P′2,2

⟩

P′

2,2∈P3×3



permB,3(θ) , 3

√

√

√

√

√

⟨

perm

I I

I P′2,2

⟩

P′

2,2∈P3×3


the following 3-covers (and taking the 3rd root):

1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

8 4 4 4 2 2



permB,3(θ) =3

√

1

3!· (8 + 4 + 4 + 4 + 2 + 2)



1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

8 4 4 4 2 2



permB,3(θ) =3

√

1

3!· (8 + 4 + 4 + 4 + 2 + 2)

=3

√

1

3!· 24 =

3√4 ≈ 1.587



1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

8 4 4 4 2 2



permB,3(θ) =3

√

1

3!· (8 + 4 + 4 + 4 + 2 + 2)

=3

√

1

3!· 24 =

3√4 ≈ 1.587 <

3√8 = 2 = perm(θ)



1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

8 4 4 4 2 2



1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′



1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′


1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′



1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′


1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

1′′

1′′′

2′

2′′

2′′′

1′′

1′′′

1′

1′

2′

2′′

2′′′

The coupling of the cycles causes this graph to have fewer than

23 perfect matchings!


For general M we obtain

permB,M(θ) = M√

ζSM=

M√M + 1,

ζSM: cycle index of the symmetric group over M elements.

The Gibbs free energy function

Gibbs Free EnergyFunction

p

FGibbs(p)The Gibbs free energy function

FGibbs(p) , −∑

a

pa · log(

g(a))

+∑

a

pa · log(pa).


p

p∗

− log(ZGibbs) FGibbs(p)The Gibbs free energy function

FGibbs(p) , −∑

a

pa · log(

g(a))

+∑

a

pa · log(pa).


p

p∗


FGibbs(p) , −∑

a

pa · log(

g(a))

+∑

a

pa · log(pa).

is defined such that its minimal value is related to the partition function:

Z = exp

(

−minp

FGibbs(p)

)

.


p

p∗


FGibbs(p) , −∑

a

pa · log(

g(a))

+∑

a

pa · log(pa).


perm(θ) = Z = exp

(

−minp

FGibbs(p)

)

.


p

p∗


FGibbs(p) , −∑

a

pa · log(

g(a))

+∑

a

pa · log(pa).


perm(θ) = Z = exp

(

−minp

FGibbs(p)

)

.

Nice, but it does not yield any computational savings by itself.


p

p∗

− log(ZGibbs) FGibbs(p)

F ′(p)

− log(Z ′)

p′

The Gibbs free energy function

FGibbs(p) , −∑

a

pa · log(

g(a))

+∑

a

pa · log(pa).


perm(θ) = Z = exp

(

−minp

FGibbs(p)

)

.

But it suggests other optimization schemes.

The Bethe approximation

Bethe Approximation

The Bethe approximation to the Gibbs free energy function yields such

an alternative optimization scheme.

Bethe Approximation



This approximation is interesting because of the following theorem:

Theorem (Yedidia/Freeman/Weiss, 2000):

Fixed points of the sum-product algorithm (SPA) correspond to

stationary points of the Bethe free energy function.

Bethe Approximation



This approximation is interesting because of the following theorem:

Theorem (Yedidia/Freeman/Weiss, 2000):

Fixed points of the sum-product algorithm (SPA) correspond to

stationary points of the Bethe free energy function.

Definition: We define the Bethe permanent of θ to be

permB(θ) = ZBethe = exp

(

−minβ

FBethe(β)

)

.

Bethe Approximation

However, in general, this approach of replacing the Gibbs free energy by

the Bethe free energy comes with very few guarantees:

Bethe Approximation



The Bethe free energy function might have multiple local minima.

Bethe Approximation




It is unclear how close the (global) minimum of the Bethe free

energy is to the minimum of the Gibbs free energy.

Bethe Approximation




It is unclear how close the (global) minimum of the Bethe free

energy is to the minimum of the Gibbs free energy.

It is unclear if the sum-product algorithm converges (even to a

local minimum of the Bethe free energy).

Bethe Approximation

Luckily, in the case of the permanent approximation problem, the

above-mentioned normal factor graph N(θ) is such that the Bethe free

energy function is very well behaved. In particular, one can show that:

Bethe Approximation




The Bethe free energy function (for a suitable parametrization)

is convex and therefore has no local minima [V., 2010, 2011].

Bethe Approximation






The minimum of the Bethe free energy is quite close to the

minimum of the Gibbs free energy. (More details later.)

Bethe Approximation






The minimum of the Bethe free energy is quite close to the

minimum of the Gibbs free energy. (More details later.)

The sum-product algorithm converges to the minimum of the

Bethe free energy. (More details later.)

Relationship betweenPermanent and Bethe Permanent

permB(θ)

Theorem (Gurvits, 2011)↓≤ perm(θ)

Conjecture (Gurvits, 2011)↓≤

√2n · permB(θ)


permB(θ)

Theorem (Gurvits, 2011)↓≤ perm(θ)

Conjecture (Gurvits, 2011)↓≤

√2n · permB(θ)

This can be rewritten as follows:

1

nlog permB(θ)

Theorem↓≤ 1

nlog perm(θ)

Conjecture↓≤ 1

nlog permB(θ) + log(

√2)


Problem: find large classes of random matrices such that w.h.p.

permB(θ)

Theorem (Gurvits, 2011)↓≤ perm(θ) ≤ O(

√n) · permB(θ).


Problem: find large classes of random matrices such that w.h.p.

permB(θ)

Theorem (Gurvits, 2011)↓≤ perm(θ) ≤ O(

√n) · permB(θ).

This can be rewritten as follows:

1

nlog permB(θ)

Theorem↓≤ 1

nlog perm(θ) ≤ 1

nlog permB(θ) +O

(

1

nlog(n)

)

Sum-Product Algorithm Convergence

Theorem: Modulo some minor technical conditions on the initial

messages, the sum-product algorithm converges to the (global)

minimum of the Bethe free energy function [V., 2010, 2011].

Sum-Product Algorithm Convergence

Theorem: Modulo some minor technical conditions on the initial

messages, the sum-product algorithm converges to the (global)

minimum of the Bethe free energy function [V., 2010, 2011].

Comment: the first part of the proof of the above theorem is very

similar to the SPA convergence proof in

Bayati and Nair, “A rigorous proof of the cavity method for counting

matchings,” Allerton 2006.

Note that they consider matchings, not perfect matchings. (Although

the perfect matching case can be seen as a limiting case of the matching

setup, the convergence proof of the SPA is incomplete for that case.)

Conclusions

Conclusions

Conclusions

Loopy belief propagagion is no silver bullet.

Conclusions


However, there are interesting setups where it works very well.

Conclusions



Complexity of the permanent estimation based on the SPA is

remarkably low. (Hard to be beaten by any standard convex

optimization algorithm that minimizes the Bethe free energy.)

Conclusions






If the Bethe approximation does not work well, one can try better

approximations, e.g., the Kikuchi approximation.

Conclusions








Note: One can also give a comb. interpr. of the Kikuchi part. func.

Conclusions








Note: One can also give a comb. interpr. of the Kikuchi part. func.

Inspired by the approaches mentioned in this talk, Ryuhei Mori

recently showed that many replica method computations can be

simplified and made quite a bit more intuitive.

Thank you!

References

P. O. Vontobel, “Counting in graph covers: a combinatorial

characterization of the Bethe entropy function,” submitted to IEEE

Trans. Inf. Theory, Nov. 2010, arxiv: 1012.0065.

P. O. Vontobel, “The Bethe permanent of a non-negative matrix,”

Proc. 48th Allerton Conf. on Communications, Control, and

Computing, Allerton House, Monticello, IL, USA, Sep. 29 – Oct. 1,

2010.

P. O. Vontobel, “A combinatorial characterization of the Bethe and the

Kikuchi partition functions,” Proc. Inf. Theory Appl. Workshop, UC

San Diego, La Jolla, CA, USA, Feb. 6–11, 2011.

R. Mori, “Connection between annealed free energy and belief

propagation on random factor graph ensembles,” Proc. ISIT 2011, St.

Petersburg, Russia, Jul. 31 – Aug. 5, arxiv: 1102.3132.

Documents

Should We Believe in Numbers Computed by Loopy Belief …people.csail.mit.edu/andyd/CIOG_slides/vontobel_talk_ciog2011.pdf · CIOG Workshop, Princeton University, November 3, 2011