30
The expressive power of mixture models and Restricted Boltzmann Machines Johannes Rauh joint work with Guido Mont ´ ufar and Nihat Ay Max Planck Institute Mathematics in the Sciences Workshop on Graphical Models April 2012 (Fields Institute, Toronto) Rauh, Mont ´ ufar, Ay ( MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 1 / 25

The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

The expressive power of mixture models andRestricted Boltzmann Machines

Johannes Rauhjoint work with Guido Montufar and Nihat Ay

Max Planck InstituteMathematics in the Sciences

Workshop on Graphical ModelsApril 2012 (Fields Institute, Toronto)

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 1 / 25

Page 2: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

Outline

Mixture Models and RBMs

Submodels of RBMs

The number of modes

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 2 / 25

Page 3: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

Mixture Models and RBMs

Outline

Mixture Models and RBMs

Submodels of RBMs

The number of modes

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 3 / 25

Page 4: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

Mixture Models and RBMs

Mixture modelsConsider n binary random variables. We want to study productdistributions and their mixtures.

k

n

DefinitionThe kth mixture model Mn,k consists of all convex combinations of kproduct distributions.

p(x1, . . . , xn) =k∑

i=1

λiqi,1(x1)qi,2(x2) . . . qi,n(xn).

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 4 / 25

Page 5: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

Mixture Models and RBMs

Mixture modelsConsider n binary random variables. We want to study productdistributions and their mixtures.

k

n

DefinitionThe kth mixture model Mn,k consists of all convex combinations of kproduct distributions.

Mn,k is (the closure of) a graphical model with one hidden node ofsize k.

With this definition, the first mixture model is the independencemodel.

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 4 / 25

Page 6: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

Mixture Models and RBMs

Restricted Boltzmann Machines (RBMs)

Consider n binary random variables.

m

n

DefinitionThe Restricted Boltzmann Machine RBMn,m is (the closure of) thegraphical model of the complete bipartite graph Kn,m, where thegroup of m nodes is hidden.

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 5 / 25

Page 7: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

Mixture Models and RBMs

Relations between mixture models and RBMs

RBMn,m is a submodel of Mn,2m :

p(v) =∑

h

p(v, h) =∑

h

p(v|h)p(h),

where the conditionals p(v|h) are product distributions.

The mixture components p(h) belong to RBMm,n.

For m = 1, equality holds: RBMn,1 = Mn,2.

RBMn,m equals the mth Hadamard power of Mn,2:Hadamard product of functions = point-wise productfor probability distributions: renormalize afterwards

(Observation due to Cueto, Morton and Sturmfels 2009)

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 6 / 25

Page 8: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

Mixture Models and RBMs

The expressive power

We want to describe, as precisely as possible, which probabilitydistributions a model does or does not contain.

Both RBMs and mixture models are semi-algebraic sets:They have an implicit description in terms of polynomialequations and inequalities.

QuestionHow does this semi-algebraic description look like?

This description would allow to easily check whether a givendistribution belongs to the model.

. . . but it appears to be too difficult to compute.

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 7 / 25

Page 9: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

Mixture Models and RBMs

The expressive power

Since a complete description seems out of reach, we can askother questions:

Easier problems

What is the dimension of the model?

Find large subsets of the model that are easy to describe.

Find large sets of probability distributions that are notcontained in the model.

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 8 / 25

Page 10: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

Mixture Models and RBMs

The dimension of a modelIn many cases, the dimension of a parametrically definedsemi-algebraic set is the expected dimension, i.e. the number ofparameters (or the dimension of the ambient space).

The dimension of binary mixture models was recentlycomputed:

Theorem (Catalisano, Geramita and Gimigliano 2011)The dimension of Mn,k equals the expected dimensionmin{nk + k − 1, 2n − 1}, unless n = 4 and m = 3.

For RBMs, the dimension is as expected in all known cases:

Theorem (Cueto, Morton and Sturmfels 2009)The dimension of RBMn,m equals the expected dimensionmin{nm + n + m, 2n − 1} for k ≤ 2n−dlog2(n+1)e and for k ≥ 2n−blog2(n+1)c.

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 9 / 25

Page 11: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

Mixture Models and RBMs

The role of dimension

It is wellknown that the dimension alone is not sufficient to decide,whether a model is full, i.e. contains all probability distributions:

Zwiernik and Smith computed all inequalities of M3,2.Montufar proved that Mn,k is full if and only if k ≥ 2n−1.

For n = 3:k 1 2 3 4dim 3 7 7 7full? no no no yes

For RBMs with n = 3:

m 0 1 2 3dim 3 7 7 7full? no no no yes

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 10 / 25

Page 12: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

Mixture Models and RBMs

The model and its complement

The rest of the talk is about two projects that attack the followingtwo problems:

Problem 1Find large subsets of the model that are easy to describe.

We find “large” subsets of RBMn,m. These subsets are related tomixtures of product distributions on disjoint supports and allow toestimate the maximal approximation error.

Problem 2Find large sets of probability distributions that are not contained inthe model.

We find “large” sets outside of Mn,k. Interestingly, these sets touchthe uniform distribution, showing that the uniform distribution neednot be an interior point of Mn,k, even if Mn,k has the full dimension.

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 11 / 25

Page 13: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

Submodels of RBMs

Outline

Mixture Models and RBMs

Submodels of RBMs

The number of modes

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 12 / 25

Page 14: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

Submodels of RBMs

Cubical setsIdea:

The RBM consists of mixtures of product distributions.Mixtures are difficult to describe. . .

. . . unless they have disjoint supports

DefinitionA set Y ⊆ {0, 1}n is cubical, if it corresponds to a face of the n-dimensional hypercube.

111

000100

010

011

x1 = 1

x1,3 = 00

Cubical sets are cylinder sets, i.e. they are characterized by “sub-configurations.”

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 13 / 25

Page 15: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

Submodels of RBMs

Cubical setsIdea:

The RBM consists of mixtures of product distributions.Mixtures are difficult to describe. . .

. . . unless they have disjoint supports

DefinitionA set Y ⊆ {0, 1}n is cubical, if it corresponds to a face of the n-dimensional hypercube.

111

000100

010

011

x1 = 1

x1,3 = 00

Cubical sets are the support sets of product distributions.

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 13 / 25

Page 16: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

Submodels of RBMs

Mixtures on disjoint supports

TheoremRBMn,m contains any mixture of one arbitrary product distributionand m product distributions with pairwise disjoint cubical supports.

CorollaryIf m ≥ 2n−1 − 1, then RBMn,m is full.

Proof of the Corollary.

Any distribution on an edge is a product distribution.

The n-dimensional hypercube is covered by 2n−1 disjointedges

Hence any distribution is a mixture of 2n−1 productdistributions supported on disjoint edges. �

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 14 / 25

Page 17: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

Submodels of RBMs

The approximation errorNow we can find upper bounds for the approximation error:

TheoremLet m ≤ 2n−1 − 1. Then the Kullback-Leibler divergence from anydistribution on {0, 1}n to RBMn,m is upper bounded by

maxp

D(p‖RBMn,m) ≤ n − blog(m + 1)c −m + 1

2blog(m+1)c

The bound gives an idea about the value of additional hiddennodes.

Idea of the proof:

Approximate a distribution p by a mixture of productdistributions on disjoint supports.

Where p puts more mass, the approximation must be better(choose smaller cubical sets).

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 15 / 25

Page 18: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

Submodels of RBMs

Examples

TheoremLet m ≤ 2n−1 − 1. Then the Kullback-Leibler divergence from anydistribution on {0, 1}n to RBMn,m is upper bounded by

maxp

D(p‖RBMn,m) ≤ n − blog(m + 1)c −m + 1

2blog(m+1)c

0 1 2 30

0.5

1

1.5

2

D(u+‖RBM), n = 3

m0 1 2 3 4 5 6 7

0

0.5

1

1.5

2

2.5

3

D(u+‖RBM), n = 4

m

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 16 / 25

Page 19: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

The number of modes

Outline

Mixture Models and RBMs

Submodels of RBMs

The number of modes

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 17 / 25

Page 20: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

The number of modes

The set Z+

QuestionHow to prove that Mn,k is not full if k < 2n−1?

Denote Z± the elements of {0, 1}n with even/odd parity.

000

010

011

100100

111

LemmaIf k < 2n−1, then Mn,k does not contain the uniform distribution u+on Z+.

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 18 / 25

Page 21: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

The number of modes

The set Z+

000

010

011

100100

111

LemmaIf k < 2n−1, then Mn,k does not contain the uniform distribution u+on Z+.

Proof.

If u+ =∑N

i=1 λi pi, with λi > 0, then Z+ = ∪i supp(pi).

If the pi are product measures, then supp(pi) is cubical.

Only the one-element subsets of Z+ are cubical. �

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 19 / 25

Page 22: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

The number of modes

The set Z+

000

010

011

100100

111

Note: Mn,k is full if and only if Mn,k contains u+.

ConjectureThe same is true for RBMs: RBMn,m is full if and only if RBMn,m

contains u+.

(true for n = 3)

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 19 / 25

Page 23: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

The number of modes

The number of modes

Idea: Find a neighbourhood of u+ which is not contained in Mn,k.

DefinitionA mode of a distribution p is a strict local maximum of p, where“local” refers to the neighbourhood structure on the cube.

A single product distribution has (at most) one mode.u+ has 2n−1 modes.If p has 2n−1 modes, then the set of modes equals Z+ or Z−.

000

010

011

100100

111

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 20 / 25

Page 24: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

The number of modes

The number of modes

Idea: Find a neighbourhood of u+ which is not contained in Mn,k.

DefinitionA mode of a distribution p is a strict local maximum of p, where“local” refers to the neighbourhood structure on the cube.

A single product distribution has (at most) one mode.u+ has 2n−1 modes.If p has 2n−1 modes, then the set of modes equals Z+ or Z−.

QuestionHow many modes can a mixture of k product distributions have?

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 20 / 25

Page 25: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

The number of modes

The number of modes

Let α(n, k) denote the maximum number of modes that p ∈ Mn,k

may have.

Properties:

2n−1 ≥ α(n, k) ≥ min{k, 2n−1}

α(n, 1) = 1

α(3, 2) = 2

α(3, 3) = 3

CorollaryM3,3 is not full.

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 21 / 25

Page 26: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

The number of modes

Distributions with four modes

Resultα(3, 3) = 3, and hence M3,3 is not full.

Note that there are distributions with four modes arbitrarily close tothe uniform distribution.

CorollaryThe uniform distribution is not an interior point of M3,3, even thoughM3,3 is full-dimensional.In particular, the uniform distribution is a singularity of M3,3.

The set of distributions with four modes is a union of twopolyhedral sets, containing distributions with four modes on Z±.Both polyhedral parts touch the uniform distributions.

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 22 / 25

Page 27: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

The number of modes

The number of modes

Let α(n, k) denote the maximum number of nodes that p ∈ Mn,k

may have.

Properties:

2n−1 ≥ α(n, k) ≥ min{k, 2n−1}

α(n, 1) = 1

α(3, 2) = 2

α(3, 3) = 3

α(4, 2) = 3

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 23 / 25

Page 28: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

The number of modes

The number of modes

Let α(n, k) denote the maximum number of nodes that p ∈ Mn,k

may have.

Properties:

2n−1 ≥ α(n, k) ≥ min{k, 2n−1}

α(n, 1) = 1

α(3, 2) = 2

α(3, 3) = 3

α(4, 2) = 3

This tells us:It is not sufficient to consider the number of modes:For example, α(4, 7) = 8, but M4,7 is not full.

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 23 / 25

Page 29: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

Conclusions

Summary

Mixture models and RBM are important statistical models withmany open problems.

Mixtures of product distributions with disjoint supports canhelp to understand RBMs.

Distributions with the maximal number of modes are difficult toapproximate with mixture models and RBMs.

Open Problems:

Compute α(n, k) for n ≥ 4, k ≥ 2.

Is the uniform distribution always a singularity of Mn,k (unlessthe model is full)?

What about RBMn,m?

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 24 / 25

Page 30: The expressive power of mixture models and Restricted … · 2012. 4. 17. · Mixture Models and RBMs Relations between mixture models and RBMs RBM n;m is a submodel of M n;2m: p(v)

Conclusions

further reading

G. Montufar, J. Rauh, N. AyExpressive Power and Approximation Errors of RBMsNIPS 2011

G. MontufarMixture Decompositions using a Decomposition of the SampleSpacearXiv 1008.0204

A. Cueto, J. Morton, B. SturmfelsGeometry of the Restricted Boltzmann MachineAlgebraic Methods in Statistics and Probability II

P. Zwiernik, J. SmithImplicit inequality constraints in a binary tree modelElectronic Journal of Statistics

Rauh, Montufar, Ay (� MPI MIS) : Mixture Models and RBMs Graphical Models Workshop 25 / 25