Transcript
Page 1: Bregman  Information Bottleneck

Bregman Bregman Information BottleneckInformation Bottleneck

NIPS’03, Whistler December 2003

Koby CrammerKoby CrammerHebrew UniversityHebrew University

of Jerusalemof Jerusalem

Noam SlonimNoam SlonimPrinceton UniversityPrinceton University

Page 2: Bregman  Information Bottleneck

MotivationMotivation

• Extend the IB for a broad family of representations• Relation to the Exponential family

Hello, world

Multinomial distribution

Vectors

Page 3: Bregman  Information Bottleneck

OutlineOutline

• Rate-Distortion Formulation• Bregman Divergences• Bregman IB• Statistical Interpretation• Summary

Page 4: Bregman  Information Bottleneck

Information BottleneckInformation Bottleneck

X T Y

X

[ p(y=1|X) … p(y=n|X)]

[ p(y=1|T) … p(y=n|T)]

T

Page 5: Bregman  Information Bottleneck

• Input

• Variables

• Distortion

Rate-Distortion FormulationRate-Distortion Formulation

Page 6: Bregman  Information Bottleneck

• Bolzman Distribution:

• Markov + Bayes

• Marginal

Self-Consistent EquationsSelf-Consistent Equations

Page 7: Bregman  Information Bottleneck

Bregman DivergencesBregman Divergences

f

(u,f(u))

(v,f(v))

(v, f(u)+f’(u)(v-u))

Bf(v||u) = f(v) - (f(u)+f’(u)(v-u))Bf(v||u) = f:S R

Page 8: Bregman  Information Bottleneck

• Functional

• Bregman Function

• Input

• Variables

• Distortion

Bregman IB: Rate-Distortion FormulationBregman IB: Rate-Distortion Formulation

Page 9: Bregman  Information Bottleneck

• Bolzman Distribution:

• Prototypes: convex combination of input vectors

• Marginal

Self-Consistent EquationsSelf-Consistent Equations

Page 10: Bregman  Information Bottleneck

Special CasesSpecial Cases

• Information Bottleneck: Bregman function: f(x)=x log(x) – x Domain: Simplex Divergence: Kullback-Leibler

• Soft K-means Bregman function: f(x)=(1/2) x2

Domain: Realsn

Divergence: Euclidian Distance [Still, Bialek, Bottou, NIPS 2003]

Page 11: Bregman  Information Bottleneck

Bregman IBBregman IB

Information Bottleneck

BregmanClustering

Rate-Distortion

Exponential Family

Page 12: Bregman  Information Bottleneck

Exponential FamilyExponential Family

• Expectation parameters:

• Examples (single dimension): Normal

Poisson

Page 13: Bregman  Information Bottleneck

• Expectation parameters:

• Properties :

Exponential Family and Exponential Family and Bregman DivergencesBregman Divergences

Page 14: Bregman  Information Bottleneck

IllustrationIllustration

Page 15: Bregman  Information Bottleneck

• Expectation parameters:

• Properties :

Exponential Family and Exponential Family and Bregman DivergencesBregman Divergences

Page 16: Bregman  Information Bottleneck

• Distortion:

• Data vectors and prototypes: expectation parameters

• Question: For what exponential distribution we have ?

Answer: Poisson

Back to Distributional ClusteringBack to Distributional Clustering

Page 17: Bregman  Information Bottleneck

Product of Poisson

Distributions

IllustrationIllustration

a a b a a a b a a a .8.2

a b

6040

a b

Pr

Multinomial Distribution

Page 18: Bregman  Information Bottleneck

Back to Distributional ClusteringBack to Distributional Clustering

• Information Bottleneck: Distributional clustering of Poison distributions

• (Soft) k-means: (Soft) Clustering of Normal distributions

Page 19: Bregman  Information Bottleneck

• Distortion

• Input: Observations

• Output Parameters of Distribution

• IB functional: EM [Elidan & Fridman, before]

Maximum Likelihood PerspectiveMaximum Likelihood Perspective

Page 20: Bregman  Information Bottleneck

• Posterior:

• Partition Function:

Weighted -norm of the Likelihood

• → ∞ , most likely cluster governs• →0 , clusters collapse into a single prototype

Back to Self Consistent EquationsBack to Self Consistent Equations

Page 21: Bregman  Information Bottleneck

Summary Summary

• Bregman Information Bottleneck Clustering/Compression

for many representations and divergences

• Statistical Interpretation Clustering of distributions from the exponential family EM like formulation

• Current Work: Algorithms Characterize distortion measures which also yield

Bolzman distributions General distortion measures


Recommended