Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
MeanExponential Family
Application
Bhattacharyya clustering with applications tomixture simplificationsICPR 2010, Istanbul, Turkey
Frank Nielsen1,2 Sylvain Boltz1 Olivier Schwander1,3
1Ecole Polytechnique, France
2Sony Computer Science Laboratories, Japan
3ENS Cachan, France
August, 24 2010
Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
MeanExponential Family
Application
MeanDefinitionBurbea-Rao divergencesBurbea-Rao centroid
Exponential FamilyDefinitionBhattacharyya distanceClosed-form formula
ApplicationStatistical mixturesMixture simplification
Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
MeanExponential Family
Application
Introduction
Bhattacharyya distance
I Widely used to compare probability density functions
I Good statistical properties, related to Fisher information
I Measures the overlap between two distributions
Bhattacharyya coefficient
Bc(p, q) =
∫ √p(x)q(x)dx ≤ 1
Bhattacharyya distance
B(p, q) = − log Bc(p, q) ≥ 0
Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
MeanExponential Family
Application
Contributions
Drawbacks
I Few closed-form formulaare known
I Centroid estimation onlyfor univariate Gaussian,without guarantees
Results
I Bhattacharyya betweenexponential families, usingBurbea-Rao divergencecs
I Efficient scheme forcentroid
I Application tosimplification of Gaussianmixtures
Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
MeanExponential Family
Application
DefinitionBurbea-Rao divergencesBurbea-Rao centroid
What is a mean ?
Euclidean geometry
I Given a set of n points {pi},I the center of mass (a.k.a. center of gravity) is
c =1
n
∑i
pi
Unique minimizer of average squared Euclidean distance
c = arg minp
∑i
‖p − pi‖2
Definitions
I By axiomatization
I By optimization
Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
MeanExponential Family
Application
DefinitionBurbea-Rao divergencesBurbea-Rao centroid
Axiomatization
Axioms for a mean function M(x1, x2)
I Reflexivity : M(x , x) = x
I Symmetry : M(x1, x2) = M(x2, x1)
I Continuity : M(·, ·) continuous
I Strict monotonicity : M(x1, x2) < M(x ′1, x2) for x1 < x ′1I Anonymity :
M(M(x11, x12),M(x21, x22)) = M(M(x11, x21),M(x12, x22))
Yields to a unique family
M(x1, x2) = f −1
(f (x1) + f (x2)
2
)with f continuous, strictly monotonous and increasing function
Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
MeanExponential Family
Application
DefinitionBurbea-Rao divergencesBurbea-Rao centroid
Examples and f -representation
Some f -means
I Arithmetic mean : x1+x22 with f (x) = x
I Geometric mean :√
x1x2 with f (x) = log x
I Harmonic mean : 21x1
+ 1x2
with f (x) = 1x
Arithmetic mean on the f -representation
I y = f (x)
I f (x) = 1n
∑i f (xi )
I y = 1n
∑i yi
Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
MeanExponential Family
Application
DefinitionBurbea-Rao divergencesBurbea-Rao centroid
Optimization
Problem
minx
∑i
ωid(x , pi ) = minx
L(x ; ({xi}, {ωi}), d
Entropic mean (Ben-Tal et al., 1989)
I d(p, q) = If (p, q) = pf (qp ) (Csiszar f -divergence)
I f is a strictly convex differentiable function with f (1) = 0 andf ′(1) = 0
Some entropic means
I Arithmetic mean : f (x) = − log x + x − 1
I Geometric mean : f (x) = x log x − x + 1
I Harmonic mean : f (x) = (x − 1)2
Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
MeanExponential Family
Application
DefinitionBurbea-Rao divergencesBurbea-Rao centroid
Bregman means
Bregman divergence
I BF (p, q) = F (p)− F (q) + 〈p − q|∇F (q)〉I F is a strictly convex and differentiable function
Convex problem
I unique minimizer
I c = ∇F−1 (∑
i ωi∇F (xi ))
Since BF is not symmetrical, there is another centroid
I Left-sided one : minx∑
i ωiBF (x , pi )
I Right-sided one : minx∑
i ωiBF (pi , x)
Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
MeanExponential Family
Application
DefinitionBurbea-Rao divergencesBurbea-Rao centroid
Burbea-Rao divergence
Based on Jensen inequality for a convex function F
BRF (p, q) =F (p) + F (q)
2− F (
p + q
2) ≥ 0
Special case : Jensen-Shannon divergence
I JS(p, q) = KL(p, p+q2 ) + KL(q, p+q
2 )
I JS(p, q) = H(p+q2 )− H(p)+H(q)
2 − ≥ 0
I H(x) = −F (x) = −x log x (Shannon entropy)
Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
MeanExponential Family
Application
DefinitionBurbea-Rao divergencesBurbea-Rao centroid
Symmetrizing Bregman divergences
Jeffreys-Bregman divergence
SF (p, q) =1
2(BF (p, q) + BF (q, p))
=1
2〈p − q|∇F (p)−∇F (q)〉
Jensen-Bregman divergence
JF (p, q) =1
2
(BF (p,
p + q
2) + BF (q,
p + q
2)
)=
F (p) + F (q)
2− F
(p + q
2
)= BRF (p, q)
Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
MeanExponential Family
Application
DefinitionBurbea-Rao divergencesBurbea-Rao centroid
Burbea-Rao centroid
Optimization problem
I c = arg minx∑
i ωiBRF (x , pi ) = arg min L(x)
I L(x) ≡ 1
2F (x)︸ ︷︷ ︸
convex
−∑i
ωiF (c + pi
2)︸ ︷︷ ︸
concave
ConCave Convex Procedure (CCCP, NIPS2001)
I iterative scheme
I ∇Lconvex(x (k+1)) = ∇Lconcave(x (k))
I converges to a local minimum
Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
MeanExponential Family
Application
DefinitionBurbea-Rao divergencesBurbea-Rao centroid
ConCave Convex ProcedurePossible decomposition for function with bounded Hessian
Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
MeanExponential Family
Application
DefinitionBurbea-Rao divergencesBurbea-Rao centroid
Iterative algorithm for Burbea-Rao centroids
Initializationx (0) : center of mass (Bregman right-sided centroid), orsymmetrized KL divergence
Iteration
∇F (x (k+1)) =∑i
ωi∇F
(x (t) + pi
2
)
Centroid
x (t+1) = ∇F−1
(∑i
ωi∇F
(x (t) + pi
2
))
Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
MeanExponential Family
Application
DefinitionBhattacharyya distanceClosed-form formula
Exponential family
Definition
p(x ;λ) = pF (x ; θ) = exp (〈t(x)|θ〉 − F (θ) + k(x))
I λ source parameter
I θ natural parameter
I F (θ) log-normalizer
I k(x) carrier measure
Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
MeanExponential Family
Application
DefinitionBhattacharyya distanceClosed-form formula
Example
Poisson distribution
p(x ;λ) =λx
x!exp(−λ)
I t(x) = x
I θ = log λ
I F (θ) = exp(θ)
Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
MeanExponential Family
Application
DefinitionBhattacharyya distanceClosed-form formula
Multivariate normal distribution
Gaussian
p(x ;µ,Σ) =1
2π√
det Σexp
(−(x − µ)tΣ−1(x − µ)
2
)
Exponential family
I θ = (θ1, θ2) =(Σ−1µ, 1
2 Σ−1)
I F (θ) = 14tr(θ−1
1 θ2θT2
)− 1
2 log det θ1 + d2 log π
I t(x) = (x ,−x tx)
I k(x) = 0
Composite vector-matrix inner product
〈θ, θ′〉 = θt1θ′1 + tr(θt2θ
′2)
Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
MeanExponential Family
Application
DefinitionBhattacharyya distanceClosed-form formula
Bhattacharyya distance
Bhattacharyya coefficient
I Amount of overlap between distributions
I Bc(p, q) =∫ √
p(x)q(x)dx
Bhattacharyya distance
I B(p, q) = − log Bc(p, q)
Metrization
I Hellinger-Matusita metric
I H(p, q) =√
1− B(p, q)
I Gives the same Voronoi diagram
Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
MeanExponential Family
Application
DefinitionBhattacharyya distanceClosed-form formula
Closed-form formula
Bc(p, q) =
∫ √p(x)q(x)dx
=
∫exp
(〈t(x),
θp + θq2〉 − F (θp + θq)
2+ k(x)
)dx
= exp
(F
(θp + θq
2
)− F (θp) + F (θq)
2
)> 0
B(p, q) = − log Bc(p, q) = BRF (θp, θq) ≥ 0
Equivalence
I Bhattacharyya between two member of the same EF
I Burbea-Rao between natural parameters using log-normalizer
Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
MeanExponential Family
Application
DefinitionBhattacharyya distanceClosed-form formula
Examples
Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
MeanExponential Family
Application
Statistical mixturesMixture simplification
Gaussian Mixture Models
Mixture
I Pr(X = x) =∑
i ωiPr(X = x |µi ,Σi )
I each Pr(X = x |µi ,Σi ) is a multivariate normal distribution
Soft Clustering
Expectation-Maximization algorithm, equivalent to soft Bregmanclustering
Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
MeanExponential Family
Application
Statistical mixturesMixture simplification
Statistical imageshttp ://www.informationgeometry.org/MEF/
RGBxy representation : 5D point set
Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
MeanExponential Family
Application
Statistical mixturesMixture simplification
Mixture simplification
Initialization
I Mixture of Gaussians, with Bregman soft clustering (≡ EM)
Simplification
I k-means using Bhattacharyya distance and centroids
Different k
I Hierarchical clustering
Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
MeanExponential Family
Application
Statistical mixturesMixture simplification
Hierarchical clustering
Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
MeanExponential Family
Application
Statistical mixturesMixture simplification
Conclusion
Results
I Symmetrizing Bregman yields Burbea-Rao divergences
I Bhattacharyya between exponential families yields Burbea-Rao
I Closed-form formula for Bhattacharyya between EF
I Efficient scheme for BR centroid using CCCP
Applications
I Simplification of Gaussian Mixture Models
I Hierarchical Clustering
Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
MeanExponential Family
Application
Statistical mixturesMixture simplification
References
I Statistical exponential families : A digest with flash cards,F.Nielsen and V. Garcia, arXiv 2009
I An optimal Bhattacharyya centroid algorithm for Gaussianclustering with applications in automatic speech recognition,ICASSP 2000.
I The concave-convex procedure, A. Yuille and A. Rangarajan,Neural Computation, vol. 15, no. 4, pp. 915-936, 2003.
I The Burbea-Rao and Bhattacharyya centroids, F. Nielsen, andS. Boltz, arXiv 2010
www.informationgeometry.org
Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification