View
229
Download
2
Tags:
Embed Size (px)
Citation preview
WAGOS
Conformal Changes of Divergence and Information Geometry
Shun-ichi AmariRIKEN Brain Science Institute
Information GeometryInformation Geometry
Systems Theory Information Theory
Statistics Neural Networks
Combinatorics PhysicsInformation Sciences
Riemannian ManifoldDual Affine Connections
Manifold of Probability Distributions
Math. AI
Vision, Shape
optimization
22
1; , ; , exp
22
xS p x p x
Information Geometry ?Information Geometry ?
p x
;S p x θ
Riemannian metricDual affine connections
( , ) θ
Manifold of Probability DistributionsManifold of Probability Distributions
1 2 3 1 2 3
1,2,3 { ( )}
, , 1
x p x
p p p p p p
1p
p
3p
2p
;S p x
0 1
1 0 1
1
{ | ( , ,..., ); 1; 0}
{ | ( , ,..., ); 0}
n n i i
n n i
n n
S p p p p p
S p p p p
S S
p
p
Riemannian Structure
2 ( )
( )
( ) ( )
Euclidean
i jij
T
ij
ds g d d
d G d
G g
G E
Fisher information
Affine Connectioncovariant derivative
,
0, X=X(t)
( )
X c
X
i jij
Y X Y
X
s g d d
minimal
geodesic
distance
straight line
DualityDuality
, , , i jijX Y X Y X Y g X Y
Riemannian geometry:
X
Y
X
Y
**, , ,X XX Y Z Y Z Y Z
Dual Affine Connections
e-geodesic
m-geodesic
log , log 1 logr x t t p x t q x c t
, 1r x t tp x t q x
,
q x
p x
*( , )
Divergence :D Z Y
: 0
: 0, iff
: ij i j
D
D
D d g dz dz
z y
z y z y
z z z
positive-definite
Z
Y
M
Metric and Connections Induced by Divergence(Eguchi)
'
' '
:
:
:
ij i j
ijk i j k
ijk i j k
g D
D
D
y z
y z
y z
z z y
z z y
z z y
', i ii iz y
Riemannian metric
affine connections
Duality:
, ,
k ij kij kji
ijk ijk ijk
g
T
M g T
*, , ,X XX Y Z Y Z Y Z
Two Types of Divergence Invariant divergence (Chentsov, Csiszar)
f-divergence: Fisher- structure
Flat divergence (Bregman)
KL-divergence belongs to both classes
q(x)
D[p : q] = p(x)f{ }dxp(x)
Invariant divergence (manifold of probability distributions; )
: one-to-one
sufficient statistics
y k x
: :X X Y YD p x q x D p y q y
{ ( , )}S p x
ChentsovAmari -Nagaoka
Csiszar f-divergence
: ,if i
i
qD p f
p
p q
: convex, 1 0,f u f
: :cf fD cDp q p q
1f u f u c u
' ''1 1 0 ; 1 1f f f 1
( )f u
u
Ali-SilveyMorimoto
Invariant geometrical structurealpha-geometry (derived from invariant divergence)
,S p x
ij i j
ijk i j k
g E l l
T E l l l
log , ; i i
l p x
-connection
, ;ijk ijki j k T
: dually coupled
, , ,X XX Y Z Y Z Y Z
α
Fisher information
Levi-civita:
: Dually Flat Structure
1 2 1 2 1 2
affine coordinates
dual affine coordinates
potential ,
,
:D
nS
Dually flat manifold:Manifold with Convex Function
S coordinates 1 2, , , n L
: convex function
negative entropy log ,p p x p x dx energy
Euclidean 21
2i
mathematical programming, control systems, physics, engineering, economics
Riemannian metric and flatness
Bregman divergence
, grad D
1,
2i j
ijD d g d d
, ij i j i ig
: geodesic (notLevi-Civita)Flatness (affine)
{ , ( ), }S
Legendre Transformation
i i
one-to-one
0ii
,i i i
i
,D
( ) max { ( )}ii
Two flat coordinate systems ,
: geodesic (e-geodesic)
: dual geodesic (m-geodesic)
“dually orthogonal”
,
,
j ji i
ii i
i
*, , ,X XX Y Z Y Z Y Z
Geometry
ijG GRiemannian metric
1: ,
2TD d G p p p p p
1
G
G G
p p
p p
Straightness (affine connection)
: -geodesic
: -geodesic
t t
t t
p a b
p a b
Pythagorean Theorem (dually flat manifold)
: : :D P Q D Q R D P R
Euclidean space: self-dual
21
2 i
Projection Theorem min :
Q MD P Q
Q = m-geodesic projection of P to M
min :Q M
D Q P
Q’ = e-geodesic projection of P to M
dually flat space
convex functionsBregman
divergence
invariance
invariant divergence Flat divergence
KL-divergenceF-divergenceFisher inf metricAlpha connection
: space of probability distributions }{pS
logp(x)
D[p : q] = p(x) { }dxq(x)
, 0 : ( 1 not n holds)i iS p p p
Space of positive measures : vectors, matrices, arrays
f-divergence
α-divergence
Bregman divergence
divergence
1 1
2 21 1
[ : ] { }2 2i i i iD p q p q p q
[ : ] { log }ii i i
i
pD p q p p q
q
KL-divergence
α-representation (Amari-Nagaoka, Zhang)
1
2 , 1
log , 1i
i
i
prp
iU r r
typical case: u-representation,
2
1 , 1
, 1z
zU ze
i iU r p
Divergence over α-representation
: i i i iD U r V r rr p q
1
2
1
2
: -geodesic
2: -geodesic1
i i
i i
r p
r p
log 1i i
i i
r p
r p
β-divergence (Eguchi)
11
1 , 01
U z z
0 expU z z
0 : :D KLp q p q
-div -div
-divKL
( 1) ( 1)
1[ : ] { ( ) ( )
( 1
-structure
dually flat: -divergenc
)
( 1) ( ) ( ( ) ( )}
e
D p q p x q x
q x p x q x dx
( , ) divergence
, [ : ] { }i i i iD p q p q p q
: divergence
1: -divergence
Tsallis q-Entropy--
1
111
11 , 1
1ln
log , 1
exp ln 1 (1 ) } , ex
1ln 1
1
p
1 q
T
q
q
q
qq q
u qqu
u q
u u q u
H E p x dxp q
u
x
Shannon entropy1
[log ] ( ) log ( )( )
H E p x p x dxp x
Generalized log
α structure
2 1
1
2
q
q
1 11 : log
1 1T R q
q
q
q
H h H hq
h
q
p p x dx
: convex: ( ) { ( )}q qh p f h p p
1 11 ln
1q q qq
H E p xq h
p
1
: ln
11
1
p q
q q
r xD p x r x E
p x
p x r x dxq
: -divergence ; -structure
q -exponential family cf Pistone exponential
conve
escort distrib
, exp
:
( ) ( )ˆ ( ) :
1 11 ln
1
ution
x
q i i
q q
p p
i q i
q qq
p x x
p x p xdx p x
h h
E x
E p xq h
x
x x
p
κ
0
0
1 10
1
1
1
1
n
n ii
n
i ii
i q qi
qi i
q
S p
p x p x
p pq
ph
p
q-Geometry derived from : dually flat
( ) and ( )
0
0 01
log ( ) log ( )
(log log ) ( ) log
n
q q i ii
n
q i q i qi
p x p x
p p x p
Dually flat structure of q-escort
esc
ˆˆ: log
ˆ
ˆ log log q
q
pD p r p dx
rh rp x
q p x dxr x h p
esc ˆ ˆlog logij q i jg E p p
geodesic: exponential family
dual geodesic: q-family
1 2log , log 1 logp x t t p x t p x c t
1 2ˆ ˆ ˆ, 1p x t tp x t p t
q-escort probability distribution
1ˆ ( ) ( )
( )q
q
p x p xh p
Escort geometry
ˆ ˆ ˆ( ) [ log ( , ) log ( , )]ij q i jg E p x p x
q -escort geometry
1ˆ q
q
p x p xh
Dually flat structure of q-escort
esc
ˆˆ: log
ˆ
ˆ log log q
q
pD p r p dx
rh rp x
q p x dxr x h p
esc ˆ ˆlog logij q i jg E p p
geodesic: exponential family
dual geodesic: q-family
1 2log , log 1 logp x t t p x t p x c t
1 2ˆ ˆ ˆ, 1p x t tp x t p t
Pythagorean theoremq
: : :q qD p q D q r D p r
q - geodesic :
1 1 1
1 2, 1q q q
p x t tp x t p x c t
dual- -geodesic:q
1 2, 1q q q
p x t tp x t p x c t
Projection theorem
arg min :qr M
D p r p
Max-entropy theorem
constraint
max
s: q k k
q
E a x c
h
p
q -Cramer Rao theorem : ,p x
1
ˆ-unbiased estima
ˆ
tor
ˆ ˆ
q
q i i j j ij
E
q
E g
q -maximum likelihood estimator
1ˆ ˆarg max , , ;
1arg max ;
q N
q
iN
q
p x x
p xh
/ Bayes MAP with
N q
qh
q -super-robust estimator (Eguchi)
1
1
1 1
0 1
,ˆmax , max
bias-corrected -estimating function
ˆ, , log
1log
1
1ˆ, 0 max ,
q
q i q
q q
N
q i ii q
p xp x
h
q
s x p x p c
c hq
s x p xh
Conformal change of divergence
: :D p q p D p q
ij ijg p g
( )
logijk ijk k ij j ik i jk
i i
T T s g s g s g
s
q -Fisher information
( ) ( )( )
q Fij ij
q
qg p g p
h p
conformal transformation
11[ ( ) : ( )] (1 ( ) ( ) )
(1 ) ( )q q
q divergence
D p x r x p x r x dxq h p
Total Bregman divergence (Vemuri)
2
( ) ( ) ( ) ( )TBD( : )
1 | ( ) |
p q q p qp q
q
Total Bregman Divergence and its Applications to Shape
Retrieval
•Baba C. Vemuri, Meizhu Liu, Shun-ichi Amari, Frank Nielsen
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010
Total Bregman Divergence
2
::
1
DTD
f
x yx y
•rotational invariance
•conformal geometry
TBD examples
Clustering : t-center
1, , mE x x
arg min , ii
TD x x x
E
y
T-center of E x
t-center
2
1
1
i i
i
i
i
w ff
w
wf
xx
x
x
t-center is robust
1, , ;
1; ,
nE
n
x x y
x x z x y
influence fun ;ction z x y
robust as : c z y
1,
1i i
f fG
w
G w fn
y xz x y
y
x x
21
1
f f
w f
w
y y
y y
y
Robust: is boundedz
21Euclidean case
2f x
1
2,
1
,
G
yz x y
y
z x y y
How good is Total Bregman Divergence
•vision
•signal processing
•geometry (conformal)
TBD application-shape retrieval
• Using MPEG7 database;• 70 classes, with 20 shapes each class (Meizhu Liu)
First clustering then retrieval
Advantages
• Accurate;• Easy to access (shape representation);• Space and time efficient (only need to store the
closed form t-centers, clustering can be done offline, hierarchical tree storage).
Shape retrieval framework
• Shape-->• Extract boundary points & align them-->• Represent using mixture of Gaussians-->• Clustering & use k-tree to store the clustering
results;• Query on the tree.
MPEG7 database• Great intraclass variability, and small
interclass dissimilarity.
Shape representation
Experimental results
Other TBD applications
Diffusion tensor imaging (DTI) analysis [Vemuri]
• Interpolation• Segmentation
Baba C. Vemuri, Meizhu Liu, Shun-ichi Amari and Frank Nielsen, Total Bregman Divergence and its Applications to DTI Analysis, IEEE TMI, to appear