Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
On the convergence ofhigher-order orthogonality iteration and its extension
Yangyang Xu
IMA, University of Minnesota
SIAM Conference LA15, Atlanta
October 27, 2015
Best low-multilinear-rank approximation of tensors
minC,A‖X − C ×1 A1 . . .×N AN‖2F ,
subject to An ∈ OIn×rn , An ∈ RIn×rn : A>nAn = I, ∀n.
I Any tensor has a multilinear SVD [Lathauwer-Moor-Vandewalle’00]
I Truncated multilinear SVD is not the best [L-M-V’00]
2/17
Face recognition by low-rank tensor approximation
Matrix Decomposition
D ≈ UC
Tensor Decomposition
D ≈ C×1U1 · · ·×5U5
I Tensor decomposition can improve recognition accuracy over 50%[Vasilescu-Terzopoulos’02]
3/17
Higher-order Orthogonality Iteration (HOOI)
Given A, the optimal C is
C = X ×1 A>1 . . .×N A>N .
With this C plugged in, the approximation problem becomes
minA‖X −X ×1 A1A
>1 . . .×N ANA>N‖2F ,
subject to An ∈ OIn×rn , ∀n,
which is equivalent to
maxA‖X ×1 A
>1 . . .×N A>N‖2F = ‖A>n unfoldn(X ×i 6=n A>i )‖2F ,
subject to An ∈ OIn×rn , ∀n.
4/17
Higher-order Orthogonality Iteration (HOOI)
Algorithm 1: Higher-order orthogonality iteration (HOOI)
Input: X and (r1, . . . , rN )Initialization: choose (A0
1, . . . ,A0N ) with A0
n ∈ OIn×rn , ∀n and set k = 0while not convergent do
for n = 1, . . . , N doSet Ak+1
n to an orthonormal basis of the dominant rn-dimensional leftsingular subspace of
Gkn = unfoldn(X ×i<n (Ak+1
i )> ×i>n (Aki )>)
increase k ← k + 1
I Widely used and fit to large-scale problems
I Better scalability than Newton-type methods such as Newton-Grassmann[Elden-Savas’09] and Riemannian trust region [Ishteva et. al’11]
I Convergence is still an open question
4/17
this talk addresses the open questionsubspace convergence of HOOI
I Why care about convergenceI Without convergence, HOOI can give severely different multilinear
subspace by running to different numbers of iterations
I ChallengesI Non-convexityI Non-uniqueness of dominant subspaceI Non-uniqueness of orthonormal basis
I How to establish convergenceI Greedy selection of orthonormal basisI Kurdyka- Lojasiewicz property
5/17
Greedy higher-order orthogonality iteration
Algorithm 2: Greedy HOOI
Input: X and (r1, . . . , rN )Initialization: choose (A0
1, . . . ,A0N ) with A0
n ∈ OIn×rn , ∀n and set k = 0while not convergent do
for n = 1, . . . , N doSet Ak+1
n ∈ argminAn‖An −Ak
n‖F over all orthonormal bases of thedominant rn-dimensional left singular subspace of
Gkn = unfoldn(X ×i<n (Ak+1
i )> ×i>n (Aki )>)
increase k ← k + 1
I Iterate mapping is continuous
I Will be used to establish convergence of original HOOI
6/17
Block-nondegeneracy
A is a block-nondegenerate point if σrn(Gn) > σrn+1(Gn), ∀n where
Gn = unfoldn(X ×i 6=n Ai)
I Dominant multilinear subspace is unique
I Equivalent to negative definiteness of block Hessian on OIn×rn for alocal maximizer
I Necessary condition for convergenceI A degenerate first-order optimal point is not stationary
7/17
Convergence analysis of greedy HOOI
Let Akk≥1 be the sequence from greedy HOOI. Assume there is ablock-nondegenerate limit point A.
I A is a block-wise maximizer and a critical point
I There is α > 0 such that if Ak sufficiently close to A, then
α‖Ak+1 −Ak‖2F ≤ F (Ak+1)− F (Ak).
where F (A) = ‖X ×1 A>1 . . .×N A>N‖2F −
∑Nn=1 ιOIn×rn
I If Ak is sufficiently close to A, then
dist(0, ∂F (Ak)) ≤ C‖Ak −Ak−1‖F .
I Further with Kurdyka- Lojasiewicz property ⇒ Ak → A as k →∞I Local convergence to globally optimal solution
8/17
Convergence analysis of greedy HOOI
Kurdyka- Lojasiewicz Property: the following quantity is bounded near A fora certain θ ∈ [0, 1)
|F (A)− F (A)|θ
dist(0, ∂F (A))
Use (1− θ)(a− b) ≤ aθ(a1−θ − b1−θ), a, b ≥ 0 and previous inequalities to have
‖Ak+1 −Ak‖2F.F (Ak+1)− F (Ak)
.(F (A)− F (Ak))θ((F (A)− F (Ak))1−θ − (F (A)− F (Ak+1))1−θ)
.dist(0, ∂F (A))((F (A)− F (Ak))1−θ − (F (A)− F (Ak+1))1−θ)
.‖Ak −Ak−1‖F((F (A)− F (Ak))1−θ − (F (A)− F (Ak+1))1−θ).
Together with Young’s inequality gives∑k ‖Ak+1 −Ak‖F <∞
8/17
Convergence analysis of original HOOI
Starting from Ak0 sufficiently close to a block-nondegenerate limit pointA of Akk≥1, we have
Akn(Ak
n)> = Akn(Ak
n)>, ∀n, ∀k ≥ k0
where Akk≥k0 and Akk≥k0 are sequences from the original andgreedy HOOIs respectively.
I A is a block-wise maximizer and a critical point
I Akn(Ak
n)> → AnA>n as k →∞
I Local convergence to globally optimal multilinear subspace
9/17
Best low-rank tensor approximation with missing values
minX ,C,A
‖X − C ×1 A1 . . .×N AN‖2F ,
subject to An ∈ OIn×rn , ∀n; PΩ(X ) = PΩ(M)
I Reduces to the best low-rank tensor approximation if Ω contains allelements
I Can estimate the original tensor M simultaneously
10/17
Best low-rank tensor approximation with missing values
Low-rank tensor approximation with missing values[Geng-Miles-Zhou-Wang’11]
10/17
Incomplete higher-order orthogonality iteration
Algorithm 3: incomplete HOOI (iHOOI)
Input: PΩ(M) and (r1, . . . , rN )
Initialization: choose X 0 and (A01, . . . ,A
0N ) and set k = 0
while not convergent dofor n = 1, . . . , N do
Set Ak+1n to an orthonormal basis of the dominant rn-dimensional left
singular subspace of
Gkn = unfoldn(X ×i<n (Ak+1
i )> ×i>n (Aki )>)
X k+1 ← PΩ(M) + PΩc
(X k ×Nn=1 A
k+1n (Ak+1
n )>)
increase k ← k + 1
I X -update is one step of the alternating minimization for
minC,X‖X − C ×Ni=1 A
k+1i (Ak+1
i )>‖2F , subject to PΩ(X ) = PΩ(M)
11/17
Experimental results
Low-multilinear-rank tensor approximation
I Relation between iterate sequences by original and greedy HOOIs
Low-rank tensor completion
I Phase transition plots (simulate how many samples guarantee exactreconstruction)
12/17
Comparing HOOIs on subspace learning
0 100 200 300 400 50010
−8
10−6
10−4
10−2
100
Number of iterations
Sub
spac
e re
lativ
e ch
ange
s
Greedy HOOIOriginal HOOI
Gaussian random tensorfull size: 50× 50× 50
core size: 5× 5× 5
Akn(Ak
n)> = Akn(Ak
n)>, ∀n, k
0 100 200 300 400 50010
15
n =
1
rn−th singular value
(rn+1)−th singular value
0 100 200 300 400 50010
15
n =
2
0 100 200 300 400 50010
15
n =
3
Number of iterations
13/17
Comparing HOOIs on subspace learning
0 100 200 300 400 50010
−15
10−10
10−5
100
Number of iterations
Sub
spac
e re
lativ
e ch
ange
s
Greedy HOOIOriginal HOOI
Yale Face Databasefull size: 38× 64× 2958
core size: 5× 5× 20
Akn(Ak
n)> = Akn(Ak
n)>, ∀n, k
0 100 200 300 400 500
50
60
70
n =
1
rn−th singular value
(rn+1)−th singular value
0 100 200 300 400 500
50
60
70
n =
2
0 100 200 300 400 500
14.5
15
n =
3
Number of iterations
13/17
Phase transition plots
iHOOI (prop’d) WTucker geomCG
rank r
sam
ple
ratio
5 10 15 20 25 30
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
rank rsa
mpl
e ra
tio
5 10 15 20 25 30
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
rank r
sam
ple
ratio
5 10 15 20 25 30
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
I 50× 50× 50 randomly generated tensor with rank (r, r, r)
I True rank provided (adaptive rank estimate can be applied)
I WTucker [Filipovic-Jukic’13]: alternating minimization tominA,C ‖PΩ(C ×Ni=1 Ai)− PΩ(M)‖2F
I geomCG [Kressner-Steinlechner-Vandereycken’13]: Riemannianoptimization method
Proposed method achieves near-optimal reconstruction
14/17
Convergence to globally optimal solution
Observation:‖PΩ(C×N
n=1An)−PΩ(M)‖F‖PΩ(M)‖F always small
I Low sampling ratio ⇒ possibly more than one feasible low-rank solutions
I A0 sufficiently close to bases of M ⇒ Ck ×Nn=1 Akn →M
0 1 2 3 4 50
10
20
30
40
50
noise level δ
succ
ess
times
am
ong
50
tensor size: (50,50,50); rank: (20,20,20); sample ratio: 10%A0 ← truncated HOSVD of M + δ‖M‖F N
‖N‖F
15/17
Conclusions
I Give subspace sequence convergence of the HOOI methodI first result on convergence of HOOII via the convergence of a greedy HOOI and Kurdyka- Lojasiewicz
inequalityI block-nondegeneracy is sufficient and necessary
I Propose an incomplete HOOI method for applications with missingvalues
I Subspace learning and tensor completion simultaneouslyI Empirically, near-optimal recovery on low-rank tensors
16/17
References
I Y. Xu. On the convergence of higher-order orthogonality iteration. arXiv,2015.
I Y. Xu. On higher-order singular value decomposition from incompletedata. arXiv, 2014.
17/17