Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Introduction Related Work Proposed Algorithm Experiments
Accelerated Inexact Soft-Impute forFast Large-Scale Matrix Completion
Quanming Yao
Department of Computer Science and EngineeringHong Kong University of Science and Technology
Hong Kong
Joint work with James Kwok
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Outline
1 Introduction
2 Related Work
3 Proposed Algorithm
4 Experiments
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Motivating Applications
Recommender systems: predict rating by user i on item j
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Motivating Applications
Similarity among users and items: low-rank assumption
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Motivating Applications
Image inpainting: fill in missing pixels
Natural image can be well approximated by low rank matrix
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Matrix Completion
minX12‖PΩ(X − O)‖
2F + λ‖X‖∗
X ∈ Rm×n: low-rank matrix to be recovered (m ≤ n)O ∈ Rm×n: observed elements[PΩ(A)]ij = Aij if Ωij = 1, and 0 otherwise
‖X‖∗: nuclear norm (sum of X ’s singular values,non-smooth)‖X‖∗ =
∑mi=1 σi (X )
find X which is low-rank and consistent with the observations
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Proximal Gradient Descent
minx f (x) + λg(x)
f (·): convex and smoothg(·): convex, can be non-smooth
xt+1 = arg minx
f (xt) + 〈x − xt ,∇f (xt)〉+1
2‖x − xt‖2F + λg(x)
= arg minx
1
2‖x − zt‖2 + λg(x)︸ ︷︷ ︸
Proximal Step
(where zt = xt −∇f (xt))
often has simple closed-form solution
convergence rate: O(1/T ), where T is number of iterations
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Proximal Gradient Descent - Acceleration
minx f (x) + λg(x)
can be accelerated to O(1/T 2) [Nesterov, 2013]
yt = (1 + θt)xt − θtxt−1zt = yt −∇f (yt)
xt+1 = arg minx
1
2‖x − zt‖2 + λg(x)
e.g., θt = (t − 1)/(t + 2)can be seen as momentum method with specified weight
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Proximal Gradient Descent for Matrix Completion
minX1
2‖PΩ(X − O)‖2F︸ ︷︷ ︸
f (X )
+λ ‖X‖∗︸ ︷︷ ︸g(X )
Let the SVD of matrix Z be UΣV>.
Proximal Step for Matrix Completion
arg minX
1
2‖X − Z‖2F + λ‖X‖∗ = U (Σ− λI )+︸ ︷︷ ︸
thresholding
V> ≡ SVTλ(Z )
[(A)+]ij = max(Aij , 0)
singular value thresholding (SVT): shrink singular values nobigger than λ to 0
Acceleration can be used [Ji and Ye, 2009; Toh and Yun, 2010].
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Soft-Impute [Mazumder et al., 2010]
Zt = PΩ(O) + P⊥Ω (Xt), Xt+1 = SVTλ(Zt).
[P⊥Ω (A)]ij = Aij if Ωij = 0, and 1 otherwise (complement of PΩ(A))
To compute SVD, the basic operations are matrix multiplications ofthe form Ztu and Z
>t v
Key observation: Zt is sparse + low-rank
Let Xt = UtΣtVt>. For any u ∈ Rn,
Ztu = PΩ(O − Xt)u︸ ︷︷ ︸sparse:O(‖Ω‖1)
+ UtΣt(Vt>u)︸ ︷︷ ︸
low rank:O((m+n)k)
Rank-k SVD takes O(‖Ω‖1k + (m + n)k2) time, instead of O(mnk)(similarly, for Z>t v)
k is much smaller than m and n; ‖Ω‖1 much smaller than mn
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Soft-Impute is Proximal Gradient
Zt = Xt −∇f (Xt)︸ ︷︷ ︸Proximal Gradient
= Xt − PΩ(Xt − O) = P⊥Ω (Xt) + PΩ(O)︸ ︷︷ ︸Soft-Impute
Soft-Impute = Proximal Gradient
Possible to use acceleration and obtain O(1/T 2) rate
Previous work suggested that this is not useful
“sparse + low-rank” structure no longer existsincrease in iteration complexity > gain in convergence rate
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Main Contributions
Acceleration is useful!
1 “sparse + low-rank” structure can still be used
maintain low iteration complexity
improve convergence rate to O(1/T 2)
2 Speedup SVT using power method
further reduces iteration complexity
use of approximation still yields O(1/T 2) convergence rate
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
“Sparse + Low-Rank” Structure
With acceleration,
Zt = PΩ(O − Yt) + Yt= PΩ(O − Yt)︸ ︷︷ ︸
sparse
+ (1 + θt)Xt − θtXt−1︸ ︷︷ ︸sum of two low-rank matrices
For any u,
Ztu = PΩ(O − Yt)u︸ ︷︷ ︸O(‖Ω‖1)
+ (1 + θt)UtΣtV>t u︸ ︷︷ ︸
O((m+n)k)
− θtUt−1Σt−1V>t−1u︸ ︷︷ ︸O((m+n)k)
.
rank-k SVD takes O(‖Ω‖1k + (m + n)k2) time(same as Soft-Impute)
but rate is improved to O(1/T 2) (because of acceleration)
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Approximate SVT - Motivation
The iterative procedure becomes
Yt = (1 + θt)Xt − θtXt−1Zt = PΩ(O − Yt) + Yt
Xt+1 = SVT (Zt)
Motivations
in SVT, only need singular vectors with singular values ≥ λpartial-SVD still has to be exactly solved
iterative nature of proximal gradient descent, warm start canbe helpful
→ approximate the subspace spanned by those singular vectorsusing power method
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Power Method
Let rank-k SVD of Z̃ = UkΣkV>k , power method is
simple but efficient to approximate subspace spanned by Uk
iterative algorithm and can be warm-started (using R)
PowerMethod(Z̃ ,R, �̃) [Halko et al., 2011]
Require: Z̃ ∈ Rm×n, initial R ∈ Rn×k for warm-start, tolerance �̃;1: initialize Q0 = QR(Z̃R);2: for j = 0, 1, . . . do3: Qj+1 = QR(Z̃(Z̃
>Qj)); // QR decomposition of a matrix4: ∆j+1 = ‖Qj+1Qj+1> − QjQj>‖F ;5: if ∆j+1 ≤ �̃ then break;6: end for7: return Qj+1;
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Power Method - Case with k = 1
PowerMethod(Z̃ , r)
1: initialize q0 = Z̃ r ;2: for j = 0, 1, . . . do3: qj = qj/‖qj‖; // QR becomes normalization of a vector4: qj+1 = Z̃(Z̃
>qj);5: end for
Let Z̃ = UΣV>, recursive relationship can be seen as
qj =(Z̃ Z̃>
)jZ̃ r = U
1 (σ2/σ1)2j...
U>Z̃ rFor i = 2, · · · ,m, lim
j→∞
(σiσ1
)2j= 0, power method captures
span of u1 (first column of U)
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Obtain SVT(Z̃t) from a much smaller SVT
With the obtained Q, an approximate SVT can be constructed as
X̂t = Q SVTλ(Q>Z̃t).
Q>Z̃t ∈ Rk×n, thus is much smaller than Z̃t ∈ Rm×n
Approx-SVT(Z̃t ,R, λ, �̃)
Require: Z̃t ∈ Rm×n, R ∈ Rn×k , thresholds λ and �̃.1: Q = PowerMethod(Z̃t ,R, �̃);2: [U,Σ,V ] = SVD(Q>Z̃t);3: U = {ui | σi > λ}, V = {vi | σi > λ}, Σ = (Σ− λI )+;4: return QU,Σ and V .
still O(‖Ω‖1k + (m + n)k2), but is cheaper than exact SVD
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Complete Algorithm
Accelerated Inexact Soft-Impute (AIS-Impute).
Require: partially observed matrix O, parameter λ, decay parameter ν ∈ (0, 1),threshold �;
1: [U0, λ0,V0] = rank-1 SVD(PΩ(O));2: initialize c = 1, �̃0 = ‖PΩ(O)‖F , X0 = X1 = λ0U0V>0 ;3: for t = 1, 2, . . . do4: λt = ν
t(λ0 − λ) + λ;5: θt = (c − 1)/(c + 2);6: Yt = Xt + θt(Xt − Xt−1);7: Z̃t = Yt + PΩ(O − Yt);8: �̃t = ν
t �̃0;9: Vt−1 = Vt−1 − Vt(Vt>Vt−1), remove zero columns;
10: Rt = QR([Vt ,Vt−1]);11: [Ut+1,Σt+1,Vt+1] = approx-SVT(Z̃t ,Rt , λt , �̃t);12: if F (Ut+1Σt+1V
>t+1) > F (UtΣtV
>t ) c = 1 else c = c + 1;
13: end for14: return Xt+1 = Ut+1Σt+1V
>t+1.
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Accelerated Inexact Soft-Impute (AIS-Impute).
Require: partially observed matrix O, parameter λ, decay parameter ν ∈ (0, 1),threshold �;
1: [U0, λ0,V0] = rank-1 SVD(PΩ(O));2: initialize c = 1, �̃0 = ‖PΩ(O)‖F , X0 = X1 = λ0U0V>0 ;3: for t = 1, 2, . . . do4: λt = ν
t(λ0 − λ) + λ;5: θt = (c − 1)/(c + 2);6: Yt = Xt + θt(Xt − Xt−1);7: Z̃t = Yt + PΩ(O − Yt);8: �̃t = ν
t �̃0;9: Vt−1 = Vt−1 − Vt(Vt>Vt−1), remove zero columns;
10: Rt = QR([Vt ,Vt−1]);11: [Ut+1,Σt+1,Vt+1] = approx-SVT(Z̃t ,Rt , λt , �̃t);12: if F (Ut+1Σt+1V
>t+1) > F (UtΣtV
>t ) c = 1 else c = c + 1;
13: end for14: return Xt+1 = Ut+1Σt+1V
>t+1.
core steps: 5–7 (acceleration)
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Accelerated Inexact Soft-Impute (AIS-Impute).
Require: partially observed matrix O, parameter λ, decay parameter ν ∈ (0, 1),threshold �;
1: [U0, λ0,V0] = rank-1 SVD(PΩ(O));2: initialize c = 1, �̃0 = ‖PΩ(O)‖F , X0 = X1 = λ0U0V>0 ;3: for t = 1, 2, . . . do4: λt = ν
t(λ0 − λ) + λ;5: θt = (c − 1)/(c + 2);6: Yt = Xt + θt(Xt − Xt−1);7: Z̃t = Yt + PΩ(O − Yt);8: �̃t = ν
t �̃0;9: Vt−1 = Vt−1 − Vt(Vt>Vt−1), remove zero columns;
10: Rt = QR([Vt ,Vt−1]);11: [Ut+1,Σt+1,Vt+1] = approx-SVT(Z̃t ,Rt , λt , �̃t);12: if F (Ut+1Σt+1V
>t+1) > F (UtΣtV
>t ) c = 1 else c = c + 1;
13: end for14: return Xt+1 = Ut+1Σt+1V
>t+1.
core steps: 8–11 (approximate SVT)the last two iterations (Vt and Vt−1) is used to warm-start power methoderror on approximate SVT �̃t is decreased linearly
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Accelerated Inexact Soft-Impute (AIS-Impute).
Require: partially observed matrix O, parameter λ, decay parameter ν ∈ (0, 1),threshold �;
1: [U0, λ0,V0] = rank-1 SVD(PΩ(O));2: initialize c = 1, �̃0 = ‖PΩ(O)‖F , X0 = X1 = λ0U0V>0 ;3: for t = 1, 2, . . . do4: λt = ν
t(λ0 − λ) + λ;5: θt = (c − 1)/(c + 2);6: Yt = Xt + θt(Xt − Xt−1);7: Z̃t = Yt + PΩ(O − Yt);8: �̃t = ν
t �̃0;9: Vt−1 = Vt−1 − Vt(Vt>Vt−1), remove zero columns;
10: Rt = QR([Vt ,Vt−1]);11: [Ut+1,Σt+1,Vt+1] = approx-SVT(Z̃t ,Rt , λt , �̃t);12: if F (Ut+1Σt+1V
>t+1) > F (UtΣtV
>t ) c = 1 else c = c + 1;
13: end for14: return Xt+1 = Ut+1Σt+1V
>t+1.
step 12: adaptive restarts algorithm if F (X ) starts to increase
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Accelerated Inexact Soft-Impute (AIS-Impute).
Require: partially observed matrix O, parameter λ, decay parameter ν ∈ (0, 1),threshold �;
1: [U0, λ0,V0] = rank-1 SVD(PΩ(O));2: initialize c = 1, �̃0 = ‖PΩ(O)‖F , X0 = X1 = λ0U0V>0 ;3: for t = 1, 2, . . . do4: λt = ν
t(λ0 − λ) + λ;5: θt = (c − 1)/(c + 2);6: Yt = Xt + θt(Xt − Xt−1);7: Z̃t = Yt + PΩ(O − Yt);8: �̃t = ν
t �̃0;9: Vt−1 = Vt−1 − Vt(Vt>Vt−1), remove zero columns;
10: Rt = QR([Vt ,Vt−1]);11: [Ut+1,Σt+1,Vt+1] = approx-SVT(Z̃t ,Rt , λt , �̃t);12: if F (Ut+1Σt+1V
>t+1) > F (UtΣtV
>t ) c = 1 else c = c + 1;
13: end for14: return Xt+1 = Ut+1Σt+1V
>t+1.
step 4 (continuation strategy): λt is initialized to large value and then
decreased gradually; allows further speedup
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Error in Approximate SVT
Let hλg (X ;Zt) ≡ 12‖X − Zt‖2F + λg(X ), if power method exits after j
iterations, assume that k ≥ k̂ , ηt < 1 and �̃ ≥ αtηjt√
1 + η2t , then
hλ‖·‖∗(X̂t ; Z̃t) ≤ hλ‖·‖∗(SVTλ(Z̃t); Z̃t) +ηt
1− ηtβtγt �̃︸ ︷︷ ︸
controlled by �̃
.
where X̂t is approximate solution.
αt , βt , γt and ηt are some constants depend on Z̃t
k̂ is # of singular values > λ, k is input rank for Approx-SVT
�̃ is tolerance for power method
The approximation error in Approx-SVT can be controlled by �̃t
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Convergence of AIS-Impute
Theorem
With controlled approximation error on SVT, Algorithm 3converges to the optimal solution with a rate of O(1/T 2).
Since approximation error �̃t on proximal step (approx-SVT)decreases to 0 faster than O(1/T 2), the convergence rate is thesame as for exact SVT
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Synthetic Data
m ×m data matrix O = UV + GU ∈ Rm×5,V ∈ R5×m: sampled i.i.d. from N (0, 1)G : sampled from N (0, 0.05)
‖Ω‖1 = 15m log(m) random elements in O are observedhalf for training, half for parameter tuning
Testing on the unobserved (missing) elements
Performance criteria:
NMSE =√‖P⊥Ω (X − X̃ )‖F/‖P⊥Ω (X̃ )‖F
rank obtainedtime
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Synthetic Data - Compared Methods
Compare the proposed AIS-Impute with
accelerated proximal gradient algorithm (“APG”) [Ji and Ye,2009; Toh and Yun, 2010];
Soft-Impute [Mazumder et al., 2010]
AlgorithmIteration
ComplexityRate SVT
APG O(mnk) O(1/T 2) Exact
Soft-ImputeO(k‖Ω‖1 +k2(m + n))
O(1/T ) Exact
AIS-ImputeO(k‖Ω‖1 +k2(m + n))
O(1/T 2) Approximate
Code can be download fromhttps://github.com/quanmingyao/AIS-impute
Quanming Yao AIS-Impute for Matrix Completion
https://github.com/quanmingyao/AIS-impute
Introduction Related Work Proposed Algorithm Experiments
Results
m = 500 (sparsity=18.64%) m = 1000 (10.36%)
NMSE rank time (sec) NMSE rank time (sec)
APG 0.0183 5 5.1 0.0223 5 45.5
Soft-Impute 0.0183 5 1.3 0.0223 5 4.4
AIS-Impute 0.0183 5 0.3 0.0223 5 1.1
m = 1500 (7.31%) m = 2000 (5.70%)
NMSE rank time (sec) NMSE rank time (sec)
APG 0.0251 5 172.7 0.0273 5 483.9
Soft-Impute 0.0251 5 13.3 0.0273 5 18.7
AIS-Impute 0.0251 5 2.0 0.0273 5 2.9
All algorithms are equally good on recovery, while AIS-Impute isthe fastest
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Convergence Speeds
(a) objective vs #iterations. (b) objective vs time.
W.r.t. #iterations
APG and AIS-Impute are much faster than Soft-ImputeAIS-Impute has a slightly higher objective than APG
W.r.t. time
APG is the slowest (does not use “sparse plus low-rank”)AIS-Impute is the fastest
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Recommendation - MovieLens Data
Task: Recommend movies based on users’ historical ratings
#users #movies #ratings
MovieLens-100K 943 1,682 100,000
MovieLens-1M 6,040 3,449 999,714
MovieLens-10M 69,878 10,677 10,000,054
ratings (from 1 to 5) of different users on movies
50% of the observed ratings for training
25% for validation and the rest for testing
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
MovieLens Data - Compared Methods
Besides proximal algorithms, we also compare with
active subspace selection (“active”) [Hsieh and Olsen, 2014]
Frank-Wolfe algorithm (“boost”) [Zhang et al., 2012]
variant of Soft-Impute (“ALT-Impute”) [Hastie et al., 2014]
second-order trust-region algorithm (“TR”) [Mishra et al.,2013]
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Objective w.r.t. Time
AIS-Impute is in black
(a) MovieLens-100K. (b) MovieLens-10M.
MovieLen-10M
TR and APG are very slow, and thus not shown
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Testing RMSE w.r.t. Time
AIS-Impute is in black
(a) MovieLens-100K. (b) MovieLens-10M.
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Results
MovieLens-100K MovieLens-1M MovieLens-10M
RMSE rank time RMSE rank time RMSE rank time
active 1.037 70 59.5 0.925 180 1431.4 0.918 217 29681.4
boost 1.038 71 19.5 0.925 178 616.3 0.917 216 13873.9
ALT-Impute 1.037 70 29.1 0.925 179 797.1 0.919 215 17337.3
TR 1.037 71 1911.4 — — > 106 — — > 106
APG 1.037 70 83.4 0.925 180 2060.3 — — > 106
Soft-Impute 1.037 70 337.6 0.925 180 8821.0 — — > 106
AIS-Impute 1.037 70 5.8 0.925 179 129.7 0.916 215 2817.5
All algorithms are equally good at recovering the missingmatrix elements
TR is the slowest
ALT-Impute has the same convergence rate as Soft-Impute,but is faster (than Soft-Impute)
AIS-Impute is the fastest
Quanming Yao AIS-Impute for Matrix Completion
Introduction Related Work Proposed Algorithm Experiments
Conclusion
AIS-Impute
accelerates proximal gradient descent without losing the“sparse plus low-rank” structure
power method produces good approximation to SVT efficiently
fast convergence rate + low iteration complexity
empirically, much faster than the state-of-the-art
Quanming Yao AIS-Impute for Matrix Completion
IntroductionRelated WorkProximal Gradient Descent
Proposed AlgorithmExperiments