Upload
bamdev-mishra
View
54
Download
0
Embed Size (px)
Citation preview
Riemannian gossip algorithms for
decentralized matrix completion
Hiroyuki Kasai†, Bamdev Mishra‡, and Atul Saroop‡
†The University of Electro-Communications, Japan
‡Amazon Development Centre India, India
IEICE meeting 2016
Motivation
The matrix completion problem
? ? * ?
* * ? *
? * * ?
* ? * ?
≈
Low-rank prior
m Movies
n Users
m
nr
(n +m − r)r , r ≪ (m, n)
WT
UX
?
U and W factor matrices.[Netflix Challenge, 2006]
Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 2 / 25
Motivation
Our interest is to look at the decentralized scenario
? ? * ?
* * ? *
? * * ?
* ? * ?
m Movies
n1 Users n2 Users
X?
1X?
2
U[WT1 WT
2 ]≈
An agent i has access to its own data matrix X?i .
The matrix U is common across all the agents.
Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 3 / 25
Motivation
Contributions
We develop a nonlinear gossip algorithm with minimalcommunication between agents.
The optimization formulation is based on a weighted combination ofmatrix completion and consensus terms.
We develop a parallel variant of the proposed gossip algorithm.
Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 4 / 25
Motivation
Paper and codes available online
at
www.bamdevmishra.com.
Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 5 / 25
Motivation
Outline
Problem formulation on the Riemannian Grassmann manifold.
Proposed gossip algorithms.
Numerical comparisons on synthetic and Netflix data.
Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 6 / 25
Problem formulation
Outline
Problem formulation on the Riemannian Grassmann manifold.
Proposed gossip algorithms.
Numerical comparisons on synthetic and Netflix data.
Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 7 / 25
Problem formulation
Batch problem formulation
minU∈St(r ,m)
minW∈Rn×r
‖PΩ(UWT )− PΩ(X?)‖2F .
W ∈ Rn×r andU ∈ St(r ,m), the set of m × r matrices with orthonormal columns.
PΩ is the sampling operator, a convenient way to denote knownentries.
Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 8 / 25
Problem formulation
Eliminate W
minU∈St(r ,m) minW∈Rn×r ‖PΩ(UWT )− PΩ(X?)‖2F
≡minU∈St(r ,m) f (U,WU), a Grassmann optimization problem.
Solve blue problem in closed form to obtain WU.
Final optimization problem is on Grassmann manifold, i.e.,variable is ‘column space’ of U.
[Boumal and Absil, LAA, 2015]
Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 9 / 25
Problem formulation
Decentralized problem formulation
X? = [X?1,X
?2, . . . ,X
?N ].
∑i
minU∈St(r ,m),Wi∈Rni×r
1
2‖PΩi
(UWiT )− PΩi
(X?i )‖2
F
= minU∈St(r ,m)
1
2
∑i
‖PΩi(UWT
iU)− PΩi(X?
i )‖2F ,
where WiU is computed by agent i independently.
Although the problem is distributed, we still need to learn a commonU.
Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 10 / 25
Problem formulation
We add a consensus term to our optimization
formulation
Key idea: introduce multiple copies of U among N agents, but allowthem to reach consensus.
minU1,...,UN∈St(r ,m)
1
2
∑i
‖PΩi(UiW
TiUi
)− PΩi(X?
i )‖2F︸ ︷︷ ︸
completion task handled by agent i
+ρ
2(d(U1,U2)2 + d(U2,U3)2 + . . . + d(UN−1,UN)2)︸ ︷︷ ︸
consensus among agents
.
d is the Riemannian distance on the Grassmann manifold.
A large ρ trades-off completion with consensus.
Minimizing only consensus ⇒ U1 = U2 = . . . = UN−1 = UN .
Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 11 / 25
Proposed Riemannian gossip algorithms
Outline
Problem formulation on the Riemannian Grassmann manifold.
Proposed gossip algorithms.
Numerical comparisons on synthetic and Netflix data.
Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 12 / 25
Proposed Riemannian gossip algorithms
Riemannian online gossip on Grassmann
1 Agents i and i + 1 are neighbors for all i 6 N − 1. (ordering ofagents)
2 At each time slot, say t, we pick an agent i 6 N − 1 randomlywith uniform probability. (SGD updates)
Equivalently, we also pick agent i +1 (the neighbor of agent i).
Agents i and i + 1 update Ui and Ui+1, respectively, by takinga gradient descent step with stepsize γt on Grassmann manifold.∑γ2t <∞ and
∑γt = +∞.
Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 13 / 25
Proposed Riemannian gossip algorithms
A graphical illustration
Agent 1 Agent 2 Agent 3 Agent N-1 Agent N
Universal clock
Each pair is chosen
. . .
with probability 1=(N− 1)
Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 14 / 25
Proposed Riemannian gossip algorithms
Convergence of Riemannian online gossip
Asymptotic convergence follows standard SGD analysis on manifold.
The proposed algorithm is readily implementable, e.g., with thetoolbox Manopt.
[Bonnabel, IEEE TAC, 2013; Absil, Mahoney, and Sepulchre,Princeton Press, 2008; Boumal et al., JMLR, 2014]
Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 15 / 25
Proposed Riemannian gossip algorithms
Parallelizing Riemannian gossip with particular
sampling
Agent 1 Agent 2 Agent 3 Agent 4 Agent 5
Universal clock
\Solid" pairs chosen at same time.
. . .
\Dotted' pairs chosen at same time.
“Dotted’ and “solid” groups are chosen with probability 1/2.
Convergence guarantees remain the same.
(N − 1)/2 times faster.Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 16 / 25
Numerical comparisons
Outline
Problem formulation on the Riemannian Grassmann manifold.
Proposed gossip algorithms.
Numerical comparisons on synthetic and Netflix data.
Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 17 / 25
Numerical comparisons
Effect of ρ
10 000×100 000 matrix with N = 6.
0 10 20 30 40Every 10th update
10-10
10-8
10-6
10-4
10-2
100
Me
an
sq
ua
re e
rro
r o
n t
rain
ing
se
t
Agent 1 rho=103
Agent 2, rho=103
Distance 1-2, rho=103
Agent 1, rho=1010
Agent 2, rho=1010
Distance 1-2, rho=1010
A very large ρ minimizes only achieves consensus among agents.A tuned ρ achieves both completion and consensus.
Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 18 / 25
Numerical comparisons
Performance of online and parallel variants
10 000×100 000 matrix with N = 6 and ρ = 103.
0 10 20 30 40Every 10th update
10-10
10-8
10-6
10-4
10-2
100
Me
an
sq
ua
re e
rro
r o
n t
rain
ing
se
t
Onl. agent 1Onl. agent 2Onl. distance 1 - 2Para. agent 1Para. agent 2Para. distance 1 - 2
There is no loss of performance in parallelizing the updates.
Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 19 / 25
Numerical comparisons
Comparison with D-LMaFit
500× 12000 and N = 6.
0 10 20 30 40Every 10th update
10-12
10-10
10-8
10-6
10-4
10-2
100
102
Me
an
sq
ua
re e
rro
r o
n t
rain
ing
se
tOnline agent 1Online agent 2Online distance 1 - 2D-LMaFit agent 1D-LMaFit agent 2D-LMaFit distance 1 - 2
D-LMaFit code is not scalable to large data.Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 20 / 25
Numerical comparisons
Netflix data: different number of agents
Rank = 10
N = 2, 5, 10, 15, 20 agents
10 random 80/20 - train/test - 80/20 million split
Online gossip:
N = 2 N = 5 N = 10 N = 15 N = 20
TestRMSE
0.877 0.885 0.891 0.894 0.900
Batch gradient descent algorithm RTRMC benchmark: 0.873.
[Boumal and Absil, LAA, 2015]Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 21 / 25
Numerical comparisons
Netflix data: consensus of agents
0 10 20 30 40 5010
−6
10−4
10−2
100
102
Every 10th update
Consensus o
f agents
Distance agents 1 − 2Distance agents 2 − 3Distance agents 3 − 4
Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 22 / 25
Numerical comparisons
Netflix data: test RMSE with updates
0 20 40 60 800.85
0.9
0.95
1
1.05
1.1
Test R
MS
E
Agent 1 Agent 2 Agent 3RTRMC−1
Every 10th update
Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 23 / 25
Summary and future work
Summary and future work
We proposed a Riemannian gossip approach to the decentralizedmatrix completion problem.
We minimize weighted sum of completion and consensus termson Grassmann manifold.
Numerical comparisons show good performance of the proposedalgorithms, e.g., on Netflix dataset.
Currently, we intend to explore asynchronous updating of agentson Grassmann manifold.
Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 24 / 25