25
Riemannian gossip algorithms for decentralized matrix completion Hiroyuki Kasai , Bamdev Mishra , and Atul Saroop The University of Electro-Communications, Japan Amazon Development Centre India, India IEICE meeting 2016

Riemannian gossip algorithms for decentralized matrix completion

Embed Size (px)

Citation preview

Riemannian gossip algorithms for

decentralized matrix completion

Hiroyuki Kasai†, Bamdev Mishra‡, and Atul Saroop‡

†The University of Electro-Communications, Japan

‡Amazon Development Centre India, India

IEICE meeting 2016

Motivation

The matrix completion problem

? ? * ?

* * ? *

? * * ?

* ? * ?

Low-rank prior

m Movies

n Users

m

nr

(n +m − r)r , r ≪ (m, n)

WT

UX

?

U and W factor matrices.[Netflix Challenge, 2006]

Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 2 / 25

Motivation

Our interest is to look at the decentralized scenario

? ? * ?

* * ? *

? * * ?

* ? * ?

m Movies

n1 Users n2 Users

X?

1X?

2

U[WT1 WT

2 ]≈

An agent i has access to its own data matrix X?i .

The matrix U is common across all the agents.

Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 3 / 25

Motivation

Contributions

We develop a nonlinear gossip algorithm with minimalcommunication between agents.

The optimization formulation is based on a weighted combination ofmatrix completion and consensus terms.

We develop a parallel variant of the proposed gossip algorithm.

Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 4 / 25

Motivation

Paper and codes available online

at

www.bamdevmishra.com.

Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 5 / 25

Motivation

Outline

Problem formulation on the Riemannian Grassmann manifold.

Proposed gossip algorithms.

Numerical comparisons on synthetic and Netflix data.

Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 6 / 25

Problem formulation

Outline

Problem formulation on the Riemannian Grassmann manifold.

Proposed gossip algorithms.

Numerical comparisons on synthetic and Netflix data.

Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 7 / 25

Problem formulation

Batch problem formulation

minU∈St(r ,m)

minW∈Rn×r

‖PΩ(UWT )− PΩ(X?)‖2F .

W ∈ Rn×r andU ∈ St(r ,m), the set of m × r matrices with orthonormal columns.

PΩ is the sampling operator, a convenient way to denote knownentries.

Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 8 / 25

Problem formulation

Eliminate W

minU∈St(r ,m) minW∈Rn×r ‖PΩ(UWT )− PΩ(X?)‖2F

≡minU∈St(r ,m) f (U,WU), a Grassmann optimization problem.

Solve blue problem in closed form to obtain WU.

Final optimization problem is on Grassmann manifold, i.e.,variable is ‘column space’ of U.

[Boumal and Absil, LAA, 2015]

Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 9 / 25

Problem formulation

Decentralized problem formulation

X? = [X?1,X

?2, . . . ,X

?N ].

∑i

minU∈St(r ,m),Wi∈Rni×r

1

2‖PΩi

(UWiT )− PΩi

(X?i )‖2

F

= minU∈St(r ,m)

1

2

∑i

‖PΩi(UWT

iU)− PΩi(X?

i )‖2F ,

where WiU is computed by agent i independently.

Although the problem is distributed, we still need to learn a commonU.

Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 10 / 25

Problem formulation

We add a consensus term to our optimization

formulation

Key idea: introduce multiple copies of U among N agents, but allowthem to reach consensus.

minU1,...,UN∈St(r ,m)

1

2

∑i

‖PΩi(UiW

TiUi

)− PΩi(X?

i )‖2F︸ ︷︷ ︸

completion task handled by agent i

2(d(U1,U2)2 + d(U2,U3)2 + . . . + d(UN−1,UN)2)︸ ︷︷ ︸

consensus among agents

.

d is the Riemannian distance on the Grassmann manifold.

A large ρ trades-off completion with consensus.

Minimizing only consensus ⇒ U1 = U2 = . . . = UN−1 = UN .

Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 11 / 25

Proposed Riemannian gossip algorithms

Outline

Problem formulation on the Riemannian Grassmann manifold.

Proposed gossip algorithms.

Numerical comparisons on synthetic and Netflix data.

Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 12 / 25

Proposed Riemannian gossip algorithms

Riemannian online gossip on Grassmann

1 Agents i and i + 1 are neighbors for all i 6 N − 1. (ordering ofagents)

2 At each time slot, say t, we pick an agent i 6 N − 1 randomlywith uniform probability. (SGD updates)

Equivalently, we also pick agent i +1 (the neighbor of agent i).

Agents i and i + 1 update Ui and Ui+1, respectively, by takinga gradient descent step with stepsize γt on Grassmann manifold.∑γ2t <∞ and

∑γt = +∞.

Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 13 / 25

Proposed Riemannian gossip algorithms

A graphical illustration

Agent 1 Agent 2 Agent 3 Agent N-1 Agent N

Universal clock

Each pair is chosen

. . .

with probability 1=(N− 1)

Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 14 / 25

Proposed Riemannian gossip algorithms

Convergence of Riemannian online gossip

Asymptotic convergence follows standard SGD analysis on manifold.

The proposed algorithm is readily implementable, e.g., with thetoolbox Manopt.

[Bonnabel, IEEE TAC, 2013; Absil, Mahoney, and Sepulchre,Princeton Press, 2008; Boumal et al., JMLR, 2014]

Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 15 / 25

Proposed Riemannian gossip algorithms

Parallelizing Riemannian gossip with particular

sampling

Agent 1 Agent 2 Agent 3 Agent 4 Agent 5

Universal clock

\Solid" pairs chosen at same time.

. . .

\Dotted' pairs chosen at same time.

“Dotted’ and “solid” groups are chosen with probability 1/2.

Convergence guarantees remain the same.

(N − 1)/2 times faster.Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 16 / 25

Numerical comparisons

Outline

Problem formulation on the Riemannian Grassmann manifold.

Proposed gossip algorithms.

Numerical comparisons on synthetic and Netflix data.

Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 17 / 25

Numerical comparisons

Effect of ρ

10 000×100 000 matrix with N = 6.

0 10 20 30 40Every 10th update

10-10

10-8

10-6

10-4

10-2

100

Me

an

sq

ua

re e

rro

r o

n t

rain

ing

se

t

Agent 1 rho=103

Agent 2, rho=103

Distance 1-2, rho=103

Agent 1, rho=1010

Agent 2, rho=1010

Distance 1-2, rho=1010

A very large ρ minimizes only achieves consensus among agents.A tuned ρ achieves both completion and consensus.

Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 18 / 25

Numerical comparisons

Performance of online and parallel variants

10 000×100 000 matrix with N = 6 and ρ = 103.

0 10 20 30 40Every 10th update

10-10

10-8

10-6

10-4

10-2

100

Me

an

sq

ua

re e

rro

r o

n t

rain

ing

se

t

Onl. agent 1Onl. agent 2Onl. distance 1 - 2Para. agent 1Para. agent 2Para. distance 1 - 2

There is no loss of performance in parallelizing the updates.

Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 19 / 25

Numerical comparisons

Comparison with D-LMaFit

500× 12000 and N = 6.

0 10 20 30 40Every 10th update

10-12

10-10

10-8

10-6

10-4

10-2

100

102

Me

an

sq

ua

re e

rro

r o

n t

rain

ing

se

tOnline agent 1Online agent 2Online distance 1 - 2D-LMaFit agent 1D-LMaFit agent 2D-LMaFit distance 1 - 2

D-LMaFit code is not scalable to large data.Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 20 / 25

Numerical comparisons

Netflix data: different number of agents

Rank = 10

N = 2, 5, 10, 15, 20 agents

10 random 80/20 - train/test - 80/20 million split

Online gossip:

N = 2 N = 5 N = 10 N = 15 N = 20

TestRMSE

0.877 0.885 0.891 0.894 0.900

Batch gradient descent algorithm RTRMC benchmark: 0.873.

[Boumal and Absil, LAA, 2015]Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 21 / 25

Numerical comparisons

Netflix data: consensus of agents

0 10 20 30 40 5010

−6

10−4

10−2

100

102

Every 10th update

Consensus o

f agents

Distance agents 1 − 2Distance agents 2 − 3Distance agents 3 − 4

Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 22 / 25

Numerical comparisons

Netflix data: test RMSE with updates

0 20 40 60 800.85

0.9

0.95

1

1.05

1.1

Test R

MS

E

Agent 1 Agent 2 Agent 3RTRMC−1

Every 10th update

Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 23 / 25

Summary and future work

Summary and future work

We proposed a Riemannian gossip approach to the decentralizedmatrix completion problem.

We minimize weighted sum of completion and consensus termson Grassmann manifold.

Numerical comparisons show good performance of the proposedalgorithms, e.g., on Netflix dataset.

Currently, we intend to explore asynchronous updating of agentson Grassmann manifold.

Kasai, Mishra, and Saroop Riemannian gossip 20 September 2016 24 / 25

Riemannian gossip algorithms for

decentralized matrix completion

Hiroyuki Kasai†, Bamdev Mishra‡, and Atul Saroop‡

†The University of Electro-Communications, Japan

‡Amazon Development Centre India, India

IEICE meeting 2016