NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks ›...

NOMAD: Non-locking, stOchastic Multi-machinealgorithm for Asynchronous and Decentralized

matrix factorizationScaling by Exploiting Structure

S.V.N. (vishy) Vishwanathan

Purdue University and Amazonvishy@purdue.edu,amazon.com

July 1st, 2013

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 1 / 25

Regularized risk minimization

Machine LearningWe want to build a model which predicts well on dataA model’s performance is quantified by a loss function

a sophisticated discrepancy score

Our model must generalize to unseen dataAvoid over-fitting by penalizing complex models (Regularization)

More FormallyTraining data: x1, . . . , xmLabels: y1, . . . , ymLearn a vector: w

minimizew

J(w) := λ

d∑j=1

φj(wj)︸︷︷︸Regularizer

m∑i=1

l(〈w , xi〉 , yi)︸︷︷︸Risk Remp

minimizew

J(w) := λ

d∑j=1

m∑i=1

minimizew

J(w) := λ

d∑j=1

m∑i=1

minimizew

J(w) := λ

d∑j=1

m∑i=1

minimizew

J(w) := λ

d∑j=1

m∑i=1

NOMAD for Matrix Completion

Outline

1 NOMAD for Matrix Completion

2 NOMAD for Regularized Risk Minimization

Collaborative filtering

M1 M2 M3 M4 M5 M6 M7 M8 M9 M10

3 3 7 33 3 3

7 7 33 7 3

3 7 33 7

Matrix completion

minW ∈ Rm×k

H ∈ Rn×k

f (W ,H),

f (W ,H) =12

∑(i,j)∈Ω

Aij −w>i hj

︸︷︷︸loss

+λ(‖wi‖2 +

∥∥hj∥∥2)

︸︷︷︸Regularizer

Stochastic approximation

f (W ,H) ≈ fn(W ,H) =12

(Aij −w>i hj

)2+ λ

(‖wi‖2 +

∥∥hj∥∥2)

∇wi′ fn(W ,H) =

(Aij −w>i hj

)hj + λwi , for i = i ′

0 otherwise

∇hj′fn(W ,H) =

(Aij −w>i hj

)wi + λhj , for j = j ′

0 otherwise

Stochastic updates

wi ← wi − η((

Aij −w>i hj

)hj + λwi

)hj ← hj − η

((Aij −w>i hj

)wi + λhj

Decoupling the updates [Gemulla et al., KDD 2011]

Synchronize and Communicate

Some Observations (also see Gemulla et al, ICDM 2012)

The goodUpdates are decoupled and easy to parallelizeEasy to implement using map-reduce

The badCommunication and computation are interleaved

When network is active then CPU is idleWhen CPU is active then network is active

Some Observations (also see Gemulla et al, ICDM 2012)

The goodUpdates are decoupled and easy to parallelizeEasy to implement using map-reduce

The badCommunication and computation are interleaved

When network is active then CPU is idleWhen CPU is active then network is active

Question: Can we keep CPU and network simultaneously busy?

Our answer

Non-locking, stOchastic Multi-machine algorithm for Asynchronousand Decentralized matrix factorization (NOMAD)

Illustration of NOMAD communication

Eventually . . .

Experiments: Netflix

Size: 2649429 × 17770, nnz=99 million

0 5 10 15 20 25 30

number of machines

netflix

1 Processor

500 1,000 1,500

elapsed secs × num processors

1 Processor

Multiple Processors

500 1,000 1,500

2 Processors

1 Processor

500 1,000 1,500

1 Processor

Multiple Processors

500 1,000 1,500

4 Processors

1 Processor

500 1,000 1,500

1 Processor

Multiple Processors

500 1,000 1,5000.95

8 Processors

1 Processor

500 1,000 1,500

1 Processor

Multiple Processors

500 1,000 1,500

15 Processors

1 Processor

500 1,000 1,500

1 Processor

Multiple Processors

500 1,000 1,500

30 Processors

Experiments: Yahoo! music

Size: 1000990 × 624961, nnz=252 million

0 5 10 15 20 25 300

number of machines

Yahoo! music

1 Processor

500 1,000 1,500

1 Processor

Multiple Processors

500 1,000 1,500

2 Processors

1 Processor

500 1,000 1,500

1 Processor

Multiple Processors

500 1,000 1,500

4 Processors

1 Processor

500 1,000 1,500

1 Processor

Multiple Processors

500 1,000 1,500

8 Processors

1 Processor

500 1,000 1,500

1 Processor

Multiple Processors

500 1,000 1,500

15 Processors

1 Processor

500 1,000 1,500

1 Processor

Multiple Processors

500 1,000 1,500

30 Processors

Experiments: Synthetic Data

Size: 5 000 000 × 200 000, nnz=270 million

0 5 10 15 20 25 30

number of machines

Synthetic

Experiments: Synthetic data

1 Processor

500 1,000 1,50040

1 Processor

Multiple Processors

500 1,000 1,500

2 Processors

1 Processor

500 1,000 1,50040

1 Processor

Multiple Processors

500 1,000 1,500

4 Processors

1 Processor

500 1,000 1,50040

1 Processor

Multiple Processors

500 1,000 1,500

8 Processors

1 Processor

500 1,000 1,50040

1 Processor

Multiple Processors

500 1,000 1,500

15 Processors

NOMAD for Regularized Risk Minimization

Outline

1 NOMAD for Matrix Completion

2 NOMAD for Regularized Risk Minimization

Problem Formulation

d∑j=1

φj(wj) +1m

m∑i=1

l(〈w , xi〉 , yi)

Problem Formulation

minw ,u

d∑j=1

φj(wj) +1m

m∑i=1

subject to ui = 〈w , xi〉 i = 1 . . .m

Problem Formulation

minw ,u

d∑j=1

φj(wj) +1m

m∑i=1

l(ui) +1m

m∑i=1

αi(ui − 〈w , xi〉)

Problem Formulation

minw ,u

d∑j=1

φj(wj) +1m

m∑i=1

l(ui) +1m

m∑i=1

αi(ui − 〈w , xi〉)

Problem Formulation

d∑j=1

φj(wj)−1m

m∑i=1

αi 〈w , xi〉+1m

m∑i=1

(l(ui) + αiui)

Problem Formulation

d∑j=1

φj(wj)−

⟨w ,

m∑i=1

l∗(−αi)

Problem Formulation

m∑i=1

∑j∈Ωi

(λ∣∣Ωj∣∣φj(wj)−

1mαiwjxij +

1m |Ωi |

l∗(−αi)

Stochastic Gradients

∂wj J(w , α) = |Ω|

(λ∣∣Ωj∣∣∂φj(wj)−

1mαixij

∂αi J(w , α) = |Ω|(− 1

mwjxij +

1m |Ωi |

∂αi l∗(−αi)

Decoupling the updates

It Converges!

0 20 40 60 80 10010−2

10−1

number of iterations

KDDA Test Dataset

10 percent coordinatesall coordinates

Joint work with

Reading Thread Training Thread

Update

Weight Vector

Cached Data (Working Set)

Dataset

(Random Access)

(Sequential Access)

(Random Access)

NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks ›...

Documents

NOMAD CUSTOMER STORY HOW NOMAD POWERS A GOOGLE …€¦ · to disrupt AgTech // Infrastructure Enables Innovation NOMAD CUSTOMER STORY NOMAD CUSTOMER STORY | HOW NOMAD POWERS A GOOGLE-BACKED

Nomad : Parturition & Lactation

MET Nomad™ Advanced, Versatile Sorbent Trap Systemrental.cleanair.com › sites › default › files › Met_Nomad_Mercury_Con… · MET‐Nomad™ MET‐Nomad™ Benefits: •

Nomad 2015

Nomad Green

NOMAD FEATURE

Nomad Deck

Nomad Games

NOMAD: Non-locking, stOchastic Multi-machine … Non-locking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion Hyokun Yun Purdue University yun3@purdue.edu

Organized Cybercrime Simple Nomad nomad mobile research centre

Nomad: Hypoxias & Asphyxia

NOMAD: Non-locking, stOchastic Multi-machine algorithm for ...inderjit/public_papers/nomad_mc_vldb2014.pdf · matrix completion, named NOMAD (Non-locking, stOchas-tic Multi-machine

Nomad Travel

EARTH: works in paper - Nomad Art | Nomad Art

Nomad: Thyroid Physiology

Delez Nomad

NOMAD: Non-locking, stOchastic Multi-machine algorithm for …rofuyu/papers/nomad-vldb14.pdf · 2014. 7. 11. · NOMAD: Non-locking, stOchastic Multi-machine algorithm for Asynchronous

POOCCIO NOMAD

Going Nomad

Nomad Wagon