63
NOMAD: Non-locking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix factorization Scaling by Exploiting Structure S.V . N. (vishy) Vishwanathan Purdue University and Amazon vishy@{purdue.edu,amazon.com} July 1st, 2013 S.V . N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 1 / 25

NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD: Non-locking, stOchastic Multi-machinealgorithm for Asynchronous and Decentralized

matrix factorizationScaling by Exploiting Structure

S.V.N. (vishy) Vishwanathan

Purdue University and [email protected],amazon.com

July 1st, 2013

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 1 / 25

Page 2: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

Regularized risk minimization

Machine LearningWe want to build a model which predicts well on dataA model’s performance is quantified by a loss function

a sophisticated discrepancy score

Our model must generalize to unseen dataAvoid over-fitting by penalizing complex models (Regularization)

More FormallyTraining data: x1, . . . , xmLabels: y1, . . . , ymLearn a vector: w

minimizew

J(w) := λ

d∑j=1

φj(wj)︸ ︷︷ ︸Regularizer

+1m

m∑i=1

l(〈w , xi〉 , yi)︸ ︷︷ ︸Risk Remp

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 2 / 25

Page 3: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

Regularized risk minimization

Machine LearningWe want to build a model which predicts well on dataA model’s performance is quantified by a loss function

a sophisticated discrepancy score

Our model must generalize to unseen dataAvoid over-fitting by penalizing complex models (Regularization)

More FormallyTraining data: x1, . . . , xmLabels: y1, . . . , ymLearn a vector: w

minimizew

J(w) := λ

d∑j=1

φj(wj)︸ ︷︷ ︸Regularizer

+1m

m∑i=1

l(〈w , xi〉 , yi)︸ ︷︷ ︸Risk Remp

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 2 / 25

Page 4: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

Regularized risk minimization

Machine LearningWe want to build a model which predicts well on dataA model’s performance is quantified by a loss function

a sophisticated discrepancy score

Our model must generalize to unseen dataAvoid over-fitting by penalizing complex models (Regularization)

More FormallyTraining data: x1, . . . , xmLabels: y1, . . . , ymLearn a vector: w

minimizew

J(w) := λ

d∑j=1

φj(wj)︸ ︷︷ ︸Regularizer

+1m

m∑i=1

l(〈w , xi〉 , yi)︸ ︷︷ ︸Risk Remp

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 2 / 25

Page 5: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

Regularized risk minimization

Machine LearningWe want to build a model which predicts well on dataA model’s performance is quantified by a loss function

a sophisticated discrepancy score

Our model must generalize to unseen dataAvoid over-fitting by penalizing complex models (Regularization)

More FormallyTraining data: x1, . . . , xmLabels: y1, . . . , ymLearn a vector: w

minimizew

J(w) := λ

d∑j=1

φj(wj)︸ ︷︷ ︸Regularizer

+1m

m∑i=1

l(〈w , xi〉 , yi)︸ ︷︷ ︸Risk Remp

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 2 / 25

Page 6: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

Regularized risk minimization

Machine LearningWe want to build a model which predicts well on dataA model’s performance is quantified by a loss function

a sophisticated discrepancy score

Our model must generalize to unseen dataAvoid over-fitting by penalizing complex models (Regularization)

More FormallyTraining data: x1, . . . , xmLabels: y1, . . . , ymLearn a vector: w

minimizew

J(w) := λ

d∑j=1

φj(wj)︸ ︷︷ ︸Regularizer

+1m

m∑i=1

l(〈w , xi〉 , yi)︸ ︷︷ ︸Risk Remp

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 2 / 25

Page 7: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Outline

1 NOMAD for Matrix Completion

2 NOMAD for Regularized Risk Minimization

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 3 / 25

Page 8: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Collaborative filtering

M1 M2 M3 M4 M5 M6 M7 M8 M9 M10

U1

U2

U3

U4

U5

U6

3 3 7 33 3 3

7 7 33 7 3

3 7 33 7

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 4 / 25

Page 9: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Matrix completion

A≈

W

H

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 5 / 25

Page 10: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Matrix completion

minW ∈ Rm×k

H ∈ Rn×k

f (W ,H),

f (W ,H) =12

∑(i,j)∈Ω

(

Aij −w>i hj

)2

︸ ︷︷ ︸loss

+λ(‖wi‖2 +

∥∥hj∥∥2)

︸ ︷︷ ︸Regularizer

.

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 6 / 25

Page 11: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Stochastic approximation

f (W ,H) ≈ fn(W ,H) =12

(Aij −w>i hj

)2+ λ

(‖wi‖2 +

∥∥hj∥∥2)

∇wi′ fn(W ,H) =

(Aij −w>i hj

)hj + λwi , for i = i ′

0 otherwise

∇hj′fn(W ,H) =

(Aij −w>i hj

)wi + λhj , for j = j ′

0 otherwise

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 7 / 25

Page 12: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Stochastic updates

wi ← wi − η((

Aij −w>i hj

)hj + λwi

)hj ← hj − η

((Aij −w>i hj

)wi + λhj

)

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 8 / 25

Page 13: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Decoupling the updates [Gemulla et al., KDD 2011]

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 9 / 25

Page 14: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Decoupling the updates [Gemulla et al., KDD 2011]

Synchronize and Communicate

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 9 / 25

Page 15: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Decoupling the updates [Gemulla et al., KDD 2011]

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 9 / 25

Page 16: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Decoupling the updates [Gemulla et al., KDD 2011]

Synchronize and Communicate

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 9 / 25

Page 17: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Decoupling the updates [Gemulla et al., KDD 2011]

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 9 / 25

Page 18: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Decoupling the updates [Gemulla et al., KDD 2011]

Synchronize and Communicate

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 9 / 25

Page 19: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Decoupling the updates [Gemulla et al., KDD 2011]

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 9 / 25

Page 20: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Some Observations (also see Gemulla et al, ICDM 2012)

The goodUpdates are decoupled and easy to parallelizeEasy to implement using map-reduce

The badCommunication and computation are interleaved

When network is active then CPU is idleWhen CPU is active then network is active

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 10 / 25

Page 21: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Some Observations (also see Gemulla et al, ICDM 2012)

The goodUpdates are decoupled and easy to parallelizeEasy to implement using map-reduce

The badCommunication and computation are interleaved

When network is active then CPU is idleWhen CPU is active then network is active

Question: Can we keep CPU and network simultaneously busy?

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 10 / 25

Page 22: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Our answer

Non-locking, stOchastic Multi-machine algorithm for Asynchronousand Decentralized matrix factorization (NOMAD)

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 11 / 25

Page 23: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Illustration of NOMAD communication

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 12 / 25

Page 24: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Illustration of NOMAD communication

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 12 / 25

Page 25: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Illustration of NOMAD communication

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 12 / 25

Page 26: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Illustration of NOMAD communication

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 12 / 25

Page 27: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Illustration of NOMAD communication

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 12 / 25

Page 28: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Illustration of NOMAD communication

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 12 / 25

Page 29: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Illustration of NOMAD communication

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 12 / 25

Page 30: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Illustration of NOMAD communication

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 12 / 25

Page 31: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Illustration of NOMAD communication

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 12 / 25

Page 32: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Illustration of NOMAD communication

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 12 / 25

Page 33: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Illustration of NOMAD communication

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 12 / 25

Page 34: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Eventually . . .

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 13 / 25

Page 35: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Experiments: Netflix

Size: 2649429 × 17770, nnz=99 million

0 5 10 15 20 25 30

0

10

20

30

40

number of machines

spee

dup

netflix

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 14 / 25

Page 36: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Experiments: Netflix

1 Processor

500 1,000 1,500

1

1.05

1.1

elapsed secs × num processors

test

RM

SE

1 Processor

Multiple Processors

500 1,000 1,500

1

1.05

1.1

elapsed secs × num processors

test

RM

SE

2 Processors

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 15 / 25

Page 37: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Experiments: Netflix

1 Processor

500 1,000 1,500

1

1.05

1.1

elapsed secs × num processors

test

RM

SE

1 Processor

Multiple Processors

500 1,000 1,500

1

1.05

1.1

1.15

elapsed secs × num processors

test

RM

SE

4 Processors

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 15 / 25

Page 38: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Experiments: Netflix

1 Processor

500 1,000 1,500

1

1.05

1.1

elapsed secs × num processors

test

RM

SE

1 Processor

Multiple Processors

500 1,000 1,5000.95

1

1.05

1.1

1.15

elapsed secs × num processors

test

RM

SE

8 Processors

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 15 / 25

Page 39: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Experiments: Netflix

1 Processor

500 1,000 1,500

1

1.05

1.1

elapsed secs × num processors

test

RM

SE

1 Processor

Multiple Processors

500 1,000 1,500

0.95

1

1.05

1.1

1.15

elapsed secs × num processors

test

RM

SE

15 Processors

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 15 / 25

Page 40: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Experiments: Netflix

1 Processor

500 1,000 1,500

1

1.05

1.1

elapsed secs × num processors

test

RM

SE

1 Processor

Multiple Processors

500 1,000 1,500

1

1.1

1.2

1.3

elapsed secs × num processors

test

RM

SE

30 Processors

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 15 / 25

Page 41: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Experiments: Yahoo! music

Size: 1000990 × 624961, nnz=252 million

0 5 10 15 20 25 300

10

20

30

number of machines

spee

dup

Yahoo! music

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 16 / 25

Page 42: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Experiments: Yahoo! music

1 Processor

500 1,000 1,500

23

24

25

elapsed secs × num processors

test

RM

SE

1 Processor

Multiple Processors

500 1,000 1,500

23

24

25

26

elapsed secs × num processors

test

RM

SE

2 Processors

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 17 / 25

Page 43: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Experiments: Yahoo! music

1 Processor

500 1,000 1,500

23

24

25

elapsed secs × num processors

test

RM

SE

1 Processor

Multiple Processors

500 1,000 1,500

23

24

25

elapsed secs × num processors

test

RM

SE

4 Processors

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 17 / 25

Page 44: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Experiments: Yahoo! music

1 Processor

500 1,000 1,500

23

24

25

elapsed secs × num processors

test

RM

SE

1 Processor

Multiple Processors

500 1,000 1,500

23

24

25

26

elapsed secs × num processors

test

RM

SE

8 Processors

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 17 / 25

Page 45: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Experiments: Yahoo! music

1 Processor

500 1,000 1,500

23

24

25

elapsed secs × num processors

test

RM

SE

1 Processor

Multiple Processors

500 1,000 1,500

23

24

25

26

elapsed secs × num processors

test

RM

SE

15 Processors

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 17 / 25

Page 46: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Experiments: Yahoo! music

1 Processor

500 1,000 1,500

23

24

25

elapsed secs × num processors

test

RM

SE

1 Processor

Multiple Processors

500 1,000 1,500

25

30

35

elapsed secs × num processors

test

RM

SE

30 Processors

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 17 / 25

Page 47: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Experiments: Synthetic Data

Size: 5 000 000 × 200 000, nnz=270 million

0 5 10 15 20 25 30

0

10

20

30

number of machines

spee

dup

Synthetic

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 18 / 25

Page 48: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Experiments: Synthetic data

1 Processor

500 1,000 1,50040

60

80

100

120

140

elapsed secs × num processors

test

RM

SE

1 Processor

Multiple Processors

500 1,000 1,500

50

100

150

200

elapsed secs × num processors

test

RM

SE

2 Processors

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 19 / 25

Page 49: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Experiments: Synthetic data

1 Processor

500 1,000 1,50040

60

80

100

120

140

elapsed secs × num processors

test

RM

SE

1 Processor

Multiple Processors

500 1,000 1,500

40

60

80

100

120

140

elapsed secs × num processors

test

RM

SE

4 Processors

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 19 / 25

Page 50: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Experiments: Synthetic data

1 Processor

500 1,000 1,50040

60

80

100

120

140

elapsed secs × num processors

test

RM

SE

1 Processor

Multiple Processors

500 1,000 1,500

40

60

80

100

120

140

elapsed secs × num processors

test

RM

SE

8 Processors

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 19 / 25

Page 51: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Matrix Completion

Experiments: Synthetic data

1 Processor

500 1,000 1,50040

60

80

100

120

140

elapsed secs × num processors

test

RM

SE

1 Processor

Multiple Processors

500 1,000 1,500

40

60

80

100

120

140

elapsed secs × num processors

test

RM

SE

15 Processors

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 19 / 25

Page 52: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Regularized Risk Minimization

Outline

1 NOMAD for Matrix Completion

2 NOMAD for Regularized Risk Minimization

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 20 / 25

Page 53: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Regularized Risk Minimization

Problem Formulation

minw

λ

d∑j=1

φj(wj) +1m

m∑i=1

l(〈w , xi〉 , yi)

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 21 / 25

Page 54: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Regularized Risk Minimization

Problem Formulation

minw ,u

λ

d∑j=1

φj(wj) +1m

m∑i=1

l(ui)

subject to ui = 〈w , xi〉 i = 1 . . .m

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 21 / 25

Page 55: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Regularized Risk Minimization

Problem Formulation

minw ,u

maxα

λ

d∑j=1

φj(wj) +1m

m∑i=1

l(ui) +1m

m∑i=1

αi(ui − 〈w , xi〉)

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 21 / 25

Page 56: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Regularized Risk Minimization

Problem Formulation

maxα

minw ,u

λ

d∑j=1

φj(wj) +1m

m∑i=1

l(ui) +1m

m∑i=1

αi(ui − 〈w , xi〉)

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 21 / 25

Page 57: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Regularized Risk Minimization

Problem Formulation

maxα

minw

λ

d∑j=1

φj(wj)−1m

m∑i=1

αi 〈w , xi〉+1m

minu

m∑i=1

(l(ui) + αiui)

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 21 / 25

Page 58: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Regularized Risk Minimization

Problem Formulation

maxα

minw

λ

d∑j=1

φj(wj)−

⟨w ,

1m

m∑i=1

αixi

⟩+

1m

m∑i=1

l∗(−αi)

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 21 / 25

Page 59: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Regularized Risk Minimization

Problem Formulation

maxα

minw

m∑i=1

∑j∈Ωi

(λ∣∣Ωj∣∣φj(wj)−

1mαiwjxij +

1m |Ωi |

l∗(−αi)

)

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 21 / 25

Page 60: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Regularized Risk Minimization

Stochastic Gradients

∂wj J(w , α) = |Ω|

(λ∣∣Ωj∣∣∂φj(wj)−

1mαixij

)

∂αi J(w , α) = |Ω|(− 1

mwjxij +

1m |Ωi |

∂αi l∗(−αi)

).

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 22 / 25

Page 61: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Regularized Risk Minimization

Decoupling the updates

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 23 / 25

Page 62: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Regularized Risk Minimization

It Converges!

0 20 40 60 80 10010−2

10−1

number of iterations

dual

gap

KDDA Test Dataset

10 percent coordinatesall coordinates

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 24 / 25

Page 63: NOMAD: Non-locking, stOchastic Multi-machine algorithm for ... › ~vishy › talks › nomad.pdf · NOMAD for Matrix Completion Our answer Non-locking, stOchastic Multi-machine algorithm

NOMAD for Regularized Risk Minimization

Joint work with

Reading Thread Training Thread

Update

RAM

Weight Vector

RAM

Cached Data (Working Set)

Disk

Dataset

Read

(Random Access)

Read

(Sequential Access)

Load

(Random Access)

S.V. N. Vishwanathan (Purdue and Amazon) Decentralized Optimization for ML 25 / 25