13
Machine Learning on Cell Processor Supervisor: Dr. Eric McCreath Student: Robin Srivastava

Presentation on experimental setup for verigying - "Slow Learners are Fast"

Embed Size (px)

Citation preview

Page 1: Presentation on experimental setup for verigying  - "Slow Learners are Fast"

Machine Learning on Cell Processor

Supervisor: Dr. Eric McCreath Student: Robin Srivastava

Page 2: Presentation on experimental setup for verigying  - "Slow Learners are Fast"

Background and Motivation

Machine Learning

Batch Learning

Online Learning

Email-N ……..… email-2 Email-1

HAM

SPAM

Page 3: Presentation on experimental setup for verigying  - "Slow Learners are Fast"

Background and Motivation

Machine Learning

Sequential in Nature

Batch Learning

Online Learning

Email-N ……..… email-2 Email-1

HAM

SPAM

Page 4: Presentation on experimental setup for verigying  - "Slow Learners are Fast"

Object   Performance evaluation of a parallel online machine

learning algorithm (Langford et. al. [1])   Target Machines

  Cell Processor: One 3 GHz 64-bit IBM PowerPC, six specialized co-processors

  Intel Dual Core Machine: 2GHz dual core processor, 1.86 GB of main memory

Page 5: Presentation on experimental setup for verigying  - "Slow Learners are Fast"

Stochastic Gradient Descent   Step 1: Initialize weight vector w0 with some arbitrary

values   Step 2: Update the weight vector as follows

where is the gradient of error function and is the learning rate

  Step 3: Follow Step 2 for all the units for data €

w(t+1) = wt −η∇E wt( )

∇E

η

Page 6: Presentation on experimental setup for verigying  - "Slow Learners are Fast"

Delayed Stochastic Gradient Descent   Step 1: Initialize weight vector w0 with some arbitrary

values   Step 2: Update the weight vector as follows

where is the gradient of error function and is the learning rate

  Step 3: Follow Step 2 for all the units for data €

w(t+1) = wt −η∇E wt−τ( )

∇E

η

Page 7: Presentation on experimental setup for verigying  - "Slow Learners are Fast"

Implementation Model C

ompl

ete

Dat

aset

Page 8: Presentation on experimental setup for verigying  - "Slow Learners are Fast"

Implementation   Dataset – TREC 2007 Public Corpus

  Number of mail: 75,419   Each mail classified as either ‘ham’ or ‘spam’

  Pre-processing   Total number of features extracted: 2,218,878   Pre-processed email format

<Number of features><space><index>:<count><space>…………..<index>:<count>

Page 9: Presentation on experimental setup for verigying  - "Slow Learners are Fast"

Memory Requirement   Algorithm Implemented

  Online Logistic Regression with delayed update   Requirement per level of parallelization

  Two private copy of weight vectors   Two shared copy of weight vectors   Two error gradients   Required Dimension for each = Number of features = 2,218,878   Data type: Float (On Cell takes 4 bytes)   Total = (6 x 2218878) x 4 = 53,253,072 bytes = 50.78 MB   Size occupied by other auxiliary variables

  Alternatively   Make only shared copy use the full dimension   Total size = (2 x 2218878) x 4 = 16.9 MB + others

Page 10: Presentation on experimental setup for verigying  - "Slow Learners are Fast"

Limitations on Cell   Memory limitation of SPE

  Available: 256 KB   Required: approx. 51 MB   Work Around:

  Reduced the number of features   Done one more level of pre-processing

  SIMD limitation   The time wasted in preparing the data for SIMD surpassed its

benefits for this implementation

Page 11: Presentation on experimental setup for verigying  - "Slow Learners are Fast"

Results   Serial implementation of logistic regression on Intel Dual

core took 36.93 and 36.45 sec respectively for two consecutive executions.

  Parallel implementation using stochastic gradient process

Page 12: Presentation on experimental setup for verigying  - "Slow Learners are Fast"

Results (contd.)   Performance on Cell

Tim

e in

mic

rose

cond

s

Page 13: Presentation on experimental setup for verigying  - "Slow Learners are Fast"

References ①  John Langford, Alexander J. Samola and Martin Zinkevich.

Slow learners are fast published in Journal of Machine Learning Research 1(2009)

②  Michael Kistler, Michael Perrone, Fabrizio Petrini. Cell Multiprocessor Communication Network: Built for Speed.

③  Thomas Chen , Ram Raghavan , Jason Dale and Eiji Iwata. Cell Broadband Engine Architecture and its first implementation

④  Jonathan Bartlett. Programming high-performance applications on the Cell/B.E. processor, Part 6: Smart buffer management with DMA transfers

⑤  Introduction to Statistical Machine Learning, 2010 course assignment 1

⑥  Christopher Bishop, Pattern Recognition and Machine Learning.