Upload
olivier-grisel
View
6.410
Download
6
Embed Size (px)
Citation preview
Strategies & Tools for Parallel Machine Learning
in PythonPyConFR 2012 - Paris
Sunday, September 16, 2012
The Story
scikit-learn
The Cloud
stackoverflow
& kaggle
Sunday, September 16, 2012
...
Sunday, September 16, 2012
Parts of the Ecosystem
Multiple Machines with Multiple Cores ♥♥♥
Single Machine with Multiple Cores ♥♥♥
multiprocessing
Sunday, September 16, 2012
The Problem
Big CPU (Supercomputers - MPI)
Simulating stuff from models
Big Data (Google scale - MapReduce)
Counting stuff in logs / Indexing the Web
Machine Learning?
often somewhere in the middle
Sunday, September 16, 2012
Parallel ML Use Cases
• Model Evaluation with Cross Validation
• Model Selection with Grid Search
• Bagging Models: Random Forests
• Averaged Models
Sunday, September 16, 2012
Embarrassingly Parallel ML Use Cases
• Model Evaluation with Cross Validation
• Model Selection with Grid Search
• Bagging Models: Random Forests
• Averaged Models
Sunday, September 16, 2012
Inter-Process Comm. Use Cases
• Model Evaluation with Cross Validation
• Model Selection with Grid Search
• Bagging Models: Random Forests
• Averaged Models
Sunday, September 16, 2012
Cross Validation
Labels to Predict
Input Data
Sunday, September 16, 2012
Cross Validation
A B C
A B C
Sunday, September 16, 2012
Cross Validation
A B C
A B C
Subset of the data usedto train the model
Held-outtest set
for evaluation
Sunday, September 16, 2012
Cross Validation
A B C
A B C
A C B
A C B
B C A
B C A
Sunday, September 16, 2012
Model Selection
the Hyperparameters hell
p1 in [1, 10, 100]
p2 in [1e3, 1e4, 1e5]
Find the best combination of parameters that maximizes the Cross Validated Score
Sunday, September 16, 2012
Grid Search
(1, 1e3) (10, 1e3) (100, 1e3)
(1, 1e4) (10, 1e4) (100, 1e4)
(1, 1e5) (10, 1e5) (100, 1e5)
p1
p2
Sunday, September 16, 2012
(1, 1e3) (10, 1e3) (100, 1e3)
(1, 1e4) (10, 1e4) (100, 1e4)
(1, 1e5) (10, 1e5) (100, 1e5)
Sunday, September 16, 2012
Grid Search:Qualitative Results
Sunday, September 16, 2012
Grid Search:Cross Validated Scores
Sunday, September 16, 2012
Enough ML theory!
Lets go shopping^Wparallel computing!
Sunday, September 16, 2012
Single Machinewith
Multiple Cores
♥ ♥
♥ ♥
Sunday, September 16, 2012
multiprocessing>>> from multiprocessing import Pool>>> p = Pool(4)
>>> p.map(type, [1, 2., '3'])[int, float, str]
>>> r = p.map_async(type, [1, 2., '3'])>>> r.get()[int, float, str]
Sunday, September 16, 2012
multiprocessing
• Part of the standard lib
• Nice API
• Cross-Platform support (even Windows!)
• Some support for shared memory
• Support for synchronization (Lock)
Sunday, September 16, 2012
multiprocessing: limitations
• No docstrings in a stdlib module? WTF?
• Tricky / impossible to use the shared memory values with NumPy
• Bad support for KeyboardInterrupt
Sunday, September 16, 2012
Sunday, September 16, 2012
• transparent disk-caching of the output values and lazy re-evaluation (memoize pattern)
• easy simple parallel computing
• logging and tracing of the execution
Sunday, September 16, 2012
>>> from os.path.join>>> from joblib import Parallel, delayed
>>> Parallel(2)(... delayed(join)('/ect', s)... for s in 'abc')['/ect/a', '/ect/b', '/ect/c']
joblib.Parallel
Sunday, September 16, 2012
Usage in scikit-learn
• Cross Validationcross_val(model, X, y, n_jobs=4, cv=3)
• Grid SearchGridSearchCV(model, n_jobs=4, cv=3).fit(X, y)
• Random ForestsRandomForestClassifier(n_jobs=4).fit(X, y)
Sunday, September 16, 2012
>>> from joblib import Parallel, delayed>>> import numpy as np
>>> Parallel(2, max_nbytes=1e6)(... delayed(type)(np.zeros(int(i)))... for i in [1e4, 1e6])[<type 'numpy.ndarray'>, <class 'numpy.core.memmap.memmap'>]
joblib.Parallel:shared memory
Sunday, September 16, 2012
(1, 1e3) (10, 1e3) (100, 1e3)
(1, 1e4) (10, 1e4) (100, 1e4)
(1, 1e5) (10, 1e5) (100, 1e5)
Only 3 allocated datasets shared by all the concurrent workers performing
the grid search.
Sunday, September 16, 2012
Multiple Machineswith
Multiple Cores
♥ ♥
♥ ♥♥ ♥
♥ ♥♥ ♥
♥ ♥♥ ♥
♥ ♥
Sunday, September 16, 2012
MapReduce?
[ (k1, v1), (k2, v2), ...]
mapper
mapper
mapper
[ (k3, v3), (k4, v4), ...]
reducer
reducer
[ (k5, v6), (k6, v6), ...]
Sunday, September 16, 2012
Why MapReduce does not always work
Write a lot of stuff to disk for failover
Inefficient for small to medium problems
[(k, v)] mapper [(k, v)] reducer [(k, v)]
Data and model params as (k, v) pairs?
Complex to leverage for Iterative Algorithms
Sunday, September 16, 2012
• Parallel Processing Library
• Interactive Exploratory Shell
Multi Core & Distributed
IPython.parallel
Sunday, September 16, 2012
Demo!
Sunday, September 16, 2012
The AllReduce Pattern
• Compute an aggregate (average) of active node data
• Do not clog a single node with incoming data transfer
• Traditionally implemented in MPI systems
Sunday, September 16, 2012
AllReduce 0/3Initial State
Value: 2.0 Value: 0.5
Value: 1.1 Value: 3.2 Value: 0.9
Value: 1.0
Sunday, September 16, 2012
AllReduce 1/3Spanning Tree
Value: 2.0 Value: 0.5
Value: 1.1 Value: 3.2 Value: 0.9
Value: 1.0
Sunday, September 16, 2012
AllReduce 2/3Upward Averages
Value: 2.0 Value: 0.5
Value: 1.1(1.1, 1)
Value: 3.2(3.1, 1)
Value: 0.9(0.9, 1)
Value: 1.0
Sunday, September 16, 2012
AllReduce 2/3Upward Averages
Value: 2.0(2.1, 3)
Value: 0.5(0.7, 2)
Value: 1.1(1.1, 1)
Value: 3.2(3.1, 1)
Value: 0.9(0.9, 1)
Value: 1.0
Sunday, September 16, 2012
AllReduce 2/3Upward Averages
Value: 2.0(2.1, 3)
Value: 0.5(0.7, 2)
Value: 1.1(1.1, 1)
Value: 3.2(3.1, 1)
Value: 0.9(0.9, 1)
Value: 1.0(1.38, 6)
Sunday, September 16, 2012
AllReduce 3/3Downward Updates
Value: 2.0(2.1, 3)
Value: 0.5(0.7, 2)
Value: 1.1(1.1, 1)
Value: 3.2(3.1, 1)
Value: 0.9(0.9, 1)
Value: 1.38
Sunday, September 16, 2012
AllReduce 3/3Downward Updates
Value: 1.38 Value: 1.38
Value: 1.1(1.1, 1)
Value: 3.2(3.1, 1)
Value: 0.9(0.9, 1)
Value: 1.38
Sunday, September 16, 2012
AllReduce 3/3Downward Updates
Value: 1.38 Value: 1.38
Value: 1.38 Value: 1.38 Value: 1.38
Value: 1.38
Sunday, September 16, 2012
AllReduce Final State
Value: 1.38 Value: 1.38
Value: 1.38 Value: 1.38 Value: 1.38
Value: 1.38
Sunday, September 16, 2012
AllReduce Implementations
http://mpi4py.scipy.org
IPC directly w/ IPython.parallel
https://github.com/ipython/ipython/tree/master/docs/examples/parallel/interengine
Sunday, September 16, 2012
Working in the Cloud
• Launch a cluster of machines in one cmd: starcluster start mycluster -b 0.07
starcluster sshmaster mycluster
• Supports spotinstances!
• Ships blas, atlas, numpy, scipy!
• IPython plugin!
Sunday, September 16, 2012
Perspectives
Sunday, September 16, 2012
2012 results by Stanford / Google
Sunday, September 16, 2012
The YouTube Neuron
Sunday, September 16, 2012
Thanks
• http://j.mp/ogrisel-pyconfr-2012
• http://scikit-learn.org
• http://packages.python.org/joblib
• http://ipython.org
• http://star.mit.edu/cluster/
@ogrisel
Sunday, September 16, 2012
Goodies
Super Ctrl-C to killall nosetests
Sunday, September 16, 2012
psutil nosetests killer script in Automator
Sunday, September 16, 2012
Bind killnose to Shift-Ctrl-C
Sunday, September 16, 2012
ps aux | grep IPython.parallel \ | grep -v grep \ | cut -d ' ' -f 2 | xargs kill
Or simpler:
Sunday, September 16, 2012