Review: The best frameworks for machine learning …postachio-files.s3-website-us-east-1.amazonaws.com/291bc8b5-da61-4...By Martin Heller Review: The best frameworks for machine learning

By Martin Heller

Review: The best frameworks for machine learning and deeplearning

infoworld.com/article/3163525/analytics/review-the-best-frameworks-for-machine-learning-and-deep-learning.html

Over the past year I've reviewed half a dozen open source machine learning and/or deep learning frameworks:Caffe, Microsoft Cognitive Toolkit (aka CNTK 2), MXNet, Scikit-learn, Spark MLlib, and TensorFlow. If I had cast mynet even wider, I might well have covered a few other popular frameworks, including Theano (a 10-year-old Pythondeep learning and machine learning framework), Keras (a deep learning front end for Theano and TensorFlow), andDeepLearning4j (deep learning software for Java and Scala on Hadoop and Spark). If you’re interested in workingwith machine learning and neural networks, you’ve never had a richer array of options.

There's a difference between a machine learning framework and a deep learning framework. Essentially, a machinelearning framework covers a variety of learning methods for classification, regression, clustering, anomaly detection,and data preparation, and it may or may not include neural network methods. A deep learning or deep neuralnetwork (DNN) framework covers a variety of neural network topologies with many hidden layers. These layerscomprise a multistep process of pattern recognition. The more layers in the network, the more complex the featuresthat can be extracted for clustering and classification.

Caffe, CNTK, DeepLearning4j, Keras, MXNet, and TensorFlow are deep learning frameworks. Scikit-learn andSpark MLlib are machine learning frameworks. Theano straddles both categories.

In general, deep neural network computations run an order of magnitude faster on a GPU (specifically an NvidiaCUDA general-purpose GPU, for most frameworks), rather than on a CPU. In general, simpler machine learningmethods don't need the speedup of a GPU.

While you can train DNNs on one or more CPUs, the training tends to be slow, and by slow I'm not talking aboutseconds or minutes. The more neurons and layers that need to be trained, and the more data available for training,the longer it takes. When the Google Brain team trained its language translation models for the new version ofGoogle Translate in 2016, they ran their training sessions for a week at a time, on multiple GPUs. Without GPUs,each model training experiment would have taken months.

Each of these packages has at least one distinguishing characteristic. Caffe's strength is convolutional DNNs forimage recognition. Cognitive Toolkit has a separate evaluation library for deploying prediction models that works onASP.Net websites. MXNet has excellent scalability for training on multi-GPU and multimachine configurations. Scikit-learn has a wide selection of robust machine learning methods and is easy to learn and use. Spark MLlib integrateswith Hadoop and has excellent scalability for machine learning. TensorFlow has a unique diagnostic facility for itsnetwork graphs, TensorBoard.

On the other hand, the training speed of all the deep learning packages on GPUs is nearly identical. That's becausethe training inner loops spend most of their time in the Nvidia CuDNN package. Still, each package takes asomewhat different approach to describing neural networks, with two major camps: those that use a graphdescription file, and those that create their descriptions by executing code.

With that in mind, let's dive into each one.

Caffe

The Caffe deep learning project, originally a strong framework for image classification, seems to be stalling, based

1/10

http://www.infoworld.com/article/3163525/analytics/review-the-best-frameworks-for-machine-learning-and-deep-learning.html?upd=1490170831196

http://www.infoworld.com/article/3154273/analytics/review-caffe-deep-learning-conquers-image-classification.html

http://www.infoworld.com/article/3138507/artificial-intelligence/review-microsoft-takes-on-tensorflow.html

http://www.infoworld.com/article/3149598/artificial-intelligence/mxnet-review-amazons-scalable-deep-learning.html

http://www.infoworld.com/article/3158509/analytics/review-scikit-learn-shines-for-simpler-machine-learning.html

http://www.infoworld.com/article/3141605/artificial-intelligence/review-spark-lights-up-machine-learning.html

http://www.infoworld.com/article/3127397/artificial-intelligence/review-tensorflow-shines-a-light-on-deep-learning.html

http://caffe.berkeleyvision.org/

on its persistent bugs, as well the fact that it has been stuck at version 1.0 RC3 for more than a year and thefounders have left the project. It still has good convolutional networks for image recognition and good support forNvidia CUDA GPUs, as well as a straightforward network description format. On the other hand, its models oftenneed substantial amounts of GPU memory (more than 1GB) to run, its documentation is spotty and problematic,support is hard to obtain, and installation is iffy, especially for its Python notebook support.

Caffe has command-line, Python, and Matlab interfaces, and it relies on ProtoText files to define its models andsolvers. Caffe defines a network layer by layer in its own model schema. The network defines the entire modelbottom to top from input data to loss. As data and derivatives flow through the network in the forward and backwardpasses, Caffe stores, communicates, and manipulates the information as blobs (binary large objects) that internallyare N-dimensional arrays stored in a C-contiguous fashion (meaning the rows of the array are stored in contiguousblocks of memory, as in the C language). Blobs are to Caffe as tensors are to TensorFlow.

Layers perform operations on blobs and constitute the components of a Caffe model. Layers convolve filters,perform pooling, take inner products, apply nonlinearities (such as rectified-linear and sigmoid and other element-wise transformations), normalize, load data, and compute losses such as softmax and hinge.

Caffe has proven its effectiveness in image classification, but its moment seems to have passed. Unless an existingCaffe model fits your needs or could be fine-tuned to your purposes, I recommend using TensorFlow, MXNet, orCNTK instead.

2/10

Ads by Kiosked

3/10

http://kiosked.com/

InfoWorld

A precomputed Caffe Jupyter notebook displayed in NBViewer. This notebook explains doing “surgery”on Caffe networks using a cute kitten.

Microsoft Cognitive Toolkit

Microsoft Cognitive Toolkit is a fast and easy-to-use deep learning package, but it is limited in scope compared toTensorFlow. It has a good variety of models and algorithms, excellent support for Python and Jupyter notebooks, aninteresting declarative BrainScript neural network configuration language, and automated deployment for Windowsand Ubuntu Linux.

On the downside, when I reviewed Beta 1 the documentation had not yet been fully updated to CNTK 2, and thepackage had no MacOS support. While there have been many improvements to CNTK 2 since Beta 1, including anew memory compression mode to reduce memory usage on GPUs and new Nuget installation packages, MacOSsupport is still absent.

The Python API added for Beta 1 helps to bring the Cognitive Toolkit to mainstream, Python-writing, deep learningresearchers. The API contains abstractions for model definition and compute, learning algorithms, data reading, anddistributed training. As a supplement to the Python API, CNTK 2 has new Python examples and tutorials, along withsupport of Google’s protocol buffers serialization. The tutorials are implemented as Jupyter notebooks.

CNTK 2 components can handle multidimensional dense or sparse data from Python, C++, or BrainScript. TheCognitive Toolkit includes a wide variety of neural network types: FFN (Feedforward), CNN (Convolutional),RNN/LSTM (Recurrent/Long Short Term Memory), batch normalization, and sequence to sequence with attention,for starters. It supports reinforcement learning, generative adversarial networks, supervised and unsupervisedlearning, automatic hyperparameter tuning, and the ability to add new, user-defined, core components on the GPUfrom Python. It is able to do parallelism with accuracy on multiple GPUs and machines, and (Microsoft claims) it canfit even the largest models into GPU memory.

The CNTK 2 APIs support defining networks, learners, readers, training, and evaluation from Python, C++, andBrainScript. They also support evaluation with C#. The Python API interoperates with NumPy and includes a high-level layers library that enables concise definition of advanced neural networks, including recurrences. The toolkitsupports representation of recurrent models in symbolic form as cycles in the neural network instead of requiringstatic unrolling of the recurrence steps.

You can train CNTK 2 models on Azure networks and GPUs. The GPU-equipped N-series family of Azure VirtualMachines, which was in limited rollout when I reviewed Beta 1, is now generally available and fully manageablefrom the Azure console.

4/10

https://www.microsoft.com/en-us/research/product/cognitive-toolkit/

https://developers.google.com/protocol-buffers/docs/overview

Ads by Kiosked

InfoWorld

Several CNTK 2/Microsoft Cognitive Toolkit tutorials are supplied as Jupyter notebooks. The figureshows the visualizations plotted for the training of the Logistic Regression tutorial.

MXNet

MXNet, a portable, scalable deep learning library that is Amazon's DNN framework of choice, combines symbolicdeclaration of neural network geometries with imperative programming of tensor operations. MXNet scales tomultiple GPUs across multiple hosts with a near-linear scaling efficiency of 85 percent and boasts excellentdevelopment speed, programmability, and portability. It supports Python, R, Scala, Julia, and C++ to variousdegrees, and it allows you to mix symbolic and imperative programming flavors.

At the time I reviewed MXNet the documentation felt unfinished, and I found few examples for languages other thanPython. Both situations have improved since my review.

5/10

http://kiosked.com/

http://mxnet.io/

The MXNet platform is built on a dynamic dependency scheduler that automatically parallelizes both symbolic andimperative operations on the fly, although you have to tell MXNet which GPU and CPU cores to use. A graphoptimization layer on top of the scheduler makes symbolic execution fast and memory efficient.

MXNet currently supports building and training models in Python, R, Scala, Julia, and C++; trained MXNet modelscan also be used for prediction in Matlab and JavaScript. No matter what language you choose for building yourmodel, MXNet calls an optimized C++ back-end engine.

The MXNet authors consider their API a superset of what's offered in Torch, Theano, Chainer, and Caffe, albeit withmore portability and support for GPU clusters. In many respects MXNet is similar to TensorFlow, but with the addedability to embed imperative tensor operations.

In addition to the practically obligatory MNIST digit classification, the MXNet tutorials for computer vision coverimage classification and segmentation using convolutional neural networks (CNN), object detection using Faster R-CNN, neural art, and large-scale image classification using a deep CNN and the ImageNet data set. There areadditional tutorials for natural language processing, speech recognition, adversarial networks, and both supervisedand unsupervised machine learning.

Ads by Kiosked

Amazon

6/10

http://kiosked.com/

http://www.allthingsdistributed.com/2016/11/mxnet-default-framework-deep-learning-aws.html

Amazon tested an Inception v3 algorithm implemented in MXNet on P2.16xlarge instances and found ascaling efficiency of 85 percent.

Scikit-learn

The Scikit-learn Python framework has a wide selection of robust machine learning algorithms, but no deeplearning. If you’re a Python fan, Scikit-learn may well be your best option among the plain machine learninglibraries.

Scikit-learn is a robust and well-proven machine learning library for Python with a wide assortment of well-established algorithms and integrated graphics. It is relatively easy to install, learn, and use, and it has goodexamples and tutorials.

On the con side, Scikit-learn does not cover deep learning or reinforcement learning, lacks graphical models andsequence prediction, and can't really be used from languages other than Python. It doesn't support PyPy, the Pythonjust-in-time compiler, or GPUs. That said, except for its minor foray into neural networks, it doesn't really have speedproblems. It uses Cython (the Python-to-C compiler) for functions that need to be fast, such as inner loops.

Scikit-learn has a good selection of algorithms for classification, regression, clustering, dimensionality reduction,model selection, and preprocessing. It has good documentation and examples for all of these, but lacks any kind ofguided workflow for accomplishing these tasks.

Scikit-learn earns top marks for ease of development, mostly because the algorithms all work as advertised anddocumented, the APIs are consistent and well designed, and there are few "impedance mismatches" between datastructures. It's a pleasure to work with a library where the features have been thoroughly fleshed out and the bugsthoroughly flushed out.

7/10

http://scikit-learn.org/stable/

Ads by Kiosked

InfoWorld

8/10

http://kiosked.com/

This example uses Scikit-learn’s small handwritten digit data set to demonstrate semi-supervisedlearning using a Label Spreading model. Only 30 of the 1,797 total samples were labeled.

On the other hand, the library does not cover deep learning or reinforcement learning, which leaves out the currenthard but important problems, such as accurate image classification and reliable real-time language parsing andtranslation. Clearly, if you’re interested in deep learning, you should look elsewhere.

Nevertheless, there are many problems, ranging from building a prediction function linking different observations toclassifying observations to learning the structure of an unlabeled data set, that lend themselves to plain old machinelearning without needing dozens of layers of neurons. For those areas Scikit-learn is very good indeed.

InfoWorldScorecard

Models andalgorithms

(25%)

Ease ofdevelopment

(25%)Documentation

(20%)Performance

(20%)

Ease ofdeployment

(10%)

OverallScore(100%)

Caffe 1.0 RC3 8 8 7 9 8 8.0

MicrosoftCognitiveToolkit v2.0Beta 1

8 9 8 10 9 8.8

MXNet v0.7 8 8 7 10 8 8.2

Scikit-learn0.18.1

9 9 9 8 9 8.8

Spark MLlib2.01

9 8 8 9 8 8.5

TensorFlowr0.10

9 8 9 10 8 8.9

At a Glance

Caffe 1.0 RC3

InfoWorld Rating

Learn more

on Berkeley Artificial Intelligence...

Microsoft Cognitive Toolkit v2.0 Beta 1

InfoWorld Rating

Learn more

on Microsoft

9/10

http://caffe.berkeleyvision.org/

https://www.microsoft.com/en-us/research/product/cognitive-toolkit/

MXNet v0.7

InfoWorld Rating

Learn more

on Distributed Machine Learning...

Scikit-learn 0.18.1

InfoWorld Rating

Learn more

on Scikit-learn

Spark MLlib 2.01

InfoWorld Rating

Learn more

on Apache Software Foundation

TensorFlow r0.10

InfoWorld Rating

Learn more

on Google

10/10

http://mxnet.io/

http://scikit-learn.org/stable/#

http://spark.apache.org/

https://www.tensorflow.org/

Documents

Review: The best frameworks for machine learning …postachio-files.s3-website-us-east-1.amazonaws.com/291bc8b5-da61-4...By Martin Heller Review: The best frameworks for machine learning