Machine Learning Frameworks
Brendan Herger [email protected]
Brendan Herger [email protected]
https://xkcd.com/530/
Load Data Feature Engineering ML
Why are we here?There are many awesome,
competing, overlapping machine learning libraries
And people usually stick to one
Criteria Overview of libraries
Case study
Criteria
• What each project does
• Scalability
• Usability
• Integration
Overview of Libraries
R / CRANStrengths
• Algorithm selection - CRAN Library
• Community
• Commercial backing - Revolution Analytics / RStudio
• Packages (Plotting - GGPlot 2, Time series - SARIMA / GARCH)
R / CRANWeaknesses
• Algorithm implementation
• Documentation
• Computational limits
• Ongoing support
R / CRANRoadmap
• Go write it
• Commercial support: Multi-threading, scalability, Hadoop map-reduce
Pandas / SKLearnStrengths
• Unified APIs
• Algorithm Choice / quick iteration
• Documentation
• Other Python packages
Pandas / SKLearnWeaknesses
• Limited to one machine
• Two packages shimmed together
• Python broilerplate
Pandas / SKLearnRoadmap
• Python: Multithreading (Project Ibis)
• Pandas: Rapid development, better time series, better SKLearn integration
• SKLearn: C Implementations of some algorithms, awaiting new algorithms
Spark MLLibStrengths
• Commercial Backing
• Programming Languages
• Community
Spark MLLibWeaknesses
• Algorithm Implementations
• Documentation
• Community
Spark MLLibRoadmap
Spark MLLibRoadmap
• ML - Pipelines for MLLib
• “Datasets”: RDDs + “Data Frames”
H2O.aiStrengths
• Speed & Data Scale
• Programming Languages
• Commercial Backing
H2O.aiWeaknesses
• Algorithm Choice, Utility Functions
• Community Support
H2O.aiRoadmap
• Go to the Keynote talks
Which is best?
[ ] is best