Machine Learning Streams with Spark 1.0

  • View
    255

  • Download
    3

Embed Size (px)

DESCRIPTION

 

Text of Machine Learning Streams with Spark 1.0

  • Seattle Spark Meetup Machine Learning Streams with Spark 1.0 Drew Minkin Principal Program Manager, Ubix Labs
  • Machine Learning and Business Analytics Streams and Real Time Analytics Deep Dive into MLlib AGENDA
  • Machine Learning and Business Analytics
  • Machine Learning is Not A Spectator Sport
  • Machine Learning and Data Science http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
  • Reactive Proactive ProductionResearch The Analytics Spectrum http://halobi.com/wp-content/uploads/Blog-1-1024x600.png Graph Data Management Simulation Process ImprovementContent Delivery Knowledge Management Data Modeling Visualization Data Quality Monitoring Analysis Optimization Algorithms Trialing Statistics Domain Expertise Integration Big Data Collaboration Descriptive Predictive Prescriptive
  • Five Families of Algorithms http://en.wikipedia.org/wiki/Wu_Xing Association Classification Estimation Forecasting Clustering
  • Classification http://akorra.com/2012/06/06/top-10-creatures-that-influenced-martial-arts/ Target a Discrete Answer Yes/No Find All Columns Driving its Value Use model to score new records Many Different Measures of Accuracy Quick and Improving Iterations Most Actionable Types of Models Hospital Readmission Equipment Failure Likelihood to purchase Examples Credit Scoring Banding
  • Association and Sequencing http://38.media.tumblr.com/tumblr_m81wcfIO3V1qmzwx0o1_1280.jpg Examples Collaborative Filtering Identify cross-sell Identify sequential, next-sale Make purchase recommendations Complex event associations Transactions and items in Rules, Sequences and Itemsets out Recommender Systems
  • Forecasting and Time Series http://akorra.com/2012/06/06/top-10-creatures-that-influenced-martial-arts/ Input of measure over time and related series Predictions generated for short term trends Based on cycles and events Examples Workforce Optimization Timing Purchasing Decisions Optimizing Maintenance Windows Material Cost Planning Equipment Usage Planning Demand Sensing
  • Estimation and Regression http://akorra.com/2012/06/06/top-10-creatures-that-influenced-martial-arts/ Predicting a Continuous Distribution Many Different Measures of Accuracy Quick and Improving Iterations Most Actionable Types of Models Length Of Stay Estimation Customer Lifetime Value Examples Pricing Optimization
  • Clustering http://akorra.com/2012/06/06/top-10-creatures-that-influenced-martial-arts/ Hard and Soft Groupings Profiles of Subgroups Likenesses and Differences Examples Marketing Campaigns Reward Programs Equipment Utilization Process Improvement Analysis Market Segmentation
  • Combining Algorithms in Harmony http://en.wikipedia.org/wiki/Wu_Xing
  • Streams and Real Time Analytics
  • The Challenges of Scaling Analytics Classes of Analytics Complexity Spark vs. Storm, etc. Stream Paradigms and Spark AGENDA Streams and Real Time Analytics
  • Will Business Run out of Modeling Opportunities?
  • The Approaching Crisis for Machine Learning
  • Hype vs. Reality in Scaling Data Science http://www.kdnuggets.com/2013/04/poll-results-largest-dataset-analyzed-data-mined.html
  • 2009 vs. 2014 Scaling Data Science http://www.kdnuggets.com
  • Spectrum of Stream Based Analytics Latency Events/Sec Months Days Hours Minutes Seconds 100 ms < 1 ms 0 10 102 103 104 105 106 Big Data NoSQL RDBMS Business Monitoring Machine Monitoring Real Time Monitoring Web Analytics EDW Analytics Operational Analytics http://www.cs.ucr.edu/~mueen/ppt/StreamInsigh%205%20SLIDE%20DEMO.pptx
  • Challenges of Stream Based Applications http://www.cs.ucr.edu/~mueen/ppt/StreamInsigh%205%20SLIDE%20DEMO.pptx Devices Sensors Web servers Feeds Complex Analytics & Mining
  • Challenges of Stream Based Applications http://www.cs.ucr.edu/~mueen/ppt/StreamInsigh%205%20SLIDE%20DEMO.pptx Hopping Windows Tumbling Windows Event Synchronization Latency Time Window Management
  • Deep Dive into MLlib
  • Architecture Descriptive Analytics Predictive Analytics Prescriptive Analytics AGENDA Deep Dive into MLlib
  • MLlib Descriptive Analytics http://halobi.com/wp-content/uploads/Blog-1-1024x600.png Graph Data Management Simulation Process Improvement Reactive Proactive ProductionResearch Content Delivery Knowledge Management Data Modeling Visualization Data Quality Monitoring Analysis Optimization Algorithms Trialing Statistics Domain Expertise Integration Big Data Collaboration
  • MLlib Descriptive Analytics - Data Types http://halobi.com/wp-content/uploads/Blog-1-1024x600.png Vectors Dense
  • MLlib Descriptive Analytics - Data Types http://halobi.com/wp-content/uploads/Blog-1-1024x600.png Vectors Sparse
  • MLlib Descriptive Analytics - Data Types http://halobi.com/wp-content/uploads/Blog-1-1024x600.png Linear Algebra CoordinateMatrix DistributedMatrix IndexedRow IndexedRowMatrix MatrixEntry RowMatrix
  • MLlib Descriptive Analytics Summary Statistics http://halobi.com/wp-content/uploads/Blog-1-1024x600.png Sample size Maximum value of each column Sample mean vector Minimum value of each column Number of nonzero elements Sample variance vector
  • MLlib Descriptive Analytics - SVD http://public.lanl.gov/mewall/kluwer2002.html Singular Value Decomposition Can Collapse Sparse Matrices to Denser Forms
  • MLlib Descriptive Analytics PCA http://halobi.com/wp-content/uploads/Blog-1-1024x600.png Primary Component Analysis Reduces Dimensionality with Feature Selection
  • MLLib Predictive Analytics http://halobi.com/wp-content/uploads/Blog-1-1024x600.png Graph Data Management Simulation Process Improvement Reactive Proactive ProductionResearch Content Delivery Knowledge Management Data Modeling Visualization Data Quality Monitoring Analysis Optimization Algorithms Trialing Statistics Domain Expertise Integration Big Data Collaboration
  • MLlib Predictive Analytics Bayesian Classifier http://xkcd.com/1132/
  • MLlib Predictive Analytics Logistic Regression http://halobi.com/wp-content/uploads/Blog-1-1024x600.png Grandaddy of Algorithms Coefficients from states or exact values Small scores can make big changes
  • MLlib Predictive Analytics - SVM http://www.youtube.com/watch?v=3liCbRZPrZA http://www.projectrho.com/public_html/rocket/fasterlight.php Linear Support Vector Machine for classifiers Behold the kernel trick
  • MLlib Predictive Analytics Regression http://halobi.com/wp-content/uploads/Blog-1-1024x600.png Linear Ridge Least Absolute Shrinkage & Selection Operator
  • MLlib Predictive Analytics Kmeans http://halobi.com/wp-content/uploads/Blog-1-1024x600.png
  • MLlib Predictive Analytics Matrix Factorization http://halobi.com/wp-content/uploads/Blog-1-1024x600.png Collaborative Filtering Alternating Least Squares (ALS)
  • Reactive Proactive ProductionResearch Prescriptive Analytics http://halobi.com/wp-content/uploads/Blog-1-1024x600.png Graph Data Management Simulation Process ImprovementContent Delivery Knowledge Management Data Modeling Visualization Data Quality Monitoring Analysis Optimization Algorithms Trialing Statistics Domain Expertise Integration Big Data Collaboration
  • MLlib Prescriptive Analytics Gradient Descent http://bleedingedgemachine.blogspot.com/2012/12/gradient-descent.html http://kungfupanda.wikia.com/wiki/Monkey Linear and Nonlinear Optimization minimize smooth functions without constraints,