16
Machine Learning Curtis Huang [email protected]

Modern Machine Learning Infrastructure and Practices

Embed Size (px)

Citation preview

Machine Learning

Curtis Huang [email protected]

•  Robotics •  Pricing and Optimization •  Big Data, Hadoop and Spark •  Data Science, ML in Display Advertising •  ML, Relevance in Sponsored Search •  Contenting Ranking for FB Posts

About Me

•  Advantages from mining/learning patterns in data •  Cost of Storage and Compute •  Distributed Systems

Machine Learning Why Now?

What People Think

Reality

•  Specific Tasks •  Quality Data •  Feature Engineering •  Iterations of Experiments

ML Today

Domain Knowledge

StatisticsEngineering

ML Workflow New Hypothesis• Data Analysis• Problem Formulation• Short/Long Term Objectives

Data Preparation• Acquire Data• Synthesize• Clean/Reformat

Feature Engineering• Domain Knowledge• Creativity• Extraction Pipeline

Online Evaluation• Bucket Test• Launch Criteria• Metrics – CTR, Time Spent• Performance Impact

Offline Evaluation• Evaluate on Test Set• Metrics – PR/AUC/NDCG

Model Training• Training Algorithm• Hyper Parameter Tuning• Over-fitting

Data Algorithm Train Model Fault Tolerant Deployment

ML Workflow

Example of a ML System

Datastore

ETL Ad-hoc Analysis

MLFramework

DistributedKV-Store

Snapshot RealtimeFeatures

AlgorithmService

LoggingService

•  Ad-hoc Analysis •  Adding and Validating New Features •  Gap between Online/Offline Metrics •  System/Other Issues

Challenges and Lessons Learned 4 V’s of Big Data

Deep Learning

Word2Vec[1] in Spark

[1]Mikolov et al.

•  ConvNet for computer vision tasks •  Network architecture

Krizhevsky et. al. (2011)

Reed et. al. 2016

•  Expensive computation in training (clusters, GPUs) •  Interpretability of model •  Power consumption

Challenges

Thank You

Curtis Huang [email protected]