Upload
stepan-pushkarev
View
127
Download
1
Embed Size (px)
Citation preview
Serverless Machine Learning Operations
by Stepan Pushkarev CTO of Hydrosphere.io
Mission: Accelerate Machine Learning to Production
Opensource Products:- Mist: Spark Compute as a Service- ML Lambda: ML Function as a Service - Sonar: Data and ML Monitoring
Business Model: Subscription services and hands-on consulting
About
Ops folks here?
Machine Learning nerds here?
VP/Managers/Strategy?
Development Operations are well studied
Machine Learning operations are ad hoc
● Research phase -> productization phase
● Scripts driven ./bin/spark-submit
python train.py
● Raw SQL / HiveQL / SQL on Hadoop
● Automated with Cron and/or Workflow
Managers
● Hosted Notebooks culture
ML Project Time to Market
ML Project Time to Market
ML Project Time to Market
- Go to production strategy from the Day 1
- Training: Serverless Spark Compute
- Serving/inferencing: Serverless ML Lambdas
Agenda
Why does business hire data scientists?
Why do companies hire data scientists?
To make products smarter.
What is a deliverable of data scientist?
Academic paper
ML Model R/Python script
Jupiter Notebook
BI Dashboard
How to move this to prod?
Academic paper?
ML Model? R/Python script?
Jupiter Notebook?
BI Dashboard?
Tragedy 1: Engineer to re-implement R/Python script
Tragedy 2: Notebook/scripts deployments
Tragedy 2: Run notebook/script as it is using cron
© Daniel Tunkelang - Where should you put your data scientists? - www.slideshare.net/dtunkelang/where-should-you-put-your-data-scientists
Step 1 (management): Integrate data scientists into cross-functional teams
Step 2: Build/Deploy functions, not notebooks
Step 3: Monitor ML in production with other ML
● Data pipeline statistics
● Anomaly detection
● Pattern recognition
● Keep Data Scientist in
the loop
● Treat data errors as
Software bugs
Data Pipeline Functions
Batch Prediction Functions
From Vanilla Spark to serverless training and data processing
./bin/spark-submit
- Spark Sessions Pool
- Functions Registry
- Multi-tenancy
- REST API Framework
- Data API Framework
- Infrastructure
Integration (EMR,
Hortonworks, etc)
UX: Deploy Spark functions and trigger it from apps
Mist - Serverless proxy for Spark
DEMO
Machine Learning: training + serving
pipeline
Training (Estimation) pipeline
trainpreprocess preprocess
pipeline
Prediction Pipeline
preprocess preprocess
cluster
datamodel
data scientist
web app
docker
API
libs
model
Local Spark ML Serving Library:https://github.com/Hydrospheredata/spark-ml-serving
Model Artifact
Models - Runtimes - Formats Zoo
API & Logistics
- HTTP/1.1, HTTP/2, gRPC
- Kafka, Flink, Kinesis
- Protobuf, Avro
- Service Discovery
- Pipelining
- Tracing
- Monitoring
- Autoscaling
- Versioning
- A/B, Canary
- Testing
- CPU, GPU
Sidecar Architecture
UX: Train anywhere and deploy as a Function
UX: Models and Applications
Applications provide public endpoints for the models
and compositions of the models.
UX: Streaming Applications + Batching
UX: Pipelines, Assembles and BestSLA Applications
ML Function as a Service Demo!!!
Thank you
Looking for
- Feedback
- Advisors, mentors &
partners
- Pilots and early adopters
Stay in touch
- @hydrospheredata
- https://github.com/Hydrospheredata
- http://hydrosphere.io/