40
Serverless Machine Learning Operations by Stepan Pushkarev CTO of Hydrosphere.io

Serverless machine learning operations

Embed Size (px)

Citation preview

Page 1: Serverless machine learning operations

Serverless Machine Learning Operations

by Stepan Pushkarev CTO of Hydrosphere.io

Page 2: Serverless machine learning operations

Mission: Accelerate Machine Learning to Production

Opensource Products:- Mist: Spark Compute as a Service- ML Lambda: ML Function as a Service - Sonar: Data and ML Monitoring

Business Model: Subscription services and hands-on consulting

About

Page 3: Serverless machine learning operations

Ops folks here?

Machine Learning nerds here?

VP/Managers/Strategy?

Page 4: Serverless machine learning operations

Development Operations are well studied

Page 5: Serverless machine learning operations

Machine Learning operations are ad hoc

● Research phase -> productization phase

● Scripts driven ./bin/spark-submit

python train.py

● Raw SQL / HiveQL / SQL on Hadoop

● Automated with Cron and/or Workflow

Managers

● Hosted Notebooks culture

Page 6: Serverless machine learning operations

ML Project Time to Market

Page 7: Serverless machine learning operations

ML Project Time to Market

Page 8: Serverless machine learning operations

ML Project Time to Market

Page 9: Serverless machine learning operations

- Go to production strategy from the Day 1

- Training: Serverless Spark Compute

- Serving/inferencing: Serverless ML Lambdas

Agenda

Page 10: Serverless machine learning operations

Why does business hire data scientists?

Page 11: Serverless machine learning operations

Why do companies hire data scientists?

To make products smarter.

Page 12: Serverless machine learning operations

What is a deliverable of data scientist?

Academic paper

ML Model R/Python script

Jupiter Notebook

BI Dashboard

Page 13: Serverless machine learning operations

How to move this to prod?

Academic paper?

ML Model? R/Python script?

Jupiter Notebook?

BI Dashboard?

Page 14: Serverless machine learning operations

Tragedy 1: Engineer to re-implement R/Python script

Page 15: Serverless machine learning operations

Tragedy 2: Notebook/scripts deployments

Page 16: Serverless machine learning operations

Tragedy 2: Run notebook/script as it is using cron

Page 17: Serverless machine learning operations

© Daniel Tunkelang - Where should you put your data scientists? - www.slideshare.net/dtunkelang/where-should-you-put-your-data-scientists

Step 1 (management): Integrate data scientists into cross-functional teams

Page 18: Serverless machine learning operations

Step 2: Build/Deploy functions, not notebooks

Page 19: Serverless machine learning operations

Step 3: Monitor ML in production with other ML

● Data pipeline statistics

● Anomaly detection

● Pattern recognition

● Keep Data Scientist in

the loop

● Treat data errors as

Software bugs

Page 20: Serverless machine learning operations

Data Pipeline Functions

Page 21: Serverless machine learning operations

Batch Prediction Functions

Page 22: Serverless machine learning operations

From Vanilla Spark to serverless training and data processing

./bin/spark-submit

- Spark Sessions Pool

- Functions Registry

- Multi-tenancy

- REST API Framework

- Data API Framework

- Infrastructure

Integration (EMR,

Hortonworks, etc)

Page 23: Serverless machine learning operations
Page 24: Serverless machine learning operations

UX: Deploy Spark functions and trigger it from apps

Page 25: Serverless machine learning operations

Mist - Serverless proxy for Spark

DEMO

Page 26: Serverless machine learning operations

Machine Learning: training + serving

Page 27: Serverless machine learning operations

pipeline

Training (Estimation) pipeline

trainpreprocess preprocess

Page 28: Serverless machine learning operations

pipeline

Prediction Pipeline

preprocess preprocess

Page 29: Serverless machine learning operations

cluster

datamodel

data scientist

web app

docker

API

libs

model

Local Spark ML Serving Library:https://github.com/Hydrospheredata/spark-ml-serving

Page 30: Serverless machine learning operations

Model Artifact

Page 31: Serverless machine learning operations

Models - Runtimes - Formats Zoo

Page 32: Serverless machine learning operations

API & Logistics

- HTTP/1.1, HTTP/2, gRPC

- Kafka, Flink, Kinesis

- Protobuf, Avro

- Service Discovery

- Pipelining

- Tracing

- Monitoring

- Autoscaling

- Versioning

- A/B, Canary

- Testing

- CPU, GPU

Page 33: Serverless machine learning operations

Sidecar Architecture

Page 34: Serverless machine learning operations
Page 35: Serverless machine learning operations

UX: Train anywhere and deploy as a Function

Page 36: Serverless machine learning operations

UX: Models and Applications

Applications provide public endpoints for the models

and compositions of the models.

Page 37: Serverless machine learning operations

UX: Streaming Applications + Batching

Page 38: Serverless machine learning operations

UX: Pipelines, Assembles and BestSLA Applications

Page 39: Serverless machine learning operations

ML Function as a Service Demo!!!

Page 40: Serverless machine learning operations

Thank you

Looking for

- Feedback

- Advisors, mentors &

partners

- Pilots and early adopters

Stay in touch

- @hydrospheredata

- https://github.com/Hydrospheredata

- http://hydrosphere.io/

- [email protected]