Serverless machine learning operations

Serverless Machine Learning Operations

by Stepan Pushkarev CTO of Hydrosphere.io

Mission: Accelerate Machine Learning to Production

Opensource Products:- Mist: Spark Compute as a Service- ML Lambda: ML Function as a Service - Sonar: Data and ML Monitoring

Business Model: Subscription services and hands-on consulting

About

Ops folks here?

Machine Learning nerds here?

VP/Managers/Strategy?

Development Operations are well studied

Machine Learning operations are ad hoc

● Research phase -> productization phase

● Scripts driven ./bin/spark-submit

python train.py

● Raw SQL / HiveQL / SQL on Hadoop

● Automated with Cron and/or Workflow

Managers

● Hosted Notebooks culture

ML Project Time to Market



- Go to production strategy from the Day 1

- Training: Serverless Spark Compute

- Serving/inferencing: Serverless ML Lambdas

Agenda

Why does business hire data scientists?

Why do companies hire data scientists?

To make products smarter.

What is a deliverable of data scientist?

Academic paper

ML Model R/Python script

Jupiter Notebook

BI Dashboard

How to move this to prod?

Academic paper?

ML Model? R/Python script?

Jupiter Notebook?

BI Dashboard?

Tragedy 1: Engineer to re-implement R/Python script

Tragedy 2: Notebook/scripts deployments

Tragedy 2: Run notebook/script as it is using cron

© Daniel Tunkelang - Where should you put your data scientists? - www.slideshare.net/dtunkelang/where-should-you-put-your-data-scientists

Step 1 (management): Integrate data scientists into cross-functional teams

Step 2: Build/Deploy functions, not notebooks

Step 3: Monitor ML in production with other ML

● Data pipeline statistics

● Anomaly detection

● Pattern recognition

● Keep Data Scientist in

the loop

● Treat data errors as

Software bugs

Data Pipeline Functions

Batch Prediction Functions

From Vanilla Spark to serverless training and data processing

./bin/spark-submit

- Spark Sessions Pool

- Functions Registry

- Multi-tenancy

- REST API Framework

- Data API Framework

- Infrastructure

Integration (EMR,

Hortonworks, etc)

UX: Deploy Spark functions and trigger it from apps

Mist - Serverless proxy for Spark

DEMO

Machine Learning: training + serving

pipeline

Training (Estimation) pipeline

trainpreprocess preprocess

pipeline

Prediction Pipeline

preprocess preprocess

cluster

datamodel

data scientist

web app

docker

API

libs

model

Local Spark ML Serving Library:https://github.com/Hydrospheredata/spark-ml-serving

https://github.com/Hydrospheredata/spark-ml-serving

Model Artifact

Models - Runtimes - Formats Zoo

API & Logistics

- HTTP/1.1, HTTP/2, gRPC

- Kafka, Flink, Kinesis

- Protobuf, Avro

- Service Discovery

- Pipelining

- Tracing

- Monitoring

- Autoscaling

- Versioning

- A/B, Canary

- Testing

- CPU, GPU

Sidecar Architecture

UX: Train anywhere and deploy as a Function

UX: Models and Applications

Applications provide public endpoints for the models

and compositions of the models.

UX: Streaming Applications + Batching

UX: Pipelines, Assembles and BestSLA Applications

ML Function as a Service Demo!!!

Thank you

Looking for

- Feedback

- Advisors, mentors &

partners

- Pilots and early adopters

Stay in touch

- @hydrospheredata

- https://github.com/Hydrospheredata

- http://hydrosphere.io/

- [email protected]

https://github.com/Hydrospheredata/mist

http://hydrosphere.io/

mailto:[email protected]