Pipelines and RAPIDS Learning with the Kubeflow S91030 - Hybrid … · 2019-03-29 · DASK/SPARK DEEP LEARNING FRAMEWORKS CUDNN RAPIDS CUDF CUML CUGRAPH. BENCHMARKS cuML — XGBoost

S91030 - Hybrid Machine Learning with the Kubeflow Pipelines and RAPIDS

Sina Chavoshi

The right approach for the right problem

Building blocks Platform Solutions

Cloud AI Strategy:



Cloud AI Strategy:

Building BlocksSight Language Conversation



Cloud AI Strategy:

Solutions / Contact Center

Customer

Phone

Chat

Contact Center Provider

Contact Center Interface

Virtual Agent

AgentAssist

Knowledge Base(PDF/HTML)

BackendFulfillment

Virtual Agent

Agent

Google Cloud Contact Center AI



Cloud AI Strategy:

Cloud AI PlatformData pipeline

Cloud Dataprep

BigQuery

Cloud Dataflow

Cloud Dataproc

Model development

Cloud ML Engine

Model deployment and management

Cloud ML Engine

CloudKubernetes Engine

Tools

Jupyter Notebooks

Services

ASL

Community

Kubeflow

Building & deploying real-life ML applications is hard and costly because of lack of tooling that covers end-to-end ML development & deployment.

In addition to the actual ML...

ML Code

You have to worry about so much more.

Configuration

Data Collection

Data Verification

Feature Extraction

Process Management Tools

Analysis Tools

Machine Resource

Management

Serving Infrastructure

Monitoring

ML Code

Source: Sculley et al.: Hidden Technical Debt in Machine Learning Systems

02

03

AI problems today

Problems SolutionsDeploymentBrittle, opinionated infrastructure that is hard to productionize and breaks between cloud and on-prem

TalentMachine Learning expertise is scarce

CollaborationDifficult to find, leverage existing solutions

Reusable pipelines

01

02

03

01: Kubeflow

Scalable ML services on Kubernetes

Easy to get started• Out-of-box support for top frameworks

– pytorch, caffe, tf and xgboost• Kubernetes manages dependencies, resources

Swappable & scalable• Library of ML services• GPU support• Massive scale

Meet customer where they are• GCP• On-prem with Cisco

Cloud

On-prem

Training

ML microservices

Predict

Training Predict

Product Overview

RAPIDS

THE BIG PROBLEM IN DATA SCIENCE

All Data

ETL

Manage Data

Structured Data Store

Data Preparation

Training

Model Training Visualization

Evaluate

Scoring

Deploy

Slow Training Times for Data Scientists

RAPIDS — OPEN GPU DATA SCIENCESoftware Stack Python

Data PreparationcuDF

Graph AnalyticscuGRAPH

Model TrainingcuML

CUDA

PYTHON

APACHE ARROW on GPU Memory

DAS

K/SP

ARK

DEEP LEARNINGFRAMEWORKS

CUDNN

RAPIDS

CUMLCUDF CUGRAPH

BENCHMARKScuML — XGBoost End-to-End

cuIO/cuDF — Load and Data Preparation

Benchmark

200GB CSV dataset; Data preparation includes joins, variable transformations.

CPU Cluster Configuration

CPU nodes (61 GiB of memory, 8 vCPUs, 64-bit platform), Apache Spark

DGX Cluster Configuration

5x DGX-1 on InfiniBand network

Time in seconds — Shorter is better

cuIO / cuDF (Load and Data Preparation) Data Conversion XGBoost

AI Hub & Pipelines: Fast & simple adoption of AI

5. PublishUpload & share pipelines running best

within your org or publicly.

1. Search & DiscoverFind best-of-breed solutions on the AI

Hub which leverage Cloud AI solutions

2. DeployQuick 1-click implementation of ML pipelines onto Google Cloud Platform .

4. Run in productionDeploy customized pipelines

in production.

3. CustomizeExperiment and adjustment out-of-the-box pipelines to custom use cases.

Network effect

The Flywheel of AI Adoption

02: Reusable Pipelines

Enable developers to build custom ML applications by easily “stitching” and connecting various components.

• Reuse instead of reimplement or reinvent• Discover, learn and replicate successful pipelines

What constitutes a Kubeflow Pipeline

● Containerized implementations of ML Tasks○ Containers provide portability, repeatability and

encapsulation○ A task can be single node or *distributed* ○ A containerized task can invoke other services

● Specification of the sequence of steps○ Specified via Python SDK

● Input Parameters○ A “Job” = Pipeline invoked w/ specific parameters

03: AI Hub at a glance

All AI content in one placeQuick discovery of plug & play AI pipelines & other content built by teams across Google and by partners and customers.

Fast & simple implementation of AI on GCPOne-click deployment of AI pipelines via Kubeflow on GCP as the go-to platform for AI + hybrid & on premise.

Enterprise-grade internal & external sharingFoster reuse by sharing deployable AI pipelines & othercontent privately within organizations & publicly.

1

2

3

Mission

The one place for everything AI, from experimentation to production.

Public and private AI Hub

By Google

Unique AI assets by Google

By partners

Created, shared & monetized by anyone

By customers

Content shared securely within and with other organizations

Public content + Private content

AutoML, TPUs, Cloud AI Platform, etc.

Kubeflow Pipelines enable Workflow

orchestrationRapid reliable

experimentationShare, re-use &

compose

Demo

Visual depiction of pipeline topology

View all current and historical runs, grouped as “Experiments”

Rich visualizations of metrics

Clone an existing pipeline

Access to all config params, inputs and outputs for each run

Update parameters and submit

Easy comparison of Runs

Easy comparison of Runs

That’s a wrap.

Documents

Pipelines and RAPIDS Learning with the Kubeflow S91030 - Hybrid … · 2019-03-29 · DASK/SPARK DEEP LEARNING FRAMEWORKS CUDNN RAPIDS CUDF CUML CUGRAPH. BENCHMARKS cuML — XGBoost