Simplifying ML Workflows with Apache Beam & …...Summary Apache Beam: Data-processing framework...

Simplifying ML Workflows with Apache Beam & TensorFlow ExtendedTyler Akidau@takidau

Software Engineer at GoogleApache Beam PMC

Apache BeamPortable data-processing pipelines

Example pipelines

Python

Cross-language Portability Framework

Language B SDK

Language A SDK

Language C SDK

Runner 1 Runner 3Runner 2

The Beam Model

Language A Language CLanguage B

The Beam Model

Python compatible runners

Direct runner (local machine):

Google Cloud Dataflow:

Apache Flink:

Apache Spark:

TensorFlow ExtendedEnd-to-end machine learning in production

“Doing ML in production is hard.”

-Everyone who has ever tried

Because, in addition to the actual ML...

ML Code

...you have to worry about so much more.

Configuration

Data Collection

Data Verification

Feature Extraction

Process Management Tools

Analysis Tools

Machine Resource

Management

Serving Infrastructure

Monitoring

Source: Sculley et al.: Hidden Technical Debt in Machine Learning Systems

ML Code

In this talk, I will...

Show you how to apply transformations...

TensorFlowTransform

Show you how to apply transformations...

In this talk, we will...

Show you how to apply transformations... ... consistently between Training and Serving

TensorFlowTransform

TensorFlowEstimators

TensorFlowServing

In this talk, we will...

Introduce something new...

TensorFlowTransform

TensorFlowEstimators

TensorFlowServing

TensorFlowModel

Analysis

TensorFlow TransformConsistent In-Graph Transformations in Training and Serving

Typical ML Pipeline

batch processing

During training

“live” processing

During serving

data request

Typical ML Pipeline

batch processing

During training

“live” processing

During serving

data request

TensorFlow Transform

tf.Transform batch processing

During training

transform as tf.Graph

During serving

data request

Defining a preprocessing function in TF Transform

def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( tft.normalize(x) * y), "B": tensorflow_fn(y, z), "C": tft.ngrams(z) }

Many operations available for dealing with text and numeric features, user can define their own.

mean stddev

normalize

mean stddev

normalizemultiply

mean stddev

normalizemultiply

quantiles

bucketize

mean stddev

normalizemultiply

quantiles

bucketize

mean stddev

normalizemultiply

quantiles

bucketize

mean stddev

normalizemultiply

quantiles

bucketize

Analyzers

Reduce (full pass)

Implemented as a distributed data pipeline

Transforms

Instance-to-instance (don’t change batch dimension)

Pure TensorFlow

Analyzenormalize

multiply

bucketize

constant tensors

mean stddev

normalizemultiply

quantiles

bucketize

What can be done with TF Transform?

Pretty much anything.

tf.Transform batch processing

What can be done with TF Transform?

Anything that can be expressed as a TensorFlow GraphPretty much anything.

tf.Transform batch processing Serving Graph

Scale to ... Bag of Words / N-Grams

Bucketization Feature Crosses

Some common use-cases...

Apply another TensorFlow Model

Some common use-cases...

Scale to ... Bag of Words / N-Grams

Bucketization Feature Crosses

github.com/tensorflow/transform

TensorFlow Model AnalysisScaleable, sliced, and full-pass metrics

Introducing…

Let’s Talk about Metrics...

● How accurate?● Converged model?● What about my TB sized eval set?● Slices / subsets?● Across model versions?

ML Fairness: analyzing model mistakes by subgroup

Specificity (False Positive Rate)

ROC CurveAll groups

Learn more at ml-fairness.com

ML Fairness: analyzing model mistakes by subgroup

Specificity (False Positive Rate)

ROC CurveAll groups

Group A

Group B

Learn more at ml-fairness.com

ML Fairness: understand the failure modes of your models

ML Fairness: Learn More

ml-fairness.com

How does it work?

...estimator = DNNLinearCombinedClassifier(...)estimator.train(...)

estimator.export_savedmodel( serving_input_receiver_fn=serving_input_fn)

tfma.export.export_eval_savedmodel ( estimator=estimator, eval_input_receiver_fn=eval_input_fn)...

Inference Graph (SavedModel)

SignatureDef

...estimator = DNNLinearCombinedClassifier(...)estimator.train(...)

estimator.export_savedmodel( serving_input_receiver_fn=serving_input_fn)

tfma.export.export_eval_savedmodel ( estimator=estimator, eval_input_receiver_fn=eval_input_fn)...

How does it work?

Eval Graph (SavedModel)

SignatureDefEval Metadata

Inference Graph (SavedModel)

SignatureDef

github.com/tensorflow/model-analysis

SummaryApache Beam: Data-processing framework the runs locally and scales to massive data, in the Cloud (now) and soon on-premise via Flink (Q2-Q3) and Spark (Q3-Q4). Powers large-scale data processing in the TF libraries below.

tf.Transform: Consistent in-graph transformations in training and serving.

tf.ModelAnalysis: Scalable, sliced, and full-pass metrics.

Simplifying ML Workflows with Apache Beam & …...Summary Apache Beam: Data-processing framework...

Documents

Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel Efficiency

Apache beamとdataflow紹介

Introduction to Apache Beam

How Apache Beam Will Change Big Data - QConSP€¦ · Apache Beam is a unified model for processing data Was originally created at Google Later donated to the Apache Foundation as

Scio - A Scala API for Google Cloud Dataflow & Apache Beam

Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0

2017-05-17 dhalperi @ApacheCon Big Data · Apache Beam is a uniﬁed programming model capable of expressing a wide variety of ... This talk will cover the basics of Apache Beam,

Efficient Window Aggregation with General Stream Slicing · temswhichadoptatuple-at-a-timeprocessingmodel(e.g.,Apache Storm, Apache Flink, and other Apache Beam-based systems). To

Introduction to Apache Beam - Meetupfiles.meetup.com/18743046/Introduction_to_Apache_Beam.pdf18 The Apache Beam Vision 1. End users: who want to write pipelines in a language that’s

Big Data Systeme Recommendations - HAW Hamburgubicomp/projekte/master2017... · •Apache Apex •Apache Beam •Batch •Apache Hadoop •Apache Tez •Stream •Apache Storm •Apache

with Apache Beam (incubating) Fundamentals of Stream ... · Apache Beam Committers & Google Engineers Fundamentals of Stream Processing with Apache Beam (incubating) QCon San Francisco

Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing

Performance Analysis and Optimization of Apache Pig...Apache Pig is a language, compiler, and run-time library for simplifying the de-velopment of data-analytics applications on Apache

Introduction to Apache Beam - files.meetup.comThe Apache Beam Vision 1. End users: who want to write pipelines in a language that’s familiar. 2. SDK writers: who want to make Beam

Simplifying Big Data with Apache Crunch

Simplifying Data Management DataStax and Robin … · DataStax Enterprise is the best distribution of Apache Cassandra™ and also includes ... Simplifying Data Management ... bare

Simplifying Data Engineering to Accelerate Innovationilovejoes.weebly.com/.../simplifying-data-engineering-databricks_2.pdf · Apache Spark™ is an open source data processing engine

Apache Beam

Unified Processing with Apache Beam · 16/09/2017 · Unified Processing with Apache Beam Cloud+Data NEXTCon 2017. I am Sourabh Hello! I am Sourabh Hello! ... Apache Flink Apache

Apache Beam (incubating)