Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Simplifying ML Workflows with Apache Beam & TensorFlow ExtendedTyler Akidau@takidau
Software Engineer at GoogleApache Beam PMC
+
Apache BeamPortable data-processing pipelines
+
Example pipelines
+
Java
Python
Cross-language Portability Framework
+
Language B SDK
Language A SDK
Language C SDK
Runner 1 Runner 3Runner 2
The Beam Model
Language A Language CLanguage B
The Beam Model
Python compatible runners
+
Direct runner (local machine):
Google Cloud Dataflow:
Apache Flink:
Apache Spark:
Now
Now
Q2-Q3
Q3-Q4
TensorFlow ExtendedEnd-to-end machine learning in production
+
“Doing ML in production is hard.”
-Everyone who has ever tried
Because, in addition to the actual ML...
ML Code
+
...you have to worry about so much more.
Configuration
Data Collection
Data Verification
Feature Extraction
Process Management Tools
Analysis Tools
Machine Resource
Management
Serving Infrastructure
Monitoring
Source: Sculley et al.: Hidden Technical Debt in Machine Learning Systems
ML Code
+
In this talk, I will...
+
In this talk, I will...
Show you how to apply transformations...
TensorFlowTransform
Show you how to apply transformations...
+
In this talk, we will...
Show you how to apply transformations... ... consistently between Training and Serving
TensorFlowTransform
TensorFlowEstimators
TensorFlowServing
+
In this talk, we will...
Introduce something new...
TensorFlowTransform
TensorFlowEstimators
TensorFlowServing
TensorFlowModel
Analysis
+
TensorFlow TransformConsistent In-Graph Transformations in Training and Serving
+
Typical ML Pipeline
batch processing
During training
“live” processing
During serving
data request
+
Typical ML Pipeline
batch processing
During training
“live” processing
During serving
data request
+
TensorFlow Transform
tf.Transform batch processing
During training
transform as tf.Graph
During serving
data request
+
Defining a preprocessing function in TF Transform
def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( tft.normalize(x) * y), "B": tensorflow_fn(y, z), "C": tft.ngrams(z) }
Many operations available for dealing with text and numeric features, user can define their own.
X Y Z
+
Defining a preprocessing function in TF Transform
def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( tft.normalize(x) * y), "B": tensorflow_fn(y, z), "C": tft.ngrams(z) }
Many operations available for dealing with text and numeric features, user can define their own.
X Y Z
mean stddev
normalize
+
Defining a preprocessing function in TF Transform
def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( tft.normalize(x) * y), "B": tensorflow_fn(y, z), "C": tft.ngrams(z) }
Many operations available for dealing with text and numeric features, user can define their own.
X Y Z
mean stddev
normalizemultiply
+
X Y Z
mean stddev
normalizemultiply
Defining a preprocessing function in TF Transform
def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( tft.normalize(x) * y), "B": tensorflow_fn(y, z), "C": tft.ngrams(z) }
Many operations available for dealing with text and numeric features, user can define their own.
quantiles
bucketize
A+
X Y Z
mean stddev
normalizemultiply
quantiles
bucketize
A
Defining a preprocessing function in TF Transform
def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( tft.normalize(x) * y), "B": tensorflow_fn(y, z), "C": tft.ngrams(z) }
Many operations available for dealing with text and numeric features, user can define their own.
B+
X Y Z
mean stddev
normalizemultiply
quantiles
bucketize
A B
Defining a preprocessing function in TF Transform
def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( tft.normalize(x) * y), "B": tensorflow_fn(y, z), "C": tft.ngrams(z) }
Many operations available for dealing with text and numeric features, user can define their own.
C+
mean stddev
normalizemultiply
quantiles
bucketize
Analyzers
Reduce (full pass)
Implemented as a distributed data pipeline
Transforms
Instance-to-instance (don’t change batch dimension)
Pure TensorFlow
+
Analyzenormalize
multiply
bucketize
constant tensors
data
mean stddev
normalizemultiply
quantiles
bucketize
What can be done with TF Transform?
Pretty much anything.
tf.Transform batch processing
+
What can be done with TF Transform?
Anything that can be expressed as a TensorFlow GraphPretty much anything.
tf.Transform batch processing Serving Graph
+
Scale to ... Bag of Words / N-Grams
Bucketization Feature Crosses
Some common use-cases...
Apply another TensorFlow Model
Some common use-cases...
Scale to ... Bag of Words / N-Grams
Bucketization Feature Crosses
github.com/tensorflow/transform
TensorFlow Model AnalysisScaleable, sliced, and full-pass metrics
Introducing…
+
Let’s Talk about Metrics...
● How accurate?● Converged model?● What about my TB sized eval set?● Slices / subsets?● Across model versions?
+
ML Fairness: analyzing model mistakes by subgroup
Specificity (False Positive Rate)
Sens
itivi
ty (T
rue
Posi
tive
Rate
)
ROC CurveAll groups
Learn more at ml-fairness.com
ML Fairness: analyzing model mistakes by subgroup
Specificity (False Positive Rate)
Sens
itivi
ty (T
rue
Posi
tive
Rate
)
ROC CurveAll groups
Group A
Group B
Learn more at ml-fairness.com
ML Fairness: understand the failure modes of your models
How does it work?
...estimator = DNNLinearCombinedClassifier(...)estimator.train(...)
estimator.export_savedmodel( serving_input_receiver_fn=serving_input_fn)
tfma.export.export_eval_savedmodel ( estimator=estimator, eval_input_receiver_fn=eval_input_fn)...
Inference Graph (SavedModel)
SignatureDef
+
...estimator = DNNLinearCombinedClassifier(...)estimator.train(...)
estimator.export_savedmodel( serving_input_receiver_fn=serving_input_fn)
tfma.export.export_eval_savedmodel ( estimator=estimator, eval_input_receiver_fn=eval_input_fn)...
How does it work?
Eval Graph (SavedModel)
SignatureDefEval Metadata
Inference Graph (SavedModel)
SignatureDef
+
github.com/tensorflow/model-analysis
SummaryApache Beam: Data-processing framework the runs locally and scales to massive data, in the Cloud (now) and soon on-premise via Flink (Q2-Q3) and Spark (Q3-Q4). Powers large-scale data processing in the TF libraries below.
tf.Transform: Consistent in-graph transformations in training and serving.
tf.ModelAnalysis: Scalable, sliced, and full-pass metrics.
+