Serverless Predictions at Scale - Amazon Web...

Thomas Reske

Global Solutions Architect, Amazon Web Services

Serverless Predictions at Scale

Serverless computing allows you to

build and run applications and services without thinking about servers

No server management Flexible scaling

Automated high availability No excess capacity

What are the benefits of serverless computing?

Machine Learning Process

Fetch data

Clean and

format data

Prepareand

transform data

Model TrainingEvaluation

and verificaton

Integrationand

deployment

Operations, monitoring, and

optimization

Machine Learning Process

Fetch data

Clean and

format data

Prepareand

transform data

and verificaton

Integrationand

deployment

optimizationHow can I use pre-trained

models for serving predictions?

How to scale effectively?

What exactly are we deploying?

A Pre-trained Model...

• ... is simply a model which has been trained previously

• Model artifacts (structure, helper files, parameters, weights, ...) are files, e.g. in JSON or binary format

• Can be serialized (saved) and de-serialized (loaded) by common deep learning frameworks

• Allows for fine-tuning of models („head-start“ or „freezing“), but also to apply them to similar problems and different data („transfer learning“)

• Models can be made publicly available, e.g. in the MXNet Model Zoo

What‘s missing?

Blueprints?

Model ServerModel ServerModel Server

Model Artifacts

Load Balancing

Generic Architecture

Amazon API Gateway and AWS Lambda Integration

Amazon API Gateway

(API Proxy)

Amazon S3(Model Archive)

AWS Lambda(Data Processing)

Trade-Offs?

Model Server for Apache MXNet

• Flexible and easy to use open-source project (Apache License 2.0) for serving deep learning models

• Supports MXNet and Open Neural Network Exchange (ONNX)

• CLI to package model artifacts and pre-configured Docker images to start a service that sets up HTTP endpoints to handle model inference requests

• More Details at https://github.com/awslabs/mxnet-model-server and https://onnx.ai/

Quick Start - CLI# create directory

mkdir -p ~/mxnet-model-server-quickstart && cd $_

# create and activate virtual environment

python3 -m venv ~/mxnet-model-server-quickstart/venv

source ~/mxnet-model-server-quickstart/venv/bin/activate

# install packages

pip install --upgrade pip

pip install mxnet-model-server

# run model server for Apache MXNet

mxnet-model-server --models \

squeezenet=https://s3.amazonaws.com/model-server/models/squeezenet_v1.1/squeezenet_v1.1.model

# classify an image

curl -s -O https://s3.amazonaws.com/model-server/inputs/kitten.jpg

curl -s http://127.0.0.1:8080/squeezenet/predict -F "data=@kitten.jpg" | jq -r

Quick Start - Docker# create directory

mkdir -p ~/mxnet-model-server-quickstart-docker && cd $_

# run model server for Apache MXNet

docker run -itd --name mms -p 8080:8080 awsdeeplearningteam/mms_cpu \

mxnet-model-server start --mms-config /mxnet_model_server/mms_app_cpu.conf

# classify an image

curl -s -O https://s3.amazonaws.com/model-server/inputs/kitten.jpg

curl -s http://127.0.0.1:8080/squeezenet/predict -F "data=@kitten.jpg" | jq -r

Why developers love containers

Software Packaging

Distribution Immutable Infrastructure

Server

Guest OS

Bins/Libs Bins/Libs

App2App1

Managing One Host Is Straightforward

Managing a Fleet Is Hard

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Availability Zone 1 Availability Zone 2

Availability Zone 3

Server

Guest OS

Server

Guest OS

Server

Guest OS

Server

Guest OS

Model ServerModel ServerModel Server

Model Artifacts

Load Balancing

Generic Architecture

AWS Fargate and Apache MXNet Model Server

AWS Fargate(Data Processing and Model Archive)

Elastic Load Balancing

Amazon ECR(Inference Code Image)

Quick Start – AWS Fargate CLI – Basic Task Definition {

"family": "mxnet-model-server-fargate",

"networkMode": "awsvpc",

"containerDefinitions": [

"name": "mxnet-model-server-fargate-app",

"image": "awsdeeplearningteam/mms_cpu",

"portMappings": [

"containerPort": 8080,"hostPort": 8080,"protocol": "tcp"

"essential": true,

"entryPoint": [

"mxnet-model-server","start","--mms-config","/mxnet_model_server/mms_app_cpu.conf"

"requiresCompatibilities": [

"FARGATE"

"cpu": "256",

"memory": "512"

Quick Start – AWS Fargate CLI # register task definition

aws ecs register-task-definition --cli-input-json file://basic_taskdefinition.json

# create AWS Fargate cluster

aws ecs create-cluster --cluster-name mxnet-model-server

# create service

aws ecs create-service \

--cluster mxnet-model-server \

--service-name mxnet-model-server-fargate-service \

--task-definition mxnet-model-server-fargate:1 \

--desired-count 2 \

--launch-type "FARGATE" \

--network-configuration "awsvpcConfiguration={subnets=[$MMS_SUBNETS],securityGroups=[$MMS_SG_ID]}" \

--load-balancers "targetGroupArn=$MMS_TG_ARN,containerName=$MMS_CONTAINER_NAME,containerPort=8080"

Trade-Offs?

Amazon SageMaker

Amazon SageMaker(ML Hosting Service)

Quick Start – Amazon SageMaker CLI # register model

aws sagemaker create-model \

--model-name image-classification-model \

--primary-container Image=$INFERENCE_IMG,ModelDataUrl=$MODEL_DATA_URL \

--execution-role-arn $EXECUTION_ROLE_ARN

# register next-generation model

aws sagemaker create-model \

--model-name image-classification-model-nextgen \

--primary-container Image=$INFERENCE_IMG_NG,ModelDataUrl=$MODEL_DATA_URL_NG \

--execution-role-arn $EXECUTION_ROLE_ARN

Quick Start – Amazon SageMaker CLI # create endpoint configuration

aws sagemaker create-endpoint-config \--endpoint-config-name "image-classification-config" \

--production-variants ‘[ { "VariantName" : "A", "ModelName" : "image-classification-model", "InitialInstanceCount" : 1, "InstanceType" : "ml.t2.medium", "InitialVariantWeight" : 8

}, { "VariantName" : "B", "ModelName" : "image-classification-model-nextgen", "InitialInstanceCount" : 1, "InstanceType" : "ml.t2.medium", "InitialVariantWeight" : 2

} ]’

# create endpoint

aws sagemaker create-endpoint \

--endpoint-name image-classification \

--endpoint-config-name image-classification-endpoint-config \&& aws sagemaker wait endpoint-in-service --endpoint-name image-classification

Other Options?

AWS Fargate(Data Processing)

Elastic Load Balancing

Amazon API Gateway

(API Proxy)

Amazon S3(Modell Archiv)

Amazon SageMaker(ML Hosting Service)

Amazon ECR(Inferenz Code Image)

Amazon API Gateway

(API Proxy)

AWS Lambda(Data Pre-Processing)

https://medium.com/@julsimon/using-chalice-to-serve-sagemaker-predictions-a2015c02b033

Summary

Fetch data

Clean and

format data

Prepareand

transform data

and verificaton

Integrationand

deployment

optimizationHow can I use pre-trained

models for serving predictions?

How to scale effectively?

Summary

Fetch data

Clean and

format data

Prepareand

transform data

and verificaton

Integrationand

deployment

optimization1

API Gateway + Lambda + S3

ELB + AWS Fargate

Amazon SageMaker

Resources

• Serverless Computing and Applicationshttps://aws.amazon.com/serverless/

• Seamlessly Scale Predictions with AWS Lambda and MXNethttps://aws.amazon.com/blogs/compute/seamlessly-scale-predictions-with-aws-lambda-and-mxnet/

• Model Server for Apache MXNethttps://github.com/awslabs/mxnet-model-server

• Serverless Predictions at Scalehttps://github.com/aws-samples/aws-ai-bootcamp-labs/blob/master/serverless_predictions.MD

• Serverless Inference with MMS on AWS Fargatehttps://github.com/awslabs/mxnet-model-server/blob/master/docs/mms_on_fargate.md

• Introducing Model Server for Apache MXNethttps://aws.amazon.com/blogs/machine-learning/introducing-model-server-for-apache-mxnet/

• Using Chalice to serve SageMaker predictionshttps://medium.com/@julsimon/using-chalice-to-serve-sagemaker-predictions-a2015c02b033

Before you go...

Sessions (AI & ML)

Wednesday, June 6 Thursday, June 713:00-15:00 AWS ML & AI Deep Dive

Julien Simon, AWS13:00-14:00 Machine Learning & AI at Amazon

Ian Massingham, AWS

15:00-16:00 Build Your Recommendation Engineson AWS TodayYotam Yarden & Cyrus Vahid, AWS

14:00-15:00 The Machine Learning Process: FromBusiness Model to Machine Learning in ProductionConstantin Gonzalez, AWS

16:00-17:00 Serverless Predictions at ScaleThomas Reske, AWS

15:00-16:00 Using Machine Learning Algorithmsto create Market TransparencyDr. Markus Schmidberger & Dr. Markus Ludwig, Scout24 AG

16:00-17:00 Needle in the Live Video Cloud haystack – AI/ML for mass live acquisitionMatija Tonejc & David Querq, Make.TV

Please complete the session survey in the summit mobile app.

Thank You!

Thank you!

Serverless Predictions at Scale - Amazon Web...

Documents

Pricing Strategy - aws-de-media.s3.amazonaws.comaws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June6/Apollo... · P ric e Objec ts P ric e P o in ts P ric e ... P roduc t P

Project On-boarding in an Enterprise Environmentaws-de-media.s3.amazonaws.com/images/AWS_Summit_2018...Over half of new project budgets goes to building technologies vs. running existing

Build a Recommendation Engine on AWS Todayaws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June6/Doppler/Build Your...was really useful and suited our business needs. We believe

AWS EC2 Virtualization: Introducing Nitroaws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved

Connecting industrial PLC devices to AWSaws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June6... · TwinCAT IoT for connecting the PLC directly or via a gateway application TwinCAT

Edge and Hybrid Storage - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June… · •Low-cost: Simple licensing, predictable storage costs, and reduced

Container-based Architectures on AWS - Amazon …aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/...API for launching containers on the cluster EC2 INSTANCES ECS AGENT TASK TASK

An Introduction for Building End2End IoT Solutions finalaws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June6... · Keep your fleet secure AWS IoTDevice Defender is a fully managed

Build planetary scale applications with compartmentalizationaws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June… · Evolution Data Protection Amazon DynamoDB Amazon CloudFront

Real-World Predictive Applications with Amazon Machine ...aws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Services we will use: Amazon Machine Learning, Amazon Kinesis,

Additional Security Services on AWSaws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Coral... · • CloudTrail history of AWS API calls used to access the Management Console,

Claranet from on-premise to serverless - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/BF/2 Berl… · About Claranet Group •Founded in 1996 •Private

Device Provisioning Options with AWS IoTaws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June6... · MQTT AWS IoT1-click Endpoints Gateway/PLC Cloud Enterprise Applications Device

Log Analytics with Amazon Elasticsearch Serviceaws-de-media.s3.amazonaws.com/images/_Munich_Loft_Slides/Elast… · Log Analytics with Amazon Elasticsearch Service Christoph Schmitter

tecRacer-Aesculap Academy goes Amazon Cloudaws-de-media.s3.amazonaws.com/images/Partner Web... · Amazon Web Services mit der “ManagedService Provider” Kompetenz ausgezeichnet

Data Challenges in an acquisition based world 07 June 2018aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/D… · the help of AWS. We are an Online Food Ordering

YOUR SOFTWARE LIFECYCLE SPEEDUP RECIPE AGILE …aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/... · 2018-06-19 · Digital Transformation New ways of developing, delivering,

Securely Connecting and Managing Industrial IoT …aws-de-media.s3.amazonaws.com/images/HMI/Sessions2018/S09... · 2018-05-12 · DATA SERVICES AMAZON S3 AMAZON DYNAMODB AMAZON RDS

Einführung in Amazon Machine Learningaws-de-media.s3.amazonaws.com/images/Webinar/2016-02-25-Machine...Amazon Redshift Amazon RDS Amazon S3 Amazon EMR. Three types of data-driven

Transforming Product Development - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June… · © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights