Elastic Compute for Batch Platform › sites › events › ... · Apache Mesos Architecture...

Preview:

Citation preview

Elastic Compute for Batch Platform

Date : 13th Apr 2015

Name : Muralidhar Sortur

Content

•  Background on Batch •  Infra need for Batch •  Available technology •  Mesos Eco system •  BAAS •  Architecture and Design •  Demo •  What Next? •  Questions

Elastic Compute for Batch Platform 2

Background

•  Common use case of batch processing –  Statement Generation, bank postings, risk evaluation, credit score calculation, inventory management, portfolio

optimization, Data backup, Data crunching, Information extraction etc.

•  JSR-352 : Batch Application for Java platform

•  Logging, check pointing & parallelization

•  Start, Stop , Restart, Kill

•  Require huge dedicated backend infrastructure

•  ROI

Elastic Compute for Batch Platform 3

“Batch” As Is

•  Infrastructure : Dedicated VMs, Under utilized Resources, Time to Production is more

• Operation : Dedicated ops to address production job failures

• Development : No Standard way to accomplish task

Elastic Compute for Batch Platform 4

Elastic Compute for Batch Platform 5 5

Batch Infra

Commercial Solutions

Where does “Batch Infra” stand?

PaaS

Infra need for batch

Elastic Compute for Batch Platform 6

Resources

Data

Exec Environment : RHEL V?, Ubuntu, CentOS, Windows?

Resources : CPU, Memory, Network

Data: Hadoop, swift store, SQL, NoSQL

Infra need for batch

Elastic Compute for Batch Platform 7

PaaS

SaaS Exec Environment : freedom to use different flavor and version of container

Resources: Fail proof, bill as per usage

Data: Abstraction, consistency, availability, performance

Static Partitioning & Resource Utilization

Elastic Compute for Batch Platform 8

Infra

Mesos Eco System

Elastic Compute for Batch Platform 9

Infra

Apache Mesos Architecture

Elastic Compute for Batch Platform 10

Chronos executor

Chronos scheduler

Chronos executo

r

Infra

Resource Offer

Elastic Compute for Batch Platform 11

Infra

Elastic Compute for Batch Platform 12

What is BAAS? 1.  Batch as a Hosted service 2.  Self service 3.  Fault tolerant 4.  Better monitoring & notification 5.  Better ways to interact with outside world (DBs, services, Hadoop, streams,

messaging) 6.  Flexibility in writing batch job. 7.  Open source !!

12

Infra

Our Approach

•  Abstracting user experience from specifics of technology. •  Reduce opex and capex. •  Proactive monitoring and alerting •  Self service capability •  Fault tolerant.

Elastic Compute for Batch Platform 13

Infra

BAAS Detailed View

Elastic Compute for Batch Platform 14

BAAS REST

Service

Chronos Scheduler

Application Life Cycle

Management

Mesos Master

(Resource manager)

Mesos slave

Mesos slave

Mesos slave

Docker

Zoo keeper

Infra

BAAS Deployment View

Elastic Compute for Batch Platform 15

Infra

Interacting modules in Docker

Elastic Compute for Batch Platform 16

libraries

Manifest

cleanup

Execute

Deployer

Docker

LOGSTORE

MANIFEST STORE

MAVEN REPO

APP LIFE CYCLE MGMT MYSQL DB

GITHUB AUTHN

PASS

Infra

Create Batch Flow

Elastic Compute for Batch Platform 17

Infra

Schedule and Run Batch Job

Elastic Compute for Batch Platform 18

Infra

Self Service

• Creating & Registering Batch App. • Provisioning • Build & Deployment • Scheduling • Monitoring batch jobs during execution • Email alert for failure, long running jobs. • Killing a running batch job. • Restarting a failed job. • Report generation for batch job executions. • Chain the execution of batch Jobs

Presentation Title Goes Here 19

Operation

Development

•  Do I need to code all steps in batch ? •  Is there a standard way of writing a step / task ? •  How can I pass parameters from one step to another ? •  Why should every one implement same task in different way? •  Where can I contribute a well defined task / step to be used in batch job ?

Storage •  Is there a abstraction for batch storage needs ? •  Is it possible to isolate storage and choice of storage from code? •  Should I write a separate batch app for taking backup?

Elastic Compute for Batch Platform 20

Development

Batch Reusable Infra component

Elastic Compute for Batch Platform 21

BRIC Tasklet

0 to N Inputs

output

Repeat Status

Inputs can be 1.  hardcoded parameters 2.  Previous Tasklet 3.  Command line parameters

Output can be passed to other Tasklet

Development

Presentation Title Goes Here 22

Storage Manager

Batch Job Jcloud API

Compute

Event manager

BAAS

Meta Data Repo

Storage as a Service Development

“Batch” with this

•  Infrastructure : NO Dedicated VMs, improved Resource utilization, Time to Production in less than a day

• Operation : Complete self service, Developer is responsible to take action if failure

• Development : Standard way to accomplish task

Presentation Title Goes Here 23

Presentation Title Goes Here 24

Resources

•  http://stackoverflow.com/questions/18285212/how-to-scale-docker-containers-in-production

•  http://eurosys2013.tudos.org/wp-content/uploads/2013/paper/Schwarzkopf.pdf

•  http://www.wired.com/wiredenterprise/2013/03/google-borg-twitter-mesos/

•  http://www.slideshare.net/tomasbart/introduction-to-apache-mesos

•  http://www.slideshare.net/pacoid/strata-sc-2014-apache-mesos-as-an-sdk-for-building-distributed-frameworks

•  https://www.docker.com/tryit/

Elastic Compute for Batch Platform 25

Recommended