13
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Jim Donahue | Principal Scientist Adobe Systems Technology Lab Flint: Making Sparks (and Sharks and HDFSs too!)

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Jim Donahue | Principal Scientist Adobe Systems Technology Lab Flint: Making

Embed Size (px)

Citation preview

Page 1: © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Jim Donahue | Principal Scientist Adobe Systems Technology Lab Flint: Making

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Jim Donahue | Principal Scientist Adobe Systems Technology LabFlint: Making Sparks (and Sharks and HDFSs too!)

Page 2: © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Jim Donahue | Principal Scientist Adobe Systems Technology Lab Flint: Making

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Flint: Bring BDAS to the AWS Masses @ Adobe

How to effectively evangelize BDAS @ Adobe?

Looking for intrepid, curious users who want to experiment

Curiosity is always tempered by cost of startup

Most of the data for experimental applications likely in AWS

2

Page 3: © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Jim Donahue | Principal Scientist Adobe Systems Technology Lab Flint: Making

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Flint: Design Principles

Shared Nothing

Get your own AWS account and go

Simple Configuration

Write a little JSON, run a couple of scripts

Efficient, flexible scaling

As simple or complex as you want/need

Full access to tools

Batch, Spark/Shark shells, Shark Server, web UIs, …

Access to all the Spark/Shark tuning parameters

Very simple hardwired “spark-env.sh”

Tuned to Adobe environment

Port choices determined by our firewall

3

Page 4: © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Jim Donahue | Principal Scientist Adobe Systems Technology Lab Flint: Making

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Flint: Architecture

4

• Local Spark/Shark, Slaves can use S3 storage for files

• Remote Access runs shells on SSH Server

• Components use S3, SimpleDB for state management

• Flint distributes shared AWS credentials among components

• Flint manages master, SSHServer startup

• Slave elasticity managed by master, can leverage spot pricing

LocalSpark/Shark

RemoteAccess

ClusterSetup

Local Flint

Server

S3

Spark Master

SSHServer

(Shells)

SimpleDB

SparkSlave(s)

Flint

AWS

Page 5: © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Jim Donahue | Principal Scientist Adobe Systems Technology Lab Flint: Making

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Flint: Setup

5

• Flint instance manages encrypted AWS credentials

• Create S3 buckets to hold JAR files

• Create SimpleDB tables to hold state

• Create key pair, security group for instances

LocalSpark/Shark

RemoteAccess

ClusterSetup

Local Flint

Server

S3

SimpleDB

Flint

AWS

Page 6: © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Jim Donahue | Principal Scientist Adobe Systems Technology Lab Flint: Making

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Flint: Provisioning

6

• Define clusters through JSON spec

(“master instance configuration is x, slave instance configuration is y, scaling rule is …”)

• Define configurations through JSON spec

(“spark master uses AMI x, running service y, with properties a, b, …”) and JAR file containing services code

• “Getting started” set of clusters, configurations provided

• AMI provided with all the requisite Spark / Shark / Hadoop / Kafka bits

LocalSpark/Shark

RemoteAccess

ClusterSetup

Local Flint

Server

S3

SimpleDB

Flint

AWS

Page 7: © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Jim Donahue | Principal Scientist Adobe Systems Technology Lab Flint: Making

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Flint: Cluster Start

7

• Local Flint Instance launches “master” instance (using cluster definition in SimpleDB)

• Master reads SimpleDB and S3 for configuration and code, installs master services

• Starting services launches Spark and/or HDFS masters through command line

• Master puts “connect URL” in SimpleDB

LocalSpark/Shark

RemoteAccess

ClusterSetup

Local Flint

Server

S3

Spark Master

SimpleDB

Flint

AWS

Page 8: © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Jim Donahue | Principal Scientist Adobe Systems Technology Lab Flint: Making

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Flint: Slave(s) Start

8

• Master “scaling service” launches slave instance(s)

• Slave reads SimpleDB and S3 for configuration and code, installs worker services

• Slave gets master “connect URL” from SimpleDB

• Slave launches Spark and/or HDFS workers through command line

LocalSpark/Shark

RemoteAccess

ClusterSetup

Local Flint

Server

S3

Spark Master

SimpleDB

SparkSlave(s)

Flint

AWS

Page 9: © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Jim Donahue | Principal Scientist Adobe Systems Technology Lab Flint: Making

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Flint: Client Start

9

• Flint instance launches “client” instance (using cluster definition in SimpleDB)

• Client reads SimpleDB and S3 for configuration and code, installs (SSHServer) services

• Client reads SimpleDB for authentication info, master connect URL

• Service startup starts SSHServer connected to right “shell factory”

LocalSpark/Shark

RemoteAccess

ClusterSetup

Local Flint

Server

S3

Spark Master

SSHServer

(Shells)

SimpleDB

SparkSlave(s)

Flint

AWS

Page 10: © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Jim Donahue | Principal Scientist Adobe Systems Technology Lab Flint: Making

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Flint: Client Connect (Remote Shells)

10

• Flint server finds “appropriate client”

• SSH client launched to connect

• SSHServer connects to master on client’s behalf

LocalSpark/Shark

RemoteAccess

ClusterSetup

Local Flint

Server

S3

Spark Master

SSHServer

(Shells)

SimpleDB

SparkSlave(s)

Flint

AWS

Page 11: © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Jim Donahue | Principal Scientist Adobe Systems Technology Lab Flint: Making

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Flint: Client Asynchronous Requests

Flint clients can also make asynchronous requests

Each Flint master runs service that pulls request from SQS queue

Request progress/results stored in SDB

Requests include:

Move data between HDFS and S3

Mount EBS volume and cache in HDFS (AWS public data sets)

Run batch job

Client can make request even if cluster not alive

Simplifies startup sequencing

Can use monitoring of “cluster queues” to start cluster “on demand”

11

Page 12: © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Jim Donahue | Principal Scientist Adobe Systems Technology Lab Flint: Making

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Flint: Where We Are Now

Have some intrepid, curious users

The big issue is always “Do I really want to use Spark/Shark?”

SQL is a big selling point

Scala is a mild put-off

Spark Streaming may help settle the issue

Open Sourcing is under discussion

If you’re interested, let me know!

12

Page 13: © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Jim Donahue | Principal Scientist Adobe Systems Technology Lab Flint: Making

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.