Practical Cloud & Workflow Orchestration

Preview:

DESCRIPTION

A presentation given at the 2011 Amazon AWS Genomics meeting held in Seattle, WA. This is a 30 minute talk I gave focusing mainly on practical tools, tips and methods for bootstrapping and orchestration on the cloud. Covers examples of: Ubuntu Cloud Init AWS Cloud Formation Opscode Chef MIT StarCluster

Citation preview

Practical Cloud & Workflow Orchestration

2011 Amazon Genomics Event Chris Dagdigian

chris@bioteam.net  

I’m Chris. I’m an infrastructure geek. I work for the BioTeam.

Twitter: @chris_dag  

Disclaimer.

I’m not an Amazon shill.

Really.

The IaaS competition just can’t compete.

AWS lets me build useful stuff.

When stuff gets built, I get paid.

Installing VMware & excreting a press release does not turn a

company into a cloud provider.

I need more than just virtual compute and block storage. AWS has tons of glue

and many useful IaaS building blocks.

IaaS competitors lag far behind in features and service offerings.

Speaking of pretenders…

No APIs? Not a cloud.

No self-service? Not a cloud.

I have to email a human? Not a cloud.

50% failure rate on server launch? Lame cloud.

Virtual servers & block storage only? Barely a cloud.

I’m getting insufferable, huh? ���Moving on …

Three Topics Today.

Time, Laziness & Beauty.

image: shanelin via flickr

Tick … Tick Tick…

image: shanelin via flickr

User expectations are changing.

image: shanelin via flickr

Automated provisioning ���can shrink the time between ���

“I want to do some science” & ���“I’m ready to do some science”.

image: shanelin via flickr

However…

image: shanelin via flickr

If servers, storage and systems can be deployed in minutes …

image: shanelin via flickr

… why does it still take days, several helpdesk tickets & a team of humans to load software and configure my

systems to actually do science?

image: shanelin via flickr

It shouldn’t.

image: shanelin via flickr

If provisioning gets faster, configuration management ���also needs to keep pace.

Laziness.

Larry Wall’s 1st Great Virtue

“… the quality that makes you go to great effort to reduce overall energy expenditure. It makes you write labor-saving programs that other people will find useful, and document what you wrote so you don't have to answer

so many questions about it.”

It’s all scriptable.

•  Servers •  Storage •  Network •  Bootstrapping •  Provisioning •  Configuration •  Management •  Monitoring •  Scaling •  Accounting &

audit trails

Not hype. Real.

I can do it from my ipad.

No cubicle required.

Our research IT infrastructures can now be 100% virtual and 100% scriptable

And it’s pretty easy to understand.

Anyone can drive this stuff.

Especially motivated researchers.

Stuff like this is a big deal.

5GB managed MySQL in the cloud. $.011 / hour

Database Administrator not required.

Automatic patching, backups & clustering

Anyone with a web browser can launch one.

Beauty.

Scriptable infrastructure is just the beginning.

The really cool stuff is what we build on top.

With good tools …

We can orchestrate complex systems, pipelines and workflows.

Orchestrated systems working in concert��� are a beautiful thing.

Let me show you a few of the tools we like.

Cloud Init

Cloud Init •  https://help.ubuntu.com/community/UEC •  Developed by Ubuntu •  Baked into all Ubuntu UEC releases •  Also baked into Amazon Linux AMIs •  Works on Eucalyptus clouds as well

Cloud Init gives you a hook into freshly booted systems.

It’s a great and easy-to-comprehend way to bootstrap or customize generic server images.

When you launch a server, you can inject a YAML formatted file into the environment.

Cloud init files are parsed and executed right after the node boots for the first time.

You can run scripts, install software, load SSH keys, etc. to ‘bootstrap’ a generic node.

#cloud-config!packages:! - httpd!!runcmd:! - /etc/init.d/httpd start ! - echo "<h1>Hello Amazon Genomics Event!</h1>” \ !!> /var/www/html/index.html!

!

Previous real-world example does this: 1.  Download/install Apache web server 2.  Turn on the web server 3.  Create a cheezy index.html

This is the script I ran moments before this talk …

#!/bin/sh!!ec2-run-instances ami-8c1fece5 \! -n 1 \! -t m1.small \! -g dagdemo-SG \! -k dagdemo-sshkeypair \! --user-data-file ./cloudInit-config.txt!!

Important to understand: •  ami-8c1fece5 is Amazon Linux public AMI •  No web server pre-installed •  Never before been ‘touched’ by me •  Cloud Init does it all via the script I injected at

instance launch time

Lets see if it worked …

Amazon CloudFormation

Amazon CloudFormation •  http://aws.amazon.com/cloudformation/ •  AWS specific •  Sweet way to turn on|off entire stacks of

related and dependent AWS services

Treat complex infrastructure as single resource •  Cliché example - In a single “stack” you can

define and then start/stop: •  Elastic database cluster + •  Elastic webserver cluster + •  Monitoring & auto-scaling triggers •  Event & error notification •  Elastic load balancer

My live demo of CloudFormation •  Using the example WordPress Blog template •  It does a ton of cool stuff: •  RDS backend for mySQL database, elastic

webserver cluster with auto-scaling, security group setup, automatic scaling, automatic alarm notices

•  It all sits behind an elastic load balancer

My CloudFormation blog demo: •  Actual stack file at http://biote.am/6d •  Check it out … •  .JSON formatted but still quite readable

•  It lets me define and then control a ton of different related AWS services all at once.

#!/bin/sh!# Launch Stack !cfn-create-stack AWSGenomics-demoStack \! --template-file cf-wordpress.json.txt!!!

#!/bin/sh!# Check state & status!!cfn-describe-stacks AWSGenomics-demoStack!echo ""!cfn-describe-stack-events \ ! AWSGenomics-demoStack --headers!

10 AWS Services/Resources orchestrated as one.

Cloudwatch.

Auto-scaling triggers.

SNS Endpoints for Alarms.

Alarm triggers.

RDS Database & Security Group.

Elastic Load Balancer.

EC2 Security Group.

Cool, huh?

{ in case the demo fails! }

Opscode Chef

Chef enables Infrastructure as Code

It’s freaking awesome.

Chef lets you:��� Manage configuration as idempotent Resources. Group resources as idempotent Recipes. Group recipes into Roles. Track it all like Source Code.��� Search your infrastructure like a ninja. Ohai!��� Configure your systems, software & pipelines

http://www.opscode.com/chef/ ��� •  Several flavors •  Open source •  Commercial / Managed •  Commercial / ‘Behind your Firewall’ ���

•  No time today for even a short description

of how it works. You should check it out.

Chef demo via ‘knife’ command line …

knife ec2 server create \! -N aws-genomicsDemo \! -I ami-63be790a \! -f t1.micro \! -G default \! -S bioteam-IAM-admins-v1 \! -r 'recipe[getting-started]' \! -i ./bioteam-IAM-admins-v1.pem \! -x ubuntu!

Fully automatic remote bootstrapping …

Done!

Search-driven, parallel remote SSH execution

knife ssh name:aws-genomicsDemo \! -a cloud.public_hostname \! -x ubuntu \! -i bioteam-IAM-admins-v1.pem \! 'sudo chef-client; \! cat /tmp/chef-getting-started.txt'!

Lets install some genomics tools��� •  Our Maq short read assembler cookbook: •  Installs all dependencies (compilers, etc.) •  Puts application source on node •  Builds maq from source •  Installs it

$ knife node \! run_list add \! aws-genomicsDemo \! 'recipe[maq]'!

It really is that easy.

MIT StarCluster

MIT Starcluster •  http://web.mit.edu/stardev/cluster •  Ready to use Linux compute farm on AWS •  Grid Engine, MPI, NFS filesystems •  Libraries, tools, applications •  Easy to use, easy to extend •  Integrates well with Chef

If you have not built Linux clusters from scratch before …

It’s hard to really appreciate everything that StarCluster does behind the scenes.

MIT Starcluster – More Info��� •  Live demo (time permitting) •  StarCluster & Spot Instances Screencast •  http://biote.am/6c •  http://aws.amazon.com/ec2/spot-and-

science/

Phew. That’s a lot of slides.

Time to explore the demos?

Questions?

Thanks! Related talk slides: http://biote.am/6a

“Mapping Informatics to the Cloud”

Recommended