Upload
chris-dagdigian
View
7.274
Download
6
Embed Size (px)
DESCRIPTION
A presentation given at the 2011 Amazon AWS Genomics meeting held in Seattle, WA. This is a 30 minute talk I gave focusing mainly on practical tools, tips and methods for bootstrapping and orchestration on the cloud. Covers examples of: Ubuntu Cloud Init AWS Cloud Formation Opscode Chef MIT StarCluster
Citation preview
Practical Cloud & Workflow Orchestration
2011 Amazon Genomics Event Chris Dagdigian
I’m Chris. I’m an infrastructure geek. I work for the BioTeam.
Twitter: @chris_dag
Disclaimer.
I’m not an Amazon shill.
Really.
The IaaS competition just can’t compete.
AWS lets me build useful stuff.
When stuff gets built, I get paid.
Installing VMware & excreting a press release does not turn a
company into a cloud provider.
I need more than just virtual compute and block storage. AWS has tons of glue
and many useful IaaS building blocks.
IaaS competitors lag far behind in features and service offerings.
Speaking of pretenders…
No APIs? Not a cloud.
No self-service? Not a cloud.
I have to email a human? Not a cloud.
50% failure rate on server launch? Lame cloud.
Virtual servers & block storage only? Barely a cloud.
I’m getting insufferable, huh? ���Moving on …
Three Topics Today.
Time, Laziness & Beauty.
image: shanelin via flickr
Tick … Tick Tick…
image: shanelin via flickr
User expectations are changing.
image: shanelin via flickr
Automated provisioning ���can shrink the time between ���
“I want to do some science” & ���“I’m ready to do some science”.
image: shanelin via flickr
However…
image: shanelin via flickr
If servers, storage and systems can be deployed in minutes …
image: shanelin via flickr
… why does it still take days, several helpdesk tickets & a team of humans to load software and configure my
systems to actually do science?
image: shanelin via flickr
It shouldn’t.
image: shanelin via flickr
If provisioning gets faster, configuration management ���also needs to keep pace.
Laziness.
Larry Wall’s 1st Great Virtue
“… the quality that makes you go to great effort to reduce overall energy expenditure. It makes you write labor-saving programs that other people will find useful, and document what you wrote so you don't have to answer
so many questions about it.”
It’s all scriptable.
• Servers • Storage • Network • Bootstrapping • Provisioning • Configuration • Management • Monitoring • Scaling • Accounting &
audit trails
Not hype. Real.
I can do it from my ipad.
No cubicle required.
Our research IT infrastructures can now be 100% virtual and 100% scriptable
And it’s pretty easy to understand.
Anyone can drive this stuff.
Especially motivated researchers.
Stuff like this is a big deal.
5GB managed MySQL in the cloud. $.011 / hour
Database Administrator not required.
Automatic patching, backups & clustering
Anyone with a web browser can launch one.
Beauty.
Scriptable infrastructure is just the beginning.
The really cool stuff is what we build on top.
With good tools …
We can orchestrate complex systems, pipelines and workflows.
Orchestrated systems working in concert��� are a beautiful thing.
Let me show you a few of the tools we like.
Cloud Init
Cloud Init • https://help.ubuntu.com/community/UEC • Developed by Ubuntu • Baked into all Ubuntu UEC releases • Also baked into Amazon Linux AMIs • Works on Eucalyptus clouds as well
Cloud Init gives you a hook into freshly booted systems.
It’s a great and easy-to-comprehend way to bootstrap or customize generic server images.
When you launch a server, you can inject a YAML formatted file into the environment.
Cloud init files are parsed and executed right after the node boots for the first time.
You can run scripts, install software, load SSH keys, etc. to ‘bootstrap’ a generic node.
#cloud-config!packages:! - httpd!!runcmd:! - /etc/init.d/httpd start ! - echo "<h1>Hello Amazon Genomics Event!</h1>” \ !!> /var/www/html/index.html!
!
Previous real-world example does this: 1. Download/install Apache web server 2. Turn on the web server 3. Create a cheezy index.html
This is the script I ran moments before this talk …
#!/bin/sh!!ec2-run-instances ami-8c1fece5 \! -n 1 \! -t m1.small \! -g dagdemo-SG \! -k dagdemo-sshkeypair \! --user-data-file ./cloudInit-config.txt!!
Important to understand: • ami-8c1fece5 is Amazon Linux public AMI • No web server pre-installed • Never before been ‘touched’ by me • Cloud Init does it all via the script I injected at
instance launch time
Lets see if it worked …
Amazon CloudFormation
Amazon CloudFormation • http://aws.amazon.com/cloudformation/ • AWS specific • Sweet way to turn on|off entire stacks of
related and dependent AWS services
Treat complex infrastructure as single resource • Cliché example - In a single “stack” you can
define and then start/stop: • Elastic database cluster + • Elastic webserver cluster + • Monitoring & auto-scaling triggers • Event & error notification • Elastic load balancer
My live demo of CloudFormation • Using the example WordPress Blog template • It does a ton of cool stuff: • RDS backend for mySQL database, elastic
webserver cluster with auto-scaling, security group setup, automatic scaling, automatic alarm notices
• It all sits behind an elastic load balancer
My CloudFormation blog demo: • Actual stack file at http://biote.am/6d • Check it out … • .JSON formatted but still quite readable
• It lets me define and then control a ton of different related AWS services all at once.
#!/bin/sh!# Launch Stack !cfn-create-stack AWSGenomics-demoStack \! --template-file cf-wordpress.json.txt!!!
#!/bin/sh!# Check state & status!!cfn-describe-stacks AWSGenomics-demoStack!echo ""!cfn-describe-stack-events \ ! AWSGenomics-demoStack --headers!
10 AWS Services/Resources orchestrated as one.
Cloudwatch.
Auto-scaling triggers.
SNS Endpoints for Alarms.
Alarm triggers.
RDS Database & Security Group.
Elastic Load Balancer.
EC2 Security Group.
Cool, huh?
{ in case the demo fails! }
Opscode Chef
Chef enables Infrastructure as Code
It’s freaking awesome.
Chef lets you:��� Manage configuration as idempotent Resources. Group resources as idempotent Recipes. Group recipes into Roles. Track it all like Source Code.��� Search your infrastructure like a ninja. Ohai!��� Configure your systems, software & pipelines
http://www.opscode.com/chef/ ��� • Several flavors • Open source • Commercial / Managed • Commercial / ‘Behind your Firewall’ ���
• No time today for even a short description
of how it works. You should check it out.
Chef demo via ‘knife’ command line …
knife ec2 server create \! -N aws-genomicsDemo \! -I ami-63be790a \! -f t1.micro \! -G default \! -S bioteam-IAM-admins-v1 \! -r 'recipe[getting-started]' \! -i ./bioteam-IAM-admins-v1.pem \! -x ubuntu!
Fully automatic remote bootstrapping …
Done!
Search-driven, parallel remote SSH execution
knife ssh name:aws-genomicsDemo \! -a cloud.public_hostname \! -x ubuntu \! -i bioteam-IAM-admins-v1.pem \! 'sudo chef-client; \! cat /tmp/chef-getting-started.txt'!
Lets install some genomics tools��� • Our Maq short read assembler cookbook: • Installs all dependencies (compilers, etc.) • Puts application source on node • Builds maq from source • Installs it
$ knife node \! run_list add \! aws-genomicsDemo \! 'recipe[maq]'!
It really is that easy.
MIT StarCluster
MIT Starcluster • http://web.mit.edu/stardev/cluster • Ready to use Linux compute farm on AWS • Grid Engine, MPI, NFS filesystems • Libraries, tools, applications • Easy to use, easy to extend • Integrates well with Chef
If you have not built Linux clusters from scratch before …
It’s hard to really appreciate everything that StarCluster does behind the scenes.
MIT Starcluster – More Info��� • Live demo (time permitting) • StarCluster & Spot Instances Screencast • http://biote.am/6c • http://aws.amazon.com/ec2/spot-and-
science/
Phew. That’s a lot of slides.
Time to explore the demos?
Questions?
Thanks! Related talk slides: http://biote.am/6a
“Mapping Informatics to the Cloud”