What makes AWS invincible? from JAWS Days 2014

Preview:

DESCRIPTION

 

Citation preview

What makes AWS invincible?Haruka Iwao, 2014/03/15

Before talking about AWS

About myself

Haruka Iwao (@Yuryu) DevOps Engineer

at FreakOut, Inc.

Lived in Osaka, Tsukuba, Yokohama. Now in Tokyo.

Playing FFXIV ARR

Me

Final Fantasy XIV ARR Status:Cleared the Coil Turn 5.Got my Allagan Weapon.

Kindle Publishing

Publishing Kindle books about the Linux Kernel

Search “Yuryu Linux”

About FreakOut, Inc.

Not about freaking out :p Advertisement company

Established in 2010

“Real-time Bidding”

Real-time bidding

SSPSupply-side

Platform

DSPDemand-side

Platform

DSPDemand-side

Platform

DSPDemand-side

PlatformRequest a page

Read an ad tag

Call for bids

DSP decides the best ad for the user and page

Real-time bidding (2)

SSPSupply-side

Platform

DSPDemand-side

Platform

DSPDemand-side

Platform

DSPDemand-side

Platform

Bid

Auction

Return the winning ad

Real-time bidding (3)

http://londoncreative.com/real-time-bidding-spending-to-significantly-increase/

Our motto

50ms or die. Return a response within

50ms or lose an auction automatically.

Latency matters. Literally.

How we use AWS

Our system at a glance

http://aws.amazon.com/jp/solutions/case-studies/freakout/

Mix of on-premise and AWS

On-premise in Japan AWS in North America

Starting small Scaling well No need to visit a DC

Latency matters

Latency matters

Latency is important for our service

1ms = 1/50 of processing time

Latency between servers

Freedom to build an arbitrary network

... Gives you an arbitrary latency

Longer latency in AWS

On-premise

time=0.063 ms

time=0.083 ms

time=0.077 ms

time=0.070 ms

time=0.092 ms

time=0.069 ms

time=0.077 ms

AWS, extreme case

time=1.88 ms

time=1.96 ms

time=2.60 ms

time=3.72 ms

time=2.46 ms

time=1.05 ms

time=2.37 ms

Longer latency in AWS (2)

Hard to see? Let’s make a graph...

Longer latency, illustrated

On-premise AWS0

0.5

1

1.5

2

2.5

RTT(ms)

RTT(ms)

Longer latency in AWS (3)

This is not always true Just an extreme case

This applies to intra-AZ “Option” to group servers

in near racks would be great

Placement groups

Placement groups are not enough

Only available to cluster compute instances

Guarantees bandwidth, not latency

Possible workarounds

Assume the latency Design your app accordingly Use persistent connections Put hot data on local

Still, lower latency gives “extra” room

Infrastructure as Code

The “Awesome” Console

... So awesome to make mistakes easily...

AWS is Programmable.

Thou hast SDK.

Python

Thou hast CLI.

CLI

Thou hast CloudFormation.

AWS CloudFormation

SDK + CLI + CloudFormation You can “code” your

infrastructure Infrastructure becomes

“reproducible” and “reusable”

Always use CLI

Always use CLI to make changes “Review” the commands Less chance of “oops”

But...

CLI is hard to understand!

VS

aws ec2 run-instances --image-id ami-xxxxxxxx --count 1 --instance-type t1.micro --key-name MyKeyPair --security-group-ids sg-xxxxxxxx --subnet-id subnet-xxxxxxxx

Easy enough?

No way...

Record & Play

“Record” instructions on the Web Console

“Playback” them using CLI In other words...

Converted to

aws ec2 run-instances --image-id ami-xxxxxxxx --count 1 --instance-type t1.micro --key-name MyKeyPair --security-group-ids sg-xxxxxxxx --subnet-id subnet-xxxxxxxx

With “playback”

You could review changes beforehand

You could record changes and reuse them

Easier than writing CLI commands by hand

A very famous quote about “code”

All your code are belong to test

Testing is Important

Every program has bugs “Infrastructure as Code” is

no exception

How do you test?How do you

test?

Bugs can be fatal

A bug can destroy your whole system

What if you accidentally Terminate an instance Set a wrong route table Delete RR from Route53

“Sandbox” for testing

VPC is (sometimes) not enough

Test 100% bootstrap in a safe environment Register IAM accounts Add Route53 zones Set up S3 buckets, etc…

Framework for testing

Test-kitchen to test your Chef cookbooks

Serverspec to test your server setups

How do you verify your changes to AWS?

Possible workarounds

Use a separate account Maybe we need more

environments in the future? Costs money

CloudFormer converts environments to configuration

Scenario #1

You add a new rule to your security group aws ec2 authorize-security-

You want to make sure a port is open or closed between particular hosts How?

Workaround #1

Create a new VPC Apply the new rule Launch two instances Check connectivity

Scenario #2

You set up Route53 Health Checks

Now you want to test if it actually fails-over How?

Workaround #2

Set up two ELBs / instances Stop instances registered

to one ELB Query to R53 until it fails-

over

Need a solution!

A “common language” to verify AWS configuration

Want to run tests cheaper, quicker and safer

Even the requirements are not yet clear…

In the end of the presentation…

What makes AWS invincible? Lower latency

Giving options or hints to EC2

“Playback” feature Generate CLI commands

using simple UI

Testing methodology

Thank you!