30
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished Engineer SPOT205: November 23, 2013

Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Why Scale Matters & How the Cloud Really is Different

James Hamilton, AWS VP & Distinguished Engineer

SPOT205: November 23, 2013

Page 2: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

Agenda

Redefining Scale

at AWS

AWS Designed

Hardware &

Infrastructure

Multi-AZ Design Point

& Why it Works

Page 3: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

Perspective on Scaling

On average, AWS adds enough

new server capacity every

day to support Amazon’s

global infrastructure when it

was a $7B business (2004).

Page 4: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

AWS Global Infrastructure

9 regions

25 availability zones

42 edge locations

Page 5: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

Amazon S3 Growth

Q4 2006 Q4 2007 Q4 2008 Q4 2009 Q4 2010 Q4 2011 Q4 2012 Q4 2013

Peak Requests:

2,000,000+

per second

Total Number of S3 Objects

2.9 Billion 14 Billion 40 Billion 102 Billion

762 Billion

262 Billion

>1.7 Trillion

>3 Trillion

Peak requests:

1.5M/sec

Page 6: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

DynamoDB Requests Served/Month

Page 7: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

DynamoDB: Consistent Performance at Scale

Page 8: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

“AWS is the overwhelming market

share leader, with more than five

times the compute capacity in

use than the aggregate total of the

other fourteen providers.”

Page 9: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

Agenda

Redefining Scale

at AWS

AWS Designed

Hardware &

Infrastructure

Multi-AZ Design Point

& Why it Works

Page 10: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

Pace of Innovation

Infrastructure pace of

innovation increasing

– Driven by cloud service providers and

high-scale internet applications

– Cost of datacenter and H/W

infrastructure dominates

– Infrastructure more than just a cost

center

High focus on innovation

– Driving down cost

– Increasing aggregate reliability

– Reducing resource consumption

footprint

Page 11: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

AWS Custom Server Designs

OEM Server Ecosystem

– Optimized for 10s to 100s of thousands of customers

– Broadly applicable servers can run a variety of workloads

Cloud Server Ecosystem – Optimized for single customer

– Highly specialized servers optimized for specific workload

– Large scale deployments allow hardware specialization

– Move hot s/w kernels to hardware implementations

– Datacenters, servers, networking, storage to designed to integrated spec.

Page 12: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

AWS Custom Storage Designs

Commercial high-density storage:

• Quanta M4600H 4U Disk Enclosure

• Impressive best in class general purpose design

• We use custom design with still higher density

OEM storage & servers must target vast workload

diversity

High scale supports AWS-specific optimizations

– More space, power, & cost efficient

Page 13: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

Networking Equipment

• Relative cost of networking

increasing quickly

• Profit margins high

• Ecosystem vertically

integrated

8%

3 year server & 10 year infrastructure amortization

Monthly Costs

Page 14: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

Get the Network Out of the Way

Current Networks Over-Subscribed Mainframe Model Goes Commodity

• Forces workload placement

restrictions

• Goal: Make all points in

datacenter equidistant

• Amazon custom routers &

protocol stacks

Page 15: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

Power Infrastructure

Negotiated power purchasing

agreements

AWS custom high-voltage

sub-stations in some regions

– Lower power cost

– Build faster

Page 16: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

Super Bowl Power Outage 34 minute outage that very nearly changed the 2013 game

“A piece of equipment that was designed to monitor electrical load sensed an abnormality in the system. The equipment operated as designed and opened a breaker that partially cut power to the Superdome in order to isolate the issue. Backup generators kicked in immediately as designed.”

Lights without immediate backup power – Restarting gas discharge lights takes 15+ min

Highly likely backup power wouldn’t have helped

– Switchgear lockout

We design & deploy custom switch firmware

Page 17: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

Carbon Neutral Power Choice

Most companies rarely build new

datacenters so there are few new

power procurement options

The entire multi-datacenter US-WEST

(Oregon) is 100% carbon neutral

One of the largest AWS regions

world-wide

– And, by far, the fastest growing

Page 18: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

Procurement & Supply Chain Optimization

Global demand allows

purchasing power at volume

Direct component purchasing

– Precise inventory control

– Better pricing

– Optimized designs

Supply Chain Procurement

Demand-driven supply chain

Shorter cycle time drives higher

utilization

– Predicting next week easier

than 4 to 6 months out

Less overbuy & less capacity risk

yielding lower costs

Page 19: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

Utilization & Economics

On premise 30% utilization

VERY good &10% to 20%

more common

Solution: Pool number of

heterogeneous services

Don’t block the business

Don’t over-buy

Transfers capital expense

to variable expense

Apply capital for business

investments rather than

infrastructure

Cost encourages prioritization

of work by application

developers

High scale needed to make a

spot market for low priority

work

Pay as You Go

Pay as You Grow

Server Utilization

Problem

Chargeback Models

Drive Good Behavior

Page 20: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

Amazon Cycle of Innovation

15+ years of

operational excellence

Lower Reduce Prices

Innovate

Listen to Customers

Lower Costs

Improve Processes

Re-invest in

Features

38 AWS price

reductions since 2006

Page 21: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

AWS Pace of Innovation New Service Announcements & Updates

235

Page 22: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

Agenda

Redefining Scale

at AWS

AWS Designed

Hardware &

Infrastructure

Multi-AZ Design Point

& Why it Works

Page 23: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

Conventional Design: Cross-Region Replication

5th app availability “9” only via multi-datacenter replication

Conventional approach:

– Two datacenters in distant locations

– Replicate all data to both datacenters

The industry-wide dominant multi-DC availability approach

– Looks rock solid but performs remarkably poorly in

practice

Acid Test: Are you willing to pull the plug on the primary server?

99.999%

Page 24: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

What is wrong with inter-regional replication?

Asynchronous replication between datacenters

– Committing to an SSD order 1 to 2 msec

– LA to New York 74 msec round trip

On failure, a difficult & high skill decision:

– Fail-over & lose transactions, or

– Don’t fail-over & lose availability

I’ve been on these calls in the past

– No win situation

– Very hard to get right

Page 25: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

What Else is Wrong with X-Country Replication?

Fragile: Active/Passive Doesn’t Work – Failover to a system that hasn’t been taking operational load

– Passive secondary not recently tested

– Secondary config or S/W version different, incorrect load balancer config,

incorrect network ACLs, latent hardware problem, router problem,

resource shortage under load

– Can’t test without negative customer impact

– If you don’t test it, it won’t work

2-Way Redundancy Expensive: – More than ½ capacity reserved to handle failure

– 3 datacenters much less expensive but impractical w/o high scale

Page 26: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

AWS Multi-Availability Zone Model

Choose Region to be close to user, close to data, or meeting jurisdictional

requirements

Synchronous replication to 2 (or better 3) Availability Zones

– Easy when less than 2 to 3 msec away

– Can failover w/o customer impact

ELB over EC2 instances in different AZs

Stateless EC2 apps easy

For persistent state use

– DynamoDB

– Simple Storage Service

– Mutli-AZ RDS

Page 27: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

New Research: Customers Improve Availability by Migrating Apps to AWS

32% reduction in total application downtime

2013 AWS Customer Survey

Research Note: Benchmarking availability and reliability

in the cloud: Amazon Web Services Nucleus Research,

November 2013, Document N168

Page 28: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

Is Hosting On-premises Less Expensive?

Utilization fundamentally higher in cloud

– Aggregating non-correlated workloads,

scale, spot market

Amazon specific H/W designs

– ODM acquisition of custom servers & net

gear

– Direct purchasing of disk, memory, & CPU

– AWS controlled hypervisor & net protocol

layers

Deep R&D: Many new data centers built each

year

Immense scale

– Volume purchasing, highly automated,

specialists in all areas

Amazon margins are tiny compared to

enterprise margins

Page 29: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

Summary AWS Economics driven by scale & singular focus

– Economies of scale

– Increased availability through multiple-datacenter deployment

– Steadily declining price

Mega-scale advantages available to all customers regardless of size

– Datacenter presence near all customers world-wide

– Multiple datacenters in each region for high availability

– Deeper R&D investment & operational focus in datacenter, server, storage, &

networking than any IT organization in the world

– Buying power that rivals the biggest in the world

Cloud Model Fundamentally different from the last 30 years

– Even if rebranded as “cloud enabled”, “private cloud”, “cloud-like”

Page 30: Why Scale Matters and how the Cloud Really is Differentawsmedia.s3.amazonaws.com/SPOT205.pdf · Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished

Please give us your feedback on this

presentation

As a thank you, we will select prize

winners daily for completed surveys!

SPOT205