View
1.184
Download
1
Embed Size (px)
DESCRIPTION
Citation preview
Optimizing for Cost in the Cloud
Jinesh Varia
@jinman
Technology Evangelist
Multiple dimensions of optimizations
Cost Performance Response time Time to market High-availability Scalability Security Manageability …….
Optimizing for Cost
When you turn off your cloud resources, you actually stop paying for them
Continuous optimization in your architecture results in recurring savings in your next month’s bill
Elasticity is one of the fundamental
properties of the cloud that drives many of its economic benefits
#1 Use only what you need (use Auto Scaling Service, modify–db)
Optimizing for Cost…
Turn off what you don’t need (automatically)
0
2
4
6
8
10
12
14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Lo
ad
Hour
Daily CPU Load
25% Savings
Optimize by the time of day
Availability Zone #2
Availability Zone #1
Auto Scaling group : App Tier
Auto Scaling group : Web Tier
Elastic Load
Balancer
www.MyWebSite.com
(dynamic data)
media.MyWebSite.com
(static data)
Amazon Route 53
(DNS)
Amazon EC2
Amazon RDS Amazon
RDS
Amazon S3
Amazon
CloudFront
1 5 9 13 17 21 25 29 33 37 41 45 49
We
b S
erv
ers
Week
Optimize during a year
50% Savings
Auto scaling : Types of Scaling
Scaling by Schedule
• Use Scheduled Actions in Auto Scaling Service • Date
• Time
• Min and Max of Auto Scaling Group Size
• You can create up to 125 actions, scheduled up to 31 days into the future, for each of your auto scaling groups. This gives you the ability to scale up to four times a day for a month.
Scaling by Policy
• Scaling up Policy - Double the group size
• Scaling down Policy - Decrement by 1
Auto scaling Best Practices
Use Auto Scaling Tags
Use Auto scaling Alarms and Email Notifications
Scale up and down symmetrically
Scale up quickly and scaling down slowly
Auto Scaling across Availability Zones
Leverage Suspend and Resume Processes
Scale up by 10% if CPU utilization is greater than 60% for 5 minutes, Scale down by 10% if CPU utilization is less than 30% for 20 minutes.
Example:
Ag
g.
CP
U
Ins
tan
ce
s
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
RD
S D
B S
erv
ers
Days of the Month
75% Savings
Optimize during a month
End of the month processing
Expand the cluster at the end of the month • Expand/Shrink feature in Amazon Elastic MapReduce
Vertically Scale up at the end of the month • Modify-DB-Instance (in Amazon RDS) (or a New RDS DB Instance )
• CloudFormation Script (in Amazon EC2)
Tip: Use “Reminder scripts”
Disassociate your unused EIPs
Delete unassociated EBS volumes
Delete older EBS snapshots
Leverage S3 Object Expiration
AWS Support – Trusted Advisor – Your personal cloud assistant
Tip – Instance Optimizer
Instance
Amazon CloudWatch Alarm
Free Memory
Free CPU
Free HDD
At 1-min
intervals
Custom Metrics
PUT 2 weeks
“You could save a bunch of money by switching to a small instance, Click on CloudFormation Script to Save”
$$$ in Savings
#1 Use only what you need (use Auto Scaling Service, modify–db)
#2 Invest time in Reserved Pricing analysis (EC2, RDS)
Optimizing for Cost…
Save more when you reserve
On-demand Instances
• Pay as you go
• Starts from $0.02/Hour
Reserved Instances
• One time low upfront fee + Pay as you go
• $23 for 1 year term and $0.01/Hour
1-year and 3-year terms
Heavy Utilization RI
Medium Utilization RI
Light Utilization RI
The Total Cost Of (Non) Ownership in the Cloud Whitepaper (New!)
Whitepaper: http://bit.ly/aws-tco-webapps
Steady State Usage Pattern
(Example: Corporate Website)
Web Application Usage Patterns
Spiky Predictable Usage Pattern
(Example: Marketing Promotions Website)
Uncertain unpredictable Usage Pattern
(Example: Social game or Mobile Website)
Availability Zone #2
Availability Zone #1
Auto Scaling group : App Tier
Auto Scaling group : Web Tier
Elastic Load
Balancer
www.MyWebSite.com
(dynamic data)
media.MyWebSite.com
(static data)
Amazon Route 53
(DNS)
Amazon EC2
Amazon RDS Amazon
RDS
Amazon S3
Amazon
CloudFront
Example: TCO of a 3-tier Web Application
Utilization Sweet Spot Feature Savings over On-Demand
<10% On-Demand No Upfront Commitment
10% - 40% Light Utilization RI Ideal for Disaster Recovery Up to 56% (3-Year)
40% - 75% Medium Utilization RI Standard Reserved Capacity Up to 66% (3-Year)
>75% Heavy Utilization RI Lowest Total Cost Ideal for Baseline Servers
Up to 71% (3-Year)
$-
$2.000
$4.000
$6.000
$8.000
$10.000
$12.000
$14.000
Co
st
Utilization
Heavy Utilization
Medium Utilization
Light Utilization
On-Demand
m2.xlarge running Linux in US-East Region
over 3 Year period Break-even point
0
2
4
6
8
10
12
0 5 10 15 20 25 30 35
Tra
ffic
me
as
ure
d i
n S
erv
ers
/In
sta
nc
es
Months
Spiky Predictable Usage Pattern
Traffic Pattern
EC2 Reserved
Physical servers
(on-premises)
EC2 On-Demand
TCO Web Application - Spiky Usage Pattern
Amortized monthly costs
over 3 years
On-Premises Option
AWS Option 1
All Reserved AWS Option 2 Mix of On-Demand
and Reserved
AWS Option 3 All On-Demand
Compute/Server Costs
Server Hardware $510 $0 $0 $0
Network Hardware $103 $0 $0 $0
Hardware Maintenance $78 $0 $0 $0
Power and Cooling $286 $0 $0 $0
Data Center Space $240 $0 $0 $0
Personnel $2,000 $0 $0 $0
AWS Instances $0 $992 $881 $1,940
Total - Per Month $3,220 $992 $881 $1,940
Total - 3 Years $115,920 $35,717 $31,731 $69,854
Savings over On-premises
Option 69% 72% 40%
TCO of Spiky Predictable Web Application
Option 1: All Reserved
Option 2: Mix of On-Demand and Reserved Recommended Option (Most Cost-effective)
Option 3: All On-Demand Commitment-free and Risk-free Option
Recommendations
Steady State Usage Pattern • For 100% utilization
• 3-Year Heavy RI (for maximum savings over on-demand)
Spiky Predictable Usage Pattern • Baseline
• 3-Year Heavy RI (for maximum savings over on-demand) • 1-Year Light RI (for lowest upfront commitment) + savings over on-demand
• Peak: On-Demand
Uncertain and unpredictable Usage Pattern • Start out small with On-Demand Instances (risk-free and commitment-
free) • Switch to some combination of Reserved and On-Demand, if application is
successful • If not successful, you walk away having spent a fraction of what you would
pay to buy your own technology infrastructure
#1 Use only what you need (use Auto Scaling Service, modify–db)
#2 Invest time in Reserved Pricing analysis (EC2, RDS)
#3 Architect for Spot Instances (bidding strategies)
Optimizing for Cost…
Optimize by using Spot Instances
Heavy Utilization RI
Medium Utilization RI
Light Utilization RI
1-year and 3-year terms
On-demand Instances
• Pay as you go
• Starts from $0.02/Hour
Reserved Instances
• One time low upfront fee + Pay as you go
• $23 for 1 year term and $0.01/Hour
Spot Instances
• Requested Bid Price and Pay as you go
• $0.005/Hour as of today at 9 AM
What are Spot Instances?
Availability Zone
Region
Availability Zone
Unused
Unused
Unused
Unused
Unused
Unused
Sold at 50% Discount!
Sold at 56% Discount!
Sold at 66% Discount!
Sold at 59% Discount!
Sold at 54% Discount!
Sold at 63% Discount!
What is the tradeoff?
Availability Zone
Region
Availability Zone
Unused
Unused
Unused
Unused
Unused
Unused
Reclaimed
Reclaimed
Spot Use cases
Use Case Types of Applications
Batch Processing Generic background processing (scale out computing)
Hadoop Hadoop/MapReduce processing type jobs (e.g.
Search, Big Data, etc.)
Scientific Computing Scientific trials/simulations/analysis in chemistry, physics, and biology
Video and Image
Processing/Rendering
Transform videos into specific formats
Testing Provide testing of software, web sites, etc
Web/Data Crawling Analyzing data and processing it
Financial Hedgefund analytics, energy trading, etc
HPC Utilize HPC servers to do embarrassingly parallel jobs
Cheap Compute Backend servers for Facebook games
Save more money by using Spot Instances
Reserved Hourly Price > Spot Price < On-Demand Price
Spot: Example Customers
63%
50%
57%
50%
50%
66%
56%
50%
Typical Spot Bidding Strategies
1. Bid near the Reserved Hourly Price
2. Bid above the Spot Price History
3. Bid near On-Demand Price
4. Bid above the On-Demand Price
0%
2%
4%
6%
8%
10%
12%
14%
16%
18%
20%
Perc
en
tag
e o
f th
e D
istr
ibu
tio
n
Bid Price as Percentage of the On-Demand Price
Bid Distribution (for last 3 months)
$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $
1. Bid Near the Reserved Hourly Price
66% Savings over On-Demand
2. Bid above the Spot Price History
50% Savings over On-Demand
3. Bid near the On-Demand Price
50% Savings over On-Demand
4. Bid above the On-Demand Price
57% Savings over On-Demand
Managing Interruption
Amazon Elastic MapReduce
Hadoop Cluster
HDFS
Task Node
Task Node
Core Node
Core Node
Input
Data Outpu
tData
Amazon S3
Metadata
Amazon DynamoDB
BI Apps
Upload large
datasets or log
files directly Data
Source
Code/
Scripts
Amazon S3
Service
Amazon Elastic
MapReduce
HiveQL
Pig Latin
Cascading
Mapper
Reducer
Runs multiple
JobFlow Steps
Name Node
JDBC/ODB
C
HiveQL
Pig Latin
Query
Amazon EMR (Hadoop): Run Task Nodes on Spot
#1: Cost without Spot 4 instances *14 hrs * $0.45 = $25.20
Job Flow
14 Hours
Duration:
Scenario #1
Duration:
Job Flow
7 Hours
Scenario #2
#2: Cost with Spot 4 instances *7 hrs * $0.45 = $12.60 + 5 instances * 7 hrs * $0.225 = $7.875 Total = $20.475
Time Savings: 50% Cost Savings: ~19%
Amazon EMR: Reducing Cost with Spot
Use Case: Web crawling/Search using Hadoop type clusters. Use Reserved Instances for their DB workloads and Spot instances for their indexing clusters. Launch 100’s of instances.
Bidding Strategy: Bid a little above the On-Demand price to prevent interruption.
Interruption Strategy: Restart the cluster if interrupted
Made for each other: MapReduce + Spot
66% Savings over On-Demand
On-demand + Spot
Amazon S3
Amazon SQS
Amazon DynamoDB
Job
Amazon S3
Amazon SQS
Amazon DynamoDB
Completed
Job Reports
Website
Amazon
CloudWatch
Amazon
Elastic Compute Cloud
Amazon EC2
Amazon EC2
Amazon EC2
Input Queue
Output Queue
Input Bucket
Output Bucket
Website (Job
Manager)
Intranet
Video Transcoding Application Example
Use of Amazon SQS in Spot Architectures
VisibilityTimeOut Amazon EC2
Spot Instance
Optimizing Video Transcoding Workloads
Free Offering • Optimize for reducing cost
• Acceptable Delay Limits
Implementation
• Set Persistent Requests
• Use on-demand Instances, if delay
Maximum Bid Price
< On-demand Rate
Get your set reduced price for your workload
Premium Offering Optimized for Faster response times
No Delays
Implementation
Invest in RIs
Use on-demand for Elasticity
Maximum Bid Price
>= On-demand Rate
Get Instant Capacity for higher price
Persistent Requests
Architecting for Spot Instances : Best Practices
Manage interruption
• Split up your work into small increments
• Checkpointing: Save your work frequently and periodically
Test Your Application
Track when Spot Instances Start and Stop
Spot Requests
• Use Persistent Requests for continuous tasks
• Choose maximum price for your requests
#1 Use only what you need (use Auto Scaling Service, modify–db)
#2 Invest time in Reserved Pricing analysis (EC2, RDS)
#3 Architect for Spot Instances (bidding strategies)
#4 Leverage Application Services (ELB, SNS, SQS, SWF, SES)
Optimizing for Cost…
Optimize by converting ancillary instances into services
Monitoring: CloudWatch Notifications: SNS Queuing: SQS SendMail: SES Load Balancing: ELB Workflow: SWF Search: CloudSearch
Elastic Load Balancing
Elastic Load Balancing
Pros
Elastic and Fault-tolerant
Auto scaling
Monitoring included
Cons
For Internet-facing traffic only
Software LB on EC2
Pros
Application-tier load balancer
Cons
SPOF
Elasticity has to be implemented manually
Not as cost-effective
Web Servers
$0.08 per hour
(small instance)
Availability Zone
$0.025 per hour
Web Servers
Availability Zone
EC2 instance
+ software LB
Elastic Load
Balancer DNS
DNS
Application Services
SNS, SQS, SES, SWF
Pros
Pay as you go
Scalability
Availability
High performance
Software on EC2
Pros
Custom features
Cons
Requires an instance
SPOF
Limited to one AZ
DIY administration
Producer
SQS queue
Consumers
Consumers
Producer
EC2 instance
+ software queue
$0.01 per
10,000 Requests ($0.000001 per Request)
$0.08 per hour
(small instance)
#1 Use only what you need (use Auto Scaling Service, modify–db)
#2 Invest time in Reserved Pricing analysis (EC2, RDS)
#3 Architect for Spot Instances (bidding strategies)
#4 Leverage Application Services (ELB, SNS, SQS, SWF, SES)
#5 Implement Caching (ElastiCache, CloudFront)
Optimizing for Cost…
Optimize for performance and cost by page caching and edge-caching static content
caching
When am I charged?
Paris
Singapore
London
Amazon Simple Storage Service
(S3)
Edge Location
Edge Location
Edge Location
Client
Client
Client
Amazon Elastic
Compute Cloud
(EC2)
When content is popular…
Paris
Singapore
London
Amazon Simple Storage Service
(S3)
Edge Location
Edge Location
Edge Location
Client
Client
Client
Amazon Elastic
Compute Cloud
(EC2)
Architectural Recommendations
Use Amazon S3 + CloudFront as it will reduce the cost as well as reduce latency for static data • Depends on cache-hit ratio
For Video Streaming, use CloudFront as there is no need of a separate streaming server running Adobe FMS
Use managed caching service (Amazon ElastiCache)
#1 Use only what you need (use Auto Scaling Service, modify–db)
#2 Invest time in Reserved Pricing analysis (EC2, RDS)
#3 Architect for Spot Instances (bidding strategies)
#4 Leverage Application Services (ELB SNS, SQS, SWF, SES)
#5 Implement Caching (ElastiCache, CloudFront)
Number of ways to further save with AWS…
http://aws.amazon.com