(CMP401) Elastic Load Balancing Deep Dive and Best Practices

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Colm MacCárthaigh – AWS Principal Engineer

October 2015

CMP401

Elastic Load BalancingDeep Dive and Best Practices

Elastic Load Balancing

Security Scalability Availability

Security

Threat modeling

SSL/TLS SSL security policies

• Same-day mitigation for POODLE

• Same-day mitigation for LogJam

• Same-day mitigation for Heartbleed

• RC4 removed in advance of ratings and compliance

changes

SSL/TLS management

SSL/TLS cipher suites

• Always prefer perfect forward secrecy

• Prefer AES over 3DES over RC4

• Prefer GCM over CBC + HMAC

• Compare against billions of connections from real-world

clients

SSL/TLS cipher suites

• Legacy clients can cause compatibility issues

• Old firmware in embedded systems

• TVs, controllers, web scrapers…

• ELB defaults strike a balance

• Access log gap analysis

• We recommend ELBSecurityPolicy-2015-05


2015-05-13T23:39:43.945958Z my-loadbalancer192.168.131.39:2817 10.0.0.1:80 0.000086 0.001048 0.001337 200 200 0 57 "GET https://www.example.com:443/ HTTP/1.1” "curl/7.38.0" DHE-RSA-AES128-SHA TLSv1.2

S3





ELB and security compartmentalization

Public subnet

Private subnet


Public subnet

Private subnet


Public subnet

Private subnet


Public subnet

Private subnet


Public subnet

Private subnet


Public subnet

Private subnet

Threat modeling

Scalability

Scalability

L = λW

Scalability

W = L / λ

Scalability

Latency = Load /

Throughput

Scalability

Memory Latency

Scalability

Caching and cache misses

Scalability

Processing time

Count

Scalability

GET / HTTP/1.1

GET /monthly_report/ HTTP/1.1

Scalability

Processing time

Count

Scalability

Wait time

Count

Scalability

Scalability

Wait time

Count

Weighed Round Robin

Single server

Scalability

Scalability

Wait time

Count

Weighed Round Robin

Single server

Least Connections

Scalability

Wait time

Count

Beware of

blackholing

traffic

Weighed Round Robin

Single server

Least Connections

ELB’s own scaling is a mix of preemptive, based

on the instance capacity you add, and reactive,

based on the load you receive.

CloudWatch and Auto Scaling

All load balancer metrics can be used for Auto Scaling.

Allow you to scale dynamically based on the load

balancers' view of the application.

Important to consider all metrics when using

Auto Scaling; may not be aware of resource

contention on another metric.

You may be at peak multiple times a day.

13 CloudWatch metrics provided for each load

balancer.

Provide detailed insight into the health of the load

balancer and application stack.

CloudWatch alarms can be configured to notify or

take action, in case any metric goes outside of the

acceptable range.

All metrics provided at 1-minute granularity.

Amazon CloudWatch metrics

HealthyHostCount

The count of the number of healthy instances

in each Availability Zone.

Most common cause of unhealthy hosts are

health check exceeding the allocated timeout.

Test by making repeated requests to the back-

end instance from another Amazon EC2

instance.

View at the zonal dimension.

Latency

Measures the time elapsed in seconds after the request leaves the load

balancer until the response is received.

Test by sending requests to the back-end instance from another instance.

Using minimum, average, and maximum, CloudWatch

stats provide upper and lower bounds for latency.

Debug individual requests using access logs.

SurgeQueue and spillovers

Count of the number of requests that could not be sent to back-end

instances.

Queue up to 1,024 requests per load balancer

node, after which 503 errors will be returned.

Often caused by not being able to open

connections to the back-end instance.

Normally a sign of an underscaled application.

• timestamp

• elb name

• client:port

• backend:port

• request_processing_time

• backend_processing_time

• response_processing_time

• elb_status_code

• backend_state_code

• received_bytes

• sent_bytes

• “request”

• “User-Agent”

• Ciphersuite

• SSL/TLS protocol version

Access logs

2015-05-13T23:39:43.945958Z my-loadbalancer 192.168.131.39:2817

10.0.0.1:80 0.000086 0.001048 0.001337 200 200 0 57 "GET

https://www.example.com:443/ HTTP/1.1” "curl/7.38.0" DHE-RSA-

AES128-SHA TLSv1.2

Global scalability

ELB integrates with Amazon Route 53 latency–based routing and geo-

based routing

Useful for applications where latency is critical

Online advertising bidding

Trading 53

Availability

Seamless and graceful replacement

of instances with

no downtime

Health checks

ELB

EC2

instance

EC2

instance

EC2

instance

Health checks

Support for TCP and HTTP health checks

Customize frequency and failure thresholds

Must return a 2xx response

Think hard about health check “depth”

Idle timeouts allow for connections to be

closed by the load balancer when no

longer in use.

Length of time that an idle connection should be kept open

For both client and back-end connections

Defaults to 60 seconds but can be set between 1 and 3,600

seconds

Timeouts should decrease as you go

up the stack

Idle timeouts

15s

3s

3s

ELB

15sEC2

instances

Amazon S3

Amazon RDS

Amazon SWF

3s

9s

Idle timeouts

Multiple Availability Zones

VPC

EC2

instanceELB

ELBEC2

instance

us-w

est-

1a

us-w

est-

1b

Amazon

Route 53

Protected by Amazon Route 53 health checks

All load balancers scaled to handle loss

of single Availability Zone.

Amazon Route 53 health checks shift

traffic away from the failed Availability

Zone.

Completed within 150 seconds.

No other external or control plane

dependencies.

Health checkers and edge locations

perform the same volume of activity,

whether endpoints are healthy or

unhealthy.

Constant work

time

System activity

Time to react

When nothing is failing, the

volume of API calls is zero. When

failure occurs, the volume of API

calls spikes.

time

System activity

Time to react

Work on failure


VPC

EC2

instanceELB

ELBEC2

instance

us-w

est-

1a

us-w

est-

1b

Amazon

Route 53

Always associate two or more subnets in

different zones with the load balancer

Using multiple Availability Zones

does bring a few challenges

Req

ue

st

co

un

t

Time

Traffic imbalances

DNS caching and spreading

DNS TTLs are generally honored

But sometimes there simply are not enough DNS servers

to spread load around fairly

Mobile networks typically have a dozen or so top-level

resolvers

Enterprise networks may have as few as one

DNS caching by clients and ISPs can often cause clients to target

a specific IP address or stop resolving at all.

Register a wildcard CNAME or ALIAS within Amazon Route 53.

// Create a wildcard CNAME or ALIAS in Route 53.

*.example.com ALIAS … elb-12345.us-east-1.elb.amazon.com

*.example.com CNAME elb-12345.us-east-1.elb.amazon.com

// prepend random content for each lookup made by the application.

PROMPT> dig +short 25a8ade5-6557-4a54-a60e-8f51f3b195d1.example.com

192.0.2.1

192.0.2.2

DNS optimization

http://25a8ade5-6557-4a54-a60e-8f51f3b195d1.example.com


VPC

EC2

instanceELB

ELBEC2

instance

us-w

est-

1a

us-w

est-

1b

Amazon

Route 53


EC2

instanceELB

ELB

us-w

est-

1a

us-w

est-

1b

Amazon

Route 53VPC

Req

ue

st

co

un

t

Time

Traffic imbalances

Cross-zone enabled

Load balancer absorbs impact of DNS caching

Eliminates imbalances in back-end instance utilization

Requests distributed evenly across multiple

Availability Zones

Check connection limits before enabling

No additional bandwidth charge for

cross-zone traffic

Cross-zone load balancing

Integrated with AWS CloudFormation, AWS Opsworks,

AWS Elastic Beanstalk, Amazon EC2 Container Service,

Amazon API Gateway, Asgard

Load balancers are a common gateway for blue/green

deployments

Load balancers can be managed programmatically for

immutable deployments

ELB and DevOps

Remember to complete

your evaluations!

Thank you!

Technology

(CMP401) Elastic Load Balancing Deep Dive and Best Practices