Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webinar Series

Preview:

Citation preview

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Adam Boeglin, HPC Solutions Architect

Wednesday, May 3, 2023

Choosing the Right EC2 Instance and Applicable Use Cases

Understanding the factors that going into choosing an EC2 instance

Defining system performance and how it is characterized for different workloads

How Amazon EC2 instances deliver performance while providing flexibility and agility

How to make the most of your EC2 instance experience through the lens of several instance types

What to Expect from the Session

InstancesAPI

Networking

EC2EC2

Purchase options

Amazon Elastic Compute Cloud is Big

Host ServerHypervisor

Guest 1 Guest 2 Guest n

Amazon EC2 Instances

In the past

First launched in August 2006 M1 Instance

“One size fits all” M1

2006 2008 2010 2012 2014 2016

m1.small

m1.largem1.xlarge

c1.mediumc1.xlarge

m2.xlarge

m2.4xlargem2.2xlarge

cc1.4xlarge

t1.micro

cg1.4xlarge

cc2.8xlarge

m1.medium

hi1.4xlarge

m3.xlargem3.2xlarge

hs1.8xlarge

cr1.8xlarge

c3.largec3.xlarge

c3.2xlargec3.4xlargec3.8xlargeg2.2xlarge

i2.xlargei2.2xlargei2.4xlargei2.4xlarge

m3.mediumm3.large

r3.larger3.xlarger3.2xlarge

r3.4xlarger3.8xlarge

t2.microt2.smallt2.med

c4.largec4.xlargec4.2xlargec4.4xlargec4.8xlarge

d2.xlarged2.2xlarged2.4xlarged2.8xlarge

g2.8xlarge

t2.largem4.large

m4.xlargem4.2xlargem4.4xlarge

m4.10xlarge

Amazon EC2 Instances History

x1.32xlarge

t2.nano

Instance generation

c4.largeInstance family

Instance size

Choices and Flexibility

Choice of Processor Memory Storage Options Accelerated Graphics Burstable Performance

Servers are hired to do jobs Performance is measured differently depending on the job

Hiring a Server

?

Performance Factors

Resource Performance factors Key indicatorsCPU Sockets, number of cores, clock

frequency, bursting capabilityCPU utilization, run queue length

Memory Memory capacity Free memory, anonymous paging, thread swapping

Network interface

Max bandwidth, packet rate Receive throughput, transmit throughput over max bandwidth

Disks Input / output operations per second, throughput

Wait queue length, device utilization, device errors

Resource Utilization

For given performance, how efficiently are resources being used

Something at 100% utilization can’t accept any more work

Low utilization can indicate more resource is being purchased than needed

Example: Web Application MediaWiki installed on Apache with 140 pages of content Load increased in intervals over time

Example: Web Application Memory stats

Example: Web Application Disk stats

Example: Web Application Network stats

Example: Web Application CPU stats

“Launching new instances and running tests in parallel is easy…[when choosing an instance] there is no substitute for measuring the performance of your full application.”

- EC2 Documentation

How not to choose an EC2 instance

Brute Force Testing Ignoring Metrics Favoring old generation instances Guessing based on what you already have

EC2 Instance Families

General purpose

Computeoptimized

C3

Storage and IOoptimized

I2 G2

GPUenabled

Memoryoptimized

R3C4M4D2

X1

Give back instances as easily as you can acquire new ones Find an ideal instance type and workload combination EC2 Instance Pages provide “Use Case” Guidance With EBS, storage and instance size don’t need to be coupled

Instance Selection = Performance Tuning

Instance sizing

c4.8xlarge 2 - c4.4xlarge

4 - c4.2xlarge

8 - c4.xlarge

Choosing the right size

Understand your unit of work Web request Database / Table Batch Process

What is that unit’s requirements? CPU threads Memory Constraints Disk & Network

What are it’s availability requirements?

CPU Instructions and Protection Levels

CPU has at least two protection levels. Privileged instructions can’t be executed in user mode to protect

system. Applications leverage system calls to the kernel.

Kernel

Application

VMM

Application

Kernel

PV

X86 CPU Virtualization: Prior to Intel VT-x

Binary translation for privileged instructions Para-virtualization (PV)

PV requires going through the VMM, adding latency Applications that are system call bound are most affected

KernelApplication

VMM

PV-HVM

X86 CPU Virtualization: After Intel VT-x

Hardware assisted virtualization (HVM) PV-HVM uses PV drivers opportunistically for operations that

are slow emulated: e.g., network and block I/O

Tip: Use HVM AMIs with EBS

Time Keeping Explained

Time keeping in an instance is deceptively hard gettimeofday(), clock_gettime(), QueryPerformanceCounter() The TSC

CPU counter, accessible from userspace Requires calibration, vDSO Invariant on Sandy Bridge+ processors

Xen pvclock; does not support vDSO On current generation instances, use TSC as clocksource

# cat /sys/devices/system/cl*/cl*/available_clocksourcexen tsc hpet acpi_pm # cat /sys/devices/system/cl*/cl*/current_clocksourcetsc

# vi /boot/grub/menu.list && reboot

# cat /proc/cmdlineroot=/dev/xvda1 ro clock source=tsc

Change with:

Tip: Use TSC as clocksource

Review: C4 Instances Custom Intel E5-2666 v3 at 2.9 GHz P-state and C-state controls

Model vCPU Memory (GiB) EBS (Mbps)c4.large 2 3.75 500c4.xlarge 4 7.5 750c4.2xlarge 8 15 1,000c4.4xlarge 16 30 2,000c4.8xlarge 36 60 4,000

Batch & HPC workloads, Game Servers, Ad Serving, & High Traffic Web Servers

What’s new in C4: P-state and C-state control

Intel Turbo Boost up to 3.5Ghz By entering deeper idle states, non-idle cores can achieve up to 300MHz

higher clock frequencies But… deeper idle states require more time to exit, may not be appropriate

for latency-sensitive workloads

Tip: P-state control for AVX2

If an application makes heavy use of AVX2 on all cores, the processor may attempt to draw more power than it should

Processor will transparently reduce frequency Frequent changes of CPU frequency can slow an application

Review: T2 Instances Lowest cost EC2 instance at $0.0065 per hour Burstable performance Fixed allocation enforced with CPU credits

Model vCPU Baseline CPU Credits / Hour

Memory (GiB)

Storage

t2.nano 1 5% 3 .5 EBS Onlyt2.micro 1 10% 6 1 EBS Onlyt2.small 1 20% 12 2 EBS Onlyt2.medium 2 40%** 24 4 EBS Onlyt2.large 2 60%** 36 8 EBS Only

General Purpose, Web Serving, Developer Environments, Small Databases

How Credits Work

A CPU credit provides the performance of a full CPU core for one minute

An instance earns CPU credits at a steady rate An instance consumes credits when active Credits expire (leak) after 24 hours

Baseline rate

Credit balance

Burstrate

Tip: Monitor CPU credit balance

Tip: How to Interpret Steal Time

Fixed CPU allocations of CPU can be offered through CPU caps Steal time happens when CPU cap is enforced Leverage CloudWatch metrics

Announced: X1 Instances

Largest memory instance with 2TB of DRAM Quad socket, Intel E7 processors with 128 vCPUs

Model vCPU Memory (GiB) Local Storage

x1.32xlarge 128 1952 2x 1920GB

In-Memory Databases, Big Data Processing, HPC Workloads

NUMA

Non-uniform memory access Each processor in a multi-CPU system has local memory that is

accessible through a fast interconnect Each processor can also access memory from other CPUs, but

local memory access is a lot faster than remote memory Performance is related to the number of CPU sockets and how they

are connected - Intel QuickPath Interconnect (QPI)

QPI

122GB 122GB

16 vCPU’s 16 vCPU’s

r3.8xlarge

QPI

QPI

QPIQPI

QPI

488GB

488GB

488GB

488GB

32 vCPU’s 32 vCPU’s

32 vCPU’s 32 vCPU’s

x1.32xlarge

Tip: Kernel Support for NUMA Balancing

An application will perform best when the threads of its processes are accessing memory on the same NUMA node.

NUMA balancing moves tasks closer to the memory they are accessing. This is all done automatically by the Linux kernel when automatic NUMA

balancing is active: version 3.8+ of the Linux kernel. Windows support for NUMA first appeared in the Enterprise and Data

Center SKUs of Windows Server 2003.

Review: I2 Instances 16 vCPU: 3.2 TB SSD; 32 vCPU: 6.4 TB SSD 365K random read IOPS for 32 vCPU instance

Model vCPU Memory (GiB)

Storage Read IOPS Write IOPS

i2.xlarge 4 30.5 1 x 800 SSD 35,000 35,000i2.2xlarge 8 61 2 x 800 SSD 75,000 75,000i2.4xlarge 16 122 4 x 800 SSD 175,000 155,000i2.8xlarge 32 244 8 x 800 SSD 365,000 315,000

NoSQL Databases, Clustered Databases, Online Transaction Processing (OLTP)

Hardware

Split Driver ModelDriver Domain Guest Domain Guest Domain

VMM

Frontend driver

Frontend driver

Backend driver

DeviceDriver

Physical CPU

Physical Memory

Network Device

Virtual CPU Virtual Memory

CPU Scheduling

Sockets

Application1

23

4

5

Granting in pre-3.8.0 Kernels

Requires “grant mapping” prior to 3.8.0 Grant mappings are expensive operations due to TLB flushes

SSD

Inter domain I/O:(1) Grant memory(2) Write to ring buffer(3) Signal event(4) Read ring buffer(5) Map grants(6) Read or write grants(7) Unmap grants

read(fd, buffer,…)

I/O domain Instance

Granting in 3.8.0+ Kernels, Persistent and Indirect

Grant mappings are setup in a pool once Data is copied in and out of the grant pool

SSD

read(fd, buffer…)

I/O domain Instance

Grant pool

Copy to and from grant pool

2009 – Longer ago than you think Avatar was the top movie in the theaters Facebook overtook MySpace in active users President Obama was sworn into office

The 2.6.32 Linux kernel was released

Tip: Use 3.8+ kernel

Amazon Linux 13.09 or later Ubuntu 14.04 or later RHEL/Centos 7 or later Etc.

Device Pass Through: Enhanced Networking

SR-IOV eliminates need for driver domain Physical network device exposes virtual function to instance Requires a specialized driver, which means:

Your instance OS needs to know about it EC2 needs to be told your instance can use it

Hardware

After Enhanced NetworkingDriver Domain Guest Domain Guest Domain

VMM

NIC Driver

Physical CPU

Physical Memory

SR-IOV Network Device

Virtual CPU Virtual Memory

CPU Scheduling

Sockets

Application1

2

3

NIC Driver

Elastic Network Adapter

Next Generation of Enhanced Networking Hardware Checksums Multi-Queue Support Receive Side Steering

20Gbps in a Placement Group New Open Source Amazon Network Driver

EBS Performance

Instance Size Matters Match your volume size and

type to your instance Use EBS Optimization if EBS

performance is important

Choose HVM AMI’s Time keeping: use TSC C state and P state controls Monitor T2 CPU credits Use a modern Linux kernel NUMA balancing Persistent grants for I/O performance Enhanced networking

Summary: Getting the Most Out of EC2 Instances

Bare metal performance goal, and in many scenarios already there History of eliminating hypervisor intermediation and driver domains

Hardware assisted virtualization Scheduling and granting efficiencies Device pass through

Virtualization Themes

Next steps

Visit the Amazon EC2 documentation Launch an instance and try your app!

Thank you!

Recommended