Deterministic capacity planning for OpenStack as elastic cloud infrastructure

Preview:

DESCRIPTION

Capacity planning for elastic cloud infrastructure platforms like OpenStack is critical for successful deployments. The proper sizing of compute resources within OpenStack allows for easier scheduling, optimal efficiency in hardware utilization, and consistency of resource allocation. Google Compute Engine and Amazon Web Services offer deterministic compute resources designed to meet both cloud provider business requirements and cloud consumer service-level requirements. In this session, we'll explore these public provider approaches, extend them to OpenStack, and provide sizing data and tools to help with your deployment. In this session, Keith Basil, Sean Cohen, and Tushar Katarki discuss: -Approaches for providing consistent compute service levels in OpenStack. -Building instance families for your workloads. -Sizing compute node for OpenStack. -Storage & Network sizing or elastic clouds - Capacity planning tools & benchmarks

Citation preview

Deterministic capacity planningfor OpenStack

Keith BasilPrincipal Product Manager, Red Hat

Sean CohenPrincipal Product Manager, Red Hat

Tushar KatarkiPrincipal Product Manager, Red Hat

http://sharpwriter.deviantart.com/art/Welcome-to-the-Internet-Please-Follow-me-322248378http://creativecommons.org/licenses/by-nc-nd/3.0/

devOps headband, BOFH Slayer gun handle and OpenStack unicorn branding added for effect. Not for redistribution.

AGENDA

✦ OpenStack as an Elastic Cloud✦ Determinism in Infrastructure✦ Compute for Elastic Clouds✦ Storage for Elastic Clouds ✦ Networking for Elastic Clouds✦ Putting It All Together

Keith Basil

personalVirginia hare scrambler, plays chess..

professionalRed HatCloudscaling, Time Warner Cable,FederalCloud.com, Cisco and a couple of startups

blendedskype/twitter/github/irc, life: noslzzp

Sean Cohen

personal Jazzman, oil painting & tennis...

professionalRed HatDot Hill Systems, CloverleafCommunications, VerticalNet

blendedskype: sean.redhat, irc: scohen

Tuskar Katarki

personalTwo kids and the wife, squash, hike/bike

professionalRed Hat15 years in IT infrastructure developmentSun Microsystems, Oracle

Hello.. I’m Your Elastic Cloud.

H E L L Omy name is

OpenStack

OpenStack ...

✦Is open source software and vibrant community

✦Provides a framework for an elastic cloud

✦Benefits from deterministic deployment approaches

Elastic Cloud != Enterprise Virtualization

Elastic Cloud Workloads

✦Applications expect failure

✦Smaller stateless VMs

✦Applications scale out horizontally with VMs of predetermined capacity

✦Lifecycle measured in hours to minutes

Enterprise Virt Workloads

✦Workloads NOT designed to tolerate failure

✦Larger stateful VMs

✦Workloads scale up within custom VMs(more vCPU, vRAM)

✦Lifecycle measured in years

Scale Up- Servers are like pets.

Scale Out- Servers are like cattle.

Difference in the resource requests?

I want 6 vCPUs, 4 GB and 120Gb disk please.

One is user determined. One is provider determined.8)

I want an m1.small

please8)

I would like an m1.medium VM please!

Umm, Do I know you? I need to see some papers!!

Keystone

Ok, we need to find a place to build this

VM.Nova

Tag - you’re it!

instance

capacity capacity

capacity Papers are good. Time to get to

work!Nova

NodeNeutron, I need a network

with all the trimmings!Neutron

Here’s your IP, default route and FW settings.

Cinder, have that volume ready for

me?

Node

Indeed I do. Don’t forget to mount it!

SwiftGlance

Hey Glance, can I get the RHEL 6.4 image?

Node

8)

OpenStack in 2 Minutes!

Thank you OpenStack!!

8)

It’s rendering time!

Your Mission, Should You Chose to Accept It..

“If you’re going to do operations reliably, you need to make it reproducible and programmatic.”

“Applications are what matter. Anything that gets apps deployed faster and helps companies manage the proliferation of apps is good. Hence, DevOps.”

- Mark Imbriaco VP of Ops, Digital Ocean

- Mike LoukidesWhat is DevOps?

http://sharpwriter.deviantart.com/art/Welcome-to-the-Internet-Please-Follow-me-322248378http://creativecommons.org/licenses/by-nc-nd/3.0/

devOps headband, BOFH Slayer gun handle and OpenStack unicorn branding added for effect. Not for redistribution.

The goal is to keep your devOps heroes in play!

Determinism in Infrastructure

Let's Break The Myth...

There is no such thing as

“infinite scale” in cloud computing

All computing requests, even for virtualized resources, ultimately map to

physical device —> finite resources

✦ Every provider has limits, even if they’re massive.

✦ Adding the word Cloud simply squeezes the limit balloon

✦ It doesn’t eliminate the issue, even with “elasticity.”

✦ The service provider is responsible for risk mitigation of the capacity it rents.

Capacity Planning in a the Cloud

Infrastructure as “building” code

Why History matters..

✦Capacity planning and performance monitoring in the context of Public providers:✦Can be done only by understand the history of a specific cloud provider. ✦Requires both cloud performance application to understand✦Current state of the provider✦Performance history over a given period of time.

Cloud tenants have a service level expectation

Cloud Operators have business constraints

Implicit contract

8^)

Operators

RULE!

8^)

Unicorns

RULE!

8^)

8^)

devOps

FTW!

8^)

BOFH

Slayer!

8^)

# root

8^)

8^)

Unicorns

RULE!

8^)

Unicorns

RULE!

Implicit Contract

8^)

uid=0

Operator Tenants

Capacity Planning in the Cloud•Cloud users buy services based on capacity, protected by SLA•Cloud provider need deterministic capacityplanning to support the elastic growth

8^)

Operators

RULE!

8^)

Unicorns

RULE!

8^)

8^)

devOps

FTW!

8^)

BOFH

Slayer!

8^)

# root

8^)

8^)

Unicorns

RULE!

8^)

Unicorns

RULE!

Implicit Contract

8^)

uid=0

Operator Tenants

Deterministic Capacity Planning

✦Determinism is the best measure we have for predicting the effort and expense of making a process consistently performant✦When your service becomes a critical part of a customer’s infrastructure, their fate becomes wedded to the SLA’s you deliver. ✦ In Cloud Computing, the service’s performance will not be measured by its average speed but by the consistency of its speed

Modeling Performances

✦Using this information, we’re able to more accurately determine the capacity of a Public provider✦ Monitoring performance spikes and valleys over time. ✦This means we can more accurately model for performance, and thus capacity.

Benchmarks can provide useful insight for performance analysis and capacity planning

http://cloudharmony.com/benchmarks

Deterministic Concepts & Goals

AWS and GCE as models

You want 2048, not Tetris®

✦ Scheduling made easy

✦ Scaling made easy

✦ Optimal hardware use (no holes or hot spots)

✦ Performance consistency

How do we achieve determinism for these core OpenStack services?

Compute for Elastic Clouds

ComputeInstance Family

Solving resource contention in Compute

CPU

DiskMemory

1/1

1/2

1/4

1/8

n1-standard-8

n1-standard-4

n1-standard-2

n1-standard-1

m1.xlarge

m1.large

m1.medium

m1.small

m1.classn1-standard.class

xlarge

large

medium

small

Public Cloud VM Instances Exposed!

We can take this approach with OpenStack

xlargelarge medium

small

Solve for the biggest VM in the class

We can easily derive the entire instance family because smaller instances are fractional proportions of the largest.

This facilitates efficient hardware use and scheduling.

1/1 1/2 1/4 1/8

xlarge

Efficient Bin-Packing with Fractional Proportions

xlarge

Compute Hardware Node (general compute instance family)

128GB memory, (16) 1TB disks, (2) E5-2670 CPU

xlarge

small

small

small

small

small

small

small

small

medium medium

medium medium

xlarge xlarge

small

small

small

small

small

small

small

smallGiven the machine config below, it would support:

(4) n1-standard-8-d(8) n1-standard-4-d(16) n1-standard-2-d(32) n1-standard-1-d

(8) m1.xlarge(16) m1.large(32) m1.medium(64) m1.small

large

large

large

Efficient Scheduling with Fractional Proportions

MEMORY OPTIMIZED NODE

small

small

small

small

medium

medium medium

xlarge

medium medium

small

small

large

large

GENERAL COMPUTE NODE

xlarge

small

small

small

small

medium medium

medium medium

xlarge

large

General Purpose Instance Families✦ n1-standard✦ m1✦ A1 - A4

CPU OPTIMIZED NODE

small

small

small

small

small

small

small

small

medium

xlarge

medium medium

small

small

large

large

Memory Optimized Instance Families✦ n1-highmem✦ m2,cr1✦ A5 - A7

CPU Optimized Instance Families✦ n1-highcpu✦ c1,cc2,c3

sche

dulin

g

sche

dulin

g

sche

dulin

g

Compute Calculator Intro

Designed to help determine optimal compute hardware configurations

✦Visually shows resource constraints

✦Allows custom instance families

✦Walk through

Storage for Elastic Clouds

Block StorageVolume Types

Solving resource contention in Block Storage

Throughput

General StoragePerformance(IOPS/latency)

What Are the Public Clouds Doing with Storage?

Performance Optimized – ✦ guaranteed IOPS (SSDs)✦ IOPS per GB with low latency✦ for I/O intensive workloads✦ Billed by size and IO usage

Capacity Optimized (standard) – ✦no IOPS guarantees✦workloads with moderate IO✦Billed by size and IO usage

Blended Approach (Performance Scaled with Capacity) –

✦ Ephemeral disks deprecated!✦ IOPS scale with volume size✦ Attached volume limits✦ Billed by size only

Block Storage Classes in OpenStack

THROUGHPUT OPTIMIZED STORAGE NODEPERFORMANCE OPTIMIZED STORAGE NODE

Performance Optimized Storage✦ all SSDs

GENERAL STORAGE NODE

Throughput Optimized Storage✦ fast SAS drives with RAID 5/6✦ throughput tuned network✦ high bandwidth Internal bus

Capacity (General) Optimized Storage✦ larger SATA HDDs

Cin

der s

ched

ulin

g

Cin

der s

ched

ulin

g

Cin

der s

ched

ulin

g

SSD SSD SSD SSD

HDDHDD HDDHDD

HDDHDD HDDHDD

HDDHDD HDDHDD

HDDHDD HDDHDD

HDDHDD HDDHDD

HDDHDD HDDHDD

HDDHDD HDDHDD

HDDHDD HDDHDD

HDDHDD HDDHDD

HDDHDD HDDHDD

HDDHDD HDDHDD

HDDHDD HDDHDD

HDD HDD

HDD HDD

HDD HDD

HDD HDD

SSD SSD SSD SSD

SSD SSD SSD SSD

SSD SSD SSD SSD

Storage Tiers with OpenStack Cinder

8^)

Operators

RULE!

8^)

1. Define storage back ends

2. Create Volumes Types✦ General✦ Performance✦ Throughput

3. Create Volumes

# cinder create \ --volume_type IOPS_OPTIMIZED_TYPE \ --display_name volume-1 50

TENANT

OPERATOR

✦ Raw capacity of the storage

✦ Replication

✦ RAID type

Capacity (General) Optimized Storage

RAID TYPE2-Way

Replication

3-Way

Replication

RAID5 2.2 3.3

RAID6 2.4 3.6

RAID10 4 n/a

Example:

Twelve (12), 1TB disks, configured for RAID6 and 2-way replication would yield 5.0TB of usable capacity.

12TB / 2.4 = 5.0TB net usable capacity.

✦ IOPS scale linearly with VM count

✦ Limits should be seen as triggers for storage scale out

Performance Optimized Storage

Write Latency

READ Latency

Throughput Optimized Storage

✦ Throughput response matters

✦ The Read/Write mix matters

✦ Influenced by RAID type

41

Storage Planning ● Step 0: What is my Cloud Storage offering?

● Capacity Based

● Performance (IOPS) Based

● Throughput (Bandwidth) Based

● Step 1: What Storage Tiers do I need?

● Capacity Optimized, Performance Optimized, Throughput Optimized

● Step 2: Storage Capacity Planning

● Workload projections

● Performance Observations, Metrics to be optimized, and Calculators

● Step 3: Procure and Deploy

● Step 4: Manage and Steer

● Schedulers

Networking for Elastic Clouds

Core Network

Solving resource contention for the Network

Throughput

ResiliencyLatency

Enterprise vs Cloud Fabric

Traditional Enterprise Topology Modern Cloud Friendly Topology

Network diagrams referenced from http://cto.vmware.com/is-your-cloud-ready-for-big-data/

Network Elasticity is Required..

NODE NODE NODE NODE NODE NODE NODE NODE

NODE NODE

NODE NODE

NODE NODE NODE NODE NODE NODE NODE NODE

NODE NODE

NODE NODE

NODE NODE NODE NODE NODE NODE NODE NODE

NODE NODE

NODE NODE

NODE NODE NODE NODE NODE NODE NODE NODE NODE

BLOCKSTORE

BLOCKSTORE

NODE

NODE NODE NODE NODE NODE NODE NODE

BLOCKSTORE

BLOCKSTORE

NODE

NODE NODE NODE NODE NODE NODE NODE

NODENODE

NODE

BLOCKSTORE

BLOCKSTORE

BLOCKSTORE

BLOCKSTORE

Elastic Cloud Resource Map

NODE

NODE

Because your cloud will grow..

Each unit here could be a server, or a rack of servers.

Core Fabric Requirements

OpenStack friendly networking features:

✦Availability and Resiliency (multi-path, per-flow routing)

✦Resource Node (compute/storage) Data Throughput

✦Network Latency

✦Congestion Management

Spine and Leaf Topology

Ask your friendly network vendor for guidance

Cisco, ARISTA, Brocade, Juniper, Force10, etc.

http://bradhedlund.com/2012/01/25/construct-a-leaf-spine-design-with-40g-or-10g-an-observation-in-scaling-the-fabric/

Putting it All Together

Remember our Hero!

Plan for the Resource Service Level

Compute/StorageNetwork Fabric

Cloud Controller

ResourceService

Level

High level architectureCore

servi

ces

Genera

l Purp

ose

Compu

te

Perform

ance

Storag

e

Genera

l (Cap

acity

)

Storag

e

DeterministicNetwork{

OpenStackCore Services{

DeterministicResources}

Scale Out (as needed)

Questions?

Resources

✦ https://github.com/noslzzp/cloud-resource-calculator

✦ What is DevOps?http://oreil.ly/1jBcsAu - free!

Open source tools includes:✦Graphite✦Ganglia

Public Clouds Benchmarks✦Cloudharmony.com✦Cloudsleuth.com(Global Provider View)

Thank You!

Red Hat Enterprise Linux OpenStack PlatformHigh AvailabilityArthur Berezin — Technical Product Manager, Red HatWednesday, April 162:30 pm - 3:30 pm

Deploying Red Hat Enterprise Linux OpenStack Platform in the enterprise with FlexPodArthur Enright — Field Product Manager, Red HatNetApp and CiscoWednesday, April 163:40 pm - 4:40 pm

Deep dive: OpenStack ComputeSteve Gordon — Technical Product Manager, Red HatThursday, April 179:45 am - 10:45 am

Check out these sessions!

Recommended