32
[at scale] OpenStack Benchmarking Boris Pavlovic Mirantis, 2013

Rally--OpenStack Benchmarking at Scale

Embed Size (px)

DESCRIPTION

OpenStack benchmarking tool--Rally shows you how to detect OpenStack bottlenecks and design issues

Citation preview

Page 1: Rally--OpenStack Benchmarking at Scale

[at scale]

OpenStackBenchmarking

Boris PavlovicMirantis, 2013

Page 2: Rally--OpenStack Benchmarking at Scale

Agenda

● Benchmarking OpenStack at scale○ What? Why? How?

● Rally○ What is Rally?○ Vision○ Examples and results

Page 3: Rally--OpenStack Benchmarking at Scale

● How to ensure that OpenStack works at scale?

● How to detect performance issues quickly and improve OpenStack scalability?

Benchmarking OpenStack

Page 4: Rally--OpenStack Benchmarking at Scale

● Generate load from concurrent users● Capture key metrics--avg/max time, failure rate

○ VM provisioning○ Floating IP allocation○ Snapshot creation

● Verify that the cloud works fine

...

● PROFIT!!!

A straightforward way to benchmark OpenStack

Page 5: Rally--OpenStack Benchmarking at Scale

● Generate load from concurrent users● Capture key metrics--avg/max time, failure rate

○ VM provisioning○ Floating IP allocation○ Snapshot creation

● Verify that the cloud works fine

...

● PROFIT!!!

… but what if it breaks apart?

A straightforward way to benchmark OpenStack

Page 6: Rally--OpenStack Benchmarking at Scale

Incorrect deployment setup?

Page 7: Rally--OpenStack Benchmarking at Scale

Non-optimal hardware?

Page 8: Rally--OpenStack Benchmarking at Scale

Bug in the code?

Page 9: Rally--OpenStack Benchmarking at Scale

Did you take enough time to educate yourself?

;)

RTFM

Page 10: Rally--OpenStack Benchmarking at Scale

Really?

Page 11: Rally--OpenStack Benchmarking at Scale

Read the docs… (after an hour)

Page 12: Rally--OpenStack Benchmarking at Scale

There should be an

Page 13: Rally--OpenStack Benchmarking at Scale

● 3 common approaches:○ Use better hardware○ Deploy better○ Make the code better

Improve OS cloud performance and scalability

Page 14: Rally--OpenStack Benchmarking at Scale

● 3 common approaches:○ Use better hardware○ Deploy better○ Make the code better

● But we need to know data points○ Which part of the code is a bottleneck?○ What hardware limits are hit, if any? ○ How deployment topology influences

performance?

Improve OS cloud performance and scalability

Page 15: Rally--OpenStack Benchmarking at Scale

RALLY

Shine a light in the darkness

Page 16: Rally--OpenStack Benchmarking at Scale

● Rally is a community-based project that allows OpenStack developers and operators to get relevant and repeatable benchmarking data of how their cloud operates at scale.

● Wiki https://wiki.openstack.org/wiki/Rally

What is Rally?

Page 17: Rally--OpenStack Benchmarking at Scale

● Different types of user-defined workloads○ For developers: synthetic tests, stress tests

○ For operators: real-life cloud usage patterns

● Flexible reporting○ For developers: low-level profiling data, bottlenecks

○ For operators: high-level data about cloud

performance, highlights of bottlenecks within their use case

Relevant to both devs and operators

Page 18: Rally--OpenStack Benchmarking at Scale

RALLY

How Rally works

Deploy OpenStack

cloud

Deploy engines

DevStack

Fuel

Dummy

Run specified scenarios

Get results

Server Providers

Virsh

OpenStack

LXC

Amazon

Get results● Execution

time breakdown

● Failure rates● Graphics● Profiling data

Parameters ● Number of

users● Number of

tenants● Concurrency● Type of

workload● Duration

Page 19: Rally--OpenStack Benchmarking at Scale

Benchmarking scenarios

Real-life workloads

Synthetic workloads

OpenStackcloud

Workload 1

Workload 2

Workload 3

Results

Data for Developers

- Low-level profiling- Tomograph results- Graphs

Data for Stakeholders

- Historical data- SLAs- Bottlenecks

Page 20: Rally--OpenStack Benchmarking at Scale

● Put stress test on various OpenStack components○ Large number of provisioned VMs per second○ Large number of provisioned volumes per second○ Large number of uploaded images per second ○ Large amount of active resources (VMs/images/volumes)

● Expose bottlenecks and uncover design issues in OpenStack

● Create a golden standard for everyone in the community to validate against

Synthetic tests for developers

Page 21: Rally--OpenStack Benchmarking at Scale

How did we deploy OpenStack?

● Using Fuel● On real hardware● 3 physical controllers● 500+ physical compute nodes● In HA deployment mode with Galera,

HAProxy, Corosync, Pacemaker

Page 22: Rally--OpenStack Benchmarking at Scale

Large numbers of active

VMs shouldn’t affect

provision of new VMs

Large number of active VMs

Page 23: Rally--OpenStack Benchmarking at Scale

Large number of concurrent users

Average time of

booting and deleting VMs with different numbers of concurrent users

Page 24: Rally--OpenStack Benchmarking at Scale

Profiling with Tomograph and Zipkin

Highlights:

● Launch 3 VMs○ 336 DB queries ○ 74 RPC calls

● Delete 3 VMs under high load○ 1 minute global DB lock

on quotas table

Page 25: Rally--OpenStack Benchmarking at Scale

● Rationale○ In the real world, scenarios are more complicated, than “boot-destroy”

immediately○ Workloads rarely change--OpenStack and its topology/configuration

change often ○ Profiles are specific for businesses

● Expected outcome○ Let companies specify their existing workload and benchmark cloud

according to this workload ○ Let companies share

Why real workloads in addition to synthetic?

Page 26: Rally--OpenStack Benchmarking at Scale

What to benchmark

Provision VMs Use VMs Destroy VMs

How long (on average)? How long (on average)?

How long (maximum)? How long (maximum)?

Success rate? Success rate?

1.

2.

3.

Page 27: Rally--OpenStack Benchmarking at Scale

Detailed benchmark of each step

Provision VMs Use VMs Destroy VMs

nova

-api

nova

-db

sche

dule

com

pute

netw

ork

glan

ce

1s 2s 9s 4s 8s 2m

nova

-api

nova

-db

com

pute

netw

ork

nova

-dd

1s 2s 9s 4s 8s

Page 28: Rally--OpenStack Benchmarking at Scale

Another workload representation

What it shows

● Areas of biggest concern

● A baseline for all future changes (OpenStack version, deployment topology, Neutron plugin)

Page 29: Rally--OpenStack Benchmarking at Scale

What we ultimately want to achieve

● Provide a mechanism to easily define workloads

● Let users benchmark their cloud within specified workload

● Provide historical data on all applied optimizations to see if they are heading to better performance

Page 30: Rally--OpenStack Benchmarking at Scale

● Greatly improve profiling capabilities to quickly pinpoint problem location

● Extend workload definitions to support richer and more realistic tests, combine workloads

● Support historical data and provide means of comparison/analytics

● Better correlation between business KPIs and reporting

Roadmap

Page 31: Rally--OpenStack Benchmarking at Scale
Page 32: Rally--OpenStack Benchmarking at Scale

Join Rally community

● It’s up to you to make Rally better

● Join our team:○ Wiki: https://wiki.openstack.org/wiki/Rally○ Project space: https://launchpad.net/rally○ IRC chat: #openstack-rally on irc.freenode.net