Upload
mirantis
View
4.863
Download
1
Embed Size (px)
DESCRIPTION
OpenStack benchmarking tool--Rally shows you how to detect OpenStack bottlenecks and design issues
Citation preview
[at scale]
OpenStackBenchmarking
Boris PavlovicMirantis, 2013
Agenda
● Benchmarking OpenStack at scale○ What? Why? How?
● Rally○ What is Rally?○ Vision○ Examples and results
● How to ensure that OpenStack works at scale?
● How to detect performance issues quickly and improve OpenStack scalability?
Benchmarking OpenStack
● Generate load from concurrent users● Capture key metrics--avg/max time, failure rate
○ VM provisioning○ Floating IP allocation○ Snapshot creation
● Verify that the cloud works fine
...
● PROFIT!!!
A straightforward way to benchmark OpenStack
● Generate load from concurrent users● Capture key metrics--avg/max time, failure rate
○ VM provisioning○ Floating IP allocation○ Snapshot creation
● Verify that the cloud works fine
...
● PROFIT!!!
… but what if it breaks apart?
A straightforward way to benchmark OpenStack
Incorrect deployment setup?
Non-optimal hardware?
Bug in the code?
Did you take enough time to educate yourself?
;)
RTFM
Really?
Read the docs… (after an hour)
There should be an
● 3 common approaches:○ Use better hardware○ Deploy better○ Make the code better
Improve OS cloud performance and scalability
● 3 common approaches:○ Use better hardware○ Deploy better○ Make the code better
● But we need to know data points○ Which part of the code is a bottleneck?○ What hardware limits are hit, if any? ○ How deployment topology influences
performance?
Improve OS cloud performance and scalability
RALLY
Shine a light in the darkness
● Rally is a community-based project that allows OpenStack developers and operators to get relevant and repeatable benchmarking data of how their cloud operates at scale.
● Wiki https://wiki.openstack.org/wiki/Rally
What is Rally?
● Different types of user-defined workloads○ For developers: synthetic tests, stress tests
○ For operators: real-life cloud usage patterns
● Flexible reporting○ For developers: low-level profiling data, bottlenecks
○ For operators: high-level data about cloud
performance, highlights of bottlenecks within their use case
Relevant to both devs and operators
RALLY
How Rally works
Deploy OpenStack
cloud
Deploy engines
…
DevStack
Fuel
Dummy
Run specified scenarios
Get results
Server Providers
…
Virsh
OpenStack
LXC
Amazon
Get results● Execution
time breakdown
● Failure rates● Graphics● Profiling data
Parameters ● Number of
users● Number of
tenants● Concurrency● Type of
workload● Duration
Benchmarking scenarios
Real-life workloads
Synthetic workloads
OpenStackcloud
Workload 1
Workload 2
Workload 3
Results
Data for Developers
- Low-level profiling- Tomograph results- Graphs
Data for Stakeholders
- Historical data- SLAs- Bottlenecks
● Put stress test on various OpenStack components○ Large number of provisioned VMs per second○ Large number of provisioned volumes per second○ Large number of uploaded images per second ○ Large amount of active resources (VMs/images/volumes)
● Expose bottlenecks and uncover design issues in OpenStack
● Create a golden standard for everyone in the community to validate against
Synthetic tests for developers
How did we deploy OpenStack?
● Using Fuel● On real hardware● 3 physical controllers● 500+ physical compute nodes● In HA deployment mode with Galera,
HAProxy, Corosync, Pacemaker
Large numbers of active
VMs shouldn’t affect
provision of new VMs
Large number of active VMs
Large number of concurrent users
Average time of
booting and deleting VMs with different numbers of concurrent users
Profiling with Tomograph and Zipkin
Highlights:
● Launch 3 VMs○ 336 DB queries ○ 74 RPC calls
● Delete 3 VMs under high load○ 1 minute global DB lock
on quotas table
● Rationale○ In the real world, scenarios are more complicated, than “boot-destroy”
immediately○ Workloads rarely change--OpenStack and its topology/configuration
change often ○ Profiles are specific for businesses
● Expected outcome○ Let companies specify their existing workload and benchmark cloud
according to this workload ○ Let companies share
Why real workloads in addition to synthetic?
What to benchmark
Provision VMs Use VMs Destroy VMs
How long (on average)? How long (on average)?
How long (maximum)? How long (maximum)?
Success rate? Success rate?
1.
2.
3.
Detailed benchmark of each step
Provision VMs Use VMs Destroy VMs
nova
-api
nova
-db
sche
dule
com
pute
netw
ork
glan
ce
1s 2s 9s 4s 8s 2m
nova
-api
nova
-db
com
pute
netw
ork
nova
-dd
1s 2s 9s 4s 8s
Another workload representation
What it shows
● Areas of biggest concern
● A baseline for all future changes (OpenStack version, deployment topology, Neutron plugin)
What we ultimately want to achieve
● Provide a mechanism to easily define workloads
● Let users benchmark their cloud within specified workload
● Provide historical data on all applied optimizations to see if they are heading to better performance
● Greatly improve profiling capabilities to quickly pinpoint problem location
● Extend workload definitions to support richer and more realistic tests, combine workloads
● Support historical data and provide means of comparison/analytics
● Better correlation between business KPIs and reporting
Roadmap
Join Rally community
● It’s up to you to make Rally better
● Join our team:○ Wiki: https://wiki.openstack.org/wiki/Rally○ Project space: https://launchpad.net/rally○ IRC chat: #openstack-rally on irc.freenode.net