CloudStack Performance Testing

Preview:

DESCRIPTION

CloudStack Performance Testing: Challenges, Learning, Results By Sowmya Krishnan

Citation preview

CloudStack Performance TestingChallenges, Learning, Results

Sowmya Krishnan

Apache CloudStack

• Turnkey platform for delivering IaaS clouds• Highly scalable• Flexible– Hypervisor agnostic– Network topologies– Multiple storage options– Scales to tens of thousands of hypervisors

Typical Deployment

Router

L3 Core SwitchTop of Rack

Switch

………… …Availability Zone 1

Servers

Management Server Cluster

Object Storage

Pod 1 Pod 2 Pod 3 Pod N

Primary MySQL

Load Balancer

Admin

Internet

Backup MySQL

Challenge of creating a production scale performance set up

What we want…• 30,000 hosts• 100s of Storage nodes• High Network bandwidth• High end switches• …What we have in a QA Lab…Handful of serversOne/Two Storage poolsShared switch

Scalability Challenges

Continuous monitoring of the infrastructureHosts up, VMs up, Routers, Capacity…

Orchestration of large number of resourcesClustered management serversOutage of a hypervisor hostOutage of a management serverOutage of network

Solution?

SIMULATOR

Simulator

User API

Admin API

Load Balancer

Mgmt. Server

Mgmt. Server

Mgmt. Server

MySQL

Zone Simulator

MySQL

Mgmt. Server

Simulator• Hypervisor type in itself• Implements commonly used commands of hypervisor host

– Start/Stop– List VMs in the host– Program Security Group Rules– Add host to maintenance mode

• Deploy Simulator VMs– Using a Simulator Template– Start/Stop– Migrate– Volume attached– Take snapshot of the volume

• Storage pools

Test Setup Configurations

• Management server hosts– Dual core Intel(R) Xeon(R) CPU, 2.27GHz, ht enabled, 4

processor– 16 G RAM

• Database– Quad-Core AMD Opteron(tm), 2.1GHz, ht enabled, 8

processor– 32 G RAM

• Java heap size – (5 – 12) GB [Depending on use case]

Demo

Router Check interval

Redundant Virtual Router setup

4000 Accounts => 8000 RVRs

RouterStatusMonitor task which runs for all routers to monitor each router statusDefault check interval = 5 mins

After about an hour, interval between successive checks shot up to 56 mins

Fix:Introduce a new global config to configure the pool size for router checks

Pool size = 100 8000/100 = 80 * 3 s (to perform RVR check)

= 240 s (4 min) to complete RVR check tasks

Router Check interval

router.check.poolsize

Async Job Response Time

As number of VMs increase time taken to deploy instances increased?Not alwaysIf Network id is not passed, CS spends significant time searching for the network to place the VMAfter Fix:

Deploy VM time

Baselines For common Use cases

CPU UtilizationDB connectionsSteady State Measures – when no external API is being firedTime taken for Async job [Deploy VM] to completeMS failures and time to reconnect all hostsList API response time, with varied page size parameters

Future work

• Enhance Simulator Features• VR performance• Hypervisor specific agents• Storage Metrics• …

mvn install -Dsimulatorcp agent-simulator/tomcatconf/components-simulator.xml.inclient//target/cloud-client-ui-4.1.0-SNAPSHOT/WEB-INF/classes/components-simulator.xmlEdit client//target/cloud-client-ui-4.1.0-SNAPSHOT/WEB-INF/classes/environment.properties to point to the new xmlmvn -P developer -pl developer -Ddeploydbmvn -P developer -pl developer -Ddeploydb-simulatormvn -pl :cloud-client-ui jetty:run -Dsimulator

Q&A

Recommended