Upload
buildacloud
View
1.612
Download
1
Embed Size (px)
DESCRIPTION
CloudStack Performance Testing: Challenges, Learning, Results By Sowmya Krishnan
Citation preview
CloudStack Performance TestingChallenges, Learning, Results
Sowmya Krishnan
Apache CloudStack
• Turnkey platform for delivering IaaS clouds• Highly scalable• Flexible– Hypervisor agnostic– Network topologies– Multiple storage options– Scales to tens of thousands of hypervisors
Typical Deployment
Router
L3 Core SwitchTop of Rack
Switch
………… …Availability Zone 1
Servers
Management Server Cluster
Object Storage
Pod 1 Pod 2 Pod 3 Pod N
Primary MySQL
Load Balancer
Admin
Internet
Backup MySQL
Challenge of creating a production scale performance set up
What we want…• 30,000 hosts• 100s of Storage nodes• High Network bandwidth• High end switches• …What we have in a QA Lab…Handful of serversOne/Two Storage poolsShared switch
Scalability Challenges
Continuous monitoring of the infrastructureHosts up, VMs up, Routers, Capacity…
Orchestration of large number of resourcesClustered management serversOutage of a hypervisor hostOutage of a management serverOutage of network
Solution?
SIMULATOR
Simulator
User API
Admin API
Load Balancer
Mgmt. Server
Mgmt. Server
Mgmt. Server
MySQL
Zone Simulator
MySQL
Mgmt. Server
Simulator• Hypervisor type in itself• Implements commonly used commands of hypervisor host
– Start/Stop– List VMs in the host– Program Security Group Rules– Add host to maintenance mode
• Deploy Simulator VMs– Using a Simulator Template– Start/Stop– Migrate– Volume attached– Take snapshot of the volume
• Storage pools
Test Setup Configurations
• Management server hosts– Dual core Intel(R) Xeon(R) CPU, 2.27GHz, ht enabled, 4
processor– 16 G RAM
• Database– Quad-Core AMD Opteron(tm), 2.1GHz, ht enabled, 8
processor– 32 G RAM
• Java heap size – (5 – 12) GB [Depending on use case]
Demo
Router Check interval
Redundant Virtual Router setup
4000 Accounts => 8000 RVRs
RouterStatusMonitor task which runs for all routers to monitor each router statusDefault check interval = 5 mins
After about an hour, interval between successive checks shot up to 56 mins
Fix:Introduce a new global config to configure the pool size for router checks
Pool size = 100 8000/100 = 80 * 3 s (to perform RVR check)
= 240 s (4 min) to complete RVR check tasks
Router Check interval
router.check.poolsize
Async Job Response Time
As number of VMs increase time taken to deploy instances increased?Not alwaysIf Network id is not passed, CS spends significant time searching for the network to place the VMAfter Fix:
Deploy VM time
Baselines For common Use cases
CPU UtilizationDB connectionsSteady State Measures – when no external API is being firedTime taken for Async job [Deploy VM] to completeMS failures and time to reconnect all hostsList API response time, with varied page size parameters
Future work
• Enhance Simulator Features• VR performance• Hypervisor specific agents• Storage Metrics• …
mvn install -Dsimulatorcp agent-simulator/tomcatconf/components-simulator.xml.inclient//target/cloud-client-ui-4.1.0-SNAPSHOT/WEB-INF/classes/components-simulator.xmlEdit client//target/cloud-client-ui-4.1.0-SNAPSHOT/WEB-INF/classes/environment.properties to point to the new xmlmvn -P developer -pl developer -Ddeploydbmvn -P developer -pl developer -Ddeploydb-simulatormvn -pl :cloud-client-ui jetty:run -Dsimulator
Q&A