Performance and Reliability 101 Brent Cromarty Ping Identity [email protected]

Performance and Reliability 101

Brent CromartyPing Identity

[email protected]

A little about me

• Like– Long walks on the beach– Red wine

• Dislike– Mean people– Early mornings

• Encourage questions throughout presentation– Although I may hold off if I am going to address it with

material in a future slide.– You may have to ask again

OK Seriously…

• Spent bulk of my career (14 years) at SAP– By way of Business Objects acquisition

• By way of Crystal Decisions acquisition– By way of Seagate Software IMG

• 5 years experience in customer support– Discovered impatience, dislike for people

• 9 years of performance and reliability (P&R) testing for Crystal Reports in Crystal/Business Objects Enterprise product

• Currently in my second year with Ping Identity

Why are we here?• Types of testing that make up P&R – Design• What is the goal of each test type?• What does it prove/disprove?

– Execution• How is the test run?

– Results Analysis• How to figure out if the test passed or failed?

• Best Practices (Tips/Tricks/Suggestions/Filler)• Suggestions for root cause analysis

So… What are these test types that I speak of?

• Types of P&R tests– Load– Scalability– Endurance– Stress– Reliability

Load

Load Testing• Performance equivalent of functional “smoke” test• Functional test/workflow executed under “load”– Typically “load” is in the form of concurrent users

• Executed with a Load Generator tool– Load Runner, JMeter, QALoad, Grinder, etc…

– Does the component stand up?• Does the test pass functionally? For all users?• Does it crash? Does the system grind to a halt?

– Metrics to consider• Response time (average, 90th percentile, min, max)• Throughput• CPU and memory utilization on the target system

Scalability

Scalability Testing• Executed as a series of Load Tests– Workload Scalability

• Vary the user load from test to test

– Resource Scalability• Vary the resources from test to test

– Functional success• Error rate “too high”, scalability results are meaningless

– How does performance change from test to test?• Response time (average, 90th percentile, min, max)• Throughput• CPU and memory utilization on the target system

– Do not discount single user performance• A system can exhibit linear scalability, but still perform poorly

Endurance

Endurance Testing• Also know is “Soak” testing• Load test executed over an extended duration– Typically overnight or over the weekend

• Proves “reliability” of the system– Consistency of functional results

• Very first result same as very last and all those in between?• Depending on requirements, error rate > 0 can be acceptable

– Consistency of performance• Does response time or throughput degrade over time?

– Consistency of resource utilization• Are we leaking memory?• How does CPU usage look over the duration?

Stress

Stress Testing• Often mistakenly referred to as “Load” testing• Best thought of as “extreme” load testing– Resiliency of the system when pushed beyond limits

• 150% to 200% of the “nominal” load for the system• Half the system resources suggested for a given load– CPUs, memory, network bandwidth, etc…

• Looking for “graceful failure”– Best: System returns “Too Busy”– Acceptable: System slows down, maybe some requests time out

• Better: effective error messaging so that uses know system is maxed out

– Bad: Crash– Worst: Unpredictable results, misleading error messages

Reliability

Reliability Testing• Negative condition Load Testing• Test resiliency under error conditions– Error condition code paths typically don’t get the same

coverage as the “happy path” – Is the system consistent under constant error conditions?

• Are results consistent and predictable over time?

• Consistency of resource utilization– Error conditions are notorious for resource leaks

• Security tests– i.e.: Denial of Service

Random Suggestions (Time Filler)• Choose workflows that fit the “80/20 rule”

– Some workflows need P&R testing, others don’t. Choose wisely.

• Use sufficient hardware for your Load Generator application– Size your client hardware like you would your target system

• Don’t use “intrusive” validation in your test cases– Heavy test validation will slow down your test and affect concurrency

• Avoid use of “intrusive” monitoring when possible• Beware of logging

– Logging is useful, but can kill performance

• Visualize your results– A picture is worth a thousand words. Who doesn’t like charts?– Include context (resource utilization of the systems under test)

So what do I do if I think there is a problem?

• Too slow?– Is your system tuned?

• Ensure you have not configured a bottleneck in your deployment

– Try a profiling tool• Can show which areas of the code are taking the most time

– Add some lightweight logging to code• Add “timing code” to log out elapsed time in functions/paths

– Use a stack dumping utility• Repeated stack dumps can show where you are “stuck”

• Using too much memory or leaking?– Try a profiling tool

• Can show

– Add “size” logging for container classes• Can show you if your containers are growing unbounded

Questions?

Documents

Performance and Reliability 101 Brent Cromarty Ping Identity [email protected]