Continuous Performance Testing Feedback - … and Techniques for Performance Tests change in CI. 4. Reporting should be succinct, and it should inspire action Reporting should be succinct,

CONFIDENTIAL – Not for Distribution | ©2016 SOASTA, All rights reserved.

Continuous Performance Testing Feedback

Eric Proegler, SOASTA

©2016 SOASTA, All rights reserved. 2


● 13 Years in Performance Testing, 20 in software

● Product Manager, SOASTA

● @ericproegler, contextdrivenperformancetesting.com

● Board, Association for Software Testing

● Lead Organizer,Workshop on Performance and Reliability

● Not a DevOps Maven, Scrumbag, or Agilista!

Eric Proegler


WOPR22WOPR22 was held May 21-23, 2014, in Malmö, Sweden, on the topic of “Early Performance Testing.” Participants in the workshop included: Fredrik Fristedt, Andy Hohenner, Paul Holland, Martin Hynie, Emil Johansson, Maria Kedemo, John Meza, Eric Proegler, Bob Sklar, Paul Stapleton, Andy Still, Neil Taitt, and Mais Tawfik Ashkar.


The Future Just Keeps Happening


What’s Changed?

Everything!

● Development● Deployment● Operations● Testing - and Performance Testing


What We’re Really Talking About

Rapid feedback about riskfrom automatically executed

and evaluated tests,

to provide the right information to people and tools


What We’re Talking About





Testing Addresses Risks

Testing doesn’t create, assure, or ensure quality. It can gather information.

Functional Testing is pretty straightforward. Does the software do what we expect?

Performance Testing is less so. What exactly do we expect?

Slow Performance might be worse than an outage


Risk: ScalabilityExpensive operations mean systems won’t scale well

Ops Problem? Can often be “solved” with hardware

Tall Stacks -> Wirth’s Law: “Software is getting slower more rapidly

than hardware gets faster”

What Does a Problem Look Like?

Longer response times is a clue

“High” CPU/Memory/Storage/Network Utilization


Risk: CapacitySystem can’t support the expected load structurally

Cannot be “solved” with hardware


Response time very sensitive to load. Hard or Soft Resource Limitations

● CPU/Network Limitation

● Increasing I/O Latency

● Database threads and other queues


Risk: ConcurrencyOperations that contend and collide (Race conditions, database locks,

contention points)

Difficult to find and fix these problems


Infrequent functional issues that seem to only occur under load

Process crashes and restarts

Not easily reproducible


Risk: ReliabilityDegradation over time, system becomes slower, less predictable, or

eventually fails


Process crashes and restarts

Memory or Object Leaks

More frequent Garbage Collection

Decaying response times


ME LAUNCH ALL THE TESTS!!!



Rapid feedback about risk

from automatically executed and evaluated

tests, to provide the right information

to people and tools


If Only There Was a Tool...

(...and a CLI for people using the wrong tool!)


Parts of an Automated Performance Test

1. Prepare the SUT installation2. Start the test infrastructure

(if needed)3. Execute the performance

test4. Report results - in context,

usefully5. Clean up and shut down

environment


Philosophy Break



Rapid feedback about risk

from automatically executed




Evaluating an Automated TestTo take action (or not take action) based on an automated test (Check, please),

you need:

A reliable measurement (or Oracle) for determining whether the test passes,

fails, or needs further investigation

An expected result for context - what was supposed to happen in this test?

A way to validate this measurement when investigating problems

A reproducible, consistent set of conditions (state) to take the measurement in


Evaluating an Automated Performance Test

Additional measurement and aggregation

Reproducible also refers to available resources

Could be compound?

● Response Time Average

● 90(92/95/98) th Percentile

● Error Rate

● Transactions Per Second

● Size of Network Transfer


Evaluating an Automated Performance Test


Calibration

Calibrate, and Recalibrate as necessary between

environments, builds, day/times…any time you want to be

sure you are comparing two variables accurately

Be ready to repeat/troubleshoot when you find anomalies.

Watch for trends, be ready to drill down. Automation

extends your senses, but doesn’t replace them


Running Experiments


What We’re Really Talking About


and evaluated

tests, to provide the right information

to people and tools


Dinosaur Thinking - Simulation Testing

Will the completed, deployed system support:

(a, b…) users

performing (e, f…) activities

at (j, k…) rates

on mn… configuration

under rs… external conditions,

meeting x, y… response time goals?


“All Models are wrong. Some are useful.”

● Guesses at activities, and frequencies● Organic loads and arrival rates – limitations imposed by

load testing tools● Session abandonment, other human behaviors● Simulating every activity in the system● Data densities (row counts, cardinality)● Warmed caching● Loads evolving over time


“The Environment is Identical”● Shared resources: Virtualization, SANs, Databases,

Networks, Authentication, etc

● Execution environment versions and patching

● Software and hardware component changes, versions

and patching

● Variable network conditions, especially last-mile

● Background processing and other activities against

overlapping systems and resources


Changing Test Design

● Horizontal Scalability assumptions – let’s use

them

● Who says we have to have systems distributed

the same way?

● Why can’t my test database be on this system for

calibration purposes?

● Isolation is more important than “Real”


Changing Test Design

Now That We’re Not Pretending to be “Real”:

Scalability: Add Timers to Functional Tests, record

response times and trend them

Capacity: Stair-Stepping until curve is found

Concurrency: No think time, burst loads of small

numbers of threads

Reliability: Soak Tests - run tests for days or weeks


Repeatable, Reliable, Rapid Tests

1. Test subsets: single servers, single components, cheaply and

repeatedly

2. Login (measure session overhead/footprint)

3. Simple workflows, as soon as they are ready

4. Control variables

5. Focus on repeatability

6. Asserts that were too expensive in scaled load tests work here

7. Use more than one test!


What are the New Tricks?


What We’re Really Talking AboutRapid feedback about risk

from automatically executed and evaluated

tests,



Reporting

Exposing Graphs/traffic lights is fine for verification.

When something goes wrong, how do you ask for

action? What additional information could you

include?

Can you explain it in 3 sentences with one picture?


How Are We Doing?



Conclusions

1. Tests reduce risk. Think about which risks your Performance

Tests address.

2. Pass/Fail is an easy check in Functional Tests. What are the

criteria for Performance Tests?

3. Models and Techniques for Performance Tests change in CI.

4. Reporting should be succinct, and it should inspire action

5. Embrace the present! Beats the alternative...


Thank You!

Documents

Continuous Performance Testing Feedback - … and Techniques for Performance Tests change in CI. 4. Reporting should be succinct, and it should inspire action Reporting should be succinct,