System Testing

DAIMI (c) Henrik Bærbak Christensen 1

System Testing


What is System Testing?

System testing is testing the system as a whole and consists of a set of test types with different focus:– Functional testing

• functional requirements, use cases, “doing the job”

– Performance testing• coping with work load, latency, transaction counts

– Stress testing• coping with above limit workload, graceful degradation

– Configuration testing• deployment and (re)configurations

– Security testing• threat resistence

– Recovery testing• loss of resources, recovery


Aim

System testing’s aim is to demonstrate that the system performs according to its requirements.– thus it embody the contradictory aims of

• proving that there are defects• make us confident that there are none...

– but the focus is more on the latter than in lower levels of testing

Next testing level:– Acceptance test: include costumer

• Factory AT (FAT) and Site AT (SAT)

– Alpha/Beta test: include mass-market user groups• Alpha: at shop; Beta: at customer, Release Candiate:

War stories

Major Danish supplier of hardware/software for military air crafts– 1000 hours system testing

Small Danish company serving Danish airports– The ‘factory accept test’ area was called the “wailing

wall” (grædemuren )



Functional Testing


Functional Test

Driven by the requirements / specifications– functional tests can thus be

planned and test cases defined very early in a project

Often large overlap with top-level integration test cases– however, the changed

focus often requires new test cases.


Coverage

The natural adequacy criteria / coverage metric of functional tests is of course defined by the requirements– requirements coverage

• all functional requirements must be achievable by system

– use case coverage• all use cases are covered

Technical adequacy may also be interesting– state coverage

• all system state and state transitions are covered

– function coverage• all “system functions” are covered


The story is the same

The basic story at the system level resemble all other levels:

Use systematic techniques to define test cases such that– coverage is high– perceived chance of finding defect is high– minimal number of test cases defined

Remember:– Reliability: Probability that a software system will not

cause the failure of the system for a specified time under specified conditions.


Defining test cases

Functional tests are black-box in nature.

Thus BB techniques are applicable– ECs of user input / required output

• Valid and invalid input handling

– Boundary value analysis


Strategies for selection at system level

John D. McGregor and David A. Sykes, Practical Guide to Testing Object-Oriented Software, Addison Wesley, 2001.

McGregor and Sykes describe two approaches at system level:

Defect Hunting versus System Use– illustrates the contradictory aims of testing


Defect Hunting

Defect hunting is based upon the observation that often developers pays less attention to error handling than to normal processing.

Thus the idea is to go hunting defects by trying to trigger failures: provide invalid input/conditions.– the classic is the monkey test: hammer away on the keyboard

and see what happens

This has some overlap with performance, stress, and other types of testing mentioned later.


System Use

Remember the definition of reliability:

Reliability: Probability that a software system will not cause the failure of the system for a specified time under specified conditions.

Thus: Reliability is determined by the way we use the system.


War Story: Recovery/Exceptions

Warstory– Ragnarok encountered ‘disk full’ during save

operation – resulting in lost data in a version control repository (and that is b-a-d!) for a compiler…

– ‘disk full’ had never been though of…

[but]– given the scope of the system (research prototype)

the effort to get the system running again was lower than the effort to do a systematic testing…

• (but this estimate was never made, I was just plain lucky!)


Operational Profile/Usage Model

If we can make a model of how we use the system we can then focus our testing efforts (read: cost) such that we test those functions that are used the most (read: benefit).

Thus: get as much reliability as cheap as possible


Operational Profiles

Operational profile– quantitative characterization of how a software

system will be used in its intended environment– a specification of classes of inputs and the probability

of their occurrence [Burnstein 12.3]

Burnstein defines a 5 step process to get it:– Customer profile, user profile, system mode profile,

functional profile, operational profile...


Another approach

Binder defines three system testing patterns– Extended Use Case Test

• focus on complete requirements coverage

– Covered in CRUD• CRUD coverage for all problem domain abstractions

– akin to Burnstein’s “function coverage”

– Allocate Test by Profile• maximize requirements coverage under budget constraints


Extended Use Case Test


EUCT

Intent– Develop an application system test suite by modeling

essential capabilities as extended use cases

Based on extended use cases:– Use case augmented with information on

• domain of each variable that participates in use case• required input/output relationships among use case variables• (relative frequency of use case)• sequential dependencies among use cases

– that is, we need the exact input/output relation...


Process

Binder describes the process of defining test cases based upon the extended use cases:– identify operational variables– define variable domains– develop operational relations– develop test cases

The is actually another way of saying EC and Boundary Analysis testing – nothing new – identify input dimensions– identify boundaries– develop equivalence classes– develop test cases


Entry/Exit

Entry criteria– that we have extended use cases – that system has passed integration testing

Exit criteria– use case coverage / requirements coverage


Real Example


Covered in CRUD


CRUD

Intent– Covered in CRUD verifies that all basic operations are

exercised for each problem domain object in the system under test.

Context– Antidecomposition axiom once again: system level

testing does not achieve full coverage of its units.– CRUD focus on exercising the domain objects by the

central Create, Read, Update, Delete operations.


Strategy

1) Make a matrix of use cases versus domain object CRUD

2) Develop test cases for missing ops on any domain object.


Coverage

Exit criteria– CRUD adequacy


Allocate Tests by Profile


Profile testing

Intent– Allocate the overall testing budget to each use case in

proportion to its relative frequency,

Context– Rank use cases according to relative frequency– Allocate testing resources accordingly– Develop test cases for each use case until resources

allocated are exhausted.– Maximize reliability given the testing budget.

Example from previously– Word’s (save) versus (configure button panel)


Strategy

Estimate use case frequency– difficult, but reportedly the relative ordering is more

important than exact frequency• 90% search, 10% update• 80% search, 20% update• still the testing is heavily skewed towards search testing...

Testing effort proportional to use case probability


Example

Design, setup, and run a test: 1h Test finds a defect: 5% chance Correcting a defect: 4 hours Testing budget: 1000h Total tests: T

– Mean time to make a test: 1h + (0.05*4h) = 1.2h– Number of test cases: 1000h / 1.2h = 833

Allocate 833 test cases according to profile– 50% use case would get 833 * 50% test cases ...


Notes

Take care of– critical use cases (central for operation, like “save”)

– high-freq use case with trivial implementation

– many low-freq use cases• merge them• (or maybe they are plainly wrong modelled)

Exit criteria– use case coverage


Non-functional testing types


Performance Testing

Does it perform ok?– coping with work load, latency, transaction counts per

second, ect...

Test case generation based upon std techs.

Load generators: – load = inputs that simulates a group of transactions– applications that will generate load

• multiple users accessing database/webservice

– Capture/Replay tactics for realistic load– “Devilish” loads


Performance Testing

Probes– software units that collect performance data in the

production code– “Built-in monitors” tactic

Perhaps tools to analyse probe output to judge if expected output can be confirmed...


Stress Testing

Stress testing– allocate resources in maximum amounts

– flood with requests, flood with processes, fill RAM, fill disk, fill data structures

– [fill the event queue with mouse events ; or hammer away on the keyboard (monkey-test)]

– accumulated data over period of years

Easy to find example of defects– JHotDraw with 2000 figures…– Java with a couple of 100 threads…


Configuration/Deployment

Configuration testing / deployment testing– verify product variants on execution environment– I.e. how does the installation program work? Is it

complete? Are any files missing? Are all combinations of options working?

– What about dependencies to things that will change?

Burnstein– test all combinations/configuration– test all device combinations

• poor Microsoft !!!

– test performance level is maintained


Deployment testing

The microsoft platform has given a name to the problems:– ‘DLL hell’

The Pentium bug– Early release of pentium x86 made wrong floating

point calculations (!)– defect was traced to a wrong script for transferring the

design models to the machines to generate the hardware masks

To verify that install is correct– system regression tests for ‘normal’ operation

• pruned for tests using options not installed.


Security testing

War Story– SAVOS weather system at Copenhagen Airport

• no escape to windows from SAVOS was a requirement• one day we saw a guy playing chess with SAVOS iconized!• we did not know that dbl-click the title bar minimizes the

window

Standard security aspects apply but also:– ability of application to access resources– for instance

• System DLL’s that user has not permission to access…• Wrong Java Policy file


Recovery Testing

Recovery testing– loss of resources to verify that it can properly recover

• loosing network connection, loosing server in mid transaction, loosing data input from the field

– many availability techniques• cold standby, warm standby, hot standby, exception

handling, etc.

Areas of interest– Restart: Pending transactions and system state

properly re-established– Switch-over: from master to slave system


Regression Testing

Not a special testing technique but retesting software after a change.

Very expensive for manual tests – researchers try to find the minimal test set to run

given some change in a module A

Automated tests fare better– if they are not too heavy to execute...


Acceptance tests

System tests performed together with the customer (factory accept test)

does system meet users’ expectations?

– rehearsal by testers and developers is important – all defects must be documented

Installations test (site accept test)– perform same tests at customer site


Alpha/Beta tests

The accept and installation tests for the mass-market

– α-test: Users at developers’ site

– β-test: Users at their own site


Summary

System testing is concerned with– end user behavior verification, and– performance, stress, configurations, etc.

The BB techniques and WB coverage criteria apply at all levels– consider equivalence classes and boundaries– consider the coverage required

Documents

System Testing