29
Automatic System Testing of Programs without Test Oracles Christian Murphy, Kuang Shen, Gail Kaiser Columbia University

Automatic System Testing of Programs without Test Oracles

Embed Size (px)

DESCRIPTION

Automatic System Testing of Programs without Test Oracles. Christian Murphy, Kuang Shen, Gail Kaiser Columbia University. Problem Statement. Some applications ( e.g. machine learning, simulation) do not have test oracles that indicate whether the output is correct for arbitrary input - PowerPoint PPT Presentation

Citation preview

Automatic System Testing of Programs without Test Oracles

Christian Murphy, Kuang Shen, Gail Kaiser

Columbia University

Chris Murphy – Columbia University 2

Problem Statement Some applications (e.g. machine learning,

simulation) do not have test oracles that indicate whether the output is correct for arbitrary input

Oracles may exist for a limited subset of the input domain, and gross errors (e.g. crashes) can be detected with certain inputs or techniques

However, it is difficult to detect subtle (computational) errors for arbitrary inputs in such “non-testable programs”

Chris Murphy – Columbia University 3

Observation If there is no oracle in the general case, we

cannot know the expected relationship between a particular input and its output

However, it may be possible to know relationships between sets of inputs and the corresponding set of outputs

“Metamorphic Testing” [Chen et al. ’98] is such an approach

Chris Murphy – Columbia University 4

Metamorphic Testing An approach for creating follow-up test cases

based on previous test cases

If input x produces output f(x), then the function’s “metamorphic properties” are used to guide a transformation function t, which is applied to produce a new test case input, t(x)

We can then predict the expected value of f(t(x)) based on the value of f(x) obtained from the actual execution

Chris Murphy – Columbia University 5

Metamorphic Testing without an Oracle When a test oracle exists, we can know

whether f(t(x)) is correctBecause we have an oracle for f(x)So if f(t(x)) is as expected, then it is correct

When there is no test oracle, f(x) acts as a “pseudo-oracle” for f(t(x)) If f(t(x)) is as expected, it is not necessarily

correctHowever, if f(t(x)) is not as expected, either f(x) or f(t(x)) (or both) is wrong

Chris Murphy – Columbia University 6

Metamorphic Testing Example Consider a program that reads a text file of test

scores for students in a class, and computes the averages and the standard deviation of the averages

If we permute the values in the text file, the results should stay the same

If we multiply each score by 10, the final results should all be multiplied by 10 as well

These metamorphic properties can be used to create a “pseudo-oracle” for the application

Chris Murphy – Columbia University 7

Limitations of Metamorphic Testing Manual transformation of the input data or

comparison of output can be laborious and error-prone

Comparison of outputs not always possible with tools like diff when they are not expected to be “exactly” the same

Chris Murphy – Columbia University 8

Our Solution Automated Metamorphic System Testing

Tester needs to: Specify the application’s metamorphic properties Configure the testing framework Run the application with its test input

Framework takes care of automatically: Transforming program input data Executing multiple instances of the application with

different transformed inputs in parallel Comparing outputs of the executions

Chris Murphy – Columbia University 9

Model

Chris Murphy – Columbia University 10

Amsterdam: Automated Metamorphic System Testing Framework

Metamorphic properties are specified in XML Input transformation Runtime options Output comparison

Framework provides out-of-box support for numerous transformation and comparison functions but is extendable to support custom operations

Additional invocations are executed in parallel in separate sandboxes that have their own virtual execution environment [Osman et al. OSDI’02]

Chris Murphy – Columbia University 11

Empirical Studies To measure the effectiveness of the

approach, we selected three real-world applications from the domain of supervised machine learning Support Vector Machines (SVM): vector-based

classifier C4.5: decision tree classifierMartiRank: ranking application

Chris Murphy – Columbia University 12

Methodology (1) Mutation testing was used to seed defects into

each application Comparison operators were reversed Math operators were changed Off-by-one errors were introduced

For each program, we created multiple variants, each with exactly one mutation

Weak mutants (that did not affect the final output) were discarded, as were those that caused outputs that were obviously wrong

Chris Murphy – Columbia University 13

Methodology (2) Each variant (containing one mutation) acted

as a pseudo-oracle for itself:Program was run to produce an output with the

original input datasetMetamorphic properties applied to create new

input datasetsProgram run on new inputs to create new

outputs If outputs not as expected, the mutant had been

killed (i.e. the defect had been detected)

Chris Murphy – Columbia University 14

Metamorphic Properties Each application had four metamorphic properties

specified, based on: Permuting the order of the elements in the input data set Multiplying the elements by a positive constant Adding a constant to the elements Negating the values of the elements in the input data

Testing was conducted using our implementation of the Amsterdam framework

Chris Murphy – Columbia University 15

SVM Results

Permuting the input was very effective at killing off-by-one mutants

Many functions in SVM perform calculations on a set of numbers

Off-by-one mutants caused some element of the set to be omitted

By permuting, a different number would be omitted The results of the calculations would be different,

revealing the defect

Chris Murphy – Columbia University 16

C4.5 Results

Negating the input was very effective C4.5 creates a decision tree in which nodes contain

clauses like “if attrn > α then class = C” If the data set is negated, those nodes should

change to “if attrn ≤ -α then class = C”, i.e. both the operator and the sign of α

In most cases, only one of the changes occurred

Chris Murphy – Columbia University 17

MartiRank Results

Permuting and negating were effective at killing comparison operator mutants

MartiRank depends heavily on sorting Permuting and negating change which numbers get

compared and what the result should be, thus inducing the differences in the final sorted list

Chris Murphy – Columbia University 18

Summary of Results

143 mutants killed out of 182 (78%) Permuting or negating the inputs proved to

be effective techniques for killing mutants because of the mathematical nature of the applications

Multiplying and adding were not effective, possibly because of the nature of the mutants we inserted

Chris Murphy – Columbia University 19

Benefits of Automation For SVM, all of the metamorphic properties called

for the outputs to be the same as the original But in practice we knew they wouldn’t be exactly

the same Partly due to floating point calculations Partly due to approximations in the implementation

We could use Heuristic Metamorphic Testing to allow for outputs that were considered “close enough” (either semantically or to within some tolerance)

Chris Murphy – Columbia University 20

Effect on Testing Time Without parallelism, metamorphic testing

introduces at least 100% overhead since the application must be run at least twice

In our experiments on a multi-core machine, the only overhead came from creating the “sandbox” and comparing the results less than one second for a 10MB input file

Chris Murphy – Columbia University 21

Limitations and Future Work Framework Implementation

The “sandbox” only includes in-process memory and the file system, but not anything external to the system

The framework does not yet address fault localization

Approach Approach requires some knowledge of the application to

determine the metamorphic properties in the first place Need to investigate applicability to other domains Further applicability of Heuristic Metamorphic Testing to

non-deterministic applications

Chris Murphy – Columbia University 22

Contributions A testing technique called Automated

Metamorphic System Testing that facilitates testing of non-testable programs

An implementation called Amsterdam

Empirical studies demonstrating the effectiveness of the approach

Automatic System Testing of Programs without Test Oracles

Chris Murphy

[email protected]

http://psl.cs.columbia.edu/metamorphic

Chris Murphy – Columbia University 24

Related Work Pseudo-oracles [Davis & Weyuker ACM’81]

Testing non-testable programs [Weyuker TCJ’82]

Overview of approaches [Baresi and Young ’01]

Embedded assertion languages Extrinsic interface contracts Pure specification languages Trace checking & log file analysis

Using metamorphic testing [Chen et al. JIST’02; others]

Chris Murphy – Columbia University 25

Related Work Applying Metamorphic Testing to “non-

testable programs” Chen et al. ISSTA’02 (among others)

Automating metamorphic testingGotleib & Botella COMPSAC’03

Chris Murphy – Columbia University 26

Categories of Metamorphic Properties Additive: Increase (or decrease) numerical values by a

constant Multiplicative: Multiply numerical values by a constant Permutative: Randomly permute the order of elements in a

set Invertive: Reverse the order of elements in a set Inclusive: Add a new element to a set Exclusive: Remove an element from a set Others….

ML apps such as ranking, classification, and anomaly detection exhibit these properties [Murphy SEKE’08]

Chris Murphy – Columbia University 27

Specifying Metamorphic Properties

Chris Murphy – Columbia University 28

Further Testing For each app, additional data sets were used

to see if more mutants could be killed

SVM: 18 of remaining 19 were killed MartiRank: 6 of remaining 19 were killed C4.5: one remaining mutant was killed

Chris Murphy – Columbia University 29

Heuristic Metamorphic Testing Specify metamorphic properties in which the

results are may be “similar” but not necessarily exactly the same as predicted

Reducing false positives by checking against a difference threshold when comparing floating point numbers

Addressing non-determinism by specifying heuristics for what is considered “close”