Bioinformatics software testing and quality assurancebioinformatics.org.au/ws/wp-content/uploads/sites/10/... · 2016-07-21 · Mary is a bioinformatician in a research lab. Her project

Bioinformatics software testing and quality assurance Joshua W. K. Ho Head, Bioinformatics and Systems Medicine Laboratory Victor Chang Cardiac Research Institute

Winter School in Mathematical and Computational Biology, UQ 2016-07-06

How do I know the output of my program is correct?

How do you know an R package is implemented correctly?

Is running the ‘example’ code alone sufficient? If not, how many test cases do we need? How to generate additional test cases? How do we verify the correctness of the outputs?

Nature 2010 (News Feature)

Nature 2013 (News In Focus)

Science 2013 (Policy Forum)

Nature 2015 (World View)

Clinical application: how can we be sure that our variant calling pipeline is implemented correctly?

Challenge: very hard to check the correctness of the output, especially false negatives

Genome Medicine, 2013

Five commonly employed variant-calling pipelines. SNV concordance ~57.4%, and indel concordance ~26.8% => Need caution interpreting results in genomic medicine setting

Genome Medicine, 2014

Compared results from ANNOVAR and VEP (using ENSEMBL transcripts): Matching annotation in only 65% if loss-of-function variants, and 87% of all exonic variants

Why is testing challenging in bioinformatics?

•  Lack of rigorous review (compared to journal peer review) •  We do not spend enough time and energy on testing •  Misusing external software / components

•  Often caused by not checking the limitation and scope of the software function •  Over reliance on ‘validation testing’ – merely check if the results “make sense”

•  Nonetheless it is often hard to exactly check the correctness of the output of complex algorithmic outcomes

•  Often only rely on a very small number of (simple) test cases •  Software fault may only show up in some input cases, so a failure may not be observed

unless we try to search the input space widely. •  Will need to use diverse and realistic test cases

Main objectives

•  Understand the importance and challenges of software testing in bioinformatics

•  Understand basic concepts and techniques in software testing

•  Understand how we can implement QA in bioinformatics, especially in translational genomic applications

Software testing concepts and techniques

Some definitions in software testing

Error: A defect in the human thought process Misspecification of the range of a variable

Fault: Concrete manifestation of an error within the software. A bug, use of wrong parameters, incorrect software dependency

Failure: Departure of the operational software system behaviour from user expectation Test case: Input and execution condition that is developed for verifying the compliance of the program to the specific requirement Oracle: A mechanism to check the correctness of a test result from any given test case.

Case study: testing a kNN classifier

Mary is a bioinformatician in a research lab. Her project studies whether promoter DNA sequences in yeast can distinguish among three groups of genes - highly expressed, highly repressed, and dynamically expressed. She wants to build a supervised classifier of promoter DNA sequences for her research. She downloaded a new R package that was described in a recent publication. The algorithm behind the main function of this package, kNN, is described in the paper and the R ‘help’ page. The ‘example’ code can be executed successfully, and seems to produce reasonable looking outputs, even though you are not 100% sure of the expected outputs.

What does this R package do?

Class1, TCGATCGATCGGGGATTAGC Class1, ACGATCGGGGACGAGCTACCCATG Class1, CATCGATGGGCTAGCT Class2, ATCGTGGGCTAGCTAGCCCCCC Class2, GATGCTAAACGGGGATCGATCA Class3, ACGTGGATCGAAAAAGCTAGC Class3, GTAGATCGAAATCGATGCATCGAGC Class3, ATGCTAGGGCTAGCTAC

TGATGCGACGATCGATCGCATAC ACGAGGGGCTAGCTACA GCTAGCCCATCGATCTAGATCGAGCGATCGA ACGTTGGCTAGCTACG

Training sequences Training class label

Test sequences

AA AT AC AG TA TT … Class1, 8 2 4 5 2 1 … Class1, 3 5 1 6 7 2 … Class1, 0 2 3 5 2 1 … Class2, 2 9 4 0 9 1 … Class2, 1 4 0 3 0 5 … Class3, 8 1 1 5 1 1 … Class3, 6 0 3 7 7 7 … Class3, 5 2 7 1 6 2 …

AA AT AC AG TA TT … 3 5 1 6 7 2 … 0 2 3 5 2 1 … 2 9 4 0 9 1 … 6 0 3 7 7 7 …

Class1 Class2 Class3 Test data

The label of the test data is determined by the most frequently occurring class in the k nearest training instances. If k=3, the test data will be classified to be Class 3 in this example. If there is a tie of the most frequently occurring class, our classifier will return ‘Uncertain’.

Calculate Euclidian distance (using k-mer frequency) between test data and all training data

Calculation of k-mer frequency

13

Correct implementation

What are good test cases? What is a good oracle for this program?

This version has a fault.

Correct: sqrt(sum((thisTest - thisTrain)^2))

This version has a fault.

Correct: k==kk

Input space

Failure-causing input

Execute by Program Under Test (PUT)

Execute by Program Under Test (PUT)

Verify by Oracle

Verify by Oracle

Successful test case

Failure detected

Test case selection Test execution Output verification

1. Test Case Selection Problem: how to increase the chance of selecting a test case from the failure-causing input

2. Oracle Problem: How to decide if any given test cases is correct?

Can we learn from the software testing field?

A good software testing strategy should actively reveal as many faults as possible using a selected set of test cases. Selection of test cases (input) Special test cases? Random test cases? Test based on program flow? Test based on failure pattern?

Test execution and automation Which test to execute first, what to execute next? When to stop? Can we automate this process?

Verification of test cases How to check correctness for the output from large and complex software? Simulation program? Machine learning software?

Test reporting and documentation Reporting testing for validation and verification

Weyuker (1982) Computer Journal

Example: how to test sin(x)?

sin function sin(0o )=0

sin(30o)=0.5

Suppose the program returns: sin(29.8o )=0.51234 incorrect sin(29.8o )=0.49876 correct? How do I design test cases without knowing the implementation of the program? E.g.,

3 5

sin( )3! 5!x xx x= − + −K

Three standard techniques

Special test cases / special value testing Selected cases where the correct output is known (from external experimental validation or simulation)

N-version programming Check concordance between multiple implementation or variants of the same initial specification

Check that the output is within the ‘expected range’ of values Even thought we cannot determine precisely what the value may be, it is often possible to determine an expected range of values the output should fall in.

21

Three solutions

Solution 1: special test cases, such as 𝑥=0,𝜋/6 , 𝜋/2 ,… Problem: can only test a small subset of inputs

Solution 2: N-version programming, compare the results of multiple independent versions of sin(x) Problem: what happens when an inconsistency is detected?

Solution 3: check the expected range, such as visual inspection of the plot of sin(x) Problem: not quantitative

Compile a test suite of models that have been solved analytically or using numerical method Run each stochastic simulator many times, and check that the result do not deviate substantially from the analytical solution

Software testing concept: Special test cases

What about systems biology?

Software testing concept: N-version programming

How to determine the correctness of the program output?

Indeed it is very hard to verify the correctness of any given output of these programs (If we know the expected output, we do not need these programs in the first place!) Common techniques for dealing with the Oracle problem Special test cases Selected cases where the correct output is known (from external experimental validation or simulation)

N-version programming Check concordance between multiple implementation or variants of the same initial specification

Expected range Check that the range of value is within expectation

25

26

Metamorphic Testing

sin function has the following properties sin(x)= sin(x+360o) ……

Execute the program with input x=29.8o and x=389.8o check that sin(29.8o) and sin(389.8o ) Key idea: Multiple execution of the same program. •  Identify the expected output of a program from previously executed test cases

Chen et al (2009) BMC Bioinformatics

Core idea: Execute the same program multiple times with slightly modified input, such that their output could be compared to some expected properties

Different MR and test cases have different effectiveness

We tested a network simulator with real and simulated data using 10 Metamorphic Relations on the original and mutant programs

Chen et al (2009) BMC Bioinformatics

Advantages of Metamorphic Testing

•  We can use real data (instead of simulated data) as test cases as there is now a mechanism to verify the output

•  Usually quite easy to implement if you know some properties of the algorithm

•  Can be use in conjunction with other testing techniques, such as special test cases and N-version programming, etc.

31

An example MR for a kNN classifier

Source test case: Source input: Train.seq, Train.cls, OneTest.seq, k, kk

Source output: cls

Follow-up test case If cls != ‘Uncertain’:

•  Follow-up input: (Train.seq,OneTest.seq), (Train.cls, cls), OneTest.seq, k, kk

•  Expected follow-up output: cls

Another example MR for a kNN classifier

Source test case: Source input: Train.seq, Train.cls, OneTest.seq, k, kk

Source output: cls

Follow-up test case If cls != ‘Uncertain’:

•  Follow-up input: (Train.seq + duplicate all sequences from class cls), (Train.cls + cls), OneTest.seq, k, kk

•  Expected follow-up output: cls

Other useful tips for choosing good test cases

•  Test boundary values •  Use diverse test cases •  The order of execution of the test cases matters because failure causing inputs

are generally clustered together in the input space

35

Failure pattern and test case diversity

Failure-causing pattern fixed but unknown

r

o

t

r

o

t

r

r

o

•  Can reduce the number of test cases by up to 50% •  Challenge is to define a good distance measure

Kamali (2015) Biophysical Reviews

Quality assurance in translational bioinformatics

Motivation: Quality assurance of clinically-oriented bioinformatics pipelines for human genetic mutation identification

Genetic counseling, Inform treatment options

Take blood for DNA sequencing

From hundreds of millions of short reads to identify

genetic variants

Motivation – Clinical Guidelines about quality assurance

•  RCPA – Massively Parallel Sequencing Implementation Guidelines •  Aimed at diagnostic laboratories implementing next generation sequencing •  “The validation study must establish the analytical validity of the bioinformatics

pipeline in terms of being able to correctly detect sequence variants” •  “The laboratory must validate the entire bioinformatics pipeline as a whole, under

the given operational environment”

•  ….but did not specify how?

Motivation – Many variant calling pipelines exist, but their results have low concordance…

J. O’Rawe, et al., “Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing,” Genome Med., vol. 5, no. 3, p. 28, Mar. 2013.

A genomic variant calling pipeline

A sequence with length ~3x109

~200 million sequence reads, each with length 100

~2 million variants

Vision: Validation and Quality Control on the cloud

Framework Overview

Tests

Test Description

MR0 Deterministic Output

MR1 random permutation of input

MR2 duplication of reads

MR3 unmapped reads

MR4 mapped reads

SI0 simulated reads – no mutations

SI1 simulated reads – mutations

Results

Results

Cost: on-demand vs spot •  9 x c3.8xlarge instances for 6 hours

($1.68/hr/ instance on-demand) •  76% saving using spot instances

On-Demand Spot $90.72 $21.60

Summary

•  Understand the importance and challenges of software testing in bioinformatics

•  Understand basic concepts and techniques in software testing

•  Understand how we can implement QA in bioinformatics, especially in translational genomic applications

[email protected] http://bioinformatics.victorchang.edu.au

Joint work with

Eleni Giannoulatou (Lab Head)

Michael Troup (RA)

Andrian Yang (PhD student)

Amir Kamali (MPhil student)

Documents

Bioinformatics software testing and quality assurancebioinformatics.org.au/ws/wp-content/uploads/sites/10/... · 2016-07-21 · Mary is a bioinformatician in a research lab. Her project