20
Automated Software Transplantation Alexandru Marginean — Automated Software Transplantation Earl T. Barr Mark Harman Yue Jia Justyna Petke Alexandru Marginean CREST, University College London

Automated Software Transplantationissta2015.cs.uoregon.edu/slides/barr-transplantation.pdf · the selected statement at t in the organ’s statement list; to replace, it overwrites

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Automated Software Transplantationissta2015.cs.uoregon.edu/slides/barr-transplantation.pdf · the selected statement at t in the organ’s statement list; to replace, it overwrites

Automated Software Transplantation

Alexandru Marginean — Automated Software Transplantation

Earl T. Barr

Mark Harman

Yue Jia

Justyna Petke

Alexandru Marginean

CREST, University College London

Page 2: Automated Software Transplantationissta2015.cs.uoregon.edu/slides/barr-transplantation.pdf · the selected statement at t in the organ’s statement list; to replace, it overwrites

Video Player

Start from scratch

Why Autotransplantation?Check open

source repositories

Alexandru Marginean — Automated Software Transplantation

Why not handle H.264?

~100 players

Page 3: Automated Software Transplantationissta2015.cs.uoregon.edu/slides/barr-transplantation.pdf · the selected statement at t in the organ’s statement list; to replace, it overwrites

AutotransplantationAutomatic Error Fixing

In-Situ Code ReusalManual Code Transplants

Alexandru Marginean — Automated Software Transplantation

Related Work

Clone Detection Code Migration Dependence Analysis

Feature Location Code Salvaging Feature Extraction

In-Situ Code Reuse

Synchronising Manual

Transplants

Automatic Replay Copy-

Paste

Page 4: Automated Software Transplantationissta2015.cs.uoregon.edu/slides/barr-transplantation.pdf · the selected statement at t in the organ’s statement list; to replace, it overwrites

AutotransplantationAutomatic Error Fixing

Related WorkManual Code TransplantsIn-Situ Code Reusal

Alexandru Marginean — Automated Software Transplantation

Running Program

Debugger

Binary Organ

Miles et al.: In situ reuse of logically extracted functional components

Page 5: Automated Software Transplantationissta2015.cs.uoregon.edu/slides/barr-transplantation.pdf · the selected statement at t in the organ’s statement list; to replace, it overwrites

Autotransplantation

Manual Code Transplants In-Situ Code ReusalAutomatic Error Fixing

Related Work

Host Donors

Alexandru Marginean — Automated Software Transplantation

Sidiroglou-Douskos et al.: Automatic Error Elimination by Multi-Application Code Transfer

Page 6: Automated Software Transplantationissta2015.cs.uoregon.edu/slides/barr-transplantation.pdf · the selected statement at t in the organ’s statement list; to replace, it overwrites

Related WorkManual Code Transplants In-Situ Code Reusal

Automatic Error Fixing

Alexandru Marginean — Automated Software Transplantation

Autotransplantation

Page 7: Automated Software Transplantationissta2015.cs.uoregon.edu/slides/barr-transplantation.pdf · the selected statement at t in the organ’s statement list; to replace, it overwrites

Human Organ Transplantation

Alexandru Marginean — Automated Software Transplantation

Page 8: Automated Software Transplantationissta2015.cs.uoregon.edu/slides/barr-transplantation.pdf · the selected statement at t in the organ’s statement list; to replace, it overwrites

Automated Software Transplantation

Host

Donor

OOrgan

ENTRY

V

Organ’s Test Suite

Manual Work:

Organ Entry

Organ’s Test Suite

Implantation Point

Alexandru Marginean — Automated Software Transplantation

Page 9: Automated Software Transplantationissta2015.cs.uoregon.edu/slides/barr-transplantation.pdf · the selected statement at t in the organ’s statement list; to replace, it overwrites

μTrans

Alexandru Marginean — Automated Software Transplantation

Host

Donor

Stage 1: Static Analysis

Host Beneficiary

Stage 2: Genetic

Programming

Stage 3: Organ

Implantation

Organ’s Test Suite

Page 10: Automated Software Transplantationissta2015.cs.uoregon.edu/slides/barr-transplantation.pdf · the selected statement at t in the organ’s statement list; to replace, it overwrites

Stage 1 — Static Analysis Donor

OEENTRY

VeinOrgan

Matching Table Dependency

Graph

Alexandru Marginean — Automated Software Transplantation

H

Implantation Point

Stm: x = 10; -> Decl: int x;Donor: int X -> Host: int A, B, C

Page 11: Automated Software Transplantationissta2015.cs.uoregon.edu/slides/barr-transplantation.pdf · the selected statement at t in the organ’s statement list; to replace, it overwrites

Strong Proxies: Does it produce the correct output?

Stage 2 — GPS1 S2 S3 S4 S5 … Sn

Matching Table

V3H V4H

Donor Variable ID

Host Variable ID (set)

V1D

V2D

V1H V2H

V5H

Individual

Var

Mat

chin

gSt

atem

ent

s

V1D V1H

V2D V4H

S1 S7 S73

M1:M2:

…Genetic Programming

Algorithm 1 Generate the initial population P ; the functionchoose returns an element from a set uniformly at random.Input V , the organ vein; SD, the donor symbol table; OM , the host

type-to-variable map; Sp, the size of population; v, statements inindividual; m, mappings in individual.

1: P := ;2: for i := 1 to Sp do

3: m, v := ;, ;4: for all sd 2 SD do

5: sh := choose(OM [sd])6: m := m [ {sd ! sh}7: v := { choose(V ) }8: P := P [ {(m, v)}9: return P

variables used in donor at LO. For each individual, Alg. 1 firstuniformly selects a type compatible binding from the host’svariables in scope at the implantation point to each of theorgan’s parameters. We then uniformly select one statementfrom the organ, including its vein, and add it to the individual.The GP system records which statements have been selectedand favours statements that have not yet been selected.

Search Operators During GP, µTrans applies crossoverwith a probability of 0.5. We define two custom, crossoveroperators: fixed-two-points and uniform crossover. The fixed-two-points crossover is the standard fixed-point operator sepa-rately applied to the organ’s map from host variables to organparameters and the statement vector, restricted to each vec-tor’s centre point. The uniform crossover operator producesonly one o↵spring, whose host to organ map is the crossover ofits parents’ and whose V and O statements are the union of itsparents. The use of union here is novel. Initially, we used con-ventional fixed-point crossover on organ and vein statementsvectors, but convergence was too slow. Adopting union spedconvergence, as desired. Fig. 4 shows an example of applyingthe crossover operators on the two individuals on the left.After crossover, one of the two mutation operators is ap-

plied with a probability of 0.5. The first operator uniformlyreplaces a binding in the organ’s map from host variablesto its parameters with a type compatible alternative. Inour running example, say an individual currently maps thehost variable curs to its N_init parameter. Since curs isnot a valid array length in idct, the individual fails theorgan test suite. Say the remap operator chooses to remapN_init. Since its type is int, the remap operator selects anew variable from among the int variables in scope at theinsertion point, which include ‘hit_eof, curs, tos, length,. . . ’ . The correct mapping is N_list to length; if the remapoperator selects it, the resulting individual will be more fit.

The second operator mutates the statements of the organ.First, it uniformly picks t, an o↵set into the organ’s statementlist. When adding or replacing, it first uniformly selects aindex into the over-organ’s statement array. To add, it insertsthe selected statement at t in the organ’s statement list; toreplace, it overwrites the statement at t with the selectedstatement. In essence, the over-organ defines a large set ofaddition and replacement operations, one for each uniquestatement, weighted by the frequency of that statement’sappearance in the over-organ. Fig. 5 shows an example ofapplying µTrans’s mutation operators.

At each generation, we select top 10% most fit individuals(i.e. elitism) and insert them into the new generation. Weuse tournament selection to select 60% of the population forreproduction. Parents must be compilable; if the proportionof possible parents is less than 60% of the population, Alg. 1generates new individuals. At the end of evolution, an organ

that passes all the tests is selected uniformly at random andinserted into the host at Hl.

Fitness Function Let IC be the set of individuals thatcan be compiled. Let T be the set of unit tests used in GP,TXi and TPi be the set of non-crashed tests and passed testsfor the individual i respectively. Our fitness function follows:

fitness(i) =

(13 (1 +

|TXi||T | + |TPi|

|T | ) i 2 IC

0 i /2 IC(1)

For the autotransplantation goal, a viable candidate must,at a minimum, compile. At the other extreme, a successfulcandidate passes all of the TO

D , the developer-provided testsuite that defines the functionality we seek to transplant.These poles form a continuum. In between fall those individ-uals who execute tests to termination, even if they fail. Ourfitness function therefore contains three equally-weightedfitness components. The first checks whether the individualcompiles properly. The second rewards an individual forexecuting test cases to termination without crashing and lastrewards an individual for passing tests in TO

D .

Implementation Implemented in TXL and C, µScalpelrealizes µTrans and comprises 28k SLoCs, of which 16k isTXL [17], and 12k is C code. µScalpel inherits the limita-tions of TXL, such as its stack limit which precludes parsinglarge programs and its default C grammar’s inability to prop-erly handle preprocessor directives. As an optimisation weinline all the function calls in the organ. Inlining eases slicereduction, eliminating unneeded parameters, returns andaliasing. For the construction of the call graphs, we use GNUcflow, and inherit its limitations related to function pointers.

5. EMPIRICAL STUDYThis section explains the subjects, test suites, and research

questions we address in our empirical evaluation of automatedcode transplantation as realized in our tool, µScalpel.

Subjects We transplant code for five donors into three hosts.We used the following criteria to choose these programs. First,they had to be written in C, because µScalpel currentlyoperates only on C programs. Second, they had to be popularreal-world programs people use. Third, they had to be diverse.Fourth, the host is the system we seek to augment, so it had tobe large and complex to present a significant transplantationchallenge, while, fifth, the organ we transplant could comefrom anywhere, so donors had to reflect a wide range of sizes.To meet these constraints, we perused GitHub, SourceForge,and GNU Savannah in August 2014, restricting our attentionto popular C projects in di↵erent application domains.

Presented in Tab. 1, our donors include the audio stream-ing client IDCT, the simple archive utility MYTAR, GNUCflow (which extracts call graphs from C source code), Web-server3 (which handles HTTP requests), the command lineencryption utility TuxCrypt, and the H.264 codec x264. Ourhosts include Pidgin, GNU Cflow (which we use as both adonor and a host), SOX, a cross-platform command line au-dio processing utility, and VLC, a media player. We use x264and VLC in our case study in Sec. 7; we use the rest in Sec. 6.These programs are diverse: their application domains

span chat, static analysis, sound processing, audio streaming,archiving, encryption, and a web server. The donors vary insize from 0.4–63k SLoC and the hosts are large, all greater3https://github.com/Hepia/webserver.

261

Does it compile?

Weak Proxies: Does it execute test cases without

crashing?Alexandru Marginean — Automated Software Transplantation

Page 12: Automated Software Transplantationissta2015.cs.uoregon.edu/slides/barr-transplantation.pdf · the selected statement at t in the organ’s statement list; to replace, it overwrites

Host

Organ

Donor

RQ1: Do we break the initial functionality?

RQ2: Have we really added new functionality?

RQ3: How about the computational effort?

RQ4: Is autotransplantation useful?

Research QuestionsRegression TestsAcceptance Tests

Alexandru Marginean — Automated Software Transplantation

Page 13: Automated Software Transplantationissta2015.cs.uoregon.edu/slides/barr-transplantation.pdf · the selected statement at t in the organ’s statement list; to replace, it overwrites

Research QuestionsRQ1: Do we break the initial functionality?

RQ3: How about the computational effort?

RQ4: Is autotransplantation useful?

RQ2: Have we really added new functionality?

Empirical Study

15 Transplantations 300 Runs 5 Donors 3 Hosts

Case Study:

H.264 Encoding Transplantation

Alexandru Marginean — Automated Software Transplantation

Page 14: Automated Software Transplantationissta2015.cs.uoregon.edu/slides/barr-transplantation.pdf · the selected statement at t in the organ’s statement list; to replace, it overwrites

Validation

Alexandru Marginean — Automated Software Transplantation

Regression Tests

Augmented Regression

Tests

Host Beneficiary

Donor Acceptance

Tests

Acceptance Tests

Manual Validation

RQ1.1

RQ1.2 RQ2

Page 15: Automated Software Transplantationissta2015.cs.uoregon.edu/slides/barr-transplantation.pdf · the selected statement at t in the organ’s statement list; to replace, it overwrites

Subjects Type Size KLOC

Reg. Tests

Organ Test Suite

Idct Donor 2.3 - 3-5Mytar Donor 0.4 - 4Cflow Donor 25 - 6-20

Webserver Donor 1.7 - 3TuxCrypt Donor 2.7 - 4-5

Pidgin Host 363 88 -Cflow Host 25 21 -SoX Host 43 157 -

Case Studyx264 Donor 63 - 5VLC Host 422 27 -

Subjects

• Minimal size: 0.4k;

• Max size: 422k;

• Average Donor:16k;

• Average Host: 213k;

Feedback-Controlled Random Test

Generation

Kohsuke Yatoh1⇤, Kazunori Sakamoto2, Fuyuki Ishikawa2, Shinichi Honiden12

1University of Tokyo, Japan,2National Institute of Informatics, Japan

{k-yatoh, exkazuu, f-ishikawa, honiden}@nii.ac.jp

ABSTRACTFeedback-directed random test generation is a widely usedtechnique to generate random method sequences. It lever-ages feedback to guide generation. However, the validity offeedback guidance has not been challenged yet. In this pa-per, we investigate the characteristics of feedback-directedrandom test generation and propose a method that exploitsthe obtained knowledge that excessive feedback limits thediversity of tests. First, we show that the feedback loopof feedback-directed generation algorithm is a positive feed-back loop and amplifies the bias that emerges in the candi-date value pool. This over-directs the generation and limitsthe diversity of generated tests. Thus, limiting the amountof feedback can improve diversity and e↵ectiveness of gener-ated tests. Second, we propose a method named feedback-controlled random test generation, which aggressively con-trols the feedback in order to promote diversity of generatedtests. Experiments on eight di↵erent, real-world applicationlibraries indicate that our method increases branch cover-age by 78% to 204% over the original feedback-directed al-gorithm on large-scale utility libraries.

Categories and Subject DescriptorsD.2.5 [Software Engineering]: Testing and Debugging—Testing tools

General TermsAlgorithms, Reliability, Verification

KeywordsRandom testing, Test generation, Diversity

⇤The author is currently a�liated with Google Inc., Japan,and can be reached at [email protected].

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that copies

bear this notice and the full citation on the first page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior specific

permission and/or a fee.

ISSTA’15 , July 12–17, 2015, Baltimore, MD, USA

Copyright 2015 ACM 978-1-4503-3620-8/15/07 ...$15.00.

1. INTRODUCTIONFeedback-directed random testing [17] is a promising tech-

nique to automatically generate software tests. The tech-nique can create random method sequences using publicmethods from the classes of a system-under-test (SUT). Itis a general and test oracle independent technique to gen-erate software tests. Due to its generality and flexibility,many researchers have used feedback-directed random test-ing. Some researchers leveraged feedback-directed randomtesting as a part of their proposed methods [5, 25]. Othersused feedback-directed random testing to prove their the-ories on random testing [11, 12]. There is an interestingstudy that mined SUT specifications by analyzing the dy-namic behavior of SUT observed during feedback-directedrandom testing [18]. In addition, feedback-directed randomtesting has already been adopted by industries and under-gone intensive use [19].Despite its importance, characteristics of feedback-directed

random testing have seldom been studied. To the best ofour knowledge, some studies have proposed extensions tofeedback-directed random testing [14, 27], but they failedto analyze the nature of feedback-directed random testing.Specifically, the idea of feedback guidance had never beenchallenged. In this paper we investigate characteristics offeedback-directed random testing by using a model SUT andpropose a new technique that exploits the obtained knowl-edge that excessive feedback over-directs generation, ampli-fies bias, and limits the diversity of generated tests.We address two research questions in this paper.

RQ1: Why does the test e↵ectiveness stop increasing atdi↵erent points depending on random seeds?

RQ2: Can our proposed technique lessen the dependencyon random seeds and improve the overall performanceof test generation?

The resulting test e↵ectiveness of feedback-directed randomtesting should di↵er because of its randomness. However,the observed di↵erence is much larger than expected. Forexample, the interquartile range marks 10% in our prelim-inary experiment on the model SUT. This spoils the credi-bility of feedback-directed random testing.There are three contributions in this paper.

• We hypothesize that feedback guidance over-directs thegeneration and limits the diversity of generated testsand show that both average score and variance of teste↵ectiveness improve by limiting the amount of feed-back.

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for profit or commercial advantage and that copies bear this notice and the full citation

on the first page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specific permission and/or a

fee. Request permissions from [email protected].

Copyright is held by the owner/author(s). Publication rights licensed to ACM.

ISSTA’15, July 13–17, 2015, Baltimore, MD, USA

ACM. 978-1-4503-3620-8/15/07

http://dx.doi.org/10.1145/2771783.2771805

Consist

ent *Complete *Well D

ocumented*Easyt

oR

euse* *

Evaluated

*ISSTA*Ar

tifact *

AEC

316

Alexandru Marginean — Automated Software Transplantation

Page 16: Automated Software Transplantationissta2015.cs.uoregon.edu/slides/barr-transplantation.pdf · the selected statement at t in the organ’s statement list; to replace, it overwrites

Experimental Methodology and Setup

μSCALPEL

Host

Implantation Point

Donor

OEOrgan’s Test Suite

Host Beneficiary

Implantation Point

Organ

Count LOC — CLOC

Count LOC CLOC

x 20

GNU Time

Validation Test Suites

Coverage Information: Gcov

Alexandru Marginean — Automated Software Transplantation

Ubuntu 14.10, 16 GB Ram 8 threads

Page 17: Automated Software Transplantationissta2015.cs.uoregon.edu/slides/barr-transplantation.pdf · the selected statement at t in the organ’s statement list; to replace, it overwrites

All Test SuitesDonor Host Passed Regression Regression++ AcceptanceIdct Pidgin 16 20 17 16

Mytar Pidgin 16 20 18 20Web Pidgin 0 20 0 18Cflow Pidgin 15 20 15 16Tux Pidgin 15 20 17 16Idct Cflow 16 17 16 16

Mytar Cflow 17 17 17 20Web Cflow 0 0 0 17Cflow Cflow 20 20 20 20Tux Cflow 14 15 14 16Idct SoX 15 18 17 16

Mytar SoX 17 17 17 20Web SoX 0 0 0 17Cflow SoX 14 16 15 14Tux SoX 13 13 13 14

TOTAL 188/300 233/300 196/300 256/300 RQ1.1 RQ1.2 RQ2

Empirical Study RQ1,2

Feedback-Controlled Random Test

Generation

Kohsuke Yatoh1⇤, Kazunori Sakamoto2, Fuyuki Ishikawa2, Shinichi Honiden12

1University of Tokyo, Japan,2National Institute of Informatics, Japan

{k-yatoh, exkazuu, f-ishikawa, honiden}@nii.ac.jp

ABSTRACTFeedback-directed random test generation is a widely usedtechnique to generate random method sequences. It lever-ages feedback to guide generation. However, the validity offeedback guidance has not been challenged yet. In this pa-per, we investigate the characteristics of feedback-directedrandom test generation and propose a method that exploitsthe obtained knowledge that excessive feedback limits thediversity of tests. First, we show that the feedback loopof feedback-directed generation algorithm is a positive feed-back loop and amplifies the bias that emerges in the candi-date value pool. This over-directs the generation and limitsthe diversity of generated tests. Thus, limiting the amountof feedback can improve diversity and e↵ectiveness of gener-ated tests. Second, we propose a method named feedback-controlled random test generation, which aggressively con-trols the feedback in order to promote diversity of generatedtests. Experiments on eight di↵erent, real-world applicationlibraries indicate that our method increases branch cover-age by 78% to 204% over the original feedback-directed al-gorithm on large-scale utility libraries.

Categories and Subject DescriptorsD.2.5 [Software Engineering]: Testing and Debugging—Testing tools

General TermsAlgorithms, Reliability, Verification

KeywordsRandom testing, Test generation, Diversity

⇤The author is currently a�liated with Google Inc., Japan,and can be reached at [email protected].

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that copies

bear this notice and the full citation on the first page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior specific

permission and/or a fee.

ISSTA’15 , July 12–17, 2015, Baltimore, MD, USA

Copyright 2015 ACM 978-1-4503-3620-8/15/07 ...$15.00.

1. INTRODUCTIONFeedback-directed random testing [17] is a promising tech-

nique to automatically generate software tests. The tech-nique can create random method sequences using publicmethods from the classes of a system-under-test (SUT). Itis a general and test oracle independent technique to gen-erate software tests. Due to its generality and flexibility,many researchers have used feedback-directed random test-ing. Some researchers leveraged feedback-directed randomtesting as a part of their proposed methods [5, 25]. Othersused feedback-directed random testing to prove their the-ories on random testing [11, 12]. There is an interestingstudy that mined SUT specifications by analyzing the dy-namic behavior of SUT observed during feedback-directedrandom testing [18]. In addition, feedback-directed randomtesting has already been adopted by industries and under-gone intensive use [19].Despite its importance, characteristics of feedback-directed

random testing have seldom been studied. To the best ofour knowledge, some studies have proposed extensions tofeedback-directed random testing [14, 27], but they failedto analyze the nature of feedback-directed random testing.Specifically, the idea of feedback guidance had never beenchallenged. In this paper we investigate characteristics offeedback-directed random testing by using a model SUT andpropose a new technique that exploits the obtained knowl-edge that excessive feedback over-directs generation, ampli-fies bias, and limits the diversity of generated tests.We address two research questions in this paper.

RQ1: Why does the test e↵ectiveness stop increasing atdi↵erent points depending on random seeds?

RQ2: Can our proposed technique lessen the dependencyon random seeds and improve the overall performanceof test generation?

The resulting test e↵ectiveness of feedback-directed randomtesting should di↵er because of its randomness. However,the observed di↵erence is much larger than expected. Forexample, the interquartile range marks 10% in our prelim-inary experiment on the model SUT. This spoils the credi-bility of feedback-directed random testing.There are three contributions in this paper.

• We hypothesize that feedback guidance over-directs thegeneration and limits the diversity of generated testsand show that both average score and variance of teste↵ectiveness improve by limiting the amount of feed-back.

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for profit or commercial advantage and that copies bear this notice and the full citation

on the first page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specific permission and/or a

fee. Request permissions from [email protected].

Copyright is held by the owner/author(s). Publication rights licensed to ACM.

ISSTA’15, July 13–17, 2015, Baltimore, MD, USA

ACM. 978-1-4503-3620-8/15/07

http://dx.doi.org/10.1145/2771783.2771805

Consist

ent *Complete *Well D

ocumented*Easyt

oR

euse* *

Evaluated

*ISSTA*Ar

tifact *

AEC

316

Alexandru Marginean — Automated Software Transplantation

PassedAll

188/300

Regression

233/300RQ1.1

Regression++

196/300RQ1.2

Acceptance

256/300 RQ2

Page 18: Automated Software Transplantationissta2015.cs.uoregon.edu/slides/barr-transplantation.pdf · the selected statement at t in the organ’s statement list; to replace, it overwrites

Timing InformationExecution Time (minutes)

Donor HostIdct Pidgin 5 7 97

Mytar Pidgin 3 1 65Web Pidgin 8 5 160Cflow Pidgin 58 16 1151Tux Pidgin 29 10 574Idct Cflow 3 5 59

Mytar Cflow 3 1 53Web Cflow 5 2 102Cflow Cflow 44 9 872Tux Cflow 31 11 623Idct SoX 12 17 233

Mytar SoX 3 1 60Web SoX 7 3 132Cflow SoX 89 53 74Tux SoX 34 13 94

TotalRQ3

Empirical Study RQ3

Feedback-Controlled Random Test

Generation

Kohsuke Yatoh1⇤, Kazunori Sakamoto2, Fuyuki Ishikawa2, Shinichi Honiden12

1University of Tokyo, Japan,2National Institute of Informatics, Japan

{k-yatoh, exkazuu, f-ishikawa, honiden}@nii.ac.jp

ABSTRACTFeedback-directed random test generation is a widely usedtechnique to generate random method sequences. It lever-ages feedback to guide generation. However, the validity offeedback guidance has not been challenged yet. In this pa-per, we investigate the characteristics of feedback-directedrandom test generation and propose a method that exploitsthe obtained knowledge that excessive feedback limits thediversity of tests. First, we show that the feedback loopof feedback-directed generation algorithm is a positive feed-back loop and amplifies the bias that emerges in the candi-date value pool. This over-directs the generation and limitsthe diversity of generated tests. Thus, limiting the amountof feedback can improve diversity and e↵ectiveness of gener-ated tests. Second, we propose a method named feedback-controlled random test generation, which aggressively con-trols the feedback in order to promote diversity of generatedtests. Experiments on eight di↵erent, real-world applicationlibraries indicate that our method increases branch cover-age by 78% to 204% over the original feedback-directed al-gorithm on large-scale utility libraries.

Categories and Subject DescriptorsD.2.5 [Software Engineering]: Testing and Debugging—Testing tools

General TermsAlgorithms, Reliability, Verification

KeywordsRandom testing, Test generation, Diversity

⇤The author is currently a�liated with Google Inc., Japan,and can be reached at [email protected].

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that copies

bear this notice and the full citation on the first page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior specific

permission and/or a fee.

ISSTA’15 , July 12–17, 2015, Baltimore, MD, USA

Copyright 2015 ACM 978-1-4503-3620-8/15/07 ...$15.00.

1. INTRODUCTIONFeedback-directed random testing [17] is a promising tech-

nique to automatically generate software tests. The tech-nique can create random method sequences using publicmethods from the classes of a system-under-test (SUT). Itis a general and test oracle independent technique to gen-erate software tests. Due to its generality and flexibility,many researchers have used feedback-directed random test-ing. Some researchers leveraged feedback-directed randomtesting as a part of their proposed methods [5, 25]. Othersused feedback-directed random testing to prove their the-ories on random testing [11, 12]. There is an interestingstudy that mined SUT specifications by analyzing the dy-namic behavior of SUT observed during feedback-directedrandom testing [18]. In addition, feedback-directed randomtesting has already been adopted by industries and under-gone intensive use [19].Despite its importance, characteristics of feedback-directed

random testing have seldom been studied. To the best ofour knowledge, some studies have proposed extensions tofeedback-directed random testing [14, 27], but they failedto analyze the nature of feedback-directed random testing.Specifically, the idea of feedback guidance had never beenchallenged. In this paper we investigate characteristics offeedback-directed random testing by using a model SUT andpropose a new technique that exploits the obtained knowl-edge that excessive feedback over-directs generation, ampli-fies bias, and limits the diversity of generated tests.We address two research questions in this paper.

RQ1: Why does the test e↵ectiveness stop increasing atdi↵erent points depending on random seeds?

RQ2: Can our proposed technique lessen the dependencyon random seeds and improve the overall performanceof test generation?

The resulting test e↵ectiveness of feedback-directed randomtesting should di↵er because of its randomness. However,the observed di↵erence is much larger than expected. Forexample, the interquartile range marks 10% in our prelim-inary experiment on the model SUT. This spoils the credi-bility of feedback-directed random testing.There are three contributions in this paper.

• We hypothesize that feedback guidance over-directs thegeneration and limits the diversity of generated testsand show that both average score and variance of teste↵ectiveness improve by limiting the amount of feed-back.

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for profit or commercial advantage and that copies bear this notice and the full citation

on the first page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specific permission and/or a

fee. Request permissions from [email protected].

Copyright is held by the owner/author(s). Publication rights licensed to ACM.

ISSTA’15, July 13–17, 2015, Baltimore, MD, USA

ACM. 978-1-4503-3620-8/15/07

http://dx.doi.org/10.1145/2771783.2771805

Consist

ent *Complete *Well D

ocumented*Easyt

oR

euse* *

Evaluated

*ISSTA*Ar

tifact *

AEC

316

Alexandru Marginean — Automated Software Transplantation

Average

334 (min)

Std. Dev.

10 (Average)

Total

72 (hours)

Page 19: Automated Software Transplantationissta2015.cs.uoregon.edu/slides/barr-transplantation.pdf · the selected statement at t in the organ’s statement list; to replace, it overwrites

Case Study RQ4

Transplant Time & Test Suites

Time (hours) Regression Regression++ Acceptance

H.264 26 100% 100% 100%

Feedback-Controlled Random Test

Generation

Kohsuke Yatoh1⇤, Kazunori Sakamoto2, Fuyuki Ishikawa2, Shinichi Honiden12

1University of Tokyo, Japan,2National Institute of Informatics, Japan

{k-yatoh, exkazuu, f-ishikawa, honiden}@nii.ac.jp

ABSTRACTFeedback-directed random test generation is a widely usedtechnique to generate random method sequences. It lever-ages feedback to guide generation. However, the validity offeedback guidance has not been challenged yet. In this pa-per, we investigate the characteristics of feedback-directedrandom test generation and propose a method that exploitsthe obtained knowledge that excessive feedback limits thediversity of tests. First, we show that the feedback loopof feedback-directed generation algorithm is a positive feed-back loop and amplifies the bias that emerges in the candi-date value pool. This over-directs the generation and limitsthe diversity of generated tests. Thus, limiting the amountof feedback can improve diversity and e↵ectiveness of gener-ated tests. Second, we propose a method named feedback-controlled random test generation, which aggressively con-trols the feedback in order to promote diversity of generatedtests. Experiments on eight di↵erent, real-world applicationlibraries indicate that our method increases branch cover-age by 78% to 204% over the original feedback-directed al-gorithm on large-scale utility libraries.

Categories and Subject DescriptorsD.2.5 [Software Engineering]: Testing and Debugging—Testing tools

General TermsAlgorithms, Reliability, Verification

KeywordsRandom testing, Test generation, Diversity

⇤The author is currently a�liated with Google Inc., Japan,and can be reached at [email protected].

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that copies

bear this notice and the full citation on the first page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior specific

permission and/or a fee.

ISSTA’15 , July 12–17, 2015, Baltimore, MD, USA

Copyright 2015 ACM 978-1-4503-3620-8/15/07 ...$15.00.

1. INTRODUCTIONFeedback-directed random testing [17] is a promising tech-

nique to automatically generate software tests. The tech-nique can create random method sequences using publicmethods from the classes of a system-under-test (SUT). Itis a general and test oracle independent technique to gen-erate software tests. Due to its generality and flexibility,many researchers have used feedback-directed random test-ing. Some researchers leveraged feedback-directed randomtesting as a part of their proposed methods [5, 25]. Othersused feedback-directed random testing to prove their the-ories on random testing [11, 12]. There is an interestingstudy that mined SUT specifications by analyzing the dy-namic behavior of SUT observed during feedback-directedrandom testing [18]. In addition, feedback-directed randomtesting has already been adopted by industries and under-gone intensive use [19].Despite its importance, characteristics of feedback-directed

random testing have seldom been studied. To the best ofour knowledge, some studies have proposed extensions tofeedback-directed random testing [14, 27], but they failedto analyze the nature of feedback-directed random testing.Specifically, the idea of feedback guidance had never beenchallenged. In this paper we investigate characteristics offeedback-directed random testing by using a model SUT andpropose a new technique that exploits the obtained knowl-edge that excessive feedback over-directs generation, ampli-fies bias, and limits the diversity of generated tests.We address two research questions in this paper.

RQ1: Why does the test e↵ectiveness stop increasing atdi↵erent points depending on random seeds?

RQ2: Can our proposed technique lessen the dependencyon random seeds and improve the overall performanceof test generation?

The resulting test e↵ectiveness of feedback-directed randomtesting should di↵er because of its randomness. However,the observed di↵erence is much larger than expected. Forexample, the interquartile range marks 10% in our prelim-inary experiment on the model SUT. This spoils the credi-bility of feedback-directed random testing.There are three contributions in this paper.

• We hypothesize that feedback guidance over-directs thegeneration and limits the diversity of generated testsand show that both average score and variance of teste↵ectiveness improve by limiting the amount of feed-back.

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for profit or commercial advantage and that copies bear this notice and the full citation

on the first page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specific permission and/or a

fee. Request permissions from [email protected].

Copyright is held by the owner/author(s). Publication rights licensed to ACM.

ISSTA’15, July 13–17, 2015, Baltimore, MD, USA

ACM. 978-1-4503-3620-8/15/07

http://dx.doi.org/10.1145/2771783.2771805

Consist

ent *Complete *Well D

ocumented*Easyt

oR

euse* *

Evaluated

*ISSTA*Ar

tifact *

AEC

316

Alexandru Marginean — Automated Software Transplantation

MSU Sixth MPEG-4 AVC/H.264

Video Codecs Comparison, with ~24% better encoding

than second place.

Second Annual MSU MPEG-4 AVC/ H.264 Codecs Comparison

Doom9's 2005 Codec Shoot-Out

Page 20: Automated Software Transplantationissta2015.cs.uoregon.edu/slides/barr-transplantation.pdf · the selected statement at t in the organ’s statement list; to replace, it overwrites

Automated Software Transplantation

H

D

OO

ENTRY

V

Organ’s Test Suite

Manual Work:

Organ Entry

Organ’s Test Suite

Implantation Point

Alexandru Marginean — Automated Software Transplantation

μTrans

Alexandru Marginean — Automated Software Transplantation

Host

Donor

Stage 1: Static Analysis

Host Beneficiary

Stage 2: Genetic

Programming

Stage 3: Organ

Implantation

Organ’s Test Suite

Validation

Alexandru Marginean — Automated Software Transplantation

Regression Tests

Augmented Regression

Tests

Host Beneficiary

Donor Acceptance

Tests

Acceptance Tests

Manual Validation

RQ1.a

RQ1.b RQ2

Subjects Type Size KLOC

Reg. Tests

Organ Test Suite

Idct Donor 2.3 - 3-5Mytar Donor 0.4 - 4Cflow Donor 25 - 6-20

Webserver Donor 1.7 - 3TuxCrypt Donor 2.7 - 4-5

Pidgin Host 363 88 -Cflow Host 25 21 -SoX Host 43 157 -

Case Studyx264 Donor 63 - 5VLC Host 422 27 -

Subjects

• Minimal size: 0.4k;

• Max size: 422k;

• Average Donor:16k;

• Average Host: 213k;

Feedback-Controlled Random Test

Generation

Kohsuke Yatoh1⇤, Kazunori Sakamoto2, Fuyuki Ishikawa2, Shinichi Honiden12

1University of Tokyo, Japan,2National Institute of Informatics, Japan

{k-yatoh, exkazuu, f-ishikawa, honiden}@nii.ac.jp

ABSTRACTFeedback-directed random test generation is a widely usedtechnique to generate random method sequences. It lever-ages feedback to guide generation. However, the validity offeedback guidance has not been challenged yet. In this pa-per, we investigate the characteristics of feedback-directedrandom test generation and propose a method that exploitsthe obtained knowledge that excessive feedback limits thediversity of tests. First, we show that the feedback loopof feedback-directed generation algorithm is a positive feed-back loop and amplifies the bias that emerges in the candi-date value pool. This over-directs the generation and limitsthe diversity of generated tests. Thus, limiting the amountof feedback can improve diversity and e↵ectiveness of gener-ated tests. Second, we propose a method named feedback-controlled random test generation, which aggressively con-trols the feedback in order to promote diversity of generatedtests. Experiments on eight di↵erent, real-world applicationlibraries indicate that our method increases branch cover-age by 78% to 204% over the original feedback-directed al-gorithm on large-scale utility libraries.

Categories and Subject DescriptorsD.2.5 [Software Engineering]: Testing and Debugging—Testing tools

General TermsAlgorithms, Reliability, Verification

KeywordsRandom testing, Test generation, Diversity

⇤The author is currently a�liated with Google Inc., Japan,and can be reached at [email protected].

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that copies

bear this notice and the full citation on the first page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior specific

permission and/or a fee.

ISSTA’15 , July 12–17, 2015, Baltimore, MD, USA

Copyright 2015 ACM 978-1-4503-3620-8/15/07 ...$15.00.

1. INTRODUCTIONFeedback-directed random testing [17] is a promising tech-

nique to automatically generate software tests. The tech-nique can create random method sequences using publicmethods from the classes of a system-under-test (SUT). Itis a general and test oracle independent technique to gen-erate software tests. Due to its generality and flexibility,many researchers have used feedback-directed random test-ing. Some researchers leveraged feedback-directed randomtesting as a part of their proposed methods [5, 25]. Othersused feedback-directed random testing to prove their the-ories on random testing [11, 12]. There is an interestingstudy that mined SUT specifications by analyzing the dy-namic behavior of SUT observed during feedback-directedrandom testing [18]. In addition, feedback-directed randomtesting has already been adopted by industries and under-gone intensive use [19].Despite its importance, characteristics of feedback-directed

random testing have seldom been studied. To the best ofour knowledge, some studies have proposed extensions tofeedback-directed random testing [14, 27], but they failedto analyze the nature of feedback-directed random testing.Specifically, the idea of feedback guidance had never beenchallenged. In this paper we investigate characteristics offeedback-directed random testing by using a model SUT andpropose a new technique that exploits the obtained knowl-edge that excessive feedback over-directs generation, ampli-fies bias, and limits the diversity of generated tests.We address two research questions in this paper.

RQ1: Why does the test e↵ectiveness stop increasing atdi↵erent points depending on random seeds?

RQ2: Can our proposed technique lessen the dependencyon random seeds and improve the overall performanceof test generation?

The resulting test e↵ectiveness of feedback-directed randomtesting should di↵er because of its randomness. However,the observed di↵erence is much larger than expected. Forexample, the interquartile range marks 10% in our prelim-inary experiment on the model SUT. This spoils the credi-bility of feedback-directed random testing.There are three contributions in this paper.

• We hypothesize that feedback guidance over-directs thegeneration and limits the diversity of generated testsand show that both average score and variance of teste↵ectiveness improve by limiting the amount of feed-back.

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for profit or commercial advantage and that copies bear this notice and the full citation

on the first page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specific permission and/or a

fee. Request permissions from [email protected].

Copyright is held by the owner/author(s). Publication rights licensed to ACM.

ISSTA’15, July 13–17, 2015, Baltimore, MD, USA

ACM. 978-1-4503-3620-8/15/07

http://dx.doi.org/10.1145/2771783.2771805

Consist

ent *Complete *

Well D

ocumented*Easyt

oR

euse* *

Evaluated

*ISSTA*Ar

tifact *

AEC

316

Alexandru Marginean — Automated Software Transplantation