Test-Driven Reuse: Improving the Selection of Semantically Relevant Source Code

Test-Driven ReuseImproving the Selection of Semantically Relevant Code

Mehrdad [email protected]

Department of Computer ScienceUniversity of Calgary

2 April 2014

2Mehrdad Nurolahzade


Interface-based Retrieval

Research Questions

Q1: Is interface-based retrieval effective in large source code libraries?

Q2: Does including additional test facts improve selection?

Q3: Does including additional test facts improve approximate retrieval?


An Assessment of Test-Driven Reuse*

• 10 realistic test-driven reuse tasks

• Solutions were verified to be in the repositories.

• Qualitatively analyzed top 10 results

• Each tool managed to retrieve only one good solution.

*Mehrdad Nurolahzade, Robert J. Walker, Frank Maurer, "An Assessment of Test-Driven Reuse: Promises and Pitfalls". In Proceedings of 13th International Conference on Software Reuse (ICSR 2013), Pisa, Italy, June 18-20, 2013.

Mehrdad Nurolahzade 5

An Assessment of Test-Driven Reuse*

• Potential bugs in tool prototypes

• Interface-based retrieval fails in large repositories when keywords are very common or unknown.

*Mehrdad Nurolahzade, Robert J. Walker, Frank Maurer, "An Assessment of Test-Driven Reuse: Promises and Pitfalls". In Proceedings of 13th International Conference on Software Reuse (ICSR 2013), Pisa, Italy, June 18-20, 2013.


Q1: Is interface-based retrieval effective in large source code repositories?

Reviver

Approach: Reviver

7

Similar Test Cases

Test Case

Transformed Source Code

Compiled Binary Code

Results

Extract

Similarity SearchCompile

Test

Transform

DisplayWrite

Mehrdad Nurolahzade

Developer

Facts

System Under Test

RetrieveRelevant Source Code

Interface of the System Under Test

Interface-based Search

Transform

Reviver: Heterogeneous Data Model


Test Indexer Test Case x

Lexical Facts

Structural Facts

Data Flow Facts

Lexical Model Relational Model Graph Model

Test Case y

Test Case z

New Model

Other Facts

Reviver: Multiple Representations


Lexical Facts

AccountTest, Account,

from, to, Bank, bank,

getInstance, register,

testValidTransfer,

fromBalance,

getBalance, toBalance,

getLastTransaction,

Transaction, t,

transfer

Structural Facts

Data Flow Facts

Reviver: Federated Search

Lexical similarity (simLexical)

Reference similarity (simType)

Call-set similarity (simCall)

Data flow similarity (simDataFlow)

Σ


Test Case Results

Lexical Model

Relational Model

Graph Model

Relational Model

Evaluation: Exact Match Retrieval

• Ad hoc interface-based retrieval prototype

• Repository: seeded with a subset of Merobase

• Tasks: 10 trial tasks from the original study


Task Interface-based Retrieval Reviver

#1 3 1

#2 1 1

#3 1 1

#4 1 1

#5 13 1

#6 30 1

#7 2 1

#8 2 1

#9 1 1

#10 1 1


Q2: Does including additional test facts improve selection?

Rank of the correct result for each task

Evaluation: Approximate Match Retrieval

• Transformations generate variations of a query.

• Transformations can be combined (24-1=15).

Transaction t = from.transfer(100.0, to);

Name Transaction a = c.m2(100.0,

b);

Type C2t = from.transfer(“100”, to);

Scenario Transaction t = from.transfer(1.0,

to);

Protocol int id = from.transfer(to, 100.0,

true);Mehrdad Nurolahzade 13


Q3:Does including additional test facts improve approximate retrieval?

Number of correct results for each transformation(N)ame, (T)ype, (S)cenario, (P)rotocol

Conclusion

• Contributions– An evaluation of interface-based retrieval– A new paradigm for test-driven reuse– A multi-representation reuse library– The Reviver prototype– A technique for evaluating test-driven reuse

• Implications– How to detect semantic similarity in source code in

absence of lexical and type similarity?– Multi-representation reuse libraries are promising.