15
Test-Driven Reuse Improving the Selection of Semantically Relevant Code Mehrdad Nurolahzade [email protected] Department of Computer Science University of Calgary 2 April 2014

Test-Driven Reuse: Improving the Selection of Semantically Relevant Source Code

Embed Size (px)

Citation preview

Page 1: Test-Driven Reuse: Improving the Selection of Semantically Relevant Source Code

Test-Driven ReuseImproving the Selection of Semantically Relevant Code

Mehrdad [email protected]

Department of Computer ScienceUniversity of Calgary

2 April 2014

Page 2: Test-Driven Reuse: Improving the Selection of Semantically Relevant Source Code

2Mehrdad Nurolahzade

Page 3: Test-Driven Reuse: Improving the Selection of Semantically Relevant Source Code

3Mehrdad Nurolahzade

Interface-based Retrieval

Page 4: Test-Driven Reuse: Improving the Selection of Semantically Relevant Source Code

Research Questions

Q1: Is interface-based retrieval effective in large source code libraries?

Q2: Does including additional test facts improve selection?

Q3: Does including additional test facts improve approximate retrieval?

4Mehrdad Nurolahzade

Page 5: Test-Driven Reuse: Improving the Selection of Semantically Relevant Source Code

An Assessment of Test-Driven Reuse*

• 10 realistic test-driven reuse tasks

• Solutions were verified to be in the repositories.

• Qualitatively analyzed top 10 results

• Each tool managed to retrieve only one good solution.

*Mehrdad Nurolahzade, Robert J. Walker, Frank Maurer, "An Assessment of Test-Driven Reuse: Promises and Pitfalls". In Proceedings of 13th International Conference on Software Reuse (ICSR 2013), Pisa, Italy, June 18-20, 2013.

Mehrdad Nurolahzade 5

Page 6: Test-Driven Reuse: Improving the Selection of Semantically Relevant Source Code

An Assessment of Test-Driven Reuse*

• Potential bugs in tool prototypes

• Interface-based retrieval fails in large repositories when keywords are very common or unknown.

*Mehrdad Nurolahzade, Robert J. Walker, Frank Maurer, "An Assessment of Test-Driven Reuse: Promises and Pitfalls". In Proceedings of 13th International Conference on Software Reuse (ICSR 2013), Pisa, Italy, June 18-20, 2013.

Mehrdad Nurolahzade 6

Q1: Is interface-based retrieval effective in large source code repositories?

Page 7: Test-Driven Reuse: Improving the Selection of Semantically Relevant Source Code

Reviver

Approach: Reviver

7

Similar Test Cases

Test Case

Transformed Source Code

Compiled Binary Code

Results

Extract

Similarity SearchCompile

Test

Transform

DisplayWrite

Mehrdad Nurolahzade

Developer

Facts

System Under Test

RetrieveRelevant Source Code

Interface of the System Under Test

Interface-based Search

Transform

Page 8: Test-Driven Reuse: Improving the Selection of Semantically Relevant Source Code

Reviver: Heterogeneous Data Model

Mehrdad Nurolahzade 8

Test Indexer Test Case x

Lexical Facts

Structural Facts

Data Flow Facts

Lexical Model Relational Model Graph Model

Test Case y

Test Case z

New Model

Other Facts

Page 9: Test-Driven Reuse: Improving the Selection of Semantically Relevant Source Code

Reviver: Multiple Representations

9Mehrdad Nurolahzade

Lexical Facts

AccountTest, Account,

from, to, Bank, bank,

getInstance, register,

testValidTransfer,

fromBalance,

getBalance, toBalance,

getLastTransaction,

Transaction, t,

transfer

Structural Facts

Data Flow Facts

Page 10: Test-Driven Reuse: Improving the Selection of Semantically Relevant Source Code

Reviver: Federated Search

Lexical similarity (simLexical)

Reference similarity (simType)

Call-set similarity (simCall)

Data flow similarity (simDataFlow)

Σ

10Mehrdad Nurolahzade

Test Case Results

Lexical Model

Relational Model

Graph Model

Relational Model

Page 11: Test-Driven Reuse: Improving the Selection of Semantically Relevant Source Code

Evaluation: Exact Match Retrieval

• Ad hoc interface-based retrieval prototype

• Repository: seeded with a subset of Merobase

• Tasks: 10 trial tasks from the original study

11Mehrdad Nurolahzade

Page 12: Test-Driven Reuse: Improving the Selection of Semantically Relevant Source Code

Task Interface-based Retrieval Reviver

#1 3 1

#2 1 1

#3 1 1

#4 1 1

#5 13 1

#6 30 1

#7 2 1

#8 2 1

#9 1 1

#10 1 1

Mehrdad Nurolahzade 12

Q2: Does including additional test facts improve selection?

Rank of the correct result for each task

Page 13: Test-Driven Reuse: Improving the Selection of Semantically Relevant Source Code

Evaluation: Approximate Match Retrieval

• Transformations generate variations of a query.

• Transformations can be combined (24-1=15).

Transaction t = from.transfer(100.0, to);

Name Transaction a = c.m2(100.0,

b);

Type C2t = from.transfer(“100”, to);

Scenario Transaction t = from.transfer(1.0,

to);

Protocol int id = from.transfer(to, 100.0,

true);Mehrdad Nurolahzade 13

Page 14: Test-Driven Reuse: Improving the Selection of Semantically Relevant Source Code

14Mehrdad Nurolahzade

Q3:Does including additional test facts improve approximate retrieval?

Number of correct results for each transformation(N)ame, (T)ype, (S)cenario, (P)rotocol

Page 15: Test-Driven Reuse: Improving the Selection of Semantically Relevant Source Code

Conclusion

• Contributions– An evaluation of interface-based retrieval– A new paradigm for test-driven reuse– A multi-representation reuse library– The Reviver prototype– A technique for evaluating test-driven reuse

• Implications– How to detect semantic similarity in source code in

absence of lexical and type similarity?– Multi-representation reuse libraries are promising.

15Mehrdad Nurolahzade