34
Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK

Scalable Clone Detection and Elimination for Erlang Programs

  • Upload
    luann

  • View
    38

  • Download
    0

Embed Size (px)

DESCRIPTION

Scalable Clone Detection and Elimination for Erlang Programs. Huiqing Li, Simon Thompson University of Kent Canterbury, UK. Overview. Erlang Wrangler Clone detection Clone elimination Case studies Conclusions and future work. Erlang. Weakly typed functional programming language. - PowerPoint PPT Presentation

Citation preview

Page 1: Scalable Clone Detection and Elimination for  Erlang Programs

Scalable Clone Detection and Elimination for Erlang Programs

Huiqing Li, Simon Thompson

University of KentCanterbury, UK

Page 2: Scalable Clone Detection and Elimination for  Erlang Programs

Overview

Erlang

Wrangler

Clone detection

Clone elimination

Case studies

Conclusions and future work

Page 3: Scalable Clone Detection and Elimination for  Erlang Programs

Erlang• Weakly typed functional programming language.

• Built-in support for concurrency, distribution and fault-

tolerance.

• Some eccentricities: multiple binding occurrences,

bound variables in patterns, multiple usages of atoms,

side-effects, .... %% Factorial in Erlang. -module (fac).

-export ([fac/1]).

fac(0) -> 1; fac(N) when N > 0 -> N * fac(N-1).

Page 4: Scalable Clone Detection and Elimination for  Erlang Programs

Wrangler

Basic refactorings: structural, macro, process and test-framework related

Clone detection+ removal

Improve modulestructure

Page 5: Scalable Clone Detection and Elimination for  Erlang Programs
Page 6: Scalable Clone Detection and Elimination for  Erlang Programs
Page 7: Scalable Clone Detection and Elimination for  Erlang Programs

Clone Detection

Page 8: Scalable Clone Detection and Elimination for  Erlang Programs

Clone Detection

• The Wrangler clone detector

– Report clone classes whose members are

identical or similar

– No false positives

– High recall rate

– Scalable.

Page 9: Scalable Clone Detection and Elimination for  Erlang Programs

X+4 Y+5X+4 Y+5

What is ‘identical’ code?

variable+number

Identical if values of literals and variables

ignored, but respecting binding structure.

Page 10: Scalable Clone Detection and Elimination for  Erlang Programs

(X+3)+4 4+(5-(3*X))

What is ‘similar’ code?

X+Y

The anti-unification gives the (most specific)

common generalisation.

Similarity = min( , , )||(X+3)+4||||4+(5-(3*X))||

||X+Y|| ||X+Y||

Page 11: Scalable Clone Detection and Elimination for  Erlang Programs

Clone Detection

• All clones in a project meeting the threshold

parameters.

• Thresholds:

– minimum number of expressions,

– minimum number of tokens,

– minimum number of duplications,

– maximum number of new parameters, and

– minimum similarity score.

Page 12: Scalable Clone Detection and Elimination for  Erlang Programs
Page 13: Scalable Clone Detection and Elimination for  Erlang Programs

Clone result with threshold values: 1, 40, 2, 4, 0.8:

Page 14: Scalable Clone Detection and Elimination for  Erlang Programs

Clone result with threshold values: 3, 20, 2, 2,0.8:

Page 15: Scalable Clone Detection and Elimination for  Erlang Programs

Implementation

Page 16: Scalable Clone Detection and Elimination for  Erlang Programs

Implementation

• Clone detection in an incremental way.

– Initial clone detection.

– Incremental clone detection.

• AST-based two-phase clone detection.

Page 17: Scalable Clone Detection and Elimination for  Erlang Programs

Parse program, annotate and serialise AST

Generalise and hash expression

Clone detection using generalised suffix tree

Examination of clone candidates using anti-unification

Source Erlang programs

Serialised AAST

Hashed expression sequences

Initial clone candidates

Final clones

The Initial Detection Algorithm

• Bypasses the Erlang pre-processor;

• Location information included In AST;

• Static semantic information added to AST

• AAST traversed, and expression sequences collected.

• Bypasses the Erlang pre-processor;

• Location information included In AST;

• Static semantic information added to AST

• AAST traversed, and expression sequences collected.

• Capture structural similarity between expressions while keeping a structural skeleton of the original;

• Replace certain substrees with a placeholder, but only if sensible to do so.

• Each expression statement is hashed and mapped to an integer; therefore each expression sequence is mapped to a sequence of integers.

• Capture structural similarity between expressions while keeping a structural skeleton of the original;

• Replace certain substrees with a placeholder, but only if sensible to do so.

• Each expression statement is hashed and mapped to an integer; therefore each expression sequence is mapped to a sequence of integers.

• Check a candidate clone class for anti-unification, and will return none, one or more clone classes;

• Generation of anti_unifier function;

• Generation of application instances.

• Check a candidate clone class for anti-unification, and will return none, one or more clone classes;

• Generation of anti_unifier function;

• Generation of application instances.

Page 18: Scalable Clone Detection and Elimination for  Erlang Programs

The Initial Detection Algorithm

• Designed with incremental clone detection in

mind.

– Use relative locations, every function starts from

location {1, 1};

– Intermediate information cached: AAST, Static

semantic information, hash information, clone

table.

Page 19: Scalable Clone Detection and Elimination for  Erlang Programs

The Incremental Detection Algorithm

• Follow the same steps as the initial detection

algorithm, but reuse and incrementally update

the information cached from the previous run

of the clone detection.

• Take a function, instead of a file, as a unit to

track changes.

• Track the change of clones, mark each clone

class as new, unchanged, change+, changed-,

or change+- .

Page 20: Scalable Clone Detection and Elimination for  Erlang Programs
Page 21: Scalable Clone Detection and Elimination for  Erlang Programs

Clone Elimination

• Fully automatic clone elimination not desirable in

practice.

– Choice of clones to remove.

– functionality of the clone needs to be examined.

– the anti-unification function of a clone class, and its

parameters need to be renamed.

– A host module for the anti-unification function needs

to be selected.

Page 22: Scalable Clone Detection and Elimination for  Erlang Programs

Clone Elimination with Wrangler• Copy and paste the anti_unification function to an proper

Erlang module.

• Modify the anti_unification function is necessary.

• Rename function name.

• Rename variable names.

• Re-order function parameters.

• Apply ‘fold expressions against a function definition’ to

the new function.

Page 23: Scalable Clone Detection and Elimination for  Erlang Programs

Case Study 1

Page 24: Scalable Clone Detection and Elimination for  Erlang Programs

Incremental vs. Standalone Clone Detection

Page 25: Scalable Clone Detection and Elimination for  Erlang Programs

Case Study 2

Page 26: Scalable Clone Detection and Elimination for  Erlang Programs

SIP case study

Session Initiation Protocol

SIP message processing allows rewriting rules to transform messages.

SIP message manipulation (SMM) is tested by smm_SUITE.erl, 2658 LOC.

Page 27: Scalable Clone Detection and Elimination for  Erlang Programs

Clone detection

Page 28: Scalable Clone Detection and Elimination for  Erlang Programs

Clone detection

Page 29: Scalable Clone Detection and Elimination for  Erlang Programs

Reducing the case study

Step1 2658 6 2218 11 2131

2 2342 7 2203 12 2097

3 2231 8 2201 13 2042

4 2217 9 2183 … …

5 2216 10 2149

Page 30: Scalable Clone Detection and Elimination for  Erlang Programs

Case Study 3

Page 31: Scalable Clone Detection and Elimination for  Erlang Programs
Page 32: Scalable Clone Detection and Elimination for  Erlang Programs

Conclusions

• Efficient clone detection on medium-sized projects.• Possible to improve code using these techniques, but only with expert involvement.• A mechanism for clone detection to contribute to the daily reports from incremental nightly builds; case-study for this with LambdaStream.

Page 33: Scalable Clone Detection and Elimination for  Erlang Programs

Future Work

• To extend the tool to detect expression sequences which are similar up to insertion, or deletion of some expressions.• To check client code against libraries.

Page 34: Scalable Clone Detection and Elimination for  Erlang Programs

http://www.cs.kent.ac.uk/projects/wrangler/

Thank you!