23
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods Toshihiro Kamiya Interdisciplinary Graduate School of Sci. & Eng., Shimane Univ. [email protected] 10th Int'l Workshop on Software Clones, Osaka

Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods

Embed Size (px)

Citation preview

Page 1: Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods

Introducing Parameter Sensitivity toDynamic Code-Clone Analysis Methods

Toshihiro KamiyaInterdisciplinary Graduate School of Sci. & Eng., Shimane Univ.

[email protected] Int'l Workshop on Software Clones, Osaka

Page 2: Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods

March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 2

Outline● What is a dynamic code-clone analysis?

– Detection– Visualization– Samples

● Parameter sensitivity– Possible alternative techniques

[Position Paper] Toshihiro Kamiya, Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods, Proc. 10th International Workshop on Software Clones (IWSC 2016), pp. 19-20, 2016.

Page 3: Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods

March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 3

Dynamic code-clone analysis● Definition:

– Use dynamic information: ● To detect code clones● To visualize such code clones

● Aims/applications:– Detect code clones between a code fragment and its restructured

(refactored) one● Observe evolution of code clones in clone management

– Find code clones w/ similarity in deep semantics (or behavior)

Page 4: Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods

March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 4

Detection method● Detection Steps

1. Collect execution trace(s) by running target program(s)2. Find sub-sequences of the similar method invocations3. Map such sub-sequences into code fragments

Toshihiro Kamiya, "An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and Analysis," IWSC 2015, pp. 1-7 (Mar. 6, 2015).

The details are described in

Page 5: Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods

March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 5

Detection method (implementation)An implementation of step “2. Find sub-sequences of the similar method invocations”

● Just AN implelentation. Could utilize another data structures/algorithms2-1. Generate call tree from execution trace.2-2. For each node of call tree, generate a SB data structure.

– String balloon incl.● A target node● Context (Location): path from root to the target node, ● Contents: Set of nodes called by the target (both direct and indirect)

2-3. Find sets of SB having similar contents.● With a frequent item-set mining algorithm (hyper cubic decomposition [Uno03])[Uno03] T. Uno, et al., An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases, Discovery Science,LNCS 3245, pp. 16-31, 2003.

Revised from IWSC15's

Page 6: Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods

March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 6

Visualization method● Code fragments (of a clone class)

→ “root” nodes of sub-graphs in call graph

● Similarity→ Methods called commonly in the sub-graphs

● Differences→ Methods called solely in a sub-graph

main()

print_extensions_w_for_stmt()

print_extensions_w_map_func()

get_extensions()print

map()

lambda() at line 8

os.path.splitext()

Page 7: Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods

March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 7

A sample code clone – code fragmentsApplied to two CLI HTTP-client tools– prog 1: https://github.com/chrislongo/HttpShell– prog 2: https://pypi.python.org/pypi/httpie

Inputs URL, outputs HTML text.

Page 8: Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods

March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 8

A sample code clone – code fragments

Page 9: Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods

March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 9

A sample code clone – code fragmentsCalling the same function: pygments.highlight()

Page 10: Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods

March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 10

A sample code clone – code fragments

Similar? - Yes.

But why?

Page 11: Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods

March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 11

A sample code clone – call graph

.

. .

.

.

2../ColorFormatter/get_lexer

.

.. pygments.util//get_bool_opt

1.pygments.formatters.terminal/TerminalFormatter/__init__ .

.

StringIO/StringIO/writepygments.lexer//streamer

.

.

.

pygments.lexers//_load_lexers

pygments.lexer//__call__

1.pygments.lexers//guess_lexer

.

.

re//_compile

1../AnsiLogger/print_data

pygments//highlight

2../ColorFormatter/format_body

...

...

...

.

.

pygments//format

pygments//lex

… have common contents.

Because these method calls of guess_lexer() and get_lexer() ...

Page 12: Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods

March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 12

● But this example is the best one in an experiment.● Not always so lucky in general practice ...

Page 13: Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods

March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 13

A bad example from detection result● Code fragments calling utility functions are sometimes

detected as a code clone�

Page 14: Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods

March 15, 2016

A bad example from detection result● Code fragments calling utility functions are sometimes

detected as a code clone ☹– Code fragments of a clone class

Page 15: Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods

A bad example from detection result● Code fragments calling utility functions are sometimes

detected as a code clone ☹– Code fragments of a clone class

Page 16: Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods

March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 16

A bad example from detection result● Code fragments calling utility functions are sometimes detected as a

code clone ☹– Code fragments of a clone class

● cli.py (an entry point) from prog 2● _get_proxy_info() from prog 1● should_bypass_proxy() from prog 2

– Calling functions of regular exp. and assoc. array, i.e. utility functions– Results in a false positive: cli.py and others

(True positive: _get_proxy_info() and should_bypass_proxy())

Page 17: Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods

March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 17

An idea: Parameter sensitivity● Execution trace also includes argument values of each method invocations �

● Add argument value(s) to node labels– re//_compile.’[ˆA-Za-z0-9.]+’ or – re//_compile.’[ˆ-]+’ in place of re//_compile

to distinguish these calls of utility functions.

● Need to introduce value semantics (may challenging ) �

– ’[0-9]’ == ’\d’ (when interpreted as regular exp.)– 0xff == 255

Page 18: Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods

March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 18

Alternative techniques● Threshold about ratio of shared nodes

– Yet another parameter on clone detection ☹● Depends on stack depth ?�

● Pre-defined, manual classification of “Utility” functions☹

– When target code including new(unknown) libraries● Considering order of method invocations

– Such as Smith-Waterman algorithm (applied to static clone detection in [Marukami13])

– Yet another parameter of tool ☹● Depends on length of code fragments ?�

–[Marukami13] H. Murakami, K. Hotta, Y. Higo, H. Igaki, Gapped Code Clone Detection with Lightweight Source Code Analysis, ICPC 2013, pp. 93-102, 2013.

Page 19: Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods

March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 19

Summary● A dynamic code-clone detection

– Based on frequent item-set mining of method invocations● Utility functions (methods) make false positive.● Possible solutions/open questions

– parameter sensitivity, – threshold about ratio of shared nodes, – manual classification of “Utility” functions, – order of method invocations

Page 20: Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods

March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 20

Page 21: Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods

March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 21

Another bad example

Page 22: Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods

● format_headers() of prog2● print_data() of prog1

Page 23: Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods

March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 23