Upload
kamiya-toshihiro
View
305
Download
5
Embed Size (px)
Citation preview
Introducing Parameter Sensitivity toDynamic Code-Clone Analysis Methods
Toshihiro KamiyaInterdisciplinary Graduate School of Sci. & Eng., Shimane Univ.
[email protected] Int'l Workshop on Software Clones, Osaka
March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 2
Outline● What is a dynamic code-clone analysis?
– Detection– Visualization– Samples
● Parameter sensitivity– Possible alternative techniques
[Position Paper] Toshihiro Kamiya, Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods, Proc. 10th International Workshop on Software Clones (IWSC 2016), pp. 19-20, 2016.
March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 3
Dynamic code-clone analysis● Definition:
– Use dynamic information: ● To detect code clones● To visualize such code clones
● Aims/applications:– Detect code clones between a code fragment and its restructured
(refactored) one● Observe evolution of code clones in clone management
– Find code clones w/ similarity in deep semantics (or behavior)
March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 4
Detection method● Detection Steps
1. Collect execution trace(s) by running target program(s)2. Find sub-sequences of the similar method invocations3. Map such sub-sequences into code fragments
Toshihiro Kamiya, "An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and Analysis," IWSC 2015, pp. 1-7 (Mar. 6, 2015).
The details are described in
March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 5
Detection method (implementation)An implementation of step “2. Find sub-sequences of the similar method invocations”
● Just AN implelentation. Could utilize another data structures/algorithms2-1. Generate call tree from execution trace.2-2. For each node of call tree, generate a SB data structure.
– String balloon incl.● A target node● Context (Location): path from root to the target node, ● Contents: Set of nodes called by the target (both direct and indirect)
2-3. Find sets of SB having similar contents.● With a frequent item-set mining algorithm (hyper cubic decomposition [Uno03])[Uno03] T. Uno, et al., An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases, Discovery Science,LNCS 3245, pp. 16-31, 2003.
Revised from IWSC15's
March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 6
Visualization method● Code fragments (of a clone class)
→ “root” nodes of sub-graphs in call graph
● Similarity→ Methods called commonly in the sub-graphs
● Differences→ Methods called solely in a sub-graph
main()
print_extensions_w_for_stmt()
print_extensions_w_map_func()
get_extensions()print
map()
lambda() at line 8
os.path.splitext()
March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 7
A sample code clone – code fragmentsApplied to two CLI HTTP-client tools– prog 1: https://github.com/chrislongo/HttpShell– prog 2: https://pypi.python.org/pypi/httpie
Inputs URL, outputs HTML text.
March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 8
A sample code clone – code fragments
March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 9
A sample code clone – code fragmentsCalling the same function: pygments.highlight()
March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 10
A sample code clone – code fragments
Similar? - Yes.
But why?
March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 11
A sample code clone – call graph
.
. .
.
.
2../ColorFormatter/get_lexer
.
.. pygments.util//get_bool_opt
1.pygments.formatters.terminal/TerminalFormatter/__init__ .
.
StringIO/StringIO/writepygments.lexer//streamer
.
.
.
pygments.lexers//_load_lexers
pygments.lexer//__call__
1.pygments.lexers//guess_lexer
.
.
re//_compile
1../AnsiLogger/print_data
pygments//highlight
2../ColorFormatter/format_body
...
...
...
.
.
pygments//format
pygments//lex
… have common contents.
Because these method calls of guess_lexer() and get_lexer() ...
March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 12
● But this example is the best one in an experiment.● Not always so lucky in general practice ...
March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 13
A bad example from detection result● Code fragments calling utility functions are sometimes
detected as a code clone�
March 15, 2016
A bad example from detection result● Code fragments calling utility functions are sometimes
detected as a code clone ☹– Code fragments of a clone class
●
●
●
A bad example from detection result● Code fragments calling utility functions are sometimes
detected as a code clone ☹– Code fragments of a clone class
March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 16
A bad example from detection result● Code fragments calling utility functions are sometimes detected as a
code clone ☹– Code fragments of a clone class
● cli.py (an entry point) from prog 2● _get_proxy_info() from prog 1● should_bypass_proxy() from prog 2
– Calling functions of regular exp. and assoc. array, i.e. utility functions– Results in a false positive: cli.py and others
(True positive: _get_proxy_info() and should_bypass_proxy())
March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 17
An idea: Parameter sensitivity● Execution trace also includes argument values of each method invocations �
● Add argument value(s) to node labels– re//_compile.’[ˆA-Za-z0-9.]+’ or – re//_compile.’[ˆ-]+’ in place of re//_compile
to distinguish these calls of utility functions.
● Need to introduce value semantics (may challenging ) �
– ’[0-9]’ == ’\d’ (when interpreted as regular exp.)– 0xff == 255
March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 18
Alternative techniques● Threshold about ratio of shared nodes
– Yet another parameter on clone detection ☹● Depends on stack depth ?�
● Pre-defined, manual classification of “Utility” functions☹
– When target code including new(unknown) libraries● Considering order of method invocations
– Such as Smith-Waterman algorithm (applied to static clone detection in [Marukami13])
– Yet another parameter of tool ☹● Depends on length of code fragments ?�
–[Marukami13] H. Murakami, K. Hotta, Y. Higo, H. Igaki, Gapped Code Clone Detection with Lightweight Source Code Analysis, ICPC 2013, pp. 93-102, 2013.
March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 19
Summary● A dynamic code-clone detection
– Based on frequent item-set mining of method invocations● Utility functions (methods) make false positive.● Possible solutions/open questions
– parameter sensitivity, – threshold about ratio of shared nodes, – manual classification of “Utility” functions, – order of method invocations
March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 20
March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 21
Another bad example
● format_headers() of prog2● print_data() of prog1
March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 23