Upload
domenic-wilcox
View
221
Download
0
Embed Size (px)
Citation preview
2002/12/11 PROFES2002 1
On software maintenance process improvement based on code clone analysis
Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** ,Shinji Kusumoto*, Katsuro Inoue*
*Graduate School of Information and Science Technology, Osaka University
**PRESTO, Japan Science and Technology Corporation
2002/12/11 PROFES2002 2
Contents
Background: Code CloneObjectiveCode Clone Detection Tool: CCFinderProposed Clone Removal TechniqueCase StudiesSummaries and Future Works
2002/12/11 PROFES2002 3
Background: Code CloneCode clone is a code portion in source files
that is identical or similar to another.
Clone Pair
Clone Class
Code clone is one of factors that make software maintenance more difficult.
If some faults are found in a code clone, it is necessary to correct the faults in its all code clones.
2002/12/11 PROFES2002 4
Code Clone Detection Tool: CCFinder
We have been developing a code clone detection tool, CCFinder.
We have delivered CCFinder to software companies and evaluated the usefulness through some case studies.
2002/12/11 PROFES2002 5
Case studies of CCFinderOpen source software
Commercial Software (about 30 companies)
Students exercise of Osaka University Filed in a court as an evidence for
software copyright suit
JDK libraries (Java, 570 KLOC) Linux, FreeBSD (C, 1.6 + 1.3 MLOC) FreeBSD, OpenBSD , NetBSD(C) Qt(C++ , 240KLOC)
NTT Data Corp., Hitachi Ltd., Hitachi GP, NEC soft Ltd., ASTEC Inc., SRA Inc., NASDA , Daiwa Computer, etc…
2002/12/11 PROFES2002 6
Purpose of our research
As an actual application of CCFinder, we want to use code clone analysis in refactoring process.
But code clones detected by CCFinder are sequences of tokens, such code clones are not appropriate to be directly replaced by one module (subroutine, function et.).
2002/12/11 PROFES2002 7
Objective We propose a method to extract code clones fro
m ones detected by CCFinder, which are well-suited to refactoring process ([Extract Method], [Pull Up Method])*.
We apply the proposed method to some open source softwares, and evaluate the usefulness of it.
*M. Fowler: Refactoring: Improving the Design of Existing Code, Addison-Wesley, 1999.
2002/12/11 PROFES2002 8
Extract Method
Void methodA(int i){methodZ();
System.out.println(“name:” + name);System.out.println(“amount:” + i);
}
Void methodB(int i){methodY();
System.out.println(“name:” + name);System.out.println(“amount:” + i);
}
void methodA(int i){methodZ();methodC(i);
}void methodB(int i){
methodY();methodC(i);
}
Void methodC(int i){System.out.println(“name:” + name);System.out.println(“amount:” + i);
}
methodC(i);
methodC(i);
2002/12/11 PROFES2002 9
Pull Up Method
method A
class A
class B class C
class A
class B class C method A method A
2002/12/11 PROFES2002 10
Outline of CCFinder Clone detection process consists of four steps.
Source files
Lexical analysis
Transformation
Token sequence
Match detection
Transformed token sequence
Clones on transformed sequence
Formatting
Clone pairs
CCFinder
Step 1
Step 2
Step 3
Step 4
Target program
C / C++
Java
FORTRAN
COBOL
LISP
Plain Text
2002/12/11 PROFES2002 11
Process of CCFinder(1/4) Source files
Lexical analysis
Transformation
Token sequence
Match detection
Transformed token sequence
Clones on transformed sequence
Formatting
1. static void foo() throws RESyntaxException { 2. String a[] = new String [] { "123,400", "abc", "orange 100" }; 3. org.apache.regexp.RE pat = new org.apache.regexp.RE("[0-9,]+"); 4. int sum = 0; 5. for (int i = 0; i < a.length; ++i) 6. if (pat.match(a[i])) 7. sum += Sample.parseNumber(pat.getParen(0)); 8. System.out.println("sum = " + sum); 9. }10. static void goo(String [] a) throws RESyntaxException {11. RE exp = new RE("[0-9,]+");12. int sum = 0;13. for (int i = 0; i < a.length; ++i)14. if (exp.match(a[i]))15. sum += parseNumber(exp.getParen(0));16. System.out.println("sum = " + sum);17. }
static void foo ( ) {
String a [ ] = new String [ ] { "123,400" , "abc" , "orange 100" } ;
int sum = 0 ;
for ( int i = 0 ; i < a . length ; ++ i )
sum += pat . getParen 0 ;
System . out . println ( "sum = " + sum ) ;
}
throws RESyntaxException
Sample . parseNumber ( ) )
if pat . match a [ i ]( ) )
org . apache . regexp . RE pat = new org . apache . regexp . RE ( "[0-9,]+" ) ;
static void goo ( ) {String a [ ]
int sum = 0 ;
for ( int i = 0 ; i < a . length ; ++ i )
System . out . println ( "sum = " + sum ) ;
}
throws RESyntaxException
if exp . match a [ i ]( ) )
exp = new RE ( "[0-9,]+" ) ;
(
RE
sum += exp . getParen 0 ;parseNumber ( ) )(
(
(
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
Clone pairs
static $p ( ) {
[ ] = new [ ] { $u } ;
= ;
for ( = ; < . ; ++ )
+= . ;
. . ( + ) ;
}
throws
. ( ) )
if . [ ]( ) )
= new ( ) ;
static ( ) {[ ]
= ;
for ( = ; < . ; ++ )
. . ( + ) ;
}
throws
if . [ ]( ) )
= new ( ) ;
(
+= . ;( ) )(
(
(
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
.
$p $p
$p $p
$p $p $p $p
$p $p $p
$p $p $p $p $p $p $p
$p $p $p $p
$p $p $p $p $p $p
$p $p $p $p $p
$p $p $p $p $p
$p $p $p $p
$p $p $p
$p $p $p $p $p $p $p
$p $p $p $p
$p $p $p $p $p $p
$p $p $p $p $p
$p
Transformation
Lexical analysis
2002/12/11 PROFES2002 12
stat
ic$p $p ( ) th
rows
$p { $p $p [ ] "="
$p $p [ ] { $u } ; $p $p "="
new
$p ( $p ) ; $p $p "="
$p
static * $p * * * * * * * * * * * * * *$p * * * * * * * * * * * * * *( * * ) * * throws * $p * * * * * * * * * * * * * *{ * * $p * * * * * * * * * * * * * *$p * * * * * * * * * * * * * *[ * * ] * * "=" * * * $p * * * * * * * * * * * * * *$p * * * * * * * * * * * * * *[ * * ] * * { * * $u * } * ; * * $p * * * * * * * * * * * * * *$p * * * * * * * * * * * * * *"=" * * * new * $p * * * * * * * * * * * * * *( * * $p * * * * * * * * * * * * * *) * * ; * * $p * * * * * * * * * * * * * *$p * * * * * * * * * * * * * *"=" * * * $p * * * * * * * * * * * * * *
Process of CCFinder(3/4) Source files
Lexical analysis
Transformation
Token sequence
Match detection
Transformed token sequence
Clones on transformed sequence
Formatting
Clone pairs
2002/12/11 PROFES2002 13
Process of CCFinder(3/4)* * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * ** ** * * * * ** ** * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * ** ** * * * * ** ** * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * * * * * * *
Source files
Lexical analysis
Transformation
Token sequence
Match detection
Transformed token sequence
Clones on transformed sequence
Formatting
Clone pairs
2002/12/11 PROFES2002 14
Process of CCFinder(4/4) 1. static void foo() throws RESyntaxException { 2. String a[] = new String [] { "123,400", "abc", "orange 100" }; 3. org.apache.regexp.RE pat = new org.apache.regexp.RE("[0-9,]+"); 4. int sum = 0; 5. for (int i = 0; i < a.length; ++i) 6. if (pat.match(a[i])) 7. sum += Sample.parseNumber(pat.getParen(0)); 8. System.out.println("sum = " + sum); 9. }10. static void goo(String [] a) throws RESyntaxException {11. RE exp = new RE("[0-9,]+");12. int sum = 0;13. for (int i = 0; i < a.length; ++i)14. if (exp.match(a[i]))15. sum += parseNumber(exp.getParen(0));16. System.out.println("sum = " + sum);17. }
Source files
Lexical analysis
Transformation
Token sequence
Match detection
Transformed token sequence
Clones on transformed sequence
Formatting
Clone pairs
2002/12/11 PROFES2002 15
Issues in refactoring process
Since code clones detected by CCFinder are sequences of tokens, so they are not appropriate to be directly replaced by one module (subroutine, function et.).
Some of them do not suit refactoring.
Example 1 (Code clones including needless statements for refactoring)
righttokennumber = c.getEndNumber() - c.getStartNumber() + 1;}
string getLeftClone() const{
char temp[STRLENGTH];snprintf(temp,STRLENGTH,
"%s\t%d,%d,%d\t%d,%d,%d\t",leftID.c_str(), leftstartline,leftstartcolumn,leftstartnumber, leftendline,leftendcolumn,leftendnumber);
string clone(temp);return clone;
}
string getRightClone() const{
char temp[STRLENGTH];snprintf(temp,STRLENGTH,
"%s\t%d,%d,%d\t%d,%d,%d\t",rightID.c_str(), rightstartline,rightstartcolumn,rightstartnumber, rightendline,rightendcolumn,rightendnumber);
string clone(temp);return clone;
}
int getLeftTokenNumber() const{
return lefttokennumber;}
2002/12/11 PROFES2002 17
Example 1 (Code clones including needless statements for refactoring)
righttokennumber = c.getEndNumber() - c.getStartNumber() + 1;}
string getLeftClone() const{ char temp[STRLENGTH]; snprintf(temp,STRLENGTH, "%s\ t%d,%d,%d\ t%d,%d,%d\ t",leftID.c_str(), leftstartline,leftstartcolumn,leftstartnumber, leftendline,leftendcolumn,leftendnumber); string clone(temp); return clone;}
string getRightClone() const{ char temp[STRLENGTH]; snprintf(temp,STRLENGTH,
return clone;}
string getRightClone() const{ char temp[STRLENGTH]; snprintf(temp,STRLENGTH, "%s\ t%d,%d,%d\ t%d,%d,%d\ t",rightID.c_str(), rightstartline,rightstartcolumn,rightstartnumber, rightendline,rightendcolumn,rightendnumber); string clone(temp); return clone;}
int getLeftTokenNumber() const{ return lefttokennumber;}
parts should be detected. Only
Example 2 (Code clones not suited to refactoring)
lp=makeexptree(&lp,NULL,SRPAREN,NULL,SNULL); rp=makeexptree(NULL,&lp,SIDENTIFIER, idtemp,SPROCEDURE); curVertex->kind=ST_call; curVertex->tree=rp; curVertex->refer=search_tree_postorder(rp); curVertex->refer=uniq_RDlist(curVertex->refer); cutSt();}break;case 67:#line 355 "subPascalParse.y"{yyval.exptree = yyvsp[0].exptree;}break;
if(func->parameter!=NULL) return error("Error:Parameter Exist"); curVertex->kind=ST_call; lp=makeexptree(NULL,NULL,SIDENTIFIER,idtemp,SPROCEDURE); curVertex->tree=lp; curVertex->refer=search_tree_postorder(lp); curVertex->refer=uniq_RDlist(curVertex->refer); cutSt();}break;case 66:#line 314 "subPascalParse.y"{struct FUNCTION *func; struct RD *para;
CCFinder extracts parts as code clones.
But, these are not suited to refactoring.
2002/12/11 PROFES2002 19
Outline of proposed method
It extracts meaningful code clone from output of CCFinder.
CCFinder
Filter
Source files
Meaningful clone data
Clone data
Clone data
GUI Interface
2002/12/11 PROFES2002 20
Processes executed by the filter
Clone extraction unit : It extracts meaningful code clones from the result of syntax analysis and the output of CCFinder.
Output of CCFinder Source files
Syntax analysis unit
Clone extraction unit
Clone management unit
Output
Clone management unit : It sorts and merges code clones detected by clone extraction unit.
Syntax analysis unit : It performs syntax analysis to source code including code clones.
2002/12/11 PROFES2002 21
Implementation of proposed method
CCShaper ( Code Clone Shaper) Target program: Java CCShaper extracts meaningful block from th
e output of CCFinder
Description language: C++ Source size: about 4000 LOC Working environment: Windows2000/XP
2002/12/11 PROFES2002 22
Outline of experiments
We conducted two experiments using two kinds of Java source codes, ANTLR and Ant, which are open source software.
Experimental environment Pentium4 1.5GHz memory SDRAM512MB We deal with code clones
which include more than 50 tokens.
CCFinder
Code clonesMeaningful
code clones
CCShaper
Source code of ANTLR, Ant
2002/12/11 PROFES2002 23
Two criteria
Clone Pair
Clone Class
A Clone Pair is each pair of clone code portions. A Clone Class is a collection of code portions
that are code clones each other.
In this experiment, we use two criteria, the number of clone pairs and clone classes.
2002/12/11 PROFES2002 24
Experiment 1(ANTLR)(1/3)
Without CCShaper With CCShaper
Number of clone pairs
388574 984
Number of clone classes
1072 148Analysis time: about 2 minutes
ANTLR is implemented in Java,and generates parsers in either Java or C++. 239 Source filesSize: about 44000 LOCResult
1/400
1/7
2002/12/11 PROFES2002 25
Experiment 1(ANTLR)(2/3)Comparing on scatter plot
Without CCShaper With CCShaper
a
2002/12/11 PROFES2002 26
Experiment 1(ANTLR)(3/3)Source code of the selected code clonepublic final void mOPEN_ELEMENT_OPTION(boolean _createToken)throws RecognitionException, CharStreamException, TokenStreamException { int _ttype; Token _token=null; int _begin=text.length(); ttype = OPEN_ELEMENT_OPTION; int _saveIndex;
match('<'); if ( _createToken && _token==null && _ttype!=Token.SKIP ) { _token = makeToken(_ttype); _token.setText(new String(text.getBuffer(), _begin, text.length()-_begin)); } _returnToken = _token;}
(a)Only portions are different from other clones.
This code clone appears in 20 places of ANTLR.
All code clones are methods included in the same class.
These methods can be merged to one method by adding 2 arguments.
2002/12/11 PROFES2002 27
Experiment 2(Ant)(1/4)Ant is a Java-based build tool. 689 source files.Size: about 164000 LOC.
Result
Without CCShaper With CCShaper
Number of clone pairs
12870 159
Number of clone classes
1161 87Analysis time: about 5 seconds
1/80
1/13
2002/12/11 PROFES2002 28
Experiment 2(Ant)(2/4)Comparing on scatter plot
Without CCShaper With CCShaper
b
2002/12/11 PROFES2002 29
Experiment 2(Ant)(3/4)
public void getAutoresponse(Commandline cmd) { if (m_AutoResponse == null) { cmd.createArgument().setValue(FLAG_AUTORESPONSE_DEF); } else if (m_AutoResponse.equalsIgnoreCase("Y")) { cmd.createArgument().setValue(FLAG_AUTORESPONSE_YES); } else if (m_AutoResponse.equalsIgnoreCase("N")) { cmd.createArgument().setValue(FLAG_AUTORESPONSE_NO); } else { cmd.createArgument().setValue(FLAG_AUTORESPONSE_DEF); } // end of else}
Source code of the selected code clone (b)These clones are verbatimly the same ones
These clones appear in seven classes
These seven classes inherit a same class
These methods can be merged to one method by pulling up to the parent class
2002/12/11 PROFES2002 30
Experiment 2(Ant)(4/4) Class diagram (before refactoring)
MSVSSADD MSVSSCHECKIN MSVSSCHECKOUT
MSVSSCP MSVSSCREATE MSVSSGET MSVSSLABEL
MSVSS
getAutoresponse(Commandline cmd)
getAutoresponse(Commandline cmd)
getAutoresponse(Commandline cmd)
getAutoresponse(Commandline cmd)
getAutoresponse(Commandline cmd)
getAutoresponse(Commandline cmd)
getAutoresponse(Commandline cmd)
Class diagram (after refactoring)
MSVSSADD MSVSSCHECKIN MSVSSCHECKOUT
MSVSSCP MSVSSCREATE MSVSSGET MSVSSLABEL
MSVSS
getAutoresponse(Commandline cmd)
2002/12/11 PROFES2002 31
Summaries
We have developed a filtering tool (CCShaper) which extracts code clones that are well-suited to refactoring activity
We have evaluated the usefulness of CCShaper by applying it to actual Java programs
2002/12/11 PROFES2002 32
We are going toapply CCShaper to commercial software,extend it as to apply other programming languag
es,develop a filtering method which can extract cod
e clones more-suited to refactoring.
Future works
2002/12/11 PROFES2002 33
2002/12/11 PROFES2002 34
Web page of CCFinder/Gemini is available at http://sel.ist.osaka-u.ac.jp/cdtools/index.html.en
2002/12/11 PROFES2002 35
The difference between ‘diff’ and clone detection tools
Diff finds the longest common sub-string. Given a code portion, diff does not
report two or more same code portions (clones).
Clone detection tool finds all the same or similar code portions.
2002/12/11 PROFES2002 36
Suffix-treeSuffix tree is a tree that satisfies the following conditions.
1. A leaf node represents the starting position of sub-string.
2. A path from root node to a leaf node represents a sub-string.
3. First characters of labels of all the edges from one node are different from each other.
→ A common path means a clone
x
y
z%
%
xyxyz%
y
xyz%
z%
xyz%
z%
1
2
43
56
71 2 3 4 5 6 7x x y x y z %
1 2 3 4 5 6 7x x y x y z %
1 x *2 x * *3 y *4 x * * *5 y * *6 z *7 % *
2002/12/11 PROFES2002 37
Example of transformation rules in Java
All identifiers defined by user are transformed to same tokens.Unique identifier is inserted at each end of the top-level definitions and declarations.
Prevents detecting clones that begin at the middle of class definition and end at the middle of another one.
”java. lang. Math. PI” is transformed to ”Math. PI”. By using import sentence, a class is referred to with either full pac
kage name or a shorter name
” new int[] {1, 2, 3} ” is transformed to ” new int[] {$} ” Eliminates table initialization code.
2002/12/11 PROFES2002 38
The output ofCCFinder
Output of CCFinder
#version: ccfinder 3.1
#langspec: JAVA
#option: -b 30,1
#option: -k +
#option: -r abcdfikmnprsv
#option: -c wfg
#begin{file description}
0.0 52 C:\Gemini.java
0.1 94 C:\GeneralManager.java
:
:
#end{file description}
#begin{clone}
0.1 53,9 63,13 1.10 542,9 553,13 35
0.1 53,9 63,13 1.10 624,9 633,13 35
0.2 124,9 152,31 0.2 154,9 216,51 42
:
:
#end{clone}
Object file ID( file 0 in Group 0 )
Location of a clone pair( Lines 53 - 63 in file 0.1 and Lines 542 - 553 in file 1.10 are identical or similar to each other)
It is difficult to analyze source code by only this text-based information of the location of clone pairs.
2002/12/11 PROFES2002 39
The analysis of comparison among students (non-gapped clones only)
A
B
The corresponding code A (2 students)
Similar code fragments were from source code of sample compiler described in textbook.
B (4 students) Many code fragments were
similar even with respect to name of variables or comments.