Upload
keaira
View
36
Download
5
Embed Size (px)
DESCRIPTION
Gemini: Maintenance Support Environment Based on Code Clone Analysis. Yasushi Ueda*, Toshihiro Kamiya**, Shinji Kusumoto*** and Katsuro Inoue***. *Graduate School of Engineering Science, Osaka Univ. **PRESTO, Japan Science and Technology Corp. - PowerPoint PPT Presentation
Citation preview
1
Gemini:Maintenance Support EnvironmentBased on Code Clone Analysis
*Graduate School of Engineering Science, Osaka Univ.**PRESTO, Japan Science and Technology Corp.
***Graduate School of Information Science and Technology, Osaka [email protected]
{kamiya, kusumoto, inoue}@ist.osaka-u.ac.jp
Yasushi Ueda*, Toshihiro Kamiya**,Shinji Kusumoto*** and Katsuro Inoue***
2
ContentsBackgroundMaintenance support environment,
GeminiOverviewSystem structureScatter Plot
Case StudyConclusions
3
Background (1/2) A code clone is a pair/set of code portions in
source files that are identical or similar to each other.
clone pair
clone pair
clone pair
clone class
4
Background (2/2) Code clone is one of the factors that make
software maintenance more difficult. If some faults are found in a code fragment, it is
necessary to correct the faults in its all clone pairs.
[1] T. Kamiya, S. Kusumoto, and K. Inoue, “CCFinder: A multi-linguistic token-based code clone detection system for large scale source code”,
IEEE Transactions on Software Engineering, (to appear).
We have developed a code clone detection tool, CCFinder[1]. Token-based clone detector Its input is a set of source files and
output is the locations of clone pairs.
5
CCFinder (1/4) Clone detection process consists of four steps.
Source files
Lexical analysis
Transformation
Token sequence
Match detection
Transformed token sequence
Clones on transformed sequence
Formatting
Clone pairs
CCfinder
Step 1
Step 2
Step 3
Step 4
Target program
C / C++ Java FORTRAN COBOL LISP
6
Source files
Lexical analysis
Transformation
Token sequence
Match detection
Transformed token sequence
Clones on transformed sequence
Formatting
Clone pairs
1. static void foo() throws RESyntaxException { 2. String a[] = new String [] { "123,400", "abc", "orange 100" }; 3. org.apache.regexp.RE pat = new org.apache.regexp.RE("[0-9,]+"); 4. int sum = 0; 5. for (int i = 0; i < a.length; ++i) 6. if (pat.match(a[i])) 7. sum += Sample.parseNumber(pat.getParen(0)); 8. System.out.println("sum = " + sum); 9. }10. static void goo(String [] a) throws RESyntaxException {11. RE exp = new RE("[0-9,]+");12. int sum = 0;13. for (int i = 0; i < a.length; ++i)14. if (exp.match(a[i]))15. sum += parseNumber(exp.getParen(0));16. System.out.println("sum = " + sum);17. }
static void foo ( ) {
String a [ ] = new String [ ] { "123,400" , "abc" , "orange 100" } ;
int sum = 0 ;
for ( int i = 0 ; i < a . length ; ++ i )
sum += pat . getParen 0 ;
System . out . println ( "sum = " + sum ) ;
}
throws RESyntaxException
Sample . parseNumber ( ) )
if pat . match a [ i ]( ) )
org . apache . regexp . RE pat = new org . apache . regexp . RE ( "[0-9,]+" ) ;
static void goo ( ) {String a [ ]
int sum = 0 ;
for ( int i = 0 ; i < a . length ; ++ i )
System . out . println ( "sum = " + sum ) ;
}
throws RESyntaxException
if exp . match a [ i ]( ) )
exp = new RE ( "[0-9,]+" ) ;
(
RE
sum += exp . getParen 0 ;parseNumber ( ) )(
(
(
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
static $p ( ) {
[ ] = new [ ] { $u } ;
= ;
for ( = ; < . ; ++ )
+= . ;
. . ( + ) ;
}
throws
. ( ) )
if . [ ]( ) )
= new ( ) ;
static ( ) {[ ]
= ;
for ( = ; < . ; ++ )
. . ( + ) ;
}
throws
if . [ ]( ) )
= new ( ) ;
(
+= . ;( ) )(
(
(
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
.
$p $p
$p $p
$p $p $p $p
$p $p $p
$p $p $p $p $p $p $p
$p $p $p $p
$p $p $p $p $p $p
$p $p $p $p $p
$p $p $p $p $p
$p $p $p $p
$p $p $p
$p $p $p $p $p $p $p
$p $p $p $p
$p $p $p $p $p $p
$p $p $p $p $p
$p Lexical analysis
Transformation
Token sequence
Match detection
Transformed token sequence
Clones on transformed sequence
Formatting
static void foo ( ) {
String a [ ] = new String [ ] { $u } ;
int sum = 0 ;
for ( int i = 0 ; i < a . length ; ++ i )
sum += pat . getParen 0 ;
System . out . println ( "sum = " + sum ) ;
}
throws RESyntaxException
Sample . parseNumber ( ) )
if pat . match a [ i ]( ) )
RE pat = new RE ( "[0-9,]+" ) ;
static void goo ( ) {String a [ ]
int sum = 0 ;
for ( int i = 0 ; i < a . length ; ++ i )
System . out . println ( "sum = " + sum ) ;
}
throws RESyntaxException
if exp . match a [ i ]( ) )
exp = new RE ( "[0-9,]+" ) ;
(
RE
sum += exp . getParen 0 ;parseNumber ( ) )(
(
(
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
$p .
CCFinder (2/4) Example of clone detection process
static $p ( ) {
[ ] = new [ ] { $u } ;
= ;
for ( = ; < . ; ++ )
+= . ;
. . ( + ) ;
}
throws
. ( ) )
if . [ ]( ) )
= new ( ) ;
static ( ) {[ ]
= ;
for ( = ; < . ; ++ )
. . ( + ) ;
}
throws
if . [ ]( ) )
= new ( ) ;
(
+= . ;( ) )(
(
(
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
.
$p $p
$p $p
$p $p $p $p
$p $p $p
$p $p $p $p $p $p $p
$p $p $p $p
$p $p $p $p $p $p
$p $p $p $p $p
$p $p $p $p $p
$p $p $p $p
$p $p $p
$p $p $p $p $p $p $p
$p $p $p $p
$p $p $p $p $p $p
$p $p $p $p $p
$p
7
Example of transformation rules in Java All identifiers defined by user are transformed to same toke
ns. Unique identifier is inserted at each end of the top-level defi
nitions and declarations. Prevents detecting clones that begin at the middle of class definitio
n and end at the middle of another one. ”java. lang. Math. PI” is transformed to ”Math. PI”.
By using import sentence, a class is referred to with either full package name or a shorter name
” new int[] {1, 2, 3} ” is transformed to ” new int[] {$} ” Eliminates table initialization code.
8
Source files
Lexical analysis
Transformation
Token sequence
Match detection
Transformed token sequence
Clones on transformed sequence
Formatting
Clone pairs
1. static void foo() throws RESyntaxException { 2. String a[] = new String [] { "123,400", "abc", "orange 100" }; 3. org.apache.regexp.RE pat = new org.apache.regexp.RE("[0-9,]+"); 4. int sum = 0; 5. for (int i = 0; i < a.length; ++i) 6. if (pat.match(a[i])) 7. sum += Sample.parseNumber(pat.getParen(0)); 8. System.out.println("sum = " + sum); 9. }10. static void goo(String [] a) throws RESyntaxException {11. RE exp = new RE("[0-9,]+");12. int sum = 0;13. for (int i = 0; i < a.length; ++i)14. if (exp.match(a[i]))15. sum += parseNumber(exp.getParen(0));16. System.out.println("sum = " + sum);17. }
static void foo ( ) {
String a [ ] = new String [ ] { "123,400" , "abc" , "orange 100" } ;
int sum = 0 ;
for ( int i = 0 ; i < a . length ; ++ i )
sum += pat . getParen 0 ;
System . out . println ( "sum = " + sum ) ;
}
throws RESyntaxException
Sample . parseNumber ( ) )
if pat . match a [ i ]( ) )
org . apache . regexp . RE pat = new org . apache . regexp . RE ( "[0-9,]+" ) ;
static void goo ( ) {String a [ ]
int sum = 0 ;
for ( int i = 0 ; i < a . length ; ++ i )
System . out . println ( "sum = " + sum ) ;
}
throws RESyntaxException
if exp . match a [ i ]( ) )
exp = new RE ( "[0-9,]+" ) ;
(
RE
sum += exp . getParen 0 ;parseNumber ( ) )(
(
(
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
static $p ( ) {
[ ] = new [ ] { $u } ;
= ;
for ( = ; < . ; ++ )
+= . ;
. . ( + ) ;
}
throws
. ( ) )
if . [ ]( ) )
= new ( ) ;
static ( ) {[ ]
= ;
for ( = ; < . ; ++ )
. . ( + ) ;
}
throws
if . [ ]( ) )
= new ( ) ;
(
+= . ;( ) )(
(
(
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
.
$p $p
$p $p
$p $p $p $p
$p $p $p
$p $p $p $p $p $p $p
$p $p $p $p
$p $p $p $p $p $p
$p $p $p $p $p
$p $p $p $p $p
$p $p $p $p
$p $p $p
$p $p $p $p $p $p $p
$p $p $p $p
$p $p $p $p $p $p
$p $p $p $p $p
$p Lexical analysis
Transformation
Token sequence
Match detection
Transformed token sequence
Clones on transformed sequence
Formatting
static void foo ( ) {
String a [ ] = new String [ ] { $u } ;
int sum = 0 ;
for ( int i = 0 ; i < a . length ; ++ i )
sum += pat . getParen 0 ;
System . out . println ( "sum = " + sum ) ;
}
throws RESyntaxException
Sample . parseNumber ( ) )
if pat . match a [ i ]( ) )
RE pat = new RE ( "[0-9,]+" ) ;
static void goo ( ) {String a [ ]
int sum = 0 ;
for ( int i = 0 ; i < a . length ; ++ i )
System . out . println ( "sum = " + sum ) ;
}
throws RESyntaxException
if exp . match a [ i ]( ) )
exp = new RE ( "[0-9,]+" ) ;
(
RE
sum += exp . getParen 0 ;parseNumber ( ) )(
(
(
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
$p .
CCFinder (2/4) Example of clone detection process
static $p ( ) {
[ ] = new [ ] { $u } ;
= ;
for ( = ; < . ; ++ )
+= . ;
. . ( + ) ;
}
throws
. ( ) )
if . [ ]( ) )
= new ( ) ;
static ( ) {[ ]
= ;
for ( = ; < . ; ++ )
. . ( + ) ;
}
throws
if . [ ]( ) )
= new ( ) ;
(
+= . ;( ) )(
(
(
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
.
$p $p
$p $p
$p $p $p $p
$p $p $p
$p $p $p $p $p $p $p
$p $p $p $p
$p $p $p $p $p $p
$p $p $p $p $p
$p $p $p $p $p
$p $p $p $p
$p $p $p
$p $p $p $p $p $p $p
$p $p $p $p
$p $p $p $p $p $p
$p $p $p $p $p
$p Lexical analysis
Transformation
Token sequence
Match detection
Transformed token sequence
Clones on transformed sequence
Formatting
stat
ic$p $p ( ) th
rows
$p { $p $p [ ] "="
$p $p [ ] { $u } ; $p $p "="
new
$p ( $p ) ; $p $p "="
$p
static * $p * * * * * * * * * * * * * *$p * * * * * * * * * * * * * *( * * ) * * throws * $p * * * * * * * * * * * * * *{ * * $p * * * * * * * * * * * * * *$p * * * * * * * * * * * * * *[ * * ] * * "=" * * * $p * * * * * * * * * * * * * *$p * * * * * * * * * * * * * *[ * * ] * * { * * $u * } * ; * * $p * * * * * * * * * * * * * *$p * * * * * * * * * * * * * *"=" * * * new * $p * * * * * * * * * * * * * *( * * $p * * * * * * * * * * * * * *) * * ; * * $p * * * * * * * * * * * * * *$p * * * * * * * * * * * * * *"=" * * * $p * * * * * * * * * * * * * *
* * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * ** ** * * * * ** ** * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * ** ** * * * * ** ** * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * * * * * * *
Lexical analysis
Transformation
Token sequence
Match detection
Transformed token sequence
Clones on transformed sequence
Formatting
1. static void foo() throws RESyntaxException { 2. String a[] = new String [] { "123,400", "abc", "orange 100" }; 3. org.apache.regexp.RE pat = new org.apache.regexp.RE("[0-9,]+"); 4. int sum = 0; 5. for (int i = 0; i < a.length; ++i) 6. if (pat.match(a[i])) 7. sum += Sample.parseNumber(pat.getParen(0)); 8. System.out.println("sum = " + sum); 9. }10. static void goo(String [] a) throws RESyntaxException {11. RE exp = new RE("[0-9,]+");12. int sum = 0;13. for (int i = 0; i < a.length; ++i)14. if (exp.match(a[i]))15. sum += parseNumber(exp.getParen(0));16. System.out.println("sum = " + sum);17. }
9
CCFinder (3/4)Application of CCFinder
Free softwareJDK libraries (Java, 570 KLOC)Linux, FreeBSD (C, 1.6 + 1.3 MLOC)FreeBSD, OpenBSD , NetBSD(C)Qt(C++ , 240KLOC)
Commercial softwareNTT data Corp., Hitachi Ltd., NEC soft Ltd.,
ASTEC Inc., SRA Inc.NASDA (Control program for rocket)
10
CCFinder (4/4) Output of CCFinder
#version: ccfinder 3.1
#langspec: JAVA
#option: -b 30,1
#option: -k +
#option: -r abcdfikmnprsv
#option: -c wfg
#begin{file description}
0.0 52 C:\Gemini.java
0.1 94 C:\GeneralManager.java
:
:
#end{file description}
#begin{clone}
0.1 53,9 63,13 1.10 542,9 553,13 35
0.1 53,9 63,13 1.10 624,9 633,13 35
0.2 124,9 152,31 0.2 154,9 216,51 42
:
:
#end{clone}
Object file ID( file 0 in Group 0 )
Location of a clone pair( Lines 53 - 63 in file 0.1 and Lines 542 - 553 in file 1.10 are identical or similar to each other)
It is difficult to analyze source code by only this text-based information of the location of clone pairs.
11
Goals of this studyProposal of an interactive code clone
analysis environment Gemini
Case study to evaluate the proposed environment Apply Gemini to programming exercise
in our university and analyze the results.
12
Gemini overview A GUI-based code clone analysis environment
Uses CCFinder as a code clone detector. Has several views to interactive analysis.
Scatter plot view Select by mouse dragging Sorting function Zoom in/out
Metric graph view Select by metric values
Source code view Implemented in Java
About 10,000 lines of code
13
Clone pair manager (CPM)
Clone class manager (CCM)
Scatter plot view
Clone pair list view
Metrics graph
Clone class list view
User Interfaces
System structure of Gemini
Source files
Source code manager (SCM)
Source code viewClone
selection information
Clone selection
information
User
Gemini
Code clone detector (CCD)
CCFinder
Code clone database
(CDB)
14
Scatter plot Both the vertical and
horizontal axes represent a token sequence of source code.
A dot means that corresponding two tokens on the two axes are same. The main diagonal line is
always drawn, since each dot on it refers to an identical position of the two axes.
A clone pair is shown as a diagonal line segment.
The distribution is symmetrical with the main diagonal line.
a b c a b c a d e c a b c a b c a d e
c
a, b, c, ... : tokens : matched position
15
Sorting function
f1f2
f3f4
f5f6
f1 f2 f3 f4 f5 f6f1
f6f3
f1 f6 f3f4
f4f2
f5f5f2
When multiple files are compared in scatter plot, boundaries of their files are shown on the axes.
Depending on the file orders, the distribution of dots is spread widely.
We put similar files as near as possible.
16
Snapshots of scatter plot
17
Clone class metrics LEN (C ): Length of token sequence of each element in clone class C POP (C ): Number of elements in clone class C
RAD (C ): Distribution in the file system of elements in clone class C
DFL (C ): Estimation of how many tokens would be removed from source files when all code fragments of clone
class C are replaced with caller statements of a new identical routine
new sub routinecaller
statements
18
Aims of clone class metricsWe are interested in
Clone classes whose elements are spread widely. High value of POP means that there are many similar
code fragments. High value of RAD means that the clones are spread
over many subsystems. They are difficult to find all together in maintenance.
Clone classes which are appropriate for refactoring. High value of DFL (high value POP and high value of L
EN) means that the clone class is worth evaluating whether the elements can be merged into one routine.
19
Snapshots of clone class metric graph
RAD LEN POP DFL
Filtering mode : ON
20
Case study overviewApplication target
Programs developed in a programming exercise of Osaka Univ.Compiler in C languagePrograms of 69 studentsTotal size is 360,000 lines of code
Issue of AnalysisSimilarity among all programs
In the programming exercise, plagiarisms sometimes happen.
21
Analysis (1/2) Compiler of 69
students are arranged on the two axes.
The distribution is spread widely. Rearrangement of
scatter plot using sorting function
The grid represents boundary lines between individuals.
22
Analysis (2/2)
A
B
The corresponding code A (2 students)
Similar code fragments were from source code of sample compiler described in textbook.
B (4 students)Many code fragments
were similar even with respect to name of variables or comments.
23
ConclusionsWe presented a maintenance support
environment based on code clone analysis, Gemini.
We also applied it to programming exercise to evaluate its usefulness.
We are going to evaluate the applicability of Gemini to large scale software in actual software maintenance as future research work.
24
Suffix-tree Suffix tree is a tree that satisfies the following conditions.1. A leaf node represents the starting
position of sub-string.2. A path from root node to a leaf node
represents a sub-string.3. First characters of labels
of all the edges from one node are different from each other.
→ A common path means a clone
x
y
z%
%
xyxyz%
y
xyz%
z%
xyz%
z%
1
2
43
56
71 2 3 4 5 6 7x x y x y z %
1 2 3 4 5 6 7x x y x y z %
1 x *2 x * *3 y *4 x * * *5 y * *6 z *7 % *
25
Definition of DFL and RAD DFL(C )
DFL(C) = LEN(C) ×POP(C) - 5×POP(C) + LEN(C) LEN(C) ×POP(C) : the target code size for restructuring5×POP(C) : the code size of new caller statements LEN(C) : the code size of new identical routine
RAD (C ) Distribution in the file system of elements in clone class C
RAD(C) = 0 : C is enclosed within a single file.RAD(C) = 1 : C is enclosed within a single directory.RAD(C) = n : C is enclosed within a directory tree of n layers.
new sub routinecaller
statements
26
RSA(i) : Ratio of covered code range in file i by clones between one file i of other filesStep2:
From among the remaining files, select the most similar file to F and put it next toF by the value of RST
RST(i,j) : Ratio of covered code range in file i by clones between a file i and a file j
f1f1
Sorting functionStep1:
Select a head file by the value of RSA(Make F the head file)
Step3:Repeat step2 recursively while any file remains, treating the most similar file in previous step2 as new F
f1f2
f3f4
f5f6
f1 f2 f3 f4 f5 f6
f1f6
f1 f6f1
f6f3
f1 f6 f3f1
f6f3
f1 f6 f3f4
f4f1
f6f3
f1 f6 f3f4
f4f2
f5f5f2
27
Analysis - reuse of programs (1/3) RST(Parser,Checker) and RST(Checker,SPC) of each stu
dent were used as ratio of reused code.
RST Parser,
CheckerChekcer, SP
Cave
S1 0.117 0.086 0.102
S2 0.553 0.563 0.549
S3 0.674 0.729 0.701
: : : :S69 0.112 0.598 0.390ave 0.185 0.461 0.320max
0.674 0.747 0.701
min 0.037 0.086 0.102
28
Analysis - reuse of programs (2/3)
Parser Checker SPC
Parser Checker SPC
The average of RST of S1 is the lowest. C : between Parser and Checker D : between Checker and Parser
C
D
Minimum length of clone to be detected was changed to 15 tokens.
29
Analysis - reuse of programs (3/3)
Parser Checker SPCParser Checker SPCS3
Parser Checker SPCParser Checker SPC
S2
The highest average value of RST S2 : 0.549, S3 : 0.701 Different appearances in scatter plot
30
Parser Checker SPCParser Checker
SPC
S10
S10 : The value of DFL(SPC) was very high
Parser Checker SPCParser Checker
SPC
S9
S9 : The value of DFL(Parser) was very high
Analysis- Usefulness of metric graph
Verified the value of DFL from metrics graph DFL(C) = (LEN(C) ×POP(C))– (LEN (C) + 5×POP(C))
DFL
Parser Checker
SPC
S1 0 99 113: : : :
S9 3538 163 189S10 100 211 3439: : : :
S69 223 211 258ave. 196 183 311
E
C
D
The highest values of DFL in each program