30
1 Gemini: Maintenance Support Environment Based on Code Clone Analysis *Graduate School of Engineering Science, Osaka Univ. **PRESTO, Japan Science and Technology Corp. ***Graduate School of Information Science and Technolo gy, Osaka Univ. [email protected] {kamiya, kusumoto, inoue}@ist.osaka-u.ac.jp Yasushi Ueda*, Toshihiro Kamiya**, Shinji Kusumoto*** and Katsuro Inoue ***

Gemini: Maintenance Support Environment Based on Code Clone Analysis

  • Upload
    keaira

  • View
    36

  • Download
    5

Embed Size (px)

DESCRIPTION

Gemini: Maintenance Support Environment Based on Code Clone Analysis. Yasushi Ueda*, Toshihiro Kamiya**, Shinji Kusumoto*** and Katsuro Inoue***. *Graduate School of Engineering Science, Osaka Univ. **PRESTO, Japan Science and Technology Corp. - PowerPoint PPT Presentation

Citation preview

Page 1: Gemini: Maintenance Support Environment Based on Code Clone Analysis

1

Gemini:Maintenance Support EnvironmentBased on Code Clone Analysis

*Graduate School of Engineering Science, Osaka Univ.**PRESTO, Japan Science and Technology Corp.

***Graduate School of Information Science and Technology, Osaka [email protected]

{kamiya, kusumoto, inoue}@ist.osaka-u.ac.jp

Yasushi Ueda*, Toshihiro Kamiya**,Shinji Kusumoto*** and Katsuro Inoue***

Page 2: Gemini: Maintenance Support Environment Based on Code Clone Analysis

2

ContentsBackgroundMaintenance support environment,

GeminiOverviewSystem structureScatter Plot

Case StudyConclusions

Page 3: Gemini: Maintenance Support Environment Based on Code Clone Analysis

3

Background (1/2) A code clone is a pair/set of code portions in

source files that are identical or similar to each other.

clone pair

clone pair

clone pair

clone class

Page 4: Gemini: Maintenance Support Environment Based on Code Clone Analysis

4

Background (2/2) Code clone is one of the factors that make

software maintenance more difficult. If some faults are found in a code fragment, it is

necessary to correct the faults in its all clone pairs.

[1] T. Kamiya, S. Kusumoto, and K. Inoue, “CCFinder: A multi-linguistic token-based code clone detection system for large scale source code”,

IEEE Transactions on Software Engineering, (to appear).

We have developed a code clone detection tool, CCFinder[1]. Token-based clone detector Its input is a set of source files and

output is the locations of clone pairs.

Page 5: Gemini: Maintenance Support Environment Based on Code Clone Analysis

5

CCFinder (1/4) Clone detection process consists of four steps.

Source files

Lexical analysis

Transformation

Token sequence

Match detection

Transformed token sequence

Clones on transformed sequence

Formatting

Clone pairs

CCfinder

Step 1

Step 2

Step 3

Step 4

Target program

C / C++ Java FORTRAN COBOL LISP

Page 6: Gemini: Maintenance Support Environment Based on Code Clone Analysis

6

Source files

Lexical analysis

Transformation

Token sequence

Match detection

Transformed token sequence

Clones on transformed sequence

Formatting

Clone pairs

1. static void foo() throws RESyntaxException { 2. String a[] = new String [] { "123,400", "abc", "orange 100" }; 3. org.apache.regexp.RE pat = new org.apache.regexp.RE("[0-9,]+"); 4. int sum = 0; 5. for (int i = 0; i < a.length; ++i) 6. if (pat.match(a[i])) 7. sum += Sample.parseNumber(pat.getParen(0)); 8. System.out.println("sum = " + sum); 9. }10. static void goo(String [] a) throws RESyntaxException {11. RE exp = new RE("[0-9,]+");12. int sum = 0;13. for (int i = 0; i < a.length; ++i)14. if (exp.match(a[i]))15. sum += parseNumber(exp.getParen(0));16. System.out.println("sum = " + sum);17. }

static void foo ( ) {

String a [ ] = new String [ ] { "123,400" , "abc" , "orange 100" } ;

int sum = 0 ;

for ( int i = 0 ; i < a . length ; ++ i )

sum += pat . getParen 0 ;

System . out . println ( "sum = " + sum ) ;

}

throws RESyntaxException

Sample . parseNumber ( ) )

if pat . match a [ i ]( ) )

org . apache . regexp . RE pat = new org . apache . regexp . RE ( "[0-9,]+" ) ;

static void goo ( ) {String a [ ]

int sum = 0 ;

for ( int i = 0 ; i < a . length ; ++ i )

System . out . println ( "sum = " + sum ) ;

}

throws RESyntaxException

if exp . match a [ i ]( ) )

exp = new RE ( "[0-9,]+" ) ;

(

RE

sum += exp . getParen 0 ;parseNumber ( ) )(

(

(

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.

static $p ( ) {

[ ] = new [ ] { $u } ;

= ;

for ( = ; < . ; ++ )

+= . ;

. . ( + ) ;

}

throws

. ( ) )

if . [ ]( ) )

= new ( ) ;

static ( ) {[ ]

= ;

for ( = ; < . ; ++ )

. . ( + ) ;

}

throws

if . [ ]( ) )

= new ( ) ;

(

+= . ;( ) )(

(

(

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.

.

$p $p

$p $p

$p $p $p $p

$p $p $p

$p $p $p $p $p $p $p

$p $p $p $p

$p $p $p $p $p $p

$p $p $p $p $p

$p $p $p $p $p

$p $p $p $p

$p $p $p

$p $p $p $p $p $p $p

$p $p $p $p

$p $p $p $p $p $p

$p $p $p $p $p

$p Lexical analysis

Transformation

Token sequence

Match detection

Transformed token sequence

Clones on transformed sequence

Formatting

static void foo ( ) {

String a [ ] = new String [ ] { $u } ;

int sum = 0 ;

for ( int i = 0 ; i < a . length ; ++ i )

sum += pat . getParen 0 ;

System . out . println ( "sum = " + sum ) ;

}

throws RESyntaxException

Sample . parseNumber ( ) )

if pat . match a [ i ]( ) )

RE pat = new RE ( "[0-9,]+" ) ;

static void goo ( ) {String a [ ]

int sum = 0 ;

for ( int i = 0 ; i < a . length ; ++ i )

System . out . println ( "sum = " + sum ) ;

}

throws RESyntaxException

if exp . match a [ i ]( ) )

exp = new RE ( "[0-9,]+" ) ;

(

RE

sum += exp . getParen 0 ;parseNumber ( ) )(

(

(

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.

$p .

CCFinder (2/4) Example of clone detection process

static $p ( ) {

[ ] = new [ ] { $u } ;

= ;

for ( = ; < . ; ++ )

+= . ;

. . ( + ) ;

}

throws

. ( ) )

if . [ ]( ) )

= new ( ) ;

static ( ) {[ ]

= ;

for ( = ; < . ; ++ )

. . ( + ) ;

}

throws

if . [ ]( ) )

= new ( ) ;

(

+= . ;( ) )(

(

(

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.

.

$p $p

$p $p

$p $p $p $p

$p $p $p

$p $p $p $p $p $p $p

$p $p $p $p

$p $p $p $p $p $p

$p $p $p $p $p

$p $p $p $p $p

$p $p $p $p

$p $p $p

$p $p $p $p $p $p $p

$p $p $p $p

$p $p $p $p $p $p

$p $p $p $p $p

$p

Page 7: Gemini: Maintenance Support Environment Based on Code Clone Analysis

7

Example of transformation rules in Java All identifiers defined by user are transformed to same toke

ns. Unique identifier is inserted at each end of the top-level defi

nitions and declarations. Prevents detecting clones that begin at the middle of class definitio

n and end at the middle of another one. ”java. lang. Math. PI” is transformed to ”Math. PI”.

By using import sentence, a class is referred to with either full package name or a shorter name

” new int[] {1, 2, 3} ” is transformed to ” new int[] {$} ” Eliminates table initialization code.

Page 8: Gemini: Maintenance Support Environment Based on Code Clone Analysis

8

Source files

Lexical analysis

Transformation

Token sequence

Match detection

Transformed token sequence

Clones on transformed sequence

Formatting

Clone pairs

1. static void foo() throws RESyntaxException { 2. String a[] = new String [] { "123,400", "abc", "orange 100" }; 3. org.apache.regexp.RE pat = new org.apache.regexp.RE("[0-9,]+"); 4. int sum = 0; 5. for (int i = 0; i < a.length; ++i) 6. if (pat.match(a[i])) 7. sum += Sample.parseNumber(pat.getParen(0)); 8. System.out.println("sum = " + sum); 9. }10. static void goo(String [] a) throws RESyntaxException {11. RE exp = new RE("[0-9,]+");12. int sum = 0;13. for (int i = 0; i < a.length; ++i)14. if (exp.match(a[i]))15. sum += parseNumber(exp.getParen(0));16. System.out.println("sum = " + sum);17. }

static void foo ( ) {

String a [ ] = new String [ ] { "123,400" , "abc" , "orange 100" } ;

int sum = 0 ;

for ( int i = 0 ; i < a . length ; ++ i )

sum += pat . getParen 0 ;

System . out . println ( "sum = " + sum ) ;

}

throws RESyntaxException

Sample . parseNumber ( ) )

if pat . match a [ i ]( ) )

org . apache . regexp . RE pat = new org . apache . regexp . RE ( "[0-9,]+" ) ;

static void goo ( ) {String a [ ]

int sum = 0 ;

for ( int i = 0 ; i < a . length ; ++ i )

System . out . println ( "sum = " + sum ) ;

}

throws RESyntaxException

if exp . match a [ i ]( ) )

exp = new RE ( "[0-9,]+" ) ;

(

RE

sum += exp . getParen 0 ;parseNumber ( ) )(

(

(

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.

static $p ( ) {

[ ] = new [ ] { $u } ;

= ;

for ( = ; < . ; ++ )

+= . ;

. . ( + ) ;

}

throws

. ( ) )

if . [ ]( ) )

= new ( ) ;

static ( ) {[ ]

= ;

for ( = ; < . ; ++ )

. . ( + ) ;

}

throws

if . [ ]( ) )

= new ( ) ;

(

+= . ;( ) )(

(

(

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.

.

$p $p

$p $p

$p $p $p $p

$p $p $p

$p $p $p $p $p $p $p

$p $p $p $p

$p $p $p $p $p $p

$p $p $p $p $p

$p $p $p $p $p

$p $p $p $p

$p $p $p

$p $p $p $p $p $p $p

$p $p $p $p

$p $p $p $p $p $p

$p $p $p $p $p

$p Lexical analysis

Transformation

Token sequence

Match detection

Transformed token sequence

Clones on transformed sequence

Formatting

static void foo ( ) {

String a [ ] = new String [ ] { $u } ;

int sum = 0 ;

for ( int i = 0 ; i < a . length ; ++ i )

sum += pat . getParen 0 ;

System . out . println ( "sum = " + sum ) ;

}

throws RESyntaxException

Sample . parseNumber ( ) )

if pat . match a [ i ]( ) )

RE pat = new RE ( "[0-9,]+" ) ;

static void goo ( ) {String a [ ]

int sum = 0 ;

for ( int i = 0 ; i < a . length ; ++ i )

System . out . println ( "sum = " + sum ) ;

}

throws RESyntaxException

if exp . match a [ i ]( ) )

exp = new RE ( "[0-9,]+" ) ;

(

RE

sum += exp . getParen 0 ;parseNumber ( ) )(

(

(

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.

$p .

CCFinder (2/4) Example of clone detection process

static $p ( ) {

[ ] = new [ ] { $u } ;

= ;

for ( = ; < . ; ++ )

+= . ;

. . ( + ) ;

}

throws

. ( ) )

if . [ ]( ) )

= new ( ) ;

static ( ) {[ ]

= ;

for ( = ; < . ; ++ )

. . ( + ) ;

}

throws

if . [ ]( ) )

= new ( ) ;

(

+= . ;( ) )(

(

(

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.

.

$p $p

$p $p

$p $p $p $p

$p $p $p

$p $p $p $p $p $p $p

$p $p $p $p

$p $p $p $p $p $p

$p $p $p $p $p

$p $p $p $p $p

$p $p $p $p

$p $p $p

$p $p $p $p $p $p $p

$p $p $p $p

$p $p $p $p $p $p

$p $p $p $p $p

$p Lexical analysis

Transformation

Token sequence

Match detection

Transformed token sequence

Clones on transformed sequence

Formatting

stat

ic$p $p ( ) th

rows

$p { $p $p [ ] "="

$p $p [ ] { $u } ; $p $p "="

new

$p ( $p ) ; $p $p "="

$p

static * $p * * * * * * * * * * * * * *$p * * * * * * * * * * * * * *( * * ) * * throws * $p * * * * * * * * * * * * * *{ * * $p * * * * * * * * * * * * * *$p * * * * * * * * * * * * * *[ * * ] * * "=" * * * $p * * * * * * * * * * * * * *$p * * * * * * * * * * * * * *[ * * ] * * { * * $u * } * ; * * $p * * * * * * * * * * * * * *$p * * * * * * * * * * * * * *"=" * * * new * $p * * * * * * * * * * * * * *( * * $p * * * * * * * * * * * * * *) * * ; * * $p * * * * * * * * * * * * * *$p * * * * * * * * * * * * * *"=" * * * $p * * * * * * * * * * * * * *

* * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * ** ** * * * * ** ** * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * ** ** * * * * ** ** * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * * * * * * *

Lexical analysis

Transformation

Token sequence

Match detection

Transformed token sequence

Clones on transformed sequence

Formatting

1. static void foo() throws RESyntaxException { 2. String a[] = new String [] { "123,400", "abc", "orange 100" }; 3. org.apache.regexp.RE pat = new org.apache.regexp.RE("[0-9,]+"); 4. int sum = 0; 5. for (int i = 0; i < a.length; ++i) 6. if (pat.match(a[i])) 7. sum += Sample.parseNumber(pat.getParen(0)); 8. System.out.println("sum = " + sum); 9. }10. static void goo(String [] a) throws RESyntaxException {11. RE exp = new RE("[0-9,]+");12. int sum = 0;13. for (int i = 0; i < a.length; ++i)14. if (exp.match(a[i]))15. sum += parseNumber(exp.getParen(0));16. System.out.println("sum = " + sum);17. }

Page 9: Gemini: Maintenance Support Environment Based on Code Clone Analysis

9

CCFinder (3/4)Application of CCFinder

Free softwareJDK libraries (Java, 570 KLOC)Linux, FreeBSD (C, 1.6 + 1.3 MLOC)FreeBSD, OpenBSD , NetBSD(C)Qt(C++ , 240KLOC)

Commercial softwareNTT data Corp., Hitachi Ltd., NEC soft Ltd.,

ASTEC Inc., SRA Inc.NASDA (Control program for rocket)

Page 10: Gemini: Maintenance Support Environment Based on Code Clone Analysis

10

CCFinder (4/4) Output of CCFinder

#version: ccfinder 3.1

#langspec: JAVA

#option: -b 30,1

#option: -k +

#option: -r abcdfikmnprsv

#option: -c wfg

#begin{file description}

0.0 52 C:\Gemini.java

0.1 94 C:\GeneralManager.java

:

:

#end{file description}

#begin{clone}

0.1 53,9 63,13 1.10 542,9 553,13 35

0.1 53,9 63,13 1.10 624,9 633,13 35

0.2 124,9 152,31 0.2 154,9 216,51 42

       :

:

#end{clone}

Object file ID( file 0 in Group 0 )

Location of a clone pair( Lines 53 - 63 in file 0.1 and Lines 542 - 553 in file 1.10 are identical or similar to each other)

It is difficult to analyze source code by only this text-based information of the location of clone pairs.

Page 11: Gemini: Maintenance Support Environment Based on Code Clone Analysis

11

Goals of this studyProposal of an interactive code clone

analysis environment Gemini

Case study to evaluate the proposed environment Apply Gemini to programming exercise

in our university and analyze the results.

Page 12: Gemini: Maintenance Support Environment Based on Code Clone Analysis

12

Gemini overview A GUI-based code clone analysis environment

Uses CCFinder as a code clone detector. Has several views to interactive analysis.

Scatter plot view Select by mouse dragging Sorting function Zoom in/out

Metric graph view Select by metric values

Source code view Implemented in Java

About 10,000 lines of code

Page 13: Gemini: Maintenance Support Environment Based on Code Clone Analysis

13

Clone pair manager (CPM)

Clone class manager (CCM)

Scatter plot view

Clone pair list view

Metrics graph

Clone class list view

User Interfaces

System structure of Gemini

Source files

Source code manager (SCM)

Source code viewClone

selection information

Clone selection

information

User

Gemini

Code clone detector (CCD)

CCFinder

Code clone database

(CDB)

Page 14: Gemini: Maintenance Support Environment Based on Code Clone Analysis

14

Scatter plot Both the vertical and

horizontal axes represent a token sequence of source code.

A dot means that corresponding two tokens on the two axes are same. The main diagonal line is

always drawn, since each dot on it refers to an identical position of the two axes.

A clone pair is shown as a diagonal line segment.

The distribution is symmetrical with the main diagonal line.

a b c a b c a d e c a b c a b c a d e

c

a, b, c, ... : tokens : matched position

Page 15: Gemini: Maintenance Support Environment Based on Code Clone Analysis

15

Sorting function

f1f2

f3f4

f5f6

f1 f2 f3 f4 f5 f6f1

f6f3

f1 f6 f3f4

f4f2

f5f5f2

When multiple files are compared in scatter plot, boundaries of their files are shown on the axes.

Depending on the file orders, the distribution of dots is spread widely.

We put similar files as near as possible.

Page 16: Gemini: Maintenance Support Environment Based on Code Clone Analysis

16

Snapshots of scatter plot

Page 17: Gemini: Maintenance Support Environment Based on Code Clone Analysis

17

Clone class metrics LEN (C ): Length of token sequence of each element in clone class C POP (C ): Number of elements in clone class C

RAD (C ): Distribution in the file system of elements in clone class C

DFL (C ): Estimation of how many tokens would be removed from source files when all code fragments of clone

class C are replaced with caller statements of a new identical routine

new sub routinecaller

statements

Page 18: Gemini: Maintenance Support Environment Based on Code Clone Analysis

18

Aims of clone class metricsWe are interested in

Clone classes whose elements are spread widely. High value of POP means that there are many similar

code fragments. High value of RAD means that the clones are spread

over many subsystems. They are difficult to find all together in maintenance.

Clone classes which are appropriate for refactoring. High value of DFL (high value POP and high value of L

EN) means that the clone class is worth evaluating whether the elements can be merged into one routine.

Page 19: Gemini: Maintenance Support Environment Based on Code Clone Analysis

19

Snapshots of clone class metric graph

RAD LEN POP DFL

Filtering mode : ON

Page 20: Gemini: Maintenance Support Environment Based on Code Clone Analysis

20

Case study overviewApplication target

Programs developed in a programming exercise of Osaka Univ.Compiler in C languagePrograms of 69 studentsTotal size is 360,000 lines of code

Issue of AnalysisSimilarity among all programs

In the programming exercise, plagiarisms sometimes happen.

Page 21: Gemini: Maintenance Support Environment Based on Code Clone Analysis

21

Analysis (1/2) Compiler of 69

students are arranged on the two axes.

The distribution is spread widely. Rearrangement of

scatter plot using sorting function

The grid represents boundary lines between individuals.

Page 22: Gemini: Maintenance Support Environment Based on Code Clone Analysis

22

Analysis (2/2)

A

B

The corresponding code A (2 students)

Similar code fragments were from source code of sample compiler described in textbook.

B (4 students)Many code fragments

were similar even with respect to name of variables or comments.

Page 23: Gemini: Maintenance Support Environment Based on Code Clone Analysis

23

ConclusionsWe presented a maintenance support

environment based on code clone analysis, Gemini.

We also applied it to programming exercise to evaluate its usefulness.

We are going to evaluate the applicability of Gemini to large scale software in actual software maintenance as future research work.

Page 24: Gemini: Maintenance Support Environment Based on Code Clone Analysis

24

Suffix-tree Suffix tree is a tree that satisfies the following conditions.1. A leaf node represents the starting

position of sub-string.2. A path from root node to a leaf node

represents a sub-string.3. First characters of labels

of all the edges from one node are different from each other.

→ A common path means a clone

x

y

z%

%

xyxyz%

y

xyz%

z%

xyz%

z%

1

2

43

56

71 2 3 4 5 6 7x x y x y z %

1 2 3 4 5 6 7x x y x y z %

1 x *2 x * *3 y *4 x * * *5 y * *6 z *7 % *

Page 25: Gemini: Maintenance Support Environment Based on Code Clone Analysis

25

Definition of DFL and RAD DFL(C )

DFL(C) = LEN(C) ×POP(C) - 5×POP(C) + LEN(C) LEN(C) ×POP(C) : the target code size for restructuring5×POP(C) : the code size of new caller statements LEN(C) : the code size of new identical routine

RAD (C ) Distribution in the file system of elements in clone class C

RAD(C) = 0 : C is enclosed within a single file.RAD(C) = 1 : C is enclosed within a single directory.RAD(C) = n : C is enclosed within a directory tree of n layers.

new sub routinecaller

statements

Page 26: Gemini: Maintenance Support Environment Based on Code Clone Analysis

26

RSA(i) : Ratio of covered code range in file i by clones between one file i   of other filesStep2:

From among the remaining files, select the most similar file to F and put it next toF by the value of RST

RST(i,j) : Ratio of covered code range in file i by clones between a file i and a file j

f1f1

Sorting functionStep1:

Select a head file by the value of RSA(Make F the head file)

Step3:Repeat step2 recursively while any file remains, treating the most similar file in previous step2 as new F

f1f2

f3f4

f5f6

f1 f2 f3 f4 f5 f6

f1f6

f1 f6f1

f6f3

f1 f6 f3f1

f6f3

f1 f6 f3f4

f4f1

f6f3

f1 f6 f3f4

f4f2

f5f5f2

Page 27: Gemini: Maintenance Support Environment Based on Code Clone Analysis

27

Analysis - reuse of programs (1/3) RST(Parser,Checker) and RST(Checker,SPC) of each stu

dent were used as ratio of reused code.

RST Parser,

CheckerChekcer, SP

Cave

S1 0.117 0.086 0.102

S2 0.553 0.563 0.549

S3 0.674 0.729 0.701

: : : :S69 0.112 0.598 0.390ave 0.185 0.461 0.320max

0.674 0.747 0.701

min 0.037 0.086 0.102

Page 28: Gemini: Maintenance Support Environment Based on Code Clone Analysis

28

Analysis - reuse of programs (2/3)

Parser Checker SPC

Parser Checker SPC

The average of RST of S1 is the lowest. C : between Parser and Checker D : between Checker and Parser

C

D

Minimum length of clone to be detected was changed to 15 tokens.

Page 29: Gemini: Maintenance Support Environment Based on Code Clone Analysis

29

Analysis - reuse of programs (3/3)

Parser Checker SPCParser Checker SPCS3

Parser Checker SPCParser Checker SPC

S2

The highest average value of RST S2 : 0.549, S3 : 0.701 Different appearances in scatter plot

Page 30: Gemini: Maintenance Support Environment Based on Code Clone Analysis

30

Parser Checker SPCParser Checker

SPC

S10

S10 : The value of DFL(SPC) was very high

Parser Checker SPCParser Checker

SPC

S9

S9 : The value of DFL(Parser) was very high

Analysis- Usefulness of metric graph

Verified the value of DFL from metrics graph DFL(C) = (LEN(C) ×POP(C))– (LEN (C) + 5×POP(C))

DFL

Parser Checker

SPC

S1 0 99 113: : : :

S9 3538 163 189S10 100 211 3439: : : :

S69 223 211 258ave. 196 183 311

E

C

D

The highest values of DFL in each program