39
2002/12/11 PROFES2002 1 On software maintenance process improvement based on code clone analysis Yoshiki Higo* Yasushi Ueda* Toshihiro Kamiya** Shinji Kusumoto*, Katsuro Inoue* *Graduate School of Information and Science Technology, Osaka University **PRESTO, Japan Science and Technology Corporation

2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

Embed Size (px)

Citation preview

Page 1: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 1

On software maintenance process improvement based on code clone analysis

Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** ,Shinji Kusumoto*, Katsuro Inoue*

 

*Graduate School of Information and Science Technology, Osaka University

**PRESTO, Japan Science and Technology Corporation

Page 2: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 2

Contents

Background: Code CloneObjectiveCode Clone Detection Tool: CCFinderProposed Clone Removal TechniqueCase StudiesSummaries and Future Works

Page 3: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 3

Background: Code CloneCode clone is a code portion in source files

that is identical or similar to another.

Clone Pair

Clone Class

Code clone is one of factors that make software maintenance more difficult.

If some faults are found in a code clone, it is necessary to correct the faults in its all code clones.

Page 4: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 4

Code Clone Detection Tool: CCFinder

We have been developing a code clone detection tool, CCFinder.

We have delivered CCFinder to software companies and evaluated the usefulness through some case studies.

Page 5: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 5

Case studies of CCFinderOpen source software

Commercial Software (about 30 companies)

Students exercise of Osaka University Filed in a court as an evidence for

software copyright suit

JDK libraries (Java, 570 KLOC) Linux, FreeBSD (C, 1.6 + 1.3 MLOC) FreeBSD, OpenBSD , NetBSD(C) Qt(C++ , 240KLOC)

NTT Data Corp., Hitachi Ltd., Hitachi GP, NEC soft Ltd., ASTEC Inc., SRA Inc., NASDA , Daiwa Computer, etc…

Page 6: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 6

Purpose of our research

As an actual application of CCFinder, we want to use code clone analysis in refactoring process.

But code clones detected by CCFinder are sequences of tokens, such code clones are not appropriate to be directly replaced by one module (subroutine, function et.).

Page 7: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 7

Objective We propose a method to extract code clones fro

m ones detected by CCFinder, which are well-suited to refactoring process ([Extract Method], [Pull Up Method])*.

We apply the proposed method to some open source softwares, and evaluate the usefulness of it.

*M. Fowler: Refactoring: Improving the Design of Existing Code, Addison-Wesley, 1999.

Page 8: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 8

Extract Method

Void methodA(int i){methodZ();

System.out.println(“name:” + name);System.out.println(“amount:” + i);

}

Void methodB(int i){methodY();

System.out.println(“name:” + name);System.out.println(“amount:” + i);

}

void methodA(int i){methodZ();methodC(i);

}void methodB(int i){

methodY();methodC(i);

}

Void methodC(int i){System.out.println(“name:” + name);System.out.println(“amount:” + i);

}

methodC(i);

methodC(i);

Page 9: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 9

Pull Up Method

  method A

class A

class B class C

class A

class B class C  method A   method A

Page 10: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 10

Outline of CCFinder Clone detection process consists of four steps.

Source files

Lexical analysis

Transformation

Token sequence

Match detection

Transformed token sequence

Clones on transformed sequence

Formatting

Clone pairs

CCFinder

Step 1

Step 2

Step 3

Step 4

Target program

C / C++

Java

FORTRAN

COBOL

LISP

Plain Text

Page 11: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 11

Process of CCFinder(1/4) Source files

Lexical analysis

Transformation

Token sequence

Match detection

Transformed token sequence

Clones on transformed sequence

Formatting

1. static void foo() throws RESyntaxException { 2. String a[] = new String [] { "123,400", "abc", "orange 100" }; 3. org.apache.regexp.RE pat = new org.apache.regexp.RE("[0-9,]+"); 4. int sum = 0; 5. for (int i = 0; i < a.length; ++i) 6. if (pat.match(a[i])) 7. sum += Sample.parseNumber(pat.getParen(0)); 8. System.out.println("sum = " + sum); 9. }10. static void goo(String [] a) throws RESyntaxException {11. RE exp = new RE("[0-9,]+");12. int sum = 0;13. for (int i = 0; i < a.length; ++i)14. if (exp.match(a[i]))15. sum += parseNumber(exp.getParen(0));16. System.out.println("sum = " + sum);17. }

static void foo ( ) {

String a [ ] = new String [ ] { "123,400" , "abc" , "orange 100" } ;

int sum = 0 ;

for ( int i = 0 ; i < a . length ; ++ i )

sum += pat . getParen 0 ;

System . out . println ( "sum = " + sum ) ;

}

throws RESyntaxException

Sample . parseNumber ( ) )

if pat . match a [ i ]( ) )

org . apache . regexp . RE pat = new org . apache . regexp . RE ( "[0-9,]+" ) ;

static void goo ( ) {String a [ ]

int sum = 0 ;

for ( int i = 0 ; i < a . length ; ++ i )

System . out . println ( "sum = " + sum ) ;

}

throws RESyntaxException

if exp . match a [ i ]( ) )

exp = new RE ( "[0-9,]+" ) ;

(

RE

sum += exp . getParen 0 ;parseNumber ( ) )(

(

(

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.

Clone pairs

static $p ( ) {

[ ] = new [ ] { $u } ;

= ;

for ( = ; < . ; ++ )

+= . ;

. . ( + ) ;

}

throws

. ( ) )

if . [ ]( ) )

= new ( ) ;

static ( ) {[ ]

= ;

for ( = ; < . ; ++ )

. . ( + ) ;

}

throws

if . [ ]( ) )

= new ( ) ;

(

+= . ;( ) )(

(

(

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.

.

$p $p

$p $p

$p $p $p $p

$p $p $p

$p $p $p $p $p $p $p

$p $p $p $p

$p $p $p $p $p $p

$p $p $p $p $p

$p $p $p $p $p

$p $p $p $p

$p $p $p

$p $p $p $p $p $p $p

$p $p $p $p

$p $p $p $p $p $p

$p $p $p $p $p

$p

Transformation

Lexical analysis

Page 12: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 12

stat

ic$p $p ( ) th

rows

$p { $p $p [ ] "="

$p $p [ ] { $u } ; $p $p "="

new

$p ( $p ) ; $p $p "="

$p

static * $p * * * * * * * * * * * * * *$p * * * * * * * * * * * * * *( * * ) * * throws * $p * * * * * * * * * * * * * *{ * * $p * * * * * * * * * * * * * *$p * * * * * * * * * * * * * *[ * * ] * * "=" * * * $p * * * * * * * * * * * * * *$p * * * * * * * * * * * * * *[ * * ] * * { * * $u * } * ; * * $p * * * * * * * * * * * * * *$p * * * * * * * * * * * * * *"=" * * * new * $p * * * * * * * * * * * * * *( * * $p * * * * * * * * * * * * * *) * * ; * * $p * * * * * * * * * * * * * *$p * * * * * * * * * * * * * *"=" * * * $p * * * * * * * * * * * * * *

Process of CCFinder(3/4) Source files

Lexical analysis

Transformation

Token sequence

Match detection

Transformed token sequence

Clones on transformed sequence

Formatting

Clone pairs

Page 13: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 13

Process of CCFinder(3/4)* * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * ** ** * * * * ** ** * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * ** ** * * * * ** ** * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * ** ** * ** ** ** * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * ** ** * * * * * * * * * * * * * * * * * * * * *

Source files

Lexical analysis

Transformation

Token sequence

Match detection

Transformed token sequence

Clones on transformed sequence

Formatting

Clone pairs

Page 14: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 14

Process of CCFinder(4/4) 1. static void foo() throws RESyntaxException { 2. String a[] = new String [] { "123,400", "abc", "orange 100" }; 3. org.apache.regexp.RE pat = new org.apache.regexp.RE("[0-9,]+"); 4. int sum = 0; 5. for (int i = 0; i < a.length; ++i) 6. if (pat.match(a[i])) 7. sum += Sample.parseNumber(pat.getParen(0)); 8. System.out.println("sum = " + sum); 9. }10. static void goo(String [] a) throws RESyntaxException {11. RE exp = new RE("[0-9,]+");12. int sum = 0;13. for (int i = 0; i < a.length; ++i)14. if (exp.match(a[i]))15. sum += parseNumber(exp.getParen(0));16. System.out.println("sum = " + sum);17. }

Source files

Lexical analysis

Transformation

Token sequence

Match detection

Transformed token sequence

Clones on transformed sequence

Formatting

Clone pairs

Page 15: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 15

Issues in refactoring process

Since code clones detected by CCFinder are sequences of tokens, so they are not appropriate to be directly replaced by one module (subroutine, function et.).

Some of them do not suit refactoring.

Page 16: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

Example 1 (Code clones including needless statements for refactoring)

righttokennumber = c.getEndNumber() - c.getStartNumber() + 1;}

string getLeftClone() const{

char temp[STRLENGTH];snprintf(temp,STRLENGTH,

"%s\t%d,%d,%d\t%d,%d,%d\t",leftID.c_str(), leftstartline,leftstartcolumn,leftstartnumber, leftendline,leftendcolumn,leftendnumber);

string clone(temp);return clone;

}

string getRightClone() const{

char temp[STRLENGTH];snprintf(temp,STRLENGTH,

"%s\t%d,%d,%d\t%d,%d,%d\t",rightID.c_str(), rightstartline,rightstartcolumn,rightstartnumber, rightendline,rightendcolumn,rightendnumber);

string clone(temp);return clone;

}

int getLeftTokenNumber() const{

return lefttokennumber;}

Page 17: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 17

Example 1 (Code clones including needless statements for refactoring)

righttokennumber = c.getEndNumber() - c.getStartNumber() + 1;}

string getLeftClone() const{ char temp[STRLENGTH]; snprintf(temp,STRLENGTH, "%s\ t%d,%d,%d\ t%d,%d,%d\ t",leftID.c_str(), leftstartline,leftstartcolumn,leftstartnumber, leftendline,leftendcolumn,leftendnumber); string clone(temp); return clone;}

string getRightClone() const{ char temp[STRLENGTH]; snprintf(temp,STRLENGTH,

return clone;}

string getRightClone() const{ char temp[STRLENGTH]; snprintf(temp,STRLENGTH, "%s\ t%d,%d,%d\ t%d,%d,%d\ t",rightID.c_str(), rightstartline,rightstartcolumn,rightstartnumber, rightendline,rightendcolumn,rightendnumber); string clone(temp); return clone;}

int getLeftTokenNumber() const{ return lefttokennumber;}

parts should be detected. Only

Page 18: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

Example 2 (Code clones not suited to refactoring)

lp=makeexptree(&lp,NULL,SRPAREN,NULL,SNULL); rp=makeexptree(NULL,&lp,SIDENTIFIER, idtemp,SPROCEDURE); curVertex->kind=ST_call; curVertex->tree=rp; curVertex->refer=search_tree_postorder(rp); curVertex->refer=uniq_RDlist(curVertex->refer); cutSt();}break;case 67:#line 355 "subPascalParse.y"{yyval.exptree = yyvsp[0].exptree;}break;

if(func->parameter!=NULL) return error("Error:Parameter Exist"); curVertex->kind=ST_call; lp=makeexptree(NULL,NULL,SIDENTIFIER,idtemp,SPROCEDURE); curVertex->tree=lp; curVertex->refer=search_tree_postorder(lp); curVertex->refer=uniq_RDlist(curVertex->refer); cutSt();}break;case 66:#line 314 "subPascalParse.y"{struct FUNCTION *func; struct RD *para;

CCFinder extracts parts as code clones.

But, these are not suited to refactoring.

Page 19: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 19

Outline of proposed method

It extracts meaningful code clone from output of CCFinder.

CCFinder

Filter

Source files

Meaningful clone data

Clone data

Clone data

GUI Interface

Page 20: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 20

Processes executed by the filter

Clone extraction unit : It extracts meaningful code clones from the result of syntax analysis and the output of CCFinder.

Output of CCFinder Source files

Syntax analysis unit

Clone extraction unit

Clone management unit

Output

Clone management unit : It sorts and merges code clones detected by clone extraction unit.

Syntax analysis unit : It performs syntax analysis to source code including code clones.

Page 21: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 21

Implementation of proposed method

CCShaper ( Code Clone Shaper) Target program: Java CCShaper extracts meaningful block from th

e output of CCFinder

Description language: C++ Source size: about 4000 LOC Working environment: Windows2000/XP

Page 22: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 22

Outline of experiments

We conducted two experiments using two kinds of Java source codes, ANTLR and Ant, which are open source software.

Experimental environment Pentium4 1.5GHz memory SDRAM512MB We deal with code clones

which include more than 50 tokens.

CCFinder

Code clonesMeaningful

code clones

CCShaper

Source code of ANTLR, Ant

Page 23: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 23

Two criteria

Clone Pair

Clone Class

A Clone Pair is each pair of clone code portions. A Clone Class is a collection of code portions

that are code clones each other.

In this experiment, we use two criteria, the number of clone pairs and clone classes.

Page 24: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 24

Experiment 1(ANTLR)(1/3)

Without CCShaper With CCShaper

Number of clone pairs

388574 984

Number of clone classes

1072 148Analysis time: about 2 minutes

ANTLR is implemented in Java,and generates parsers in either Java or C++. 239 Source filesSize: about 44000 LOCResult

1/400

1/7

Page 25: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 25

Experiment 1(ANTLR)(2/3)Comparing on scatter plot

Without CCShaper With CCShaper

a

Page 26: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 26

Experiment 1(ANTLR)(3/3)Source code of the selected code clonepublic final void mOPEN_ELEMENT_OPTION(boolean _createToken)throws RecognitionException, CharStreamException, TokenStreamException {    int _ttype;     Token _token=null;    int _begin=text.length();    ttype = OPEN_ELEMENT_OPTION;    int _saveIndex;

    match('<');    if ( _createToken && _token==null && _ttype!=Token.SKIP ) {       _token = makeToken(_ttype);       _token.setText(new String(text.getBuffer(), _begin, text.length()-_begin));    }    _returnToken = _token;}

(a)Only portions are different from other clones.

This code clone appears in 20 places of ANTLR.

All code clones are methods included in the same class.

These methods can be merged to one method by adding 2 arguments.

Page 27: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 27

Experiment 2(Ant)(1/4)Ant is a Java-based build tool. 689 source files.Size: about 164000 LOC.

Result

Without CCShaper With CCShaper

Number of clone pairs

12870 159

Number of clone classes

1161 87Analysis time: about 5 seconds

1/80

1/13

Page 28: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 28

Experiment 2(Ant)(2/4)Comparing on scatter plot

Without CCShaper With CCShaper

b

Page 29: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 29

Experiment 2(Ant)(3/4)

public void getAutoresponse(Commandline cmd) { if (m_AutoResponse == null) { cmd.createArgument().setValue(FLAG_AUTORESPONSE_DEF); } else if (m_AutoResponse.equalsIgnoreCase("Y")) { cmd.createArgument().setValue(FLAG_AUTORESPONSE_YES); } else if (m_AutoResponse.equalsIgnoreCase("N")) { cmd.createArgument().setValue(FLAG_AUTORESPONSE_NO); } else { cmd.createArgument().setValue(FLAG_AUTORESPONSE_DEF); } // end of else}

Source code of the selected code clone (b)These clones are verbatimly the same ones

These clones appear in seven classes

These seven classes inherit a same class

These methods can be merged to one method by pulling up to the parent class

Page 30: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 30

Experiment 2(Ant)(4/4) Class diagram (before refactoring)

MSVSSADD MSVSSCHECKIN MSVSSCHECKOUT

MSVSSCP MSVSSCREATE MSVSSGET MSVSSLABEL

MSVSS

getAutoresponse(Commandline cmd)

getAutoresponse(Commandline cmd)

getAutoresponse(Commandline cmd)

getAutoresponse(Commandline cmd)

getAutoresponse(Commandline cmd)

getAutoresponse(Commandline cmd)

getAutoresponse(Commandline cmd)

Class diagram (after refactoring)

MSVSSADD MSVSSCHECKIN MSVSSCHECKOUT

MSVSSCP MSVSSCREATE MSVSSGET MSVSSLABEL

MSVSS

getAutoresponse(Commandline cmd)

Page 31: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 31

Summaries

We have developed a filtering tool (CCShaper) which extracts code clones that are well-suited to refactoring activity

We have evaluated the usefulness of CCShaper by applying it to actual Java programs

Page 32: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 32

We are going toapply CCShaper to commercial software,extend it as to apply other programming languag

es,develop a filtering method which can extract cod

e clones more-suited to refactoring.

Future works

Page 33: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 33

Page 34: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 34

Web page of CCFinder/Gemini is available at http://sel.ist.osaka-u.ac.jp/cdtools/index.html.en

Page 35: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 35

The difference between ‘diff’ and clone detection tools

Diff finds the longest common sub-string. Given a code portion, diff does not

report two or more same code portions (clones).

Clone detection tool finds all the same or similar code portions.

Page 36: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 36

Suffix-treeSuffix tree is a tree that satisfies the following conditions.

1. A leaf node represents the starting position of sub-string.

2. A path from root node to a leaf node represents a sub-string.

3. First characters of labels of all the edges from one node are different from each other.

→ A common path means a clone

x

y

z%

%

xyxyz%

y

xyz%

z%

xyz%

z%

1

2

43

56

71 2 3 4 5 6 7x x y x y z %

1 2 3 4 5 6 7x x y x y z %

1 x *2 x * *3 y *4 x * * *5 y * *6 z *7 % *

Page 37: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 37

Example of transformation rules in Java

All identifiers defined by user are transformed to same tokens.Unique identifier is inserted at each end of the top-level definitions and declarations.

Prevents detecting clones that begin at the middle of class definition and end at the middle of another one.

”java. lang. Math. PI” is transformed to ”Math. PI”. By using import sentence, a class is referred to with either full pac

kage name or a shorter name

” new int[] {1, 2, 3} ” is transformed to ” new int[] {$} ” Eliminates table initialization code.

Page 38: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 38

The output ofCCFinder

Output of CCFinder

#version: ccfinder 3.1

#langspec: JAVA

#option: -b 30,1

#option: -k +

#option: -r abcdfikmnprsv

#option: -c wfg

#begin{file description}

0.0 52 C:\Gemini.java

0.1 94 C:\GeneralManager.java

:

:

#end{file description}

#begin{clone}

0.1 53,9 63,13 1.10 542,9 553,13 35

0.1 53,9 63,13 1.10 624,9 633,13 35

0.2 124,9 152,31 0.2 154,9 216,51 42

       :

:

#end{clone}

Object file ID( file 0 in Group 0 )

Location of a clone pair( Lines 53 - 63 in file 0.1 and Lines 542 - 553 in file 1.10 are identical or similar to each other)

It is difficult to analyze source code by only this text-based information of the location of clone pairs.

Page 39: 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji

2002/12/11 PROFES2002 39

The analysis of comparison among students (non-gapped clones only)

A

B

The corresponding code A (2 students)

Similar code fragments were from source code of sample compiler described in textbook.

B (4 students) Many code fragments were

similar even with respect to name of variables or comments.