35
1 Comparing the Decompositions Produced by Software Clustering Algorithms using Similarity Measurements 2001 IEEE International Conference on Software Maintenance (ICSM'01). Brian S. Mitchell & Spiros Mancoridis Math & Computer Science, Drexel University

Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

  • Upload
    trantu

  • View
    230

  • Download
    5

Embed Size (px)

Citation preview

Page 1: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

1

Comparing the DecompositionsProduced by Software Clustering

Algorithms using Similarity Measurements

2001 IEEE International Conference on Software Maintenance (ICSM'01).

Brian S. Mitchell & Spiros MancoridisMath & Computer Science, Drexel University

Page 2: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

2Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Motivation

Using module dependencies when determining the similarity between two decompositionsis a good idea…

Page 3: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

3Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Clustering the Structure of a System (1)

Given the structure of a system…

Page 4: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

4Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Clustering the Structure of a System (2)

The goal is to partition the system structure graphinto clusters…

The clusters shouldrepresent the

subsystems

Page 5: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

5Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Clustering the Structure of a System (3)

But how do we know that the clustering result is good?

Page 6: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

6Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Ways to Evaluate Software Clustering Results…

Assess it against a mental modelAssess it against a benchmark standardTechniques:n Subjective Opinionsn Similarity Measurements

Given a software clustering result, we can:

Page 7: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

7Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Example: How “Similar” are these Decompositions?

M1

M5

M2

M6

M3

M4

M7

M8

M1

M5

M2

M6

M3

M4

M7

M8

Green Edges:Similarity still the same…

Red Edges:Not as similar…

Conclusions:Once we add the red edges the similarity between PA and PB decreases

PA

PB

Blue Edges:Similarity still the same…

Page 8: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

8Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Observations

Edges are important for determining the similarity between decompositionsExisting measurements don’t consider edges:n Precision / Recall (similarity)n MoJo (distance)Our idea: Use the edges to determine similarity

Page 9: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

9Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Research ObjectivesCreate new similarity measurements that use dependencies (edges) n EdgeSim (similarity)n MeCl (distance)

Evaluate the new similarity measurements against MoJo & Precision/RecallUse similarity measurements to support evaluation of software clustering results (see our WCRE’01 paper)

Page 10: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

10Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Example: How “Similar” are these Decompositions?

M1

M5

M2

M6

M3

M4

M7

M8

M1

M5

M2

M6

M3

M4

M7

M8

Add Green Edges:PR, MoJo, MeCl &EdgeSim unchanged.

Add Red Edges:PR, MoJo unchanged. EdgeSim, MeCl reduced.

PA

PB

Add Blue Edges:PR, MoJo, MeCl &EdgeSim unchanged.

Page 11: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

11Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Definitions

M1

M2

M3

M4

Internal/Intra-Edge: Edge within a cluster

External/Inter-Edge: Edge between two clusters

Page 12: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

12Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

EdgeSim Example

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

c f g

hd e

i j

k l

MDG

Page 13: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

13Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

EdgeSim Example

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

c f g

hd e

i j

k l

MDG

Step 1:Find Common Inter-

and Intra-Edges

Page 14: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

14Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

EdgeSim Example

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

c f g

hd e

i j

k l

MDG

CommonEdge Weight

Total EdgeWeight

=10

19= 53%

Page 15: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

15Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

c f g

hd e

i j

k l

MDG

Page 16: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

16Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example(A→B)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

A1 A3A2

B1 B2

Page 17: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

17Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example(A1 B1)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

c

A1,1A1 A3A2

B1 B2

U

Page 18: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

18Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example(A2 B1)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

cf g

A1,1 A2,1A1 A3A2

B1 B2

U

Page 19: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

19Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example(A1 B2)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

cf g

d e

A1,1 A2,1

A1,2

A1 A3A2

B1 B2

U

Page 20: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

20Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example(A2 B2)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

cf g

d e h

A1,1 A2,1

A1,2 A2,2

A1 A3A2

B1 B2

U

Page 21: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

21Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example(A3 B2)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

cf g

d e h

k l

i j

A1,1 A2,1

A1,2 A2,2

A3,2

A1 A3A2

B1 B2

U

Page 22: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

22Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

B1

B2

MeCl Example(A→B)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

cf g

d e h

k l

i j

A1,1 A2,1

A1,2 A2,2

A3,2

A1 A3A2

B1 B2

Page 23: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

23Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example (A→B)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

cf g

d e h

k l

i j

A1,1 A2,1

A1,2 A2,2

A3,2

A1 A3A2

B1 B2

B1

B2

Newly Introduced Inter-Edges

Page 24: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

24Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example(B→A)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

cf g

d e h

k l

i j

B1,1 B1,2

B2,1 B2,2

B2,3

A1 A3A2

B1 B2

Page 25: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

25Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

A1 A2

A3

MeCl Example(B→A)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

cf g

d e h

k l

i j

B1,1 B1,2

B2,1 B2,2

B2,3

A1 A3A2

B1 B2

Page 26: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

26Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example (B→A)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

cf g

d e h

k l

i j

B1,1 B1,2

B2,1 B2,2

B2,3

A1 A3A2

B1 B2

A1 A2

A3

Newly Introduced Inter-Edges

Page 27: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

27Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Calculation

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

A1 A3A2

B1 B2

Inter-Edges Introduced

MeCl(A→B):({b,e},{e,c},{g,h},{f,h})

MeCl(B→A):({e,i},{h,j},{b,f},{c,f},{h,e})

MeCl= 1 -maxW(MA→B, MB→A)

Total Edge Weight

MeCl = 1-5

19= 73.7%

Page 28: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

28Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Similarity Measurement RecapM1

M5

M2

M6

M3

M4

M7

M8

M1

M5

M2

M6

M3

M4

M7

M8

A1:

B1:

M1

M5

M2

M6

M3

M4

M7

M8

M1

M5

M2

M6

M3

M4

M7

M8

A2:

B2:

P1 P2

MoJo(P1) = MoJo(P2) = 87.5%

PR(P1) = PR(P2) = P:84.6%, R:68.7%, AVGPR=76.7%

Conclusion… P1 is equally similar to P2

Page 29: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

29Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Similarity Measurement RecapM1

M5

M2

M6

M3

M4

M7

M8

M1

M5

M2

M6

M3

M4

M7

M8

A1:

B1:

M1

M5

M2

M6

M3

M4

M7

M8

M1

M5

M2

M6

M3

M4

M7

M8

A2:

B2:

P1 P2

EdgeSim(P1)=77.8% EdgeSim(P2)=58.3%

MeCl(P1)=88.9% MeCl(P2)=66.7%

Conclusion… P1 is more similar than P2

Page 30: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

30Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Summary:EdgeSim & MeCl

EdgeSim:n Rewards clustering algorithms for

preserving the edge typesn Penalizes clustering algorithms for

changing the edge types

MeCl:n Rewards the clustering algorithm for

creating cohesive “subclusters”

Page 31: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

31Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Special Modules

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

A1 A3A2

B1 B2

Omnipresent Modules:“Strong” Connection to other Modules

Library Modules:Always used by other modules, never use other modules

Isomorphic Modules:Modules equally connected to other subsystems

Page 32: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

32Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Special Modules

i

k la b

c f g

h

c

f

g

a

b

h

k

l

i

PA

PB

A1 A3A2

B1 B2

k

k

Omnipresent Modules:Removed

Library Modules:Removed

Isomorphic Modules:Replicated

Special Treatment of Special Treatment of Special Modules helps to Special Modules helps to determine the Similaritydetermine the Similarity

d

d

Page 33: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

33Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Case Study Overview

Clustering Algorithms

Similarity Evaluation Tool

Similarity Analysis

Precision/Recall

Source Codevoid main(){printf(“hello”);

}

Clustered Result

M1

M2

M3

M5M4

M6

M7 M8

MoJo

EdgeSim

MeClAverage, Variance, etc. based on 100 clustering runs…(4950 Evaluations)

Page 34: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

34Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Case Study ObservationsAll similarity measurements exhibit consistent behavior for the systems studied

Removal of “special” modules improved all similarity measurements Treating isomorphic modules specially only improved similarity slightlyEdgeSim and MeCl produced higher and less variable similarity values then Precision/Recall and MoJo

For all systems examined:If MeCl(SA) < MeCl(SB) then MoJo(SA) < MoJo(SB), PR(SA) < PR(SB), and EdgeSim(SA) < EdgeSim(SB)

Page 35: Comparing the Decompositions Produced by Software ...bmitchell/research/icsm01pres.pdf · Produced by Software Clustering Algorithms using Similarity Measurements ... M1 M5 M2 M6

35Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Questions

Special Thanks To:n AT&T Researchn Sun Microsystemsn DARPAn NSFn US Army