Upload
others
View
20
Download
0
Embed Size (px)
Citation preview
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association
SOFTWARE DESIGN AND QUALITY GROUP INSTITUTE FOR PROGRAM STRUCTURES AND DATA ORGANIZATION, FACULTY OF INFORMATICS
www.kit.edu
Source: pixelio.de
sdq.ipd.kit.edu
Remodularizing Legacy Model Transformations with Automatic Clustering TechniquesAndreas Rentschler, Dominik Werle, Qais Noorshams, Lucia Happe, Ralf Reussner !3rd Workshop on the Analysis of Model Transformations Monday, September 29, 2014
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
Model-driven Software Quality Prediction
Performance modelof a component-basedsoftware architecture
Performance data:Execution timeThroughputResource utilisation
Motivation ◉ ⚪ Validation ⚪ Conclusion ⚪2
Approach ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
Model-driven Software Quality Prediction
Performance modelof a component-basedsoftware architecture
Performance data:Execution timeThroughputResource utilisation
Motivation ◉ ⚪ Validation ⚪
Palladio Component
ModelMeasurement
DataGenerated Simulation
Code
SimuCom Simulation Framework
Transformation Run
Conclusion ⚪2
Approach ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
Model-driven Software Quality Prediction
Performance modelof a component-basedsoftware architecture
Performance data:Execution timeThroughputResource utilisation
Motivation ◉ ⚪ Validation ⚪
Palladio Component
ModelMeasurement
DataGenerated Simulation
Code
SimuCom Simulation Framework
Transformation Run
5k lines of
codeConclusion ⚪
2Approach ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
Model-driven Software Quality Prediction
Performance modelof a component-basedsoftware architecture
Performance data:Execution timeThroughputResource utilisation
Motivation ◉ ⚪ Validation ⚪
Palladio Component
ModelMeasurement
DataQueueing Petri Net
ModelSimulateTransformation
5k lines of
codeQVT-O
Conclusion ⚪2
Approach ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
Problem and Overall Approach
3Motivation ⚫ ◉ Approach ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ Validation ⚪ Conclusion ⚪
Legacy Transformation
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
Problem and Overall Approach
3Motivation ⚫ ◉ Approach ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ Validation ⚪ Conclusion ⚪
Legacy Transformation Modularized
Transformation
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
Problem and Overall Approach
3Motivation ⚫ ◉ Approach ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ Validation ⚪ Conclusion ⚪
Legacy Transformation Modularized
Transformation
Manual Decomposition
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
Problem and Overall Approach
3Motivation ⚫ ◉ Approach ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ Validation ⚪ Conclusion ⚪
Legacy Transformation Modularized
Transformation
Manual Decomposition
ClustersCluster Analysis
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
Problem and Overall Approach
3Motivation ⚫ ◉ Approach ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ Validation ⚪ Conclusion ⚪
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
Problem and Overall Approach
3Motivation ⚫ ◉ Approach ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ Validation ⚪ Conclusion ⚪
How can we support typical transformation designs?What dependence information is required?
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
Problem and Overall Approach
3Motivation ⚫ ◉ Approach ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ Validation ⚪ Conclusion ⚪
How can we support typical transformation designs?What dependence information is required?
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
Design RulesWhat’s makes model transformations different from GPL programs?
Data-centric operationsData is hierarchically structuredData models extrinsically defined!
Common decompositional styles [Lawley04]:
4Motivation ⚫ ⚫ Validation ⚪ Conclusion ⚪
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
Source-driven Target-driven Aspect-driven
one-to-many mappings
many-to-one mappings,M2T templates
a mixture of both
Approach ◉ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
A Minimalistic Example Transformation
5Motivation ⚫ ⚫ Validation ⚪ Conclusion ⚪
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
Approach ⚫ ◉ ⚪ ⚪ ⚪ ⚪ ⚪
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
A Minimalistic Example Transformation
5Motivation ⚫ ⚫ Validation ⚪ Conclusion ⚪
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
StopAction
ActivityModel
Source Model
Activity
Action
StartAction
actions
successors
Approach ⚫ ◉ ⚪ ⚪ ⚪ ⚪ ⚪
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
A Minimalistic Example Transformation
5Motivation ⚫ ⚫ Validation ⚪ Conclusion ⚪
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
StopAction
ActivityModel
Source Model
Activity
Action
StartAction
actions
successors
ProcessModel
Target Model
Process
Stepsteps
next
Approach ⚫ ◉ ⚪ ⚪ ⚪ ⚪ ⚪
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
A Minimalistic Example Transformation
5Motivation ⚫ ⚫ Validation ⚪ Conclusion ⚪
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
StopAction
ActivityModel
Source Model
Activity
Action
StartAction
actions
successors
ProcessModel
Target Model
Process
Stepsteps
next
mapActivity2Process
Transformation
mapAction2Step
call
call
mapAction2Step
mapAction2Step
call
mapping
mapping
mapping
mappingin out
in
in
in
out
Approach ⚫ ◉ ⚪ ⚪ ⚪ ⚪ ⚪
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
A Minimalistic Example Transformation
5Motivation ⚫ ⚫ Validation ⚪ Conclusion ⚪
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
StopAction
ActivityModel
Source Model
Activity
Action
StartAction
actions
successors
CompositeActions
CompositeAction
actions
ProcessModel
Target Model
Process
Stepsteps
next
mapActivity2Process
Transformation
mapAction2Step
call
call
mapAction2Step
mapAction2Step
call
mapping
mapping
mapping
mappingin out
in
in
in
out
Approach ⚫ ◉ ⚪ ⚪ ⚪ ⚪ ⚪
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
A Minimalistic Example Transformation
5Motivation ⚫ ⚫ Validation ⚪ Conclusion ⚪
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
StopAction
ActivityModel
Source Model
Activity
Action
StartAction
actions
successors
CompositeActions
CompositeAction
actions
ProcessModel
Target Model
Process
Stepsteps
next
mapAction2Step
callmapping
call
in
outin
out
createProcesshelper
mapActivity2Process
Transformation
mapAction2Step
call
call
mapAction2Step
mapAction2Step
call
mapping
mapping
mapping
mappingin out
in
in
in
out
Approach ⚫ ◉ ⚪ ⚪ ⚪ ⚪ ⚪
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
A Minimalistic Example Transformation
5Motivation ⚫ ⚫ Validation ⚪ Conclusion ⚪
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
Activity2Process
Action2Step
CompositeAction2Step
StopAction
ActivityModel
Source Model
Activity
Action
StartAction
actions
successors
CompositeActions
CompositeAction
actions
ProcessModel
Target Model
Process
Stepsteps
next
mapAction2Step
callmapping
call
in
outin
out
createProcesshelper
mapActivity2Process
Transformation
mapAction2Step
call
call
mapAction2Step
mapAction2Step
call
mapping
mapping
mapping
mappingin out
in
in
in
out
Approach ⚫ ◉ ⚪ ⚪ ⚪ ⚪ ⚪
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
The Bunch Clustering Approach*
6Motivation ⚫ ⚫ Approach ⚫ ⚫ ◉ ⚪ ⚪ ⚪ ⚪ Validation ⚪ Conclusion ⚪* by [Mitchell06]; alternative approaches are ARCH, ACDC, LIMBO, … [Shtern12]
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
The Bunch Clustering Approach*
6Motivation ⚫ ⚫ Approach ⚫ ⚫ ◉ ⚪ ⚪ ⚪ ⚪ Validation ⚪ Conclusion ⚪* by [Mitchell06]; alternative approaches are ARCH, ACDC, LIMBO, … [Shtern12]
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
115
5
15
15
515
5
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
The Bunch Clustering Approach*
6Motivation ⚫ ⚫ Approach ⚫ ⚫ ◉ ⚪ ⚪ ⚪ ⚪ Validation ⚪ Conclusion ⚪* by [Mitchell06]; alternative approaches are ARCH, ACDC, LIMBO, … [Shtern12]
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
115
5
15
15
515
5
115
5
15
15
515
5
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
The Bunch Clustering Approach*
6Motivation ⚫ ⚫ Approach ⚫ ⚫ ◉ ⚪ ⚪ ⚪ ⚪ Validation ⚪ Conclusion ⚪* by [Mitchell06]; alternative approaches are ARCH, ACDC, LIMBO, … [Shtern12]
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
Clustering methods: hill climbing, genetic, exhaustive
115
5
15
15
515
5
115
5
15
15
515
5
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
The Bunch Clustering Approach*
6
composition. It traverses the tree-like structured activity model, and each nodeembodies an own high-level concept that is mapped to target concepts.Target-driven decomposition. When objects of a particular class in the targetdomain are constructed from information distributed over instances of multipleclasses in the source domain (many-to-one mappings), a target-driven decom-position is deemed more adequate. Transformations from low-level to high-levelconcepts (synthesizing transformations) use this style.Aspect-driven decomposition. In several cases, a mixture of the two ap-plies. Aspect-driven decompositions are required whenever a single concern isdistributed over multiple concepts in both domains (many-to-many mappings). In-place transformations (i.e., transformations within a single domain) that replaceconcepts with low-level concepts often follow this style, particularly if operationsare executed per concern and affect multiple elements in the domain.
Any of these styles – and preferably also mixtures – must be supported by anautomatic decomposition analysis in order to produce meaningful results.
3 Automatic Software ClusteringThe principal objective of software clustering methodologies is to help softwareengineers in understanding and maintaining large software systems with outdatedor missing documentation and inferior structure. They do so by partitioningsystem entities – including methods, classes, and modules – into manageablesub systems. A survey on algorithms that had been used to cluster generalsoftware systems has been carried out by Shtern et al. [3]. They describe variousclasses of algorithms that can be used for this purpose, including algorithmsfrom graph-theory, constructive, hierarchical agglomerative, and optimizationalgorithms.
In this paper, we employ the Bunch tool, a clustering system that uses oneof two optimization algorithms, hill climbing or a genetic algorithm, to findnear-optimal solutions [4]. Bunch operates on a graph with weighted edges,the so-called Module Dependency Graph (MDG). Nodes represent the low-levelconcepts to be grouped into modules, and may correspond to methods and classes.As a fitness function for the optimization algorithms, Modularization Quality(MQ) is used, a metric that integrates coupling and cohesion among the clustersinto a single value. Optimization starts with a randomly created partitioning,for which neighboring partitions – with respect to atomic move operations – areexplored.
According to Mitchell et al. [4], a dependency graph is a directed graphG = (V,E) that consists of a set of vertices and edges, E ⇢ V ⇥ V . A partition(or clustering) of G into n clusters (n-partition) is then formally defined as ⇧G =Sn
i=1 Gi with Gi = (Vi, Ei), and 8v 2 V 91k 2 [1, n], v 2 Vk. Edges Ei are edgesthat leave or remain inside the partition, Ei = {hv1, v2i 2 E : v1 2 Vi ^ v2 2 V }.
The MQ value is the sum of the cluster factors CFi over all i 2 {1, . . . , k}clusters. The cluster factor of the i-th cluster is defined as the normalized ratiobetween the weight of all the edges within the cluster, intraedges µi, and thesum of weights of all edges that connect with nodes in one of the other clusters,interedges ✏i,j or ✏j,i. Penalty of interedges is equally distributed to each of theaffected clusters i and j:
MQ =kX
i=1
CFi, CFi =
8<
:
0, µi = 0µi
µi+ 12
Pkj=1j 6=i
(✏i,j+✏j,i), otherwise
Motivation ⚫ ⚫ Approach ⚫ ⚫ ◉ ⚪ ⚪ ⚪ ⚪ Validation ⚪ Conclusion ⚪* by [Mitchell06]; alternative approaches are ARCH, ACDC, LIMBO, … [Shtern12]
Modularity quality index: favor low coupling & high cohesion
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
Clustering methods: hill climbing, genetic, exhaustive
115
5
15
15
515
5
115
5
15
15
515
5
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
Dependence Analysis
7Motivation ⚫ ⚫ Approach ⚫ ⚫ ⚫ ◉ ⚪ ⚪ ⚪ Validation ⚪ Conclusion ⚪
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
mapping_mapAction2Stepclass_core_Action
package_core
class_core_BasicAction
mapping_mapActivity2Process
entry_main
class_core_Activity
class_core_StartAction mapping_mapAction2Step2
mapping_mapAction2Step3class_core_StopAction
package_process
class_process_Step
class_process_Process
helper_createProcess
package_composite
mapping_mapAction2Stepclass_composite_CompositeAction
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
Dependence Analysis
7
«package»
«package»
«package»
«write»
«write»
«write»
«read»
«read»
«read»
«read»
«read»
«read»
«call»
«call»
«call»
«call»
«call»
«call»
«write»
Motivation ⚫ ⚫ Approach ⚫ ⚫ ⚫ ◉ ⚪ ⚪ ⚪ Validation ⚪ Conclusion ⚪
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
mapping_mapAction2Stepclass_core_Action
package_core
class_core_BasicAction
mapping_mapActivity2Process
entry_main
class_core_Activity
class_core_StartAction mapping_mapAction2Step2
mapping_mapAction2Step3class_core_StopAction
package_process
class_process_Step
class_process_Process
helper_createProcess
package_composite
mapping_mapAction2Stepclass_composite_CompositeAction
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
Weight ConfigurationDepending on the style…!
!
!
!
!
!
…initialize a configuration vector for weighting the various types of edges:
8Motivation ⚫ ⚫ Approach ⚫ ⚫ ⚫ ⚫ ◉ ⚪ ⚪ Validation ⚪ Conclusion ⚪
Wwrite for write-access dependencies to classes and packages, Wread for read-access dependencies to classes and packages from one of the method’s parameters,Wcall for method call dependencies, and Wpackage for containment of classes andpackages to their directly containing package.These weights constitute a particularweight configuration, vector WC := hWwrite ,Wread ,Wcall ,Wpackagei 2 N4
0.Choosing a weight of zero naturally results in the respective type of edge
being ignored by the clustering algorithm. Choosing values Wwrite � Wread
promotes a mainly target-driven decomposition, whereas values Wwrite ⌧ Wread
enforce a mainly source-driven decomposition.
4.2 Cluster Analysis
Once dependence information has been extracted from the source files in form ofa graph, and weights have been configured accordingly, cluster analysis can beperformed on the obtained graph structure in a follow-up step.Algorithm and parameters. Bunch supports three clustering algorithms, ex-haustive search, hill climbing, and a genetic algorithm. In this paper, we useBunch’s hill climbing algorithm which appeared to produce more stable results.We use a consistent configuration, with population size set to 100, the minimumsearch space set to 90%, leaving 10% of the neighbors selected randomly.
Fig. 3 depicts the graph that had been extracted from the Activity2Processexample. Colored nodes represent the transformation’s methods, and gray nodesmark the transformation’s model elements. Boxes mark a two-level partitioningcreated by Bunch – L0 stands for the lower and more detailed level, whereasL1 partitions subsume one or more L0 partitions. For this clustering, a weightconfiguration h1, 15, 5, 15i had been used. With a sufficiently higher weight forread than for write dependencies, 15 � 1, a source-driven decomposition hadbeen performed. Therefore, mapping methods have been grouped together withtheir respective source model elements (Activity, Action, etc.) on L0. Twoof the clusters solely contain model elements and can be ignored. In the L1partition, two clusters remain: One cluster aggregates Activity2Process andAction2Stepmethods, the other cluster aggregates CompositeAction2Stepmethods. The reference to class CompositeAction may have primarily inducedthe algorithm to correctly group the respective methods together. When compar-ing the Bunch-derived L0 partition with our handmade partitioning illustratedby colors blue, red and yellow (cf. Fig. 3), we can observe that both partitionsare highly similar. Bunch, however, decided to agglomerate the red and blue L0clusters to a single L1 cluster. Developers may think about adopting Bunch’srecommendation and merge clusters Activity2Process and Action2Step.
4.3 Structural Analysis
The main objective of the approach is to gain a better understanding of the code,but also to agree on a modular decomposition that fosters understandability andthat can be used to restructure the code. To achieve this goal, in this last step, theexisting modular structure and partitions computed by the algorithm on differentparameters are compared against each other regarding their modularizationquality and structural differences. Although this is a manual step that requiresto find a compromise on two or more partitions and to refine the solution basedon expert knowledge, developers can profit from a set of metrics.
To include the legacy modular structure of the code into the assessment, anautomatized structural analysis is used that extracts this kind of information.
Method creates/updates model elementMethod reads model element
Method invokes another methodModel element is contained in package
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
Source-driven Target-driven Aspect-driven
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
Assigning Weights
9
Wwrite for write-access dependencies to classes and packages, Wread for read-access dependencies to classes and packages from one of the method’s parameters,Wcall for method call dependencies, and Wpackage for containment of classes andpackages to their directly containing package.These weights constitute a particularweight configuration, vector WC := hWwrite ,Wread ,Wcall ,Wpackagei 2 N4
0.Choosing a weight of zero naturally results in the respective type of edge
being ignored by the clustering algorithm. Choosing values Wwrite � Wread
promotes a mainly target-driven decomposition, whereas values Wwrite ⌧ Wread
enforce a mainly source-driven decomposition.
4.2 Cluster Analysis
Once dependence information has been extracted from the source files in form ofa graph, and weights have been configured accordingly, cluster analysis can beperformed on the obtained graph structure in a follow-up step.Algorithm and parameters. Bunch supports three clustering algorithms, ex-haustive search, hill climbing, and a genetic algorithm. In this paper, we useBunch’s hill climbing algorithm which appeared to produce more stable results.We use a consistent configuration, with population size set to 100, the minimumsearch space set to 90%, leaving 10% of the neighbors selected randomly.
Fig. 3 depicts the graph that had been extracted from the Activity2Processexample. Colored nodes represent the transformation’s methods, and gray nodesmark the transformation’s model elements. Boxes mark a two-level partitioningcreated by Bunch – L0 stands for the lower and more detailed level, whereasL1 partitions subsume one or more L0 partitions. For this clustering, a weightconfiguration h1, 15, 5, 15i had been used. With a sufficiently higher weight forread than for write dependencies, 15 � 1, a source-driven decomposition hadbeen performed. Therefore, mapping methods have been grouped together withtheir respective source model elements (Activity, Action, etc.) on L0. Twoof the clusters solely contain model elements and can be ignored. In the L1partition, two clusters remain: One cluster aggregates Activity2Process andAction2Stepmethods, the other cluster aggregates CompositeAction2Stepmethods. The reference to class CompositeAction may have primarily inducedthe algorithm to correctly group the respective methods together. When compar-ing the Bunch-derived L0 partition with our handmade partitioning illustratedby colors blue, red and yellow (cf. Fig. 3), we can observe that both partitionsare highly similar. Bunch, however, decided to agglomerate the red and blue L0clusters to a single L1 cluster. Developers may think about adopting Bunch’srecommendation and merge clusters Activity2Process and Action2Step.
4.3 Structural Analysis
The main objective of the approach is to gain a better understanding of the code,but also to agree on a modular decomposition that fosters understandability andthat can be used to restructure the code. To achieve this goal, in this last step, theexisting modular structure and partitions computed by the algorithm on differentparameters are compared against each other regarding their modularizationquality and structural differences. Although this is a manual step that requiresto find a compromise on two or more partitions and to refine the solution basedon expert knowledge, developers can profit from a set of metrics.
To include the legacy modular structure of the code into the assessment, anautomatized structural analysis is used that extracts this kind of information.
Wwrite for write-access dependencies to classes and packages, Wread for read-access dependencies to classes and packages from one of the method’s parameters,Wcall for method call dependencies, and Wpackage for containment of classes andpackages to their directly containing package.These weights constitute a particularweight configuration, vector WC := hWwrite ,Wread ,Wcall ,Wpackagei 2 N4
0.Choosing a weight of zero naturally results in the respective type of edge
being ignored by the clustering algorithm. Choosing values Wwrite � Wread
promotes a mainly target-driven decomposition, whereas values Wwrite ⌧ Wread
enforce a mainly source-driven decomposition.
4.2 Cluster Analysis
Once dependence information has been extracted from the source files in form ofa graph, and weights have been configured accordingly, cluster analysis can beperformed on the obtained graph structure in a follow-up step.Algorithm and parameters. Bunch supports three clustering algorithms, ex-haustive search, hill climbing, and a genetic algorithm. In this paper, we useBunch’s hill climbing algorithm which appeared to produce more stable results.We use a consistent configuration, with population size set to 100, the minimumsearch space set to 90%, leaving 10% of the neighbors selected randomly.
Fig. 3 depicts the graph that had been extracted from the Activity2Processexample. Colored nodes represent the transformation’s methods, and gray nodesmark the transformation’s model elements. Boxes mark a two-level partitioningcreated by Bunch – L0 stands for the lower and more detailed level, whereasL1 partitions subsume one or more L0 partitions. For this clustering, a weightconfiguration h1, 15, 5, 15i had been used. With a sufficiently higher weight forread than for write dependencies, 15 � 1, a source-driven decomposition hadbeen performed. Therefore, mapping methods have been grouped together withtheir respective source model elements (Activity, Action, etc.) on L0. Twoof the clusters solely contain model elements and can be ignored. In the L1partition, two clusters remain: One cluster aggregates Activity2Process andAction2Stepmethods, the other cluster aggregates CompositeAction2Stepmethods. The reference to class CompositeAction may have primarily inducedthe algorithm to correctly group the respective methods together. When compar-ing the Bunch-derived L0 partition with our handmade partitioning illustratedby colors blue, red and yellow (cf. Fig. 3), we can observe that both partitionsare highly similar. Bunch, however, decided to agglomerate the red and blue L0clusters to a single L1 cluster. Developers may think about adopting Bunch’srecommendation and merge clusters Activity2Process and Action2Step.
4.3 Structural Analysis
The main objective of the approach is to gain a better understanding of the code,but also to agree on a modular decomposition that fosters understandability andthat can be used to restructure the code. To achieve this goal, in this last step, theexisting modular structure and partitions computed by the algorithm on differentparameters are compared against each other regarding their modularizationquality and structural differences. Although this is a manual step that requiresto find a compromise on two or more partitions and to refine the solution basedon expert knowledge, developers can profit from a set of metrics.
To include the legacy modular structure of the code into the assessment, anautomatized structural analysis is used that extracts this kind of information.
Wwrite for write-access dependencies to classes and packages, Wread for read-access dependencies to classes and packages from one of the method’s parameters,Wcall for method call dependencies, and Wpackage for containment of classes andpackages to their directly containing package.These weights constitute a particularweight configuration, vector WC := hWwrite ,Wread ,Wcall ,Wpackagei 2 N4
0.Choosing a weight of zero naturally results in the respective type of edge
being ignored by the clustering algorithm. Choosing values Wwrite � Wread
promotes a mainly target-driven decomposition, whereas values Wwrite ⌧ Wread
enforce a mainly source-driven decomposition.
4.2 Cluster Analysis
Once dependence information has been extracted from the source files in form ofa graph, and weights have been configured accordingly, cluster analysis can beperformed on the obtained graph structure in a follow-up step.Algorithm and parameters. Bunch supports three clustering algorithms, ex-haustive search, hill climbing, and a genetic algorithm. In this paper, we useBunch’s hill climbing algorithm which appeared to produce more stable results.We use a consistent configuration, with population size set to 100, the minimumsearch space set to 90%, leaving 10% of the neighbors selected randomly.
Fig. 3 depicts the graph that had been extracted from the Activity2Processexample. Colored nodes represent the transformation’s methods, and gray nodesmark the transformation’s model elements. Boxes mark a two-level partitioningcreated by Bunch – L0 stands for the lower and more detailed level, whereasL1 partitions subsume one or more L0 partitions. For this clustering, a weightconfiguration h1, 15, 5, 15i had been used. With a sufficiently higher weight forread than for write dependencies, 15 � 1, a source-driven decomposition hadbeen performed. Therefore, mapping methods have been grouped together withtheir respective source model elements (Activity, Action, etc.) on L0. Twoof the clusters solely contain model elements and can be ignored. In the L1partition, two clusters remain: One cluster aggregates Activity2Process andAction2Stepmethods, the other cluster aggregates CompositeAction2Stepmethods. The reference to class CompositeAction may have primarily inducedthe algorithm to correctly group the respective methods together. When compar-ing the Bunch-derived L0 partition with our handmade partitioning illustratedby colors blue, red and yellow (cf. Fig. 3), we can observe that both partitionsare highly similar. Bunch, however, decided to agglomerate the red and blue L0clusters to a single L1 cluster. Developers may think about adopting Bunch’srecommendation and merge clusters Activity2Process and Action2Step.
4.3 Structural Analysis
The main objective of the approach is to gain a better understanding of the code,but also to agree on a modular decomposition that fosters understandability andthat can be used to restructure the code. To achieve this goal, in this last step, theexisting modular structure and partitions computed by the algorithm on differentparameters are compared against each other regarding their modularizationquality and structural differences. Although this is a manual step that requiresto find a compromise on two or more partitions and to refine the solution basedon expert knowledge, developers can profit from a set of metrics.
To include the legacy modular structure of the code into the assessment, anautomatized structural analysis is used that extracts this kind of information.
«package»
«package»
«package»
«write»
«write»
«write»
«read»
«read»
«read»
«read»
«read»
«read»
«call»
«call»
«call»
«call»
«call»
«call»
«write»
Motivation ⚫ ⚫ Approach ⚫ ⚫ ⚫ ⚫ ⚫ ◉ ⚪ Validation ⚪ Conclusion ⚪
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
mapping_mapAction2Stepclass_core_Action
package_core
class_core_BasicAction
mapping_mapActivity2Process
entry_main
class_core_Activity
class_core_StartAction mapping_mapAction2Step2
mapping_mapAction2Step3class_core_StopAction
package_process
class_process_Step
class_process_Process
helper_createProcess
package_composite
mapping_mapAction2Stepclass_composite_CompositeAction
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
Assigning Weights
9
Wwrite for write-access dependencies to classes and packages, Wread for read-access dependencies to classes and packages from one of the method’s parameters,Wcall for method call dependencies, and Wpackage for containment of classes andpackages to their directly containing package.These weights constitute a particularweight configuration, vector WC := hWwrite ,Wread ,Wcall ,Wpackagei 2 N4
0.Choosing a weight of zero naturally results in the respective type of edge
being ignored by the clustering algorithm. Choosing values Wwrite � Wread
promotes a mainly target-driven decomposition, whereas values Wwrite ⌧ Wread
enforce a mainly source-driven decomposition.
4.2 Cluster Analysis
Once dependence information has been extracted from the source files in form ofa graph, and weights have been configured accordingly, cluster analysis can beperformed on the obtained graph structure in a follow-up step.Algorithm and parameters. Bunch supports three clustering algorithms, ex-haustive search, hill climbing, and a genetic algorithm. In this paper, we useBunch’s hill climbing algorithm which appeared to produce more stable results.We use a consistent configuration, with population size set to 100, the minimumsearch space set to 90%, leaving 10% of the neighbors selected randomly.
Fig. 3 depicts the graph that had been extracted from the Activity2Processexample. Colored nodes represent the transformation’s methods, and gray nodesmark the transformation’s model elements. Boxes mark a two-level partitioningcreated by Bunch – L0 stands for the lower and more detailed level, whereasL1 partitions subsume one or more L0 partitions. For this clustering, a weightconfiguration h1, 15, 5, 15i had been used. With a sufficiently higher weight forread than for write dependencies, 15 � 1, a source-driven decomposition hadbeen performed. Therefore, mapping methods have been grouped together withtheir respective source model elements (Activity, Action, etc.) on L0. Twoof the clusters solely contain model elements and can be ignored. In the L1partition, two clusters remain: One cluster aggregates Activity2Process andAction2Stepmethods, the other cluster aggregates CompositeAction2Stepmethods. The reference to class CompositeAction may have primarily inducedthe algorithm to correctly group the respective methods together. When compar-ing the Bunch-derived L0 partition with our handmade partitioning illustratedby colors blue, red and yellow (cf. Fig. 3), we can observe that both partitionsare highly similar. Bunch, however, decided to agglomerate the red and blue L0clusters to a single L1 cluster. Developers may think about adopting Bunch’srecommendation and merge clusters Activity2Process and Action2Step.
4.3 Structural Analysis
The main objective of the approach is to gain a better understanding of the code,but also to agree on a modular decomposition that fosters understandability andthat can be used to restructure the code. To achieve this goal, in this last step, theexisting modular structure and partitions computed by the algorithm on differentparameters are compared against each other regarding their modularizationquality and structural differences. Although this is a manual step that requiresto find a compromise on two or more partitions and to refine the solution basedon expert knowledge, developers can profit from a set of metrics.
To include the legacy modular structure of the code into the assessment, anautomatized structural analysis is used that extracts this kind of information.
Wwrite for write-access dependencies to classes and packages, Wread for read-access dependencies to classes and packages from one of the method’s parameters,Wcall for method call dependencies, and Wpackage for containment of classes andpackages to their directly containing package.These weights constitute a particularweight configuration, vector WC := hWwrite ,Wread ,Wcall ,Wpackagei 2 N4
0.Choosing a weight of zero naturally results in the respective type of edge
being ignored by the clustering algorithm. Choosing values Wwrite � Wread
promotes a mainly target-driven decomposition, whereas values Wwrite ⌧ Wread
enforce a mainly source-driven decomposition.
4.2 Cluster Analysis
Once dependence information has been extracted from the source files in form ofa graph, and weights have been configured accordingly, cluster analysis can beperformed on the obtained graph structure in a follow-up step.Algorithm and parameters. Bunch supports three clustering algorithms, ex-haustive search, hill climbing, and a genetic algorithm. In this paper, we useBunch’s hill climbing algorithm which appeared to produce more stable results.We use a consistent configuration, with population size set to 100, the minimumsearch space set to 90%, leaving 10% of the neighbors selected randomly.
Fig. 3 depicts the graph that had been extracted from the Activity2Processexample. Colored nodes represent the transformation’s methods, and gray nodesmark the transformation’s model elements. Boxes mark a two-level partitioningcreated by Bunch – L0 stands for the lower and more detailed level, whereasL1 partitions subsume one or more L0 partitions. For this clustering, a weightconfiguration h1, 15, 5, 15i had been used. With a sufficiently higher weight forread than for write dependencies, 15 � 1, a source-driven decomposition hadbeen performed. Therefore, mapping methods have been grouped together withtheir respective source model elements (Activity, Action, etc.) on L0. Twoof the clusters solely contain model elements and can be ignored. In the L1partition, two clusters remain: One cluster aggregates Activity2Process andAction2Stepmethods, the other cluster aggregates CompositeAction2Stepmethods. The reference to class CompositeAction may have primarily inducedthe algorithm to correctly group the respective methods together. When compar-ing the Bunch-derived L0 partition with our handmade partitioning illustratedby colors blue, red and yellow (cf. Fig. 3), we can observe that both partitionsare highly similar. Bunch, however, decided to agglomerate the red and blue L0clusters to a single L1 cluster. Developers may think about adopting Bunch’srecommendation and merge clusters Activity2Process and Action2Step.
4.3 Structural Analysis
The main objective of the approach is to gain a better understanding of the code,but also to agree on a modular decomposition that fosters understandability andthat can be used to restructure the code. To achieve this goal, in this last step, theexisting modular structure and partitions computed by the algorithm on differentparameters are compared against each other regarding their modularizationquality and structural differences. Although this is a manual step that requiresto find a compromise on two or more partitions and to refine the solution basedon expert knowledge, developers can profit from a set of metrics.
To include the legacy modular structure of the code into the assessment, anautomatized structural analysis is used that extracts this kind of information.
Wwrite for write-access dependencies to classes and packages, Wread for read-access dependencies to classes and packages from one of the method’s parameters,Wcall for method call dependencies, and Wpackage for containment of classes andpackages to their directly containing package.These weights constitute a particularweight configuration, vector WC := hWwrite ,Wread ,Wcall ,Wpackagei 2 N4
0.Choosing a weight of zero naturally results in the respective type of edge
being ignored by the clustering algorithm. Choosing values Wwrite � Wread
promotes a mainly target-driven decomposition, whereas values Wwrite ⌧ Wread
enforce a mainly source-driven decomposition.
4.2 Cluster Analysis
Once dependence information has been extracted from the source files in form ofa graph, and weights have been configured accordingly, cluster analysis can beperformed on the obtained graph structure in a follow-up step.Algorithm and parameters. Bunch supports three clustering algorithms, ex-haustive search, hill climbing, and a genetic algorithm. In this paper, we useBunch’s hill climbing algorithm which appeared to produce more stable results.We use a consistent configuration, with population size set to 100, the minimumsearch space set to 90%, leaving 10% of the neighbors selected randomly.
Fig. 3 depicts the graph that had been extracted from the Activity2Processexample. Colored nodes represent the transformation’s methods, and gray nodesmark the transformation’s model elements. Boxes mark a two-level partitioningcreated by Bunch – L0 stands for the lower and more detailed level, whereasL1 partitions subsume one or more L0 partitions. For this clustering, a weightconfiguration h1, 15, 5, 15i had been used. With a sufficiently higher weight forread than for write dependencies, 15 � 1, a source-driven decomposition hadbeen performed. Therefore, mapping methods have been grouped together withtheir respective source model elements (Activity, Action, etc.) on L0. Twoof the clusters solely contain model elements and can be ignored. In the L1partition, two clusters remain: One cluster aggregates Activity2Process andAction2Stepmethods, the other cluster aggregates CompositeAction2Stepmethods. The reference to class CompositeAction may have primarily inducedthe algorithm to correctly group the respective methods together. When compar-ing the Bunch-derived L0 partition with our handmade partitioning illustratedby colors blue, red and yellow (cf. Fig. 3), we can observe that both partitionsare highly similar. Bunch, however, decided to agglomerate the red and blue L0clusters to a single L1 cluster. Developers may think about adopting Bunch’srecommendation and merge clusters Activity2Process and Action2Step.
4.3 Structural Analysis
The main objective of the approach is to gain a better understanding of the code,but also to agree on a modular decomposition that fosters understandability andthat can be used to restructure the code. To achieve this goal, in this last step, theexisting modular structure and partitions computed by the algorithm on differentparameters are compared against each other regarding their modularizationquality and structural differences. Although this is a manual step that requiresto find a compromise on two or more partitions and to refine the solution basedon expert knowledge, developers can profit from a set of metrics.
To include the legacy modular structure of the code into the assessment, anautomatized structural analysis is used that extracts this kind of information.
«package»
«package»
«package»
«write»
«write»
«write»
«read»
«read»
«read»
«read»
«read»
«read»
«call»
«call»
«call»
«call»
«call»
«call»
«write»
Motivation ⚫ ⚫ Approach ⚫ ⚫ ⚫ ⚫ ⚫ ◉ ⚪ Validation ⚪ Conclusion ⚪
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
mapping_mapAction2Stepclass_core_Action
package_core
class_core_BasicAction
mapping_mapActivity2Process
entry_main
class_core_Activity
class_core_StartAction mapping_mapAction2Step2
mapping_mapAction2Step3class_core_StopAction
package_process
class_process_Step
class_process_Process
helper_createProcess
package_composite
mapping_mapAction2Stepclass_composite_CompositeAction
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
Assigning Weights
9
Wwrite for write-access dependencies to classes and packages, Wread for read-access dependencies to classes and packages from one of the method’s parameters,Wcall for method call dependencies, and Wpackage for containment of classes andpackages to their directly containing package.These weights constitute a particularweight configuration, vector WC := hWwrite ,Wread ,Wcall ,Wpackagei 2 N4
0.Choosing a weight of zero naturally results in the respective type of edge
being ignored by the clustering algorithm. Choosing values Wwrite � Wread
promotes a mainly target-driven decomposition, whereas values Wwrite ⌧ Wread
enforce a mainly source-driven decomposition.
4.2 Cluster Analysis
Once dependence information has been extracted from the source files in form ofa graph, and weights have been configured accordingly, cluster analysis can beperformed on the obtained graph structure in a follow-up step.Algorithm and parameters. Bunch supports three clustering algorithms, ex-haustive search, hill climbing, and a genetic algorithm. In this paper, we useBunch’s hill climbing algorithm which appeared to produce more stable results.We use a consistent configuration, with population size set to 100, the minimumsearch space set to 90%, leaving 10% of the neighbors selected randomly.
Fig. 3 depicts the graph that had been extracted from the Activity2Processexample. Colored nodes represent the transformation’s methods, and gray nodesmark the transformation’s model elements. Boxes mark a two-level partitioningcreated by Bunch – L0 stands for the lower and more detailed level, whereasL1 partitions subsume one or more L0 partitions. For this clustering, a weightconfiguration h1, 15, 5, 15i had been used. With a sufficiently higher weight forread than for write dependencies, 15 � 1, a source-driven decomposition hadbeen performed. Therefore, mapping methods have been grouped together withtheir respective source model elements (Activity, Action, etc.) on L0. Twoof the clusters solely contain model elements and can be ignored. In the L1partition, two clusters remain: One cluster aggregates Activity2Process andAction2Stepmethods, the other cluster aggregates CompositeAction2Stepmethods. The reference to class CompositeAction may have primarily inducedthe algorithm to correctly group the respective methods together. When compar-ing the Bunch-derived L0 partition with our handmade partitioning illustratedby colors blue, red and yellow (cf. Fig. 3), we can observe that both partitionsare highly similar. Bunch, however, decided to agglomerate the red and blue L0clusters to a single L1 cluster. Developers may think about adopting Bunch’srecommendation and merge clusters Activity2Process and Action2Step.
4.3 Structural Analysis
The main objective of the approach is to gain a better understanding of the code,but also to agree on a modular decomposition that fosters understandability andthat can be used to restructure the code. To achieve this goal, in this last step, theexisting modular structure and partitions computed by the algorithm on differentparameters are compared against each other regarding their modularizationquality and structural differences. Although this is a manual step that requiresto find a compromise on two or more partitions and to refine the solution basedon expert knowledge, developers can profit from a set of metrics.
To include the legacy modular structure of the code into the assessment, anautomatized structural analysis is used that extracts this kind of information.
Wwrite for write-access dependencies to classes and packages, Wread for read-access dependencies to classes and packages from one of the method’s parameters,Wcall for method call dependencies, and Wpackage for containment of classes andpackages to their directly containing package.These weights constitute a particularweight configuration, vector WC := hWwrite ,Wread ,Wcall ,Wpackagei 2 N4
0.Choosing a weight of zero naturally results in the respective type of edge
being ignored by the clustering algorithm. Choosing values Wwrite � Wread
promotes a mainly target-driven decomposition, whereas values Wwrite ⌧ Wread
enforce a mainly source-driven decomposition.
4.2 Cluster Analysis
Once dependence information has been extracted from the source files in form ofa graph, and weights have been configured accordingly, cluster analysis can beperformed on the obtained graph structure in a follow-up step.Algorithm and parameters. Bunch supports three clustering algorithms, ex-haustive search, hill climbing, and a genetic algorithm. In this paper, we useBunch’s hill climbing algorithm which appeared to produce more stable results.We use a consistent configuration, with population size set to 100, the minimumsearch space set to 90%, leaving 10% of the neighbors selected randomly.
Fig. 3 depicts the graph that had been extracted from the Activity2Processexample. Colored nodes represent the transformation’s methods, and gray nodesmark the transformation’s model elements. Boxes mark a two-level partitioningcreated by Bunch – L0 stands for the lower and more detailed level, whereasL1 partitions subsume one or more L0 partitions. For this clustering, a weightconfiguration h1, 15, 5, 15i had been used. With a sufficiently higher weight forread than for write dependencies, 15 � 1, a source-driven decomposition hadbeen performed. Therefore, mapping methods have been grouped together withtheir respective source model elements (Activity, Action, etc.) on L0. Twoof the clusters solely contain model elements and can be ignored. In the L1partition, two clusters remain: One cluster aggregates Activity2Process andAction2Stepmethods, the other cluster aggregates CompositeAction2Stepmethods. The reference to class CompositeAction may have primarily inducedthe algorithm to correctly group the respective methods together. When compar-ing the Bunch-derived L0 partition with our handmade partitioning illustratedby colors blue, red and yellow (cf. Fig. 3), we can observe that both partitionsare highly similar. Bunch, however, decided to agglomerate the red and blue L0clusters to a single L1 cluster. Developers may think about adopting Bunch’srecommendation and merge clusters Activity2Process and Action2Step.
4.3 Structural Analysis
The main objective of the approach is to gain a better understanding of the code,but also to agree on a modular decomposition that fosters understandability andthat can be used to restructure the code. To achieve this goal, in this last step, theexisting modular structure and partitions computed by the algorithm on differentparameters are compared against each other regarding their modularizationquality and structural differences. Although this is a manual step that requiresto find a compromise on two or more partitions and to refine the solution basedon expert knowledge, developers can profit from a set of metrics.
To include the legacy modular structure of the code into the assessment, anautomatized structural analysis is used that extracts this kind of information.
Wwrite for write-access dependencies to classes and packages, Wread for read-access dependencies to classes and packages from one of the method’s parameters,Wcall for method call dependencies, and Wpackage for containment of classes andpackages to their directly containing package.These weights constitute a particularweight configuration, vector WC := hWwrite ,Wread ,Wcall ,Wpackagei 2 N4
0.Choosing a weight of zero naturally results in the respective type of edge
being ignored by the clustering algorithm. Choosing values Wwrite � Wread
promotes a mainly target-driven decomposition, whereas values Wwrite ⌧ Wread
enforce a mainly source-driven decomposition.
4.2 Cluster Analysis
Once dependence information has been extracted from the source files in form ofa graph, and weights have been configured accordingly, cluster analysis can beperformed on the obtained graph structure in a follow-up step.Algorithm and parameters. Bunch supports three clustering algorithms, ex-haustive search, hill climbing, and a genetic algorithm. In this paper, we useBunch’s hill climbing algorithm which appeared to produce more stable results.We use a consistent configuration, with population size set to 100, the minimumsearch space set to 90%, leaving 10% of the neighbors selected randomly.
Fig. 3 depicts the graph that had been extracted from the Activity2Processexample. Colored nodes represent the transformation’s methods, and gray nodesmark the transformation’s model elements. Boxes mark a two-level partitioningcreated by Bunch – L0 stands for the lower and more detailed level, whereasL1 partitions subsume one or more L0 partitions. For this clustering, a weightconfiguration h1, 15, 5, 15i had been used. With a sufficiently higher weight forread than for write dependencies, 15 � 1, a source-driven decomposition hadbeen performed. Therefore, mapping methods have been grouped together withtheir respective source model elements (Activity, Action, etc.) on L0. Twoof the clusters solely contain model elements and can be ignored. In the L1partition, two clusters remain: One cluster aggregates Activity2Process andAction2Stepmethods, the other cluster aggregates CompositeAction2Stepmethods. The reference to class CompositeAction may have primarily inducedthe algorithm to correctly group the respective methods together. When compar-ing the Bunch-derived L0 partition with our handmade partitioning illustratedby colors blue, red and yellow (cf. Fig. 3), we can observe that both partitionsare highly similar. Bunch, however, decided to agglomerate the red and blue L0clusters to a single L1 cluster. Developers may think about adopting Bunch’srecommendation and merge clusters Activity2Process and Action2Step.
4.3 Structural Analysis
The main objective of the approach is to gain a better understanding of the code,but also to agree on a modular decomposition that fosters understandability andthat can be used to restructure the code. To achieve this goal, in this last step, theexisting modular structure and partitions computed by the algorithm on differentparameters are compared against each other regarding their modularizationquality and structural differences. Although this is a manual step that requiresto find a compromise on two or more partitions and to refine the solution basedon expert knowledge, developers can profit from a set of metrics.
To include the legacy modular structure of the code into the assessment, anautomatized structural analysis is used that extracts this kind of information.
15
15
15
1
1
1
15
15
15
15
15
15
5
5
5
5
5
5
1
Motivation ⚫ ⚫ Approach ⚫ ⚫ ⚫ ⚫ ⚫ ◉ ⚪ Validation ⚪ Conclusion ⚪
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
mapping_mapAction2Stepclass_core_Action
package_core
class_core_BasicAction
mapping_mapActivity2Process
entry_main
class_core_Activity
class_core_StartAction mapping_mapAction2Step2
mapping_mapAction2Step3class_core_StopAction
package_process
class_process_Step
class_process_Process
helper_createProcess
package_composite
mapping_mapAction2Stepclass_composite_CompositeAction
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
Example Clustering
10
(SS-L1):mapping_mapActivity2Process
(SS-L0):mapping_mapAction2Step
mapping_mapAction2Step
class_core_Action
(SS-L0):package_core
package_core
class_core_BasicAction
(SS-L0):mapping_mapActivity2Process
mapping_mapActivity2Processentry_main
class_core_Activity
(SS-L0):mapping_mapAction2Step2
class_core_StartAction
mapping_mapAction2Step2
(SS-L0):mapping_mapAction2Step3
mapping_mapAction2Step3
class_core_StopAction
(SS-L1):helper_createProcess
(SS-L0):class_process_Step
package_process
class_process_Step
(SS-L0):helper_createProcess
class_process_Process
helper_createProcess
(SS-L0):mapping_mapAction2Step
package_composite
mapping_mapAction2Step
class_composite_CompositeAction
Motivation ⚫ ⚫ Validation ⚪ Conclusion ⚪Approach ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ ◉
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
Assessing ResultsCompare derived clustering with a manual expert clusteringUsing three similarity/distance metrics
11Motivation ⚫ ⚫ Approach ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ Validation ◉ Conclusion ⚪
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
Assessing ResultsCompare derived clustering with a manual expert clusteringUsing three similarity/distance metrics
11Motivation ⚫ ⚫ Approach ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ Validation ◉ Conclusion ⚪
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
Table 1: Activity2Process – Manual vs. derived clusteringConfiguration Statistics Similarity to expert clustering
#
C
l
u
s
t
e
r
s
M
Q
i
n
d
e
x
P
r
e
c
i
s
i
o
n
R
e
c
a
l
l
E
d
g
e
S
i
m
M
e
C
l
Expert clustering
Derived manually 3 1.067 100% 100% 100 100%
Method-call dependencies only
Hill Climbing, WC = h0, 0, 1, 0i 2 1.214 20.00% 100% 54.54 60%
Class-level dependencies
Hill Climbing, WC = h1, 15, 5, 15i 2 1.083 33.33% 100% 72.72 85%
Modularization Quality. Quality metrics can be used for a quick estimationof the quality of a particular partition. In context of the Bunch approach, itmakes sense to observe the MQ index that Bunch uses to assess partitions whensearching for a (quasi-)optimal partition. The MQ value can be computed forboth method and model dependencies (which it has been optimized for), but alsofor method dependencies alone.
We use three similarity measures to quantify the similarity of a sampleclustering with the expert clustering, Precision/Recall, EdgeSim, and MeCl. Thelatter two had been specifically built for the software domain by Mitchell et al.,all three are supported by the Bunch tool. Other measurements that are used inother contexts include MojoFM [9] and the Koschke-Eisenbarth metric [10].Precision/Recall. Precision is calculated as the percentage of node pairs ina single cluster of a sample clustering that are also contained within a singlecluster in the authoritative clustering. Recall, on the other hand, is defined as thepercentage of node pairs within a single cluster in the authoritative clusteringthat are also node pairs within a single cluster in the sample clustering [3]. Edgesare not considered, and the metric is sensitive to number and size of clusters [11].EdgeSim. The EdgeSim similarity measure [11] calculates the normalized ratioof intra and intercluster edges present in both partitions. Nodes are ignored.MeCl. The MergeClumps (MeCl) metric is a distance measure [11]. Starting withthe largest subsets of entities that had been placed in each of the partitions intothe same clusters, a series of merge operations, needed to convert one partitioninto the other, is calculated. Both directions are considered, and the largestnumber of merge operations (in a normalized form) is taken as the MeCl distance.
We used the above measurements to compare quality and similarity of manu-ally and two automatically derived partitions in the Activity2Process example.We computed a partition based on method-level dependencies alone, and anotherpartition based on method and class-level dependencies (Tab. 1). Due to thesmall number of nodes in the input graphs, the output partition per dependencegraph produced was identical for five independent runs.
The expert clustering – the one manually done – comprises three clusters,whereas both derived clusterings comprise two. The method-level clusteringproduced the best MQ value. Despite having a slightly worse modularizationquality, the partition derived from class-level dependencies still produces an(albeit marginally) better MQ value than that of the expert clustering.
Even more importantly, for this example, all three metrics agree that model-use dependencies result in a partition more similar to the expert clustering thana partition derived from method-call dependencies alone. The still relatively low
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
Assessing ResultsCompare derived clustering with a manual expert clusteringUsing three similarity/distance metrics
11Motivation ⚫ ⚫ Approach ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ Validation ◉ Conclusion ⚪
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
Table 1: Activity2Process – Manual vs. derived clusteringConfiguration Statistics Similarity to expert clustering
#
C
l
u
s
t
e
r
s
M
Q
i
n
d
e
x
P
r
e
c
i
s
i
o
n
R
e
c
a
l
l
E
d
g
e
S
i
m
M
e
C
l
Expert clustering
Derived manually 3 1.067 100% 100% 100 100%
Method-call dependencies only
Hill Climbing, WC = h0, 0, 1, 0i 2 1.214 20.00% 100% 54.54 60%
Class-level dependencies
Hill Climbing, WC = h1, 15, 5, 15i 2 1.083 33.33% 100% 72.72 85%
Modularization Quality. Quality metrics can be used for a quick estimationof the quality of a particular partition. In context of the Bunch approach, itmakes sense to observe the MQ index that Bunch uses to assess partitions whensearching for a (quasi-)optimal partition. The MQ value can be computed forboth method and model dependencies (which it has been optimized for), but alsofor method dependencies alone.
We use three similarity measures to quantify the similarity of a sampleclustering with the expert clustering, Precision/Recall, EdgeSim, and MeCl. Thelatter two had been specifically built for the software domain by Mitchell et al.,all three are supported by the Bunch tool. Other measurements that are used inother contexts include MojoFM [9] and the Koschke-Eisenbarth metric [10].Precision/Recall. Precision is calculated as the percentage of node pairs ina single cluster of a sample clustering that are also contained within a singlecluster in the authoritative clustering. Recall, on the other hand, is defined as thepercentage of node pairs within a single cluster in the authoritative clusteringthat are also node pairs within a single cluster in the sample clustering [3]. Edgesare not considered, and the metric is sensitive to number and size of clusters [11].EdgeSim. The EdgeSim similarity measure [11] calculates the normalized ratioof intra and intercluster edges present in both partitions. Nodes are ignored.MeCl. The MergeClumps (MeCl) metric is a distance measure [11]. Starting withthe largest subsets of entities that had been placed in each of the partitions intothe same clusters, a series of merge operations, needed to convert one partitioninto the other, is calculated. Both directions are considered, and the largestnumber of merge operations (in a normalized form) is taken as the MeCl distance.
We used the above measurements to compare quality and similarity of manu-ally and two automatically derived partitions in the Activity2Process example.We computed a partition based on method-level dependencies alone, and anotherpartition based on method and class-level dependencies (Tab. 1). Due to thesmall number of nodes in the input graphs, the output partition per dependencegraph produced was identical for five independent runs.
The expert clustering – the one manually done – comprises three clusters,whereas both derived clusterings comprise two. The method-level clusteringproduced the best MQ value. Despite having a slightly worse modularizationquality, the partition derived from class-level dependencies still produces an(albeit marginally) better MQ value than that of the expert clustering.
Even more importantly, for this example, all three metrics agree that model-use dependencies result in a partition more similar to the expert clustering thana partition derived from method-call dependencies alone. The still relatively low
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
Assessing ResultsCompare derived clustering with a manual expert clusteringUsing three similarity/distance metrics
11Motivation ⚫ ⚫ Approach ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ Validation ◉ Conclusion ⚪
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
Table 1: Activity2Process – Manual vs. derived clusteringConfiguration Statistics Similarity to expert clustering
#
C
l
u
s
t
e
r
s
M
Q
i
n
d
e
x
P
r
e
c
i
s
i
o
n
R
e
c
a
l
l
E
d
g
e
S
i
m
M
e
C
l
Expert clustering
Derived manually 3 1.067 100% 100% 100 100%
Method-call dependencies only
Hill Climbing, WC = h0, 0, 1, 0i 2 1.214 20.00% 100% 54.54 60%
Class-level dependencies
Hill Climbing, WC = h1, 15, 5, 15i 2 1.083 33.33% 100% 72.72 85%
Modularization Quality. Quality metrics can be used for a quick estimationof the quality of a particular partition. In context of the Bunch approach, itmakes sense to observe the MQ index that Bunch uses to assess partitions whensearching for a (quasi-)optimal partition. The MQ value can be computed forboth method and model dependencies (which it has been optimized for), but alsofor method dependencies alone.
We use three similarity measures to quantify the similarity of a sampleclustering with the expert clustering, Precision/Recall, EdgeSim, and MeCl. Thelatter two had been specifically built for the software domain by Mitchell et al.,all three are supported by the Bunch tool. Other measurements that are used inother contexts include MojoFM [9] and the Koschke-Eisenbarth metric [10].Precision/Recall. Precision is calculated as the percentage of node pairs ina single cluster of a sample clustering that are also contained within a singlecluster in the authoritative clustering. Recall, on the other hand, is defined as thepercentage of node pairs within a single cluster in the authoritative clusteringthat are also node pairs within a single cluster in the sample clustering [3]. Edgesare not considered, and the metric is sensitive to number and size of clusters [11].EdgeSim. The EdgeSim similarity measure [11] calculates the normalized ratioof intra and intercluster edges present in both partitions. Nodes are ignored.MeCl. The MergeClumps (MeCl) metric is a distance measure [11]. Starting withthe largest subsets of entities that had been placed in each of the partitions intothe same clusters, a series of merge operations, needed to convert one partitioninto the other, is calculated. Both directions are considered, and the largestnumber of merge operations (in a normalized form) is taken as the MeCl distance.
We used the above measurements to compare quality and similarity of manu-ally and two automatically derived partitions in the Activity2Process example.We computed a partition based on method-level dependencies alone, and anotherpartition based on method and class-level dependencies (Tab. 1). Due to thesmall number of nodes in the input graphs, the output partition per dependencegraph produced was identical for five independent runs.
The expert clustering – the one manually done – comprises three clusters,whereas both derived clusterings comprise two. The method-level clusteringproduced the best MQ value. Despite having a slightly worse modularizationquality, the partition derived from class-level dependencies still produces an(albeit marginally) better MQ value than that of the expert clustering.
Even more importantly, for this example, all three metrics agree that model-use dependencies result in a partition more similar to the expert clustering thana partition derived from method-call dependencies alone. The still relatively low
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
Assessing ResultsCompare derived clustering with a manual expert clusteringUsing three similarity/distance metrics
11Motivation ⚫ ⚫ Approach ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ Validation ◉ Conclusion ⚪
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
Table 1: Activity2Process – Manual vs. derived clusteringConfiguration Statistics Similarity to expert clustering
#
C
l
u
s
t
e
r
s
M
Q
i
n
d
e
x
P
r
e
c
i
s
i
o
n
R
e
c
a
l
l
E
d
g
e
S
i
m
M
e
C
l
Expert clustering
Derived manually 3 1.067 100% 100% 100 100%
Method-call dependencies only
Hill Climbing, WC = h0, 0, 1, 0i 2 1.214 20.00% 100% 54.54 60%
Class-level dependencies
Hill Climbing, WC = h1, 15, 5, 15i 2 1.083 33.33% 100% 72.72 85%
Modularization Quality. Quality metrics can be used for a quick estimationof the quality of a particular partition. In context of the Bunch approach, itmakes sense to observe the MQ index that Bunch uses to assess partitions whensearching for a (quasi-)optimal partition. The MQ value can be computed forboth method and model dependencies (which it has been optimized for), but alsofor method dependencies alone.
We use three similarity measures to quantify the similarity of a sampleclustering with the expert clustering, Precision/Recall, EdgeSim, and MeCl. Thelatter two had been specifically built for the software domain by Mitchell et al.,all three are supported by the Bunch tool. Other measurements that are used inother contexts include MojoFM [9] and the Koschke-Eisenbarth metric [10].Precision/Recall. Precision is calculated as the percentage of node pairs ina single cluster of a sample clustering that are also contained within a singlecluster in the authoritative clustering. Recall, on the other hand, is defined as thepercentage of node pairs within a single cluster in the authoritative clusteringthat are also node pairs within a single cluster in the sample clustering [3]. Edgesare not considered, and the metric is sensitive to number and size of clusters [11].EdgeSim. The EdgeSim similarity measure [11] calculates the normalized ratioof intra and intercluster edges present in both partitions. Nodes are ignored.MeCl. The MergeClumps (MeCl) metric is a distance measure [11]. Starting withthe largest subsets of entities that had been placed in each of the partitions intothe same clusters, a series of merge operations, needed to convert one partitioninto the other, is calculated. Both directions are considered, and the largestnumber of merge operations (in a normalized form) is taken as the MeCl distance.
We used the above measurements to compare quality and similarity of manu-ally and two automatically derived partitions in the Activity2Process example.We computed a partition based on method-level dependencies alone, and anotherpartition based on method and class-level dependencies (Tab. 1). Due to thesmall number of nodes in the input graphs, the output partition per dependencegraph produced was identical for five independent runs.
The expert clustering – the one manually done – comprises three clusters,whereas both derived clusterings comprise two. The method-level clusteringproduced the best MQ value. Despite having a slightly worse modularizationquality, the partition derived from class-level dependencies still produces an(albeit marginally) better MQ value than that of the expert clustering.
Even more importantly, for this example, all three metrics agree that model-use dependencies result in a partition more similar to the expert clustering thana partition derived from method-call dependencies alone. The still relatively low
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
Assessing ResultsCompare derived clustering with a manual expert clusteringUsing three similarity/distance metrics
11Motivation ⚫ ⚫ Approach ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ Validation ◉ Conclusion ⚪
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
➡ Significantly better results with model elements considered
Table 1: Activity2Process – Manual vs. derived clusteringConfiguration Statistics Similarity to expert clustering
#
C
l
u
s
t
e
r
s
M
Q
i
n
d
e
x
P
r
e
c
i
s
i
o
n
R
e
c
a
l
l
E
d
g
e
S
i
m
M
e
C
l
Expert clustering
Derived manually 3 1.067 100% 100% 100 100%
Method-call dependencies only
Hill Climbing, WC = h0, 0, 1, 0i 2 1.214 20.00% 100% 54.54 60%
Class-level dependencies
Hill Climbing, WC = h1, 15, 5, 15i 2 1.083 33.33% 100% 72.72 85%
Modularization Quality. Quality metrics can be used for a quick estimationof the quality of a particular partition. In context of the Bunch approach, itmakes sense to observe the MQ index that Bunch uses to assess partitions whensearching for a (quasi-)optimal partition. The MQ value can be computed forboth method and model dependencies (which it has been optimized for), but alsofor method dependencies alone.
We use three similarity measures to quantify the similarity of a sampleclustering with the expert clustering, Precision/Recall, EdgeSim, and MeCl. Thelatter two had been specifically built for the software domain by Mitchell et al.,all three are supported by the Bunch tool. Other measurements that are used inother contexts include MojoFM [9] and the Koschke-Eisenbarth metric [10].Precision/Recall. Precision is calculated as the percentage of node pairs ina single cluster of a sample clustering that are also contained within a singlecluster in the authoritative clustering. Recall, on the other hand, is defined as thepercentage of node pairs within a single cluster in the authoritative clusteringthat are also node pairs within a single cluster in the sample clustering [3]. Edgesare not considered, and the metric is sensitive to number and size of clusters [11].EdgeSim. The EdgeSim similarity measure [11] calculates the normalized ratioof intra and intercluster edges present in both partitions. Nodes are ignored.MeCl. The MergeClumps (MeCl) metric is a distance measure [11]. Starting withthe largest subsets of entities that had been placed in each of the partitions intothe same clusters, a series of merge operations, needed to convert one partitioninto the other, is calculated. Both directions are considered, and the largestnumber of merge operations (in a normalized form) is taken as the MeCl distance.
We used the above measurements to compare quality and similarity of manu-ally and two automatically derived partitions in the Activity2Process example.We computed a partition based on method-level dependencies alone, and anotherpartition based on method and class-level dependencies (Tab. 1). Due to thesmall number of nodes in the input graphs, the output partition per dependencegraph produced was identical for five independent runs.
The expert clustering – the one manually done – comprises three clusters,whereas both derived clusterings comprise two. The method-level clusteringproduced the best MQ value. Despite having a slightly worse modularizationquality, the partition derived from class-level dependencies still produces an(albeit marginally) better MQ value than that of the expert clustering.
Even more importantly, for this example, all three metrics agree that model-use dependencies result in a partition more similar to the expert clustering thana partition derived from method-call dependencies alone. The still relatively low
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
ConclusionProblem: Model transformation programs are often badly structured.
12Motivation ⚫ ⚫ Approach ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ Validation ⚫ Conclusion ◉
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
ConclusionProblem: Model transformation programs are often badly structured.
12Motivation ⚫ ⚫ Approach ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ Validation ⚫ Conclusion ◉
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
Approach: Apply automatic clustering algorithms to do the job.
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
ConclusionProblem: Model transformation programs are often badly structured.
12Motivation ⚫ ⚫ Approach ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ Validation ⚫ Conclusion ◉
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
By including the model structure, we can derive clusterings that follow source and target-driven decompositional styles.Selecting ‘good’ weights requires expertise and/or experimenting.
Findings:
Approach: Apply automatic clustering algorithms to do the job.
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
ConclusionProblem: Model transformation programs are often badly structured.
12
Can we support aspectual decompositions?How do the results compare in maintenance scenarios?
Motivation ⚫ ⚫ Approach ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ Validation ⚫ Conclusion ◉
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
By including the model structure, we can derive clusterings that follow source and target-driven decompositional styles.Selecting ‘good’ weights requires expertise and/or experimenting.
Open challenges:
Findings:
Approach: Apply automatic clustering algorithms to do the job.
2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.
ConclusionProblem: Model transformation programs are often badly structured.
Thanks!12
Can we support aspectual decompositions?How do the results compare in maintenance scenarios?
Motivation ⚫ ⚫ Approach ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ Validation ⚫ Conclusion ◉
Legacy Transformation Modularized
Transformation
Manual Decomposition
Control and Data
Dependencies
Dependency Analysis
ClustersCluster Analysis
By including the model structure, we can derive clusterings that follow source and target-driven decompositional styles.Selecting ‘good’ weights requires expertise and/or experimenting.
Open challenges:
Findings:
Approach: Apply automatic clustering algorithms to do the job.