Remodularizing Legacy Model Transformations …msdl.cs.mcgill.ca/conferences/AMT/2014/files/AMT14...Remodularizing Legacy Model Transformations with Automatic Clustering Techniques

KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association

SOFTWARE DESIGN AND QUALITY GROUP INSTITUTE FOR PROGRAM STRUCTURES AND DATA ORGANIZATION, FACULTY OF INFORMATICS

www.kit.edu

Source: pixelio.de

sdq.ipd.kit.edu

Remodularizing Legacy Model Transformations with Automatic Clustering TechniquesAndreas Rentschler, Dominik Werle, Qais Noorshams, Lucia Happe, Ralf Reussner !3rd Workshop on the Analysis of Model Transformations Monday, September 29, 2014

2014-04-25Remodularizing Legacy Model Transformations with Automatic Clustering Techniques – Andreas Rentschler et al.

Model-driven Software Quality Prediction

Performance modelof a component-basedsoftware architecture

Performance data:Execution timeThroughputResource utilisation

Motivation ◉ ⚪ Validation ⚪ Conclusion ⚪2

Approach ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪





Motivation ◉ ⚪ Validation ⚪

Palladio Component

ModelMeasurement

DataGenerated Simulation

Code

SimuCom Simulation Framework

Transformation Run

Conclusion ⚪2

Approach ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪






Palladio Component

ModelMeasurement

DataGenerated Simulation

Code

SimuCom Simulation Framework

Transformation Run

5k lines of

codeConclusion ⚪

2Approach ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪






Palladio Component

ModelMeasurement

DataQueueing Petri Net

ModelSimulateTransformation

5k lines of

codeQVT-O

Conclusion ⚪2

Approach ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪


Problem and Overall Approach

3Motivation ⚫ ◉ Approach ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪ Validation ⚪ Conclusion ⚪

Legacy Transformation




Legacy Transformation Modularized

Transformation





Transformation

Manual Decomposition





Transformation


ClustersCluster Analysis





Transformation


Control and Data

Dependencies

Dependency Analysis





How can we support typical transformation designs?What dependence information is required?


Transformation


Control and Data

Dependencies

Dependency Analysis





How can we support typical transformation designs?What dependence information is required?


Transformation


Control and Data

Dependencies

Dependency Analysis



Design RulesWhat’s makes model transformations different from GPL programs?

Data-centric operationsData is hierarchically structuredData models extrinsically defined!

Common decompositional styles [Lawley04]:

4Motivation ⚫ ⚫ Validation ⚪ Conclusion ⚪


Transformation


Control and Data

Dependencies

Dependency Analysis


Source-driven Target-driven Aspect-driven

one-to-many mappings

many-to-one mappings,M2T templates

a mixture of both

Approach ◉ ⚪ ⚪ ⚪ ⚪ ⚪ ⚪


A Minimalistic Example Transformation



Transformation


Control and Data

Dependencies

Dependency Analysis


Approach ⚫ ◉ ⚪ ⚪ ⚪ ⚪ ⚪





Transformation


Control and Data

Dependencies

Dependency Analysis


StopAction

ActivityModel

Source Model

Activity

Action

StartAction

actions

successors

Approach ⚫ ◉ ⚪ ⚪ ⚪ ⚪ ⚪





Transformation


Control and Data

Dependencies

Dependency Analysis


StopAction

ActivityModel

Source Model

Activity

Action

StartAction

actions

successors

ProcessModel

Target Model

Process

Stepsteps

next

Approach ⚫ ◉ ⚪ ⚪ ⚪ ⚪ ⚪





Transformation


Control and Data

Dependencies

Dependency Analysis


StopAction

ActivityModel

Source Model

Activity

Action

StartAction

actions

successors

ProcessModel

Target Model

Process

Stepsteps

next

mapActivity2Process

Transformation

mapAction2Step

call

call

mapAction2Step

mapAction2Step

call

mapping

mapping

mapping

mappingin out

in

in

in

out

Approach ⚫ ◉ ⚪ ⚪ ⚪ ⚪ ⚪





Transformation


Control and Data

Dependencies

Dependency Analysis


StopAction

ActivityModel

Source Model

Activity

Action

StartAction

actions

successors

CompositeActions

CompositeAction

actions

ProcessModel

Target Model

Process

Stepsteps

next

mapActivity2Process

Transformation

mapAction2Step

call

call

mapAction2Step

mapAction2Step

call

mapping

mapping

mapping

mappingin out

in

in

in

out

Approach ⚫ ◉ ⚪ ⚪ ⚪ ⚪ ⚪





Transformation


Control and Data

Dependencies

Dependency Analysis


StopAction

ActivityModel

Source Model

Activity

Action

StartAction

actions

successors

CompositeActions

CompositeAction

actions

ProcessModel

Target Model

Process

Stepsteps

next

mapAction2Step

callmapping

call

in

outin

out

createProcesshelper

mapActivity2Process

Transformation

mapAction2Step

call

call

mapAction2Step

mapAction2Step

call

mapping

mapping

mapping

mappingin out

in

in

in

out

Approach ⚫ ◉ ⚪ ⚪ ⚪ ⚪ ⚪





Transformation


Control and Data

Dependencies

Dependency Analysis


Activity2Process

Action2Step

CompositeAction2Step

StopAction

ActivityModel

Source Model

Activity

Action

StartAction

actions

successors

CompositeActions

CompositeAction

actions

ProcessModel

Target Model

Process

Stepsteps

next

mapAction2Step

callmapping

call

in

outin

out

createProcesshelper

mapActivity2Process

Transformation

mapAction2Step

call

call

mapAction2Step

mapAction2Step

call

mapping

mapping

mapping

mappingin out

in

in

in

out

Approach ⚫ ◉ ⚪ ⚪ ⚪ ⚪ ⚪


The Bunch Clustering Approach*

6Motivation ⚫ ⚫ Approach ⚫ ⚫ ◉ ⚪ ⚪ ⚪ ⚪ Validation ⚪ Conclusion ⚪* by [Mitchell06]; alternative approaches are ARCH, ACDC, LIMBO, … [Shtern12]


Transformation


Control and Data

Dependencies

Dependency Analysis






Transformation


Control and Data

Dependencies

Dependency Analysis


115

5

15

15

515

5





Transformation


Control and Data

Dependencies

Dependency Analysis


115

5

15

15

515

5

115

5

15

15

515

5





Transformation


Control and Data

Dependencies

Dependency Analysis


Clustering methods: hill climbing, genetic, exhaustive

115

5

15

15

515

5

115

5

15

15

515

5



6

composition. It traverses the tree-like structured activity model, and each nodeembodies an own high-level concept that is mapped to target concepts.Target-driven decomposition. When objects of a particular class in the targetdomain are constructed from information distributed over instances of multipleclasses in the source domain (many-to-one mappings), a target-driven decom-position is deemed more adequate. Transformations from low-level to high-levelconcepts (synthesizing transformations) use this style.Aspect-driven decomposition. In several cases, a mixture of the two ap-plies. Aspect-driven decompositions are required whenever a single concern isdistributed over multiple concepts in both domains (many-to-many mappings). In-place transformations (i.e., transformations within a single domain) that replaceconcepts with low-level concepts often follow this style, particularly if operationsare executed per concern and affect multiple elements in the domain.

Any of these styles – and preferably also mixtures – must be supported by anautomatic decomposition analysis in order to produce meaningful results.

3 Automatic Software ClusteringThe principal objective of software clustering methodologies is to help softwareengineers in understanding and maintaining large software systems with outdatedor missing documentation and inferior structure. They do so by partitioningsystem entities – including methods, classes, and modules – into manageablesub systems. A survey on algorithms that had been used to cluster generalsoftware systems has been carried out by Shtern et al. [3]. They describe variousclasses of algorithms that can be used for this purpose, including algorithmsfrom graph-theory, constructive, hierarchical agglomerative, and optimizationalgorithms.

In this paper, we employ the Bunch tool, a clustering system that uses oneof two optimization algorithms, hill climbing or a genetic algorithm, to findnear-optimal solutions [4]. Bunch operates on a graph with weighted edges,the so-called Module Dependency Graph (MDG). Nodes represent the low-levelconcepts to be grouped into modules, and may correspond to methods and classes.As a fitness function for the optimization algorithms, Modularization Quality(MQ) is used, a metric that integrates coupling and cohesion among the clustersinto a single value. Optimization starts with a randomly created partitioning,for which neighboring partitions – with respect to atomic move operations – areexplored.

According to Mitchell et al. [4], a dependency graph is a directed graphG = (V,E) that consists of a set of vertices and edges, E ⇢ V ⇥ V . A partition(or clustering) of G into n clusters (n-partition) is then formally defined as ⇧G =Sn

i=1 Gi with Gi = (Vi, Ei), and 8v 2 V 91k 2 [1, n], v 2 Vk. Edges Ei are edgesthat leave or remain inside the partition, Ei = {hv1, v2i 2 E : v1 2 Vi ^ v2 2 V }.

The MQ value is the sum of the cluster factors CFi over all i 2 {1, . . . , k}clusters. The cluster factor of the i-th cluster is defined as the normalized ratiobetween the weight of all the edges within the cluster, intraedges µi, and thesum of weights of all edges that connect with nodes in one of the other clusters,interedges ✏i,j or ✏j,i. Penalty of interedges is equally distributed to each of theaffected clusters i and j:

MQ =kX

i=1

CFi, CFi =

8<

:

0, µi = 0µi

µi+ 12

Pkj=1j 6=i

(✏i,j+✏j,i), otherwise

Motivation ⚫ ⚫ Approach ⚫ ⚫ ◉ ⚪ ⚪ ⚪ ⚪ Validation ⚪ Conclusion ⚪* by [Mitchell06]; alternative approaches are ARCH, ACDC, LIMBO, … [Shtern12]

Modularity quality index: favor low coupling & high cohesion


Transformation


Control and Data

Dependencies

Dependency Analysis


Clustering methods: hill climbing, genetic, exhaustive

115

5

15

15

515

5

115

5

15

15

515

5


Dependence Analysis

7Motivation ⚫ ⚫ Approach ⚫ ⚫ ⚫ ◉ ⚪ ⚪ ⚪ Validation ⚪ Conclusion ⚪


Transformation


Control and Data

Dependencies

Dependency Analysis


mapping_mapAction2Stepclass_core_Action

package_core

class_core_BasicAction

mapping_mapActivity2Process

entry_main

class_core_Activity

class_core_StartAction mapping_mapAction2Step2

mapping_mapAction2Step3class_core_StopAction

package_process

class_process_Step

class_process_Process

helper_createProcess

package_composite

mapping_mapAction2Stepclass_composite_CompositeAction


Dependence Analysis

7

«package»

«package»

«package»

«write»

«write»

«write»

«read»

«read»

«read»

«read»

«read»

«read»

«call»

«call»

«call»

«call»

«call»

«call»

«write»

Motivation ⚫ ⚫ Approach ⚫ ⚫ ⚫ ◉ ⚪ ⚪ ⚪ Validation ⚪ Conclusion ⚪


Transformation


Control and Data

Dependencies

Dependency Analysis



package_core



entry_main

class_core_Activity



package_process

class_process_Step



package_composite



Weight ConfigurationDepending on the style…!

!

!

!

!

!

…initialize a configuration vector for weighting the various types of edges:

8Motivation ⚫ ⚫ Approach ⚫ ⚫ ⚫ ⚫ ◉ ⚪ ⚪ Validation ⚪ Conclusion ⚪

Wwrite for write-access dependencies to classes and packages, Wread for read-access dependencies to classes and packages from one of the method’s parameters,Wcall for method call dependencies, and Wpackage for containment of classes andpackages to their directly containing package.These weights constitute a particularweight configuration, vector WC := hWwrite ,Wread ,Wcall ,Wpackagei 2 N4

0.Choosing a weight of zero naturally results in the respective type of edge

being ignored by the clustering algorithm. Choosing values Wwrite � Wread

promotes a mainly target-driven decomposition, whereas values Wwrite ⌧ Wread

enforce a mainly source-driven decomposition.

4.2 Cluster Analysis

Once dependence information has been extracted from the source files in form ofa graph, and weights have been configured accordingly, cluster analysis can beperformed on the obtained graph structure in a follow-up step.Algorithm and parameters. Bunch supports three clustering algorithms, ex-haustive search, hill climbing, and a genetic algorithm. In this paper, we useBunch’s hill climbing algorithm which appeared to produce more stable results.We use a consistent configuration, with population size set to 100, the minimumsearch space set to 90%, leaving 10% of the neighbors selected randomly.

Fig. 3 depicts the graph that had been extracted from the Activity2Processexample. Colored nodes represent the transformation’s methods, and gray nodesmark the transformation’s model elements. Boxes mark a two-level partitioningcreated by Bunch – L0 stands for the lower and more detailed level, whereasL1 partitions subsume one or more L0 partitions. For this clustering, a weightconfiguration h1, 15, 5, 15i had been used. With a sufficiently higher weight forread than for write dependencies, 15 � 1, a source-driven decomposition hadbeen performed. Therefore, mapping methods have been grouped together withtheir respective source model elements (Activity, Action, etc.) on L0. Twoof the clusters solely contain model elements and can be ignored. In the L1partition, two clusters remain: One cluster aggregates Activity2Process andAction2Stepmethods, the other cluster aggregates CompositeAction2Stepmethods. The reference to class CompositeAction may have primarily inducedthe algorithm to correctly group the respective methods together. When compar-ing the Bunch-derived L0 partition with our handmade partitioning illustratedby colors blue, red and yellow (cf. Fig. 3), we can observe that both partitionsare highly similar. Bunch, however, decided to agglomerate the red and blue L0clusters to a single L1 cluster. Developers may think about adopting Bunch’srecommendation and merge clusters Activity2Process and Action2Step.

4.3 Structural Analysis

The main objective of the approach is to gain a better understanding of the code,but also to agree on a modular decomposition that fosters understandability andthat can be used to restructure the code. To achieve this goal, in this last step, theexisting modular structure and partitions computed by the algorithm on differentparameters are compared against each other regarding their modularizationquality and structural differences. Although this is a manual step that requiresto find a compromise on two or more partitions and to refine the solution basedon expert knowledge, developers can profit from a set of metrics.

To include the legacy modular structure of the code into the assessment, anautomatized structural analysis is used that extracts this kind of information.

Method creates/updates model elementMethod reads model element

Method invokes another methodModel element is contained in package


Transformation


Control and Data

Dependencies

Dependency Analysis


Source-driven Target-driven Aspect-driven


Assigning Weights

9


































«package»

«package»

«package»

«write»

«write»

«write»

«read»

«read»

«read»

«read»

«read»

«read»

«call»

«call»

«call»

«call»

«call»

«call»

«write»

Motivation ⚫ ⚫ Approach ⚫ ⚫ ⚫ ⚫ ⚫ ◉ ⚪ Validation ⚪ Conclusion ⚪


Transformation


Control and Data

Dependencies

Dependency Analysis



package_core



entry_main

class_core_Activity



package_process

class_process_Step



package_composite



Assigning Weights

9


































«package»

«package»

«package»

«write»

«write»

«write»

«read»

«read»

«read»

«read»

«read»

«read»

«call»

«call»

«call»

«call»

«call»

«call»

«write»



Transformation


Control and Data

Dependencies

Dependency Analysis



package_core



entry_main

class_core_Activity



package_process

class_process_Step



package_composite



Assigning Weights

9


































15

15

15

1

1

1

15

15

15

15

15

15

5

5

5

5

5

5

1



Transformation


Control and Data

Dependencies

Dependency Analysis



package_core



entry_main

class_core_Activity



package_process

class_process_Step



package_composite



Example Clustering

10

(SS-L1):mapping_mapActivity2Process

(SS-L0):mapping_mapAction2Step

mapping_mapAction2Step

class_core_Action

(SS-L0):package_core

package_core


(SS-L0):mapping_mapActivity2Process

mapping_mapActivity2Processentry_main

class_core_Activity

(SS-L0):mapping_mapAction2Step2

class_core_StartAction

mapping_mapAction2Step2

(SS-L0):mapping_mapAction2Step3

mapping_mapAction2Step3

class_core_StopAction

(SS-L1):helper_createProcess

(SS-L0):class_process_Step

package_process

class_process_Step

(SS-L0):helper_createProcess



(SS-L0):mapping_mapAction2Step

package_composite

mapping_mapAction2Step

class_composite_CompositeAction

Motivation ⚫ ⚫ Validation ⚪ Conclusion ⚪Approach ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ ◉


Transformation


Control and Data

Dependencies

Dependency Analysis



Assessing ResultsCompare derived clustering with a manual expert clusteringUsing three similarity/distance metrics

11Motivation ⚫ ⚫ Approach ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ Validation ◉ Conclusion ⚪


Transformation


Control and Data

Dependencies

Dependency Analysis






Transformation


Control and Data

Dependencies

Dependency Analysis


Table 1: Activity2Process – Manual vs. derived clusteringConfiguration Statistics Similarity to expert clustering

#

C

l

u

s

t

e

r

s

M

Q

i

n

d

e

x

P

r

e

c

i

s

i

o

n

R

e

c

a

l

l

E

d

g

e

S

i

m

M

e

C

l

Expert clustering

Derived manually 3 1.067 100% 100% 100 100%

Method-call dependencies only

Hill Climbing, WC = h0, 0, 1, 0i 2 1.214 20.00% 100% 54.54 60%

Class-level dependencies


Modularization Quality. Quality metrics can be used for a quick estimationof the quality of a particular partition. In context of the Bunch approach, itmakes sense to observe the MQ index that Bunch uses to assess partitions whensearching for a (quasi-)optimal partition. The MQ value can be computed forboth method and model dependencies (which it has been optimized for), but alsofor method dependencies alone.

We use three similarity measures to quantify the similarity of a sampleclustering with the expert clustering, Precision/Recall, EdgeSim, and MeCl. Thelatter two had been specifically built for the software domain by Mitchell et al.,all three are supported by the Bunch tool. Other measurements that are used inother contexts include MojoFM [9] and the Koschke-Eisenbarth metric [10].Precision/Recall. Precision is calculated as the percentage of node pairs ina single cluster of a sample clustering that are also contained within a singlecluster in the authoritative clustering. Recall, on the other hand, is defined as thepercentage of node pairs within a single cluster in the authoritative clusteringthat are also node pairs within a single cluster in the sample clustering [3]. Edgesare not considered, and the metric is sensitive to number and size of clusters [11].EdgeSim. The EdgeSim similarity measure [11] calculates the normalized ratioof intra and intercluster edges present in both partitions. Nodes are ignored.MeCl. The MergeClumps (MeCl) metric is a distance measure [11]. Starting withthe largest subsets of entities that had been placed in each of the partitions intothe same clusters, a series of merge operations, needed to convert one partitioninto the other, is calculated. Both directions are considered, and the largestnumber of merge operations (in a normalized form) is taken as the MeCl distance.

We used the above measurements to compare quality and similarity of manu-ally and two automatically derived partitions in the Activity2Process example.We computed a partition based on method-level dependencies alone, and anotherpartition based on method and class-level dependencies (Tab. 1). Due to thesmall number of nodes in the input graphs, the output partition per dependencegraph produced was identical for five independent runs.

The expert clustering – the one manually done – comprises three clusters,whereas both derived clusterings comprise two. The method-level clusteringproduced the best MQ value. Despite having a slightly worse modularizationquality, the partition derived from class-level dependencies still produces an(albeit marginally) better MQ value than that of the expert clustering.

Even more importantly, for this example, all three metrics agree that model-use dependencies result in a partition more similar to the expert clustering thana partition derived from method-call dependencies alone. The still relatively low





Transformation


Control and Data

Dependencies

Dependency Analysis



#

C

l

u

s

t

e

r

s

M

Q

i

n

d

e

x

P

r

e

c

i

s

i

o

n

R

e

c

a

l

l

E

d

g

e

S

i

m

M

e

C

l

Expert clustering















Transformation


Control and Data

Dependencies

Dependency Analysis



#

C

l

u

s

t

e

r

s

M

Q

i

n

d

e

x

P

r

e

c

i

s

i

o

n

R

e

c

a

l

l

E

d

g

e

S

i

m

M

e

C

l

Expert clustering















Transformation


Control and Data

Dependencies

Dependency Analysis



#

C

l

u

s

t

e

r

s

M

Q

i

n

d

e

x

P

r

e

c

i

s

i

o

n

R

e

c

a

l

l

E

d

g

e

S

i

m

M

e

C

l

Expert clustering















Transformation


Control and Data

Dependencies

Dependency Analysis


➡ Significantly better results with model elements considered


#

C

l

u

s

t

e

r

s

M

Q

i

n

d

e

x

P

r

e

c

i

s

i

o

n

R

e

c

a

l

l

E

d

g

e

S

i

m

M

e

C

l

Expert clustering












ConclusionProblem: Model transformation programs are often badly structured.

12Motivation ⚫ ⚫ Approach ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ Validation ⚫ Conclusion ◉





Transformation


Control and Data

Dependencies

Dependency Analysis


Approach: Apply automatic clustering algorithms to do the job.





Transformation


Control and Data

Dependencies

Dependency Analysis


By including the model structure, we can derive clusterings that follow source and target-driven decompositional styles.Selecting ‘good’ weights requires expertise and/or experimenting.

Findings:




12

Can we support aspectual decompositions?How do the results compare in maintenance scenarios?

Motivation ⚫ ⚫ Approach ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ Validation ⚫ Conclusion ◉


Transformation


Control and Data

Dependencies

Dependency Analysis



Open challenges:

Findings:




Thanks!12

Can we support aspectual decompositions?How do the results compare in maintenance scenarios?

Motivation ⚫ ⚫ Approach ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ Validation ⚫ Conclusion ◉


Transformation


Control and Data

Dependencies

Dependency Analysis



Open challenges:

Findings:


Documents

Remodularizing Legacy Model Transformations …msdl.cs.mcgill.ca/conferences/AMT/2014/files/AMT14...Remodularizing Legacy Model Transformations with Automatic Clustering Techniques