Download pdf - À Ł X Àô 1 D Ý˜ Ý )Ł ðl · Chapter 2. Related work 5 2.1 Refactoring Candidate Identiﬁcation Using Static Metric Based Heuristic ... Chapter 5. Selection of Refactorings

박사학위논문

Ph. D. Dissertation

객체지향소프트웨어의유지보수성향상을위한

리팩토링식별및선택방법에대한연구

Identification and Selection of Refactorings

for Improving Maintainability of Object-Oriented Software

한아림 (韓雅琳 Han, Ah-Rim)

전산학과

Department of Computer Science

KAIST

2013

객체지향소프트웨어의유지보수성향상을위한

리팩토링식별및선택방법에대한연구

Identification and Selection of Refactorings

for Improving Maintainability of Object-Oriented Software

DCS

20075196

한아림. Han, Ah-Rim. Identification and Selection of Refactorings

for Improving Maintainability of Object-Oriented Software . 객체지향 소프트웨어의

유지보수성 향상을 위한 리팩토링 식별 및 선택 방법에 대한 연구. Department of

Computer Science . 2013. 69p. Advisor Prof. Bae, Doo-Hwan. Text in English.

ABSTRACT

Object-oriented software undergoes continuous changes with various maintenance activities. Due to the

changes, the design quality of the software degrades overtime. Thus, refactoring can serve to restructure the

design of object-oriented software without altering its external behavior to improve maintainability, which in turn

reduces maintenance costs and shortens time-to-market.

In the thesis, we provide the methods for supporting systematic refactoring identification: refactoring candi-

date identification and refactoring selection. For identifying refactoring candidates, we use top-down and bottom-

up approaches. First, for the top-down approach—a traditional way of finding refactoring opportunities by using

heuristic rules for eliminating violations of design principles in object-oriented software systems—, we establish

the rules to identity the refactoring candidates with the aim of reducing dependencies of entities for methods and

classes. When establishing the rules, we are motivated by the studies that dynamic information—how the system

is utilized—is an important factor for estimating changes. Therefore, to perform refactorings that effectively im-

prove maintainability, the entities are found based on how the users utilize the software (e.g., user scenario and

operational profile); and within these entities, refactoring candidates are identified. Second, for the bottom-up

approach—the way of finding refactoring opportunities without pre-defined patterns or rules—, we develop the

method for grouping entities—methods and attributes—by using the concept of the maximal independent set in

graph theory. When grouping entities, we take into account the new dependency of refactorings—refactoring ef-

fect dependency on maintainability—as well as the syntactic dependency of refactorings. The entities involved in

each maximal independent set are mapped into a group of elementary refactorings and these refactorings can be

applied at the same time. For selecting refactorings to be applied, we provide the method of selecting of multiple

refactorings by supporting assessment and impact analysis of elementary refactorings. We develop the refac-

toring effect evaluation framework for assessing each elementary refactoring effect on maintainability based on

matrix computation. Using the evaluation framework, we select the group of refactorings containing the multiple

elementary refactorings that best improves maintainability.

We evaluate our proposed approach in three open-source projects—jEdit, Columba, and JGIT. From the ex-

perimental results, we conclude that dynamic information is helpful in identifying refactorings that efficiently im-

prove maintainability, because dynamic information is helpful for extracting refactoring candidates in frequently

changed classes. Furthermore, the refactoring identification using multiple refactorings selects refactorings that

lead the software design to reach higher fitness function values (better improve maintainability) with smaller costs

(i.e., smaller search space exploration cost and shorter time). In addition, the refactoring effect dependency is

essential to be considered for correctly selecting a group of refactorings that most improve maintainability.

i

Contents

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i

Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

Chapter 1. Introduction 1

Chapter 2. Related work 52.1 Refactoring Candidate Identification Using Static Metric Based Heuristic

Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Determining Refactoring Sequences to be Applied . . . . . . . . . . . . . . 6

2.3 Analysis of Dependencies/Conflicts between Refactoring Candidates . . . . 7

Chapter 3. Overview of Our Approach 8

Chapter 4. Identification of Refactoring Candidates 114.1 Top-Down Approach: Extracted with Heuristic Rules . . . . . . . . . . . . 11

4.1.1 Dynamic Information-based Identification of Refactoring Candi-dates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.1.2 Refactoring Candidate Extraction Rules . . . . . . . . . . . . . . . 15

4.2 Bottom-Up Approach: Grouping Entities into Maximal Independent Sets . 21

4.2.1 Refactoring Effect Dependency (RED) on Maintainability . . . . . 21

4.2.2 Algorithm of RED-aware Grouping: Maximal Independent Set(MIS) Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Chapter 5. Selection of Refactorings to be Applied 275.1 Calculation for Each of Elementary Refactoring’s Effect on Maintainabil-

ity: Refactoring Effect Evaluation Framework . . . . . . . . . . . . . . . . 29

5.2 Refactoring Impact Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.3 Selection of Multiple Refactorings . . . . . . . . . . . . . . . . . . . . . . . 31

Chapter 6. Tool Implementation 35

Chapter 7. Evaluation 377.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

7.2 Subjects and Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 37

ii

7.3 Evaluation Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

7.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

7.4.1 Dynamic Information-based Identification of Refactoring Candi-dates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

7.4.2 RED-aware Grouping of Multiple Elementary Refactorings . . . . 52

7.5 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Chapter 8. Discussion 59

Chapter 9. Conclusion and Future Work 619.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

9.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

References 63

Appendix 69

Chapter A. Behavioral Dependency Measure as a Good Indicator for Change-Proneness Prediction [39] 70

A.1 Behavioral Dependency Measure . . . . . . . . . . . . . . . . . . . . . . . . 70

A.2 Procedure for Behavior Dependency Measure Measurement . . . . . . . . 74

A.3 Change-Proneness Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . 79

A.4 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

Summary (in Korean) 89

– iii –

List of Tables

3.1 Identified three phases by referencing refactoring process in [65]. . . . . . . . . . . . . . . . . . . 8

4.1 Difference between the static method call (SMC) and the dynamic method call (DMC). . . . . . . 14

7.1 Characteristics and development history for each subject. . . . . . . . . . . . . . . . . . . . . . . 38

7.2 Measures of each subject. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

7.3 Examined range of revisions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

7.4 Fitness function values of the original design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

7.5 Indicators of cost-effective refactorings: (1) Percentage of reduction for propagated changes, (2)

Rate of reduction for propagated changes for jEdit, Columba, and JGIT. . . . . . . . . . . . . . . 49

7.6 Commonly found classes between the real changed classes and the extracted classes as refactoring

candidates for each approach using static information and approach using dynamic information. . 51

7.7 Top k ranking distance measures (K: Kendall’s tau; F: Spearman’s footrule [27]) between the real

changed classes and the extracted classes as refactoring candidates for each approach using static

information and approach using dynamic information. . . . . . . . . . . . . . . . . . . . . . . . . 51

7.8 Results of the effect of multiple refactorings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

7.9 Results of the effect of the RED. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

A.1 RPS(oi,oj), which is a set of all the reachable paths for each pair of objects (Row: oi, Column:

oj) in Fig. A.5(b). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

A.2 SumWRP (ci,cj), which is the sum of the weighted reachable paths for each pair of classes (Row:

ci, Column: cj) and the BDM (ci) value of each class in Fig. A.3(b). . . . . . . . . . . . . . . . . 79

A.3 The number of ground facts regarding 13 subsequent versions of JFlex (versions 1.3 to 1.4.3). . . 80

A.4 The analysis-of-variance (ANOVA) table that includes the results for each clustering variable . . . 83

A.5 Descriptive statistics with respect to the four attributes for Group 1 (294 instances). . . . . . . . . 83

A.6 Descriptive statistics with respect to the four attributes for Group 2 (84 instances). . . . . . . . . . 84

A.7 Prediction model using existing metrics in Group 1. . . . . . . . . . . . . . . . . . . . . . . . . . 85

A.8 Prediction model using existing metrics and BDM in Group 1. . . . . . . . . . . . . . . . . . . . 86

A.9 Group 2 prediction model (The result is same whether the BDM is used or not because the BDM

is not included in the model). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

iv

List of Figures

1.1 Cost distribution in software lifecycle [91]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

3.1 A proposed framework for systematic refactoring identification. . . . . . . . . . . . . . . . . . . 9

3.2 Overall procedure of our approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.1 Overall approach of the top-down approach: extracted with the dynamic information based heuris-

tic rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.2 Procedure for dynamic profiling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.3 A metamodel of the AOM used in this thesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.4 Overall approach of the bottom-up approach: grouping entities into MISs. . . . . . . . . . . . . . 21

4.5 A motivating example of showing the need of the RED on maintainability. . . . . . . . . . . . . . 22

4.6 A sequential algorithm of calculating a MIS. This is reference from the lecture note of Costas

Busch [17]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.7 The algorithm for calculating MISs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5.1 Overall approach of the selection of multiple refactorings. . . . . . . . . . . . . . . . . . . . . . . 27

5.2 The algorithm for deriving a delta table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.3 Example of calculating the delta table (D) of Fig. 4.5(c) for the design of Fig. 4.5(a). . . . . . . . 33

5.4 Matrices matrices required to calculate the delta table (Fig. 4.5(c)). . . . . . . . . . . . . . . . . . 34

6.1 Overall tool architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6.2 A snapshot of tool operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

7.1 Change distribution graph (X-axis: ] of occurred changes for each class, Y-axis: ] of correspond-

ing classes) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

7.2 Maintainability evaluation function for producing fitness value. . . . . . . . . . . . . . . . . . . 45

7.3 Change simulation for jEdit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

7.4 Change simulation for Columba. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

7.5 Change simulation for JGIT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

7.6 The effect of multiple refactorings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

7.7 The effect of multiple refactorings on the number of iterations. . . . . . . . . . . . . . . . . . . . 58

A.1 Examples of Sequence Diagrams (SDs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

v

A.2 An example of using inheritance relationships and polymorphism: (a) A class diagram repre-

senting classes and their relationships. (b) SDs representing the behaviors of objects that are

instantiated from the classes in A.2(a). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

A.3 (a) An example of the Interaction Overview Diagram (IOD). (b) An example of the class diagram

that has classes from which the objects, in the SDs in Fig. A.1 are instantiated. . . . . . . . . . . . 72

A.4 Overview of our approach of BDM measurement . . . . . . . . . . . . . . . . . . . . . . . . . . 74

A.5 (a) OBDMA and OBDMB correspond to SD sd A and SD sd B in Fig. A.1. (b) The obtained

OSBDM by synthesizing the two OBDMs in (a). . . . . . . . . . . . . . . . . . . . . . . . . . . 76

A.6 An SD that was reverse-engineered from source codes of JFlex version 1.3. . . . . . . . . . . . . 82

A.7 A validation procedure followed during in this case study . . . . . . . . . . . . . . . . . . . . . . 82

– vi –

Chapter 1. Introduction

Object-oriented software undergoes continuous changes with various maintenance activities such as addition

of new functionalities, correction of bugs, improvement of performance, and adaptation to new environments.

Several empirical studies [62, 67, 90] have shown that the cost for maintenance and rework is biggest portion

of the cost of total software life cycle. In [91], they conducted the case study by analyzing the cost of software

development cost collected from three major European companies. Figure 1.1 shows the cost distribution across

the software life cycle based on a projected total production time of five years. Only those application systems for

which complete cost data was collected were included. For a total production time of five years, the percentage of

non-recurring costs amounts on average to 21% of the total life cycle costs. Therefore, 79% of the total costs are

recurring costs (i.e., maintenance and rework costs). This implies that to reduce the cost of software development,

it is important to reduce the maintenance cost.

The main factor affecting the maintenance cost is the design quality of software [63, 75, 31]. However, since

the changes often take place without consideration of the software’s overall structure and design rationale due to

time constraints, the design quality of the software may degrade overtime. This phenomenon is known as software

aging [72] or software decay [29]. Thus, refactoring can serve to restructure the design of object-oriented software

without altering its external behavior [29] to improve maintainability (or to make software accommodate changes

more easily), which in turn reduces maintenance costs and shortens time-to-market.

Much of the existing research on automated refactoring focuses on refactoring application [78, 25, 80, 51,

9], that is, applying refactorings on actual source codes. Several studies have attempted to support refactoring

identification. For instance, to support each activity of the refactoring process, (1) algorithms are developed to find

refactoring candidates with the opportunities of applying design patterns [55, 21, 50], removing code clones [56,

45, 93, 46], and improving code quality such as testability [42], as well as maintainability. (2) For evaluating the

design of the refactored code, design quality evaluation models such as QMOOD [56] and Maintainability Index

(MI) [69], or a special metric such as historical volatility [89], are used. Distance measures [83, 88] or weighted

sums of metrics [82] also have been used as evaluation functions (i.e., fitness functions); pareto optimality has

been used to compare different fitness functions and to combine results from different fitness functions [43]. (3)

The methods for scheduling of refactoring candidates also have been studied [58, 64] to achieve the greatest effect

of maintainability improvement. In an attempt to provide the method for automated refactoring, the literature

has proposed methods of refactoring identification by using several search techniques. O’Keeffe et al. [68]

treat object-oriented design as an optimization problem and employ several search techniques such as multiple

ascent hill-climbing, simulated annealing, and genetic algorithms to automate the refactoring process. They do

not provide where to apply which refactorings because extraction of refactoring candidates depends on random

choices. In the other aspect of refactoring automation, Steimann et al. [85] propose a concept of a framework for

– 1 –

Figure 1.1: Cost distribution in software lifecycle [91].

specifying refactorings in an ad-hoc fashion. They argue that in practice concrete refactoring needs may deviate

from what has been distilled as a named refactoring, and mapping these needs to a series of such refactorings is

worth to be developed. However, we still lack systematic approaches, clear guides, and automated tool support

for identifying where to apply which refactorings and in what order.

Goal and Approach

In the thesis, we provide the methods for supporting systematic refactoring identification: identification of

refactoring candidates and selection of refactorings to be applied. For each iteration of the refactoring identifica-

tion process, multiple elementary refactorings that most improve maintainability are produced. The procedure of

refactoring identification process is iterated until no more group of refactoring candidates for improving maintain-

ability are found.

For identification of refactoring candidates, we attempt top-down and bottom-up approaches. First, for the

top-down approach—traditional way of finding refactoring opportunities by using heuristic rules for eliminat-

ing violations of design principles (e.g., removing bad smells) in object-oriented software systems—we establish

the rules to extract the refactoring candidates with the aim of reducing dependencies of entities of methods and

classes. When establishing the rules, we are motivated by our previous studies [39] (see in Appendix A.) that have

shown that the data capturing how the system is utilized is an important factor for estimating changes; (1) program

usage data recorded from Integrated Development Environments (IDEs) significantly improves the accuracy of

– 2 –

change prediction approaches [77, 76] and (2) dynamic coupling measures [3] and behavioral dependency mea-

sures [39]—that are obtainable during run-time execution and pinpoint the systems’ parts that are often used—are

good indicators for predicting change-prone classes. As a result, we have come to argue that if changes are more

prone to occur in the pieces of codes that users more often utilize, then investing efforts on the refactorings in-

volving such codes may effectively improve maintainability. Therefore, entities are identified based on how the

users utilize the software (e.g., user scenario and operational profile [66]); and within these entities, refactoring

candidates are identified. Second, for the bottom-up approach—identification of refactoring opportunities without

humans’ insights [85]—we develop the method for grouping entities (i.e., methods and attributes) into maximal

independent sets (MISs). Different from the previous approaches for refactoring candidate identification, we at-

tempt to grouping elementary refactorings (i.e., Move Method refactorings) without pre-defined patterns. The

methods involved in each MIS are transformed into a group of elementary refactorings—in this thesis, Move

Method refactorings. Each of the group has elementary refactorings that can be applied at the same time. The

concept of the MIS is from graph theory. We select the group of refactorings containing the multiple elementary

refactorings that best improves maintainability. When calculating MISs, we take into account the new dependency

of refactorings—called refactoring effect dependency (RED). The RED is essential to be considered when select-

ing refactorings even though refactorings are not syntactically dependent—syntactic dependency of refactorings

indicates that the application of one refactoring changes or deletes elements necessary for the other refactorings

thus it disables those refactorings.

For selecting refactorings to be applied, we provide the method of selecting refactorings by supporting as-

sessment and impact analysis of elementary refactorings. We have developed the refactoring effect evaluation

framework (i.e., delta table. Each cell of the delta table indicates delta of maintainability after the application of

each elementary refactoring on the current design configuration. This delta table is used for refactoring selection

criteria. The matrix computation is used for calculating each of elementary refactoring’s effect on maintainability.

The matrix computation is fast, thus it provides efficient computation for deriving the delta table. It is important

that the method of elementary-level refactoring computation enables selecting multiple refactorings. Note that the

previous studies of exhaustive and greedy way of refactoring selection suffer from the need of too much space

exploration cost due to many possible sequences to be evaluated and the inefficiency in selecting just one best

refactoring for the iteration of refactoring identification process, respectively. Furthermore, this method supports

to extend considering refactorings to other various type of refactorings; because the action of big refactoring (e.g.,

Collapse Hierarchy Class refactoring) comprises of elementary refactorings (e.g., Move Method refactorings). The

procedure of refactoring selection consists of the several activities: (1) calculation for each of elementary refactor-

ing’s effect on maintainability, (2) checking whether there are duplicated elementary refactorings (i.e., syntactic

dependencies) or the RED among refactorings—that are identified as refactoring candidates from top-down and

bottom-up approaches—, (3) finding multiple (elementary) refactorings—that can be applied at a same time—

containing refactorings that most improve maintainability, and (4) identification of the impacted refactorings after

applying selected refactorings and recalculation of the changed values for those impacted refactorings.

– 3 –

We evaluate our proposed approach in three open-source projects—jEdit [49], Columba [20], and JGIT [52].

From the experimental results, we conclude that dynamic information is helpful in identifying refactorings that ef-

ficiently improve maintainability; because dynamic information is helpful for extracting refactoring candidates in

frequently changed classes. Therefore, considering dynamic information in addition to static information provides

more opportunities in identifying refactorings that efficiently improve maintainability because of the refactoring

candidates that are uniquely identified by the approach using dynamic information only. Furthermore, the experi-

mental results show that the refactoring identification using multiple refactorings selects refactorings that lead the

software design to reach higher fitness function values (better improve maintainability) with smaller costs (i.e.,

smaller search space exploration cost and shorter time). In addition, the RED should be considered when selecting

multiple refactorings.

Organization

The thesis is organized as follows. Chapter 2 contains a discussion of related studies. Chapter 3 explains

the framework for systematic refactoring identification and the overview of our proposed approach for refactoring

identification. Chapter 4 explains the methods of identification of refactoring candidates in terms of the top-

down and bottom-up approaches. Then, Chapter 5 explains the detailed procedure and the method of selection

of refactorings to be applied. Chapter 6 covers the implemented tool for applying our proposed approach. In

Chapter 7 and Chapter 8, we present the experiments performed to evaluate the proposed approach and discuss

the obtained results, respectively. Finally, we conclude and discuss future research in Chapter 9.

– 4 –

Chapter 2. Related work

In the following sections, we present representative studies of refactoring identification; note that we divide

the categories of related work in the aspect of the comparison with our approach.

2.1 Refactoring Candidate Identification Using Static Metric Based Heuris-tic Rules

Several studies have attempted to support for identifying refactorings using metrics as a means of detecting

refactoring candidates or evaluating refactoring effects [24, 83, 54, 86, 92]. Tahvildari et al. [86] propose a metric-

based method for detecting design flaws and analyzing the impact of the chosen meta-pattern transformations

for improving maintainability. They detect design flaws based on pre-defined quality design heuristics using

object-oriented metrics of complexity, coupling, and cohesion metrics. However, the authors do not provide a

systematic approach for applying given meta-pattern transformations; they offer neither clear rules for detecting

design flaws nor a method of how to apply meta-pattern transformations. This process still requires much human

interpretation and judgment. Moreover, the effect of certain given meta-pattern transformations are evaluated on

object-oriented metrics as positive and negative. Since a quantitative method for evaluating the effect of meta-

pattern transformations is not available, the approach cannot determine a sequence to be applied first among the

multiple potential meta-pattern transformations. Bart Du Bois et al. [24] provide a table representing the analysis

of the impact of certain refactorings, which redistribute responsibilities either within the class or between classes,

on cohesion and coupling metrics. In the manner of Tahvildari’s work [86], the authors specify the impact of

refactorings as ranges of best to worst cases as positive (i.e., improvement), negative (i.e., deterioration), and zero

(i.e., neutral); it also lacks a means of quantitative refactoring-effect evaluation, which is essential for making a

decision on which refactorings should be applied first. Simon et al. [83] provide a software-visualization approach

using a distance-based cohesion metric to support developers for choosing appropriate refactorings; the parts with

lower distances are cohesive whereas parts with higher distances are less cohesive. However, decisions for which

refactorings should be performed and how to apply those refactorings are still heavily dependent on developers, as

the authors admit that they presume that the developer is the last authority in identifying and applying refactorings.

In the above-mentioned studies, the metrics are obtained using statically profiled information from source codes,

in other words, without executing a program, which might suggest refactorings on parts of software that is not

really in use. Furthermore, as pointed out above, they provide neither exact algorithms guiding where to apply

which refactorings nor a quantitative evaluation method, which are essential for selecting better refactoring.

Research has looked at providing a tool support and systematic methodology to assist developers in making

decisions as to where to apply which refactoring. Tsantalis et al. [88] propose a methodology and constructed

– 5 –

a tool for the identification of Move Method refactoring opportunities that solve Feature Envy bad smells. They

extract a list of behavior-preserving refactorings based on a distance-based measure that employs the notion of

distance between system entities (i.e., methods and attributes) and classes. This concept of distance for measuring

lack-of-cohesion is also used in [83]. The authors also defined an Entity Placement metric, also based on the

concept of distance and used as a means of quantitative refactoring-effect evaluation. However, in their experiment,

they show the performance of refactoring opportunities by measuring the effect of refactored designs only on

coupling and cohesion metrics and some qualitative analysis.

2.2 Determining Refactoring Sequences to be Applied

Selection Strategy: Exhaustive or Greedy

It is difficult to schedule refactorings by considering the issues of refactoring dependencies or newly created

refactoring candidates. Theoretically, when the number of available refactoring candidates is m and the number

of the selecting refactorings is n and assuming that there are no repetitions of refactoring candidates, the number

of the refactoring schedules that need to be examined is n-permutations of m that can be formulated by m!/(m-

n)!. As the number of refactoring candidates increases, the number of possible refactoring schedules increases

exponentially. Therefore, scheduling refactorings by investigating all possible orders may become NP-hard.

For selecting refactoring candidates, in the studies [88, 28, 38], they the single refactoring that best improves

the current design of software is selected in a stepwise (i.e., greedy) way after extracting and assessing refactoring

candidates. Then, the selected refactoring is applied, and the refactoring identification process is iterated until no

more refactorings that can improve maintainability are found. Finally, the sequence of refactorings is generated

by logging the results of the selected refactoring for each refactoring identification process. This selection method

offers the advantage of taking the change of system; thus, newly created refactoring candidates can be considered.

In addition, by extracting and assessing refactoring candidates again after applying the selected refactoring, refac-

toring dependencies do not need to be considered. However, it is inefficient to select just the single best refactoring

for each refactoring identification process. The costs (e.g., search space exploration cost and computation cost) of

extracting and assessing refactoring candidates are high.

To address the limitation, we provide an automated method for selecting multiple refactorings that can be

applied at the same time.

Search Space Reduction

The methods of narrowing the refactoring sequences to those that are semantically sound and avoiding se-

quences leading to the same results have been also studied. Piveta et al. [73] propose an approach to narrow the

number of refactoring sequences by discarding those that semantically does not make sense and avoiding those

that lead to the same results. They also provide a detailed example of the approach considering sequences for

method manipulation, showing how the number of sequences can be significantly reduced.

– 6 –

Refactoring Scheduling: Finding an Optimal Sequence of Refactoring Applications using Search Tech-niques

In studies of search-based refactoring, they try to find an optimal sequence of refactoring applications using

search techniques. Lee et al. [56] and Seng et al. [82] use genetic algorithms to produce a sequence of refactor-

ings to apply to reach an optimal system in terms of the employed fitness function. However, [82] do not take into

account that the application of a refactoring may create new refactoring candidates not originally present in the

initial system. Moreover, some of the produced sequences of refactorings using the search-based approach may

not be feasible to be applied because of dependencies among refactoring candidates; applying one refactoring

may conflict with the application of other refactorings. The scheduling approaches using the Genetic Algorithm

(GA) do not perform feasibility checking of generated sequence of refactorings and they would require additional

cost of repairing. Lee et al. [56] try to resolve the refactoring-conflict problem by repairing infeasible sequences

of refactorings, but it seems time-consuming to reorder the randomly generated sequence of refactorings without

considering refactoring conflict. As a consequences, for considering newly generated, deleted and changed refac-

toring candidates, the studies of refactoring scheduling using search techniques would require much of search

exploration cost due to too many possible sequences of refactorings to be evaluated. Therefore, in our approach,

we attempt to select a group of refactorings that can be applied at a same time.

2.3 Analysis of Dependencies/Conflicts between Refactoring Candidates

When selecting refactorings, as mentioned briefly in the previous section, syntactic dependency (i.e., con-

flicts) of refactorings should be considered. The syntactic dependency of refactorings indicates that if the ap-

plication of one refactoring changes or deletes elements necessary for the other refactorings, thus disable those

refactorings. Therefore, many research efforts have been invested in resolving conflicts of the extracted refactor-

ings and applying as many refactorings as possible for improving maintainability of software.

Tom Mens et al. [64] represent refactorings as graph transformations; and they propose the techniques of

critical pair analysis and sequential dependency analysis to detect the dependencies between refactorings. Using

the results of this analysis can help the developer to make an informed decision of which refactoring is most

suitable in a given context and why. In the very similar manner, Fawad Qayum et al. [74] represent the system by a

graph model and refactoring steps as graph transformation rules. Then, dependency information (which is derived

from the analysis of graph transformation) is used for expressing the problem as an instance of the optimisation

problem. Zibran et al. [93] notice the importance of considering the syntactic dependency of refactorings. They

argue that the application of a subset of refactoring from a set of applicable refactoring activities may result in

distinguishable impact on the overall code quality; moreover, there may be sequential dependencies and conflicts

among the refactoring activities. Hence, they insist that it is necessary that, from all refactoring candidates a subset

of non-conflicting refactoring activities be selected and ordered (for application) such that the quality of the code

base is maximized while the required effort is minimized.

– 7 –

Chapter 3. Overview of Our Approach

Framework for Systematic Refactoring Identification

According to [65], the refactoring process consists of the following distinct activities.

1. Identify places where the software should be refactored.

2. Determine which refactoring(s) should be applied.

3. Guarantee that the applied refactoring preserves behavior.

4. Apply the refactoring.

5. Assess the effect of the refactoring on quality characteristics of the software.

6. Maintain the consistency between the refactored program code and other software artifacts such as docu-

mentation, design documents, and test cases, etc.

Based on this process, we categorize the activities of the refactoring process into three phases (Table 3.1).

Table 3.1: Identified three phases by referencing refactoring process in [65].

Phase Description

Refactoring Identification Determination where to apply which refactorings in what order

Refactoring Application Actual modification on source code

Refactoring Maintenance Testing the refactored code, consistency checking with other

software artifacts, and change management

”Refactoring-identification phase” refers to planning to determine where to apply which refactorings or how

to apply the refactorings for meeting the goal of refactoring, such as improvement of maintainability, under-

standability, and testability. ”Refactoring-application phase” refers to the task of applying planned refactorings

on actual source codes. ”Refactoring-maintenance phase” refers to three activities: testing the refactored code,

checking consistency with other software artifacts such as requirement documents or Unified Modeling Language

(UML) models, and change management. The refactoring is one kind of code change; therefore, in the change

management activity, tasks for recording change logs and change owners—who are responsible for making those

changes—for applying each refactoring are needed.

In the thesis, we focus on refactoring identification. To enable automated refactoring, we propose a frame-

work for systematic refactoring identification—the activities of refactoring candidate identification, refactoring

candidate assessment, and refactoring selection—(as in Fig. 3.1). In Chapter 4 and Chapter 5, we will explain the

– 8 –

Source code

AccessmentExtraction

Selection

Refactoring Candidates

Fit. Func. 1

Fit. Func. 2

7 8 6 3 10 8

8 7 4 5 10 9

Application

Selected Refactorings

Figure 3.1: A proposed framework for systematic refactoring identification.

detailed methods of the proposed approach for refactoring identification.

Overall Procedure of Our Approach

Fig. 3.2 shows an overview of our proposed approach for refactoring identification. The following briefly

describes a procedure of the proposed refactoring identification method. The input of refactoring identification

is the source code of program. Refactoring identification consists of two main activities: refactoring-candidate

identification and refactoring selection. In the refactoring candidate identification activity, refactoring candidates

are (1) extracted using the dynamic information based rules, and (2) grouped into the MISs considering the RED.

In the refactoring selection activity, we have developed the refactoring effect evaluation framework (i.e., delta

table) for assessing each elementary refactoring effect on maintainability based on matrix computation. Each

group of refactorings are sorted in the order of expected degree of improvement on maintainability by using the

delta table. The group of refactorings containing the multiple refactorings that best improves maintainability is

selected and applied; and the delta table is recalculated. The refactoring selection process (i.e., the procedure

of selection for multiple refactorings) is iterated until no more refactorings that can improve maintainability are

found. The output is the groups of elementary refactorings, which are the logged results obtained from each

refactoring selection process. The next section will explain the detailed procedure and methods of the refactoring-

candidate identification and refactoring selection.

– 9 –

I. Refactoring Candidate Identification

Extracting with Dynamic Information

based Rules

RED-aware Grouping into Maximal

Independent Sets

Refactoring Candidates

Source Code of Object-Oriented Program

II. Refactoring Selection

Assessing the Effect

of Refactorings

Selecting Multiple

Refactorings

Selected Multiple Refactorings

Apply Selected

RefactoringsStop

No Improvement

Refactoring Effect

Evaluation Framework

• RED: Refactoring Effect Dependency

Figure 3.2: Overall procedure of our approach.

– 10 –

Chapter 4. Identification of Refactoring Candidates

4.1 Top-Down Approach: Extracted with Heuristic Rules

Operational profile or

User scenarios

Object-oriented

source code

Dynamic dependency Static dependency

Profiled model

Collapse Class Hierarchy and

Move Method refactorings

Extracting refactoring candidates

Refactoring

candidate

extraction

rules

Figure 4.1: Overall approach of the top-down approach: extracted with the dynamic information based heuristic

rules.

The Fig. 4.1 illustrates the procedure for the approach of the top-down approach—extracted with the dynamic

information based heuristic rules.

4.1.1 Dynamic Information-based Identification of Refactoring Candidates

Refactoring candidates are extracted with the aim of reducing dependencies of entities of methods and classes,

since the goal of the refactoring in our approach is to make software accommodate changes more easily. By moti-

vated by the studies—the data capturing how the system is utilized is an important factor for estimating changes—,

we use dynamic information of how the users utilize the software for identifying the entities involved in given

user scenarios and operational profile [66]; and within these entities, refactoring candidates are extracted. Regard-

ing dynamic information, we use dynamic profiling technique to obtain the dynamic dependencies of entities—

Dynamic Method Calls (DMCs)—based on dynamic method calls by executing programs based on user scenarios

or operational profiles; and we specifically designed the profiled model, Abstract Object Model (AOM), for saving

the dynamically and statically profiled information.

– 11 –

Change Preventing Related Design Problems and Resolving Refactorings

Dependency refers to a relationship where the structure or behavior of an entity is dependent on another

entity [2]. UML defines dependency as a relationship where a change to the influent modeling element may

affect the dependent modeling element [70]. Object-oriented software involves structural and behavioral aspects

of dependencies [32]. The relationships in classes such as association, aggregation, composition, and inheritance

represent structural dependencies. Behavioral dependency occurs when a method calls another method (i.e., when

a method requires a service from another method to execute its own behavior); in this case, the methods or

the owner classes of those methods have behavioral dependencies. In our approach, we consider a behavioral

dependency that occurs due to a method call. Note that structural and behavioral dependencies are not mutually

exclusive; an entity can have both structural and behavioral dependencies on another entity [32]. For example, a

”use” type of association relationship for structural dependency entails behavioral dependency. High dependency

between entities makes change-sensitive software in that many classes are modified when making a single change

to a system (e.g., Shotgun Surgery [29]), or a single class is modified by many different types of changes (e.g.,

Divergent Change [29]). This makes software difficult to maintain, and, thereby, lowers the overall maintainability

level. The kinds of situations mentioned above should be resolved. Therefore, refactorings should be applied in a

way that reduces dependencies of entities (i.e., methods and classes), resulting in software accommodate changes

more easily.

Fowler [29] suggests considerable refactorings for resolving the change preventing related bad smells—

Divergent Change and Shotgun Surgery—as follows: Inline Class (i.e., merging class; in our approach, Collapse

Class Hierarchy), Move Method, Move Field, and Extract Class, etc. Among the mentioned refactorings, we

currently support two refactorings: Collapse Class Hierarchy and Move Method. In a Collapse Class Hierarchy

refactoring, all methods and fields contained in a class are moved into another class; subsequently, the moved

class is deleted. In a Move Method refactoring, a method is moved into a target class. We do not consider Move

Field refactoring—moving attributes (i.e., fields) from one class to another class—, because fields have stronger

conceptual binding to the classes in which they are initially placed since they are less likely than methods to change

once assigned to a class [88]. For Extract Class refactoring, our rule-based approach has difficulty in determining

specific code blocks to be split in an automated way; therefore, we leave this refactoring for future work.

Use of Dynamic Information to Find Refactoring Candidates in Change-Prone Parts

Previous studies have shown that the data capturing how the system is utilized is an important factor for es-

timating changes. Robbes et al. [77] show that using program usage data recorded from Integrated Development

Environments (IDEs) significantly improves the overall accuracy of change prediction approaches. The exper-

imental results of the other study [3] and our study [39] show that dynamic coupling measures and behavioral

dependency measures— that are obtainable during run-time execution and pinpoint the systems’ parts that are

often used—are good indicators for predicting change-prone classes.

Being motivated by these works, we have come to argue that if changes are more prone to occur in the pieces

– 12 –

of codes that users more often utilize, then applying refactorings in these parts would fast improve maintainability

of software. The underlying assumption is that the pieces of codes that have been used more are more likely to

undergo changes in a future version; therefore, investing efforts on the refactorings involving such codes may

effectively improve maintainability. By using only static information (i.e., that can be obtained by analyzing

source codes statically without running a program) such as structural complexity of the program, refactorings may

be suggested on rarely, or, even worse, never-used entities. If changes have never occurred in such entities, then

the benefit—for example, reduced maintenance cost for accommodating the changes—of the application of those

refactorings may be little to none. In this case, refactorings need to be applied on the other entities.

Dependencies of Entities Based on Dynamic Method Calls (DMCs)

The procedure of dynamic profiling technique used in our approach is presented; and the definition and

measurement of the DMC is provided. At last, the profiled model use in our approach is explained.

Java VM

Executing

Inserted logging

codes

void methodA(Type param, …)

{

log(“enter methodA”);

…

c.methodB();

…

log(“exit methodA”);

return ret;

}

void methodA(Type param, …)

{

…

c.methodB();

…

return ret;

}

Class Loading

Instrumentation

driver

Java Compiler

Java

source

code

Compiled

byte code

Instrumented

byte code

Insert logging

codes

Compiling

Logging

…

enter methodA

…

enter methodB

…

exit methodB

…

exit methodA

…

Dynamic

method calls

Code level

view

Figure 4.2: Procedure for dynamic profiling.

Dynamic profiling. Dynamic profiling is a form of dynamic program analysis that measures, for example, the

use of memory, the use of particular instructions, or frequency or duration of method calls; it is achieved by

instrumenting either the program source code or its binary executable form. Dynamic profiling has been used

by many researchers [47, 16, 84]. The most common use of dynamically-profiled information is to aid program

optimization—for example, the compiler writers use it to find out how well their instruction scheduling or branch

prediction algorithm performs. In our approach, the dynamic profiling technique is used to obtain the dependencies

of entities of methods and classes by executing programs the same way as in live operation based on user scenarios

or operational profiles—a quantitative representation of how the software will be used. Note that in software

reliability engineering, for making reliability estimation, user scenarios or operational profiles [66] are developed

and maintained to describe how users actually utilize the system. These dependencies are obtained by logging the

– 13 –

frequency of method calls; those dynamic dependencies are defined as DMCs.

Fig. 4.2 depicts the procedure of dynamic profiling used in this thesis. The java instrumentation technique

[23] is used for dynamic profiling. On the compiled byte code, entering and exiting logging codes are inserted

at the start and the end of all method declarations. This enables the tracing of logs of executed methods while

executing a program without modifying the original source codes.

Table 4.1: Difference between the static method call (SMC) and the dynamic method call (DMC).

SMC DMC

Source of Source codes or Programs execution according

measurement structural models to user scenarios or operational profiles

Subject of Class and method Object and message

measurement

Degree of Number of Number of

measurement distinct methods all messages

Dynamic Method Call (DMC). DMC is an instantiated form of a static method call (SMC). The differences

between the DMC and the SMC is explained in Table 4.1. Definition 1 offers a precise definition of the DMC.

Definition 1 (Definition of DMC). When an object o1 sends a message n to an object o2, there exists a DMC. We

denote this relation as o1 −→n o2. In the definition, DMC is represented as dmc and it consists of six attributes as

follows:

- id: a unique identifier.

- mcallee: a method from which the message n is initiated; a method called from the method mcaller.

- mcaller: a method which calls mcallee.

- ccallType: a calling type class; a structural callee class.

- ccaller: an owner class of method mcaller.

- ccallee: an owner class of method mcallee.

As listed, id refers to a unique identifier of the dmc. The dmc can be defined with two ends of methods

mcaller and mcallee, and two ends of classes ccaller and ccallee, which are the owner classes of those methods.

The ccallType is a calling type class that denotes a structural callee class. The DMCs existing in the system can

be retrieved with respect to two parameters: (1) entity (ε) such as method and class; and (2) direction (δ) such

as import and export. The DMC for a class or method in the import direction occur when the class or method

imports services from external class(es); in other words, the class or method uses other methods that are defined

in external class(es). On the other hand, the DMC for a class or method in the export direction occur when the

class or method exports services to external class(es); in other words, other methods defined in external class(es)

use the class or method. We specify each direction of import and export using the following symbols, ≺ and �,

– 14 –

respectively. We denote DMC(ε, δ) as the list of DMCs that are retrieved respect to the entity ε and the direction

δ.

AOMClass

AOMMethod

AOMField

DynamicMethodCall

StaticMethodCall

-owner

1

-methods*

-owner

1

-fields

*

-dynamicReferer0..*

-callee

1

-caller 1

-dynamicMethodCall

0..*

-staticReferer 0..*

-callee

1

-caller1

-staticMethodCall

0..*

-referer*

-fields

*

-dynamicDependency

0..*

-staticDependency

1

Figure 4.3: A metamodel of the AOM used in this thesis.

Abstract Object Model (AOM). AOM is the profiled model; and it is specifically designed for saving the dy-

namically and statically profiled information. Fig. 4.3 shows the metamodel of the AOM. A method meta-class

AOMMethod is associated with a DMC meta-class DynamicMehtodCall and a SMC meta-class StaticMethod-

Call. This enables a DMC/SMC to be navigable with two ends: a caller method and a callee method. In the

opposite direction, a method is able to navigate the DMCs/SMCs that the method calls and the DMCs/SMCs

by which the method is referred. The DMC meta-class DynamicMehtodCall is also associated with the SMC

meta-class StaticMethodCall. If multiple or even zero method calls exist between two entities (such as methods

or classes) during run-time execution, the SMC counts this as one. In other words, the multiplicity of the SMC

to the DMC is 0..*, whereas the multiplicity of the DMC to SMC is one. This enables the DMC to be navigable

with the SMC from which the DMC is instantiated. In the opposite direction, the SMC is able to navigate DMCs

that are actually instantiated. By maintaining the metamodel of the AOM, information related to the DMCs can be

updated without re-doing dynamic profiling at every trial of refactoring. When dynamic profiling, the DMCs are

mapped into the corresponding SMCs from which those DMCs are instantiated. Therefore, for each application of

refactoring, by adjusting the information related to the SMCs—such as the classes and fields of two ends of caller

and callee methods of the SMCs—, the updated information related to the DMCs can be obtained by tracing the

information related to the SMCs.

4.1.2 Refactoring Candidate Extraction Rules

Based on the DMCs, the rules are defined for extracting refactoring candidates. By trying every refactoring-

candidate extraction rule, pairs of entities (i.e., methods and classes) are extracted as refactoring candidates ac-

cording to the heuristic design strategy, which is defined in a way aimed at reducing dependencies of those entities;

then, using the max function, the part of refactoring candidates that are highly-ranked with the scoring function

are chosen to be assessed. The heuristic design strategies used in our approach are explained in subsection 4.1.2.

– 15 –

Elements of Refactoring-Candidate Extraction Rule

Refactoring candidate extraction rules specify where to refactor and which refactoring to use. Each rule

consists of three elements: (1) the scoring function, (2) the max function, and (3) the specific corresponding

refactoring to apply:

Scoring function. A scoring function is a kind of a fitness function. It represents how much each pair of entities—

which is extracted as a refactoring candidate according to a heuristic design strategy—fits into the heuristic design

strategy. Therefore, a scoring function is designed to retrieve how many times a pair of entities is extracted as a

refactoring candidate using the heuristic design strategy.

Max function. It is infeasible to assess all the refactoring candidates extracted from all the defined rules, because

there are too many. Note that to assess refactoring candidates, each refactoring candidate has to be individually

applied to the current version of the program and its effect on the refactored program is evaluated, which requires

a large computation cost. Therefore, we assess only the refactoring candidates that are highly-ranked (top k) with

scoring functions. The role of the max function is to cut off the top k refactoring candidates to be assessed. In the

rule, the cutline number represents k.

Refactoring. As stated in subsection 4.1.1, we use two types of refactorings—Move Method and Collapse Class

Hierarchy—, and the operations of those refactorings are presented in Algorithm 1 and Algorithm 2, respectively.

We formulate pre- and post-conditions referring to [88, 36, 71] and check before and after refactoring applications.

We do not specify these conditions in this thesis.

Algorithm 1 Collapse Class Hierarchy

Require: Cmerging: a class that is merging the other class,

Require: Cmerged: a class that is to be merged

for all Mmerged ∈ Cmerged doMove Method(Cmerging, Mmerged)

Cmerged.ancestor ← Cmerged.ancestor ∪ Cmerging.ancestorCmerged.descendant← Cmerged.descendant ∪ Cmerging.descendantCmerged.field← Cmerged.field ∪ Cmerging.fieldremove Cmerged

Algorithm 2 Move Method

Require: C: a target class to which a method is moved

Require: M : a method to be moved

M.overridingMethod← findingOverriding(C, M )

/∗findingOverriding function is specified in Algorithm 3.*/

C.method← C.method ∪ {M}

– 16 –

Algorithm 3 findingOverriding

Require: c: a target class to which a method is moved

Require: m: a moving method

/*Algorithm findingOverriding returns the method by whichmovingMethodis overrided.*/

queue← ancestor classes of c

visitedClass← ∅while queue 6= ∅ dotmpClass← remove one element of class from queue

add tmpClass to visitedClass

for all tmpMethod ∈ methods in c doif tmpMethod = m then

return tmpMethod

for all ancClass ∈ ancestor classes of tmpClass doif visitedClass does not contain ancClass then

add ancClass to queue

return null

Design of Refactoring-Candidate Extraction Rule

For each design strategy, pairs of methods (or classes) are extracted as the entities of the refactoring candidate

of Move Method (or Collapse Class Hierarchy), and the number of extractions for the pairs of methods (or classes)

is retrieved by the scoring function. The rules are defined in the following way: a part of refactoring candidates

that are highly-ranked with the scoring function are chosen to be assessed using the max function. Note that for

each strategy, two types—method and class—of scoring functions are obtained, and three rules are defined.

In the following, for each type of heuristic design strategy, we present a brief explanation and the procedure

for obtaining the corresponding scoring functions. We then define the refactoring-candidate extraction rules using

the obtained scoring functions in a semi-formal way.

[Heuristic design strategy type 1.]

Explanation. It is better to gather the methods, which are called by one method but are spread over many different

classes, into one class. Let a method m call the methods, and those called methods are implemented in

different classes. The N stands for the threshold to determining the situation such that those called methods

are implemented in many different classes. Therefore, we define the following heuristic design strategies:

when those called methods are implemented in the N (N = 2, 3, 4, 5, and 6) classes, those methods (or

their owner classes) are extracted as the entities of refactoring candidates of Move Method (or Collapse

Hierarchy Class). In this thesis, we set theN from 2 to 6, because we have tested for all methods in all three

subjects—jEdit, Columba, and JGIT—, and the maximum number of different classes for each subject does

not exceed 6. Note that 1 need not to be examined because it means all the called methods are in the same

class. The N is not fixed and can be differentiated according to the characteristic of the used project.

– 17 –

Algorithm 4 getNDiff M and NDiff C (N = 2, 3, 4, 5, 6)

for all c ∈ classes in the system, m ∈ methods in c dodiffClass← ∅ /*a set for saving different callee classes*/

for all dmc ∈ DMC(m,≺) doif dmc.ccallee 6= c then

add dmc.ccallee to diffClass

if diffClass.size ≥ N thenfor all dmc1, dmc2 ∈ DMC(m,≺) do

if dmc1 6= dmc2 thenc1 ← dmc1.ccallee, c2 ← dmc2.ccalleeif c1 6= c2 && c1 6= c && c2 6= c thenNDiff C({c1, c2})← NDiff C({c1, c2}) + 1

m1 ← dmc1.mcallee, m2 ← dmc2.mcallee

if m1 6=m2 && m1 6=m && m2 6=m thenNDiff M({m1,m2})← NDiff M({m1,m2}) + 1

Procedure. Algorithm 4 is illustrated for obtaining scoring functions as follows. For all class c in the system, and

for all method m in class c, let a method m call the methods, and those called methods are implemented

in different classes. If the number of different classes is greater than or equal to N—the threshold to

determining the situation such that methods are implemented in many different classes—, then the pair of

methods in the list of the called methods is extracted as the entities of the refactoring candidate of Move

Method, and the number of extraction for the pair of methods is increased for the scoring functionNDiff M

(when methods in the pair are neither identical to each other nor identical with the method m). This also

applies to the class-level; therefore, the pair of classes in the list of classes—the owner classes of those

called methods—is extracted as the entities of the refactoring candidate of Collapse Hierarchy Class, and

the number of extraction for the pair of classes is increased for the scoring function NDiff C (also when

classes in the pair are neither identical to each other nor identical with the class c).

Rules. The N stands for 2, 3, 4, 5, and 6, hence for this type of the heuristic design strategy, five design strategies

are defined; then, a total of 15 rules are defined.

• R1(N=2), R4(N=3), R7(N=4), R10(N=5), R13(N=6):

∀(ci, cj) ∈max(NDiff C, cutline)

→ Collapse Class Hierarchy(ci, cj)

• R2(N=2), R5(N=3), R8(N=4), R11(N=5), R14(N=6):

∀(mi, mj) ∈max(NDiff M, cutline)

→Move Method(owner class of mi, mj)

• R3(N=2), R6(N=3), R9(N=4), R12(N=5), R15(N=6):

– 18 –

∀(mi, mj) ∈max(NDiff M, cutline)

→Move Method(owner class of mj , mi)

Algorithm 5 getI C and I M

for all c ∈ classes in the system, m ∈ methods in c doif m 6= null then

for all dmc1 ∈ DMC(m,≺) doc1← dmc1.ccaller, c2← dmc1.ccalleeif c1 6= c2 && c1 6= c && c2 6= c thenI C({c1, c2})← I C({c1, c2}) + 1

m1← dmc1.mcaller, m2← dmc1.mcallee

if m1 6=m2 && m1 6=m && m2 6=m thenI M({m1,m2})← I M({m1,m2}) + 1

[Heuristic design strategy type 2.]

Explanation. Again, it is better to gather methods that have many interactions into one class. Let a methodm call

the other method n, and those methods are implemented in different classes. Then, we define the following

heuristic design strategy: when those two methods have interactions, those methods (or their owner classes)

are extracted as the entities of refactoring candidates of Move Method (or Collapse Hierarchy Class).

Procedure. Algorithm 5 is illustrated for obtaining scoring functions as follows. For all class c in the system, and

for all method m in class c, let a method m call the other method n, and those methods are implemented in

different classes. Subsequently, the pair of methods is extracted as the entities of the refactoring candidate of

Move Method, and the number of extraction for the pair of methods is increased for the scoring function I M

(when the methods in a pair are not identical to each other). This also applies to the class-level; therefore, the

pair of methods classes—the owner classes of those methods—is extracted as the entities of the refactoring

candidate of Collapse Hierarchy Class, and the number of extraction for the pair of classes is increased for

the scoring function I C (also when classes in a pair are not identical to each other).

Rules. For this type of the heuristic design strategy, one heuristic design strategy is defined; then, three rules are

defined. Note that for each rule, the refactoring candidates which are highly-ranked (top cutline) with scor-

ing functions are chosen to be assessed; this can be said that the pairs of entities that have many interactions

are chosen to be assessed.

• R16: ∀(ci, cj) ∈max(I C, cutline)

→ Collapse Class Hierarchy(ci, cj)

• R17: ∀(mi, mj) ∈max(I M, cutline)

→Move Method(owner class of mi, mj)

– 19 –

• R18: ∀(mi, mj) ∈max(I M, cutline)

→Move Method(owner class of mj , mi)

– 20 –

4.2 Bottom-Up Approach: Grouping Entities into Maximal IndependentSets

MISs of EntitiesMISs of EntitiesMaximal Independent Sets of

Entities

Object Oriented Source Code

Constructing RER-aware Graph

Grouping Entities into Maximal

Independent Sets

RER-aware Graph

• RED: Refactoring Effect Dependency

• MIS: Maximal Independent Set

MIS4MIS3MIS2MIS1

Figure 4.4: Overall approach of the bottom-up approach: grouping entities into MISs.

For grouping of elementary refactorings considering the RED, the RED-aware graph is constructed; and

based on the graph, the entities of methods and attributes are grouped into the MISs. By referring the delta table—

obtained from the refactoring effect evaluation framework for evaluating each elementary refactoring effect on

maintainability—, methods involved in each MIS are mapped into a group of elementary Move Method refactor-

ings. Note that the Move Method refactorings in the same group can be applied at the same time. The Fig. 4.4

illustrates the procedure for the approach of the bottom-up approach—grouping entities into MISs.

Before explaining about the algorithm of RED-aware grouping of MISs, we explain the importance of the

RED.

4.2.1 Refactoring Effect Dependency (RED) on Maintainability

When grouping elementary refactorings that are to be applied at the same time, refactoring dependency

needs to be considered. The syntactic dependency of refactorings indicates that the application of one refactor-

ing changes or deletes elements necessary for the other refactorings thus it disables those refactorings. Many

research efforts have been invested in resolving the syntactic dependencies of the refactorings and applying as

many refactorings as possible for improving maintainability of software.

Even though the refactorings are not conflict each other, however, applying all the refactorings—that are

expected to improve maintainability—does not guarantee to improve maintainability. This is because refactorings’

– 21 –

m2

Class B

m1

m3

Class A

m6m7

Class C

Class D

m4

m5

(a) An example design.

m2

Class B

m1

m3

Class A

m6m7

Class C

Class D

m4

m5

(b) After moving method m3 to class A (from Fig. 4.5(a)).

D A B C D

m1 1 - 1 1

m2 - -1 0 -1

m3 -2 - 0 1

m4 0 -1 - 0

m5 -1 0 0 -

m6 - -1 0 0

m7 - -1 0 0

Target Class

Mo

vin

g M

eth

od

(c) ∆ coupling for each moving method

refactoring (Fig. 4.5(a)).

Target Class

Mo

vin

g M

eth

od

D A B C D

m1 -1 - 0 0

m2 - 1 1 0

m3 - 2 2 3

m4 -1 0 - 0

m5 -1 0 0 -

m6 - 1 1 1

m7 - 1 1 1

(d) ∆ coupling for each moving method

refactoring (Fig. 4.5(b)).

Figure 4.5: A motivating example of showing the need of the RED on maintainability.

effect on maintainability is dependent each other. Once a refactoring applied, the design configuration of the

software is changed; and this may influent other refactorings’ effect on maintainability. In other words, other

refactorings’ effect on maintainability may be changed as the status (e.g., how the entities are associated and

where the entities are placed) of the software design is changed. Therefore, the pre-calculated (i.e., intended)

effect on maintainability of other refactorings—that even do not have syntactic dependencies—may be changed.

– 22 –

Fig. 4.5 is a motivating example showing the need of RED. The system (in Fig. 4.5(a)) consists of four classes

and each class contains methods A = {m2, m6, m7}, B = {m1, m3}, C = {m4}, and D = {m5}; and the method

calls are represented with directed edges. Let the number of edges across the classes be coupling value. Each

cell in the table (rows: moving methods, columns: target classes) in Fig. 4.5(c) represents the delta of coupling

value after the application of each Move Method refactoring on the design (Fig. 4.5(a)). For example, the delta of

coupling value after moving the method m3 to the class A is -2. (Let Move Method(method m3, class A) denote

this refactoring.) In other words, this application of refactoring reduces the coupling value as -2. Then, expected

output of total reduced coupling value after the application of the two refactorings, Move Method(method m3,

class A) and Move Method(method m2, class B), is -3 (= -2 -1). However, after applying Move Method(method

m3, class A), the design configuration is changed (as in Fig. 4.5(b)). The cells shaded with pink color in Fig. 4.5(d)

represent the changed delta of coupling values after the application of the refactoring Move Method(method m3,

class A). As a result, the delta of coupling value after the application of the refactoring Move Method(method m2,

class B) is +1 (not -1); and the actual output of total reduced coupling value by applying those two refactorings is

-1 (= -2 +1). Without considering the RED on maintainability, not intended results may come up; and even worse

the total effect of the application of refactorings may be none. To the best of my knowledge, no one noticed or

discussed this kind of refactoring dependency before.

The RED is essential to be considered to correctly identify a group of refactorings that most improve main-

tainability for each iteration of the refactoring identification process. As a result, when grouping elementary

refactorings, we take into account the new dependency of refactorings—RED. We provide the clear definition of

the RED on maintainability.

Definition 2 (Definition of the RED on Maintainability). The effect on maintainability of refactoringA and refac-

toring B is dependent each other, when the application of the refactoring A changes the effect on maintainability

of refactoring B, and vice versa, even those two refactorings are not syntactically dependent.

4.2.2 Algorithm of RED-aware Grouping: Maximal Independent Set (MIS) Calculation

We develop the method for RED-aware grouping of elementary refactorings by using the concept of the MIS

in graph theory.

Maximal Independent Set in Graph Theory

We present a MIS [53, 59, 1] in graph theory. Its concept is used for grouping of elementary refactorings.

Given a graphG = (V , E), an Independent Set (IS) is a set of vertices S ⊆ V such that if u, v ∈ S, then (u, v) 6∈E.

In short, an IS is a set of vertices in G such that no two vertices in IS are adjacent (i.e., connected by an edge). A

Maximal Independent Set (MIS) is an IS to which no more vertices can be added without violating independence

property. In short, a MIS is an IS that is not a subset of any other IS. A Maximum Independent Set (MaxIS) is an

IS with maximum cardinality among all IS sets of G.

Finding a MIS is trivial in the sequential algorithm; it just scan the vertices in arbitrary order. If a vertex u

– 23 –

G

(a) Initially, I = empty.

G = G1

V1

(b) Phase 1: Pick a node V1 and add it to I.

G1

V1

(c) Phase 1: Remove V1 and neighbors N(V1).

G2

V2

(d) Phase 2: Pick a node V2 and add it to I.

G2

V2

(e) Phase 2: Remove V2 and neighbors N(V2).

G3

(f) Phase 3,4,5,. . . : Repeat until all nodes are removed.

G

(g) At the end, set I will be an MIS of G.

Figure 4.6: A sequential algorithm of calculating a MIS. This is reference from the lecture note of Costas Busch

[17].

does not violate independence, then add u to the MIS. Otherwise, if u violates independence, then discard u. The

figures (in Fig. 4.6) illustrate the sequential algorithm for calculating a MIS.

On the contrary, computing a MaxIS is a notoriously difficult problem. It is equivalent to maximum clique

on the complementary graph. Both problems are NP-hard, in fact not approximable within n12−ε. Therefore,

finding all the existing MISs is also the NP-hard problem. For the reason stated above, we use some heuristic for

calculating MISs, which is scalable for large size of programs. We try to find MISs, each of which has as many

independent entities (which are later to be transformed into elementary refactorings) as possible. This is because

the more elementary refactorings are in a MIS, the larger maintainability improvement can be expected. Thus,

– 24 –

we find the intermediate groups of entities—that already have the large number entities as possible—by grouping

the entities using transitive independent relations. This algorithm is also deterministic—the calculated the MISs

is always same for a certain system.

Algorithm of RED-aware Grouping: Maximal Independent Set Calculation

For RED-aware grouping of elementary refactorings, we calculate MISs of entities based on the graph of the

RED-aware graph; and entities in a MIS is be mapped into elementary refactorings in a group. Note that each

elementary refactoring in the group can be applied at the same time.

The RED-aware graph GR = (VR, ER) of the corresponding object-oriented program is constructed as fol-

lows.

• VR = {methods, attributes}

• ER = {method calls(method m1, method m2),

attribute assesses1(method m1, attribute a1),

attribute assesses2(method m1, method m2)}.

The vertices (VR) indicate the entities of methods and attributes. The edges (ER) indicate the association between

entities. The entities that are associated when they are preferably to be located in the same class for improving

maintainability (in term of low coupling and high cohesion). To this end, we connect the edge between the entities

when (1) a method calls the other method (method calls), (2) a method assesses an attribute (attribute assesses1),

and (3) two methods assess the same attribute (attribute assesses2).

The Fig. 4.7 shows the algorithm for the method of calculating MISs. In the algorithm, we provide the strict

constraints to prevent from producing different outcome of MISs for every execution on the same input system.

For the first step, based on the RED-aware graph GR = (VR, ER), the independent relations are extracted. An

independent relation indicates the relation: two vertices u, v ∈ VR and (u, v) 6∈ ER. For the second step, we

find the intermediate groups of entities—that already have the large number entities as possible—by grouping the

entities using transitive independent relations. The size of the intermediate groups of entities should be greater

than threshold λ. The threshold λ is determined by the average size of the intermediate groups of entities of

the certain programs. We use the threshold to cut the candidates of intermediate groups of entities; and in the

experiment, we show that the calculation for MISs is scalable for large size of programs. After determining the

intermediate groups of entities, we assign the remaining entities—that are not added into the groups—until no

more entities can be added to any other groups of entities without violating the independence property. Finally,

MISs of entities are obtained. Then, we exclude entities of attributes from the MISs of entities. Please note that, in

the thesis, we use the Move Method refactoring as the elementary refactoring. As mentioned in subsection 4.1.1,

we do not consider Move Field refactoring—moving attributes (i.e., fields) from one class to another class—,

because fields have stronger conceptual binding to the classes in which they are initially placed since they are less

– 25 –

import sys

from MRModel import * from MREntity import *

from MRMethod import *

from MRField import *

from MRClass import *

from numpy import * import scipy as sp

import random

def generateIndependentSets(matrix, numMethod):

(rows, cols) = matrix.nonzero()

independent_vertex_set_set = set()

for k in range(50): independent_vertex_set_len = len(independent_vertex_set_set)

rowidxs = list(range(len(rows)))

random.shuffle(rowidxs, random.random)

independent_vertex_set = set(range(numMethod))

while len(independent_vertex_set) > 0: next_independent_vertex_set = set()

next_row_idx = []

for i in rowidxs:

if rows[i] in independent_vertex_set and cols[i] in independent_vertex_set:

independent_vertex_set.remove(cols[i])

next_independent_vertex_set.add(cols[i]) next_row_idx.append(i)

rowidxs = next_row_idx

random.shuffle(rowidxs)

fivs = frozenset(independent_vertex_set)

rm = set() for vs in independent_vertex_set_set:

if vs < fivs:

rm.add(vs)

if fivs < vs:

rm.add(fivs)

independent_vertex_set_set.add(fivs) for vs in rm:

independent_vertex_set_set.remove(vs)

independent_vertex_set = next_independent_vertex_set

return independent_vertex_set_set

Figure 4.7: The algorithm for calculating MISs.

likely than methods to change once assigned to a class [88]. The example of MISs that can be obtained from Fig.

4.5(a) or Fig. 4.5(b) is MIS1 = {m1, m2, m4, m6, m7} and MIS2 = {m1, m4, m5, m6, m7}.

– 26 –

Chapter 5. Selection of Refactorings to be Applied

• MM : Move Method refactoring• CCH : Collapse Class Hierarchy

refactoring • MIS : Maximal Independent Set of

refactorings• GER: Group of Elementray Refactoring

Groups of Elementary Refactorings

R7

GER4R4 R6

R5

GER3R1

R2

GER1R3

Object-Oriented Source Code

Delta Table

Refactorings’ Effect Evaluation Framework

Creating Link Matrix Creating Membership Matrix

Link Matrix Membership Matrix

Deriving Delta Table

Accessing Effect of Refactorings

Selecting Multiple Elementary Refactorings

CCHs and MMsMISs

Transform into Elementary Refactorings

Effect of Refactorings

GER1 GER2 GER3 GER4 GER5

4 2 -2 1 3

MISs CCHs MMs

Selected Elementary Refactorings

R1R2

R3

Entities are mapped into Elementary Refactorings

Figure 5.1: Overall approach of the selection of multiple refactorings.

We provide the method of selecting refactorings by supporting assessment and impact analysis of elemen-

tary refactorings based on the matrix computation. It is important that the method of elementary-level refac-

toring computation (our refactoring selection method) enables selecting multiple refactorings. Furthermore, this

method supports to extend considering refactorings to other various type of refactorings; because the action of big

refactoring (e.g., Collapse Hierarchy Class refactoring) comprises of elementary refactorings (e.g., Move Method

refactorings).

The procedure of refactoring selection consists of the several activities: (1) calculation for each of elementary

refactoring’s effect on maintainability (Delta Table Derivation), (2) checking whether there are duplicated elemen-

tary refactorings (i.e., syntactic dependencies) or the RED among refactorings—that are identified as refactoring

candidates in chapter 4—(Refactoring Impact Analysis), (3) finding multiple (elementary) refactorings—that can

be applied at a same time—containing refactorings that most improve maintainability, and (4) identification of

the impacted refactorings after applying selected refactorings (Refactoring Impact Analysis) and recalculation of

the changed values for those impacted refactorings. The Fig. 5.1 illustrates the procedure for the approach of the

selection of multiple refactorings.

Before explaining about the procedure of refactoring selection, we briefly present the method how to assess

refactoring candidates.

– 27 –

Method of Refactoring Candidate Assessment

Before making a decision on which refactorings to apply, refactoring candidates need to be assessed.

In the previous studies, each of the refactoring candidates is assessed by using a simulation model or using a

virtual application method. This is because (1) the actual application of the refactoring candidate on source code

adds a significant overhead due to disk write operations (once for applying each refactoring and once for undoing

it) [87]; in addition, (2) the evaluation of the design quality of maintainability does not necessarily require all the

information on the source code. For using a simulation model, a refactoring candidate is applied and the effect

of the applied refactoring is assessed by evaluating maintainability on the transformed simulation model (i.e., the

refactored model); then the application of the refactoring candidate is rolled back. For instance, in Seng’s work

[82], they transform the source code into a suitable model using standard fact extraction technique; and they use

this model for simulating the source code refactorings and calculating the impact of these refactorings on the

fitness function. In our previous study of cost-effective refactoring identification [38], the AOM—the profiled

model mentioned in the previous section 4.1.1—is used for this purpose. For using a virtual application method,

the effect of a refactoring candidate is estimated (without application) by updating elements that are needed for

evaluating maintainability (i.e., calculating fitness function). For instance, in Tsantalis’ work [87], to assess the

effect of a Move Method refactoring opportunity, they update the entity sets which are involved in the move of the

corresponding method.

To evaluate maintainability of the refactored design of software, the quantitative fitness function of main-

tainability (e.g., QMOOD [8] and Maintainability Index (MI) [69]) is needed. By using the fitness function of

maintainability, refactoring candidates are evaluated and ranked in the order of their expected degree of improve-

ment on maintainability. Therefore, the refactoring(s) that mostly improve fitness value can be selected. For

instance, for the quantitative fitness function of maintainability, the weighted sum of design metrics are used in

Seng’s work [82], and the maintainability evaluation function [38]—designed as cohesion metrics over coupling

metrics—is used in our previous work. In Tsantalis’ work, they use the Entity Placement metric [87].

In this thesis, we provide the method for assessing elementary refactorings by using matrix computation. We

derive a delta table (see in 5.1), each cell of which indicates delta of maintainability after the application of each

Move Method refactoring on the current design configuration. The matrix computation is used for calculating each

of elementary refactoring’s effect on maintainability. The maintainability is calculated based on the information of

links—e.g., the entity a is associated with the entity b—and memberships—e.g., the entity a is placed in the class

A—of the entities. This way of assessing the effect of refactoring is similar to the one using a virtual application

method, since it estimates the effect of each of elementary refactoring by using the design status (e.g., links and

membership of entities) information without actually applying those refactorings. In addition, maintainability (see

subsection 5.1) used in the thesis provides the quantitative measure, therefore, each of elementary refactoring’s

effect on maintainability can be used for refactoring selection criteria.

– 28 –

5.1 Calculation for Each of Elementary Refactoring’s Effect on Main-tainability: Refactoring Effect Evaluation Framework

Maintainability

In object-oriented software, two objectives—high cohesion and low coupling—have been accepted as im-

portant factors for good software design quality in terms of maintenance [33]; because by adopting these design

quality metrics, less propagation of changes to other parts of the system or side effects would occur [10, 6]. Cohe-

sion corresponds to the degree to which elements of a class belong together, and coupling refers to the strength of

association established by a connection from one class to another. To this end, the associations among the entities

belonging to a class (inner entities) should be largest as possible (high cohesion). At the same time, the associa-

tions between the entities not belonging to a class but to other classes (outer entities) and the entities belonging

to the class itself should be smallest as possible (low coupling). Based on this concept, we measure maintain-

ability of the design for overall object-oriented software as the number of the associations across the classes. The

associations indicate the edges in the constructed graph GR that are presented in subsection 4.2.2. Therefore,

maintainability can be assessed by the number of edges (i.e., associations) whose two ends of vertices are located

in different classes. This number naturally represents the lack of degree of association to which entities of a class

belong together (lack of cohesion) and, at the same time, the degree of association to which entities of one class

to another have (coupling); as a result, by applying refactorings, we aim to reduce this number (for improving

maintainability).

Definition of Delta Table

We derive a delta table, each cell of which indicates delta of maintainability after the application of each

elementary refactoring on the current design configuration. This delta table is used for refactoring selection

criteria. The matrix computation is used for calculating each of elementary refactoring’s effect on maintainability.

For the delta table, the row elements indicate the moving methods and attributes, while the column elements

indicate the target classes. The matrix computation is fast, thus it provides efficient computation for deriving the

delta table.

Establishment of Refactoring Effect Evaluation Framework: Derivation of Delta Table

The Fig. 5.2 shows the algorithm for deriving a delta table.

The delta table is calculated as follows. We use the RED-aware graph GR to form the Link matrix (L). The

L denotes the link information from an entity to an entity. Each entity indicates a method or an attribute—which

is contained in a program. The value of cell of the L denotes the strength of the relation. When there is an

association from an entity (row) to an entity (column), then the value of one is accumulated to the cell of the L

is 1; otherwise, when there is no association between two entities, the cell of the L is 0. Note that we distinguish

the direction of the edges, therefore, when two entities have associations each other, then the value of cell of the

– 29 –

def getInternalExternalLinkMatrix(self): internal_link_mask = self.membershipMatrix * self.membershipMatrix.T internal_link_matrix = self.linkMatrix.multiply(internal_link_mask)

external_link_matrix = self.linkMatrix - internal_link_matrix return (internal_link_matrix, external_link_matrix)

def invertedMembershipMatrix(self, M): new_matrix = zeros((len(self.entities), len(self.classes)), dtype='int32') (rows, cols) = M.nonzero() for i in range(len(rows)): v = M[rows[i], cols[i]] new_matrix[rows[i], :] = new_matrix[rows[i], :] + v new_matrix[rows[i], cols[i]] = 0

ret = sp.coo_matrix(new_matrix)

return ret

def getEvalMatrix(self): (internal_matrix, external_matrix) = self.getInternalExternalLinkMatrix() IP = internal_matrix * self.membershipMatrix EP = external_matrix * self.membershipMatrix IIP = self.invertedMembershipMatrix(IP) D = IIP - EP return D

Figure 5.2: The algorithm for deriving a delta table.

L becomes 2. Since there is no association between fields, the cell of the L is 0. The Membership matrix (M )

denotes the membership information of an entity to a class. The cell of the M is 1, when an entity (row) is placed

in a class (column); the cell of the M is 0, when the entity is not located in the class. Note that even though we do

not consider Move Field refactorings, we need to consider attributes as entities of delta table for calculating delta

of maintainability affected by the location of the attributes. When those two matrices, L and M , are multiplied,

the Projection matrix (P ) is produced. The P represents the link information from an entity (row) to a class

(column). For deriving a delta table, we first compute two types of the P : PInt (internal projection matrix) and

PExt (external projection matrix), each of which is computed by the multiplication of the M with LInt (matrix

denoting internal links) and LExt (matrix denoting external links), respectively.

PInt = LInt ×M,

PExt = LExt ×M.

As was noted in subsection 5.1, the maintainability is assessed by the number of external links—edges (associa-

tions) whose two ends of vertices (entities) are located in different classes—in the system, and this number should

– 30 –

be reduced for improving maintainability. The cell of the PInt is 1, when the internal link exists from the entity

(row) to the class (column). This means that moving the entity to other classes (other than the class itself) will

potentially increase the external links in the system. To this end, we use Inv() function for PInt. The Inv()

function inverses the cell of PInt(entity, class itself) as 1→ 0 and PInt(entity, other classes) as 0→ 1. The cell

having 1 in PExt means that the external link exists from an entity (row) to a class (column). This means that

moving the entity to the class itself will decrease the external links in the system. Finally, by using the formulation

below, we can get the delta table (D) of which each cell is delta of maintainability value after application of each

Move Method refactoring on the design.

D = Inv(PInt)− PExt.

5.2 Refactoring Impact Analysis

Refactoring impact analysis is used in (a) checking whether there are duplicated elementary refactorings

or the RED among refactorings, and (b) identifying of the impacted refactorings after applying refactorings for

updating changed maintainability values.

The refactoring impact analysis of (a) is done when they attempt to grouping refactorings—that can be

applied at a same time—(to make a more bigger refactoring). For this, refactorings are projected into elementary

refactorings (i.e., Move Method refactorings). Then, among the elementary refactorings, we check whether there

are duplicated elementary refactorings (i.e., syntactic dependency) or the RED.

The refactoring impact analysis of (b) is done for identifying of the impacted refactorings after applying

refactorings for updating changed maintainability values. In the internal projection matrix (PInt) and the external

projection matrix (PExt), the following cells are changed: classes that are changed in M (i.e., owner class of

moving method and target class) and linked methods with the moving method. Therefore, only those changed

cells need to be recalculated.

5.3 Selection of Multiple Refactorings

We select the group of refactorings—that can be applied at the same time—containing the multiple elemen-

tary refactorings that best improves maintainability. The refactorings are selected from the identified refactoring

candidates—(1) Collapse Class Hierarchy refactorings and (2) Move Method refactorings (in section 4.1) and 3)

MISs (groups of elementary (i.e., Move Method) refactorings) (in section 4.2). For selecting multiple refactor-

ings, accumulated values of delta of maintainability for each group of refactoring candidates are prioritized in the

descending order, and the group of refactoring candidates which has the largest value are selected.

Using the delta table, methods involved in each MIS are transformed into a group of elementary refactorings.

For instance, for each methodm in a MIS, the Move Method refactoring that has the largest delta of maintainability

is mapped out of all available Move Method refactorings (method m, class c), where c 6= owner class of method

– 31 –

m and c ∈ classes in the system. In short, the Move Method refactoring is determined by comparing the values of

delta of maintainability for the row of entity of method m in the delta table.

After selection, we recalculate the changed values of impacted elementary refactorings for updating the delta

table. In addition, the M and L matrices are updated. The procedure of refactoring identification is iterated until

no more group of refactoring candidates for improving maintainability are found.

It is important to note that before selecting multiple refactorings, we check the specialization ratio (S) [44],

which is a measure used to prevent merging of too many classes together and getting the class hierarchy wider. S

is formulated as follows: (] of classes - ] of root classes) / (] of classes - ] of leaf classes), where the root classes

are the distinct class hierarchies, and the leaf classes are the ones from which the other classes do not inherit. S

measures the width of the inheritance tree; in other words, S is the average number of derived classes for each base

class. Therefore, a higher value indicates a wider tree. If the S of the refactored model exceeds specific threshold

γ, then an alternative refactoring (for example, a refactoring with the second-largest fitness value) is selected.

Example

By following the procedure explained above, Fig. 5.3 illustrates how to obtain the delta table (Fig. 4.5(c)) for

the corresponding design (Fig. 4.5(a)). Note that Fig. 5.4 represents the matrices required to calculate the delta

table (Fig. 4.5(c)).

– 32 –

LInt m1 m2 m3 m4 m5 m6 m7

m1 0 0 1 0 0 0 0

m2 0 0 0 0 0 0 0

m3 1 0 0 0 0 0 0

m4 0 0 0 0 0 0 0

m5 0 0 0 0 0 0 0

m6 0 0 0 0 0 0 0

m7 0 0 0 0 0 0 0

M A B C D

m1 0 1 0 0

m2 1 0 0 0

m3 0 1 0 0

m4 0 0 1 0

m5 0 0 0 1

m6 1 0 0 0

m7 1 0 0 0

PInt A B C D

m1 0 1 0 0

m2 0 0 0 0

m3 0 1 0 0

m4 0 0 0 0

m5 0 0 0 0

m6 0 0 0 0

m7 0 0 0 0

X=

(a) PInt = LInt ×M

LExt m1 m2 m3 m4 m5 m6 m7

m1 0 0 0 0 0 0 0

m2 0 0 1 0 1 0 0

m3 0 1 0 1 0 1 1

m4 0 0 1 0 0 0 0

m5 0 1 0 0 0 0 0

m6 0 0 1 0 0 0 0

m7 0 0 1 0 0 0 0

M A B C D

m1 0 1 0 0

m2 1 0 0 0

m3 0 1 0 0

m4 0 0 1 0

m5 0 0 0 1

m6 1 0 0 0

m7 1 0 0 0

PExt A B C D

m1 0 0 0 0

m2 0 1 0 1

m3 3 0 1 0

m4 0 1 0 0

m5 1 0 0 0

m6 0 1 0 0

m7 0 1 0 0

X=

(b) PExt = LExt ×M

Inv(PInt) A B C D

m1 1 0 1 1

m2 0 0 0 0

m3 1 0 1 1

m4 0 0 0 0

m5 0 0 0 0

m6 0 0 0 0

m7 0 0 0 0

PExt A B C D

m1 0 0 0 0

m2 0 1 0 1

m3 3 0 1 0

m4 0 1 0 0

m5 1 0 0 0

m6 0 1 0 0

m7 0 1 0 0

D A B C D

m1 1 0 1 1

m2 0 -1 0 -1

m3 -2 0 0 1

m4 0 -1 0 0

m5 -1 0 0 0

m6 0 -1 0 0

m7 0 -1 0 0

=

(c) D = Inv(PInt)− PExt

Figure 5.3: Example of calculating the delta table (D) of Fig. 4.5(c) for the design of Fig. 4.5(a).

– 33 –

LInt m1 m2 m3 m4 m5 m6 m7

m1 0 0 1 0 0 0 0

m2 0 0 0 0 0 0 0

m3 1 0 0 0 0 0 0

m4 0 0 0 0 0 0 0

m5 0 0 0 0 0 0 0

m6 0 0 0 0 0 0 0

m7 0 0 0 0 0 0 0

(a) Internal link matrix (LInt).

LExt m1 m2 m3 m4 m5 m6 m7

m1 0 0 0 0 0 0 0

m2 0 0 1 0 1 0 0

m3 0 1 0 1 0 1 1

m4 0 0 1 0 0 0 0

m5 0 1 0 0 0 0 0

m6 0 0 1 0 0 0 0

m7 0 0 1 0 0 0 0

(b) External link matrix (LExt).

M A B C D

m1 0 1 0 0

m2 1 0 0 0

m3 0 1 0 0

m4 0 0 1 0

m5 0 0 0 1

m6 1 0 0 0

m7 1 0 0 0

(c) Membership matrix (M).

PInt A B C D

m1 0 1 0 0

m2 0 0 0 0

m3 0 1 0 0

m4 0 0 0 0

m5 0 0 0 0

m6 0 0 0 0

m7 0 0 0 0

(d) Internal projection matrix

(PInt).

Inv(PInt) A B C D

m1 1 0 1 1

m2 0 0 0 0

m3 1 0 1 1

m4 0 0 0 0

m5 0 0 0 0

m6 0 0 0 0

m7 0 0 0 0

(e) Inversion of internal projec-

tion matrix (Inv(PInt)).

PExt A B C D

m1 0 0 0 0

m2 0 1 0 1

m3 3 0 1 0

m4 0 1 0 0

m5 1 0 0 0

m6 0 1 0 0

m7 0 1 0 0

(f) External projection matrix

(PExt).

D A B C D

m1 1 - 1 1

m2 - -1 0 -1

m3 -2 - 0 1

m4 0 -1 - 0

m5 -1 0 0 -

m6 0 -1 0 0

m7 0 -1 0 0

(g) Delta Table.

Figure 5.4: Matrices matrices required to calculate the delta table (Fig. 4.5(c)).

– 34 –

Chapter 6. Tool Implementation

Refactoring

Extractor

compile

Refactoring Selector

Metric Measurer &

Fitness Evaluator

Refactoring

Simulator

Dynamic-static

Composer

Static

Profiler

Dynamic

ProfilerDynamic profiling Static profiling

Dynamic method

calls

Static method

calls

Abstract Object

Model (AOM)

Class definitions

Dynamic-Static mapping

Java Source

CodeJava Byte Code

Measuring metrics &

Evaluating fitness values

Simulating refactoring

candidates

Refactored AOM Tentatively

refactored AOM

Selecting & applying

best refactoring for

transforming AOM

Refactored AOM Refactored AOM Refactored AOMFitness value

A sequence of

refactorings

Generating results

No

improvement

Refactoring

candidate

identification rules

Extracting refactoring

candidates

Maintainability

evaluation function

Refactored AOM Refactored AOMRefactored AOM Refactoring candidate

Selected

refactoring logs

else

Figure 6.1: Overall tool architecture.

The proposed method has been implemented [37] using Java with Eclipse environment. Fig. 6.1 illustrates

the overall tool architecture. The following main modules comprise the tool: static profiler, dynamic profiler,

dynamic-static composer, refactoring simulator, metric measurer, fitness evaluator, and refactoring selector. In

the static profiler, given the Java source code, code structure information such as SMCs and class definitions

are extracted. On the other hand, in the dynamic profiler, DMCs are extracted by executing a Java byte code

compiled from the Java source code using user scenarios or operational profiles. In the dynamic-static composer,

the DMCs are mapped into corresponding SMCs from which those DMCs are instantiated, and the base Abstract

Object Model (AOM) is constructed. More detailed explanation of AOM is provided in subsection 4.1.1. In

the refactoring extractor, refactoring candidates are extracted using refactoring-candidate extraction rules. In the

refactoring simulator, refactoring candidates are applied by transforming base AOM, and tentatively refactored

AOMs are produced. In the metric measurer and the fitness evaluator, for all tentatively refactored AOMs, metrics

are derived and used to calculate the fitness value of the maintainability evaluation function. In the refactoring

– 35 –

selector, if no more refactoring candidates for improving fitness values are found, the tool is stopped, and it

generates the selected refactoring logs as output indicating a sequence of recommended refactorings. Otherwise,

the refactoring that makes the refactored AOM with the best fitness value is selected in a greedy way and applied,

then the base AOM is updated into the refactored AOM, only when the specialization ratio of the refactored AOM

does not exceeds the specific threshold γ. After that, the procedure of the refactoring extractor, the refactoring

simulator, the metric measurer, the fitness evaluator, and the refactoring selector are iterated. In addition to the

best selection mode, the tool can be operated in a user-interactive mode. In user-interactive mode, users can select

the preferred refactoring. Fig. 6.2 shows a snapshot of the tool operation.

Figure 6.2: A snapshot of tool operation.

– 36 –

Chapter 7. Evaluation

In the first subsection, research questions that we investigate in this experiment are presented. In the second

subsection, the experimental subjects and data processing method for those subjects are explained. In the third

and fourth subsections, the evaluation design is explained, and the results are presented, respectively. The final

subsection ends with threats to validity.

7.1 Research Questions

We evaluate our approach of refactoring identification in two aspects: dynamic information-based identifi-

cation of refactoring candidates and RED-aware selection of multiple refactorings. The research questions for

our experiments are as follows. The first research question is to test the usefulness of the overall approach by

using dynamic information in identifying refactoring candidates, while the second research question is to test the

capability of the approach in extracting refactoring candidates. The third and fourth research questions are to test

the effect of using Groups of Elementary Refactorings (MISs) and the effect of the RED, respectively.

RQ1. Effect of dynamic information for effective refactoring identification

Is the dynamic information helpful in identifying refactorings that effectively improve maintainability?

RQ2. Effect of dynamic information for extracting refactoring candidates in frequently changed classes

Is dynamic information helpful in extracting refactoring candidates in the classes where real changes had

frequently occurred?

RQ3. Effect of selection for multiple refactorings

Do the multiple refactorings help to improve maintainability and reduce the costs of search space exploration

and computation?

RQ4. Effect of the RED

Is the RED important when grouping entities into MISs for improving maintainability?

7.2 Subjects and Data Processing

Three projects are chosen for experimental subjects: jEdit [49], Columba [20], and JGIT [52]. A number of

reasons led us to select these subjects.

• The full source code of each version is available.

• They contain a relatively large number of classes.

– 37 –

Table 7.1: Characteristics and development history for each subject.

Name jEdit Columba JGIT

Type Text editor Email client Distributed source version control system

Total ] of revisions 19501 458 1616

Report period 2001-09 ∼ 2011-09 2006-07 ∼ 2011-07 2009-09 ∼ 2011-09

Number of developers 25 9 9

Version to apply the proposed approach jEdit-4.3 Columba-1.4 v1.1.0.201109151100-r

Table 7.2: Measures of each subject.

Name jEdit (jEdit-4.3) Columba (Columba-1.4) JGIT (jGit-1.1.0)

Class ] 952 1506 689

Method ] 6487 8745 5334

Attribute ] 3523 3967 2989

Link ] 26626 23981 18280

Fitness function value 0.023287 0.023117 0.021357

• They are written in Java; our proposed method applies to object-oriented software.

• Their development histories are well-managed in version-control systems.

Table 7.1 summarizes characteristics and development histories of each subject. In addition, measures of each

subject—the classes, method, attributes, links (i.e., associations), and fitness function values—are presented in

Table 7.2.

To apply the proposed approach on each subject, one version of the source code is selected as input data (as

the last row in Table 7.1). It is important to mention that we select a version after which major changes have

occurred. We also do not select the early version because, at that time, the software is unstable, and meaningless

changes may occur frequently. In short, we take into account a mature version.

Performing the Dynamic Profiling

To obtain dynamic information of the dependencies of entities, dynamic profiling is performed by executing

the selected version of the program of each subject according to its user scenarios or operational profiles. In this

experiment, the user scenarios or operational profiles data are not available, since the subjects are chosen from

open source projects. Therefore, the dynamic information is obtained by executing the programs from various

users who exhibit normal behavior in using the programs. To obtain more reliable dynamic profiling results, we set

specific criteria for use of software regarding characteristics of users, experimental environment, or experimental

conditions. Note that we do not take into account abnormal or extraordinary scenarios, because they may result in

suggesting refactorings in parts not actually in use. For example, for jEdit, we do not log the bootstrapping part

of the editor; we profile only the editing part. Similarly, for Columba, we do not log the initializing part of the

– 38 –

e-mail client, but only functions such as retrieving messages from a mailbox, composing messages, and submitting

messages to a server. In the following, for each subject, we present the criteria and the rationale for using it.

jEdit [49] is a java-based text editor which is developed for using the same editor on different platforms or

operating systems. It also provides very common graphic user interfaces like other text editors. Since it provides

only one interface, GUI, we set the criteria of the experimental condition: characteristics of language (natural

language and formal language) and length of written text (long and short). To detect typos while editing the natural

language, the text editor should search an entire dictionary that is rather big; while a typo on the formal language

(e.g., programming language) can be relatively easily detected. On the other hand, for a long description, the users

tend to change the structure of the description while writing the description. However, for a short description,

like a short e-mail message, the description is written without revision. The dynamic profiling was conducted as

follows.

- Long description with formal language: C and python server code, 2 days, 1 man.

- Short description with formal language: html, python, 1 day, 2 men.

- Long description with natural language: latex, 2 days, 1 man.

- Short description with natural language: E-mail message, 1 day, 2 men.

Columba [20] is an E-mail client program implemented using Java. It supports standard protocols—POP3

and IMAP4—for e-mail clients and provides usual GUI features, such as showing the list of received and sent mails

and e-mail composition. According to the interfaces which Columba has, we set the criteria of the experimental

condition: network protocol and GUI. For the network protocol, we take into account POP3 and IMAP4. As we

mentioned above, the GUI is composed of three common mail client actions, and we distinguish read-intensive

users and write-intensive users. By observing the usage pattern of the users, we found that the graduate school

students tended to be read-intensive users, while the business people tended to be write-intensive users, relatively.

Therefore, we chose the two user groups for realizing the experiment condition. The dynamic profiling was

conducted for four days with six graduate school students and six business men working in a venture company.

Since all of them used the G-mail, they used the Columba with the POP3 on first two days and then they used it

with the IMAP4 on next two days.

JGIT [52] is a java implementation of git, which is a well-known distributed version control system. The

git provides very complex version control operations, such as three-way merging, cherry-picking, and rebase.

Moreover, the git uses various protocols: https, ssh, or git. Among those functionalities, the JGIT does not

provide all of them, but only provide core functionalities: clone, push, commit, fetch or branch. In fact, usage

pattern of JGIT may be varied by the habit of the user. For example, some developers prefer the small commit and

big push, while some other developers prefer the big commit and big push, and few developers prefer the small

commit and small push. On the other hand, some users does not use commit or push functionalities, but only use

clone and fetch—of course, these developers do not use branch. The dynamic profiling was conducted for three

– 39 –

days by three developer and one manager of the venture company as follows.

• Each of three developers has characteristics as follows:

- Domain: server side; language: Erlang and Python; commit interval: short; push interval: short; clone or

pull interval: rare

- Domain: client side (iOS); language: objective-C and C; commit interval: short; push interval: long; clone

or pull interval: rare

- Domain: client side (Android); language: Java and C; commit interval: long; push interval: long; clone or

pull interval: rare

• The manager’s characteristics are as follows:

- Domain: server, client, and library side; language: Java, objective-C and C; commit interval: long; push

interval: long; clone or pull interval: short.

Extracting Changes

Table 7.3: Examined range of revisions.

jEdit Columba JGIT

18000 ∼ 19000 300 ∼ 450 1 ∼ 1616

For each subject, real changes—that had occurred within the examined revisions of the development history—

are extracted. We examined the revisions (as in Table 7.3) that had been made after the selected version. The

changed methods include method body changes as well as method signature changes, such as changes in name,

parameter, visibility, and return type.

To test hypothesis 1, the changed methods across the revisions are added to the list of changed methods,

which is used as the input for change impact analysis. Note that the methods in the list of changed methods can

be redundant to take into account the effect of their frequency of occurrences. To test hypothesis 2, the set of

changed classes—which are the owner classes of those changed methods—and the corresponding number (i.e.,

frequency) of changes for those classes are used to be compared with extracted classes as refactoring candidates.

The procedure used for extracting the list of changed methods is as follows. First, we retrieve the source code in

which each revision occurred. Next, we analyze files in the source code for each revision to obtain the following

information.

abstract java abstract java is a function such that:

(rev number, file name)→ a set of file info entry

– 40 –

file info entry file info entry is a tuple such that:

(start line number, end line number, class name, method name)

Then, we use Diff and obtain changed line numbers.

line change line change is a function such that:

(former rev number, latter rev number, file name)→ changed line number

Finally, using these changed line numbers, we can obtain the changed methods which had been changed across

the revisions.

7.3 Evaluation Design

RQ1: Effect of Dynamic Information for Cost-Effective Refactoring Identification

Before explaining the experimental design of RQ1, we clearly provide the definition of cost-effective refac-

torings. Cost-effectiveness of applied refactorings can be explained as maintainability improvement over invested

refactoring effort (cost). Therefore, it can be said that refactoring X is more cost-effective than refactoring Y,

when maintainability improvement in relation to the invested effort of applying refactoring X is larger than that of

applying refactoring Y. By using cost-effective refactorings, the less effort of applying refactorings is required to

accomplish the same maintainability improvement.

For RQ1, which aims to investigate whether dynamic information is helpful in identifying cost-effective

refactorings that fast improve maintainability and lead to high rate of maintainability improvement, we use the

method of change simulation. We compare the results of change simulation to observe how quickly the number

of propagated changes is reduced on the refactored models whose applied refactorings are identified using the

following three comparison groups.

1. The approach using dynamic information only (group 1)

2. The approach using static information only (group 2)

3. The combination of the two approaches (group 3)

For each subject, refactorings are identified from three comparison groups as follows. (1) In group 1, for

each iteration of refactoring identification process, 180 refactoring candidates (i.e., 18 rules [6 types of scoring

functions × 3 types of refactorings] × 10 top refactoring candidates) are assessed. In this experiment, the cutline

number—the threshold number for limiting the consideration set of refactoring candidates—is set to 10. The

best refactoring is selected and applied. We continue to perform the refactoring identification process until no

more refactoring candidates for improving maintainability are found. At last, we obtain a sequence of refactor-

ings. (2) Group 2 follows the same approach of the group 1 but substitutes the DMCs with SMCs (i.e., using

– 41 –

static measures instead of dynamic measures) in the refactoring-candidate extraction rules and the maintainability

evaluation function for extracting and assessing refactoring candidates, respectively. (3) In group 3, the consid-

ered refactoring candidates are the ones extracted from both approaches—refactoring candidates from group 1⋃

refactoring candidates from group 2—and the best refactoring is selected and applied in an iterative way (with the

same method of our study). The approach of group 2 is to test the effect of using dynamic information, and the

approach of group 3 is to test whether the dynamic information can be additional or complementary information

to the static information. Note that the aim of this test is not to compare the performance of the approach of using

dynamic information versus the approach of using static information but to investigate the effect of using dynamic

information for identifying cost-effective refactorings.

Change simulation. To assess the capability of refactorings for maintainability improvement, we use the change

simulation method. The basic idea is to inject changes—extracted as real changes that have occurred during

software maintenance—and then obtain propagated changes by performing change impact analysis. We call the

former change the original change and the latter change the propagated change. This evaluation method is based

on the belief that propagated changes indicate how the design can withstand original changes; in other words, the

more easily the design accommodates changes, the fewer propagated changes will occur. Therefore, the capabil-

ity of refactorings for maintainability improvement is measured by the reduced number of propagated changes.

For performing change simulation, the list of changed methods (explained in subsection 7.2) is used as original

changes (input for change impact analysis). Then, change impact analysis is performed on each of the refac-

tored model—produced by every application of a sequence of refactorings—to obtain propagated changes (output

for change impact analysis). Change impact analysis is the method to identify the potential consequences for a

change; therefore, the propagated changes are computed by taking the directly and indirectly affected methods

from the method. Change impact analysis used in the experiment is implemented as follows. For each method

in the list of changed methods, the propagated methods, that refer to this method but are defined in other classes,

are retrieved. Note that the change impact analysis is performed in the batch processing mode. Subsequently, the

propagated methods or classes—the owners of the propagated methods—are accumulated for all the methods in

the list of changed methods. The two-steps of indirect propagated methods are considered using the weight value

of 0.5, while the direct propagated methods use the weight value of 1.

Two indicators for cost-effective refactorings. The cost-effectiveness of the identified refactorings can be evalu-

ated by observing how fast the number of propagated changes is reduced. In our approach, we assume that invested

refactoring effort (cost) is the number of applied refactorings. We notice the limitation of estimating refactoring

cost as the number of applied refactorings; to more accurately estimate refactoring cost, we need to consider the

effort needed to perform the activities of the entire refactoring process. We deal with this issue in the discussion

chapter 8. Two indicators are used: (1) the percentage of reduction for propagated changes and (2) the rate of

reduction for propagated changes. The indicators can be calculated as follows. Let rn represent the nth applied

refactoring. Let the number of propagated changes for accommodating changes on the initial profiled model be

ic0, iclast on the design applying all the identified refactorings from the first to the last refactoring, and icn on

– 42 –

the design applying a sequence of identified refactorings r1, ..., rn. The percentage of reduction for propagated

changes (percntRPC) of rn is as below.

percntRPC(rn) =ic0 − icn

ic0 −min{iclast}× 100,

where min{x} returns the minimum number of propagated changes among all comparison groups, which is

needed for normalization. The rate of reduction for propagated changes (rateRPC) between rm and rn, when rm

precedes rn, is calculated by the differences of percentage of reduction for propagated changes over the number

of applied refactorings as below.

rateRPC(rm, rn) =percntRPC(rn)− percntRPC(rm)

] of applied refactorings between rn and rm.

In this experiment, the rate of reduction for propagated changes is considered for every applied refactoring; there-

fore, it can be represented as to percntRPC(rn)− percntRPC(rn−1), since the number of applied refactorings

between refactorings is 1.

0

5

10

15

20

25

30

1 2 3 4 5 6 7 8 9 10 11 12

(a) jEdit

0

20

40

60

80

100

120

1 2 3 4 5 6 7 8 9 10 11 12 13 17 19 20 21 22 26 35 43 49

(b) Columba

0

5

10

15

20

25

30

35

40

45

1 4 7 10 13 16 19 22 26 30 34 37 40 45 48 55 59 65 74 78 85 94 99 106121153231280357778

(c) JGIT

Figure 7.1: Change distribution graph (X-axis: ] of occurred changes for each class, Y-axis: ] of corresponding

classes)

– 43 –

RQ2: Effect of Dynamic Information for Extracting Refactoring Candidates in Frequently Changed Classes

The underlying assumption of our approach is that changes are more prone to occur in the pieces of codes the

users more often utilize and that, hence, applying refactorings in these parts would fast improve maintainability of

software. For this reason, we extract refactoring candidates in the entities—involved in given scenarios/functions

of a system—in a way that reduces dependencies of those entities.

To validate the assumption, we test RQ2, which aims to investigate whether dynamic information is helpful

in extracting refactoring candidates in the classes where real changes had frequently occurred. For each subject,

we compare a) the classes of the top 10%, 20%, 30% and 100% (i.e., all changes) most frequently changed during

the real development history (explained in subsection 7.2) with b) the classes of refactoring candidates extracted

from the approach using dynamic information and the approach using static information, respectively, to observe

how many classes extracted as refactoring candidates are found in real changed classes. In addition to all changes,

we consider top k% (k = 10, 20, 30) most frequently changed classes to examine the capability of each approach

for extracting refactoring candidates from highly-ranked frequently changed classes. We have considered up to

the top 30% most frequently changed classes, because, for three subjects—jEdit, Columba, and JGIT—most of the

changes occur in those changed classes (see Fig. 7.1). For instance, the ratio of the number of occurred changes

in the top 30% most frequently changed classes over the total number of occurred changes in changed classes is

143 out of 207 ≈ 70%, 788 out of 993 ≈ 80%, and 8754 out of 9773 ≈ 90%, for jEdit, Columba, and JGIT,

respectively. Note that, as with RQ1, the aim of this test is not to compare the performance of the approach of

using dynamic information versus the approach of using static information but to investigate the effect of using

dynamic information for extracting refactoring candidates in the frequently changed classes.

We use the classes of refactoring candidates that are obtained from group 1 (i.e., the approach using dynamic

information only) and group 2 (i.e., the approach using static information only) in RQ1. The lists of the extracted

classes as refactoring candidates for each approach and the list of real changed classes are ranked in a descend-

ing order according to the number of occurred changes for each class. Let the ranked list of the top k% most

frequently changed classes be rankKChanged, the ranked list of the classes extracted as refactoring candidates

for group 1 be rankDynamic, and the ranked list of group 2 be rankStatic. By comparing rankKChanged

with rankStatic and rankDynamic, we first obtain (1) commonly found classes (i.e., intersect set). We then

obtain (2) the two distance measures (K: Kendall’s tau, F: Spearman’s footrule [27]), which are the measures for

comparing similarity of two top k lists.

RQ3: Effect of Multiple Refactorings

To investigate the effect of multiple refactorings (RQ3), we compare our approach—whose refactoring can-

didates are from both top-down and bottom-up approaches, (1) Collapse Class Hierarchy refactorings (i.e., big

refactorings) and Move Method refactorings identified from the rule-based identification and (2) MISs, which

should be supported on assessment and impact analysis of elementary refactorings—with the approach whose

refactorings candidates are only (1). To evaluate the capability of the selected refactorings for maintainability

– 44 –

fitness =

MSCavg −MSCmin

MSCmax −MSCmin

w1

DCavg −DCminDCmax −DCmin

+ w2

SCavg − SCminSCmax − SCmin

+ w3

DCSavg −DCSminDCSmax −DCSmin

+ w4

SCSavg − SCSminSCSmax − SCSmin

Figure 7.2: Maintainability evaluation function for producing fitness value.

improvement, we use the maintainability evaluation function [38] as a fitness function to measure maintainability

of the refactored design.

Maintainability evaluation function. In search-based approaches, to combine multiple objectives into a single-

objective function, methods such that (1) metrics for each objective are normalized, weighted, and added up

[81, 61], or (2) Pareto optimality [43], are used. We adopt the former approach for conflating two objectives of

metrics—high cohesion and low coupling—into a single fitness function. We design the maintainability evaluation

function as (cohesion / coupling), because the maintainability evaluation function of this design produces larger

fitness values as the software gets more maintainable (with higher cohesion and lower coupling). In addition,

the two objectives may conflict in many cases, and the maintainability evaluation function of this design prevents

merging of unrelated units of codes, which reduces couplings but lowers cohesion.

Fig. 7.2 shows the formulation of the maintainability evaluation function, which produces the fitness value

of the refactored model. Each metric is normalized in the following way: the difference between the average

and minimum values is divided by the difference between the maximum and minimum values of the metric. The

average value of the metric is obtained by summing all the values of the classes and dividing this by the number

of classes. For composing all coupling metrics, weight values, whose total sum is one, are multiplied to each

normalized coupling metric, then all the normalized coupling metrics are added up. Note that by using the weight

values, a user can decide to focus on certain aspects of the maintainability evaluation function. In our approach,

we assign a weight value of 0.25 for each coupling metric.

In the following, each metric—constituting the maintainability evaluation function—is explained. For co-

hesion, the Method Similarity Cohesion (MSC) [11] metric is used. In this metric, the similarity for all pairs

of methods are integrated and normalized to measure how cohesive the class is. Its difference from the other

cohesion metrics is that it considers degree of similarity between a pair of methods in a class. For instance, Lack

of Cohesion in Methods (LCOM) [18] does not account for the degree of similarity between methods; instead, it

categorizes the sets into two groups—empty and non-empty—and produces the same results for a pair of methods

whether it has one instance variable or all instance variables shared in common. Another cohesion metric, Cohe-

sion Among Methods in Class (CAMC) [7], is not considered, because this metric only deals with the parameter

types (not usage of instance variables or methods). MSC for a class C is calculated as follows:

MSC(C) =2

n(n− 1)

n(n−1)2∑i=1

IVcIVt

i,

– 45 –

where class C has n methods, and for a pair of methods, IV c and IV t stand for the common (i.e., intersect set)

and total instance (i.e., union set) variables used by the pair of methods repeatedly. Since there are n(n−1)2 distinct

combinations of pairs of methods in a class, i ranges from 1 (i.e., first pair) to n(n−1)2 (i.e., last pair), and |IV |c|IV |t i

indicates the similarity of the pair of methods, respectively.

For coupling, four coupling metrics—defined based on the DMCs as well as the SMCs—are used. The size

of the DMCs in both directions for all methods in a class C is defined as Dynamic Coupling (DC). DC can be

specified as

DC(C) =∑mi

|DMC(mi,≺)|+ |DMC(mi,�)|,

where every method m in class C. In the same way, the size of SMCs, measured on a method between caller and

callee classes, in both directions for all methods in a class C, is defined as Static Coupling (SC); SC can also

be specified based on SMCs. Let SMC(ε, δ) denote the list of SMCs retrieved in respect to the entity ε and the

direction δ likewise DMC(ε, δ). Then, SC can be specified as

SC(C) =∑mi

|SMC(mi,≺)|+ |SMC(mi,�)|,

where every method m in class C. The modified versions of DC and SC are defined and named DCS and SCS

by converting from lists into the set of DMC and SMC. In other words, redundant elements are eliminated from

the lists ofDMC and SMC for degrading the effect of strength of dependencies; therefore, only distinct elements

remain in the set of DMCS and SMCS . Each of the defined coupling DCS and SCS is specified as follows.

DCS(C) =∑mi

|DMCS(mi,≺)|+ |DMCS(mi,�)|,

SCS(C) =∑mi

|SMCS(mi,≺)|+ |SMCS(mi,�)|.

It is worth to mention that we have considered four couplings which capture 8 types of combination: (dynamic

method call vs. static method call) × (import direction vs. export direction) × (distinct methods [set] vs. all

invoked methods [list]). They cover not only many well-known coupling metrics but also additional features

(i.e., dynamic aspects). For instance, Message Passing Coupling (MPC) [57] counts static method calls for all

invoked methods in the import direction, and Request For a Class (RFC) [18] counts static method calls for

distinct methods in the import direction, while Coupling Between Objects (CBO) [18] counts static method calls

for distinct methods in both directions. The coarse-grained metrics, such as Coupling Factor (CF) [13], are not

considered, because they are measured based on the number of coupled classes, not on the methods. All the

mentioned coupling metrics capture only static aspects, which are based on static method calls that can be obtained

by analyzing source codes without running a program.

– 46 –

Table 7.4: Fitness function values of the original design.

jEdit Columba jGit

Fitness fn. 0.023287 0.023117 0.021357

Table 7.4 shows the fitness function values of the original design (i.e., before applying selected refactorings).

For measuring the costs of computation and search space exploration of the approach, we use the number of

iterations for the selection process and the elapsed time. The elapsed time is measured under following conditions:

processor 1.8GhHz Intel Core i5, Memory 8G 1600 MHz DDR3, Graphic Intel HD Graphics 4000 512MB, and

Software OS X 10.8.2. We compare the two approaches on fitness function values (maintainability improvement),

the number of iterations, and the elapsed time (computation cost and search space exploration cost).

RQ4: Effect of the RED

To investigate the effect of the RED (RQ4), we compare the approach considering the RED (our approach)

with the approach without considering the RED. We compare these two approaches on fitness function values

(maintainability improvement) as well. To analyze the different results of the two approaches, the accumulated

deviation is measured as follows.]iterations∑

i=0

|Expectedi −Actuali|,

where Expectedi and Actuali are expected and actual maintainability (i.e., external links assessed in the refac-

toring effect evaluation framework) on i-th iteration, respectively.

7.4 Results

7.4.1 Dynamic Information-based Identification of Refactoring Candidates

RQ1: Effect of Dynamic Information for Cost-Effective Refactoring Identification

The results are represented in Fig. 7.3, Fig. 7.4, and Fig. 7.5 for jEdit, Columba, and JGIT, respectively;

the x-axis shows each applied refactoring, and the y-axis shows the number of propagated changes of methods

or classes to accommodating original changes. In addition, Table 7.5 summarizes the percentage of reduction

for propagated changes (i.e., methods) and the rate of reduction for propagated changes (i.e., methods) for each

applied refactoring.

For jEdit, as in Fig. 7.3(a) and Fig. 7.3(b), the same number of propagated changes is reduced for all

approaches in the first applied refactoring. However, from the next applied refactorings, the approaches using

dynamic information (group 1 and group 3) reduce the number of propagated changes faster than the approach

using static information only (group 2) does. For this reason, to reach the same number of reduced propagated

changes—for example, where the percentage of reduction for propagated changes is around 72% ∼ 75%—the

– 47 –

required numbers of refactoring application are 5, 6, and 7 for group 3, group 1, and group 2, respectively. As a

result, the average rate of reduction for propagated changes of all nine of applied refactorings for the approaches

using dynamic information (group 1 and group 3) are higher than that of the approach using static information

only (group 2). For instance, the average rates of reduction for propagated changes of all nine applied refactorings

are 11.11%, 10.56%, and 9.44% for group 3, group 1, and group 2, respectively. Furthermore, at the final solution,

where propagated changes do not drop anymore, the number of reduced propagated changes of the approaches

using dynamic information (group 1 and group 3) is greater than that of the approach using static information only

(group 2). For instance, when positing the total percentage of reduction of propagated changes for the combination

of the two approaches (group 3) as 100%, then those for group 1 and group 2 are 95% and 85%, respectively.

For Columba, as in Fig. 7.4(a) and Fig. 7.4(b), the approaches using dynamic information (group 1 and

group 3) also reduce the number of propagated changes much faster and bigger than the approach using static

information only (group 2) does. For this reason, as in jEdit, to reach the same number of reduced propagated

changes—for example, where the percentage of reduction for propagated changes is around 75% ∼ 76%—the

required numbers of refactoring application are 4, 6, and 10 for group 3, group 1, and group 2, respectively. As a

result, for instance, the average rate of reduction for propagated changes of all 11 applied refactorings are 9.09%,

7.67%, and 7.10% for group 3, group 1, and group 2, respectively. In addition, when positing the total percentage

of reduction for propagated changes for the combination of the two approaches (group 3) as 100%, then those for

group 1 and group 2 are 85% and 78%, respectively. It is also worth mentioning that in Columba, the absolute

scale of reduction for propagated changes is relatively small, because there are not many revisions to be retrieved.

Referring to the report period (in Table 7.1), we assume that the maturity level of the development for Columba is

relatively lower than jEdit, and Columba may still be in the development process. In fact, jEdit has been developed

and maintained for over ten years. Furthermore, the developers are much smaller, while the size of the program is

much bigger than jEdit’s; they may not have exerted as much effort for the revisions as jEdit does.

For JGIT, as in Fig. 7.5(a) and Fig. 7.5(b), group 1 reduces the number of propagated changes faster than

group 2 does, though only from the fourth to the eighth applied refactorings. Even at the final solution, the number

of reduced propagated changes of group 1 is smaller than that of group 2. As a result, for instance, the average

rate of reduction for propagated changes of the total of 12 applied refactorings are 8.33%, 5.13%, and 6.85% for

group 3, the group 1, and group 2, respectively. In addition, when positing the total percentage of reduction for

propagated changes for combination of the two approaches (group 3) as 100%, then those for group 1 and group

2 are 62% and 82%, respectively.

As opposed to the results with jEdit and Columba, with JGIT, group 1 does not outperform group 2. By

analyzing the revision history and source codes of JGIT, we found the following observations that can explain

this result. JGIT is a distributed source version control system and provides many special features for working

in a distributed environment with high speed. Since the most common use of scenarios for using version control

systems are committing, pushing, cloning, or pulling a file into a repository, we have mostly captured these normal

scenarios when performing dynamic profiling. However, real changes—which had occurred in the examined

– 48 –

Table 7.5: Indicators of cost-effective refactorings: (1) Percentage of reduction for propagated changes, (2) Rate

of reduction for propagated changes for jEdit, Columba, and JGIT.

jEditPercentage of reduction for propagated changes (%) Rate of reduction for propagated changes (%)

] of applied refactoring Dynamic + Static Static Dynamic ] of applied refactoring Dynamic + Static Static Dynamic

1 30 30 30 1 30 30 30

2 42.5 37.5 40 2 12.5 7.5 10

3 50 42.5 52.5 3 7.5 5 12.5

4 60 47.5 60 4 10 5 7.5

5 72.5 60 65 5 12.5 12.5 5

6 82.5 60 77.5 6 10 0 12.5

7 90 75 85 7 7.5 15 7.5

8 95 85 90 8 5 10 5

9 100 85 95 9 5 0 5

Average 69.17 58.06 66.11 Average 11.11 9.44 10.56

ColumbaPercentage of reduction for propagated changes (%) Rate of reduction for propagated changes (%)


1 31.3 6.3 31.3 1 31.3 6.3 31.3

2 46.9 23.4 43.8 2 15.6 17.2 12.5

3 59.4 35.9 50.0 3 12.5 12.5 6.3

4 75.0 45.3 65.6 4 15.6 9.4 15.6

5 84.4 57.8 73.4 5 9.4 12.5 7.8

6 87.5 65.6 76.6 6 3.1 7.8 3.1

7 89.1 70.3 78.1 7 1.6 4.7 1.6

8 92.2 73.4 79.7 8 3.1 3.1 1.6

9 96.9 75.0 81.3 9 4.7 1.6 1.6

10 98.4 76.6 82.8 10 1.6 1.6 1.6

11 100.0 78.1 84.4 11 1.6 1.6 1.6

Average 78.27 55.26 67.90 Average 9.09 7.10 7.67

JGITPercentage of reduction for propagated changes (%) Rate of reduction for propagated changes (%)


1 9.46 9.46 2.23 1 9.46 9.46 2.23

2 14.33 14.33 3.01 2 4.87 4.87 0.78

3 18.88 18.74 12.48 3 4.55 4.41 9.47

4 28.71 21.57 21.43 4 9.83 2.83 8.95

5 48.70 23.79 43.74 5 19.99 2.22 22.31

6 51.53 27.78 45.55 6 2.83 3.99 1.81

7 57.98 30.98 51.30 7 6.45 3.20 5.75

8 60.76 40.21 51.95 8 2.78 9.23 0.65

9 81.26 63.96 52.50 9 20.50 23.75 0.55

10 92.72 75.05 56.63 10 11.46 11.09 4.13

11 93.60 77.50 61.54 11 0.88 2.45 4.91

12 100.00 82.24 61.54 12 6.40 4.74 0.00

Average 54.83 40.47 38.66 Average 8.33 6.85 5.13

– 49 –

revisions of the development history for JGIT—are related to developing and correcting errors of the algorithms

that are not frequently used but contain critical functions, and the complexity of these algorithms is high. An

example of such algorithms is packing; JGIT stores each newly created object as a separate file, and this takes

a great deal of space and is inefficient. Thus, periodic packing of the repository is required to maintain space

efficiency, which requires very complex computation. For the reasons stated above, in JGIT, group 1 may rarely

identify refactorings on those parts of the algorithms; thus, the percentage of reduction for propagated changes and

the rate of reduction for propagated changes are rather small. However, the combination of the two approaches

(group 3) still outperforms the approach using static information alone (group 2). It is obvious that some of the

refactoring candidates—not identified in the approach using static information (group 2) and only identified in the

approach using dynamic information (group 1)—contribute to improving maintainability even faster. Here, these

two approaches are mutually complementary; thus, it can be said that using the dynamic information in addition

to the static information helps to improve maintainability even faster.

From the results presented above, we can conclude that, in three subjects—jEdit, Columba, and JGIT—,

dynamic information is helpful in identifying cost-effective refactorings that fast improve maintainability; and,

considering dynamic information in addition to static information provides even more opportunities to identify

cost-effective refactorings because of the refactoring candidates that are uniquely identified by the approach using

dynamic information only.

RQ2: Effect of Dynamic Information for Extracting Refactoring Candidates in Frequently Changed Classes

For each subject, jEdit, Columba, and JGIT, the commonly found classes (i.e., intersect set) of each approach

using static information and approach using dynamic information are represented in Table 7.6. The intersect set

is represented as (1) the number of the classes (Class ]), and (2) the number of occurred changes in those classes

(Change ]). The asterisk (*) is appended to the results of better solutions (i.e., those in which a greater number

of the classes or a greater number of occurred changes in those classes are commonly found). For two subjects,

jEdit and Columba, in the approach using dynamic information (group 1), more classes—extracted as refactoring

candidates—are found in the classes where real changes had occurred. For JGIT, in the approach using dynamic

information (group 1), more classes—extracted as refactoring candidates—are found only in the classes of top

10% and 20% most frequently changed.

On the other hand, the two distance measures (K: Kendall’s tau, F: Spearman’s footrule [27]) of the approach

using static information and the approach using dynamic information are represented in Table 7.7. The distance

measures count the number of pairwise disagreements between two top k-ranked lists. Therefore, the larger the

distance, the more dissimilar the two top k ranked lists are; conversely, the smaller the distance, the more similar

the two top k-ranked lists are. The asterisk (*) is also appended to the results of better solutions (i.e., those

with the smaller distance measures). Likewise the results in the commonly found classes, for two subjects, jEdit

and Columba, in the approach using dynamic information (group 1), the ranked lists of classes—extracted as

refactoring candidates—are more similar to the ranked list of the real changed classes. For JGIT, in the approach

– 50 –

Table 7.6: Commonly found classes between the real changed classes and the extracted classes as refactoring

candidates for each approach using static information and approach using dynamic information.

jEditChanged Static ∩ Changed Dynamic ∩ Changed

Top % Class ] Change ] Class ] Change ] Class ] Change ]

10.00% 7 67 6 60 7∗ 67∗

20.00% 16 110 7 64 10∗ 79∗

30.00% 27 143 9 70 12∗ 85∗

100.00% 72 207 19 82 21∗ 99∗

ColumbaChanged Static ∩ Changed Dynamic ∩ Changed


10.00% 27 458 9 220 14∗ 269∗

20.00% 55 624 13 241 15∗ 275∗

30.00% 79 788 16 251 17∗ 282∗

100.00% 265 993 22 260 24∗ 292∗

JGITChanged Static ∩ Changed Dynamic ∩ Changed


10.00% 20 5039 9 2872 10∗ 2899∗

20.00% 50 7296 19 3709 22∗ 3758∗

30.00% 91 8754 27∗ 3992∗ 26 3938

100.00% 258 9773 44∗ 4109∗ 33 3998

† The asterisk (*) is appended to the results of better solutions (i.e., those in which

a greater number of the classes or a greater number of occurred changes in those classes are commonly found).

Table 7.7: Top k ranking distance measures (K: Kendall’s tau; F: Spearman’s footrule [27]) between the real

changed classes and the extracted classes as refactoring candidates for each approach using static information and

approach using dynamic information.

jEdit Columba JGITChanged Static Dynamic Static Dynamic Static Dynamic

Top % K F K F K F K F K F K F

10.00% 68.5 69 53∗ 52∗ 397 207 258.5∗ 177∗ 1779.5 695 1284.5∗ 449.25∗

20.00% 158 69.25 129.5∗ 65.25∗ 1000.5 712 791∗ 659∗ 3494 1198.25 2702∗ 1026∗

30.00% 256.5 105 194∗ 96∗ 1785 1653 1617.5∗ 1520∗ 4411.5∗ 2099∗ 5489.5 2199.25

100.00% 1229 1329.25 1003.5∗ 1196.25∗ 15857 19824 15655.5∗ 19328∗ 26499∗ 22752∗ 29623 22974.25

† The asterisk (*) is appended to the results of better solutions (i.e., those with the smaller distance measures).

– 51 –

using dynamic information (group 1), the ranked list of classes—extracted as refactoring candidates—are more

similar only to the ranked lists of the top 10% and 20% most frequently changed.

The results presented above in three subjects—jEdit, Columba, and JGIT—show that dynamic information

is helpful in extracting refactoring candidates in the classes where real changes had occurred. In addition, overall,

the approach using dynamic information even outperforms the approach using static information for finding fre-

quently changed classes. Even though the former approach is not always better than the latter approach, we find

that the correlation does exist between the frequently changed classes and the classes of refactoring candidates

extracted from the approach using dynamic information. The results offer promising support for using dynamic

information for extracting refactoring candidates from highly-ranked frequently changed classes, and, further,

that using dynamic information in addition to static information can be a great help for cost-effective refactoring

identification.

7.4.2 RED-aware Grouping of Multiple Elementary Refactorings

RQ3: Effect of Multiple Refactorings

Table 7.8 summarizes the results of which each approach has reached to the final solution (i.e., no more refac-

torings that improve maintainability are found): fitness function values (maintainability improvement), the number

of iterations, and the elapsed time (computation cost and search space exploration cost) for jEdit, Columba, and

JGIT, respectively. The graphs in Fig. 7.6 are presented to show the visual results. In all three projects, the fitness

Table 7.8: Results of the effect of multiple refactorings.

Subject Comparators Fitness fn. Computation cost and search space exploration cost

Iteration (]) Elapsed Time (sec)

jEdit Rulebased RCs only 154 0.030322 431.89

Our approach 23 0.032312 241.12

Columba Rulebased RCs only 220 0.036951 581.53

Our approach 41 0.038132 205.74

jGit Rulebased RCs only 43 0.022549 198.77

Our approach 72 0.026701 232.68

† Rulebased RCs: Refactoring Candidates from Rule-based Identification.

† Rulebased RCs only: approach without MISs.

† Our approach: approach with Rulebased RCs + MISs.

function values of our approach (the approach of selecting multiple refactorings) are greater than the approach of

refactoring identification using the Rule-based RCs only (without MISs): jEdit (0.032312 > 0.030322), Columba

(0.038132 > 0.036951), and jGit (0.026701 > 0.022549). In addition, the total number of iterations and taken

time to reach to the solution of our approach is much lesser than the approach of refactoring identification us-

ing the Rule-based RCs only (without MISs): ] of iterations—jEdit (23 < 154) and Columba (41 < 220); time

– 52 –

Table 7.9: Results of the effect of the RED.

Subject Comparators Fitness fn. Accumulated Deviation

jEdit Not RED 0.032379 9246

Our approach 0.033472 0

Columba Not RED 0.030720 40758


jGit Not RED 0.023602 13058


† Not RED: approach without considering the RED.

† Our approach: approach considering the RED.

(sec)—jEdit (241.12 < 431.89) and Columba (205.74 < 581.53).

We take a close look the graphs in terms of the number of iterations (Fig. 7.7). The graphs of jEdit (Fig.

7.7(a)), Columba (Fig. 7.7(b)), and JGIT (Fig. 7.7(c)), present the results of the effect of multiple refactorings

on the number of iterations, respectively; the x-axis shows the number of iterations, and the y-axis shows the

fitness function values (cohesion / coupling). In both jEdit and Columba, fitness function values grows fast in our

approach; in contrast, the approach of refactoring identification using the Rule-based RCs only (without MISs)

grows gradually. It takes certain overhead of computing MISs in the first procedure of selection for multiple

refactorings (denoted as the processing time in the graph). However, the benefit of reduced time overcomes this

overhead.

In jGit, the approach of refactoring identification using the Rule-based RCs only (without MISs) faces the

local optimum problem. By examining the logged results, we found the following observation. First, during

the iterative process of refactoring selection, the approach of refactoring identification using the Rule-based RCs

only (without MISs) selects the refactorings in the same place. Thus, the second best refactoring rarely selected.

In contrast, our approach selects refactorings globally, which helps to prevent this problem. Second, in the ap-

proach of refactoring identification using the Rule-based RCs only (without MISs), it kills the other Move Method

refactoring opportunities. Sometimes, a group of smaller pieces of refactorings are useful, as in our approach.

From the results, we can conclude that our approach selects refactorings that lead the software design to

reach higher fitness function values (better improve maintainability) with smaller costs (i.e., smaller search space

exploration cost and shorter time). Even though it takes certain overhead of computing MISs at the very beginning

of our approach, the benefit of reduced cost overcomes this overhead. Furthermore, in some project, the approach

of refactoring identification using the Rule-based RCs only (without MISs) may face the local optimum problem.

Our approach tends to have better performance in avoiding local optimum by selecting refactorings globally.

– 53 –

RQ4: Effect of the RED

Table 7.9 summarizes the results of which each approach has reached to the final solution (i.e., no more

refactorings that improve maintainability are found): fitness function values (maintainability improvement) for

jEdit, Columba, and JGIT, respectively. In all three projects, the fitness function values of our approach (the

approach considering the RED) are greater than the approach without considering the RED: jEdit (0.033472 >

0.032379), Columba (0.037123 > 0.030720), and jGit (0.028192 > 0.023602). To explain the different results of

the two approaches, the deviation between actual and expected maintainability (i.e., external links assessed in the

refactoring effect evaluation framework) is measured for each iteration of the selection process and accumulated.

The accumulated deviations of the approach without considering the RED are 9246, 40758, and 13058, for jEdit,

Columba, and jGit, respectively. It can be interpreted that to this amount, the approach without considering the

RED miscalculates the maintainability of the suggested groups of refactorings; and the group of refactorings that

does not mostly improve maintainability could be selected.

From the results, we can conclude that when selecting multiple refactorings, considering the RED on is

important to correctly identify the group of refactorings that best improves maintainability. Even though the

refactorings are not syntactically dependent, the RED should be considered when selecting multiple refactorings.

7.5 Threats to Validity

We assume that the cost of each refactoring is the same; therefore, the number of applied refactorings is

regarded as the refactoring cost (effort). However, the number of applied refactorings does not actually reflect

the effort required to apply them. For practical use of our approach, several factors need to be considered. More

detailed discussion is provided in the next subsection.

The capability of identified refactorings for maintainability improvement is assessed by using the change

simulation method. In the experiment, we obtained changes from the change history for the input of the change

impact analysis. For changes obtained from the change history, it would be good to extract intentional changes by

excluding ripple effects—that the intentional changes necessitated—among the obtained changes, perform change

impact analysis for those intentional changes, and compare the results of change impact analysis. However,

discernment of intentional changes among the obtained changes is not feasible, because it is nontrivial to identify

whether a change is an intentional change or a ripple effect; therefore, we did not use the intentional changes as

the input of the change impact analysis. Thus, we use the obtained changes (i.e., input as original changes), then

perform change impact analysis to identify the potential consequences (i.e., output as propagated changes) for

those obtained changes.

For implementing change impact analysis, the two-steps of direct and indirect propagated methods are con-

sidered by using different weight values. The further step of indirect propagated methods can be considered.

To evaluate the capability of the selected refactorings, we use the maintainability evaluation function, which

is based on the coupling and cohesion metrics. This maintainability evaluation function fits to our evaluation

– 54 –

criteria, because we regard improving maintainability of software as having high cohesion and low coupling. The

maintainability evaluation function designed for other goals or weighting on the specific criteria may produce

different fitness function values.

When constructing RED-aware graph, more types of dependencies such as structural dependencies can be

considered for grouping entities into sets more precisely.

– 55 –

120

125

130

135

140

145

150

155

160

165

170

0 1 2 3 4 5 6 7 8 9

# o

f im

pa

cte

d m

eth

od

s fo

r a

cco

mm

od

a�

ng

ch

an

ge

s

# of applied refactorings

Dynamic+Sta c(Methods) Sta c(Methods) Dynamic(Methods)

(a) Number of propagated methods for accommodating changes on

jEdit

100

105

110

115

120

125

130

135

140

0 1 2 3 4 5 6 7 8 9

# o

f im

pa

cte

d c

lass

es

for

acc

om

od

a�

ng

ch

an

ge

s


Dynamic+Sta c(Classes) Sta c(Classes) Dynamic(Classes)

(b) Number of propagated classes for accommodating changes on

jEdit

Figure 7.3: Change simulation for jEdit.

700

710

720

730

740

750

760

770

780

790

800

0 1 2 3 4 5 6 7 8 9 10 11

# o

f im

pa

cte

d m

eth

od

s fo

r a

cco

mm

od

a�

ng

ch

an

ge

s




Columba

600

610

620

630

640

650

660

0 1 2 3 4 5 6 7 8 9 10 11

# o

f im

pa

cte

d c

lass

es

for

acc

om

od

a�

ng

ch

an

ge

s




Columba

Figure 7.4: Change simulation for Columba.

140

145

150

155

160

165

170

175

0 1 2 3 4 5 6 7 8 9 10 11 12

# o

f im

pa

cte

d m

eth

od

s fo

r a

cco

mm

od

a�

ng

ch

an

ge

s




JGIT

115

120

125

130

135

140

0 1 2 3 4 5 6 7 8 9 10 11 12

# o

f im

pa

cte

d c

lass

es

for

acc

om

mo

da

�n

g c

ha

ng

es




JGIT

Figure 7.5: Change simulation for JGIT.

– 56 –

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

jEdit Columba jGit

Single refactoring Mul!ple refactorings (Our approach)

(a) Fitness fn. [38]

0

500

1000

1500

2000

2500

jEdit Columba jGit


(b) ] of iterations

0

1000

2000

3000

4000

5000

6000

jEdit Columba jGit


(c) Elapsed time (sec)

Figure 7.6: The effect of multiple refactorings.

– 57 –

400 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38

0.033

0.023

0.024

0.025

0.026

0.027

0.028

0.029

0.03

0.031

0.032

# of iterations

Fit

ness (

cohesio

n/coupling)

Our approach: approach with Rulebased_RCs + MISs

Rulebased_RCs only: approach without MISs

(a) jEdit

550 5 10 15 20 25 30 35 40 45 50

0.04

0.022

0.024

0.026

0.028

0.03

0.032

0.034

0.036

0.038

# of iterations

Fit

ness (

cohesio

n/coupling)



(b) Columba

700 10 20 30 40 50 60

0.027

0.021

0.0215

0.022

0.0225

0.023

0.0235

0.024

0.0245

0.025

0.0255

0.026

0.0265

# of iterations

Fit

ness (

cohesio

n/coupling)



Local Optimum

(c) JGIT

Figure 7.7: The effect of multiple refactorings on the number of iterations.

– 58 –

Chapter 8. Discussion

Some of the researches addressed the method of estimating refactoring cost; for example, Zibran and Roy

[93] propose a refactoring effort model that takes into account several types of effort needed to remove software

code clones. To more accurately estimate refactoring cost, we need to consider the effort needed to perform the

activities—refactoring identification, refactoring application, and refactoring maintenance—of the entire refactor-

ing process (explained in Section 3). For refactoring identification, refactoring complexity (e.g., big or small for

code modification, or easy or difficult for understanding context) needs to be considered. It is reasonable to expect

that big refactorings—which consist of a series of small refactorings—would require more effort to be applied

than small refactorings would do, because they should affect larger portion of source codes; and at the same time,

impact of big refactorings on maintainability improvement tends to be larger. For instance, in the experiment—

performed without considering refactoring complexity—, class-level refactorings (i.e., Collapse Class Hierarchy

refactorings) are selected in many cases than method-level refactorings (i.e., Move Method refactorings); because

the impact on maintainability improvement of class-level refactorings tends to be larger than that of method-level

refactorings. If the refactoring complexity of the application is taken into account for estimating refactoring cost,

method-level refactorings may be more selected. Refactoring complexity of the application can be considered by

dividing each refactoring into fine-grained (e.g., atomic-level) transformations and giving each a different weight.

For refactoring application, basically, if we can ensure that applying a refactoring on actual source codes is fully

automated by a tool, then the refactoring cost can be regarded as zero. However, in practice, the application of

refactorings may involve additional costs such as the effort of relocating codes, especially when the refactorings

are complex. Refactoring inspection costs also need to be considered, because it is a human who decides whether

to refactor or not. For instance, the developer or the maintainer needs to take time to decide whether identified

refactorings should be applied or not. Even though the refactorings are beneficial to maintainability improvement,

they could be rejected to be applied due to the confliction with other design practices and principles. Finally, for

refactoring maintenance, the effort involved in testing the refactored code and checking consistency with other

software artifacts needs to be considered.

In the experiment, the main key to obtain a better outcome is how strongly the frequently utilized parts are

correlated with the parts that actually have been changed and how much more refactorings are identified and

applied in those parts. For instance, in jEdit and Columba, changes have occurred in the parts that are often

utilized; while in JGIT, change-occurred parts are not strongly correlated with the frequently used parts. By

examining the changes made to JGIT, we notice that development of system’s main functionalities has almost

been finished; and developers seem to focus on perfective maintenance. It is reasonable that, in this case, changes

can be made to the places dealing with exceptional scenarios or containing functionalities utilized only by high-

end users. Even though the use of frequency is rather low, the importance or complexity of developing such

– 59 –

parts can be high. For this reason, for JGIT, other predictors, such as structural complexity (e.g., class size), may

need to be additionally considered to identify better cost-effective refactorings. Nevertheless, it is worth pointing

out that the dynamic information is the important factor for identifying cost-effective refactorings, because the

experimental results show that, the combination of two approaches—the approach using dynamic information and

the approach of static approach—still outperforms the approach using static information alone. We discussed with

senior developers—who work in IT industries over ten years—for interpreting these experimental results. They

support the arguments by providing the following explanations: the system having intensive user interactions

tends to be gradually developed by actively accommodating users’ requests; thus, changes are more likely to be

occurred where users more utilize. On the other hand, the system, which is algorithmic-based and has rather less

interactions with users, tends to be developed in a way of completing each decomposed function; thus, changes

are not likely to be occurred where the development is completed.

We defined a total of 18 refactoring extraction rules. Given the inherent limitation of the rule-based approach,

the rules cannot be complete. Further, more rules need to be developed and refined to find better refactoring can-

didates. In addition, other methods of finding refactoring candidates are needed. Using our rule-based approach,

for refactorings such as Extract Class and Extract Method, determining specific code blocks to be split in an

automated way is difficult.

While the dynamic profiling-based approach of refactoring identification needs efforts of dynamic profiling in

addition to the approach of using static information only, the benefit of using the dynamic profiling-based approach

may outweigh the efforts of dynamic profiling. In addition, the efforts of dynamic profiling are manageable

because dynamic profiling is done just once at the beginning of the approach.

We assume that the application of elementary refactorings (i.e., Move Method refactorings) do not change—

delete or merge—entities in the MISs; but only the membership information—which entity is placed in which

class—is changed. As a result, the vertices and edges of the constructed RED-graph GR is remained same after

the application of the selected refactorings. Therefore, MISs do not need to be calculated for every iteration of the

refactoring selection process. The calculation of MISs is done once at the beginning of the refactoring selection

process. For the future work, we plan to consider more types of refactorings. According to this, we need to

develop the method of recalculating the MISs for accommodating the situations that the application of elementary

refactorings deletes or merges entities.

It is worth to mention that the goal of our method of refactoring selection is not to find an optimal sequence of

refactorings. We attempt to select multiple refactorings that can be applied at the same time; and the sequence of

refactorings is generated by logging the results of the selected multiple refactoring for each refactoring selection

process. By this way of selection, we take the advantages of (1) considering refactoring dependencies and the

creation of new refactoring candidates after the application of the refactoring suggestions, and (2) more efficient

computing and searching than the approach of selecting just the single best refactoring.

– 60 –

Chapter 9. Conclusion and Future Work

9.1 Summary of Contributions

In the thesis, we provide the methods for supporting systematic refactoring identification: identification of

refactoring candidates and selection of refactorings to be applied. For identification of refactoring candidates,

we attempt top-down and bottom-up approaches. First, for the top-down approach—finding refactoring oppor-

tunities by using heuristic rules for eliminating violations of design principles (e.g., removing bad smells) in

object-oriented software systems—we establish the rules to extract the refactoring candidates with the aim of re-

ducing dependencies of entities of methods and classes. When establishing the rules, entities are identified based

on how the users utilize the software (e.g., user scenario and operational profile); and within these entities, refac-

toring candidates are identified. Second, for the bottom-up approach—identification of refactoring opportunities

without humans’ insights—we develop the method for grouping elementary refactorings by using the concept of

the MIS in graph theory. For grouping of elementary refactorings, we develop the method for forming entities

(i.e., methods and attributes) into MISs by taking the RED into account. The methods involved in each MIS are

transformed into a group of Move Method refactorings. Each of the group has elementary refactorings that can be

applied at the same time. For selecting refactorings to be applied, we provide the method of selecting refactorings

(refactoring effect evaluation framework) by supporting assessment and impact analysis of elementary refactor-

ings. By referring the refactoring effect evaluation framework (delta table), we select the group of refactorings

containing the multiple elementary refactorings that best improves maintainability. We evaluate our proposed

approach in three open-source projects—jEdit, Columba, and JGIT. From the experimental results, we conclude

that dynamic information is helpful in identifying refactorings that efficiently improve maintainability; because

dynamic information is helpful for extracting refactoring candidates in frequently changed classes. Furthermore,

the experimental results show that the selection method of multiple refactorings reduces search space exploration

cost; and the RED should be considered when selecting multiple refactorings.

The contributions of the thesis can be summarized as follows.

• Establish the framework of systematic refactoring identification

• Develop the method for dynamic information-based identification of refactoring candidates

• Recognize the new dependency of refactorings (i.e., RED) that is essential to be considered to correctly

identify a group of refactorings that most improve maintainability

• Develop the method for RED-aware grouping of elementary refactorings (by using the concept of the MIS

in graph theory)

– 61 –

• Provide the method for selecting multiple (elementary) refactorings by supporting assessment and impact

analysis of elementary refactorings (based on the matrix computation, which enables fast computation)

• Perform empirical studies on large-sized open source programs

9.2 Future Work

The framework of refactoring selection provides the methods of assessment and impact analysis of ele-

mentary refactorings (based on the matrix computation), and these methods support to easily extend considering

refactorings to other various type of refactorings; because the action of big refactoring (e.g., Collapse Hierarchy

Class refactoring) comprises of elementary refactorings (e.g., Move Method refactoring). Therefore, to provide

more complete methods for supporting systematic refactoring identification, we plan to consider more types of

refactorings (that are considered in [22]), for example, Pull Up Method refactoring and Form Template Method

refactoring.

– 62 –

References

[1] Noga Alon, Laszlo Babai, and Alon Itai. A fast and simple randomized parallel algorithm for the maximal

independent set problem. Journal of algorithms, 7(4):567–583, 1986.

[2] G. Arevalo. Understanding behavioral dependencies in class hierarchies using concept analysis. Proceedings

of LMO, 3:47–59, 2003.

[3] E Arisholm, LC Briand, and A Føyen. Dynamic coupling measurement for object-oriented software. IEEE

Transactions on Software Engineering, pages 491–506, 2004.

[4] E. Arisholm, L.C. Briand, and A. Foyen. Dynamic coupling measurement for object-oriented software. IEEE

Transaction on Software Engineering, 30:491–506, 2004.

[5] Erik Arisholm and Lionel C. Briand. Predicting fault-prone components in a java legacy system. In ISESE

’06: Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering, pages

8–17, New York, NY, USA, 2006. ACM.

[6] Felix Bachmann, Len Bass, and Robert Nord. Modifiability tactics. Technical Report, 2007.

[7] J. Bansiya, L. Etzkorn, C. Davis, and W. Li. A class cohesion metric for object-oriented designs, journal of

object-oriented program. Journal of Object-Oriented Program, 11(8):47–52, 1999.

[8] Jagdish Bansiya and Carl G. Davis. A hierarchical model for object-oriented design quality assessment.

Software Engineering, IEEE Transactions on, 28(1):4–17, 2002.

[9] E. Biermann, K. Ehrig, C. Kohler, G. Kuhns, G. Taentzer, and E. Weiss. Emf model refactoring based on

graph transformation concepts. Electronic Communications of the EASST, 3(0), 2007.

[10] Bart Du Bois, Serge Demeyer, and Jan Verelst. Refactoring - improving coupling and cohesion of existing

code. Working Conference on Reverse Engineering, 0:144–151, 2004.

[11] C. Bonja and E. Kidanmariam. Metrics for class cohesion and similarity between methods. Proceedings of

the 44th annual Southeast regional conference, pages 91–95, 2006.

[12] Borland. Together 2006 Release 2 for Eclipse, 2006. http://www.borland.com/us/products/together/index.html.

[13] L.C. Briand, J.W. Daly, and J.K. Wust. A unified framework for coupling measurement in object-oriented

systems. Software Engineering, IEEE Transactions on, 25(1):91–121, 1999.

[14] LC Briand, Y. Labiche, and Y. Miao. Towards the reverse engineering of UML sequence diagrams. In

Proceedings of the 10th Working Conference on Reverse Engineering, pages 57–66, 2003.

– 63 –

[15] Lionel C. Briand and Jurgen Wust. Empirical studies of quality models in object-oriented systems. In

Advances in Computers, pages 97–166. Academic Press, 2002.

[16] M.G. Burke, J.D. Choi, S. Fink, D. Grove, M. Hind, V. Sarkar, M.J. Serrano, V.C. Sreedhar, H. Srinivasan,

and J. Whaley. The jalapeno dynamic optimizing compiler for java. In Proceedings of the ACM 1999

conference on Java Grande, pages 129–141. ACM, 1999.

[17] Costas Busch. Maximal Independent Set, 2000. http://www.slidefinder.net/m/maximal independent set/mis/15624838.

[18] S.R. Chidamber and C.F. Kemerer. A metrics suite for object oriented design. Software Engineering, IEEE

Transactions on, 20(6):476–493, 1994.

[19] KJ Cios, W. Pedrycz, and RM Swiniarsk. Data mining methods for knowledge discovery. IEEE Transactions

on Neural Networks, 9(6):1533–1534, 1998.

[20] Columba. Columba, 2011. http://sourceforge.net/projects/columba/.

[21] S. Demeyer, S. Ducasse, and O. Nierstrasz. Object-oriented reengineering patterns. Morgan Kaufmann,

2002.

[22] Java development user guide. Refactoring Actions, 2013.

http://help.eclipse.org/juno/topic/org.eclipse.jdt.doc.user/reference/ref-menu-refactor.htm.

[23] M. Dmitriev. Selective profiling of java applications using dynamic bytecode instrumentation. In Perfor-

mance Analysis of Systems and Software, 2004 IEEE International Symposium on-ISPASS, pages 141–150.

IEEE, 2004.

[24] Bart Du Bois, Serge Demeyer, and Jan Verelst. Refactoring - improving coupling and cohesion of existing

code. In Proceedings of the 11th Working Conference on Reverse Engineering, pages 144–151, Washington,

DC, USA, 2004. IEEE Computer Society.

[25] S. Ducasse, M. Lanza, and S. Tichelaar. Moose: an extensible language-independent environment for reengi-

neering object-oriented systems. In Proceedings of the Second International Symposium on Constructing

Software Engineering Tools (CoSET 2000), pages 1–7. Citeseer, 2000.

[26] A.L. Edwards. An introduction to linear regression and correlation. W.H. Freeman, San Francisco, CA,

1976.

[27] R. Fagin, R. Kumar, and D. Sivakumar. Comparing top k lists. In Proceedings of the fourteenth annual ACM-

SIAM symposium on Discrete algorithms, pages 28–36. Society for Industrial and Applied Mathematics,

2003.

[28] Marios Fokaefs, Nikolaos Tsantalis, Eleni Stroulia, and Alexander Chatzigeorgiou. Identification and appli-

cation of extract class refactorings in object-oriented systems. Journal of Systems and Software, 2012.

– 64 –

[29] M. Fowler and K. Beck. Refactoring: Improving the design of existing code. Addison-Wesley Professional,

1999.

[30] R. France, J. Bieman, and B.H.C. Cheng. Repository for Model Driven Development (ReMoDD). Lecture

Notes in Computer Science, 4364:311, 2007.

[31] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design Patterns: Elements of Reusable

Object-Oriented Software. Addison Wesley, 1995.

[32] V. Garousi, L. Briand, and Y. Labiche. Analysis and visualization of behavioral dependencies among dis-

tributed objects based on uml models. Model Driven Engineering Languages and Systems, pages 365–379,

2006.

[33] C. Ghezzi, M. Jazayeri, and D. Mandrioli. Fundamentals of software engineering. Prentice Hall PTR, 2002.

[34] David Gilbert and Thomas Morgner. JFreeChart, 2010. http://www.jfree.org/jfreechart/.

[35] M.S. Gittens. The Extended Operational Profile Model for Usage-based Software Testing. Library and

Archives Canada Bibliotheque et Archives Canada, 2005.

[36] P Van Gorp, H Stenten, T Mens, and S Demeyer. Towards automating source-consistent uml refactorings.

Lecture Notes in Computer Science, pages 144–158, 2003.

[37] Ah-Rim Han. ARTool, 2011. https://github.com/igsong/ARTOOL.

[38] Ah-Rim Han and Doo-Hwan Bae. Dynamic profiling-based approach to identifying cost-effective refactor-

ings. Information and Software Technology, 55(6):966–985, 2013.

[39] Ah-Rim Han, Sang-Uk Jeon, Doo-Hwan Bae, and Jang-Eui Hong. Measuring behavioral dependency for

improving change-proneness prediction in uml-based design models. The Journal of Systems & Software,

83(2):222 – 234, Feb 2010.

[40] A.R. Han, S.U. Jeon, D.H. Bae, and J.E. Hong. Behavioral Dependency Measurement for Change-Proneness

Prediction in UML 2.0 Design Models. Proceedings of the 32nd Annual IEEE International Computer

Software and Applications, pages 76–83, 2008.

[41] Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Series in

Data Management Systems, San Francisco, CA, second edition, 2006.

[42] M. Harman. Refactoring as testability transformation. In Software Testing, Verification and Validation

Workshops (ICSTW), 2011 IEEE Fourth International Conference on, pages 414–421. IEEE, 2011.

[43] M. Harman and L. Tratt. Pareto optimal search based refactoring at the design level. In Proceedings of the

9th annual conference on Genetic and evolutionary computation, pages 1106–1113. ACM, 2007.

– 65 –

[44] Brian Henderson-Sellers. Object-oriented metrics: measures of complexity. Prentice-Hall, Inc., Upper

Saddle River, NJ, USA, 1996.

[45] Y. Higo, S. Kusumoto, and K. Inoue. A metric-based approach to identifying refactoring opportunities for

merging code clones in a java software system. Journal of Software Maintenance and Evolution: Research

and Practice, 20(6):435–461, 2008.

[46] K. Hotta, Y. Higo, and S. Kusumoto. Identifying, tailoring, and suggesting form template method refactoring

opportunities with program dependence graph. In 16th European Conference on Software Maintenance and

Reengineering (CSMR’12), pages 53–62, 2012.

[47] W.W. Hwu and P.P. Chang. Achieving high instruction cache performance with an optimizing compiler. In

ACM SIGARCH Computer Architecture News, pages 242–251. ACM, 1989.

[48] IBM Rational. Rational Software Architect Standard Edition, 2008. http://www-

01.ibm.com/software/awdtools/swarchitect/standard/.

[49] jEdit. jEdit, 2011. http://www.jedit.org/.

[50] S.U. Jeon, J.S. Lee, and D.H. Bae. An automated refactoring approach to design pattern-based program

transformations in java programs. In Software Engineering Conference, 2002. Ninth Asia-Pacific, pages

337–345. IEEE, 2002.

[51] JetBrains. IntelliJ IDEA, 2012. http://www.jetbrains.com/idea/.

[52] JGIT. JGIT, 2011. http://eclipse.org/jgit/.

[53] David S Johnson, Mihalis Yannakakis, and Christos H Papadimitriou. On generating all maximal indepen-

dent sets. Information Processing Letters, 27(3):119–123, 1988.

[54] Y. Kataoka, T. Imai, H. Andou, and T. Fukaya. A quantitative evaluation of maintainability enhancement by

refactoring. In Proceedings. International Conference on Software Maintenance, 2002., pages 576 – 585,

2002.

[55] Joshua Kerievsky. Refactoring to patterns. Addison Wesley Pearson Education, 2005.

[56] Sukhee Lee, Gigon Bae, Heung Seok Chae, Doo-Hwan Bae, and Yong Rae Kwon. Automated scheduling

for clone-based refactoring using a competent GA. Softw., Pract. Exper., 41(5):521–550, 2011.

[57] W Li and S Henry. Object-oriented metrics that predict maintainability. Journal of systems and software,

23(2):111–122, 1993.

[58] H. Liu, Z. Ma, W. Shao, and Z. Niu. Schedule of bad smell detection and resolution: A new way to save

effort. Software Engineering, IEEE Transactions on, 38(1):220–235, 2012.

– 66 –

[59] Michael Luby. A simple parallel algorithm for the maximal independent set problem. SIAM journal on

computing, 15(4):1036–1053, 1986.

[60] P.C. Mahalanobis. On the generalized distance in statistics. Proc. Nat. Inst. Sci. India, 2(1):49–55, 1936.

[61] S. Mancoridis, B.S. Mitchell, C. Rorres, Y. Chen, and E.R. Gansner. Using automatic clustering to produce

high-level system organizations of source code. In Program Comprehension, 1998. IWPC’98. Proceedings.,

6th International Workshop on, pages 45–52. IEEE, 1998.

[62] J. Martin and C. L. McClure. Software Maintenance: The Problems and its Solutions. Prentice Hall, 1983.

[63] R. C. Martin. Agile Software Development: Principles, Patterns and Practices. Prentice Hall, 2003.

[64] T Mens, G Taentzer, and O Runge. Analysing refactoring dependencies using graph transformation. Software

and Systems Modeling, Jan 2007.

[65] T Mens and T Tourwe. A survey of software refactoring. IEEE Transactions on Software Engineering,

30(2):126–139, 2004.

[66] J.D. Musa. Operational profiles in software-reliability engineering. Software, IEEE, 10(2):14–32, 1993.

[67] J. T. Nosek and P. Palvia. Software maintenance management: Changes in the last decade. Journal of

Software Maintenance: Research and Practice, 2(3):157–174, 1990.

[68] M O’Keeffe and M O Cinneide. Search-based refactoring for software maintenance. The Journal of Systems

& Software, 81(4):502–516, 2008.

[69] P. Oman and J. Hagemeister. Metrics for assessing a software system’s maintainability. In Proceerdings of

Conference on Software Maintenance, 1992., pages 337 –344, 9-12 1992.

[70] OMG. UML 2.4 Superstructure Specification, (formal/2010-05-05) edition, 2010.

http://www.omg.org/spec/UML/2.4/Superstructure/PDF/.

[71] W. F. Opdyke. Refactoring: A program restructuring aid in designing object-oriented application framework.

Ph.D. thesis, University of Illinois at Urbana-Champaign, 1992.

[72] D. Parnas. Software aging. In Proceedings of The 16th International Conference on Software Engineering

(ICSE94), pages 279–287. IEEE Computer Society Press, 1994.

[73] E. Piveta, J. Araujo, M. Pimenta, A. Moreira, P. Guerreiro, and R.T. Price. Searching for opportunities of

refactoring sequences: Reducing the search space. In Computer Software and Applications, 2008. COMP-

SAC’08. 32nd Annual IEEE International, pages 319–326. IEEE, 2008.

[74] F. Qayum and R. Heckel. Search-based refactoring using unfolding of graph transformation systems. Elec-

tronic Communications of the EASST, 38(0), 2011.

– 67 –

[75] A.J. Riel. Object-Oriented Design Heuristics. Addison-Wesley, 1999.

[76] R. Robbes, M. Lanza, and D. Pollet. A benchmark for change prediction. Faculty of Informatics, Universit

della Svizzerra Italiana, Lugano, Switzerland, Tech. Rep, 6, 2008.

[77] R. Robbes, D. Pollet, and M. Lanza. Replaying ide interactions to evaluate and improve change prediction

approaches. In Mining Software Repositories (MSR), 2010 7th IEEE Working Conference on, pages 161–170.

IEEE, 2010.

[78] D. Roberts, J. Brant, and R. Johnson. A refactoring tool for smalltalk. Urbana, 51:61801, 1997.

[79] Atanas Rountev, Olga Volgin, and Miriam Reddoch. Control flow analysis for reverse engineering of se-

quence diagrams. Rapport Technique, Ohio State University, 2004.

[80] Chris Seguin. JRefactory, 2003. http://jrefactory.sourceforge.net/csrefactory.html.

[81] O. Seng, M. Bauer, M. Biehl, and G. Pache. Search-based improvement of subsystem decompositions. In

Proceedings of the 2005 conference on Genetic and evolutionary computation, pages 1045–1051. ACM,

2005.

[82] O Seng, J Stammel, and D Burkhart. Search-based determination of refactorings for improving the class

structure of object-oriented systems. Proceedings of the 8th annual conference on Genetic and evolutionary

computation, page 1916, 2006.

[83] F. Simon, F. Steinbruckner, and C. Lewerentz. Metrics based refactoring. In Software Maintenance and

Reengineering, 2001. Fifth European Conference on, pages 30–38. IEEE, 2001.

[84] A. Srivastava and A. Eustace. Atom: A system for building customized program analysis tools. ACM

SIGPLAN Notices, 39(4):528–539, 2004.

[85] Friedrich Steimann and Jens von Pilgrim. Refactorings without names. In Proceedings of the 27th

IEEE/ACM International Conference on Automated Software Engineering, pages 290–293. ACM, 2012.

[86] L Tahvildari and K Kontogiannis. A metric-based approach to enhance design quality through meta-pattern

transformations. Proc. European Conf. Software Maintenance and Reeng, pages 183–192, 2003.

[87] N Tsantalis and A Chatzigeorgiou. Identification of extract method refactoring opportunities. Proceedings

of the 2009 European Conference on Software Maintenance and Reengineering-Volume 00, pages 119–128,

2009.

[88] N Tsantalis and A Chatzigeorgiou. Identification of move method refactoring opportunities. Software Engi-

neering, IEEE Transactions on, 35(3):347 – 367, 2009.

– 68 –

[89] N. Tsantalis and A. Chatzigeorgiou. Ranking refactoring suggestions based on historical volatility. In Soft-

ware Maintenance and Reengineering (CSMR), 2011 15th European Conference on, pages 25–34. IEEE,

2011.

[90] H. Van Vliet. Software Engineering: Principles and Practice. Wiley, second edition, 2000.

[91] Ruediger Zarnekow and Walter Brenner. Distribution of cost over the application lifecycle - a multi-case

study. In Proceedings of the Thirteenth European Conference on Information Systems, 2005.

[92] L. Zhao and J.H. Hayes. Predicting classes in need of refactoring: An application of static metrics. In

Proceedings of the workshop on predictive models of software engineering (PROMISE), associated with

ICSM2006, pages 1–5. Citeseer, 2006.

[93] Minhaz F Zibran and Chanchal K Roy. Conflict-aware optimal scheduling of code clone refactoring: a

constraint programming approach. In Program Comprehension (ICPC), 2011 IEEE 19th International Con-

ference on, pages 266–269. IEEE, 2011.

– 69 –

Chapter A. Behavioral Dependency Measure as a Good

Indicator for Change-Proneness Prediction [39]

A.1 Behavioral Dependency Measure

Behavioral Dependency

A change in a class can affect other classes enforcing them to be modified. In order to predict the class

affected when a class changes occurs, we need to examine the dependencies of pairs of entities (i.e., classes or

objects) in the system. In this paper, we focus on behavioral dependency.

Essentially, we assume that the object sending a message has a behavioral dependency on the object receiving

the message. This is derived from the insight that modifying the class of the object receiving a message may affect

the class of the object sending the message. It is important to note that when an object sends a message to another

object, the class implementing the corresponding method of the message may be different from that of the object

receiving the message. This is due to the use of inheritance relationships and polymorphism, which may cause

dynamic binding of methods. In this case, the class of the object sending a message must be bound to (i.e., have

a behavioral dependency on) the class implementing the actual method of that message. Therefore, we need to

consider inheritance relationships and polymorphism according to the behavior of objects in order to correctly

identify dependencies between classes and ultimately predict change-proneness accurately. This issue will be

explained in detail in Subsection A.1. We also assume that a high intensity of behavioral dependency represents

high possibility of changes to be occurred. The rationale behind this assumption is that the more external services

upon which the class of an object is dependent, the more likely it is that the class will be changed.

sd A

o1:c1 o2:c2 o3:c3 o4:c4 o5:c5

b

e

ac

d

f

sd B

alt

o1:c1 o2:c2 o3:c3

a

b[else]

[cond]

g

Figure A.1: Examples of Sequence Diagrams (SDs)

To quantify the behavioral dependency, we define two kinds of behavioral dependencies: direct and indirect.

Each is defined as follows. Let O denote a set of objects existing in a system.

Definition 3 (Direct behavioral dependency). For op1, op2 ∈ O, op1 has a direct behavioral dependency on op2if op1 needs some services of op2 by sending a synchronous message to op2 and receiving a reply from op2. We

denote direct behavioral dependency as a relation→.

– 70 –

Definition 4 (External service request relation). For op1, op2, op3 ∈ O, op1 → op2 and then op2 → op3 because

op1 needs external service which is provided from op3 via op2. We denote this as an external service request

relation y. Therefore, in this case, (op1 → op2) y (op2 → op3).

Definition 5 (Indirect behavioral dependency). We denote indirect behavioral dependency as a relation . For

example, for op1, op2, op3, ..., opn ∈ O, if we have the relation (op1 → op2) y (op2 → op3) y ... y(opn−1 → opn), then we can derive the indirect behavioral dependency as op1 opn except n = 2, which

means a direct behavioral dependency op1 → op2.

A synchronous message entails a dependency between two objects since the sender object depends on the

receiver object. On the other hand, an asynchronous message does not entail such dependency since the sender

object does not wait for a reply but continues to proceed. This means the reply will not affect the sender object’s

behavior. Therefore, in our approach, we only consider synchronous messages with replies.

Fig. A.1 shows two examples of SDs. In SD sd A, object o1 has a direct behavioral dependency on object o2

because it sends a synchronous message a to object o2 and receives a reply from it. On the other hand, object o1

has an indirect behavioral dependency on object o3; before object o1 receives a reply for message a from object

o2, message b is sent from object o2 to object o3. By the same reasoning, object o1 and object o4, as well as

object o2 and object o4, have indirect dependencies. Asynchronous message e from object o1 to object o2 does

not entail a behavioral dependency since object o1 (the sender) does not wait for a reply from object o2. All

messages in SD sd B cause direct behavioral dependencies.

It is important to note that an indirect behavioral dependency is not a transitive relation. For example, in Fig.

A.1, object o1 and object o5 do not have a behavioral dependency, even though object o1 and object o2 have a

behavioral dependency because of message a and object o2 and object o5 have a behavioral dependency because

of message f. This is because message f is sent from object o2 to object o5 after object o1 receives the reply

for message a from object o2. For this reason, we need to save the information of the message that triggers the

current message to precisely identify the indirect behavioral dependency between the two objects. In this way,

when object oi has an indirect dependency on object oj , we can derive a reachable path (a sequence of exchanged

messages between two objects) by traversing stored messages from object oj to object oi in a backward direction.

Features of the Behavioral Dependency Measure

The proposed BDM has a number of features that are different from existing metrics.

First, the most important feature of the BDM that makes it unique is that it considers inheritance relationships

and polymorphism. In general, polymorphism indicates method overriding and method overloading. We do not

take method overloading into account because it refers to methods that have same name with different numbers

or types of parameters in one class; as a result, method overloading does not occur dependency among classes.

Therefore, in this paper, polymorphism means method overriding on the classes having inheritance relationships.

As the system contains more inheritance relationships and polymorphism, dependency among classes becomes

more complex because of dynamic binding of methods. Hence, inheritance relationships and polymorphism as

they relate to the behavior of objects need to be considered in order to correctly identify dependency among classes.

– 71 –

+ draw ()

Shape

+ draw ()

Triangle

+ draw ()

Circle

+ draw ()

Rectangle

Canvas

Square

(a)

sd draw

draw ()

t : Triangle

sd draw

draw ()

c : Circle

sd draw

draw ()

r : Rectangle

sd draw

draw ()

s : Square

(b)

Figure A.2: An example of using inheritance relationships and polymorphism: (a) A class diagram representing

classes and their relationships. (b) SDs representing the behaviors of objects that are instantiated from the classes

in A.2(a).

ref A EER = 80 %

ref B EER = 20 %

* EER ( Expected Execution Rate )

(a)

+ a ()

+ b ()

+ c () + d ()

+ a () + g () + f ()

c 6

c 4 c 2

c 3

c 5

c 1

(b)

Figure A.3: (a) An example of the Interaction Overview Diagram (IOD). (b) An example of the class diagram that

has classes from which the objects, in the SDs in Fig. A.1 are instantiated.

Indeed, this is critical for the accurate prediction of change-proneness. If we were not to consider inheritance

relationships and polymorphism, a class may be mistakenly predicted to be prone to change. The example in Fig.

A.2 shows the importance of considering inheritance relationships and polymorphism in relation to the behavior

– 72 –

of objects when measuring dependency between classes. In the class diagram in Fig. A.2(a), the Canvas class

has an association with the Shape class, which indicates that the Canvas class calls the method draw in the

Shape class. In other words, the Canvas class is dependent on the Shape class. This static dependency is the

information that we can derive from the class diagram. Most existing coupling metrics are measured based on

static dependencies. However, if the message draw is sent to the object that is an instance of the subclass of

the Shape class and that subclass overrides the method draw, the dependency is bound between the Canvas

class and the subclass, even though the association is specified between the Canvas class and the Shape class.

Furthermore, if the message draw is sent to the object that is an instance of the subclass of the Shape class but

does not have the method draw, the dependency is bound between the Canvas class and the subclass’s one of

the parent classes that implement the method draw. The SDs in Fig. A.2(b) illustrate the behavior of the objects

that are instances of subclasses (i.e., Triangle class, Circle class, Rectangle class, and Square class) of the

Shape class in Fig. A.2(a). By considering the three foremost SDs, we can determine that the Canvas class is

behaviorally dependent on the Triangle class, Circle class, and Rectangle class, all of which override the method

draw. By considering the last SD, which tell us that the object of the Square class receives the message draw, we

can also determine that the Canvas class is behaviorally dependent on the Rectangle class, since the method of

the message draw is actually implemented in the Rectangle class. As a consequence, no matter where an actual

method is implemented, the proposed BDM enables a class of the object sending a message to be bound to the

class that implements the actual method of that message; this is the feature that makes the BDM more sensitive to

systems with high levels of inheritance relationships and polymorphism.

Second, we consider the extent and direction when measuring the behavioral dependency. No matter how

many times a class calls the method of another class, the established coupling metric (e.g., CBO) treats this as one

in either direction. This is because the established coupling metric is based on method call dependencies that only

capture the static characteristics of couplings. Let us consider two cases; class cj implements one method called

100 times by class ci, while class ck implements two methods that called by class ci once time for each method.

The established static couplings for the former case and the latter case are one and two, respectively. However,

class ci might be more behaviorally dependent on class cj than it is on class ck. Therefore, it is important to keep

the information relating to the extent and direction of a class’s dependence.

Third, we consider the execution rate of the messages based on the control structure and the operational

profile. We use two kinds of diagrams, an SD and an IOD, to depict a system’s behavior. An SD in UML 2.0

provides combined fragments that allow us to express control structures such as branch and loop. An alt combined

fragment that corresponds to a branch control structure describes the behavior of two or more mutually-exclusive

alternatives. A message in an alt combined fragment can be executed depending on the condition. This may affect

the behavioral dependency of objects that are related by this message. Without running a program (i.e., dynamic

information), it is difficult to determine whether the message will be executed or not. Therefore, the probabilistic

execution rate of a message is considered when measuring a behavioral dependency. For example, in the SD sd B

of Fig. A.1, either message a or message b is executed whether the condition is satisfied or not (i.e., true or false).

– 73 –

BDM measurement procedure

«datastore»Class Diagram

«datastore»Sequence Diagrams (SDs)

«datastore»Interaction Overview Diagram (IOD)

Construct OBDM for each SD

Synthesize OBDMs into OSBDM

«datastore»OBDMs

Derive all reachable paths for each pair of objects

Sum the number of weighted reachable paths for each pair of classes

«datastore»OSBDM

Caculate the BDM for each class

«datastore»RPS between objects

«datastore»sumWRP between classes

«datastore»BDM

Figure A.4: Overview of our approach of BDM measurement

Therefore, the probabilistic execution rate of each message can be 0.5. An IOD in UML 2.0 illustrates an overview

of a flow of control in which each activity node can be an SD. Some scenarios (i.e., SDs) might be executed more

frequently than others, as specified in the operational profile [35]. The operational profile provides the expected

execution rate of an SD. Therefore, the operational profile also needs to be considered for the better measurement

of the behavioral dependency. We suggest specifying the Expected Execution Rate (i.e., the operational profile)

of each SD in an IOD. For example, the IOD in Fig. A.3(a) shows that the Expected Execution Rates of SD A and

SD B are 80% and 20%, respectively.

A.2 Procedure for Behavior Dependency Measure Measurement

In this section, we explain a systematic way of calculating the BDM in UML design models using SDs, a

class diagram, and an IOD. An overview of our approach is shown in Fig. A.4. The BDM is computed through the

following procedures. First, Object Behavioral Dependency Model (OBDM) is constructed for each SD based on

all direct and indirect behavioral dependencies between objects by referring to the class diagram and the IOD. After

that, we synthesize all OBDMs into the Object System Behavioral Dependency Model (OSBDM) for the entire

system. Next, we derive all the reachable paths for each pair of objects in the system from the OSBDM. We then

sum the weighted reachable paths for each pair of classes (a reachable path is weighted using the distance length

between objects and the execution rate of the messages of which the reachable path is composed). Finally, we

calculate the BDM for every class in the system. Detailed procedures are described in the following subsections.

Constructing OBDMs

A dependency model OBDMA for SD A is a 2-tuple (O,M ), where

– 74 –

• O is a set of nodes representing objects in the SD

• M is a set of edges representing messages that are exchanged between two nodes. Message m ∈ M

represents a synchronous message with a reply, which entails a direct dependency from a sender object to a

receiver object. Message m has following six attributes.

– ms ∈ O is the sender of the message.

– mr ∈ O is the receiver of the message. mr 6= ms.

– mn is the name of the message.

– mb ∈M is the instance of a backward navigable message. mb 6= m. “-” means none.

– mmeL is the probabilistic message execution rate in an SD. 0 ≤ mmeL ≤ 1. The default value is 1.

– mmeH is the expected message execution rate in an IOD. 0 ≤ mmeH ≤ 1. The default value is 1.

ms and mr represent the sender and receiver objects, respectively. Since we do not consider messages from

an object to itself, they should not represent the same node. As was pointed out in Subsection A.1, when an object

sends a message to another object, the class of the object receiving a message may be different from the class

implementing the corresponding method. In such a case, the class of the object sending the message may change

when the implemented method changes. Therefore, when binding a receiver node, it is important to note whether

the method is actually implemented in the class of the receiver object. If not, the receiver node of the message is

bound to an object of a parent class that actually implements the method.

mb represents the message that triggers m and is called a backward navigable message. As was noted in

Subsection A.1, mb is essential for identifying indirect behavioral dependencies between objects. We can identify

the message that activates the current message by tracing the backward navigable message. When deriving a

reachable path from the OSBDM, identification of the message that triggers m prevents infinite loop of traversing.

As was described in Subsection A.1, mmeL and mmeH help to better predict the change-proneness of classes

by considering the probabilistic or expected execution rates of the messages. Later, these rates are synthesized

according to a reachable path and used to measure behavioral dependency. mmeL represents the probabilistic

message execution rate in an SD. We consider a branch control structure that might affect the probability of the

message execution. Note that a branch control structure is represented as an alt combined fragment in UML

2.0. When a message is in an alt combined fragment, it is executed only when a condition of the corresponding

interaction operand is met. Therefore, mmeL is the same as the probability that one of the interaction operands

that contain the message is selected. If an alt combined fragment is nested, the probability that a message will

be executed in the corresponding combined fragment is multiplied to mmeL recursively. When a message is not

contained in any combined fragments, its mmeL is 1. mmeH represents the expected message execution rate in an

IOD. We specify the Expected Execution Rate (i.e., the operational profile) of each SD in an IOD. A message in

an SD is executed only when the corresponding SD is activated. Therefore, mmeH is the same as the probability

– 75 –

o2o1

o3

o1

a(-,1,0.8)

o3

a (-,0.5,0.2)

OBDMA OBDMB

o2o4

o5

b(a,1,0.8)

c(b,1,0.8)

d(a,1,0.8)

f(-,1,0.8) o6g(-,1,0.2)

b (-,0.5,0.2)

(a)

o2

b (-,0.5,0.2)

o1

o3

b(a,1,0.8)o4

o5

c(b,1,0.8)OSBDM

a(-,1,0.8) f(-,1,0.8)

o6

g(-,1,0.2)

d(a,1,0.8)

a (-,0.5,0.2)

(b)

Figure A.5: (a) OBDMA and OBDMB correspond to SD sd A and SD sd B in Fig. A.1. (b) The obtained OSBDM

by synthesizing the two OBDMs in (a).

that the control flow of the software reaches the SD to which the message belongs. The mmeH values can be

obtained by multiplying all the Expected Execution Rates on the way from the initial node to the corresponding

SD node in the IOD. If an SD is always activated, the mmeH values of all the messages in the SD are 1.

Fig. A.5(a) shows an example of two OBDMs constructed from SD sd A and SD sd B in Fig. A.1. Each

node of object oi, 1≤ i ≤6, corresponds to an instance of class ci in Fig. A.3(b). Each edge of message m

is represented as mn(mb,mmeL,mmeH). Due to inheritance relationships and polymorphism, which may bring

about dynamic binding of methods (Subsection A.1), we examine the behavior of objects in SDs and the structures

of classes in a class diagram in order to correctly identify dependency between classes. In other words, when an

object sends a message to another object, we first check whether the method of the message is implemented in the

class of the receiving object or in one of its parent classes; we then create an edge corresponding to the message

between the node of the object sending the message and the object of which the class actually implemented the

method of the message. For instance, when object o1 sends message a to object o2, just as in SD sd A, we create

the edge of message a between node o1 and node o2 in OBDMA since class c2 overrides method a. On the other

hand, when object o1 sends message g to object o2, just as in SD sd B, we create the edge of message a between

node o1 and node o6 in OBDMB since class c6 actually implements method g. To distinguish the messages in

OBDMB from those in OBDMA, we rename message a to a’ and b to b’. Since either the message a’ or b’ may be

activated depending on the condition of the alt combined fragment, both a’meL and b’meL are 0.5. The Expected

Execution Rates of SD A and SD B are represented in the IOD in Fig. A.3(a) and are reflected in the execution

rates of the messages as 0.8 and 0.2, respectively.

– 76 –

Algorithm 6 retrieveReachablePathSet(o1, o2 : O)input OUT ← outgoing message set of o1input IN ← incoming message set of o2input RP ← ∅ an array for storing reachable path

/*RP denotes a reachable path*/

input RPS ← ∅ a vector for saving a set of reachable paths

output RPS

for all in ∈ IN dofor all out ∈ OUT do

if in == out then/*For RPS by the Direct Behavioral Dependency*/

RPS ← RPS ∪ {in}else

/*For RPS by the Indirect Behavioral Dependency*/

RP ← RP + {in}while inb! = out && inb! = ∅ do

if in == out thenRP ← RP + {inb}RPS ← RPS ∪RP

RP ← ∅break

elseRP ← RP + {inb}in← inb

Synthesizing OBDMs into an OSBDM

To determine the behavioral dependencies between objects in the whole system, we synthesize OBDMs into

OSBDM = (Os,Ms). Os and Ms denote the set of objects and the union of messages that exist in the system,

respectively. The method for constructing the OSBDM will be explained using the example in Fig. A.5. Fig.

A.5(b) shows the obtained OSBDM by synthesizing the two OBDMs in Fig. A.5(a). This OSBDM is composed

of Os = {o1,o2, ..,o6} and Ms = {(m ∈ M of SD sd A) ∪ (m ∈ M of SD sd B)}. Note that object o1 in

SD sd A and object o1 in SD sd B are instantiated from the same class c1. Therefore only one o1 remains in the

OSBDM. The sending message a from o1 in SD sd A and another sending message a’ from o1 in SD sd B are

connected with the corresponding target object o2 in the OSBDM. If a messagem is triggered by another message

in the context of the system by examining the IOD, we set this other message as a backward navigable message of

message m. There is no such case in this example.

Deriving Reachable Paths

We derive all reachable paths for each pair of objects in the system from the OSBDM. LetRPS(oi, oj) = {s|

s is a reachable path between source object oi and target object oj} be a set of all the reachable paths between

two objects. To retrieve the RPS(oi, oj), we start traversing of the OSBDM from a message incoming to object

oj to a message outgoing from object oi in reverse. When object oi has a direct behavioral dependency on object

oj , one of the incoming messages to object oj and one of the outgoing messages from object oi are equal. This

– 77 –

Table A.1: RPS(oi,oj), which is a set of all the reachable paths for each pair of objects (Row: oi, Column: oj) in

Fig. A.5(b).

o1 o2 o3 o4 o5 o6

o1 - {a,a’} {ab,b’} {abc,ad} - {g}o2 - - {b} {bc,d} {f} -

o3 - - - {c} - -

o4 - - - - - -

o5 - - - - - -

o6 - - - - - -

message is then added into a set of reachable paths. On the other hand, when object oi has an indirect behavioral

dependency on object oj , we traverse the OSBDM from one of the messages incoming to object oj iteratively by

substituting it with a backward navigable message. In doing this, we finally reach one of the outgoing messages

from object oi. The stored sequence of messages encountered while traversing is the reachable path. The method

for retrieving a reachable path set from object o1 to object o2 is presented in Algorithm 6. The set of reachable

paths for each pair of objects in Fig. A.5(b) is presented in Table A.1.

Summing Weighted Reachable Paths

Prior to calculating the BDM for every class in the system, we sum the weighted reachable paths for each

pair of objects using theRPS obtained above. In this process, an object is projected onto the class from which the

object is instantiated. In this manner, the results of the summation of the weighted reachable paths are obtained

for each pair of classes.

We formalize the sum of the Weighted Reachable Paths (SumWRP) from class ci to class cj as follows:

SumWRP (ci, cj) =∑

∀s∈RPS(oi,oj)

DF (s)× fmeL × fmeH , (A.1)

where oi and oj indicate the objects that correspond to projected instances of class ci and cj , respectively. We

use three factors for weighting reachable path s: distance factor, fmeL, and fmeH . We define a distance factor by

DF (s) = 1/d, where d is the distance length (i.e., the number of messages in the corresponding reachable path

s). The rationale for using the distance factor is that an indirect behavioral dependency might be weakened by the

successive calls. In other words, the farther an object is from the source of changes, the less the object is likely

to be changed. Therefore, we need to degrade the impact when the distance of indirect behavioral dependency

between two objects is great. We represent the first message in the reachable path s as f . Then, fmeL, the

probabilistic message execution rate, and fmeH , the expected message execution rate, are taken into account as

factors for weighting a reachable path.

We explain how to calculate SumWRP using Table A.1. To calculate SumWRP (c2,c4), for example, we

first obtain reachable paths, {bc,d}, between object o2 and object o4. We then identify weighting factors for

each reachable path. For reachable path bc, the distance factor is 1/2, because the number of messages in this

– 78 –

Table A.2: SumWRP (ci,cj), which is the sum of the weighted reachable paths for each pair of classes (Row: ci,

Column: cj) and the BDM (ci) value of each class in Fig. A.3(b).

c1 c2 c3 c4 c5 c6 BDM(ci)

c1 0 0.9 0.5 0.67 0 0.2 2.27

c2 0 0 0.8 1.2 0.8 0 2.8

c3 0 0 0 0.8 0 0 0.8

c4 0 0 0 0 0 0 0

c5 0 0 0 0 0 0 0

c6 0 0 0 0 0 0 0

reachable path is 2. The first message of this reachable path is b. Therefore, bmeL, 1, and bmeH, 0.8, are applied

for weighting the reachable path. For reachable path d, the distance factor is 1; dmeL and dmeH are 1 and 0.8,

respectively. Finally, we sum the weighted reachable paths and obtain SumWRP (c2,c4) as follows:

SumWRP (c2, c4) = (1/2× 1× 0.8) + (1× 1× 0.8) = 1.2

Calculating the Behavioral Dependency Measure

Finally, the BDM for every class ci in the system is obtained as follows. Let C = {ci|1 ≤ i ≤ n} be all the

classes existing in the system.

BDM(ci) =∑

∀cj∈C, i6=j

SumWRP (ci, cj). (A.2)

Table A.2 summarizes the sum of the weighted reachable paths obtained from the OSBDM in Fig. A.5(b)

and the BDM of each class in Fig. A.3(b). The BDM is used to predict change-proneness; the higher the class’s

BDM, the larger the likelihood the class will be changed.

A.3 Change-Proneness Modeling

In this section, we describe the method for building a change-proneness prediction model. In our study, the

change-proneness is used for predicting change-prone classes in the successive versions.

Model Construction Method

To build the change-proneness prediction model, there are a large number of modeling techniques from

which to choose, including standard statistical techniques (e.g., logistic regression) and data mining techniques

(e.g., decision trees [41]). Multiple linear regression provides a regression analysis of variance for a dependent

variable explained by one or more factor variables. Hence, we choose a stepwise multiple regression [26] to build

the change-proneness model in this study. While constructing the regression, we remove outliers that are clearly

over-influential on the regression results. Two kinds of techniques can be used for outlier analysis: Standard errors

of the predicted values (S.E. of mean predictions) and the Mahalanobis distance [60]. The former is an estimate

– 79 –

Table A.3: The number of ground facts regarding 13 subsequent versions of JFlex (versions 1.3 to 1.4.3).

1.3 1.3.1 1.3.2 1.3.3 1.3.4 1.4pre1 1.4pre3 1.4pre4 1.4pre5 1.4 1.4.1 1.4.2 1.4.3

Package 3 3 3 4 4 4 4 5 5 5 5 5 5

Class 44 44 44 48 48 47 47 61 59 59 59 62 62

Interface 4 4 4 4 4 4 4 3 3 3 3 4 4

Invoked Message 1768 1828 1828 2042 2048 2071 2135 2426 2418 2401 2376 2651 2651

Reachable Path 1394 1442 1442 2335 2339 2362 2282 3244 3244 3249 3242 3317 3319

of the standard deviation of the average value for dependent variable for cases that have the same values with the

independent variables. The latter is a measure of how much a case’s values on the independent variables differ

from the average of all cases; case means a data instance for constructing a prediction model. Hence, we identify

and remove the instances that have extremely large S.E. of mean predictions and large Mahalanobis distance

values.

Model Variables

We first collected several data types from the object-oriented software.

The independent variables include the C&K metrics, Lorenz and Kidd metrics, MOOD metrics, and the

BDM. We collect the C&K metrics and Lorenz and Kidd metrics using [12]. These are the most widely used

metrics for evaluating object-oriented software. The set of metrics used in the case study are listed in the appendix.

To calculate the BDM, which is measured on UML models, we have developed a tool built on the EMF (Eclipse

Modeling Framework). It imports the UML 2.0 models in the format of XMI generated from [48], an Eclipse-

based UML 2.0 modeling tool made by the Rational Division of IBM.

Following a common analysis procedure [5], we first perform a Principal Component Analysis (PCA) to

identify the dimensions actually present in the data relating to the independent variables. We do not make use

of a PCA to select a subset of independent variables since, as discussed in [15], experience has shown that this

usually leads to suboptimal prediction models even though regression coefficients are easier to interpret. The re-

sulting principal components can be described in terms of categories such as size, complexity, cohesion, coupling,

inheritance and polymorphism (see the appendix).

The dependent variable of the model is the change-proneness. To compute the change-proneness, the change

data, which are obtained using a class-level source code diff, are collected for each application class. Based on this

change data, the total amount of changes (i.e., source lines of code added and deleted) within consequent releases

are measured.

A.4 Case Study

This section presents the results of a case study, the objective of which is to validate the usefulness of the

BDM presented above. The first subsection explains the details of the system. In the next subsection, the goal of

– 80 –

the case study and the validation method are described. In the third section, results are presented and interpreted.

The last subsection ends with a discussion.

The Subjects

In order to investigate whether the BDM is statistically related to change-proneness, we need a target system

that has well documented UML models with a class diagram, SDs an IOD, and subsequent releases for extracting

change-related information. For our experiment, we reverse-engineered the UML design models from the existing

system, JFlex, using a reverse engineering tool with manual supports. JFlex is a lexical analyzer generator for

Java, which is written in Java. JFlex takes a specially formatted specification file containing the details of a lexical

analyzer as input and creates a Java source file for the corresponding lexical analyzer. A number of reasons led us

to select JFlex for the case study:

• It has evolved through 14 generations (at the time that we conducted this case study) and recorded the

history of changes.

• The full source code of each version is available because it is an open-source project.

• It contains a relatively large number of classes.

• It is mature. The release dates are February 20, 2001 for the initial version (version 1.3) and January 31,

2009 for the latest version (version 1.4.3).

• It was written in Java. Our BDM is applied in object-oriented software that uses inheritance relationships,

so polymorphism and dynamic binding may occur.

Among the 14 releases of JFlex, version 1.3.5 is not considered in the case study because the changes made

between this version and version 1.3.4 are negligible. Table A.3 represents the number of ground facts regarding

the 13 successive versions of JFlex. The initial version of JFlex 1.3 consists of 44 Java classes and 1394 reachable

paths, while the latest version of JFlex 1.4.3 consists of 62 Java classes and 3319 reachable paths. The total

number of reachable paths can be less than the total number of invoked messages in each version of the system in

the following cases: (1) invoked messages for which call methods from the library are not considered (the scope

of the measurement is limited to the application classes of JFlex) and (2) invoked messages for which call methods

within the same class are not considered, since these internal messages do not cause behavioral dependency.

We collected several types of data (i.e., existing object-oriented software metrics, the BDM, and change data)

for each class from nine versions of JFlex based on reverse-engineered models. It should be noted that we collected

metrics that are available on design models. In other words, we did not gather metrics that are obtainable only from

source codes, such as source lines of code (SLOC), number of fields (NOF), and number of parameters (NOP).

To select classes with a long history of changes, we included the classes from the initial version that remain in the

latest version (i.e., 42 classes for each version of JFlex). For each version on which the BDM and other metrics

are measured, the change data was measured by counting the total number of changes in the next four subsequent

– 81 –

Reachable Path 987

lexParse:LexParse cUP$LexParse$actions:CUP$LexParse$actions charClasses:CharClasses intCharSet:IntCharSet out:Out

alt

[cond = true]

[cond = false]

1: CUP$LexParse$do_action

2: CUP$LexParse$do_action

1: makeclass

2: makeclass

1.1: sub

1.2: sub

1.1.1: dump

1.1.2: dump

Figure A.6: An SD that was reverse-engineered from source codes of JFlex version 1.3.

Validation procedure

Collect data

Cluster data

Identify dimensions of data

Perform stepwise multiple regression

«datastore»UML models

«datastore»BDM

«datastore»Exiting OO metrics

«datastore»Change data

«datastore»Identified dimensions of data

«datastore»Clustered membership of data

«datastore»Change-proneness prediction model

Validate the usefulness of BDM

Figure A.7: A validation procedure followed during in this case study

versions. This change data is used as change-proneness. We take 9 of the 13 releases into account because the

change data is not available in the last four versions; we finally obtained 378 instances of classes.

We easily reverse-engineered the class diagram from JFlex source code. On the other hand, reverse-engineering

SDs is difficult and sometimes even impossible [14], because an SD represents the partial behavior of the over-

all system; SDs can exist in various forms according to the various users’ view on the system. Thus, in this

case study, we construct the SD for each reachable path that consists of consecutive invoked messages, while

extracting the structural information from source codes and reflecting it in the SD as alt, opt, or loop combined

fragments. It should be noted that the SD in UML 2.0 uses the combined fragments to represent one or more

sequences (traces) rather than specifying all the possible scenarios [79]. Hence, we do not need to execute the

system and monitor its execution to retrieve meaningful information and reverse-engineer SDs from source codes.

Fig. A.6 shows an example of the reverse-engineered SD obtained from JFlex version 1.3. This SD corresponds

– 82 –

Table A.4: The analysis-of-variance (ANOVA) table that includes the results for each clustering variable

Cluster Error

Mean Square df Mean Square df F Sig.

MIF 320.333 1 .439 376 729.972 .000

PF 7500.000 1 .383 376 19583.333 .000

DIT .000 1 2.168 376 .000 1.000

NOC .593 1 .673 376 .881 .349

Table A.5: Descriptive statistics with respect to the four attributes for Group 1 (294 instances).

N Range Minimum Maximum Sum Mean Std. Variance Skewness

Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Statistic Statistic Std. Error

PF 294 2 36 38 11088 37.71 .041 .701 .491 -2.052 .142

MIF 294 2 70 72 20958 71.29 .041 .701 .491 -.462 .142

NOC 294 5 0 5 84 .29 .051 .882 .778 4.033 .142

DIT 294 5 0 5 210 .71 .086 1.471 2.164 1.864 .142

to the reachable path from the object of the LexParse class to the object of the Out class with four messages:

CUP$LexParse$do action, makeClass, sub, and dump. By analyzing source code from JFlex 1.3, for exam-

ple, we extracted 1394 reachable paths and, at the same time, constructed 1394 SDs with 1768 messages. The

IOD cannot be reversed from source codes; it is specified only from the early stages of software development to

help developers get an overview of the system. Thus, in this experiment, the Expected Execution Rate in the IOD

was not considered when calculating the BDM.

Goal and Validation Methodology

The goal of this case study is to confirm that the BDM is a significant additional explanatory variable over

and above that which has already been accounted for by other existing metrics when the system contains complex

inheritance relationships and polymorphism. It should be noted that the BDM considers dynamic features (see

Subsection A.1). As a result, we expect the BDM to more accurately predict behavioral dependency when the

system is involved in a complex dynamic binding occurrence environment. Indeed, accurate prediction of behav-

ioral dependency helps to construct a better change-proneness prediction model. In order to achieve this goal, the

validation procedure depicted in Fig. A.7 was performed.

To investigate whether the effect of the BDM is different according to intensity of use of inheritance rela-

tionships and polymorphism, we divide the data set into two groups and independently build change-proneness

prediction models for each group. We cluster the data into the following groups:

• (Group 1) contains comparatively more complex inheritance relationships and polymorphism.

• (Group 2) contains comparatively less complex inheritance relationships and polymorphism.

To classify the data set into these two groups, a clustering technique was applied. Clustering is also called data

– 83 –

Table A.6: Descriptive statistics with respect to the four attributes for Group 2 (84 instances).

N Range Minimum Maximum Sum Mean Std. Variance Skewness

Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Statistic Statistic Std. Error

PF 84 0 26 27 2268 27.00 .000 .000 .000 .000 .263

MIF 84 1 73 74 6174 73.50 .055 .503 .253 .000 .263

NOC 84 2 0 2 16 .19 .060 .548 .301 2.782 .263

DIT 84 5 0 5 60 .71 .161 1.477 2.182 1.888 .263

segmentation and is used to partition large data sets into groups according to similarities. Clustering may serve

as a preprocessing step for classification, which would then operate on the detected clusters and the selected

attributes or features [41]. We used K-means clustering [19] with four inheritance- and polymorphism-related

metrics: PF (Polymorphism Factor), MIF (Method Inheritance Factor), NOC (Number of Children), and DIT

(Depth of Inheritance). Table A.4 represents the analysis-of-variance (ANOVA) table that includes the results for

each clustering variable. At the α = 0.05 significance level, MIF and PF are significant explanatory variables to

divide the groups since Sig. (p-value) ≈ 0.000 ≤ 0.05 = α. The descriptive statistics with respect to the four

attributes are provided for Groups 1 and 2 in Tables A.5 and A.6, respectively. The classification results show

that the classes in Group 1 have higher PF and NOC values than those in Group 2. Therefore, Group 1 can be

characterized as having more complex inheritance relationships and polymorphism than Group 2. As there is not

much difference in MIF values and no difference in DIT values between the two groups, we did not consider these

two attributes for identifying each group’s characteristics. In analyzing the cluster membership of the classes in

JFlex across the nine versions, we noticed that the classes that developed earlier tend to belong to Group 1, while

the classes that developed later tend to belong to Group 2. In other words, the JFlex system has evolved in that it

now has less complex inheritance relationships and polymorphism.

For Groups 1 and 2, we construct two change-proneness prediction models: one between the change and

existing metrics and the other between the change and the BDM in addition to existing metrics. Consequently,

we analyzed four change-proneness prediction models in total. To validate the assertion that the BDM helps to

explain additional variations in change-proneness, we compare the goodness-of-the-fit of those two models. As

a result, it is clear that the BDM contributes to obtain a better model fit. The results of the change-proneness

prediction models are presented and discussed in detail in the next subsection.

Results

[Results of the change-proneness prediction models.]

We evaluated the performance of the prediction models according to the goodness-of-the-fit (R-square) and

a sequence of the selection of independent variables. The sequence of the selection is important because the

independent variable that has the largest positive or negative correlation with the dependent variable is selected at

each step in a stepwise selection.

We first present the results of the two regression models obtained from the data set of Group 1. The results of

– 84 –

Table A.7: Prediction model using existing metrics in Group 1.

Unstandardized Standardized

coefficients coefficients

Selected variables B Std. Error Beta t Sig.

(Constant) .050 .054 .915 .361

COC .050 .011 .215 4.695 .000

PProtM -.010 .004 -.122 -2.809 .005

NOC .199 .042 .204 4.715 .000

WMC -.019 .004 -.399 -4.444 .000

MNOL .114 .030 .249 3.748 .000

the stepwise regression using C&K metrics, Lorenz and Kidd metrics, and MOOD metrics as candidate covariates

are presented in Table A.7. The prediction model included five variables. The model explains around 57 percent

of the variance of the data set and shows an adjustedR2 of 0.54. The sequence of variables entering into the model

is COC, PProtM, NOC, WMC, and MNOL. The result after including the BDM in addition to existing metrics to

make a prediction model, we obtain the result as shown in Table A.8. Around 64 percent of the variance in the

data set is explained and an adjusted R2 of 0.63 is obtained. In this model, COC, BDM, PProtM, NOC, MNOL,

WMC, and NORM variables were included in the order of the sequence as listed. Note that the BDM is the second

variable to be included with significant-level of p-value < 0.00001. Even when accounting for the difference in

the number of covariates, the coefficient of determination (R2) is increased by 9 percent (from 0.54 to 0.63)

when using the BDM. Therefore, this experiment shows that the BDM helps to obtain a better change-proneness

prediction model. In other words, even though the existing metrics still do most of the lifting, the BDM captures

additional dimensions that enable the construction of a more accurate change-proneness prediction model.

From the data set of Group 2, we also constructed two regression models, one using only existing metrics

for the baseline of the comparison and the other using the BDM in addition to existing metrics, to investigate

whether the effect of the BDM performs differently according to the intensity of inheritance relationships and

polymorphism. In this group, the BDM was not included when constructing the regression model. Hence, the

results of the two models are the same. Table A.9 shows the results of the prediction model obtained from Group

2. This model explains the change variance of around 75 percent and shows an adjustedR2 of 0.74. The goodness-

of-the-fit of the prediction model in Group 2 is considerably higher than that of the prediction models in Group 1

because smaller data instances were used to create the model.

[Interpretation of Results.]

In the experiment, we divided the data set into two groups, one with comparatively more complex and the

other with comparatively less complex inheritance relationships and polymorphism, in order to investigate the

effects of the BDM according to the intensity of inheritance relationships and polymorphism. Intensity of use

of inheritance relationships and polymorphism can be explained through the attributes (i.e., inheritance- and

polymorphism-related metrics) used for K-means clustering groups; PF and NOC were used for characterizing

each group, as mentioned in Subsection A.4. PF equals the number of actual method overrides divided by the

– 85 –

Table A.8: Prediction model using existing metrics and BDM in Group 1.




(Constant) .048 .053 .894 .372

COC .050 .011 .214 4.632 .000

BDM .019 .008 .108 2.431 .000

PProtM -.012 .004 -.144 -3.263 .001

NOC .202 .042 .206 4.811 .000

MNOL .120 .029 .264 4.114 .000

WMC -.017 .004 -.361 -3.931 .000

NORM .008 .003 .184 2.465 .014

Table A.9: Group 2 prediction model (The result is same whether the BDM is used or not because the BDM is not

included in the model).




(Constant) -1.211 .397 -3.052 .003

NOO -.013 .003 -.307 -4.280 .000

CL .021 .004 .571 5.393 .000

NOIS .816 .107 1.252 7.615 .000

RFC -.139 .024 -.798 -5.832 .000

maximum number of possible method overrides and is calculated as a fraction. The PF value increases as the

system uses method overriding; if the system overrides everything, the PF is 100%. If subclasses seldom override

their parent’s methods, PF will be low. NOC equals the number of immediate subclasses derived from a base

class and measures the breadth of a class hierarchy. Conversely, DIT measures the depth. Therefore, it can be

concluded that Group 1 contains relatively complex inheritance relationships and polymorphism than Group 2

since the former has higher PF and NOC values than the latter.

It was determined that the BDM is a significant indicator as it helped to improve the accuracy of change-

proneness prediction in Group 1 only. This result was anticipated because in Group 1, the system redefines the

parent’s methods more often(i.e., high PF) and inherits parent classes more often (i.e., high NOC) than the system,

in Group 2. In short, Group 1 contains high degree of inheritance relationships and polymorphism, which may be

the reason for the high probability that dynamic binding will occur. In Group 2, the BDM could not be selected

as a variable to explain the variances in change-proneness. This indicates that the BDM is no more useful than

existing metrics in systems that contain low degree of inheritance relationships or polymorphism. By analyzing

the results of the experiment, we reached the conclusion that the BDM can help accurately predict changes when

the system contains high degree of inheritance relationships and polymorphism. The reason for improving the

accuracy of change-proneness prediction is the BDM’s feature enabling a class of the object sending a message to

– 86 –

be bound to the class that actually implements the method of the message, as mentioned in Subsection A.1. To put

the point another way, the BDM is a specific measure for a consideration of the dynamic behavior of the system.

Discussion

In the case study, we showed that the BDM is the significant indicator for predicting change-proneness when

the system contains high degree of inheritance relationships and polymorphism. However, it is not possible to

determine the exact thresholds of the system’s high degree of inheritance relationships and polymorphism because

these thresholds are relative and empirical. However, we do not need specific guidelines that tell us when to use

the BDM in change-proneness prediction. The BDM is an additional variable that may be used in conjunction

with existing metrics for explaining variance in change-proneness for systems where dynamic binding is likely to

occur. When constructing a change-proneness prediction model, the BDM is not selected if it cannot capture any

features over and above those captured by existing metrics. In other words, the BDM is selected as a significant

variable only if it helps to improve the accuracy of change-proneness prediction in addition to existing metrics.

In the case study, we used reversed UML models that were obtained from source codes, even though model-

based change-proneness prediction was the goal of the study. This is because, in practice, most legacy systems

which have been developed and maintained for a long period of time do not have well documented design models,

especially for SDs and IODs. It is worth reiterating that an IOD is specified from the early stages of software

development and cannot be reversed from source codes. If an IOD specified with the Expected Execution Rate in

each SD is available, a more accurate BDM may be obtained. In the future, we plan to use the UML models which

will soon be available from the Repository for Model-Driven Development (REMODD) project [30] in order to

explore the usefulness of the BDM for model-based change-proneness prediction.

The fitness of the models for model-based change-proneness prediction is rather low compared to models for

code-based change-proneness prediction. This is because the information extracted from UML models is not as

sufficient for change-proneness prediction as the information from source codes. If other metrics derivable from

only source codes were considered when building the change-proneness prediction model, the R2 values would

be higher. For example, SLOC, which indicates the size of the class, is known as a significant indicator to affect

change-proneness [4]. Of course, the goal of this study is to determine whether the BDM helps to obtain a better

model fit. Therefore, we need to see that it provides an improved predictive model when compared to models

considering only existing metrics that are available on UML models. This provides the benefit of early change-

proneness prediction at the moment a design model becomes available, without the necessity of implementing

source codes.

The results from our earlier study [40] on JFreeChart [34] also showed that the BDM is a strong indicator

and complementary to C&K metrics for explaining the variance of changes. In this paper, we performed the new

experiment to compare the effect of the BDM with varying degrees of inheritance relationships and polymor-

phism. We used another system, because complex dynamic bindings may not occur in the system examined in the

previous study, JFreeChart, since it is the graphic library for generating various types of charts. In other words,

– 87 –

in JFreeChart, dependencies occurred by method calling would be simple. In our previous paper, we only used

C&K metrics as the existing metrics. However, in this case study, we used more metrics to confirm that the BDM

is effective with regard to change-proneness prediction.

– 88 –

요약문

객체지향소프트웨어의유지보수성향상을위한리팩토링식별및선택

방법에대한연구

객체지향소프트웨어서는유지보수관련한다양한활동들을지원하기위하여계속해서변경이일어

나는데, 이러한 변경은 소프트웨어의 품질을 현격히 저하시킨다. 이러한 문제를 향상시킬 수 있는 방법

중하나인리팩토링을사용하는데이는객체지향소프트웨어의설계를변경하되외부행동은변경시키지

않으면서유지보수성을향상시키고유지보수비용과절감하고시장으로의출시시간 (time-to-market)을

단축시킬수있는이점이있다.

본연구에서는객체지향소프트웨어체계적인리팩토링식별방법 (리팩토링후보군식별과리팩토

링 선택 방법)을 제안한다. 리팩토링 후보 군을 식별하기 위하여 우리는 톱 다운 (top-down)과 보텀 업

(bottom-up) 방식을 제안한다. 첫 번째로, 톱 다운 방식은 전통적인 방식으로써, 객체지향 소프트웨어의

설계원리들을위반하는요소들을제거하기하기위한휴리스틱 (heuristic)한규칙들을기반으로리팩토

링 후보 군을 식별한다. 본 연구에서는 리팩토링 후보 군들을 메소드와 클래스와 같은 엔티티 (entitiy)

들간의의존성을줄이는방향으로규칙들을고안하였다. 또한규칙을만들때,시스템이어떻게사용되

는지에대한동적인정보가앞으로소프트웨어에일어날변경을예측하기위한중요한요소라는연구를

기반으로 개발하였다. 이를 위하여, 유지보수성을 효과적으로 향상시킬 수 있는 리팩토링을 수행하기

위하여,사용자시나리오 (user scenario)와사용프로파일 (operational profile)과같이사용자들이소프트웨

어를사용하는정보를기반으로엔티티들을식별하고이러한엔티티들을중심으로리팩토링후보군을

식별한다. 두 번째로, 보텀 업 방식은 리팩토링 후보 군들을 미리 정해진 패턴이나 규칙에 기반하지 않

고찾는방법으로써, 그래프이론에서의최대독립집합 (maximal independent set)개념을기반으로하여

메소드와 클래스와 같은 엔티티 (entitiy)들을 그룹핑하는 방법을 고안하였다. 엔티티들을 그룹핑 할 때

하나의리팩토링이다른리팩토링의수행과충돌이나는현상인기존의리팩토링의존성뿐만아니라본

연구에서새롭게발견된유지보수성에대한리팩토링효과의존성 (refactoring effect dependency)을고려

한다. 각각의최대독립집합에있는엔티티들은기본레벨 (elementary)리팩토링들의그룹으로매핑되며

이들 리팩토링들은 한꺼번에 수행 가능하다. 마지막으로, 식별된 리팩토링 후보군들 중에서 실제로 적

용할리팩토링들을선택한다. 이를위하여,기본레벨리팩토링들의평가(assessment)및영향분석(impact

analysis)를 지원함으로써 다수의 (multiple) 리팩토링을 선택할 수 방법을 리팩토링 효과 평가 프레임워

크 (refactoring effect evaluation framework)형태로고안하였다. 리팩토링효과평가프레임워크는각각의

기본레벨리팩토링의효과를매트릭스계산을이용하여평가하고,이를이용하여다수의기본레벨리팩

토링들을포함한그룹중가장많이유지보수성을개선할수있는그룹을선택한다.

제안한방식은 jEdit, Columba, JGIT과같은세개의오픈소스프로젝트에대하여평가하였다.그리고

실험결과로부터동적정보가효과적으로유지보수성을개선할수있는리팩토링을식별하는데도움이

– 89 –

된다는사실을보였다. 이는동적정보가자주변경되는클래스에서리팩토링후보군들을식별하기때문

이었다. 또한다수의리팩토링선택방식은더빨리피트니스함수 (fitness function)값을올리고,더적은

탐색 공간과 시간 측면에서 비용을 요구함을 보였다. 또한, 리팩토링 효과 의존성은 가장 많이 유지보

수성을개선하는그룹을적절하게선택하기위하여필수적으로고려되어야한다는사실도실험결과를

통하여보였다.

– 90 –

감사의글

2005년 3월 카이스트 SE LAB 배두환 교수님 연구실에 석사로 입학하여 박사 졸업하기 까지 많은

시간이흘렀습니다. 저에게도박사학위감사의글쓰는시간이올까라고생각하며힘들어하던시기도있

었지만많은분들의도움으로무사히박사학위과정을마칠수있었습니다.이글을빌려저를도와주시고

지지해주신모든분들께감사의말씀을전하고자합니다.

먼저 박사 과정 동안 많은 격려와 사랑을 베풀어주신 저의 지도 교수님이신 배두환 교수님께 큰 감

사의마음을전합니다. 교수님께서저의지도교수님이신것이너무자랑스럽습니다. 교수님께서부족한

저에게베풀어주신사랑다른사람들에게도돌려주며,앞으로교수님제자로서멋지게살아가도록하겠

습니다.

바쁘신 와중에도 소중한 시간을 내어 논문 심사를 해주시고 세심한 제안과 따뜻한 격려를 해 주신

백종문 교수님, 맹승렬 교수님, 허재혁 교수님, 홍장의 교수님께도 깊은 감사를 드립니다. 또한 연구자

로서의 열정과 모범을 보여주시고 마음으로 응원을 보내주시는 차성덕 교수님, 권용래 교수님, 이윤준

교수님께도깊은감사의말씀전합니다.

박사과정동안옆에서함께해주는연구실선후배와동기들이저에겐큰힘이되었습니다.먼저힘든

일이있을때마다고민을들어주시고진심으로따뜻하게격려해주신지은경박사님감사드립니다. 일과

가정의균형을잃지않으시고매사에긍정적이신박사님의모습은저에게많은귀감이되었습니다. 또한

랩생활동안정말많은힘이되었던두여학생,희진이와현정이에게도고맙다는인사전합니다. 밥먹고

난후그대들과마시는커피한잔은저에게는연구실생활의활력소가되었습니다. 20대꿈많던아가씨

때 만나서 결혼하기 까지 모두 지켜본 사이인 만큼 공유하는 추억이 참 많네요. 8년 동안 함께 있어줘서

고맙고앞으로도더욱더깊은우정으로평생함께할수있으면좋겠습니다.

또한랩에서조용히맡은일을묵묵히해내며연구실의기둥이되어준영석이,연구실에활력을불어

넣어주고든든한버팀목이되어준박사과정들,기곤이,지민이,지훈이,동환이모두감사합니다. 앞으로

도지금처럼좋은연구하시고건승하시길바라겠습니다. 또한산업체에서오셔서공부하시느라수고가

많은 김영택 선배님, 한석씨, 종세 오빠, 미선 언니에게도 감사합니다. 덕분에 좋은 이야기도 많이 듣고

간접적으로 회사에 대하여 경험할 수 있었습니다. 학위과정 성공적으로 마무리 하시고, 학교에 계신 동

안 좋은 추억 많이 만드셨으면 좋겠습니다. 또한 수업과 과제들, 그리고 랩의 궂은 일을 도맡아 하느라

수고가많은광의, (서)동원, (임)유진에게도감사의인사전합니다. 특히디펜스다과준비정말감동이었

습니다. 앞으로도특유의유머와긍정의힘으로어떤일이든잘해나가리라믿습니다. 그리고베트남에서

왔지만한국어도잘하고마음착한 Nguyen Minh Chau에게도감사합니다. 박사고년차인저에게어려워

하지않고다가와주고이야기도많이들려주어고마웠습니다. 마지막으로,함께지내면서정이많이든

백교수님학생들박진희, 류덕산, 김다정, 김태현, 이광규, 이낙원, 채민성에게도감사의말씀전합니다.

연구열심히하셔서앞으로원하시는바모두이루길기원합니다.

지금은 계시지 않지만 그동안 랩에서 함께 생활했던 선 후배님들께도 감사의 인사 전합니다. 연구

– 91 –

하는데 아낌 없이 조언해 주시고 가족처럼 챙겨주었던 경아 언니, 상욱 오빠, (강)동원 오빠, 승훈 오빠,

경식오빠,형인,선경,찬희,슬기,현식,수현오빠,성태오빠,태훈오빠,숙희언니,석중오빠,진웅,범석

오빠, Marew, Khanh, Giang, 승재, 인규 선배, 신호승 선배, 형호 오빠, (송)지환 오빠, 진호 오빠, 마유승

박사님의따뜻한마음들잊지못할것입니다. 어디에있으시던건승하시고행복하시기를바랍니다.

학위 과정 중에 만난 보물 같은 친구들에게도 감사를 전합니다. 특히 함께 박사 과정에 진학하여

동고동락한저의대녀효실이와저의아들의대부인혁윤군과함께보낸시간들은잊지못할것입니다.

둘의 결혼을 다시 한번 축하하며 행복한 가정 꾸리시고, 또한 일 에서도 앞으로 원하는 바 꼭 이루길 바

라겠습니다. 또한 석사 동기들 영남, 은정, 영현 오빠, 형호, 태인, 지연 언니, 경수 오빠, 준일 오빠에게

감사합니다. 앞으로도모두열심히노력하여서자신의목표를실현하고좋은성과거두기를바랍니다.

또한멀리떨어져있어서연락도소홀하고잘챙겨주지는못함에도불구하고진심으로아껴주는친

구들에게도고맙고미안한마음전합니다. 특히,두아들의슈퍼맘이자나의죽마고우형이,레지던트말

년 차라 고생하는 우리 현숙이, 유쾌 통쾌 상쾌 혜진이, 미국에서 박사 과정 중인 (이)주희, 똑 소리 나는

엄마이자아내인서강대99퀸카였던미모의경민언니,그리고가정과회사에서의일모두훌륭하게해

내고 있는 멋진 친구들 주연, 윤경, 선미, 선미, (강)유진, 보민, 준희 언니, (하)주희 언니에게 진심으로

감사하다는마음전합니다. 또한남편의회사위클레이 (Weclay Inc.) 식구들인윤대표,우석씨,동주씨,

주형씨,대영씨,성진씨에게도감사의마음전합니다. 서로아껴주면서힘내서열심히일하는모습이

너무멋지고,분명히좋은결실을맺으리라는걸믿어의심치않습니다. 항상뒤에서응원하겠습니다.

저를위하여기도해주시고따뜻한격려를아끼지않은친척분들께도감사의인사드립니다.사랑하

는 사촌들, 특히 브라질에 있는 지은 언니, 상도 오빠, 상훈 오빠, 슬기, 가은이, 보원 언니, 창배 오빠, 솔

언니, 효정이, 정훈이, 캐나다에 있는 오늘 결혼한 Sarah, Sungmin 언니, David 오빠, 소라 언니, 정진이,

멕시코에있는 (이)현정언니, Hyun Hwa언니에게감사전합니다.

마지막으로,저의사랑하는가족에게진심으로고맙다는말씀전합니다. 먼저원준이태어나서산후

조리를 시작으로 세 돌이 되어가는 시점까지 지극 정성으로 원준이를 건강하고 예쁘게 키워 주신 친정

부모님 정말 감사합니다. 부모님 덕분에 무사히 졸업을 하였습니다. 부모님 고생이 헛되지 않게 저희

몫을 잘 해나면서 살겠습니다. 그리고 언제나 저와 남편, 원준이를 위하여 기도해 주시고 배려해주시는

시부모님진심으로감사합니다. 정말딸같이예뻐해주시고넘치는사랑주셔서매일감사하면서살고

있습니다. 앞으로더욱잘사는모습보여드리겠습니다. 그리고건우오빠, UBC에서힘든공부중인동생

정섭이 에게도 감사합니다. 열심히 한 만큼 좋은 결실을 맺을 수 있도록 기도하겠습니다. 그리고 제가

세상에서제일사랑하는두남자,남편송인권과아들송원준에게감사의말씀을전합니다. 남편,당신은

저의 최고의 연구 동료이자 최고의 친구입니다. 늘 곁에서 큰 힘이 되어 주어 정말 고맙고, 하는 일들이

고되고힘들지만내색한번안하고즐기면서하는모습보여주어감사합니다. 지금하는일들이잘되길

바랍니다. 건강하세요.

마지막으로 지면으로 통해서 일일이 언급을 하지 못했지만 그 동안 저를 아끼고 사랑해주신 모든

분들께다시한번진심으로감사드립니다.

– 92 –

이력서

이 름 : 한아림

생 년 월 일 : 1981년 9월 30일

출 생 지 : 경기도부천시

본 적 지 : 서울시성북구하월곡동

주 소 : 경기도성남시분당구구미동 75우성빌라 109동 301호

E-mail 주 소 : [email protected]

학 력

1997. 3. – 2000. 2. 서울대학교사범대학부속고등학교

2000. 3. – 2004. 2. 서강대학교컴퓨터학과 (B.S.)

(Graduation Rank: No. 3 among all Computer Science students)

2005. 3. – 2007. 2. 한국과학기술원전산학과 (M.S.) (advisor: Doo-Hwan Bae)

Thesis: Behavioral Dependency Measurement in UML 2.0 Sequence Diagrams for Change-

proneness Prediction

2007. 3. – 2013. 8. 한국과학기술원전산학과 (Ph.D.) (advisor: Doo-Hwan Bae)

Thesis: Identification and Selection of Refactorings for Improving Maintainability of

Object-Oriented Software

경 력

2004. 2. – 2004. 4. Intern, ZIO Interactive, Seoul, Korea

2004. 8. – 2004. 10. Intern, Peace Corps (Headquarters), Washington, D.C., USA

2007. 3. – 2007. 12. Chair of the Ph.D students of Computer Science Department, KAIST, Daejeon, Korea

2011. 3. – 2012. 8. SAMSUNG Scholarship Program, SAMSUNG Electronics (supported by Video Display

Division), Korea

– 93 –

연구업적

1. Ah-Rim Han, Sang-Uk Jeon, Doo-Hwan Bae, Jang-Eui Hong, Measuring behavioral dependency for im-

proving change-proneness prediction in UML-based design models, Journal of Systems and Software (JSS),

Vol. 83, No. 2, pp. 222-234, Feb. 2010. (http://dx.doi.org/10.1016/j.jss.2009.09.038)

2. In-Gwon Song, Sang-Uk Jeon, Ah-Rim Han, Doo-Hwan Bae, An approach to identifying causes of implied

scenarios using unenforceable orders, Information and Software Technology (IST), Vol. 53, No. 6, pp. 666-

681, Jun. 2011. (http://dx.doi.org/10.1016/j.infsof.2010.11.007)

3. Ah-Rim Han, Doo-Hwan Bae, Dynamic profiling-based approach to identifying cost-effective refactorings,

Information and Software Technology (IST), Vol. 55, No. 6, pp. 966-985, Jun. 2013.

(http://dx.doi.org/10.1016/j.infsof.2012.12.002)

학회활동

1. Ah-Rim Han, Sang-Uk Jeon, Doo-Hwan Bae, Jang-Eui Hong, Behavioral Dependency Measurement for

Change-proneness Prediction in UML 2.0 Design Models, Proceedings of the 32nd Annual IEEE Interna-

tional Computer Software and Applications, pp. 76-83, Jul. 2008. (19.5% acceptance rate) (Invited to the

Special Issue of Journal of Systems and Software (JSS))

2. Ah-Rim Han, Doo-Hwan Bae, Automatic selection of multiple refactorings by considering refactoring effect

dependency, 28th IEEE/ACM International Conference on Automated Software Engineering (ASE) 2013,

submitted.

3. Ah-Rim Han, Sang-Uk Jeon, Jang-Eui Hong, Doo-Hwan Bae, Timing Consistency Checking in UML 2.0

Behavioral Models using OCL, Proceedings of the Korea Computer Congress (KCC), Vol. 33, No. 1, pp.

181-183, Jun. 2006. (in Korean)

4. Ah-Rim Han, Dong-Won Kang, Hyeon-Jeong Kim, Doo-Hwan Bae, An Approach to Extract Similar Process

for Knowledge-Based Software Process Tailoring, Proceedings of Workshop on Korea Software Engineering

Technology, Vol.5, No. 1, pp. 42-52, Aug. 2007. (in Korean)

– 94 –

5. Hyung-In Ihm, In-Gwon Song, Sang-Uk Jeon, Ah-Rim Han, Jang-Eui Hong, Doo-Hwan Bae, A Technique

of Power Consumption Estimation for Embedded Software Design Models, Proceedings of 2008 Korea Con-

ference on Software Engineering (KCSE), Vol. 10, No. 1, pp. 113-120, Feb. 2008. (in Korean)

6. Hyung-In Ihm, Ah-Rim Han, Sang-Uk Jeon, Doo-Hwan Bae, Jang-Eui Hong, Instruction Pattern-Based

Power Consumption Estimation for Embedded Software Design Models, Proceedings of 2009 Korea Confer-

ence on Software Engineering (KCSE), Vol. 11, No. 1, pp. 122-129, Feb. 2009. (in Korean)

– 95 –