Quality prediction model for object oriented software using UML metrics

Quality Prediction Model using UML metrics [1] of [42]

Quality prediction model for object oriented software using

UML metrics

Ana Erika Camargo,

Koichiro OchimizuJapan Advanced Institute of Science and Technology

4th World Congress for Software Quality – Bethesda, Maryland, USA – September 2008


Outline

• Objective• Scope• Our Approach• Related Work• Design complexity metrics and UML• Prediction Technique• Case Study• Conclusions and Future work


Objective

To create a model which is able :

• To predict fault-prone* code in early phases of the

life cycle of the software

• To detect possible defects in the software

(*) Fault-prone code: Code capable of having bugs.


Scope

Misunderstandingof Requirements

ComplexSpecifications

Fault-pronecode

Complex Implementation

Developers'experience

Complex Design

Designers'experience

Wrong Design

Wrong Implementation

OO

SS

SS

O

O S

S

O

Causal-Loop DiagramCausal-Loop Diagram

: VariablesS : Change in the Same DirectionO : Change in the Opposite Direction : Scope of this study


Our Approach

Predict before coding

*Related existing works Predict

Fault-prone code

Approximation:To obtain good candidates of fault-proneness prediction

Design Complexity Metrics

FROM: UML Artifacts


FROM: Code


Related work: Fault predictionPrediction models of fault-proneness:

Study Input: Design Complexity Metrics

Output Prediction Technique

Basili et al. [1996]

CK Metrics among others

Fault-prone classes Multivariate Logistic Regression

Briand et al.[2000] Fault-prone classes Multivariate Logistic Regression

Kanmani et al.[2004] Fault ratioGeneral Regression Neural Network

Nachiappan et al.[2005] Fault ratioMultiple Linear Regression

Olague et al.[2007] CK, QMOOD Fault-prone classes Multivariate Logistic Regression

CK : Chidamber & Kemerer, QMOOD: Quality Metrics for Object Oriented Design


Related work: Fault prediction

From these studies, we identified useful metrics to predict fault-proneness of code :

• Chidamber and Kemerer – CK 1. Depth of inheritance tree (DIT)2. Number of children (NOC)3. Weighted Methods per Class (WMC)4. Coupling Between objects (CBO)5. Response for class (RFC)6. Lack of Cohesion of methods (LCOM)

• Bansiyana and Davi's quality metrics - QMOOD 7. Average of DIT for all classes in the system (ANA)8. Class Interface Size (CIS)9. Data Access Metric (DAM)10.Direct Class Coupling (DCC)11.Measure of aggregation (MOA)12.Measure of functionality abstraction (MFA)13.Number of methods (NOM - same as WMC)


Related work: UML & Design Complexity Metrics

• Tang et. al[2002]: Measures CK metrics from data structures , which are created from Rational Rose class, collaboration and activity diagrams.

Issue:To obtain accurate measures, assumptions are

made, related to the level of details in the diagrams. For example: one activity diagram per operation in the system is required


Related work: UML & Design Complexity Metrics

• Baroni [2002]: formal definition of CK and QMOOD metrics, among others. This work uses UML class diagrams.

Issues:RFC, LCOM calculations are code dependentCBO calculation, does not have a clear inclusion

of methods used or variables instantiated of different classes, within every method of a class.


UML & Design Complexity Metrics

Predict before coding

*Related existing works Predict

Fault-prone code

Approximation:To obtain good candidatesof fault-proneness prediction


FROM: UML Artifacts


FROM: Code



Design complexity metrics that can be approximated using UML class diagrams:

• Chidamber and Kemerer – CK Weighted Methods per Class (WMC) Depth of inheritance tree (DIT) Number of children (NOC) Coupling Between objects (CBO) Response for class (RFC) Lack of Cohesion of methods (LCOM)

• Bansiyana and Davi's quality metrics - QMOOD Average of DIT for all classes in the system (ANA) Class Interface Size (CIS) Data Access Metric (DAM) Direct Class Coupling (DCC) Measure of aggregation (MOA) Measure of functionality abstraction (MFA) Number of methods (NOM - same as WMC)

Can be obtained straightforward from CLASS Diagrams Cannot be calculated precisely from CLASS Diagrams. Implementation of the bodies of the classes is needed.



CBO Approximation

• CBO-code:Num. Classes Couple to a given Class *

• CBO-UML Approach 1 (UML Collaboration Diagram):A count of all messages Sent to different objects

• CBO-UML Approach 2 (UML Collaboration Diagram):The same as Approach 1, but eliminating those which RETURN a value.

(*) If a method within a class uses a method or instance of a variable of a different class, it is said that this pair of classes is coupled



CBO Approximation

aCustomer

R7: fundsStatus : = CommtiFunds()


UML & Design Complexity MetricsCBO Evaluation using an e-commerce system (*).

(*) Described in: Gomaa Hassan, Designing Concurrent, Distributed, and Real-Time Applications with UML, Addison Wesley-Object Technology Series Editors, July 2000.

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

0 2 4 6 8 10 12 14

Class number

CB

O

CBO-code

CBO-UML(2)

CBO-UML(1)



CBO Evaluation

• For CBO-code and CBO-UML Approach 1

correlation coefficient = 0.81

• For CBO-code and CBO-UML Approach 2


CBO-UML Approach 2 is slightly more linear to

CBO-code



RFC Approximation• RFC-code:

Num. of Methods of a given class + Num. of methods of other classes directly called by any of the methods of the given class.

• RFC-UML Approach 1 (UML Collaboration Diagrams): Messages Received + Messages Sent

• RFC-UML Approach 2 (UML Collaboration & Class Diagrams):(Messages Received + Number of attributes*2) + Messages Sent, where:

(Messages Received + Number of attributes*2) ~ Num. of Methods of a given class. Considering 2 public methods per attribute to get and to set its value.



RFC Approximation

class C {A a; void m() {

D d ;D d ;d.dosth();d.dosth();

…… …….... }}

void setA (A a) { this.a = a; } A getA() {

return a;}

}

c d

x

dosth()

m()

RFC (C) = 3 + 1 = 4


UML & Design Complexity MetricsRFC Evaluation using the same e-commerce system.

0

0.2

0.4

0.6

0.8

1

1.2

0 2 4 6 8 10 12 14

Class number

RFC

RFC-code

RFC-UML(2)

RFC-UML(1)



RFC Evaluation• For RFC-Code and RFC-UML Approach 1

correlation coefficient = -0.07

• For RFC-Code and RFC-UML Approach 2


RFC-UML Approach 2 has a stronger linear relationship with RFC-Code



Remark

If true that our 2nd approach’s assumption might not be all valid, it still obtained an acceptable performance.

Which might be explained to the fact that private attributes in a class are moderate correlated to its number of methods, according to Olague’s research [2007].



Design complexity metrics that can be approximated using UML diagrams:

• Chidamber and Kemerer – CK Weighted Methods per Class (WMC) Depth of inheritance tree (DIT) Number of children (NOC) Coupling Between objects (CBO) Response for class (RFC) Lack of Cohesion of methods (LCOM)

• Bansiyana and Davi's quality metrics - QMOOD Average of DIT for all classes in the system (ANA) Class Interface Size (CIS) Data Access Metric (DAM) Direct Class Coupling (DCC) Measure of aggregation (MOA) Measure of functionality abstraction (MFA) Number of methods (NOM - same as WMC)

Can be obtained straightforward from CLASS Diagrams Can be approximated by using COLLABORATION Diagrams Can not be approximated precisely using UML Diagrams


Prediction Technique

Predict : How?

Design Complexity Metrics (12)

FROM: UML Artifacts

Fault-prone codePredictDesign Complexity

Metrics (13)

Approximation

Related existing works

FROM: Code



Logistic Regression

• Use. When we have one variable (y) with two values (e.g. faulty /no faulty, 1/0) and one or more measurement variables (xs).

• Goal. To predict the probability of getting a particular value of y , given xs variables, through a logit model.

• Key Points. No assumptions on the distribution of variables are made.



Logistic Regression



Example. We want to estimate the probability of a class to be highly FAULTY, in terms of a design complexity metric: Mx.



Faulty: Most Faulty (MF) = 1Least Faulty (LF) = 2

Design complexity Metric: Mx

CLASS FAULTY Mx CLASS FAULTY Mx---------------------------------------- ---------------------------------------------1. 1 1 13 2 12 1 1 14 2 03 1 1 15 2 04 1 1 16 2 05 1 1 17 2 06 1 1 18 2 07 1 1 19 2 08 1 1 20 2 09 1 1 21 2 010 1 1 22 2 011 1 0 23 2 012 1 0 24 2 0

CLASS Mx=1 Mx=0 Total--------------------------------------------MF=1 10 2 12LF=2 1 11 12--------------------------------------------Total 11 13 24



Probabilities • The probability of any given CLASS will be MF:

P(MF) = 12 /24 = 0.50

• The probability of any given CLASS will be MF given that Mx=1: P(MF|Mx=1) = 10/11= 0.909

• The probability of any given CLASS will be MF given that Mx=0: P(MF|Mx=0) = 2/13= 0.154

CLASS Mx=1 Mx=0 Total--------------------------------------------MF 10 2 12LF 1 11 12--------------------------------------------Total 11 13 24



Odds• The odds of a CLASS being MF:

Odds(MF) = 12 /12 = 1

• The odds of a CLASS being MF given that Mx=1 : Odds(MF| Mx=1) = 10/1= 10 …. (1)

• The odds of a CLASS being MF given that Mx=0 : Odds(MF| Mx=0) = 2/11= 0.182 … (2)

CLASS Mx=1 Mx=0 Total--------------------------------------------MF 10 2 12LF 1 11 12--------------------------------------------Total 11 13 24



• Odds and Probabilities provide the same information but in different ways.

• It is easy to convert odds y probabilities and vice-versa, e.g. :

1 + odds (MF| Mx=1)P(MF| Mx=1) =

1 - P (MF| Mx=1)Odds(MF| Mx=1) =

10

1+10= 0.909odds (MF| Mx=1) =

P (MF| Mx=1) =0.909

1-0.909

= 10



• Applying the natural log of (1) and (2) :ln [ Odds(MF|Mx=1) ] = ln ( 10 ) = 2.303 …………(3)ln [ Odds(MF|Mx=0) ] = ln (0.182) = -1.704 ………(4)

• We can generalize (3) and (4) in the following: ln[ Odds(MF|Mx) ] = A + B*Mx ………..(5)

• From (3) and (5), when Mx = 1: ln[ Odds(MF|Mx) ] = A + B = 2.303 ….(6)

• From (4) and (5), when Mx=0: ln[ Odds(MF|Mx) ] = A = -1.704 ……..(7)

• From (6) and (7): A = -1.704 , B = 4.007

• Finally we can re-write (5) as follows: ln[ Odds(MF|Mx) ] = -1.704 + 4.007 *Mx


ln[ Odds(MF|Mx) ] = -1.704 + 4.007 *Mx

• If:

• We can re-write our final equations as:


1 - pOdds(MF|Mx) = ; p = P (MF|Mx)

p

p

1 - pln [ ] = -1.704 + 4.007 *Mx

1 (1+e-(-1.704+4.007Mx) )p = P (MF|Mx) =


Case study


FROM: UML Artifacts

Fault-prone codePredict


Approximation

Related existing works

FROM: Code

Predict using:Logistic Regression

Are the candidate UML metrics good enough to predict fault-proneness?


Case study

Objective: Estimate the probability of having a faulty class during the testing phase, using Logistic Regression.


Case study

Description. Using the design and implementation of the e-commerce system described in Gomaa’s book, this case study was carried out as follows:

• Collection of UML and Code metrics (Xs)• Collection of data related to the faults of the e-

commerce system from the logs of the CVS repository used (Y)

• Evaluation of the relationship between each metric to fault-proneness, using Univariate Logistic Models


Case studyMetrics to evaluate. Due to the manner the e-commerce system was

designed and implemented, without inheritance classes:SUITE Code Metric Level Inheritance

MetricUML Metric to

evaluate

QMOOD

Average Number of Ancestors (ANA) System Yes Measure of Aggregation (MOA)

No

Class Interface Size (CIS)*

Class

Data Access of Metric (DAM) Direct Class Coupling (DCC) Measure of Functional Abstraction (MFA) Yes

QMOOD

CK

Number of Methods (NOM) =

Weighted Methods per class (WMC) *

No

CK

Depth of Inheritance (DIT) Yes

Number of Children (NOC) Yes

Response For Class (RFC)*No

Coupling Between Objects (CBO) (*) Were found good predictors of fault-prone code in Olague’s work [2007].


Case study

Estimation of the probability of a class of being faulty, using CBO-code.

1 No Faulty 0.2 No Faulty2 No Faulty 0.2 No Faulty3 Faulty 0.99903 Faulty4 Faulty 1 Faulty5 Faulty 1 Faulty6 Faulty 1 Faulty7 Faulty 0.2 No Faulty8 Faulty 1 Faulty9 Faulty 1 Faulty10 Faulty 1 Faulty11 No Faulty 0.2 No Faulty12 Faulty 1 Faulty13 No Faulty 0.2 No Faulty

Class NumberPREDICTED Prob using CBO-code

Actual (y) Predicted (y) p1

1 e 1.3863 8.3282CBO

Correctness: 12/13 classes92.3% classes correct classifiedSensitivity: 8/9 faulty classes88.8% Faulty classes correct classifiedSpecificity:4/4 no-faulty classes100% No-faulty classes correct classified


Case studyResults. From the univariate models using each one of the

metrics proposed.

MetricsCorrectness

[classes]

Sensitivity

[ faulty classes]

Specificity

[no-faulty classes]

CBO-code 92.3 % 88.88% 100%

CBO-UML(1) 69.2% 66.66% 75%

CBO-UML(2) 69.2% 55.55% 100%

RFC-code 84.61% 88.88% 75%

RFC-UML(1) 76.92% 77.77% 75%

RFC-UML(2) 84.61% 88.88% 75%

WMC-code 90.9% 85.7% 100%

WMC-UML 72.7% 71.42% 75%

CIS-code 90.9% 85.7% 100%

CIS-UML 90.9% 100% 75%

DAM-code 36.3% 57.14% 0%

DAM-UML 72.7% 85.7 50%

DCC measures were not significant for this study

CIS: Public Methods in a class

DAM: Ratio of number of private and protected attributes to the total number of attributes


Case study

Results• Our second approach to approximate RFC with UML

diagrams performed equally to the RFC metric measured from code

• UML CIS approximation performed similarly to the CIS metric measured from the code

• The rest of the UML metrics’ performance was somewhat acceptable


Case study

Can we apply the obtained models to other case studies?

System MetricsCorrectness

[classes]

Sensitivity

[ faulty classes]

Specificity

[no-faulty classes]

E-commerce CBO-UML(1) 69.2% 66.66% 75%

Banking 72.7% 100% 50%

E-commerce RFC-UML(1) 76.92% 77.77% 75%

Banking 72.7% 100% 50%

E-commerce CBO-UML(2) 69.2% 55.55% 100%

Banking 63.6% 80% 50%

E-commerce RFC-UML(2) 84.61% 88.88% 75%

Banking 72.7% 80% 66.6%


Conclusions and Future work

• UML metrics can be acceptable predictors of fault-prone code

• UML CIS and UML RFC metrics showed strong relationship to fault-proneness of code

• We might be able to create a more robust model to predict fault-prone code before its implementation.


Conclusions and Future work

• Further study and evaluation of other metrics using other UML artifacts (e.g. sequence diagrams, state diagrams and description of use cases) is needed.

• Construction of a more robust model using multivariate logistic regression

• Evaluation of the final model obtained, using different study cases


Quality prediction model for object oriented software

using UML metricsCamargo Ana Erika

[email protected]

Documents

Quality prediction model for object oriented software using UML metrics