84
Metrics and Problem Detection Jorge Ressia

05 Problem Detection

Embed Size (px)

Citation preview

Page 1: 05 Problem Detection

Metrics and Problem Detection

Jorge Ressia

Page 2: 05 Problem Detection

Software is complex.

The Standish Group, 2004

53% Challenged

18% Failed

29% Succeeded

Page 3: 05 Problem Detection

How large is your project?

1’000’000 lines of code

* 2 = 2’000’000 seconds

/ 3600 = 560 hours

/ 8 = 70 days

/ 20 = 3 months

Page 4: 05 Problem Detection

forward engineering

}

{

}

{

}

{

}

{

Page 5: 05 Problem Detection

forward engineering

actual development }

{

}

{

}

{

}

{}

{

}

{

}

{}

{

}

{

Page 6: 05 Problem Detection

forward engineering

actual development }

{

}

{

}

{

}

{}

{

}

{

}

{}

{

}

{

What is the current state?

What should we do?

Where to start?

How to proceed?

Page 7: 05 Problem Detection

reve

rse

engin

eerin

gforward engineering

}

{

}

{

}

{

}

{}

{

}

{

}

{}

{

}

{

actual development

Page 8: 05 Problem Detection

Reverse engineering is analyzing a subject system to:

identify components and their relationships, and

create more abstract representations.

Chikofky & Cross, 90

Page 9: 05 Problem Detection

}

{

}

{

}

{}

{

}

{

A large system contains lots of details.

How to judge its quality?

Page 10: 05 Problem Detection

http://moose.unibe.ch

http://loose.upt.ro/incode

Page 11: 05 Problem Detection

1Metrics

2Design

Problems

3Code Duplication

Page 12: 05 Problem Detection

1Metrics

Page 13: 05 Problem Detection

You cannot controlwhat you cannot measure.

Tom de Marco

Page 14: 05 Problem Detection

Metrics are functions that assign numbers to

products, processes and resources.

Page 15: 05 Problem Detection

Software metrics are measurements which

relate to software systems, processes or

related documents.

Page 16: 05 Problem Detection

Metrics compress system traits into numbers.

Page 17: 05 Problem Detection

Let’s see some examples...

Page 18: 05 Problem Detection

Examples of size metrics

NOM - number of methods

NOA - number of attributes

LOC - number of lines of code

NOS - number of statements

NOC - number of children

Lorenz, Kidd, 1994Chidamber, Kemerer, 1994

Page 19: 05 Problem Detection

McCabe, 1977

McCabe cyclomatic complexity (CYCLO) counts the number of independent paths through the code of a function.

interpretation can’t directly lead to improvement action

it reveals the minimum number of tests to write

Page 20: 05 Problem Detection

Chidamber, Kemerer, 1994

Weighted Method Count (WMC) sums up the complexity of class’ methods (measured by the metric of your choice; usually CYCLO).

interpretation can’t directly lead to improvement action

it is configurable, thus adaptable to our precise needs

Page 21: 05 Problem Detection

Chidamber, Kemerer, 1994

Depth of Inheritance Tree (DIT) is the (maximum) depth level of a class in a class hierarchy.

only the potential and not the real impact is quantified

inheritance is measured

Page 22: 05 Problem Detection

Coupling between objects (CBO) shows the number of classes from which methods or attributes are used.

Chidamber, Kemerer, 1994

no differentiation of types and/or intensity of coupling

it takes into account real dependencies not just declared ones

Page 23: 05 Problem Detection

Tight Class Cohesion (TCC) counts the relative number of method-pairs that access attributes of the class in common.

Bieman, Kang, 1995

TCC = 2 / 10 = 0.2

ratio values allow comparison between systems

interpretation can lead to improvement action

Page 24: 05 Problem Detection

Access To Foreign Data (ATFD) counts how many attributes from other classes are accessed directly from a measured class.

Marinescu 2006

Page 25: 05 Problem Detection

...

Page 26: 05 Problem Detection

2Design Problems

Page 27: 05 Problem Detection

McCall, 1977

Page 28: 05 Problem Detection

Metrics Assess and Improve Quality!

Really?

Page 29: 05 Problem Detection

McCall, 1977

Page 30: 05 Problem Detection

?Problem 2: implicit mapping

we don’t reason in terms of metrics, but in terms of design principles

Problem 1: metrics granularity

capture symptoms, not causes of problems

in isolation,they don’t lead to improvement solutions

Page 31: 05 Problem Detection

2 big obstacles in using metrics:

Thresholds make metrics hard to interpret

Granularity make metrics hard to use in isolation

Page 32: 05 Problem Detection

Can metrics help me in what I really care for? :)

Page 33: 05 Problem Detection

forward engineering

actual development }

{

}

{

}

{

}

{}

{

}

{

}

{}

{

}

{

How do I understand code?

How do I improve code?

I want nothing to do with metrics!

Page 34: 05 Problem Detection

How to get an initial understanding of a system?

Page 35: 05 Problem Detection

Metric ValueLOC 35175

NOM 3618

NOC 384

CYCLO 5579

NOP 19

CALLS 15128

FANOUT 8590

AHH 0.12

ANDC 0.31

Page 36: 05 Problem Detection

Metric ValueLOC 35175

NOM 3618

NOC 384

CYCLO 5579

NOP 19

CALLS 15128

FANOUT 8590

AHH 0.12

ANDC 0.31And now what?

Page 37: 05 Problem Detection

We need means to compare.

Page 38: 05 Problem Detection

hierarchies?

coupling?

Page 39: 05 Problem Detection

0.31ANDC

NOM

20.21 19

0.12

35175

NOP

NOC

418

0.15

8590

LOC

3618

9.42

5579

NOM

CALLS15128

384

FANOUT

9.72

0.56

AHH

CYCLO

The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006

Size Communication

Inheritance

Page 40: 05 Problem Detection

0.31ANDC

NOM

20.21 19

0.12

35175

NOP

NOC

418

0.15

8590

LOC

3618

9.42

5579

NOM

CALLS15128

384

FANOUT

9.72

0.56

AHH

CYCLO

Size

The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006

Page 41: 05 Problem Detection

0.31ANDC

NOM

20.21 19

0.12

35175

NOP

NOC

418

0.15

8590

LOC

3618

9.42

5579

NOM

CALLS15128

384

FANOUT

9.72

0.56

AHH

CYCLO

Communication

The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006

Page 42: 05 Problem Detection

0.31ANDC

NOM

20.21 19

0.12

35175

NOP

NOC

418

0.15

8590

LOC

3618

9.42

5579

NOM

CALLS15128

384

FANOUT

9.72

0.56

AHH

CYCLO

Inheritance

The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006

Page 43: 05 Problem Detection

0.31ANDC

NOM

20.21 19

0.12

35175

NOP

NOC

418

0.15

8590

LOC

3618

9.42

5579

NOM

CALLS15128

384

FANOUT

9.72

0.56

AHH

CYCLO

The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006

Page 44: 05 Problem Detection

...

HIGH

0.30

16

15

10

9

0.25

AVG

C++

4

5

0.20

LOW

Java

AVGLOW HIGH

0.24

10

13

7

0.20

10

0.16

7

4NOM/NOC

LOC/NOM

CYCLO/LOC

Page 45: 05 Problem Detection

0.31ANDC

NOM

20.21 19

0.12

35175

NOP

NOC

418

0.15

8590

LOC

3618

9.42

5579

NOM

CALLS15128

384

FANOUT

9.72

0.56

AHH

CYCLO

The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006

close to high close to average close to low

Page 46: 05 Problem Detection

The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006

close to high close to average close to low

Page 47: 05 Problem Detection

forward engineering

actual development }

{

}

{

}

{

}

{}

{

}

{

}

{}

{

}

{

How do I understand code?

How do I improve code?

I want nothing to do with metrics!

Page 48: 05 Problem Detection

How do I improve code?

Page 49: 05 Problem Detection

Breaking design principles, rules and best practices

deteriorates the code;

it leads to design problems.

Quality is more than 0 bugs.

Page 50: 05 Problem Detection

and 33%of all classes would require changes

Imagine changing just a small design fragment

Page 51: 05 Problem Detection

Design problems areexpensivefrequentunavoidable

How to detect and eliminate them?

Page 52: 05 Problem Detection

God Classes tend to centralize the intelligence of the system, to do everything and to use data from small data-classes.

Riel, 1996

Page 53: 05 Problem Detection

God Classes tendto centralize the intelligence of the system,to do everything andto use data from small data-classes.

Page 54: 05 Problem Detection

God Classescentralize the intelligence of the system,do everything anduse data from small data-classes.

Page 55: 05 Problem Detection

God Classesare complex,are not cohesive,access external data.

Page 56: 05 Problem Detection

God Classesare complex, WMC is highare not cohesive, TCC is lowaccess external data. ATFD more than few

Compose metrics into queries using

logical operato

rs

Page 57: 05 Problem Detection

Detection Strategies are metric-based queries to detect design flaws.

METRIC 1 > Threshold 1

Rule 1

METRIC 2 < Threshold 2

Rule 2

AND Quality problem

Lanza, Marinescu 2006

Page 58: 05 Problem Detection

A God Class centralizes too much intelligence in the system.

ATFD > FEW

Class uses directly more than a

few attributes of other classes

WMC ! VERY HIGH

Functional complexity of the

class is very high

TCC < ONE THIRD

Class cohesion is low

AND GodClass

Lanza, Marinescu 2006

Page 59: 05 Problem Detection

An Envious Method is more interested in data from a handful of classes.

ATFD > FEW

Method uses directly more than

a few attributes of other classes

LAA < ONE THIRD

Method uses far more attributes

of other classes than its own

FDP ! FEW

The used "foreign" attributes

belong to very few other classes

AND Feature Envy

Lanza, Marinescu 2006

Page 60: 05 Problem Detection

Data Classes are dumb data holders.

WOC < ONE THIRD

Interface of class reveals data

rather than offering services

AND Data Class

Class reveals many attributes and is

not complex

Lanza, Marinescu 2006

Page 61: 05 Problem Detection

Data Classes are dumb data holders.

AND

OR

Class reveals many

attributes and is not

complex

NOAP + NOAM > FEW

More than a few public

data

WMC < HIGH

Complexity of class is not

high

NOAP + NOAM > MANY

Class has many public

data

WMC < VERY HIGH

Complexity of class is not

very high

AND

Lanza, Marinescu 2006

Page 62: 05 Problem Detection

Shotgun Surgery depicts that a change in an operation triggers many (small) in a lot of different operation and classes.

Lanza, Marinescu 2006

Page 63: 05 Problem Detection

3Code Duplication

Page 64: 05 Problem Detection

What is Code Duplication?

What are the problems of it?

Page 65: 05 Problem Detection

Code Duplication Detection

Lexical Equivalence

Semantic Equivalence

Syntactical Equivalence

Page 66: 05 Problem Detection

Visualization of Copied Code Sequences

File A

File A

File B

File B

Page 67: 05 Problem Detection

Author Level Transformed Code Comparison Technique

Johnson 94 Lexical Substrings String-Matching

Ducasse 99 Lexical Normalized Strings String-Matching

Baker 95 Syntactical Parameterized Strings String-Matching

Mayrand 96 Syntactical Metrics Tuples Discrete comparison

Kontogiannis 97 Syntactical Metrics Tuples Euclidean distance

Baxter 98 Syntactical AST Tree-Matching

Source Code Transformed Code Duplication Data

Transformation Comparison

Page 68: 05 Problem Detection

Noise Elimination

…//assign same fastid as containerfastid = NULL;const char* fidptr = get_fastid();if(fidptr != NULL) { int l = strlen(fidptr); fastid = newchar[ l + 1 ];

fastid=NULL;constchar*fidptr=get_fastid();if(fidptr!=NULL)intl=strlen(fidptr)fastid = newchar[l+]

Page 69: 05 Problem Detection

Enhanced Simple Detection Approach

Page 70: 05 Problem Detection

a b c d a b c d

lines from source

lines

from

source

Page 71: 05 Problem Detection

a b c d a x y d

lines from source

lines

from

source

Page 72: 05 Problem Detection

a b c a b x y c

lines from source

lines

from

source

Page 73: 05 Problem Detection

a x b x c x d x

lines from source

lines

from

source

Page 74: 05 Problem Detection

lines from source 2

lines

from

source 1

Page 75: 05 Problem Detection

lines from source 2

lines

from

source 1

Page 76: 05 Problem Detection

lines from source 2

lines

from

source 1

exact

chunk

Page 77: 05 Problem Detection

lines from source 2

lines

from

source 1

exact

chunk

line

bias

Page 78: 05 Problem Detection

lines from source 2

lines

from

source 1

exact

chunk

exact

chunk

line

bias

Page 79: 05 Problem Detection

Significant Duplication: - It is the largest possible duplication chain uniting all exact clones that are close enough to each other.- The duplication is large enough.

Lanza, Marinescu 2006

Page 80: 05 Problem Detection

1Metrics

2Design

Problems

3Code Duplication

Page 81: 05 Problem Detection

God

Class

Brain

Class

Feature

Envy

Data

Class

Brain

Method

Significant

Duplication

Intensive

Coupling

Extensive

Coupling

Shotgun

Surgery

Tradition

Breaker

Refused

Parent

Bequest

uses

has

is

has

has

has (partial)

is partially

has

is

is

has

Futile

Hierarchy

uses

has

has

is

has (subclass)

Classification

Disharmonies

Identity

Disharmonies

Collaboration

Disharmonies

Lanza, Marinescu 2006

Page 82: 05 Problem Detection

Don’t reason about quality in terms of numbers!

Follow a clear and repeatable process

Page 83: 05 Problem Detection

QA is part of the the Development Process

http://loose.upt.ro/incode

Page 84: 05 Problem Detection

Tudor Gîrbawww.tudorgirba.com

creativecommons.org/licenses/by/3.0/

Jorge Ressia