104
Metrics and Problem Detection Tudor Gîrba www.tudorgirba.com

Problem Detection (EVO 2008)

Embed Size (px)

DESCRIPTION

This set of slides was used a lecture on Problem Detection.

Citation preview

Page 1: Problem Detection (EVO 2008)

Metrics and Problem Detection

Tudor Gîrbawww.tudorgirba.com

Page 2: Problem Detection (EVO 2008)

Software is complex.

The Standish Group, 2004

53% Challenged

18% Failed

29% Succeeded

Page 3: Problem Detection (EVO 2008)

How large is your project?

Page 4: Problem Detection (EVO 2008)

How large is your project?

1’000’000 lines of code

Page 5: Problem Detection (EVO 2008)

How large is your project?

1’000’000 lines of code

* 2 = 2’000’000 seconds

Page 6: Problem Detection (EVO 2008)

How large is your project?

1’000’000 lines of code

* 2 = 2’000’000 seconds

/ 3600 = 560 hours

Page 7: Problem Detection (EVO 2008)

How large is your project?

1’000’000 lines of code

* 2 = 2’000’000 seconds

/ 3600 = 560 hours

/ 8 = 70 days

Page 8: Problem Detection (EVO 2008)

How large is your project?

1’000’000 lines of code

* 2 = 2’000’000 seconds

/ 3600 = 560 hours

/ 8 = 70 days

/ 20 = 3 months

Page 9: Problem Detection (EVO 2008)

forward engineering

}

{

}

{

}

{

}

{

Page 10: Problem Detection (EVO 2008)

forward engineering

actual development }

{

}

{

}

{

}

{}

{

}

{

}

{}

{

}

{

Page 11: Problem Detection (EVO 2008)

forward engineering

actual development }

{

}

{

}

{

}

{}

{

}

{

}

{}

{

}

{

What is the current state?

What should we do?

Where to start?

How to proceed?

Page 12: Problem Detection (EVO 2008)

reve

rse

engin

eerin

gforward engineering

}

{

}

{

}

{

}

{}

{

}

{

}

{}

{

}

{

actual development

Page 13: Problem Detection (EVO 2008)

Reverse engineering is analyzing a subject system to:

identify components and their relationships, and

create more abstract representations.

Chikofky & Cross, 90

Page 14: Problem Detection (EVO 2008)

}

{

}

{

}

{}

{

}

{

A large system contains lots of details.

Page 15: Problem Detection (EVO 2008)

}

{

}

{

}

{}

{

}

{

A large system contains lots of details.

How to judge its quality?

Page 17: Problem Detection (EVO 2008)

1Metrics

2Design

Problems

3Code Duplication

Page 18: Problem Detection (EVO 2008)

1Metrics

Page 19: Problem Detection (EVO 2008)

You cannot controlwhat you cannot measure.

Tom de Marco

Page 20: Problem Detection (EVO 2008)

Metrics are functions that assign numbers to

products, processes and resources.

Page 21: Problem Detection (EVO 2008)

Software metrics are measurements which

relate to software systems, processes or

related documents.

Page 22: Problem Detection (EVO 2008)

Metrics compress system traits into numbers.

Page 23: Problem Detection (EVO 2008)

Let’s see some examples...

Page 24: Problem Detection (EVO 2008)

Examples of size metrics

NOM - number of methods

NOA - number of attributes

LOC - number of lines of code

NOS - number of statements

NOC - number of children

Lorenz, Kidd, 1994Chidamber, Kemerer, 1994

Page 25: Problem Detection (EVO 2008)

McCabe, 1977

McCabe cyclomatic complexity (CYCLO) counts the number of independent paths through the code of a function.

interpretation can’t directly lead to improvement action

it reveals the minimum number of tests to write

Page 26: Problem Detection (EVO 2008)

Chidamber, Kemerer, 1994

Weighted Method Count (WMC) sums up the complexity of class’ methods (measured by the metric of your choice; usually CYCLO).

interpretation can’t directly lead to improvement action

it is configurable, thus adaptable to our precise needs

Page 27: Problem Detection (EVO 2008)

Chidamber, Kemerer, 1994

Depth of Inheritance Tree (DIT) is the (maximum) depth level of a class in a class hierarchy.

only the potential and not the real impact is quantified

inheritance is measured

Page 28: Problem Detection (EVO 2008)

Coupling between objects (CBO) shows the number of classes from which methods or attributes are used.

Chidamber, Kemerer, 1994

no differentiation of types and/or intensity of coupling

it takes into account real dependencies not just declared ones

Page 29: Problem Detection (EVO 2008)

Tight Class Cohesion (TCC) counts the relative number of method-pairs that access attributes of the class in common.

Bieman, Kang, 1995

TCC = 2 / 10 = 0.2

ratio values allow comparison between systems

interpretation can lead to improvement action

Page 30: Problem Detection (EVO 2008)

Access To Foreign Data (ATFD) counts how many attributes from other classes are accessed directly from a measured class.

Marinescu 2006

Page 31: Problem Detection (EVO 2008)

...

Page 32: Problem Detection (EVO 2008)

2Design Problems

Page 33: Problem Detection (EVO 2008)

McCall, 1977

Page 34: Problem Detection (EVO 2008)

Metrics Assess and Improve Quality!

Page 35: Problem Detection (EVO 2008)

Metrics Assess and Improve Quality!

Really?

Page 36: Problem Detection (EVO 2008)

McCall, 1977

Page 37: Problem Detection (EVO 2008)

?Problem 1: metrics granularity

capture symptoms, not causes of problems

in isolation,they don’t lead to improvement solutions

Page 38: Problem Detection (EVO 2008)

?Problem 2: implicit mapping

we don’t reason in terms of metrics, but in terms of design principles

Problem 1: metrics granularity

capture symptoms, not causes of problems

in isolation,they don’t lead to improvement solutions

Page 39: Problem Detection (EVO 2008)

2 big obstacles in using metrics:

Thresholds make metrics hard to interpret

Granularity make metrics hard to use in isolation

Page 40: Problem Detection (EVO 2008)

Can metrics help me in what I really care for? :)

Page 41: Problem Detection (EVO 2008)

forward engineering

actual development }

{

}

{

}

{

}

{}

{

}

{

}

{}

{

}

{

Page 42: Problem Detection (EVO 2008)

forward engineering

actual development }

{

}

{

}

{

}

{}

{

}

{

}

{}

{

}

{

How do I understand code?

Page 43: Problem Detection (EVO 2008)

forward engineering

actual development }

{

}

{

}

{

}

{}

{

}

{

}

{}

{

}

{

How do I understand code?

How do I improve code?

Page 44: Problem Detection (EVO 2008)

forward engineering

actual development }

{

}

{

}

{

}

{}

{

}

{

}

{}

{

}

{

How do I understand code?

How do I improve code?

I do not want to deal with metrics!

Page 45: Problem Detection (EVO 2008)

How to get an initial understanding of a system?

Page 46: Problem Detection (EVO 2008)

Metric ValueLOC 35175

NOM 3618

NOC 384

CYCLO 5579

NOP 19

CALLS 15128

FANOUT 8590

AHH 0.12

ANDC 0.31

Page 47: Problem Detection (EVO 2008)

Metric ValueLOC 35175

NOM 3618

NOC 384

CYCLO 5579

NOP 19

CALLS 15128

FANOUT 8590

AHH 0.12

ANDC 0.31

Page 48: Problem Detection (EVO 2008)

Metric ValueLOC 35175

NOM 3618

NOC 384

CYCLO 5579

NOP 19

CALLS 15128

FANOUT 8590

AHH 0.12

ANDC 0.31And now what?

Page 49: Problem Detection (EVO 2008)

We need means to compare.

Page 50: Problem Detection (EVO 2008)

hierarchies?

coupling?

Page 51: Problem Detection (EVO 2008)

0.31ANDC

NOM

20.21 19

0.12

35175

NOP

NOC

418

0.15

8590

LOC

3618

9.42

5579

NOM

CALLS15128

384

FANOUT

9.72

0.56

AHH

CYCLO

The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006

Size Communication

Inheritance

Page 52: Problem Detection (EVO 2008)

0.31ANDC

NOM

20.21 19

0.12

35175

NOP

NOC

418

0.15

8590

LOC

3618

9.42

5579

NOM

CALLS15128

384

FANOUT

9.72

0.56

AHH

CYCLO

Size

The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006

Page 53: Problem Detection (EVO 2008)

0.31ANDC

NOM

20.21 19

0.12

35175

NOP

NOC

418

0.15

8590

LOC

3618

9.42

5579

NOM

CALLS15128

384

FANOUT

9.72

0.56

AHH

CYCLO

Communication

The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006

Page 54: Problem Detection (EVO 2008)

0.31ANDC

NOM

20.21 19

0.12

35175

NOP

NOC

418

0.15

8590

LOC

3618

9.42

5579

NOM

CALLS15128

384

FANOUT

9.72

0.56

AHH

CYCLO

Inheritance

The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006

Page 55: Problem Detection (EVO 2008)

0.31ANDC

NOM

20.21 19

0.12

35175

NOP

NOC

418

0.15

8590

LOC

3618

9.42

5579

NOM

CALLS15128

384

FANOUT

9.72

0.56

AHH

CYCLO

The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006

Page 56: Problem Detection (EVO 2008)

...

HIGH

0.30

16

15

10

9

0.25

AVG

C++

4

5

0.20

LOW

Java

AVGLOW HIGH

0.24

10

13

7

0.20

10

0.16

7

4NOM/NOC

LOC/NOM

CYCLO/LOC

Page 57: Problem Detection (EVO 2008)

0.31ANDC

NOM

20.21 19

0.12

35175

NOP

NOC

418

0.15

8590

LOC

3618

9.42

5579

NOM

CALLS15128

384

FANOUT

9.72

0.56

AHH

CYCLO

The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006

close to high close to average close to low

Page 58: Problem Detection (EVO 2008)

The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006

close to high close to average close to low

Page 59: Problem Detection (EVO 2008)

forward engineering

actual development }

{

}

{

}

{

}

{}

{

}

{

}

{}

{

}

{

How do I understand code?

How do I improve code?

I do not want to deal with metrics!

Page 60: Problem Detection (EVO 2008)

How do I improve code?

Page 61: Problem Detection (EVO 2008)

Breaking design principles, rules and best practices

deteriorates the code;

it leads to design problems.

Quality is more than 0 bugs.

Page 62: Problem Detection (EVO 2008)

Imagine changing just a small design fragment

Page 63: Problem Detection (EVO 2008)

Imagine changing just a small design fragment

Page 64: Problem Detection (EVO 2008)

and 33%of all classes would require changes

Imagine changing just a small design fragment

Page 65: Problem Detection (EVO 2008)

Design problems areexpensivefrequentunavoidable

Page 66: Problem Detection (EVO 2008)

Design problems areexpensivefrequentunavoidable

How to detect and eliminate them?

Page 67: Problem Detection (EVO 2008)

God Classes tend to centralize the intelligence of the system, to do everything and to use data from small data-classes.

Riel, 1996

Page 68: Problem Detection (EVO 2008)

God Classes tendto centralize the intelligence of the system,to do everything andto use data from small data-classes.

Page 69: Problem Detection (EVO 2008)

God Classescentralize the intelligence of the system,do everything anduse data from small data-classes.

Page 70: Problem Detection (EVO 2008)

God Classesare complex,are not cohesive,access external data.

Page 71: Problem Detection (EVO 2008)

God Classesare complex, WMC is highare not cohesive, TCC is lowaccess external data. ATFD more than few

Page 72: Problem Detection (EVO 2008)

God Classesare complex, WMC is highare not cohesive, TCC is lowaccess external data. ATFD more than few

Compose metrics into queries using

logical operato

rs

Page 73: Problem Detection (EVO 2008)

Detection Strategies are metric-based queries to detect design flaws.

METRIC 1 > Threshold 1

Rule 1

METRIC 2 < Threshold 2

Rule 2

AND Quality problem

Lanza, Marinescu 2006

Page 74: Problem Detection (EVO 2008)

A God Class centralizes too much intelligence in the system.

ATFD > FEW

Class uses directly more than a

few attributes of other classes

WMC ! VERY HIGH

Functional complexity of the

class is very high

TCC < ONE THIRD

Class cohesion is low

AND GodClass

Lanza, Marinescu 2006

Page 75: Problem Detection (EVO 2008)

An Envious Method is more interested in data from a handful of classes.

ATFD > FEW

Method uses directly more than

a few attributes of other classes

LAA < ONE THIRD

Method uses far more attributes

of other classes than its own

FDP ! FEW

The used "foreign" attributes

belong to very few other classes

AND Feature Envy

Lanza, Marinescu 2006

Page 76: Problem Detection (EVO 2008)

Data Classes are dumb data holders.

WOC < ONE THIRD

Interface of class reveals data

rather than offering services

AND Data Class

Class reveals many attributes and is

not complex

Lanza, Marinescu 2006

Page 77: Problem Detection (EVO 2008)

Data Classes are dumb data holders.

AND

OR

Class reveals many

attributes and is not

complex

NOAP + NOAM > FEW

More than a few public

data

WMC < HIGH

Complexity of class is not

high

NOAP + NOAM > MANY

Class has many public

data

WMC < VERY HIGH

Complexity of class is not

very high

AND

Lanza, Marinescu 2006

Page 78: Problem Detection (EVO 2008)

Shotgun Surgery depicts that a change in an operation triggers many (small) in a lot of different operation and classes.

Lanza, Marinescu 2006

Page 79: Problem Detection (EVO 2008)

3Code Duplication

Page 80: Problem Detection (EVO 2008)

What is Code Duplication?

Page 81: Problem Detection (EVO 2008)

What is Code Duplication?

And why is it a p

roblem?

Page 82: Problem Detection (EVO 2008)

Code Duplication Detection

Lexical Equivalence

Semantic Equivalence

Syntactical Equivalence

Page 83: Problem Detection (EVO 2008)

Visualization of Copied Code Sequences

File A

File A

File B

File B

Page 84: Problem Detection (EVO 2008)

Author Level Transformed Code Comparison Technique

Johnson 94 Lexical Substrings String-Matching

Ducasse 99 Lexical Normalized Strings String-Matching

Baker 95 Syntactical Parameterized Strings String-Matching

Mayrand 96 Syntactical Metrics Tuples Discrete comparison

Kontogiannis 97 Syntactical Metrics Tuples Euclidean distance

Baxter 98 Syntactical AST Tree-Matching

Source Code Transformed Code Duplication Data

Transformation Comparison

Page 85: Problem Detection (EVO 2008)

Noise Elimination

…//assign same fastid as containerfastid = NULL;const char* fidptr = get_fastid();if(fidptr != NULL) { int l = strlen(fidptr); fastid = newchar[ l + 1 ];

fastid=NULL;constchar*fidptr=get_fastid();if(fidptr!=NULL)intl=strlen(fidptr)fastid = newchar[l+]

Page 86: Problem Detection (EVO 2008)

Enhanced Simple Detection Approach

Page 87: Problem Detection (EVO 2008)

a b c d a b c d

lines from source

lines

from

source

Page 88: Problem Detection (EVO 2008)

a b c d a x y d

lines from source

lines

from

source

Page 89: Problem Detection (EVO 2008)

a b c a b x y c

lines from source

lines

from

source

Page 90: Problem Detection (EVO 2008)

a x b x c x d x

lines from source

lines

from

source

Page 91: Problem Detection (EVO 2008)

lines from source 2

lines

from

source 1

Page 92: Problem Detection (EVO 2008)

lines from source 2

lines

from

source 1

Page 93: Problem Detection (EVO 2008)

lines from source 2

lines

from

source 1

exact

chunk

Page 94: Problem Detection (EVO 2008)

lines from source 2

lines

from

source 1

exact

chunk

line

bias

Page 95: Problem Detection (EVO 2008)

lines from source 2

lines

from

source 1

exact

chunk

exact

chunk

line

bias

Page 96: Problem Detection (EVO 2008)

Significant Duplication is large and should be looked at Lanza, Marinescu 2006

Page 97: Problem Detection (EVO 2008)

1Metrics

2Design

Problems

3Code Duplication

Page 98: Problem Detection (EVO 2008)

God

Class

Brain

Class

Feature

Envy

Data

Class

Brain

Method

Significant

Duplication

Intensive

Coupling

Extensive

Coupling

Shotgun

Surgery

Tradition

Breaker

Refused

Parent

Bequest

uses

has

is

has

has

has (partial)

is partially

has

is

is

has

Futile

Hierarchy

uses

has

has

is

has (subclass)

Classification

Disharmonies

Identity

Disharmonies

Collaboration

Disharmonies

Lanza, Marinescu 2006

Page 99: Problem Detection (EVO 2008)

Follow a clear and repeatable process

Page 100: Problem Detection (EVO 2008)

Follow a clear and repeatable process

Page 101: Problem Detection (EVO 2008)

Follow a clear and repeatable process

Page 102: Problem Detection (EVO 2008)

Don’t reason about quality in terms of numbers!

Follow a clear and repeatable process