66
Stéphane Ducasse L S E Stéphane Ducasse [email protected] http://stephane.ducasse.free.fr/ A Portfolio of Software Evolution Expertise

Ducasse's Maintenance Expertise

  • Upload
    ducasse

  • View
    636

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Ducasse's Maintenance Expertise

Stéphane Ducasse

LSE

Stéphane [email protected]://stephane.ducasse.free.fr/

A Portfolio of Software Evolution Expertise

1

Page 2: Ducasse's Maintenance Expertise

S.Ducasse LSE

A word of presentationCo-author of Object-Oriented Reengineering PatternsCo-developer of Moose (reengineering platform)10 PhD Theses in reengineering50+ articlesGrounded in realityWas maintainer of Squeak 3.9

Worked with:Harman-Becker AG Bedag AGNokia, Daimler

2

Page 3: Ducasse's Maintenance Expertise

S.Ducasse LSE

Roadmap• Some facts• Our approach

• Supporting maintenance• Moose an open-platform

• Some visual examples

• Conclusion

3

Page 4: Ducasse's Maintenance Expertise

S.Ducasse LSE

Software is complex.

The Standish Group, 2004

53% Challenged

18% Failed

29% Succeeded

4

Page 5: Ducasse's Maintenance Expertise

S.Ducasse LSE

How large is your project?

5

Page 6: Ducasse's Maintenance Expertise

S.Ducasse LSE

How large is your project?

5

Page 7: Ducasse's Maintenance Expertise

S.Ducasse LSE

How large is your project?

5

Page 8: Ducasse's Maintenance Expertise

S.Ducasse LSE

How large is your project?

5

Page 9: Ducasse's Maintenance Expertise

S.Ducasse LSE

How large is your project?

1’000’000 lines of code

5

Page 10: Ducasse's Maintenance Expertise

S.Ducasse LSE

How large is your project?

1’000’000 lines of code* 2 = 2’000’000 seconds

5

Page 11: Ducasse's Maintenance Expertise

S.Ducasse LSE

How large is your project?

1’000’000 lines of code* 2 = 2’000’000 seconds

/ 3600 = 560 hours

5

Page 12: Ducasse's Maintenance Expertise

S.Ducasse LSE

How large is your project?

1’000’000 lines of code* 2 = 2’000’000 seconds

/ 3600 = 560 hours/ 8 = 70 days

5

Page 13: Ducasse's Maintenance Expertise

S.Ducasse LSE

How large is your project?

1’000’000 lines of code* 2 = 2’000’000 seconds

/ 3600 = 560 hours/ 8 = 70 days

/ 20 = 3 months

5

Page 14: Ducasse's Maintenance Expertise

S.Ducasse LSE

Maintenance is Continuous Development

6

Relative Maintenance EffortBetween 50% and 75% of global

effort is spent on “maintenance” !

17.4% Corrective(fixing reported errors)

18.2% Adaptive(new platforms or OS)

60.3% Perfective(new functionality)

4.1% Other

The bulk of the maintenance cost is due to new functionalityeven with better requirements, it is hard to predict new functions

Page 15: Ducasse's Maintenance Expertise

S.Ducasse LSE

Lehman’s Software Evolution LawsContinuous Change: “A program that is used in a real-world environment must change, or become progressively less useful in that environment.”

Software Entropy: “As a program evolves, it becomes more complex, and extra resources are needed to preserve and simplify its structure.”

7

Page 16: Ducasse's Maintenance Expertise

S.Ducasse LSE

Roadmap• Some facts• Our approach

• Supporting maintenance• Moose an open-platform

• Some visual examples• Conclusion

8

Page 17: Ducasse's Maintenance Expertise

S.Ducasse

Supporting the evolution of applications

A research goal and agenda grounded in reality

How to help companies maintaining their large software?What is the xray for software?

code, people, practicesWhich analyses?How can you monitor your system (dashboards....)How to present extracted information?

9

Page 18: Ducasse's Maintenance Expertise

S.Ducasse

Covered topics

TopicsMetamodeling, Software metrics,Program understanding,Visualization, Evolution analysis,Duplicated code detection,Code Analysis, Refactorings,Tests

ContributionsMoose: an open-source extensible reengineering environment: (Lugano, Bern, Annecy, Anvers, Louvain la neuve, ULB, UTSL)

ContactsHarman-Becker (3 Millions C++), Bedag (Cobol), Nokia, ABB, IMEC

10

Representation Transformations

Reverse

Engineering

Analyses

Evolution

Page 19: Ducasse's Maintenance Expertise

S.Ducasse LSE11

Representation Transformations

ReverseEngineering

Analyses

EvolutionLanguage Independent Meta Model (FAMIX)

[UML99]An Extensible Reengineering Environment (Moose)

[Models 06]

Reengineering PatternsVersion Analyses

[ICSM 05]HISMO metamodel

[JSME 05]

Understanding Large Systems [WCRE99, TSI00, TSE03]Static/Dynamic Information

[ICSM99]Feature Analysis

[JSME 06]Class Understanding

[OOPSLA01,TSE04]Package Blueprints

[ICSM 07]Distribution Maps

[ICSM 06]

Software Metrics [LMO99, OOPSLA00]

Duplicated Code Identification[ICSM99, ICSM02]

Group Identification [ASE03]

Test Generation [CSMR 06]Concept Identification

[WCRE 06]

Language Independent Refactorings

[IWPSE 00]

Page 20: Ducasse's Maintenance Expertise

S.Ducasse

One Example: who is responsible of what?

12

(1) Extraction

(2) Modèle

(4) Visualisation

(3) Analyses

Distribution Map of authors on JBoss

Page 21: Ducasse's Maintenance Expertise

S.Ducasse LSE

Moose is a reengineering tool which integrates multiple techniques

13

Visualization

Evolution Analysis

Moose

Queries and Navigation

Number of classes = 382Number of methods = 4268…

Semantic Analysis

Metrics

word1 word2

Page 22: Ducasse's Maintenance Expertise

S.Ducasse LSE

Moose is open and open-sourcemeta-describedmeta-model aware

14

Method Class

Inheritance

Page 23: Ducasse's Maintenance Expertise

S.Ducasse LSE

Designed to be extensible

15

Method Class

Inheritance

Author

File

Duplication

Event

Trace

ClassVersion

ClassHistory

Page 24: Ducasse's Maintenance Expertise

S.Ducasse LSE

Roadmap• Some facts• Our approach

• Supporting maintenance• Moose an open-platform

• Some visual examples• Conclusion

16

Page 25: Ducasse's Maintenance Expertise

S.Ducasse LSE

Understanding large systemsUnderstanding code is difficult!Systems are largeCode is abstractShould I really convinced you?

Some existing approachesMetrics: problems you often get meaningless results once combinedVisualization: often beautiful but without meaning

17

Page 26: Ducasse's Maintenance Expertise

S.Ducasse LSE

Polymetric views

18

W: # fields H: # methodsC: # lines of code

Page 27: Ducasse's Maintenance Expertise

S.Ducasse LSE

Polymetric views condense information

19

Classes+InheritanceW: # of Added Methods H: # of Overridden MethodsC: # of Method Extended

To get a feel of the inheritance semantics: adding vs. reusing

methods LOC # statements # parameters

Page 28: Ducasse's Maintenance Expertise

S.Ducasse LSE

Navigating Views...

20

Page 29: Ducasse's Maintenance Expertise

S.Ducasse LSE

Understanding classesUnderstanding even a class is difficult!

21

Page 30: Ducasse's Maintenance Expertise

S.Ducasse LSE

Class Blueprint

22

Invocation Sequence

Initialization External Interface Internal Implementation Accessor Attribute

Enriched call flow annotated with metrics to give semantics

Page 31: Ducasse's Maintenance Expertise

S.Ducasse LSE

Class Blueprint

23

Page 32: Ducasse's Maintenance Expertise

S.Ducasse LSE

Large delegating interface

24

Page 33: Ducasse's Maintenance Expertise

S.Ducasse LSE

Sharing Flows

25

Page 34: Ducasse's Maintenance Expertise

S.Ducasse LSE

Regular Subclasses

26

Page 35: Ducasse's Maintenance Expertise

S.Ducasse LSE

Patterns

27

Page 36: Ducasse's Maintenance Expertise

S.Ducasse LSE

How can we predict changes?Common wisdom stresses that what changes yesterday will change today, but it is true?

In the Sahara the weather is constant, tomorrow: 90% chance that it is the same as today

In Belgium, the weather is changing really fast (sea influence), 30% chance that it is the same as today

28

Page 37: Ducasse's Maintenance Expertise

S.Ducasse LSE

With history analysis we can get the climate of a software system

29

Past LateChangers

Future EarlyChangers

Presentversion

Pastversions

Futureversions

1, TopLENOM1..i (S, t1) ∩ TopEENOMi..n (S, t2) ≠ ∅

YWi(S) = 0, TopLENOM1..i (S, t1) ∩ TopEENOMi..n (S, t2) = ∅

hitYW(S, t1, t2) =

∑ YWi(S, t1, t2)

n - 2

Page 38: Ducasse's Maintenance Expertise

S.Ducasse LSE

How developers develop?• More efficient to put people working together in the

same office? • How can we optimize software development?

30

Page 39: Ducasse's Maintenance Expertise

S.Ducasse LSE

Who did that?

31

Files

Time

Page 40: Ducasse's Maintenance Expertise

S.Ducasse LSE

Line colors show which author owned which files in which period

32

File A

File B

Green authorlarge commit

Green authorownership

Blue authorsmall commit

Page 41: Ducasse's Maintenance Expertise

S.Ducasse LSE

Which author “possesses” which files?

33

Page 42: Ducasse's Maintenance Expertise

S.Ducasse LSE

Alphabetical order is no order!

34

Page 43: Ducasse's Maintenance Expertise

S.Ducasse LSE

Based on similar commit signature

35

DialogueMonologue

Edit Takeover

Familiarization

Page 44: Ducasse's Maintenance Expertise

S.Ducasse LSE

Understanding evolution of large systems

• How old are the hierarchies?• How did the classes change?• How did the inheritance change?

36

Page 45: Ducasse's Maintenance Expertise

S.Ducasse LSE

Evolution holds useful information

37

A

B

A

BC

A

BC

D

A

BC

D

A

D

time

B is stable

C was removed

E is newborn

A is persistent

D inherited from C and then from A …

Page 46: Ducasse's Maintenance Expertise

S.Ducasse LSE

Hierarchy Evolution Complexity View characterizes class hierarchy histories

38

B is stable

C was removed

E is newborn

A is persistent

D inherited from C and then from A …

A

B

E

C

D

ENOS

Removed

Age

Removed

Age InheritanceHistory

ClassHistory

ENOM

Page 47: Ducasse's Maintenance Expertise

S.Ducasse LSE

Class hierarchies over 40 versions of Jun - a 740 classes, 3D framework

39

Page 48: Ducasse's Maintenance Expertise

S.Ducasse LSE

Identifying Duplicated Code“Parsing the program suite of interest requires a parser for the language dialect of interest. While this is nominally an easy task, in practice one must acquire a tested grammar for the dialect of the language at hand. Often for legacy codes, the dialect is unique and the developing organization will need to build their own parser. Worse, legacy systems often have a number of languages and a parser is needed for each. Standard tools such as Lex and Yacc are rather a disappointment for this purpose, as they deal poorly with lexical hiccups and language ambiguities.” [Baxter 98]

Problems Unknown Duplicated Code ScalabilityUnderstanding

40

Page 49: Ducasse's Maintenance Expertise

S.Ducasse LSE

Language Independent

Language independent, Textual, [ICSM’99], M. Rieger’s PhD. Thesis

Duploc handledPascal, Java, Smalltalk, Python, Cobol, C++, PDP-11, C

Slower than other approaches but...Max 45 min to adapt our approach to a new languageBetween 3% and 10% less identification than parametrized match

41

Exact Copies

a b c d e f a b c d e f

Copies with

a b c d e fa b x y e f

Page 50: Ducasse's Maintenance Expertise

S.Ducasse LSE

A Conceptual MatrixFile A

File A

File B

File B

Exact Copies

a b c d e f a b c d e f

Copies with

a b c d e fa b x y e f

Variations42

Page 51: Ducasse's Maintenance Expertise

S.Ducasse LSE

Entities that change together can reveal hidden dependencies

43

2 3A 3 3 4 6

6 6B 6 5 6 7

3 3C 5 5 8 9

1 3D 3 4 4 6

4 5E 5 6 6 6

v1 v2 v3 v4 v5 v6

(A,D,E)(v2)

(A,D)(v2,v6)

(A,B,C,D)(v6)

(A,B,C,D,E)()

(D,E)(v2,v4)

(A,B,C)(v5,v6)

(A)(v2,v5,v6)

(D)(v2,v4,v6)

(C)(v3,v5,v6)

()(v1,v2,v3,v4,v5,v6)

Page 52: Ducasse's Maintenance Expertise

S.Ducasse LSE

How properties spread in large systems?Properties:

MetricsPeopleSymbol/Concepts

Spread = how many packages does it touch?Focus = do packages and properties match?

Distribution Map:a generic visualization

44

Page 53: Ducasse's Maintenance Expertise

S.Ducasse LSE

Distribution Map

45

Page 54: Ducasse's Maintenance Expertise

S.Ducasse LSE

Ownership• Authors in JBoss

46

Page 55: Ducasse's Maintenance Expertise

S.Ducasse LSE

Characterizing PackagesButterflies [Metrics05]

Kind of Radar

47

Page 56: Ducasse's Maintenance Expertise

S.Ducasse LSE

Relative version

48

Page 57: Ducasse's Maintenance Expertise

S.Ducasse LSE

How to understand PackagesPackages are key structuring elementsBut complex:

importclasses....

Package Blueprints[ICSM 2007]

49

Page 58: Ducasse's Maintenance Expertise

S.Ducasse LSE

Surfaces represent package communication

50

A1

B1

C1

D1

P2 P3 P4

P1: analyzed package E1

A2 A3 A4 B4

P1 blueprint

A2 A1

A3 B1

A4 C1

B4 D1 E1

classes in P1that do

references

referenced classes

P4 surface

P2 surface

P3 surface

Page 59: Ducasse's Maintenance Expertise

S.Ducasse LSE

Principle

51

D1

D1

D1

col

most—leastinternal referencing classes External

referenced classes

A4 G1F1

I1H1

B1C1

A2 A1

B2

E1

A1

G1

A3 E1

B3 F1 G1

D1 C1

E1

exte

rnal

re

fere

nces

inte

rnal

refe

renc

es headbody

Internalreferenced classes

C1

col

E1

col

F1

col

G1

col

A1

col

B1

col

H1

col

I1

col

P1

C1

A1

B1

P2

A2

B2

D1

P3

A3

E1

B3

F1

P4

I1

G1

H1

A4

Package under analysisP1

D1

D1

D1

col

most—leastinternal referencing classes External

referenced classes

A4 G1F1

I1H1

B1C1

A2 A1

B2

E1

A1

G1

A3 E1

B3 F1 G1

D1 C1

E1

exte

rnal

re

fere

nces

inte

rnal

refe

renc

es headbody

Internalreferenced classes

C1

col

E1

col

F1

col

G1

col

A1

col

B1

col

H1

col

I1

col

Page 60: Ducasse's Maintenance Expertise

S.Ducasse LSE

Example

52

Page 61: Ducasse's Maintenance Expertise

S.Ducasse LSE

Symbols contain domain information

• What are the concepts used in an application?• How can we use symbolic information?

53

Page 62: Ducasse's Maintenance Expertise

S.Ducasse LSE

Looking at the Symbols• Developers use meaningful names, which capture

the domain knowledge.

54

Page 63: Ducasse's Maintenance Expertise

S.Ducasse LSE

A cluster is a group of documents which use the same terms

55

Page 64: Ducasse's Maintenance Expertise

S.Ducasse LSE

Moose has been validated on real life systems

Several large, industrial case studies (NDA)Harman-BeckerNokiaDaimlerSiemens

Different implementation languages (C++, Java, Smalltalk, Cobol)

We use external C++ parsersDifferent sizesMoose is used in several research groups

56

Page 65: Ducasse's Maintenance Expertise

S.Ducasse LSE

Possible New Research Directions

• Remodularization• Clustering analysis• Open and Modular modules

• Service Identification in Service Oriented Architecture• Architecture Extraction/Validation• Software Quality• Cost/Bugs prediction• EJB evaluation• Business rules extraction• Model transformation• Test

57

Page 66: Ducasse's Maintenance Expertise

S.Ducasse LSE

Evolution/Maintenance is a challenge

Understanding and maintaining large and complex applications needs better tools/analyses

Moose is a platform for developing new analysesTransfer to tool vendors

58