Upload
ducasse
View
636
Download
0
Tags:
Embed Size (px)
Citation preview
Stéphane Ducasse
LSE
Stéphane [email protected]://stephane.ducasse.free.fr/
A Portfolio of Software Evolution Expertise
1
S.Ducasse LSE
A word of presentationCo-author of Object-Oriented Reengineering PatternsCo-developer of Moose (reengineering platform)10 PhD Theses in reengineering50+ articlesGrounded in realityWas maintainer of Squeak 3.9
Worked with:Harman-Becker AG Bedag AGNokia, Daimler
2
S.Ducasse LSE
Roadmap• Some facts• Our approach
• Supporting maintenance• Moose an open-platform
• Some visual examples
• Conclusion
3
S.Ducasse LSE
Software is complex.
The Standish Group, 2004
53% Challenged
18% Failed
29% Succeeded
4
S.Ducasse LSE
How large is your project?
5
S.Ducasse LSE
How large is your project?
5
S.Ducasse LSE
How large is your project?
5
S.Ducasse LSE
How large is your project?
5
S.Ducasse LSE
How large is your project?
1’000’000 lines of code
5
S.Ducasse LSE
How large is your project?
1’000’000 lines of code* 2 = 2’000’000 seconds
5
S.Ducasse LSE
How large is your project?
1’000’000 lines of code* 2 = 2’000’000 seconds
/ 3600 = 560 hours
5
S.Ducasse LSE
How large is your project?
1’000’000 lines of code* 2 = 2’000’000 seconds
/ 3600 = 560 hours/ 8 = 70 days
5
S.Ducasse LSE
How large is your project?
1’000’000 lines of code* 2 = 2’000’000 seconds
/ 3600 = 560 hours/ 8 = 70 days
/ 20 = 3 months
5
S.Ducasse LSE
Maintenance is Continuous Development
6
Relative Maintenance EffortBetween 50% and 75% of global
effort is spent on “maintenance” !
17.4% Corrective(fixing reported errors)
18.2% Adaptive(new platforms or OS)
60.3% Perfective(new functionality)
4.1% Other
The bulk of the maintenance cost is due to new functionalityeven with better requirements, it is hard to predict new functions
S.Ducasse LSE
Lehman’s Software Evolution LawsContinuous Change: “A program that is used in a real-world environment must change, or become progressively less useful in that environment.”
Software Entropy: “As a program evolves, it becomes more complex, and extra resources are needed to preserve and simplify its structure.”
7
S.Ducasse LSE
Roadmap• Some facts• Our approach
• Supporting maintenance• Moose an open-platform
• Some visual examples• Conclusion
8
S.Ducasse
Supporting the evolution of applications
A research goal and agenda grounded in reality
How to help companies maintaining their large software?What is the xray for software?
code, people, practicesWhich analyses?How can you monitor your system (dashboards....)How to present extracted information?
9
S.Ducasse
Covered topics
TopicsMetamodeling, Software metrics,Program understanding,Visualization, Evolution analysis,Duplicated code detection,Code Analysis, Refactorings,Tests
ContributionsMoose: an open-source extensible reengineering environment: (Lugano, Bern, Annecy, Anvers, Louvain la neuve, ULB, UTSL)
ContactsHarman-Becker (3 Millions C++), Bedag (Cobol), Nokia, ABB, IMEC
10
Representation Transformations
Reverse
Engineering
Analyses
Evolution
S.Ducasse LSE11
Representation Transformations
ReverseEngineering
Analyses
EvolutionLanguage Independent Meta Model (FAMIX)
[UML99]An Extensible Reengineering Environment (Moose)
[Models 06]
Reengineering PatternsVersion Analyses
[ICSM 05]HISMO metamodel
[JSME 05]
Understanding Large Systems [WCRE99, TSI00, TSE03]Static/Dynamic Information
[ICSM99]Feature Analysis
[JSME 06]Class Understanding
[OOPSLA01,TSE04]Package Blueprints
[ICSM 07]Distribution Maps
[ICSM 06]
Software Metrics [LMO99, OOPSLA00]
Duplicated Code Identification[ICSM99, ICSM02]
Group Identification [ASE03]
Test Generation [CSMR 06]Concept Identification
[WCRE 06]
Language Independent Refactorings
[IWPSE 00]
S.Ducasse
One Example: who is responsible of what?
12
(1) Extraction
(2) Modèle
(4) Visualisation
(3) Analyses
Distribution Map of authors on JBoss
S.Ducasse LSE
Moose is a reengineering tool which integrates multiple techniques
13
Visualization
Evolution Analysis
Moose
…
Queries and Navigation
Number of classes = 382Number of methods = 4268…
Semantic Analysis
Metrics
word1 word2
S.Ducasse LSE
Moose is open and open-sourcemeta-describedmeta-model aware
14
Method Class
Inheritance
S.Ducasse LSE
Designed to be extensible
15
Method Class
Inheritance
Author
File
Duplication
Event
Trace
ClassVersion
ClassHistory
S.Ducasse LSE
Roadmap• Some facts• Our approach
• Supporting maintenance• Moose an open-platform
• Some visual examples• Conclusion
16
S.Ducasse LSE
Understanding large systemsUnderstanding code is difficult!Systems are largeCode is abstractShould I really convinced you?
Some existing approachesMetrics: problems you often get meaningless results once combinedVisualization: often beautiful but without meaning
17
S.Ducasse LSE
Polymetric views
18
W: # fields H: # methodsC: # lines of code
S.Ducasse LSE
Polymetric views condense information
19
Classes+InheritanceW: # of Added Methods H: # of Overridden MethodsC: # of Method Extended
To get a feel of the inheritance semantics: adding vs. reusing
methods LOC # statements # parameters
S.Ducasse LSE
Navigating Views...
20
S.Ducasse LSE
Understanding classesUnderstanding even a class is difficult!
21
S.Ducasse LSE
Class Blueprint
22
Invocation Sequence
Initialization External Interface Internal Implementation Accessor Attribute
Enriched call flow annotated with metrics to give semantics
S.Ducasse LSE
Class Blueprint
23
S.Ducasse LSE
Large delegating interface
24
S.Ducasse LSE
Sharing Flows
25
S.Ducasse LSE
Regular Subclasses
26
S.Ducasse LSE
Patterns
27
S.Ducasse LSE
How can we predict changes?Common wisdom stresses that what changes yesterday will change today, but it is true?
In the Sahara the weather is constant, tomorrow: 90% chance that it is the same as today
In Belgium, the weather is changing really fast (sea influence), 30% chance that it is the same as today
28
S.Ducasse LSE
With history analysis we can get the climate of a software system
29
Past LateChangers
Future EarlyChangers
Presentversion
Pastversions
Futureversions
1, TopLENOM1..i (S, t1) ∩ TopEENOMi..n (S, t2) ≠ ∅
YWi(S) = 0, TopLENOM1..i (S, t1) ∩ TopEENOMi..n (S, t2) = ∅
hitYW(S, t1, t2) =
∑ YWi(S, t1, t2)
n - 2
S.Ducasse LSE
How developers develop?• More efficient to put people working together in the
same office? • How can we optimize software development?
30
S.Ducasse LSE
Who did that?
31
Files
Time
S.Ducasse LSE
Line colors show which author owned which files in which period
32
File A
File B
Green authorlarge commit
Green authorownership
Blue authorsmall commit
S.Ducasse LSE
Which author “possesses” which files?
33
S.Ducasse LSE
Alphabetical order is no order!
34
S.Ducasse LSE
Based on similar commit signature
35
DialogueMonologue
Edit Takeover
Familiarization
S.Ducasse LSE
Understanding evolution of large systems
• How old are the hierarchies?• How did the classes change?• How did the inheritance change?
36
S.Ducasse LSE
Evolution holds useful information
37
A
B
A
BC
A
BC
D
A
BC
D
A
D
time
B is stable
C was removed
E is newborn
A is persistent
D inherited from C and then from A …
S.Ducasse LSE
Hierarchy Evolution Complexity View characterizes class hierarchy histories
38
B is stable
C was removed
E is newborn
A is persistent
D inherited from C and then from A …
A
B
E
C
D
ENOS
Removed
Age
Removed
Age InheritanceHistory
ClassHistory
ENOM
S.Ducasse LSE
Class hierarchies over 40 versions of Jun - a 740 classes, 3D framework
39
S.Ducasse LSE
Identifying Duplicated Code“Parsing the program suite of interest requires a parser for the language dialect of interest. While this is nominally an easy task, in practice one must acquire a tested grammar for the dialect of the language at hand. Often for legacy codes, the dialect is unique and the developing organization will need to build their own parser. Worse, legacy systems often have a number of languages and a parser is needed for each. Standard tools such as Lex and Yacc are rather a disappointment for this purpose, as they deal poorly with lexical hiccups and language ambiguities.” [Baxter 98]
Problems Unknown Duplicated Code ScalabilityUnderstanding
40
S.Ducasse LSE
Language Independent
Language independent, Textual, [ICSM’99], M. Rieger’s PhD. Thesis
Duploc handledPascal, Java, Smalltalk, Python, Cobol, C++, PDP-11, C
Slower than other approaches but...Max 45 min to adapt our approach to a new languageBetween 3% and 10% less identification than parametrized match
41
Exact Copies
a b c d e f a b c d e f
Copies with
a b c d e fa b x y e f
S.Ducasse LSE
A Conceptual MatrixFile A
File A
File B
File B
Exact Copies
a b c d e f a b c d e f
Copies with
a b c d e fa b x y e f
Variations42
S.Ducasse LSE
Entities that change together can reveal hidden dependencies
43
2 3A 3 3 4 6
6 6B 6 5 6 7
3 3C 5 5 8 9
1 3D 3 4 4 6
4 5E 5 6 6 6
v1 v2 v3 v4 v5 v6
(A,D,E)(v2)
(A,D)(v2,v6)
(A,B,C,D)(v6)
(A,B,C,D,E)()
(D,E)(v2,v4)
(A,B,C)(v5,v6)
(A)(v2,v5,v6)
(D)(v2,v4,v6)
(C)(v3,v5,v6)
()(v1,v2,v3,v4,v5,v6)
S.Ducasse LSE
How properties spread in large systems?Properties:
MetricsPeopleSymbol/Concepts
Spread = how many packages does it touch?Focus = do packages and properties match?
Distribution Map:a generic visualization
44
S.Ducasse LSE
Distribution Map
45
S.Ducasse LSE
Ownership• Authors in JBoss
46
S.Ducasse LSE
Characterizing PackagesButterflies [Metrics05]
Kind of Radar
47
S.Ducasse LSE
Relative version
48
S.Ducasse LSE
How to understand PackagesPackages are key structuring elementsBut complex:
importclasses....
Package Blueprints[ICSM 2007]
49
S.Ducasse LSE
Surfaces represent package communication
50
A1
B1
C1
D1
P2 P3 P4
P1: analyzed package E1
A2 A3 A4 B4
P1 blueprint
A2 A1
A3 B1
A4 C1
B4 D1 E1
classes in P1that do
references
referenced classes
P4 surface
P2 surface
P3 surface
S.Ducasse LSE
Principle
51
D1
D1
D1
col
most—leastinternal referencing classes External
referenced classes
A4 G1F1
I1H1
B1C1
A2 A1
B2
E1
A1
G1
A3 E1
B3 F1 G1
D1 C1
E1
exte
rnal
re
fere
nces
inte
rnal
refe
renc
es headbody
Internalreferenced classes
C1
col
E1
col
F1
col
G1
col
A1
col
B1
col
H1
col
I1
col
P1
C1
A1
B1
P2
A2
B2
D1
P3
A3
E1
B3
F1
P4
I1
G1
H1
A4
Package under analysisP1
D1
D1
D1
col
most—leastinternal referencing classes External
referenced classes
A4 G1F1
I1H1
B1C1
A2 A1
B2
E1
A1
G1
A3 E1
B3 F1 G1
D1 C1
E1
exte
rnal
re
fere
nces
inte
rnal
refe
renc
es headbody
Internalreferenced classes
C1
col
E1
col
F1
col
G1
col
A1
col
B1
col
H1
col
I1
col
S.Ducasse LSE
Example
52
S.Ducasse LSE
Symbols contain domain information
• What are the concepts used in an application?• How can we use symbolic information?
53
S.Ducasse LSE
Looking at the Symbols• Developers use meaningful names, which capture
the domain knowledge.
54
S.Ducasse LSE
A cluster is a group of documents which use the same terms
55
S.Ducasse LSE
Moose has been validated on real life systems
Several large, industrial case studies (NDA)Harman-BeckerNokiaDaimlerSiemens
Different implementation languages (C++, Java, Smalltalk, Cobol)
We use external C++ parsersDifferent sizesMoose is used in several research groups
56
S.Ducasse LSE
Possible New Research Directions
• Remodularization• Clustering analysis• Open and Modular modules
• Service Identification in Service Oriented Architecture• Architecture Extraction/Validation• Software Quality• Cost/Bugs prediction• EJB evaluation• Business rules extraction• Model transformation• Test
57
S.Ducasse LSE
Evolution/Maintenance is a challenge
Understanding and maintaining large and complex applications needs better tools/analyses
Moose is a platform for developing new analysesTransfer to tool vendors
58