Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Identification of design motifs with pattern matching
algorithms
Information and Software Technology (2010)Oliver Kaczor, Yann-Gaël Guéhéneuc, and Sylvie Hamel
2010. 01. 20.Presented by Seul-Ki Lee
Contents
IntroductionRelated workPre-processing stepIdentification step
Automata simulationBit-vector processing algorithm
Case studyConclusionDiscussion
ⓒKAIST SE LAB 2010 2/22
Introduction (1/3)
Software maintenance and evolution time- and resource-consuming activity
• More than 50% of the total cost of the systems• Documentation is often useless and lost
Main tasks of maintainer• Design recovery
– Build higher-level abstraction from code to understand system– Benefit from the design pattern used by developer
0%20%40%60%80%
100%
Zelkowits et al,
[1979]
Lients & Swarson,
[1981]
McKee, [1984]
Port, [1988]
Huff, [1990]
Moad, [1990]
Eastwood, [1993]
Erlikh, [2000]
(studied year)
(cost ratio)
cost ratio = maintenance & evolution cost / total cost
3/22
Introduction (2/3)
Design patterns and design motifsSolution to recurring design problems
• 23 patterns by GoF(Gang of Four) and other patterns
Micro-architecturesConcrete indication of some design motifs in the implemented systemSynonym with the set of motifs
Design patterns
Design motifs
Not observable(intent, motivation, applicability, consequences)
Observable
(structure, participants, collaborations)
ⓒKAIST SE LAB 2010 4/22
Introduction (3/3)
ObjectivesIdentify the micro-architectures
• Adapt two classical approximate string matching algorithms– Automata simulation– Bit-vector processing
• Set the four approximate types to identify efficiently
ⓒKAIST SE LAB 2010 5/22
Related work (1/2)
Yann-Gaël Guéhéneuc et al. [TSE, 2008]Combination of machine learning algorithms and explanation-based programming
• Remove search space by quantifying the micro-architectures• Apply explanation-based programming to identify micro-
architectures on remaining entities
Limitation• Performance of explanation-based constraint programming
ⓒKAIST SE LAB 2010 6/22
Related work (2/2)
Tsantalis et al. [TSE, 2006]Combination of graphs and matrices
• Convert the graphs representing the structure of a motif• Use matrices representing entities and relationship• Compute similarity scoring between matrices
Limitation• Hard to descript the possible approximations• Exist some errors in the reported results
– There are some wrong results
ⓒKAIST SE LAB 2010 7/22
Automata simulation
Overall processing step
Systems Design motifs Inputs
Pre-processing
Identification processing with approximation
OutputsIdentifiedoccurrences
Represented byUML-like models
1. Adjust constraints to reduce false positive2. Transform models to Eulerian graphs3. Transform graphs to strings
Bit-vector processing
Compared with true occurrences (TO)
ⓒKAIST SE LAB 2010 8/22
Pre-processing (1/3)
Step1. Adjust ignorance relationship to reduce false positive occurrencesand describe to PADL meta-model
Relationship type
as : associationag : aggregationco : compositiondm : dummyin : inheritanceig : ignorance
ⓒKAIST SE LAB 2010 9/22
Pre-processing (2/3)
UML-like model of systemUML-like model of design motif
Eulerian model of systemEulerian model of design motif
Step2. Transform to Eulerian graphs by adding dummy relationships
Pre-processing (3/3)
Eulerian model of systemEulerian model of design motif
Step3. Transform to strings by using directed Chinese Postman problem, shortest tour of a graph
Component in Leaf dm Component in Composite co Leaf A in B in D dm B in E co B in C dm G cr C dm G cr D dm G cr E dm G as F ag A
Strings of Eulerian models
ⓒKAIST SE LAB 2010 11/22
Identification process (1/3)
Automata simulation
Definition of occurrences of motif• Each path to the final state
Limitation• Hard to manage a very large number of paths• Hard to handle ignorance relationships
A in B in D dm B in E co B in C dm G cr C dm G cr D dm G cr E dm G as F ag A
B D D B
B E
E B
ⓒKAIST SE LAB 2010 12/22
Identification process (2/3)
Bit-vector processing algorithmCharacteristics
• : token, : string
G=
• Apply standard bit operations– Bit-wise logical “and”, “or” operators– Left ( ) and right ( ) circular shifts
• Use negation operator– Easy to handle ignorance relationships
A in B in D dm B in E co B in C dm G cr C dm G cr D dm G cr E dm G as F ag A
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0
ⓒKAIST SE LAB 2010 13/22
Identification process (3/3)
Bit-vector processing algorithm (cont’d)Compare micro-architecture with design motif
A in B in D dm B in E co B in C dm G cr C dm G cr D dm G cr E dm G as F ag A
Component in Leaf dm Component in Composite co Leaf
TripletsFirst (in) Third (in) Fourth (co)
Component Leaf LeafComp
onentComposite LeafComp
onentComposite
A B A B B B C EB C B C C B D EB D B C D B E EB E B C E
B D CB D DB D EB E CB E DB E E
14/22
Case study (1/5)
EnvironmentSelected design motifs
• Abstract Factory, Composite, Decorator, Observer, State/Strategy
Systems• Seven open-source applications• Wide range of domains• Small to medium (57 classes to 742 classes)
Adjusted approximation• Replace a stronger or weaker relationships• Allow entities adding in or removing from the hierarchy• Change the role between abstract and concrete entities• Accept cases that all roles are not played in the design motif
ⓒKAIST SE LAB 2010 15/22
Case study (2/5)
Computation timeAbstract Factory design motif
Composite design motif
ⓒKAIST SE LAB 2010 16/22
Case study (3/5)
Definition of metrics to measure Precision
Recall
ⓒKAIST SE LAB 2010 17/22
Case study (4/5)
Comparison between identified occurrencesAbstract Factory design motifJHotDraw QuickUML
Composite design motifJHotDraw QuickUML
TO CP+M BV0 5235 250
Precision 0.00% 0.00%
Recall 0.00% 0.00%
Time 37,922 s 32 s
TO CP+M BV8 1159 11
Precision 100.00% 100.00%
Recall 100.0% 100.00%
Time 20,320 s 97 s
TO CP+M BV52 115 103
Precision 100.00% 100.00%
Recall 100.00% 100.00%
Time 907 s 3 s
TO CP+M BV8 1159 11
Precision 100.00% 100.00%
Recall 100.00% 100.00%
Time 1164 s 28 s
TO (true occurrences)CP+M (constraint programming with metrics)BV (bit-vector processing algorithm)ⓒKAIST SE LAB 2010 18/22
Case study (5/5)
Comparison between identified occurrences (same definition of an occurrence as Tsantalis et al)
Abstract Factory design motifJHotDraw QuickUML
Composite design motifJHotDraw QuickUML
TO SC BV0 - 250
Precision - 0.00%
Recall - 100.00%
TO SC BV8 - 11
Precision - 72.73%
Recall - 100.00%
TO SC BV2 1 8
Precision 100.00% 25.00%
Recall 50.00% 100.00%
TO SC BV2 1 2
Precision 100.00% 100.00%
Recall 50.00% 100.00%
TO (true occurrences)SC (scoring algorithm)BV (bit-vector processing algorithm)ⓒKAIST SE LAB 2010 19/22
Conclusion
SummaryPresent two pattern matching algorithms to identify design motifs
• Automata simulation – not different result on performance• Bit-vector processing – better performance than previous works
Allow to adjust approximation adequately
Future workUse of metric-based analysis to reduce search spaceData integration related to method and field declarationsMore experiments on even larger systems and other patterns (anti-patterns)
ⓒKAIST SE LAB 2010 20/22
Discussion
ContributionSuggest new approach to identify design patterns
• Provide interesting performance, precision and recall with respect to previous approaches
• Present that automata simulation seems too slow to be useful• Suggest four approximations to identify design motifs flexibly
LimitationLimited precision when considering the developers’ intentsDifficulty on identifying the Singleton motifLack of approximations based on method type or field access
ⓒKAIST SE LAB 2010 21/22
Thank you