View
151
Download
0
Category
Preview:
DESCRIPTION
Software clone, detection, empirical studies, validation
Citation preview
School of Computing
Kingston, Canada
Contextualized Analysis of Web Services
James R. Cordy
David B. Skillicorn
Douglas Martin
Scott Grant
When is a Clone not a Clone? (and vice-versa)
Motivation � The Personal Web
� Rapidly growing number of web services makes it increasingly difficult to find and choose the right ones
� Need a quick and convenient way to find alternatives
� Hand tagging impractical – automation is needed!
� Automation � Similarity detection techniques offer solutions!
� Code clone detection from software engineering research can find similar code fragments – why not similar services?
� Topic models from data mining research can find text documents with similar semantics – why not similar services?
Motivation
Web Service Similarity � Web services are stored in
service registries, containing WSDL service description files
� Could apply clone detection to entire service descriptions
� But what we really want are similar service operations
<operation name="GetStock" > <input message="tns:GetStockRequest" /> <output message="tns:GetStockResponse" /> </operation>
<operation name="GetStock" > <input message="tns:GetStockRequest" /> <output message="tns:GetStockResponse" /> </operation>
<complexType name=“Stock”> <sequence> <element name=“Supplier” type=“xsd:string”/> <element name=“Warehouse” type=“xsd:string”/> <element name=“OnHand” type=“xsd:string”/> <element name=“OnOrder” type=“xsd:string”/> <element name=“Demand” type=“xsd:string”/> </sequence> </complexType >
Let’s try it!
<complexType name=“Stock”> <sequence> <element name=“date” type=“xsd:string”/> <element name=“open” type=“xsd:float”/> <element name=“high” type=“xsd:float”/> <element name=“low” type=“xsd:float”/> <element name=“close” type=“xsd:float”/> <element name=“volume” type=“xsd:float”/> </sequence> </complexType >
<operation name=“DrawRateChartCustom”> <input message=“DrawRateChartCustomIn”/> <output message=“DrawRateChartCustomOut”/> </operation>
<operation name="GetTopicBinaryChartCustom"> <input message="GetTopicBinaryChartCustomSoapIn"/> <output message="GetTopicBinaryChartCustomSoapOut"/> </operation>
How about these?
So what went wrong?
� At this point we thought maybe our idea wasn’t going to work
� Maybe clone detection can’t help with web service discovery?
� But why? What’s so special about WSDL?
Web Service Description Language (WSDL)
� A WSDL service description has 3 main parts:
Web Service Description Language (WSDL)
� A WSDL service description has 3 main parts:
� a <portType> element where the operations are declared;
Web Service Description Language (WSDL)
� A WSDL service description has 3 main parts:
� a <portType> element where the operations are declared;
� <message> elements corresponding to inputs, outputs and faults of the operations;
Web Service Description Language (WSDL)
� A WSDL service description has 3 main parts:
� a <portType> element where the operations are declared;
� <message> elements corresponding to inputs, outputs and faults of the operations;
� and a <types> element containing an XML Schema that defines the data and structure types used in the messages
Web Service Description Language (WSDL)
� This simple example service has two operations:
Web Service Description Language (WSDL)
� This simple example service has two operations:
� ReserveRoom
Web Service Description Language (WSDL)
� This simple example service has two operations:
� ReserveRoom
� GetAvailableRooms
Web Service Description Language (WSDL)
� WSDL service description files contain descriptions of the operations that a web service has to offer
� But the pieces of each operation’s own description are scattered over different parts of the WSDL file
� Difficult to identify complete units to analyze and compare
The Problem
� This poses a problem for analysis techniques:
� Operations cannot easily be compared for similarity using clone detectors, because there are no contiguous fragments to compare
� And they cannot be analyzed using data mining topic models, because there are no separate complete documents to generate a model from
Our Solution � Our solution is to contextualize the original
<operation> elements, to create self-contained operation descriptions � We use source transformation to inline remote
information from the context into the elements that reference or depend on them
� We call these contextualized WSDL operations Web Service Cells, or WSCells � The first example of a new kind of clone detection:
contextual clones
Contextualizing WSDL Operations
Contextual Clone Detection
An Experiment � We have run an experiment to investigate the
difference between clone detection on WSCells and original raw operations
� Two sets of WSDL service description files: 1,100 operations and 7,500 operations
� Compared NICAD clone detector results for each set at various near-miss difference thresholds
0% = exact clone, 10% = 1 line in 10 different, and so on
An Experiment � Number of clones decreases with WSCells
Difference Threshold
Clone Pairs in Set 1 Clone Pairs in Set 2
Originals WSCells Originals WSCells
0.0 852 705 1434 1066
0.1 852 734 1434 1228
0.2 879 775 1438 1637
0.3 884 813 1469 1637
<operation name="GetStock" > <input message="tns:GetStockRequest" /> <output message="tns:GetStockResponse" /> </operation>
<operation name="GetStock" > <input message="tns:GetStockRequest" /> <output message="tns:GetStockResponse" /> </operation>
<complexType name=“Stock”> <sequence> <element name=“Supplier” type=“xsd:string”/> <element name=“Warehouse” type=“xsd:string”/> <element name=“OnHand” type=“xsd:string”/> <element name=“OnOrder” type=“xsd:string”/> <element name=“Demand” type=“xsd:string”/> </sequence> </complexType >
<complexType name=“Stock”> <sequence> <element name=“date” type=“xsd:string”/> <element name=“open” type=“xsd:float”/> <element name=“high” type=“xsd:float”/> <element name=“low” type=“xsd:float”/> <element name=“close” type=“xsd:float”/> <element name=“volume” type=“xsd:float”/> </sequence> </complexType >
� Reduction in false positives
� Number of clone classes can increase with WSCells
Difference Threshold
Clone Classes in Set 1 Clone Classes in Set 2
Originals WSCells Originals WSCells
0.0 169 187 587 433
0.1 169 139 587 499
0.2 172 142 589 631
0.3 171 136 591 631
An Experiment
<operation name="GetStock" > <input message="tns:GetStockRequest" /> <output message="tns:GetStockResponse" /> </operation>
<operation name="GetStock" > <input message="tns:GetStockRequest" /> <output message="tns:GetStockResponse" /> </operation>
<complexType name=“Stock”> <sequence> <element name=“Supplier” type=“xsd:string”/> <element name=“Warehouse” type=“xsd:string”/> <element name=“OnHand” type=“xsd:string”/> <element name=“OnOrder” type=“xsd:string”/> <element name=“Demand” type=“xsd:string”/> </sequence> </complexType >
<complexType name=“Stock”> <sequence> <element name=“date” type=“xsd:string”/> <element name=“open” type=“xsd:float”/> <element name=“high” type=“xsd:float”/> <element name=“low” type=“xsd:float”/> <element name=“close” type=“xsd:float”/> <element name=“volume” type=“xsd:float”/> </sequence> </complexType >
� Splits by deeper differences –more precision
Clone Detection for Web Services
� Contextual clone detection with WSCells works!
� Not only finds similar web service operations, but uncovers similar operations we could not find in any other way
<operation name=“DrawRateChartCustom”> <input message=“DrawRateChartCustomIn”/> <output message=“DrawRateChartCustomOut”/> </operation>
<operation name="GetRealChartCustom"> <input message="GetRealChartCustomSoapIn"/> <output message="GetRealChartCustomSoapOut"/> </operation>
<operation name="GetLastSaleChartCustom"> <input message="GetLastSaleChartCustomSoapIn"/> <output message="GetLastSaleChartCustomSoapOut"/> </operation>
<operation name=“DrawYieldCurveCustom”> <input message=“DrawYieldCurveCustomIn”/> <output message=“DrawYieldCurveCustomOut”/> </operation>
<operation name="GetTopicChartCustom"> <input message="GetTopicChartCustomSoapIn" /> <output message="GetTopicChartCustomSoapOut" /> </operation> <operation name="GetTopicBinaryChartCustom">
<input message="GetTopicBinaryChartCustomSoapIn"/> <output message="GetTopicBinaryChartCustomSoapOut"/> </operation>
Semantic Analysis of Web Services
� Contextualized WSCells also make it possible to use data mining topic models to do semantic analysis of web services � Because they provide self-contained documents of
significant size
� Might topic models provide a different view of web service similarity?
Latent Dirichlet Allocation � Latent Dirichlet Allocation (LDA) :
� A statistical model to uncover latent topics
� Identifies the correlation between documents in terms of shared latent topics (sets of tokens)
� Accepts a set of documents (e.g., source files) as input, returns probability distributions over inferred topics (a topic model) as output � Each document has some probability of being related
to topic 1, another probability for topic 2, and so on
� Similar documents should be related to similar topics
Latent Dirichlet Allocation � Documents are represented in the model in terms
of probability distributions over topics
� Similarity between documents is found using the Hellinger Distance � A measure of how much agreement there is between
the shared topics of two documents � Almost identical documents have a small Hellinger
Distance since they will be related to the same topics � In terms of web services, small Hellinger Distances
indicate highly related operations
Evaluating WSCells
� To evaluate the use of WSCells with LDA, we : � Generate an LDA model for the original <operation>
elements, and another for the contextualized WSCells � Explore the Global and Local Similarity between each
pair of operations in the models
� Global Similarity an overall view of the most closely related web service operations in the service set
� Local Similarity a per-operation view of the other most related web service operations for each operation
Global Similarity � We look at Global Similarity using a visualization
called Bluevis
� Bluevis shows the global conceptual structure of a system by highlighting similar operations using an illuminated line from left-to-right � Plot some top fraction of similar operations
(top 25,000 in our examples) � Use a consistently ordered list of web service
operations for the LDA model to view the differences � If a display is noisy, it is often an indication that the
model is not identifying meaningful data
Global Similarity
Global Similarity
� For original raw operations: � Bluevis highlights the LDA
most similar operations � Some clear structure
� However, most of this is due to shared keywords, like get and SOAP
� This uncontextualized model has very little value
Global Similarity
Global Similarity
� For contextualized WSCells: � A clearer semantic
structure, less noise overall � Operation similarity
becomes meaningful
� Services with semantic similarity discovered � E.g., Operations with
similar parameters or faults, such as those that manipulate holiday dates or financial rates
Local Similarity � We can also examine the local similarity for each
individual operation � Identify the complete ordered list of similarity scores
for an operation in the data set
� Using the top similarity scores, evaluate how meaningful the data is from a user's perspective � For example, how can I find the most similar web
service operations to the one I am using now?
� We use a tool called POCO (Pairwise Observation of Concepts) to examine the most similar operations
Local Similarity
Local Similarity Operation Most similar WSCell Most similar original raw
WSDL operation
ListFinancials GetFinancialServicesFromList LanguagesList
ExportShipsAndCategories ExportIteneraryAndSteps Search
GetIssueData GetFlightData word_cloud
GetWeatherReport GetWeather GetIndices
GetAIDIBOR GetTRLIBOR GetCarriers
searchByIdentifier searchByNameAndAddress GetLastSecurityHeadlines
ToolsAndHardwareBox KitchenAndHousewareBox ListRenditions
GetReservations GetRoomAvailabilityForDay GetSOFIBOR
GetOtherProductInfo NextOtherProductPortion GetParkingInfo
GetAllSplitsByExchange GetAllCashDividendsByExchange GetTeamLoyalties2
Summary � Very-high-level domain-specific languages such as
WSDL make poor targets for similarity analysis using clone detection and topic models � Lack of local context prevents meaningful results
� Contextualizing using WSCells exposes both cloning and semantic relationships between web operations � Clone detection of WSCells identifies similar web
service operations � Topic models of WSCells expose both global
system-wide semantic relationships and local individual relationships between operations
Current & Future � Continue analysis of web services for the Personal
Web using our results
� Apply contextualization to similarity analysis of other modeling and specification languages (currently Simulink, Stateflow and UML sequence diagrams)
� Experiment with effect of contextualization on clone and topic model analysis of traditional languages such as Java and C (“contextual clones”)
James R. Cordy
David B. Skillicorn
Douglas Martin
Scott Grant
Questions?
Contextualized Analysis of Web Services
When is a Clone not a Clone? (and vice-versa)
Recommended