[IEEE 2010 Fifth Chinagrid Annual Conference (ChinaGrid) - Guangzhou, TBD, China (2010.07.16-2010.07.18)] 2010 Fifth Annual ChinaGrid Conference - A Semantic Web Service Discovery

A Semantic Web Service Discovery Model Based On Pastry System

Ting Wang1 Dept. of Computer Science & Technology Univ. of Beijing University of Technology

Beijing, China 100124 [email protected]

Rui-Hua Di2 Dept. of Computer Science & Technology Univ. of Beijing University of Technology

Beijing, China 100124 [email protected]

ABSTRACT: The structure has been put forward of P2P systems are strictly based on the keyword matching to find the way of routing resources, so the routing mechanism will not be able to reflect the massive, distributed resources, and semantic information, thus reducing the precision of service discovery and search prospective rate. Semantic Web as the next generation of the embryonic form of the World Wide Web, which supports the semantic search query and increase the efficiency of the realization of intelligent search. Based on the Pastry system, a typical P2P system, advanced a new type of ontology semantic Web Service discovery model, and made in-depth discussions in how to make semantics and peer-to-peer networks integration.

KEYWORDS: Semantic Web；Ontology；Web Service；P2P；Pastry

I. INTRODUCTION With the expansion of the network size, based on the c/s

mode of applications have ever-increasing demands on multi-processing ability of all types of server, while the mode of data exchange are carried out by the server, so the distribution of data stream on the network is uneven, and how to relieve the central server’s load and bandwidth bottlenecks made a condition known as "peer-to-peer" (Peer to Peer Network, hereinafter referred to as P2P networks) of the new large-scale distributed systematic study which has been people's attention, developed rapidly and soon replace the web as the most bandwidth applications on the Internet.

Because P2P networks do not require transformation of the network infrastructure, with its low cost service, the deployment flexibility, scalability and so on, it has become the focus of the study, how to find documents stored node quickly and efficiently where there are given is particularly important.

P2P networks as a core issue scalability and search efficiency has always been a constraint key factor to routing system. Researchers made a number of possible routing solutions such as: flooding (Gnutella), Center Index (Napster), distributed hash table DHT (Chord, CAN), pre / suffix Address (Pastry, Tapestry) and so on. The efficiency of these technologies in the search, fuzzy query, the maintenance of overhead and the scalability have advantages and disadvantages of each, but these routes are based on a strict program for routing keyword matching, and therefore can not support queries based on semantics.

Semantic web supporting semantic query and the realization of intelligent search to improve search efficiency is the future of the embryonic form of the World Wide Web. Peer-to-peer file-sharing to provide a lower cost, distributed computing, distributed search services on the Internet is widely used. Semantic web and peer-to-peer network is a hot research field in recent years. How to make semantics and peer-to-peer networks integration have become the emerging issues on distributed research.

A typical web service using a service-oriented architecture, this architecture contains three important roles: service requestor (Requestor), service provider (Provider) and the Registry (Registry). The traditional web service discovery mechanisms store and achieve a description of the service by means of centralized service registration center.

This article will use the OWL-S language to describe web service; using the Service Profile to describe the semantic characteristics of services (input and output, etc.) will integrate the Semantic web and P2P networks. We will be based on structured P2P routing of Pastry mechanisms and the ontological concept to achieve hashing the complexity of the ontology to a simple value keyword so that we can find fast and accurate with the users’ request; that is, to increase the precision of querying service and introduce the Semantic Query to web service. At the same time, we classified the characteristics ontology concept of the service, based on semantic similarity of ontology concept between the different concepts we classified different ontology into different concept groups, thereby helping to improve the rate of semantic service discovery.

II. RELATED WORK At present, P2P technology has been carried out series of

in-depth research at home and abroad, made a number of application prototypes, which are mainly represented by Napster centralized sharing model and represented by Gnutella sharing pure P2P model (usually called unstructured P2P systems), as well as research in recent years to find the formation of the Distributed Hash System (structured P2P system). Napster model focused on the central directory server information, unified management, and its structure is simple, but when the central directory server is failure, the entire sharing system will paralyze; Gnutella model has no fixed structure, querying of broadcasting information, fault-tolerant high, no node failure; however, phagocytes bandwidth efficiency is high;

The Fifth Annual ChinaGrid Conference

978-0-7695-4106-8/10 $26.00 © 2010 IEEE

DOI 10.1109/ChinaGrid.2010.24

205

while in structured P2P system, each node and resources are mapped through the hash function to obtain global ID, then it uses distributed hash table to find the overall situation, so that the purpose of finding its resources will be strong, and it will not only reduce the phagocyte rate of bandwidth, but also to improve the efficiency of the search.

The typical structured P2P systems are mainly of CAN, Chord, Pastry and Tapestry. Such structure provides the stability of scalability and reliability, the structure which is selected in this paper is: Pastry.

In this paper, the structure model of the P2P network, the introduction of the ontology concept and the concept of eigenvector services, using the routing algorithm compatible hash search registration services, to improve the efficiency of the service discovery; At the same time, each network node is registered peer-to-peer, self-adaptive, self-organization, and there is no strict dependence on, and so, he has a good scalability.

III. PASTRY NETWORK MODEL INTRODUCED Pastry is a DHT routing mechanism, which is a scalable

distributed object location and routing protocol proposed by Microsoft Research. It is composed by the Pastry node, structured, and self-organizing overlay network (Overlay Network). Pastry routing algorithm can effectively retrieve the results, while ensuring that the search step in O(logN) (N is the total number of nodes), to achieve the scalability of the search.

Pastry network, each node is both client and server; nodes joining the network were randomly assigned to a node 128 bits NodeID, in order to locate each node in the node space. NodeID can be calculated by using the public key or obtain the hash function value of IP address. Node need to safeguard the three kinds of data structures: (1) the routing table (2) leaf set (3) neighbor set, as shown in Figure 1:

Figure 1. Pastry node data structure

Routing table recorded as R, usually contains

2log b N⎡ ⎤⎢ ⎥ lines (b is a Pastry specification of adjustable

parameters, usually values for 4, N is the size of the network

nodes), there are 2b -1 items per line. At the first n lines in

the Routing table, n-nodes point to the value of the ID’s previous n-bit is same as the current node ID’s, but the first n + 1 bit is the current node ID’s value is all possible.

Neighborhood set recorded as M, storage the number of |M| nodes information (node ID and IP address) which communicate with the current node best. Message in the routing node does not need to use the Neighborhood sets; their existence is to ensure the communication efficiency.

Leaf set recorded as L, in which |L| / 2 nodes of the ID is less than the current node, and |L| / 2 nodes is greater than the current node ID, they all point to the network ID nodes closest to the current node. Leaf node in the message routing

process will be used, |L| normally be admitted to 2b , |M|

normally be admitted to 2 2b× .

IV. PASTRY ROUTING MECHANISMS When receiving the message, firstly check the leaf node

sets to see whether the key words is covered with the scope of leaf node set ID. If so, the message will be transmitted to the target node which ID value is the closest one to the keyword D.

If the leaf node set is not covered by the keyword value, the message was transmitted to the node which coincidence of the prefix keyword is more than one bit compared to coincidence of the current node in the routing table (including the current node).

If the routing table on the record is empty or point to the record of the failure node, then find a combined focus node from the state table contained current node, and that the coincidence of node ID and the keyword prefix is greater than the current node; furthermore, value is closer to the keyword than the current node.

Therefore, as long as the leaf set does not appear more than half of the nodes failure at the same time, routing process can continue. From the above can be seen that every step of routing compared to the previous step to the target nodes are a step forward, so the process is convergent. Routing process as shown in Figure 2:

Figure 2. Pastry message routing process

V. BASED ON PASTRY SEMANTIC WEB SERVICE

DISCOVERY MECHANISM

A. WEB SERVICE DESCRIPTION AND SEMANTIC SERVICE EIGENVECTOR

206

Through WSDL and OWL-S, web services can be described with rich semantic information. Each node in the Pastry stored on the Web Service using WSDL and OWL-S to describe. In order to efficiently publish and query them with the semantic information of Web services, we have introduced a service eigenvector. Semantic description of services will be mapped into a one-dimensional vector. Services publishing and services querying are described in this vector. Services when it is published, we use eigenvectors to describe a specific service; Similarly, in the service discovery process, the request (query) is converted into the corresponding eigenvector of services, and matching with published eigenvector in network (based on the prefix overlap bit) in order to gain access to adequate service.

B. THE CALCULATING OF CONCEPT GROUPS AND SERVICE EIGENVECTOR First of all, we will describe the functional information of

services, such as: input (input), output (output) to classify the ontology concept in order to build the ontology concept group (Concept Group), in Figure 3. We will define a sort of similar semantic ontology concept as a collection of ontology concept group, the similarity of semantic among is determined by ontology concept distance in the concept chart as well as the number of common attributes to the ontology concept groups. At the same time, there is the size of the relationship between concept groups, the comparison rules are as follows:

If the concept group CGx and the concept group CGy are satisfied with one of the following two conditions, said CGx bigger than concept group CGy:

1. If CGx has the higher position than CGy in chart. 2. If CGx and CGy are at the same level in chart, while

CGx is at the left side of CGy.

Figure 3. Ontology concept and concept group

On this basis we will map the function information of semantic description services (including services publishing and services querying) such as: input (inputs), output (outputs) to their corresponding concept group. Because it can effectively improve the ability of semantic reasoning through mapping the ontological concept to a hash value, so in this paper we use numerical keyword (key) to represent the ontological concept of functional attributes.

We complete the following four steps to change the WSDL, OWL-S description of web service into the eigenvector.

(1) Bloom Key hash algorithm to determine whether a group of concepts is semantically similar.

(2) For each input Ii or output Oi, the first to find the concept of a subordinate group CGi.

(3) Because the service input and output parameters’ order to describe the service does not affect functional attributes, thus based on the definition of concept group size relationship, we will sort the input and output parameters in the descending.

(4) For a sorted concept group CGi, we use Bloom Key hash algorithm to generate CGi corresponding to its keyword ki; that is, a combination of all the ki of the services constitute an eigenvector.

C. SERVICE RELEASE Advanced the service discovery model in this paper, in

P2P network each node not only receive registering services, but also accept requests for service inquiries. Model to deal with service releasing request may follow these steps:

(1) For the service S, using Bloom to obtain eigenvector V from corresponding services of its. V = (k1, k2, ... kp, d, k1 ... kq), ki for service input and output method according to Bloom received hash key, d is the separator of input and output.

(2) The use of hash algorithm hash vector V will be mapped to the value of the K, and can choose a suitable algorithm make K contains characteristics of each element in vector V.

(3) Mentioned above, in accordance with the Pastry routing query algorithm, query the value of K corresponding to the key node of Pi.

(4) Service S release in the node Pi.

D. SEMANTIC SERVICE DISCOVERY MECHANISM Model to deal with requests for service discovery, follow

these steps: (1) For each service eigenvector V = (k1, k2, ... kp, d, k1

... kq). (2) Through the hash algorithm H, vector V will be

mapped to the value of the K. (3) According to Pastry routing algorithm, we find

information about the node Pi containing the key K. (4) Service of the V service eigenvector is kept in Node

Pi. Through this method, we can service the user query

submitted fast to a matching nodes registered services. By using Bloom, methods used to determine the ontology concept subordinate to which ontology concept group has become quickly and efficiently, so the calculation of a service eigenvector will be a little time-consuming process. However, because of the eigenvectors of each service may not be the same number of elements, element itself may not be the same, matching the two services has become an eigenvector in this paper on the problem to be solved.

To solve this problem, we have considered two different methods. First, the vector V = (k1, k2 ... d, ... kn) of all

207

elements as a whole search keyword on Pastry network. Through the P2P routing algorithm (based on the maximum transmitted message prefix keywords) retrieved the node where the keyword is to find the service on the node. Second, the vector V of each element as a separate keyword, using Pastry routing algorithm query to a series of nodes, and then take the intersection of these nodes to obtain the best match node. In order to improve service efficiency, we use the first method, because of the semantic description of services may have different interfaces, service parameters may be very large, each parameter to match the number of nodes may be a lot, the number of nodes searched by parameters will be a huge number; therefore, intersection is a big workload, and would greatly reduce the efficiency of service discovery.

VI. SUMMARIES AND FUTURE WORK DHT-based P2P system is compatible with the hash

function based on accurate positioning of keywords and found objects. Hash function is always trying to ensure that the hash value generated uniform random distribution, the result of the high similarity of the two elements, but not exactly the same object is to generate a completely different hash value, the storage to the two objects will happen on the completely random two nodes. Thus, DHT can provide exact match query, but it is very difficult to support the semantics.

In this paper, we advanced a new type of ontology semantic Web Service discovery model based on the Pastry system. The model’s major innovations as follows:

(1) We use OWL-S language to describe web service; Service Profile describes the characteristics of service semantics.

(2) We classified the characteristics ontology concept of the service, the service is mapped to the specific services eigenvector.

(3) Pastry structure will be posted to the corresponding service node. Service request routing algorithm based on Pastry in the network prefix-based services to the largest eigenvector keyword matching routing and forwarding information, and ultimately found the request matching node.

Through this model, we achieved a P2P distributed network environment is described based on the semantic web service discovery methods, and greatly enhanced the precision of service discovery and the entirety of query.

In further work, we will: (1) Study found that without lowering the efficiency of

services under the premise of the concept of how to make an ontology concept can be mapped to more than one ontology concept group.

(2) For the concurrent node join and exit the optimal strategy algorithm.

(3) The structure of the existing P2P systems are the lack of the physical location of the node space considerations, this also led to the approach of logical nodes in the physical location may be adjacent to a very distant phenomenon, so we will have to consider: When the introduction of the

physical location of the node factors, Pastry inquiry system in dealing with the request of the corresponding optimization algorithm.

REFERENCES [1] D Martin, M Burstein, et al. OWL-S: Semantic Markup for Web

Service [EB/OL].http://www.daml.org/services/owl-s/1.0/, 2004. [2] C Tang, Z Xu, S Dwarkadas. Peer-to-peer information retrieval using

self-organizing semantic overlay networks [c]. In: Proc of SIGCOMM2003. New York: ACM Press，2003. 175-186.

[3] Antony Rowstron, Peter Druschel, “Pastry: scalable, decentralized object location and routing for large-scale peer-to-peer systems[R],” IFIP/ACM International Conference on Distributed Systems Platforms, 2001.

[4] B H Bloom. Space/Time trade-offs in hash coding with allowable errors [J]. ACM, 1970, 13(7):422-426.

[5] A. Crespo and H. Garc´ıa-Molina. Routing Indices for Peer-to-peer Systems. In ICDCS’02, July 2002.

[6] M. Berry, Z. Drmac, and E. Jessup. Matrices, Vector Spaces, and Information Retrieval. SIAM Review, 41(2):335–362, 1999.

[7] I Constantinescu, B Faltings. Efficient matching and directory services [A]. Ning Zhong. WI’03 Proceedings of IEEE/WIC International Conference on Web Intelligence[C]. Washington, DC: IEEE Computer Society, 2004.75-81.

[8] Antony Rowstron, Peter Druschel, “Pastry: scalable, decentralized object location and routing for large-scale peer-to-peer systems[R],” IFIP/ACM International Conference on Distributed Systems Platforms,2001

208

Documents

[IEEE 2010 Fifth Chinagrid Annual Conference (ChinaGrid) - Guangzhou, TBD, China (2010.07.16-2010.07.18)] 2010 Fifth Annual ChinaGrid Conference - A Semantic Web Service Discovery