Upload
michalis-loupis
View
214
Download
0
Embed Size (px)
Citation preview
7/31/2019 OPEN-SME_AUTH_WP3_D31b
1/19
FP7-SME-2008-2 243768
OPEN-SME
Open-Source Software Reuse Service for SMEs
Deliverable D3.1b
Open Source Search Engine (v.2)
Deliverable Type: PU*
Nature of the Deliverable: P**
Date: March 5, 2012
Distribution: WP3
Code: OPEN-SME/AUTH/WP3/D3.1b
Editor: AUTH
Contributors: AUTH, TTEL
*Deliverable Type: PU= Public, RE= Restricted to a group specified by the Consortium, PP= Restricted to other program
participants (including the Commission services), CO= Confidential, only for members of the Consortium(including the Commission services)
**Nature of the Deliverable: P= Prototype, R= Report, S= Specification, T= Tool, O= Other
Abstract: This is a brief report accompanying the OCEAN tool prototype (version 2), already
available to the consortium, with respect to the specifications and functionality already achieved.
Copyright by the OPEN-SME Consortium.
The OPEN-SME Consortium consists of:
Enosi Mihanikon Pliroforikis & Epikinonion Ellados Project Coordinator Greece
Drustvo za informacione sisteme I racunarske mreze-Informaciono drustvo Srbije Partner SerbiaEpistimoniko Techniko Epimelitirio Kyprou (Technical Chamber of Cyprus) Partner Cyprus
Teknikbyn Science Park Vasteras AB Partner Sweden
SOLINET GmbH Telecommunications Partner Germany
GNOMON Informatics SA Partner Greece
Maelardalens Hoegskola Partner Sweden
Teletel S.A. - Telecommunications and Information Technology Partner Greece
Aristotelio Panepistimio Thessalonikis Partner Greece
Universiteit Maastricht Partner Netherlands
7/31/2019 OPEN-SME_AUTH_WP3_D31b
2/19
Deliverable D3.1b: Open Source Search Engine (version 1) Page 2 of 19
This page has been intentionally left blank.
OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011
7/31/2019 OPEN-SME_AUTH_WP3_D31b
3/19
Deliverable D3.1b: Open Source Search Engine (version 1) Page 3 of 19
Table of Contents
ABBREVIATIONS ...........................................................................................................................4
1. INTRODUCTION ........................................................................................................................5
1.1 DELIVERABLE SCOPE..................................................................................................................5
2. TECHNOLOGY PLATFORM ....................................................................................................6
3. API DEVELOPMENT ..................................................................................................................8
3.1 EXTERNALQUERYSERVICES ........................................................................................................ 8
3.2 DATABASE SERVICE API ..........................................................................................................10
4. USER INTERFACE AND USE CASE DEVELOPMENT .....................................................15
5. SCREENSHOTS ..........................................................................................................................16
OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011
7/31/2019 OPEN-SME_AUTH_WP3_D31b
4/19
Deliverable D3.1b: Open Source Search Engine (version 1) Page 4 of 19
ABBREVIATIONS
CBD Component Based Development
CBSE Component Based Software Engineering
CMMI Capability Maturity Model Integration
COMPARE Component Repository and Search Engine
COTS Commercial Off The Shelf
CPU Central Processing Unit
EFP Extra Functional Property
ISO International Standard Organisation
JSP Java Server Pages
PI Provided Interface
ProCom Progress Component Model
QoS Quality of Service
RCP Rich Control Platform
RI Required Interface
RTOS Real-time Operating System
RUP Rational Unified Process
SME Small and Medium scale Enterprise
SWEET Swedish Worst Case Execution Time Tool
V&V Verification and Validation
WCET Worst Case Execution Time
OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011
7/31/2019 OPEN-SME_AUTH_WP3_D31b
5/19
Deliverable D3.1b: Open Source Search Engine (version 1) Page 5 of 19
1. INTRODUCTION
1.1 DELIVERABLE SCOPE
This is a brief report accompanying the OCEAN tool prototype (version 2), already available to theconsortium, with respect to the specifications and functionality already achieved.
OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011
7/31/2019 OPEN-SME_AUTH_WP3_D31b
6/19
Deliverable D3.1b: Open Source Search Engine (version 1) Page 6 of 19
2. TECHNOLOGYPLATFORM
An instance of Liferay, the open source content management system and Enterprise portal has
been deployed and is accessible on http://ocean.gnomon.com.gr. For access as an administrator
you can use the account (username, password) = (root, test). Basic user, role and security management has been implemented. Access to the tools is restricted
to registered users only.
Registration functionality and process are in place.
After the development and during the testing phase of OCEAN v1.0 it became evident that the
fundamental assumption, that a number of external APIs can be used / called from the internal search
API in order to fetch results from OS search engines, was not valid anymore. During early test phase (Q3
Q4 2011) already the Merobase API service stopped working, leaving the Google Code search API as
the single working source for OCEAN. This already problematic situation (to have a metasearch engine
with only one source) became quickly a stalemate as Google Code ceased service from January 15 th2012.
Thus there was an urgent need to redefine OCEAN functionality and basic design. The OCEAN team
returned to the drawing table and came back in a very short time with an alternative system design:
Instead of calling external APIs, the meta-search engine would call a new HTTP-based web service
running on a Debian Linux server at the Aristotle University of Thessaloniki that queries standard
HTML-based Open Source search sites and scrapes the N first results returned from their native web
interface . To quickly achieve the desired functionality the team used the free web data extraction tool
DEiXTo [http://deixto.com] and custom Perl CGI scripts capable of searching in real time Koders and
Krugle.
As far as Merobase is concerned, after communicating with its creators, access to a brand new API was
provided through a JAR search client. So, a Perl web service (running on the same server) was alsowritten utilizing the API and returning the results for a user-specified query in a suitable XML format.
The revised OCEAN architecture is shown in Figure 1.
OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011
http://ocean.gnomon.com.gr/http://ocean.gnomon.com.gr/7/31/2019 OPEN-SME_AUTH_WP3_D31b
7/19
Deliverable D3.1b: Open Source Search Engine (version 1) Page 7 of 19
Figure 1: Revised OCEAN Architecture
Finally, in addition to the new architecture, OCEAN user interface has improved, with basic search
parameters such as language, license and return type added to the main page as well as with improveduser preferences management. (see Screenshots)
OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011
7/31/2019 OPEN-SME_AUTH_WP3_D31b
8/19
Deliverable D3.1b: Open Source Search Engine (version 1) Page 8 of 19
3. API DEVELOPMENT
Internal Query API: The following API methods have been developed and tested according to the
specifications:
# Method
1 searchOSS( textToSearch:Text,
searchBase:Text,
engines:List,
licences:List,
metrics:List,
userDefinedOptions:List,
resultGranularity:List,
async:Boolean,
timeout:function,
complete:function):
searchResults:List
2 getEngines(void) : engines:List
3 getEngine(engine:String) : result:Engine
External Query and Fetch APIs: The API has been implemented for the Google Code1 and the
Merobase2 search engines
Database and DB access API . The specified DB schema has been defined and implemented in
the MySQL database that is part of the deployed Liferay instance. The method
storeResults(userID:Number, searchResults:List):void has been
implemented and tested.
3.1 EXTERNALQUERYSERVICES
Typically there are two main mechanisms to search and retrieve data from a website: either through an
Application Programming Interface commonly known as an API (if available) or via screen scraping.
The first one is better, faster and more reliable. However, there is not always a search API available. In
such cases, web robots, also called agents, are usually used in order to simulate a person searching the
target website through a web browser and capture bits of interest by utilizing scraping techniques.So, for
the open source code search engines Koders and Krugle that do not offer an API, we deployed DEiXTo-
1http://www.google.com/codesearch
2http://merobase.com/#main
OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011
http://www.google.com/codesearchhttp://www.google.com/codesearchhttp://www.google.com/codesearchhttp://merobase.com/#mainhttp://merobase.com/#mainhttp://www.google.com/codesearchhttp://merobase.com/#main7/31/2019 OPEN-SME_AUTH_WP3_D31b
9/19
Deliverable D3.1b: Open Source Search Engine (version 1) Page 9 of 19
based wrappers in order to scrape in real time the results returned from their native web interface.
Custom Perl scripts were written and got installed on a Debian Linux server at the premises of the
Aristotle University of Thessaloniki. Therefore, OCEAN became able to search the two websites through
an external web service. It should be noted though that for Krugle, an excellent open source web browser
automation tool, called Selenium3, was used.
Moreover, for Merobase, a JAR search client was utilized. After communicating with the Merobase coredeveloper, we got access to their brand new API. Thus, another 3rd Perl service was created returning
Merobase results in a suitable XML format.
More specifically:
Koders
The koders script is based upon DEiXToBot (a Mechanize agent object capable of executing extraction
rules previously built with the DEiXTo GUI tool). The service supports 4 URL parameters: s (for the
search keyword), li (for license), la (for language) and n (for the number of results requested).
Example:http://swserv2.csd.auth.gr/cgi-bin/koders.pl?li=*&la=*&s=perl&n=20
This http request would result in the following native http request:
http://www.koders.com/default.aspx?s=perl&la=*&li=*&p=0
The XML response file returned by our service is depicted in Figure 2:
Figure 2: Example XML response from Koders
Krugle
The krugle script/ service has two pillars: a) the Selenium Server (version 2.20.0) and b) DEiXToBot.
Selenium allows us to launch a Firefox instance in order to programmatically simulate the process of
searching on Krugles website. On the other hand, DEiXToBot facilitates the parsing of results data in
the HTML result pages and their transformation into XML. The script supports 4 URL parameters: s (for
the search keyword), project, license and n (for the number of results requested)
Example:
http://swserv2.csd.auth.gr/cgi-bin/krugle.pl?s=java&n=10&project=&language=&license=
An example of the XML response is depicted in Figure 3:
3http://seleniumhq.org/
OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011
http://seleniumhq.org/http://seleniumhq.org/http://seleniumhq.org/http://seleniumhq.org/7/31/2019 OPEN-SME_AUTH_WP3_D31b
10/19
Deliverable D3.1b: Open Source Search Engine (version 1) Page 10 of 19
Figure 3: Example XML response from Krugle
The http request above would result in this native request:
http://opensearch.krugle.org/document/?
query=java&project=&language=&license=&search_type=advanced_search
Merobase
The Merobase Perl script (harnessing a Java search client) can submit queries in real time to Merobasethrough its API. It supports 2 parameters: s (for the search keyword) and n (for the number of results
requested).
Example:
http://swserv2.csd.auth.gr/cgi-bin/merobase.pl?s=java&n=25
This would yield the following XML response depicted in Figure 4:
Figure 4: Example XML response from Merobase
3.2 DATABASE SERVICE API
public void storeResults (long userId, String title, String description, String[] tags,
List oceanResults)
Description
This call stores a list of OceanSearchResults in the database characterized by a title, a
description and a number of tags. Additionally the userId of the user who requests the
operation.
OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011
7/31/2019 OPEN-SME_AUTH_WP3_D31b
11/19
Deliverable D3.1b: Open Source Search Engine (version 1) Page 11 of 19
Argument Description
userID : Id of the user requesting the operation.
title : The title of the query.
description : The description of the query.
oceanResults : List of results record.
public Queries getQuery (String queryId)
Description
Returns a Queries object stored in the database with the specific queryId. A Query is
specified by a Title, a Description, an array of Tags and a list of QueryResults. A Query is
described as a group of QueryResults with a common search criteria.
Argument Description
queryId : the Id of the query that is stored in the database.
public void deleteQuery(String queryId)
Description
Deletes a Queries object with the current queryId from the database, additionally all the
children that are associated (Metrics, Metadata, etc.) with it are also deleted.
Argument Description.........................................................................................................
queryId : the Id of the query that is stored in the database.
public void storeQuery(long userId, Queries query, String title, String description)
Description
A Queries object is been created in database with a title and a description. The userId of
the user who requests the operation is required.
Argument Description
userId : Id of the user requesting the operation.
title : The title of the query.
description : The description of the query.
public void storeEntries(List oceanResults, Queries query)
OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011
7/31/2019 OPEN-SME_AUTH_WP3_D31b
12/19
Deliverable D3.1b: Open Source Search Engine (version 1) Page 12 of 19
Description
A list of OceanSearchResults objects is been created in database and attached to a Query
object that already exists in database.
Argument Description
oceanResults : List of results record.
query : The query with the results are associated.
public void storeMetrics(QueryResults queryResults, OceanSearchResult searchResult)
Description
A QueryMetrics object is been created in the database and attached to an already created
queryResults object of the database by retrieving the appropriate metrics values from the
searchResult object.
........................................................................................................................................................
Argument Description
queryResults : a QueryResults object that belongs to a Queries object. Both already exist in database.
searchResult : a result that is been return by the search interface.
public void storeMetadata(QueryResults queryResults, OceanSearchResult searchResult);
DescriptionA QueryMetadata object is been created in the database and attached to an already created
queryResults object of the database by retrieving the appropriate metadata values from the
searchResult object.
........................................................................................................................................................
Argument Description
queryResults : a QueryResults object that belongs to a Queries object. Both already exist in database.
searchResult : a result that is been return by the search interface.
public void storeTags(String[] tags, Queries query)
Description
An array of tags is been stored in the database and is associated with the query object that
exists already in the database.............................................................................................
Argument Description
tags: a list of tags that are associated with the query object.
query : the query object which is associated with the tags.
OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011
7/31/2019 OPEN-SME_AUTH_WP3_D31b
13/19
Deliverable D3.1b: Open Source Search Engine (version 1) Page 13 of 19
public List listQueryResults(String queryId)
Description
A list of QueryResults is returned based on the queryId of the parent Queries object.
Argument Description
queryId : the Id of the query that is stored in the database.
public List listQueryMetrics(String queryId)
Description
A list of QueryMetrics is returned based on the queryId of the parent Queries object.
Argument Description
queryId : the Id of the query that is stored in the database.
public List listResultMetadata(String queryId);
Description
A list of ResultMetadata is returned based on the queryId of the parent Queries object.
Argument Description
queryId : the Id of the query that is stored in the database.
OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011
7/31/2019 OPEN-SME_AUTH_WP3_D31b
14/19
Deliverable D3.1b: Open Source Search Engine (version 1) Page 14 of 19
Figure 5: Database Schema
OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011
7/31/2019 OPEN-SME_AUTH_WP3_D31b
15/19
Deliverable D3.1b: Open Source Search Engine (version 1) Page 15 of 19
4. USERINTERFACEANDUSECASEDEVELOPMENT
The following table summarizes the status of the development of the different use cases defined for the
tool.
Use Case 1: Create user account Completed & tested Use Case 2: Approve user account request Completed & tested Use Case 3: User Login Completed & tested Use Case 4: User Log Out Completed & tested Use Case 5: Request forgotten password Completed & tested Use Case 6: Create/ Edit User profile Completed & tested Use Case 8: Perform search (Freetext) Completed & tested Use Case 7: Perform search (Navigational) This had to be delayed for the 1st week of April due
to other priorities described earlier in Section 2, since additional resources had to be allocated forthe integration of DeiXto tool in the OCEAN architecture. It is currently under development andtesting.
Use Case 9: Perform search (Advanced) Completed & tested Use Case 10:Store search results Completed & tested Use Case 11: View saved queries. Completed & tested Use Case 12: Subscribe to search notification service. This had to be delayed for the 1st
week of April due to other priorities described earlier in Section 2, since additional resources hadto be allocated for the integration of DeiXto tool in the OCEAN architecture. It is currently underdevelopment and testing.
OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011
7/31/2019 OPEN-SME_AUTH_WP3_D31b
16/19
Deliverable D3.1b: Open Source Search Engine (version 1) Page 16 of 19
5. SCREENSHOTS
Figure 6: Sign-in welcome page
Figure 7: Basic search functionality using the Merobase and Krugle search engines
OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011
7/31/2019 OPEN-SME_AUTH_WP3_D31b
17/19
Deliverable D3.1b: Open Source Search Engine (version 1) Page 17 of 19
Figure 8: Account Management page
Figure 9: Create/Edit User Profile.
OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011
7/31/2019 OPEN-SME_AUTH_WP3_D31b
18/19
Deliverable D3.1b: Open Source Search Engine (version 1) Page 18 of 19
Figure 10: Selection of results to be saved.
Figure 11: Title, Description and Tags of saved query.
OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011
7/31/2019 OPEN-SME_AUTH_WP3_D31b
19/19
Deliverable D3.1b: Open Source Search Engine (version 1) Page 19 of 19
Figure 12: List of saved queries.
Figure 13: Details of the saved query.
OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011