OPEN-SME_AUTH_WP3_D31b

Embed Size (px)

Citation preview

  • 7/31/2019 OPEN-SME_AUTH_WP3_D31b

    1/19

    FP7-SME-2008-2 243768

    OPEN-SME

    Open-Source Software Reuse Service for SMEs

    Deliverable D3.1b

    Open Source Search Engine (v.2)

    Deliverable Type: PU*

    Nature of the Deliverable: P**

    Date: March 5, 2012

    Distribution: WP3

    Code: OPEN-SME/AUTH/WP3/D3.1b

    Editor: AUTH

    Contributors: AUTH, TTEL

    *Deliverable Type: PU= Public, RE= Restricted to a group specified by the Consortium, PP= Restricted to other program

    participants (including the Commission services), CO= Confidential, only for members of the Consortium(including the Commission services)

    **Nature of the Deliverable: P= Prototype, R= Report, S= Specification, T= Tool, O= Other

    Abstract: This is a brief report accompanying the OCEAN tool prototype (version 2), already

    available to the consortium, with respect to the specifications and functionality already achieved.

    Copyright by the OPEN-SME Consortium.

    The OPEN-SME Consortium consists of:

    Enosi Mihanikon Pliroforikis & Epikinonion Ellados Project Coordinator Greece

    Drustvo za informacione sisteme I racunarske mreze-Informaciono drustvo Srbije Partner SerbiaEpistimoniko Techniko Epimelitirio Kyprou (Technical Chamber of Cyprus) Partner Cyprus

    Teknikbyn Science Park Vasteras AB Partner Sweden

    SOLINET GmbH Telecommunications Partner Germany

    GNOMON Informatics SA Partner Greece

    Maelardalens Hoegskola Partner Sweden

    Teletel S.A. - Telecommunications and Information Technology Partner Greece

    Aristotelio Panepistimio Thessalonikis Partner Greece

    Universiteit Maastricht Partner Netherlands

  • 7/31/2019 OPEN-SME_AUTH_WP3_D31b

    2/19

    Deliverable D3.1b: Open Source Search Engine (version 1) Page 2 of 19

    This page has been intentionally left blank.

    OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011

  • 7/31/2019 OPEN-SME_AUTH_WP3_D31b

    3/19

    Deliverable D3.1b: Open Source Search Engine (version 1) Page 3 of 19

    Table of Contents

    ABBREVIATIONS ...........................................................................................................................4

    1. INTRODUCTION ........................................................................................................................5

    1.1 DELIVERABLE SCOPE..................................................................................................................5

    2. TECHNOLOGY PLATFORM ....................................................................................................6

    3. API DEVELOPMENT ..................................................................................................................8

    3.1 EXTERNALQUERYSERVICES ........................................................................................................ 8

    3.2 DATABASE SERVICE API ..........................................................................................................10

    4. USER INTERFACE AND USE CASE DEVELOPMENT .....................................................15

    5. SCREENSHOTS ..........................................................................................................................16

    OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011

  • 7/31/2019 OPEN-SME_AUTH_WP3_D31b

    4/19

    Deliverable D3.1b: Open Source Search Engine (version 1) Page 4 of 19

    ABBREVIATIONS

    CBD Component Based Development

    CBSE Component Based Software Engineering

    CMMI Capability Maturity Model Integration

    COMPARE Component Repository and Search Engine

    COTS Commercial Off The Shelf

    CPU Central Processing Unit

    EFP Extra Functional Property

    ISO International Standard Organisation

    JSP Java Server Pages

    PI Provided Interface

    ProCom Progress Component Model

    QoS Quality of Service

    RCP Rich Control Platform

    RI Required Interface

    RTOS Real-time Operating System

    RUP Rational Unified Process

    SME Small and Medium scale Enterprise

    SWEET Swedish Worst Case Execution Time Tool

    V&V Verification and Validation

    WCET Worst Case Execution Time

    OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011

  • 7/31/2019 OPEN-SME_AUTH_WP3_D31b

    5/19

    Deliverable D3.1b: Open Source Search Engine (version 1) Page 5 of 19

    1. INTRODUCTION

    1.1 DELIVERABLE SCOPE

    This is a brief report accompanying the OCEAN tool prototype (version 2), already available to theconsortium, with respect to the specifications and functionality already achieved.

    OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011

  • 7/31/2019 OPEN-SME_AUTH_WP3_D31b

    6/19

    Deliverable D3.1b: Open Source Search Engine (version 1) Page 6 of 19

    2. TECHNOLOGYPLATFORM

    An instance of Liferay, the open source content management system and Enterprise portal has

    been deployed and is accessible on http://ocean.gnomon.com.gr. For access as an administrator

    you can use the account (username, password) = (root, test). Basic user, role and security management has been implemented. Access to the tools is restricted

    to registered users only.

    Registration functionality and process are in place.

    After the development and during the testing phase of OCEAN v1.0 it became evident that the

    fundamental assumption, that a number of external APIs can be used / called from the internal search

    API in order to fetch results from OS search engines, was not valid anymore. During early test phase (Q3

    Q4 2011) already the Merobase API service stopped working, leaving the Google Code search API as

    the single working source for OCEAN. This already problematic situation (to have a metasearch engine

    with only one source) became quickly a stalemate as Google Code ceased service from January 15 th2012.

    Thus there was an urgent need to redefine OCEAN functionality and basic design. The OCEAN team

    returned to the drawing table and came back in a very short time with an alternative system design:

    Instead of calling external APIs, the meta-search engine would call a new HTTP-based web service

    running on a Debian Linux server at the Aristotle University of Thessaloniki that queries standard

    HTML-based Open Source search sites and scrapes the N first results returned from their native web

    interface . To quickly achieve the desired functionality the team used the free web data extraction tool

    DEiXTo [http://deixto.com] and custom Perl CGI scripts capable of searching in real time Koders and

    Krugle.

    As far as Merobase is concerned, after communicating with its creators, access to a brand new API was

    provided through a JAR search client. So, a Perl web service (running on the same server) was alsowritten utilizing the API and returning the results for a user-specified query in a suitable XML format.

    The revised OCEAN architecture is shown in Figure 1.

    OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011

    http://ocean.gnomon.com.gr/http://ocean.gnomon.com.gr/
  • 7/31/2019 OPEN-SME_AUTH_WP3_D31b

    7/19

    Deliverable D3.1b: Open Source Search Engine (version 1) Page 7 of 19

    Figure 1: Revised OCEAN Architecture

    Finally, in addition to the new architecture, OCEAN user interface has improved, with basic search

    parameters such as language, license and return type added to the main page as well as with improveduser preferences management. (see Screenshots)

    OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011

  • 7/31/2019 OPEN-SME_AUTH_WP3_D31b

    8/19

    Deliverable D3.1b: Open Source Search Engine (version 1) Page 8 of 19

    3. API DEVELOPMENT

    Internal Query API: The following API methods have been developed and tested according to the

    specifications:

    # Method

    1 searchOSS( textToSearch:Text,

    searchBase:Text,

    engines:List,

    licences:List,

    metrics:List,

    userDefinedOptions:List,

    resultGranularity:List,

    async:Boolean,

    timeout:function,

    complete:function):

    searchResults:List

    2 getEngines(void) : engines:List

    3 getEngine(engine:String) : result:Engine

    External Query and Fetch APIs: The API has been implemented for the Google Code1 and the

    Merobase2 search engines

    Database and DB access API . The specified DB schema has been defined and implemented in

    the MySQL database that is part of the deployed Liferay instance. The method

    storeResults(userID:Number, searchResults:List):void has been

    implemented and tested.

    3.1 EXTERNALQUERYSERVICES

    Typically there are two main mechanisms to search and retrieve data from a website: either through an

    Application Programming Interface commonly known as an API (if available) or via screen scraping.

    The first one is better, faster and more reliable. However, there is not always a search API available. In

    such cases, web robots, also called agents, are usually used in order to simulate a person searching the

    target website through a web browser and capture bits of interest by utilizing scraping techniques.So, for

    the open source code search engines Koders and Krugle that do not offer an API, we deployed DEiXTo-

    1http://www.google.com/codesearch

    2http://merobase.com/#main

    OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011

    http://www.google.com/codesearchhttp://www.google.com/codesearchhttp://www.google.com/codesearchhttp://merobase.com/#mainhttp://merobase.com/#mainhttp://www.google.com/codesearchhttp://merobase.com/#main
  • 7/31/2019 OPEN-SME_AUTH_WP3_D31b

    9/19

    Deliverable D3.1b: Open Source Search Engine (version 1) Page 9 of 19

    based wrappers in order to scrape in real time the results returned from their native web interface.

    Custom Perl scripts were written and got installed on a Debian Linux server at the premises of the

    Aristotle University of Thessaloniki. Therefore, OCEAN became able to search the two websites through

    an external web service. It should be noted though that for Krugle, an excellent open source web browser

    automation tool, called Selenium3, was used.

    Moreover, for Merobase, a JAR search client was utilized. After communicating with the Merobase coredeveloper, we got access to their brand new API. Thus, another 3rd Perl service was created returning

    Merobase results in a suitable XML format.

    More specifically:

    Koders

    The koders script is based upon DEiXToBot (a Mechanize agent object capable of executing extraction

    rules previously built with the DEiXTo GUI tool). The service supports 4 URL parameters: s (for the

    search keyword), li (for license), la (for language) and n (for the number of results requested).

    Example:http://swserv2.csd.auth.gr/cgi-bin/koders.pl?li=*&la=*&s=perl&n=20

    This http request would result in the following native http request:

    http://www.koders.com/default.aspx?s=perl&la=*&li=*&p=0

    The XML response file returned by our service is depicted in Figure 2:

    Figure 2: Example XML response from Koders

    Krugle

    The krugle script/ service has two pillars: a) the Selenium Server (version 2.20.0) and b) DEiXToBot.

    Selenium allows us to launch a Firefox instance in order to programmatically simulate the process of

    searching on Krugles website. On the other hand, DEiXToBot facilitates the parsing of results data in

    the HTML result pages and their transformation into XML. The script supports 4 URL parameters: s (for

    the search keyword), project, license and n (for the number of results requested)

    Example:

    http://swserv2.csd.auth.gr/cgi-bin/krugle.pl?s=java&n=10&project=&language=&license=

    An example of the XML response is depicted in Figure 3:

    3http://seleniumhq.org/

    OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011

    http://seleniumhq.org/http://seleniumhq.org/http://seleniumhq.org/http://seleniumhq.org/
  • 7/31/2019 OPEN-SME_AUTH_WP3_D31b

    10/19

    Deliverable D3.1b: Open Source Search Engine (version 1) Page 10 of 19

    Figure 3: Example XML response from Krugle

    The http request above would result in this native request:

    http://opensearch.krugle.org/document/?

    query=java&project=&language=&license=&search_type=advanced_search

    Merobase

    The Merobase Perl script (harnessing a Java search client) can submit queries in real time to Merobasethrough its API. It supports 2 parameters: s (for the search keyword) and n (for the number of results

    requested).

    Example:

    http://swserv2.csd.auth.gr/cgi-bin/merobase.pl?s=java&n=25

    This would yield the following XML response depicted in Figure 4:

    Figure 4: Example XML response from Merobase

    3.2 DATABASE SERVICE API

    public void storeResults (long userId, String title, String description, String[] tags,

    List oceanResults)

    Description

    This call stores a list of OceanSearchResults in the database characterized by a title, a

    description and a number of tags. Additionally the userId of the user who requests the

    operation.

    OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011

  • 7/31/2019 OPEN-SME_AUTH_WP3_D31b

    11/19

    Deliverable D3.1b: Open Source Search Engine (version 1) Page 11 of 19

    Argument Description

    userID : Id of the user requesting the operation.

    title : The title of the query.

    description : The description of the query.

    oceanResults : List of results record.

    public Queries getQuery (String queryId)

    Description

    Returns a Queries object stored in the database with the specific queryId. A Query is

    specified by a Title, a Description, an array of Tags and a list of QueryResults. A Query is

    described as a group of QueryResults with a common search criteria.

    Argument Description

    queryId : the Id of the query that is stored in the database.

    public void deleteQuery(String queryId)

    Description

    Deletes a Queries object with the current queryId from the database, additionally all the

    children that are associated (Metrics, Metadata, etc.) with it are also deleted.

    Argument Description.........................................................................................................

    queryId : the Id of the query that is stored in the database.

    public void storeQuery(long userId, Queries query, String title, String description)

    Description

    A Queries object is been created in database with a title and a description. The userId of

    the user who requests the operation is required.

    Argument Description

    userId : Id of the user requesting the operation.

    title : The title of the query.

    description : The description of the query.

    public void storeEntries(List oceanResults, Queries query)

    OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011

  • 7/31/2019 OPEN-SME_AUTH_WP3_D31b

    12/19

    Deliverable D3.1b: Open Source Search Engine (version 1) Page 12 of 19

    Description

    A list of OceanSearchResults objects is been created in database and attached to a Query

    object that already exists in database.

    Argument Description

    oceanResults : List of results record.

    query : The query with the results are associated.

    public void storeMetrics(QueryResults queryResults, OceanSearchResult searchResult)

    Description

    A QueryMetrics object is been created in the database and attached to an already created

    queryResults object of the database by retrieving the appropriate metrics values from the

    searchResult object.

    ........................................................................................................................................................

    Argument Description

    queryResults : a QueryResults object that belongs to a Queries object. Both already exist in database.

    searchResult : a result that is been return by the search interface.

    public void storeMetadata(QueryResults queryResults, OceanSearchResult searchResult);

    DescriptionA QueryMetadata object is been created in the database and attached to an already created

    queryResults object of the database by retrieving the appropriate metadata values from the

    searchResult object.

    ........................................................................................................................................................

    Argument Description

    queryResults : a QueryResults object that belongs to a Queries object. Both already exist in database.

    searchResult : a result that is been return by the search interface.

    public void storeTags(String[] tags, Queries query)

    Description

    An array of tags is been stored in the database and is associated with the query object that

    exists already in the database.............................................................................................

    Argument Description

    tags: a list of tags that are associated with the query object.

    query : the query object which is associated with the tags.

    OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011

  • 7/31/2019 OPEN-SME_AUTH_WP3_D31b

    13/19

    Deliverable D3.1b: Open Source Search Engine (version 1) Page 13 of 19

    public List listQueryResults(String queryId)

    Description

    A list of QueryResults is returned based on the queryId of the parent Queries object.

    Argument Description

    queryId : the Id of the query that is stored in the database.

    public List listQueryMetrics(String queryId)

    Description

    A list of QueryMetrics is returned based on the queryId of the parent Queries object.

    Argument Description

    queryId : the Id of the query that is stored in the database.

    public List listResultMetadata(String queryId);

    Description

    A list of ResultMetadata is returned based on the queryId of the parent Queries object.

    Argument Description

    queryId : the Id of the query that is stored in the database.

    OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011

  • 7/31/2019 OPEN-SME_AUTH_WP3_D31b

    14/19

    Deliverable D3.1b: Open Source Search Engine (version 1) Page 14 of 19

    Figure 5: Database Schema

    OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011

  • 7/31/2019 OPEN-SME_AUTH_WP3_D31b

    15/19

    Deliverable D3.1b: Open Source Search Engine (version 1) Page 15 of 19

    4. USERINTERFACEANDUSECASEDEVELOPMENT

    The following table summarizes the status of the development of the different use cases defined for the

    tool.

    Use Case 1: Create user account Completed & tested Use Case 2: Approve user account request Completed & tested Use Case 3: User Login Completed & tested Use Case 4: User Log Out Completed & tested Use Case 5: Request forgotten password Completed & tested Use Case 6: Create/ Edit User profile Completed & tested Use Case 8: Perform search (Freetext) Completed & tested Use Case 7: Perform search (Navigational) This had to be delayed for the 1st week of April due

    to other priorities described earlier in Section 2, since additional resources had to be allocated forthe integration of DeiXto tool in the OCEAN architecture. It is currently under development andtesting.

    Use Case 9: Perform search (Advanced) Completed & tested Use Case 10:Store search results Completed & tested Use Case 11: View saved queries. Completed & tested Use Case 12: Subscribe to search notification service. This had to be delayed for the 1st

    week of April due to other priorities described earlier in Section 2, since additional resources hadto be allocated for the integration of DeiXto tool in the OCEAN architecture. It is currently underdevelopment and testing.

    OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011

  • 7/31/2019 OPEN-SME_AUTH_WP3_D31b

    16/19

    Deliverable D3.1b: Open Source Search Engine (version 1) Page 16 of 19

    5. SCREENSHOTS

    Figure 6: Sign-in welcome page

    Figure 7: Basic search functionality using the Merobase and Krugle search engines

    OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011

  • 7/31/2019 OPEN-SME_AUTH_WP3_D31b

    17/19

    Deliverable D3.1b: Open Source Search Engine (version 1) Page 17 of 19

    Figure 8: Account Management page

    Figure 9: Create/Edit User Profile.

    OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011

  • 7/31/2019 OPEN-SME_AUTH_WP3_D31b

    18/19

    Deliverable D3.1b: Open Source Search Engine (version 1) Page 18 of 19

    Figure 10: Selection of results to be saved.

    Figure 11: Title, Description and Tags of saved query.

    OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011

  • 7/31/2019 OPEN-SME_AUTH_WP3_D31b

    19/19

    Deliverable D3.1b: Open Source Search Engine (version 1) Page 19 of 19

    Figure 12: List of saved queries.

    Figure 13: Details of the saved query.

    OPEN-SME/AUTH/WP3/D3.1b OPEN-SME Consortium August 2011