46
Publications Office Technical Analysis of the common modules of the project EUR-Lex, Portal 2012 and EU BookShop Subject Common Modules Analysis Version 0.01 Release Date 16/02/2012 Filename document.doc Document Reference ELX-CP-EUB4-Technial Analysis

CP EUB4 Technical Analysis - Europapublications.europa.eu/documents/10530/676542/ao10463_annex_24_eu…  · Web viewPublications Office Technical Analysis of the common modules of

Embed Size (px)

Citation preview

Publications Office

Technical Analysis of the common modules of the project EUR-Lex, Portal 2012 and EU

BookShop

Subject Common Modules Analysis

Version 0.01

Release Date 16/02/2012

Filename document.doc

Document Reference ELX-CP-EUB4-Technial Analysis

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

TABLE OF CONTENTS

1 Introduction................................................................................................................................... 6

1.1 Purpose of the Document 6

1.2 Scope of the Document 6

1.3 Intended Audience 6

1.4 Structure of the Document 6

2 Common Use Cases for the EUR-Lex, EU-BookShop and Portal 2012 applications...................7

3 Common Modules......................................................................................................................... 8

3.1 ECAS 8

3.2 User management module 8

3.3 Index & Search 8

3.4 CELLAR 8

4 Usage of the common modules....................................................................................................9

4.1 ECAS 9

4.2 User management module 11

4.2.1 Configuration...................................................................................................................12

4.2.2 Register and Login..........................................................................................................12

4.2.3 Manage Roles and Privileges.........................................................................................14

4.2.4 Manage Saved Documents.............................................................................................16

4.2.5 Manage Saved Queries..................................................................................................17

4.2.6 Manage User’s Profiles...................................................................................................19

4.3 Index & Search 21

4.3.1 Configuration...................................................................................................................21

4.3.2 Search for Documents....................................................................................................22

4.4 CELLAR 27

4.4.1 Configuration...................................................................................................................27

4.4.2 Retrieve content from the CELLAR.................................................................................28

4.4.3 The CELLAR response...................................................................................................33

4.4.4 The CELLAR exception...................................................................................................34

5 Architecture Overview................................................................................................................. 35

5.1 Global Overview 35

5.2 Software requirements 37

5.3 Hardware requirements 37

5.3.1 Application Servers.........................................................................................................37

5.3.2 Database Servers...........................................................................................................37

5.3.3 Publications Office/DIGIT Connection.............................................................................37

document.doc Page 2 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

LIST OF TABLES

Table 1: Applicable Documents..............................................................................................................6

Table 2: Applicable Documents..............................................................................................................6

Table 3: Abbreviations and Acronyms....................................................................................................7

Table 4: Definitions................................................................................................................................. 7

Table 5: Software cartography..............................................................................................................41

LIST OF FIGURES

Figure 1: WebLogic: Modifying Default Authenticator Control Flag......................................................12

Figure 2: WebLogic: Creating new Authentication Provider.................................................................12

Figure 3: WebLogic: Reordering Authentication Providers...................................................................13

Figure 4: System, UserSystem and UserAccount.................................................................................15

Figure 5: Privileges and Roles..............................................................................................................17

Figure 6: Saved Documents................................................................................................................. 20

Figure 7: Saved Queries.......................................................................................................................21

Figure 8 User's profiles......................................................................................................................... 22

Figure 9: SearchQuery class................................................................................................................28

Figure 10: Example of facet..................................................................................................................29

Figure 11: GetFacetValuesQuery class................................................................................................30

Figure 12: Example of TagCloud..........................................................................................................30

Figure 13 WeightedManifestationTypes...............................................................................................33

Figure 14: Architecture Overview.........................................................................................................40

document.doc Page 3 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

REFERENCE AND APPLICABLE DOCUMENTS

This section contains the lists of all reference and applicable documents. When referring to any of the documents below, the bracketed reference will be used in the text, such as [R01].

REFERENCE DOCUMENTS

Ref. Title Reference Version Date

R01 EU Bookshop Integration / Vision Document None 1.0 19/10/2010

R02 Portal 2012 Vision Document None 1.1 30/08/2011

Table 1: Applicable Documents

APPLICABLE DOCUMENTS

Ref. Title Reference Version DateA01 General invitation to Tender No 10233

Design and Development of the new EUR-Lex - Specifications

N/A N/A N/A

A02 Project Quality Plan ELX-PQP 1.01 10/02/2011

Table 2: Applicable Documents

document.doc Page 4 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

ABBREVIATIONS AND ACRONYMS

ABBREVIATIONS AND ACRONYMS

Abbreviation MeaningECAS European Commission's Authentication Service HTML HyperText Markup LanguageIDOL Intelligent Data Operating Layer (Autonomy Search software)IT Information TechnologyNAL Named Authority ListOJ Official JournalOP Office des PublicationsPDF Portable Document FormatRSS Really Simple SyndicationSSL Secure Socket LayerURI Uniform Resource IdentifierURL Uniform Resource LocatorWCM Web Content ManagementWSDL Web Service Definition LanguageXML eXtensible Markup LanguageEUB4 EU Bookshop 4 project

Table 3: Abbreviations and Acronyms

DEFINITIONS

Term Meaning

EuroVocIt is a multilingual, multidisciplinary thesaurus covering the activities of the EU, the European Parliament in particular

Table 4: Definitions

document.doc Page 5 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

1 INTRODUCTION

1.1 PURPOSE OF THE DOCUMENT

The purpose of the document is to provide an analysis of the vision documents of the Portal 2012 and EU-Bookshop 4 projects [R01, R02] in order to define what are the different modules developed in the scope of EUR-Lex that can be reused.

This document will first identify the use cases already implemented in EUR-Lex that are also significant in the scope of the Portal 2012 and EU Bookshop projects. Then the third party systems that have to be reused are listed and described. For each module a specific library has been implemented in order to allow EUR-Lex and any other portals that will have similar constraints and needs to use the common modules. These reusable libraries are described and their usage is documented. Finally an overall architecture overview is given, providing additional information on how the different applications will be integrated and on the software and hardware requirements.

1.2 SCOPE OF THE DOCUMENT

The scope of this document is limited to the modules implemented in the new EUR-Lex. The scope does not include an exhaustive list of the features that might be developed to fulfil any other common needs across the Portal 2012 and EU-Bookshop projects.

1.3 INTENDED AUDIENCE

The present document is intended to be read by the following people:

OP IT Project Manager;

OP EU-Bookshop Working Group;

OP Portal 2012 Working Group.

1.4 STRUCTURE OF THE DOCUMENT

The document is organised as follows:

Chapter 1 - Introduction summarises the purpose and scope of the Technical Analysis document. This specifies the purpose and scope of the document, and the intended audience;

Chapter 2 - Common Use Cases for the EUR-Lex, EU-BookShop and Portal 2012applications lists the use cases that are common to the three applications.

Chapter 3 - Common Modules Description of the different modules used, implemented in the scope of EUR-Lex.

Chapter 4 - Usage of the common modules Usage of the different modules implemented in the scope of EUR-Lex.

Chapter 5 - Architecture Overview describes the global target architecture that will be applicable for the three applications (EUR-Lex, EU-BookShop and Portal 2012). This chapter also includes information about Software and Hardware requirements.

document.doc Page 6 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

2 COMMON USE CASES FOR THE EUR-LEX, EU-BOOKSHOP AND PORTAL 2012 APPLICATIONS

This section lists the identified use cases that are common to each application. They have been regrouped by categories, as follows:

User management module

o User registration;

o User authentication (login);

o Management of user’s roles and privileges;

o Management of user’s saved documents;

o Management of user’s saved queries;

o Management of user’s profiles;

Index & Search

o Search for documents.

Content Layer

o Consultation of a document.

document.doc Page 7 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

3 COMMON MODULESAfter the identification of the different common use cases, we are able to identify the common modules for these applications. In the scope of the EUR-Lex, those common modules have been developed following the constraint that they have to be easily reused by the forthcoming new portals. The following sub-sections list the identified common modules and describe their goals. Chapter 4 gives additional information on the integration of these modules using the associated client libraries.

3.1 ECASIn the scope of EUR-Lex, the single sign-on module used is ECAS. This choice has been imposed by the PO, therefore no other solution has been investigated. ECAS is a well-known authentication service that is already used for the majority of the European commission websites and applications.

3.2 USER MANAGEMENT MODULE

User management was quickly identified as a clear candidate for the implementation of a third party module that can be reused by different applications. This module has been first developed in the scope of EUR-Lex to fulfil all the needs related to the storage and management of user’s information. User’s information encompass general data such as user profiles including login information, user role and privileges but also more portal specific information such as saved queries or saved documents.

Even if the application has been developed during the development of the new EUR-Lex, all stored information can be independent of EUR-Lex. The information is stored in a generic way allowing other applications to use it.

To be able to be accessed from other applications like EU-BookShop or Portal 2012, the user management application offers some web services allowing the retrieval/storing of information.

3.3 INDEX & SEARCH The index and search module is based on the Autonomy IDOL product which was imposed during the development of EUR-Lex. This module is intended to index all the data that is disseminated by EUR-Lex and other forthcoming new portals.

As suggested by its name, the module offers index and search capabilities on different type of contents and digital formats. In the scope of EUR-Lex, the module is used to index and search all the legal content, the different website pages and editorial content.

The Index & Search module is directly interfaced with the CELLAR in order to retrieve and index the different documents and metadata it contains. This module is associated to a JAVA library allowing performing different types of queries on the IDOL servers. The library is detailed in section 4.2.6.

3.4 CELLARIn order to centralise all documents into one application, the CELLAR has been developed. The CELLAR can be seen as the content repository module. This application aims to store all content and metadata needed by the Publications Office and its applications. It offers an HTTP REST interface to retrieve metadata and documents’ content in different language and formats.

One of the main goals of EUR-Lex was to use the CELLAR application which has been developed as a completely distinct project. For this purpose a JAVA client library has been developed to ease the usage of the module. The library is detailed in section 4.4.

document.doc Page 8 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

4 USAGE OF THE COMMON MODULES

4.1 ECASIn the scope of EUR-Lex, the single sign-on module is ECAS. This choice has been imposed by the OP and no other solutions have been investigated. In order to integrate this module in EUR-Lex, the Weblogic Security Realm and Authentication Provider features have been used.

The process to configure the Weblogic application to use the ECAS Authentication Provider is described hereafter.

First of all, please raise a request at DIGIT to provide you with the last version of the ECAS Authentication provider (for development environments, a mock is provided as a JAR file that should be named mock-client-ecas-weblogic-X-X-full.jar).

Then:

1. Copy the mock-client-ecas-weblogic-X-X-full.jar to $MIDDLEWARE_HOME/$DOMAIN_HOME/lib

2. Identity Asserter Configuration

To activate ECAS Identity Asserter V2 from BEA WebLogic Server Administration Console perform the following steps.

a. Set  Default Authenticator control flag to optional: b. Select <WL_DOMAIN>/Security Realms;

c. Click on “myrealm” entry;

d. Select “Providers” tab;

e. Click on “DefaultAuthenticator” entry;

f. Change the value of the “Control Flag” to OPTIONAL.

document.doc Page 9 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

Figure 1: WebLogic: Modifying Default Authenticator Control Flag

g. Click “Save”; h. Setup Mock ECAS Identity Asserter Authentication Provider Version 2;

i. Select <WL_DOMAIN>//Security Realms;

j. Click on “myrealm” entry;

k. Select “Providers” tab;

l. Click “New”;

m. Specify values

Name: “MockECAS”

Type: “MockECASIdentityAsserterV2”

 

Figure 2: WebLogic: Creating new Authentication Provider

n. Click “OK”;

document.doc Page 10 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

o. change the order of  MockECAS

p. Click “Reorder”;

q. Select “MockECAS” from the authentication providers’ list and use up arrow button to move Mock ECAS to the top of the list;

Figure 3: WebLogic: Reordering Authentication Providers

r. When done click “OK”.s. Verify Mock ECAS Identity Asserter control flag

t. Click on “MockECAS” and make sure that Control Flag has value “OPTIONAL”. If this is not the case change it to “OPTIONAL” the same way you have done it for “DefaultAuthenticator”.

u. Enter ECAS provider specific values

v. Click “Provider Specific” in the “Common” section of Mock ECAS Identity Asserter “Configuration” tab;

w. Modify values for

o Proxy Url:/cas/proxy

o Server Name: <WL_HOST>

x. Accept Strength: BASIC for ECAS Mock-up or STRONG for the real ECAS service.

y. Requesting User Details:true

z. ECAS Base Url:https://<ECAS_HOST>:<SSL_PORT>

aa. In Excluded Context Paths, add the following paths

/console

The WebLogic administrator console must be accessible without ECAS authentication.

bb. Validate Url:/cas/strictValidatecc. Click “Save” button.

dd. Restart Weblogic Server.

document.doc Page 11 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

During the installation of the production environment, a definitive ECAS client is installed by the infrastructure team in order to plug the system with the final ECAS system.

4.2 USER MANAGEMENT MODULE

To centralise all user’s information and configuration data, a specific system called ‘Common Portal’ has been designed and implemented in the scope of EUR-Lex. Please note that the so-called ‘Common Portal’ is a different system than the Portal 2012 project.

The Common Portal is an independent system designed to store all the user information gathered by the different portals. For the time being, EUR-Lex is the single client of the Common Portal but it can easily be used by other portals. For this purpose a specific web service exposes all the operations required by the portals.

In the scope of the EUR-Lex project, we have also designed and implemented a WebService client API to exchange information with a central user management application. This client shall be reused by any other Portal that is intended to use the Common Portal.

Here is the maven configuration of the artefact “eurlex-commonportal-api”:

<dependency><groupId>op.eurlex</groupId><artifactId>eurlex-commonportal-api</artifactId><version>${project.version}</version>

</dependency>

This section will describe the major use cases with a data structure approach, then explain the methods that shall be used in the client API.

Please note that any web service call to the user management application is handled through the service “eu.europa.ec.op.commonportal.api.service.ICommonportalWebServiceClient”. So, if a service of a client application needs to interact with the user management application, the good approach is to inject this client into the service, like this:

@Resource(name = CommonportalWebServiceClient.ID)private ICommonportalWebServiceClient webservice;

4.2.1 CONFIGURATIONThe username, password and WSDL location for the web service authentication need to be set up, for instance in a spring configuration file:

<bean id="app.commons.commonportal.api.service.commonportalWebServiceClient"class="eu.europa.ec.op.commonportal.api.service.impl.CommonportalWebServiceClient">

<property name="password" value="frontoffice1"/><property name="username" value="EURLexFrontOffice"/><property name="wsdlUrl" value="http://localhost:7001/op-

commonportal/ws?wsdl"/></bean>

In order to make spring aware of core services, please add the following line in your spring configuration file:

<context:component-scan base-package="eu.europa.ec.op.commonportal.api.service"/>

4.2.2 REGISTER AND LOGIN

4.2.2.1 Data Model

document.doc Page 12 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

FK_URS_TB_URA FK_URS_TB_STM

TB_USER_ACCOUNT

URA_IDURA_USER_ID_CDURA_FIRSTNAME_NMURA_LASTNAME_NMURA_EMAIL_NMURA_DELETABLE_FLURA_MODIFIED_BYURA_MODIFIED_BY_IDURA_MODIFIED_ONURA_CREATED_BYURA_CREATED_BY_IDURA_CREATED_ON

NUMBER(11)VARCHAR2(80 CHAR)VARCHAR2(40 CHAR)VARCHAR2(40 CHAR)VARCHAR2(80 CHAR)VARCHAR2(1 CHAR)VARCHAR2(50)NUMBER(11)TIMESTAMPVARCHAR2(50)NUMBER(11)TIMESTAMP

<pk>TB_SYSTEM

STM_IDSTM_NAME_NMSTM_MODIFIED_BYSTM_MODIFIED_BY_IDSTM_MODIFIED_ONSTM_CREATED_BYSTM_CREATED_BY_IDSTM_CREATED_ON

NUMBER(11)VARCHAR2(40 CHAR)VARCHAR2(50)NUMBER(11)TIMESTAMPVARCHAR2(50)NUMBER(11)TIMESTAMP

<pk>

TB_USER_SYSTEM

URS_IDURS_URA_IDURS_STM_IDURS_LAST_LOGIN_DATE_DTURS_MODIFIED_BYURS_MODIFIED_BY_IDURS_MODIFIED_ONURS_CREATED_BYURS_CREATED_BY_IDURS_CREATED_ON

NUMBER(11)NUMBER(11)NUMBER(11)DATEVARCHAR2(50)NUMBER(11)TIMESTAMPVARCHAR2(50)NUMBER(11)TIMESTAMP

<pk><fk1><fk2>

Figure 4: System, UserSystem and UserAccount

These tables store the information related to a user and the list of applications (called “System”) the user is registered to.

In the scope of the EUR-Lex project, we identified two systems:

The Frontoffice application;

The Backoffice application.

The entity UserSystem makes the link between a user account and a system. It also contains the last login date of a user, for a given system.

Please note that the system only stores information about the different authenticated users but cannot be used for direct authentication. Please refer to chapter 4.1 for the sign-on handling.

4.2.2.2 RegisterIf your application must provide a way to a user to register in the user management application, you should use this API:

UserAccount userAccount = new UserAccount();userAccount.setUserId("genglefr");userAccount.setEmail("[email protected]");AuditUser auditUser = new AuditUser();auditUser.setUsername("My application name");auditUser.setUserId(0L);userAccount = this.client.saveUserAccount(userAccount, auditUser);

Please note that an AuditUser is an object that contains the information related to the audit columns “Modified by” and “Created by”.

4.2.2.3 LoginWhen a user logs into the application, you should retrieve its data from the user management application. To achieve this, use this simple API:

document.doc Page 13 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

UserAccount userAccount = this.client.getUserAccount("genglefr");

The roles and privileges of the user will also be fetched.

4.2.3 MANAGE ROLES AND PRIVILEGES

4.2.3.1 Data Model

FK_RTP_TB_PRS

FK_RTP_TB_RLE

FK_RUA_TB_RUA

FK_RUA_TB_RLE

FK_RLE_TB_STM

FK_URS_TB_URA

FK_URS_TB_STM

FK_PRG_TB_STM

TB_ROLE

RLE_IDRLE_STM_IDRLE_NAME_NMRLE_DESCRIPTION_DESCRLE_DEFAULT_FLRLE_DELETABLE_FLRLE_MODIFIED_BYRLE_MODIFIED_BY_IDRLE_MODIFIED_ONRLE_CREATED_BYRLE_CREATED_BY_IDRLE_CREATED_ON

NUMBER(11)NUMBER(11)VARCHAR2(50)VARCHAR2(240 CHAR)VARCHAR2(1 CHAR)VARCHAR2(1 CHAR)VARCHAR2(50)NUMBER(11)TIMESTAMPVARCHAR2(50)NUMBER(11)TIMESTAMP

<pk><fk>

TB_USER_ACCOUNT

URA_IDURA_USER_ID_CDURA_FIRSTNAME_NMURA_LASTNAME_NMURA_EMAIL_NMURA_DELETABLE_FLURA_MODIFIED_BYURA_MODIFIED_BY_IDURA_MODIFIED_ONURA_CREATED_BYURA_CREATED_BY_IDURA_CREATED_ON

NUMBER(11)VARCHAR2(80 CHAR)VARCHAR2(40 CHAR)VARCHAR2(40 CHAR)VARCHAR2(80 CHAR)VARCHAR2(1 CHAR)VARCHAR2(50)NUMBER(11)TIMESTAMPVARCHAR2(50)NUMBER(11)TIMESTAMP

<pk>

TB_ROLE_TO_USER_ACCOUNT

RUA_IDRUA_RLE_IDRUA_URA_IDRUA_DELETABLE_FLRUA_MODIFIED_BYRUA_MODIFIED_BY_IDRUA_MODIFIED_ONRUA_CREATED_BYRUA_CREATED_BY_IDRUA_CREATED_ONColumn_11

NUMBER(11)NUMBER(11)NUMBER(11)VARCHAR2(1 CHAR)VARCHAR2(50)NUMBER(11)TIMESTAMPVARCHAR2(50)NUMBER(11)TIMESTAMP<Undefined>

<pk><fk1><fk2>

TB_SYSTEM

STM_IDSTM_NAME_NMSTM_MODIFIED_BYSTM_MODIFIED_BY_IDSTM_MODIFIED_ONSTM_CREATED_BYSTM_CREATED_BY_IDSTM_CREATED_ON

NUMBER(11)VARCHAR2(40 CHAR)VARCHAR2(50)NUMBER(11)TIMESTAMPVARCHAR2(50)NUMBER(11)TIMESTAMP

<pk>

TB_PRIVILEGE

PRG_IDPRG_STM_IDPRG_DESCRIPTION_DESCPRG_PRIVILEGE_CDPRG_MODIFIED_BYPRG_MODIFIED_BY_IDPRG_MODIFIED_ONPRG_CREATED_BYPRG_CREATED_BY_IDPRG_CREATED_ON...

NUMBER(11)NUMBER(11)VARCHAR2(50 CHAR)VARCHAR2(40 CHAR)VARCHAR2(50)NUMBER(11)TIMESTAMPVARCHAR2(50)NUMBER(11)TIMESTAMP

<pk><fk>

TB_ROLE_TO_PRIV

RTP_IDRTP_RLE_IDRTP_PRG_IDRTP_CREATE_FLRTP_READ_FLRTP_UPDATE_FLRTP_DELETE_FLRTP_MODIFIED_BYRTP_MODIFIED_BY_IDRTP_MODIFIED_ONRTP_CREATED_BYRTP_CREATED_BY_IDRTP_CREATED_ON

NUMBER(11)NUMBER(11)NUMBER(11)VARCHAR2(1 CHAR)VARCHAR2(1 CHAR)VARCHAR2(1 CHAR)VARCHAR2(1 CHAR)VARCHAR2(50)NUMBER(11)TIMESTAMPVARCHAR2(50)NUMBER(11)TIMESTAMP

<pk><fk1><fk2>

TB_USER_SYSTEM

URS_IDURS_URA_IDURS_STM_IDURS_LAST_LOGIN_DATE_DTURS_MODIFIED_BYURS_MODIFIED_BY_IDURS_MODIFIED_ONURS_CREATED_BYURS_CREATED_BY_IDURS_CREATED_ON

NUMBER(11)NUMBER(11)NUMBER(11)DATEVARCHAR2(50)NUMBER(11)TIMESTAMPVARCHAR2(50)NUMBER(11)TIMESTAMP

<pk><fk1><fk2>

Figure 5: Privileges and Roles

These tables store the information related to a user and its roles and privileges, for a given system.

document.doc Page 14 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

As you can see, a user may have multiple roles. A role is composed of one or more privileges.

4.2.3.2 Add/Remove Role and PrivilegeThe roles and privileges stored in the user management application are fetched automatically with the this.client.getUserAccount(username) API.

In order to grant a role to a user, simply use the following API:

UserAccount userAccount = this.client.getUserAccount("genglefr");RoleToUserAccount roleToUserAccount = new RoleToUserAccount();roleToUserAccount.setRole(this.client.getRole(Roles.ROLE_EXPERT_SEARCH_CREATE, "My application name"));userAccount.getRoleToUserAccounts().add(roleToUserAccount);this.client.saveUserAccount(userAccount,(AuditUser) userDetails.getAuditUser());

To remove a role:

UserAccount userAccount = this.client.getUserAccount("genglefr");userAccount.getRoleToUserAccounts().remove(0);this.client.saveUserAccount(userAccount,(AuditUser) userDetails.getAuditUser());

4.2.3.3 Spring Security IntegrationIn order to integrate spring-security, we have created an object that implements the org.springframework.security.core.userdetails.User and org.springframework.security.core.userdetails.UserDetails interfaces:

eu.europa.ec.op.commonportal.api.security.EcasUserDetails.

Hereafter is a code sample that explains how to build the EcasUserDetails object, with the UserAccount we have fetched from the user management application (please note that this method overrides the org.springframework.security.core.userdetails. UserDetailsService#loadUserByUsername(String username) method):

@Overridepublic UserDetails loadUserByUsername(String username) { Set<GrantedAuthority> roles = new HashSet<GrantedAuthority>(); UserAccount userAccount = this.client.getUserAccount(username); if (userAccount == null) { userAccount = new UserAccount(); userAccount.setUserId(username); // Logged in ECAS but no account, give Anonymous Role roles = fetchAnonymousAuthorities(); } else { // Logged in ECAS with account, but not registered in commonportal give Anonymous Role if (!isRegistered(userAccount, "My application name")) { roles = fetchAnonymousAuthorities(); } else { roles = createRolesSet(userAccount.getRoleToUserAccounts(), EurlexConfig.getCommonportalSystemName(), false); } } EcasUserDetails userDetails = new EcasUserDetails(userAccount, roles); if (isRegistered(userAccount, "My application name")) { updateLastLoginDate(userAccount, "My application name");

document.doc Page 15 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

this.client.saveUserAccount(userAccount, (AuditUser) userDetails.getAuditUser()); }return userDetails;

Once the user is authenticated, and the security object populated, we can use the spring-security standard APIs to check if a user has a specific role or privilege.

For instance, if a specific functionality of your application requires a role to be executed, simply annotate your method like this:

@Secured( {Roles.ROLE_SOCIAL_ADMIN_CREATE})public void deleteAnswer(Long idAnswer, Locale locale) {…}

The application will check if the current user has the role ROLE_SOCIAL_ADMIN_CREATE before executing the method.

Spring also provides a convenient way to show/hide buttons in the front end depending on an existing role in the security profile of the current user:

<sec:authorize ifAnyGranted="<%= Roles.ROLE_SAVED_DOCUMENTS_READ %>"><%-- HTML code --%>

</sec:authorize>

In this sample, the HTML code will only be evaluated if the role ROLE_SAVED_DOCUMENTS_READ is granted to the current user.

4.2.4 MANAGE SAVED DOCUMENTS

4.2.4.1 Data Model

FK_SDF_TB_URS

FK_SDD_TB_SDF

TB_SAVED_DOCUMENT

SDD_IDSDD_SDF_IDSDD_LEGAL_CONTENT_ID_CDSDD_MODIFIED_BYSDD_MODIFIED_BY_IDSDD_MODIFIED_ONSDD_CREATED_BYSDD_CREATED_BY_IDSDD_CREATED_ON

NUMBER(11)NUMBER(11)VARCHAR2(80 CHAR)VARCHAR2(50)NUMBER(11)TIMESTAMPVARCHAR2(50)NUMBER(11)TIMESTAMP

<pk><fk>

TB_USER_SYSTEM

URS_IDURS_URA_IDURS_STM_IDURS_LAST_LOGIN_DATE_DTURS_MODIFIED_BYURS_MODIFIED_BY_IDURS_MODIFIED_ONURS_CREATED_BYURS_CREATED_BY_IDURS_CREATED_ON

NUMBER(11)NUMBER(11)NUMBER(11)DATEVARCHAR2(50)NUMBER(11)TIMESTAMPVARCHAR2(50)NUMBER(11)TIMESTAMP

<pk><fk1><fk2>

TB_SAVED_DOCUMENT_FOLDER

SDF_IDSDF_URS_IDSDF_FOLDER_NAME_NMSDF_FOLDER_COMMENT_REMSDF_NB_DOCUMENTS_NOSDF_MODIFIED_BYSDF_MODIFIED_BY_IDSDF_MODIFIED_ONSDF_CREATED_BYSDF_CREATED_BY_IDSDF_CREATED_ON

NUMBER(11)NUMBER(11)VARCHAR2(40)VARCHAR2(240)NUMBER(11)VARCHAR2(50)NUMBER(11)TIMESTAMPVARCHAR2(50)NUMBER(11)TIMESTAMP

<pk><fk>

Figure 6: Saved Documents

These tables store the information related to saved documents.

The column SDD_LEGAL_CONTENT_ID_CD of the table TB_SAVED_DOCUMENT identifies the document in the Cellar (Content Layer).

document.doc Page 16 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

The information that can be persisted in the table TB_SAVED_DOCUMENT_FOLDER is:

SDF_FOLDER_NAME_NM: The name of the folder;

SDF_FOLDER_COMMENT_REM: A comment linked to the folder;

SDF_NB_DOCUMENTS_NO: The number of documents the folder contains.

4.2.4.2 Add/Remove a DocumentThe following code sample shows how to add a document in the folder name "My folder name":

List<SavedDocument> savedDocuments = this.client.getDocuments("genglefr", "My application name", "My folder name");SavedDocument savedDocument = new SavedDocument();savedDocument.setLegalContentId("cellar:137de3e7-f68d-4251-a15c-710a5b12a72e");savedDocuments.add(savedDocument);this.client.saveDocuments("genglefr", savedDocuments, "My application name", "My folder name", (AuditUser) userDetails.getAuditUser());

4.2.5 MANAGE SAVED QUERIES

4.2.5.1 Data Model

FK_SDQ_TB_URS

TB_SAVED_QUERY

SDQ_IDSDQ_URS_IDSDQ_NAME_NMSDQ_COMMENT_REMSDQ_QUERY_LOBSDQ_RESULT_SIZE_NOSDQ_LAST_RUN_DATE_DTSDQ_RSS_FLSDQ_MODIFIED_BYSDQ_MODIFIED_BY_IDSDQ_MODIFIED_ONSDQ_CREATED_BYSDQ_CREATED_BY_IDSDQ_CREATED_ON

NUMBER(11)NUMBER(11)VARCHAR2(40 CHAR)VARCHAR2(240 CHAR)CLOBNUMBER(11)DATEVARCHAR2(1 CHAR)VARCHAR2(50)NUMBER(11)TIMESTAMPVARCHAR2(50)NUMBER(11)TIMESTAMP

<pk><fk>

TB_USER_SYSTEM

URS_IDURS_URA_IDURS_STM_IDURS_LAST_LOGIN_DATE_DTURS_MODIFIED_BYURS_MODIFIED_BY_IDURS_MODIFIED_ONURS_CREATED_BYURS_CREATED_BY_IDURS_CREATED_ON

NUMBER(11)NUMBER(11)NUMBER(11)DATEVARCHAR2(50)NUMBER(11)TIMESTAMPVARCHAR2(50)NUMBER(11)TIMESTAMP

<pk><fk1><fk2>

Figure 7: Saved Queries

document.doc Page 17 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

These tables store the information related to saved queries.

The information that can be persisted in the table is:

SDQ_NAME_NM: The name of the query;

SDQ_COMMENT_REM: A comment linked to the query;

SDQ_QUERY_LOB: The textual representation of the query. In the EUR-Lex context, we store here a XML representation of the query. For instance, for an expert query in EUR-Lex we have used the following XML serialisation:

<query><internal><![CDATA[]]></internal><searchType>EXPERT_SEARCH</searchType><expertQuery><![CDATA[EUROVOC_descriptor = "politics and public safety" OR TI = azores OR TE = azores]]></expertQuery><searchLanguage>en</searchLanguage><searchScope>LEGAL_CONTENT</searchScope><params></params></query>

The format of the query in this column must be defined by each application;

SDQ_RESULT_SIZE_NO: The number of results returned by this query, as of the last execution date;

SDQ_LAST_RUN_DATE: The timestamp when the query was executed the last time;

SDQ_RSS_FL: A flag that determines if the query is used as an RSS feed.

4.2.5.2 Add/Remove a QueryIn order to save a query, please have a look into the following code sample:

SavedQuery savedQuery = new SavedQuery();savedQuery.setComment("My comment");savedQuery.setName("My test query name");savedQuery.setLastRunDate(TimeHelper.getCurrentTime());savedQuery.setQuery("transport");savedQuery.setRss(false);this.client.saveQuery("genglefr",savedQuery,"My application name", getAuditUser());

In our example, the query is very simple ("transport"). You should consider here to write a serialization/deserialization mechanism (to XML, for instance) to populate this field. This way, you’ll be able to store everything you need in this CLOB column.

document.doc Page 18 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

4.2.6 MANAGE USER’S PROFILES

4.2.6.1 Data modelTB_PREFERENCE

PRC_IDPRC_URS_IDPRC_NAME_NMPRC_VALUE_LOBPRC_MODIFIED_BYPRC_MODIFIED_BY_IDPRC_MODIFIED_ONPRC_CREATED_BYPRC_CREATED_BY_IDPRC_CREATED_ON

NUMBER(11)NUMBER(11)VARCHAR2(40 CHAR)CLOBVARCHAR2(50)NUMBER(11)TIMESTAMPVARCHAR2(50)NUMBER(11)TIMESTAMP

<pk><fk>

TB_PRINT_PROFILE

PTP_IDPTP_URS_IDPTP_FORMAT_CDPTP_METADATAS_LOBPTP_PROFILE_NAME_NMPTP_DEFAULT_PROFILE_FLPTP_MODIFIED_BYPTP_MODIFIED_BY_IDPTP_MODIFIED_ONPTP_CREATED_BYPTP_CREATED_BY_ID...

NUMBER(11)NUMBER(11)VARCHAR2(20 CHAR)CLOBVARCHAR2(40 CHAR)VARCHAR2(1 CHAR)VARCHAR2(50)NUMBER(11)TIMESTAMPVARCHAR2(50)NUMBER(11)

<pk><fk>

TB_EXPORT_PROFILE

ETP_IDETP_URS_IDETP_EXPORT_FORMAT_CDETP_PROFILE_NAME_NMETP_DEFAULT_PROFILE_FLETP_METADATAS_LOBETP_MODIFIED_BYETP_MODIFIED_BY_IDETP_MODIFIED_ONETP_CREATED_BYETP_CREATED_BY_IDETP_CREATED_ON

NUMBER(11)NUMBER(11)VARCHAR2(20 CHAR)VARCHAR2(40 CHAR)VARCHAR2(1 CHAR)CLOBVARCHAR2(50)NUMBER(11)TIMESTAMPVARCHAR2(50)NUMBER(11)TIMESTAMP

<pk><fk>

TB_DISPLAY_PROFILE

DLP_IDDLP_URS_IDDLP_HIGHLIGHT_RESULT_FLDLP_NB_RESULT_PER_PAGE_NODLP_FIRST_SORT_CRITERIA_CDDLP_SECOND_SORT_CRITERIA_CDDLP_IS_FIRST_SORT_CRIT_ASC_FLDLP_METADATAS_LOBDLP_IS_SECOND_SORT_CRIT_ASC_FLDLP_PROFILE_NAME_NMDLP_DEFAULT_PROFILE_FLDLP_MODIFIED_BYDLP_MODIFIED_BY_IDDLP_MODIFIED_ONDLP_CREATED_BYDLP_CREATED_BY_IDDLP_CREATED_ON

NUMBER(11)NUMBER(11)VARCHAR2(1 CHAR)NUMBER(11)VARCHAR2(20 CHAR)VARCHAR2(20 CHAR)VARCHAR2(1 CHAR)CLOBVARCHAR2(1 CHAR)VARCHAR2(40 CHAR)VARCHAR2(1 CHAR)VARCHAR2(50)NUMBER(11)TIMESTAMPVARCHAR2(50)NUMBER(11)TIMESTAMP

<pk><fk>

TB_USER_SYSTEM

URS_IDURS_URA_IDURS_STM_IDURS_LAST_LOGIN_DATE_DTURS_MODIFIED_BYURS_MODIFIED_BY_IDURS_MODIFIED_ONURS_CREATED_BYURS_CREATED_BY_IDURS_CREATED_ON

NUMBER(11)NUMBER(11)NUMBER(11)DATEVARCHAR2(50)NUMBER(11)TIMESTAMPVARCHAR2(50)NUMBER(11)TIMESTAMP

<pk>

Figure 8 User's profiles

The different tables depicted in the figure below are related to the user’s profiles and user’s preferences.

Regarding the user’s profiles, the system is able to handle 3 different types of profiles:

The profile related to the print of search results;

The profile related to the display of search results;

The profile related to the export of search results.

These profiles can easily be reused by others applications such as EU-BookShop and Portal 2012 if these systems handle user’s preferences or profiles.

As these tables are not really generic in their structures, we have designed another table to manage all other type of preferences, stored in a generic way. The definition of the information stored in each table is given below.

4.2.6.1.1 Print profile

The print profile should be used when the user want to print some search results. He is able to define the following information:

PTP_FORMAT_CD: represents the format of the printed results. It may be, for instance, HTML or XML;

PTP_METADATAS_LOB: represents the list of metadata the user wants to print. He may want to print only the title and identifier of document for instance. In the scope of EUR-Lex, we store the list of metadata with an XML serialization:

<metadata>

<metadataCode>TI_DISPLAY</metadataCode>

<metadataCode>SO</metadataCode>

<metadataCode>DN</metadataCode>

<metadataCode>AU</metadataCode>

document.doc Page 19 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

<metadataCode>DD_DISPLAY</metadataCode><metadataCode>FM</metadataCode>

</metadata>

PTP_PROFILE_NAME_NM: the name of the profile

PTP_DEFAULT_PROFILE_FL: possible values are ‘Y’ or ‘N’. As a user can save multiple profiles, he has to define the profile to be used by default. The value ‘Y’ means that the profile is the default one, otherwise it is ‘N’ for no.

4.2.6.1.2 Export profile

The export profile should be used when the user wants to export some search results. He is able to define the same information as for the print profile.

4.2.6.1.3 Display profile

The display profile should be used when the user display some search results in the application. He is able to define the following information:

DLP_HIGHLIGHT_RESULT_FL: flag ‘Y’ or ‘N’. In the scope of EUR-Lex, it is used to tell the search layer to retrieve result containing some highlighted words matching with the search request.

DLP_NB_RESULT_PER_PAGE_NO: an integer representing the number of result displayed in a page of search results.

DLP_FIRST_SORT_CRITERIA_CD: represent the first sort criteria used by the search layer to retrieve sorted results.

DLP_SECOND_SORT_CRITERIA_CD: represent the second sort criteria used by the search layer to retrieve sorted results.

DLP_IS_FIRST_SORT_CRIT_ASC_FL: a flag used with the first sort criteria to indicate the sort order (ascending/descending).

DLP_IS_SECOND_SORT_CRIT_ASC_FL: a flag used with the second sort criteria to indicate the sort order (ascending/descending)..

DLP_METADATAS_LOB: represents the list of metadata to be displayed in the search results. In the scope of EUR-Lex, we stored information thanks to a XML serialization, like done in the print profile.

DLP_PROFILE_NAME: the name of the profile.

DLP_DEFAULT_PROFILE_FL: a flag used to know if the profile is the default one.

4.2.6.1.4 User’s preferences

The user’s preferences are used to store any other information related to the user. It can be used to store a generic list of preferences:

PRC_NAME_NM: the name of the preference

PRC_VALUE_LOB: the value of the preference.

4.2.6.2 Save a profileIn order to save a profile (for instance a print profile), please take in consideration to the following method:

/** * Saves a given print profile for a given user identifier, system name, and profile name. * @param userId The user identifier. * @param systemName The system name * @param profile the profile to save * @param maximumAllowedProfiles The maximum number of allowed profiles * @param auditUser the user who should appear in the audit columns * @return the id-enriched PrintProfile object

document.doc Page 20 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

*/PrintProfile savePrintProfile(String userId, String systemName, PrintProfile profile, int maximumAllowedProfiles, AuditUser auditUser);

You need to provide the identifier of the user, the name of the system you are using, the profile and the maximum number of allowed profiles. To build a PrintProfile you can use the following lines:

PrintProfile profile = new PrintProfile();profile.setFormat("HTML");profile. setDefaultProfile(true);profile. setProfileName("My print profile");profile. getMetadataCodes().add("DN");profile. getMetadataCodes().add("TI");

Finally, you can use the webservice as follows:

this.client.savePrintProfile ("genglefr", "My application name", profile, 10, getAuditUser());

If the number of saved profiles is greater than the given maximumAllowedProfiles, a BusinessException will be thrown.

4.3 INDEX & SEARCH

In the scope of EUR-Lex, we have designed and implemented a useful API to perform IDOL queries and encapsulate the result into Java objects.

Here is the maven configuration of the artefact “search-layer-api”:

<dependency><groupId>op.eurlex</groupId><artifactId>eurlex-search-layer-api</artifactId><version>${project.version}</version>

</dependency>

This section will describe how to use the search API we have implemented.

Please note that every IDOL query has to be performed using the service “eu.europa.ec.op.eurlex.searchlayer.service.IIdolSearchService”. So, if a service of a client application needs to interact with IDOL, the good approach is to inject this bean into the service:

@Autowiredprivate IIdolSearchService idolSearchService;

4.3.1 CONFIGURATIONThere is one service that must be implemented by the application and instantiated by spring:

public interface IIdolConfigurationService

This service is used to manage the configuration of the Idol Search Service. It allows retrieving the URLs related to the IDOL DISH or DAH depending on the environment (dev or prod)1. All values can for instance be retrieved from a configuration file that contains the needed information.

In order to make spring aware of core services, please add the following line to your spring configuration file:

<context:component-scan base-package="eu.europa.ec.op.eurlex.searchlayer.service" />

1 The environment is represented by the interface IIDOLEnvironment. A default implementation is provided in the artefact but you can easily implement it yourself.

document.doc Page 21 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

<context:component-scan base-package="eu.europa.ec.op.eurlex.searchlayer.domain" />

Note that you can also add into your web.xml file the following

<context-param><param-name>contextConfigLocation</param-name><param-value>classpath:eu/europa/ec/op/eurlex/searchlayer/config/spring-search-

layer-service.xml</param-value>

</context-param>

You have to be aware that the usage of this file is only possible in the case if your application is deployed on a WebLogic server. This is due to the fact that this file includes the declaration of a special bean: the ParallelSearchService. This bean allows executing multiple SearchQuery in parallel and uses a task executor org.springframework.scheduling.commonj.WorkManagerTaskExecutor for which some components are only available into WebLogic.

4.3.2 SEARCH FOR DOCUMENTSThe IIdolSearchService allows different operations that will be detailed below:

performSearch based on a SearchQuery

performSearch based on a ITagCloudQuery

performGetFacetValues based on a GetFacetValuesQuery

4.3.2.1 Perform search for document retrievalThis operation is a basic need to retrieve the notice information after a search. At first, the aim is to build a SearchQuery object containing all information of the search.

Once this object is built, the following API allows you retrieving the notice information:

/*** Perform a search query.* * @param searchQuery the search query to perform* @param env the idol environment* @param portalEnv the environment of the portal (for instance : DEV, PROD...)* @return the SearchResults corresponding to the provided {@link SearchQuery}*/ISearchResults performSearch(SearchQuery searchQuery, IIDOLEnvironment env, Environment portalEnv);

To build a SearchQuery, you need to understand all concepts used.

The first one is the concept of IMetadata which represents a metadata in the document. This contains different information such as:

Code represents the code of the metadata (a unique ID);

Xpath contains the xpath of the metadata in the document. For the indexation, the xpaths containing ‘|’ as ‘OR’ operator are supported. For the print metadata, the ‘[]’ as selector operator is also supported;

isSearchIndexMetadata is true if this metadata is used to select the IDOL database to search;

isIndexFieldMetadata is true if the metadata is an index field;

document.doc Page 22 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

isIdolReference is true if the metadata is the IDOL reference.

The SearchQuery defines different behaviours of the IDOL search, for instance:

the searchLanguage needs to filter the document for a given language;

the sortMetadata defining the sorting to be applied. It can been seen like an ORDER BY clause in SQL;

the pagination, by defining a page and a pageSize;

the possibility to retrieve the exact number of resulting documents. You need to be aware that using this has very slow performance and that this number is an approximation when the resulting document number is too high. The workaround we have implemented in EUR-Lex for this issue is explained in the section related to the facets 4.3.2.2.

You need to know that an IDOL search is composed of two kinds of metadata:

the print metadata: define which metadata will be retrieved from IDOL (the whole document is not retrieved!). It can be seen like a SELECT clause in SQL;

the search metadata: define the query used to retrieve the result from IDOL. It can be seen like a WHERE clause in SQL.

The print metadata are simply a list of metadata to be set or added in the search query.

The search metadata compose the query to be executed and are represented as a tree in the search query. It is composed of two types:

the IMetadataSearchCriterion: it is used for the general querying of metadata. It is used in combination with a IFieldSpecifier. This field is used to specify what kind of operation must be done. For instance you can specify this kind of operation:

FieldSpecifierExactPhrase: will retrieve the documents that contain the exact given value

FieldSpecifierOneOfString: will retrieve the documents that contain one of the given strings.

FieldSpecifierWild: will retrieve the documents that contain the given value. This given value may contain wild cards.

the IFreeTextSearchCriterion: it is used for the querying on text metadata. The rule applied to retrieve documents is much more complex: it supports wild cards, makes ‘OR’ operations between given words, it supports the ‘”’ character to search for the exact value, etc... For more information related to the IFreeTextSearchCriterion, please refer to the IDOL documentation.

The following figure shows the structure used to represent the tree of the IMetadataSearchCriterion and IFreeTextSearchCriterion.

You can note that each criterion is implemented by a simple criterion and a complex one. The complex one contains two references to other criteria. The simple represents a leaf in the tree and defines the value, the FieldSpecifier and the list of metadata that needs to be applied on.

document.doc Page 23 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

class SearchQuery

SearchQuery

IFreeTex tSearchCriterion IMetadataSearchCriterion

ComplexMetaDataSearchCriterion

- logicalOperator: LogicalOperator

SimpleMetaDataSearchCriterion

- fieldSpecifier: IFieldSpecifier- metadata: List<IMetadata>

ComplexFreeTextSearchCriterionSimpleFreeTex tSearchCriterion

Figure 9: SearchQuery class

The result of the execution is encapsulated in an ISearchResults object which offers different information such as2:

the number of results;

the SearchQuery itself;

the status of the execution (SUCCESS or ERROR),

the InputStream of the content.

4.3.2.2 Perform search for facet retrievalA facet represents the repartition of possible values of a metadata in function of their occurrences in the indexed documents that match the initial query. The facets are specially handled by IDOL and performing a facet search is quick.

The following figure shows an example of a facet. For the metadata ‘type of act’, you can see that 13136 documents have the type of act value related3 to ‘National execution measures’, 5306 documents have the type of act value related to ‘Written question’, …

Figure 10: Example of facet

2 This is not an exhaustive list.3 related because ‘National execution measures’ is just a label, the real value of the metadata is coded.

document.doc Page 24 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

A facet search must be based on a SearchQuery to apply the search facet on the documents resulting of this search.

The sum of the facet results should be equal to the result number of the SearchQuery. This allows retrieving, with very good performance, the exact number of resulting documents. For each search, you can add a facet search on the based query. This facet would be based on a metadata always present for each document contained in IDOL, for instance a technical metadata. Then, performing a facet search on this metadata and adding all results, will retrieve the exact number of resulting documents of the based query.

You can use the following API to perform a facet search.

/*** Perform a get facet values search query.* * @param getFacetValuesQuery the get facet value query to perform* @param env the idol environment* @param portalEnv the environment of the portal (for instance : DEV, PROD...)* @return the GetFacetValuesResult corresponding to the provided {@link GetFacetValuesQuery}.*/IGetFacetValuesResult performGetFacetValues(GetFacetValuesQuery getFacetValuesQuery, IIDOLEnvironment env, Environment portalEnv);

The following figure explains the structure of the GetFacetValuesQuery.

You notice that it contains a SearchQuery object and a GetFacetValueSortMetadata. This object contains the list of metadata for the facet search4 and the sort criterion which can be:

Alphabetical;

Based on a date;

Based on the count of the facet (the number of result having the same value for the given metadata).

You can easily create this object by using the API on the SearchQuery:

/*** Returns a new {@link GetFacetValuesQuery} with the searchQuery of the builder.* @param facetMetadata the facet metadata* @param facetSortCriterion the facet metadata sort criterion* @param searchIndexHolder the search index holder* @return a new {@link GetFacetValuesQuery} with the searchQuery of the builder*/

public IGetFacetValuesQuery createFacetValuesQuery(IMetadata facetMetadata, GetFacetValueSortCriterion facetSortCrit,ISearchIndexHolder srchIdxHolder)

4 You can make a facet search with different metadata that will be present in the IDOL result.

document.doc Page 25 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

class IGetFacetValuesQuery

GetFacetV aluesQuery

- searchIndexHolder: ISearchIndexHolder- searchLanguage: SearchLanguage

SearchQueryGetFacetValueSortMetaData

- metadata: List<IMetadata>- sortCriterion: GetFacetValueSortCriterion

Figure 11: GetFacetValuesQuery class

You can retrieve the result thanks to the IGetFacetValuesResult. It contains a mapping between the metadata and the facet values related to this metadata. The facet values are present in the object FacetValueResult. This is a simple object containing a list of facet result. The facet result contains the value of the facet and the number of occurrences found for this value.

4.3.2.3 Perform search for tag cloud retrievalThe tag cloud is a composition of the words the most used in the resulting documents. It is, just like the facet, based on the SearchQuery. The more often a word is present in the resulting documents, the higher importance it has. Generally, at the display, it is represented by a most important font size for the element. The following figure shows an example.

Figure 12: Example of TagCloud

‘Joint Committee’ and ‘EEA Agreement’ are the most important elements of the tag cloud.

You can use the following API to perform a tag cloud search:

/*** Perform a tag cloud only search query.* * @param tagCloudQuery the tag cloud search query to perform* @param env the idol environment* @param portalEnv the environment of the portal (for instance : DEV, PROD...)* @return the TagCloudResult corresponding to the provided {@link ITagCloudQuery}*/

document.doc Page 26 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

ITagCloudResult performSearch(ITagCloudQuery tagCloudQuery, IIDOLEnvironment env, Environment portalEnv);

The implementation of ITagCloudQuery just contains the based SearchQuery and the number of elements on which the tag cloud search will be based. The more important the number it is, the more often the tag cloud search takes.

You can use the API of the SearchQuery to create this object:

/*** Returns a new {@link TagCloudQuery} with the searchQuery of the builder.* @param elementNumber the number of element taking into account to compute tag cloud* @return a new {@link TagCloudQuery} with the searchQuery of the builder*/ITagCloudQuery createTagCloudQuery(int elementNumber)

The result is wrapped in the ITagCloudResult. It contains a list of TagCloudElement which simply represents the value of each element of the tag cloud.

4.4 CELLARThe CELLAR application is used to store a lot of content with Fedora and to offer possibilities to retrieve this content, in different languages.

The content layer is represented by the CELLAR system which aims at storing the whole content used by the different portals. The CELLAR exposes an HTTP REST interface for dissemination of the content. To ease the usage of this interface, we have designed and implemented an API to perform queries to the CELLAR and encapsulate the result into Java objects.

Here is the maven configuration of the artefact “content-layer-api”:

<dependency><groupId>op.eurlex</groupId><artifactId>eurlex-cellar-api</artifactId><version>${project.version}</version>

</dependency>

This section will describe how to use the search API we have implemented.

Please note that every query has to be performed thanks to the service “eu.europa.ec.op.cellar.api.service.ICellarService”. So, if a service of a client application needs to interact with the CELLAR, the good approach is to inject this bean into the service:

@Autowiredprivate ICellarService cellarService;

We will describe you what kinds of objects are used to perform a query through the CELLAR and what kind of response is returned. Please note that if an error occurs during the CELLAR query, an exception is thrown and can be caught to be correctly handled.

4.4.1 CONFIGURATIONIn order to make spring aware of core services, please add the following line in your spring configuration file:

<context:component-scan base-package=" eu.europa.ec.op.cellar.api.service"/><context:component-scan base-package=" eu.europa.ec.op.cellar.api.service.impl"/>

Note that you can also add, into your web.xml file, the following

document.doc Page 27 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

<context-param><param-name>contextConfigLocation</param-name><param-value>classpath:eu/europa/ec/op/eurlex/cellar/api/config/spring-cellar-api-

service.xml</param-value>

</context-param>

To use the cellar service, you need to add different properties loaded into the application:

cellar-api.baseURI which represents the base URI of the cellar application. The URI is common to each request done to the CELLAR application. Its value is “http://cellar-dev.publications.europa.eu/”.

cellar-api.host_header this represents the host of the CELLAR system.

cellar-api.read_timeout The connection timeout interval (in ms). For instance 100000, 0 means no timeout.

cellar-api.connection_timeout The read timeout interval (in ms). For instance 100000, 0 means no timeout.

4.4.2 RETRIEVE CONTENT FROM THE CELLARThe CELLAR API allows performing different operations on the CELLAR:

Retrieve manifestation (content of a document in a specific format and language);

Retrieve Object, Branch and Tree notices containing the metadata of a document;

Retrieve all the identifiers of a specific document (synonyms);

Retrieve information related to NALs.

The API encapsulates all the HTTP calls to the cellar and exposes convenience methods allowing you to easily retrieve the requested content.

The next sections explain how to retrieve the different contents stored in the CELLAR system.

4.4.2.1 Retrieve manifestationThe manifestation of a document represents some content related to the document, in a particular language and format. For instance, you may need to retrieve the content of a document in English, in PDF or HTML format.

You just need to call the method below of the ICellarService:

/**

* Retrieves the manifestation of a content layer object. The stream must be closed after use. * @param cellarCurie the cellarCurie of the content layer object. This URI must contain the prefix cellar/, celex/,... * @param weightedManifestationTypes the weightedManifestationTypes * @param weightedLanguagePairsForAcceptLanguage the weighted language pairs for the accept language * @return the input stream of the manifestation * @throws CellarException if an error is return from the cellar. */ICellarResponse getManifestation(String cellarCurie, WeightedManifestationTypes weightedManifestationTypes, WeightedLanguagePairs weightedLanguagePairsForAcceptLanguage) throws CellarException;

document.doc Page 28 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

As you can see, you need to provide different objects. The first represents the identifier of the resource you want to retrieve. This resource may have the following form:

{PS identifier}:{identifier}: This represents the identifier of the document prefixed by its production system.

As the URI of the request to the CELLAR application has always the form ${cellar-api.baseURI}/resource/{PS identifier}/{identifier}, if the given cellarCurie has the form previously described, it will automatically be reworked to {PS identifier}/{identifier} to be appended to ${cellar-api.baseURI}/resource/. So, basically, the cellarCurie is used to generate the URI of the request. If it doesn’t have the given form, it will be appended to the ${cellar-api.baseURI}/resource/ without any modifications.

The second object WeightedManifestationTypes is used to build the Accept header. It provides a weight to each accepted type of response. The weight is related to the importance of the response type you want to retrieve. This allows the CELLAR application to return results with fall-back, in the case the requested resource doesn’t exist in the preferred format you requested.The following figure explains how the object is built.

class Domain Mo...

WeightedManifestationTypes WeightedManifestationType

- manifestationType: ManifestationType- weight: Double

Figure 13 WeightedManifestationTypes

As you can see, the object is simply composed of WeightedManifestationType. This one is a combination of a double, representing the weight of the ManifestationType. This object is an enumeration of the different possible manifestation types.

Finally, the object WeightedLanguagePairs is used to build the Accept-language header. It is the same principle as for the WeightedManifestationTypes .You need to provide an importance to each language allowing the CELLAR application to return results with fallback, in the case the requested resource doesn’t exist in the preferred language you requested. It is built based on the WeightedManifestationTypes. However, instead of a ManifestationType, you need to provide the weight to the Locale.

4.4.2.2 Retrieve notices (branch and tree)The branch and tree notices are stored in the CELLAR application and can be retrieved, as for the manifestation, thanks to the content layer. These notices contain the metadata related to a resource stored in the CELLAR. The difference between a branch and a tree notice is that a tree notice contains all expressions and manifestations although the branch notice only contains the expression and manifestations in the requested language.

You can find below a way to retrieve a branch and a tree notice.

/** * Retrieves the branch notice of a content layer object. The stream must be closed after use. * @param cellarCurie the cellarCurie of the content layer object. This URI must contain the prefix cellar/, celex/,... It may not be an expression URI. * @param weightedLanguagePairsForAcceptLanguage the weighted language pairs for the accept language

document.doc Page 29 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

* @return the input stream of the branch notice * @throws CellarException if an error is return from the cellar. */ ICellarResponse getBranchNotice(String cellarCurie, WeightedLanguagePairs weightedLanguagePairsForAcceptLanguage) throws CellarException;

This method can be called by building the same objects as for the manifestation retrieval, see 4.4.2.1 Retrieve manifestation. The tree notice can be retrieved as follows:

/** * Retrieves the tree notice of a content layer object. The stream must be closed after use. * @param cellarCurie the cellarCurie of the content layer object. This URI must contain the prefix cellar/, celex/,... Only work URI type. * @param decodeLanguage the decoding language * @return the input stream of the branch notice * @throws CellarException if an error is return from the cellar. */ ICellarResponse getTreeNotice(String cellarCurie, WeightedLanguagePair decodeLanguage) throws CellarException;

4.4.2.3 Retrieve synonymsA resource in the CELLAR can be identified with different identifiers. There are different types of identifier, listed below, in a non-exhaustive way:

Celex

OJ

Pegase

Uriserv

Cellar

The Cellar identifier is the most technical id. In some case, you should want to retrieve a particular type of identifier from another one of the notice. For instance, you may want to retrieve the celex number of a document from its cellar id. All those identifiers are synonym identifiers.

This is possible by calling the following API:

/** * Retrieve the identifier mapping for the provided identifiers. * @param identifiers the identifiers to resolve the mapping for * @return the input stream containing the mapping * @throws CellarException if an error is return from the cellar. */ ICellarResponse getIdentifierMapping(List<String> identifiers) throws CellarException;

You just need to provide the list of identifiers you want to retrieve the synonyms from.

The response you’ll get contains a XML with the following form:

<NOTICE type="identifier"><OBJECT

in="http://cellar.publications.europa.eu/resource/celex/32008D0438"><URI>

<VALUE>

document.doc Page 30 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

http://cellar-dev.publications.europa.eu/resource/cellar/e22e7190-16a0-432b-b3ef-a51f47f5444f.0005

</VALUE><TYPE>cellar</TYPE><IDENTIFIER>e22e7190-16a0-432b-b3ef-

a51f47f5444f.0005</IDENTIFIER></URI><SAMEAS>

<URI>

<VALUE>http://cellar-dev.publications.europa.eu/resource/celex/32011R0699.ENG

</VALUE><TYPE>celex</TYPE><IDENTIFIER>32011R0699.ENG</IDENTIFIER>

</URI></SAMEAS><SAMEAS>

<URI>

<VALUE>http://cellar-dev.publications.europa.eu/resource/oj/JOL_2011_190_R_0001_01.ENG

</VALUE><TYPE>oj</TYPE><IDENTIFIER>JOL_2011_190_R_0001_01.ENG</IDENTIFIER>

</URI></SAMEAS><SAMEAS>

<URI>

<VALUE>http://cellar-dev.publications.europa.eu/resource/eurostat/OJ.L_.2011.190.0001.01.ENG

</VALUE><TYPE>eurostat</TYPE><IDENTIFIER>OJ.L_.2011.190.0001.01.ENG</IDENTIFIER>

</URI></SAMEAS>

</OBJECT></NOTICE>

You need to parse the content of the response to retrieve the synonym identifier with the type you want.

4.4.2.4 Retrieve NALsThe CELLAR REST services offer the possibility to retrieve some dumps of NALs in different formats, some concepts of NALs etc. You can find below a list of the available services. The principal argument of these methods is the conceptSchemeURI that represents the URI of the NAL you want to retrieve. All these services have an equivalent for the Eurovoc concept. We will describe the general services that can be used.

Retrieve language supported by a concept scheme: the client is able to retrieve the list of languages in which the NAL is currently available. Just use the following API:

/** * Get the languages supported by the NAL for the provided concept

scheme. * @param conceptSchemeURI the concept scheme URI

* @return the supported languages

document.doc Page 31 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

* @throws CellarException if an error is return from the cellar. */ ICellarResponse getNALSupportedLanguages(String conceptSchemeURI) throws CellarException;

The conceptSchemeURI may be “corporate-body”.

The response is with the JSON format.

Retrieve the concept scheme: returns information about the concept scheme as translations of the concept scheme, last modified date, version, etc. The value of the concept scheme should have the form http://publications.europa.eu/authority/corporate-body. Use the following API:

/** * Get the NAL concept scheme. * @param conceptSchemeURI the concept scheme URI * @return the concept scheme * @throws CellarException if an error is return from the cellar. */ ICellarResponse getNALConceptScheme(String conceptSchemeURI) throws CellarException;

Retrieve the modified concept scheme since a particular date: by providing a date, you are able to retrieve the list of concept schemes modified since that date. The result is a JSON stream. Use the API:

/** * Get the NAL concept schemes modified after the provided date. * @param ifModifiedSince the modification date * @return the concept schemes * @throws CellarException if an error is returned from the cellar. */

ICellarResponse getNALConceptSchemes(Date ifModifiedSince) throws CellarException;

Retrieve the top concept of the concept scheme: The value of the concept scheme should have the form http://publications.europa.eu/authority/corporate-body. Use the following API:

/** * Get the NAL top concept scheme in the provided language. * @param conceptSchemeURI the concept scheme URI * @param locale the locale defining the language * @return the concept scheme * @throws CellarException if an error is return from the cellar. */ ICellarResponseConcepts getNALTopConcepts(String conceptSchemeURI, Locale locale) throws CellarException;

Retrieve the relative concepts of a concept scheme: retrieves a list of concepts having a specific semantic relation with the given concept. The value of the concept scheme should have the form http://publications.europa.eu/authority/corporate-body / COR . The given relation is part of an enum. Use the following API:

/** * Get the NAL concept scheme having a specific relation in the provided language. * @param conceptURI the concept * @param relationURI the relation * @param locale the locale defining the language

document.doc Page 32 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

* @return the concept scheme * @throws CellarException if an error is return from the cellar. */ ICellarResponseConcepts getNALConceptRelatives(String conceptURI, ConceptRelationURI relationURI, Locale locale) throws CellarException;

Retrieve a concept in a language: The value of the concept scheme should have the form http://publications.europa.eu/authority/corporate-body / COR . Use the following API:

/** * Get the NAL concept in the provided language. * @param conceptURI the concept * @param locale the locale defining the language * @return the concept scheme * @throws CellarException if an error is return from the cellar. */ ICellarResponseConcepts getNALConcept(String conceptURI, Locale locale) throws CellarException;

Retrieve the dump of a concept scheme: retrieve the NAL stored in the CELLAR in all languages, in XML or SKOS format. When retrieving the SKOS, the resulting stream is a XML while when retrieving the XMLs, the resulting stream is a zip file containing all XMLs. Use the following API:

/** * Return the zipped dump of the NAL with provided concept scheme uri suffix. * @param conceptSchemeUriSuffix the concept scheme uri suffix (concept-scheme uri suffix after: "http://publications.europa.eu/resource/authority/" typical: "fd_010") * @param format the authority table dump content format * @return the zipped NAL dump * @throws CellarException if an error is return from the cellar. */ ICellarResponse getNALDump(String conceptSchemeUriSuffix, AuthorityTableFormat format) throws CellarException;

Retrieve all domain elements of Eurovoc: retrieve the domain of Eurovoc with their codes and labels in all languages. The response is a JSON stream. Use the following API:

/** * Get the domains facets of the EUROVOC thesaurus. * @return the domain facets * @throws CellarException if an error is return from the cellar. */ ICellarResponse getDomain() throws CellarException;

4.4.3 THE CELLAR RESPONSEThe response of the CELLAR application is encapsulated into the object ICellarResponse. This object is very simple to use as it contains the following information:

An InputStream representing the content.

The status code of the response. For instance, it may be 200 if the request was well handled. Sometimes, if the CELLAR contains multiple streams related to your request, the response is a code 300. In that case, the content of the response will be a HTML containing different

document.doc Page 33 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

HTML anchors with the different resources that can be retrieved. It is up to the client to parse this HTML to retrieve the URI of each resource to download each of them.

A MultivaluedMap containing the headers of the response.

Attention, please note that the implementation of the InputStream is such that after having read it, you cannot reset it. This implies that if you want to read it many times, you need to create another InputStream (with the implementation ByteArrayInputStream for instance) based on the response.

4.4.3.1 The ICellarResponseConcepts responseThis is an extension of the ICellarResponse. In addition to the ICellarResponse, it contains a list of Concept of the NALS. The concepts are mainly characterized by a code, a translation in a language.

4.4.4 THE CELLAR EXCEPTIONWhen the request to the CELLAR is not correctly finished, an exception ICellarException is thrown. It will be the case when the response code is less than 200 or greater than 300. In that case, the exception object is built with the request URI, request headers and response to be able to debug the application.

document.doc Page 34 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

5 ARCHITECTURE OVERVIEWThis section describes how the systems will be integrated with the common modules, the protocols used to interact with each other. Finally, an overview of the software and hardware requirements will be done.

5.1 GLOBAL OVERVIEW

The following figure gives a global overview of the relations between the 3 application servers (EUR-Lex, Portal 2012 and EU BookShop) with the user management application and the other components related to the search and content layers.

document.doc Page 35 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

Figure 14: Architecture Overview

The application servers (EUR-Lex, Portal 2012 and EU Book Shop) communicate with the user management application via Web Services (green lines in the schema). The Single Sign On server is an ECAS server. The HTTPS protocol is used between the application and the Single Sign On server (purple lines).

The applications retrieve information from the CELLAR (grey lines) and IDOL using the HTTP protocol (red lines).

document.doc Page 36 of 37

Development and Maintenance of the new EUR-Lex

Ref: ELX-CP-EUB4-Technial Analysis CP EUB4 Technical Analysis Version: 0.01

5.2 SOFTWARE REQUIREMENTS

The following table gives an overview of the application servers and other software requirements applicable for the EUR-Lex application. Such requirements should also be applicable for the EU BookShop and Portal 2012 applications.

To be able to use the common modules developed, it is highly advised to use the different Java libraries described below.

Server/Purpose Supporting Software

Web ApplicationOracle WebLogic 11gR2 (10.3.2)

Web Service

Data Repository Oracle Database 11gR2

Operating System Solaris 10 (64 bits) on SPARC

Java JDK 1.6 (update 14 or higher)

Spring framework (3.0.3 or higher)

Maven

Table 5: Software cartography

5.3 HARDWARE REQUIREMENTS

The EU Book Shop and Portal 2012 applications have their own constraints in terms of concurrent users, performance and availability, which we cannot assume to be the same as for EUR-Lex.

This section reflects the hardware requirements applicable for the EUR-Lex application, and is provided for information.

5.3.1 APPLICATION SERVERSThe system is expected to support a load equivalent to the existing EUR-Lex sites. Since the system shall support a peak load of 10 000 simultaneous users, it is estimated that the application servers shall be able to process around 200 requests per second on peak periods. To support this load, the hardware envisioned is the equivalent of 2 Sun SPARC M5000 Servers with 8 core 2.53 GHz, 64 GB RAM and 146 GB HDD (Solaris 10 - 64 bits). It is expected that this hardware runs about 16 application servers. However, this will be confirmed by DIGIT load testing.

5.3.2 DATABASE SERVERSA high availability is expected from the Oracle Database servers. If deemed appropriate by the DIGIT, these database servers may thus be installed as an Oracle Real Application Cluster (RAC). This option will ensure the scalability and fault-tolerance of the content managed.

Taking the estimation above of peak periods of 200 web requests per second, it is estimated that the database shall support a peak of 1,000 transactions per second; these transactions being mostly read operations. The hardware envisioned is thus the equivalent of a Sun SPARC M5000 Servers with 8 core 2.53 GHz, 64 GB RAM and 146 GB HDD (Solaris 10 - 64 bits).

5.3.3 PUBLICATIONS OFFICE/DIGIT CONNECTIONThe network between the Publications Office and DIGIT supports a throughput of 1 GB/s. The estimated usage is about up to 80 Mb/s (average 25 Mb/s).

End of Document

document.doc Page 37 of 37