39
Final Review Meeting 16 th March 2010 Brussels (Belgium) www.d4science.eu gCube Technology - Part 2: Excellence Contract n°: RI-212488 G eorge Kakaletris - NKUA D 4Science JRA Manager www.d4science.eu

Final Review Meeting 16 th March 2010 Brussels (Belgium) gCube Technology - Part 2: Excellence Contract n°: RI-212488 George Kakaletris

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Final Review Meeting16th March 2010

Brussels (Belgium)

www.d4science.eu

gCube Technology - Part 2: Excellence

Contract n°: RI-212488

George Kakaletris - NKUA

D4Science JRA Manager

www.d4science.eu

2

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

gCube - Part 2: ExcellenceOutline

• Excellence in gCube:gCube as a wholeAdvanced conceptsService highlights

• Recent related publications• Looking into the future

3

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

gCube Technology - Part 2 : Excellence

CompletenessOpenness

Excellence in gCube: gCube as a whole Advanced concepts Service highlights

Recent related publications Looking into the future

4

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

gCube Completeness

gCube is a complete platform encompassing:

• Multilayered development framework

• Infrastructure enabling and aggregation layer

• Data infrastructure logic

• Multi-domain application-level logic

• User interface and end-user application enabler

• User interface and end-user application

Most competing systems face a single sub-domain:

• Distributed processing

• Federated/Distributed/Centralized Information Management/Retrieval

• Specific application

5

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

Enabling Elements Runtime Environment provision

(gCore/gHN) Infrastructure Management,

Monitoring and Self-reorganisation VRE Management VO and Security Support Services Process Execution

Information Organisation Services Storage Management Collection Management Content Management Metadata Management Archive Import Metadata Brokerage Annotation Management Content Transformation Ontology Management

Information Retrieval Services Metadata Indexing Content Indexing Personalisation Content Source Description &

Selection Data Fusion Search

Presentation Services Application Support Layer Portals User Portlets Administrative Portlets Desktop clients

gCube Completeness: A full view of gCube

6

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

YEAR 2

: EXTE

NDED

gCube Openness: Specifications & Technologies

WS-* WSRF X-*

Inc. several metadata formats (DC, TEI, ISO etc)

WS-BPEL JSR (several) JDL Glue Schema (part) GSI-Security OpenSearch

Java Globus Toolkit gLite

Distributed under Open Source License

EUPL https://quality.wiki.d4science.research-infrastructures.eu/quality/index.php/Standards

Can comply with: OAI-PMH OAI-ORE JSDL

Considered / Upcoming: WS-DAI OpenGIS - related

7

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

gCube Technology - Part 2 : Excellence

Resource ModelScope Information ModelSchema Independence

Excellence in gCube: gCube as a whole Advanced concepts Service highlights

Recent related publications Looking into the future

8

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

A powerful Resource Model

A powerful resource model Captures the full breadth of diverse resources found in gCube infrastructures (Data,

Software, Services, Hardware, Configurations …) Is Open

• allows new resource types to be defined / registered

• allows extending resource descriptors with arbitrary information

• is implemented and exploited over plain, WS-world common standards (xml/xquery)

Software as a resource, for: Handling of dependencies

• An extended model for software packaging dependencies

• Extension of dependencies into the Service domain

» Permits complex scenarios of collocation for performance

• Beyond even typical coexistence of component and services versions : message routing Handling of state and logic semantics

• e.g. the Search Operator Profile Monitoring & Management

Current resource model is Partially derived by Glue Schema 1.3 Gradually adopting concepts of Glue Schema 2.0

9

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

Resource Scoping

An innovative concept for creating virtual applications Logically grouping resources into applications

Key features: Handles resource visibility orthogonally to security

• Can be applied even in unsecured (parametrically secured) infrastructures• Exploits the same pool of principals (VO groups )

Is transparent to services• Handled entirely by the gCube container and gCore

Is implemented through standards (formal soap header extensions)• External entities can achieve interoperability

More benefits: Multi-scoping of resources Hierarchical propagation model (Infrastructure, VO, VRE)

Common implementation methods for similar features: API-bound awareness of scope

• Hard/costly to apply• Inconsistency prone

Security-bound access• Complex to achieve similar results• Hard to implement in unsecured setups

10

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

Information Model

Powerful Information Model Capable of implementing, yet chronologically preceding, OAI-

ORE Schema agnostic data/information hosting Supports efficient storage & retrieval

• Compliant with latest developments in Cloud infrastructures

The key concepts: Information Object

• Payload and properties

• Open property model Typed relationships

• Specialized semantics for VRE specific needs

• Two-level type hierarchy (type and subtype)

Related concepts examples OAI-ORE: Can be satisfied with minimal specialization Enhanced publications (DRIVER): Can be satisfied with

reasonable specialization MPEG7: Multiple redirections required to capture multimedia

object relationships

Information Object

Properties

Collection

(Resource)

Properties

Type / Subtype

11

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

A “Complex” Information Object

Some PDF DESC

1 (DC)

My ePrintsDC –EN

Descriptors of My ePrints

ISO GeoTagging of My ePrints

My Annotations

DESC 2 (DC)

DESC 3 (ISO)

DESC 4 (ISO)

AN1

MyPDF

MyPDFJPEG

Thumbnail

Item 2Data Set 1

MyPDFIn mp TIFF

Format

Item 2Data Set 2

Item 2Data Set 3 (ext)

UpdatedURL

MimeType

UpdateDescription

Size

idb

idb/iab

idb/iab

idb

idb idb

idb idb

ipo ipo ipoipo/ar

ipo/ar

ipo

ipo

ipoipo

ipoipo

ipo

ipo: is-part-ofidb: is-descr-byiab: is-annot-byar: altern-repres

12

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

Schema Agnostic Data Management and Access

No assumptions are made on schemas in any system layer / operation stage: Import, hosting, retrieval, presentation Computational intelligence based and traditional tools are exploited for assisting

interoperability

Enabling features: Flexible Information Model Schema unbound importing capacities Schema agnostic hosting services Schema unbound processing services Schema adapting presentation components

Sibling technologies: Federated Information Retrieval systems usually cope with single (or limited) (meta)data

manifestations Widely known DLMSes diverge

• Some are bound to single metadata manifestation (usually a form of DC)• Some cope with single schema for the majority of functionality, while alternative schemas

can be second-class citizens• Some can efficiently handle multiple schemas but luck full set of services for exploitation

Data processing services • Mostly schema-agnostic engines, but do not handle data all-the-way

13

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

gCube Technology - Part 2 : Excellence

Information SystemArchive ImportData Transformation IndexingData Processing Information RetrievalUser Applications

Excellence in gCube: gCube as a whole Advanced concepts Service highlights

Recent related publications Looking into the future

14

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

YEAR 2

: ARCHIT

ECTU

RE

REFIN

EMEN

T &

IMPLE

MEN

TATI

ON

The Information System

Allows and facilitates resource creation, publication and monitoring Integrates with gLite infrastructures

Supports the advanced, open, resource model of gCube Supports and enforces access scoping rules Offers WS-DAIX-like interface for document

access• XQuery 1.0 support for structured documents

• XCollection access for semi-structured documents

Encompasses a robust, effective and efficient architecture Registry, Information Collector, Notifier Services Publisher & Client Libraries Distributed & replicated Single point of reference Capable of handling 100Ks updates per day

Related technologies MDS4: IS builds on MDS4 UDDI:

Limited subset of IS Target only to subset of

resources handled by gCube and IS (i.e. Web Services)

LDAP Servers Limited subset of IS Would require excessive

changes to support introduced concepts that would overrule their benefits assumptions

15

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

YEAR 2

: ARCHIT

ECTU

RE

REFIN

EMEN

T &

IMPLE

MEN

TATI

ON

The Archive Import Service

A highly customizable, modular data importing & linking service:

Is operated via a fully fledged scripting language Is based on plug-ins Is bundled with a mini development environment

for the support of complex imports

Offers unrivalled archive importing capacities Goes beyond de-facto standards

• Capable of importing OAI-ORE entities and supporting gCube information model

• Compliant with, yet exceeding OAI-PMH capacities

Several access protocols provided out of the box (premade plug-ins)

• FTP, HTTP, Local FileSystem, …• XML, HTML, binaries, …

Related technologies are protocol bound:access and transportationmanifestations

e.g. XML/CSV, OAI-PMH

16

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

YEAR 2

: ARCHIT

ECTU

RE

EXTE

NSION &

IMPLE

MEN

TATI

ON

The gCube Data Transformation Service

gDTS: A Computational Intelligence-based Metadata and Content Transformation Engine Automatic transformation path identification

» Minimal path length is favorable Fine-grained sub typing of formats (e.g. resolution, fps etc) Pluggable algorithms for content transformation

Operation Inputs

» the targeted format (opt detailed specification)

» the source object (opt source format specification) Outputs

» Suitable transformation path(s)

» Target object Tested both on grid and cloud infrastructures Master –worker model Dynamically adaptation of worker node population

Common Use Cases Adopted: Thumbnailing, Text extraction, Transcoding etc Future: Watermarking, Feature extraction etc

Related technologies usually:Are specialized, on a limited set of formats

Database data (tabular) Textual data (XML, CSV

etc) A single group of

formats (e.g. video, or image, binary documents)

Are centralized

17

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

YEAR 2

: ARCHIT

ECTU

RE

REFIN

EMEN

T, N

EW C

ONCEPTS

&

IMPLE

MEN

TATI

ON

Metadata & Content Indexing Services

Offers multiple types of indexing Forward Indices, Full-Text,

geospatial/temporal, XML, feature Arbitrary (typed) field indexing

Distributed architecture Multiple lookup services per index Node index cache technology Notification based replication

High performance achievements 10 to 300ms index access time (over SOAP)

High level of failure endurance gCF assisted service state recovery SMS storage backend: integrity assurance

Bundled with large set of instruments: gDTS integration AIS integration IR Bootstrapper component

Goes beyond related technologies such as Digital Libraries and even web search: Supersedes most IR systems that have singe or dual type indices

Favored are: Full text indices & FWD indices

Beyond typical indexing: Feature indexing

(implemented/not employed)• Combined with feature extraction

and distance calculators Support ranked geo-temporal

queries Special compacting algorithm

for FWD indices Combined metadata + content

ranking algorithms

18

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

The data processing pipeline

Layered architecture: Execution engine (complete) Workflow engine (partial) Workflow presentation systems (Not integrated)

Multi-mode operation In-process, Intra-process, Intra-node

Multi-protocol logic execution Executables (native, scripts…) POJOs & Native Javas (engine context) Web Services (WS, WSRF) & HTTP APIs

Multi-infrastructure (gLite, Hadoop, gCube, …) Supports Elastic Cloud management & application

Large data set exchanges: the gRS

Workflow engine: Favors optimisation over matching, the typical grid

approach

Related Technologies Condor + Pegasus

No native handling of heterogeneous techs

Single infrastructure Match-based plans Limited control artefacts

OGSA-DAI, DPQ Mostly data oriented Minimal optimisation capacities Single infrastructure Single protocol Minimal control artefacts

Map-Reduce infrastructures A single model of processing

embedded in processors

YEAR 2

: RE-

DESIG

N &

IMPLE

MEN

TATI

ON

19

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

YEAR 2

: IM

PLEM

ENTA

TION

REFIN

EMEN

TS &

EXTE

NSION

Information Retrieval Services: beyond typical “lookups”

• All in one:• Built in capacities for

federation (DIR components + operators)

• Internally employed for collection selection

• Unlimited capacities for custom processing

• Fully featured:

• Sorting

• Filtering

• Fusion / Merging

• Projection

• Custom source access

• Custom processing

• Related Technologies (IR systems)

• Directly invoke indices (no further processing capacities)

• Even re-sorting can be hard

• Exploit single predefined manifestation or are customizable upon a single custom manifestation

• The cons of our approach:• Impact on performance

20

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

Search Operators

Preprocessors

Search Engine

PP#3

An example of Information Retrieval LifeCycle in gCube

NLP

CSS

Parse

IS

IDX Lookup

1

IDX Lookup

2

CSDS

Fuse

ProjectFetch

Metadata

Sort

AIS

gDTS

PESPre-

process

W/F

Planer

21

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

gCube Technology - Part 2 : Excellence

The Process Execution Engine The plug-ins concept On-demand VREs The gCF

Excellence in gCube: gCube as a whole Advanced concepts Service highlights

Recent related publications Looking into the future

22

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

Papers

The Process Execution Engine

“Dataflow Processing and Optimization on Grid and Cloud Infrastructures”, M. Tsangaris et Al,Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, Vol. 32 No. 1, March 2009

“Nefeli: Hint-based Execution of Workloads in Clouds”, K. Tsakalozos et Al, Published: ICDCS 2010: The 30th International Conference on Distributed Computing Systems

The gCF

“Taming development complexity in service-oriented e-Infrastructures: the gCore application framework and distribution for gCube”, Pagano, P et Al, Zero-In e-Infrastructure News Magazine

23

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

Papers

The plug-ins concept

“Functional adaptivity for Digital Library Services in e-Infrastructures: the gCube Approach”, Simeoni, F. et al; 13th European Conference on Research and Advanced Technology for Digital Libraries, ECDL 2009, 2009

“Matchmaking for Covariant Hierarchies”, Simeoni, F.; Lievens, D., ACP4IS '09: Proceedings of the 8th workshop on Aspects, components, and patterns for infrastructure software. 2009

On-demand VREs

“On-demand Virtual Research Environments and the Changing Roles of Librarians”, Candela, L. et Al, Library Hi Tech, 2009, 27, 239-251

“An Extensible Virtual Digital Libraries Generator”, Assante, M. et Al, 12th European Conference on Research and Advanced Technology for Digital Libraries, ECDL 2008, Aarhus, Denmark, September 14-19, Springer, 2008, 5173, 122-134

24

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

gCube Technology - Part 2 : Excellence

Future work D4Science II Beyond

Excellence in gCube: gCube as a whole Advanced concepts Service highlights

Recent related publications Looking into the future

25

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

Future work

Interoperability (D4Science-II) Extend standards adoption Promote specs into standardisation bodies Focus on interoperating with other, widely adopted systems

Improve information retrieval features New roles for ontologies (D4Science-II) NLP features extension Further performance improvements

Extend security concepts On interoperability (D4Science-II) Support uniformly fine-grained security policies for all resources

and Information Objects

26

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

Future work

Process Execution Elastic cloud management integration Objectives-based process execution

• Without dismissal of previous assumptions of generality Multi e-infrastructure aggregation (D4Science II) Unification with Data Transformation Services and Information

Retrieval

Extensive multimedia handling Decomposition/Composition to/from gCube Information model Feature extraction & indexing for Retrieval

27

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

SUPPLEMENTARY

28

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

gRS

Large data set exchange: the gRS Formalizes the exchange of large data sets in web services (paging, store & forward,

throttling / flow control…) Adds the “by-ref” notion to data exchanged via services Confronts several performance issues of WS interactions Faster than observed similar OGSA-DAI data transfers Already ported to other implementations disjoint to D4Science

• gRS2: a full in-process to across-machine communication and data exchange mechanism

• Boosts performace

29

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

Process Execution on Grid & Cloud

Dataflow Processing and Optimization on Grid and Cloud Infrastructures

Authors: M. Tsangaris, G. Kakaletris, H. Kllapi, G. Papanikos, F. Pentaris, P. Polydoras, E. Sitaridi, V. Stoumpos, Y. Ioannidis

Published: Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, Vol. 32 No. 1, March 2009

Abstract: Complex on-demand data retrieval and processing is a characteristic of several applications and combines the notions of querying & search, information filtering & retrieval, data transformation & analysis, and other data manipulations. Such rich tasks are typically represented by data processing graphs, having arbitrary data operators as nodes and their producer-consumer interactions as edges. Optimizing and executing such graphs on top of distributed architectures is critical for the success of the corresponding applications and presents several algorithmic and systemic challenges. This paper describes a system under development that offers such functionality on top of Ad-hoc Clusters, Grids, or Clouds. Operators may be user defined, so their algebraic and other properties as well as those of the data they produce are specified in associated profiles. Optimization is based on these profiles, must satisfy a variety of objectives and constraints, and takes into account the particular characteristics of the underlying architecture, mapping high-level dataflow semantics to flexible runtime structures. The paper highlights the key components of the system and outlines the major directions of its development.

30

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

Process Execution & The Cloud

Nefeli: Hint-based Execution of Workloads in Clouds

Authors: Konstantinos Tsakalozos, Mema Roussopoulos, Vangelis Floros and Alex Delis

Published: ICDCS 2010: The 30th International Conference on Distributed Computing Systems http://icdcs2010.cnit.it/

Abstract: Virtualization of computer systems has made feasible the provision of entire distributed infrastructures in the form of services. Such services do not expose the internal operational and physical characteristics of the underlying machinery to either users or applications. In this way, infrastructures including computers in data-centers, clusters of workstations, and networks of machines are shrouded in “clouds”. Mainly through the deployment of virtual machines, such networks of computing nodes become cloud-computing environments. In this paper, we propose Nefeli, a virtual infrastructure gateway that is capable of effectively handling diverse workloads of jobs in cloud environments. By and large, users and their workloads remain agnostic to the internal features of clouds at all times. Exploiting execution patterns as well as logistical constraints, users provide Nefeli with hints for the handling of their jobs. Hints provide no hard requirements for application deployment in terms of pairing virtual-machines to specific physical cloud elements. Nefeli helps avoid bottlenecks within the cloud through the realization of viable virtual machine deployment mappings. As the types of jobs change over time, deployment mappings must follow suit. To this end, Nefeli offers mechanisms to migrate virtual machines as needed to adapt to changing performance needs. Using our prototype system, we show significant improvements in overall time needed and energy consumed for the execution of workloads in both simulated and real cloud computing environments.

31

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

On the idea of plug-ins

Functional adaptivity for Digital Library Services in e-Infrastructures: the gCube Approach

Authors: Simeoni, F.; Candela, L.; Lievens, D.; Pagano, P. & Simi, M. Agosti, M.; Borbinha, J.; Kapidakis, S.; Papatheodorou, C. & Tsakonas, G. (ed.)

Published: 13th European Conference on Research and Advanced Technology for Digital Libraries, ECDL 2009, 2009

Abstract: We consider the problem of e-Infrastructures that wish to reconcile the generality of their services with the bespoke requirements of diverse user communities. We motivate the requirement of functional adaptivity in the context of gCube, a service-based system that integrates Grid and Digital Library technologies to deploy, operate, and monitor Virtual Research Environments defined over infrastructural resources. We argue that adaptivity requires mapping service interfaces onto multiple implementations, truly alternative interpretations of the same functionality. We then analyse two design solutions in which the alternative implementations are, respectively, full-fledged services and local components of a single service. We associate the latter with lower development costs and increased binding flexibility, and outline a strategy to deploy them dynamically as the payload of service plugins. The result is an infrastructure in which services exhibit multiple behaviours, know how to select the most appropriate behaviour, and can seamlessly learn new behaviours.

32

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

On the idea of plug-ins

Matchmaking for Covariant Hierarchies

Authors: Simeoni, F.; Lievens, D.

Published: ACP4IS '09: Proceedings of the 8th workshop on Aspects, components, and patterns for infrastructure software. 2009

Abstract: We describe a model of matchmaking suitable for the implementation of services, rather than their composition. In the model, processing requirements are modelled by client requests and com- putational resources are software processors that compete for re- quest processing as the covariant implementations of an open service interface. Matchmaking then relies on type analysis to rank processors against requests in support of a wide range of dispatch strategies. We relate the model to the autonomicity of service provision and briefly report on its deployment within a production-level infrastructure for scientific computing.

33

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

On VREs (Librarians)

On-demand Virtual Research Environments and the Changing Roles of Librarians

Authors: Candela, L.; Castelli, D. & Pagano, P.

Published: Library Hi Tech, 2009, 27, 239-251

Abstract: The aim of this paper is to discuss how new technologies for supporting scientific research will possibly influence the librarians’ work. The discussion is conducted in a context that takes into account the emergence of e-infrastructures as means to realise a new model of producing, using and sharing information resources and even to change the concept of information resource itself. At the core of this innovation there are virtual research environments, i.e. evolved versions of the current “research libraries”. The environments provide scientists with collaborative and customised environments supporting results production and exchange around the globe in a cost-efficient manner. The experiences made with these innovative research environments within the D4Science project is reported. On the basis of this experience, possible professional profiles are suggested for librarians working in these new evolved “research libraries”.

34

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

On VREs (Intantiation)

An Extensible Virtual Digital Libraries Generator

Assante, M.; Candela, L.; Castelli, D.; Frosini, L.; Lelii, L.; Manghi, P.; Manzi, A.; Pagano, P. & Simi, M.

Christensen-Dalsgaard, B.; Castelli, D.; Jurik, B. A. & Lippincott, J. (ed.)

Published: 12th European Conference on Research and Advanced Technology for Digital Libraries, ECDL 2008, Aarhus, Denmark, September 14-19, Springer, 2008, 5173, 122-134

Abstract: In this paper we describe the design and implementation of the VDL Generator, a tool to simplify and automatise the Digital Library development process. In particular, we discuss how our approach to the realisation of this tool simplifies the task of implementing, extending and modifying such a fundamental component. This tool models its issue as a generic search problem that can easily be adapted to different application scenarios. In particular, to guarantee its extensibility we carefully identify, isolate and organise the VDL Generator constituents, i.e. (i) the set of logical components that can be used when designing a Digital Library, (ii) the set of physical components that by implementing the logical components contribute to implement the Digital Library and (iii) the search strategy exploited to accomplish the generation task. Furthermore, we report on the experiences matured in implementing and exploiting such an innovative service in the context of the Diligent EU funded project and discuss future plans for its consolidation.

35

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

On gCore

Taming development complexity in service-oriented e-Infrastructures: the gCore application framework and distribution for gCube

Authors: Pagano, P.; Simeoni, F.; Simi, M. & Candela, L.

Published: Zero-In e-Infrastructure News Magazine, EU FP7 Funded Project BELIEF-II, 2009, 1, 19 – 21 http://www.beliefproject.org/zero-in/zero-in-first-edition-emagazine/taming-development-complexity-in-service-oriented-e-infrastructures

Introduction: e-Infrastructure is the term coined for innovative research environments that provide modern scientists with seamless access to shared, distributed and heterogeneous resources. Within this domain, service-orientation is a common assumption where it provides a common abstraction to hardware, data and even application services as shareable resources. This approach, however, complicates resource management, since deployment, configuration, staging, scoping, monitoring and secure operation of services become fully dynamic and a responsibility of the infrastructure. To fulfill this responsibility, infrastructures must be clear as to the description and run-time behaviour of services. This adds to the complexity typically associated with service development, whether generically related to distributed programming (e.g. concurrency, performance-awareness, and tolerance to partial failure) or specifically introduced by open technologies (e.g. reliance upon multiple standards, limited integration and documentation of development tools). This complexity challenges the operation, maintenance, evolution and thirdparty extension of the infrastructure, ultimately threatening its adoption.

36

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

gHNgHN

ICService

ICService

ExistCommon Lib

ExistCommon Lib

Exist 1.2Exist 1.2

Aggregator Sink

RegistryServiceRegistryService

Aggregator Source

NotifierServiceNotifierService

WS-Topics

ISPublisherLib

ISPublisherLib

ISN

otif

icat

ion

Lib

ISN

otif

icat

ion

Lib

gLiteBridgeService

gLiteBridgeService

gLiteInfrastr.

gCubeServicegCubeService

RegistrationPT

ISPublisherLib

ISPublisherLib

Aggregator Source

Aggregator Sink

RPD

ISPublisherLib

ISPublisherLib

ISP

ublis

her

Lib

ISP

ublis

her

Lib

XQueryAccess PT

ISClientLib

ISClientLib

RPD

WS

-N

otifi

catio

nP

T

Profile

Profile

ISN

otif

icat

ion

Lib

ISN

otif

icat

ion

Lib

StatefulWS-Resource

StatefulWS-Resource

StatefulWS-Resource

StatefulWS-Resource

GCUBEResourceParsers Lib

GCUBEResourceParsers Lib

WS

-Se

rvic

eG

rou

pW

S-S

erv

ice

Gro

up

Notification

Notification

IS Operation

37

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

gCube Resource Model

D4Science Mid-Term Review MeetingBrussels, 3rd April 2009

38

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

Versioning in SOA

Service oriented versioning: seen only in advanced desktop systems Issues not solved yet in mainstream platforms like Java

Is based on formalization and exploitation of the version semantics of WS Is supported by

the resource model the s/w production cycle

Allows smooth evolution of a production establishment Impacts positively the long term stability of the system

Goes beyond typical coexistence of services Side-by-side deployment of several versions of the same service + Transparent routing of client messages to appropriate producers

39

www.d4science.euD4Science Final Review MeetingBrussels, 16th March 2010

Highlights of technological excellence in end-user applications

TimeSeries Management The objective: on-line curation of time series Go beyond common practice of desktop applications for efficient curation and face

the challenge of large data size management over web interface

Scientific Reporting Go beyond the practice on structured information stores and fixed data types Achieve integration of diverse platform services with templated report definition and

production• Dynamic document creation

» Complex information object• Storage• Metadata generation• Indexing

Workspace Achieve integration of a collaborative workspace with the content repository and

information retrieval capacities of the platform, offering access to diverse virtual (e.g. queries) and compound objects YEA

R 2: A

RCHITEC

TURE

& IM

PLEM

ENTA

TION