13
D4Science Scientific Data Infrastructure: promoting interoperability by embracing the value of the differences Pasquale Pagano [email protected] Networking session September 2010 Brussels (Belgium) www.d4science.eu

D4Science scientific data infrastructure promoting interoperability by embracing the value of the differences (D4SCIENCE-II)

  • Upload
    fao

  • View
    347

  • Download
    2

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: D4Science scientific data infrastructure promoting interoperability by embracing the value of the differences (D4SCIENCE-II)

D4Science Scientific Data Infrastructure: promoting interoperability by embracing the value of the differences

Pasquale [email protected]

Networking sessionSeptember 2010

Brussels (Belgium)

www.d4science.eu

Page 2: D4Science scientific data infrastructure promoting interoperability by embracing the value of the differences (D4SCIENCE-II)

2

www.d4science.euD4Science Scientific Data InfrastructureBrussels, 29 September 2010

Assumptions

Consolidated facts:

Very rich applications and data collections are currently maintained by a multitude of authoritative providers

Different problems require different execution paradigms: batch, map-reduce, synchronous call, message-queue, …

Key distributed computation technologies exist: grid (gLite and Globus), distributed resource management (Condor), clusters (Hadoop), …

Several standards are adopted in the same domain

Societal observations

• A rich variety of protocols, models, and formats • Create barriers in the usage of resources• Delay dramatically new exploitation patterns

Technical observations

Protocols, models, and formats heterogeneity increases load, Load increases failures

Page 3: D4Science scientific data infrastructure promoting interoperability by embracing the value of the differences (D4SCIENCE-II)

3

www.d4science.euD4Science Scientific Data InfrastructureBrussels, 29 September 2010

D4Science Vision

D4Science objectives:

hide heterogeneity, i.e. abstract over differences in location, protocol, and model;

embrace heterogeneity, i.e. allow for multiple locations, protocols, and models;

Technical goals

no bottlenecks: scale no less than the interfaced resources no outages: keep failures partial and temporary autonomicity: system reacts and recovers

Page 4: D4Science scientific data infrastructure promoting interoperability by embracing the value of the differences (D4SCIENCE-II)

4

www.d4science.euD4Science Scientific Data InfrastructureBrussels, 29 September 2010

Hiding Heterogeneity [1/2]

D4Science is an ecosystem of e-infrastructures where: various communities cohabitate by maintaining their peculiarities and

policies, resources sharing and reuse of services from other domains is feasible

and affordable

Page 5: D4Science scientific data infrastructure promoting interoperability by embracing the value of the differences (D4SCIENCE-II)

5

www.d4science.euD4Science Scientific Data InfrastructureBrussels, 29 September 2010

Hiding Heterogeneity [2/2]

D4Science approach:

• Heterogeneous resources are virtually accessible in a common ecosystem of resources

• despite their locations, technologies, and protocol

• Different communities have access to different views• according to the conditions under which the sharing can occur

• Each community can define its own virtual research environment to satisfy specific needs

• for a limited timeframe and at no cost for the providers of the resource

• Several virtual research environments can coexist• without interfering each other even by competing for the same

resources

Page 6: D4Science scientific data infrastructure promoting interoperability by embracing the value of the differences (D4SCIENCE-II)

6

www.d4science.euD4Science Scientific Data InfrastructureBrussels, 29 September 2010

Approaches and solutions to achieve interoperability :

Blackboard-based

asynchronous communication between components in a system one protocol to R/W and one language to specify messages

Wrapper/ Mediator-based

translates one interface for a component into a compatible interface

Proxy-based

exposes the same interface but allows additional operation over received calls

Adaptor-based

provides a unified interface to a set of other components interfaces and encapsulates how this set of objects interact

Broker-based

Specialises an Adaptor by coordinating communication

Embracing Heterogeneity:Interoperability Approaches

Page 7: D4Science scientific data infrastructure promoting interoperability by embracing the value of the differences (D4SCIENCE-II)

7

www.d4science.euD4Science Scientific Data InfrastructureBrussels, 29 September 2010

Embracing Heterogeneity:Data Representation, Discovery, and Access

D4Science offers

Open transformation service framework Extendible with specific source-target mediators To use for metadata and data crosswalk transformations Tailored for statistical, geospatial, temporal, and textual data

Rich set of reference data Extendible with domain-specific reference data To reuse in services for data curation and harmonization

Support for geospatial services To capture, manage, analyze, and display all forms of data that can be

geographically referenced

Integrated resources registry Format agnostic To support discovery and access

Page 8: D4Science scientific data infrastructure promoting interoperability by embracing the value of the differences (D4SCIENCE-II)

8

www.d4science.euD4Science Scientific Data InfrastructureBrussels, 29 September 2010

D4Science offers solutions to:

Decouple the business domain and infrastructure specific logic from the core “execution” functionality

Invocate a wide range of logic components: SOAP and REST WebServices, Shell Scripts, Executable Binaries, POJOs, …

Support most of the execution paradigms: batch, map-reduce, synchronous call

Bridges key distributed computation technologies: grid (gLite and Globus), Condor, Hadoop

Control and monitor the execution of a processing flow

Staging of data among different storage providers

Streaming data among computation elements

Embracing Heterogeneity: Process Execution [1/2]

Page 9: D4Science scientific data infrastructure promoting interoperability by embracing the value of the differences (D4SCIENCE-II)

9

www.d4science.euD4Science Scientific Data InfrastructureBrussels, 29 September 2010

Embracing Heterogeneity: Process Execution [2/2]

By using adaptors that

operate on a specific third party language and translate them into native constructs,

allow for the creation of complex workflows that exploit several diverse technologies deployed on different infrastructures

Page 10: D4Science scientific data infrastructure promoting interoperability by embracing the value of the differences (D4SCIENCE-II)

10

www.d4science.euD4Science Scientific Data InfrastructureBrussels, 29 September 2010

Conclusions

Facts

Very rich services and data collections are currently maintained by a multitude of authoritative providers

Several standards are adopted in the same domain

Interoperability approaches are key to exploit such richness

D4Science offers a variety of patterns, tools, and solutions

to delivery interoperability solutions and interconnect Heterogeneous digital content Heterogeneous repository systems Heterogeneous computation platforms

to decrease the cost of adoption to reduce the time to market of new ideas to deal with plethora of standards

Page 11: D4Science scientific data infrastructure promoting interoperability by embracing the value of the differences (D4SCIENCE-II)

11

www.d4science.euD4Science Scientific Data InfrastructureBrussels, 29 September 2010

Supported Standards

WS-* WSRF WS-BPEL

JDL JSDL Glue Schema (part)

X-* DC, TEI, ISO etc

JSR (several)

GSI-Security XACML SAML

OpenSearch

OGC related https://quality.wiki.d4science.research-infrastructures.eu/quality/index.php/Standards

Comply with: OAI-PMH OAI-ORE

Page 12: D4Science scientific data infrastructure promoting interoperability by embracing the value of the differences (D4SCIENCE-II)

12

www.d4science.euD4Science Scientific Data InfrastructureBrussels, 29 September 2010

Supported Standards

WSRF Specifications

• WS-ResourceProperties (WSRF-RP)• WS-ResourceLifetime (WSRF-RL)• WS-ServiceGroup (WSRF-SG)• WS-BaseFaults (WSRF-BF)

JSR

• 168 : Simple Portlets• 286 : 186 update• 160 : JMX

WSN Specifications:

• WS-BaseNotification• WS-Topics• (WS-BrokeredNotification)

WS-* Standards

• SOAP• WSDL• WS-Addressing

ISO:

• ISO3166 countries• ISO4217 currencies• ISO1915 geo-location

X-*

• XML• XSD• XSL• XSLT• xPath• xQuery

OGC

• Web Coverage Processing Service • Web Coverage Service • Web Feature Service • Web Map Context • Web Map Service • Web Map Tile Service • Web Processing Service • Web Service Common

OGF Standard:

• Glue Schema (2)

……….

Comply with: OAI-PMH OAI-ORE

Page 13: D4Science scientific data infrastructure promoting interoperability by embracing the value of the differences (D4SCIENCE-II)

13

www.d4science.euD4Science Scientific Data InfrastructureBrussels, 29 September 2010

Thanks

www.gcube-system.org

www.d4science.eu

Pasquale PaganoD4Science-II Technical [email protected]

Donatella CastelliD4Science-II Project [email protected]

Jessica Michel AssoumouD4Science-II Administrative and Financial [email protected]

D4Science is powered by the open-source gCube framework