20
http://www.ogsadai.org.uk OGSA-DAI Data Access and Integration for the Grid Neil Chue Hong [email protected]

Http:// OGSA-DAI Data Access and Integration for the Grid Neil Chue Hong [email protected]

Embed Size (px)

Citation preview

Page 1: Http:// OGSA-DAI Data Access and Integration for the Grid Neil Chue Hong N.ChueHong@epcc.ed.ac.uk

http://www.ogsadai.org.uk

OGSA-DAIData Access and Integration for the Grid

Neil Chue Hong

[email protected]

Page 2: Http:// OGSA-DAI Data Access and Integration for the Grid Neil Chue Hong N.ChueHong@epcc.ed.ac.uk

2http://www.ogsadai.org.uk

Motivation Goals Partners Features Projects Further information Overview and demo of FirstDIG/INWA

Overview

Page 3: Http:// OGSA-DAI Data Access and Integration for the Grid Neil Chue Hong N.ChueHong@epcc.ed.ac.uk

3http://www.ogsadai.org.uk

OGSA-DAI Motivation

Entering an age of data– Data Explosion

• CERN: LHC will generate 1GB/s = 10PB/y• VLBA (NRAO) generates 1GB/s today• Pixar generate 100 TB/Movie

– Storage getting cheaper Data stored in many different ways

– Data resources• Relational databases• XML databases• Flat files

Need ways to facilitate – Data discovery– Data access– Data integration

Empower e-Business and e-Science– The Grid is a vehicle for achieving this

Page 4: Http:// OGSA-DAI Data Access and Integration for the Grid Neil Chue Hong N.ChueHong@epcc.ed.ac.uk

4http://www.ogsadai.org.uk

Goals for OGSA-DAI

Aim to deliver application mechanisms that:– Meet the data requirements of Grid applications

• Functionally, performance and reliability

• Reduce development cost of data centric Grid applications

• Provide consistent interfaces to data resources

– Acceptable and supportable by database providers• Trustable, imposed demand is acceptable, etc.

• Provide a standard framework that satisfies standard requirements

A base for developing higher-level services– Data federation– Distributed query processing– Data mining– Data visualisation

Page 5: Http:// OGSA-DAI Data Access and Integration for the Grid Neil Chue Hong N.ChueHong@epcc.ed.ac.uk

5http://www.ogsadai.org.uk

Integration Scenario

A patient moves hospital

DB2 OracleCSVfile

A: (PID, name, address, DOB) B: (PID, first_contact) C: (PID, first_name, last_name, address, first_contact, DOB)

Data A Data B

Data C

Amalgamated patient record

Page 6: Http:// OGSA-DAI Data Access and Integration for the Grid Neil Chue Hong N.ChueHong@epcc.ed.ac.uk

6http://www.ogsadai.org.uk

Why OGSA-DAI?

Why use OGSA-DAI over JDBC?– Language independence at the client end

• Do not need to use Java

– Platform independence• Do not have to worry about connection technology and drivers

– Can handle XML and file resources– Can embed additional functionality at the service end

• Transformations, Compression, Third party delivery• Avoiding unnecessary data movement

– Provision of Metadata is powerful– Usefulness of the Registry for service discovery

• Dynamic service binding process

– The quickest way to make data accessible on the Grid• Installation and configuration of OGSA-DAI is fast and

straightforward

Page 7: Http:// OGSA-DAI Data Access and Integration for the Grid Neil Chue Hong N.ChueHong@epcc.ed.ac.uk

7http://www.ogsadai.org.uk

Project Partners

Powered by ….

Funded by the Grid Core ProgrammeOGSA-DAI£3 million, 18 months, from Feb 2002

Three major releases, three interim releases

DAIT (DAI-Two)Keep the OGSA-DAI brand name£1.5 million, 24 months, from Oct 2003Four major releases

GGF DAIS WGStrong involvement.Standardise the interfaces

OGSA-DAI to be a reference implementation

Page 8: Http:// OGSA-DAI Data Access and Integration for the Grid Neil Chue Hong N.ChueHong@epcc.ed.ac.uk

8http://www.ogsadai.org.uk

Core features

An extensible framework for building applications– Supports relational, xml and some files

• MySQL, Oracle, DB2, SQL Server, Postgres, XIndice, CSV, EMBL

– Supports various delivery options• SOAP, FTP, GridFTP, HTTP, files, email, inter-service

– Supports various transforms• XSLT, ZIP, GZip

– Supports message level security using X509 certificates– Client Toolkit library for application developers– Comprehensive documentation and tutorials

Third production release is coming in November– OGSI/GT3 based– Also previews of WS-I and WS-RF/GT4 releases

Page 9: Http:// OGSA-DAI Data Access and Integration for the Grid Neil Chue Hong N.ChueHong@epcc.ed.ac.uk

9http://www.ogsadai.org.uk

Activities are the drivers

Express a task to be performed by a GDS Three broad classes of activities:

– Statement– Transformations– Delivery

Extensible:– Easy to add new functionality– Does not require modification to the service interface– Extension operate within the OGSA-DAI framework

Functionality:– Implemented at the service– Work where the data is (do not require to move data back)

Page 10: Http:// OGSA-DAI Data Access and Integration for the Grid Neil Chue Hong N.ChueHong@epcc.ed.ac.uk

11http://www.ogsadai.org.uk

Client Toolkit

Why? Nobody wants to write XML! A programming API which makes writing

applications easier– Now: Java– Next: Perl, C, C#?, ML!?

// Create a querySQLQuery query = new SQLQuery(SQLQueryString);ActivityRequest request = new ActivityRequest();request.addActivity(query);

// Perform the queryResponse response = gds.perform(request);

// Display the resultResultSet rs = query.getResultSet();displayResultSet(rs, 1);

Page 11: Http:// OGSA-DAI Data Access and Integration for the Grid Neil Chue Hong N.ChueHong@epcc.ed.ac.uk

13http://www.ogsadai.org.uk

e-Digital MammOgraphy National Database Built a prototype of a national database of

mammographic images in support of the UK Breast screening programme

Employ Grid technologies to facilitate this process

Page 12: Http:// OGSA-DAI Data Access and Integration for the Grid Neil Chue Hong N.ChueHong@epcc.ed.ac.uk

14http://www.ogsadai.org.uk

DB2 ContentManager

DB2 ContentManager

DB2 ContentManager

DB2 ContentManager

DB2 Federation

OGSA-DAI OGSA-DAI OGSA-DAI OGSA-DAI

Database Files

OGSA-DAI

Core Services

Core Services

Core Services

Core Services

DataLoad

TrainingApp

TrainingServices

UCLKCL UEDCHU

CoreAPI

TrainingAPI

TrainingApplication

Core & Training API

OGSA-DAI

DataLoad

TrainingApp

Core & Training API

DataLoad

TrainingApp

Core & Training API

DataLoad

TrainingApp

Core & Training API

Page 13: Http:// OGSA-DAI Data Access and Integration for the Grid Neil Chue Hong N.ChueHong@epcc.ed.ac.uk

16http://www.ogsadai.org.uk

GeneGrid

Grid Based Framework for Bioinformatics – Virtual Bioinformatics Laboratory– Integration of Existing Technologies & Data Sets– Gene Study in Silico– Develop Specialist Data Sets– Grid Services for Commercial or 3rd Party Use

Data resources as XML collections (XIndice), flat files and relational databases (MySQL)– OGSA-DAI plus custom extensions– Beta testers for file based activities

http://www.qub.ac.uk/escience/projects/genegrid/

Page 14: Http:// OGSA-DAI Data Access and Integration for the Grid Neil Chue Hong N.ChueHong@epcc.ed.ac.uk

18http://www.ogsadai.org.uk

Distributed Query Processing

Queries mapped to algebraic expressions for evaluation

Parallelism represented by partitioning queries – Use exchange operators

Prototype available from:– http://www.ogsadai.org.uk

table_scan(protein)

table_scantermID=S92(proteinTerm)

reduce

reduce

hash_join(proteinId)

op_call(Blast)

reduce

exchange

exchange

3,4

1 2

Page 15: Http:// OGSA-DAI Data Access and Integration for the Grid Neil Chue Hong N.ChueHong@epcc.ed.ac.uk

19http://www.ogsadai.org.uk

GridMiner

Test application area: medical– traumatic brain injury treatment– Predicting the outcome of seriously ill patients– analytical part focuses on data mining and On-Line

Analytical Processing (OLAP)

Target: – provide tools to discover and access relevant

knowledge and information from different distributed and heterogeneous data sources

– building on and extending OGSA-DAI

http://www.gridminer.org/

Page 16: Http:// OGSA-DAI Data Access and Integration for the Grid Neil Chue Hong N.ChueHong@epcc.ed.ac.uk

20http://www.ogsadai.org.uk

GridMiner Scenario

Heterogeneities:– Name in A is „First Last“ (as the target format)– Name in C has to be combined

Distribution:– 3 data sources

Page 17: Http:// OGSA-DAI Data Access and Integration for the Grid Neil Chue Hong N.ChueHong@epcc.ed.ac.uk

21http://www.ogsadai.org.uk

Future work

Architecture review– better concurrency model– better AAA framework– better definition of extensibility points

• security, activities, dynamic configuration, mobile code,…

Improved support for– WS Security profiles– Stored procedures– Data transport– XQuery– Database specific datatypes and SQL

Additionally– JDBC and ODBC driver for OGSA-DAI– Contribution process

Page 18: Http:// OGSA-DAI Data Access and Integration for the Grid Neil Chue Hong N.ChueHong@epcc.ed.ac.uk

22http://www.ogsadai.org.uk

Further information

The OGSA-DAI Project Site:– http://www.ogsadai.org.uk

The DAIS-WG site:– http://forge.gridforum.org/projects/dais-wg/

OGSA-DAI Users Mailing list– [email protected]– General discussion on grid DAI matters

Formal support for OGSA-DAI releases– http://www.ogsadai.org.uk/support– [email protected]

OGSA-DAI training courses

Page 19: Http:// OGSA-DAI Data Access and Integration for the Grid Neil Chue Hong N.ChueHong@epcc.ed.ac.uk

23http://www.ogsadai.org.uk

Project Membership

Principal Investigators

Project Manager

Programme Management Board Chair

Technical Review Board Chair

Research Team

IBM Dissemination TeamEPCC Team

Charaka TomMike Ally AmyMario

Malcolm

Kostas

Norman Paul

Neil

Andy Simon Dave PatrickNeil

IBM Development Team

Page 20: Http:// OGSA-DAI Data Access and Integration for the Grid Neil Chue Hong N.ChueHong@epcc.ed.ac.uk

24http://www.ogsadai.org.uk

The End

Questions?