12
Grid-Enabling Data: Sticking Plaster, Sellotape, & Chewing Gum? Colin C. Venters [email protected] National Centre for e-Social Science University of Manchester

Grid-Enabling Data: Sticking Plaster, Sellotape, & Chewing Gum ?

  • Upload
    yehudi

  • View
    21

  • Download
    0

Embed Size (px)

DESCRIPTION

Grid-Enabling Data: Sticking Plaster, Sellotape, & Chewing Gum ?. Colin C. Venters [email protected] National Centre for e-Social Science University of Manchester. Terms of Reference. Data: numbers, characters, images which can processed and transmitted by [humans] and [machines]. - PowerPoint PPT Presentation

Citation preview

Page 1: Grid-Enabling Data:  Sticking Plaster, Sellotape, &  Chewing Gum ?

Grid-Enabling Data: Sticking Plaster, Sellotape, & Chewing

Gum?

Colin C. [email protected]

National Centre for e-Social ScienceUniversity of Manchester

Page 2: Grid-Enabling Data:  Sticking Plaster, Sellotape, &  Chewing Gum ?

Terms of Reference

Data: numbers, characters, images which can processed and transmitted by [humans] and [machines].– Unstructured.

– Semi-structured.

– Structured. Database Management System (DBMS): a suite of programs

which manage the storage and retrieval of large structured sets of persistent data.

Database: one or more large structured sets of persistent data and one component of a database management system.

Federated databases: data integration using middleware.

Page 3: Grid-Enabling Data:  Sticking Plaster, Sellotape, &  Chewing Gum ?

What’s in a Grid?

Computational Grids - high performance computing resources.

Data Grids - access to heterogeneous datasets. Access Grid - advanced video conferencing-based

collaborative environment. The Grid makes it possible to share heterogeneous,

distributed resources over a network.

Page 4: Grid-Enabling Data:  Sticking Plaster, Sellotape, &  Chewing Gum ?

GRID

MIDDLEWARE

Visualization

Workstation

Mobile AccessSupercomputer, PC-Cluster

DBMS, Sensors, Experiments

Networks

The Grid Metaphor

Page 5: Grid-Enabling Data:  Sticking Plaster, Sellotape, &  Chewing Gum ?

Data Integration

Unimpeded use of distributed, heterogeneous, autonomous data resources.– Integrated view of the data resources that allow users to

interact with them as if they constituted a single, global, integrated data resource.

Data integration fosters collaboration - one of the fundamental goals of e-research.

Limited DBMS support for Grid integration.

Page 6: Grid-Enabling Data:  Sticking Plaster, Sellotape, &  Chewing Gum ?

Grid-Enabling: Grid Middleware

GridFTP– High-performance data transfer protocol.

Storage Resource Broker (SRB)– Uniform interface to a virtual distributed data storage resource.

Open Grid Services Architecture Data Access and Integration (OGSAI-DAI)– Grid Data Service (GDS).

• Standard interface for database access.

– Grid Data Service Factory (GDSF).• Establishes a database service instance.

– Database Access and Integration Service Group Registry (DAISGR).• Identifies available database services.

– OGSA-DQP• Distributed Query Processing i.e. search across multiple databases.

Page 7: Grid-Enabling Data:  Sticking Plaster, Sellotape, &  Chewing Gum ?

ConvertGrid

ESRC pilot demonstrator project (PDP) in e-Social Science Programme.

Research problem: investigating complex research questions that require the combination of datasets from multiple sources.

Data management:– Access to multiple datasets.

Data fusion:– Multiple geo-referenced data sets i.e. different target geographies e.g.

1991 Wards, 1991 Postcode Sectors. Converts data sources with different native geographies to a

common Target Geography. – CSV or XML format.

– Results returned as a string or streams (FTP/HTTP/GridFTP).

Page 8: Grid-Enabling Data:  Sticking Plaster, Sellotape, &  Chewing Gum ?

Different Target Geographies

Page 9: Grid-Enabling Data:  Sticking Plaster, Sellotape, &  Chewing Gum ?

ConvertGrid Architecture

Page 10: Grid-Enabling Data:  Sticking Plaster, Sellotape, &  Chewing Gum ?

Challenges

Scalability: – Performance and capacity requirements.

Security:– Use of Grid Security Infrastructure (GSI) at the Grid service

client level is a non-trivial problem.

Heterogeneity:– Infrastructural.– Syntactic.– Semantic.

Metadata:– Adds contexts to data aiding identification, location, and

interpretation.

Page 11: Grid-Enabling Data:  Sticking Plaster, Sellotape, &  Chewing Gum ?

Further Reading

Watson, P. (2003). Databases and the Grid. In: Grid Computing: Making The Global Infrastructure a Reality, F. Berman, G. Fox, and A. J. G. Hey (eds.), Wiley, pp. 363-384.

Cole, K. et al. (2003). Grid Enabling Quantitative Social Science Datasets: A Scoping Study. ESRC

Atkinson, M. et al. (2004). Data Access, Integration, and Management. In Foster, I. and Kesselman, C. The Grid2: Blueprint for a New Computing Infrastructure, Elsevier, p. 391-429.

Page 12: Grid-Enabling Data:  Sticking Plaster, Sellotape, &  Chewing Gum ?

Acknowledgements

ConvertGrid Team, University of Manchester– Keith Cole, Jon McLaren, Pascal Ekin, Linda Mason, Stephen

Pickles, and Justin Hayes.

Paul Watson, University of Newcastle Alvaro Fernandes, University of Manchester Mike Mineter, National e-Science Centre, University of

Edinburgh