View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Brian Matthews, CRIS 2002, 31/08/02 1
Accessing the Outputs of Scientific Projects
Brian Matthews, Michael Wilson,Business & Information Technology Dept, CLRC
Kerstin Kleese-van Dam E-Science Centre, CLRC
Brian Matthews, CRIS 2002, 31/08/02 2
Overview
• Science produces two outputs– Conventional Publications – Science Data Sets
• In traditional Science, the 1st is used as a measure of success– The second is locked away.
• In this talk I shall discuss:– A general purpose science data portal for allowing access to data
sets– Potential links to publications.
• To make all the outputs of science available.
Brian Matthews, CRIS 2002, 31/08/02 3
• Central Laboratory of the Research
Councils
• 1700 staff - supporting 12000 scientists and
engineers from universities and industry
• Based at 3 sites:
– Daresbury Laboratory
– Rutherford Appleton Laboratory
– Chilbolton Observatory
• A Multidisciplinary Laboratory
Who we are (CLRC)
Brian Matthews, CRIS 2002, 31/08/02 4
A Multidisciplinary Laboratory
• Spallation Neutron and Muon Source (ISIS)
• Synchrotron Radiation Source (SRS)
• Lasers• Microstructures• Space Science and
Technology• Molecular
Spectroscopy
• Earth Observation• Atmospheric Science• Computational Science• Energy Research• Information Technology • Particle Physics• Radio Communications• Surfaces Transforms
and Interfaces
Brian Matthews, CRIS 2002, 31/08/02 5
The Problem
• Scientific institutions generate vast quantities of data– CLRC - ISIS, SRS, Space Science, Particle Physics,
Computational Science, ...
• More data coming on stream all the time: – CERN-LHC, Diamond, CASIM, HGP, ...
• Very good at handling large amounts of data• Diverse approaches to organising and distributing it.
Need a usable way of gaining access to the data
Brian Matthews, CRIS 2002, 31/08/02 6
User Scenarios
• Lecturer: – This published study would be a good example for teaching, is the
raw data publicly available?
• Researcher:– This is an interesting paper - can I check the data?
• Experiment Proposer: – Have there been any neutron or X-Ray studies of this molecule at
100 K? What reports and papers have been published on them?
• Instrument Scientist: – The instrument seems a bit unstable recently, fetch me the results
of all calibration runs from the last 3 months? Is there are report on this instrument?
Need a usable way of gaining access to publications with data
Brian Matthews, CRIS 2002, 31/08/02 7
The Data Portal Concept
• Single point of access to the CLRC data resources
• Encompasses a wide range of data holdings– Describes what data is available from the facilities– Links to the data held at the facility– Different archiving methods
• Caters for a wide range of users– general community data curators
• Supports a wide range of queries– employing data mining, thesauri, ….
Brian Matthews, CRIS 2002, 31/08/02 8
Combine Diverse Users & Searches ...
Discovery Excavation
Wider science
comm
unity
Data curator
Specialist userExperim
enter
General
comm
unity
Brian Matthews, CRIS 2002, 31/08/02 9
… with Distributed Data Silos….
Facility 1 Facility 2 Facility 3 Facility 4
Brian Matthews, CRIS 2002, 31/08/02 10
…using a central common metadata index ...
http
CLRC Data Access
Server
Client
XML wrapper
Common metadata catalogue database
Local data
Local metadata
XML wrapper
Facility 1
Brian Matthews, CRIS 2002, 31/08/02 11
… and a Web based interface
• Exploit the existing Web infrastructure.– Use New Technologies (XML/RDF);– rapidly disseminated;– widely accessible;
– database and user platform independent
– can be developed now, but with the GRID in mind.
Every user who needs to can get to the information.
Brian Matthews, CRIS 2002, 31/08/02 12
Metadata
Science Metadata Model
ISIS SRS HEPSpace
ScienceSocial
ScienceEnv.
Science
A generic metadata model for all scientific applications with Specialisation for each domain
Can answer questions across domains
Can answer questions about specific domains
Brian Matthews, CRIS 2002, 31/08/02 13
Metadata Model
Metadata Object
Topic
Study Description
Access Conditions
Data Location
Data Description
Related Material
Keywords providing a index on what the study is about.
Provenance about what the study is, who did it and when.
Conditions of use providing information on who and how the data can be accessed.
Detailed description of the organisation of the data into datasets and files.
Locations providing a navigational to where the data on the study can be found.
References into the literature and community providing context about the study.
Brian Matthews, CRIS 2002, 31/08/02 14
Study Description
• The Study is the basic unit for a scientific activity.
• Can be further divided into:– Programmes: for
connected studies.– Investigations: for a
single measurement, experiment or simulation.
STUDY Name
STUDY
Investigator STUDY Id
Investigation
Data Manager
STUDY Info
Experiment Measurement
Programme
Simulation
contains
associated 0..*
0..*
0..1
0..*
Brian Matthews, CRIS 2002, 31/08/02 15
Hierarchy of Data Holdings
• With investigations, there are associated data holdings.
• These are themselves arranged in a hierarchy: data sets, and files, with links between them
• Logical organisation – identity separated from location.
Data HoldingData Holding
File 1 name: date:
Investigation
Data Holding
Data-Set 1 (Raw) Data-Set 2 (Inter) Data-Set 3 (Final)
File 1 name: date:
File 1 name: date:
Brian Matthews, CRIS 2002, 31/08/02 16
Metadata example
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE CLRCMetadata SYSTEM "clrcmetadata.dtd"><CLRCMetadata><MetadataRecord metadataID="N000001">
<Topic><Discipline>Chemistry</Discipline><Subject>Crystal Structure</Subject><Subject>Copper</Subject>...
<Experiment><StudyName>Crystal Structure: Copper : Palladium: :complex: 150K ...<Investigator><Name><Surname>Porter...<Institution>University of Peebles ...<Funding>EPSRC ...<TimePeriod><StartDate><Date>21/04/1999….<Purpose><Abstract>
To study the structure of Copper and Palladium co-ordination complexes at a 150K. <DataManager><Name><Surname>Teat...<Instrument>SRS Station 9.8, BRUKER AXS SMART 1K...<Condition>...Wavelength...<Units>Angstrom...<ParamValue>0.6890...<Condition>…Crystal-to-detector distance<Units>cm...<ParamValue>5.00...
<AccessConditions>The user has to be one of: Prof. F. Porter….
Brian Matthews, CRIS 2002, 31/08/02 17
Metadata collection
Metadata collection and maintenance is a big problem.• But doing science is a process.
Submit proposal
Prepare experiment
Generateresults
Analyseresults
Write report
Provenancemetadata + access
conditionsdata
description ++ +datalocation
Related material
Collecting the metadata can then become part of the experimental support environment
Brian Matthews, CRIS 2002, 31/08/02 18
Grid middleware
Architecture
UsersOther Data
Portals
Local data
Local metadata
XML wrapper
Facility 4
Local data
Local metadata
XML wrapper
Facility 2
Local data
Local metadata
XML wrapper
Facility 1
Local data
Local metadata
XML wrapper
Facility 3
CLRC broke
r
XML wrapper
Common metadata catalogue database
CLRC Data Portal
Brian Matthews, CRIS 2002, 31/08/02 19
Server Architecture
User input interpreter
pre-set XSL
ScriptQuery
Generator
USER
Central metadata repository
XML File
XML Parser
Key: Internal
http
Ascii file
External agent
module
User output generator
Response Generator
Localmetadata repository
XML File
Brian Matthews, CRIS 2002, 31/08/02 20
Example
Result of searching: search across facilities - returns XML to session and displays summary
Brian Matthews, CRIS 2002, 31/08/02 23
Select data
- pick the required data files and download
from convenient location.
Brian Matthews, CRIS 2002, 31/08/02 24
Current developments
• Pilot completed
• Consolidate and broaden existing system– move towards a development system
– handle a greater diversity of data sources – e.g. Max Planck Institute for
Meteorology
• Enhance the Technology– Web services (SOAP, WDSL, OGSA, XML Query)
• Provide links to other information sources:– Library systems
– Thesauri
Brian Matthews, CRIS 2002, 31/08/02 25
Interface with existing archives
• CLRC maintains existing data archives – Atmospheric, earth observation, STP, astronomy.– Existing access mechanisms (Web, Z39.50)– Existing metadata catalogues and formats
• Can we use the Data Portal to access them?– Use the Metadata format as a framework to be
specialised to express existing metadata framework– XML Query as a query layer on the archive
Brian Matthews, CRIS 2002, 31/08/02 26
Re-architect system
• Break up the portal middleware into components.
DP
Resultscollation
Data source location
Query generation
ontologyservice
Security service
Replicationservice
Userservice
replicationservice
Globus GIS - MDS
Globus GSI
Grid Enable
with
Web Services
RDF+DAML+OIL
XML Query
Brian Matthews, CRIS 2002, 31/08/02 27
Access to Data and Publications
• The Data Portal offers the potential to integrate the outputs of scientific research: data and publications.
• Need to have a common search mechanism over library and data portals.– Can abstract the science metadata to Dublin Core.– Links to CERIF would further deepen connection.– Access to common thesauri for classification.
• Common web service interface – Data Portal provides this.– XML Query as a communication mechanism
Brian Matthews, CRIS 2002, 31/08/02 28
Mapping between Dublin Core and Science
Metadata• Title
– Study: Name• Creator
– Study: Investigator: Name (Role is principle investigator)
• Subject – Topic: Keyword
• Description– Study: Study Information: Purpose
• Publisher– Investigation: Data Manager
• Contributor– Study: Investigator: Name ;
Investigation: Data Manager• Date
– Study: Study Information: Time• Resource Type
– Collection; or Dataset.
• Format– Data Description: File Format
• Resource Identifier– Study: Study Id (whole study)– Data description: File: URI (for individual
data files).• Source
– Data description: Data sets: Related Data sets
– Related Material: Related work• Language
– Not covered in the current metadata format; but an simple extension
• Relation– Related Material: Related work
• Coverage– Data description: Logical Description:
Coverage• Rights Management
– Access Conditions
Brian Matthews, CRIS 2002, 31/08/02 29
Where are we?
• Data Portal up and running – Being developed in the E-Science Centre in CLRC
• http://esc.dl.ac.uk:9000/index.html
– Science metadata proving very robust– Trying to extend its use into other areas of science – materials
science, environmental science.
• Beginning to approach the problem of integrating with electronic library resources.