Upload
hakhanh
View
219
Download
0
Embed Size (px)
Citation preview
EPrints Workshop, January 2005 1
eBank UK: Dissemination of research data using EPrints
Simon Coles, School of Chemistry, University of Southampton
EPrints Workshop, January 2005 2
Overview• Scholarly communications in Chemistry
Data, information, workflows and provenance
• The data publication bottlenecke-Science and chemistry
• eBank UK Information architecture, data flow and interoperability
• Challenges for the futureExpansion into other disciplines and data formats
EPrints Workshop, January 2005 3
Research & e-Science workflows
Aggregator services: national, commercial
Repositories : institutional, e-prints, subject, data, learning objects
Data curation: databases & databanks
Validation
Harvestingmetadata
Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media
Deposit / self-archiving
Peer-reviewed publications: journals, conference proceedings
Publication
Validation
Data analysis, transformation, mining, modelling
Searching , harvesting, embedding
Presentation services: subject, media-specific, data, commercial portals
Resource discovery, linking, embedding
Linking
The scholarly knowledge cycle.
Liz Lyon, eBankUK article. Ariadne, July 2003.
EPrints Workshop, January 2005 4
Learning & Teaching workflows
Research & e-Science workflows
Aggregator services: eBankUK
Repositories : institutional, e-prints, subject, data, learning objects
Data curation: databases & databanks
Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules
Validation
Harvestingmetadata
Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media
Resource discovery, linking, embedding
Deposit / self-archiving
Peer-reviewed publications: journals, conference proceedings
Publication
Validation
Data analysis, transformation, mining, modelling
Resource discovery, linking, embedding
Deposit / self-archiving
Learning object creation, re-use
Searching , harvesting, embedding
Quality assurance bodies
Validation
Presentation services: subject, media-specific, data, commercial portals
Resource discovery, linking, embedding
Linking
EPrints Workshop, January 2005 5
Current chemistry publishing protocolsIdeas and interpretations Hooks into the literature
Results & derived data
Raw data!
EPrints Workshop, January 2005 6
EPrints Workshop, January 2005 7
Data Overload!
How do we disseminate?
EPSRC National Crystallography
Service
The data deluge
EPrints Workshop, January 2005 8
CombeChem: eScience testbed
Properties
X-Raye-Lab
Analysis
Propertiese-Lab
SimulationVideo
Diff
ract
omet
er
Grid Middleware
StructuresDatabase
EPrints Workshop, January 2005 9
Establishing common ground…
• Understand the data creation process • Terminology and definitions
– Data– Metadata– Datafile– Dataset– Data holding
• Different views– Digital library researchers, computer scientists, chemists– Generic vs specific– Modeller vs practitioner
• Aim for a common ontology• Modelling the domain• Creating a metadata schema
EPrints Workshop, January 2005 10
Crystallography workflow• Initialisation: mount new sample on diffractometer &
set up data collection• Collection: collect data• Processing: process and correct images• Solution: solve structures• Refinement: refine structure• CIF: produce CIF (Crystallographic Information File
format)• Report: generate Crystal Structure Report
RAW DATA DERIVED DATA RESULTS DATA
EPrints Workshop, January 2005 11
Deposition into the archive
EPrints Workshop, January 2005 12
An Archive entry
ecrystals.chem.soton.ac.uk
EPrints Workshop, January 2005 13
Access to the underlying data
EPrints Workshop, January 2005 14
Some metadata issues
• Using simple and qualified Dublin Core • Additional chemical information in schema for
harvesting e.g. empirical formula• Schema contains International Chemical Identifier
(InChI)• Links to all datasets associated with an experiment• Links to individual datasets within an experiment• Links to EPrints (and other published literature)
derived from the data• Using vocabularies specific to crystallography• Engaging the broader scientific community to ensure
different schemas are compliant and standards can emerge
EPrints Workshop, January 2005 15
ebank_dcrecord (XML)
Crystal structure (data holding)
Crystal structure report (HTML)
Dataset
Dataset
Institutional repository
eBank UK aggregator service
ePrint UK aggregator service
Subject service
DepositHarvesting OAI-PMH
ebank_dc
Harvesting OAI-PMHoai_dc
Harvesting OAI-PMHoai_dc
Dataset
dc:identifier
dcterms:references
Linking
dc:type=“CrystalStructure” and/or “Collection”
Model input Andy Powell, UKOLN.
Eprint oai_dcrecord (XML)
dcterms:isReferencedBy
dc:type=“Eprint” and/or ”Text”
Data flow in eBank
Eprint“jump-off” page (HTML)
dc:identifierEprintmanifestation (e.g. PDF)
Linking
EPrints Workshop, January 2005 16
Harvesting: OAIster
EPrints Workshop, January 2005 17
Linking and aggregating
EPrints Workshop, January 2005 18
Embedded in a science portal
EPrints Workshop, January 2005 19
Current situation
• Version 2.0 eBank metadata schema• Pilot institutional e-data repository for harvesting
(raw, derived, results data) using EPrints.orgsoftware
• Exports records as ebank_dc and oai_dc• Validation of schema & discussion with
International Union of Crystallography for final developments and wider deployment
• Pilot eBank UK aggregator service• Developing search interface Version 1.0 • Testing with PSIgate physical sciences portal –
embedding eBank UK
EPrints Workshop, January 2005 20
What’s next?
• Progress towards generic metadata schemas • Validation against other schema (CCLRC Model)• Eprints.org software: allow for more generic scientific data
and schemas? • Metadata enhancement: keywords based on knowledge of
keywords in related publications?• Investigate identifiers: International Chemical Identifier • Explore context sensitive linking• Full embedding into chemical and crystallographic research
and publishing• e-Learning embedding and pedagogic evaluation• Feasibility study in related domains
EPrints Workshop, January 2005 21
Breakout Session?• Describing non ‘Dublin Core’ terms
Qualified Dublin CoreComplex object formats: METS vs MPEG-21 DIDL Set & Friends containers
• Compliance between schemasOne generic schemaDevelop multiple schemas
• RightsUse / reusePublisher
• Linking & aggregatingDOIKeyword ontologiesIdentifiersContext sensitive linking