18
CRISP WP 17 1 / 2 Proposed Metadata Catalogue Architecture Document

CRISP WP 17 1 / 2

  • Upload
    jon

  • View
    59

  • Download
    0

Embed Size (px)

DESCRIPTION

CRISP WP 17 1 / 2. Proposed Metadata Catalogue Architecture Document. Work package 17 - IT & DM: Metadata Management and Data Continuum. - PowerPoint PPT Presentation

Citation preview

Page 1: CRISP WP 17 1 / 2

CRISP WP 171 / 2

Proposed Metadata Catalogue Architecture Document

Page 2: CRISP WP 17 1 / 2

Bessone Nicola - ESRF 2

Work package 17 - IT & DM: Metadata Management and Data Continuum

• Objectives:choose, implement data management and metadata mining services and establish an environment permitting a data continuum from raw data to publications across the participating Research Institutes (RIs): ILL, ESRF, SLHC and EuroFEL.

• Task plan:1) Evaluate and adapt metadata catalogues according to

the RIs requirements.2) Deploy and integrate metadata catalogue3) Prototype of data mining on metadata services.

Page 3: CRISP WP 17 1 / 2

Bessone Nicola - ESRF 3

Evaluate metadata catalogues:Use cases

• Identified a list of requirement based on ILL, ESRF and DASY use cases.

• Select a list of most suitable metadata catalogue system on the market.

• Match the requirements with features proposed by the metadata catalogues.

Page 4: CRISP WP 17 1 / 2

Bessone Nicola - ESRF 4

Evaluate metadata catalogues:Requirements

1) AAA1. Authentication

Modular integration of different authentication systems.

2. AuthorizationCustomizable access control system.

3. AccountingGranular logging information levels.

Page 5: CRISP WP 17 1 / 2

Bessone Nicola - ESRF 5

Evaluate metadata catalogues:Requirements

2) Metadata ModelCore Scientific Metadata Model (CSMD) already been developed at STFC

Study

Investigation Sample

Dataset Datafile

Parameter

Page 6: CRISP WP 17 1 / 2

Bessone Nicola - ESRF 6

Evaluate metadata catalogues:Requirements

3) Searching methodFulfill user’s search needs, being easy to use and to access (web).Provide data mining to Facilities and Scientific management about data use/access/search/modific.

4) Cross platform

5) Service APIStable set of API possibly programming language agnostic.

Page 7: CRISP WP 17 1 / 2

Bessone Nicola - ESRF 7

Evaluate metadata catalogues:Requirements

6) Sustainability1. Open source2. Project organization:

Actively maintained, Release plan (documentation, update mechanism, backward comp.), Patch release process (security, bug fix)

3. Cutting edge Technology

7) LicenseFree of charge

Page 8: CRISP WP 17 1 / 2

Bessone Nicola - ESRF 8

Evaluate metadata catalogues:Requirements

8) Data policyDynamic authorization system.

9) Scalability & PerformanceILL host ~2’000 experiment /year producing ~10’000 datasets. Other facilities possibly more…

10)Data ingestionManually & automatic + possible harvest (OAI-PMH)

11)SecurityProtect intellectual property.

Page 9: CRISP WP 17 1 / 2

Bessone Nicola - ESRF 9

Evaluate metadata catalogues:Metadata catalogue systems

1. ICAT2. Dspace3. Fedora4. Ckan5. Invenio6. Tardis7. ISPyB8. iRODS9. SRB-MCAT10. MS. Zentity

Page 10: CRISP WP 17 1 / 2

Bessone Nicola - ESRF 10

Evaluate metadata catalogues:Selection result

• Different solutions have been explored, amongst them ICAT appears to be the only one that currently fits the Data Model requirements. This is the key element for a successful implementation in a reasonable time frame.

Page 11: CRISP WP 17 1 / 2

Bessone Nicola - ESRF 11

Evaluate metadata catalogues:ICAT

• Authentication plug-in• Rule based authorization mechanism• Flexible metadata model• Search method: full-text, numerical and string search

and SQL like query syntax• Set of API (Java and Python)• Database configurable (Oracle, Posgres and MySQL)• Federated search via TopCAT• Core Scientific Meta-Data Model (CSMD)

Page 12: CRISP WP 17 1 / 2

Bessone Nicola - ESRF 12

Evaluate metadata catalogues:ICAT

• Plug-in for DAWN/Mantid• Licence: FreeBSD• Web interface: TopCAT• In use at 11+ RIs

Page 13: CRISP WP 17 1 / 2

Bessone Nicola - ESRF 13

Evaluate metadata catalogues:ICAT

• Work-in-progress:– Improve web interface (TopCat)– Possibility to harvest (OAI-PMH)– Installation process– Synonym mechanism– Integration with Umbrella authentication

Page 14: CRISP WP 17 1 / 2

Bessone Nicola - ESRF 14

Deploy and integrate ICAT:ESRF - Pilot

SpecSpec

ICAT API

RDBMS

Web Service API

Spec

Tomo Xml

TomoDB DB

Tomo to ICATxml converter

ICAT XmlICAT xml ingest

Actual TomoDB metadata collect structure

1

21

23

SMIS

3

SMIS to ICATingester

Page 15: CRISP WP 17 1 / 2

Bessone Nicola - ESRF 15

Deploy and integrate ICAT:ESRF - future

SpecSpecNew Sequencer

Experiment metadataManagement

Scientist controlling the Experiment

ICAT API

RDBMS

Web Service API

SMIS API

RDBMS

Web Service API

WEBInterface

DataManager

SpecSpecSpec session

NEW beamline control system

Page 16: CRISP WP 17 1 / 2

Bessone Nicola - ESRF 16

Deploy and integrate ICAT:ILL

• Data policy published in Dec 2011• Implementation Oct 2012• ICAT deployment Dec 2012• Currently, ingestion of the Data since Nov

2012

Page 17: CRISP WP 17 1 / 2

Bessone Nicola - ESRF 17

Future work

• Complete the deployment (ingestion) at the participating facilities.

• Data mining– Collect uses cases from the different facilities– Currently all use cases are technically simple

(no request for correlation for instance) • Work on the search engine (lucene)• Reporting

Page 18: CRISP WP 17 1 / 2

Bessone Nicola - ESRF 18