Mendeley Data FAIR hackathon

Preview:

Citation preview

MENDELEY FAIR HACKATHON

Luiz Olavo Bonino - luiz.bonino@dtls.nl

WHAT IS FAIR DATA?

I need data. What should I do?

WHAT IS FAIR DATA?

WHAT IS FAIR DATA?

Findable:

F1. (meta)data are assigned a globally unique and persistent identifier;

F2. data are described with rich metadata;

F3. metadata clearly and explicitly include the identifier of the data it describes;

F4. (meta)data are registered or indexed in a searchable resource;

http://www.nature.com/articles/sdata201618

WHAT IS FAIR DATA?

WHAT IS FAIR DATA?

Accessible:

A1. (meta)data are retrievable by their identifier using a standardized communications protocol;

A1.1 the protocol is open, free, and universally implementable;

A1.2. the protocol allows for an authentication and authorization procedure, where necessary;

A2. metadata are accessible, even when the data are no longer available;

■ http://www.nature.com/articles/sdata201618

WHAT IS FAIR DATA?

WHAT IS FAIR DATA?

Interoperable:

I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.

I2. (meta)data use vocabularies that follow FAIR principles;

I3. (meta)data include qualified references to other (meta)data;

■ http://www.nature.com/articles/sdata201618

WHAT IS FAIR DATA?

WHAT IS FAIR DATA?

Reusable:

R1. meta(data) are richly described with a plurality of accurate and relevant attributes;

R1.1. (meta)data are released with a clear and accessible data usage license;

R1.2. (meta)data are associated with detailed provenance;

R1.3. (meta)data meet domain-relevant community standards;

■ http://www.nature.com/articles/sdata201618

NETHERLANDS

FAIR transformation FAIR transformation

Analysis transformation Analysis transformation

FAIRNESS LEVELS

PID\\\

Metadata (intrinsic)

'provenance' (user defined)

Data (elements)

Non-FAIR

PID

Metadata (intrinsic)

'provenance' (user defined)

Data (elements)

Findable Usable for Humans

PID

Metadata (intrinsic)

'provenance' (user defined)

Data (elements)

FAIR metadata

PID

Metadata (intrinsic)

'provenance' (user defined)

Data (elements)

FAIR data- restricted access

PID

Metadata (intrinsic)

'provenance' (user defined)

Data (elements)

FAIR data- Open Access

PID

Metadata (intrinsic)

'provenance' (user defined)

Data (elements)

FAIR data- Open Access/Functionally Linked

FAIR DATA TOOLS

FAIR DATA POINT

A particular class of FAIR Data System that provides access to datasets in a FAIR manner. The datasets can be external or internal to the FAIR Data Point. Also, the source data can be a non-FAIR dataset or a FAIR Data Resource. If the source data is non-FAIR, the FAIR Data Point needs to made the necessary FAIR transformations on the fly.

FAIR Data Resource

non-FAIR Data Resource

FAIR DATA POINT

Who are you? Can I

trust you?

FAIR DATA POINT

Here is information about

myself

FDP Metadata

Who is responsible?

FDP license?

Description?

FAIR DATA POINTOk,

now that I know you, tell me what you

have to offer

reads

FDP Metadata

FAIR DATA POINT

Here is information about my catalog of datasets

Catalog Metadata

FAIR DATA POINTTell

me more about your genomic dataset

reads

Catalog Metadata

FAIR DATA POINT

This is the information about the

genomic dataset

Dataset Metadata

License?Publisher?

Last modified date?

Theme?

FAIR DATA POINTIn

which forms the dataset is available?

reads

Dataset Metadata

FAIR DATA POINT

This is the information about the

dataset distributions

Distribution Metadata

Access or download URL?

Format?

Size?

Media type?

FAIR DATA POINTTell

me more about the dataset content

reads

Dataset Metadata

FAIR DATA POINT

This is the information about the data

record of the dataset

Data record Metadata

Types?

Domain?

Range?

FAIR DATA POINTOk,

now that I know what you have, give

me the data.

reads

Dataset, distribution, data record metadata

FAIR DATA POINT

Here is my data.

FAIR Data

FAIR DATA POINT - ARCHITECTURE

FAIR API / GUI

Metadata Provider FAIR Accessor

Metrics Gatherer Security Enforcer

FAIR Metadata FAIR Data

FAIR Data Point metadata

Title Responsible institution(s) Contact FAIR API version License …

FDP METADATA

<http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp> dct:alternative "DTL FDP"@en ; dct:description "The DTL FAIR Data Point hosts the FAIR Data versions of datasets that have been made FAIR during BYODs as well as other relevant life sciences datasets"@en ; dct:subject "FAIR Data" , "Life Sciences" ; dct:title "DTL FAIR Data Point"@en ; <http://www.re3data.org/schema/3-0#api> <http://dtls.nl/fdp#api=1> ; <http://www.re3data.org/schema/3-0#catalog> <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank> , <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/comparativeGenomics> , <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/patient-registry> , <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/textmining> , <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/transcriptomics> ; <http://www.re3data.org/schema/3-0#institution> <http://dtls.nl> ; <http://www.re3data.org/schema/3-0#institutionCountry> <http://lexvo.org/id/iso3166/NL> ; <http://www.re3data.org/schema/3-0#lastUpdate> "2016-10-27"^^xsd:date ; <http://www.re3data.org/schema/3-0#software> "FAIR Data Point" ; <http://www.re3data.org/schema/3-0#startDate> "2016-10-27"^^xsd:date ; a <http://www.re3data.org/schema/3-0#Repository> ; rdfs:label "DTL FAIR Data Point"@en ; <http://xmlns.com/foaf/0.1/landingpage> <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/swagger-ui.html> .

FAIR Data Point metadata

Catalog metadata

Title Theme taxonomy Issued date …

CATALOG METADATA

<http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank> dct:hasVersion "1.0" ; dct:identifier "biobank" ; dct:issued "2016-02-01"^^xsd:date ; dct:language lang:en ; dct:modified "2016-08-01"^^xsd:date ; dct:title "Rd connect's biobank catalog"@en ; a dcat:Catalog ; rdfs:label "Rd connect's biobank catalog"@en ; dcat:dataset <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank/77350-collection1> ; dcat:themeTaxonomy <http://dbpedia.org/resource/Biobank> , <http://edamontology.org/topic_3337> .

FAIR Data Point metadata

Catalog 1 metadata

Dataset metadata

Title Publisher License Theme(s) Version …

DATASET METADATA

<http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank/77350-collection1> dct:creator <http://orcid.org/0000-0002-1215-167X> ; dct:hasVersion "1.0" ; dct:identifier "77350-collection1" ; dct:issued "2016-02-01"^^xsd:date ; dct:language lang:en ; dct:modified "2016-08-01"^^xsd:date ; dct:publisher <http://orcid.org/0000-0002-1215-167X> ; dct:title "Galliera Genetic Bank"@en ; <http://rdf.biosemantics.org/ontologies/fdp-o#dataRecord> <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/datarecord/77350-collection1-datarecord-1> ; a dcat:Dataset ; rdfs:label "Galliera Genetic Bank"@en ; rdfs:seeAlso <http://catalogue.rd-connect.eu/web/galliera-genetic-bank/bb_home> ; dcat:distribution <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank/77350-collection1/csv> , <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank/77350-collection1/distributionTurtle> , <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank/77350-collection1/ldf> ; dcat:keyword "Galliera Genetic Bank" , "biobank" ; dcat:landingPage <http://ggb.galliera.it> ; dcat:theme <http://dbpedia.org/resource/Biobank> , <http://edamontology.org/topic_3337> , <http://www.orpha.net/ORDO/Orphanet_1023> …

FAIR Data Point metadata

Catalog 1 metadata

Dataset 1 metadata

Distribution metadata

Title Media type Download/access URL License …

DISTRIBUTION METADATA

<http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank/77350-collection1/distributionTurtle> dct:description "Ring14 biobank turtle distribution"@en ; dct:hasVersion "1.0" ; dct:identifier "distributionTurtle" ; dct:issued "2016-02-01"^^xsd:date ; dct:license <http://rdflicense.appspot.com/rdflicense/cc-by-nc-nd3.0> ; dct:modified "2016-07-07"^^xsd:date ; dct:title "Ring14 biobank turtle distribution"@en ; a dcat:Distribution ; rdfs:label "Ring14 biobank turtle distribution"@en ; dcat:downloadURL <http://semlab1.liacs.nl:8080/rdc-demo-dataset/RING_14_dummy-Biobank.ttl> ; dcat:mediaType "text/turtle" .

<http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank/77350-collection1/ldf> dct:description "Ring14 biobank linked data fragment distribution"@en ; dct:hasVersion "1.0" ; dct:identifier "ldf" ; dct:license <http://rdflicense.appspot.com/rdflicense/cc-by-nc-nd3.0> ; dct:title "Ring14 biobank linked data fragment distribution"@en ; a dcat:Distribution ; rdfs:label "Ring14 biobank linked data fragment distribution"@en ; dcat:accessURL <http://dev-vm.fair-dtls.surf-hosted.nl:5050/ring14-biosample> .

FAIR Data Point metadata

Catalog metadata

Dataset metadata

Distribution metadata

Data record metadata

Type Domain Range …

DATA RECORD METADATA

<http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/datarecord/77350-collection1-datarecord-1> dct:hasVersion "1.0" ; dct:identifier "77350-collection1-datarecord-1" ; dct:issued "2016-02-01"^^xsd:date ; dct:language lang:en ; dct:modified "2016-08-01"^^xsd:date ; dct:publisher <http://orcid.org/0000-0002-1215-167X> ; dct:title "Galliera Genetic Bank datarecord metadata" ; <http://rdf.biosemantics.org/ontologies/fdp-o#refersTo> <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank/77350-collection1/csv> ; <http://rdf.biosemantics.org/ontologies/fdp-o#rmlMapping> <https://git.lumc.nl/biosemantics/ring14-fdp-metadata/raw/bd01b84fb792ae3860fdda646e9cb96a1a11205c/rml/biobank/RING_14_biobank_mapping.ttl> ; a <http://rdf.biosemantics.org/ontologies/fdp-o#DataRecord> ; rdfs:label "Galliera Genetic Bank datarecord metadata" .

<#ring14-biobank-id-resource> rml:logicalSource <#inputFile>; rr:subjectMap [ rr:template "http://rdf.biosemantics.org/dataset/ring14/resource/identifier/{Sample ID}" ; rr:class <http://rdf.biosemantics.org/ontologies/rd-connect/21f6df30_1f72_45fb_bfc1_2b3d1af1410a> ];

FAIR Data Point metadata

Catalog 2 metadata

Catalog 1 metadata

Dataset 1 metadata

Distribution 1.a metadata

Data record metadata

Distribution 1.b metadata

Dataset 2 metadata

Distribution 2.a metadata

Data record metadata

Distribution 2.b metadata

Dataset 3 metadata

Distribution 3.a metadata

Data record metadata

FAIR DATA POINT

Biobank

FAIR Data PointBiobankDatabase

Patie

nt R

egist

ry

FAIR

Dat

a Po

int

UNIPROT

FAIR

Dat

a Po

int

HPA

FAIR Data Point

METADATA LAYERSLayer Description Example Standard

FDP (Data repository)

Information about the FDP as a data repository

PID, title, description, license, owner, API version, etc.

RE3Data

Catalog Information about the catalog of datasets offered

PID, title, description, publisher, etc.

W3C DCAT #Catalog

Dataset Information about each of the offered datasets

Publisher, issue date, theme, etc.

W3C DCAT #Dataset,

Distribution Information about how the dataset is distributed

AccessURL, downloadURL, format, mediaType, etc.

W3C DCAT #Distribution

Data record Information about the actual data, types, identifiers, etc.

Data items types, identifiers, domain, range, etc.

RML

OA

I-PM

H

DEMO FAIR DATA POINT

http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/swagger-ui.html

http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/

API

GUI

FAIR DATA POINT

Metadata Provider FAIR Accessor

Metrics Gatherer Security Enforcer

EXISTING DATA REPOSITORIES

Metadata Provider FAIR Accessor

Metrics Gatherer Security Enforcer

EXTENDING EXISTING DATA REPOSITORIES

Metadata Provider FAIR Accessor

Metrics Gatherer Security Enforcer+

Metadata Provider

FAIR Data Accessor

Metrics Gatherer Access Controller

EUDAT Current ComponentsEUDAT Current

ComponentsEUDAT Current

ComponentsCurrentSolution

Components

FAIR DATA ECOSYSTEM

Create Publish AnnotateFind

0110011 1100101 1001100

OpenRDFKnowledgeAnnotatorORKA

DataFAIRport

0110011 1100101 1001100

Development started in October 2016

FAIR Metadata FAIR Metadata FAIR Metadata FAIR Metadata

0110011 1100101 1001100

metadata index

retrieves metadata

search interfaces

(GUI and API)

0110011 1100101 1001100

Development started in October 2016 Based on OpenRefine

FAIRIFICATION PROCESS

■ Retrieve original data

■ Dataset identification and analysis

■ Definition of the semantic model

■ Data transformation

■ License assignment

■ Metadata definition

■ FAIR Data resource (data, metadata, license) deployment

FAIRIFICATION

Original dataset

FAIR Data Resource

FAIR Format

Metadata Licensesubmit generate

Generic semantic

model

Resolution ServicesInput

Non-FAIR Dataset

Metadata

License

ARTA Service

Identifier Service

Vocabulary Service

FAIR Data Resource

FAIR Format

Metadata License

output

FAIR DataModel Registry

FAIRIFIER

■ Transform non-FAIR datasets into FAIR Data Resources (dataset in FAIR format, license and metadata)

■ Data munging

■ Semantic modeling

■ License definition

■ Metadata definition and extraction

■ Data publication

FAIRIFIER

FAIR DATA MODEL REGISTRY

FAIR DataModel Registry

Dataset

Data Model

Dataset

Data Model

Dataset

Data Model

FAIRIFICATION

Original dataset

FAIR Data Resource

FAIR Format

Metadata Licensesubmit generate

Generic semantic

model

FAIRIFICATION - NEW DATASET TYPE

Original dataset

FAIR Data Resource

FAIR Format

Metadata Licensesubmit generate

FAIR Data Model Registry

store

Non-FAIR - FAIR

mapping

FAIRIFICATION - RECURRING DATASET TYPE

Original dataset

FAIR Data Resource

FAIR Format

Metadata Licensesubmit generate

FAIR Data Model Registry

qu

ery

Non-FAIR - FAIR

mappingretr

ieve

■ A particular class of FAIR Data System to provide support for data interoperability;

■ Supports publication and access to FAIR data. ■ Fosters an ecosystems of applications and services; ■ Federated architecture: different FAIRports (and other

FAIR Data Systems) are interconnectable; ■ Supports citations of datasets and data items; ■ Provides metrics for data usage and citation;

DataFAIRport

FAIR Data Search Engine

FAIRifier + (Meta)Data Publication

Metadata storageData storage (optional)

TransformationServices Registry

(optional)FAIR Data Point

DataFAIRportDTL

FAIR Data PointFAIR Data Point

F A IR

FAIRPORT

DataFAIRportFind,&Access,&Interoperate&&&Re3use&Data

Stewardship API FAIR Data API

(Meta)Data Storage component

Metadata storage

Data storage

DataVerse EUDAT Data Repository

Semantic resolver Ontology storage

Data storage API / FAIR Data API

Data usage policy

Management component

GUI (Data publishing, search, mgmt)

Data Mgmt App

FAIR Data System

Metrics storage

Data ConsumerData Producer

Data Consumer AppsEx. *APInatomy, BRAIN,

etc)

Data Consumer AppsEx. *APInatomy, BRAIN,

etc)

Data Consumer AppsEx. *APInatomy, BRAIN,

etc)

Data Consumer AppsEx. *APInatomy, BRAIN,

etc)Data Mgmt AppData Mgmt AppData

Stewardship Apps

■ Allow third-party annotation on existing knowledge bases

■ Capture the provenance of the annotator and the original statement

OpenRDFKnowledgeAnnotatorORKA

DEMO: HTTP://DEV-VM.FAIR-DTLS.SURF-HOSTED.NL:8080/#/

DEMO: HTTP://DEV-VM.FAIR-DTLS.SURF-HOSTED.NL:8080/#/

DEMO: HTTP://DEV-VM.FAIR-DTLS.SURF-HOSTED.NL:8080/#/

ANNOTATIONS GO TO NANOPUB STORE

TOOLS ROADMAPDec 16 Jan 17 Feb 17 Mar 17

FAIR Data Point

Version 1 Metadata editor,

release metadata, POST, FAIR accessor

Version 1.1 Reintroduce OAI-PMH compliance

Version 1.2 Update

notification

FAIR Data Search Engine

Beta 1 Crawler,

metadata index and search GUI

Beta 2 Improved search GUI, search API

FAIRifier

Beta 1 OpenRefine + RDF plugin, publication to FAIR Data Point

Beta 2 Metadata

definition and extraction (RML),

license picker

TOOLS ROADMAPDec 16 Jan 17 Feb 17 Mar 17

FAIR Data Model

Registry

Alpha 1 Start of the

integration work

ORKA

Beta 1 Definition of 2-3

use cases

Beta 2 Extended with

features required by the use cases

Data FAIRport

Alpha 1 Start of the

integration work

TECHNOLOGY TRANSFER EVENTS

EXTENDING EXISTING DATA REPOSITORIES

Metadata Provider FAIR Accessor

Metrics Gatherer Security Enforcer+

Metadata Provider

FAIR Data Accessor

Metrics Gatherer Access Controller

EUDAT Current ComponentsEUDAT Current

ComponentsEUDAT Current

ComponentsCurrentSolution

Components

FAIR HACKATHON - GOALS

■ Align solutions with FAIR Data Point specifications.

■ Metadata content

■ API

■ Data

FAIR HACKATHON OUTCOME

■ FAIR data model for solutions content;

■ Architecture of the required adjustments/extensions;

■ Technical specification of the adjustments/extensions;

■ Proof-of-concept of the adjusted solution;

FAIR HACKATHONS

RDRF

MOLGENIS FAIR HACKATHON

MOLGENIS FAIR HACKATHON

MOLGENIS FAIR HACKATHON

DTL’S FAIR HACKATHONS ROADMAP■ EUDAT (pilot project ongoing)

■ EGA (July 6-8 2016)

■ Molgenis (Oct 19-20 2016)

■ Patient registry solution providers (Oct 25-27 2016)

■ Mendeley (Nov 18 2016)

■ Quaero Systems (Nov 24 2016)

■ tranSMART (TBD)

■ phenotypeDB (TBD)

■ Euretos Knowledge Platform (TBD)

■ NIH, Australian National Data Services, Brazilian open government data, …

BRING YOUR OWN DATA - BYOD

■ Goals: ■ Learn how to make data linkable “hands-on” with experts ■ Create a “telling story” to demonstrate its use ■ Make FAIR Data at the source

■ Composition: ■ Data owners – specialists on given datasets ■ Data interoperability experts ■ Domain experts

Source: Marcos Roos

NETHERLANDS

Domain Expert

Data Owner FAIR Data Expert

BYOD

NETHERLANDS

BYOD

Non-FAIR Dataset

Metadata

Non-FAIR Dataset

FAIR DataTransformation

Ontology

Ontology

FAIR datasets

FAIR datasets

NETHERLANDS

BYODPlanning

Preparation Execution Follow Up

NETHERLANDS

BYODPlanning

Preparation

Identify Plan

Datasets Attendees' profile Output data access Tentative dates Tentative venue Costs Funds

Coordination Set date Invite attendees Set venue Catering Lodging Financial planning

Publicity Working document Preparatory calls Data hosting Software hosting Documentation hosting

NETHERLANDS

BYODPlanning

Execution

Day One

Introduction SW, LD, Ontology intro Use case intro Workgroups division Working sessions WWW/TTTALA

Day Two

Progress report Working sessions Groups reports WWW/TTTALA

Day Three

Data integration Answer driving question Explore data Demo improvement Final report WWW/TTTALA

NETHERLANDS

BYODPlanning

Follow-Up

D+15

Report difficulties Clarifications Next steps

D+45

Report difficulties Clarifications Next steps

Implementation

Expand FAIRification Implement solution Scale-up solution Deploy

BYOD

FA

IR

FAIR HACKATHON

BBMRI

2.0

FAIR dICT

RD Connectt

ODE

X 4A

LL

myFAIR El

ixir

Exce

llera

te

Core FAIR TechnologyFAIR Data E.T.

FAIR Data E.T.

FAIR Data E.T.

FAIR Data E.T.

FAIR Data E.T.

FAIR Data E.T.

DTL

RELATED PROJECTS

ODEX4allFAIR-dICT

myFAIR

QUESTIONS?