25
Metadata in the iPlant Collaborative Cyberinfrastructure Birds of a Feather meeting at PAG XXII, Jan. 14, 2014

Metadata in the iPlant Collaborative Cyberinfrastructure Birds of a Feather meeting at PAG XXII, Jan. 14, 2014

Embed Size (px)

Citation preview

Metadata in the iPlant Collaborative Cyberinfrastructure

Birds of a Feather meeting at PAG XXII, Jan. 14, 2014

From the iPlant Data Strategy:

“The vision for iPlant CI data capabilities is to provide flexible, adaptive and scalable data infrastructure that enables users and communities to implement best practices for data management.”

How to enable best practices for data management in iPlant:

1. A way to add and edit metadata2. Metadata templates for common file types3. Search and browse iDS based on metadata and file

content4. Support for unstructured and structured

(relational) data within the iDS5. Interoperability with key external data sources6. Benefits/features that are aligned with the use of

popular file types7. An iPlant Data Commons for public data

KEY ELEMENTS OF THE iPLANT DATA STRATEGY

1. CI to enable users to add and edit metadata using simple and flexible interfaces, including customizable metadata components.– a web-based user interface accessible via the DE– upload metadata as csv file– access to all metadata entities via iPlant APIs

Current DE metadata interface

Metadata: /iplant/home/user/file.txt

Attribute6 Value6

attribute_1 value_a

attribute_2 value_b

attribute_3 value_c

attribute_4 value_d

Add Delete Templates 6

OK Cancel

Browse Templates

2. Project data management templates and best practices for organizing, handling and managing data for diverse use cases, including:– groups or consortia working on large-scale

genome and transcriptome sequencing projects or species range maps

– single PI laboratories focused on specific analysis such as RNA-Seq experiments, phenotype data sets

Metadata: /iplant/home/user/file.txt

Attribute6 Value6

attribute_1 value_a

attribute_2 value_b

attribute_3 value_c

attribute_4 value_d

Add Delete Templates 6

OK Cancel

Browse Templates

Metadata: /iplant/home/user/file.txt

Attribute6 Value6

attribute_1 value_a

attribute_2 value_b

attribute_3 value_c

attribute_4 value_d

Add Delete Browse

Templates

OK Cancel

Browse Templates

Cancel

Metagenomic Sequence (MIMS)

Eukaryotic Genome Sequence (MIGS)

Genome Sequence in iDS

Item 1

Select a template

Insert

Attributes Preview

Metadata: /iplant/home/user/file.txt

Attribute6 Value6

attribute_1 value_a

attribute_2 value_b

attribute_3 value_c

attribute_4 value_d

Add Delete Browse

Templates

OK Cancel

Browse Templates

Cancel Insert

Metagenomic Sequence (MIMS)

Eukaryotic Genome Sequence (MIGS)

Item 3Item 5 Genome Sequence in iDS

Item 1

Attributes Preview

project specimen identifier

i collection date

i geographic location nam…

i geographic location longi… geographic location

latit…

i genus

i species infraspecific name

Metadata: /iplant/home/user/file.txt

Attribute6 Value6

attribute_1 value_a

attribute_2 value_b

attribute_3 value_c

attribute_4 value_d

Add Delete Browse

Templates

OK Cancel

Browse Templates

Cancel Insert

Metagenomic Sequence (MIMS)

Eukaryotic Genome Sequence (MIGS)

Item 3Item 5 Genome Sequence in iDS

Item 1

Attributes Preview

project specimen identifier

i collection date

i geographic location nam…

i geographic location longi… geographic location

latit…

i genus

i species infraspecific name

Metadata: /iplant/home/user/file.txt

Add Delete Browse Templates

OK Cancel

Accordion Item

Accordion Item

Accordion Item

Attribute6 Value6

i project* jackson

i specimen identifier 54769

i collection date* 2008-01-23T19:23

i sequencing method*

Template: Metagenemoic Sequence

Metadata

Metadata: /iplant/home/user/file.txt

Add Delete Browse Templates

OK Cancel

Accordion Item

Accordion Item

Accordion Item

Attribute6 Value6

i project* jackson

i specimen identifier 54769

i collection date* 2008-01-23T19:23

i sequencing method*

Template: Metagenemoic Sequence

Metadata

All of these are ISO8601 compliant time stamps: 2008-0123T19:23:10+00:00…

Metadata: /iplant/home/user/file.txt

Add Delete Browse Templates

Cancel

Accordion Item

Accordion Item

Accordion Item

Attribute6 Value6

i project* jackson

i specimen identifier 54769

i collection date* 2008-01-23T19:23

i sequencing method*

Template: Metagenemoic Sequence

Metadata

OK

Metadata: /iplant/home/user/file.txt

Add Delete Browse Templates

OK Cancel

Accordion Item

Accordion Item

Accordion Item

Attribute6 Value6

i project* jackson

i specimen identifier 54769

i collection date* 2008-01-23T19:23

i sequencing method*

Template: Metagenemoic Sequence

Metadata

This field is required.

Metadata: /iplant/home/user/file.txt

Add Delete Browse Templates

Cancel

Accordion Item

Accordion Item

Accordion Item

Attribute6 Value6

i project* jackson

i specimen identifier 54769

i collection date* 2008-01-23T19:23

i sequencing method* DOI#

Template: Metagenemoic Sequence

Metadata

OK

3. CI to support searching and browsing based on metadata attributes and suitable file content.– provenance/system metadata and scientific

metadata– across both private data and public data– ontology enhanced searches

Search capabilities

• Search API: users will be able to search by – file or folder name– any metadata attribute or value– date created– date last modified– creator– file size– file type– tool that created the file– analysis that created a file or folder– constraints (and, or, xor)

Search capabilities

• Users will be able to make "smart folders", that is, folders for all the files that match a set of search criteria.

4. Support for unstructured, semi-structured, and structured (relational) data within the iDS.– Document-based and NoSQL approaches to

support unstructured and semi-structured data– Support for large matrix based data sets (e.g., in

GBS, GWAS, etc.)– A way for users to search and access data in iPlant-

hosted projects that include MySQL and PostgreSQL databases

5. Interoperability with key external data sources, including, but not limited to:– Ability to use external data in analyses run through

iPlant, e.g., import from BioMart– Access to databases like CoGe, PO, MaizeGDB– Ability to push/publish/link data housed in iDS to

canonical public repositories like NCBI, Data Dryad– Ability to engage semantic services and semantic

pipelines based on metadata and ontological reasoning systems.

6. Benefits/features that are aligned with the use of popular file types. – provide the suitable utilities, tools, integration,

and documentation on best data management practices for projects utilizing these formats

7. An iPlant Data Commons that provides stable access to objects in the iDS that includes:– The option to make data public and permanent

(un-editable).– Issuing multiple permanent identifiers (unique IDs)

as needed (i.e. DOI, NOID, ARK) while packaging the content in standard compliant formats.