7
WT-Exec Team Illinois: B. Ludaescher, M. Turk, V. Stodden, K. Turner (Proj Mgr), TBD (sw architect) U Chicago: K. Chard TACC: N. Gaffney UCSB: M. Jones Notre Dame: J. Nabrzyski & WT-dev team, WT collaborators, WT Working Groups, Summer Interns The Whole Tale Merging Science and Cyberinfrastructure Pathways

20160608 whole tale for connecting journals to data repositories

Embed Size (px)

Citation preview

Page 1: 20160608 whole tale for connecting journals to data repositories

WT-Exec Team Illinois: B. Ludaescher, M. Turk, V. Stodden, K. Turner (Proj Mgr), TBD (sw architect)U Chicago: K. Chard TACC: N. GaffneyUCSB: M. Jones Notre Dame: J. Nabrzyski & WT-dev team, WT collaborators, WT Working Groups, Summer Interns

Introducing The Whole

TaleMerging Science and Cyberinfrastructure

Pathways

Page 2: 20160608 whole tale for connecting journals to data repositories

Problems Facing Data Researchers Workflow for data research is fragmented

● Ingestion: Data comes from many sources and is integrated the “old fashioned way,” via chains of email

● Storage/Sharing: A collection of cloud services copies data from Dropbox and Box to local storage with a distributed directory structures to organize (and provide discovery) to data

● Use: Actions taken on data are not recorded (custom scripts, some version of a community developed and supported codebase)

● Output: Publication of final data as prescribed by a Data Management Plan (hopefully with a DOI) with link in publications gives no reproducibility

Page 3: 20160608 whole tale for connecting journals to data repositories

Enter the Whole Tale (WT)WT will leverage & contribute to existing CI and tools to support the whole science story (= run-to-pub-cycle), also providing access to big CI & HPC for long tail researchers.

➡ Integrated tools to simplify usage and promote best practices.

Page 4: 20160608 whole tale for connecting journals to data repositories

The Whole Tale’s ApproachWT will integrate well established CI components creating a simple and

unified environment to use, share, and publish data and workflows1. Unified Authentication via Globus Auth2. Abstracted Storage Layer with a unified namespace3. Integrated Python and R APIs integrated with Jupyter Notebook

Environments4. Ingest and publication service linking data, computations, and scholarly

articles5. OwnCloud desktop integration for “Dropbox like interface”6. Event System to react to changes (e.g. new data published)7. Data Dashboard to ease data management and service interactions

Capture full workflow via notebooks, scripts, and applications to be published along with data and research publications

Page 5: 20160608 whole tale for connecting journals to data repositories

Astronomy with Whole TaleAllen uses university credentials to access large cosmological

simulation outputs from Blue Waters published into WT, does analysis using Whole Tale services in a Jupyter Notebook, and creates a new result, published back in WT, obtaining a DOI for the linked data and source code, tied to original input data. Allen references this in research paper.

Beth finds the DOI, and is able to access data and analysis to then compare model output with new observations from the Hobby Eberly Telescope Dark Energy eXperament on TACC systems. A new DOI is created for these results, and Allen is notified about them.

Page 6: 20160608 whole tale for connecting journals to data repositories

Further Augmented Publications• Imagine a paper with a figure (or table) that represents a view of a

dataset, or a simulation• In the paper, imagine you can click on the figure – what should happen?• Today, the figure link might open a larger version of the figure, perhaps

published with a DOI and cited in the paper• Or a dataset (particularly likely for a table) that you can download, maybe

also published with a DOI and cited in the paper• Resources hosted by the article publisher, or a third party service (e.g.,

figshare, datadryad)• Whole Tale will enable data collections (including data, software, workflow,

etc.) to be automatically created and published as part of a research workflow

• Whole Tale further vision: When you click on the image, you connect to a service that loads the data and provides some ways to represent it, or to a workflow that is set at the point at which the figure/table was generated

• Publishers don’t need to do anything new to make this work

Page 7: 20160608 whole tale for connecting journals to data repositories

Base Share Integrate Reproduce Operation

• Ingest data from HTTP, Globus, and DataONE

• Store data in a private cloud based home directory

• Move and manage data in iRODS

• Interact with data using Jupyter

• Manage data across ownClound & iRODS

• Authenticate using ORCID

• Interact with data through a suite of frontends

• Automatically extract key metadata

• Search and manage distributed data from within frontends

• Operate on remote data as if it were local (including using OAI-ORE)

• Utilize a single identity across services

• Discover and share frontends through global repository

• Integrate data and workflows with publications

• Issue, resolve, and track identifiers for distributed data

• Discover data using federated and distributed queries

• Track provenance across services

• Organize data collections via user-defined namespaces