17
DataONE: A Distributed Earth Science Data Network Supporting the Full Data Life Cycle Robert Cook Oak Ridge National Laboratory W. Michener, D. Vieglais, A. Budden, and R. Koskela University of New Mexico

A Distributed Earth Science Data Network Supporting the ... · A Distributed Earth Science Data Network Supporting the Full Data Life Cycle ... D. Vieglais, ... • Dave Vieglais

Embed Size (px)

Citation preview

DataONE: A Distributed Earth Science Data Network Supporting the Full Data Life Cycle Robert Cook Oak Ridge National Laboratory

W. Michener, D. Vieglais, A. Budden, and R. Koskela University of New Mexico

Agenda

• Introduction

• Approach • Cyberinfrastructure

• Community Engagement

• DataONE and Data Life Cycle

2

Objectives: Enable Science

3

Dec

reas

ing

Spat

ial C

over

age

Incr

easi

ng P

roce

ss K

now

ledg

e

Adapted from CENR-OSTP

Remote sensing

Intensive science sites and experiments

Extensive science sites

Volunteer & education networks

“Building the Knowledge Pyramid” 80:20 20:80

Objectives: Solve Data Challenges

4

DataONE Vision and Approach

Enable new science and knowledge creation through universal access to data about life on earth and the environment that sustains it.

1. Build on existing cyberinfrastructure

2. Create new cyberinfrastructure

3. Support communities of practice

5

Three major components for a flexible, scalable, sustainable network

Member Nodes • diverse institutions • serve local community • provide resources for

managing their data • retain copies of data

Coordinating Nodes • retain complete metadata

catalog • indexing for search • network-wide services • ensure content

availability (preservation) • replication services

Investigator Toolkit

Cyberinfrastructure

6

Operational core infrastructure • Three coordinating nodes:

• ORC, UCSB, UNM • Seven member nodes:

• KNB SANParks • Dryad ORNL DAAC • Merritt USGS • Avian Knowledge Network

• Essential investigator toolkit components: • Search interface (ONE Mercury) • ONE-R Plugin • Developer tools in in Python and Java

• Design and component documentation

July 2012 Cyberinfrastructure Release

7

Data Life Cycle

Plan

Collect

Assure

Describe

Preserve

Discover

Integrate

Analyze

8

Community Involvement through Working Groups

9

Engagement Research

• Community Engagement and Education

• Sociocultural Barriers to Data Sharing / Preservation

• Public Participation in Science and Research

• Sustainability and Governance

Infrastructure Research

• Federated Security • Data Integration and

Semantics • Data Preservation and

Metadata • Distributed Storage • Scientific Workflows and

Provenance

• Exploration, Visualization, and Analysis • Usability and Assessment

Working Groups Explore the Entire Life Cycle

10

Engagement Research

• Community Engagement and Education

• Sociocultural Barriers to Data Sharing / Preservation

• Public Participation in Science and Research

• Sustainability and Governance

Infrastructure Research

• Federated Security • Data Integration and

Semantics • Data Preservation and

Metadata • Distributed Storage • Scientific Workflows and

Provenance

• Exploration, Visualization, and Analysis • Usability and Assessment

Working Groups and the Data Life Cycle

Plan

Collect

Assure

Describe

Preserve

Discover

Integrate

Analyze

11

Public Participation in Science and Research

Data Integration and Semantics

Scientific Workflows and Provenance

Distributed Storage

Exploration, Visualization, and Analysis

Sociocultural Barriers to Data Sharing / Preservation

Data Preservation and Metadata

Spatio-Temporal Exploratory Model identifies factors affecting patterns of migration

Diverse bird observations and environmental data from 300,00 locations in the US integrated and analyzed using High Performance Computing Resources

Land Cover

Meteorology

MODIS – Remote sensing data

• Analyze patterns of migration

• Predict future bird distributions

Model results

Occurrence of Indigo Bunting (2008)

Jan Sep Dec Jun Apr

Exploration, Visualization, and Analysis Working Group

12

Plan

Collect

Assure

Describe

Preserve

Discover

Integrate

Analyze

DataONE Investigator Toolkit – 2012

13

Data Management Planning Tool http://dmp.cdlib.org

14

Plan

Collect

Assure

Describe

Preserve

Discover

Integrate

Analyze

DataONE Investigator Toolkit – future

Morpho

15

DataONE and the Data Life Cycle

• Education and training: Providing essential skills (e.g., data management training, best practices) for scientific enquiry

• Discovery and access: Enabling discovery and universal access to data about life on earth from around the world

• Data integration, visualization, and synthesis: Providing transformational tools that enable cross-cutting research

• Building community: Combining expertise and resources across diverse communities to collectively educate, advocate, and support the scientific data life cycle

16

DataONE Team and Sponsors

• Bertram Ludaescher

• Peter Honeyman

• Jeff Horsburgh

• Robert Sandusky

• Peter Buneman

• Carole Goble

• Cliff Duke

• Donald Hobern

• Ewa Deelman • Amber Budden, Roger Dahl, Rebecca Koskela, Bill Michener, Robert Nahf, Mark Servilla

• Patricia Cruse, John Kunze

• Dave Vieglais

• Paul Allen, Rick Bonney, Steve Kelling

• Chad Berkley, Stephanie Hampton, Matt Jones

• Suzie Allard, Carol Tenopir, Maribeth Manoff, Robert Waltz, Bruce Wilson

• John Cobb, Bob Cook, Giri Palanisamy, Line Pouchard, Suresh SanthanVannan

• Mike Frame, Viv Hutchison, Jeff Morisette, Jake Weltzin, Lisa Zolly

• David DeRoure

• Ryan Scherle, Todd Vision

LEON LEVY FOUNDATION

• Randy Butler