1
The SEAD Prototype: Data Curation and Preservation for Sustainability Science Beth Plale, Robert H. McDonald, Kavitha Chandrasekar, Inna Kouper, Indiana University, {plale, rhmcdona, kavchand, inkouper}@indiana.edu Margaret Hedstrom, James Myers, University of Michigan, {hedstrom, myersjd}@umich.edu Praveen Kumar, Rob Kooper, Luigi Marini, University of Illinois at Urbana-Champaign, {kumar1, kooper, lmarini}@illinois.edu SEAD Vision and Rationale Serve interdisciplinary and data- driven research in sustainability science Enable access to publications, data and people Support new types of analyses with heterogeneous data Reduce overall cost of data curation and preservation Capture metadata to provide immediate value for users, producers and repositories Increase capabilities for research data re-use SEAD Use Cases (focusing on curation) Ingestion of heterogeneous data types (e.g., images, geo-spatial data, and sensor data) and mapping of semantic relationships among the research data collections as well as semantic annotation and tagging. Support of data discovery through interoperable standards and algorithms, social networking and data publishing. Enhancements of existing data through automated scientific metadata extraction and data visualization plugins. Ingestion of new data sets directly via workbench tools. Curation of data via federated deposit into institutional and disciplinary repositories. SEAD Prototype Branded Public Access Active Project Spaces Individual Data Data pages Collection pages Tag – Search – Map Project Summary Geo-Web App Branded Repository Android – Desktop Apps APIs – Web Services Role-based Access Control Data/ Metadata Management Extractors and Indexing User Management RDF –Tupelo 2 – Medici – Lucene – Geoserver MySQL – Local File System Active Content Reposito ry People Projects Publications Organizations Data Citations Visualization s APIs – Joseki – Web Services Jena – RDF MySQL – Local File System VIVO Curator’s Workbench Ingest Processing Matchmaking Faceted Search Geo-spatial Search APIs – Web Services Metadata Extraction Persistent IDs Indexing Archiving Solr Query (XML) Geospatial Query MySQL – Local File System – Solr – PostGIS Virtual Archive BagIt Conversi on Matchmak er DataONE Member Node Acknowledgements SEAD is funded by the National Science Foundation under Cooperative Agreement #OCI0940824. SEAD gratefully acknowledges all of our partner participants who have been involved in developing our services framework. This includes the research teams from the following organizations: School of Information, University of Michigan; Department of Civil and Environmental Engineering, the National Center for Supercomputing Applications (NCSA) and UIUC Libraries, University of Illinois at Urbana-Champaign; Data to Insight Center, IU Libraries and School of Informatics and Computing, Indiana University; the Interuniversity Consortium for Political and Social Research (ICPSR); the National Center for Earth-Surface Dynamics (NCED) and the Data Conservancy Project, John Hopkins University. Currently, SEAD has implemented core functionality for uploading, annotating, and viewing data, linking data to researcher profiles, and mechanisms to package this information and transfer it to institutional repositories or archival cloud storage. The curation pipeline to institutional repositories supports both long-term preservation and search and discovery workflows. The SEAD prototype is currently being tested by ingesting, annotating, and preserving datasets from the National Center for Earth Surface Dynamics (1.6 terabytes of data containing over 450,000 files) which involves transfer of data and metadata between SEAD ACR, VIVO and VA components. Active Curation, Actionable Data (ACR) Community Exploration, Research Analytics (VIVO) People / Projects / Publications Data Citations Organizations Visualized Networks and Community Dynamics Policy-Driven Curation Institutional / Cloud / Grid Storage Faceted Search Data Publication, Preservation and Discovery (VA) SPRAQL / HTTP SPRAQL / HTTP BAGIT User / Entity Management – Analytics

SEAD Prototype: Data Curation and Preservation for Sustainability Science

  • Upload
    sead

  • View
    126

  • Download
    1

Embed Size (px)

DESCRIPTION

A poster presented at ESIP July 2013

Citation preview

Page 1: SEAD Prototype: Data Curation and Preservation for Sustainability Science

The SEAD Prototype: Data Curation and Preservation for Sustainability ScienceBeth Plale, Robert H. McDonald, Kavitha Chandrasekar, Inna Kouper, Indiana University, {plale, rhmcdona, kavchand, inkouper}@indiana.edu

Margaret Hedstrom, James Myers, University of Michigan, {hedstrom, myersjd}@umich.eduPraveen Kumar, Rob Kooper, Luigi Marini, University of Illinois at Urbana-Champaign, {kumar1, kooper, lmarini}@illinois.edu

SEAD Vision and Rationale Serve interdisciplinary and data-driven

research in sustainability science Enable access to publications, data and

people Support new types of analyses with

heterogeneous data Reduce overall cost of data curation and

preservation Capture metadata to provide immediate

value for users, producers and repositories Increase capabilities for research data re-

use

SEAD Use Cases (focusing on curation) Ingestion of heterogeneous data types

(e.g., images, geo-spatial data, and sensor data) and mapping of semantic relationships among the research data collections as well as semantic annotation and tagging.

Support of data discovery through interoperable standards and algorithms, social networking and data publishing.

Enhancements of existing data through automated scientific metadata extraction and data visualization plugins.

Ingestion of new data sets directly via workbench tools.

Curation of data via federated deposit into institutional and disciplinary repositories.

SEAD Prototype

Branded Public Access Active Project Spaces Individual Data Pages

Dat

a pa

ges

Colle

ction

pag

esTa

g –

Sear

ch –

Map

Proj

ect

Sum

mar

yG

eo-W

eb A

ppBr

ande

d Re

posi

tory

Andr

oid

– D

eskt

op

Apps

APIs – Web Services

Role-based Access Control

Data/Metadata Management

Extractors and Indexing

User Management

RDF –Tupelo 2 – Medici – Lucene – Geoserver

MySQL – Local File System

Active Content

Repository

Peop

lePr

ojec

tsPu

blic

ation

sO

rgan

izatio

nsD

ata

Cita

tions

Visu

aliza

tions

APIs – Joseki – Web Services

Jena – RDF

MySQL – Local File System

VIVO

Cura

tor’s

W

orkb

ench

Inge

st P

roce

ssin

gM

atch

mak

ing

Face

ted

Sear

chG

eo-s

patia

l Sea

rch

APIs – Web Services

Metadata ExtractionPersistent IDs

IndexingArchiving

Solr Query (XML)

Geospatial Query

MySQL – Local File System – Solr – PostGIS

Virtual Archive

BagIt Conversion Matchmaker DataONE

Member Node

Acknowledgements

SEAD is funded by the National Science Foundation under Cooperative Agreement #OCI0940824.

SEAD gratefully acknowledges all of our partner participants who have been involved in developing our services framework. This includes the research teams from the following organizations: School of Information, University of Michigan; Department of Civil and Environmental Engineering, the National Center for Supercomputing Applications (NCSA) and UIUC Libraries, University of Illinois at Urbana-Champaign; Data to Insight Center, IU Libraries and School of Informatics and Computing, Indiana University; the Interuniversity Consortium for Political and Social Research (ICPSR); the National Center for Earth-Surface Dynamics (NCED) and the Data Conservancy Project, John Hopkins University.

Currently, SEAD has implemented core functionality for uploading, annotating, and viewing data, linking data to researcher profiles, and mechanisms to package this information and transfer it to institutional repositories or archival cloud storage. The curation pipeline to institutional repositories supports both long-term preservation and search and discovery workflows. The SEAD prototype is currently being tested by ingesting, annotating, and preserving datasets from the National Center for Earth Surface Dynamics (1.6 terabytes of data containing over 450,000 files) which involves transfer of data and metadata between SEAD ACR, VIVO and VA components.

Active Curation, Actionable Data (ACR) Community Exploration,

Research Analytics (VIVO)

People / Projects / Publications Data Citations Organizations Visualized Networks and Community

Dynamics

Policy-Driven Curation Institutional / Cloud / Grid Storage Faceted Search

Data Publication, Preservation and Discovery (VA)

SPRAQL / HTTP

SPRAQL / HTTP BAGIT

User / Entity Management – Analytics