Big Data Initiatives for Agroecosystems

Preview:

Citation preview

Big Data Initiatives for Agroecosystems

Cynthia ParrKnowledge Services DivisionNational Agricultural Library

Ecological Society of America, 2015

Outline

• Data management at the National Agricultural Library

• Four examples1. Insects 5K – i5K Workspace2. Life Cycle Assessment3. Long-Term Agroecosystem

Research 4. Ag Data Commons

• General principles8.1 million items, Agricola, PubAg

3http://blog.thingarage.com/

raw data

citable publication

4

raw data collection

cleaning, enrichment, analysis

registration, preservation

temporary data

referable data

citable data

citable publication

Modified from Peter Wittenberg, Research Data Alliancehttps://rd-alliance.org/group/data-fabric-ig.html

i5k.nal.usda.gov

5

Genome project hosting at the i5k Workspace

• 27 pilot genomes hosted; 45 total– Storage and dissemination of a

genome assembly and anything mapped to it.

– BLAST, JBrowse Genome Browser• Manual Curation: Web Apollo• Post-curation maintenance

– Quality Control – Official Gene Set generation

• Research plan• Generate material• Sequencing• Assembly• Automated

annotation

• Manual Curation• Official gene set

generation• Genome project

maintenance

• Biological insights/Publication

Genome Project Trajectory

Life Cycle Assessment Commons

7

www.lcacommons.gov

Unformatted, non-standard

LCA Commons Concept

LCA Community

Open LCA FrameworkCommon computing environment, application,

data standards, and development

NAL LCADC

NREL USLCI

XYZ LCI DB

ABCLCI DB

Distributed computing environment & application

Common data standards

Distributed computing environment

DEFLCI DB

Common application & data standards

Interoperability Tools

Ag Data Commons

Catalog and Repository

Long Term Agro-ecosystem Research (LTAR)

LTAR Data

Common Observatory– Meteorology– Hydrology– Eddy flux CO2

– Non-CO2 gasses– Soil– Biological

10

Common Experiment Approach

– Business as usual– Aspirational

Will include data about– Management practices– Results

LTAR Data Loss N=194 of ~500 citations in 2011 LTAR site proposals

Bad links to data

No data available

80% of papers provide no way to obtain data

Data are accessible

Refers to general data source

LTAR information management

• Support for download of files, web services• Metadata in FGDC CSDGM, ISO 19115, EML,

Project Open Data• Catalog of instrument specs using SensorML 2• Data dictionaries in ISO 19110• Weather data to be converted to other formats• Field names could be converted to match different

conventions (AgMIP, etc.)

Ag Data Commons

13

data.nal.usda.gov

EnhancedDKAN

Distributed repositories

AG DATA COMMONS

Search & Knowledge Discovery

Thesaurus &Indexing

Ag Data CommonsRepository

Organization & Curation

Grant management

systems

INGESTION DISSEMINATION

PubAg

DatasetSubmission

Analytics & Tools

Data.govForest Service

NCBI

Ag Data Commons

Catalog

Color Legend:BuildingAdapt/Re-useExisting

LCA Commons

Guiding principle 1:a distributed network ….

Geospatial Catalog

Geospatial Repository

STEWARDS

Ag Data Commons (catalog)

Ag Data Commons

(repository)

USDA Enterprise Inventory

National Weather Service

Data.gov

Ecosystems.data.gov

of Networks…

Public access to open, machine readable data enables larger

scale, integrative and innovative data science

The long tail

Guiding principle 2:big data AND long tail

Guiding principle 3:curation adds value

• Data dictionaries• Standards & templates• Linkages• Semantics• Preservation

Thanks!

National Agricultural LibraryKnowledge Services Division: Susan McCarthy

LTARJeffrey Campbell, Charles Lockwood

i5K Monica Poelchau, Chris Childers

LCA Commons Peter Arbuckle, Ezra Kahn

Ag Data Commons Ursula Pieper, Jocelyn McNamara, Qing Qu, Erin Antognoli, Melissa Lowrey, Jaylen Nathwani, NuCivic

… and collaborators and testers

Recommended