59
P. Bryan Heidorn University of Arizona and JRS Biodiversity Foundation 8 August 2011 Impacto de la informática en el conocimiento de la biodiversidad: actualidad y futuro Universidad Nacional de Colombia and Instituto de Ciencias Naturales, Bogotá Biodiversity Informatics: An Interdisciplinary Challenge Adapted in part from 2010 KENYA’S INTERNATIONAL CONFERENCE ON BIODIVERSITY, LAND USE AND CLIMATE NAIROBI 15 th to 17 th September 2010

Biodiversity Informatics: An Interdisciplinary Challenge

Embed Size (px)

DESCRIPTION

"Impacto de la Informática en el Conocimiento de la Biodiversidad: Actualidad y Futuro” at Universidad Nacional de Colombia on August 12, 2011. https://sites.google.com/site/simposioinformaticaicn/home

Citation preview

Page 1: Biodiversity Informatics: An Interdisciplinary Challenge

P. Bryan HeidornUniversity of Arizona and JRS Biodiversity Foundation

8 August 2011Impacto de la informática en el conocimiento de la

biodiversidad: actualidad y futuroUniversidad Nacional de Colombia and Instituto de Ciencias Naturales, Bogotá

Biodiversity Informatics: An Interdisciplinary Challenge

Adapted in part from 2010 KENYA’S INTERNATIONAL CONFERENCE ON BIODIVERSITY, LAND USE AND CLIMATE CHANGENAIROBI 15th to 17th September 2010

Page 2: Biodiversity Informatics: An Interdisciplinary Challenge

University of Arizona

Page 3: Biodiversity Informatics: An Interdisciplinary Challenge

Biodiversity Informatics

The development and use of information technology-based sociotechnical systems to document, understand and protect biological diversity particularly at the organismal level.

Page 4: Biodiversity Informatics: An Interdisciplinary Challenge

Main Themes

• Cyberinfrastructure enabled science• Greater reuse of data• Mobilization of analog data• Data integration• Distributed collaborative research• Citizen science• High volume and high computation

Page 5: Biodiversity Informatics: An Interdisciplinary Challenge

Cyberinfrastructure Vision

“The anticipated growth in both the production and repurposing of digital data raises complex issues not only of scale and heterogeneity, but also of stewardship, curation and long-term access.”

NSF Cyberinfrastructure Vision for 21st Century Discovery, Chapter 3

Page 6: Biodiversity Informatics: An Interdisciplinary Challenge

Recognition of need for data curation

“Recommendation 6: The NSF, working in partnership with collection managers and the community at large, should act to develop and mature the career path for data scientists and to ensure that the research enterprise includes a sufficient number of high-quality data scientists.”

Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century, Recommendations

Page 7: Biodiversity Informatics: An Interdisciplinary Challenge

• Recognition of the importance of Information

• Recognition of the need for education

• New work roles within traditional institutions

Interagency Working Group on Digital Data

Page 8: Biodiversity Informatics: An Interdisciplinary Challenge

Dark data is the data that we know is/was there but we can’t see it.

Hubble Space Telescope composite image "ring" of dark matter in the galaxy cluster Cl 0024+17

Page 9: Biodiversity Informatics: An Interdisciplinary Challenge

Does NSF’s Data Follow the Power Law?

I do not know but if $1 = X bytes…..

Awarded Amount 2007

$0

$1,000,000

$2,000,000

$3,000,000

$4,000,000

$5,000,000

$6,000,000

$7,000,000

1 586 1171 1756 2341 2926 3511 4096 4681 5266 5851 6436 7021 7606 8191 8776

Heidorn, P. Bryan (2008). Shedding Light on the Dark Data in the Long Tail of Science. Library Trends 57(2) Fall 2008 . Institutional Repositories: Institutional Repositories: Current State and Future. Edited by Sarah Sheeves and Melissa Cragin. (http://hdl.handle.net/2142/9127).

Page 10: Biodiversity Informatics: An Interdisciplinary Challenge

The Future is all about Data

• How do we get it?• How do we analyze it?• How do we disseminate it (Maps, charts

tables..)?• How do we keep it?

– Provenance, Storage, Weeding • How do we make it sustainable?

Page 11: Biodiversity Informatics: An Interdisciplinary Challenge

Data Repurposing

From: To stand the test of time: Long-term stewardship of of digital data sets in science and engineering. Sept 26-27, 2006 Arlington VA

Page 12: Biodiversity Informatics: An Interdisciplinary Challenge

Where is your data now?

Is it doing good or is it sleeping or dead?

Page 13: Biodiversity Informatics: An Interdisciplinary Challenge

Cyberinfrastructure Needs

• Storage• Access• Processing• Communication• Training• Institutions

Page 14: Biodiversity Informatics: An Interdisciplinary Challenge

The iPlant Collaborative

Cyberinfrastructure to Support the Challenges of Modern Biology

Society for Experimental Biology, Glasgow, UK

July 3rd, 2011

Dan Stanzione

Co-PI and Cyberinfrastructure Lead, iPlant Collaborative

Deputy Director, Texas Advanced Computing Center

[email protected]

[email protected]

Page 15: Biodiversity Informatics: An Interdisciplinary Challenge

What is iPlant?

• iPlant’s mission is to build the CI to support plant biology’s Grand Challenge solutions

• Grand Challenges were not defined in advance, but identified through engagement with the community

• A virtual organization with Grand Challenge teams relying on national cyberinfrastructure

• Long term focus on sustainable food supply, climate change, biofuels, ecological stability, etc

• Hundreds of participants globally… Working group members at >50 US institutions, USDA, DOE, etc.

Page 16: Biodiversity Informatics: An Interdisciplinary Challenge

Brief History• Funding by NSF – February 1st, 2008

• iPlant Kickoff Conference at CSHL – April 2008o ~200 participants

Grand Challenge Workshops – Sept-Dec 2008 CI workshop – Jan 2009 Grand Challenge White Paper Review – March 2009 Project Recommendations – March 2009 Project Kickoffs – May 2009 & August 2009 Start of software development; September 2009 First prototypes to public: April 2010 First release with user-driven tool integration: July 2011

Page 17: Biodiversity Informatics: An Interdisciplinary Challenge

iPlant’s Central Challenge

• To define what it means to build a lasting, community driven Cyberinfrastructure for the Grand Challenges of Plant Science, to get community buy-in of this vision, and to execute this vision.

Page 18: Biodiversity Informatics: An Interdisciplinary Challenge

Steve Goff, PIU of Arizona

Dan Stanzione, coPITexas Advanced Computing Center

National Science BoardUpdate on Award Progress: DBI -0735191

Directorate for Biological SciencesJuly 2011

Page 19: Biodiversity Informatics: An Interdisciplinary Challenge

What iPlant Offers

Page 20: Biodiversity Informatics: An Interdisciplinary Challenge

Grand Challenges in Plant Science• Genotype-to-Phenotype

– To understand how DNA blueprints produce a plant’s characteristic traits and functions and to predict how traits change in response to complex environments

– Requires ability to collect, query, interpret, and model high-throughput, genome-scale data sets

• Tree of Life– To understand evolutionary

relationships among green plants– Requires ability to create, display, and query

information in very large phylogenetic trees

Page 21: Biodiversity Informatics: An Interdisciplinary Challenge

iPlant Progress

• Science Planning (Year 1)– Community engagement– Grand Challenge selection

• Cyberinfrastructure Design (Year 2)– Requirements generation– Technology evaluations– Prototyping

Page 22: Biodiversity Informatics: An Interdisciplinary Challenge

iPlant Progress

• Release of CI deliverables (Year 3)– iPlant Discovery Environments and Tools

• iPlant Genotype to Phenotype Tools– Processing and integration of high throughput data– Modeling and visualization of phenotypic expression

• iPlant Tree of Life Tools– Assembly, Reconciliation and Viewing– Taxonomic Name Resolution Service– My-Plant social networking site

• DNA Subway Tool for genome annotation / analysis

Page 23: Biodiversity Informatics: An Interdisciplinary Challenge

Taxonomic Name Resolution Service

Page 24: Biodiversity Informatics: An Interdisciplinary Challenge

Biodiversity: Development of new knowledge and tools to use knowledge

• Progress on digitization of the world’s billion+ museum specimens

• Distribution of digitized products through global networks (e.g. the Global Biodiversity Information Facility).

• Digitization of hundreds of millions of pages of natural history text (begun with the Biodiversity Heritage Library)

• Large online stores of information on species such as the Encyclopedia of Life

Page 25: Biodiversity Informatics: An Interdisciplinary Challenge

The Biodiversity Heritage Library has 34 million pages now

Palaeontology, or, A systematic summary of extinct animals and their geological relations / by Richard Owen. Publication info:Edinburgh :A. and C. Black,1860.

Long Citation Half-lifeCritical use for TaxonomyEcology and Environmental HistoryNaming for genomics and metagenomics

Page 26: Biodiversity Informatics: An Interdisciplinary Challenge

The Rubiaceae of Colombia, by Paul C. Standley. Chicago,1930.Chicago :Field Museum of Natural History,

Page 27: Biodiversity Informatics: An Interdisciplinary Challenge

Mobilizing Data Locked on Paper

• Fine-Grained Semantic Markup of Descriptive Data for Knowledge Applications in Biodiversity Domains Hong Cui [email protected] (Principal Investigator)

• The University of Arizona is awarded a grant to develop and evaluate a set of algorithms/software to help computers to read and “understand” taxonomic descriptions of plants, animals, and other living or fossil organisms. The major functions of the algorithms/software include 1) annotate large sets of text descriptions in a machine-readable way to support various knowledge applications, including producing character matrices and identification keys for various taxon groups.

Page 28: Biodiversity Informatics: An Interdisciplinary Challenge

Semantic Markup System

Trai

ning

Thu

rsda

y fo

r stu

dent

s

Page 29: Biodiversity Informatics: An Interdisciplinary Challenge

The Problem

• It is difficult to find what is already known• Clone specimens may be stored in different

museums around the world• DNA analysis may be conducted on one but

not the other• Micrographs may be in a database• Taxonomic treatments or revisions may exist

Page 30: Biodiversity Informatics: An Interdisciplinary Challenge

Biological Science Collections (BiSciCol) Tracker

S1: KNM

S2: MNHN

Muséum national d'histoire naturelle

Nairobi National Museum

S3: MBG

Living Collection: Missouri Botanical Garden

Determination

?

?

Gene Sequence

GENBANK

?

?

?

?Parasitism

Agave sisalana

?

Page 31: Biodiversity Informatics: An Interdisciplinary Challenge

BiSciCol Tracker

Page 32: Biodiversity Informatics: An Interdisciplinary Challenge

BiSciCol Design

• Insert new design

Page 33: Biodiversity Informatics: An Interdisciplinary Challenge

NSF: Advanced Digitization of Biological Collections

• iDigBio: The National Resource for Advancing Digitization of Biological Collections

Page 34: Biodiversity Informatics: An Interdisciplinary Challenge

Organization

• National Hub (~$7.5M)– Title: A Collections Digitization Framework for the 21st Century– PI: Lawrence Page, University of Florida

• Thematic Hub (~$2M each)– Title: InvertNet–An Integrative Platform for Research on Environmental

Change, Species Discovery and Identification• PI: Christopher Dietrich, University of Illinois, Urbana-Champaign

– Title: Plants, Herbivores and Parasitoids: A Model System for the Study of Tri-Trophic Associations

• PI: Randall T. Schuh, American Museum of Natural History

– Title: North American Lichens and Bryophytes: Sensitive Indicators of Environmental Quality and Change

• PI (Principal Investigator): Corinna Gries, University of Wisconsin, Madison

Page 35: Biodiversity Informatics: An Interdisciplinary Challenge

Virtual Organization and Collaboration

• VOSS: Next Steps in Articulating Success Factors for Distributed Collaborations. Gary Olson [email protected] (Principal Investigator) Judith Olson (Co-Principal Investigator)

• Theory of Remote Collaboration. Evaluation A prototype online Collaboration Success Wizard will be developed for those engaged in collaboration or planning to collaborate to assess their strengths and weaknesses.

Page 36: Biodiversity Informatics: An Interdisciplinary Challenge

Example of Virtual Community in NanoTechnology

Page 37: Biodiversity Informatics: An Interdisciplinary Challenge

Three of the pioneers behind novel light-scattering techniques to detect certain early stage cancers joined an outside expert on biophotonics in a call-in program to discuss new research results that were presented in the Aug. 1, 2007, edition of Clinical Cancer Research. Richard McCourt (right), of NSF's Directorate for Biological Sciences, was the moderator.Credit: National Science Foundation

Page 38: Biodiversity Informatics: An Interdisciplinary Challenge

Features of Virtual Organization

• Common Goals• Geographic dispersal• Distributed strengths and capabilities• Need to multimedia collaboration• Non-residents to be treated as insiders• Document sharing, video and voice, workflow

integration.

Page 39: Biodiversity Informatics: An Interdisciplinary Challenge

Interdisciplinary and high volume data

• Cyberinfrastructure and the Dimensions in Biodiversity - Planning for Success -Madison, WI - Oct 13-15, 2010 Corinna Gries [email protected] (Principal Investigator) Matthew Jones (Co-Principal Investigator)David Vieglais (Co-Principal Investigator)

• Need to make order of magnitude improvements in rate of biodiversity study with 0 increase in cash.

• Development of cyberinfrastructure (CI) supporting integrative research in biodiversity sciences.

Page 40: Biodiversity Informatics: An Interdisciplinary Challenge

Cloud Computing

• Data-Intensive Science Workshops, to be held Sept. 19 to 20, 2010, Seattle, WA; and Mar 20 to 21, 2011, Washington DC

• Needed for most modeling with large data sets including climate models

• Needed for phylogenetic analysis

Page 41: Biodiversity Informatics: An Interdisciplinary Challenge

Occurrence Data Sharing• SilverLining: A highly scalable cloud-based platform for data

distribution and user collaboration. David Vieglais [email protected] (Principal Investigator) Eileen Lacey (Co-Principal Investigator)

• Potential for leveraging a cloud-based Platform as a Service (PaaS) for data publication to address myriad challenges currently faced by existing distributed data service architectures such as Distributed Generic Information Retrieval (DiGIR) and TDWG Access Protocol for Information Retrieval (TAPIR). Specific goals are to 1) simplify and reduce the ongoing cost of publishing data, 2) improve data quality at the source, 3) provide scalable, effective access to published data, 4) stimulate innovation by creating a simple, highly scalable platform for new applications for data interaction, and 5) develop a suite of reference applications demonstrating capacities of the new architecture.

Page 42: Biodiversity Informatics: An Interdisciplinary Challenge

Agile Science

• Disaster: RAPID: Gulf Coast Oil Spill Biodiversity Tracker. A Volunteer-based Observation Network Steven Kelling [email protected] (Principal Investigator)

• RAPID: Enhancement of Fishnet2 for Disaster Impact Assessment Henry Bart [email protected] (Principal Investigator)

Page 43: Biodiversity Informatics: An Interdisciplinary Challenge

http://ebird.org/tools/oilspill/

Page 44: Biodiversity Informatics: An Interdisciplinary Challenge
Page 45: Biodiversity Informatics: An Interdisciplinary Challenge

New Validation Models

• Filtered Push: Continuous Quality Control for Distributed Collections and Other Species-Occurrence Data. James Macklin [email protected] (Principal Investigator) Bertram Ludaescher (Co-Principal Investigator)

• networked solution to enable annotation of distributed biological collection data and to share assertions about their quality or usability.

Page 46: Biodiversity Informatics: An Interdisciplinary Challenge

Improved collection management

• Collaborative Biodiversity Collections Computing. James Beach [email protected] (Principal Investigator)

http://digbiocol.wordpress.com/

Page 47: Biodiversity Informatics: An Interdisciplinary Challenge

Map of Life

Co-Pis: Walter Jetz (Yale) Rob Guralnick (CU Boulder)

An infrastructure for integrating and advancing global species distribution knowledge

Page 48: Biodiversity Informatics: An Interdisciplinary Challenge

Scal

e (G

rain

)

World

200km

50km

1km

100m

1m

1996

: GTO

PO 3

0

2009

: SRT

MV

V4

2003

: GLC

200

0

2009

: Glo

bCov

er

1992

: BIO

ME

2001

: Im

age

2.2

Regi

onal

mod

els

TopographyLandcover

currentLandcover

future

Species distributions(Vertebrates)

?

Advancing species distribution knowledge

2006

W

WF

2005

-9: e

xper

t m

aps

Atla

s da

ta, s

urve

ys

Knowledge Gap

Hurlbert and Jetz (PNAS 2007)Jetz et al. (Conservation Biology 2008)

Page 49: Biodiversity Informatics: An Interdisciplinary Challenge

The “Wallacean shortfall”, i.e. the geographic bias and coarseness of our species distribution knowledge is a (the?) major impediment for biodiversity science and our understanding of global change impacts on biodiversity

Narrowing the knowledge gap:

1) Data mobilization (Museums, NGOs, GBIF)

2) Focused sampling

3) Model-based data integration

4) ‘Crowd-sourcing’

Overcoming the “Wallacean shortfall”

Page 50: Biodiversity Informatics: An Interdisciplinary Challenge

‘Map of Life’ aims to build on and complement the spatial biodiversity aspects of these and other efforts. By addressing key storage, query, visualization and modeling challenges common to all, and by providing mapping and data integration services, the platform is expected to empower region- and taxon-specific efforts, freeing their resources for investment in core competencies, including quality control or specific user-community needs.

Map of Life

Page 51: Biodiversity Informatics: An Interdisciplinary Challenge

An online workbench and knowledgebase to dynamically document, annotate, integrate, validate, advance, and analyze the disparate sources of global biodiversity distribution knowledge.

Map of Life

Page 52: Biodiversity Informatics: An Interdisciplinary Challenge
Page 53: Biodiversity Informatics: An Interdisciplinary Challenge

Display, spatially explicit WIKI

Jetz, McPherson & Guralnick. in review

Page 54: Biodiversity Informatics: An Interdisciplinary Challenge

Cougar

Page 55: Biodiversity Informatics: An Interdisciplinary Challenge

Modeling Software Support• Development of a Data Assimilation Capability Towards Ecological

Forecasting in a Data-Rich Era. Yiqi Luo [email protected] (Principal Investigator) S Lakshmivarahan (Co-Principal Investigator)

• Powerful eco-informatics tool that assimilate data from measurement sensor networks and to generate data products that will be useful for policy making on resource management and climate change mitigation. Ecological Platform for Assimilation of Data (EcoPAD) for data assimilation and forecasting in ecology. EcoPAD will include components of (1) core computational algorithms (e.g., ecological models) that are specifically designed to solve ecological issues, (2) a variety of optimization techniques for data assimilation, (3) various data bases that will feed into EcoPAD, and (4) diverse functions of EcoPAD

Page 56: Biodiversity Informatics: An Interdisciplinary Challenge

Formalizing Location Data

• Improving GEOLocate to Better Serve Biodiversity Informatics Henry Bart [email protected] (Principal Investigator) Nelson Rios (Co-Principal Investigator)

• a software tool for assigning latitude and longitude coordinates to text descriptions of locations where scientific collections were made (Georeferencing)

Page 57: Biodiversity Informatics: An Interdisciplinary Challenge

Collaborative Georeferencing

Page 58: Biodiversity Informatics: An Interdisciplinary Challenge

• Grant Making: about $2M/yr– Animal Tracking in South Africa– Specimen Digitization in Ghana– Social Value of Conservation in Peru– Species Pages and BD Education in Costa Rica– Niche Modeling in Brazil– Travel Grants– Lake Victoria Data Library Project in Tanzania, Uganda and

Kenya– Flora de Colombia en Línea

JRS Biodiversity Foundation

Page 59: Biodiversity Informatics: An Interdisciplinary Challenge

The Future is Collaboration and Data Sharing

• Libraries• Museums• Government• Universities

To bring the best data to the major problems and opportunities of our

time and the future

• NGO• Private Land Holders• Ranches• Farms