11
Partnerships Drive Informatics Solutions for Biological Imaging at Ocean Observatories Heidi M. Sosik Joe Futrelle Andrew Maffei

Partnerships Drive Informatics Solutions for Biological ...dusk.geo.orst.edu/Pickup/IN54A/Sosik_IN54A.pdf · Technology solutions comprising high-performance computing scientific

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Partnerships Drive Informatics Solutions for Biological ...dusk.geo.orst.edu/Pickup/IN54A/Sosik_IN54A.pdf · Technology solutions comprising high-performance computing scientific

Partnerships Drive Informatics Solutions for

Biological Imaging at Ocean Observatories

Heidi M. Sosik Joe Futrelle

Andrew Maffei

Page 2: Partnerships Drive Informatics Solutions for Biological ...dusk.geo.orst.edu/Pickup/IN54A/Sosik_IN54A.pdf · Technology solutions comprising high-performance computing scientific

Demand for Informatics Solutions in Ocean Science

Observatories combined with new sensor technologies Unprecedented observing capabilities Unprecedented “big data” challenges

Automated biological imaging in the ocean Demanding case study Well developed science objectives

Approach Scientist – Informaticist partnerships Iterative design and evaluation

Page 3: Partnerships Drive Informatics Solutions for Biological ...dusk.geo.orst.edu/Pickup/IN54A/Sosik_IN54A.pdf · Technology solutions comprising high-performance computing scientific

Imaging FlowCytobot

An automated submersible imaging flow cytometer “robotic underwater microscope” - repeated, >6-month deployments with continuous sampling - taxon-specific goals at science – society interface Critical Data Challenges

~1 billion images, and counting non-standard data formats distributed storage locally accessible, non-fixed locations complex, multi-stage analyses high compute demand including near real time & full reprocessing very large numbers of annotations provenance tracking for analysis and products

Page 4: Partnerships Drive Informatics Solutions for Biological ...dusk.geo.orst.edu/Pickup/IN54A/Sosik_IN54A.pdf · Technology solutions comprising high-performance computing scientific

Species-Specific Blooms on the New England Shelf

Automated image analysis and classification

Sosik et al. 2007 27 diatom taxa at MVCO

2006 2007 2008 2009 2010 2011 2012 20130

5

10

15

Cel

ls m

L-1

manualautomated

Ditylum brightwellii

2006 2007 2008 2009 2010 2011 2012 20130

20

40

60

80

Cel

ls m

L-1

manualautomated

Guinardia delicatula

Page 5: Partnerships Drive Informatics Solutions for Biological ...dusk.geo.orst.edu/Pickup/IN54A/Sosik_IN54A.pdf · Technology solutions comprising high-performance computing scientific

Who are the Partners?

Scientist Ecologist and instrument developer Familiar with data challenges Willingness to try new solutions Complex use cases involving ecosystem characterization change detection early warning of harmful algal blooms, etc.

Informaticist Computer / library science expertise Familiar with scientific data systems Willingness to engage scientists Technology solutions comprising high-performance computing scientific data formats large-scale databases semantics and standards ubiquitous, mobile systems

IFCB

Stitching

Stitched ROIs

Segmentation

ROIs

Feature extraction

ROIs + features

Thresholding

Frame grab

Raw images

Overlap detection

Partial ROIs Complete ROIs

Binning

Bins of ROIs

Fluorescence + scattering

Video frames

Timing + volume

Campbell et al. 2010

Page 6: Partnerships Drive Informatics Solutions for Biological ...dusk.geo.orst.edu/Pickup/IN54A/Sosik_IN54A.pdf · Technology solutions comprising high-performance computing scientific

Small, interdisciplinary teams • Scientists, instrument developers • Facilitator • Information modelers • Technology implementers

Develop formal use case via template Design concept model & activity diagrams Evaluate technology approaches Develop prototypes Formally evaluate prototypes Iterate

Technology Development Process

Adapted from Tetherless World Constellation

Rensselaer Polytechnic Institute

Fox et al.

Page 7: Partnerships Drive Informatics Solutions for Biological ...dusk.geo.orst.edu/Pickup/IN54A/Sosik_IN54A.pdf · Technology solutions comprising high-performance computing scientific

Developing science informatics in partnership

Development of science is prior, primary, and ongoing

Use case driven by science needs allows for prototyping Scientists evaluate and adopt prototype technologies when ready Informaticist gains understanding of science Scientist gains informatics expertise

Use

Case

Science development

Informatics development

Evaluation Prototype

Use

Case

Iteration…

Page 8: Partnerships Drive Informatics Solutions for Biological ...dusk.geo.orst.edu/Pickup/IN54A/Sosik_IN54A.pdf · Technology solutions comprising high-performance computing scientific

IFCB

Localstorage

Filesystem

Camera

Acquisition code

PMT

Polling and data transfer

Analysis

Local fileaccess

HTTPaccess

Post-processing

Job scheduling

Dashboard

Database

Name resolution

Metadata

Web services

Post-processing

Accession

New system components interoperate with existing

Mission-critical data acquisition uninterrupted Analytics development continues using existing codebase New data access, provenance, parallel processing augment and interoperate

Existing New

Minimal disruption and risk to existing data and workflow systems

Page 9: Partnerships Drive Informatics Solutions for Biological ...dusk.geo.orst.edu/Pickup/IN54A/Sosik_IN54A.pdf · Technology solutions comprising high-performance computing scientific

Early Outcome: A Web-based Data Dashboard

Shareable URL for each data item

Navigation in time series

Visual summary of selected data with clickable links to images, metadata, raw data

Updated in near-real time as new data is collected

http://ifcb-data.whoi.edu

Page 10: Partnerships Drive Informatics Solutions for Biological ...dusk.geo.orst.edu/Pickup/IN54A/Sosik_IN54A.pdf · Technology solutions comprising high-performance computing scientific

Next Steps – A Product Pipeline

Ultimate goal: Time series of taxon-specific abundance & biomass, community characterization

Image processing Feature extraction Classification

Web services for data access Deposit service for products Interoperability with existing algorithms & code base Automated provenance generation and tracking

Initial use case: “blob mask” generation

Page 11: Partnerships Drive Informatics Solutions for Biological ...dusk.geo.orst.edu/Pickup/IN54A/Sosik_IN54A.pdf · Technology solutions comprising high-performance computing scientific

Completed round of review, evaluation, and revision Now in use by additional science groups providing input for next steps

Freshwater lake in Québec

Y. Huot

Gulf of Mexico Port Aransas, TX

L. Campbell

Salt Pond Nauset Marsh, MA

M. Brosnahan

One partnership nested within larger network of related interactions Exploring shared solutions Leveraging technology and approaches

Looking Forward

Thank You!