1
` Collaborating with computer scientists, informaticists*, and software developers for Integrated Ecosystem Assessments Peter Fox 1 ; Heidi Sosik, Stace Beaulieu ([email protected]), and Joseph Futrelle 2 ; David Mark Welch 3 ; Jon Hare and Michael Fogarty 4 1 Tetherless World Constellation (TWC), Rensselaer Polytechnic Institute; 2 Woods Hole Oceanographic Institution; 3 Marine Biological Laboratory; 4 Northeast Fisheries Science Center ECO-OP Use Case: Ecosystem Status Report Outcomes ECO-OP Use Case: Linked Data Provenance Outcomes Extended the PROV Ontology for capturing provenance in the IPython Notebook, a software platform that enables transparent workflows https://github.com/tetherless-world/ecoop/tree/master/prov Applied the prov-ecoop ontology to case studies that included the Climate Forcing Chapter, a regional map of primary production, and a fisheries indicator in the Ecosystem Status Report Book chapter in press for Oceanographic and Marine Cross-Domain Data Management for Sustainable Development: Documenting provenance for reproducible marine ecosystem assessment in open science Marine Biodiversity Virtual Laboratory (MBVL) Work in Progress Project website: https://tw.rpi.edu//web/project/MBVL/ Focusing on Objectives 1, 2, and 4 in this first year: 1) developing data access and computational infrastructure for the MBVL; 2) generating derived data products; 4) producing traceable product workflows. We will work this summer with Matthew Ball, undergraduate student in computer sciences from Bowie State University, in the PEP program. More team members: X. Ma, L. Fu 1 More team members: B. Lee, S. Zednik 1 ; A. Shipunova, A. Voorhis 3 More team members: M. Di Stefano, P. West 1 ; A. Maffei 2 ; G. DePiper, K. Friedland, S. Gaichas, K. Hyde, R. Gamble, M. Jones, S. Lucey 4 The ECO-OP and MBVL projects were funded by the U.S. National Science Foundation, grant numbers 0955649 and 1539256, respectively. A pilot toward end-to-end transparency from scientists’ desks to a report provided to policy makers and the public, important for science-based decision making. Prototype enabled an executable workflow for the production of a collaborative, multidisciplinary report with very heterogeneous data types https://github.com/tetherless-world/ecoop/tree/master/pyecoop Small team with computer scientists and IT specialists working directly with fisheries scientists led to rapid results, with a limiting factor being sufficient training for adoption of technologies by the larger group of domain scientists. Manuscript under review in Earth Science Informatics: Toward cyberinfrastructure to facilitate collaboration and reproducibility for marine Integrated Ecosystem Assessments Our solution for sharing workflows and delivering reproducible documents: ECO-OP: An abbreviation of ECOsystem and interOPerability Goal: to develop and deploy a software environment to generate a portion of the Ecosystem Status Report for the Northeast U.S. Continental Shelf Large Marine Ecosystem, retaining traceability of derived datasets including indicators of physical pressures and ecosystem states. How can we generalize our workflows for biodiversity? * What is an informaticist? One can think of informatics as the steps and skills involved to make sense out of data – some of this is domain-specific (scientists are informaticists in their domains), and some of this is general to information processing or to the engineering of information systems. Take sample from environment Extract subset of organisms from sample Measure attributes for the sampled organisms Classify the organisms into categories Determine the number of classified organisms in each category VAMPS: Visualization and Analysis of Microbial Population Structures https://vamps.mbl.edu/ IFCB: Imaging Flow CytoBot http://ifcb-data.whoi.edu/ Goal: This research effort brings together computational and information scientists, oceanographers and microbiologists to develop a Marine Biodiversity Virtual Laboratory (MBVL) to address multi-scale, heterogeneous data challenges with informatics solutions that enable the cyber-generation and documentation of biodiversity indicators, providing the traceability between data and information to be used as a basis for sustainable ecosystem-based management and needed policy decisions. Current emphasis on lower trophic levels Goal: to provide standardized provenance as metadata for data products, so that a human (and, ultimately, in the future, a machine) could trace back to the source observational data and models used to compile an indicator. For the community standard, we chose the PROV Ontology for representing and exchanging provenance information as Linked Data in the Semantic Web. Diagram for the three top classes in PROV-O and the properties that relate them. Proposed implementation of a workflow using IPython Notebook to generate a fisheries indicator. Entities: 1 IPython Notebook; 2 Cell; 3 Datasets; 4 script written in other programming language (R) that was split into five Cells; 5 other software environments. Four different agents are identified as contributing source datasets. Activities: 1 CellRun; 2 other activities performed in other software environments. Diagram for TWC Methodology. The use case defines the interactions between people, hardware, software, and desired products and can be adjusted or refined after each iteration of the cycle. PDF of Climate Forcing Chapter IPython (now Jupyter) Notebook

Collaborating with computer scientists, informaticists ...tw.rpi.edu/media/2016/06/08/78a4/ECOOP_MBVL_poster_NOAA_revi… · Collaborating with computer scientists, informaticists*,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Collaborating with computer scientists, informaticists ...tw.rpi.edu/media/2016/06/08/78a4/ECOOP_MBVL_poster_NOAA_revi… · Collaborating with computer scientists, informaticists*,

`

Col

labo

rati

ng w

ith

com

pute

r sc

ient

ists

, inf

orm

atic

ists

*, a

nd

soft

war

e de

velo

pers

for

Int

egra

ted

Eco

syst

em A

sses

smen

tsP

eter

Fox

1 ; H

eidi

Sos

ik, S

tace

Bea

ulie

u (s

tace

@w

hoi.e

du),

and

Jose

ph F

utre

lle 2 ;

Dav

id M

ark

Wel

ch 3 ;

Jon

Har

e an

d M

icha

el F

ogar

ty 4

1 Te

ther

less

Wor

ld C

onst

ella

tion

(TW

C),

Ren

ssel

aer P

olyt

echn

ic In

stitu

te; 2

Woo

ds H

ole

Oce

anog

raph

ic In

stitu

tion;

3 M

arin

e B

iolo

gica

l Lab

orat

ory;

4 N

orth

east

Fis

herie

s S

cien

ce C

ente

r

EC

O-O

P U

se C

ase:

Eco

syst

em S

tatu

s R

epor

t

Out

com

es

EC

O-O

P U

se C

ase:

Link

ed D

ata

Pro

vena

nce

Out

com

es•

Ext

ende

d th

e P

RO

V O

ntol

ogy

for c

aptu

ring

prov

enan

ce in

the

IPyt

hon

Not

eboo

k, a

sof

twar

e pl

atfo

rm th

at e

nabl

es tr

ansp

aren

t wor

kflo

ws

http

s://g

ithub

.com

/teth

erle

ss-w

orld

/eco

op/tr

ee/m

aste

r/pro

v•

App

lied

the

prov

-eco

op o

ntol

ogy

to c

ase

stud

ies

that

incl

uded

the

Clim

ate

Forc

ing

Cha

pter

, a re

gion

al m

ap o

f prim

ary

prod

uctio

n, a

nd a

fish

erie

s in

dica

tor i

n th

e E

cosy

stem

Sta

tus

Rep

ort

•B

ook

chap

ter i

n pr

ess

for O

cean

ogra

phic

and

Mar

ine

Cro

ss-D

omai

n D

ata

Man

agem

ent f

or S

usta

inab

le D

evel

opm

ent:

Doc

umen

ting

prov

enan

ce fo

r rep

rodu

cibl

e m

arin

e ec

osys

tem

ass

essm

ent

in o

pen

scie

nce

Mar

ine

Bio

dive

rsit

y V

irtu

al

Labo

rato

ry (

MB

VL)

Wor

k in

Pro

gres

s•

Pro

ject

web

site

: http

s://t

w.rp

i.edu

//web

/pro

ject

/MB

VL/

•Fo

cusi

ng o

n O

bjec

tives

1, 2

, and

4 in

this

firs

t yea

r:1)

dev

elop

ing

data

acc

ess

and

com

puta

tiona

l inf

rast

ruct

ure

for t

he M

BV

L;2)

gen

erat

ing

deriv

ed d

ata

prod

ucts

;4)

pro

duci

ng tr

acea

ble

prod

uct w

orkf

low

s.

•W

e w

ill w

ork

this

sum

mer

with

Mat

thew

Bal

l, un

derg

radu

ate

stud

ent i

n co

mpu

ter s

cien

ces

from

Bow

ie S

tate

Uni

vers

ity, i

n th

e P

EP

prog

ram

.

Mor

e te

am m

embe

rs: X

. Ma,

L. F

u 1

Mor

e te

am m

embe

rs: B

. Lee

, S. Z

edni

k 1 ;

A. S

hipu

nova

, A. V

oorh

is 3

Mor

e te

am m

embe

rs: M

. Di S

tefa

no, P

. Wes

t 1;A

. Maf

fei 2

; G. D

ePip

er,

K. F

riedl

and,

S. G

aich

as, K

. Hyd

e, R

. Gam

ble,

M. J

ones

, S. L

ucey

4

The

EC

O-O

P an

d M

BV

L pr

ojec

ts w

ere

fund

ed b

y th

e U

.S. N

atio

nal S

cien

ce F

ound

atio

n, g

rant

num

bers

095

5649

and

153

9256

, res

pect

ivel

y.

•A

pilo

t tow

ard

end-

to-e

nd tr

ansp

aren

cy fr

om s

cien

tists

’ des

ks to

a re

port

prov

ided

to p

olic

y m

aker

s an

d th

e pu

blic

, im

porta

nt fo

r sci

ence

-bas

ed

deci

sion

mak

ing.

•P

roto

type

ena

bled

an

exec

utab

le w

orkf

low

for t

he p

rodu

ctio

n of

a

colla

bora

tive,

mul

tidis

cipl

inar

y re

port

with

ver

y he

tero

gene

ous

data

type

s ht

tps:

//gith

ub.c

om/te

ther

less

-wor

ld/e

coop

/tree

/mas

ter/p

yeco

op•

Sm

all t

eam

with

com

pute

r sci

entis

ts a

nd IT

spe

cial

ists

wor

king

dire

ctly

with

fis

herie

s sc

ient

ists

led

to ra

pid

resu

lts, w

ith a

lim

iting

fact

or b

eing

suf

ficie

nt

train

ing

for a

dopt

ion

of te

chno

logi

es b

y th

e la

rger

gro

up o

f dom

ain

scie

ntis

ts.

•M

anus

crip

t und

er re

view

in E

arth

Sci

ence

Info

rmat

ics:

Tow

ard

cybe

rinfr

astr

uctu

re to

faci

litat

e co

llabo

ratio

n an

d re

prod

ucib

ility

fo

r mar

ine

Inte

grat

ed E

cosy

stem

Ass

essm

ents

Our

sol

utio

n fo

r sh

arin

g w

orkf

low

s an

d de

liver

ing

repr

oduc

ible

do

cum

ents

:

ECO

-OP:

An

abbr

evia

tion

of E

CO

syst

em a

nd in

terO

Pera

bilit

yG

oal:

to d

evel

op a

nd d

eplo

y a

softw

are

envi

ronm

ent t

o ge

nera

te a

po

rtion

of t

he E

cosy

stem

Sta

tus

Rep

ort f

or th

e N

orth

east

U.S

. Con

tinen

tal

She

lf La

rge

Mar

ine

Eco

syst

em, r

etai

ning

trac

eabi

lity

of d

eriv

ed d

atas

ets

incl

udin

g in

dica

tors

of p

hysi

cal p

ress

ures

and

eco

syst

em s

tate

s.

How

can

we

gene

raliz

e ou

r wor

kflo

ws

for b

iodi

vers

ity?

* Wha

t is

an in

form

atic

ist?

One

can

thin

k of

info

rmat

ics

as th

e st

eps

and

skill

s in

volv

ed to

mak

e se

nse

out o

f dat

a –

som

e of

this

is d

omai

n-sp

ecifi

c (s

cien

tists

ar

e in

form

atic

ists

in th

eir d

omai

ns),

and

som

e of

this

is g

ener

al to

info

rmat

ion

proc

essi

ng o

r to

the

engi

neer

ing

of in

form

atio

n sy

stem

s.

Take

sam

ple

from

en

viro

nmen

t

Ext

ract

sub

set o

f or

gani

sms

from

sam

ple

Mea

sure

attr

ibut

es fo

r th

e sa

mpl

ed o

rgan

ism

s

Cla

ssify

the

orga

nism

s in

to c

ateg

orie

s

Det

erm

ine

the

num

ber

of c

lass

ified

org

anis

ms

in e

ach

cate

gory

VAM

PS: V

isua

lizat

ion

and

Ana

lysi

s of

Mic

robi

al

Pop

ulat

ion

Stru

ctur

esht

tps:

//vam

ps.m

bl.e

du/

IFC

B:

Imag

ing

Flow

Cyt

oBot

http

://ifc

b-da

ta.w

hoi.e

du/

Goa

l: Th

is re

sear

ch e

ffort

brin

gs to

geth

er c

ompu

tatio

nal a

nd in

form

atio

n sc

ient

ists

, oce

anog

raph

ers

and

mic

robi

olog

ists

to d

evel

op a

Mar

ine

Bio

dive

rsity

Virt

ual L

abor

ator

y (M

BV

L) to

add

ress

mul

ti-sc

ale,

he

tero

gene

ous

data

cha

lleng

es w

ith in

form

atic

s so

lutio

ns th

at e

nabl

e th

e cy

ber-

gene

ratio

n an

d do

cum

enta

tion

of b

iodi

vers

ity in

dica

tors

, pro

vidi

ng th

e tra

ceab

ility

bet

wee

n da

ta a

nd in

form

atio

n to

be

used

as

a ba

sis

for

sust

aina

ble

ecos

yste

m-b

ased

man

agem

ent a

nd n

eede

d po

licy

deci

sion

s.

Cur

rent

em

phas

is o

n lo

wer

trop

hic

leve

ls

Goa

l: to

pro

vide

sta

ndar

dize

d pr

oven

ance

as

met

adat

a fo

r dat

a pr

oduc

ts,

so th

at a

hum

an (a

nd, u

ltim

atel

y, in

the

futu

re, a

mac

hine

) cou

ld tr

ace

back

to

the

sour

ce o

bser

vatio

nal d

ata

and

mod

els

used

to c

ompi

le a

n in

dica

tor.

For t

he c

omm

unity

sta

ndar

d, w

e ch

ose

the

PR

OV

Ont

olog

y fo

r rep

rese

ntin

g an

d ex

chan

ging

pro

vena

nce

info

rmat

ion

as L

inke

d D

ata

in th

e S

eman

tic W

eb.

Dia

gram

for t

he th

ree

top

clas

ses

in P

RO

V-O

and

the

prop

ertie

s th

at re

late

them

.

Pro

pose

d im

plem

enta

tion

of a

w

orkf

low

usi

ng IP

ytho

n N

oteb

ook

to g

ener

ate

a fis

herie

s in

dica

tor.

Ent

ities

: 1 IP

ytho

n N

oteb

ook;

2

Cel

l; 3

Dat

aset

s; 4

scr

ipt

writ

ten

in o

ther

pro

gram

min

g la

ngua

ge (R

) tha

t was

spl

it in

to fi

ve C

ells

; 5 o

ther

so

ftwar

e en

viro

nmen

ts.

Four

diff

eren

t age

nts

are

iden

tifie

d as

con

tribu

ting

sour

ce d

atas

ets.

Act

iviti

es: 1

Cel

lRun

; 2 o

ther

ac

tiviti

es p

erfo

rmed

in o

ther

so

ftwar

e en

viro

nmen

ts.

Dia

gram

for T

WC

M

etho

dolo

gy.

The

use

case

def

ines

the

inte

ract

ions

bet

wee

n pe

ople

, ha

rdw

are,

sof

twar

e, a

nd

desi

red

prod

ucts

and

can

be

adju

sted

or r

efin

ed a

fter

each

iter

atio

n of

the

cycl

e.

PD

F of

Clim

ate

Forc

ing

Cha

pter

IPyt

hon

(now

Jup

yter

) N

oteb

ook