23
Ryan Fraser, Nicholas Carr Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL

Ryan Fraser, Nicholas Carr Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL

Embed Size (px)

Citation preview

Page 1: Ryan Fraser, Nicholas Carr Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL

Ryan Fraser, Nicholas Carr

Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL

Page 2: Ryan Fraser, Nicholas Carr Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL

● What are VLs?

● What is provenance?

● How do we represent VLs using standardised provenance?

Outline

Page 3: Ryan Fraser, Nicholas Carr Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL

From https://nectar.org.au/virtual-laboratories-1, they are:

● data repositories and computational tools and streamlining research workflows

What are VLs?

Page 4: Ryan Fraser, Nicholas Carr Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL

Connecting the commons with VHIRL and Provenance

Page 5: Ryan Fraser, Nicholas Carr Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL

From http://en.wikipedia.org/wiki/Provenance#Computer_Science:

What is provenance?

“Computer science uses the term provenance to mean the lineage of data or processes, as per data provenance. However there is a field of informatics research within computer science called provenance that studies how provenance of data and processes should be characterised, stored and used. Semantic web standards bodies, such as the World Wide Web Consortium, ratified a standard for provenance representation in 2014, known as PROV.”

Page 6: Ryan Fraser, Nicholas Carr Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL

Do you make decisions? Yes. Should someone remember how you made those decisions? Yes = PROV

Page 7: Ryan Fraser, Nicholas Carr Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL

Data Services

Data Layers discovered

Layers consist of numerous remote data services

PROV: a) Service

captures data service information (hosted on RDS)

b) Captures subset details of data selected

Subset Selected for processing

Page 8: Ryan Fraser, Nicholas Carr Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL

Compute/Storage Services

Flexibility in what compute provider to utilise

PROV: Captures job details, login info, where/what/ when/how computed etc

Includes all relevant NeCTAR details for cloud processing

Page 9: Ryan Fraser, Nicholas Carr Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL

Available Toolboxes

TCRM – estimate wind speed from cyclone and severe wind

ANUGA – estimate inundation from riverine floods, tsunami, dam break and storm surge

PROV: Captures code utilised along with “how” it is used (template/input files)

Page 10: Ryan Fraser, Nicholas Carr Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL

Example for tsunami inundation

PROV: Captures location (PID) of where input files/scripts are persisted

Page 11: Ryan Fraser, Nicholas Carr Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL

Processing Services

The steps so far have been building an environment to run a processing script

Either write your own script...

...or build from existing templates

...when you’re done, it will be submitted for processing on the Cloud!

PROV: Captures location (PID) of where input files/scripts are persisted

Page 12: Ryan Fraser, Nicholas Carr Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL

PROV: Finalised outputs are persisted with PIDs on RDS and captured in prov information

Page 13: Ryan Fraser, Nicholas Carr Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL

PROV: After job is completed – finalised Prov record is published to provenance store

PROV record endpoints could be registered in ANDS RDA along side output data!!!

Page 14: Ryan Fraser, Nicholas Carr Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL
Page 15: Ryan Fraser, Nicholas Carr Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL

Components of the Virtual Hazard Impact & Risk Laboratory (VHIRL)

Data Services Processing Services

Compute Services Enablers

Virtual Laboratories/Ap

psData Analytics

Magnetics

Gravity

DEM

eScript

ANUGA

NCIPetascale

NCICloud

NeCTAR Cloud

AmazonCloud

Desktop

Service Orchestration

ProvenanceMetadata

Auth.

CoastalInundation

Tsuanmi Inundation

Scenario

Cyclone Wind Path Calculation

Landsat

Bathymetry

Cyclone WindModel

Surface Wave Propagation (earthquake)TCRM

Page 16: Ryan Fraser, Nicholas Carr Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL

Basic scientific data processing model - 1

Input Data ProcessOutput Data

Page 17: Ryan Fraser, Nicholas Carr Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL

Background: How do we represent VLs using standardised provenance?

Page 18: Ryan Fraser, Nicholas Carr Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL

Basic scientific data processing model - 2

Code ProcessOutput Data

Config

Input Data

input item Roles

Page 19: Ryan Fraser, Nicholas Carr Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL

Basic scientific data processing model - 3, PROV

Code ProcessOutput Data

Config

Input Data

Who/ which

system

Who

wasGeneratedBy

wasAttri

butedTowasAssociatedWith

used

Entity Activity AgentPROV classes:

Page 20: Ryan Fraser, Nicholas Carr Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL

Basic scientific data processing model - 4, PROMS

Report N

Entity Activity AgentPROV classes:PROMS classes:

hadStartingActivity /

hadEndingActivityReporting System X

reportingSystem

R.S. Report

Page 21: Ryan Fraser, Nicholas Carr Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL

Basic scientific data processing model - 5, Storage

Report N

Entity Activity AgentPROV classes:PROMS classes:

Reporting System X

R.S. Report

Report NReport N

Report M

Report NReporting System Y Report N

Report NReport N

OrganisationalProvenance

Store

reported and stored

Page 22: Ryan Fraser, Nicholas Carr Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL

managed data

web service

data

user supplied

data

managed code

user supplied

code

Data Management

VL ID’d and persisted

output data

cited using PROMS-O format

soon to be VL ID’d and persisted, with minimal metadata recorded too

SSSC ID’s and persisted

perhaps SSSC ID’s and persisted, perhaps VL managed

soon to be VL ID’d and persisted, if required, perhaps with time limits

Page 23: Ryan Fraser, Nicholas Carr Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL

managed data

web service

data

user supplied

data

managed code

user supplied

code

Data Management

VL ID’d and persisted

output data

cited using PROMS-O format

soon to be VL ID’d and persisted, with minimal metadata recorded too

SSSC ID’s and persisted

perhaps SSSC ID’s and persisted, perhaps VL managed

soon to be VL ID’d and persisted, if required, perhaps with time limits

Virtual Labs Service Citation Example

[{ref}] {service title}{service endpoint URI}{query}{time queried}{cached copy ID}

[1] “Subset of elevation”

http://pid.csiro.au/service/anuga-thredds“bussleton.nc?var=elevation&spatial=bb&north=-33.06495205829679&south=-33.551573283840156&west=114.84967874597227&east=115.70661233971667&temporal=all&time_start=&time_end=&horizStride”

“2014-12-15T13:15:11”

http://pid.csiro.au/dataset/abcd1234