39
Bertram Ludäscher [email protected] Director,Center for Informa-cs Research in Science & Scholarship (CIRSS) Graduate School of Library and Informa-on Science (GSLIS) & Na-onal Center for Supercompu-ng Applica-ons (NCSA) & Department of Computer Science Scien2fic Workflows & Provenance Intl. Conf. on So+ware & System Processes (ICSSP) Co-located with ICSE’16, AusCn, TX, May 14-22, 2016

ICSSP-Panel Austin, May 15, 2016

Embed Size (px)

Citation preview

Page 1: ICSSP-Panel Austin, May 15, 2016

BertramLudä[email protected]

Director,CenterforInforma-csResearchinScience&Scholarship(CIRSS)

GraduateSchoolofLibraryandInforma-onScience(GSLIS)&Na-onalCenterforSupercompu-ngApplica-ons(NCSA)

&DepartmentofComputerScience

Scien2ficWorkflows&Provenance

Intl.Conf.onSo+ware&SystemProcesses(ICSSP)Co-locatedwithICSE’16,AusCn,TX,May14-22,2016

Page 2: ICSSP-Panel Austin, May 15, 2016

•  Scien2ficWorkflows– Examples,Features

•  Provenance&ReproducibleScience– “ProspecCveProvenance”(a.k.a.workflows)– RetrospecCveProvenance

•  YesWorkflow– Yes,ScriptscanbeWorkflows,too!

•  TheQuestcon2nues!

Outline

2ICSSP'16Panel,AusCnTX

Page 3: ICSSP-Panel Austin, May 15, 2016

Scientific Workflows: ASAP •  Automation

–  wfs to automate computational aspects of science

•  Scaling (exploit and optimize machine cycles) –  wfs should make use of parallel compute resources –  wfs should be able handle large data

•  Abstraction, Evolution, Reuse (human cycles) –  wfs should be easy to (re-)use, evolve, share

•  Provenance –  wfs should capture processing history, data lineage è traceable data- and wf-evolution è  Reproducible Science

TridentWorkbench

VisTrails

3

Eswareinmal…ICSSP'16Panel,AusCnTX

Page 4: ICSSP-Panel Austin, May 15, 2016

10Essen2alfunc2onsofascien2ficworkflowsystem1.   AutomateprogramsandservicesscienCstsalreadyuse.

2.   ScheduleinvocaConsofprogramsandservicescorrectlyandefficiently–inparallelwherepossible.

3.   Managedataflowto,from,andbetweenprogramsandservices.

4.   Enablescien2sts(notjustdevelopers)toauthorormodifyworkflowseasily.

5.   Predictwhataworkflowwilldowhenexecuted:prospec1veprovenance.

6.   RecordwhathappenedduringworkflowexecuCon:retrospec1veprovenance.

7.   Revealretrospec2veprovenance–howworkflowproductswerederivedfrominputsviaprogramsandservices.

8.   Organizeintermediateandfinaldataproductsasdesiredbyusers.

9.  EnablescienCststoversion,shareandpublishtheirworkflows.

10.   Empowerscien2stswhowishtoautomateaddi2onalprogramsandservicesthemselves.

Thesefunc2ons(notjustdataflow&actors)dis2nguishscien1ficworkflowautoma1onfromgeneralscien2ficsoZwaredevelopment.

ICSSP'16Panel,AusCnTX 4

Src:TimothyMcPhillips

Page 5: ICSSP-Panel Austin, May 15, 2016

FindOTUs

(OTUHunter)

AssignTaxonomy(STAP)

Profilealignment

(STAPorInfernal)

BuildphylogeneCctree(RaxMLorQuicktree)

Viewtree:Dendroscope

UniFrac:tree&

environmentfile

AssembledconCgs

Chimeracheck

(Mallard)

DiversitystaCsCcs:Text:OUTlist,Chao1,Shannon

Graphs:rarefacConcurves,rank-abundancecurves

VisualizaContools:Cytoscapenetworks&Heatmap

WATERS: WorkflowforAlignment,Taxonomy,EcologyofRibosomalSequences(AmberHartman;EisenLab;UCDavis)

+/-cipres

+/-cluster

+/-cluster

+/-cluster

ICSSP'16Panel,AusCnTX 5

Page 6: ICSSP-Panel Austin, May 15, 2016

Executable WATERS Workflow in Kepler

ICSSP'16Panel,AusCnTX 6

Page 7: ICSSP-Panel Austin, May 15, 2016

Example Bioinformatics

Workflow:

Motif-Catcher

MarcFaccionetal.UCDavisGenomeCenter

ICSSP'16Panel,AusCnTX 7

Page 8: ICSSP-Panel Austin, May 15, 2016

Motif-Catcher workflow, implemented in Kepler

SKöhleretal.ImprovedMoCfDetecConinLargeSequenceSetswithRandomSamplinginaKeplerworkflow,ICCS-WS,2012

ICSSP'16Panel,AusCnTX 8

Page 9: ICSSP-Panel Austin, May 15, 2016

A Data-Streaming Workflow over Sensor Data

ICSSP'16Panel,AusCnTX 9

Page 10: ICSSP-Panel Austin, May 15, 2016

•  MonitorandcontrolsupercomputersimulaCons

–  50+compositeactors(subworkflows)–  4levelsofhierarchy–  1000+atomic(Java)actors

43actors,3levels

196actors,4levels30actors

206actors,4levels

137actors33actors

150123actors

66actors12actors

243actors,4levels

NorbertPodhorszkiORNL(then:UCDavis)

“Plumbing”workflow

ICSSP'16Panel,AusCnTX 10

Page 11: ICSSP-Panel Austin, May 15, 2016

More “Plumbing” (beware the Boolean Select)

Cabellosetal.ComputerPhysicsCommunica-ons182,2011

ICSSP'16Panel,AusCnTX 11

Page 12: ICSSP-Panel Austin, May 15, 2016

Scien2ficWorkflowDesign:SomeChallenges

“And the graphical UI makes our scientific workflows so much easier to develop, understand and maintain!”

ICSSP'16Panel,AusCnTX 12

Page 13: ICSSP-Panel Austin, May 15, 2016

Modeling & Design: Die Grenzen meiner Sprache bedeuten die Grenzen meiner Welt VanillaProcessNetwork

Func2onalProgrammingDataflowNetwork

XMLTransforma2onNetwork

Collec2on-orientedModeling&Design

framework(COMAD)“LookMa:NoShims!”

ICSSP'16Panel,AusCnTX 13

Page 14: ICSSP-Panel Austin, May 15, 2016

Problemswith[toomany]ShimsandWires•  Shimsneedtobeplacedandconnected

–  Tedious,error-prone•  DistractfromscienCficmeaningfulactors

–  Non-descripCveworkflows–worthsharing?•  DataOrganizaConisencodedinworkflowstructure

–  Notrobusttodatachanges•  Shimso+enleadtocomplexdesigns

–  Imagineallprevious`design-paxerns’intertwined–  GOTO-programming

COMAD/VDAL:Raisingthelevelofabstrac1on

  Localizedcontrol-flow

  Datamanagementnotdoneviawires

  Actorsarecouplednotbywirebutbydata!ICSSP'16Panel,AusCnTX 14

Page 15: ICSSP-Panel Austin, May 15, 2016

CollecCon-OrientedModeling&Design(COMAD)–  fullyembracetheassemblylinemetaphor

–  data=taggednestedcollec2ons–  e.g.representedasflaxened,pipelined(XML)tokenstreams:

PipelinedCollec2on-OrientedWorkflows

Actors(likeassemblylineworkers),passonwhattheydon’tworkon

TMcPhillips,SBowers,DZinn,BLudäscher

ICSSP'16Panel,AusCnTX 15

Page 16: ICSSP-Panel Austin, May 15, 2016

Two different workflow designs

• Hardwiringvs.configurabledata/collecConmanagement• brixlevs.changeresilientdesigns• scienCstcanrecognizenapkindrawing/conceptualmodel• Humancyclesareexpensive

ICSSP'16Panel,AusCnTX 16

Page 17: ICSSP-Panel Austin, May 15, 2016

ADIOS in Kepler

ICSSP'16Panel,AusCnTX 17

Page 18: ICSSP-Panel Austin, May 15, 2016

ADIOS in COMAD

ICSSP'16Panel,AusCnTX 18

Page 19: ICSSP-Panel Austin, May 15, 2016

From “Climate Gate” to Reproducible Science

Capturing provenance is crucial for transparency, interpretation, debugging, … => repeatable experiments, => reproducible science=> need workflow-system agnostic model

ICSSP'16Panel,AusCnTX 19

Page 20: ICSSP-Panel Austin, May 15, 2016

20

NaturalHistory:Understandingwhathappened…

Zrzavý,Jan,DavidStorch,andStanislavMihulka.Evolu-on:EinLese-Lehrbuch.Springer-Verlag,2009.

Author:Jkwchui(BasedondrawingbyTruth-seeker2004)

ICSSP'16Panel,AusCnTX

Page 21: ICSSP-Panel Austin, May 15, 2016

Computa2onalProvenance

•  Originandprocessinghistoryofanar2fact– usually:data(products),figures,...– someCmes:workflow(andscript)evoluCon…

•  Differentsub-communiCes:– Provenanceindatabases– Provenancein(scien2fic)workflows–  ...programminglanguages,systems/security,…

21ICSSP'16Panel,AusCnTX

Page 22: ICSSP-Panel Austin, May 15, 2016

22

Run1meProvenance(a.k.a.traces,logs,

retrospec1veprovenance,“Trace-land”)

DifferentKindsofDataProvenanceinWorkflows

WorkflowModeling&Design(a.k.a.prospec1veprovenance

“Workflow-land”)

ICSSP'16Panel,AusCnTX

Page 23: ICSSP-Panel Austin, May 15, 2016

SKOPE:SynthesizedKnowledgeOfPastEnvironments

23

Bocinsky,Kohleretal.studyrain-fedmaizeofAnasazi–  FourCorners;AD600–1500.ClimatechangeinfluencedMesaVerdeMigra2ons;late

13thcenturyAD.Usesnetworkoftree-ringchronologiestoreconstructaspa2o-temporalclimatefieldatafairlyhighresoluCon(~800m)fromAD1–2000.AlgorithmesCmatesjointinformaConintree-ringsandaclimatesignaltoidenCfy“best”tree-ringchronologiesforclimatereconstrucCng.

K.Bocinsky,T.Kohler,A2000-yearreconstrucConoftherain-fedmaizeagriculturalnicheintheUSSouthwest.Nature

Communica1ons.doi:10.1038/ncomms6618

… implemented as an R Script … ICSSP'16Panel,AusCnTX

Page 24: ICSSP-Panel Austin, May 15, 2016

GetModernClimate

PRISM_annual_growing_season_precipitation

SubsetAllData

dendro_series_for_calibration

dendro_series_for_reconstruction CAR_Analysis_unique

cellwise_unique_selected_linear_models

CAR_Analysis_union

cellwise_union_selected_linear_models

CAR_Reconstruction_union

raster_brick_spatial_reconstruction raster_brick_spatial_reconstruction_errors

CAR_Reconstruction_union_output

ZuniCibola_PRISM_grow_prcp_ols_loocv_union_recons.tif ZuniCibola_PRISM_grow_prcp_ols_loocv_union_errors.tif

master_data_directory prism_directory

tree_ring_datacalibration_years retrodiction_years

?

YesWorkflow:Yes,scriptsareworkflows,too!

•  ScriptvsWorkflows/ASAP:– Automation:*****– Scaling:**– Abstraction:*– Provenance:**

24ICSSP'16Panel,AusCnTX

Page 25: ICSSP-Panel Austin, May 15, 2016

YesWorkflow.org•  YesWorkflow(YW)

–  Startedasagrass-rootseffort(Kurator,SKOPE,..)–  …meeCngthescienCsts/userswheretheyR!

•  R,Matlab,(i)Python,Jupyter,…

–  Scripts+simpleuserannotaCons

•  =>Revealtheworkflowmodel/abstrac2on…thatunderliesthe(script)implementa-on

•  =>YWcangiveusmoreofASAP!–  FirstYW:ASAP(AbstracCon)...–  ThenYW-recon:ASAP(reconstrucCngrun2meProvenance)

25ICSSP'16Panel,AusCnTX

Page 26: ICSSP-Panel Austin, May 15, 2016

YWannota2ons:ModelyourWorkflow!

26ICSSP'16Panel,AusCnTX

Page 27: ICSSP-Panel Austin, May 15, 2016

YesWorkflow:Prospec2ve&RetrospecCveProvenance…(almost)forfree!

•  YWannotaConsinthescript(R,Python,Matlab)areusedtorecreatetheworkflowviewfromthescript…

27

cassette_id

sample_score_cutoff

sample_spreadsheetfile:cassette_{cassette_id}_spreadsheet.csv

calibration_imagefile:calibration.img

initialize_run

run_logfile:run/run_log.txt

load_screening_results

sample_namesample_quality

calculate_strategy

rejected_sample accepted_sample num_images energies

log_rejected_sample

rejection_logfile:/run/rejected_samples.txt

collect_data_set

sample_id energy frame_numberraw_image

file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw

transform_images

corrected_imagefile:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img

total_intensitypixel_count corrected_image_path

log_average_image_intensity

collection_logfile:run/collected_images.csv

YW!

ICSSP'16Panel,AusCnTX

Page 28: ICSSP-Panel Austin, May 15, 2016

GetModernClimate

PRISM_annual_growing_season_precipitation

SubsetAllData

dendro_series_for_calibration

dendro_series_for_reconstruction CAR_Analysis_unique

cellwise_unique_selected_linear_models

CAR_Analysis_union

cellwise_union_selected_linear_models

CAR_Reconstruction_union

raster_brick_spatial_reconstruction raster_brick_spatial_reconstruction_errors

CAR_Reconstruction_union_output

ZuniCibola_PRISM_grow_prcp_ols_loocv_union_recons.tif ZuniCibola_PRISM_grow_prcp_ols_loocv_union_errors.tif

master_data_directory prism_directory

tree_ring_datacalibration_years retrodiction_years

PaleoclimateReconstruc2on(EnviRecon.org)

28

•  …explainedusingYesWorkflow!

KyleB.,(computaConal)archaeologist:"Ittookmeabout20minutestocomment.LessthananhourtolearnandYW-annotate,all-told."

ICSSP'16Panel,AusCnTX

Page 29: ICSSP-Panel Austin, May 15, 2016

main

fetch_maskinput_mask_file

load_datainput_data_file standardize_with_mask

land_water_mask

NEE_data simple_diagnosestandardized_NEE_data result_NEE_pdf

Get3viewsforthepriceof1!

29

result_NEE_pdf

input_mask_file land_water_maskfetch_mask

input_data_file NEE_dataload_data

standardized_NEE_data

standardize_with_mask

standardize_with_masksimple_diagnose

fetch_mask land_water_mask

load_data NEE_data

standardize_with_mask standardized_NEE_data simple_diagnose result_NEE_pdf

input_mask_file

input_data_file

Processview

Dataview

Combinedview

ICSSP'16Panel,AusCnTX

Page 30: ICSSP-Panel Austin, May 15, 2016

Mul2-ScaleSynthesisandTerrestrialModelIntercomparisonProject(MsTMIP)

fetch_drought_variable

drought_variable_1

fetch_effect_variable

effect_variable_1

convert_effect_variable_units

effect_variable_2

create_land_water_mask

land_water_mask

init_data_variables

predrought_effect_variable_1 drought_value_variable_1 recovery_time_variable_1 drought_number_variable_1

define_droughts

sigma_dv_event month_dv_length

detrend_deseasonalize_effect_variable

effect_variable_3

calculate_data_variables

recovery_time_variable_2 drought_value_variable_2 predrought_effect_variable_2 drought_number_variable_2

export_recovery_time_figure

output_recovery_time_figure

export_drought_value_variable_figure

output_drought_value_variable_figure

export_predrought_effect_variable_figure

output_predrought_effect_variable_figure

export_drought_number_variable_figure

output_drought_number_figure

input_drough_variable

input_effect_variable

ChristopherSchwalm,YaxingWei

30ICSSP'16Panel,AusCnTX

Page 31: ICSSP-Panel Austin, May 15, 2016

���������������������� �������

��� ������������������������� ��

���������������� ��

�����������

���������������������������������������������

��������������

����������������

���������������

������������������������������

������������������������������������� �������

������������������������������

����������������������������

�������

�����������������������������������������

����������������������������������

����������������������������������

���������� ��������������������������������������������������������������

��������������������������������

��������������������������������

���������� ���������������������

Figure 4: Process workflow view of an A↵ymetrix analysis script (in R).

4 YesWorkflow Examples

In the following we show YesWorkflow views extracted from real-world scientific use cases.The scripts were annoted with YW tags by scientists and script authors, using a verymodest training and mark-up e↵ort.1 Due to lack of space, the actual MATLAB and R

scripts with their YW markup are not included here. However, they are all availablefrom the yw-idcc-15 repository on the YW GitHub site [Yes15].

4.1 Analysis of Gene Expression Microarray Data

Bioinformatics workflows commonly possess a pattern of large numbers of incoming pa-rameters and outputs at each stage of computation. In addition, analysis of even asingle bioinformatics dataset tends to yield a large number of di↵erent output files.Hence, bioinformatics pipelines are attractive candidates for workflow systems, whichcan capture this complexity [Bie12]. Figure 4 shows a YesWorkflow representation ofan R script performing a classic, complex bioinformatics task: analysis of A↵ymetrixgene expression microarray data. This R script was modeled on our previous work-flows developed in the Kepler environment [SMLB12]. The script analyzes experimentdesigns consisting of two conditions (e.g., microarrays from control-treated cells vs mi-croarrays from drug-treated cells) with multiple replicates in each condition. The R

script employs a set of standard BioConductor [GCB+04] packages mixed with customprogramming. The workflow consists of four fundamental tasks: normalization of dataacross microarray datasets (Normalize), selection of di↵erentially expressed genes (DEGs)between conditions (SelectDEGs), determination of gene ontology (GO) statistics for theresulting datasets (GO Analysis), and creation of a heatmap of the di↵erentially ex-pressed genes (MakeHeatmap). Each module produces outputs, and each module (asidefrom MakeHeatmap) requires external parameter inputs. Importantly, this graphical rep-resentation clearly indicates the dependence of each module on datasets and parameterinputs. This example demonstrates that YesWorkflow can provide informative visualiza-tions of bioinformatics workflows, especially workflows involving large numbers of inputsand outputs.

1For all of these scripts, learning the YW model and annotating the scripts was done in a few hours.

6

GeneExpressionMicroarrayDataAnalysis

•  [Normalize]–  NormalizaConofdataacrossmicroarraydatasets

•  [SelectDEGs]–  SelecConofdifferenCallyexpressedgenesbetweencondiCons

•  [GOAnalysis]–  determinaConofgeneontologystaCsCcsfortheresulCngdatasets

•  [MakeHeatmap]–  creaConofaheatmapofthedifferenCallyexpressedgenes.

TylerKolisnik,MarkBieda

31ICSSP'16Panel,AusCnTX

Page 32: ICSSP-Panel Austin, May 15, 2016

initialize_run

run_logfile:run/run_log.txt

load_screening_results

sample_name sample_quality

calculate_strategy

rejected_sample accepted_sample num_imagesenergies

log_rejected_sample

rejection_logfile:/run/rejected_samples.txt

collect_data_set

sample_idenergyframe_numberraw_image

file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw

transform_images

corrected_imagefile:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img

total_intensitypixel_count corrected_image_path

log_average_image_intensity

collection_logfile:run/collected_images.csv

sample_spreadsheetfile:cassette_{cassette_id}_spreadsheet.csv

calibration_imagefile:calibration.img

cassette_id

sample_score_cutoff

Datacollec2onworkflow(X-raydiffracCon)

32ICSSP'16Panel,AusCnTX

Page 33: ICSSP-Panel Austin, May 15, 2016

run/  

├──  raw  

│      └──  q55  

│              ├──  DRT240  

│              │      ├──  e10000  

│              │      │      ├──  image_001.raw  

...          ...  ...  ...  

│              │      │      └──  image_037.raw  

│              │      └──  e11000  

│              │              ├──  image_001.raw  

...          ...          ...  

│              │              └──  image_037.raw  

│              └──  DRT322  

│                      ├──  e10000  

│                      │      ├──  image_001.raw  

...                  ...  ...  

│                      │      └──  image_030.raw  

│                      └──  e11000  

│                              ├──  image_001.raw  

...                          ...  

│                              └──  image_030.raw  

├──  data  

│      ├──  DRT240  

│      │      ├──  DRT240_10000eV_001.img  

...  ...  ...  

│      │      └──  DRT240_11000eV_037.img  

│      └──  DRT322  

│              ├──  DRT322_10000eV_001.img  

...          ...  

│              └──  DRT322_11000eV_030.img  

│  

├──  collected_images.csv  

├──  rejected_samples.txt  

└──  run_log.txt  

 

YW-RECON:ProspecCve&Retrospec2veProvenance…(almost)forfree!

33

cassette_id

sample_score_cutoff

sample_spreadsheetfile:cassette_{cassette_id}_spreadsheet.csv

calibration_imagefile:calibration.img

initialize_run

run_logfile:run/run_log.txt

load_screening_results

sample_namesample_quality

calculate_strategy

rejected_sample accepted_sample num_images energies

log_rejected_sample

rejection_logfile:/run/rejected_samples.txt

collect_data_set

sample_id energy frame_numberraw_image

file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw

transform_images

corrected_imagefile:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img

total_intensitypixel_count corrected_image_path

log_average_image_intensity

collection_logfile:run/collected_images.csv

•  URI-templateslinkconceptualenCCestorun2meprovenance“le+behind”bythescriptauthor…

•  …facilitaCngprovenancereconstruc2onICSSP'16Panel,AusCnTX

Page 34: ICSSP-Panel Austin, May 15, 2016

initialize_run

run_logfile:run/run_log.txt

load_screening_results

sample_name sample_quality

calculate_strategy

rejected_sample accepted_sample num_imagesenergies

log_rejected_sample

rejection_logfile:/run/rejected_samples.txt

collect_data_set

sample_idenergyframe_numberraw_image

file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw

transform_images

corrected_imagefile:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img

total_intensitypixel_count corrected_image_path

log_average_image_intensity

collection_logfile:run/collected_images.csv

sample_spreadsheetfile:cassette_{cassette_id}_spreadsheet.csv

calibration_imagefile:calibration.img

cassette_id

sample_score_cutoff

Q1:Whatsamplesdidthescriptruncollectimagesfrom?

run/  

├──  raw  

│      └──  q55  

│              ├──  DRT240  

│              │      ├──  e10000  

│              │      │      ├──  image_001.raw  

...          ...  ...  ...  

│              │      │      └──  image_037.raw  

│              │      └──  e11000  

│              │              ├──  image_001.raw  

...          ...          ...  

│              │              └──  image_037.raw  

│              └──  DRT322  

│                      ├──  e10000  

│                      │      ├──  image_001.raw  

...                  ...  ...  

│                      │      └──  image_030.raw  

│                      └──  e11000  

│                              ├──  image_001.raw  

...                          ...  

│                              └──  image_030.raw  

├──  data  

│      ├──  DRT240  

│      │      ├──  DRT240_10000eV_001.img  

...  ...  ...  

│      │      └──  DRT240_11000eV_037.img  

│      └──  DRT322  

│              ├──  DRT322_10000eV_001.img  

...          ...  

│              └──  DRT322_11000eV_030.img  

│  

├──  collected_images.csv  

├──  rejected_samples.txt  

└──  run_log.txt  

  34ICSSP'16Panel,AusCnTX

Page 35: ICSSP-Panel Austin, May 15, 2016

initialize_run

run_logfile:run/run_log.txt

load_screening_results

sample_name sample_quality

calculate_strategy

rejected_sample accepted_sample num_imagesenergies

log_rejected_sample

rejection_logfile:/run/rejected_samples.txt

collect_data_set

sample_idenergyframe_numberraw_image

file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw

transform_images

corrected_imagefile:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img

total_intensitypixel_count corrected_image_path

log_average_image_intensity

collection_logfile:run/collected_images.csv

sample_spreadsheetfile:cassette_{cassette_id}_spreadsheet.csv

calibration_imagefile:calibration.img

cassette_id

sample_score_cutoff

Q2:WhatenergieswereusedforimagecollecConfromsampleDRT322?

run/  

├──  raw  

│      └──  q55  

│              ├──  DRT240  

│              │      ├──  e10000  

│              │      │      ├──  image_001.raw  

...          ...  ...  ...  

│              │      │      └──  image_037.raw  

│              │      └──  e11000  

│              │              ├──  image_001.raw  

...          ...          ...  

│              │              └──  image_037.raw  

│              └──  DRT322  

│                      ├──  e10000  

│                      │      ├──  image_001.raw  

...                  ...  ...  

│                      │      └──  image_030.raw  

│                      └──  e11000  

│                              ├──  image_001.raw  

...                          ...  

│                              └──  image_030.raw  

├──  data  

│      ├──  DRT240  

│      │      ├──  DRT240_10000eV_001.img  

...  ...  ...  

│      │      └──  DRT240_11000eV_037.img  

│      └──  DRT322  

│              ├──  DRT322_10000eV_001.img  

...          ...  

│              └──  DRT322_11000eV_030.img  

│  

├──  collected_images.csv  

├──  rejected_samples.txt  

└──  run_log.txt  

  35ICSSP'16Panel,AusCnTX

Page 36: ICSSP-Panel Austin, May 15, 2016

initialize_run

run_logfile:run/run_log.txt

load_screening_results

sample_name sample_quality

calculate_strategy

rejected_sample accepted_sample num_imagesenergies

log_rejected_sample

rejection_logfile:/run/rejected_samples.txt

collect_data_set

sample_idenergyframe_numberraw_image

file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw

transform_images

corrected_imagefile:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img

total_intensitypixel_count corrected_image_path

log_average_image_intensity

collection_logfile:run/collected_images.csv

sample_spreadsheetfile:cassette_{cassette_id}_spreadsheet.csv

calibration_imagefile:calibration.img

cassette_id

sample_score_cutoff

Q3:WhereistherawimageofthecorrectedimageDRT322_11000ev_030.img?run/  

├──  raw  

│      └──  q55  

│              ├──  DRT240  

│              │      ├──  e10000  

│              │      │      ├──  image_001.raw  

...          ...  ...  ...  

│              │      │      └──  image_037.raw  

│              │      └──  e11000  

│              │              ├──  image_001.raw  

...          ...          ...  

│              │              └──  image_037.raw  

│              └──  DRT322  

│                      ├──  e10000  

│                      │      ├──  image_001.raw  

...                  ...  ...  

│                      │      └──  image_030.raw  

│                      └──  e11000  

│                              ├──  image_001.raw  

...                          ...  

│                              └──  image_030.raw  

├──  data  

│      ├──  DRT240  

│      │      ├──  DRT240_10000eV_001.img  

...  ...  ...  

│      │      └──  DRT240_11000eV_037.img  

│      └──  DRT322  

│              ├──  DRT322_10000eV_001.img  

...          ...  

│              └──  DRT322_11000eV_030.img  

│  

├──  collected_images.csv  

├──  rejected_samples.txt  

└──  run_log.txt  

 

36ICSSP'16Panel,AusCnTX

Page 37: ICSSP-Panel Austin, May 15, 2016

initialize_run

run_logfile:run/run_log.txt

load_screening_results

sample_name sample_quality

calculate_strategy

rejected_sample accepted_sample num_imagesenergies

log_rejected_sample

rejection_logfile:/run/rejected_samples.txt

collect_data_set

sample_idenergyframe_numberraw_image

file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw

transform_images

corrected_imagefile:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img

total_intensitypixel_count corrected_image_path

log_average_image_intensity

collection_logfile:run/collected_images.csv

sample_spreadsheetfile:cassette_{cassette_id}_spreadsheet.csv

calibration_imagefile:calibration.img

cassette_id

sample_score_cutoff

run/  

├──  raw  

│      └──  q55  

│              ├──  DRT240  

│              │      ├──  e10000  

│              │      │      ├──  image_001.raw  

...          ...  ...  ...  

│              │      │      └──  image_037.raw  

│              │      └──  e11000  

│              │              ├──  image_001.raw  

...          ...          ...  

│              │              └──  image_037.raw  

│              └──  DRT322  

│                      ├──  e10000  

│                      │      ├──  image_001.raw  

...                  ...  ...  

│                      │      └──  image_030.raw  

│                      └──  e11000  

│                              ├──  image_001.raw  

...                          ...  

│                              └──  image_030.raw  

├──  data  

│      ├──  DRT240  

│      │      ├──  DRT240_10000eV_001.img  

...  ...  ...  

│      │      └──  DRT240_11000eV_037.img  

│      └──  DRT322  

│              ├──  DRT322_10000eV_001.img  

...          ...  

│              └──  DRT322_11000eV_030.img  

│  

├──  collected_images.csv  

├──  rejected_samples.txt  

└──  run_log.txt  

 

Q5:Whatcassere-idhadthesampleleadingtoDRT240_10000ev_001.img?

37ICSSP'16Panel,AusCnTX

Page 38: ICSSP-Panel Austin, May 15, 2016

Meme-Pad&Conclusions•  Scien2ficworkflows(ASAP)

–  Data-orientedvsprocess-orientedmodeling•  dataflowmodelsvscontrol-floworientedmodels•  conceptualmodelsvsexecuConmodelsèWhatmodelsandtoolsforwhichkindsofworkflows?

–  Dataprovenanceinworkflows•  ProspecCveprovenance•  RetrospecCveprovenance

•  YesWorkflow:–  Scriptsare(canbe)workflows,too!

•  TheQuest:somanymodels,soli^le-me(contrastw/databases)–  Empoweringscien2sts…–  …withusefultoolsbasedonusefulmodels–  …meeCngthemwheretheyalreadyare–  …provenanceiskey–  Onesizemightnotfitall:BewareoftheTuringtar-pit!

38ICSSP'16Panel,AusCnTX

Page 39: ICSSP-Panel Austin, May 15, 2016

YesWorkflowReferences•  hxp://yesworkflow.org

•  T.McPhillips,S.Bowers,K.Belhajjame,B.Ludäscher(2015).Retrospec2veProvenanceWithoutaRun2meProvenanceRecorder.7thUSENIXWorkshopontheTheoryandPrac-ceofProvenance(TaPP'15).

•  T.McPhillips,T.Song,T.Kolisnik,S.Aulenbach,K.Belhajjame,R.K.Bocinsky,Y.Cao,J.Cheney,F.ChirigaC,S.Dey,J.Freire,C.Jones,J.Hanken,K.W.KinCgh,T.A.Kohler,D.Koop,J.A.Macklin,P.Missier,M.Schildhauer,C.Schwalm,Y.Wei,M.Bieda,B.Ludäscher(2015).YesWorkflow:AUser-Oriented,Language-IndependentToolforRecoveringWorkflowInforma2onfromScripts.Interna-onalJournalofDigitalCura-on10,298-313.

•  JoãoFelipePimentel,SaumenDey,TimothyMcPhillips,KhalidBelhajjame,DavidKoop,LeonardoMurta,VanessaBraganholo,BertramLudäscher.Yin&Yang:Demonstra2ngComplementaryProvenancefromnoWorkflow&YesWorkflow(demopaper).ProvenanceandAnnota-onofDataandProcesses:6thIntl.ProvenanceandAnnota-onWorkshop(IPAW),June7-8,2016,McLean,VA.

39ICSSP'16Panel,AusCnTX