Upload
bertram-ludaescher
View
60
Download
0
Embed Size (px)
Citation preview
BertramLudä[email protected]
Director,CenterforInforma-csResearchinScience&Scholarship(CIRSS)
GraduateSchoolofLibraryandInforma-onScience(GSLIS)&Na-onalCenterforSupercompu-ngApplica-ons(NCSA)
&DepartmentofComputerScience
Scien2ficWorkflows&Provenance
Intl.Conf.onSo+ware&SystemProcesses(ICSSP)Co-locatedwithICSE’16,AusCn,TX,May14-22,2016
• Scien2ficWorkflows– Examples,Features
• Provenance&ReproducibleScience– “ProspecCveProvenance”(a.k.a.workflows)– RetrospecCveProvenance
• YesWorkflow– Yes,ScriptscanbeWorkflows,too!
• TheQuestcon2nues!
Outline
2ICSSP'16Panel,AusCnTX
Scientific Workflows: ASAP • Automation
– wfs to automate computational aspects of science
• Scaling (exploit and optimize machine cycles) – wfs should make use of parallel compute resources – wfs should be able handle large data
• Abstraction, Evolution, Reuse (human cycles) – wfs should be easy to (re-)use, evolve, share
• Provenance – wfs should capture processing history, data lineage è traceable data- and wf-evolution è Reproducible Science
TridentWorkbench
VisTrails
3
Eswareinmal…ICSSP'16Panel,AusCnTX
10Essen2alfunc2onsofascien2ficworkflowsystem1. AutomateprogramsandservicesscienCstsalreadyuse.
2. ScheduleinvocaConsofprogramsandservicescorrectlyandefficiently–inparallelwherepossible.
3. Managedataflowto,from,andbetweenprogramsandservices.
4. Enablescien2sts(notjustdevelopers)toauthorormodifyworkflowseasily.
5. Predictwhataworkflowwilldowhenexecuted:prospec1veprovenance.
6. RecordwhathappenedduringworkflowexecuCon:retrospec1veprovenance.
7. Revealretrospec2veprovenance–howworkflowproductswerederivedfrominputsviaprogramsandservices.
8. Organizeintermediateandfinaldataproductsasdesiredbyusers.
9. EnablescienCststoversion,shareandpublishtheirworkflows.
10. Empowerscien2stswhowishtoautomateaddi2onalprogramsandservicesthemselves.
Thesefunc2ons(notjustdataflow&actors)dis2nguishscien1ficworkflowautoma1onfromgeneralscien2ficsoZwaredevelopment.
ICSSP'16Panel,AusCnTX 4
Src:TimothyMcPhillips
FindOTUs
(OTUHunter)
AssignTaxonomy(STAP)
Profilealignment
(STAPorInfernal)
BuildphylogeneCctree(RaxMLorQuicktree)
Viewtree:Dendroscope
UniFrac:tree&
environmentfile
AssembledconCgs
Chimeracheck
(Mallard)
DiversitystaCsCcs:Text:OUTlist,Chao1,Shannon
Graphs:rarefacConcurves,rank-abundancecurves
VisualizaContools:Cytoscapenetworks&Heatmap
WATERS: WorkflowforAlignment,Taxonomy,EcologyofRibosomalSequences(AmberHartman;EisenLab;UCDavis)
+/-cipres
+/-cluster
+/-cluster
+/-cluster
ICSSP'16Panel,AusCnTX 5
Executable WATERS Workflow in Kepler
ICSSP'16Panel,AusCnTX 6
Example Bioinformatics
Workflow:
Motif-Catcher
MarcFaccionetal.UCDavisGenomeCenter
ICSSP'16Panel,AusCnTX 7
Motif-Catcher workflow, implemented in Kepler
SKöhleretal.ImprovedMoCfDetecConinLargeSequenceSetswithRandomSamplinginaKeplerworkflow,ICCS-WS,2012
ICSSP'16Panel,AusCnTX 8
A Data-Streaming Workflow over Sensor Data
ICSSP'16Panel,AusCnTX 9
• MonitorandcontrolsupercomputersimulaCons
– 50+compositeactors(subworkflows)– 4levelsofhierarchy– 1000+atomic(Java)actors
43actors,3levels
196actors,4levels30actors
206actors,4levels
137actors33actors
150123actors
66actors12actors
243actors,4levels
NorbertPodhorszkiORNL(then:UCDavis)
“Plumbing”workflow
ICSSP'16Panel,AusCnTX 10
More “Plumbing” (beware the Boolean Select)
Cabellosetal.ComputerPhysicsCommunica-ons182,2011
ICSSP'16Panel,AusCnTX 11
Scien2ficWorkflowDesign:SomeChallenges
“And the graphical UI makes our scientific workflows so much easier to develop, understand and maintain!”
ICSSP'16Panel,AusCnTX 12
Modeling & Design: Die Grenzen meiner Sprache bedeuten die Grenzen meiner Welt VanillaProcessNetwork
Func2onalProgrammingDataflowNetwork
XMLTransforma2onNetwork
Collec2on-orientedModeling&Design
framework(COMAD)“LookMa:NoShims!”
ICSSP'16Panel,AusCnTX 13
Problemswith[toomany]ShimsandWires• Shimsneedtobeplacedandconnected
– Tedious,error-prone• DistractfromscienCficmeaningfulactors
– Non-descripCveworkflows–worthsharing?• DataOrganizaConisencodedinworkflowstructure
– Notrobusttodatachanges• Shimso+enleadtocomplexdesigns
– Imagineallprevious`design-paxerns’intertwined– GOTO-programming
COMAD/VDAL:Raisingthelevelofabstrac1on
Localizedcontrol-flow
Datamanagementnotdoneviawires
Actorsarecouplednotbywirebutbydata!ICSSP'16Panel,AusCnTX 14
CollecCon-OrientedModeling&Design(COMAD)– fullyembracetheassemblylinemetaphor
– data=taggednestedcollec2ons– e.g.representedasflaxened,pipelined(XML)tokenstreams:
PipelinedCollec2on-OrientedWorkflows
Actors(likeassemblylineworkers),passonwhattheydon’tworkon
TMcPhillips,SBowers,DZinn,BLudäscher
ICSSP'16Panel,AusCnTX 15
Two different workflow designs
• Hardwiringvs.configurabledata/collecConmanagement• brixlevs.changeresilientdesigns• scienCstcanrecognizenapkindrawing/conceptualmodel• Humancyclesareexpensive
ICSSP'16Panel,AusCnTX 16
ADIOS in Kepler
ICSSP'16Panel,AusCnTX 17
ADIOS in COMAD
ICSSP'16Panel,AusCnTX 18
From “Climate Gate” to Reproducible Science
Capturing provenance is crucial for transparency, interpretation, debugging, … => repeatable experiments, => reproducible science=> need workflow-system agnostic model
ICSSP'16Panel,AusCnTX 19
20
NaturalHistory:Understandingwhathappened…
Zrzavý,Jan,DavidStorch,andStanislavMihulka.Evolu-on:EinLese-Lehrbuch.Springer-Verlag,2009.
Author:Jkwchui(BasedondrawingbyTruth-seeker2004)
ICSSP'16Panel,AusCnTX
Computa2onalProvenance
• Originandprocessinghistoryofanar2fact– usually:data(products),figures,...– someCmes:workflow(andscript)evoluCon…
• Differentsub-communiCes:– Provenanceindatabases– Provenancein(scien2fic)workflows– ...programminglanguages,systems/security,…
21ICSSP'16Panel,AusCnTX
22
Run1meProvenance(a.k.a.traces,logs,
retrospec1veprovenance,“Trace-land”)
DifferentKindsofDataProvenanceinWorkflows
WorkflowModeling&Design(a.k.a.prospec1veprovenance
“Workflow-land”)
ICSSP'16Panel,AusCnTX
SKOPE:SynthesizedKnowledgeOfPastEnvironments
23
Bocinsky,Kohleretal.studyrain-fedmaizeofAnasazi– FourCorners;AD600–1500.ClimatechangeinfluencedMesaVerdeMigra2ons;late
13thcenturyAD.Usesnetworkoftree-ringchronologiestoreconstructaspa2o-temporalclimatefieldatafairlyhighresoluCon(~800m)fromAD1–2000.AlgorithmesCmatesjointinformaConintree-ringsandaclimatesignaltoidenCfy“best”tree-ringchronologiesforclimatereconstrucCng.
K.Bocinsky,T.Kohler,A2000-yearreconstrucConoftherain-fedmaizeagriculturalnicheintheUSSouthwest.Nature
Communica1ons.doi:10.1038/ncomms6618
… implemented as an R Script … ICSSP'16Panel,AusCnTX
GetModernClimate
PRISM_annual_growing_season_precipitation
SubsetAllData
dendro_series_for_calibration
dendro_series_for_reconstruction CAR_Analysis_unique
cellwise_unique_selected_linear_models
CAR_Analysis_union
cellwise_union_selected_linear_models
CAR_Reconstruction_union
raster_brick_spatial_reconstruction raster_brick_spatial_reconstruction_errors
CAR_Reconstruction_union_output
ZuniCibola_PRISM_grow_prcp_ols_loocv_union_recons.tif ZuniCibola_PRISM_grow_prcp_ols_loocv_union_errors.tif
master_data_directory prism_directory
tree_ring_datacalibration_years retrodiction_years
?
YesWorkflow:Yes,scriptsareworkflows,too!
• ScriptvsWorkflows/ASAP:– Automation:*****– Scaling:**– Abstraction:*– Provenance:**
24ICSSP'16Panel,AusCnTX
YesWorkflow.org• YesWorkflow(YW)
– Startedasagrass-rootseffort(Kurator,SKOPE,..)– …meeCngthescienCsts/userswheretheyR!
• R,Matlab,(i)Python,Jupyter,…
– Scripts+simpleuserannotaCons
• =>Revealtheworkflowmodel/abstrac2on…thatunderliesthe(script)implementa-on
• =>YWcangiveusmoreofASAP!– FirstYW:ASAP(AbstracCon)...– ThenYW-recon:ASAP(reconstrucCngrun2meProvenance)
25ICSSP'16Panel,AusCnTX
YWannota2ons:ModelyourWorkflow!
26ICSSP'16Panel,AusCnTX
YesWorkflow:Prospec2ve&RetrospecCveProvenance…(almost)forfree!
• YWannotaConsinthescript(R,Python,Matlab)areusedtorecreatetheworkflowviewfromthescript…
27
cassette_id
sample_score_cutoff
sample_spreadsheetfile:cassette_{cassette_id}_spreadsheet.csv
calibration_imagefile:calibration.img
initialize_run
run_logfile:run/run_log.txt
load_screening_results
sample_namesample_quality
calculate_strategy
rejected_sample accepted_sample num_images energies
log_rejected_sample
rejection_logfile:/run/rejected_samples.txt
collect_data_set
sample_id energy frame_numberraw_image
file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw
transform_images
corrected_imagefile:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img
total_intensitypixel_count corrected_image_path
log_average_image_intensity
collection_logfile:run/collected_images.csv
YW!
ICSSP'16Panel,AusCnTX
GetModernClimate
PRISM_annual_growing_season_precipitation
SubsetAllData
dendro_series_for_calibration
dendro_series_for_reconstruction CAR_Analysis_unique
cellwise_unique_selected_linear_models
CAR_Analysis_union
cellwise_union_selected_linear_models
CAR_Reconstruction_union
raster_brick_spatial_reconstruction raster_brick_spatial_reconstruction_errors
CAR_Reconstruction_union_output
ZuniCibola_PRISM_grow_prcp_ols_loocv_union_recons.tif ZuniCibola_PRISM_grow_prcp_ols_loocv_union_errors.tif
master_data_directory prism_directory
tree_ring_datacalibration_years retrodiction_years
PaleoclimateReconstruc2on(EnviRecon.org)
28
• …explainedusingYesWorkflow!
KyleB.,(computaConal)archaeologist:"Ittookmeabout20minutestocomment.LessthananhourtolearnandYW-annotate,all-told."
ICSSP'16Panel,AusCnTX
main
fetch_maskinput_mask_file
load_datainput_data_file standardize_with_mask
land_water_mask
NEE_data simple_diagnosestandardized_NEE_data result_NEE_pdf
Get3viewsforthepriceof1!
29
result_NEE_pdf
input_mask_file land_water_maskfetch_mask
input_data_file NEE_dataload_data
standardized_NEE_data
standardize_with_mask
standardize_with_masksimple_diagnose
fetch_mask land_water_mask
load_data NEE_data
standardize_with_mask standardized_NEE_data simple_diagnose result_NEE_pdf
input_mask_file
input_data_file
Processview
Dataview
Combinedview
ICSSP'16Panel,AusCnTX
Mul2-ScaleSynthesisandTerrestrialModelIntercomparisonProject(MsTMIP)
fetch_drought_variable
drought_variable_1
fetch_effect_variable
effect_variable_1
convert_effect_variable_units
effect_variable_2
create_land_water_mask
land_water_mask
init_data_variables
predrought_effect_variable_1 drought_value_variable_1 recovery_time_variable_1 drought_number_variable_1
define_droughts
sigma_dv_event month_dv_length
detrend_deseasonalize_effect_variable
effect_variable_3
calculate_data_variables
recovery_time_variable_2 drought_value_variable_2 predrought_effect_variable_2 drought_number_variable_2
export_recovery_time_figure
output_recovery_time_figure
export_drought_value_variable_figure
output_drought_value_variable_figure
export_predrought_effect_variable_figure
output_predrought_effect_variable_figure
export_drought_number_variable_figure
output_drought_number_figure
input_drough_variable
input_effect_variable
ChristopherSchwalm,YaxingWei
30ICSSP'16Panel,AusCnTX
���������������������� �������
��� ������������������������� ��
���������������� ��
�����������
���������������������������������������������
��������������
����������������
���������������
������������������������������
������������������������������������� �������
������������������������������
����������������������������
�������
�����������������������������������������
����������������������������������
����������������������������������
���������� ��������������������������������������������������������������
��������������������������������
��������������������������������
���������� ���������������������
Figure 4: Process workflow view of an A↵ymetrix analysis script (in R).
4 YesWorkflow Examples
In the following we show YesWorkflow views extracted from real-world scientific use cases.The scripts were annoted with YW tags by scientists and script authors, using a verymodest training and mark-up e↵ort.1 Due to lack of space, the actual MATLAB and R
scripts with their YW markup are not included here. However, they are all availablefrom the yw-idcc-15 repository on the YW GitHub site [Yes15].
4.1 Analysis of Gene Expression Microarray Data
Bioinformatics workflows commonly possess a pattern of large numbers of incoming pa-rameters and outputs at each stage of computation. In addition, analysis of even asingle bioinformatics dataset tends to yield a large number of di↵erent output files.Hence, bioinformatics pipelines are attractive candidates for workflow systems, whichcan capture this complexity [Bie12]. Figure 4 shows a YesWorkflow representation ofan R script performing a classic, complex bioinformatics task: analysis of A↵ymetrixgene expression microarray data. This R script was modeled on our previous work-flows developed in the Kepler environment [SMLB12]. The script analyzes experimentdesigns consisting of two conditions (e.g., microarrays from control-treated cells vs mi-croarrays from drug-treated cells) with multiple replicates in each condition. The R
script employs a set of standard BioConductor [GCB+04] packages mixed with customprogramming. The workflow consists of four fundamental tasks: normalization of dataacross microarray datasets (Normalize), selection of di↵erentially expressed genes (DEGs)between conditions (SelectDEGs), determination of gene ontology (GO) statistics for theresulting datasets (GO Analysis), and creation of a heatmap of the di↵erentially ex-pressed genes (MakeHeatmap). Each module produces outputs, and each module (asidefrom MakeHeatmap) requires external parameter inputs. Importantly, this graphical rep-resentation clearly indicates the dependence of each module on datasets and parameterinputs. This example demonstrates that YesWorkflow can provide informative visualiza-tions of bioinformatics workflows, especially workflows involving large numbers of inputsand outputs.
1For all of these scripts, learning the YW model and annotating the scripts was done in a few hours.
6
GeneExpressionMicroarrayDataAnalysis
• [Normalize]– NormalizaConofdataacrossmicroarraydatasets
• [SelectDEGs]– SelecConofdifferenCallyexpressedgenesbetweencondiCons
• [GOAnalysis]– determinaConofgeneontologystaCsCcsfortheresulCngdatasets
• [MakeHeatmap]– creaConofaheatmapofthedifferenCallyexpressedgenes.
TylerKolisnik,MarkBieda
31ICSSP'16Panel,AusCnTX
initialize_run
run_logfile:run/run_log.txt
load_screening_results
sample_name sample_quality
calculate_strategy
rejected_sample accepted_sample num_imagesenergies
log_rejected_sample
rejection_logfile:/run/rejected_samples.txt
collect_data_set
sample_idenergyframe_numberraw_image
file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw
transform_images
corrected_imagefile:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img
total_intensitypixel_count corrected_image_path
log_average_image_intensity
collection_logfile:run/collected_images.csv
sample_spreadsheetfile:cassette_{cassette_id}_spreadsheet.csv
calibration_imagefile:calibration.img
cassette_id
sample_score_cutoff
Datacollec2onworkflow(X-raydiffracCon)
32ICSSP'16Panel,AusCnTX
run/
├── raw
│ └── q55
│ ├── DRT240
│ │ ├── e10000
│ │ │ ├── image_001.raw
... ... ... ...
│ │ │ └── image_037.raw
│ │ └── e11000
│ │ ├── image_001.raw
... ... ...
│ │ └── image_037.raw
│ └── DRT322
│ ├── e10000
│ │ ├── image_001.raw
... ... ...
│ │ └── image_030.raw
│ └── e11000
│ ├── image_001.raw
... ...
│ └── image_030.raw
├── data
│ ├── DRT240
│ │ ├── DRT240_10000eV_001.img
... ... ...
│ │ └── DRT240_11000eV_037.img
│ └── DRT322
│ ├── DRT322_10000eV_001.img
... ...
│ └── DRT322_11000eV_030.img
│
├── collected_images.csv
├── rejected_samples.txt
└── run_log.txt
YW-RECON:ProspecCve&Retrospec2veProvenance…(almost)forfree!
33
cassette_id
sample_score_cutoff
sample_spreadsheetfile:cassette_{cassette_id}_spreadsheet.csv
calibration_imagefile:calibration.img
initialize_run
run_logfile:run/run_log.txt
load_screening_results
sample_namesample_quality
calculate_strategy
rejected_sample accepted_sample num_images energies
log_rejected_sample
rejection_logfile:/run/rejected_samples.txt
collect_data_set
sample_id energy frame_numberraw_image
file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw
transform_images
corrected_imagefile:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img
total_intensitypixel_count corrected_image_path
log_average_image_intensity
collection_logfile:run/collected_images.csv
• URI-templateslinkconceptualenCCestorun2meprovenance“le+behind”bythescriptauthor…
• …facilitaCngprovenancereconstruc2onICSSP'16Panel,AusCnTX
initialize_run
run_logfile:run/run_log.txt
load_screening_results
sample_name sample_quality
calculate_strategy
rejected_sample accepted_sample num_imagesenergies
log_rejected_sample
rejection_logfile:/run/rejected_samples.txt
collect_data_set
sample_idenergyframe_numberraw_image
file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw
transform_images
corrected_imagefile:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img
total_intensitypixel_count corrected_image_path
log_average_image_intensity
collection_logfile:run/collected_images.csv
sample_spreadsheetfile:cassette_{cassette_id}_spreadsheet.csv
calibration_imagefile:calibration.img
cassette_id
sample_score_cutoff
Q1:Whatsamplesdidthescriptruncollectimagesfrom?
run/
├── raw
│ └── q55
│ ├── DRT240
│ │ ├── e10000
│ │ │ ├── image_001.raw
... ... ... ...
│ │ │ └── image_037.raw
│ │ └── e11000
│ │ ├── image_001.raw
... ... ...
│ │ └── image_037.raw
│ └── DRT322
│ ├── e10000
│ │ ├── image_001.raw
... ... ...
│ │ └── image_030.raw
│ └── e11000
│ ├── image_001.raw
... ...
│ └── image_030.raw
├── data
│ ├── DRT240
│ │ ├── DRT240_10000eV_001.img
... ... ...
│ │ └── DRT240_11000eV_037.img
│ └── DRT322
│ ├── DRT322_10000eV_001.img
... ...
│ └── DRT322_11000eV_030.img
│
├── collected_images.csv
├── rejected_samples.txt
└── run_log.txt
34ICSSP'16Panel,AusCnTX
initialize_run
run_logfile:run/run_log.txt
load_screening_results
sample_name sample_quality
calculate_strategy
rejected_sample accepted_sample num_imagesenergies
log_rejected_sample
rejection_logfile:/run/rejected_samples.txt
collect_data_set
sample_idenergyframe_numberraw_image
file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw
transform_images
corrected_imagefile:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img
total_intensitypixel_count corrected_image_path
log_average_image_intensity
collection_logfile:run/collected_images.csv
sample_spreadsheetfile:cassette_{cassette_id}_spreadsheet.csv
calibration_imagefile:calibration.img
cassette_id
sample_score_cutoff
Q2:WhatenergieswereusedforimagecollecConfromsampleDRT322?
run/
├── raw
│ └── q55
│ ├── DRT240
│ │ ├── e10000
│ │ │ ├── image_001.raw
... ... ... ...
│ │ │ └── image_037.raw
│ │ └── e11000
│ │ ├── image_001.raw
... ... ...
│ │ └── image_037.raw
│ └── DRT322
│ ├── e10000
│ │ ├── image_001.raw
... ... ...
│ │ └── image_030.raw
│ └── e11000
│ ├── image_001.raw
... ...
│ └── image_030.raw
├── data
│ ├── DRT240
│ │ ├── DRT240_10000eV_001.img
... ... ...
│ │ └── DRT240_11000eV_037.img
│ └── DRT322
│ ├── DRT322_10000eV_001.img
... ...
│ └── DRT322_11000eV_030.img
│
├── collected_images.csv
├── rejected_samples.txt
└── run_log.txt
35ICSSP'16Panel,AusCnTX
initialize_run
run_logfile:run/run_log.txt
load_screening_results
sample_name sample_quality
calculate_strategy
rejected_sample accepted_sample num_imagesenergies
log_rejected_sample
rejection_logfile:/run/rejected_samples.txt
collect_data_set
sample_idenergyframe_numberraw_image
file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw
transform_images
corrected_imagefile:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img
total_intensitypixel_count corrected_image_path
log_average_image_intensity
collection_logfile:run/collected_images.csv
sample_spreadsheetfile:cassette_{cassette_id}_spreadsheet.csv
calibration_imagefile:calibration.img
cassette_id
sample_score_cutoff
Q3:WhereistherawimageofthecorrectedimageDRT322_11000ev_030.img?run/
├── raw
│ └── q55
│ ├── DRT240
│ │ ├── e10000
│ │ │ ├── image_001.raw
... ... ... ...
│ │ │ └── image_037.raw
│ │ └── e11000
│ │ ├── image_001.raw
... ... ...
│ │ └── image_037.raw
│ └── DRT322
│ ├── e10000
│ │ ├── image_001.raw
... ... ...
│ │ └── image_030.raw
│ └── e11000
│ ├── image_001.raw
... ...
│ └── image_030.raw
├── data
│ ├── DRT240
│ │ ├── DRT240_10000eV_001.img
... ... ...
│ │ └── DRT240_11000eV_037.img
│ └── DRT322
│ ├── DRT322_10000eV_001.img
... ...
│ └── DRT322_11000eV_030.img
│
├── collected_images.csv
├── rejected_samples.txt
└── run_log.txt
36ICSSP'16Panel,AusCnTX
initialize_run
run_logfile:run/run_log.txt
load_screening_results
sample_name sample_quality
calculate_strategy
rejected_sample accepted_sample num_imagesenergies
log_rejected_sample
rejection_logfile:/run/rejected_samples.txt
collect_data_set
sample_idenergyframe_numberraw_image
file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw
transform_images
corrected_imagefile:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img
total_intensitypixel_count corrected_image_path
log_average_image_intensity
collection_logfile:run/collected_images.csv
sample_spreadsheetfile:cassette_{cassette_id}_spreadsheet.csv
calibration_imagefile:calibration.img
cassette_id
sample_score_cutoff
run/
├── raw
│ └── q55
│ ├── DRT240
│ │ ├── e10000
│ │ │ ├── image_001.raw
... ... ... ...
│ │ │ └── image_037.raw
│ │ └── e11000
│ │ ├── image_001.raw
... ... ...
│ │ └── image_037.raw
│ └── DRT322
│ ├── e10000
│ │ ├── image_001.raw
... ... ...
│ │ └── image_030.raw
│ └── e11000
│ ├── image_001.raw
... ...
│ └── image_030.raw
├── data
│ ├── DRT240
│ │ ├── DRT240_10000eV_001.img
... ... ...
│ │ └── DRT240_11000eV_037.img
│ └── DRT322
│ ├── DRT322_10000eV_001.img
... ...
│ └── DRT322_11000eV_030.img
│
├── collected_images.csv
├── rejected_samples.txt
└── run_log.txt
Q5:Whatcassere-idhadthesampleleadingtoDRT240_10000ev_001.img?
37ICSSP'16Panel,AusCnTX
Meme-Pad&Conclusions• Scien2ficworkflows(ASAP)
– Data-orientedvsprocess-orientedmodeling• dataflowmodelsvscontrol-floworientedmodels• conceptualmodelsvsexecuConmodelsèWhatmodelsandtoolsforwhichkindsofworkflows?
– Dataprovenanceinworkflows• ProspecCveprovenance• RetrospecCveprovenance
• YesWorkflow:– Scriptsare(canbe)workflows,too!
• TheQuest:somanymodels,soli^le-me(contrastw/databases)– Empoweringscien2sts…– …withusefultoolsbasedonusefulmodels– …meeCngthemwheretheyalreadyare– …provenanceiskey– Onesizemightnotfitall:BewareoftheTuringtar-pit!
38ICSSP'16Panel,AusCnTX
YesWorkflowReferences• hxp://yesworkflow.org
• T.McPhillips,S.Bowers,K.Belhajjame,B.Ludäscher(2015).Retrospec2veProvenanceWithoutaRun2meProvenanceRecorder.7thUSENIXWorkshopontheTheoryandPrac-ceofProvenance(TaPP'15).
• T.McPhillips,T.Song,T.Kolisnik,S.Aulenbach,K.Belhajjame,R.K.Bocinsky,Y.Cao,J.Cheney,F.ChirigaC,S.Dey,J.Freire,C.Jones,J.Hanken,K.W.KinCgh,T.A.Kohler,D.Koop,J.A.Macklin,P.Missier,M.Schildhauer,C.Schwalm,Y.Wei,M.Bieda,B.Ludäscher(2015).YesWorkflow:AUser-Oriented,Language-IndependentToolforRecoveringWorkflowInforma2onfromScripts.Interna-onalJournalofDigitalCura-on10,298-313.
• JoãoFelipePimentel,SaumenDey,TimothyMcPhillips,KhalidBelhajjame,DavidKoop,LeonardoMurta,VanessaBraganholo,BertramLudäscher.Yin&Yang:Demonstra2ngComplementaryProvenancefromnoWorkflow&YesWorkflow(demopaper).ProvenanceandAnnota-onofDataandProcesses:6thIntl.ProvenanceandAnnota-onWorkshop(IPAW),June7-8,2016,McLean,VA.
39ICSSP'16Panel,AusCnTX