8

Click here to load reader

Applications and Requirements for Scientific Workflow May 1 2006 NSF Geoffrey Fox Indiana University

Embed Size (px)

DESCRIPTION

The Application Drivers Phrase as transformative research – does term “scientific workflow” conjure up the innovative future or perhaps a bureaucratic past? –Distributed interdisciplinary data deluged scientific methodology as an end (instrument, conjecture) to end (paper, Nobel prize) process is a Transformative approach –Provide CS support for this scientific revolution Ground these in scenarios or in application descriptions that lead to these requirements –Spans all NSF directorates –Astronomy (multi-wavelength VO), Biology (Genomics/Proteomics), Chemistry (Drug Discovery), Environmental Science (multi-sensor monitors as in NEON), Engineering (NEES, multi-disciplinary design), Geoscience (Ocean/Weather/Earth(quake) data assimilation), Medicine (multi-modal/instrument imaging), Physics (LHC, Material design), Social science (Critical Infrastructure simulations for DHS) etc.

Citation preview

Page 1: Applications and Requirements for Scientific Workflow May 1 2006 NSF Geoffrey Fox Indiana University

Applications and Requirementsfor Scientific Workflow

May 1 2006NSF

Geoffrey FoxIndiana University

Page 2: Applications and Requirements for Scientific Workflow May 1 2006 NSF Geoffrey Fox Indiana University

Team Members

• Geoffrey Fox, Indiana University (lead) • Mark Ellisman, UCSD• Constantinos Evangelinos, Massachusetts

Institute of Technology • Alexander Gray, Georgia Tech • Walt Scacchi, University of California, Irvine • Ashish Sharma, Ohio State University • Alex Szalay, John Hopkins University

Page 3: Applications and Requirements for Scientific Workflow May 1 2006 NSF Geoffrey Fox Indiana University

The Application Drivers• Phrase as transformative research – does term

“scientific workflow” conjure up the innovative future or perhaps a bureaucratic past?– Distributed interdisciplinary data deluged scientific

methodology as an end (instrument, conjecture) to end (paper, Nobel prize) process is a Transformative approach

– Provide CS support for this scientific revolution• Ground these in scenarios or in application

descriptions that lead to these requirements– Spans all NSF directorates– Astronomy (multi-wavelength VO), Biology

(Genomics/Proteomics), Chemistry (Drug Discovery), Environmental Science (multi-sensor monitors as in NEON), Engineering (NEES, multi-disciplinary design), Geoscience (Ocean/Weather/Earth(quake) data assimilation), Medicine (multi-modal/instrument imaging), Physics (LHC, Material design), Social science (Critical Infrastructure simulations for DHS) etc.

Page 4: Applications and Requirements for Scientific Workflow May 1 2006 NSF Geoffrey Fox Indiana University

Complex scientific methodology produces modern scientific results

http://antwrp.gsfc.nasa.gov/apod/ap060118.html

Page 5: Applications and Requirements for Scientific Workflow May 1 2006 NSF Geoffrey Fox Indiana University

CHANDRA X-ray observatory data processing workflow streams

Page 6: Applications and Requirements for Scientific Workflow May 1 2006 NSF Geoffrey Fox Indiana University

What has changed?• Exponential growth in Compute(18), Sensors(18?), Data

storage(12), Network(8) (doubling time in months); performance variable in practice (last mile for networks)

• Data deluge (ignored largely in grand challenges, HPCC 1990-2000)

• Algorithms (simulation, data analysis) comparable additional improvements

• Growth of Science not same exponential? (remains linear? More interdisciplinary)

• Distributed scientists and distributed shared data (not uniform in all fields)

• Establishes distributed data deluged scientific methodology• Improve community practices that have not kept pace with

changed methodology (e.g. DICOM standard in medical imaging only supports partial provenance, standalone applications don’t support distribution)

Page 7: Applications and Requirements for Scientific Workflow May 1 2006 NSF Geoffrey Fox Indiana University

Application Requirements I• Reproducibility core to Scientific method and requires rich

provenance, interoperable persistent repositories with linkage of open data and publication as well as distributed simulations, data analysis and new algorithms.

• Distributed Science Methodology captures and publishes all steps (a rich cloud of resources including emails, Wikis as new electronic log books as well as databases, compiler options …) in scientific process (data analysis) in a fashion that allows process to be reproducible; need to be able to electronically reference steps in process. – Traditional workflow like BPEL only describes a small part of this

• Multiple collaborative heterogeneous interdisciplinary approaches to all aspects of the distributed science methodology inevitable; need research on integration of this diversity

• Multiple “ibilities” (security, reliability, usability, scalability)

Page 8: Applications and Requirements for Scientific Workflow May 1 2006 NSF Geoffrey Fox Indiana University

Application Requirements II• Cope with inevitable inconsistencies and inadequacies of

metadata standards; support for curation, data validation and “scrubbing” in algorithms and provenance; reputation and trust systems for data providers

• As we scale size and richness of data and algorithms, need a scalable methodology that hides complexity (compatible with number of scientists increasing slowly); must be simple and validatable

• Automate efficient provisioning, deployment and provenance generation of complex simulations and data analysis; support deployment and interoperable specification of user’s abstract workflow; support interactive user

• Support automated and innovative individual contributions to core “black boxes” (produced by “marine corps” for “common case”) and for general user’s actions such as choice and annotation.