A talk given at a workshop in Atlanta on "Building an Integrated MGI Accelerator Network": see http://acceleratornetwork.org/event/building-an-integrated-mgi-accelerator-network/. The US Materials Genome Initiative seeks to develop an infrastructure that will accelerate advanced materials development and deployment. The term Materials Genome suggests a science that is fundamentally driven by the systematic capture of large quantities of elemental data. In practice, we know, things are more complex—in materials as in biology. Nevertheless, the ability to locate and reuse data is often essential to research progress. I discuss here three aspects of networking materials data: data publication and discovery; linking instruments, computations, and people to enable new research modalities based on near-real-time processing; and organizing data generation, transformation, and analysis software to facilitate understanding and reuse. I use these three problems to motivate a discussion of recent results in cloud computing, data publication management, high-performance computing, and related topics.
Citation preview
computationinstitute.org Networking materials data Ian Foster
[email protected] ianfoster.org
computationinstitute.org
computationinstitute.org
computationinstitute.org Materials Innovation Infrastructure A
data sharing system to facilitate: Use of a broader set of data to
render more accurate models Multi-disciplinary communication among
scientists and engineers working on different stages of materials
development Searches for advanced materials with specific, desired
properties Curating and sharing of reliable computational code for
modeling and simulation Credit: Meredith Drosback, OSTP Computation
Data Experiment
computationinstitute.org Its both We must manage the data
delugeboth to enhance user productivity and to increase data
capture Network materials data Or chaotic deluge? Wellington bucket
fountain: https://www.youtube.com/watch?v=_p_FNNDu16w
computationinstitute.org Linking simulation and experiment to
study disordered structures Diffuse scattering images from Ray
Osborn et al., Argonne
computationinstitute.org Linking simulation and experiment to
study disordered structures Diffuse scattering images from Ray
Osborn et al., Argonne SampleExperimental scattering Material
composition Simulated structure Simulated scattering La 60% Sr 40%
Detect errors (secsmins) Knowledge base Past experiments;
simulations; literature; expert knowledge Select experiments
(minshours) Contribute to knowledge base Simulations driven by
experiments (minsdays) Knowledge-driven decision making
Evolutionary optimization
computationinstitute.org An expensive business Network engineer
Parallel programme r Software engineer Database architect Database
manager Software engineer Data engineer Parallel programmer Postdoc
Postdoc
computationinstitute.org A small business, 20 years ago
Secretary HR manager Marketing Database manager Accountant IT
department Personal assistant Shipping department Intern
Payroll
computationinstitute.org A small business, today Business cloud
Reduce costs Speed innovation Reliable, scalable, simple
computationinstitute.org Can we do the same for research?
Discovery cloud Reduce costs Speed discovery Reliable, scalable,
simple ?
computationinstitute.org File transfer & sharing Discovery
cloud: Globus research data management services www.globus.org
computationinstitute.org Linking simulation and experiment to
study disordered structures Diffuse scattering images from Ray
Osborn et al., Argonne SampleExperimental scattering Material
composition Simulated structure Simulated scattering La 60% Sr 40%
Globus transfer service Cloud hosted: reliable, secure, fast 20K
users, 3B files, 50 PB transferred Available at www.globus.org
computationinstitute.org File transfer & sharing Identity
& group management Discovery cloud: Globus research data
management services www.globus.org
computationinstitute.org Linking simulation and experiment to
study disordered structures Diffuse scattering images from Ray
Osborn et al., Argonne SampleExperimental scattering Material
composition Simulated structure Simulated scattering La 60% Sr 40%
Evolutionary optimization Globus sharing Identities, groups,
profiles Cloud hosted
computationinstitute.org File transfer & sharing Data
publication & discovery Identity & group management
Discovery cloud: Globus research data management services
www.globus.org
computationinstitute.org Linking simulation and experiment to
study disordered structures Diffuse scattering images from Ray
Osborn et al., Argonne SampleExperimental scattering Material
composition Simulated structure Simulated scattering La 60% Sr 40%
Knowledge base Past experiments; simulations; literature; expert
knowledge Contribute to knowledge base Knowledge-driven decision
making Globus data publication and discovery Cloud hosted
computationinstitute.org Data publication and discovery We are
looking for pilot users! Metadata Access Control License Storage
Curation Workflow Policies Collection Metadata DataMetadata Data
Metadata Data Dataset Dataset Dataset Community
computationinstitute.org Publish dashboard 20
computationinstitute.org Start a new submission 21
22 Describe submission: 1) Dublin Core
23 Describe submission: 2) Science metadata
computationinstitute.org Assemble the dataset 24
25 Transfer files to submission endpoint
26 Check dataset is assembled correctly
computationinstitute.org Submission now in curation workflow
27
computationinstitute.org Search published datasets 28
computationinstitute.org Search across collections
computationinstitute.org Discover a published dataset 30
computationinstitute.org Select a published dataset 31
computationinstitute.org File transfer & sharing Data
publication & discovery Simulation & data analysis Identity
& group management Discovery cloud: Globus research data
management services www.globus.org
computationinstitute.org Linking simulation and experiment to
study disordered structures Diffuse scattering images from Ray
Osborn et al., Argonne SampleExperimental scattering Material
composition Simulated structure Simulated scattering La 60% Sr 40%
Detect errors (secsmins) Knowledge base Past experiments;
simulations; literature; expert knowledge Select experiments
(minshours) Contribute to knowledge base Simulations driven by
experiments (minsdays) Knowledge-driven decision making
Evolutionary optimization
Justin Wozniak et al.
computationinstitute.org Tool shed Simulation models &
analysis tools Data space Local and remote datasets Workflows Link
data, tools in reusable form Simulation and data analysis: Point
and click parallelism Capture domain knowledge: data and code
Reusable workflows encode commonly used modeling and analysis
pipelines Builds on widely used Galaxy, Globus, and Swift systems
galaxyproject.org globus.org swift-lang.org Large simulation
campaigns Hosted on Amazon cloud for reliable, on-demand access and
scalability
computationinstitute.org Discovery Cloud: Three common themes
1) Accelerate discovery via automation 2) Slash costs of trying new
methods No local software installation No need to read manual
On-demand, elastic scalability Low operational costs, proactive
support 3) Make data preservation trivial
computationinstitute.org Take away messages Data has a dual
nature: rare treasure and chaotic deluge MGI must embrace this
duality Treasure: Store, curate, index, preserve Deluge: Slash
management costs, to both accelerate use & facilitate data
preservation Cloud services can help in both areas
computationinstitute.org Thanks to great colleagues and
collaborators Rachana Ananthakrishnan, Ben Blaiszik, Kyle Chard,
Raj Kettimuthu, Ravi Madduri, Tanu Malik, Steve Tuecke, Justin
Wozniak, and other CS colleagues Ray Osborn, Francesco de Carlo,
Chris Jacobsen, Nicola Ferrier, and other Argonne scientists Juan
de Pablo, Peter Voorhees, and other NIST CHiMaD participants
computationinstitute.org Thank you to our sponsors!