Data- and Compute-Driven Transformation of Modern Science How e-Infrastructure & Policy Support...
If you can't read please download the document
Data- and Compute-Driven Transformation of Modern Science How e-Infrastructure & Policy Support Paradigm Shifts in Research Edward Seidel Senior Vice President,
Data- and Compute-Driven Transformation of Modern Science How
e-Infrastructure & Policy Support Paradigm Shifts in Research
Edward Seidel Senior Vice President, Research and Innovation
Skolkovo Institute of Science and Technology 1
Slide 2
2 Part 1: We are in a period of unprecedented change in Science
and Societythe crises & opportunities this creates
Slide 3
3 3 Profound Transformation of Science Collision of Two Black
Holes 1972: Hawking. 1 person, no computer 50 KB 1994: 10 people,
NCSA Cray Y-MP, 50MB 1998: 15 people, NCSA Origin, 50GB
Slide 4
Community Einstein Toolkit Einstein Toolkit : open software for
astrophysics to enable new science, facilitate interdisciplinary
research and use emerging petascale computers and advanced CI.
Consortium: 92 members, 46 sites, 15 countries Whole consortium
engaged in directions, support, development Simulation: Luciano
Rezzolla, Max Planck Institut fr Gravitationsphysik (AEI) 4
Community + software + algorithms + hardware + Many groups can do
this: field explodes! Major triumph of Computational
Science---solve EEs!
Slide 5
New Frontiers: Relativistic Matter Nuclear equations of state
Collab. with astrophysicists General relativistic
magnetohydrodynamics Some groups have ideal MHD Radiation Transport
(neutrinos/photons) Expensive and complicated! Requires
opacities/emissivities Chemical reactions (thermonuclear, chemical)
SN community! Computation: Multiphysics!! GRMHD: petascale problem
Radiation transport beyond this Schnetter et al, PetaScale
Computing: Algorithms and Applications, 2007 Zoom in to just this
part: post BH formation and evolution of jet Zoom in to just this
part: post BH formation and evolution of jet
Slide 6
Schnetter, et al Post BH Formation/Evolution of Jet
Multiphysics GR, Neutrinos, MHD, Nuclear EOS Computer Science 10
level AMR, optimized for Blue Waters Science 200s evolution,
analyze it all Blue Waters 6M Pflop@1PF sustained = 70days! 6
Multiphyiscs framework needed for fluids in astrophysics and porous
media Rezzolla, et al
Slide 7
7 Theory and Observation of Universe Gravitational Waves!
Complex problems in relativistic astrophysics Relativity,
hydrodynamics, nuclear physics, radiation, neutrinos, magnetic
fields: globally distributed collab! Observe (PB), compute (PF)
signals Gravity and general relativity are transformed 4 centuries
of small science, small data culture 2-3 decades of radical change
in both data (factors of 1000 per~5 years) and collaboration
LIGO/VIRGO/GEO New era of science after a century! Data- and
compute- dominated gravitational wave astronomy!
Slide 8
US Council on Competitiveness Ping Golf Moved from workstation
to Cray, now make prototype only at last stage of design Too
effective: had to simulate less effective design! Proctor &
Gamble Pringles flying off manufacturing line causing significant
lost product and revenue. Using CFD codes that Boeing uses, airflow
over the Pringle modeled, design so Pringles did not lift off
8
Slide 9
9 Part 1a: The Growth of Data Data Tsunami Im still here But Im
your new baby big brother With millions of processors
Slide 10
Going Beyond a Community Transient & Multi-Messenger
Astronomy 10 New era: seeing events as they occur Here now ALMA,
EVLA in radio Ice Cube neutrinos On horizon 24-42m optical? LSST =
SDSS (40TB) every night! SKA = exabytes Simulations integrate all
physics Data-intensive = compute-intensive Astronomy 1500-2010 was
passive. No longer! Communities need to share data, software,
knowledge, in real time Will require integration across
disciplines, end-to- end
Slide 11
Big Data vs The Long Tail of Science Many Big Data projects are
special Tend to be highly organized, have singular sources of data,
professionally curated, a lot attention paid to them What about the
Long Tail (the other 99%)? Thousands of biologists sequencing
communities of organisms Thousands of chemist and materials
scientists developing a materials genome Millions of people
Tweeting Characteristics: Heterogeneous, perhaps hand generated Not
curated, reused, served, etc 11 How do we harness the power of this
long tail? News Flash! NYT 6/3/13: Drug side effects discovered by
mining web logs: paroxetine + pravastatin = high blood sugar!
Slide 12
12 Grand Challenge Communities Combine it All... Where is it
going to go? 12 Same CI useful for black holes, hurricanes
Slide 13
13 Grand Challenge Communities for Complex Problems Require
many disciplines, all scales of collaborations Individuals, groups,
teams, communities Multiscale Collaborations: Beyond teams Are
dynamic and highly multidisciplinary Time domain astronomy,
emergency forecasting, metagenomincs, materials genome Drive
sharing technologies and methodologies Researchers collaborate,
work by sharing data. Places requirements on eInfrastructre:
Software, networks, collaborative environments, data, sharing,
computing, etc Scientific culture, reproducibility, access,
university structures Publications. What is a modern publication?
13 Social, behavioral and economic sciences will be critical in
helping us understand these issues
Slide 14
Scenarios like this in all fields 14 NEON+GIS
Slide 15
Framing the Challenge: Science and Society Transformed by Data
Modern science Data- and compute- intensive Integrative, multiscale
4 centuries of constancy, 4 decades 10 9-12 change!
Multi-disciplinary Collaborations Individuals (Galileo!) Groups,
teams, Grand Challenge Communities Big Data + Long Tail Sea of Data
Age of Observation 15 We still think like this But such radical
change cannot be adequately addressed with (current) incremental
approach! Students take note!
Slide 16
Part 2: Crises, Challenges, Opportunities Computing Data
Software End-to-end Networks Organizational structures Education
No, we are not Cybe r Instruments & Facilities
Slide 17
17 Five Crises CDSE Community needs to address Computing
Technology Multicore: processor is new transistor Programming
model, fault tolerance, etc New models: clouds, grids, GPUs, Data,
provenance, and visualization How do we create data scientists?
What is an international data infrastructure? Software treated as
e-Infrastructure Complex applications on coupled compute-
data-networked environments, tools needed Modern apps: 10 6 +
lines, many groups contribute, take decades
Slide 18
18 Five Crises Organization for Multidisciplinary &
Computational Science Universities must significantly change
organizational structures: multidisciplinary & collaborative
research are needed [for US] to remain competitive in global
science Itself a discipline, computational science advances all
scienceinadequate/outmoded structures within Federal government and
the academy do not effectively support this critical
multidisciplinary field Education The CI environment is running
away from us! How do we develop a workforce to work effectively in
this world? How do universities transition?
Slide 19
Scientific Computing and Imaging Institute, University of Utah
Data Crisis: Information Big Bang PCAST Digital Data NSF Experts
Study Wired, Nature Storage Networking Industry Association (SNIA)
100 Year Archive Requirements Survey Report there is a pending
crisis in archiving we have to create long-term methods for
preserving information, for making it available for analysis in the
future. 80% respondents: >50 yrs; 68% > 100 yrs Industry
Slide 20
The Shift Towards a Sea of Data Implications Science &
society are now data-dominated Experiment, computation, theory US
mobile phone traffic exceeded 1 exabyte! Classes of data
Collections, observations, experiments, simulations Software
Publications Totally new methodologies Algorithms, mathematics,
culture Data become the medium for Multidisciplinarity,
communication, publication, science, economic development 20 How do
we attribute credit for this new publication form? How are data
peer reviewed? What is a publication in the modern data-rich world?
What is a business model for OA? Fundamental questions become
focused around data: What to curate, how to remove boundaries? How
to incentivize sharing? IP?
Slide 21
21 Part 2a: Recommendations
Slide 22
Software ACCI Task Force Reports Final recommendations
presented to the NSF Advisory Committee on Cyberinfrastructure Dec
2010 More than 25 workshops and Birds of a Feather sessions, 1300
people involved Final reports on-line Permanent programmatic
activities in Computational and Data-Enabled Science &
Engineering (CDS&E) should be established within NSF. Grand
Challenges Task Force NSF should establish processes to collect
community requirements and plan long-term software roadmaps.
Software Task Force Higher education should adopt criteria for
tenure and promotion that rewardthe production of digital artifacts
of scholarship. Such artifacts include widely used data sets,
scholarly services delivered online, and software. Campus Bridging
Task Force 22 Campus Bridging Data & Viz Grand Challenge HPC
Learning
Slide 23
Recommendation of NSF Advisory Committee on Cyberinfrastructure
ACCI "The National Science Foundation should create a program in
Computational and Data-Enabled Science and Engineering (CDS&E),
based in and coordinated by the NSF Office of Cyberinfrastructure.
The new program should be collaborative with relevant disciplinary
programs in other NSF directorates and offices." 23
Slide 24
24 Part 3: Universities attempt to respond We have to do all
this and revolutionize the state/national economy?
Slide 25
S koltech: Example of a 21 st Century University in the
Making
Slide 26
Skoltech at a Glance A unique Russian institution in
international context This decade: a community of 200 faculty, 300
post- docs, 1200 graduate students Focused on science, engineering
and technology Addressing problems and issues in IT, Energy,
Biomedicine, Space and Nuclear Interdisciplinary by design; no
departments 15 centers organized around complex problems With
strong programs in support of innovation and entrepreneurship
Creating a culture of innovation in every student, professor, staff
member Important part of the Skolkovo innovation ecosystem
Integrated data, compute, instrumentation infrastructure and policy
under development for Interdisciplinary research Accelerating
discovery Economic development Integrated data, compute,
instrumentation infrastructure and policy under development for
Interdisciplinary research Accelerating discovery Economic
development
Slide 27
27 Part 3a: You can help lead this revolution Kathryn Gray
Slide 28
Modern Research & Education Ecosystem 28 SoftwareSoftware
SoftwareSoftware Track 2 CampusCampusCampusCampusCampusCampus
CampusCampus CampusCampusCampusCampusCampusCampusCampusCampus
DataData DataData DataData DataData XSEDE Education Crisis: I need
all of this to start to solve my problem! Blue Waters
Slide 29
The Opportunity (US picture)! Now have emerging national
Integrated, High Performance Research Architecture Blue Waters and
beyond towards exascale: high end Extraordinary science continues
lead at cutting edge Traditional and novel large data applications
Few places can house, field, or drive such a facility XSEDE
architecture can connect Campus Bridging: campus to national CI
Campus Assets: MRI, Instruments, DNA sequencers Facilities:
Supercomputers, telescopes, accelerators, light sources, NEON More
silicon than Steel Networks: end-to-end connectivity Where are
those optical network apps? 29
Slide 30
Much to do to build CDSE on this Background: address the 5
Crises Education Many new opportunities and challenges CSE already
has its struggles Now data: what is a data scientist? CDSE emerges
Data opportunities for education and citizen science Faculty
development, curriculum development Needed on every campus Talk to
NSF, DOE, EC, your national agencies Recommendations of ACCI,
MPSAC, etc New programs needed: See NSF CDS&E, CI TraCS,
CAREER, LWD, etc You can help make this happen 30
Slide 31
Key Messages Astounding rate of change of the Triple Helix of
Research, Education, and Innovation Computing and Data radically
change methods Culture of collaboration around complex problems
These create many crises and opportunities From technology to
methodology to culture Deep integration required for science
Emergence of Computational and Data- enabled Science and
Engineering as a discipline and your role! A key part of the
paradigm shift 31