Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Presented to CERN Scientific Computing Forum
Eric Lancon and Rob RoserOctober 27, 2017
US HEP Computing Outlook
LSST
DUNE
CERN Data Center
LHC
Need a picture here…
with an emphasis on ATLAS & CMS computing
1
HEP Computing Frontier
− Computing is an essential enabling and empowering component of almost all aspects of HEP science
− HEP computing has a long history (~60 years) including notable contributions to High Performance Computing (HPC), High Throughput Computing (HTC), and large-scale Data Science
− Substantial resources are devoted to computation and data science as an essential aspect of HEP’s scientific enterprise~20% of HEP budget computing related
− US provides approximately 20% of LHC CPU and Disk
− Winds of Change: New challenges posed by hardware evolution and rapid increase in data rates/volumes in an era of flat/declining budgets
2
US Funding Landscape
Computing in the US is funded essentially from 3 separate agencies, ASCR, DOE HEP, and NSF for basic scientific research.ASCR (Office of Advanced Scientific Computing Research) − fund the 3 major High Performance Computing Centers which house the high performance computers. This is the agency that is driving toward exascale by 2021.
− ASCR also funds ESNET, the 100 GB (current) scientific network that links all of the research laboratories
DOE High Energy Physics• Funds mostly “high throughput” computing primarily at the National Labs specificly for Particle Physics
• Funds LHC Tier 1 Centers• Co-funds transatlantic network with ASCRNational Science Foundation (NSF)• Fund both mid scale computing centers housed on some University campuses as well as small University clusters
• Funds LHC Tier 2 sites, OSG, S2I2 HEP3
Glen Crawford, DirectorJanice HannanChristie Ashton
David BogleyJasmine Shannon (Contractor)
Andrea Peterson (AAAS Fellow)Brian Morsony (AAAS Fellow)
Energy FrontierAbid Patwa
Thomas LeCompte (Detailee)
James Siegrist, DirectorSherry Pepper-Roby, Administrative Specialist
Altaf Carim
Office of High Energy PhysicsHEP Budget and Planning
Erin CruzMichelle Bandy
Alan StoneMichael Cooke
September 2017
Research & Technology Division
Mike Procario, Director
Facilities Division
International Agreements ProgramMichael Salamon
HEP OperationsKathy Yarmas
HEP ConnectionsLali Chatterjee
Physics Research Research Technology
Intensity FrontierGlen Crawford (Acting)
Michael CookeKevin Flood (IPA)Cosmic Frontier
Kathy TurnerMichael SalamonEric Linder (IPA)
Theoretical PhysicsWilliam Kilgore
Simona Rolli
Computational HEPLali Chatterjee
General Accelerator R&DL.K. Len
John BogerEric Colby
Ken Marken
Detector R&D
Helmut Marsiske
SBIR/STTRKen Marken
Facility Operations Facilities Development
Fermilab ComplexJohn Kogut
LHC OperationsAbid Patwa
Simona Rolli
Other Operations[SLAC/Other Labs]
John Kogut
Instrumentation& Major Systems
LARPBruce Strauss
Muon Accelerator (MAP)Bruce Strauss
ATLAS Upgrade – Simona RolliCMS Upgrade – Simona Rolli
DESI – Kathy TurnerFACET II - Ted Lavine
HL-LHC ATLAS - Simona RolliHL-LHC AUP - Simona RolliHL-LHC CMS - Simona Rolli
LBNF-DUNE -Bill Wisniewski (Detailee)LSSTcam – Helmut Marsiske
LZ – Ted LavineMu2e – Ted Lavine
Muon g-2 – Ted LavinePIP-II – Mike Harrison (Detailee)
SuperCDMS-SNOLAB – Simona Rolli
Accelerator StewardshipEric Colby
4
The Office Of
The Associate Director of Advanced
Scientific Computing Research Barbara Helland, Associate Director
Julie Stambaugh - Financial AnalystLori Jernigan - Program Support Specialist
Tameka Morgan (Contractor) - Administrative AssistantChristopher Miller - AAAS Fellow
Computer ScienceLucy Nowell
Data & Visualization Laura Biven
SC Program SAPs Ceren Susut
SciDAC Centers & Institutes
Ceren Susut
Facilities
Oak Ridge Leadership Computing Facility
Christine Chalk
NERSCDave Goodwin
ESnetBen Brown
ALCC Carolyn Lauzon
Betsy Riley
Computational Science Research and Partnerships (SciDAC) Division
Ceren Susut – Director (Acting) Teresa Beachley - Grants and Contracts
Angie Thevenot - Program Support Specialist
Facilities Division Christine Chalk – Director (Acting)
Sally McPherson - Administrative Support
ASCACChristine Chalk - DFO
Sally McPherson - Admin Support
Base, Math: Algorithms, Models, DataSteven Lee
Non-SC Program SAPsRandall LavioletteMultiscale
MathematicsVacant
Argonne Leadership Computing Facility
Sonia Sachs
Non-SC Program SAPsVacant
THE OFFICE OF
ADVANCED SCIENTIFIC COMPUTING RESEARCHFunctional Organization Chart
CSGFChristine Chalk
September 2017
REPClaire CramerRobinson Pino
Advanced Computing Technologies Division
Network ResearchThomas Ndousse-Fetter
Collaboratories/MiddlewareRich Carlson
Computer Science Applied Math Computational Partnerships
5
P5 Science and Corresponding Timelines
P5 = European Strategy Document for US
The P5 report recommends a limited, prioritized and time-ordered list of experiments to optimally address the science drivers.• Covers the small, medium and large investment scales• Will produce results continuously throughout the 20 year time periodHEP is implementing the discovery-driven strategic plan set within a global vision for particle physics as presented in the P5 report
Realizing this vision will require a shift in approaching the networking and computing challenges the data from these experiments will present!
6
Vision for Computing from P5 Report
P5 report recognized the importance of computing:• “Rapidly evolving computer architectures and increasing data volumes require effective crosscutting solutions”
• “[Need] investments to exploit next-generation hardware and computing models” • “Close collaboration of laboratories and universities across the research areas will be needed”
P5 recommnedation 29:• Strengthen the global collaboration among laboratories and universities to address computing and scientific software needs, and provide efficient training in next-generation hardware and data-science software relevant to particle physics. Investigate models for the development and maintenance of major software within and across research areas, including long-term data and software preservation
HEP Response to P5 Recommendation• Initiated HEP Center for Computational Excellence (http://hepcce.org)• CCE leading the coordination of the required transition
7
HEP Center for Computational Excellence Summary
• Primary Mission‣ Bring next-generation computational resources to bear on pressing HEP science problems
‣ Develop cross-cutting solutions leveraging ASCR expertise and resources
• Technical Challenges‣ Hardware and software evolution‣ New algorithms for fine-grained data analysis and I/O on HPC systems
‣ HEP workflow management for the exascale ecosystem
• Engagement Examples‣ Software management for HPC systems (containers)
‣ Edge services for HPC systems‣ Petascale data transfer project with ESnet‣ Distributed large-scale data analytics
Joint ASCR/HEP Exascale Requirements Review
co-organized by HEP-CCE
8
The Realities of HEP Computing Moving Forward
− Due to funding constraints, we will have to optimize computing resources for “average” demand and not peak
− Need to find creative solutions for those instances where we need more than we have and need it fast (“pledged” vs. “non-pledged” resources)
− Importance of leveraging HPC facilities – already demonstrated at level of 11% of ATLAS computing, equivalent to contribution from some countries
− Computing in the Cloud may also be an important player in future
− Finding ways to evolve and maintain software base is critical
Computing environment of the future likely to be more heterogeneous than that of today
9
Projected Shortfalls in HEP Computing Resources
Expectation of ~10 X shortfalls in compute by 2025 including:
• Storage and Data movement – need smart networks, optimization of compute, data movement, and storage
• Hardware for simulation, data analysis, and storage• Workforce, highlighting need for expertise and training
Entire computing ecosystem critical to workflows and results;; rapid hardware and software evolution is a critical concern
Can these challenges be handled entirely within HEP resources and programs?
Need a shift in strategy to best prepare for the future while managing current operations and using resources external to HEP
10
Needs for Run 3 & 4
White papers produced by US-ATLAS & US-CMS over summer 2017 with ~common assumptions on LHC parameters
11
12
So, What’s the Plan?
Planning on a Series of Strategic Investments
13
ESnet: DOE’s international science network
ESnet is an Office of Science facility connecting the DOE labs, facilities, experiment
sites and supercomputers
ESNET provides world-class support for scientific discovery for US Office of Science Researchers and their collaborators
14
Exponential growth, a 27 year trend
15
1 EBJan 2021*
56 PB, Jan 2017
10x growth
every 47 months
Other Sciences are getting into “big data” which will increase this slope in the future
ESnet6 – a terabit network (Jan 2021)
16
Harnessing the Data Revolution Initiative
NSF Led
Touching on § Predictive analytics§ Data mining § Machine learning§ Benchmark data sets§ Integrity and accessibility§ Privacy and protection§ Human-data interface
Cyber Infrastructure • Robust, open, science-driven,
integrated research CI ecosystem, with data as a “first-class object”
17
Options for Future Computing
LHC will bring the US HEP into the ExabyteEra with Run 4 (~2025). What is our plan?
Buy Facilities• Own it! Full Control• Must buy for peak utilization if this is the only
option
User Services from other Providers• Let others make the capital investments for us• Will usage be available and affordable when needed?• Commercial clouds and HPC centers are examples• Evolution of HEP networking provides another promising example for pursuing computing as part of infrastructure not owned by HEP
Hybrid Model• Own baseline resources that will be used at full capacity• Use service providers for peak cycles when needed (conference season, reprocessing, etc) 18
How to Live in a Heterogeneous World
HEPCloud and Panda are pursuing R+D Efforts to facilitate this new paradigm
Science Goal: Satisfy the computing resource needs for the HEP Community• Hide the complexity of computing from the scientists to allow them to focus on their
analysis activities.
Technical Challenge: A single portal to access a wide range of computing resources (University Grids, Commercial Clouds, ASCR and NSF funded Super Computers, DOE laboratory resources, etc.• Expert knowledge not required• Decision engine optimizes where jobs are executed without human
intervention (HEPCloud)
Benefits:• Ease of use and improved operational efficiencies• Computing resources can expand and shrink on short notice to meet
demands (conference deadlines for instance) 19
Open Science Grid (OSG)
OSG administers a worldwide grid of computational resources which facilitates distributed computing for scientific research
• These US based resources are based at DOE National Labs, NSF Computing Centers, and University based computing.
All US LHC grid sites use OSG middleware (and some non-US use it as well.
US-ATLAS & US-CMS estimates of OSG equivalent effort provided to experimental programs to be ~15 FTE
OSG in its present incarnation (until June 2018) is funded by NSF
20
US Will Soon be Entering the Exascale Era
National Strategic Computing Initiative ((NSCI) -- White House sponsored Initiative
• Goal is to have one machine ”on the floor” in 2021 and two additional machines by 2023
• US conducted an Exascale requirements review in 2016 for each of the major sciences to gather requirements for this machine.
• HEP Report Available at https://arxiv.org/abs/1603.09303
An exascale computer provides a lot of compute. 1% of such a machine is more than is available to HEP today world-wide
Both DOE and NSF are participating in this White House Initiative
21
The Impact of HPC on HEP -- Reality
− Not all problems can be solved using HPC systems, but many can (accelerators, cosmology, event generation/simulation, QCD… already using HPC technology)
− Next generation of ASCR HPC machines (staging begins 2018, ends in 2019) will sum to ~200 petaflops of compute capability and architectures not fully spec’d
− If HEP experiments use just 10% of that, i.e. 20 petaflops, it is substantial!!!
− Learning how to leverage these resources to seamlessly supplement/enhance current capability is important but hard!− Currently, dealing with HPC systems is painful! Not for the faint of heart! Machines not designed for HEP compute.
− Most of the HEP code base does not work on these machines today.− New possibilities opened up by HPC platforms will offer unique computational opportunities — but we need to invest to harness them –doesn’t come automatically.
− Major task is to use HPC resources to act on incoming data, not just on data created in situ;; this will require investment in edge services
22
Future of US HEP Computing
HEP Computing in US will no longer be monolithicResearchers will need to leverage a variety of platforms− HPC resources at ASCR Computing Centers− HEP owned “GRID” style resources− Commercial Clouds− Elasticity is critical – spin up machines when they are neededEven ”commodity” machines of the future will be much more complicated than those of today – more parallel with a variety of accelerator processors in them. − HEP Code base will have to adaptNew paradigms are on the horizon in computing that HEP will need to pay close attention to− Deep Learning− Smart Networks− Exascale Compute, Neuromorphic, Quantum Computing
23
Moving Toward a Service Model
Would like CERN LHC program to move toward “goal oriented” provisioning
e.g. 1 billion MC events produced, etcRequires better planning than what is done today!
− Moves away from the archaic boxes/year pledge − Tighter coupling to what work the experiments need to get done.
− Gives countries flexibility on how to meet their goals
− Requires a change in culture/mind set. Now is the time to consider it.
24
We Must NOT Forget Software
Software Investment is Critical• LHC Software stack is significant. Will require substantial resources to modernize it to run on future platforms
• Baseline Services like ROOT, GEANT, etc need to not only be supported but evolve as appropriate to the next generation of compute
• Not too early to start on this
• NSF taking the early lead here in US (http://s2i2-hep.org)Goal is to prepare a strategic plan for a potential NSF Scientific Software Innovation Institute (S2I2) to develop software for experiments taking data in the "High-Luminosity Large Hadron Collider" (HL-LHC) era in the 2020s 25
Summary
− As we advance into the next decade, HEP cannot take a “business as usual” approach to computing− Combination of upgraded facilities and finer grained detectors will push the envelope of “big data”
− Is HPC the answer?• Not clear yet – but it will certainly be part of the equation
− Our Compute and Data model will have to evolve• Faster and smarter networks are a given• Relationship of compute to data is not obvious• Clouds will likely play a role in the future
− HEP compute will look more like HPC centers in the future though perhaps architected more for data processing
26
Back-Up Slides
27