Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Glenn K. Lockwood, Ph.D.Advanced Technologies Group
Simplifying data management through storage system design
- 1 -
May 1, 2019
,, I
Ne~RSC .,."
u.s . DEPARTMENT o, Office of
~ ENERGY Science •
Outline / ChargeWhat are our issues and challenges around data storage?(deletion/retention policy and practice, cost, duplication,
hot/cold storage, types of storage)
I. Storage architecture has gotten very complicatedII. NERSC wants to deploy systems that accelerate scientific
discovery by simplifying data management through innovations in system hardware and software
III. We need to manage complexity to simplify user interfaces
- 2 -u.s. DEPARTMENT o, Office of
~ ENERGY Science •
NERSC: the mission HPC facilityfor the U.S. Department of Energy Office of Science
- 3 -
7,000 users800 projects700 applications2,000 publications per year Photo credit: CAMERA
u.s. DEPARTMENT o, Office of
~ ENERGY Science •
Advancing Science• Provide a productive computing
environment for science applications– software/programming environment– advanced app/system performance expertise
• Tightly couple with workflows of DOE’s experimental and observational facilities
Advancing HPC• Collaborate with HPC vendors years
before system delivery to deploy advanced systems/new capabilities at scale
4
NERSC's dual missionAdvance science and advance the state-of-the-art in supercomputing
u.s. DEPARTMENT o, Office of
~ ENERGY Science •
Storage architecture is getting complicated
- 5 -
Performance
Capacity
Burst Buffer
Scratch
Project
Archive
More info: G. K. Lockwood et al., “Storage 2020: A Vision for the Future of HPC Storage,” Berkeley, CA, 2017.
u.s . DEPARTMENT o, Office of
~ ENERGY Science •
Addressing complexity of data management
I. Change user applications
II. Better middleware and tools
III. Smarter system softwareIV. Simpler storage system
hardware architecture
- 6 -u.s. DEPARTMENT o, Office of
ENERGY Science •
Simplifying storage system architecture in 2020
- 7 -
Performance
Capacity Scratch
Community
Archive
More info: G. K. Lockwood et al., “Storage 2020: A Vision for the Future of HPC Storage,” Berkeley, CA, 2017.
u.s . DEPARTMENT o, Office of
~ ENERGY Science •
2025 requires smarter storage system software
- 8 -
Scratch
Community
Archive
2020 Hierarchy 2025 Hierarchy
Hot Tier
Cold Tier
Software abstra
ction
Nonvolatile RAM
NAND (Flash)
Hard disk drives
TapesSoftware abstra
ction
More info: G. K. Lockwood et al., “Storage 2020: A Vision for the Future of HPC Storage,” Berkeley, CA, 2017.
- 9 -
Towards intelligent data management systemsthrough quantitative analysis
Burst Buffer
Scratch
Project
Archive
Elastic
G. K. Lockwood, N. J. Wright, S. Snyder, P. Carns, G. Brown, and K. Harms, “TOKIO on ClusterStor: Connecting Standard Tools to Enable Holistic I/O Performance Analysis,” in Proceedings of the 2018 Cray User Group, 2018.
- 10 -
State of the art in understanding I/O today
Burst Buffer
Scratch
Project
Archive
ENOSPC
Elastic
SeaStream
Perl thingy
LDMS
Darshan
Python thingymmperfmon
LMT Python thingy
Python thingy
LDMS
collectd
collectd
G. K. Lockwood, N. J. Wright, S. Snyder, P. Carns, G. Brown, and K. Harms, “TOKIO on ClusterStor: Connecting Standard Tools to Enable Holistic I/O Performance Analysis,” in Proceedings of the 2018 Cray User Group, 2018.
- 11 -
State of the art in understanding I/O today
Burst Buffer
Scratch
Project
Archive
ENOSPC
Elastic
SeaStream
Perl thingy
LDMS
Darshan
Python thingymmperfmon
LMT Python thingy
Python thingy
LDMS
collectd
collectdExpert knowledgerequired to
turn data into answers
G. K. Lockwood, N. J. Wright, S. Snyder, P. Carns, G. Brown, and K. Harms, “TOKIO on ClusterStor: Connecting Standard Tools to Enable Holistic I/O Performance Analysis,” in Proceedings of the 2018 Cray User Group, 2018.
- 12 -
TOKIO to give insight into data movement
Burst Buffer
Scratch
Project
Archive
ENOSPC
Elastic
SeaStream
Perl thingy
LDMS
Darshan
Python thingymmperfmon
LMT Python thingy
Python thingy
LDMS
collectd
collectdTotal Knowledge of I/O
(TOKIO) Framework
G. K. Lockwood, N. J. Wright, S. Snyder, P. Carns, G. Brown, and K. Harms, “TOKIO on ClusterStor: Connecting Standard Tools to Enable Holistic I/O Performance Analysis,” in Proceedings of the 2018 Cray User Group, 2018.
What TOKIO has told us so far:
Why is I/O slow?
- 13 -
ApplicationI/O Perf (GB/s)
BandwidthContention
Data serverload (cpu%)
Metadata serverload (cpu%)
MetadataOps (MOps)
File SystemFullness (%) Top: G. K. Lockwood, S. Snyder, T. Wang, S. Byna, P. Carns, and N. J. Wright, “A Year in the
Life of a Parallel File System,” in Proceedings of SC'18, 2018, p. 74:1--74:13.Left: G. K. Lockwood et al., “UMAMI: A Recipe for Generating Meaningful Metrics through
Holistic I/O Performance Analysis,” in Proceedings of PDSW-DISCS'17, 2017, pp. 55–60.
ESnet
NERSC ALCF
- 14 -
What TOKIO can tell us going forward:
How does data move during workflows?
Data movement on January 28, 2019
≤ 10 TiB≤ 50 TiB≤ 100 TiB≤ 500 TiB
≤ 1 TiB
OutlookManaging complexity in data management requires increasingly intelligent storage systems to offset increasing architectural complexity• Simpler, all-flash storage hierarchy is a viable short-term solution• Holistic understanding of the entire storage system is becoming essential• TOKIO is a software foundation for holistic understanding and working towards
adaptive storage
The Total Knowledge of I/O (TOKIO) Project, a collaboration between Lawrence Berkeley National Laboratory and Argonne National Laboratory, is
supported by the U.S. Department of Energy, Office of Science, under contracts DE-AC02-05CH11231 and DE-AC02-06CH11357,
https://www.nersc.gov/research-and-development/tokio/
"Any opinions, findings, conclusions or recommendations
expressed in this material are those of the author(s) and do not
necessarily reflect the views of the Networking and Information
Technology Research and Development Program."
The Networking and Information Technology Research and Development
(NITRD) Program
Mailing Address: NCO/NITRD, 2415 Eisenhower Avenue, Alexandria, VA 22314
Physical Address: 490 L'Enfant Plaza SW, Suite 8001, Washington, DC 20024, USA Tel: 202-459-9674,
Fax: 202-459-9673, Email: [email protected], Website: https://www.nitrd.gov