16
Glenn K. Lockwood, Ph.D. Advanced Technologies Group Simplifying data management through storage system design -1- May 1, 2019 ,, I Ne ~ RSC .,. " u.s. DEPARTMENT o, Office of ENERGY Science

Simplifying data management through storage system design · 2019-06-05 · II. NERSC wants to deploy systems that accelerate scientific discovery by simplifying data management through

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Simplifying data management through storage system design · 2019-06-05 · II. NERSC wants to deploy systems that accelerate scientific discovery by simplifying data management through

Glenn K. Lockwood, Ph.D.Advanced Technologies Group

Simplifying data management through storage system design

- 1 -

May 1, 2019

,, I

Ne~RSC .,."

u.s . DEPARTMENT o, Office of

~ ENERGY Science •

Page 2: Simplifying data management through storage system design · 2019-06-05 · II. NERSC wants to deploy systems that accelerate scientific discovery by simplifying data management through

Outline / ChargeWhat are our issues and challenges around data storage?(deletion/retention policy and practice, cost, duplication,

hot/cold storage, types of storage)

I. Storage architecture has gotten very complicatedII. NERSC wants to deploy systems that accelerate scientific

discovery by simplifying data management through innovations in system hardware and software

III. We need to manage complexity to simplify user interfaces

- 2 -u.s. DEPARTMENT o, Office of

~ ENERGY Science •

Page 3: Simplifying data management through storage system design · 2019-06-05 · II. NERSC wants to deploy systems that accelerate scientific discovery by simplifying data management through

NERSC: the mission HPC facilityfor the U.S. Department of Energy Office of Science

- 3 -

7,000 users800 projects700 applications2,000 publications per year Photo credit: CAMERA

u.s. DEPARTMENT o, Office of

~ ENERGY Science •

Page 4: Simplifying data management through storage system design · 2019-06-05 · II. NERSC wants to deploy systems that accelerate scientific discovery by simplifying data management through

Advancing Science• Provide a productive computing

environment for science applications– software/programming environment– advanced app/system performance expertise

• Tightly couple with workflows of DOE’s experimental and observational facilities

Advancing HPC• Collaborate with HPC vendors years

before system delivery to deploy advanced systems/new capabilities at scale

4

NERSC's dual missionAdvance science and advance the state-of-the-art in supercomputing

u.s. DEPARTMENT o, Office of

~ ENERGY Science •

Page 5: Simplifying data management through storage system design · 2019-06-05 · II. NERSC wants to deploy systems that accelerate scientific discovery by simplifying data management through

Storage architecture is getting complicated

- 5 -

Performance

Capacity

Burst Buffer

Scratch

Project

Archive

More info: G. K. Lockwood et al., “Storage 2020: A Vision for the Future of HPC Storage,” Berkeley, CA, 2017.

u.s . DEPARTMENT o, Office of

~ ENERGY Science •

Page 6: Simplifying data management through storage system design · 2019-06-05 · II. NERSC wants to deploy systems that accelerate scientific discovery by simplifying data management through

Addressing complexity of data management

I. Change user applications

II. Better middleware and tools

III. Smarter system softwareIV. Simpler storage system

hardware architecture

- 6 -u.s. DEPARTMENT o, Office of

ENERGY Science •

Page 7: Simplifying data management through storage system design · 2019-06-05 · II. NERSC wants to deploy systems that accelerate scientific discovery by simplifying data management through

Simplifying storage system architecture in 2020

- 7 -

Performance

Capacity Scratch

Community

Archive

More info: G. K. Lockwood et al., “Storage 2020: A Vision for the Future of HPC Storage,” Berkeley, CA, 2017.

u.s . DEPARTMENT o, Office of

~ ENERGY Science •

Page 8: Simplifying data management through storage system design · 2019-06-05 · II. NERSC wants to deploy systems that accelerate scientific discovery by simplifying data management through

2025 requires smarter storage system software

- 8 -

Scratch

Community

Archive

2020 Hierarchy 2025 Hierarchy

Hot Tier

Cold Tier

Software abstra

ction

Nonvolatile RAM

NAND (Flash)

Hard disk drives

TapesSoftware abstra

ction

More info: G. K. Lockwood et al., “Storage 2020: A Vision for the Future of HPC Storage,” Berkeley, CA, 2017.

Page 9: Simplifying data management through storage system design · 2019-06-05 · II. NERSC wants to deploy systems that accelerate scientific discovery by simplifying data management through

- 9 -

Towards intelligent data management systemsthrough quantitative analysis

Burst Buffer

Scratch

Project

Archive

Elastic

G. K. Lockwood, N. J. Wright, S. Snyder, P. Carns, G. Brown, and K. Harms, “TOKIO on ClusterStor: Connecting Standard Tools to Enable Holistic I/O Performance Analysis,” in Proceedings of the 2018 Cray User Group, 2018.

Page 10: Simplifying data management through storage system design · 2019-06-05 · II. NERSC wants to deploy systems that accelerate scientific discovery by simplifying data management through

- 10 -

State of the art in understanding I/O today

Burst Buffer

Scratch

Project

Archive

ENOSPC

Elastic

SeaStream

Perl thingy

LDMS

Darshan

Python thingymmperfmon

LMT Python thingy

Python thingy

LDMS

collectd

collectd

G. K. Lockwood, N. J. Wright, S. Snyder, P. Carns, G. Brown, and K. Harms, “TOKIO on ClusterStor: Connecting Standard Tools to Enable Holistic I/O Performance Analysis,” in Proceedings of the 2018 Cray User Group, 2018.

Page 11: Simplifying data management through storage system design · 2019-06-05 · II. NERSC wants to deploy systems that accelerate scientific discovery by simplifying data management through

- 11 -

State of the art in understanding I/O today

Burst Buffer

Scratch

Project

Archive

ENOSPC

Elastic

SeaStream

Perl thingy

LDMS

Darshan

Python thingymmperfmon

LMT Python thingy

Python thingy

LDMS

collectd

collectdExpert knowledgerequired to

turn data into answers

G. K. Lockwood, N. J. Wright, S. Snyder, P. Carns, G. Brown, and K. Harms, “TOKIO on ClusterStor: Connecting Standard Tools to Enable Holistic I/O Performance Analysis,” in Proceedings of the 2018 Cray User Group, 2018.

Page 12: Simplifying data management through storage system design · 2019-06-05 · II. NERSC wants to deploy systems that accelerate scientific discovery by simplifying data management through

- 12 -

TOKIO to give insight into data movement

Burst Buffer

Scratch

Project

Archive

ENOSPC

Elastic

SeaStream

Perl thingy

LDMS

Darshan

Python thingymmperfmon

LMT Python thingy

Python thingy

LDMS

collectd

collectdTotal Knowledge of I/O

(TOKIO) Framework

G. K. Lockwood, N. J. Wright, S. Snyder, P. Carns, G. Brown, and K. Harms, “TOKIO on ClusterStor: Connecting Standard Tools to Enable Holistic I/O Performance Analysis,” in Proceedings of the 2018 Cray User Group, 2018.

Page 13: Simplifying data management through storage system design · 2019-06-05 · II. NERSC wants to deploy systems that accelerate scientific discovery by simplifying data management through

What TOKIO has told us so far:

Why is I/O slow?

- 13 -

ApplicationI/O Perf (GB/s)

BandwidthContention

Data serverload (cpu%)

Metadata serverload (cpu%)

MetadataOps (MOps)

File SystemFullness (%) Top: G. K. Lockwood, S. Snyder, T. Wang, S. Byna, P. Carns, and N. J. Wright, “A Year in the

Life of a Parallel File System,” in Proceedings of SC'18, 2018, p. 74:1--74:13.Left: G. K. Lockwood et al., “UMAMI: A Recipe for Generating Meaningful Metrics through

Holistic I/O Performance Analysis,” in Proceedings of PDSW-DISCS'17, 2017, pp. 55–60.

Page 14: Simplifying data management through storage system design · 2019-06-05 · II. NERSC wants to deploy systems that accelerate scientific discovery by simplifying data management through

ESnet

NERSC ALCF

- 14 -

What TOKIO can tell us going forward:

How does data move during workflows?

Data movement on January 28, 2019

≤ 10 TiB≤ 50 TiB≤ 100 TiB≤ 500 TiB

≤ 1 TiB

Page 15: Simplifying data management through storage system design · 2019-06-05 · II. NERSC wants to deploy systems that accelerate scientific discovery by simplifying data management through

OutlookManaging complexity in data management requires increasingly intelligent storage systems to offset increasing architectural complexity• Simpler, all-flash storage hierarchy is a viable short-term solution• Holistic understanding of the entire storage system is becoming essential• TOKIO is a software foundation for holistic understanding and working towards

adaptive storage

The Total Knowledge of I/O (TOKIO) Project, a collaboration between Lawrence Berkeley National Laboratory and Argonne National Laboratory, is

supported by the U.S. Department of Energy, Office of Science, under contracts DE-AC02-05CH11231 and DE-AC02-06CH11357,

https://www.nersc.gov/research-and-development/tokio/

Page 16: Simplifying data management through storage system design · 2019-06-05 · II. NERSC wants to deploy systems that accelerate scientific discovery by simplifying data management through

"Any opinions, findings, conclusions or recommendations

expressed in this material are those of the author(s) and do not

necessarily reflect the views of the Networking and Information

Technology Research and Development Program."

The Networking and Information Technology Research and Development

(NITRD) Program

Mailing Address: NCO/NITRD, 2415 Eisenhower Avenue, Alexandria, VA 22314

Physical Address: 490 L'Enfant Plaza SW, Suite 8001, Washington, DC 20024, USA Tel: 202-459-9674,

Fax: 202-459-9673, Email: [email protected], Website: https://www.nitrd.gov