37
ORNL is managed by UT-Battelle for the US Department of Energy Quantifying Resiliency in the Extreme Scale HPC Co-Design Space Jeffrey S. Vetter Jeremy Meredith http://ft.ornl.gov [email protected] Dagstuhl Seminar #15281 Algorithms and Scheduling Techniques to Manage Resilience and Power Consumption in Distributed Systems 9 Jul 2015

Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

ORNL is managed by UT-Battelle

for the US Department of Energy

Quantifying Resiliency in the

Extreme Scale HPC Co-Design

Space

Jeffrey S. Vetter

Jeremy Meredith

http://ft.ornl.gov [email protected]

Dagstuhl Seminar #15281

Algorithms and Scheduling Techniques to

Manage Resilience and Power Consumption

in Distributed Systems

9 Jul 2015

Page 2: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

3

Overview

• Our community has major challenges in HPC as we move to extreme scale – Power, Performance, Resilience, Productivity

– Major shifts in architectures, software, applications • Not just HPC: Most uncertainty in two decades

• New technologies emerging to address some of these challenges – Heterogeneous computing

– Nonvolatile memory

• Consequently, we now have critical situations in – Portable programming models

– Better design and analysis tools for procurement, optimization, etc

• Aspen is a tool for structured design and analysis – Co-design applications and architectures for performance, power, resiliency

Page 3: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

4

Surveying the HPC Landscape:

Today and Tomorrow

Page 4: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

5

Notional Exascale Architecture Targets

(From Exascale Arch Report 2009)

System attributes 2001 2010 “2015” “2018”

System peak 10 Tera 2 Peta 200 Petaflop/sec 1 Exaflop/sec

Power ~0.8 MW 6 MW 15 MW 20 MW

System memory 0.006 PB 0.3 PB 5 PB 32-64 PB

Node performance 0.024 TF 0.125 TF 0.5 TF 7 TF 1 TF 10 TF

Node memory BW 25 GB/s 0.1 TB/sec 1 TB/sec 0.4 TB/sec 4 TB/sec

Node concurrency 16 12 O(100) O(1,000) O(1,000) O(10,000)

System size (nodes) 416 18,700 50,000 5,000 1,000,000 100,000

Total Node

Interconnect BW

1.5 GB/s 150 GB/sec 1 TB/sec 250 GB/sec 2 TB/sec

MTTI day O(1 day) O(1 day)

http://science.energy.gov/ascr/news-and-resources/workshops-and-conferences/grand-challenges/

Parallel I/O ??

Page 5: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

6

Today’s Status

Page 6: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

7

(Un-)Balanced Systems ??

System attributes 2001 2010 2014 est 2018 Summit/Titan

Name Seaborg3 Jaguar Titan SUMMIT

System peak 10 Tera 2 27 136 5.0

Power (MW) 0.8 6 9 10 1.1

Node main memory (GB) 16 38 512 13.5

System memory (PB) 0.006 0.3 0.7106 1.7408 2.4

Node Persistent Memory (GB) 800 inf

System Persistent Memory (PB) 2.72 inf

Node performance (TF) 0.024 0.125 1.4 0.5 7 40 28.6 1 10

Node memory BW 25 GB/s 0.1 TB/sec 1 TB/sec 0.4 TB/sec 4 TB/sec

Node concurrency 16 12 O(100) O(1,000) *POWER9s + *VOLTAs O(1,000) O(10,000)

System size (nodes) 416 18700 18700 50000 5000 3400 0.2 1000000 100000

Total Node Interconnect BW (GB/s) 1.5 GB/s 150 GB/sec 1 TB/sec 250 GB/sec 2 TB/sec

injection bandwidth per node (GB/s) 7.6 20 23 1.2

File system capacity (PB) 6 32 120 3.8

File system bandwidth (TB/s) 0.3 1 1 1.0

MTTI day O(1 day) O(1 day)

“2015” “2018”

200 1 Exaflop/sec

15 20

5 32-64

• Power is constant • 1/5 of the node count • Heterogeneous • I/O and NIC bandwidth has plateaued • NVM is new!

Page 7: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

8

Notional Future Architecture

Interconnection

Network

Page 8: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

Investigating Emerging Technologies

• Heterogeneous Computing

• NV Memory

• Optical interconnect, Silicon photonics

• Storage systems (key, value)

Page 9: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

10

Earlier Experimental Computing Systems (past

decade)

• The past decade has started the trend away from traditional ‘simple’ architectures

• Examples

– Cell, GPUs, FPGAs, SoCs, etc

• Mainly driven by facilities costs and successful (sometimes heroic) application examples

Popula

r arc

hitectu

res s

ince ~

2004

Page 10: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

11

We’ve seen this for decades; Why is it different this

time?

Bob Colwell, Hotchips 25

IEEE Spectrum Avago acquires Broadcom; Intel acquires Altera; next ??

Page 11: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

12

Emerging Computing Architectures – Now

• Heterogeneous processing

– Latency tolerant cores

– Throughput cores

– Special purpose hardware (e.g., AES, MPEG, RND)

• Memory

– Fused, configurable memory

– 2.5D and 3D Stacking

– HMC, HBM, WIDEIO2, LPDDR4, etc

– New devices (PCRAM, ReRAM)

• Interconnects

– Collective offload

– Scalable topologies

• Storage – Active storage

– Non-traditional storage architectures (key-value stores)

• Improving performance and programmability in face of increasing complexity

– Power, resilience

HPC (mobile, enterprise, embedded) computer design is more fluid now than in the past two decades.

Page 12: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

13

Recent announcements

Page 14: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

15

Comparison of emerging memory technologies

Jeffrey Vetter, ORNL

Robert Schreiber, HP Labs

Trevor Mudge, University of Michigan

Yuan Xie, Penn State University

SRAM DRAM eDRAM 2D NAND

Flash

3D NAND

Flash

PCRAM STTRAM 2D ReRAM 3D ReRAM

Data Retention N N N Y Y Y Y Y Y

Cell Size (F2) 50-200 4-6 19-26 2-5 <1 4-10 8-40 4 <1

Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24

Read Time (ns) < 1 30 5 104 104 10-50 3-10 10-50 10-50

Write Time (ns) < 1 50 5 105 105 100-300 3-10 10-50 10-50

Number of Rewrites 1016 1016 1016 104-105 104-105 108-1010 1015 108-1012 108-1012

Read Power Low Low Low High High Low Medium Medium Medium

Write Power Low Low Low High High High Medium Medium Medium

Power (other than R/W) Leakage Refresh Refresh None None None None Sneak Sneak

Maturity

http://ft.ornl.gov/trac/blackcomb

Page 15: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

16

Opportunities for NVM in Emerging Systems

• Burst Buffers

• In-mem tables

• In situ visualization

J.S. Vetter and S. Mittal, “Opportunities for Nonvolatile Memory Systems in Extreme-Scale High-Performance Computing,” Computing in Science &

Engineering, 17(2):73-82, 2015, 10.1109/MCSE.2015.4.

http://ft.ornl.gov/eavl

Page 16: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

With so many complex architectural

choices, we need new methods and

tools

• Performance Portable Programming Models

• Design tools for architectures and applications

– Performance

– Resiliency

– Power

17

Page 17: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

19

System

Software

Proxy

Apps

Application

Co-Design

Hardware

Co-Design

Computer

Science

Co-Design

Vendor

Analysis Sim

Exp

Proto HW

Prog Models

HW Simulator

Tools

Open

Analysis Models

Simulators

Emulators

HW

Design

Stack

Analysis Prog

models

Tools

Compilers

Runtime

OS, I/O, ... HW Constraints

Domain/Alg

Analysis

SW Solutions

System Design

Application Design

Workflow within the Exascale Ecosystem

“(Application driven) co-design is the process

where scientific problem requirements influence

computer architecture design, and technology

constraints inform formulation and design of

algorithms and software.” – Bill Harrod (DOE)

Slide courtesy of ExMatEx Co-design team.

Page 18: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

21

Prediction Techniques Ranked

Page 19: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

22

Prediction Techniques Ranked

Page 20: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

24

Aspen: Abstract Scalable Performance Engineering Notation

Creation

• Static analysis via compilers

• Empirical, Historical

• Manual for future applications

Use

• Interactive tools for graphs, queries

• Design space optimization

• Drive simulators

• Feedback to runtime systems

Representation in Aspen

• Modular

• Sharable

• Composable

• Reflects prog structure

Existing models for MD, UHPC CP 1,

Lulesh, 3D FFT, CoMD, VPFFT, …

Source code Aspen code

K. Spafford and J.S. Vetter, “Aspen: A Domain Specific Language for Performance Modeling,” in SC12: ACM/IEEE International Conference for High Performance

Computing, Networking, Storage, and Analysis, 2012

Researchers are using Aspen for parallel applications, scientific workflows, capacity planning, quantum computing, etc

Page 21: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

25

Manual Example of LULESH

Page 22: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

26

Creating Aspen Models

S. Lee, J.S. Meredith, and J.S. Vetter, “COMPASS: A Framework for Automated Performance Modeling and Prediction,” in ACM

International Conference on Supercomputing (ICS). Newport Beach, California: ACM, 2015, 10.1145/2751205.2751220.

Page 23: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

27

Simple MM example generated from COMPASS

Page 24: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

28

Example Queries

Automated Big O Notation

Idealized Concurrency

Page 25: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

29

Example Queries

Computational Intensity and Memory Usage per Subroutine

Page 26: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

31

Fast and Accurate Enough for Online Decisions

Page 27: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

32

PANORAMA Overview

Infrastructure

Design

Model Validation

Workflow Execution

Simulation

Anomaly

Detection and

Diagnosis

Resource

Mapping and

Adaptation

ExoGENI

OLCF

NERSC

Viz

APS

HPSS

VDF

SNS ES

ne

t

Workflow

Pegasus Framework

Aspen Modeling Language

and System

Resources

Ra

w a

nd

Co

rre

late

d M

on

ito

rin

g D

ata

ESnet

testbed

E. Deelman, C. Carothers et al., “PANORAMA: An Approach to Performance Modeling and Diagnosis of Extreme Scale Workflows,” International Journal of

High Performance Computing Applications, (to appear), 2015,

Page 28: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

End-to-end Resiliency Design using

Aspen

Page 29: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

35

Data Vulnerability Factor: Why a new metric and

methodology?

• Analytical model of resiliency that includes important features of architecture and application

– Fast

– Flexible

• Balance multiple design dimensions

– Application requirements

– Architecture (memory capacity and type)

• Focus on main memory initially

• Prioritize vulnerabilities of application data

L. Yu, D. Li et al., “Quantitatively modeling application resilience with the data vulnerability factor (Best Student Paper Finalist),” in

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis. New Orleans, Louisiana:

IEEE Press, 2014, pp. 695-706, 10.1109/sc.2014.62.

Page 30: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

36

DVF Defined

𝑁𝑒𝑟𝑟𝑜𝑟 = 𝐹𝐼𝑇 ∗ 𝑇 ∗ 𝑆𝑑

Hardware Failure Rate ( 𝐹𝐼𝑇 ) Execution Time ( 𝑇 ) Footprint Size ( 𝑆𝑑 )

Hardware Effects Number of Errors ( 𝑵𝒆𝒓𝒓𝒐𝒓 )

Hardware Access Pattern

Application Effects Number of Hardware Accesses ( 𝑵𝒉𝒂 )

𝑁ℎ𝑎 Hardware Access Pattern

Data Structure Vulnerability → 𝐷𝑉𝐹𝑑 = 𝑁𝑒𝑟𝑟𝑜𝑟 ∗ 𝑁ℎ𝑎

Application Vulnerability → 𝐷𝑉𝐹𝑎 = 𝐷𝑉𝐹𝑑𝑖𝑛𝑖=1

Hardware Access Pattern

Application Effects Number of Hardware Accesses ( 𝑵𝒉𝒂 ) We focus on a specific hardware

component, the main memory, in this work

Larger DVF indicates higher vulnerability, and vice versa

Page 31: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

37

Implementing DVF

• Extend Aspen performance modeling language

• Specify memory access patterns

• Combine error rates with memory regions and performance

• Assign DVF to each application memory region, Sum for application

Page 32: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

38

Workflow to calculate Data Vulnerability Factor

Page 33: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

39

An Example of Aspen Program for DVF

procedure VM(A,B,C) for i 1, 1000 do C[i] C[i] + A[i*4] * B[i*8] end for end procedure

Pseudocode

kernel vecmul { execute mainblock2 [1] { flops [2*(n^3)] as sp, fmad, simd access {1000} from {matA} as stream(4,16) access {4000} from {matB} as stream(4,32) access {8000} from {matC} as stream(4,4) } }

Extended Aspen Statements

Resilience Statements: Footprint Sizes: Int: 16,000 Data Structures: Ident: matA Access Pattern: Stream Int: 4 Int: 16 Resilience Statements: Footprint Sizes: Int: 16,000 Data Structures: Ident: matA Access Pattern: Stream Int: 4 Int: 16 Resilience Statements: Footprint Sizes: Int: 16,000 Data Structures: Ident: matA Access Pattern: Stream Int: 4 Int: 16

Syntax Tree

Data structure A: Number of errors: 30,400 Number of memory accesses: 51 DVF: 105504e+06 …

Resilience Modeling Results

Extended

Parser

Extended

Complier

Page 34: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

40

40

DVF Results Provides insight for balancing interacting factors

Page 35: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

41

DVF: next steps

• Evaluated different architectures

– How much no-ECC, ECC, NVM?

• Evaluate software and applications

– ABFT

– C/R

– TMR

– Containment domains

– Fault tolerant MPI

• End-to-End analysis

– Where should we bear the cost for resiliency?

• Not everwhere!

41

Page 36: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

43

Summary

• Our community has major challenges in HPC as we move to extreme scale – Power, Performance, Resilience, Productivity

– Major shifts in architectures, software, applications • Not just HPC: Most uncertainty in two decades

• New technologies emerging to address some of these challenges – Heterogeneous computing

– Nonvolatile memory

• Consequently, we now have critical situations in – Portable programming models

– Performance prediction for procurement, optimization, etc

• Aspen is a tool we have developed for performance prediction – Co-design applications and architectures for performance, power, resiliency

Page 37: Quantifying Resiliency in the Extreme Scale HPC Co …materials.dagstuhl.de/files/15/15281/15281.JeffreyS...Minimum F demonstrated (nm) 14 25 22 16 64 20 28 27 24 Read Time (ns)

44

Acknowledgements

• Contributors and Sponsors

– Future Technologies Group: http://ft.ornl.gov

– US Department of Energy Office of Science

• DOE Vancouver Project: https://ft.ornl.gov/trac/vancouver

• DOE Blackcomb Project: https://ft.ornl.gov/trac/blackcomb

• DOE ExMatEx Codesign Center: http://codesign.lanl.gov

• DOE Cesar Codesign Center: http://cesar.mcs.anl.gov/

• DOE Exascale Efforts: http://science.energy.gov/ascr/research/computer-science/

– Scalable Heterogeneous Computing Benchmark team: http://bit.ly/shocmarx

– US National Science Foundation Keeneland Project: http://keeneland.gatech.edu

– US DARPA

– NVIDIA CUDA Center of Excellence