Ncar globally accessible user environment

Supercomputing 2016

NCAR Globally Accessible User Environment

DDN Booth

Supercomputing 2016 Pamela Hill, Manager, Data Analysis Services

Supercomputing 2016

National Center For Atmospheric Research - NCAR

•  Federally(Funded(Research(and(Development(Center((FFRDC)(sponsored(by(the(Na:onal(Science(Founda:on((NSF)(and(established(in(1959(

•  Operated(by(the(University(Corpora:on(for(Atmospheric(Research((UCAR),(a(nonEprofit(consor:um(of(>(100(member(universi:es,(academic(affiliates,(and(interna:onal(affiliates(

“I$have$a$very$strong$feeling$that$science$exists$to$serve$human$welfare.$$It’s$wonderful$to$have$the$opportunity$given$us$by$society$to$do$basic$research,$but$in$return,$we$have$a$very$important$moral$responsibility$to$apply$that$research$to$benefi?ng$humanity.”$

$Walter(Orr(Roberts,(((((((((NCAR(Founding(Director(

Supercomputing 2016

Computational & Information Systems Laboratory – CISL Mission

•  To(support,(enhance,(and(extend(the(capabili:es(for(transforma:ve(science(to(the(university(community(and(the(broader(scien:fic(community,(na:onally(and(interna:onally(

•  Provide(capacity(and(capability(supercompu:ng(

•  Develop(and(support(robust,(accessible,(innova:ve(and(advanced(services(and(tools(

•  Create(an(Earth(System(Knowledge(Environment(

Supercomputing 2016

Data Analysis Services Group NCAR / CISL / HSS / DASG

•  Data Transfer and Storage Services •  Pamela Hill •  Joey Mendoza

•  High-Performance File Systems

•  Data Transfer Protocols •  Science Gateway

Support •  Innovative I/O

Solutions

•  Visualization Services •  John Clyne •  Scott Pearse •  Samuel Li

•  VAPOR development and support

•  3D visualization •  Visualization User

Support

Supercomputing 2016

GLADE Mission GLobally Accessible Data Environment

•  Unified and consistent data environment for NCAR HPC •  Supercomputers, Data Analysis and Visualization Clusters •  Support for project work spaces •  Support for shared data transfer interfaces •  Support for Science Gateways and access to Research Data Archive (RDA) &

Earth Systems Grid (ESG) data sets

•  Data is available at high bandwidth to any server or supercomputer within the GLADE environment

•  Resources outside the environment can manipulate data using common interfaces

•  Choice of interfaces supports current projects; platform is flexible to support future projects

Supercomputing 2016

GLADE Environment

GLADE$

HPSS$

Data$Transfer$Gateways$

Science$Gateways$

Project$Spaces$Data$Collec?ons$$HOME$$$$$$WORK$

$SCRATCH$

Remote$Visualiza?on$

Computa?on$ Analysis$&$Visualiza?on$

RDA$ESG$CDP$

geyser$caldera$pronghorn$

yellowstone$cheyenne&

Globus$Online$Globus+$Data$Share$

GridFTP$HSI$/$HTAR$

scp,$sXp,$bbcp$

VirtualGL$

Supercomputing 2016

GLADE Today

•  90 GB/s bandwidth •  16 PB useable capacity •  76 IBM DCS3700 •  6840 3TB drives

•  shared data + metadata

•  20 NSD servers •  6 management nodes •  File Services

•  FDR •  10 GbE

Production Jan 2017

•  220 GB/s bandwidth •  20 PB useable capacity •  8 DDN SFA14KXE •  3360 8TB drives

•  data only •  48 800GB SSD

•  Metadata •  File Services

•  EDR •  40 GbE

Expansion Spring 2017 •  19 PB useable

capacity

Supercomputing 2016

DDN SFA14KXE Scalable System Unit

8 x SSU’s •  (4) NSD VM’s •  (4) Dual Port ConnectX-4 VPI

HBA’s •  (4) EDR IB Connections •  (4) 40 GbE Connections

•  (420) 8 TB NL-SAS •  (6) 800 GB SSD

(42) 8 TB

(42) 8 TB

(42) 8 TB

(42) 8 TB

(42) 8 TB, (1) SSD

(42) 8 TB, (1) SSD

(42) 8 TB, (1) SSD

(42) 8 TB, (1) SSD

(42) 8 TB, (1) SSD

(42) 8 TB, (1) SSD

(2) NSD VM’s

(2) NSD VM’s

EDR IB

40 GbE

Supercomputing 2016

GLADE I/O Network

•  Network architecture providing global access to data storage from multiple HPC resources

•  Flexibility provided by support of multiple connectivity options and multiple compute network topologies •  10GbE, 40GbE, FDR, EDR •  Full Fat Tree, Quasi Fat Tree, Hypercube

•  Scalability allows for addition of new HPC or storage resources •  Agnostic with respect to vendor and file system •  Can support multiple solutions simultaneously

Supercomputing 2016

HPSS Archive160PB, 12.5 GB/s

60PB holdings

HPSS

GLADE16 PB, 90 GB/s, GPFS20 PB, 220GB/s, GPFS

(4) DTN

Data Services20GB/s

FRGP Network

Science Gateways

10 Gb Ethernet 40 Gb Ethernet 100 Gb Ethernet Infiniband

100 GbE

Data Networks

Infiniband FDR

Geyser, Caldera Pronghorn 46 nodes

HPC / DAV

HPSS (Boulder)Disaster Recovery

HPSS

Cheyenne 4,032 nodes

Infiniband EDR

NWSC Core

Ethernet

Switch 10/40/10

0

10 GbE 10 GbE

40 GbE

40 GbE

Yellowstone 4,536 nodes

Infiniband FDR

Supercomputing 2016

GLADE I/O Network Connections

•  All NSD servers are connected to 40GbE network •  IBM NSD servers are connected to the FDR IB network

•  Yellowstone is a full fat tree network •  Geyser/Caldera/Pronghorn quasi fat tree, up/dwn routing •  NSD servers are up/dwn routing

•  DDN SFA’s are connected to the EDR IB network •  Cheyenne is an enhanced hypercube •  NSD VM’s are nodes in the hypercube

•  Each NSD VM is connected to two points in the hypercube •  Data transfer gateways, RDA, ESG and CDP science gateways are

connected to 40GbE and 10GbE networks •  NSD servers will route traffic over the 40GbE network to serve data to both

IB networks

Supercomputing 2016

GLADE Manager Nodes

•  Secondary GPFS cluster manager, quorum node

glademgt1

•  Primary GPFS cluster manager, quorum node

glademgt2

•  token manager, quorum node, file system manager

glademgt3

•  token manger, quorum node, file system manager

glademgt4

•  Primary GPFS configuration manager •  token manager, quorum node, file system manager, multi-cluster

contact node

glademgt5

•  Secondary GPFS configuration manager •  token manager, multi-cluster contact node

glademgt6

Supercomputing 2016

GLADE File System Structure

GLADE Spectrum Scale Cluster

scratch

SSD metadata

SFA14KXE 15 PB

8M block, 4K inode

p

DCS3700 10 PB

4M block 512 byte inode

p2

SSD metadata

SFA14KXE 5 PB

8M block, 4K inode

DCS3700 5 PB

home apps

SSD metadata

SSD applications

SFA7700 100 TB

1M block, 4K inode

•  ILM rules to migrate between pools •  Data collections pinned to DCS3700

Supercomputing 2016

Spectrum Scale Multi-Cluster

•  Clusters can serve file systems or be diskless •  Each cluster is managed separately

•  can have different administrative domains •  Each storage cluster controls who has access to which file systems •  Any cluster mounting a file system must have TCP/IP access from all nodes to the

storage cluster •  Diskless clusters only need TCP/IP access to the storage cluster hosting the file

system they wish to mount •  No need for access to other diskless clusters

•  Each cluster is configured with a GPFS subnet which allows NSD servers in the storage cluster to act as routers •  Allows blocking of a subnet when necessary

•  User names and group need to be sync’d across the multi-cluster domain

Supercomputing 2016

Spectrum Scale Multi-Cluster

Geyser, Caldera Pronghorn 46 nodes

Cheyenne 4,032 nodes

Yellowstone 4,536 nodes n1 n2 n4536

n1 n2 n46

nsd21

nsd22

nsd54

nsd1

nsd2

nsd20

ESG / CDP Science

Gateways

Data Transfer Services

RDA Science Gateway

web n1 n8

n1 n2 n6

web n1 n8

EDR Network

FDR Network

40GbE Network

. . .

. . .

. . .

. . .

. . .

. . .

Storage Cluster YS

Cluster

SAGE Cluster

RDA Cluster

DAV Cluster

CH Cluster

DTN Cluster

n1 n2 n10 CH PBS Cluster

n1 n2 n4032

Cheyenne PBS, Login Nodes

. . .

. . .. . .

GLADE

Supercomputing 2016

HPC Futures Lab - I/O Innovation Lab

•  Provide infrastructure necessary for I/O hardware and software testing •  Focused on future I/O architectures and how they might relate to NCAR HPC •  Proof of Concept for future HPC procurements •  Development environment for upcoming I/O environment changes •  Domains

•  High-Bandwidth Networks (40/100 GbE, IB, OPA) •  Parallel File Systems, early release & pre-production testing (Spectrum Scale, Lustre, ??) •  Storage Hardware (block storage, object storage, SSD, NVMe, ??) •  Hadoop Cluster (testing of local temporary storage) •  Cloud Storage Clusters (targeted for Data Service Gateways) •  Compute Cluster (to drive I/O tests)

Supercomputing 2016

HPC Futures Lab - I/O Innovation Lab

•  Boulder Colorado Lab •  Network Infrastructure to support 40/100 GbE, EDR IB,

OPA •  Ability to add additional network technology as

necessary •  Management Infrastructure

•  Cluster management tools •  Monitoring tools •  Persistent storage for home directories and

application code •  Storage and server hardware

•  Cheyenne Wyoming Lab •  Leverages current test infrastructure

•  Production network support for 40/100 GbE, IB •  Integration with current pre-production

compute clusters •  Focused on ‘Burst Buffer’ and NVMe environments

•  Easily coupled to test compute clusters

Supercomputing 2016

I/O Innovation Lab – Boulder Lab

Parallel File System Research

Compute / Hadoop Cluster

Cloud

Cloud

Cloud

Cloud

40GbE

iolab mgt1

iolab mgt2

40GbE

40GbE

OPA

EDR GPFS Mgt

FDR

GPFS NSD

GPFS NSD

Lustre OSS

Lustre OSS

Lustre MS

Lustre Mgt

SSD

SFA 7700 Storage

40GbE

EDR

OPA

Hadoop

Hadoop

Hadoop

Hadoop

Hadoop

Hadoop

Hadoop

Hadoop

Cloud Data Services Research

Lab Infrastructure

Supercomputing 2016

I/O Innovation Lab – Cheyenne Lab

IBM GPFS

IBM DCS3700

DDN GPFS

DDN SFA14KXE

IBM Jellystone

SGI Laramie

IBM yogi/booboo

DDN IME

DDN IME

SGI UV300

SGI NVMe

40GbE

FDR

EDR

GPFS I/O Routing from FDR to EDR networks

Connected to EDR at FDR rate

Leverages picnic infrastructure

Leverages current test systems for I/O

testing

Supercomputing 2016

QUESTIONS? [email protected]

Technology

Ncar globally accessible user environment