21
Hiroshi Nakamura Director of Information Technology Center The Univ. of Tokyo (Director of JCAHPC) Introduction of Oakforest-PACS

Introduction of Oakforest-PACS - hpci-office.jp · Introduction of Oakforest-PACS. ... pipe; Impacts of extreme ... Rack; 15 Chassis with 120 nodes per Rack; rear panel radiator

  • Upload
    lytruc

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

Hiroshi Nakamura

Director of Information Technology Center The Univ. of Tokyo

(Director of JCAHPC)

Introduction of Oakforest-PACS

Outline

• Supercomputer deployment plan in Japan • What is JCAHPC? • Oakforest-PACS system • Application • Summary

Impacts of extreme scale computing (2017/11/2) 2

Computational Resource Providers of HPCI

• Tier1: K Computer at RIKEN • Tier2: Supercomputers of 9 universities and 2 research

institutes : Oakforest-PACS

Impacts of extreme scale computing (2017/11/2) 3

Oakforest-PACS

Deployment plan of Tier-2 Supercomputers (as of May. 2017) available at HPCI Consortium (www.hpci-c.jp)

Impacts of extreme scale computing (2017/11/2) 4 Power consumption indicates maximum of power supply (includes cooling facility)

http://www.hpci-c.jp/news/HPCI-infra-summary.pdf

Towards Exascale Computing

Impacts of extreme scale computing (2017/11/2) 5

1

10

100

1000 Post K Computer

T2K

PF

2008 2010 2012 2014 2016 2018 2020

U. of Tsukuba U. of Tokyo Kyoto U.

RIKEN AICS

Future Exascale

Tokyo Tech. TSUBAME2.0

JCAHPC (The Univ. of Tokyo and Univ. of Tsukuba)

Oakforest-PACS

Tier-1 and Tier-2 supercomputers move forward to Exascale computing like two wheels

JCAHPC • Joint Center for Advanced High Performance

Computing (http://jcahpc.jp) • director: Hiroshi Nakamura @ ITC, U-Tokyo

• established in 2013 under agreement between • Information Technology Center (ITC) at

The University of Tokyo • Center for Computational Sciences (CCS) at

University of Tsukuba • Design, operate and manage next-generation

supercomputer system for researchers Community of advanced HPC research

Impacts of extreme scale computing (2017/11/2) 6

Procurement Policy of JCAHPC • joint procurement by two universities • uniform specification, single shared system

• Each university is financially responsible to introduce the machine and its operation

• first attempt in Japan • the largest class of budget as national universities’

supercomputer in Japan Oakforest-PACS : largest scale in Japan

• investment ratio: U. Tokyo : U. Tsukuba = 2:1

Impacts of extreme scale computing (2017/11/2) 7

Oakforest-PACS • Full operation

started Dec. 2016 • Official Program

started on April 2017

• 25 PFLOPS peak • 8208 KNL CPUs • Fat-Tree (full

bisection BW) by OmniPath

Impacts of extreme scale computing (2017/11/2) 8

• HPL 13.55 PFLOPS: #1 in Japan (2017/6) WW #6(2016/11)➙#7(2017/6)

• HPCG WW #3(2016/11)➙#5(2017/6)

HPCG on Nov. 2016

9 Impacts of extreme scale computing (2017/11/2)

Location of Oakforest-PACS : Kashiwa Campus of U. Tokyo

Impacts of extreme scale computing (2017/11/2) 10

Hongo Campus of U. Tokyo

Univ. of Tsukuba

Kashiwa Campus Univ. of Tokyo

Oakforest-PACS in the Room

11 Impacts of extreme scale computing (2017/11/2)

2nd floor of Kashiwa Research Complex

http://news.mynavi.jp/news/2016/12/02/035/

Specification of Oakforest-PACS

Impacts of extreme scale computing (2017/11/2) 12

Total peak performance 25 PFLOPS Total number of compute nodes

8,208

Compute node

Product Fujitsu PRIMERGY CX600 M1 (2U) + CX1640 M1 x 8node

Processor Intel® Xeon Phi™ 7250 (Code name: Knights Landing), 68 cores, 1.4 GHz

Memory High BW 16 GB, 490 GB/sec (MCDRAM, effective rate) Low BW 96 GB, 115.2 GB/sec (peak rate)

Inter-connect

Product Intel® Omni-Path Architecture Link speed 100 Gbps Topology Fat-tree with (completely) full-bisection

bandwidth

Computation node & chassis

Impacts of extreme scale computing (2017/11/2) 13

Computation node (Fujitsu next generation PRIMERGY) with single chip Intel Xeon Phi (Knights Landing, 3+TFLOPS) and Intel Omni-Path Architecture card (100Gbps)

Chassis with 8 nodes, 2U size

Water cooling wheel & pipe

Impacts of extreme scale computing (2017/11/2) 14

Rack

15 Chassis with 120 nodes per Rack

rear panel radiator water cooling pipe

Full bisection bandwidth Fat-tree by Intel® Omni-Path Architecture

Impacts of extreme scale computing (2017/11/2) 15

12 of 768 port Director Switch (Source by Intel)

362 of 48 port Edge Switch

2 2

24 1 48 25 72 49

Uplink: 24

Downlink: 24

. . . . . . . . . Compute Nodes 8208

Login Nodes 20

Parallel FS 64

IME 300

Mgmt, etc. 8

Total 8600

• All the nodes are connected with FBB Fat-tree • globally full bisection bandwidth is preferable

for flexible job management. • 2/3 of system : University of Tokyo • 1/3 of system : University of Tsukuba • but job assignment is flexible (no boudary)

Specification of Oakforest-PACS (I/O)

Impacts of extreme scale computing (2017/11/2) 16

Parallel File System

Type Lustre File System

Total Capacity 26.2 PB

Product DataDirect Networks SFA14KE

Aggregate BW 500 GB/sec

File Cache System

Type Burst Buffer, Infinite Memory Engine (by DDN) Total capacity 940 TB (NVMe SSD, including parity data by

erasure coding) Product DataDirect Networks IME14K

Aggregate BW 1,560 GB/sec Power consumption 4.2 MW (including cooling) actually ~3.0MW # of racks 102

Software of Oakforest-PACS

Impacts of extreme scale computing (2017/11/2) 17

Compute node Login node

OS CentOS 7, McKernel Red Hat Enterprise Linux 7 Compiler gcc, Intel compiler (C, C++, Fortran), XcalbleMP MPI Intel MPI, MVAPICH2 Library Intel MKL

LAPACK, FFTW, SuperLU, PETSc, METIS, Scotch, ScaLAPACK, GNU Scientific Library, NetCDF, Parallel netCDF, Xabclib, ppOpen-HPC, ppOpen-AT, MassiveThreads

Application mpijava, XcalableMP, OpenFOAM, ABINIT-MP, PHASE system, FrontFlow/blue, FrontISTR, REVOCAP, OpenMX, xTAPP, AkaiKKR, MODYLAS, ALPS, feram, GROMACS, BLAST, R packages, Bioconductor, BioPerl, BioRuby

Distributed FS Globus Toolkit, Gfarm Job Scheduler Fujitsu Technical Computing Suite Debugger Allinea DDT Profiler Intel VTune Amplifier, Trace Analyzer & Collector

Post-K Computer and Oakforest-PACS as the two wheels of HPCI in Japan

• Oakforest-PACS fills blank period between K Computer and Post-K Computer

• Installation of Post-K Computer is planned in 2020-2021 • Shutdown of K Computer is planned in 2018-2019 ??

• System software in Oakforest-PACS developed for Post-K • McKernel

• OS for Many-core era, for a number of thin-cores without OS jitter and core binding

• Primary OS (based on Linux) on Post-K, and application development goes ahead

• XcalableMP (XMP) • Parallel programming language for directive-base easy coding on

distributed memory system • Not like explicit message passing with MPI

Impacts of extreme scale computing (2017/11/2) 18

Oakforest-PACS resource sharing program (nation-wide) • As JCAHPC (20%)

• HPCI – HPC Infrastructure program in Japan to share all supercomputers (free!)

• Big challenge special use (full system size)

• As U. Tokyo (56.7%) • Interdisciplinary Joint Research Program • General use • Industrial trial use • Educational use • Young & Female special use

• As U. Tsukuba (23.3%) • Interdisciplinary Academic Program • Large scale general use

Impacts of extreme scale computing (2017/11/2) 19

Applications on Oakforest-PACS 20

• ARTED (SALMON) – Electron Dynamics

• Lattice QCD – Quantum Chrono Dynamics

• NICAM & COCO – Atmosphere & Ocean Coupling

• GHYDRA – Earthquake Simulations

• Seism3D – Seismic Wave Propagation

Impacts of extreme scale computing (2017/11/2)

Summary • JCAHPC : joint resource center for advanced HPC

by Univ. of Tokyo and Univ. of Tsukuba • for community for advanced HPC research

• Oakforest-PACS is currently #1 supercomputer in Japan and available for nation-wide resource sharing programs

• Oakforest-PACS and Post-K : two wheels of HPCI • Oakforest-PACS is also a testbed for McKernel and XcalableMP

system software to support Post-K development

• Full system scale applications are under development with extreme scale and getting new results

• fundamental physics, global science, disaster simulation, material science, etc.

Impacts of extreme scale computing (2017/11/2) 21