19
CEA and RIKEN AICS Collaboration Yutaka Ishikawa RIKEN AICS First French‐Japanese‐German Workshop on Programming and Computing for Exascale and beyond, 5 th April 2017, Tokyo 16:25 ‐ 16:55

CEA and RIKEN AICS Collaborationevents.science-japon.org/hpc17/slides/Yutaka Ishikawa... · 2017-08-08 · Computing for Exascaleand beyond, 5thApril 2017, Tokyo 16:25 ... McKernelis

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CEA and RIKEN AICS Collaborationevents.science-japon.org/hpc17/slides/Yutaka Ishikawa... · 2017-08-08 · Computing for Exascaleand beyond, 5thApril 2017, Tokyo 16:25 ... McKernelis

CEA and RIKEN AICS Collaboration

Yutaka IshikawaRIKEN AICS

First French‐Japanese‐German Workshop on Programming and Computing for Exascale and beyond, 5th April 2017, Tokyo

16:25 ‐ 16:55

Page 2: CEA and RIKEN AICS Collaborationevents.science-japon.org/hpc17/slides/Yutaka Ishikawa... · 2017-08-08 · Computing for Exascaleand beyond, 5thApril 2017, Tokyo 16:25 ... McKernelis

Outline of Talk An Overview of FLAGSHIP 2020 and development of

post-K system CEA Collaboration Concluding Remarks

20017/04/05 2

Page 3: CEA and RIKEN AICS Collaborationevents.science-japon.org/hpc17/slides/Yutaka Ishikawa... · 2017-08-08 · Computing for Exascaleand beyond, 5thApril 2017, Tokyo 16:25 ... McKernelis

FLAGSHIP2020 Project

20017/04/05 3

LoginServersLoginServers

MaitenanceServers

MaitenanceServers

I/O NetworkI/O Network

………

……………………… Hierarchical

Storage SystemHierarchical

Storage System

PortalServersPortalServers

Missions• Building the Japanese national flagship 

supercomputer, post K, and• Developing wide range of HPC applications, running 

on post K, in order to solve social and science issues in Japan

Hardware and System Software• Post K Computer

• RIKEN AICS is in charge of development• Fujitsu is vendor partnership

Applications• 9 High priority issues from a social and 

national viewpoint• Promising creation of world‐Leading 

achievement• Promising strategic use of post K 

computer

Page 4: CEA and RIKEN AICS Collaborationevents.science-japon.org/hpc17/slides/Yutaka Ishikawa... · 2017-08-08 · Computing for Exascaleand beyond, 5thApril 2017, Tokyo 16:25 ... McKernelis

9 Social and scientific priority issues

20017/04/05 4

Category Priority issues

Life science① Innovative drug discovery infrastructure through functional control of biomolecular systems

② Integrated computational life science to support personalized and preventive medicine

Disaster prevention and

global climate problem

③ Development of integrated simulation systems for hazard and disaster induced by earthquake and tsunami

④ Advancement of meteorological and global environmental predictions utilizing observational “Big Data”

Energy problem⑤ Development of new fundamental technologies for high-efficiency energy creation, conversion/storage and use

⑥ Accelerated Development of Innovative Clean Energy Systems

Industrial applications

⑦ Creation of new functional devices and high-performance materials to support next-generation industries

⑧ Development of Innovative Design and Production Processes that Lead the Way for the Manufacturing Industry in the Near Future

Basic science ⑨ Elucidation of the fundamental laws and evolution of the universe

Selected from the following point of view:• High priority issues from a social and national viewpoint• Promising creation of world‐Leading achievement• Promising strategic use of post K computer

Page 5: CEA and RIKEN AICS Collaborationevents.science-japon.org/hpc17/slides/Yutaka Ishikawa... · 2017-08-08 · Computing for Exascaleand beyond, 5thApril 2017, Tokyo 16:25 ... McKernelis

An Overview of Co-design in the Post K development

5

Target Application

Program Brief description

① GENESIS MD for proteins

② Genomon Genome processing (Genome alignment)

③ GAMERA Earthquake simulator (FEM in unstructured & structured grid)

④ NICAM+LETK Weather prediction system using Big data (structured grid stencil & ensemble Kalman filter)

⑤ NTChem molecular electronic (structure calculation)

⑥ FFB Large Eddy Simulation (unstructured grid)

⑦ RSDFT an ab-initio program (density functional theory)

⑧ Adventure Computational Mechanics System for Large Scale Analysis and Design (unstructured grid)

⑨ CCS-QCD Lattice QCD simulation (structured grid Monte Carlo)

Node and Storage Architecture• #SIMD, SIMD length, #core,  #NUMA node• cache (size and bandwidth)• network (topologies, latency and bandwidth)• memory technologies• specialized hardware• Node interconnect, I/O network

System Software

• Operating system for many core architecture• Communication libraries (low level layer, MPI, PGAS)• File I/O (Asynchronous I/O, buffering/caching)

Programming Environment• Programming model and languages• Math libraries, domain‐specific libraries

9 social & scientific priority issues and their R&D organizations have been selected from the following point of view:• High priority issues from a social and national viewpoint• Promising creation of world‐Leading achievement• Promising strategic use of post K computer

20017/04/05

Page 6: CEA and RIKEN AICS Collaborationevents.science-japon.org/hpc17/slides/Yutaka Ishikawa... · 2017-08-08 · Computing for Exascaleand beyond, 5thApril 2017, Tokyo 16:25 ... McKernelis

An Overview of post-K

Hardware Manycore architecture based on

ARM+SVE+Fujitsuʼs extensions

6D mesh/torus Interconnect 3-level hierarchical storage system

Silicon Disk

Magnetic Disk

Storage for archive

LoginServersLoginServers

MaintenanceServers

MaintenanceServers

I/O NetworkI/O Network

……

………………………

HierarchicalStorage SystemHierarchical

Storage System

PortalServersPortalServers

System Software Multi-Kernel: Linux with Light-weight Kernel File I/O middleware for 3-level hierarchical storage

system and application Application-oriented file I/O middleware MPI+OpenMP programming environment Highly productive programing language and

libraries

20017/04/05 6

Page 7: CEA and RIKEN AICS Collaborationevents.science-japon.org/hpc17/slides/Yutaka Ishikawa... · 2017-08-08 · Computing for Exascaleand beyond, 5thApril 2017, Tokyo 16:25 ... McKernelis

CPU Architecture ARMv8-A + SVE (Scalable Vector Extension)

20017/04/05 7

FP64/FP32/FP16

Fujitsuʼs extensions Inter core barrier Sector cache Hardware prefetch assist

https://developer.arm.com/products/architecture/a-profile/docs

Page 8: CEA and RIKEN AICS Collaborationevents.science-japon.org/hpc17/slides/Yutaka Ishikawa... · 2017-08-08 · Computing for Exascaleand beyond, 5thApril 2017, Tokyo 16:25 ... McKernelis

Partition resources (CPU cores, memory)

Full Linux kernel on some cores System daemons and in-situ

non HPC applications Device drivers

Light-weight kernel(LWK), McKernel on other cores HPC applications

McKernel developed at RIKEN McKernel is loadable module of Linux McKernel supports Linux API McKernel runs on

Intel Xeon and Xeon phi Fujitsu FX10 and FX100 (Experiments)

Very simplememory 

management

Thin LWKProcess/Thread management

General scheduler

Complex Mem. Mngt.

LinuxTCP stack

Dev. Drivers

VFS

File Sys Driers

Memory

… …Interrupt

Systemdaemons

?

HPC Applications

PartitionPartition

In‐situ non HPC application

Linux API (glibc, /sys/, /proc/)

Core Core Core Core Core Core

McKernel is deployed to the Oakforest‐PACS supercomputer, 25 PF in peak, at JCAHPC organized by U. of Tsukuba and U. of Tokyo

20017/04/05 8

will be

Batch job queues for McKernel has not been turned on

Page 9: CEA and RIKEN AICS Collaborationevents.science-japon.org/hpc17/slides/Yutaka Ishikawa... · 2017-08-08 · Computing for Exascaleand beyond, 5thApril 2017, Tokyo 16:25 ... McKernelis

How to deploy McKernel

• Linux Kernel+Loadable LWK, McKernel– Linux Kernel is resident, and daemons for job scheduler and etc. run on Linux– McKernel is dynamically reloaded (rebooted) for each application

• No hardware reboot

Finish

App A, requiring LWK-without-scheduler, Is invoked

App B, requiring LWK-with-scheduler,

Is invoked

FinishAp

p C

, usi

ng fu

ll Li

nux

capa

bilit

y, Is

invo

ked

Finish

20017/04/05 9

Page 10: CEA and RIKEN AICS Collaborationevents.science-japon.org/hpc17/slides/Yutaka Ishikawa... · 2017-08-08 · Computing for Exascaleand beyond, 5thApril 2017, Tokyo 16:25 ... McKernelis

FWQ Benchmark

20017/04/05 10

https://asc.llnl.gov/sequoia/benchmarks

Linux with isolcpus McKernel

FWQ:  Fixed Work Quanta

Page 11: CEA and RIKEN AICS Collaborationevents.science-japon.org/hpc17/slides/Yutaka Ishikawa... · 2017-08-08 · Computing for Exascaleand beyond, 5thApril 2017, Tokyo 16:25 ... McKernelis

GeoFEM (University of Tokyo)

11

ICCG with Additive Schwartz Domain Decomposition - weak scaling Up to 18% improvement

0

2

4

6

8

10

12

14

16

1024 2048 4096 8192 16k 32k 64k 128k

Figu

re of m

erit (solved prob

lem size

norm

alized

 to executio

n tim

e)

Number of physical cores

Linux IHK/McKernel

Acknowledgement: Kengo Nakajima, University of Tokyo, for providing GeoFEM. This result is on Oakforest‐PACS supercomputer, 25 PF in peak, at JCAHPC organized by U. of Tsukuba and U. of Tokyo

Results using the same binary

20017/04/05

Page 12: CEA and RIKEN AICS Collaborationevents.science-japon.org/hpc17/slides/Yutaka Ishikawa... · 2017-08-08 · Computing for Exascaleand beyond, 5thApril 2017, Tokyo 16:25 ... McKernelis

CCS-QCD (University of Tsukuba)

12

Lattice quantum chromodynamics code - weak scaling Up to 38% improvement

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

1024 2048 4096 8192 16k 32k 64k 128k

MFlop

/sec/nod

e

Number of physical cores

Linux IHK/McKernel

Acknowledgement: Ken’ichi Ishikawa, Hiroshima University, providing CCS‐QCD. This result is on Oakforest‐PACS supercomputer, 25 PF in peak, at JCAHPC organized by U. of Tsukuba and U. of Tokyo

Results using the same binary

20017/04/05

Page 13: CEA and RIKEN AICS Collaborationevents.science-japon.org/hpc17/slides/Yutaka Ishikawa... · 2017-08-08 · Computing for Exascaleand beyond, 5thApril 2017, Tokyo 16:25 ... McKernelis

13

miniFE (CORAL benchmark suite) Conjugate gradient - strong scaling Up to 3.5X improvement (Linux falls over.. )

0

2000000

4000000

6000000

8000000

10000000

12000000

1024 2048 4096 8192 16k 32k 64k

Total C

G M

Flop

s

Number of physical cores

Linux IHK/McKernel 3.5X

Oakforest‐PACS supercomputer, 25 PF in peak, at JCAHPC organized by U. of Tsukuba and U. of Tokyo

Results using the same binary

20017/04/05

Page 14: CEA and RIKEN AICS Collaborationevents.science-japon.org/hpc17/slides/Yutaka Ishikawa... · 2017-08-08 · Computing for Exascaleand beyond, 5thApril 2017, Tokyo 16:25 ... McKernelis

CEA Collaboration Programming Language

Christophe Calvin, Marc Pérache, Patrick Carribault, Julien Jaeger, Julien Bigot

Mitsuhisa Sato, Hitoshi Murai, Jinpil Lee, Atsushi Hori

Runtime Environment Jacques-Charles Lafoucrière, Gilles Wiber, Yutaka Ishikawa, Masamichi

Takagi, Balazs Gerofi, Takahiro Ogura

Energy-aware batch job scheduler Matthieu Hautreux, Francis Belot, Atsuya Uno

Large DFT calculations and QM/MM Thierry Deutsch, Luigi Genovese, Takahito Nakajima , Takahito Nakajima

Application of High Performance Computing to Earthquake Related Issues of Nuclear Power Plant Facilities Evelyne Foerster, Gauthier Folzan, Alberto Frau, Muneo Hori , Hiroki

Motoyama, Kohei Fujita

KPIs (Key Performance Indicators) Jean-Philippe Bourgoin, Jean-Philippe Nominé, Didier Juvin, Shigeo Okaya,

Miwako Tsuji, Mitsuhisa Sato, Kenji Morishita

20017/04/05 14

Page 15: CEA and RIKEN AICS Collaborationevents.science-japon.org/hpc17/slides/Yutaka Ishikawa... · 2017-08-08 · Computing for Exascaleand beyond, 5thApril 2017, Tokyo 16:25 ... McKernelis

CEA Collaboration: Programming Language

Collaborators Christophe Calvin, Marc Pérache, Patrick Carribault,

Julien Jaeger, Julien Bigot Mitsuhisa Sato, Hitoshi Murai, Jinpil Lee, Atsushi Hori

Objective and Collaboration Topics Supporting a wide range of applications productivity PGAS (Partitioned Global Address Space) model for the next generation

manycore parallel systems provides light-weight one-sided communication and low overhead

synchronization semantics.

Background CEA: MPC (MultiProcessor Communications) RIKEN: XcalableMP (XMP) , PVAS (Partitioned Virtual Address Space),

and PIP (Processes in a Process)

20017/04/05 15

Page 16: CEA and RIKEN AICS Collaborationevents.science-japon.org/hpc17/slides/Yutaka Ishikawa... · 2017-08-08 · Computing for Exascaleand beyond, 5thApril 2017, Tokyo 16:25 ... McKernelis

CEA Collaboration: Programming Language

Collaborators Christophe Calvin, Marc Pérache, Patrick Carribault,

Julien Jaeger, Julien Bigot Mitsuhisa Sato, Hitoshi Murai, Jinpil Lee, Atsushi Hori

Objective and Collaboration Topics Supporting a wide range of applications productivity PGAS (Partitioned Global Address Space) model for the next generation

manycore parallel systems provides light-weight one-sided communication and low overhead

synchronization semantics.

Background CEA: MPC (MultiProcessor Communications) RIKEN: XcalableMP (XMP) , PVAS (Partitioned Virtual Address Space),

and PIP (Processes in a Process)

20017/04/05 16

2017 2018 2019 2020

• XMP available on ATOS/Bull supercomputer

• MPC available on ARM architecture

• MPC as MPI implementation for XMP prototype

• Study on a unified API for inter XMP nodes communication

• List of benchmarks and mini-app to be evaluated

• Benchmarks implemented with XMP on target architectures

• Benchmarks implemented with XMP-MPC on target architectures

• Benchmarks implemented with integrated environment on target architectures

Page 17: CEA and RIKEN AICS Collaborationevents.science-japon.org/hpc17/slides/Yutaka Ishikawa... · 2017-08-08 · Computing for Exascaleand beyond, 5thApril 2017, Tokyo 16:25 ... McKernelis

CEA Collaboration: Runtime Environment Collaborators

Christophe Calvin, Marc Pérache, Patrick Carribault,Julien Jaeger, Julien Bigot

Mitsuhisa Sato, Hitoshi Murai, Jinpil Lee, Atsushi Hori

Objective and Collaboration Topics Improving (performance) portability of applications Defining a standard of the runtime environment settings

(including libraries, OS parameters and OS kernels) Finding optimal settings in terms of application

performance Contributing to the OpenHPC community

Background CEA: SELFIE (profiling tool) and PCOCC (virtualization tool)

EasyBuild, a software build and installation framework,is used to manage open-source packages

RIKEN: Linux with IHK/McKernel (Light-weight OS Kernel)20017/04/05 17

OS: LinuxCPU: Intel Xeon, Intel Xeon Phi,

ARMNetwork: InfiniBand, Omni-Path,

Fujitsu Tofu, Bull BXI

Page 18: CEA and RIKEN AICS Collaborationevents.science-japon.org/hpc17/slides/Yutaka Ishikawa... · 2017-08-08 · Computing for Exascaleand beyond, 5thApril 2017, Tokyo 16:25 ... McKernelis

CEA Collaboration: Runtime Environment

Collaborators Christophe Calvin, Marc Pérache, Patrick Carribault,

Julien Jaeger, Julien Bigot Mitsuhisa Sato, Hitoshi Murai, Jinpil Lee, Atsushi Hori

Objective and Collaboration Topics Improving portability of applications with performance Defining a standard of the runtime environment settings (including

libraries, kernel parameters and kernels) Finding optimal settings in terms of application performance.

Background CEA: SELFIE (profiling tool) and PCOCC (virtualization tool) RIKEN: McKernel (Light-weight OS Kernel)

20017/04/05 18

2017 2018 2019 2020

• 1st version of configuration standard, libraries, kernel parameters and kernels

• 2nd version of configuration standard, libraries, kernel parameters and kernels

• 3rd version of configuration standard, libraries, kernel parameters and kernels

• 4th version of configuration standard, libraries, kernel parameters and kernels

• CEA tests McKernel on CEA’s machines

• RIKEN investigates EasyBuild

• CEA and RIKEN provide the current user demands

Page 19: CEA and RIKEN AICS Collaborationevents.science-japon.org/hpc17/slides/Yutaka Ishikawa... · 2017-08-08 · Computing for Exascaleand beyond, 5thApril 2017, Tokyo 16:25 ... McKernelis

Concluding Remarks The system software stack for post-K is being

designed and implemented with the leverage of international collaborations, CEA, DOE Labs, and JLESC (NCSA, INRIA, ANL, BSC, JSC, RIKEN)

The software stack developed at RIKEN is open source

It also runs on Intel Xeon and Xeon phi RIKEN would like to contribute to OpenHPC

20017/04/05 19