20
FLAGSHIP 2020 Project: the Post-K and ARM SVE Mitsuhisa Sato Team Leader of Architecture Development Team FLAGSHIP 2020 project RIKEN Advance Institute of Computational Science (AICS) International Symposium: New Horizons of Computational Science with Heterogeneous ManyCore Processors February 2728, 2018

FLAGSHIP 2020 Project: the Post-K and ARM SVE...FLAGSHIP 2020 Project: the Post-K and ARM SVE Mitsuhisa Sato Team Leader of Architecture Development TeamFLAGSHIP 2020 project RIKEN

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: FLAGSHIP 2020 Project: the Post-K and ARM SVE...FLAGSHIP 2020 Project: the Post-K and ARM SVE Mitsuhisa Sato Team Leader of Architecture Development TeamFLAGSHIP 2020 project RIKEN

FLAGSHIP 2020 Project: the Post-K and ARM SVE

Mitsuhisa Sato Team Leader of Architecture Development Team

FLAGSHIP 2020 projectRIKEN Advance Institute of Computational Science (AICS)

International Symposium: New Horizons of Computational Science with Heterogeneous Many‐Core ProcessorsFebruary 27‐28, 2018

Page 2: FLAGSHIP 2020 Project: the Post-K and ARM SVE...FLAGSHIP 2020 Project: the Post-K and ARM SVE Mitsuhisa Sato Team Leader of Architecture Development TeamFLAGSHIP 2020 project RIKEN

FLAGSHIP2020 Project

20018/02/27

LoginServersLoginServers

MaitenanceServers

MaitenanceServers

I/O NetworkI/O Network

………

……………………… Hierarchical

Storage SystemHierarchical

Storage System

PortalServersPortalServers

Missions• Building the Japanese national flagship supercomputer, 

post K, and• Developing wide range of HPC applications, running on 

post K, in order to solve social and science issues in Japan

Project organization• Post K Computer development

• RIKEN AICS is in charge of development• Fujitsu is vendor partner.• International collaborations: DOE, JLESC, ..

• Applications• The government selected 9 social & 

scientific priority issues and their R&D organizations.

Status and Update• “Basic Design” was finalized and now in 

“Design and Implementation” phase.• We have decided to choose ARM v8 with 

SVE as ISA for post‐K manycore processor.• We are working on detail evaluation by 

simulators and compilers

2

Page 3: FLAGSHIP 2020 Project: the Post-K and ARM SVE...FLAGSHIP 2020 Project: the Post-K and ARM SVE Mitsuhisa Sato Team Leader of Architecture Development TeamFLAGSHIP 2020 project RIKEN

Target science: 9 Priority Issues重点課題① ⽣体分⼦システムの機能制御による ⾰新的創薬基盤の構築①Innovative Drug Discovery

RIKEN Quant. Biology Center

重点課題② 個別化・予防医療を⽀援する 統合計算⽣命科学②Personalized and Preventive Medicine

Inst. Medical Science, U. Tokyo

重点課題③ 地震・津波による複合災害の統合的予測システムの構築③Hazard and Disaster induced by

Earthquake and Tsunami

Earthquake Res. Inst., U. Tokyo

重点課題④ 観測ビッグデータを活⽤した気象と地球環境の予測の⾼度化④Environmental Predictions

with Observational Big Data

Center for Earth Info., JAMSTEC

重点課題⑥ ⾰新的クリーンエネルギー システムの実⽤化⑥Innovative Clean Energy Systems

Grad. Sch. Engineering, U. Tokyo

重点課題⑦ 次世代の産業を⽀える新機能デバイス・⾼性能材料の創成⑦New Functional Devices and High-Performance

Inst. For Solid State Phys., U. Tokyo

重点課題⑧ 近未来型ものづくりを先導する ⾰新的設計・製造プロセスの開発⑧ Innovative Design and Production Processes for the Manufacturing Industry in the

Near Future

Cent. for Earth Info., JAMSTEC重点課題⑤ エネルギーの⾼効率な創出、変換・貯蔵、利⽤の新規基盤技術の開発⑤High-Efficiency Energy Creation,

Conversion/Storage and Use

Inst. Molecular Science, NINS

重点課題⑨ 宇宙の基本法則と進化の解明⑨Fundamental Laws and Evolution of the Universe

Cent. for Comp. Science, U. Tsukuba

Society with health and longevity

Disaster prevention and global climate

Energy issues

Industrial competitiveness

Basic science

20018/02/27 3

Page 4: FLAGSHIP 2020 Project: the Post-K and ARM SVE...FLAGSHIP 2020 Project: the Post-K and ARM SVE Mitsuhisa Sato Team Leader of Architecture Development TeamFLAGSHIP 2020 project RIKEN

Target science: Exploratory Issues萌芽的課題⑪ 複数の社会経済現象の相互作⽤の

モデル構築とその応⽤研究

萌芽的課題⑩ 基礎科学のフロンティア- 極限への挑戦Frontiers of Basic Science -

challenge to extremes -

Interactive Models of Socio-Economic Phenomena and their

Applications

萌芽的課題⑫ 太陽系外惑星(第⼆の地球)の誕⽣と太陽系内惑星環境変動の解明Formation of exo-planets (second

Earth) and Environmental Changes of Solar Planets

萌芽的課題⑬ 思考を実現する神経回路機構の解明と⼈⼯知能への応⽤Mechanisms of Neural Circuits

for Human Thoughts and Artificial Intelligence

Projects (more than 10 teams) were selected in Jun 2016

20018/02/27 4

Page 5: FLAGSHIP 2020 Project: the Post-K and ARM SVE...FLAGSHIP 2020 Project: the Post-K and ARM SVE Mitsuhisa Sato Team Leader of Architecture Development TeamFLAGSHIP 2020 project RIKEN

Co-design in basic design phase

Target ApplicationsArchitectural Parameters• #SIMD, SIMD length, #core,

#NUMA node• cache (size and bandwidth)• memory technologies• specialized hardware• Interconnect• I/O network

5

Page 6: FLAGSHIP 2020 Project: the Post-K and ARM SVE...FLAGSHIP 2020 Project: the Post-K and ARM SVE Mitsuhisa Sato Team Leader of Architecture Development TeamFLAGSHIP 2020 project RIKEN

Target Applicationsʼ Characteristics

Target Application

Program Brief description Co-design① GENESIS MD for proteins Collective comm. (all-to-all), Floating point perf (FPP)

② Genomon Genome processing (Genome alignment) File I/O, Integer Perf.

③ GAMERA Earthquake simulator (FEM in unstructured & structured grid) Comm., Memory bandwidth

④ NICAM+LETKWeather prediction system using Big data (structured grid stencil & ensemble Kalmanfilter)

Comm., Memory bandwidth, File I/O, SIMD

⑤ NTChem molecular electronic (structure calculation) Collective comm. (all-to-all, allreduce), FPP, SIMD,

⑥ FFB Large Eddy Simulation (unstructured grid) Comm., Memory bandwidth,

⑦ RSDFT an ab-initio program (density functional theory) Collective comm. (bcast), FPP

⑧ AdventureComputational Mechanics System for Large Scale Analysis and Design (unstructured grid)

Comm., Memory bandwidth, SIMD

⑨ CCS-QCD Lattice QCD simulation (structured grid Monte Carlo) Comm., Memory bandwidth, Collective comm. (allreduce)

6

Page 7: FLAGSHIP 2020 Project: the Post-K and ARM SVE...FLAGSHIP 2020 Project: the Post-K and ARM SVE Mitsuhisa Sato Team Leader of Architecture Development TeamFLAGSHIP 2020 project RIKEN

Co-design in basic design phase

Architectural Parameters• #SIMD, SIMD length, #core, #NUMA node• cache (size and bandwidth)• memory technologies• specialized hardware• Interconnect• I/O network

Target Applications

Mutual understanding bothcomputer architecture/system software and applications

Looking at performance predictions Finding out the best solution with constraints, e.g., power

consumption, budget, and space

Prediction of node-level performance

Profiling applications, e.g., cache misses and execution unit usages

Prediction Tool

Prediction of scalability(Communication cost)

7

Page 8: FLAGSHIP 2020 Project: the Post-K and ARM SVE...FLAGSHIP 2020 Project: the Post-K and ARM SVE Mitsuhisa Sato Team Leader of Architecture Development TeamFLAGSHIP 2020 project RIKEN

R&D Organization

8

• DOE‐MEXT• JLESC• CEA• …

International Collaboration

• HPCI Consortium• PC Cluster Consortium• OpenHPC• …

Communities

• Univ. of Tsukuba• Univ. of Tokyo• Univ. of Kyoto

Domestic Collaboration

Page 9: FLAGSHIP 2020 Project: the Post-K and ARM SVE...FLAGSHIP 2020 Project: the Post-K and ARM SVE Mitsuhisa Sato Team Leader of Architecture Development TeamFLAGSHIP 2020 project RIKEN

International Collaborations

DOE-MEXT Optimized Memory Management, Efficient MPI for exascale, Dynamic Execution

Runtime, Storage Architectures, Metadata and active storage, Storage as a Service, Parallel I/O Libraries, MiniApps for Exascale CoDesign, Performance Models for Proxy Apps, OpenMP/XMP Runtime, Programming Models for Heterogeneity, LLVM for vectorization, Power Monitoring and Control, Power Steering, Resilience API, Shared Fault Data, etc.

JLESC: Joint Laboratory on Extreme Scale Computing (NCSA, ANL, UTK, JSC, BSC, INRIA, RIKEN) Simplified Sustained System performance benchmark , Developer tools for porting

and tuning parallel applications on extreme-scale parallel systems,. HPC libraries for solving dense symmetric eigenvalue problems,.DTF+SZ: Using lossy data compression for direct data transfer in multi-component applications, Process-in-Process: Techniques for Practical Address-Space Sharing

CEA Programming Language, Runtime Environment, Energy-aware batch job

scheduler, Large DFT calculations and QM/MM, Application of High Performance Computing to Earthquake Related Issues of Nuclear Power Plant Facilities, KPIs (Key Performance Indicators)

20018/02/27 9

Page 10: FLAGSHIP 2020 Project: the Post-K and ARM SVE...FLAGSHIP 2020 Project: the Post-K and ARM SVE Mitsuhisa Sato Team Leader of Architecture Development TeamFLAGSHIP 2020 project RIKEN

An Overview of post K

Hardware Manycore architecture 6D mesh/torus Interconnect 3-level hierarchical storage system

Silicon Disk Magnetic Disk Storage for archive

LoginServersLoginServers

MaintenanceServers

MaintenanceServers

I/O NetworkI/O Network

………

……………………… Hierarchical

Storage SystemHierarchical

Storage System

PortalServersPortalServers

System Software Multi-Kernel: Linux with Light-weight Kernel File I/O middleware for 3-level hierarchical storage

system and application Application-oriented file I/O middleware MPI+OpenMP programming environment Highly productive programming language and libraries

20018/02/27

MC‐kernel: a lightweightKernel for manycore

XcalableMP PGAS language FPDS DSL

10

Page 11: FLAGSHIP 2020 Project: the Post-K and ARM SVE...FLAGSHIP 2020 Project: the Post-K and ARM SVE Mitsuhisa Sato Team Leader of Architecture Development TeamFLAGSHIP 2020 project RIKEN

SVE registers

20018/02/27 11

Page 12: FLAGSHIP 2020 Project: the Post-K and ARM SVE...FLAGSHIP 2020 Project: the Post-K and ARM SVE Mitsuhisa Sato Team Leader of Architecture Development TeamFLAGSHIP 2020 project RIKEN

SVE example

Compact code for SVE as scalar loop OpenMP SIMD directive is expected to help the SVE programming

20018/02/27 12

Page 13: FLAGSHIP 2020 Project: the Post-K and ARM SVE...FLAGSHIP 2020 Project: the Post-K and ARM SVE Mitsuhisa Sato Team Leader of Architecture Development TeamFLAGSHIP 2020 project RIKEN

Post-K: Powered by Fujitsu-designed CPU & Tofu Fujitsu CPU cores support the ARM SVE instruction set architecture

Post-K Fujitsu CPU cores & Tofu maintain the programming models and provide high application performance

ARM’s standard frameworks (SBSA, etc.) assure compatibility among platforms FP16

Functions & architecturePost-K FX100 FX10 K

CPU Core

Instruction set architecture ARMv8-A SPARC V9

SIMD width 512bit 256bit 128bit 128bit

Double precision (64bit) ✔ ✔ ✔ ✔

Single precision (32bit) ✔ ✔ ✔ ✔

Half precision (16bit) ✔ - - -

Interconnect Tofu interconnect Enhanced Tofu2 Tofu Tofu13

Page 14: FLAGSHIP 2020 Project: the Post-K and ARM SVE...FLAGSHIP 2020 Project: the Post-K and ARM SVE Mitsuhisa Sato Team Leader of Architecture Development TeamFLAGSHIP 2020 project RIKEN

Post-K Supports a New Data Type: FP16

FP16 is one of the key features for further optimization opportunities regarding hardware, system software, and applications

•Power consumption

•Bandwidth

•Cost

Reducing data bits & their movement• Smaller data types resolve, directlyReducing data bits & their movement• Smaller data types resolve, directly

Packed vectors of:• Half precision FP16 elements

• 8 & 16 bit fixed-point elements

Packed vectors of:• Half precision FP16 elements

• 8 & 16 bit fixed-point elements

•Existing numerical apps

•Brand-new apps,

Incl. deep learning

Mixed precision: Linear Algebra

Relaxed precision: Molecular Dynamics

Half precision: Deep Learning

Mixed precision: Linear Algebra

Relaxed precision: Molecular Dynamics

Half precision: Deep Learning

Post-K CPU will supportGPUs, Xeon Phi and Fujitsu

processors will support: new

data types

Challenges

Applications

Modern Processors Post-KPost-K

14

Page 15: FLAGSHIP 2020 Project: the Post-K and ARM SVE...FLAGSHIP 2020 Project: the Post-K and ARM SVE Mitsuhisa Sato Team Leader of Architecture Development TeamFLAGSHIP 2020 project RIKEN

Programming Lang. & Compiler (Fujitsu) Fortran2008&Fortran2018 subset C11&GNU extension・Clang extension

spec C++14&C++17subset&GNU extension

spec ・Clang extension spec. OpenMP 4.5 &OpenMP 5.0 subset Java

Parallel Programming Lang.& Domain Specific Library (RIKEN) XcalableMP PGAS lang. incl CAF FDPS (Framework for Developing Particle

Simulator)

Process・Thread Libs (RIKEN)

PiP (Process in Process)

Script Lang. Several kinds of Script languages

packaged in Linux distribution will be available. Ex.︓Python+NumPy, SciPy

Communication Libs. MPI 3.1&MPI4.0 subset

Open MPI –based (Fujitsu)、MPICH (RIKEN)

Low-level comm. Libs uTofu(Fujitsu)、LLC(RIKEN)

File I/O Libs (RIKEN) pnetCDF, DTF, FTAR

Math Libs BLAS, LAPACK, ScaLAPACK, SSL II

(Fujitsu) EigenEXA, Batched BLAS (RIKEN)

Prog. Development tools (Fujitsu) Profiler, debugger, GUI tools

Summary of Programming Environment & Tools on “PostK”

GCC and LLVM (by Arm) will be available.

1520018/02/27

Page 16: FLAGSHIP 2020 Project: the Post-K and ARM SVE...FLAGSHIP 2020 Project: the Post-K and ARM SVE Mitsuhisa Sato Team Leader of Architecture Development TeamFLAGSHIP 2020 project RIKEN

Summary of Programming Environment & Tools on “PostK”

Batch job scheduler (Fujitsu) Technical Computing Suite

Enhanced version of “K computer” batch job scheduler Open-source Libs, Tools, and Apps.

• OSS will be available delivered by Red Hat Distribution (CentOS or RHEL for Arm)

• Currently, we are asking to the users of the K computer what kinds of OSS will be requested.

OS kernels for compute-node.• Linux (RHEL distribution)• McKernel Light-weight Kernel (RIKEN)

1620018/02/27

Page 17: FLAGSHIP 2020 Project: the Post-K and ARM SVE...FLAGSHIP 2020 Project: the Post-K and ARM SVE Mitsuhisa Sato Team Leader of Architecture Development TeamFLAGSHIP 2020 project RIKEN

“PostK” performance evaluation environment

RIKEN is constructing “PostK” performance evaluation environment for application programmers to evaluate and estimate the performance of their applications on “PostK” and for performance turning for “postK”.

The “PostK” performance evaluation environment is available on the servers installed in RIKEN. The environment includes the following tools and servers: A small-scale FX100 system and “postK” performance estimation tool:

The estimation tool gives the performance estimation of multithreaded programs on “postK” from the profile data taken on FX100.

“PostK” processor simulator based on GEM-5: “PostK” processor simulator will give a detail performance results including estimated executing time, cache-miss, the number of instruction executed in O3. The user can understand how the compiled code for SVE is executed on “postK” processor for optimization. (Arm released GEM-5 beta0 of SVE)FP16 SVE will be available soon.

Compilers for “PostK” processor Fujitsu Compilers︓Fortran, C, C++. Fully-tuning for “postK” architecture. Arm Compiler︓LLVM-based compiler to generate code forArmv8-A + SV. C,C++ by Clang, Fortran

by Flang

SVE emulator on Arm server, developed by Arm for fast SVE code execution. Arm Severs (Planned 2Q/2018)

1720018/02/27

Page 18: FLAGSHIP 2020 Project: the Post-K and ARM SVE...FLAGSHIP 2020 Project: the Post-K and ARM SVE Mitsuhisa Sato Team Leader of Architecture Development TeamFLAGSHIP 2020 project RIKEN

Proposal for Explicit SIMD Programming OpenMP SIMD Directive SIMD code generation depends on the compiler generated SIMD code may not be optimal

ALIAS SIMD directive programmer gives the SIMD implementation of the target function the directive matches the target scalar function with the SIMD version the compiler uses the SIMD version when vectorizing loops

#pragma omp declare simduchar add_filter(uchar a2, uchar in1, uchar in2) {if (a2 > 0) {ushort temp = (ushort)in1 + (ushort)in2;if (temp > 255) return 255;else return (uchar)temp;}else return in1;}

#pragma omp alias simd to(add_filter)svuint8_t add_filter_acle(svbool_t p, svuint8_t a2,svuint8_t in1, svuint8_t in2) {svuint8_t zero = svdup_n_u8_x(p, 0);svbool_t alpha_mask = svcmpgt_u8(p, a2, zero);svuint8_t temp = svand_u8_z(alpha_mask, in2, in2);return svqadd_u8(in1, temp);}

20018/02/2718

Page 19: FLAGSHIP 2020 Project: the Post-K and ARM SVE...FLAGSHIP 2020 Project: the Post-K and ARM SVE Mitsuhisa Sato Team Leader of Architecture Development TeamFLAGSHIP 2020 project RIKEN

Comparison with ACLEint i = 0;svbool_t p = svwhilelt_b8_s32(i, N);svbool_t tp = svptrue_b32();while (svptest_first(tp, p)) {svuint8_t vin1_r = svld1_u8(p, in1_r+i);svuint8_t vout_r = add_filter_acle(p,vin2_a,vin1_r,vin2_r);svuint8_t vout_g = add_filter_acle(p,vin2_a,vin1_g,vin2_g);svuint8_t vout_b = add_filter_acle(p,vin2_a,vin1_b,vin2_b);svst1_u8(p, out_r+i, vout_r);i += svcntb();p = svwhilelt_b8_s32(i, N);}

Vectorizing Loop inACLE

#pragma omp simdfor (int i = 0; i < N; i++) {out_r[i] = add_filter(in2_a[i], in1_r[i], in2_r[i]);out_g[i] = add_filter(in2_a[i], in1_g[i], in2_g[i]);out_b[i] = add_filter(in2_a[i], in1_b[i], in2_b[i]);}

OpenMP +Explicit SIMD implementation

#pragma omp alias simd to(add_filter)svuint8_t add_filter_acle(svbool_t p, svuint8_t a2,svuint8_t in1, svuint8_t in2) {svuint8_t zero = svdup_n_u8_x(p, 0); . . .

svuint8_t add_filter_acle(svbool_t p, svuint8_t a2,svuint8_t in1, svuint8_t in2) {svuint8_t zero = svdup_n_u8_x(p, 0); . . .

20018/02/2719

Page 20: FLAGSHIP 2020 Project: the Post-K and ARM SVE...FLAGSHIP 2020 Project: the Post-K and ARM SVE Mitsuhisa Sato Team Leader of Architecture Development TeamFLAGSHIP 2020 project RIKEN

Concluding Remarks

We believe that ARM SVE will deliver high-performance and flexible SIMD-vectorization to our “post-K” manycore processor.

We think establishment of “eco-system” for ARM SVE in high-end HPC area very important, and are willing to do collaborations with partners who are interested in ARM SVE, as will as ARMv8. By sharing the experience of compilers and simulators

20018/02/27 20