View
5
Download
0
Category
Preview:
Citation preview
Copyright 2019 RIST
Gilles Gouaillardetgilles@rist.or.jp
Research Organization for Information Science & Technology
From K towards Fugaku,application provisioning on ARM+SVE
November 18th, Arm HPC User Group, SC’19
Copyright 2019 RISTARM HPC User Group @ SC’19
SingaporeLos Angeles
Amsterdam
2
810
11
12
13
1
3
45 6
79
RIKEN R-CCSK computer
©RIKEN
1
n HPCI, High Performance Computing Infrastructure established in 2012 as national HPC infrastructure, is a system connecting the Tier 0 flagship system and part of the Tier 1 major Universities and National Lab systems by high speed academic network (SINET-5).
HPCI and Supercomputer Fugaku
n World’s top class HPCI computing resources with variety of system type are provided to open call.
Kyushu Univ.ITO Subsystem B
11Osaka Univ.OCTOPUS
10
Kyoto Univ.Cray XC40
9
Nagoya Univ.FX-100
8
JAMSTECEARTH SIMULATOR
12
Tokyo Inst. of Tech.TSUBAME3.0
7
The Univ. of TokyoReedbush-U/H/L
6
AISTABCI
13JCAHPCOakforest-PACS
5
Univ. of TsukubaCygnus
4
Tohoku Univ.SX-ACE
3
Hokkaido Univ.Grand Chariot
2
SINET-5 (100 Gbps)
: node: domestic line: international line
Scheduled to be
operational around
2021
Service ended on
Aug. 16, 2019
As of June 2019
l K computer
1.5 PFlops x Yearl Other HPCI machines in
total
14.2 PFlops x Yearl Shared Storage
45 PB
Computing resources allocated at the Public Call in FY 2019
2
Copyright 2019 RISTARM HPC User Group @ SC’19
User Support --- as a Part of RIST’s Activities in HPCI
3
User Selectionand
Resource Allocation
User SupportDissemination ofAchievements
#1269
nRIST’s RolesuRegistered Institution for Facilities Use
Promotion of the “Specific High-speed Computer Facilities” (K computer)
uRepresentative for HPCI Operation
nRIST’s Activities
First Level Support
Advanced Level SupportuUser application porting
uSerial and scalability tuning
uApplication software provisioning
We are undertaking user support activities toward Fugaku.
User trainingsuProgramming training
uSeminars and Workshops
Copyright 2019 RISTARM HPC User Group @ SC’19
Widely-used Applications in HPCI projects
4
59
4442
3129
26 25 25 24 24 24 24
18 18 18
GROMACS
LAMMPS
MODYLAS
OpenF
OAM
FrontFlow
/blue
FrontFlow
/red
SCALE
OpenM
X
NICAM
VASP
GENESIS
LANS3D
FrontIS
TR
Quantu
m ESPRESSO
AMBER
Num
er o
f use
r rep
orts
■ Strong color: OSS/Commercial Ø Based on user reports publicly available on Jan. 23, 2019
Ø Limited to applications in the HPCI database.
Copyright 2019 RISTARM HPC User Group @ SC’19
What ready-to-use means?
5
nReady-to-use meansuPre-installation of the softwareuDocuments for Utilization
lExamples for utilizationlTutorial and SeminarslBenchmark information HPCI Portal site
Copyright 2019 RISTARM HPC User Group @ SC’19
2019 2020 2021
K computer
Arm server ThunderX2 at RIST
ü Examine the version and environmentü Build and test runs
OSS preparation plan towards Fugaku
ü Performance estimation
Service start▼
Early accessFugaku
In progress
ü Pre-installationü Preparing documents
for utilization
ü Portingü Test runs
ü Examination for future tuning supportü Preparation for user support
6
Post-K ComputerPerformance Evaluation Environment
ü ARM Instruction Emulator
ü gem5 simulator
Copyright 2019 RISTARM HPC User Group @ SC’19
n Who are the targets?u Potential users of Fugaku
n What can you do with the Environment?u Approximate performance of programs is attainableu The Environment mainly consists of:
l “Processor simulators” l “Performance Estimation Tools” (on FX100) l “Compilers (Fortran, C/C++)” for Fugaku
n Call for project proposals now openu The project period is up to 6 months u The call is open throughout the yearu RIST provides technical supports
Call for project proposalsPost-K Computer Performance Evaluation Environment
Estimate your code performance on future Supercomputer Fugaku !
See HPCI Portal site for more details...* Post-K = Supercomputer Fugaku
7
Copyright 2019 RISTARM HPC User Group @ SC’19
Supercomputer Fugaku vs K
9
K FugakuISA Sparcv9 ARMv8.2+SVENUMA domains 1 4Cores 8 48SIMD Width 128 512GFlops (double precision) 128 2700GFlops (single precision) 128 5400GFlops (half precision) - 10800Memory bandwidth per node (GB/s) 64 1000Memory bandwidth per core (GB/s) 8 20.8Flops/Bytes (double precision) 2 2.7
nCPU only architecture with massive memory bandwidthnVery well balanced system designed to maximize application performancen Familiar software environment to ease transition from K
Copyright 2019 RISTARM HPC User Group @ SC’19
n Challenges and opportunitiesl New ISA (ARMv8+SVE) and rich software ecosystemsl GNU and LLVM compilers (OSS)l Fujitsu and ARM compilers (Commercial)l Fujitsu MPI (Open MPI based) and Riken MPI (MPICH based)
n Multiple collaborationsl Engage developers and communitiesl Exchange information with ARM and Fujitsu application teamsl Collaborate with ARM and Fujitsu compiler teams
nMainly a many variables optimization effortl Compilerl Librariesl Optimization options
nValidation is an important effort
Toward Supercomputer Fugaku
10
Copyright 2019 RISTARM HPC User Group @ SC’19
class SimdFloat{
public:SimdFloat() {}
float32x4_t simdInternal_;};
static inline SimdFloatoperator+(SimdFloat a, SimdFloat b){
return {vaddq_f32(a.simdInternal_, b.simdInternal_)
};}
Porting GROMACS to SVE
nHighly tuned kernels on several architecturesn “Explicit” vectorizationnHeavy use of SIMD intrinsicsnOnly NEON (128 bits vectors) is supported
Copyright 2019 RISTARM HPC User Group @ SC’19
nRich and mature software ecosystem is already there on ARM architecturenARM Instruction Emulator can be used to check the correctness of SVE
binaries on non-SVE architecturen gem5 simulator
l Can be used to co-design an ARM processorl Help with preliminary tuning before the real CPU is availablel Is order of magnitudes slower than the physical hardware
n Ideally, application would feature minimal testsl Representing real application behaviorl Fast enough to run on a simulator within a reasonable time
nACLE support will soon land in OSS compilers
Lessons learned
12
Copyright 2019 RISTARM HPC User Group @ SC’19
nScalable Vector Extensions (SVE)l Vendor can implement several vector lengths without changing the ISA
nVector Length Agnostics (VLA)l VLA binary can run on any SVE architecture, regardless its vector lengthl Performance should increase with vector lengthl Ideal when shipping binaries (such as a Linux distro)
nVirtually all instructions are predicated (e.g. masked operations)l Effective way to build and execute VLA binariesl No more loop tailsl Can be a great fit for HPC (e.g. for () { if() } loops)
SVE & VLA
13
Copyright 2019 RISTARM HPC User Group @ SC’19
nGenerating a VLA binary prevent some compiler optimizationsl Loop unrolling is less efficientl Some decisions can only be performed at runtime (vs compile time)l Overhead vs fixed length binary was estimated at an average 10%
nDoes HPC really require VLA?l Most HPC software does not come in binary formatl HPC stack can easily be built from sources (spack)l Download and run containers (singularity)
VLA & HPC
14
Copyright 2019 RISTARM HPC User Group @ SC’19
nSupercomputer Fugaku, the next flagship system of HPCI and the successor to K computer is scheduled to be operational around 2021.
n Towards Fugaku, RIST is working for making OSS ready-to-usenSoftware ecosystem is already rich and mature on ARMnUseful tools are also available to co-design, port and/or tune software on
ARM+SVE processorsnCall for project proposals for “Post-K Computer Performance Evaluation
Environment” is now open, and RIST provides technical supports for them.nVisit us on our booth #1269
Summary
15
Recommended