Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
The Effect of HDR InfiniBand and In-Network
Computing on CAE Simulations
HPC-AI Advisory Council
1
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
The HPC-AI Advisory Council
• World-wide HPC non-profit organization
• More than 400 member companies / universities / organizations
• Bridges the gap between HPC-AI usage and its potential
• Provides best practices and a support/development center
• Explores future technologies and future developments
• Leading edge solutions and technology demonstrations
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
HPC-AI Advisory Council Members
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
HPC-AI Advisory Council Cluster Center
• The Council operates and manages a cluster center
• Providing free of charge access to variety of compute, network and storage
technologies
• Intel, AMD, IBM Power, ARM, NVIDIA and more
• For more information: http://hpcadvisorycouncil.com/cluster_center.php
Multiple Applications Best Practices Published
App
App
App
App
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
Data as a Resource
20th Century 21st Century
From CPU-Centric to Data-Centric Data Centers
Everything
CPU Network
From CPU-Centric to Data-Centric Data Centers
Workload
Network Functions
Communication Framework (MPI)
Workload
In-CPU Computing In-Network Computing
SHARP - Scalable Aggregation and Reduction Technology
• Reliable Scalable General Purpose Primitive– In-network Tree based aggregation mechanism
– Large number of groups
– Multiple simultaneous outstanding operations
• Applicable to Multiple Use-cases– HPC Applications using MPI / SHMEM
– Distributed Machine Learning applications
• Scalable High Performance Collective Offload– Barrier, Reduce, All-Reduce, Broadcast and more
Topology (Physical Tree)
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
Micro Benchmark – MPI Allreduce Latency
• Oak Ridge National Laboratory – Coral Summit Supercomputer
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
SHARP Performance Comparison (Lower is Better)
• SHARP Enables 4X Higher Performance
1.5X4X
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
LSTC LS-DYNA
• LS-DYNA– A general purpose structural and fluid analysis simulation software
package capable of simulating complex real world problems
– Developed by the Livermore Software Technology Corporation (LSTC)
• LS-DYNA used by– Automobile
– Aerospace
– Construction
– Military
– Manufacturing
– Bioengineering
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
Benchmark Setup
• OS: CentOS 7.7
• Driver: MLNX_OFED 4.7
• CPU: Intel E5-2697 v4 @2.6GHz, dual socket 16 cores per socket (dual socket)
• Network: InfiniBand HDR100
• LS-DYNA Version: ls-dyna_mpp_s_R11_1_0_x64_centos65_ifort160_avx2_intelmpi-2018
• Input: 3cars_rev02
• IO: RAMFS
• MPI: HPC-X 2.6.0
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
LS-DYNA 3cars Benchmark Profiling - % of MPI Time
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
LS-DYNA 3cars Benchmark Profiling – MPI Communications
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
LS-DYNA 3cars Benchmark Profiling – Message Size
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
LS-DYNA 3cars Benchmark Profiling – Memory Usage
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
LS-DYNA 3cars Benchmark – InfiniBand Transport
• DC (Dynamically Connected) InfiniBand network transport uses dynamically pool of
network resources, therefore reduces memory footprint
• DC transport was design for large scale supercomputers
• DC transport shown to provide higher performance results
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
LS-DYNA Neon_Refined_Revised Benchmark – InfiniBand
Transport
• DC (Dynamically Connected) InfiniBand network transport uses dynamically pool of
network resources, therefore reduces memory footprint
• DC transport was design for large scale supercomputers
• DC transport shown to provide higher performance results
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
LS-DYNA 3cars Benchmark – MPI Libraries
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
LD-DYNA Neon_Refined_Revised Benchmark – MPI Libraries
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
OpenFOAM
• Toolbox in an open source CFD applications that can simulate– Complex fluid flows involving
– Chemical reactions
– Turbulence
– Heat transfer
– Solid dynamics
– Electromagnetics
– The pricing of financial options
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
Benchmark Setup
• OS: CentOS 7.7
• Driver: MLNX_OFED 4.7
• CPU: Intel Gold 6138 CPU @ 2.00GHz, dual socket 20 cores per socket (dual socket)
• Network: InfiniBand HDR100 over Single HDR Switch
• OpenFOAM Version: v1912
• Input: MotorBike_160
• IO: Lustre/Local Disk
• MPI: HPC-X 2.6.0/Intel MPI 2019 u7
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
OpenFOAM Profiling – MPI Time
• MPI profiler shows the type of underlying MPI network communications
– Majority of communications occurred are non-blocking communications
• Majority of the MPI time is spent on non-blocking communications at 32 nodes
– MPI_Waitall (11% wall), 8-byte MPI_Recv (1.4% wall), 1-byte MPI_Recv (0.7% wall)
– Only 14% of the overall runtime is spent on MPI communications at 32-nodes
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
OpenFOAM Profiling – MPI Communication Topology
• Communication topology shows communication patterns among MPI ranks
• MPI processes mainly communicates with neighbors, but also shows some other patterns
32 Nodes16 Nodes8 Nodes4 Nodes
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
OpenFOAM – MPI Comparison
26
100% Scalability
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
OpenFOAM – IO Comparison
27
100% Scalability 8%
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
OpenFOAM – AVX Comparison
28
3%
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
Summary
• HPC applications such as LS-DYNA and OpenFOAM impose high demands on the cluster interconnect
• Low latency, high data throughput and MPI offload engines deliver higher performance
• A comparison between the different InfiniBand transport services demonstrates the performance advantages of the Dynamically Connected (DC) transport. The DC transport was designed for scalable HPC infrastructures, to enable the usage of a dynamic pool of network resources
• Intel MPI 2019 u7 and HPC-X 2.6 use the same UCX library from the UCF (Unified Communication Framework) consortium, and therefore demonstrate similar performance on both LS-DYNA and OpenFOAM
• Enabling AVX2 for OpenFOAM had 3% advantage over SSE4.2 (No AVX)
• OpenFOAM running mounted to local disk gave 8% advantage over Lustre
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
2020 HPC-AI Advisory Council Activities
• Cluster Center and Advanced Technology center– Multiple clusters, variety of leading edge technologies
• 2020 Conferences– USA (Stanford University) – April (Online)
– Australia (National Computational Infrastructure – NCI) – September
– HPC China - September
– UK (University of Leicester, DiRAC) – October
– SC China Conference – November
• 2020 Competitions– APAC Annual HPC-AI Competition – May-October (Online)
– ISC Annual Student Cluster Competition – June (Online)
• For more information – www.hpcadvisorycouncil.com
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
Thank You!
31