The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value...

Preview:

Citation preview

The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference

The Effect of HDR InfiniBand and In-Network

Computing on CAE Simulations

HPC-AI Advisory Council

1

The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference

The HPC-AI Advisory Council

• World-wide HPC non-profit organization

• More than 400 member companies / universities / organizations

• Bridges the gap between HPC-AI usage and its potential

• Provides best practices and a support/development center

• Explores future technologies and future developments

• Leading edge solutions and technology demonstrations

The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference

HPC-AI Advisory Council Members

The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference

HPC-AI Advisory Council Cluster Center

• The Council operates and manages a cluster center

• Providing free of charge access to variety of compute, network and storage

technologies

• Intel, AMD, IBM Power, ARM, NVIDIA and more

• For more information: http://hpcadvisorycouncil.com/cluster_center.php

Multiple Applications Best Practices Published

App

App

App

App

The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference

Data as a Resource

20th Century 21st Century

From CPU-Centric to Data-Centric Data Centers

Everything

CPU Network

From CPU-Centric to Data-Centric Data Centers

Workload

Network Functions

Communication Framework (MPI)

Workload

In-CPU Computing In-Network Computing

SHARP - Scalable Aggregation and Reduction Technology

• Reliable Scalable General Purpose Primitive– In-network Tree based aggregation mechanism

– Large number of groups

– Multiple simultaneous outstanding operations

• Applicable to Multiple Use-cases– HPC Applications using MPI / SHMEM

– Distributed Machine Learning applications

• Scalable High Performance Collective Offload– Barrier, Reduce, All-Reduce, Broadcast and more

Topology (Physical Tree)

The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference

Micro Benchmark – MPI Allreduce Latency

• Oak Ridge National Laboratory – Coral Summit Supercomputer

The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference

SHARP Performance Comparison (Lower is Better)

• SHARP Enables 4X Higher Performance

1.5X4X

The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference

LSTC LS-DYNA

• LS-DYNA– A general purpose structural and fluid analysis simulation software

package capable of simulating complex real world problems

– Developed by the Livermore Software Technology Corporation (LSTC)

• LS-DYNA used by– Automobile

– Aerospace

– Construction

– Military

– Manufacturing

– Bioengineering

The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference

Benchmark Setup

• OS: CentOS 7.7

• Driver: MLNX_OFED 4.7

• CPU: Intel E5-2697 v4 @2.6GHz, dual socket 16 cores per socket (dual socket)

• Network: InfiniBand HDR100

• LS-DYNA Version: ls-dyna_mpp_s_R11_1_0_x64_centos65_ifort160_avx2_intelmpi-2018

• Input: 3cars_rev02

• IO: RAMFS

• MPI: HPC-X 2.6.0

The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference

LS-DYNA 3cars Benchmark Profiling - % of MPI Time

The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference

LS-DYNA 3cars Benchmark Profiling – MPI Communications

The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference

LS-DYNA 3cars Benchmark Profiling – Message Size

The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference

LS-DYNA 3cars Benchmark Profiling – Memory Usage

The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference

LS-DYNA 3cars Benchmark – InfiniBand Transport

• DC (Dynamically Connected) InfiniBand network transport uses dynamically pool of

network resources, therefore reduces memory footprint

• DC transport was design for large scale supercomputers

• DC transport shown to provide higher performance results

The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference

LS-DYNA Neon_Refined_Revised Benchmark – InfiniBand

Transport

• DC (Dynamically Connected) InfiniBand network transport uses dynamically pool of

network resources, therefore reduces memory footprint

• DC transport was design for large scale supercomputers

• DC transport shown to provide higher performance results

The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference

LS-DYNA 3cars Benchmark – MPI Libraries

The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference

LD-DYNA Neon_Refined_Revised Benchmark – MPI Libraries

The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference

OpenFOAM

• Toolbox in an open source CFD applications that can simulate– Complex fluid flows involving

– Chemical reactions

– Turbulence

– Heat transfer

– Solid dynamics

– Electromagnetics

– The pricing of financial options

The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference

Benchmark Setup

• OS: CentOS 7.7

• Driver: MLNX_OFED 4.7

• CPU: Intel Gold 6138 CPU @ 2.00GHz, dual socket 20 cores per socket (dual socket)

• Network: InfiniBand HDR100 over Single HDR Switch

• OpenFOAM Version: v1912

• Input: MotorBike_160

• IO: Lustre/Local Disk

• MPI: HPC-X 2.6.0/Intel MPI 2019 u7

The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference

OpenFOAM Profiling – MPI Time

• MPI profiler shows the type of underlying MPI network communications

– Majority of communications occurred are non-blocking communications

• Majority of the MPI time is spent on non-blocking communications at 32 nodes

– MPI_Waitall (11% wall), 8-byte MPI_Recv (1.4% wall), 1-byte MPI_Recv (0.7% wall)

– Only 14% of the overall runtime is spent on MPI communications at 32-nodes

The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference

OpenFOAM Profiling – MPI Communication Topology

• Communication topology shows communication patterns among MPI ranks

• MPI processes mainly communicates with neighbors, but also shows some other patterns

32 Nodes16 Nodes8 Nodes4 Nodes

The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference

OpenFOAM – MPI Comparison

26

100% Scalability

The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference

OpenFOAM – IO Comparison

27

100% Scalability 8%

The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference

OpenFOAM – AVX Comparison

28

3%

The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference

Summary

• HPC applications such as LS-DYNA and OpenFOAM impose high demands on the cluster interconnect

• Low latency, high data throughput and MPI offload engines deliver higher performance

• A comparison between the different InfiniBand transport services demonstrates the performance advantages of the Dynamically Connected (DC) transport. The DC transport was designed for scalable HPC infrastructures, to enable the usage of a dynamic pool of network resources

• Intel MPI 2019 u7 and HPC-X 2.6 use the same UCX library from the UCF (Unified Communication Framework) consortium, and therefore demonstrate similar performance on both LS-DYNA and OpenFOAM

• Enabling AVX2 for OpenFOAM had 3% advantage over SSE4.2 (No AVX)

• OpenFOAM running mounted to local disk gave 8% advantage over Lustre

The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference

2020 HPC-AI Advisory Council Activities

• Cluster Center and Advanced Technology center– Multiple clusters, variety of leading edge technologies

• 2020 Conferences– USA (Stanford University) – April (Online)

– Australia (National Computational Infrastructure – NCI) – September

– HPC China - September

– UK (University of Leicester, DiRAC) – October

– SC China Conference – November

• 2020 Competitions– APAC Annual HPC-AI Competition – May-October (Online)

– ISC Annual Student Cluster Competition – June (Online)

• For more information – www.hpcadvisorycouncil.com

– info@hpcadvisorycouncil.com

The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference

Thank You!

31