The Effect of InfiniBand In-Network Computing on CAE Simulations · 2020-02-14 ·...

Preview:

Citation preview

Simulation in the Automotive Industry: Creating the Next Generation Vehiclenafems.org/americas November 14th, 2019 | Troy, MI

The Effect of InfiniBand In-Network

Computing on CAE Simulations

HPC-AI Advisory Council

October 1st, 2019 | Columbus, OH

Simulation in the Automotive Industry: Creating the Next Generation Vehiclenafems.org/americas November 14th, 2019 | Troy, MI

The HPC-AI Advisory Council

• World-wide HPC non-profit organization

• More than 400 member companies / universities / organizations

• Bridges the gap between HPC-AI usage and its potential

• Provides best practices and a support/development center

• Explores future technologies and future developments

• Leading edge solutions and technology demonstrations

October 1st, 2019 | Columbus, OH

Simulation in the Automotive Industry: Creating the Next Generation Vehiclenafems.org/americas November 14th, 2019 | Troy, MI

HPC Advisory Council Members

HPC-AI Advisory Council Cluster Center (Examples)

• Supermicro / Foxconn 32-node cluster

• Dual Socket Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz

• Dell™ PowerEdge™ R730/R630 36-node cluster

• Dual Socket Intel® Xeon® 16-core CPUs E5-2697A V4 @ 2.60 GHz

• AMD Daytona_X

• Dual Socket AMD Rome 128 core 8-node cluster @ 2.25GHz

Simulation in the Automotive Industry: Creating the Next Generation Vehiclenafems.org/americas November 14th, 2019 | Troy, MI

• Lattice QCD

• LAMMPS

• LS-DYNA

• miniFE

• MILC

• MSC Nastran

• MR Bayes

• MM5

• MPQC

• NAMD

• Nekbone

• NEMO

• NWChem

• Octopus

• OpenAtom

• OpenFOAM

• OpenMX

• OptiStruct

• PARATEC

• PFA

• PFLOTRAN

• Quantum ESPRESSO

• RADIOSS

• SNAP

• SPECFEM3D

• STAR-CCM+

• STAR-CD

• VASP

• VSP

• WRF

Multiple Applications Best Practices Published

• Abaqus

• ABySS

• AcuSolve

• Amber

• AMG

• AMR

• ANSYS CFX

• ANSYS FLUENT

• ANSYS Mechanical

• BQCD

• BSMBench

• CAM-SE

• CCSM

• CESM

• COSMO

• CP2K

• CPMD

• Dacapo

• Desmond

• DL-POLY

• Eclipse

• FLOW-3D

• GADGET-2

• Graph500

• GROMACS

• Himeno

• HIT3D

• HOOMD-blue

• HPCC

• HPCG

• HYCOM

• ICON

App

App

App

October 1st, 2019 | Columbus, OH

Simulation in the Automotive Industry: Creating the Next Generation Vehiclenafems.org/americas November 14th, 2019 | Troy, MI

HPC-AI Advisory Council Activities

• HPC-AI Advisory Council– More then 400 members, http://www.hpcadvisorycouncil.com/

– Application best practices, case studies– Development and benchmarking center with remote access for users– World-wide conferences

• Conferences– USA (Stanford University) – February– Switzerland (CSCS) – April– Student Cluster Competition (ISC) – July– China (HPC China) - August– Australia - August– UK – September– China – November

• Competitions– APAC HPC-AI Competition - March– China - 6th Annual RDMA Competition - May– ISC Germany - Annual Student Cluster Competition - June

• For more information – www.hpcadvisorycouncil.com– info@hpcadvisorycouncil.com

October 1st, 2019 | Columbus, OH

Simulation in the Automotive Industry: Creating the Next Generation Vehiclenafems.org/americas November 14th, 2019 | Troy, MI

HPC|Works Community

October 1st, 2019 | Columbus, OH

Simulation in the Automotive Industry: Creating the Next Generation Vehiclenafems.org/americas November 14th, 2019 | Troy, MI

Computing Evolution – Compute Centric to Data Centric

Von NeumannMachine

NeuralNetworks

Compute-Centric Data-Centric

October 1st, 2019 | Columbus, OH

Simulation in the Automotive Industry: Creating the Next Generation Vehiclenafems.org/americas November 14th, 2019 | Troy, MI

The Need for Intelligent Data Center

CPU-Centric (Onload) Data-Centric (Offload)

Move Data to the ComputeMust Wait for the Data

Creates Performance Bottlenecks

GPU

CPU

GPU

CPU

Onload Network

GPU

CPU

CPU

GPU

GPU

CPU

GPU

CPU

GPU

CPU

CPU

GPU

Move Compute to the DataAnalyze Data Everywhere

Higher Performance and Scale

Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)

• Reliable Scalable General Purpose Primitive

• In-network Tree based aggregation mechanism

• Large number of groups

• Multiple simultaneous outstanding operations

• Applicable to Multiple Use-cases

• HPC Applications using MPI / SHMEM

• Distributed Machine Learning applications

• Scalable High Performance Collective Offload

• Barrier, Reduce, All-Reduce, Broadcast and more

• Sum, Min, Max, Min-loc, max-loc, OR, XOR, AND

• Integer and Floating-Point, 16/32/64 bits

DataAggregated

AggregatedResult

Aggregated Result

Data

Host Host Host Host Host

SwitchSwitch

Switch

SHARP AllReduce Performance Advantages (128 Nodes)

SHARP AllReduce Performance Advantages 1500 Nodes, 60K MPI Ranks, Dragonfly+ Topology

The Niagara Supercomputer – University of Toronto

October 1st, 2019 | Columbus, OH

Simulation in the Automotive Industry: Creating the Next Generation Vehiclenafems.org/americas November 14th, 2019 | Troy, MI

OpenFOAM

• Toolbox in an open source CFD applications that can simulate– Complex fluid flows involving

– Chemical reactions

– Turbulence

– Heat transfer

– Solid dynamics

– Electromagnetics

– The pricing of financial options

October 1st, 2019 | Columbus, OH

Simulation in the Automotive Industry: Creating the Next Generation Vehiclenafems.org/americas November 14th, 2019 | Troy, MI

OpenFOAM Profiling – MPI/User Time Ratio

• OpenFOAM simpleFOAM solver uses mainly non-blocking communications

• 23% of overall runtime spent on MPI communication at 16 nodes / 640 MPI cores

• Both Intel MPI and HPC-X spent the same time in overall runtime on MPI communications

• Overall of MPI time spent in MPI non-blocking communications (MPI_Waitall 47%, MPI_Isend,

47%)

• Most of the MPI calls made by OpenFOAM are MPI_Waitall

October 1st, 2019 | Columbus, OH

Simulation in the Automotive Industry: Creating the Next Generation Vehiclenafems.org/americas November 14th, 2019 | Troy, MI

OpenFOAM Profiling – MPI Time

• MPI profiler shows the type of underlying MPI network communications

– Majority of communications occurred are non-blocking communications

• Majority of the MPI time is spent on non-blocking communications at 32 nodes

– MPI_Waitall (11% wall), 8-byte MPI_Recv (1.4% wall), 1-byte MPI_Recv (0.7% wall)

– Only 14% of the overall runtime is spent on MPI communications at 32-nodes (when EDR

is used)

October 1st, 2019 | Columbus, OH

Simulation in the Automotive Industry: Creating the Next Generation Vehiclenafems.org/americas November 14th, 2019 | Troy, MI

OpenFOAM Profiling – MPI Communication Topology

• Communication topology shows communication patterns among MPI ranks

• MPI processes mainly communicates with neighbors, but also shows some other patterns

32 Nodes16 Nodes8 Nodes4 Nodes

October 1st, 2019 | Columbus, OH

Simulation in the Automotive Industry: Creating the Next Generation Vehiclenafems.org/americas November 14th, 2019 | Troy, MI

OpenFOAM Performance E5-2697A v4 @ 2.60GHz, HDR100

23%

October 1st, 2019 | Columbus, OH

Simulation in the Automotive Industry: Creating the Next Generation Vehiclenafems.org/americas November 14th, 2019 | Troy, MI

OpenFOAM Performance E5-2697A v4 @ 2.60GHz, HDR100

50%

October 1st, 2019 | Columbus, OH

Simulation in the Automotive Industry: Creating the Next Generation Vehiclenafems.org/americas November 14th, 2019 | Troy, MI

OpenFOAM Performance Using (HPC-X 2.5 MPI)

35%

October 1st, 2019 | Columbus, OH

Simulation in the Automotive Industry: Creating the Next Generation Vehiclenafems.org/americas November 14th, 2019 | Troy, MI

LS-DYNA

• LS-DYNA – A general purpose structural and fluid analysis simulation software

package capable of simulating complex real world problems

– Developed by the Livermore Software Technology Corporation (LSTC)

• LS-DYNA used by – Automobile

– Aerospace

– Construction

– Military

– Manufacturing

– Bioengineering

2019, 28 - 29 October 35th INTERNATIONAL CAE CONFERENCE AND EXHIBITION

21

October 1st, 2019 | Columbus, OH

Simulation in the Automotive Industry: Creating the Next Generation Vehiclenafems.org/americas November 14th, 2019 | Troy, MI

LS-DYNA PerformanceIntel Xeon Gold 6138 CPU 2.00GHz , HDR100

2019, 28 - 29 October 35th INTERNATIONAL CAE CONFERENCE AND EXHIBITION

22

October 1st, 2019 | Columbus, OH

Simulation in the Automotive Industry: Creating the Next Generation Vehiclenafems.org/americas November 14th, 2019 | Troy, MI

LS-DYNA PerformanceIntel Xeon Gold 6138 CPU 2.00GHz , HDR100

2019, 28 - 29 October 35th INTERNATIONAL CAE CONFERENCE AND EXHIBITION

23

39 %

October 1st, 2019 | Columbus, OH

Simulation in the Automotive Industry: Creating the Next Generation Vehiclenafems.org/americas November 14th, 2019 | Troy, MI

ANSYS Fluent

• Computational Fluid Dynamics (CFD)

– Enables the study of the dynamics of things that flow

– Enable better understanding of qualitative and quantitative physical phenomena in the flow which is used to improve engineering design.

• CFD brings together a number of different disciplines

– Fluid dynamics, mathematical theory of partial differential systems, computational geometry, numerical analysis, Computer science.

• ANSYS FLUENT is a leading CFD application from ANSYS

– Widely used in almost every industry sector and manufactured product.

2019, 28 - 29 October 35th INTERNATIONAL CAE CONFERENCE AND EXHIBITION

24

October 1st, 2019 | Columbus, OH

Simulation in the Automotive Industry: Creating the Next Generation Vehiclenafems.org/americas November 14th, 2019 | Troy, MI

ANSYS FluentE5-2697A v4 @ 2.60GHz, HDR100

2019, 28 - 29 October 35th INTERNATIONAL CAE CONFERENCE AND EXHIBITION

25

26%

October 1st, 2019 | Columbus, OH

Simulation in the Automotive Industry: Creating the Next Generation Vehiclenafems.org/americas November 14th, 2019 | Troy, MI

ANSYS Fluent E5-2697A v4 @ 2.60GHz, HDR100

2019, 28 - 29 October 35th INTERNATIONAL CAE CONFERENCE AND EXHIBITION

26

15%

October 1st, 2019 | Columbus, OH

Simulation in the Automotive Industry: Creating the Next Generation Vehiclenafems.org/americas November 14th, 2019 | Troy, MI

InfiniBand QoS

• IBTA Standard

• Application SL -> VL mapping

• WWR / Strict Priority setting

2019, 28 - 29 October 35th INTERNATIONAL CAE CONFERENCE AND EXHIBITION

27

October 1st, 2019 | Columbus, OH

Simulation in the Automotive Industry: Creating the Next Generation Vehiclenafems.org/americas November 14th, 2019 | Troy, MI

QoS LS-DYNA test• Run the test with no background traffic, on 4 nodes

• Add massive background traffic but without enabling

any QoS; both the application and massive

background traffic use the same SL and the same

network resource.

• Add massive background traffic, but with enabling

QoS, and setting a priority to the LS-DYNA application

over the background traffic. 2019, 28 - 29 October 35th INTERNATIONAL CAE CONFERENCE AND EXHIBITION

28

October 1st, 2019 | Columbus, OH

Simulation in the Automotive Industry: Creating the Next Generation Vehiclenafems.org/americas November 14th, 2019 | Troy, MI

LS-DYNA QoSE5-2697A v4 @ 2.60GHz, HDR100

2019, 28 - 29 October 35th INTERNATIONAL CAE CONFERENCE AND EXHIBITION

29

October 1st, 2019 | Columbus, OH

Simulation in the Automotive Industry: Creating the Next Generation Vehiclenafems.org/americas November 14th, 2019 | Troy, MI

Summary• HPC cluster environments impose high demands on connectivity throughput and low latency

with low CPU overhead, network flexibility, and high efficiency

• Fulfilling these demands enables the maintenance of a balanced system that can achieve high application performance and high scaling

• With the increase in number of CPU cores and application threads, there is a need to develop a new HPC cluster architecture - a data-focused architecture

• The Co-Design collaboration enables the development of In-Network Computing technology that breaks the performance and scalability barriers

• The OpenFOAM, LS-DYNA and ANSYS Fluent applications were benchmarked over AMD Rome and Intel CPUs for this study to demonstrate the advantages of In-Network Computing technology

• We have witness 50% performance advantage and linear scalability with InfiniBand In-Network Computing technology

• InfiniBand QoS can be considered in network design. By enabling QoS, we can achieve similar performance for LS-DYNA, with and without the background noise.

All trademarks are property of their respective owners. All information is provided “As-Is” without any kind of warranty. The HPC Advisory Council makes no representation to the accuracy and completeness of the information

contained herein. HPC Advisory Council undertakes no duty and assumes no obligation to update or correct any information presented herein

Thank You

Recommended