HPC Fracture Seminar

8/3/2019 HPC Fracture Seminar

1/17

2011 CAE Associates

ANSYS High

PerformanceComputing (HPC)


2/17

2

Why HPC

As we have seen, calculation of crack extension often requires multiplesolutions of the model.

If the full crack path is known, then each of these solutions can be run onseparate machines simultaneously.

However, if the path of the crack is determined by the previous solution,then each run must be made sequentially.

Use of HPC can greatly reduce the overall time it takes for thesecalculations.


3/17

3

Why HPC

We have also seen that there is significant scatter and variation in thefatigue properties as well as variation in the geometry, material properties

and loading.

To account for these variations a statistical assessment is needed, whichwill require many simulations.

Use of HPC can greatly reduce the overall time it takes for thesecalculations.


4/17

4

ANSYS HPC

The default ANSYS licensing allows the use of 2 cores.

Use of additional cores requires HPC licenses.


5/17

5

Memory Options

Shared Memory ANSYS (SMP)

Multiple processors on one machine, all accessing the same RAM

Limited by memory bandwidth Most, but not all of the solution phase runs in parallel

Tops out between 4-8 cores

Distributed ANSYS (MPP)

Can run over a cluster of machines OR use multiple processors on onemachine.

In the case of clusters, limited by interconnect speed

Entire solution phase runs in parallel (including stiffness matrix generation,linear equation solving, and results calculation).

For distributed on one machine, the non-solution phases will run in SMP mode

Does not support all analysis types, elements, etc.

Requires MPI software

Extends performance to a larger number of cores


6/17

6

Solver Options

SMP Solvers

Sparse

JCG ICCG

PCG

QMR

AMG

MPP Solvers

Sparse

JCG

PCG


7/177

Scalability

Amdahls law scalability is limited by serial computations

Scalability limiters:

Large contact pairs

Constraint equations across domain partitions

MPCs

Hardware

For distributed ANSYS, hardware must be balanced

Fast processors require fast interconnects

Cannot have lots of I/O to a single disk


8/178

FEA Benchmark Problem

Bolted Flange with O-Ring

Nonlinear material properties(Hyperelastic O-Ring)

Large Deformation

Nonlinear Contact

1 Million Degrees of Freedom


9/179

DHCAD5650 High End Workstation

(2) Intel Hex Core 2.66GHzProcessors (12 Cores total)

24 GB RAM (4) 300GB Toshiba SAS 15,000 RPM

RAID 5 Configuration


10/1710

FEA Benchmark Performance

0

1

2

3

4

5

6

0 2 4 6 8 10 12 14

SolverSpeed

Up

# Cores

Single Machine - v13

SPARSE

DSPARSE

AMG


11/1711

ANSYS Inc. Benchmark

Distributed 4M DOF Sparse Solver - Linear Elastic, SOLID186s

These runs were done on a Intel cluster containing 1000+ nodes, where each

node contained 2 6-core Westmere processors, 24 GB of RAM, fast I/O andInfiniband DDR2 (~2500 MB/s) interconnect.

0

5

10

15

20

25

0 10 20 30 40 50 60

SolverSpeed

Up

Number of Cores


12/1712

Disk Drive Speed

The bolted flange analysis was run on the two different drives of our highend workstation to compare disk speed influence on solution time.

The RAID array completed the solution almost twice as fast as the SATAdrive:

Run #1: PCG Solver, 12 CPU, In-Core, RAID Array. Wall time = 8754 sec.

Run #2: PCG Solver, 12 CPU, In-Core, SATA Drive. Wall time = 16822 sec.


13/1713

Hyperthreading

Hyperthreading allows one physical processor to appear as two logicalprocessors to the operating system. This allows the operating system to

perform two different processes simultaneously.

It does not, however, allow the processor to do two of the same type ofoperation simultaneously (i.e. floating point operations).

This form of parallel processing is only effective when a system has manylightweight tasks.


14/1714

Hyperthreading and ANSYS

The bolted flange analysis was run with Hyperthreading on and then againwith it off to determine its influence.

Run #1: PCG Solver, 12 CPUs, Hyperthreading Off. Wall time = 8754 sec. Run #2: PCG Solver, 24 CPUs, Hyperthreading On. Wall time = 8766 sec.

An LS-Dyna analysis was also run in the same manner as above with thefollowing results.

Run #1: 12 CPUs, Hyperthreading Off. Wall time = 19560 sec.

Run #2: 24 CPUs, Hyperthreading On. Wall time = 32918 sec.


15/1715

GPU Accelerator

Available at v13 for Mechanical only

ANSYS job launches on a CPU, sends floating point operations to theGPU for heavy lifting, returns data to CPU to end the job.

Works in SMP mode only (in v13) using the GPU cards memory

Solvers: Sparse, PCG, JCG

Model limits for sparse solver depend on largest front sizes:

~3M DOF for 3GB Tesla C2050 and ~6M DOF for 6GB Tesla C2070

Model limits for PCG/JCG solvers: ~1.5M DOF for 3GB Tesla C2050 and ~3M DOF for 6GB Tesla C2070


16/1716

GPU Accelerator

Shows additional 1.65x speedup in static test case:

Run #1: Sparse Solver, 4 CPUs, GPU Off. Wall time = 240 sec.

Run #2: Sparse Solver, 4 CPUs, GPU On. Wall time = 146 sec.

GPU test cases run on small models, dont show great improvement.

l f


17/1717

GPU Accelerator Performance

ANSYS Inc. Benchmark

Documents

HPC Fracture Seminar