Upload
madrasmirror
View
220
Download
0
Embed Size (px)
Citation preview
8/3/2019 HPC Fracture Seminar
1/17
2011 CAE Associates
ANSYS High
PerformanceComputing (HPC)
8/3/2019 HPC Fracture Seminar
2/17
2
Why HPC
As we have seen, calculation of crack extension often requires multiplesolutions of the model.
If the full crack path is known, then each of these solutions can be run onseparate machines simultaneously.
However, if the path of the crack is determined by the previous solution,then each run must be made sequentially.
Use of HPC can greatly reduce the overall time it takes for thesecalculations.
8/3/2019 HPC Fracture Seminar
3/17
3
Why HPC
We have also seen that there is significant scatter and variation in thefatigue properties as well as variation in the geometry, material properties
and loading.
To account for these variations a statistical assessment is needed, whichwill require many simulations.
Use of HPC can greatly reduce the overall time it takes for thesecalculations.
8/3/2019 HPC Fracture Seminar
4/17
4
ANSYS HPC
The default ANSYS licensing allows the use of 2 cores.
Use of additional cores requires HPC licenses.
8/3/2019 HPC Fracture Seminar
5/17
5
Memory Options
Shared Memory ANSYS (SMP)
Multiple processors on one machine, all accessing the same RAM
Limited by memory bandwidth Most, but not all of the solution phase runs in parallel
Tops out between 4-8 cores
Distributed ANSYS (MPP)
Can run over a cluster of machines OR use multiple processors on onemachine.
In the case of clusters, limited by interconnect speed
Entire solution phase runs in parallel (including stiffness matrix generation,linear equation solving, and results calculation).
For distributed on one machine, the non-solution phases will run in SMP mode
Does not support all analysis types, elements, etc.
Requires MPI software
Extends performance to a larger number of cores
8/3/2019 HPC Fracture Seminar
6/17
6
Solver Options
SMP Solvers
Sparse
JCG ICCG
PCG
QMR
AMG
MPP Solvers
Sparse
JCG
PCG
8/3/2019 HPC Fracture Seminar
7/177
Scalability
Amdahls law scalability is limited by serial computations
Scalability limiters:
Large contact pairs
Constraint equations across domain partitions
MPCs
Hardware
For distributed ANSYS, hardware must be balanced
Fast processors require fast interconnects
Cannot have lots of I/O to a single disk
8/3/2019 HPC Fracture Seminar
8/178
FEA Benchmark Problem
Bolted Flange with O-Ring
Nonlinear material properties(Hyperelastic O-Ring)
Large Deformation
Nonlinear Contact
1 Million Degrees of Freedom
8/3/2019 HPC Fracture Seminar
9/179
DHCAD5650 High End Workstation
(2) Intel Hex Core 2.66GHzProcessors (12 Cores total)
24 GB RAM (4) 300GB Toshiba SAS 15,000 RPM
RAID 5 Configuration
8/3/2019 HPC Fracture Seminar
10/1710
FEA Benchmark Performance
0
1
2
3
4
5
6
0 2 4 6 8 10 12 14
SolverSpeed
Up
# Cores
Single Machine - v13
SPARSE
DSPARSE
AMG
8/3/2019 HPC Fracture Seminar
11/1711
ANSYS Inc. Benchmark
Distributed 4M DOF Sparse Solver - Linear Elastic, SOLID186s
These runs were done on a Intel cluster containing 1000+ nodes, where each
node contained 2 6-core Westmere processors, 24 GB of RAM, fast I/O andInfiniband DDR2 (~2500 MB/s) interconnect.
0
5
10
15
20
25
0 10 20 30 40 50 60
SolverSpeed
Up
Number of Cores
8/3/2019 HPC Fracture Seminar
12/1712
Disk Drive Speed
The bolted flange analysis was run on the two different drives of our highend workstation to compare disk speed influence on solution time.
The RAID array completed the solution almost twice as fast as the SATAdrive:
Run #1: PCG Solver, 12 CPU, In-Core, RAID Array. Wall time = 8754 sec.
Run #2: PCG Solver, 12 CPU, In-Core, SATA Drive. Wall time = 16822 sec.
8/3/2019 HPC Fracture Seminar
13/1713
Hyperthreading
Hyperthreading allows one physical processor to appear as two logicalprocessors to the operating system. This allows the operating system to
perform two different processes simultaneously.
It does not, however, allow the processor to do two of the same type ofoperation simultaneously (i.e. floating point operations).
This form of parallel processing is only effective when a system has manylightweight tasks.
8/3/2019 HPC Fracture Seminar
14/1714
Hyperthreading and ANSYS
The bolted flange analysis was run with Hyperthreading on and then againwith it off to determine its influence.
Run #1: PCG Solver, 12 CPUs, Hyperthreading Off. Wall time = 8754 sec. Run #2: PCG Solver, 24 CPUs, Hyperthreading On. Wall time = 8766 sec.
An LS-Dyna analysis was also run in the same manner as above with thefollowing results.
Run #1: 12 CPUs, Hyperthreading Off. Wall time = 19560 sec.
Run #2: 24 CPUs, Hyperthreading On. Wall time = 32918 sec.
8/3/2019 HPC Fracture Seminar
15/1715
GPU Accelerator
Available at v13 for Mechanical only
ANSYS job launches on a CPU, sends floating point operations to theGPU for heavy lifting, returns data to CPU to end the job.
Works in SMP mode only (in v13) using the GPU cards memory
Solvers: Sparse, PCG, JCG
Model limits for sparse solver depend on largest front sizes:
~3M DOF for 3GB Tesla C2050 and ~6M DOF for 6GB Tesla C2070
Model limits for PCG/JCG solvers: ~1.5M DOF for 3GB Tesla C2050 and ~3M DOF for 6GB Tesla C2070
8/3/2019 HPC Fracture Seminar
16/1716
GPU Accelerator
Shows additional 1.65x speedup in static test case:
Run #1: Sparse Solver, 4 CPUs, GPU Off. Wall time = 240 sec.
Run #2: Sparse Solver, 4 CPUs, GPU On. Wall time = 146 sec.
GPU test cases run on small models, dont show great improvement.
l f
8/3/2019 HPC Fracture Seminar
17/1717
GPU Accelerator Performance
ANSYS Inc. Benchmark