HPC Fracture Seminar

Embed Size (px)

Citation preview

  • 8/3/2019 HPC Fracture Seminar

    1/17

    2011 CAE Associates

    ANSYS High

    PerformanceComputing (HPC)

  • 8/3/2019 HPC Fracture Seminar

    2/17

    2

    Why HPC

    As we have seen, calculation of crack extension often requires multiplesolutions of the model.

    If the full crack path is known, then each of these solutions can be run onseparate machines simultaneously.

    However, if the path of the crack is determined by the previous solution,then each run must be made sequentially.

    Use of HPC can greatly reduce the overall time it takes for thesecalculations.

  • 8/3/2019 HPC Fracture Seminar

    3/17

    3

    Why HPC

    We have also seen that there is significant scatter and variation in thefatigue properties as well as variation in the geometry, material properties

    and loading.

    To account for these variations a statistical assessment is needed, whichwill require many simulations.

    Use of HPC can greatly reduce the overall time it takes for thesecalculations.

  • 8/3/2019 HPC Fracture Seminar

    4/17

    4

    ANSYS HPC

    The default ANSYS licensing allows the use of 2 cores.

    Use of additional cores requires HPC licenses.

  • 8/3/2019 HPC Fracture Seminar

    5/17

    5

    Memory Options

    Shared Memory ANSYS (SMP)

    Multiple processors on one machine, all accessing the same RAM

    Limited by memory bandwidth Most, but not all of the solution phase runs in parallel

    Tops out between 4-8 cores

    Distributed ANSYS (MPP)

    Can run over a cluster of machines OR use multiple processors on onemachine.

    In the case of clusters, limited by interconnect speed

    Entire solution phase runs in parallel (including stiffness matrix generation,linear equation solving, and results calculation).

    For distributed on one machine, the non-solution phases will run in SMP mode

    Does not support all analysis types, elements, etc.

    Requires MPI software

    Extends performance to a larger number of cores

  • 8/3/2019 HPC Fracture Seminar

    6/17

    6

    Solver Options

    SMP Solvers

    Sparse

    JCG ICCG

    PCG

    QMR

    AMG

    MPP Solvers

    Sparse

    JCG

    PCG

  • 8/3/2019 HPC Fracture Seminar

    7/177

    Scalability

    Amdahls law scalability is limited by serial computations

    Scalability limiters:

    Large contact pairs

    Constraint equations across domain partitions

    MPCs

    Hardware

    For distributed ANSYS, hardware must be balanced

    Fast processors require fast interconnects

    Cannot have lots of I/O to a single disk

  • 8/3/2019 HPC Fracture Seminar

    8/178

    FEA Benchmark Problem

    Bolted Flange with O-Ring

    Nonlinear material properties(Hyperelastic O-Ring)

    Large Deformation

    Nonlinear Contact

    1 Million Degrees of Freedom

  • 8/3/2019 HPC Fracture Seminar

    9/179

    DHCAD5650 High End Workstation

    (2) Intel Hex Core 2.66GHzProcessors (12 Cores total)

    24 GB RAM (4) 300GB Toshiba SAS 15,000 RPM

    RAID 5 Configuration

  • 8/3/2019 HPC Fracture Seminar

    10/1710

    FEA Benchmark Performance

    0

    1

    2

    3

    4

    5

    6

    0 2 4 6 8 10 12 14

    SolverSpeed

    Up

    # Cores

    Single Machine - v13

    SPARSE

    DSPARSE

    AMG

  • 8/3/2019 HPC Fracture Seminar

    11/1711

    ANSYS Inc. Benchmark

    Distributed 4M DOF Sparse Solver - Linear Elastic, SOLID186s

    These runs were done on a Intel cluster containing 1000+ nodes, where each

    node contained 2 6-core Westmere processors, 24 GB of RAM, fast I/O andInfiniband DDR2 (~2500 MB/s) interconnect.

    0

    5

    10

    15

    20

    25

    0 10 20 30 40 50 60

    SolverSpeed

    Up

    Number of Cores

  • 8/3/2019 HPC Fracture Seminar

    12/1712

    Disk Drive Speed

    The bolted flange analysis was run on the two different drives of our highend workstation to compare disk speed influence on solution time.

    The RAID array completed the solution almost twice as fast as the SATAdrive:

    Run #1: PCG Solver, 12 CPU, In-Core, RAID Array. Wall time = 8754 sec.

    Run #2: PCG Solver, 12 CPU, In-Core, SATA Drive. Wall time = 16822 sec.

  • 8/3/2019 HPC Fracture Seminar

    13/1713

    Hyperthreading

    Hyperthreading allows one physical processor to appear as two logicalprocessors to the operating system. This allows the operating system to

    perform two different processes simultaneously.

    It does not, however, allow the processor to do two of the same type ofoperation simultaneously (i.e. floating point operations).

    This form of parallel processing is only effective when a system has manylightweight tasks.

  • 8/3/2019 HPC Fracture Seminar

    14/1714

    Hyperthreading and ANSYS

    The bolted flange analysis was run with Hyperthreading on and then againwith it off to determine its influence.

    Run #1: PCG Solver, 12 CPUs, Hyperthreading Off. Wall time = 8754 sec. Run #2: PCG Solver, 24 CPUs, Hyperthreading On. Wall time = 8766 sec.

    An LS-Dyna analysis was also run in the same manner as above with thefollowing results.

    Run #1: 12 CPUs, Hyperthreading Off. Wall time = 19560 sec.

    Run #2: 24 CPUs, Hyperthreading On. Wall time = 32918 sec.

  • 8/3/2019 HPC Fracture Seminar

    15/1715

    GPU Accelerator

    Available at v13 for Mechanical only

    ANSYS job launches on a CPU, sends floating point operations to theGPU for heavy lifting, returns data to CPU to end the job.

    Works in SMP mode only (in v13) using the GPU cards memory

    Solvers: Sparse, PCG, JCG

    Model limits for sparse solver depend on largest front sizes:

    ~3M DOF for 3GB Tesla C2050 and ~6M DOF for 6GB Tesla C2070

    Model limits for PCG/JCG solvers: ~1.5M DOF for 3GB Tesla C2050 and ~3M DOF for 6GB Tesla C2070

  • 8/3/2019 HPC Fracture Seminar

    16/1716

    GPU Accelerator

    Shows additional 1.65x speedup in static test case:

    Run #1: Sparse Solver, 4 CPUs, GPU Off. Wall time = 240 sec.

    Run #2: Sparse Solver, 4 CPUs, GPU On. Wall time = 146 sec.

    GPU test cases run on small models, dont show great improvement.

    l f

  • 8/3/2019 HPC Fracture Seminar

    17/1717

    GPU Accelerator Performance

    ANSYS Inc. Benchmark