View
39
Download
0
Category
Preview:
DESCRIPTION
2.4 Parallel Performance Enhancements. In this section, we will discuss the following topics: A.New add-on product Parallel Performance for ANSYS B.Distributed Domain Solver (DDS) C.Algebraic Multigrid Solver (AMG). Parallel Performance Enhancements Overview. - PowerPoint PPT Presentation
Citation preview
Training Manual 00141915 Aug 2000
2.4-1
NEW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.72.4 Parallel Performance Enhancements
• In this section, we will discuss the following topics:
A. New add-on product Parallel Performance for ANSYS
B. Distributed Domain Solver (DDS)
C. Algebraic Multigrid Solver (AMG)
Training Manual 00141915 Aug 2000
2.4-2
NEW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7Parallel Performance Enhancements
Overview
• Driven by user requirements of higher accuracy and fidelity in solution
– e.g. mesh refinement and adaptive meshing
• Desire to solve assemblies instead of individual component analysis
– e.g. assembly contact problems
Training Manual 00141915 Aug 2000
2.4-3
NEW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7Parallel Performance Enhancements
A. Parallel Performance for ANSYS
• A new, add-on product for shared memory and distributed memory environments
• Offers powerful new solvers enabling quick, accurate solutions to large models using multiple processors
– Algebraic MultiGrid (AMG) solver• Solves static/ transient nonlinear analyses using multiple
processors (up to 8) on a single system (shared memory parallel)
– Distributed Domain Solver (DDS)• Solves large static / transient nonlinear analyses over multiple
systems (Distributed memory parallel) as well as multiple processors on a single machine (Shared memory parallel) or any combination
Training Manual 00141915 Aug 2000
2.4-4
NEW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7
What is DDS?
• Breaks large problems (up to 10 million DOFs) into smaller domains (1000 to 10000 DOFs) automatically
• Compatibility among domains obtained by solving for interface variables (Lagrange multipliers)
Parallel Performance Enhancements
B. DDS
Training Manual 00141915 Aug 2000
2.4-5
NEW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7
...What is DDS?
• Transfers and factorizes the subdomains on slave machines using direct solver
• Master machine retrieves and assembles subdomain solutions; solves for interface variables using an iterative solver and computes results for entire model
0
2
4
6
8
10
12
14
0 2 4 6 8 10 12 14 16 18 20 22 24
Number of CPUs
Sp
eed
up
rat
io
Parallel Performance Enhancements
… DDS
Training Manual 00141915 Aug 2000
2.4-6
NEW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7
Carrier problem 3.5 million DOF
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
0 5 10 15 20 25 30 35
No. of Processors
So
lver
Wal
l-ti
me
(sec
.)
Speed-up = 21.0
Parallel Performance Enhancements … DDS
Why DDS?
• Highly scalable
– More processors / less elapsed time
– Example below shows a 3.5 million-DOF SOLID92 model• 2020 subdomains on an SGI Origin 2000, 12GB memory
Training Manual 00141915 Aug 2000
2.4-7
NEW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7
Memory / Disk requirements
• 2 to 4 times more memory than PCG; however, this is not a problem for distributed memory architecture.
– Memory required is a sum of all master & individual slave machine memories
– In general Master machine will need large memory
Parallel Performance Enhancements
… DDS
Training Manual 00141915 Aug 2000
2.4-8
NEW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7
• DDS has 2 components:
– Domain decomposer• Embedded in ANSYS
• Divides domain into n subdomains
• Creates scratch.dds, file.dds, and file.erot
• Issues ‘mpirun’ command and launches appropriate ansdds.e57 executable
– ANSDDS.E57• A stand-alone, MPI enabled executable
• Computes solution for subdomain on the slave processor
• Writes out a file called scratch.u, which is later retrieved by the Master to calculate element results
Parallel Performance Enhancements
… DDS - Under the Hood
Training Manual 00141915 Aug 2000
2.4-9
NEW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7Parallel Performance Enhancements … DDS
• System requirements
– Network must be homogeneous (same operating system)• Message Passing Interface (MPI) used to communicate
– Master (where the job is submitted)• “Performance Parallel for ANSYS” add-on required
• ANSYS 5.7 must be installed (including ansdds.e57)
• Installation of MPI
• 256 MB ram / 10 GB disk required
– Slave • Installation of MPI on all slave machines
• ansdds.e57 executable must be installed
Training Manual 00141915 Aug 2000
2.4-10
NEW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7
How to use DDS
• Specify “Parallel Peformance for ANSYS” add-on when starting ANSYS
– ansys57 -pp
• Choose DDS Solver
– EQSLV,DOMAIN
• Specify information about slave processors
– DDSOPT command*
*DDSOPT command covered in Systems Training
Parallel Performance Enhancements
… DDS
Training Manual 00141915 Aug 2000
2.4-11
NEW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7Parallel Performance Enhancements
… DDS
How to use DDS (cont'd)
• Solve
• Postprocessing
– You get a results file as usual
– /PNUM,DOMAIN,ON will display domains by colors / numbers
Training Manual 00141915 Aug 2000
2.4-12
NEW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7
Are there any modeling restrictions for using DDS?
• Structural static/transient only (linear or nonlinear)
• Symmetric matrices
• “h” elements only
• No coupling / constraint equations
• No inertia relief
Parallel Performance Enhancements
… DDS Solver
Training Manual 00141915 Aug 2000
2.4-13
NEW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7
Benchmark - 2 million DOF test case
0
2000
4000
6000
8000
10000
12000
0 2 4 6 8 10 12 14 16 18 20 22 24 26
Number of Processors
CP
U t
ime
( in
sec
on
ds)
Parallel Performance Enhancements
… DDS Solver
Training Manual 00141915 Aug 2000
2.4-14
NEW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7
What is AMG solver?
• A preconditioned conjugate gradient solver similar to PCG solver
• The preconditioner used in AMG solver is derived using Algebraic MultiGrid technique
– MultiGrid techniques derive a preconditioner that is very close to [K]-1 by working on a coarser mesh of the FE model supplied
– Algebraic MultiGrid methods work on a coarsened version of the full [K] matrix instead of the mesh (that is mesh independent)
Parallel Performance Enhancements
C. AMG Solver
Training Manual 00141915 Aug 2000
2.4-15
NEW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7Parallel Performance Enhancements … AMG Solver
Why do we need AMG solver?
• Sensitivity to ill-conditioning
– Much less sensitive to ill-conditioned problems than PCG
– Will get solutions in fewer iterations than PCG for ill-conditioned problems
– Expected to perform as well as PCG for well conditioned problems
• Scalability
– Up to 5 times for 8 processors
– Scales much better than PCG
– Used in shared memory parallel (single machine with multiple processors) only
Training Manual 00141915 Aug 2000
2.4-16
NEW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7
Scalability
Parallel Performance Enhancements
… AMG Solver
AMG Benchmark , 500,000 dof model
0
100
200
300
400
500
600
0 1 2 3 4 5 6 7 8
Number of CPUs
CP
tim
e (
AM
G s
olv
er)
Training Manual 00141915 Aug 2000
2.4-17
NEW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7
How to use AMG solver
– Specify “Parallel Peformance for ANSYS” add-on when starting ANSYS• ansys57 -pp
– Specify number of processors:• /CONFIG,NPROC,N
• or config57.ans
• or use the macro SETNPROC
– Choose AMG Solver• EQSLV,AMG,Toler
– Tolerance defaults to 1e-8 similar to PCG
– Solve
Parallel Performance Enhancements
… AMG Solver
Training Manual 00141915 Aug 2000
2.4-18
NEW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7
• When to use AMG solver
– Structural Static & Transient analyses
– Nonlinear analyses
– Large aspect ratio elements, reduced integration elements
– Models with combination of shells/ solids/ beams
– Shared memory parallel machines
• When not to use AMG solver
– Non-structural problems (it works but is less efficient)
– Models made of only shell63 elements do not seem to be as cpu efficient as PCG
Parallel Performance Enhancements
… AMG Solver
Training Manual 00141915 Aug 2000
2.4-19
NEW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7N
EW
FEA
TU
RES 5
.7
Memory / Disk requirements
– 1.3 to 2 times more memory than PCG solver• Rule of thumb is 130 MB per 100,000 dof for solid92s
• Memory required is also a function of number of processors used (overhead)
– Files created during AMG solution are very similar to PCG and about the same size
Parallel Performance Enhancements
… AMG Solver
Recommended