44
Stan Posey NVIDIA, Santa Clara, CA, USA; [email protected]

Stan Posey NVIDIA, Santa Clara, CA, USA; [email protected]/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

  • Upload
    leminh

  • View
    220

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

Stan Posey

NVIDIA, Santa Clara, CA, USA; [email protected]

Page 2: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

2

Overview of GPU Progress for CFD GPU Acceleration of ANSYS Fluent GPU Acceleration of OpenFOAM

Agenda: GPU Acceleration for Applied CFD

Page 3: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

3

GPU progress in CFD research continues to expand

Growth from particle-based CFD and high-order methods Explicit schemes generally more progress than implicit

Strong GPU investments by commercial CFD vendors (ISVs)

Breakthroughs in GPU-parallel linear solvers and preconditioners GPUs for 2nd-level parallelism, preserves costly MPI investment ISV focus on hybrid parallel CFD that utilizes all CPU cores + GPU

GPU progress for end-user developed CFD with OpenACC Most benefits to aerospace companies with legacy Fortran

GPUs behind fast growth in particle-based commercial CFD New ISV developments in lattice Boltzmann (LBM) and SPH

GPU Progress Summary for GPU-Parallel CFD

Page 4: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

4

Structured Grid FV Unstructured FV Unstructured FE

CFD Software Character and GPU Suitability

Explicit

Usually

Compressible

Implicit

Usually

Incompressible

Finite Volume Finite Element:

Numerical operations on I,J,K stencil, no “solver” [Typically flat profiles: GPU strategy of directives (OpenACC)]

Sparse matrix linear algebra – iterative solvers [Hot spot ~50%, small % LoC: GPU strategy of CUDA and libs]

Page 5: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

5

Structured Grid FV Unstructured FV Unstructured FE

CFD Speedups for GPU Relative to 8-Core CPU

Explicit

Usually

Compressible

Implicit

Usually

Incompressible

Finite Element:

Turbostream

SJTU RANS

- SD++

Stanford

(Jameson)

- FEFLO

(Lohner)

Veloxi

~10x ~5x

Finite Volume

Structured grid explicit

generally best GPU fit

Page 6: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

6

Typical Routine Simulation

Large-scale Simulation ~19x Speedup

http://www.turbostream-cfd.com/ Source:

Sample Turbostream GPU Simulations

Turbostream: CFD for Turbomachinery

Page 7: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

7

GPU Application

SJTU-developed explicit CFD RANS for aerodynamic evaluation of wing shapes

GPU Benefit

Use of Tesla C2070: 37x vs. single core Intel core i7 CPU

Faster simulations for more wing design candidates vs. costly wind tunnel tests

Expanding to multi-GPU and full aircraft

COMAC and SJTU Commercial Aircraft Corporation of China

COMAC Wing Candidate

ONERA M6 Wing CFD Simulation

Commercial Aircraft Wing Design on GPUs

Page 8: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

8

Structured Grid FV Unstructured FV Unstructured FE

CFD Speedups for GPU Relative to 8-Core CPU

Explicit

Usually

Compressible

Implicit

Usually

Incompressible

- Moldflow

- AcuSolve

- Moldex3D

Turbostream

SJTU RANS

- SD++

Stanford

(Jameson)

- FEFLO

(Lohner)

Veloxi

~15x ~5x

~2x

- ANSYS Fluent

- Culises for

OpenFOAM

- SpeedIT for

OpenFOAM

- CFD-ACE+

- FIRE

Commercial CFD mostly

unstructured implicit

Finite Volume Finite Element:

Page 9: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

9

Strategic Alliances

Applications Support

Software Development

Business and technical alliances with key ISVs (ANSYS, CD-adapco, etc.) Invest in long-term technical collaboration for ANSYS Fluent acceleration Develop key technical collaborations with CFD research community:

TiTech—Aoki, Stanford—Jameson, Oxford—Giles, Wyoming—Mavriplis, others

NVIDIA linear solver toolkit with emphasis on AMG for industry CFD Invest in relevant high-order methods (DGM, flux reconstruction, etc.)

Direct developer support for range of ISV and customer requests

Implicit Schemes: Integration support of libraries and solver toolkit Explicit Schemes: Stencil libraries, OpenACC support for Fortran

NVIDIA Strategy for GPU-Accelerated CFD

Page 10: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

10

ISV Primary Applications (Green color indicates CUDA-ready during 2013)

ANSYS ANSYS Mechanical; ANSYS Fluent; ANSYS HFSS

DS SIMULIA Abaqus/Standard; Abaqus/Explicit; Abaqus/CFD

MSC Software MSC Nastran; Marc; Adams

Altair RADIOSS; AcuSolve

CD-adapco STAR-CD; STAR-CCM+

Autodesk AS Mechanical, Moldflow, AS CFD

ESI Group PAM-CRASH imp; CFD-ACE+

Siemens NX Nastran

LSTC LS-DYNA; LS-DYNA CFD

Mentor FloEFD, FloTherm

Metacomp CFD++

Primary Commercial CAE and GPU Progress

Page 11: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

11

Additional Commercial GPU Developments ISV Domain Location Primary Applications

FluiDyna CFD Germany Culises for OpenFOAM; LBultra

Vratis CFD Poland Speed-IT for OpenFOAM; ARAEL

Prometech CFD Japan Particleworks

Turbostream CFD England, UK Turbostream

IMPETUS Explicit FEA Sweden AFEA

AVL CFD Austria FIRE

CoreTech CFD (molding) Taiwan Moldex3D

Intes Implicit FEA Germany PERMAS

Next Limit CFD Spain XFlow

CPFD CFD USA BARRACUDA

Flow Science CFD USA FLOW-3D

SCSK Implicit FEA Japan ADVENTURECluster

CDH Implicit FEA Germany AMLS; FastFRS

FunctionBay MB Dynamics S. Korea RecurDyn

Cradle Software CFD Japan SC/Tetra; scSTREAM

Page 12: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

12

Every primary ISV has products available on GPUs or ongoing evaluation

The 4 largest ISVs all have products based on GPUs, some at 3rd generation

ANSYS SIMULIA MSC Software Altair

The top 4 out of 5 ISV applications are available on GPUs today

ANSYS Fluent, ANSYS Mechanical, Abaqus/Standard, MSC Nastran, . . . LS-DYNA implicit only

Several new ISVs were founded with GPUs as a primary competitive strategy Prometech, FluiDyna, Vratis, IMPETUS, Turbostream

Open source CFD OpenFOAM available on GPUs today with many options Commercial options: FluiDyna, Vratis; Open source options: Cufflink, Symscape ofgpu, RAS, etc.

Status Summary of ISVs and GPU Acceleration

Page 13: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

13

Basics of GPU Computing for ISV Software

ISV software use of GPU acceleration is user-transparent

Jobs launch and complete without additional user steps

User informs ISV application (GUI, command) that a GPU exists

Schematic of a CPU with an attached GPU accelerator

CPU begins/ends job, GPU manages heavy computations

Schematic of an x86 CPU with a GPU accelerator

1. ISV job launched on CPU

2. Solver operations sent to GPU

3. GPU sends results back to CPU

4. ISV job completes on CPU

GD

DR

GD

DR

DDR

DDR

GPU I/O Hub

PCI-Express

CPU

Cache

1

4

2

3

Page 14: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

14

Commercial CFD Focus on Sparse Solvers

CFD Application Software

+

GPU CPU - Hand-CUDA Parallel

- GPU Libraries, CUBLAS

- OpenACC Directives

Implicit Sparse Matrix Operations

40% - 65% of

Profile time,

Small % LoC

(Investigating OpenACC for more tasks on GPU)

Read input, matrix Set-up

Global solution, write output

Implicit Sparse Matrix Operations

Page 15: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

15

Toolkit of linear solvers, preconditioners, other, for large sparse Ax=b

Available schemes include:

AMG – multi-level scheme popular with several commercial CFD

Jacobi, BiCGStab, FGMRES, MC-DILU, and others

Use of NVIDIA linear solver toolkit for industry-ready CFD:

ANSYS 14.5 collaboration introduced their AMG-GPU solver in Nov 2012

FluiDyna collaboration on Culises 2.0 AMG solver library for OpenFOAM

Other ISVs and customer CFD software undergoing evaluation . . .

Accelerate state-of-the-art multi-level linear solvers in targeted

application domains

Primary Targets: CFD and Reservoir Simulation

Other domains will follow

Focus on difficult-to-parallelize algorithms

Parallelize both setup and solve phases

Difficult problems: parallel graph algorithms, sparse matrix

manipulation, parallel smoothers

No groups have successfully mapped production-quality algorithms to

fine-grained parallel architectures

Ensure NVIDIA architecture team understands these

applications and is influenced by them

BiCGstab AMG Jacobi

MC-DILU

NVIDIA Offers an Accelerated Solver Toolkit

Page 16: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

16

GPU Developments for Aircraft CFD Developer Location Software (Green color indicates GPU-ready during 2013)

NASA USA OVERFLOW

NASA USA FUN3D

AFRL USA AVUS

ONERA France elsA

Stanford/Jameson USA SD++

JAXA Japan UPACS

ANSYS USA ANSYS Fluent 15.0

CD-adapco USA/UK STAR-CCM+

Metacomp USA CFD++

ANSYS USA ANSYS Fluent 15.0

FluiDyna Germany Culises for OpenFOAM 2.2.0

Vratis Poland Speed-IT for OpenFOAM 2.2.0

CD-adapco USA/UK STAR-CCM+

External Aero

Internal Flows

Page 17: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

17

GPU Developments for Turbine Engine CFD Developer Location Software (Green color indicates CUDA-ready during 2013)

Turbostream England, UK Turbostream 3.0

Oxford / Rolls Royce England, UK OP2 / Hydra

ANSYS USA ANSYS CFD 15.0 (Fluent + CFX)

ANSYS USA ANSYS Fluent 15.0

FluiDyna Germany Culises for OpenFOAM 2.2.0

Vratis Poland Speed-IT for OpenFOAM 2.2.0

Cascade Technologies USA CHARLES

Convergent Science USA Converge CFD

Sandia NL / Oak Ridge NL USA S3D

Naval Research Lab USA JENRE

Aviadvigatel OJSC Russia GHOST CFD

Turbomachinery

Combustor

Nozzle / Noise

Page 18: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

18

GPU Status of Select Automotive CAE Software

Select Automotive CAE Application ISV Select CAE Software GPU Status

CSM: Durability (Stress) and Fatigue MSC Nastran Available Today

Road Handling and VPG Adams (for MBD) Evaluation

Powertrain Stress Analysis Abaqus/Standard Available Today

Body NVH MSC Nastran Available Today

Crashworthiness and Safety LS-DYNA Implicit only, beta

CFD: Aerodynamics / Thermal UH ANSYS Fluent Available Today, beta

IC Engine Combustion STAR-CCM+ Evaluation

Aerodynamics / HVAC OpenFOAM Available Today

Plastic Mold Injection Moldflow Available Today

Page 19: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

19

GPU progress in CFD research continues to expand

Growth from particle-based CFD and high-order methods Explicit schemes generally more progress than implicit

Strong GPU investments by commercial CFD vendors (ISVs)

Breakthroughs in GPU-parallel linear solvers and preconditioners GPUs for 2nd-level parallelism, preserves costly MPI investment ISV focus on hybrid parallel CFD that utilizes all CPU cores + GPU

GPU progress for end-user developed CFD with OpenACC Most benefits to aerospace companies with legacy Fortran

GPUs behind fast growth in particle-based commercial CFD New ISV developments in lattice Boltzmann (LBM) and SPH

GPU Progress Summary for GPU-Parallel CFD

Page 20: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

20

ISV Software Application Method GPU Status

PowerFLOW Aerodynamics LBM Evaluation LBultra Aerodynamics LBM Available v2.0 XFlow Aerodynamics LBM Evaluation Project Falcon Aerodynamics LBM Evaluation Particleworks Multiphase/FS MPS (~SPH) Available v3.5 BARRACUDA Multiphase/FS MP-PIC In development EDEM Discrete phase DEM In development ANSYS Fluent–DDPM Multiphase/FS DEM In development STAR-CCM+ Multiphase/FS DEM Evaluation AFEA High impact SPH Available v2.0

ESI High impact SPH, ALE In development LSTC High impact SPH, ALE Evaluation Altair High impact SPH, ALE Evaluation

Particle-Based Commercial CFD Software Growing

Page 21: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

21

TiTech Aoki Lab LBM Solution of External Flows

A Peta-scale LES (Large-Eddy Simulation) for Turbulent Flows

Based on Lattice Boltzmann Method, Prof. Dr. Takayuki Aoki http://registration.gputechconf.com/quicklink/8Is4ClC

www.sim.gsic.titech.ac.jp

Aoki CFD solver using Lattice

Boltzmann method (LBM) with Large Eddy Simulation (LES)

Page 22: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

22

FluiDyna Lattice Boltzmann Solver LBultra

Spin-Off in 2006 from TU Munich

CFD solver using Lattice Boltzmann

method (LBM)

Demonstrated 25x speedup single GPU

Multi-GPU ready

Contact FluiDyna for license details

www.fluidyna.de

http://www.fluidyna.com/content/lbultra

Page 23: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

23

Prometech and Particleworks for Particle CFD

Oil Flow in HB Gearbox

Courtesy of Prometech Software and Particleworks CFD Software

MPS-based method developed at the University of Tokyo [Prof. Koshizuka]

Particleworks 3.0 GPU vs. 4 core i7

http://www.prometech.co.jp

Page 24: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

24

Overview of GPU Progress for CFD GPU Acceleration of ANSYS Fluent GPU Acceleration of OpenFOAM

Agenda: GPU Acceleration for Applied CFD

Page 25: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

25

ANSYS and NVIDIA Technical Collaboration

Release

ANSYS Mechanical ANSYS Fluent ANSYS EM

13.0 Dec 2010

SMP, Single GPU, Sparse

and PCG/JCG Solvers

ANSYS Nexxim

14.0 Dec 2011

+ Distributed ANSYS;

+ Multi-node Support

Radiation Heat Transfer

(beta)

ANSYS Nexxim

14.5 Nov 2012

+ Multi-GPU Support;

+ Hybrid PCG;

+ Kepler GPU Support

+ Radiation HT;

+ GPU AMG Solver (beta),

Single GPU

ANSYS Nexxim

15.0 Q4-2013

+ CUDA 5 Kepler Tuning + Multi-GPU AMG Solver;

+ CUDA 5 Kepler Tuning

ANSYS Nexxim

ANSYS HFSS (Transient)

Page 26: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

26

Radiation HT Applications:

- Underhood cooling

- Cabin comfort HVAC

- Furnace simulations

- Solar loads on buildings

- Combustor in turbine

- Electronics passive cooling

ANSYS Fluent 14.5 and Radiation HT on GPU

VIEWFAC Utility:

Use on CPUs, GPUs

or both ~2x speedup

RAY TRACING Utility:

Uses OptiX library

from NVIDIA with up

to ~15x speedup

(Use on GPU only)

Page 27: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

27

ANSYS Fluent 15.0 will offer a GPU-based AMG solver (Nov/Dec 2013) Developed with support for MPI across multiple nodes and multiple GPUs

Solver collaboration on pressure-based coupled Navier-Stokes, others to follow

Early results published at Parallel CFD 2013, 20-24 May, Changsha, CN GPU-Accelerated Algebraic Multigrid for Applied CFD

ANSYS Fluent Use of NVIDIA Solver Tooklit

Page 28: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

28

Solve Linear System of Equations: Ax = b

Assemble Linear System of Equations

No Yes

Stop

Accelerate

this first

~ 35%

~ 65%

Runtime:

Non-linear iterations

Converged ?

ANSYS Fluent CPU Profile for Coupled Solver

Page 29: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

29

Model FL5S1:

- Incompressible

- Flow in a Bend

- 32K Hex Cells

- Coupled Solver

nvAMG Preview of ANSYS Fluent Convergence Behavior

1.0000E-08

1.0000E-07

1.0000E-06

1.0000E-05

1.0000E-04

1.0000E-03

1.0000E-02

1.0000E-01

1.0000E+00

1 11 21 31 41 51 61 71 81 91 101 111 121 131 141

NVAMG-Cont

NVAMG-X-mom

NVAMG-Y-mom

NVAMG-Z-mom

FLUENT-Cont

FLUENT-X-mom

FLUENT-Y-mom

FLUENT-Z-mom

Numerical Results

Mar 2012: Test for convergence at each iteration

matches precise Fluent behavior

Err

or

Resid

uals

Iteration Number

ANSYS Fluent 14.5 GPU Solver Convergence

Page 30: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

30

2832

933

517 517

0

1000

2000

3000

Dual Socket CPU Dual Socket CPU + Tesla C2075

AN

SY

S F

luent

AM

G S

olv

er

Tim

e (

Sec)

2 x Xeon X5650, Only 1 Core Used

1.8x

5.5x

Lower is

Better

2 x Xeon X5650, All 12 Cores Used

ANSYS Fluent 14.5 GPU Acceleration

Helix geometry

1.2M Tet cells

Unsteady, laminar

Coupled PBNS, DP

AMG F-cycle on CPU

AMG V-cycle on GPU

Helix Model

NOTE: All jobs

solver time only

Preview of ANSYS Fluent 14.5 Performance – by ANSYS, Aug 2012

Page 31: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

31

0

3

6

9

Airfoil (hex 784K) Aircraft (hex 1798K)

K20X

3930K(6)

Lower is

Better

NOTE: Times

for solver only

AN

SY

S F

luent

AM

G S

olv

er

Tim

e p

er

Itera

tion (S

ec)

ANSYS Fluent 14.5 Performance – Results by NVIDIA, Nov 2012

CPU Fluent solver:

F-cycle, agg8, DILU,

0pre, 3post

GPU nvAMG solver:

V-cycle, agg8, MC-DILU,

0pre, 3post

2 x Core-i7 3930K, Only 6 Cores Used

Solver settings:

Airfoil and Aircraft Models with Hexahedral Cells

2.4x

2.4x

ANSYS Fluent with GPU-Based AMG Solver

Page 32: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

32

N1 N2 N3 N4

1 2 3

4

Partition on CPU

GPUs and Distributed Cluster Computing

N1

Geometry decomposed: partitions

put on independent cluster nodes;

CPU distributed parallel processing Nodes distributed

parallel using MPI

Global Solution

Page 33: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

33

N1 N2 N3 N4

1 2 3

4

Partition on CPU

GPUs and Distributed Cluster Computing

N1

Geometry decomposed: partitions

put on independent cluster nodes;

CPU distributed parallel processing Nodes distributed

parallel using MPI

Global Solution

Execution on

CPU + GPU

GPUs shared memory

parallel using OpenMP

under distributed parallel

G1 G2 G3 G4

1

Page 34: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

34

G1 G2 G3 G4

8-Cores 8-Cores 16-Core Server Node

Multi-GPU Acceleration of

16-Core ANSYS Fluent 15.0

(Preview) External Aero

Xeon E5-2667 + 4 x Tesla K20X GPUs

2.9X Solver Speedup

CPU Configuration CPU + GPU Configuration

ANSYS Fluent for 3.6M Cell Aerodynamic Case

Page 35: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

35

69

41

28

12 9

0

25

50

75

Intel Xeon E5-2667, 2.90GHz

Intel Xeon E5-2667, 2.90GHz + Tesla K20X

2 x Nodes, 4 CPUs (24 Cores Total);

8 GPUs (4 ea Node)

3.5x

Lower is

Better

ANSYS Fluent for 14M Cell Aerodynamic Case

14 M Mixed cells

DES Turbulence

Coupled PBNS, SP

Times for 1 Iteration

AMG F-cycle on CPU

GPU: Preconditioned FGMRES with AMG

Truck Body Model

NOTE: All jobs

solver time only

3.3x

4 x Nodes, 8 CPUs (48 Cores Total);

16 GPUs (4 ea Node)

1 x Nodes, 2 CPUs (12 Cores Total)

ANSYS Fluent 15.0 Preview Performance – Results by NVIDIA, Jun 2013

AN

SY

S F

luent

AM

G S

olv

er

Tim

e p

er

Itera

tion (S

ec)

Page 36: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

36

Overview of GPU Progress for CFD GPU Acceleration of ANSYS Fluent GPU Acceleration of OpenFOAM

Agenda: GPU Acceleration for Applied CFD

Page 37: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

37

APR 24 – 26, Frankfurt, DE: ESI OpenFOAM Users Conference (first ever) http://www.esi-group.com/corporate/events/2013/OpenFOAM2013

Concentration on OpenFOAM from OpenCFD

JUN 11 – 14, Jeju, KR : 8th International OpenFOAM Workshop (first in Asia) http://www.openfoamworkshop2013.org/

Concentration on OpenFOAM-extend and Wikki

OCT 24 – 25, Hamburg, DE : 7th Open Source CFD International Conference (ICON)

http://www.opensourcecfd.com/conference2013/

Concentration on both OpenFOAM and OpenFOAM-extend

2013: Further Expansion of OF Community

ESI acquisition of OpenCFD from SGI during Sep 2012

IDAJ acquire majority stake of ICON during May 2013

This Year 3 (up from 2) OpenFOAM Global User Events:

Page 38: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

38

Provide technical support for commercial GPU solver developments

FluiDyna Culises AMG solver library using NVIDIA toolkit

Vratis Speed-IT library, development of CUSP-based AMG

Alliances (but no development) with key OpenFOAM organizations ESI and OpenCFD Foundation (H. Weller, M. Salari)

Wikki and OpenFOAM-extend community (H. Jasak)

IDAJ in Japan and ICON in the UK – support of both OF and OF-ext

Conduct performance studies and customer benchmark evaluations Collaborations: developers, customers, OEMs (Dell, SGI, HP, etc.)

NVIDIA Market Strategy for OpenFOAM

Page 39: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

39

Culises: CFD Solver Library for OpenFOAM

www.fluidyna.de

FluiDyna: TU Munich Spin-Off from 2006

Culises provides a linear solver library

Culises requires only two edits to control file of OpenFOAM

Multi-GPU ready

Contact FluiDyna for license details

Culises Easy-to-Use AMG-PCG Solver:

#1. Download and license from http://www.FluiDyna.de

#2. Automatic installation with FluiDyna-provided script

#3. Activate Culises and GPUs with 2 edits to config-file

config-file CPU-only config-file CPU+GPU

www.fluidyna.de

Page 40: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

40

Culises Coupling to OpenFOAM

www.fluidyna.de

Culises Coupling is User-Transparent:

Page 41: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

41

OpenFOAM Speedups Based on CFD Application

www.fluidyna.de

GPU Speedups for Different Industry Cases:

Job Speedup Solver Speedup OpenFOAM CPU-Only Efficiency

Automotive

1.6x

Multiphase

1.9x

Thermal

3.0x

Pharma CFD

2.2x

Process CFD

4.7x

Range of model sizes and different solver schemes (Krylov, AMG-PCG, etc.)

Page 42: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

42

FluiDyna Culises: CFD Solver for OpenFOAM

Solver speedup of 7x

for 2 CPU + 4 GPU

• 36M Cells (mixed type)

• GAMG on CPU

• AMGPCG on GPU

Culises: A Library for Accelerated CFD on Hybrid GPU-CPU Systems

Dr. Bjoern Landmann, FluiDyna developer.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S0293-GTC2012-Culises-Hybrid-GPU.pdf

www.fluidyna.de

DrivAer: Joint Car Body Shape by BMW and Audi

http://www.aer.mw.tum.de/en/research-groups/automotive/drivaer

Mesh Size - CPUs 9M - 2 CPU 18M - 2 CPU 36M - 2 CPU

GPUs +1 GPU +2 GPUs +4 GPUs

2.5x 4.2x 6.9x

Job Speedup 1.36x 1.52x 1.67x

Page 43: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

43

GPUs provide significant speedups for solver intensive jobs

Improved product quality with higher fidelity modeling

Shorten product engineering cycles with faster simulation turnaround

Simulations recently considered impractical now possible

Unsteady RANS, Large Eddy Simulation (LES) practical in cost and time

Effective parameter optimization from large increase in number of jobs

Conclusions For Applied CFD on GPUs

Page 44: Stan Posey NVIDIA, Santa Clara, CA, USA; sposey@nvidiaon-demand.gputechconf.com/gtc/2013/jp/sessions/2006.pdf · Stan Posey NVIDIA, Santa Clara, CA, USA; ... Internal Flows . 17

Stan Posey

NVIDIA, Santa Clara, CA, USA; [email protected]