13
1 © 2016 ANSYS, Inc. November, 2016 Accelerating the ANSYS Fluent R18.0 Radiation Solver with OpenACC Sunil Sathe, Lead Software Developer

Radiation Solver with OpenACC Accelerating the …on-demand.gputechconf.com/supercomputing/2016/...Accelerating the ANSYS Fluent R18.0 Radiation Solver with OpenACC Sunil Sathe, Lead

Embed Size (px)

Citation preview

Page 1: Radiation Solver with OpenACC Accelerating the …on-demand.gputechconf.com/supercomputing/2016/...Accelerating the ANSYS Fluent R18.0 Radiation Solver with OpenACC Sunil Sathe, Lead

1 © 2016 ANSYS, Inc. November, 2016

Accelerating the ANSYS Fluent R18.0Radiation Solver with OpenACC

Sunil Sathe, Lead Software Developer

Page 2: Radiation Solver with OpenACC Accelerating the …on-demand.gputechconf.com/supercomputing/2016/...Accelerating the ANSYS Fluent R18.0 Radiation Solver with OpenACC Sunil Sathe, Lead

2 © 2016 ANSYS, Inc. November, 2016

Outline

• Fluent heterogeneous computing (HTC) infrastructure

• PGI/OpenACC for the HTC infrastructure

• Build and execution model

• Discrete ordinate (DO) radiation solver options

• CPU/GPU co-computing

• Sample OpenACC pragma in HTC

• Performance

• Summary

Page 3: Radiation Solver with OpenACC Accelerating the …on-demand.gputechconf.com/supercomputing/2016/...Accelerating the ANSYS Fluent R18.0 Radiation Solver with OpenACC Sunil Sathe, Lead

3 © 2016 ANSYS, Inc. November, 2016

Fluent HTC infrastructure

Domain DecompositionMPI Rank 0 MPI Rank 1

MPI Rank 2 MPI Rank 3

GPU GPU

GPU GPU

CPU Cell

GPU Cell

Case

CPU Domain GPU Domain

CPU Cells GPU Cells

CPU Faces GPU Faces

Abstract Domain

Abstract Cells

Abstract Faces

OpenACC CodeAlgorithms Data Structure

Page 4: Radiation Solver with OpenACC Accelerating the …on-demand.gputechconf.com/supercomputing/2016/...Accelerating the ANSYS Fluent R18.0 Radiation Solver with OpenACC Sunil Sathe, Lead

4 © 2016 ANSYS, Inc. November, 2016

PGI/OpenACC for the HTC infrastructure

● Hardware portability

○ NVIDIA GPUs

○ Intel x86

○ Multi-core

○ OpenPower

○ ARM

● OS portability

○ Windows

○ Linux

● Performance portability

○ Competitive performance with best-in-class

compilers and programming models on individual platforms

● Ease of programming model

○ Simple pragma directives

Page 5: Radiation Solver with OpenACC Accelerating the …on-demand.gputechconf.com/supercomputing/2016/...Accelerating the ANSYS Fluent R18.0 Radiation Solver with OpenACC Sunil Sathe, Lead

5 © 2016 ANSYS, Inc. November, 2016

Build and execution model

fluent_mpi.x.y.z libhtc.so

Fluent Native Source Code

Fluent HTC Source Code

Compiled with pgc++ OpenACC support

Dynamically loaded

Compiled

Executed

Page 6: Radiation Solver with OpenACC Accelerating the …on-demand.gputechconf.com/supercomputing/2016/...Accelerating the ANSYS Fluent R18.0 Radiation Solver with OpenACC Sunil Sathe, Lead

6 © 2016 ANSYS, Inc. November, 2016

Discrete ordinate radiation solver options

Flue

nt N

ativ

e S

olve

r

Flue

nt N

ativ

e S

olve

r

Flue

nt N

ativ

e S

olve

r

Flue

nt N

ativ

e S

olve

r

CPU GPU CPU GPU CPU GPU CPU GPU

Fluent-HTCDO Solver

Fluent-HTCDO Solver

Fluent-HTCDO Solver

Fluent-HTCDO Solver

CPU Computation in Fluent Native DO Solver

CPU Computation in Fluent-HTC DO Solver

GPU Computation in Fluent-HTC DO Solver

CPU/GPU Hybrid Computation in Fluent-HTC DO Solver

Page 7: Radiation Solver with OpenACC Accelerating the …on-demand.gputechconf.com/supercomputing/2016/...Accelerating the ANSYS Fluent R18.0 Radiation Solver with OpenACC Sunil Sathe, Lead

7 © 2016 ANSYS, Inc. November, 2016

CPU/GPU co-computing

Time

Launch GPU kernels asynchronously

Compute on CPU simultaneously

Wait for GPU to finish

● Divide work between CPU and GPU

● Run the same code on both CPU and GPU

● Use OpenACC “loop” pragma to accelerate the GPU work

Page 8: Radiation Solver with OpenACC Accelerating the …on-demand.gputechconf.com/supercomputing/2016/...Accelerating the ANSYS Fluent R18.0 Radiation Solver with OpenACC Sunil Sathe, Lead

8 © 2016 ANSYS, Inc. November, 2016

Sample OpenACC pragma in HTC

#pragma acc parallel loop async(0) present(doi,ap,s))for(c=0;c<nc;c++){ doi[c] = s[c]/ap[c]; if(doi[c]<0.0) doi[c] = 0.0;}

● Use asynchronous kernel calls to allow co-computing on CPU

● Use explicit memory upload/download to always enable

usage of “present” clause

Page 9: Radiation Solver with OpenACC Accelerating the …on-demand.gputechconf.com/supercomputing/2016/...Accelerating the ANSYS Fluent R18.0 Radiation Solver with OpenACC Sunil Sathe, Lead

9 © 2016 ANSYS, Inc. November, 2016

OpenACC performance in HTC

Problem setup:Head lamp simulation of 1.4M and 11.5M casesVolume Monitor on Incident RadiationConvergence criterion of 1e-3 on volume monitor

CPU Hardware:(Haswell EP) Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz, 2 socket x 14 = 28 cores

GPGPU Hardware:Tesla K80 12+12 GB, Driver 346.46

Page 10: Radiation Solver with OpenACC Accelerating the …on-demand.gputechconf.com/supercomputing/2016/...Accelerating the ANSYS Fluent R18.0 Radiation Solver with OpenACC Sunil Sathe, Lead

10 © 2016 ANSYS, Inc. November, 2016

OpenACC performance in HTC (cont)

7.8x6.1x 3.6x 3.1x

Page 11: Radiation Solver with OpenACC Accelerating the …on-demand.gputechconf.com/supercomputing/2016/...Accelerating the ANSYS Fluent R18.0 Radiation Solver with OpenACC Sunil Sathe, Lead

11 © 2016 ANSYS, Inc. November, 2016

OpenACC performance in HTC (cont)

Problem setup:0.58M case6x6 DO Resolution with 3 BandsFlow + Energy + DOSingle Precision200 iterations

CPU Hardware:(Haswell EP) Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz, 2 socket x 14 = 28 cores

GPGPU Hardware:Tesla K80 12+12 GB, Driver 346.46

Page 12: Radiation Solver with OpenACC Accelerating the …on-demand.gputechconf.com/supercomputing/2016/...Accelerating the ANSYS Fluent R18.0 Radiation Solver with OpenACC Sunil Sathe, Lead

12 © 2016 ANSYS, Inc. November, 2016

OpenACC performance in HTC (cont)

6.7x6.3x 4.3x 2.4x

Page 13: Radiation Solver with OpenACC Accelerating the …on-demand.gputechconf.com/supercomputing/2016/...Accelerating the ANSYS Fluent R18.0 Radiation Solver with OpenACC Sunil Sathe, Lead

13 © 2016 ANSYS, Inc. November, 2016

Summary and future work

• Achievements

− Effectively using OpenACC for heterogeneous computing in Fluent

− Impressive performance achieved in Fluent with the OpenACC programming model

• Future work

− Extend the heterogeneous computing framework to more models

− Investigate more platforms like OpenPower and multi-core