28
ORNL is managed by UT-Battelle for the US Department of Energy Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Benjamin Hernandez, PhD [email protected] Advanced Data and Workflows Group Oak Ridge Leadership Computing Facility Oak Ridge National Laboratory

Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

ORNL is managed by UT-Battelle

for the US Department of Energy

Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1

Benjamin Hernandez, [email protected] Data and Workflows Group

Oak Ridge Leadership Computing Facility

Oak Ridge National Laboratory

Page 2: Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

2Exploratory Visualization of Petascale Particle

Data in Nvidia DGX-1

Oak Ridge Leadership Computing Facility (OLCF)

• Mission: Provide the computational and data resources required to solve the most challenging problems.

• Highly competitive user allocation programs (INCITE, ALCC).

• Projects receive 10x to 100x more resource than at other generally available centers.

• We partner with users to enable science & engineering breakthroughs.

Page 3: Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

3Exploratory Visualization of Petascale Particle

Data in Nvidia DGX-1

Sight:Exploratory Visualization of Scientific Data

• Client/Server architecture to provide high end visualization in laptops, desktops, and powerwalls.

• Heterogeneous scientific visualization

– Take advantage of both CPU & GPU resources within a node: DGX-1 use case.

– Advanced shading to enable new insights into data exploration.

• Parallel I/O & Data Staging

– “Pluggable” for in-situ visualization

• Lightweight tool

– Load your data

– Perform exploratory analysis

– Visualize/Save results

Page 4: Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

4Exploratory Visualization of Petascale Particle

Data in Nvidia DGX-1

Server (DGX-1)

Server (DGX-1)

Sight System Architecture (in progress)

Server (DGX-1 or multiGPU node)

CPU cores

Multi-GPU

OSPrayNvidiaOptix

VT

K-m

Lo

ca

l/P

ara

llel F

ile S

yste

m

HP

C C

luste

r

AD

IOS

I/O

Syste

m

VT

K-m

VisualizationFrames

Co

mp

ressio

n

Websockets

HTML Client

*OSPray and Nvidia Optix are finely tuned libraries for Ray Tracing in multicore and manycore architectures

Page 5: Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

5Exploratory Visualization of Petascale Particle

Data in Nvidia DGX-1

ADIOS

• ADIOS is an I/O framework

– Provides multiple methods to stage data to a staging area (on node, off node, off machine)

– Data output can be anything one wants

• Different methods allow for different types of data movement, aggregation, and arrangement to the storage system or to stream over the local-nodes, LAN, WAN

– It contains our own file format if you choose to use it (ADIOS-BP)

– Compress/decompress data in parallel

– Contains mechanisms to index and query data

Page 6: Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

6Exploratory Visualization of Petascale Particle

Data in Nvidia DGX-1

First Approach: OpenGL

VBO(points)

V.S.Apply transfer

function

G.S.Quads w/tex.

coords

F.S.Sphere gen. and Shading

Page 7: Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

7Exploratory Visualization of Petascale Particle

Data in Nvidia DGX-1

OpenGL Bindless Graphics

Initialization

Display

Address pointer

Vertices start from vboAddress to vboAddress + sizeof (float)*size

Page 8: Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

8Exploratory Visualization of Petascale Particle

Data in Nvidia DGX-1

(-1,-1)

(1,1)(-1,1)

(1,-1)

Fragment Shader sphere Generation

• Sphere equation:

𝑟2 = 𝑥 − 𝑥02 + 𝑦 − 𝑦0

2 + 𝑧 − 𝑧02

r = 1.0, z = 1.0

x = texCoord.x

y = texCoord.y

zz = 1.0 – x*x – y*y

if (zz <= 0.0) // removes fragments outside

discard;

// scale to the desired radius

// calculate diffuse illumination

Page 9: Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

9Exploratory Visualization of Petascale Particle

Data in Nvidia DGX-1

Results

Page 10: Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

10Exploratory Visualization of Petascale Particle

Data in Nvidia DGX-1

OpenGL Multi-GPU Rendering

• One MPI task for each device

– Easy to implement

– Each device initialize its GLX/EGL context

• Multi-threading. One thread per device

– In EGL is possible:

• Create the main context in the main thread:

mainCtx = eglCreateContext(display, config, 0, contextAttrs)

• Each additional thread create a shared context:

lclThrdCtx = eglCreateContext(display, config, mainCtx, contextAttrs);

• Implement some mutex/semaphores to sync any updates

– Vulkanize your viz !

• Devices are aware of other devices and can coordinate between each other

• That’s precisely NVIDIA Optix can do

Page 11: Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

11Exploratory Visualization of Petascale Particle

Data in Nvidia DGX-1

Second ApproachNvidia Optix Ray Tracing Engine

• “The OptiX API is an application framework for achieving optimal ray tracing performance on the GPU. It provides a simple, recursive, and flexible pipeline for accelerating ray tracing algorithms.”

• “Similar to OpenGL in doing the “heavy lifting” of ray tracing and leaving capability and technique to the developers”

– Plus it can use all GPUs available in your system

– Naturally fits material appearance and scene illumination

Page 12: Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

12Exploratory Visualization of Petascale Particle

Data in Nvidia DGX-1

• Optix provides eight programmable components, some of them are:

1. Ray generation

2. Intersection

3. Shading (closest hit)

Shadows (any hit)

Selector

• Shaders are CUDA like syntax

Nvidia Optix Programming Model

1

2

3

Page 13: Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

13Exploratory Visualization of Petascale Particle

Data in Nvidia DGX-1

Nvidia Optix Graph Nodes

• Geometry is defined by Graph nodes.

• A tree-like hierarchy where:

– Nodes at the bottom describes geometric objects.

– Nodes at the top describes collections of geometric objects.

Group

Transform Selector

GeometryGroup

GeometryInstance

GeometryInstance

GeometryGroup

GeometryGroup

Geometry Geometry

GeometryInstance

GeometryInstance

Geometry Geometry

Acceleration

AccelerationAcceleration

Page 14: Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

14Exploratory Visualization of Petascale Particle

Data in Nvidia DGX-1

Nvidia Optix Graph Nodes

• “Keep the hierarchy as flat as possible…”

GeometryGroup

GeometryInstance

Particles

Acceleration

But not too flat!

Group

GeometryGroup

GeometryInstance

Particles

Acceleration

AccelerationGeometry

Group

GeometryInstance

Particles

AccelerationGeometry

Group

GeometryInstance

Particles

Acceleration

Page 15: Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

15Exploratory Visualization of Petascale Particle

Data in Nvidia DGX-1

ResultsTest Systems

• Workstation

– CPU Intel Xeon 20 cores… 512 GB

– GPU Titan Z (2 Geforce Kepler GPU, 2x6 GB VRAM),

– Ubuntu 16, Nvidia Driver

• Rhea Node

– CPU Intel Xeon …

– GPU 2x Tesla K80 (4 Tesla Kepler GPU, 2x24 GB VRAM)

– Redhat 7. Nvidia Driver

• DGX-1

– CPU Intel…

– GPU 8x Tesla Pascal SMX, 8x16 GB VRAM

– Ubuntu 16, Nvidia Driver …

All systems:

CUDA 8.0Optix 4.0.2Acceleration Structure: TrbvhImage resolution: 1080pShading: Phong Illumination &

Ambient Occlusion

Page 16: Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

16Exploratory Visualization of Petascale Particle

Data in Nvidia DGX-1

Results. How fast is built the acceleration structure ? lower is better

1

10

100

1000

Workstation Rhea Node DGX-1

1 Million 10 Million 20 Million

808

Tim

e (

ms

) 108 85 80

Particles

1616

736

14741958

980

Page 17: Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

17Exploratory Visualization of Petascale Particle

Data in Nvidia DGX-1

ResultsPerformance, lower is better

0

25

50

75

100

125

150

175

200

225

250

275

Workstation Rhea node DGX-1

Frame rate (worst case)

30 59 127 600

ms

pe

r fr

am

e

Million particles

0

5

10

15

20

25

30

Workstation Rhea node DGX-1

Frame rate (best case)

30 59 127 600

Million particles

ms

pe

r fr

am

e

60 fps

30 fps

60 fps

30 fps

Page 18: Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

18Exploratory Visualization of Petascale Particle

Data in Nvidia DGX-1

Results

Page 19: Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

19Exploratory Visualization of Petascale Particle

Data in Nvidia DGX-1

Discussion

• DGX-1 can handle particle systems up to 10x larger in our test environment.

• For particle systems of the same size DGX-1 is 10x faster than the workstation system and 4.6x faster than Rhea node

– We expect for larger image resolutions DGX-1 speed up will increase.

• Our preliminary tests showed DGX-1 has enough compute power to drive a powerwall

– 3840 x 1080 @ 60 – 120 fps

• Test larger resolution

– Researchers usually are happy when

they can explore datasets even at 1 fps

Page 20: Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

20Exploratory Visualization of Petascale Particle

Data in Nvidia DGX-1

Discussion

• Nvidia Optix provides multi-GPU support with no hassle

– Test if Nvidia Optix leaves free resources for analysis tasks.

• Paging was removed in Optix 4.x

– DGX-1 includes 40 CPU cores and 512 RAM

– Using Nvidia Optix & OSPRay library will allow full system allocation to handle larger systems.

• Summit is likely to support EGL through the Nvidia GPU Drivers (do not take it as a fact or alternative fact neither!).

– Best if (pre)exascale visualization tools are 100 % CUDA compliant.

Page 21: Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

21Exploratory Visualization of Petascale Particle

Data in Nvidia DGX-1

Questions ?

Benjamin Hernandez, [email protected]

Advanced Data and Workflows Group

Oak Ridge Leadership Computing Facility

Oak Ridge National Laboratory

Acknowledgements:

Dylan Lacewell and the Nvidia Optix Team for their technical support.

Datasets provided by Cheng-Yu Shi and Leonid Zhigilei from the Computational Materials Group at University of Virginia.

This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Page 22: Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

22Exploratory Visualization of Petascale Particle

Data in Nvidia DGX-1

Further Reading

• Tom True, Alina Alt (2013) “Configuring, Programming and Debugging Applications for Multiple GPUs” GTC 2013

• Wil Braithwaite “Multi-GPU Programming for Visual Computing” SIGGRAPH 2013

Available in GTC on Demand:

– http://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php

• Adios Manual

– http://users.nccs.gov/~pnorbert/ADIOS-UsersManual-1.11.0.pdf

• Optix Tutorial Talks– http://on-demand-gtc.gputechconf.com/gtcnew/on-demand-

gtc.php?searchByKeyword=optix&searchItems=&sessionTopic=&sessionEvent=&sessionFormat=&submit=&select=+

Page 23: Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

23Exploratory Visualization of Petascale Particle

Data in Nvidia DGX-1

2018 INCITE Call for Proposals• The 2018 INCITE Call for Proposals opened April 17, 2017 and closes June 23, 2017.

• Features large allocations of computer time and supporting resources at the Argonne and Oak Ridge Leadership Computing Facility (LCF) centers, operated by the US Department of Energy (DOE) Office of Science.

• Soliciting research proposals for awards of time on the 27-petaflop Cray XK7 Titan, and the 10-petaflop IBM Blue Gene/Q, Mira. In addition, certain 2018 INCITE awards will receive time on Argonne’s new Intel/Cray system, a 9.65-petaflops system called Theta.

• The INCITE program seeks research proposals for capability computing

– Production simulations, including ensembles, that use a large fraction of the LCF systems, or

– Proposals that require the unique LCF architectural infrastructure for high-performance computing projects that cannot be performed elsewhere

• The INCITE program is open to US and non-US based researchers.

• The INCITE program invites you to participate in an INCITE Proposal Writing Webinar, offered on April 19, May 18, and June 6.

• For more information visit http://www.doeleadershipcomputing.org/

Page 24: Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

24Exploratory Visualization of Petascale Particle

Data in Nvidia DGX-1

ResultsHow fast is built the acceleration structure ?

Time(%) Time Calls Avg Min Max Name

49.63% 53.663ms 111 483.45us 1.3430us 2.5468ms [CUDA memcpy HtoD]

22.64% 24.478ms 10 2.4478ms 2.4320ms 2.4768ms Megakernel_CUDA_1

20.43% 22.090ms 22 1.0041ms 3.7440us 2.2334ms [CUDA memcpy DtoH]

6.74% 7.2833ms 1 7.2833ms 7.2833ms 7.2833ms Megakernel_CUDA_0

0.30% 324.41us 1 324.41us 324.41us 324.41us [CUDA memcpy HtoA]

0.21% 231.71us 23 10.074us 1.2480us 150.49us [CUDA memset]

0.05% 55.903us 44 1.2700us 1.2150us 1.6320us [CUDA memcpy DtoD]

Time(%) Time Calls Avg Min Max Name

50.29% 43.010ms 133 323.38us 1.2800us 1.0984ms [CUDA memcpy HtoD]

34.61% 29.596ms 44 672.63us 3.3280us 1.0593ms [CUDA memcpy DtoH]

9.48% 8.1082ms 10 810.82us 623.39us 1.2392ms Megakernel_CUDA_1

5.03% 4.2986ms 1 4.2986ms 4.2986ms 4.2986ms Megakernel_CUDA_0

0.36% 303.77us 23 13.207us 1.1520us 211.29us [CUDA memset]

0.17% 148.67us 1 148.67us 148.67us 148.67us [CUDA memcpy HtoA]

0.06% 51.008us 44 1.1590us 1.1200us 1.1840us [CUDA memcpy DtoD]

Time(%) Time Calls Avg Min Max Name

43.50% 35.509ms 133 266.99us 1.1200us 1.1818ms [CUDA memcpy HtoD]

30.77% 25.122ms 44 570.95us 1.2800us 1.1629ms [CUDA memcpy DtoH]

18.13% 14.800ms 44 336.36us 1.7920us 371.84us [CUDA memcpy PtoP]

6.16% 5.0304ms 10 503.04us 332.39us 889.03us Megakernel_CUDA_1

1.11% 904.00us 1 904.00us 904.00us 904.00us Megakernel_CUDA_0

0.14% 112.03us 1 112.03us 112.03us 112.03us [CUDA memcpy HtoA]

0.13% 108.03us 23 4.6970us 1.0560us 60.032us [CUDA memset]

0.06% 47.264us 44 1.0740us 1.0560us 1.3440us [CUDA memcpy DtoD]

Workstation

Rhea node

DGX-1

Page 25: Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

25Exploratory Visualization of Petascale Particle

Data in Nvidia DGX-1

ResultsHow fast is built the acceleration structure ?

Time(%) Time Calls Avg Min Max Name

49.63% 53.663ms 111 483.45us 1.3430us 2.5468ms [CUDA memcpy HtoD]

22.64% 24.478ms 10 2.4478ms 2.4320ms 2.4768ms Megakernel_CUDA_1

20.43% 22.090ms 22 1.0041ms 3.7440us 2.2334ms [CUDA memcpy DtoH]

6.74% 7.2833ms 1 7.2833ms 7.2833ms 7.2833ms Megakernel_CUDA_0

0.30% 324.41us 1 324.41us 324.41us 324.41us [CUDA memcpy HtoA]

0.21% 231.71us 23 10.074us 1.2480us 150.49us [CUDA memset]

0.05% 55.903us 44 1.2700us 1.2150us 1.6320us [CUDA memcpy DtoD]

Time(%) Time Calls Avg Min Max Name

50.29% 43.010ms 133 323.38us 1.2800us 1.0984ms [CUDA memcpy HtoD]

34.61% 29.596ms 44 672.63us 3.3280us 1.0593ms [CUDA memcpy DtoH]

9.48% 8.1082ms 10 810.82us 623.39us 1.2392ms Megakernel_CUDA_1

5.03% 4.2986ms 1 4.2986ms 4.2986ms 4.2986ms Megakernel_CUDA_0

0.36% 303.77us 23 13.207us 1.1520us 211.29us [CUDA memset]

0.17% 148.67us 1 148.67us 148.67us 148.67us [CUDA memcpy HtoA]

0.06% 51.008us 44 1.1590us 1.1200us 1.1840us [CUDA memcpy DtoD]

Time(%) Time Calls Avg Min Max Name

43.50% 35.509ms 133 266.99us 1.1200us 1.1818ms [CUDA memcpy HtoD]

30.77% 25.122ms 44 570.95us 1.2800us 1.1629ms [CUDA memcpy DtoH]

18.13% 14.800ms 44 336.36us 1.7920us 371.84us [CUDA memcpy PtoP]

6.16% 5.0304ms 10 503.04us 332.39us 889.03us Megakernel_CUDA_1

1.11% 904.00us 1 904.00us 904.00us 904.00us Megakernel_CUDA_0

0.14% 112.03us 1 112.03us 112.03us 112.03us [CUDA memcpy HtoA]

0.13% 108.03us 23 4.6970us 1.0560us 60.032us [CUDA memset]

0.06% 47.264us 44 1.0740us 1.0560us 1.3440us [CUDA memcpy DtoD]

Workstation

Rhea node

DGX-1

Page 26: Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

26Exploratory Visualization of Petascale Particle

Data in Nvidia DGX-1

ADIOS I/O

• Abstracting metadata, data types, and dimensions from the source code into an XML file C

Fortran

POSIXMPIMPI_LUSTREPHDF5…DATASPACESDIMESFLEXPATHICEE

zlib, bzip2, szip, zfp, isobar

Alacrity

all data in adios_write() calls are buffered before writing to

the file system.

Page 27: Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

27Exploratory Visualization of Petascale Particle

Data in Nvidia DGX-1

ADIOS I/O

• Generate the c-code from the XML file

gwrite_Atoms.ch

gread_Atoms.ch

• Both files contains code to write and read ADIOS files

– You only need to modify your XML file an generate new *.ch files

– Main code remains the same

gpp.py atoms.xml

Page 28: Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 · 2017-05-19 · 2 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Oak Ridge Leadership

28Exploratory Visualization of Petascale Particle

Data in Nvidia DGX-1

ADIOS – Write/Read example

Write Read