Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
ORNL is managed by UT-Battelle
for the US Department of Energy
Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1
Benjamin Hernandez, [email protected] Data and Workflows Group
Oak Ridge Leadership Computing Facility
Oak Ridge National Laboratory
2Exploratory Visualization of Petascale Particle
Data in Nvidia DGX-1
Oak Ridge Leadership Computing Facility (OLCF)
• Mission: Provide the computational and data resources required to solve the most challenging problems.
• Highly competitive user allocation programs (INCITE, ALCC).
• Projects receive 10x to 100x more resource than at other generally available centers.
• We partner with users to enable science & engineering breakthroughs.
3Exploratory Visualization of Petascale Particle
Data in Nvidia DGX-1
Sight:Exploratory Visualization of Scientific Data
• Client/Server architecture to provide high end visualization in laptops, desktops, and powerwalls.
• Heterogeneous scientific visualization
– Take advantage of both CPU & GPU resources within a node: DGX-1 use case.
– Advanced shading to enable new insights into data exploration.
• Parallel I/O & Data Staging
– “Pluggable” for in-situ visualization
• Lightweight tool
– Load your data
– Perform exploratory analysis
– Visualize/Save results
4Exploratory Visualization of Petascale Particle
Data in Nvidia DGX-1
Server (DGX-1)
Server (DGX-1)
Sight System Architecture (in progress)
Server (DGX-1 or multiGPU node)
CPU cores
Multi-GPU
OSPrayNvidiaOptix
VT
K-m
Lo
ca
l/P
ara
llel F
ile S
yste
m
HP
C C
luste
r
AD
IOS
I/O
Syste
m
VT
K-m
VisualizationFrames
Co
mp
ressio
n
Websockets
HTML Client
*OSPray and Nvidia Optix are finely tuned libraries for Ray Tracing in multicore and manycore architectures
5Exploratory Visualization of Petascale Particle
Data in Nvidia DGX-1
ADIOS
• ADIOS is an I/O framework
– Provides multiple methods to stage data to a staging area (on node, off node, off machine)
– Data output can be anything one wants
• Different methods allow for different types of data movement, aggregation, and arrangement to the storage system or to stream over the local-nodes, LAN, WAN
– It contains our own file format if you choose to use it (ADIOS-BP)
– Compress/decompress data in parallel
– Contains mechanisms to index and query data
6Exploratory Visualization of Petascale Particle
Data in Nvidia DGX-1
First Approach: OpenGL
VBO(points)
V.S.Apply transfer
function
G.S.Quads w/tex.
coords
F.S.Sphere gen. and Shading
7Exploratory Visualization of Petascale Particle
Data in Nvidia DGX-1
OpenGL Bindless Graphics
Initialization
Display
Address pointer
Vertices start from vboAddress to vboAddress + sizeof (float)*size
8Exploratory Visualization of Petascale Particle
Data in Nvidia DGX-1
(-1,-1)
(1,1)(-1,1)
(1,-1)
Fragment Shader sphere Generation
• Sphere equation:
𝑟2 = 𝑥 − 𝑥02 + 𝑦 − 𝑦0
2 + 𝑧 − 𝑧02
r = 1.0, z = 1.0
x = texCoord.x
y = texCoord.y
zz = 1.0 – x*x – y*y
if (zz <= 0.0) // removes fragments outside
discard;
// scale to the desired radius
// calculate diffuse illumination
9Exploratory Visualization of Petascale Particle
Data in Nvidia DGX-1
Results
10Exploratory Visualization of Petascale Particle
Data in Nvidia DGX-1
OpenGL Multi-GPU Rendering
• One MPI task for each device
– Easy to implement
– Each device initialize its GLX/EGL context
• Multi-threading. One thread per device
– In EGL is possible:
• Create the main context in the main thread:
mainCtx = eglCreateContext(display, config, 0, contextAttrs)
• Each additional thread create a shared context:
lclThrdCtx = eglCreateContext(display, config, mainCtx, contextAttrs);
• Implement some mutex/semaphores to sync any updates
– Vulkanize your viz !
• Devices are aware of other devices and can coordinate between each other
• That’s precisely NVIDIA Optix can do
11Exploratory Visualization of Petascale Particle
Data in Nvidia DGX-1
Second ApproachNvidia Optix Ray Tracing Engine
• “The OptiX API is an application framework for achieving optimal ray tracing performance on the GPU. It provides a simple, recursive, and flexible pipeline for accelerating ray tracing algorithms.”
• “Similar to OpenGL in doing the “heavy lifting” of ray tracing and leaving capability and technique to the developers”
– Plus it can use all GPUs available in your system
– Naturally fits material appearance and scene illumination
12Exploratory Visualization of Petascale Particle
Data in Nvidia DGX-1
• Optix provides eight programmable components, some of them are:
1. Ray generation
2. Intersection
3. Shading (closest hit)
Shadows (any hit)
Selector
• Shaders are CUDA like syntax
Nvidia Optix Programming Model
1
2
3
13Exploratory Visualization of Petascale Particle
Data in Nvidia DGX-1
Nvidia Optix Graph Nodes
• Geometry is defined by Graph nodes.
• A tree-like hierarchy where:
– Nodes at the bottom describes geometric objects.
– Nodes at the top describes collections of geometric objects.
Group
Transform Selector
GeometryGroup
GeometryInstance
GeometryInstance
GeometryGroup
GeometryGroup
Geometry Geometry
GeometryInstance
GeometryInstance
Geometry Geometry
Acceleration
AccelerationAcceleration
14Exploratory Visualization of Petascale Particle
Data in Nvidia DGX-1
Nvidia Optix Graph Nodes
• “Keep the hierarchy as flat as possible…”
GeometryGroup
GeometryInstance
Particles
Acceleration
But not too flat!
Group
GeometryGroup
GeometryInstance
Particles
Acceleration
AccelerationGeometry
Group
GeometryInstance
Particles
AccelerationGeometry
Group
GeometryInstance
Particles
Acceleration
15Exploratory Visualization of Petascale Particle
Data in Nvidia DGX-1
ResultsTest Systems
• Workstation
– CPU Intel Xeon 20 cores… 512 GB
– GPU Titan Z (2 Geforce Kepler GPU, 2x6 GB VRAM),
– Ubuntu 16, Nvidia Driver
• Rhea Node
– CPU Intel Xeon …
– GPU 2x Tesla K80 (4 Tesla Kepler GPU, 2x24 GB VRAM)
– Redhat 7. Nvidia Driver
• DGX-1
– CPU Intel…
– GPU 8x Tesla Pascal SMX, 8x16 GB VRAM
– Ubuntu 16, Nvidia Driver …
All systems:
CUDA 8.0Optix 4.0.2Acceleration Structure: TrbvhImage resolution: 1080pShading: Phong Illumination &
Ambient Occlusion
16Exploratory Visualization of Petascale Particle
Data in Nvidia DGX-1
Results. How fast is built the acceleration structure ? lower is better
1
10
100
1000
Workstation Rhea Node DGX-1
1 Million 10 Million 20 Million
808
Tim
e (
ms
) 108 85 80
Particles
1616
736
14741958
980
17Exploratory Visualization of Petascale Particle
Data in Nvidia DGX-1
ResultsPerformance, lower is better
0
25
50
75
100
125
150
175
200
225
250
275
Workstation Rhea node DGX-1
Frame rate (worst case)
30 59 127 600
ms
pe
r fr
am
e
Million particles
0
5
10
15
20
25
30
Workstation Rhea node DGX-1
Frame rate (best case)
30 59 127 600
Million particles
ms
pe
r fr
am
e
60 fps
30 fps
60 fps
30 fps
18Exploratory Visualization of Petascale Particle
Data in Nvidia DGX-1
Results
19Exploratory Visualization of Petascale Particle
Data in Nvidia DGX-1
Discussion
• DGX-1 can handle particle systems up to 10x larger in our test environment.
• For particle systems of the same size DGX-1 is 10x faster than the workstation system and 4.6x faster than Rhea node
– We expect for larger image resolutions DGX-1 speed up will increase.
• Our preliminary tests showed DGX-1 has enough compute power to drive a powerwall
– 3840 x 1080 @ 60 – 120 fps
• Test larger resolution
– Researchers usually are happy when
they can explore datasets even at 1 fps
20Exploratory Visualization of Petascale Particle
Data in Nvidia DGX-1
Discussion
• Nvidia Optix provides multi-GPU support with no hassle
– Test if Nvidia Optix leaves free resources for analysis tasks.
• Paging was removed in Optix 4.x
– DGX-1 includes 40 CPU cores and 512 RAM
– Using Nvidia Optix & OSPRay library will allow full system allocation to handle larger systems.
• Summit is likely to support EGL through the Nvidia GPU Drivers (do not take it as a fact or alternative fact neither!).
– Best if (pre)exascale visualization tools are 100 % CUDA compliant.
21Exploratory Visualization of Petascale Particle
Data in Nvidia DGX-1
Questions ?
Benjamin Hernandez, [email protected]
Advanced Data and Workflows Group
Oak Ridge Leadership Computing Facility
Oak Ridge National Laboratory
Acknowledgements:
Dylan Lacewell and the Nvidia Optix Team for their technical support.
Datasets provided by Cheng-Yu Shi and Leonid Zhigilei from the Computational Materials Group at University of Virginia.
This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
22Exploratory Visualization of Petascale Particle
Data in Nvidia DGX-1
Further Reading
• Tom True, Alina Alt (2013) “Configuring, Programming and Debugging Applications for Multiple GPUs” GTC 2013
• Wil Braithwaite “Multi-GPU Programming for Visual Computing” SIGGRAPH 2013
Available in GTC on Demand:
– http://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php
• Adios Manual
– http://users.nccs.gov/~pnorbert/ADIOS-UsersManual-1.11.0.pdf
• Optix Tutorial Talks– http://on-demand-gtc.gputechconf.com/gtcnew/on-demand-
gtc.php?searchByKeyword=optix&searchItems=&sessionTopic=&sessionEvent=&sessionFormat=&submit=&select=+
23Exploratory Visualization of Petascale Particle
Data in Nvidia DGX-1
2018 INCITE Call for Proposals• The 2018 INCITE Call for Proposals opened April 17, 2017 and closes June 23, 2017.
• Features large allocations of computer time and supporting resources at the Argonne and Oak Ridge Leadership Computing Facility (LCF) centers, operated by the US Department of Energy (DOE) Office of Science.
• Soliciting research proposals for awards of time on the 27-petaflop Cray XK7 Titan, and the 10-petaflop IBM Blue Gene/Q, Mira. In addition, certain 2018 INCITE awards will receive time on Argonne’s new Intel/Cray system, a 9.65-petaflops system called Theta.
• The INCITE program seeks research proposals for capability computing
– Production simulations, including ensembles, that use a large fraction of the LCF systems, or
– Proposals that require the unique LCF architectural infrastructure for high-performance computing projects that cannot be performed elsewhere
• The INCITE program is open to US and non-US based researchers.
• The INCITE program invites you to participate in an INCITE Proposal Writing Webinar, offered on April 19, May 18, and June 6.
• For more information visit http://www.doeleadershipcomputing.org/
24Exploratory Visualization of Petascale Particle
Data in Nvidia DGX-1
ResultsHow fast is built the acceleration structure ?
Time(%) Time Calls Avg Min Max Name
49.63% 53.663ms 111 483.45us 1.3430us 2.5468ms [CUDA memcpy HtoD]
22.64% 24.478ms 10 2.4478ms 2.4320ms 2.4768ms Megakernel_CUDA_1
20.43% 22.090ms 22 1.0041ms 3.7440us 2.2334ms [CUDA memcpy DtoH]
6.74% 7.2833ms 1 7.2833ms 7.2833ms 7.2833ms Megakernel_CUDA_0
0.30% 324.41us 1 324.41us 324.41us 324.41us [CUDA memcpy HtoA]
0.21% 231.71us 23 10.074us 1.2480us 150.49us [CUDA memset]
0.05% 55.903us 44 1.2700us 1.2150us 1.6320us [CUDA memcpy DtoD]
Time(%) Time Calls Avg Min Max Name
50.29% 43.010ms 133 323.38us 1.2800us 1.0984ms [CUDA memcpy HtoD]
34.61% 29.596ms 44 672.63us 3.3280us 1.0593ms [CUDA memcpy DtoH]
9.48% 8.1082ms 10 810.82us 623.39us 1.2392ms Megakernel_CUDA_1
5.03% 4.2986ms 1 4.2986ms 4.2986ms 4.2986ms Megakernel_CUDA_0
0.36% 303.77us 23 13.207us 1.1520us 211.29us [CUDA memset]
0.17% 148.67us 1 148.67us 148.67us 148.67us [CUDA memcpy HtoA]
0.06% 51.008us 44 1.1590us 1.1200us 1.1840us [CUDA memcpy DtoD]
Time(%) Time Calls Avg Min Max Name
43.50% 35.509ms 133 266.99us 1.1200us 1.1818ms [CUDA memcpy HtoD]
30.77% 25.122ms 44 570.95us 1.2800us 1.1629ms [CUDA memcpy DtoH]
18.13% 14.800ms 44 336.36us 1.7920us 371.84us [CUDA memcpy PtoP]
6.16% 5.0304ms 10 503.04us 332.39us 889.03us Megakernel_CUDA_1
1.11% 904.00us 1 904.00us 904.00us 904.00us Megakernel_CUDA_0
0.14% 112.03us 1 112.03us 112.03us 112.03us [CUDA memcpy HtoA]
0.13% 108.03us 23 4.6970us 1.0560us 60.032us [CUDA memset]
0.06% 47.264us 44 1.0740us 1.0560us 1.3440us [CUDA memcpy DtoD]
Workstation
Rhea node
DGX-1
25Exploratory Visualization of Petascale Particle
Data in Nvidia DGX-1
ResultsHow fast is built the acceleration structure ?
Time(%) Time Calls Avg Min Max Name
49.63% 53.663ms 111 483.45us 1.3430us 2.5468ms [CUDA memcpy HtoD]
22.64% 24.478ms 10 2.4478ms 2.4320ms 2.4768ms Megakernel_CUDA_1
20.43% 22.090ms 22 1.0041ms 3.7440us 2.2334ms [CUDA memcpy DtoH]
6.74% 7.2833ms 1 7.2833ms 7.2833ms 7.2833ms Megakernel_CUDA_0
0.30% 324.41us 1 324.41us 324.41us 324.41us [CUDA memcpy HtoA]
0.21% 231.71us 23 10.074us 1.2480us 150.49us [CUDA memset]
0.05% 55.903us 44 1.2700us 1.2150us 1.6320us [CUDA memcpy DtoD]
Time(%) Time Calls Avg Min Max Name
50.29% 43.010ms 133 323.38us 1.2800us 1.0984ms [CUDA memcpy HtoD]
34.61% 29.596ms 44 672.63us 3.3280us 1.0593ms [CUDA memcpy DtoH]
9.48% 8.1082ms 10 810.82us 623.39us 1.2392ms Megakernel_CUDA_1
5.03% 4.2986ms 1 4.2986ms 4.2986ms 4.2986ms Megakernel_CUDA_0
0.36% 303.77us 23 13.207us 1.1520us 211.29us [CUDA memset]
0.17% 148.67us 1 148.67us 148.67us 148.67us [CUDA memcpy HtoA]
0.06% 51.008us 44 1.1590us 1.1200us 1.1840us [CUDA memcpy DtoD]
Time(%) Time Calls Avg Min Max Name
43.50% 35.509ms 133 266.99us 1.1200us 1.1818ms [CUDA memcpy HtoD]
30.77% 25.122ms 44 570.95us 1.2800us 1.1629ms [CUDA memcpy DtoH]
18.13% 14.800ms 44 336.36us 1.7920us 371.84us [CUDA memcpy PtoP]
6.16% 5.0304ms 10 503.04us 332.39us 889.03us Megakernel_CUDA_1
1.11% 904.00us 1 904.00us 904.00us 904.00us Megakernel_CUDA_0
0.14% 112.03us 1 112.03us 112.03us 112.03us [CUDA memcpy HtoA]
0.13% 108.03us 23 4.6970us 1.0560us 60.032us [CUDA memset]
0.06% 47.264us 44 1.0740us 1.0560us 1.3440us [CUDA memcpy DtoD]
Workstation
Rhea node
DGX-1
26Exploratory Visualization of Petascale Particle
Data in Nvidia DGX-1
ADIOS I/O
• Abstracting metadata, data types, and dimensions from the source code into an XML file C
Fortran
POSIXMPIMPI_LUSTREPHDF5…DATASPACESDIMESFLEXPATHICEE
zlib, bzip2, szip, zfp, isobar
Alacrity
all data in adios_write() calls are buffered before writing to
the file system.
27Exploratory Visualization of Petascale Particle
Data in Nvidia DGX-1
ADIOS I/O
• Generate the c-code from the XML file
gwrite_Atoms.ch
gread_Atoms.ch
• Both files contains code to write and read ADIOS files
– You only need to modify your XML file an generate new *.ch files
– Main code remains the same
gpp.py atoms.xml
28Exploratory Visualization of Petascale Particle
Data in Nvidia DGX-1
ADIOS – Write/Read example
Write Read