40
April 4-7, 2016 | Silicon Valley Peter Messmer, 4/4/2016 SCIENTIFIC VISUALIZATION IN HPC

SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

April 4-7, 2016 | Silicon Valley

Peter Messmer, 4/4/2016

SCIENTIFIC VISUALIZATION IN HPC

Page 2: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

2

"Yes," said Deep Thought, "I can do it."

[Seven and a half million years later.... ]

“The Answer to the Great Question... Of Life, the Universe and Everything...

Is... Forty-two,' said Deep Thought, with infinite majesty and calm.”

— Douglas Adams, Hitchhiker’s Guide to the Galaxy

HIGH PERFORMANCE COMPUTING TODAY*

*mostly

Page 3: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

3

Accuracy

Latency

HPC

application

Month

Week

Day

Hour

5 min

100 ms

30 ms

10 ms

Sit in it Has Engine Moves Flies

Page 4: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

4

Accuracy

Latency

HPC

application

Action

Game

Month

Week

Day

Hour

5 min

100 ms

30 ms

10 ms

Sit in it Has Engine Moves Flies

Page 5: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

5

Accuracy

Latency

HPC

application

Action

Game

Month

Week

Day

Hour

5 min

100 ms

30 ms

10 ms

Sit in it Has Engine Moves Flies

Flight

Simulator

CG Movie

Page 6: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

6

Accuracy

Latency

HPC

application

Action

Game

Month

Week

Day

Hour

5 min

100 ms

30 ms

10 ms

Sit in it Has Engine Moves Flies

Flight

Simulator

CG Movie

Parameter Space

Exploration,

Approximate models

Page 7: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

7

Accuracy

Latency

HPC

application

Action

Game

Month

Week

Day

Hour

5 min

100 ms

30 ms

10 ms

Sit in it Has Engine Moves Flies

Opportunity!

Flight

Simulator

CG Movie

Parameter Space

Exploration,

Approximate models

Explorative Science,

Real-time systems

Page 8: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

8

BONSAI WITH IN-SITU VIZ ON PIZ DAINT

Presented at SC14, streaming from CSCS/Switzerland to New Orleans Presented at SC14, streaming from CSCS/Switzerland to New Orleans

Compute & Vis on 1024 GPU nodes Live Streaming

J. Bedorf, E. Gaburov, P.Messmer, S. Portegies Zwart

Page 9: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

9

BONSAI WITH IN-SITU VIZ ON PIZ DAINT

Presented at SC14, streaming from CSCS/Switzerland to New Orleans Presented at SC14, streaming from CSCS/Switzerland to New Orleans

Compute & Vis on 1024 GPU nodes Live Streaming

Page 10: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

10

Page 11: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

11

Coordinate

transformations

Feature

extraction

Thresholding

Isosurfaces,

Isovolumes

Streamlines

Field Operators

(Gradient, Curl,.. )

Clip, Slice

Binning,

Resample

Surface

Rendering

Volume

Rendering Line

Rendering

Compositing

VISUALIZATION ≠ RENDERING * * but it’s a part of it

Page 12: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

12

VISUALIZATION PIPELINE

- Analysis: Data processing to extract meaningful quantities of interest

- Filtering: Conversion of simulation data into data ready for rendering

- Rendering: Conversion of shapes to pixels (Fragment processing)

- Compositing: Combination of independently generated pixels into final frame

Your typical scientific visualization system

Simulation

Visualization

Analysis& Filtering

Rendering Compositing Delivery

Page 13: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

13

VISUALIZATION TOOLKIT (VTK)

Focus on visualization, not (only) rendering

Provides more complex operations on data (“filtering”)

Visualization pipeline

At the core of many high-level viz tools

Paraview, Visit, ..

Developed by Kitware, open source

http://www.vtk.org

Venerable backbone of scientific visualization

Tue - 15:00 : S6193 - Visualization Toolkit: Improving Rendering and Compute on GPU's

Page 14: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

14

VTK-M

Ongoing development (Sandia, Kitware, ORNL, ..)

VTK type filters and more fine-grained “worklets”

Platform portable (GPU, multicore CPU)

Thrust, TBB backend

http://m.vtk.org

Visualization algorithms on modern architectures

Wed - 15:00 : S6352 - Adapting the Visualization Toolkit for Many-Core Processors with the VTK-m Library

Page 15: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

15

TYPICAL HPC ENVIRONMENT

Workstation on scientist’s desk

Remote HPC center

Compute nodes not directly accessible

Output from compute nodes:

- File transfer

- X forwarding

- Remote rendering

Login Node

Workstation

GPU

Compute Node

GPU Compute Node

GPU

Page 16: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

16

X-FORWARDING

ssh –Y loginnode.edu

ssh –Y computenode

No extra process on compute node

Rendering by workstation GPU

X server on workstation needed

Often prohibitively slow

Be prepared for latencies

Login Node

Workstation

GPU

Compute Node

GPU Compute Node

GPU

Page 17: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

17

REMOTE RENDERING Know your latencies

Login Node

Workstation

GPU

Compute Node

GPU Compute Node

GPU

Use compute node’s GPU for rendering

Capture renderings and ship pixel data to user

Compression is key

Requires running X server on compute node*

* Requirements will change with EGL

Page 18: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

18

Stellar combustion visualized on

Blue Waters (26 TB dataset)

Remote visualization on Blue Waters

Paul Woodward, U. Minnesota: HVR w/ OpenGL on Blue Waters

Improvement in time to solution

6 GPUs in local viz cluster

128 GPUs in HPC center

Data transfer Rendering 48 days

1 day

• Limited resources in the local viz cluster

• Long data transfer times

48x speed ups by using the Tesla GPUs

in the HPC center

18 Data courtesy of John Stone, UIUC

Page 19: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

19

REMOTE RENDERING: VIDEO ENCODING

Open source approaches: TurboVNC + VirtualGL

Currently not leveraging HW H264 encoder

Commercial tools (e.g. Nice DCV)

Leverages HW encoder

No application modification needed

https://www.nice-software.com/products/dcv

NvENC library to access HW H264 encoder (lossless on Maxwell)

https://developer.nvidia.com/nvidia-video-codec-sdk

Interactivity over large distances

Tue - 13:00: S6253 - VMD: Petascale Molecular Visualization and Analysis with Remote Video Streaming

Page 20: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

20

OPENGL: GPU ACCELERATED RENDERING

•Primitives: points, lines, polygons

•Properties: colors, lighting, textures, ..

•View: camera position and perspective

•Shaders: Rendering to screen/framebuffer

•C-style functions, enums

See e.g. “What Every CUDA Programmer Should Know About OpenGL”

(http://www.nvidia.com/content/GTC/documents/1055_GTC09.pdf)

Mon - 10:00 : S6817 - High-Performance, Low-Overhead Rendering with OpenGL and Vulkan

Mon - 14:00 : H6139 - Hangout: Maximizing Performance of CUDA and OpenGL Applications

Page 21: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

21

VIS TOOLS EMBRACE OPENGL ON EGL

Prior to EGL: X server required for GPU accelerated OpenGL

Full OpenGL on EGL announced at SC16

With EGL: OpenGL without X

Major enabler for GPU rendering in HPC, incl. Cray systems*

Quick adoption by vis tool developers

https://devblogs.nvidia.com/parallelforall/egl-eye-opengl-visualization-without-x-server/

* Driver version 358.7 or newer required

Streamlined GPU accelerated off-screen rendering

4/20/2016

Page 22: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

22

USE CASE: CSI ENSIGHT 10.1.6D

“Customers post-process large collections of simulations offline

Even though rendering takes place offscreen, EnSight requires that an X server is running, and is open enough to allow users to access it. At some sites, it is unacceptable to configure an X server to have wide open access.

By using EGL to make our GL context and pbuffer, we can remove the need to start an Xserver, and solve all of these system management problems.”

Dave Bremer, CEI

EGL for batch renderer

4/20/2016

Image by Astec Inc

Page 23: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

23

PARAVIEW: DESIGNED FOR HPC

Supports all major scientific data formats

Extensive collection of “operators”: isosurface, streamlines, volume renderer, reductions, custom operators, ..

Approachable user interface

Supports remote, distributed memory vis

Open source, free

Extensible via plugins

Satisfying scientific computing needs

4/20/2016

Page 24: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

24

MODERN OPENGL FOR HPC VIZ

VTK supports now OpenGL 3.2

Enables advanced shaders (AO, VXGI, ..)

Some algorithms well suited for

distributed memory rendering

GPU hardware support

Mandatory to access advanced rendering features

Data courtesy Florida Intl University & TACC

Page 25: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

25

OPENGL RENDERING POWERHOUSE OpenGL vs OpenSWR

Big

ger

is b

ett

er

Page 26: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

26

OPENGL NOT LIMITED TO RENDERING TASKS

CUDA->OpenGL typically one-way only

EGL enables lighter weight access to OpenGL

No X server needed

Potential use of OpenGL for rasterization-like problems?

Determine covered “pixels”

3D ordering/occlusion via Z-buffer

Interop goes both ways, esp with EGL

Page 27: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

27

SCALABLE RENDERING AND COMPOSITING

Large-scale (volume) data visualization

Interactive visualization of TB of data

Stand-alone or coupling into simulation

HW Accelerated remote rendering

Plugin for ParaView

http://www.nvidia-arc.com/products/nvidia-index.html

NVIDIA INDEX

Mon - 10:00: S6590 - HPC Visualization Using NVIDIA IndeX™

Page 28: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

28

RC coming out soon.. Email for enquiries

SCALABLE VOLUME RENDERING IN PARAVIEW

Index plugin addresses shortcomings in ParaView built-in volume renderer

Beta version supports

- 3D structured, scalar grids

- 32bit float, 16 bit/8bit uint

- Overlay of opaque ParaView geometries (e.g. streamlines)

Free plugin, requires commercial IndeX license

Plugin enables GPU accelerated volume rendering

4/20/2016 Tue - 16:00 : S6670 - Toward Bridging the Gap Between High Quality and High Performance for HPC Visualization

Page 29: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

29

Advanced Rendering in scientific visualization

Two lights, no shadows

Two lights,

hard shadows, 1 shadow

ray per light

Ambient occlusion + two

lights, 144 AO rays/hit

• Ray tracing offers ambient occlusion lighting, shadows, high quality transparent surfaces

Better insight with visual cues

Courtesy of John Stone, UIUC

Page 30: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

30

OPTIX RAY TRACING FRAMEWORK

•GPU accelerated ray-tracing framework

•Build your own RT application

•Generic Ray-Geometry interaction

•Rays with arbitrary payloads

•Multi-GPU-support

Tue - 14:00 : H6148 - Hangout: CUDA for HPC Simulation and Visualization Tue - 14:00 : H6150 - Hangout: OptiX Ray Tracing Library: Best practices and Use-Case Consultation Wed -10:30 S: S6320 - Opticks: Optical Photon Simulation for High Energy Physics with NVIDIA OptiX™

Page 31: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

31

TELLING A BETTER STORY, VISUALLY

Advanced rendering can help visual message, e.g. guiding the eye via depth of field

Particularly useful for complex visualizations

Interactive ray-tracing via NVIDIA Iray

Post-processing of ParaView files

Advanced rendering improves messaging

Mon - 15:00 : H6142A - Hangout: Iray® Rendering for Developers

Page 32: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

32

PARALLEL COMPOSITING WITH ICE-T

•Each node renders fraction of image

•Sort last compositing

•Widely used (Paraview, VisIt .. )

•Critical element for real-time viz

•Up to 30 fps for 4k frames on 1024 nodes

•Cray XC30, Piz Daint @ CSCS

http://icet.sandia.gov

Tue- 14:30 : S6808 - Image Compositing on GPU-Accelerated Supercomputers

Modern networks remove compositing bottleneck

Page 33: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

33

HIGH FRAMERATE = MINIMAL IMPACT ON SIMULATION

Real-time visualization only one use case

Batch processing will not go away

Acceptable time budget for visualization/analysis

Up to the I/O time, ~ 2 %

More diagnostics in the same time

E.g. ParaView Cinema

FPS matter, even in HPC

Page 34: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

34

VISUALIZATION-ENABLED SUPERCOMPUTERS

http://blogs.nvidia.com/blog/2014/11/19/gpu-in-

situ-milky-way/

CSCS Piz Daint NCSA Blue Waters

Galaxy formation

http://devblogs.nvidia.com/parallelforall/hpc

-visualization-nvidia-tesla-gpus/

ORNL Titan

Molecular dynamics

Cosmology

http://www.sdav-scidac.org/29-

highlights/visualization/66-accelerated-cosmology-

data-anal.html

Page 35: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

35

CO-PROCESSING PARTITIONED

SYSTEM LEGACY

WORKFLOW

COMPUTE+VIS SUPPORTS MULTIPLE WORKFLOWS

Separate compute & vis system

Communication via file system

Compute and visualization on same GPU

Communication via host-device transfers or memcpy

Different nodes for different roles

Communication via high-speed network

Page 36: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

36

IN-SITU VIS: ADVANTAGES AND OPPORTUNITIES

- Minimize IO traffic

- Exploit data locality

- Less pressure on file system

- Less time wasted in I/O

- Reduce latency to first result

- Monitoring, early termination

- Enable real-time/interactive visualization

- Novel workflows, new applications

Workstation

File System

GPU-accelerated Supercomputer

Tue – 9:00: S6633 - Navigating the In-Situ Visualization Landscape

Page 37: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

37

COSMO WITH IN-SITU VISUALIZATION

COSMO-1 model, operational weather model

in Switzerland

6 nodes Cray CS-Storm system

8 K80 GPUs/node

96 GPU sockets total

~ 20s per 0.7s

NVIDIA IndeX for visualizaiton

Tue - 13:30: S6628 - Co-Designing GPU-Based Systems and Tools for Numerical Weather Predictions

Live, Interactive Weather Simulation

Page 38: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

38

IN-SITU VISUALIZATION ON TITAN

“When running PyFR at scale, it

generates very large data sets that

need analyzing for acoustics. The

traditional post hoc method is simply

not fit for purpose – in situ

visualization and processing are

critical. We see a potential for 50x

speed ups with in situ, which

significantly accelerates our scientific

discovery”

First prototype of ParaView in-situ

visualization capabilities in pyFR (CFD)

simulations, predicting jet engine acoustics

Both compute and visualization running

on Titan GPUs and streaming to a remote

location

- Dr. Peter Vincent Imperial College

Thu-10:30 : S6329 - Petascale Computational Fluid Dynamics with Python on GPUs Tue-15:00 : S6193 - Visualization Toolkit: Improving Rendering and Compute on GPU's

Page 39: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

39

VISUALIZATION IN HPC

Leverage graphics capabilities on heterogeneous nodes

GPUs offer features for visualization, rendering, remote viz

Modern networks enable parallel compositing at massive scale

Graphics capabilities may help for graphics-like algorithms

Supported by popular visualization tools

Fast rendering relevant even for batch processing

In-situ visualization for monitoring, steering, and other novel workflows

Page 40: SCIENTIFIC VISUALIZATION IN HPCon-demand.gputechconf.com/gtc/2016/presentation/s6645-peter-messmer-scivis.pdf3 Accuracy Latency HPC application Month Week Day Hour 5 min 100 ms 30

April 4-7, 2016 | Silicon Valley

THANK YOU

JOIN THE NVIDIA DEVELOPER PROGRAM AT developer.nvidia.com/join