View
55
Download
0
Category
Tags:
Preview:
DESCRIPTION
Ray Tracing in CUDA. Andrei Monteiro Marcelo Gattass Assignment 3 June 2010. Topics. Motivation Related Work Grid Construction Ray Tracing in CUDA Results Conclusion References. Motivation. - PowerPoint PPT Presentation
Citation preview
Ray Tracing in CUDA
Andrei MonteiroMarcelo GattassAssignment 3June 2010
Topics
MotivationRelated WorkGrid ConstructionRay Tracing in CUDAResultsConclusionReferences
Motivation
Ray Tracing is a technique for generating an image by launching rays for each pixel and calculating its intersections with the scene objects.
Simulates several effects naturally, such as reflection and refraction, producing a very high degree of virtual realism.
Computationally expensive.
Can use different acceleration structures. (Kd-trees, Uniform grid, BVH)
Why CUDA? Designed for General-Purpose Computing. Construction of Grid is faster than other structures (e.g. Kd-trees, BVH). Provides natural compactness, avoiding memory waste in contrast with the stencil
routing algorithm using GLSL. Use of shared memory speed up construction. Fast data transfers. Atomic operations
Motivation
Related Work
Uniform Grid A Parallel Algorithm for Construction of Uniform Grids,
Kalojanov, J. GPU-Accelerated Uniform Grid Construction for Ray
Tracing Dynamic Scenes, Ivson, P., Duarte, L., Celes, W.
Ray Tracing Understanding the Efficiency of Ray Traversal on GPUs,
Aila, T. NVIDIA. Ray Tracing on Programmable Graphics Hardware,
Purcell, T. Ray Tracing Animated Scenes using Coherent Grid
Traversal, Wald et al.
Grid Construction
Uniform Grid Speed Up simulation, avoids going
through every scene object to test intersection.
Supports Dynamic Scenes. Each voxel contains a list of primitives The ray traverses the grid.
Grid Resolution
Grid Construction
Algorithm in CUDA:1. Insert triangles in voxels
2. Calculate grid hash table
3. Sort the pairs
4. Write cell start and end
5. Reorder particles
Grid Construction
Insert triangles in voxels Bounding Box (corners) Check triangle plane Intersection Avoids more than the same reference of the triangle inside the
voxel.
Contained in more than one voxel
Contained in same voxel 4 times
Grid Construction
1. Grid Hash Table
Pair Cell Index – Particle Index. E.g. Cell Dimension = 3,
Grid resolution = 3x3
0 3 6 9
0 1
3
6
2
4 5
7 8
0
3
6
4 4 0 1 2 0 8 4 0 5 7 0 0 3 0 8 8 0 3 1 0 8 1 0 7 1 0
0 1 2 3 4 5 6 7 8
4 0 5 7 3 8 1 2 2
0 1 2 3 4 5 6 7 8
HASH:
Cell Index
Particle Index
PARTICLES:
9
0
1
2
3
4
5
6 7
8
Grid Construction
Sorting the Pairs1. In order to calculate the cells´start and
end, it is necessary to order particles in respect to cell indices which they belong.
2. Actually, the application sorts the previous hash table with respect to their keys, or cell indices.
3. Use of Radix Sort from CUDA SDK.
Grid Construction
1. Sorting the Pairs Sort Hash table by key values (cell indices).
4 0 5 7 3 8 1 2 2
0 1 2 3 4 5 6 7 8
HASH:
Cell Index
Particle Index
0 1 2 2 3 4 5 7 8
1 6 7 8 4 0 2 3 5
Sorted HASH:
Cell Index
Particle Index
Grid Construction
1. Finding Cell Start/End and Reordering Particles.
0 1 1 2
Sorted HASH:
Cell Index10 2 21
0 1 3 4 5 6 7 8 ... Current Thread2
Cell Index [thread_id]Cell Index [thread_id - 1]
0/2
Cell Start/End:
Cell Start / End 6/... 2/6
0 1 3 4 5 6 7 8 ... Cell Index2
Cell 0: end = 2Cell 1: start = 2
≠CellStart [Cell Index[thread_id]] = 2
CellEnd [Cell Index [thread_id -1]] = 2
Ray Tracing in CUDA
Can be easily parallelizedEach thread is responsible for one pixel /
ray intersection.Problems that slow performance:
Internal LoopsCause threads to diverge.
Random memory accessCauses bank conflicts, non-coalesce reading
Ray Tracing in CUDA
Kernels1. Build grid (if scene changes)
2. Setup rays (if camera moved)
3. Lauch Rays
4. Get Hits
5. Get Shadow Hits
6. Get Reflection Hits (repeat)
7. Shade
Ray Tracing in CUDA
Setup Rays Calculate the ray equation for each pixel.
One thread per pixel
dtotp
)(
Ray Tracing in CUDA
Lauch Rays Calculates the ray-grid intersection, if any. For rays that do
not intersect the grid, they are discarded for the next steps. Returns the first cell intersection and the parameters for
traversing the grid.
p(t)
Ray Tracing in CUDA
Get Hits The most expensive steps of the simulation. Typical algorithm:
Problem: Causes too much thread divergency. Solution: Use while-while algorithm
Causes less divergency
While (not hit or ray inside grid) {
Traverse cell;
if (! Cell empty) {
for each triangle in cell {
get hit ();
}
}
}
Ray Tracing in CUDA
while- while algorithm
while-while trace(): while ray not terminated while node does not contain primitives traverse to the next node while node contains untested primitives perform a ray-primitive intersection test
Ray Tracing in CUDA
Get HitsTraversal Algorithm: 3D DDA
if (nextx < nexty) nextx += deltax X += 1;else nexty += deltay Y += 1;process_grid(X, Y);
Ray Tracing in CUDA
Get HitsTriangle Intersection
MöllerEnables face culling.Greatly increased performanceCareful:
Triangles can be in more than one voxel, so it´s necessary to check if the intersection point is in the current voxel.
Ray Tracing in CUDA
Increase efficiency The internal loops make threads diverge and thus
lower performance. To contour this problem, NVIDIA researcher T. Aila
included a method called Persistent Threads in CUDA.
The idea is to keep threads busy while at least one of them is not done.
Increased performance depends on the GPU. 9800 GX2: 2.2x increase GTX 480: 3.0x increase
Ray Tracing in CUDA Persistent threads implementation code
Ray Tracing in CUDA
Shade Linear Interpolation using baricentric coordinates
Normal Texture
Texture Used CUDA 3D Texture to support variable number of scene
textures. Phong Shading
)(
)(
)(
)1(
)(
)(
)(ˆˆˆˆ
tb
tg
tr
rb
rg
rr
luzes
n
r
sb
sg
sr
b
g
r
db
dg
dr
b
g
r
s
db
dg
dr
ab
ag
ar
b
g
r
I
I
I
o
I
I
I
k
k
k
k
l
l
l
k
k
k
l
l
l
f
k
k
k
I
I
I
I
I
I
r
r
r
r
r
r
LrLn
Results
Real-Time Ray Tracing PerformanceDepends on:
Grid resolutionNumber of primitivesCamera in/outside gridShadow PassReflection Passes (1 or more times)
Scenes with reflections and many primitives vary about 20~30 fps
Results
Results
Results
Results
Results
Results
Results
Results
Conclusion
The user was able to replicate physical effects. CUDA is slower compared to other languages
(e.g. GLSL) if not optimizing and use its maximum optimization resources.
There are still several optimizations pending in this work. Math CUDA threads and kernels Too much memory used
References
Kalojanov, J. A Parallel Algorithm for Construction of Uniform Grids. High Performance Graphics, 2009. Retrieved in Apr 21 2010.
Ivson, P., Duarte, L., Celes, W., GPU-Accelerated Uniform Grid Construction for ray Tracing Dynamic Scenes.
Understanding the Efficiency of Ray Traversal on GPUs, Aila, T. NVIDIA Research. Retrieved May 23, 2010.
Ray Tracing on Programmable Graphics Hardware, Purcell, T. Stanford University. Retrieved May 28, 2010.
Ray Tracing Animated Scenes using Coherent Grid Traversal, Wald et al. SCI Institute, University of Utah. Retrieved May 25, 2010
NVIDIA CUDA Programming Guide. V. 2.0, 2008. Retrieved Mar 29, 2010.
Recommended