Upload
francis-powell
View
216
Download
2
Tags:
Embed Size (px)
Citation preview
On a Few Ray Tracing like Algorithms and Structures.
-Ravi Prakash Kammaje-Swansea University
Ray Tracing
• Naïve method– Intersect every ray against every triangle– O (rays * trs)
• Need better methods
Data Structures
BSP Tees Uniform Grid
Octree
Bounding Volume (Box) Hierarchy
Kd-trees• A specialised BSP Tree• Axes restricted to X, Y
and Z axes• Among most widely
used for ray tracing– SAH
• Heuristic to build trees suitable for Ray Tracing
– Cheap Traversal
RBSP Trees
• Form of BSP tree– Space partitioning– Binary – 2 children at each node
• Predetermined axes– Number of axes, m– Axes
• Construction and Traversal – Similar to kd-trees– Heuristics from kd-tree borrowed
RBSP Trees - Example
kd-tree RBSP tree, 24 axes
RBSP Trees - Construction• Predetermine Axis• Methods to predetermine m axes• Evenly spaced points on
Sphere• Find evenly spaced points on
unit sphere• Use vector from centre to
points as axes• Advantage
• Has an even distribution of axes• Disadvantage
• Axes are not customised to scene
Construction• Recursive process
• Find bounding volume
• At each node• Find a split plane
• Use a heuristic• Classify triangles• Continue until
• very few triangles are in node• A maximum depth is reached
• Split Plane Selection• Use SAH over all axes• Select plane with minimum
cost
RBSP Trees - Traversal• Standard slabs method
• Over m planes• Find intersection of ray
and plane• Precomputes divides
• Number of divide operations = m
• If m is large, divide operations cause slowdowns
• Use SSE to perform 4 divides • Accelerates ray tracing
RBSP Trees - Results
• Makes RBSP trees faster than kd-trees• A structure that shows Ray tracing potential• Better than kd-trees for models with non-axis aligned
scenes• Needs better heuristics to predetermine axes
0
500
1000
1500
2000
2500
3000
Armadillo Bunny Happy Buddha
Sphere SponzaDragon
Traversal 3 - Rendering times using RBSP trees of 3, 4, 8, 12, 16, 20 and 24 axes
Row Tracing• Combines rasterization and ray
tracing concepts• A form of Packet ray tracing –
Packets of rays spanning an entire row
• Row can be– A 2D plane
• Simpler traversal• Easy row / triangle intersection
– per-pixel cost less than ray / triangle intersections
• A 1D line – Simplifies clipping, occlusion testing and frustum testing
Row Tracing - Algorithm
• High level algorithm– Traverse row-plane through
kd-tree or octree– Rasterize leaf node
triangles with scanline algorithm
• Very similar to Ray tracing• Early ray termination not
possible• Use 1D Hierarchical
Occlusion Maps to achieve this
Row Tracing – Hierarchical Occlusion Maps
• Important optimization– Indicates already occluded
parts of a Row• 1D version of HOM by
Zhang, et al. (1997)• Lowest level – 1 pixel• Each upper level – 2 bits
of lower level• For a row with 1024
pixels, lowest level – 128 chars
• Entire HOM – 256 chars
Row Tracing – Hierarchical Occlusion Maps
• Initialize prior to traversal– Set all bits to zero– The entire row is
unoccluded• Updating the HOM
– Triangles rasterization– Corresponding lowest level
bits are set to 1– Upper levels updated if
necessary• Testing for Occlusion
– Skip occluded nodes– Optimize rasterization
Packet Row Tracing• Row-Packet / Node intersection
– Case 1 – All rows in packet hit the node
– Case 2 – Row packet misses node– Case 3 – Divergence nodes – Trace
individual rows from these nodes
• Occlusion testing – Test each row individually
• Leaf node – All rows are rasterized with leaf node’s triangles
• Easily multi threadable
Row Tracing – vs Packet Ray Tracing
Row Tracing – vs OpenGL
GPUs• Very Powerful• Highly Parallel• Example
– Nvidia GeForce GTX 285• 240 cores• 648 MHz Graphics Clock• 1476 MHz Processor Clock• 1 GB GDDR3 SDRAM
• General Purpose on graphics hardware is getting popular
GPU based Algorithms
• GPUs are much faster at doing parallel tasks• However, simple tasks require special
algorithms to effectively utilise this • Example
– Scan of an array – Find sum of all previous elements in the array
– Input : {3,7,1,5,8,2,8,1,8,6,2,8}– Output : {3,10,11,16,24,26,34,35,43,49,51,59}
GPU based Algorithms
• On CPUfor(i=1; i < num; ++i)
arr[i] = arr[i]+arr[i-1];
• On GPU – Use parallel scanning algorithm– Make use of several threads– Each element finds sum of itself and element at an
offset
GPU Algorithm – Parallel sum
Same number of threads as number of elements in arrayOffset = 1 => Each thread finds
sum of itself and it’s neighbouring elementDouble the offsetIterate until offset < number of elements
Can be optimised further by using blocks of threads and intermediate results
Fast ray sorting and breadth-first Packet Traversal for GPU ray tracing
- Garanzha and Loop• Sort rays on the GPU
– Generate a hash code for each ray based on• Direction of ray• Origin of ray
– If rays have same hash code• Considered coherent
– Sorted into bins • Each bin has < maxSize rays
– Compression, Sorting, Decompression scheme• Utilises GPU efficiently
• Create frustum for each bin• Breadth first traverse a BVH of triangles
OpenCL
• Based on C• Framework for developing heterogenous
applications– In theory
• Some parts can be run on GPU• Some on CPU
• Initially developed by Apple
OpenCL
OpenCL – early impressions• Still very early
– Complex code– Runs on both CPUs and GPUs
• Potentially easier to debug on CPUs prior to porting to GPUs• Can allocate work based on suitability• Runs on NVIDIA and AMD / ATI cards
• CUDA – much easier to program– Much cleaner code– Not cross platform– Only on NVIDIA GPUs
Conclusion
• A few ray tracing like structures and algorithms– RBSP Trees– Row Tracing
• Brief summary of GPU Algorithms– Parallel scan– Ray tracing by ray sorting – Garanzha and Loop– OpenCL