83
Fast Parallel Construction of High - Quality Bounding Volume Hierarchies Tero Karras Timo Aila

Karras BVH

Embed Size (px)

Citation preview

Page 1: Karras BVH

Fast Parallel Construction of High-Quality Bounding Volume Hierarchies

Tero Karras

Timo Aila

Page 2: Karras BVH

Ray tracing comes in many flavors

2

Interactive apps

1M–100Mrays/frame

Architecture & design

100M–10Grays/frame

Movie production

10G–1Trays/frame

© Activision 2009, Game trailer by Blur Studio

Courtesy of Delta Tracing Lucasfilm Ltd.™, Digital work by ILM

Courtesy of Columbia Pictures

NVIDIA

Courtesy of Dassault Systemes

Page 3: Karras BVH

Effective performance

3

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑟𝑎𝑦 𝑡𝑟𝑎𝑐𝑖𝑛𝑔 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 =𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠

𝑟𝑒𝑛𝑑𝑒𝑟𝑖𝑛𝑔 𝑡𝑖𝑚𝑒

Page 4: Karras BVH

Effective performance

4

𝑟𝑒𝑛𝑑𝑒𝑟𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 = 𝑡𝑖𝑚𝑒 𝑡𝑜 𝑏𝑢𝑖𝑙𝑑 𝐵𝑉𝐻 +𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠

𝑟𝑎𝑦 𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑟𝑎𝑦 𝑡𝑟𝑎𝑐𝑖𝑛𝑔 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 =𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠

𝑟𝑒𝑛𝑑𝑒𝑟𝑖𝑛𝑔 𝑡𝑖𝑚𝑒

“speed” “quality”

Page 5: Karras BVH

Effective performance

5

0

50

100

150

200

250

300

350

400

450

1M 10M 100M 1G 10G 100G 1T

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

𝑟𝑒𝑛𝑑𝑒𝑟𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 = 𝑡𝑖𝑚𝑒 𝑡𝑜 𝑏𝑢𝑖𝑙𝑑 𝐵𝑉𝐻 +𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠

𝑟𝑎𝑦 𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡

Interactiveapps

Architecture& design

Movieproduction

Both matter

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑟𝑎𝑦 𝑡𝑟𝑎𝑐𝑖𝑛𝑔 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 =𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠

𝑟𝑒𝑛𝑑𝑒𝑟𝑖𝑛𝑔 𝑡𝑖𝑚𝑒

Mrays/s

Page 6: Karras BVH

Effective performance

6

0

50

100

150

200

250

300

350

400

450

1M 10M 100M 1G 10G 100G 1T

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Mrays/s

SODA (2.2M tris)

NVIDIA GTX Titan

Diffuse rays

Page 7: Karras BVH

Effective performance

7

0

50

100

150

200

250

300

350

400

450

1M 10M 100M 1G 10G 100G 1T

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒

SBVH[Stich et al. 2009]

(CPU, 4 cores)

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Build timedominates

Mrays/s

Page 8: Karras BVH

Effective performance

8

0

50

100

150

200

250

300

350

400

450

1M 10M 100M 1G 10G 100G 1T

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒

HLBVH[Garanzha et al. 2011]

(GPU)

???

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Mrays/s

SBVH[Stich et al. 2009]

(CPU, 4 cores)

Page 9: Karras BVH

Effective performance

9

0

50

100

150

200

250

300

350

400

450

1M 10M 100M 1G 10G 100G 1T

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒

Our method(GPU)

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Mrays/s

HLBVH[Garanzha et al. 2011]

(GPU)

SBVH[Stich et al. 2009]

(CPU, 4 cores)

Page 10: Karras BVH

0

50

100

150

200

250

300

350

400

450

1M 10M 100M 1G 10G 100G 1T

Effective performance

10

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒

30M–500Grays/frame

97% ofSBVH

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Mrays/s

Best quality–speed tradeoff for wide range of applications

Page 11: Karras BVH

Treelet restructuring

11

IdeaBuild a low-quality BVH

Optimize its node topology

Look at multiple nodes at once

Page 12: Karras BVH

Treelet restructuring

12

IdeaBuild a low-quality BVH

Optimize its node topology

Look at multiple nodes at once

TreeletSubset of a node’s descendants

Page 13: Karras BVH

R

Treelet restructuring

13

Treelet root

Treelet leaf

IdeaBuild a low-quality BVH

Optimize its node topology

Look at multiple nodes at once

TreeletSubset of a node’s descendants

Page 14: Karras BVH

R

Treelet restructuring

14

Treelet root

Treelet leaf

IdeaBuild a low-quality BVH

Optimize its node topology

Look at multiple nodes at once

TreeletSubset of a node’s descendants

Grow by turning leaves into internal nodes

Grow

Page 15: Karras BVH

R

Treelet restructuring

15

Treeletinternal

node

Grow

IdeaBuild a low-quality BVH

Optimize its node topology

Look at multiple nodes at once

TreeletSubset of a node’s descendants

Grow by turning leaves into internal nodes

Page 16: Karras BVH

R

Treelet restructuring

16

IdeaBuild a low-quality BVH

Optimize its node topology

Look at multiple nodes at once

TreeletSubset of a node’s descendants

Grow by turning leaves into internal nodes

Page 17: Karras BVH

R

Treelet restructuring

17

IdeaBuild a low-quality BVH

Optimize its node topology

Look at multiple nodes at once

TreeletSubset of a node’s descendants

Grow by turning leaves into internal nodes

Page 18: Karras BVH

R

Treelet restructuring

18

IdeaBuild a low-quality BVH

Optimize its node topology

Look at multiple nodes at once

TreeletSubset of a node’s descendants

Grow by turning leaves into internal nodes

Page 19: Karras BVH

Treelet restructuring

19

R

IdeaBuild a low-quality BVH

Optimize its node topology

Look at multiple nodes at once

TreeletSubset of a node’s descendants

Grow by turning leaves into internal nodes

Largest leaves → best results

Page 20: Karras BVH

Treelet restructuring

20

R

C

A B

F

D

G

E

𝑛 = 7treelet leaves

𝑛 − 1 = 6treelet internal

nodesIdeaBuild a low-quality BVH

Optimize its node topology

Look at multiple nodes at once

TreeletSubset of a node’s descendants

Grow by turning leaves into internal nodes

Largest leaves → best results

Valid binary tree in itself

Page 21: Karras BVH

Treelet restructuring

21

R

C

A B

F

D

G

E

ActualBVH leaf

Arbitrarysubtree

IdeaBuild a low-quality BVH

Optimize its node topology

Look at multiple nodes at once

TreeletSubset of a node’s descendants

Grow by turning leaves into internal nodes

Largest leaves → best results

Valid binary tree in itselfLeaves can represent arbitrary subtrees of the BVH

Page 22: Karras BVH

Treelet restructuring

22

R

C

A B

F

D

G

E

RestructuringConstruct optimal binary tree for the same set of leaves

Replace old treelet

Page 23: Karras BVH

Treelet restructuring

23

C

A B

F

D

G

E

R

RestructuringConstruct optimal binary tree for the same set of leaves

Replace old treelet

Reuse the same nodesUpdate connectivity and AABBs

New AABBs should be smaller

Page 24: Karras BVH

D

E

G

Treelet restructuring

24

A

R

C

F

B

RestructuringConstruct optimal binary tree for the same set of leaves

Replace old treelet

Reuse the same nodesUpdate connectivity and AABBs

New AABBs should be smaller

Page 25: Karras BVH

Treelet restructuring

25

CA BF

D

G E

R

RestructuringConstruct optimal binary tree for the same set of leaves

Replace old treelet

Reuse the same nodesUpdate connectivity and AABBs

New AABBs should be smaller

Page 26: Karras BVH

Treelet restructuring

26

R

CA BF

D

G E

RestructuringConstruct optimal binary tree for the same set of leaves

Replace old treelet

Reuse the same nodesUpdate connectivity and AABBs

New AABBs should be smaller

Perfectly localized operationLeaves and their subtrees are kept intact

No need to look at subtree contents

Page 27: Karras BVH

Processing stages

27

Initial BVH construction

Post-processing

Optimization

Input triangles

One triangleper leaf

Page 28: Karras BVH

Processing stages

28

Initial BVH construction

Post-processing

Optimization

Parallel LBVH[Karras 2012]

60-bit Morton codesfor accurate spatial

partitioning

Page 29: Karras BVH

Processing stages

29

Initial BVH construction

Post-processing

Optimization

Parallel bottom-up traversal[Karras 2012]

Restructure multipletreelets in parallel

Page 30: Karras BVH

Processing stages

30

Initial BVH construction

Post-processing

Optimization

Parallel bottom-up traversal[Karras 2012]

Page 31: Karras BVH

Processing stages

31

Initial BVH construction

Post-processing

Optimization

Parallel bottom-up traversal[Karras 2012]

Page 32: Karras BVH

Processing stages

32

Initial BVH construction

Post-processing

Optimization

Parallel bottom-up traversal[Karras 2012]

Page 33: Karras BVH

Processing stages

33

Initial BVH construction

Post-processing

Optimization

Parallel bottom-up traversal[Karras 2012]

Page 34: Karras BVH

Processing stages

34

Initial BVH construction

Post-processing

Optimization

Parallel bottom-up traversal[Karras 2012]

Page 35: Karras BVH

Processing stages

35

Initial BVH construction

Post-processing

Optimization

Parallel bottom-up traversal[Karras 2012]

Strict bottom-up order→ no overlap between treelets

Page 36: Karras BVH

Processing stages

36

Initial BVH construction

Post-processing

Rinse and repeat(3 times is plenty)

Optimization

Page 37: Karras BVH

Processing stages

37

Initial BVH construction

Post-processing

Optimization

Collapse subtreesinto leaf nodes

Page 38: Karras BVH

Processing stages

38

Initial BVH construction

Post-processing

Optimization

Collect trianglesinto linear lists Prepare them for Woop’s

intersection test[Woop 2004]

Page 39: Karras BVH

Processing stages

39

Initial BVH construction

Post-processing

Optimization

Fast GPU ray traversal[Aila et al. 2012]

Page 40: Karras BVH

Processing stages

40

Initial BVH construction

Post-processing

Optimization

Triangle splitting

Fast GPU ray traversal[Aila et al. 2012]

Page 41: Karras BVH

Processing stages

41

Initial BVH construction

Post-processing

Optimization

Triangle splitting

Fast GPU ray traversal[Aila et al. 2012]

0.4 ms

5.4 ms

6.6 ms

17.0 ms

21.4 ms

1.2 ms

1.6 ms

DRAGON (870K tris)

NVIDIA GTX Titan23.6 ms / 30.0 ms

No splits

30% splits

Page 42: Karras BVH

Cost model

Surface area cost model[Goldsmith and Salmon 1987], [MacDonald and Booth 1990]

𝑆𝐴𝐻 ≔ 𝐶𝑖

𝑛∈𝐼

𝐴(𝑛)

𝐴(root)+ 𝐶𝑡

𝑙∈𝐿

𝐴 𝑙

𝐴 root𝑁(𝑙)

Track cost and triangle count of each subtree

Minimize SAH cost of the final BVH

Make collapsing decisions already during optimization

→ Unified processing of leaves and internal nodes

42

Page 43: Karras BVH

Optimal restructuring

43

Finding the optimal node topology is NP-hard

Naive algorithm → ࣩ 𝑛!

Our approach → ࣩ 3𝑛

But it becomes very powerful as 𝑛 grows

𝑛 = 7 treelet leaves is enough for high-quality results

Use fixed-size treelets

Constant cost per treelet

→ Linear with respect to scene size

Page 44: Karras BVH

Optimal restructuring

44

Treelet size Layouts Quality vs. SBVH *

4 15 78%

5 105 85%

6 945 88%

7 10,395 97%

8 135,135 98%

* SODA (2.2M tris)

Number of unique ways forrestructuring a given treelet

Ray tracing performanceafter 3 rounds of optimization

Page 45: Karras BVH

Optimal restructuring

45

Treelet size Layouts Quality vs. SBVH *

4 15 78%

5 105 85%

6 945 88%

7 10,395 97%

8 135,135 98%

* SODA (2.2M tris)

Almost thesame thing astree rotations

[Kensler 2008]

Limited options during optimization→ easy to get stuck in a local optimum

Varies a lotbetweenscenes

Page 46: Karras BVH

Optimal restructuring

46

Treelet size Layouts Quality vs. SBVH *

4 15 78%

5 105 85%

6 945 88%

7 10,395 97%

8 135,135 98%

* SODA (2.2M tris)

Can still beimplemented

efficiently

Surely one of thesewill take us forward

Consistentacross scenes

Further improvementis marginal

Page 47: Karras BVH

Algorithm

Dynamic programming

Solve small subproblems first

Tabulate their solutions

Build on them to solve larger subproblems

Subproblem:

What’s the best node topology for a subset of the leaves?

47

Page 48: Karras BVH

Algorithm

48

input: set of 𝑛 treelet leavesfor 𝑘 = 2 to 𝑛 do

for each subset of size 𝑘 dofor each way of partitioning the leaves do

look up subtree costscalculate SAH cost

end forrecord the best solution

end forend forreconstruct optimal topology

Process subsets fromsmallest to largest

Record the optimalSAH cost for each

Page 49: Karras BVH

Algorithm

49

input: set of 𝑛 treelet leavesfor 𝑘 = 2 to 𝑛 do

for each subset of size 𝑘 dofor each way of partitioning the leaves do

look up subtree costscalculate SAH cost

end forrecord the best solution

end forend forreconstruct optimal topology

Exhaustive search:assign each leaf toleft/right subtree

We already knowhow much thesubtrees will cost

Backtrack thepartitioning choices

Page 50: Karras BVH

Scalar vs. SIMD

50

Scalar processing

Each thread processes one treelet

Need many treelets in flight

SIMD processing

32 threads collaborate on the same treelet

Need few treelets in flight

✗Spills to off-chip memory

✗Doesn’t scale to small scenes

✓Trivial to implement

✓Data fits in on-chip memory

✓Easy to fill the entire GPU

✗Need to keep all threads busy

Parallelize over subproblems usinga pre-optimized processing schedule

(details in the paper)

Page 51: Karras BVH

Scalar vs. SIMD

51

Scalar processing

Each thread processes one treelet

Need many treelets in flight

SIMD processing

32 threads collaborate on the same treelet

Need few treelets in flight

✓Data fits in on-chip memory

✓Easy to fill the entire GPU

✓Possible to keep threads busy

✗Spills to off-chip memory

✗Doesn’t scale to small scenes

✓Trivial to implement

Page 52: Karras BVH

Quality vs. speed

Spend less effort on bottom-most nodes

Low contribution to SAH cost

Quick convergence

Additional parameter 𝛾

Only process subtrees that are large enough

Trade quality for speed

Double 𝛾 after each round

Significant speedup

Negligible effect on quality

52

Page 53: Karras BVH

Triangle splitting

Early Split Clipping [Ernst and Greiner 2007]Split triangle bounding boxes as a pre-process

53

Bounding box is nota good approximation

Split it!

Large triangle

Page 54: Karras BVH

Triangle splitting

54

Resulting boxesprovide atighter bound

Keep going until theyare small enough

Early Split Clipping [Ernst and Greiner 2007]Split triangle bounding boxes as a pre-process

Page 55: Karras BVH

Triangle splitting

55

Early Split Clipping [Ernst and Greiner 2007]Split triangle bounding boxes as a pre-process

Keep going until theyare small enough

Page 56: Karras BVH

Triangle splitting

56

Early Split Clipping [Ernst and Greiner 2007]Split triangle bounding boxes as a pre-process

Treat each box as aseparate primitive

Triangle itselfremains the same

Page 57: Karras BVH

Triangle splitting

Shortcomings of pre-process splitting

Can hurt ray tracing performance

Unpredictable memory usage

Requires manual tuning

Improve with better heuristics

Select good split planes

Concentrate splits where they matter

Use a fixed split budget

57

Page 58: Karras BVH

Split plane selection

58

Root node partitions thescene at its spatial median

Reduce node overlap in the initial BVH

Page 59: Karras BVH

Split plane selection

59

Left child

Right child

Reduce node overlap in the initial BVH

Page 60: Karras BVH

If a trianglecrosses the plane...

Split plane selection

60

...the bounding boxes will overlap

Use the samespatial medianas a split plane

Reduce node overlap in the initial BVH

Page 61: Karras BVH

Split plane selection

61

No overlap

Use the samespatial medianas a split plane

Reduce node overlap in the initial BVH

Page 62: Karras BVH

Split plane selection

62

Splitting one triangledoes not help much

Need to split them allto get the benefits

Reduce node overlap in the initial BVH

Page 63: Karras BVH

Split plane selection

63

Reduce node overlap in the initial BVH

Page 64: Karras BVH

Split plane selection

64

Same reasoning holds on multiple levels

Reduce node overlap in the initial BVH

Level 0

Level 1

Level 2 Level 2

Level 3

Level 3

Page 65: Karras BVH

Split plane selection

65

Look at all spatialmedian planes thatintersect a triangle

Split it with thedominant one

Reduce node overlap in the initial BVH

Level 1

Level 2 Level 2Level 0

Level 3

Level 3

Page 66: Karras BVH

Algorithm

1. Allocate memory for a fixed split budget

66

Page 67: Karras BVH

Algorithm

1. Allocate memory for a fixed split budget

2. Calculate a priority value for each triangle

67

Page 68: Karras BVH

Algorithm

1. Allocate memory for a fixed split budget

2. Calculate a priority value for each triangle

3. Distribute the split budget among triangles

Proportional to their priority values

68

Page 69: Karras BVH

Algorithm

1. Allocate memory for a fixed split budget

2. Calculate a priority value for each triangle

3. Distribute the split budget among triangles

Proportional to their priority values

4. Split each triangle recursively

Distribute remaining splits according to the size of the resulting AABBs

69

Page 70: Karras BVH

Split priority

70

𝑝𝑟𝑖𝑜𝑟𝑖𝑡𝑦 = 2(−𝑙𝑒𝑣𝑒𝑙) ∙ 𝐴𝑎𝑎𝑏𝑏 − 𝐴𝑖𝑑𝑒𝑎𝑙 1 3

Crosses an importantspatial median plane?

Has large potential forreducing surface area?

Concentrate on triangleswhere both apply

…but leavesomething forthe rest, too

Page 71: Karras BVH

ResultsCompare against 4 CPU and 3 GPU builders

4-core i7 930, NVIDIA GTX Titan

Average of 20 test scenes, multiple viewpoints

71

Page 72: Karras BVH

Ray tracing performance

72

SweepSAH[MacDonald]

SBVH[Stich]

Treerotations[Kensler]

Iterativereinsertion

[Bittner]

0%

20%

40%

60%

80%

100%

120%

140%

SweepSAH = 100%

High-quality CPU builders

SBVH = 131%

Page 73: Karras BVH

0%

20%

40%

60%

80%

100%

120%

140%

Ray tracing performance

73

SweepSAH[MacDonald]

SBVH[Stich]

Treerotations[Kensler]

Iterativereinsertion

[Bittner]

LBVH[Karras]

HLBVH[Garanzha]

GridSAH[Garanzha]

Fast GPU builders

67% – 69%

SBVH = 131%

Almost 2×

Page 74: Karras BVH

Ray tracing performance

74

SweepSAH[MacDonald]

SBVH[Stich]

Treerotations[Kensler]

Iterativereinsertion

[Bittner]

TRBVH TRBVH+30% split

LBVH[Karras]

HLBVH[Garanzha]

GridSAH[Garanzha]

0%

20%

40%

60%

80%

100%

120%

140%

No splits

30% splits

Our method

Page 75: Karras BVH

Ray tracing performance

75

SweepSAH[MacDonald]

SBVH[Stich]

Treerotations[Kensler]

Iterativereinsertion

[Bittner]

TRBVH TRBVH+30% split

LBVH[Karras]

HLBVH[Garanzha]

GridSAH[Garanzha]

0%

20%

40%

60%

80%

100%

120%

140%

96% of SweepSAH

91% of SBVH

Page 76: Karras BVH

Effective performance

76

0%

20%

40%

60%

80%

100%

120%

140%

1M 10M 100M 1G 10G 100G 1T

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Tree rotations[Kensler]

Iterative reinsertion[Bittner]

Not Pareto-optimal

SBVH[Stich]

SweepSAH[MacDonald]

Page 77: Karras BVH

Effective performance

77

0%

20%

40%

60%

80%

100%

120%

140%

1M 10M 100M 1G 10G 100G 1T

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

SBVH[Stich]

SweepSAH[MacDonald]

Page 78: Karras BVH

Effective performance

78

0%

20%

40%

60%

80%

100%

120%

140%

1M 10M 100M 1G 10G 100G 1T

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

HLBVH[Garanzha] GridSAH

[Garanzha]

SBVH[Stich]

SweepSAH[MacDonald]

LBVH[Karras]

Not Pareto-optimal

Page 79: Karras BVH

Effective performance

79

0%

20%

40%

60%

80%

100%

120%

140%

1M 10M 100M 1G 10G 100G 1T

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

SBVH[Stich]

SweepSAH[MacDonald]

LBVH[Karras]

Page 80: Karras BVH

Effective performance

80

0%

20%

40%

60%

80%

100%

120%

140%

1M 10M 100M 1G 10G 100G 1T

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

SBVH[Stich]

Our method(no splits)

SweepSAH[MacDonald]

LBVH[Karras]

Our method(30% splits)

Page 81: Karras BVH

Effective performance

81

0%

20%

40%

60%

80%

100%

120%

140%

1M 10M 100M 1G 10G 100G 1T

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

7M–60G rays/frame→ our method is the best choice

Below 7M→ LBVH

Above 60G→ SBVH

Page 82: Karras BVH

Conclusion

General framework for optimizing trees

Inherently parallel

Approximate restructuring → larger treelets?

Practical GPU-based BVH builder

Best choice in a large class of applications

Adjustable quality–speed tradeoff

Will be integrated into NVIDIA OptiX

82

Page 83: Karras BVH

83

Thank you

Acknowledgements

Samuli Laine

Jaakko Lehtinen

Sami Liedes

David McAllister

Anonymous reviewers

Anat Grynberg and Greg Ward for CONFERENCE

University of Utah for FAIRY

Marko Dabrovic for SIBENIK

Ryan Vance for BUBS

Samuli Laine for HAIRBALL and VEGETATION

Guillermo Leal Laguno for SANMIGUEL

Jonathan Good for ARABIC, BABYLONIAN and ITALIAN

Stanford Computer Graphics Laboratory for ARMADILLO, BUDDHA and DRAGON

Cornell University for BAR

Georgia Institute of Technology for BLADE