Upload
cmaestrofdez
View
251
Download
2
Embed Size (px)
Citation preview
Fast Parallel Construction of High-Quality Bounding Volume Hierarchies
Tero Karras
Timo Aila
Ray tracing comes in many flavors
2
Interactive apps
1M–100Mrays/frame
Architecture & design
100M–10Grays/frame
Movie production
10G–1Trays/frame
© Activision 2009, Game trailer by Blur Studio
Courtesy of Delta Tracing Lucasfilm Ltd.™, Digital work by ILM
Courtesy of Columbia Pictures
NVIDIA
Courtesy of Dassault Systemes
Effective performance
3
𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑟𝑎𝑦 𝑡𝑟𝑎𝑐𝑖𝑛𝑔 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 =𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠
𝑟𝑒𝑛𝑑𝑒𝑟𝑖𝑛𝑔 𝑡𝑖𝑚𝑒
Effective performance
4
𝑟𝑒𝑛𝑑𝑒𝑟𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 = 𝑡𝑖𝑚𝑒 𝑡𝑜 𝑏𝑢𝑖𝑙𝑑 𝐵𝑉𝐻 +𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠
𝑟𝑎𝑦 𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡
𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑟𝑎𝑦 𝑡𝑟𝑎𝑐𝑖𝑛𝑔 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 =𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠
𝑟𝑒𝑛𝑑𝑒𝑟𝑖𝑛𝑔 𝑡𝑖𝑚𝑒
“speed” “quality”
Effective performance
5
0
50
100
150
200
250
300
350
400
450
1M 10M 100M 1G 10G 100G 1T
𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒
𝑟𝑒𝑛𝑑𝑒𝑟𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 = 𝑡𝑖𝑚𝑒 𝑡𝑜 𝑏𝑢𝑖𝑙𝑑 𝐵𝑉𝐻 +𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠
𝑟𝑎𝑦 𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡
Interactiveapps
Architecture& design
Movieproduction
Both matter
𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑟𝑎𝑦 𝑡𝑟𝑎𝑐𝑖𝑛𝑔 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 =𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠
𝑟𝑒𝑛𝑑𝑒𝑟𝑖𝑛𝑔 𝑡𝑖𝑚𝑒
Mrays/s
Effective performance
6
0
50
100
150
200
250
300
350
400
450
1M 10M 100M 1G 10G 100G 1T
𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒
Mrays/s
SODA (2.2M tris)
NVIDIA GTX Titan
Diffuse rays
Effective performance
7
0
50
100
150
200
250
300
350
400
450
1M 10M 100M 1G 10G 100G 1T
𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒
SBVH[Stich et al. 2009]
(CPU, 4 cores)
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒
Build timedominates
Mrays/s
Effective performance
8
0
50
100
150
200
250
300
350
400
450
1M 10M 100M 1G 10G 100G 1T
𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒
HLBVH[Garanzha et al. 2011]
(GPU)
???
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒
Mrays/s
SBVH[Stich et al. 2009]
(CPU, 4 cores)
Effective performance
9
0
50
100
150
200
250
300
350
400
450
1M 10M 100M 1G 10G 100G 1T
𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒
Our method(GPU)
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒
Mrays/s
HLBVH[Garanzha et al. 2011]
(GPU)
SBVH[Stich et al. 2009]
(CPU, 4 cores)
0
50
100
150
200
250
300
350
400
450
1M 10M 100M 1G 10G 100G 1T
Effective performance
10
𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒
30M–500Grays/frame
97% ofSBVH
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒
Mrays/s
Best quality–speed tradeoff for wide range of applications
Treelet restructuring
11
IdeaBuild a low-quality BVH
Optimize its node topology
Look at multiple nodes at once
Treelet restructuring
12
IdeaBuild a low-quality BVH
Optimize its node topology
Look at multiple nodes at once
TreeletSubset of a node’s descendants
R
Treelet restructuring
13
Treelet root
Treelet leaf
IdeaBuild a low-quality BVH
Optimize its node topology
Look at multiple nodes at once
TreeletSubset of a node’s descendants
R
Treelet restructuring
14
Treelet root
Treelet leaf
IdeaBuild a low-quality BVH
Optimize its node topology
Look at multiple nodes at once
TreeletSubset of a node’s descendants
Grow by turning leaves into internal nodes
Grow
R
Treelet restructuring
15
Treeletinternal
node
Grow
IdeaBuild a low-quality BVH
Optimize its node topology
Look at multiple nodes at once
TreeletSubset of a node’s descendants
Grow by turning leaves into internal nodes
R
Treelet restructuring
16
IdeaBuild a low-quality BVH
Optimize its node topology
Look at multiple nodes at once
TreeletSubset of a node’s descendants
Grow by turning leaves into internal nodes
R
Treelet restructuring
17
IdeaBuild a low-quality BVH
Optimize its node topology
Look at multiple nodes at once
TreeletSubset of a node’s descendants
Grow by turning leaves into internal nodes
R
Treelet restructuring
18
IdeaBuild a low-quality BVH
Optimize its node topology
Look at multiple nodes at once
TreeletSubset of a node’s descendants
Grow by turning leaves into internal nodes
Treelet restructuring
19
R
IdeaBuild a low-quality BVH
Optimize its node topology
Look at multiple nodes at once
TreeletSubset of a node’s descendants
Grow by turning leaves into internal nodes
Largest leaves → best results
Treelet restructuring
20
R
C
A B
F
D
G
E
𝑛 = 7treelet leaves
𝑛 − 1 = 6treelet internal
nodesIdeaBuild a low-quality BVH
Optimize its node topology
Look at multiple nodes at once
TreeletSubset of a node’s descendants
Grow by turning leaves into internal nodes
Largest leaves → best results
Valid binary tree in itself
Treelet restructuring
21
R
C
A B
F
D
G
E
ActualBVH leaf
Arbitrarysubtree
IdeaBuild a low-quality BVH
Optimize its node topology
Look at multiple nodes at once
TreeletSubset of a node’s descendants
Grow by turning leaves into internal nodes
Largest leaves → best results
Valid binary tree in itselfLeaves can represent arbitrary subtrees of the BVH
Treelet restructuring
22
R
C
A B
F
D
G
E
RestructuringConstruct optimal binary tree for the same set of leaves
Replace old treelet
Treelet restructuring
23
C
A B
F
D
G
E
R
RestructuringConstruct optimal binary tree for the same set of leaves
Replace old treelet
Reuse the same nodesUpdate connectivity and AABBs
New AABBs should be smaller
D
E
G
Treelet restructuring
24
A
R
C
F
B
RestructuringConstruct optimal binary tree for the same set of leaves
Replace old treelet
Reuse the same nodesUpdate connectivity and AABBs
New AABBs should be smaller
Treelet restructuring
25
CA BF
D
G E
R
RestructuringConstruct optimal binary tree for the same set of leaves
Replace old treelet
Reuse the same nodesUpdate connectivity and AABBs
New AABBs should be smaller
Treelet restructuring
26
R
CA BF
D
G E
RestructuringConstruct optimal binary tree for the same set of leaves
Replace old treelet
Reuse the same nodesUpdate connectivity and AABBs
New AABBs should be smaller
Perfectly localized operationLeaves and their subtrees are kept intact
No need to look at subtree contents
Processing stages
27
Initial BVH construction
Post-processing
Optimization
Input triangles
One triangleper leaf
Processing stages
28
Initial BVH construction
Post-processing
Optimization
Parallel LBVH[Karras 2012]
60-bit Morton codesfor accurate spatial
partitioning
Processing stages
29
Initial BVH construction
Post-processing
Optimization
Parallel bottom-up traversal[Karras 2012]
Restructure multipletreelets in parallel
Processing stages
30
Initial BVH construction
Post-processing
Optimization
Parallel bottom-up traversal[Karras 2012]
Processing stages
31
Initial BVH construction
Post-processing
Optimization
Parallel bottom-up traversal[Karras 2012]
Processing stages
32
Initial BVH construction
Post-processing
Optimization
Parallel bottom-up traversal[Karras 2012]
Processing stages
33
Initial BVH construction
Post-processing
Optimization
Parallel bottom-up traversal[Karras 2012]
Processing stages
34
Initial BVH construction
Post-processing
Optimization
Parallel bottom-up traversal[Karras 2012]
Processing stages
35
Initial BVH construction
Post-processing
Optimization
Parallel bottom-up traversal[Karras 2012]
Strict bottom-up order→ no overlap between treelets
Processing stages
36
Initial BVH construction
Post-processing
Rinse and repeat(3 times is plenty)
Optimization
Processing stages
37
Initial BVH construction
Post-processing
Optimization
Collapse subtreesinto leaf nodes
Processing stages
38
Initial BVH construction
Post-processing
Optimization
Collect trianglesinto linear lists Prepare them for Woop’s
intersection test[Woop 2004]
Processing stages
39
Initial BVH construction
Post-processing
Optimization
Fast GPU ray traversal[Aila et al. 2012]
Processing stages
40
Initial BVH construction
Post-processing
Optimization
Triangle splitting
Fast GPU ray traversal[Aila et al. 2012]
Processing stages
41
Initial BVH construction
Post-processing
Optimization
Triangle splitting
Fast GPU ray traversal[Aila et al. 2012]
0.4 ms
5.4 ms
6.6 ms
17.0 ms
21.4 ms
1.2 ms
1.6 ms
DRAGON (870K tris)
NVIDIA GTX Titan23.6 ms / 30.0 ms
No splits
30% splits
Cost model
Surface area cost model[Goldsmith and Salmon 1987], [MacDonald and Booth 1990]
𝑆𝐴𝐻 ≔ 𝐶𝑖
𝑛∈𝐼
𝐴(𝑛)
𝐴(root)+ 𝐶𝑡
𝑙∈𝐿
𝐴 𝑙
𝐴 root𝑁(𝑙)
Track cost and triangle count of each subtree
Minimize SAH cost of the final BVH
Make collapsing decisions already during optimization
→ Unified processing of leaves and internal nodes
42
Optimal restructuring
43
Finding the optimal node topology is NP-hard
Naive algorithm → ࣩ 𝑛!
Our approach → ࣩ 3𝑛
But it becomes very powerful as 𝑛 grows
𝑛 = 7 treelet leaves is enough for high-quality results
Use fixed-size treelets
Constant cost per treelet
→ Linear with respect to scene size
Optimal restructuring
44
Treelet size Layouts Quality vs. SBVH *
4 15 78%
5 105 85%
6 945 88%
7 10,395 97%
8 135,135 98%
* SODA (2.2M tris)
Number of unique ways forrestructuring a given treelet
Ray tracing performanceafter 3 rounds of optimization
Optimal restructuring
45
Treelet size Layouts Quality vs. SBVH *
4 15 78%
5 105 85%
6 945 88%
7 10,395 97%
8 135,135 98%
* SODA (2.2M tris)
Almost thesame thing astree rotations
[Kensler 2008]
Limited options during optimization→ easy to get stuck in a local optimum
Varies a lotbetweenscenes
Optimal restructuring
46
Treelet size Layouts Quality vs. SBVH *
4 15 78%
5 105 85%
6 945 88%
7 10,395 97%
8 135,135 98%
* SODA (2.2M tris)
Can still beimplemented
efficiently
Surely one of thesewill take us forward
Consistentacross scenes
Further improvementis marginal
Algorithm
Dynamic programming
Solve small subproblems first
Tabulate their solutions
Build on them to solve larger subproblems
Subproblem:
What’s the best node topology for a subset of the leaves?
47
Algorithm
48
input: set of 𝑛 treelet leavesfor 𝑘 = 2 to 𝑛 do
for each subset of size 𝑘 dofor each way of partitioning the leaves do
look up subtree costscalculate SAH cost
end forrecord the best solution
end forend forreconstruct optimal topology
Process subsets fromsmallest to largest
Record the optimalSAH cost for each
Algorithm
49
input: set of 𝑛 treelet leavesfor 𝑘 = 2 to 𝑛 do
for each subset of size 𝑘 dofor each way of partitioning the leaves do
look up subtree costscalculate SAH cost
end forrecord the best solution
end forend forreconstruct optimal topology
Exhaustive search:assign each leaf toleft/right subtree
We already knowhow much thesubtrees will cost
Backtrack thepartitioning choices
Scalar vs. SIMD
50
Scalar processing
Each thread processes one treelet
Need many treelets in flight
SIMD processing
32 threads collaborate on the same treelet
Need few treelets in flight
✗Spills to off-chip memory
✗Doesn’t scale to small scenes
✓Trivial to implement
✓Data fits in on-chip memory
✓Easy to fill the entire GPU
✗Need to keep all threads busy
Parallelize over subproblems usinga pre-optimized processing schedule
(details in the paper)
Scalar vs. SIMD
51
Scalar processing
Each thread processes one treelet
Need many treelets in flight
SIMD processing
32 threads collaborate on the same treelet
Need few treelets in flight
✓Data fits in on-chip memory
✓Easy to fill the entire GPU
✓Possible to keep threads busy
✗Spills to off-chip memory
✗Doesn’t scale to small scenes
✓Trivial to implement
Quality vs. speed
Spend less effort on bottom-most nodes
Low contribution to SAH cost
Quick convergence
Additional parameter 𝛾
Only process subtrees that are large enough
Trade quality for speed
Double 𝛾 after each round
Significant speedup
Negligible effect on quality
52
Triangle splitting
Early Split Clipping [Ernst and Greiner 2007]Split triangle bounding boxes as a pre-process
53
Bounding box is nota good approximation
Split it!
Large triangle
Triangle splitting
54
Resulting boxesprovide atighter bound
Keep going until theyare small enough
Early Split Clipping [Ernst and Greiner 2007]Split triangle bounding boxes as a pre-process
Triangle splitting
55
Early Split Clipping [Ernst and Greiner 2007]Split triangle bounding boxes as a pre-process
Keep going until theyare small enough
Triangle splitting
56
Early Split Clipping [Ernst and Greiner 2007]Split triangle bounding boxes as a pre-process
Treat each box as aseparate primitive
Triangle itselfremains the same
Triangle splitting
Shortcomings of pre-process splitting
Can hurt ray tracing performance
Unpredictable memory usage
Requires manual tuning
Improve with better heuristics
Select good split planes
Concentrate splits where they matter
Use a fixed split budget
57
Split plane selection
58
Root node partitions thescene at its spatial median
Reduce node overlap in the initial BVH
Split plane selection
59
Left child
Right child
Reduce node overlap in the initial BVH
If a trianglecrosses the plane...
Split plane selection
60
...the bounding boxes will overlap
Use the samespatial medianas a split plane
Reduce node overlap in the initial BVH
Split plane selection
61
No overlap
Use the samespatial medianas a split plane
Reduce node overlap in the initial BVH
Split plane selection
62
Splitting one triangledoes not help much
Need to split them allto get the benefits
Reduce node overlap in the initial BVH
Split plane selection
63
Reduce node overlap in the initial BVH
Split plane selection
64
Same reasoning holds on multiple levels
Reduce node overlap in the initial BVH
Level 0
Level 1
Level 2 Level 2
Level 3
Level 3
Split plane selection
65
Look at all spatialmedian planes thatintersect a triangle
Split it with thedominant one
Reduce node overlap in the initial BVH
Level 1
Level 2 Level 2Level 0
Level 3
Level 3
Algorithm
1. Allocate memory for a fixed split budget
66
Algorithm
1. Allocate memory for a fixed split budget
2. Calculate a priority value for each triangle
67
Algorithm
1. Allocate memory for a fixed split budget
2. Calculate a priority value for each triangle
3. Distribute the split budget among triangles
Proportional to their priority values
68
Algorithm
1. Allocate memory for a fixed split budget
2. Calculate a priority value for each triangle
3. Distribute the split budget among triangles
Proportional to their priority values
4. Split each triangle recursively
Distribute remaining splits according to the size of the resulting AABBs
69
Split priority
70
𝑝𝑟𝑖𝑜𝑟𝑖𝑡𝑦 = 2(−𝑙𝑒𝑣𝑒𝑙) ∙ 𝐴𝑎𝑎𝑏𝑏 − 𝐴𝑖𝑑𝑒𝑎𝑙 1 3
Crosses an importantspatial median plane?
Has large potential forreducing surface area?
Concentrate on triangleswhere both apply
…but leavesomething forthe rest, too
ResultsCompare against 4 CPU and 3 GPU builders
4-core i7 930, NVIDIA GTX Titan
Average of 20 test scenes, multiple viewpoints
71
Ray tracing performance
72
SweepSAH[MacDonald]
SBVH[Stich]
Treerotations[Kensler]
Iterativereinsertion
[Bittner]
0%
20%
40%
60%
80%
100%
120%
140%
SweepSAH = 100%
High-quality CPU builders
SBVH = 131%
0%
20%
40%
60%
80%
100%
120%
140%
Ray tracing performance
73
SweepSAH[MacDonald]
SBVH[Stich]
Treerotations[Kensler]
Iterativereinsertion
[Bittner]
LBVH[Karras]
HLBVH[Garanzha]
GridSAH[Garanzha]
Fast GPU builders
67% – 69%
SBVH = 131%
Almost 2×
Ray tracing performance
74
SweepSAH[MacDonald]
SBVH[Stich]
Treerotations[Kensler]
Iterativereinsertion
[Bittner]
TRBVH TRBVH+30% split
LBVH[Karras]
HLBVH[Garanzha]
GridSAH[Garanzha]
0%
20%
40%
60%
80%
100%
120%
140%
No splits
30% splits
Our method
Ray tracing performance
75
SweepSAH[MacDonald]
SBVH[Stich]
Treerotations[Kensler]
Iterativereinsertion
[Bittner]
TRBVH TRBVH+30% split
LBVH[Karras]
HLBVH[Garanzha]
GridSAH[Garanzha]
0%
20%
40%
60%
80%
100%
120%
140%
96% of SweepSAH
91% of SBVH
Effective performance
76
0%
20%
40%
60%
80%
100%
120%
140%
1M 10M 100M 1G 10G 100G 1T
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒
Tree rotations[Kensler]
Iterative reinsertion[Bittner]
Not Pareto-optimal
SBVH[Stich]
SweepSAH[MacDonald]
Effective performance
77
0%
20%
40%
60%
80%
100%
120%
140%
1M 10M 100M 1G 10G 100G 1T
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒
SBVH[Stich]
SweepSAH[MacDonald]
Effective performance
78
0%
20%
40%
60%
80%
100%
120%
140%
1M 10M 100M 1G 10G 100G 1T
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒
HLBVH[Garanzha] GridSAH
[Garanzha]
SBVH[Stich]
SweepSAH[MacDonald]
LBVH[Karras]
Not Pareto-optimal
Effective performance
79
0%
20%
40%
60%
80%
100%
120%
140%
1M 10M 100M 1G 10G 100G 1T
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒
SBVH[Stich]
SweepSAH[MacDonald]
LBVH[Karras]
Effective performance
80
0%
20%
40%
60%
80%
100%
120%
140%
1M 10M 100M 1G 10G 100G 1T
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒
SBVH[Stich]
Our method(no splits)
SweepSAH[MacDonald]
LBVH[Karras]
Our method(30% splits)
Effective performance
81
0%
20%
40%
60%
80%
100%
120%
140%
1M 10M 100M 1G 10G 100G 1T
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒
7M–60G rays/frame→ our method is the best choice
Below 7M→ LBVH
Above 60G→ SBVH
Conclusion
General framework for optimizing trees
Inherently parallel
Approximate restructuring → larger treelets?
Practical GPU-based BVH builder
Best choice in a large class of applications
Adjustable quality–speed tradeoff
Will be integrated into NVIDIA OptiX
82
83
Thank you
Acknowledgements
Samuli Laine
Jaakko Lehtinen
Sami Liedes
David McAllister
Anonymous reviewers
Anat Grynberg and Greg Ward for CONFERENCE
University of Utah for FAIRY
Marko Dabrovic for SIBENIK
Ryan Vance for BUBS
Samuli Laine for HAIRBALL and VEGETATION
Guillermo Leal Laguno for SANMIGUEL
Jonathan Good for ARABIC, BABYLONIAN and ITALIAN
Stanford Computer Graphics Laboratory for ARMADILLO, BUDDHA and DRAGON
Cornell University for BAR
Georgia Institute of Technology for BLADE