78
Secrets of Parthenon Renderer Toshiya Hachisuka The University of Tokyo

Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Secrets of Parthenon Renderer

Toshiya Hachisuka!

The University of Tokyo

Page 2: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Disclaimer

Not all of my observations are fully validated by scientific experiments, though they are based on my experience.

!

Take them with a grain of salt!

Many images are removed from the original slides due to copyrights.

Page 3: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

15 years ago...

Page 4: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

GPUs in 2002 ⇒ 2017

• More complex operations (64 inst. ⇒ 64K inst.)

• Faster computation (30G FLOPS ⇒ 3T FLOPS)

Page 5: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

What is “Parthenon Renderer”?

• CPU/GPU combined offline rendering system

• Released in 2002 (= the rise of the GPGPU era)

• Publicly and commercially available back then

Page 6: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

What is “Parthenon Renderer”?

Pentium4 2.7 GHz & Radeon 9700 Pro

Page 7: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

What is “Parthenon Renderer”?

Page 8: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Why now?

!

!

• Examples of how techniques become (non) obsolete

• High-ends in 2002 are low-ends in 2017

• Hopefully useful to predict the future

Page 9: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

System Overview

Page 10: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

How Parthenon Works

!

!

• Photon mapping + Final Gathering

• Mapping computation to rasterization units

• Asynchronous computation with CPU and GPU

Page 11: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

How Parthenon Works

!

!

• Photon mapping + Final Gathering

• Mapping computation to rasterization units

• Asynchronous computation with CPU and GPU

Page 12: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x
Page 13: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Photon Mapping

Page 14: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Photon Mapping

Page 15: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Photon Mapping

Page 16: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Photon Mapping

Page 17: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Photon Mapping

Page 18: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Photon Mapping + Final Gathering

• “Clean” the rough solution

Photon Mapping Photon Mapping + FG

Page 19: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Photon Mapping + Final Gathering

Page 20: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Photon Mapping + Final Gathering

Page 21: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Photon Mapping + Final Gathering

Page 22: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Observations

!

• Algorithmic complexity

• Computation cost

Page 23: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Observations - Complexity

!

• Final gathering is a simple process

• Sample rays over the hemisphere

• Photon mapping is a complex process

• Sampling light sources and BRDFs, kNN search

Page 24: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Observations - Cost

• Photon mapping is cheap, but final gathering is not

Scene 1

Scene 2

Scene 3

Number of Rays

0 32000000 64000000 96000000 128000000 160000000

Photon Mapping Ray Tracing Final Gathering

Page 25: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Main Idea

!

• CPU (in 2002)

• Good at complex tasks but slow

!

• GPU (in 2002)

• Good at simple tasks but fast

Page 26: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Main Idea

!

• CPU (in 2002)

• Photon mapping and ray tracing

!

• GPU (in 2002)

• Final gathering

Page 27: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Does this make sense today?

• GPU ray tracing is practical today

• Should be able to do everything on GPU today

• Only if you have a good GPU

Page 28: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Solution for Low-end GPUs

• Not everyone has high-end GPUs

• GeForce GTX 580 ≈ 1.5T FLOPS

• GeForce GT 520 ≈ 150G FLOPS

Page 29: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Solution for Low-end GPUs

• Not everyone has high-end GPUs

• Radeon 9800 XT ≈ 50G FLOPS (in 2002)

• GeForce GT 520 ≈ 150G FLOPS

Page 30: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Solution for Low-end GPUs

• Some rough estimates

• 100M rays/sec on GPU 10M rays/sec on a single CPU core

• 1.5T FLOPS (10x faster than a single CPU core) 150G FLOPS (as fast as a single CPU core)

Page 31: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Solution for Low-end GPUs

Low-end GPUs in 2017 ≈

High-end GPUs in 2002 ≈

Single core of CPUs in 2017

Page 32: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

How Parthenon Works

!

!

• Photon mapping + Final Gathering

• Mapping computation to rasterization units

• Asynchronous computation with CPU and GPU

Page 33: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Precomputation

• Store the result of photon mapping into a mesh

• Similar to light maps computation

• Directional info encoded by SH coefficients

Page 34: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x
Page 35: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x
Page 36: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Grouping by Position

Page 37: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Grouping by Position

Page 38: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Grouping by Position

Page 39: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Grouping by Position

Page 40: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Grouping by Position

Page 41: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Grouping by Position

• Map the FG process on rendering cube maps

Page 42: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Grouping by Position

• Map the FG process on rendering cube maps

Face 2

Face 1 Face 3

Page 43: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Grouping by Position

!

• Too many rasterizations of the scene

• Number of final gathering points = Number of pixels = Number of rasterization passes= O(1M)

!

• Recent research use this with many approximations

Page 44: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x
Page 45: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Grouping by Direction

Page 46: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Grouping by Direction

Page 47: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Grouping by Direction

!

!

• Ray bundle [Szirmay-Kalos and Purgathofer 1998]

• Can be mapped into a parallel projection

Page 48: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x
Page 49: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x
Page 50: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Parallel Projection

Page 51: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Parallel Projection

Page 52: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Parallel Projection

Page 53: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Grouping by Direction

• Significantly fewer rasterizations of the scene

• Number of final gathering directions= Number of final gathering samples = Number of rasterization passes= O(100) << O(1M)

!

• More details in GPU Gems 2

Page 54: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Performance in 2002

Grouping by Position

Grouping by Direction

CPU ray tracing

GPU ray tracing

Number of Samples per second [K samples / sec]

0 1750 3500 5250 7000

Page 55: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Performance in 2017

Grouping by Position

Grouping by Direction

CPU ray tracing

GPU ray tracing

Number of Samples per second [M samples / sec]

0 30 60 90 120

Page 56: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Performance in 2017

Grouping by Position

Grouping by Direction

CPU ray tracing

GPU ray tracing

Number of Samples per second [M samples / sec]

0 30 60 90 120

With Approximations

Optimized

Static Scenes

Page 57: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Does this make sense today?

Grouping by Position

Grouping by Direction

CPU ray tracing

GPU ray tracing

Number of Samples per second [M samples / sec]

0 30 60 90 120

Page 58: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

How Parthenon Works

!

!

• Photon mapping + Final Gathering

• Mapping computation to rasterization units

• Asynchronous computation with CPU and GPU

Page 59: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Main Idea

!

• CPU

• Photon mapping and ray tracing

!

• GPU

• Final gathering

Page 60: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Main Idea

!

• CPU

• Photon mapping and ray tracing

!

• GPU

• Final gathering

Do both at the same time

Page 61: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Asynchronous Computation

CPU

GPU

Page 62: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Asynchronous Computation

CPU

GPU

Photon Mapping

Page 63: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Asynchronous Computation

CPU

GPU

Photon Mapping Ray tracing

Page 64: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Asynchronous Computation

CPU

GPU

Photon Mapping Ray tracing

FG

Page 65: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Asynchronous Computation

CPU

GPU

Photon Mapping Ray tracing

FGFG

Page 66: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Asynchronous Computation

CPU

GPU

Photon Mapping Ray tracing

FG FG FG FG FGFG

Page 67: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Asynchronous Computation

CPU

GPU

Photon Mapping Ray tracing

FG FG FG FG FG

Sync

FG

Page 68: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Asynchronous Computation

CPU

GPU

Photon Mapping Ray tracing Ray tracing

FG FG FG FG FG

Sync

FG FG FG FGFG

Page 69: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Asynchronous Computation

CPU

GPU

Photon Mapping Ray tracing Ray tracing

FG FG FG FG FG

Sync

FG FG FG FGFG

Sync

Page 70: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Asynchronous Computation

CPU

GPU

Photon Mapping Ray tracing Ray tracing

FG FG FG FG FG

Sync

FG FG FG FGFG

Sync

Ray tracing

FGFG

Page 71: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Does this make sense today?

!

• Final gathering typically needs far more samples

• 40 FG samples ≈ 1 RT sample (in 2002)

Page 72: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Does this make sense today?

!

• Final gathering typically needs far more samples

• 40 FG samples ≈ 1 RT sample (in 2002)

• 400 FG samples ≈ 1 RT sample (in 2017)

Photon Mapping Ray tracing Ray tracingSync

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

FG

Page 73: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Some Other Details

• Utilizes shadow mapping units

• Direct illumination and 1st photon trace

• Fake caustics

• Über shader (i.e., single shader handles all materials)

• No choice in 2002

• Still compromised choice today for some systems

Page 74: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Closing Remarks

Page 75: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

In retrospect ...

• Testing for many GPUs was painful

• Parthenon runs on both Radeon and GeForce

• Only solution for testing is to actually run

• Checking specs do not help in the end, you know

• Still true today

• Worse in my opinion since GPUs are everywhere

Page 76: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

In retrospect ...

• Heterogeneous computation was painful

• Power balance of CPU and GPU has changed a lot

• Managing duplicated codes for CPU and GPU

• Maybe still true today

• OpenCL can be a solution if it works as designed

Page 77: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

In retrospect ...

• Going to the right direction of GPU rendering

• but too early - users were not ready

• and too immature - technology was not there

!

• Still somewhat true today, but much better

• People recognized well what GPUs can do

• Virtually anything on CPUs can be done on GPUs

Page 78: Secrets of Parthenon Rendererhachisuka/tdf2017.pdfSolution for Low-end GPUs • Some rough estimates • 100M rays/sec on GPU 10M rays/sec on a single CPU core • 1.5T FLOPS (10x

Summary

• Parthenon Renderer

• One of the first GPU rendering systems

• Many choices are out-of-date, but not all of them

• Some remarks

• Heterogeneous computing might not be a good idea

• Supporting different GPUs can still be painful

• Old techniques for high-ends can be useful and practical for low-ends now