66
A Real-time Micropolygon Rendering Pipeline Kayvon Fatahalian Stanford University

A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Embed Size (px)

Citation preview

Page 1: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

A Real-time Micropolygon Rendering Pipeline

Kayvon Fatahalian Stanford University

Page 2: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Detailed surfaces

Credit: DreamWorks Pictures, Shrek 2 (2004)

Page 3: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Credit: Pixar Animation Studios, Toy Story 2 (1999)

Page 4: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Credit: Crytek

Page 5: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away
Page 6: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Zoom in (pixels)

(one pixel)

Page 7: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Micropolygons

(one pixel)

Page 8: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Motion blur Camera defocus

Page 9: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Highly detailed surfaces: micropolygons (But interoperate with large triangle rendering in same frame)

Accurate camera defocus and motion blur

[Near future] real-time system

Rendering goals

Page 10: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

It is inefficient to render micropolygons using the current graphics pipeline.

Page 11: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

How does the real-time graphics pipeline evolve to enable efficient micropolygon rendering?

How should surfaces be tessellated into micropolygons? How can micropolygons be rasterized efficiently?

How expensive is motion blur and defocus? Should the pipeline shade like GPUs or like REYES?

(changes to algorithms, abstractions, and HW are fair game)

Page 12: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

TESSELLATION: How do we add adaptive tessellation to

the pipeline?

Page 13: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Base primitives

Generate micropolygons on demand

Smooth tessellation Final displaced surface

Page 14: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

D3D tessellates parametric surfaces

Frame buffer ops

Coarse Vertex Processing

Rasterize+Cull

Fragment Processing

Fixed function =

Primitive Processing

Tessellate

Fine Vertex Processing Compute position of new triangle verts

Compute tessellation factor

New in D3D11 Generate new triangles

Page 15: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Need adaptive tessellation   Uniform tessellation per patch over/under tessellates

Undertessellation

Overtessellation

Page 16: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Performing good adaptive tessellation is very important.

Complexity pays for itself. Less surface evaluation

Less shading Less (and more efficient) rasterization

Page 17: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

vu

6x63x2

3x4

Split-dice adaptive tessellation

Split: recursively divide primitive (adaptive) Dice: simple uniform tessellation (fast)

Split Split (again) Set Rates Dice

Base Patch Micropolygon Mesh

Page 18: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

D3D11 tessellation

 Don’t split, just dice – Tessellation factors for primitive edges – “Fancy” dicer stitches edges to uniform interior

REYES dicing D3D11 dicing

7

4 3

2

7

4

Page 19: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Parallel, adaptive tessellation must avoid cracks

Page 20: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Offline techniques suboptimal for real-time

 Retain the entire micropolygon mesh and “stitch” adjacent subpatches – Very robust – Hard to parallelize (can’t stream)

 Use only power-of-two dicing factors – Easier to parallelizable – T-junctions – Over tessellates

Page 21: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Split-dice for a real-time pipeline

 DiagSplit: simple solution – Adaptive tessellation – Independent subpatch processing – Crack-free – No power-of-two constraints

Fisher et al. [Siggraph Asia 2009]

Page 22: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

DiagSplit: idea 1   Leverage D3D11 tessellator as advanced dicer – Edge-based tessellation factor function: h(edge) – Let h evaluate displacement

h=5

h=6

h=3 h=4

Dicable: send to Dicer

h=SPLIT

h=SPLIT

h=3 h=4

Evaluate h on subpatches

Page 23: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

DiagSplit: idea 2  Scale area of internal polygons to hit target

area precisely

Can scale tessellation of interior region without creating cracks

Page 24: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

DiagSplit: idea 3

Non-isoparametric patch split

 Permit “diagonal” (non-isoparametric) splits

Page 25: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

DiagSplit Direct3D 11 Power-of -Two Split-Dice + Stitching

Page 26: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

What is the cost of splitting?

 Most time is spent evaluating final surface positions

  Then you’ve got to rasterize and shade   Avoiding over tessellation pays for itself

  10 million MP/sec on one core of Intel I7

Page 27: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Parallel tessellation is an active area   Expressibility of parametric surfaces is increasing –  Approximate Catmull-Clark surfaces [Loop (TOG 2008)] –  Extension to creases and corners [Kovacs (I3D 2009)]

  Variety of implementations –  Depth-first (cache efficient, tiny footprint) –  Breath-first subdivision (fits current GPU compute model well) – Patney (SIGGRAPH Asia 2008), Patney (HPG 2009) – Eisenacher (I3D 2009) – RenderAnts

–  CUDATess [Schwarz (Eurographics 2009)]

Page 28: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Tessellate (Dice)

Fine Vertex Processing

D3D11 Pipeline

Fixed function =

Split Tessellation Rate Func

Tessellate (Dice)

Fine Vertex Processing

Micropolygon Pipeline

Programmable =

Coarse Vertex Processing

Primitive Processing

Coarse Vertex Processing

Primitive Processing

Bounding Box Func

Page 29: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

RASTERIZATION: How efficiently can micropolygons be

rasterized? (and what’s the cost?)

Page 30: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Rasterization

Page 31: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Step 1: per-polygon preprocessing (setup)

  Clip, back face cull, compute edge equations

 Make point-in-polygon tests cheap

Page 32: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Step 2: compute candidate sample set

 Coarse reject/accept of samples

Coarse uniform grid

Page 33: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Step 2: compute candidate sample set

 Coarse reject/accept of samples

Hierarchical descent Coarse uniform grid

Page 34: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Step 2: compute candidate sample set

 Coarse reject/accept of samples

Hierarchical descent Coarse uniform grid

Page 35: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Step 2: compute candidate sample set

 Coarse reject/accept of samples

Hierarchical descent Coarse uniform grid

Page 36: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Step 3: point-in-polygon tests

 Test “stamp” of samples against polygon simultaneously (data-parallel)

[Pineda 88] [Fuchs 89] [Greene 96] [Seiler 08]

Page 37: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Micropolygons: more polygons = more setup

Page 38: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Micropolygons: coarse reject not useful

Page 39: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Micropolygons: large stamps yield low efficiency

6% of tested samples inside triangle

47% of tested samples inside triangle

Page 40: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Re-implement rasterizer

 Rasterize many micropolygons simultaneously

Exec 0

Exec 1

Exec 2

Exec 3

Exec 4

Exec 5

Exec 6

Exec 7

Input micropolygons

Output fragments

(Fatahalian et al. HPG 2009)

Page 41: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Reimplement rast to increase efficiency

10

20

30

4x multi-sampling

20%

6%

Conventional rasterizer: 16 sample (4x4) stamp

Sam

ple t

est e

fficie

ncy (

%)

MICROPOLYGON-PARALLEL

16x multi-sampling No multi-sampling

28%

12%

2%

11%

(2.5 to 6x more efficient)

Page 42: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Primary visibility computation: 1080p resolution, 30 Hz

4x multi-sampling Simple scene (10 M micropolygons)

Estimated cost of GPU SW implementation: Approximately 1/3 of high-end GPU

Micropolygon rasterization is expensive

Fixed-function micropolygon rasterization is appealing

Page 43: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

MOTION BLUR AND DEFOCUS: How expensive are motion blur/defocus?

Page 44: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Motion blur and defocus

 Many 2D-techniques for approximating blur [Sung 02] [Demers 04]

 Stochastic point sampling [Cook 84, Cook 86] [Akenine-Moller 07]

Page 45: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Moving micropolygon

x

y

t

Page 46: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Soccer jump

64 unique time samples 16x multi-sampling

Page 47: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Enabling motion/defocus blur costs 3 to 7x more

 Point-in-polygon tests are more expensive  Algorithms that support blur perform more

point-in-polygon tests

Stationary geometry, perfect focus:

Tests 2.5x more samples Cost 3x more

Page 48: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Performance varies with motion Co

st (o

ps)

Page 49: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Most efficient algorithm depends on scene conditions Co

st (o

ps)

Page 50: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

[Pixar method] costs increase sharply with defocus Co

st (o

ps)

Page 51: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Tessellate (Dice)

Rasterize (5D)

D3D11 Pipeline

Split Tessellation Rate Func

Tessellate (Dice)

Rasterize+Cull

Micropolygon Pipeline

Coarse Vertex Processing

Primitive Processing

Coarse Vertex Processing

Primitive Processing

Fine Vertex Processing Fine Vertex Processing

Bounding Box Func

shutter duration, lens parameters

open and close shutter vertex positions

Fixed function =

Programmable =

Unchanged =

Page 52: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

SHADING: Should a micropolygon pipeline shade

like today’s GPUs or like REYES?

Page 53: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Tessellate (Dice)

Rasterize (5D)

D3D11 Pipeline

Fixed function =

Split Tessellation Rate Func

Cull+Shade?

Cull+Shade?

Tessellate (Dice)

Rasterize+Cull

Fragment Processing

Micropolygon Pipeline

Programmable =

Coarse Vertex Processing

Primitive Processing

Coarse Vertex Processing

Primitive Processing

Fine Vertex Processing Fine Vertex Processing

Frame buffer ops Frame buffer ops

Bounding Box Func

Unchanged =

Page 54: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

GPUs shade micropolygons inefficiently

Rasterize

micropolygon input

Four shading samples output

At least 4x overshade! (must share shading results across micropolygons)

Page 55: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

MSAA implementations break down with blur

Rasterize

Motion-blurred polygon input

Covered samples are spread widely across screen

(inefficient to represent as hit mask)

Page 56: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Future fragment shading?

Rasterize

Frame Buffer Ops

Fragment processing Shading Result Cache Shading requests

Shaded samples

Visibility sample hits

Fragments

Decoupled sampling: Ragan-Kelley et al.

Page 57: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

REYES-style shading

 Perform shading computations at vertices  Requires connectivity for derivatives – Not a problem: system has the diced mesh

 Decoupled shading for motion blur and defocus “just works”

Page 58: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Overshade in a “REYES” system

 Overtessellation  Conservative, coarse occlusion culling

Page 59: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Shading system design space Sample in screen space

Sample in object space

Shade only what’s visible

Fine early Z Deferred shading

Shade before exact visibility known

Current GPU fragment shading

REYES: shade at diced grid vertices

Sweet spot?

Page 60: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

SUMMARY

Page 61: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Tessellate (Dice)

Rasterize (5D)

Micropolygon Pipeline

Coarse Vertex Processing

Primitive Processing

Fine Vertex Processing

Frame buffer ops

Fine Vertex Shading

Future Fragment Proc.

Good adaptive tessellation Is very important

DiagSplit: Build split on top of D3D11 Tess

Abstract SPLIT as non-programmable stage

Split Tessellation Rate Func

Bounding Box Func

Page 62: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Tessellate (Dice)

Rasterize (5D)

Micropolygon Pipeline

Coarse Vertex Processing

Primitive Processing

Fine Vertex Evaluation

Frame buffer ops

Fine Vertex Shading

Future Fragment Proc.

(requires mesh connectivity)

Shading on surface: Need derivatives & good tessellation

We’ve got the diced grid for connectivity

Split Tessellation Rate Func

Bounding Box Func

Page 63: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Tessellate (Dice)

Rasterize (5D)

Micropolygon Pipeline

Coarse Vertex Processing

Primitive Processing

Fine Vertex Processing

Frame buffer ops

shutter duration, lens parameters

Fine Vertex Shading

Future Fragment Proc.

Rasterization: re-implement for micropolygons

Fixed-function HW likely

Point sampling for motion blur and defocus are 3-7x more costly

Split Tessellation Rate Func

Bounding Box Func

Page 64: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Tessellate (Dice)

Rasterize (5D)

Micropolygon Pipeline

Coarse Vertex Processing

Primitive Processing

Fine Vertex Processing

Frame buffer ops

Fine Vertex Shading

Future Fragment Proc.

Must share shading across micropolygons to avoid overcompute

(quad per MP is inefficient)

Motion blur/defocus change fragment shading (even without micropolygons)

Split Tessellation Rate Func

Bounding Box Func

Page 65: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

It’s coming!

 We’ll have the flops from future HW  Lot’s of active research on algorithms

 Real-time micropolygon rendering will be possible in the future

Page 66: A Real-Time Micropolygon Rendering Pipeline Is Not Far Away

Thank you