31
Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

  • View
    237

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

Many-Core Programming with GRAMPS& “Real Time REYES”

Jeremy Sugerman, Kayvon FatahalianStanford University

June 12, 2008

Page 2: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

2

Background, Outline Stanford Graphics / Architecture Research CPU, GPU trends And collision?

Two research areas:– HW/SW Interface, Programming Model– Future Graphics API

Page 3: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

3

Problem Statement Drive efficient development and execution in

many-/multi-core systems. Support homogeneous, heterogeneous cores. Inform future hardware

Status Quo: GPU Pipeline (Good for GL, otherwise hard) CPU (No guidance, fast is hard)

Page 4: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

4

Software defined graphs Producer-consumer, data-parallelism Initial focus on rendering

GRAMPSInput

FragmentQueue

OutputFragment

Queue

Rasterization Pipeline

Ray Tracing Pipeline

= Thread Stage= Shader Stage= Fixed-func Stage

= Queue= Stage Output

RayQueue

Ray HitQueue Fragment

Queue

Camera Intersect

Shade FB Blend

Shade FB BlendRasterize

Page 5: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

5

As a GPU Evolution Not (too) radical for ‘graphics’ Like fixed → programmable shading

– Pipeline undergoing massive shake up– Diversity of new parameters and use cases

Bigger picture than ‘graphics’– Rendering is more than GL/D3D– Compute is more than rendering– Larrabee has no innate pipeline

Page 6: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

6

As a Compute Evolution Sounds like streaming:

Execution graphs, kernels, data-parallelism Streaming: “squeeze out every FLOP”

– Goals: bulk transfer, arithmetic intensity– Intensive static analysis, custom chips (mostly)– Bounded space, data access, execution time

GRAMPS: “interesting apps are irregular”– Goals: Dynamic, data-dependent code– Aggregate work at run-time– Heterogeneous commodity platforms– Naturally supports streaming when applicable

Page 7: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

7

GRAMPS’ Role A ‘graphics pipeline’ is now an app! GRAMPS models parallel state machines.

Compared to status quo:– More flexible than a GPU pipeline– More guidance than bare metal– Portability in between– Not domain specific

Page 8: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

8

GRAMPS Interfaces Host/Setup: Create execution graph

Thread: Stateful, singleton

Shader: Data-parallel, auto-instanced

Page 9: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

9

What We’ve Built (System)

Page 10: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

10

GRAMPS Scheduler Tiered Scheduler

‘Fat’ cores: per-thread, per-core

‘Micro’ cores: shared hw scheduler

Top level: tier N

Page 11: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

11

What We’ve Built (Apps)Direct3D Pipeline (with Ray-tracing Extension)

Ray-tracing Pipeline

IA 1 VS 1 RO Rast

Trace

IA N VS N

PS

SampleQueue Set

RayQueue

PrimitiveQueue

Input VertexQueue 1

PrimitiveQueue 1

Input VertexQueue N

OM

PS2

FragmentQueue

Ray HitQueue

Ray-tracing Extension

PrimitiveQueue N

Tiler

Shade FB Blend

SampleQueue

TileQueue

RayQueue

Ray HitQueue

FragmentQueue

CameraSampler Intersect

= Thread Stage= Shader Stage= Fixed-func

= Queue= Stage Output= Push Output

Page 12: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

12

Initial Results Queues are small, utilization is good

Page 13: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

13

GRAMPS Visualization

Page 14: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

14

GRAMPS Visualization

Page 15: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

15

GRAMPS Portability Portability really means performance.

Less portable than GL/D3D– GRAMPS graph is hardware sensitive

More portable than bare metal– Enforces modularity– Best case, just works – Worst case, saves boilerplate

Page 16: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

16

High-level Challenges Is GRAMPS a suitable GPU evolution?

– Enable pipeline competitive with bare metal?– Enable innovation: advanced / alternative

methods?

Is GRAMPS a good parallel compute model?– Map well to hardware, hardware trends?– Support important apps?– Concepts influence developers?

Page 17: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

17

What’s Next for GRAMPS? Implementation: scheduling, simulation details Model:

Graph modification (state change)Blocking calls (join)Intra/inter-stage synchronization primitivesData sharing / ref-counting

Workloads: REYES, physics, others?

Develop new graphics pipelines…

Page 18: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

“Real-Time REYES”

18

Page 19: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

19

Just Build It

Build a real-time REYES pipeline...

… that is tightly integrated with ray tracing for global effects.

Page 20: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

20

What does real-time REYES mean? (to us)

Smooth surfaces via adaptive tessellation– Everything is a displaced subdivision surface

Shade on surface, prior to rasterization

Stochastic rasterization for motion blur and DOF

Order-independent transparency

Page 21: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

21

Split

Dice

Shade

Rasterize

Z Test

Blend/Resolve

Displace

Early Z

Tessellate (xbox)

Early Z

Frag Shade

Z Test

Blend/Resolve

Vertex Shade

Rasterize

REYES OpenGL/Direct3D

Page 22: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

22

Split primitive into smaller primitives until a “GOOD” grid can be created.

REYES Tessellation

Page 23: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

23

Page 24: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

24

Page 25: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

25

Page 26: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

26

Grids

GOOD GRID = - Max polygon area < 1 pixel - All polys about the same size - Bounded # polys per grid

Regular parametric sampling of primitive surface (like XBox360).

Compact representation for many adjacent polygons.

Grids provide SIMD efficiency and bulk processing benefits.

Page 27: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

27

Split

Dice

Shade

Rast/Crack Fix

Z Test

Blend/Resolve

Displace

Early Z

Tessellate (xbox)

Early Z

Frag Shade

Z Test

Blend/Resolve

Vertex Shade

Rast

REYES OpenGL/Direct3D

Page 28: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

28

What does real-time REYES mean? (to us)

Smooth surfaces via adaptive tessellation– Splitting is irregular (and serial)– Crack fixing

Shade on surface, prior to rasterization– We feel confident about this– But most “work” done before moving to raster space… hmm

Stochastic rasterization for motion blur and DOF – Many tiny polygons parallel rasterization– SIMD tricky

Order-independent transparency– Not unique to REYES

Page 29: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

29

Shading in a Hybrid System Evaluate displacement (due to REYES or on demand for ray tracing)

Shade grids Shade ray hits Looking forward… shade quads too?

One shading system or two or three?

Page 30: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

This Project is Really About Re-architecting REYES pipeline for real-time

performance (for throughput architectures like LRB)

Hybrid rendering: study interoperability of advanced techniques (REYES + ray tracing + maybe Direct3D)– Hybrid shading system– Understand workload balance

Hybrid pipeline interface: real-time, retained mode

Pursuit of more flexible, advanced graphics pipelines

Page 31: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

31

Questions?