Upload
dice
View
54.245
Download
5
Embed Size (px)
DESCRIPTION
Talk from SIGGRAPH 2010 and the Beyond Programmable Shading course Also see publications.dice.se for more material and other DICE talks.
Citation preview
Beyond Programmable Shading CourseACM SIGGRAPH 2010
Bending the Graphics Pipeline
Johan Andersson
DICE
Overview
• Give a taste of a few rendering techniques we are using & experimenting with how they interact, or would like to interact, with the graphics pipeline
• Tile-based Deferred Shading
• Morphological Antialiasing
• Analytical Ambient Occlusion
04/11/23 2Beyond Programmable Shading, SIGGRAPH 2010
Beyond Programmable Shading CourseACM SIGGRAPH 2010
TILE-BASED DEFERRED SHADING
Tile-based deferred shading
• Tile-based culling & lighting– Cull lights per screen-space tile– Lighting kernel runs per tile– Minimizes bandwidth/setup cost
• DX11: GPU compute shader – Covered in the course last year [Andersson09]
• PS3: SPU jobs– GPU renders gbuffer– SPU does light culling & full lighting evaluation for each pixel
04/11/23 4Beyond Programmable Shading, SIGGRAPH 2010
• Standard phong• Metallic• Skin• Translucent
Multiple deferred lighting models
Beyond Programmable Shading, SIGGRAPH 2010 504/11/23
Working with tiles
• Tile culling optimizations– Cull lights & shadows with tile normal cone– Detect tile specular=0– Detect tile lighting model
• Tile lighting kernel permutations– Specular on/off– Lighting models– More in the future
Beyond Programmable Shading, SIGGRAPH 2010 604/11/23
SPU-based Deferred Shading
• Ported DX11 compute shader to SPU job– Offloads PS3 GPU– SPU processing in parallel with GPU rendering– 32x16 pixel tiles
• Explicit SoA vectorization instead of implicit– C/C++ on SPU - HLSL on GPU– Not a problem for such a relative small kernel– But not ideal data-parallel programming model
Beyond Programmable Shading, SIGGRAPH 2010 704/11/23
SPU vs GPU architecture
• 6 execution contexts vs 1+ million (each pixel)• Explicit SIMD vs implicit SIMD• C/C++ vs HLSL• Explicit async DMA vs implicit latency hiding
• What can we learn?
Beyond Programmable Shading, SIGGRAPH 2010 804/11/23
Issues & challenges going forward
• More lighting models– SIMD & branching efficiency
• Transparent decal surfaces & volumes– Fixed function blending doesn’t work well with deferred
• Higher-quality antialiasing
Beyond Programmable Shading, SIGGRAPH 2010 904/11/23
Flexible lighting models
• Want both more & more flexible models:– Custom gbuffer layout per material– Quality & performance tradeoffs
• Examples:– Hair / anisotropic materials
• Requires more lighting model parameters in gbuffer– Foliage
• Massive overdraw with alpha-tested simple shaders, few parameters • Write to as simple gbuffer as possible to reduce ROP/bandwidth bottleneck
– Skin • Sub-surface scattering approximation
Beyond Programmable Shading, SIGGRAPH 2010 1004/11/23
The SIMD efficiency problem
• Lighting models through dynamic branches
• GPU shader model can be problematic:– Increased register pressure = overall slower shader – Requires good screen-space SIMD coherency for
performance win
• Potential solutions:– Reshuffle pixels to improve coherency?
• Within each tile, sort pixels by model, compute lighting & then scatter back
– GRAMPS-style queing? [Sugerman09]• Attractive & powerful high-level programming model
Beyond Programmable Shading, SIGGRAPH 2010 1104/11/23
Alpha-tested foliage has far from ideal coherency
Decals & deferred shading
• Decals blend selectively against gbuffer– Include:
• Diffuse albedo (gbuffer1.rgb)• Normal (gbuffer0.rgb)
– Want to include (but can’t in single pass):• Specular albedo (gbuffer1.a) • Specular smoothness (gbuffer0.a)
– Exclude:• Material id (can’t blend)• Object lighting (inherit from below surface)
• Fixed function blending doesn’t work well– Pixel shader can’t write out both alpha & blend factor!– Consoles doesn’t have blend mode per MRT– Linear blend doesn’t work for all components
Beyond Programmable Shading, SIGGRAPH 2010 1204/11/23
See Destruction Masking in Frostbite 2 using Volume Distance Fields [Kihl10] for more details about decal use case
Need programmable blending
• Benefits:– Write out gbuffer alpha channels indepenently of blend factor– Treat channels & targets however you see fit – Non-linear blending & renormalizing blends– Can do overlapping dependent blending
• Read current normal, add bumps relative to it, write out
• What approach?– LRB-style pixel shader framebuffer read/modify/write [Lalonde09]
• Ideal general solution for developers• How to hide synchronization latency? Implicit / explicit?
– Blend shader • Yet another stage in a fixed pipeline• No R/M/W, not ideal
– More?
Beyond Programmable Shading, SIGGRAPH 2010 1304/11/23
The deferred shading + MSAA problem
• Huge storage & bandwidth requirements with deferred– 1920 x 1080 x 5 x 4 x 4 = 165 MB– Doesn’t scale! Adding 1 bit of precision = 2x more memory
• 4x MSAA is not enough– Esp. for thin geometry in a distance
• Prohibitive performance and bandwidth in general with deferred shading– But don’t miss Andrew Lauritzen’s talk later in the course: Deferred Rendering for
Current and Future Rendering Pipelines
• There are alternatives to MSAA...
Beyond Programmable Shading, SIGGRAPH 2010 1404/11/23
MLAA – Morphological Antialiasing
• Post-effect antialiasing• Introduced in [Reshetov09]
• Implementations:– Intel CPU reference implementation [Reshetov09]– Sony PS3 SPU implementation [Perthuis10]– GPU compute? [Biri10]
Beyond Programmable Shading, SIGGRAPH 2010 1504/11/23
MLAA workings
Beyond Programmable Shading, SIGGRAPH 2010 1604/11/23
From [Reshetov09]
MLAA comparisons (PS3)
Beyond Programmable Shading, SIGGRAPH 2010 1704/11/23
No AA
MLAA
MLAA takeaways
• Awesome AA for still pictures
• Moving pictures good, but:– No sub-pixel information = edges snap to pixels– Doesn’t solve aliasing on fine detail geometry– Overall still a very good benefit!
• Focus/exclude effect based on framebuffer alpha & thresholds– Unique requirements per game/app– Not good to use on some UI, mark in alpha (or apply before)
• Variable post-effect, trade perf vs quality!
Beyond Programmable Shading, SIGGRAPH 2010 1804/11/23
MLAA future (PC)
• GPU compute shader implementation
• Combine with MSAA & sub-pixel samples – Simple MSAA box filter downsampling is a big waste– Sort of similar to A Directionally Adaptive Edge Anti-
Aliasing Filter [Yang09]– A must to reduce the edge snapping of pure MLAA– Not fully clear how it should work (sample distribution)
Beyond Programmable Shading, SIGGRAPH 2010 1904/11/23
Beyond Programmable Shading CourseACM SIGGRAPH 2010
AMBIENT OCCLUSION
Current dynamic AO
• Horizon-based Ambient Occlusion – See [Bavoil09] for complete details
• Based on screen-space depth-buffer (SSAO)– Very high quality sampling– But only screen-space info is a big limitation– Creates false occlusion artifacts
• Render in half-res for improved performance– Bilateral upsampling + gaussian blur – Can also do dual-resolution to reduce artifacts
Beyond Programmable Shading, SIGGRAPH 2010 2104/11/23
Horizon-based Ambient Occlusion
Beyond Programmable Shading, SIGGRAPH 2010 2204/11/23
False occlusion halo from thin geometryFalse occlusion halo from thin geometry
HBAO limitations
• False halo occlusion artifacts around small geometry – Such as: fences & poles
– Extra visible when moving the camera
• Very noisy sampling for detailed zbuffers– Common with alpha-tested foliage
– Difficult sampling problem
Beyond Programmable Shading, SIGGRAPH 2010 2304/11/23
Analytical Ambient Occlusion
Beyond Programmable Shading, SIGGRAPH 2010 2404/11/23
HBAO vs AAO
Beyond Programmable Shading, SIGGRAPH 2010 2504/11/23
Analytical Ambient Occlusion
• Using Ambient Occlusion Volumes– [McGuire10]
• Experimental implementation in Frostbite 2– With some good help from Morgan
McGuire and Louis Bavoil
• Geometry-based technique– Not screen-space!– Say what?
Beyond Programmable Shading, SIGGRAPH 2010 2604/11/23
AOV idea
1. Extrude prism for each triangle (GS)– Extrusion distance is where occlusion=0
2. Rasterize primitives in prism– With depth-test enabled, near depth clip disabled– Finds visible points inside volume– Need to handle case with camera inside volume
3. Accumulate analytical occlusion contribution for visible pixels (PS)– Uses pixel normal & depth values from gbuffer– Subtractive blend
Beyond Programmable Shading, SIGGRAPH 2010 2704/11/23
Beyond Programmable Shading, SIGGRAPH 2010 2804/11/23
HBAO
Beyond Programmable Shading, SIGGRAPH 2010 2904/11/23
HBAOAOV
AOV in practice
• Render geometry again in separate AO pass– Uses depth & normal buffer from deferred rendering– Half-res or lower with bilateral upsampling– Culling should consider extrusion distance
• Separate paths for dynamic & rigid objects– Can pre-compute rigid extruded AOV & reduce overdraw
• Doesn’t work with alpha-tested surfaces– Simulate with per-surface or per-triangle approx. coverage factor
Beyond Programmable Shading, SIGGRAPH 2010 3004/11/23
Overdarkening (extra occlusion)
Beyond Programmable Shading, SIGGRAPH 2010 3104/11/23
Varying overdraw with AO distance
Beyond Programmable Shading, SIGGRAPH 2010 3204/11/23
0.1 m 0.2 m 0.5 m
AOV pros & cons
Pros:•Very high quality - close to raytracing ground truth•Noise free (when full res)•Perfectly stable with view changes•Supports arbitrary dynamic polygon soups
Cons:•Requires massive fillrate•Geometry cost•Overdarkening, may require content tweaks
Beyond Programmable Shading, SIGGRAPH 2010 3304/11/23
AOV future optimizations
• Reduce the massive overdraw– Cull / restrict prisms that only extend out to empty air?– Clamp screen-space prism size
• Not correct, but practical tradeoff. HBAO does this
• More optimal prism geometry– GS is limited to triangle strip output – Precompute using quads for rigid objects
• Geometry LOD / mix with higher-order geometry representations– Also see AO volume texture & analytical capsule techniques [Hill10]
Beyond Programmable Shading, SIGGRAPH 2010 3404/11/23
AOV takeaways
• Major improvement in visual quality compared to SSAO
• Interesting use of geometry & rasterization pipelines– Builds on existing HW-, SW- & content pipelines– Quite simple brute force drop-in (but not as simple as SSAO)
• Siggraph interactive framerates™ today, but lots of potential:– Performance highly dependent on occlusion distance– Optimizations / less brute force?– Use for high-end / reference / precompute / beauty shots initially
Beyond Programmable Shading, SIGGRAPH 2010 3504/11/23
Conclusions
• New graphics pipeline usages are opened up with improved HW performance– Often not efficient to do with pure compute– Continue to give us more performance & bandwidth!
• We need to continue to break down some fixed graphics pipeline barriers
04/11/23 36Beyond Programmable Shading, SIGGRAPH 2010
Acknowledgments
• Morgan McGuire• Louis Bavoil• David Luebke• Andrew Lauritzen• Robert Kihl• Christina Coffin• SCEE
Beyond Programmable Shading, SIGGRAPH 2010 3704/11/23
Questions?
Beyond Programmable Shading, SIGGRAPH 2010 3804/11/23
email: [email protected]
blog: http://repi.se
twitter: @repi
For more DICE talks:
http://publications.dice.se
References• [Andersson09] Johan Andersson, “Parallel Graphics in Frostbite - Current & Future”, Beyond
Programmable Shading Course – Siggraph 2009 http://s09.idav.ucdavis.edu/• [Lalonde09] Paul Lalonde “Innovating in a Software Graphics Pipeline” Beyond Programmable Shading
Course – Siggraph 2009 http://s09.idav.ucdavis.edu/• [Reshetov09] Alexander Reshetov, ”Morphological Antialiasing”• [Yang09] Jason C. Yang et al, High Performance Graphics 2009, ”A Directionally Adaptive Edge Anti-
Aliasing Filter”• [McGuire10] Morgan McGuire, High Performance Graphics 2010, ”Ambient Occlusion Volumes”
http://graphics.cs.williams.edu/papers/AOVHPG10/• [Biri10] Venceslas Biri et al, Siggraph 2010, “Practical morphological antialiasing on the GPU”• [Bavoil08] Louis Bavoil & Miguel Sainz, Siggraph 2008 “Image-Space Horizon-Based Ambient Occlusion”
http://developer.nvidia.com/object/siggraph-2008-HBAO.html• [Hill10] Stephen Hill, Game Developers Conference 2010 ”Rendering with Conviction”• [Kihl10] Robert Kihl, Advanced in Real-time Rendering in 3D Graphics and Games, Siggraph 2010,
”Destruction Masking in Frostbite 2 using Volume Distance Fields” http://publications.dice.se• [Sugerman09] Jeremy Sugerman et al - ACM Transactions on Graphics January, 2009 ”GRAMPS: A
Programming Model for Graphics Pipelines” http://graphics.stanford.edu/papers/gramps-tog/• [Perthuis10] Cedric Perthuis, ”MLAA in God of War 3” (PS3 registered developers only)
Beyond Programmable Shading, SIGGRAPH 2010 3904/11/23