45
The Intersection of The Intersection of Game Engines & GPUs: Game Engines & GPUs: Current & Future Current & Future Johan Andersson Johan Andersson Rendering Architect Rendering Architect 2.5

The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

Embed Size (px)

Citation preview

Page 1: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

The Intersection of The Intersection of Game Engines & GPUs:Game Engines & GPUs:

Current & FutureCurrent & Future

Johan AnderssonJohan AnderssonRendering ArchitectRendering Architect

2.5

Page 2: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

AgendaAgenda GoalGoal

Share and discuss current & future graphics use cases in Share and discuss current & future graphics use cases in our games and implications for graphics hardwareour games and implications for graphics hardware

AreasAreas Engine overviewEngine overview ShadersShaders ParallelizationParallelization TexturingTexturing RaytracingRaytracing GPU computeGPU compute

ConclusionsConclusions Q & AQ & A

Page 3: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

FrostbiteFrostbite DICE proprietary engineDICE proprietary engine

Xbox 360Xbox 360 PS3PS3 Windows (Direct3D 10)Windows (Direct3D 10)

FocusFocus Large outdoor environmentsLarge outdoor environments Singleplayer & multiplayerSingleplayer & multiplayer Destruction!Destruction! New: Content workflowsNew: Content workflows

Page 4: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

BFBC screenshotBFBC screenshot

Page 5: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

BFBC screenshotBFBC screenshot

Page 6: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)
Page 7: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

Graph-based surface Graph-based surface shadersshaders

Artist-friendlyArtist-friendly Easy to create, tweak & Easy to create, tweak &

managemanage FlexibleFlexible

Programmers & artists can Programmers & artists can extend & expose featuresextend & expose features

Data-centricData-centric Encapsulates resourcesEncapsulates resources TransformableTransformable

Rich high-level shading frameworkRich high-level shading framework Used by all content & systemsUsed by all content & systems

Page 8: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)
Page 9: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

Shader permutationsShader permutations Generate shader permutationsGenerate shader permutations

For eachFor each used combination of features/data used combination of features/data HLSL vertex & pixel shadersHLSL vertex & pixel shaders

Many features = permutation explosionMany features = permutation explosion Shader graphs, lighting, geometryShader graphs, lighting, geometry

Balance perf. vs permutations vs featuresBalance perf. vs permutations vs features Dynamic branchingDynamic branching Live with many permutationsLive with many permutations

Page 10: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

Shader subroutinesShader subroutines Next step: Static subroutine linkingNext step: Static subroutine linking

Inline in all subroutines at call siteInline in all subroutines at call site Similar to a switch statementSimilar to a switch statement

Reduces # permutations Reduces # permutations Implementation moved to driver or GPUImplementation moved to driver or GPU

Doesn’t work with instancingDoesn’t work with instancing

Future step: Dynamic subroutinesFuture step: Dynamic subroutines Control function pointers inside shaderControl function pointers inside shader Problem solved, but coherency importantProblem solved, but coherency important

Page 11: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

Rendering & ParallelizationRendering & Parallelization

Page 12: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

JobsJobs Must utilize multi-coreMust utilize multi-core

6 HW threads on Xbox 3606 HW threads on Xbox 360 6 SPUs on PS36 SPUs on PS3 2-8 cores on PC2-8 cores on PC

JobJob definition definition Fully independent stateless functionFully independent stateless function

PS3 SPU requirementPS3 SPU requirement

Graph dependenciesGraph dependencies Task-parallel and data-parallelTask-parallel and data-parallel

Page 13: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

Rendering jobsRendering jobs Refactor rendering Refactor rendering

systems to jobssystems to jobs

Most will move to GPUMost will move to GPU EventuallyEventually One-way data flowOne-way data flow Compute shaders & Compute shaders &

stream outputstream output

JobsJobs Decal projectionDecal projection Particle simulationParticle simulation Terrain geometry Terrain geometry

processingprocessing Undergrowth Undergrowth

generation [2]generation [2] Frustum cullingFrustum culling Occlusion cullingOcclusion culling Command buffer Command buffer

generationgeneration PS3: Triangle cullingPS3: Triangle culling

Page 14: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

Parallel command buffer Parallel command buffer recording recording

Dispatch draw calls and state to multiple Dispatch draw calls and state to multiple command buffers in parallelcommand buffers in parallel Scales linearly with # coresScales linearly with # cores 1500-4000 draw calls per frame1500-4000 draw calls per frame

Super-important for all platforms, used on:Super-important for all platforms, used on: Xbox 360Xbox 360 PS3 (SPU-based)PS3 (SPU-based)

No support in DX10!No support in DX10!

Page 15: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

DX10 parallel command buffer DX10 parallel command buffer rec.rec.

Single most important DX10 issue Single most important DX10 issue For us and many others (in the future)For us and many others (in the future)

Until future API supportUntil future API support Reduce draw calls with instancingReduce draw calls with instancing

Trade GPU performance for CPU performanceTrade GPU performance for CPU performance Reduce state & constant updatesReduce state & constant updates

Slow dynamic constant path Slow dynamic constant path Manual software command buffers Manual software command buffers

Difficult to update dynamic resources efficiently in Difficult to update dynamic resources efficiently in parallel due to APIparallel due to API

Page 16: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

PS3 geometry processing PS3 geometry processing (1/2)(1/2)

Slow GPU triangle & vertex setup Slow GPU triangle & vertex setup Unique situation with ”free” processorsUnique situation with ”free” processors

Not fully utilizedNot fully utilized

Solution: SPU triangle cullingSolution: SPU triangle culling Trade SPU time for GPU performanceTrade SPU time for GPU performance Cull back faces, micro-triangles, frustumCull back faces, micro-triangles, frustum

Sony PS3 EDGE librarySony PS3 EDGE library

5 jobs processes frame geometry in parallel5 jobs processes frame geometry in parallel Output is new index buffer for each draw callOutput is new index buffer for each draw call

Page 17: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

PS3 geometry processing PS3 geometry processing (2/2)(2/2)

Great flexibility and programmability!Great flexibility and programmability! Custom processingCustom processing

Partition bounding box cullingPartition bounding box culling Triangle part cullingTriangle part culling Clip plane triangle trivial accept & rejectClip plane triangle trivial accept & reject Triangle cull volumes (inverse clip planes)Triangle cull volumes (inverse clip planes)

Future: No vertex & geometry shadersFuture: No vertex & geometry shaders DIY compute shaders with fixed-func tesselation DIY compute shaders with fixed-func tesselation

and triangle setup unitsand triangle setup units Output buffer streaming still importantOutput buffer streaming still important

Page 18: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

Occlusion cullingOcclusion culling Buildings occlude objectsBuildings occlude objects

Tons of objectsTons of objects Difficult to implementDifficult to implement

Building destructionBuilding destruction Dynamic occludeesDynamic occludees Heavy GPU occlusion Heavy GPU occlusion

queriesqueries Invisible objects still have toInvisible objects still have to

Update logic & animationsUpdate logic & animations Generate command bufferGenerate command buffer Processed on CPU & GPUProcessed on CPU & GPU

Page 19: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

Software occlusion cullingSoftware occlusion culling Solution: Rasterize course Solution: Rasterize course

zbuffer on SPU/CPUzbuffer on SPU/CPU Low-poly occluder meshesLow-poly occluder meshes

100m view distance100m view distance Max 10000 vertices/frameMax 10000 vertices/frame Manually conservativeManually conservative

256x114 float z-buffer256x114 float z-buffer Created for PS3, now on allCreated for PS3, now on all

Cull all objects against zbufferCull all objects against zbuffer Before passed to all other Before passed to all other

systems = big savingssystems = big savings Screen-space bbox testScreen-space bbox test

Page 20: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

GPU occlusion cullingGPU occlusion culling Want GPU rasterization & testing, but:Want GPU rasterization & testing, but:

Occlusion queries introduces overhead & latencyOcclusion queries introduces overhead & latency Can be manageable, not idealCan be manageable, not ideal

Conditional rendering only helps GPUConditional rendering only helps GPU Not CPU, frame memory or draw callsNot CPU, frame memory or draw calls

Future1: Low-latency extra GPU exec contextFuture1: Low-latency extra GPU exec context Rasterization and testing done on GPURasterization and testing done on GPU Lockstep with CPULockstep with CPU

Future2: Move entire cull & rendering to GPUFuture2: Move entire cull & rendering to GPU Scene graph, cull, systems, dispatch. End goal.Scene graph, cull, systems, dispatch. End goal.

Page 21: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

TexturingTexturing

Page 22: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

Texture formatsTexture formats UsingUsing

DXT1/5 color maps, sRGBDXT1/5 color maps, sRGB BC5 (3Dc) normal mapsBC5 (3Dc) normal maps BC4 (DXT5A) for grayscale masksBC4 (DXT5A) for grayscale masks

sRGB support for BC4/5 would be nicesRGB support for BC4/5 would be nice

DXT1 replacement neededDXT1 replacement needed Low qualityLow quality 565 color bleeding565 color bleeding RG/RGB masks compresses badlyRG/RGB masks compresses badly HDR envmaps & lightmapsHDR envmaps & lightmaps

RGB DXT1 mask

DXT color bleed

Page 23: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)
Page 24: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

Future texture samplingFuture texture sampling Texture sampling derivativesTexture sampling derivatives

1st order 1st order texeltexel derivatives derivatives 2nd order as well?2nd order as well?

Implement in sampler unitImplement in sampler unit Bad performance or quality with Bad performance or quality with

shader sampling shader sampling Artifacts with ddx/ddy techniqueArtifacts with ddx/ddy technique

Replace normalmaps with easily Replace normalmaps with easily compressed bumpmapscompressed bumpmaps

Bicubic upsamplingBicubic upsampling Terrain masksTerrain masks

Terrain heightmap

Derived normals [2]

Page 25: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)
Page 26: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

Current sparse texturesCurrent sparse textures Save memory for terrainSave memory for terrain

Static quadtree mask textureStatic quadtree mask texture Dynamic sparse destruction Dynamic sparse destruction

maskmask

ImplementationImplementation Indirection texture lookup in atlasIndirection texture lookup in atlas

Arrays too small, want 8192 slicesArrays too small, want 8192 slices Correct bilinear filtering by bordersCorrect bilinear filtering by borders

Siggraph’07 course for details [2]Siggraph’07 course for details [2]

Source mask

Atlas texture

Page 27: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

HW sparse texturesHW sparse textures Virtual textureVirtual texture

HW texture filtering & mipmappingHW texture filtering & mipmapping Fallback on non-resident tile access Fallback on non-resident tile access Lower mipmap, default value or shader boolLower mipmap, default value or shader bool

At least 32k x 32k, fp issues with larger?At least 32k x 32k, fp issues with larger? Application-controlled tile commit/freeApplication-controlled tile commit/free

~128 x 128 tiles~128 x 128 tiles Feedback mechanism for referenced tilesFeedback mechanism for referenced tiles

Easy view-dependent allocationEasy view-dependent allocation

Future: Latency-free allocation & generationFuture: Latency-free allocation & generation Alt1. CPU thread callback & blockAlt1. CPU thread callback & block Alt2. Keep everything on GPU. ”Command” shader?Alt2. Keep everything on GPU. ”Command” shader?

Page 28: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

Cached Procedural Unique Cached Procedural Unique TexturingTexturing

Unique dynamic sparse texture on all objects Unique dynamic sparse texture on all objects Defined by texture shader graphDefined by texture shader graph

Combine procedurals, compositing, streaming and Combine procedurals, compositing, streaming and uv-space geometryuv-space geometry

Dynamically commit & render visible tilesDynamically commit & render visible tiles

Highly complex compositingHighly complex compositing Thanks to high frame-to-frame coherencyThanks to high frame-to-frame coherency Upsample and refineUpsample and refine

New dynamic effects made possibleNew dynamic effects made possible Affect every surfaceAffect every surface

Page 29: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

RaytracingRaytracing

Page 30: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

RaytracingRaytracing Much recent debate & interest in RTRTMuch recent debate & interest in RTRT What we are interested in:What we are interested in:

Performance!! Performance!! Rasterization for primary raysRasterization for primary rays DeterministicDeterministic

Easy integration into enginesEasy integration into engines Just another method for certain effects & objectsJust another method for certain effects & objects Not replace whole pipeline Not replace whole pipeline

Efficient dynamic geometryEfficient dynamic geometry Procedural & manual animation (foliage, characters)Procedural & manual animation (foliage, characters) Destruction (foliage, buildings, objects)Destruction (foliage, buildings, objects)

Page 31: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

Mirror’s EdgeMirror’s Edge

Page 32: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

Raytraced reflections Raytraced reflections wantedwanted Glass & metalGlass & metal

Mostly planar surfacesMostly planar surfaces Reflection localityReflection locality

Correct reflections for Correct reflections for important objectsimportant objects Main characterMain character

Simplified world geometry Simplified world geometry & shading for rest& shading for rest Common for gamesCommon for games Brickmaps? [3]Brickmaps? [3]

Page 33: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

Soft reflectionsSoft reflectionsMirror’s EdgeMirror’s Edge

Page 34: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

GPGPUGPGPU

Page 35: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

GPGPU usesGPGPU uses Effect physicsEffect physics

Particle vs world soft collisionParticle vs world soft collision AI pathfindingAI pathfinding AI visibilityAI visibility

View rasterization. Obstruction from smoke & View rasterization. Obstruction from smoke & foliagefoliage

Procedural animationProcedural animation Trees, undergrowth, hairTrees, undergrowth, hair

Post-processingPost-processing

Page 36: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

CUDA DOF post-process CUDA DOF post-process filterfilter

Circle of confusion map

Thesis work at DICE [4]Thesis work at DICE [4] Test CUDA and performanceTest CUDA and performance Poisson disc blurPoisson disc blur Multi-passed diffusionMulti-passed diffusion Seperable diffusionSeperable diffusion

Good:Good: Easy to learn (C)Easy to learn (C) Map complex algorithmsMap complex algorithms Thread & memory controlThread & memory control

Bad:Bad: Performance vs shadersPerformance vs shaders

Beta interopBeta interop Vendor-specificVendor-specific Output

Page 37: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

GPU Compute programming GPU Compute programming modelmodel

Wanted:Wanted: Easy & efficient Direct3D 10 interopEasy & efficient Direct3D 10 interop

Low-latency Compute tasksLow-latency Compute tasks

Vendor-independent base interfaceVendor-independent base interface OpenCL?OpenCL?

Efficient CPU multi-core backendEfficient CPU multi-core backend Server, older GPUs, debuggingServer, older GPUs, debugging MCUDA [5]MCUDA [5]

Eventually platform-independentEventually platform-independent Future consolesFuture consoles

Page 38: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

Conclusions Shader subroutines More software-controlled pipeline More texture sampler functionality Limited-case raytracing GPU compute for games

Page 39: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

Questions?Questions?

Contact: [email protected]: [email protected]

Page 40: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

ReferencesReferences [1] Tartarchuk, Natasha & Andersson, Johan. ”Rendering [1] Tartarchuk, Natasha & Andersson, Johan. ”Rendering

Architecture and Architecture and Real-time Procedural Shading & Texturing Real-time Procedural Shading & Texturing Techniques”. Techniques”. GDC 2007. LinkGDC 2007. Link

[2] Andersson, Johan. ”[2] Andersson, Johan. ”Terrain Rendering in Frostbite using Terrain Rendering in Frostbite using Procedural Shader Splatting”. Procedural Shader Splatting”. Siggraph 2007. Siggraph 2007. LinkLink

[3] [3] Christensen, Per H. & Batali, Dana. "An Irradiance Atlas for Christensen, Per H. & Batali, Dana. "An Irradiance Atlas for Global Illumination in Complex Production Scenes“. Eurographics Global Illumination in Complex Production Scenes“. Eurographics Symposium on Rendering 2004. Symposium on Rendering 2004. LinkLink

[4] Lonroth, Per & Unger, Mattias. ”Advanced Real-time Post-[4] Lonroth, Per & Unger, Mattias. ”Advanced Real-time Post-Processing using GPGPU techniques”. Master thesis, 2008.Processing using GPGPU techniques”. Master thesis, 2008.

[5] John Stratton, Sam Stone, Wen-mei Hwu. "MCUDA: An Efficient [5] John Stratton, Sam Stone, Wen-mei Hwu. "MCUDA: An Efficient Implementation of CUDA Kernels on Multi-cores". Technical report, Implementation of CUDA Kernels on Multi-cores". Technical report, University of Illinois at Urbana-Champaign, IMPACT-08-01, March, University of Illinois at Urbana-Champaign, IMPACT-08-01, March, 2008. 2008.

Page 41: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

Bonus slidesBonus slides

Page 42: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

Real-time REYESReal-time REYES Very interestingVery interesting

Displacement mapping & proceduralsDisplacement mapping & procedurals Stochastic samplingStochastic sampling Potentially more efficient & generalPotentially more efficient & general

Compared to maxed out rasterization & Compared to maxed out rasterization & tessellation on everything = pixel-sized trianglestessellation on everything = pixel-sized triangles

ButBut No experience No experience More research & experimentation neededMore research & experimentation needed

Page 43: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

Terrain detailTerrain detail Deriving normal from heightfield good in distanceDeriving normal from heightfield good in distance Future: HW tessellation & procedural Future: HW tessellation & procedural

displacement shaders for up close ground detaildisplacement shaders for up close ground detail

Page 44: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

Texture arraysTexture arrays Use cases:Use cases:

Everything!Everything! Rich parameterized shadersRich parameterized shaders

Vary slice index per instance, triangle or texel Vary slice index per instance, triangle or texel Instancing without comprimising on variation or perf.Instancing without comprimising on variation or perf.

Cascaded shadow mapsCascaded shadow maps HW PCF only in DX 10.1 HW PCF only in DX 10.1 Stable Cascaded Bounding Box Shadow MapsStable Cascaded Bounding Box Shadow Maps

Sparse texturesSparse textures More slices plzMore slices plz

For tile pools. 64x64x8192For tile pools. 64x64x8192

Page 45: The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

Other raytracing usesOther raytracing uses Global Illumination & Ambient OcclusionGlobal Illumination & Ambient Occlusion

Incremental Photon Mapping?Incremental Photon Mapping?

Async collision raycastsAsync collision raycasts AI pathfinding, gameplay, sound obstructionAI pathfinding, gameplay, sound obstruction Seperate collision world from visual worldSeperate collision world from visual world CPU job-based nowCPU job-based now