39
MANTLE FOR DEVELOPERS JOHAN ANDERSSON – TECHNICAL DIRECTOR FROSTBITE ELECTRONIC ARTS

Mantle for Developers

  • Upload
    dice

  • View
    131.997

  • Download
    0

Embed Size (px)

DESCRIPTION

Keynote presentation about Mantle by Johan Andersson at AMD Developer Summit 2013 (APU13)

Citation preview

Page 1: Mantle for Developers

MANTLE FOR DEVELOPERS

JOHAN ANDERSSON – TECHNICAL DIRECTORFROSTBITE

ELECTRONIC ARTS

Page 2: Mantle for Developers

Simplify advanced development

Improve performance

Enable developers to innovate

Challenge the status quo

Mantle?

Page 3: Mantle for Developers
Page 4: Mantle for Developers

Control GPU performanceCPU performance

Programmability Platforms

Developer impact areas

Page 5: Mantle for Developers

Explicit Model: Mantle

Traditional Model:Black Box

Middle-ground abstraction – compromise between performance & “usability”

Hidden resource memory & state Resource CPU access tied to device context Driver analyzes & synchronizes implicitly

Thin low-level abstraction to expose how hardware works

App explicit memory management Resources are globally accessible App explicit resource state transitions

Control

New model

Page 6: Mantle for Developers

Tell when render target will be used as a texture‒ And many more resource state transitions

Don’t destroy resources that GPU is using‒ Keep track with fences or frames

Manual dynamic resource renaming‒ No DISCARD for driver resource renaming

Resource memory tiling

Powerful validation layer will help!

App responsibilityControl

Page 7: Mantle for Developers

App high-level decisions & optimizations‒ Has full scene information‒ Easier to optimize performance & memory

Flexible & efficient memory management‒ Linear frame allocators‒ Memory pools‒ Pinned memory

Reduced development time‒ For advanced game engines & apps‒ Easier to get to target performance & robustness

Explicit control enablesControl

Page 8: Mantle for Developers

Light-weight driver‒ Easier to develop & maintain‒ Reduced CPU draw call overhead

Transient resources‒ Alias render targets within frame ‒ Major memory savings‒ No need to pre-allocate everything

Explicit control enablesControl

Page 9: Mantle for Developers

CPU performance

Page 10: Mantle for Developers

CPU perf

Descriptor sets Monolithic pipelines Command buffers

Core concepts

Page 11: Mantle for Developers

Table with resource references to bind to graphics or compute pipeline

Replaces traditional resource stage binding‒ Major performance & flexibility advantage ‒ Closer to how the hardware works

App managed - lots of strategies possible!‒ Tiny vs huge sets‒ Single vs multiple‒ Static vs semi-static vs dynamic

Example 1: Single simple dynamic descriptor set‒ Bind everything you need for a single draw call‒ Close to DX/GL model but share between stages

Descriptor setsCPU perf

LinkSampler

Image Memory

VertexBuffer (VS)

Texture0 (VS+PS)

Constants (VS)

Texture1 (PS)

Texture2 (PS)

Sampler0 (VS+PS)

Dynamic descriptor set

Page 12: Mantle for Developers

Table with resource references to bind to graphics or compute pipeline

Replaces traditional resource stage binding‒ Major performance & flexibility advantage‒ Closer to how the hardware works

App managed - lots of strategies possible!‒ Tiny vs huge sets‒ Single vs multiple‒ Static vs semi-static vs dynamic

Example 2: Reuse static set with nesting‒ Reduce update time & memory usage

Descriptor setsCPU perf

LinkSampler

Image Memory

Constants (VS)

Link

Dynamic descriptor set

Texture3 (PS)

Texture4 (PS)

Sampler0 (VS+PS)

Texture2 (PS)

Texture1 (PS)

Sampler1 (PS)

Static descriptor set

VertexBuffer (VS)

Texture0 (VS+PS)

Page 13: Mantle for Developers

CPU perf

Shader stages & select graphics state combined into single object‒ No runtime compilation or patching needed!‒ Significantly less runtime overhead to use

Supports parallel building & caching‒ Fast loading times

Usage & management up to the app‒ Static vs dynamic creation‒ Amount of pipelines‒ State usage

Monolithic pipelines

IA VS HS DSTessellator

GS RS PSDB

CB

Pipeline state

Page 14: Mantle for Developers

Issue pipelined graphics & compute commands into a command buffer‒ Bind graphics state, descriptor sets, pipeline‒ Draw calls‒ Render targets‒ Clears‒ Memory transfers‒ NOT: resource mapping

Fully independent objects‒ Create multiple every frame‒ Or pre-build up front and reuse

Command buffersCPU perf

Page 15: Mantle for Developers

RenderDriver Render

GameRender

Game GameRender

Automatically extracts parallelism out of most apps Doesn’t scale beyond 2-3 cores Additional latency Driver thread often bottleneck – can collide app threads

CPU 0

CPU 1

CPU 2

CPU perf

DX/GL parallelism

Page 16: Mantle for Developers

Render

Game

Render

Game Game

Render

App can go fully wide with its rendering – minimal latency Close to linear scaling with CPU cores No driver threads – no overhead – no contention Frostbite’s approach on all consoles – and on PC with Mantle!

Render

Render

Render

Render

Render

Render

Render

Render

Render

CPU 0

CPU 1

CPU 2

CPU 3

CPU 4

CPU perf

Parallel dispatch with Mantle

Page 17: Mantle for Developers

GPU performance

Page 18: Mantle for Developers

GPU perf

Thanks to improved CPU performance – CPU will rarely be a bottleneck for the GPU‒ CPU could help GPU more:

‒ Less brute force rendering‒ Improve culling

Shader pipeline object – driver optimizations‒ Can optimize with pipeline state knowledge‒ Can optimize across all shader stages

Resource states‒ Gives driver a lot more knowledge & flexibility‒ Apps can avoid expensive/redundant transitions,

such as surface decompression

Expose existing GPU functionality‒ Quad & Rect-lists‒ HW-specific MSAA & depth data access‒ Programmable sample patterns‒ And more..

GPU optimizations

Page 19: Mantle for Developers

Modern GPUs are heterogeneous machines with multiple engines‒ Graphics pipeline‒ Compute pipeline(s)‒ DMA transfer‒ Video encode/decode‒ More…

Mantle exposes queues for the engines + synchronization primitives

QueuesGPU perf

Graphics

Compute

DMA

GPU

. . .

Queues

Thibieroz, Nicolas
separate code fragment from picture for clarity and presentation
Page 20: Mantle for Developers

QueuesGPU perf

Graphics

Compute

DMA

GPU

. . .

Queues

Thibieroz, Nicolas
separate code fragment from picture for clarity and presentation
Page 21: Mantle for Developers

Async DMA transfers‒ Copy resources in parallel with graphics or

compute

Queue use casesGPU perf

Render Other render Use copy

CopyGraphics

DMA

Page 22: Mantle for Developers

Async DMA transfers‒ Copy resources in parallel with graphics or

compute

Async compute together with graphics‒ ALU heavy compute work at the same time as

memory/ROP bound work to utilize idle units

Queue use casesGPU perf

GBuffer Shadowmap 0 Shadowmap 1 Final lightingNon-shadowed lightingCompute

Graphics

Page 23: Mantle for Developers

Async DMA transfers‒ Copy resources in parallel with graphics or

compute

Async compute together with graphics‒ ALU heavy compute work at the same time as

memory/ROP bound work to utilize idle units

Multiple compute kernels collaborating‒ Can be faster than über-kernel‒ Example: Compute geometry backend & compute

rasterizer

Queue use casesGPU perf

Compute GeometryCompute 0

Compute 1

Graphics Ordinary RenderingCompute Rasterizer

Page 24: Mantle for Developers

Draw0 Draw1 Draw2Process0Compute

Graphics

Process1 Process0

Async DMA transfers‒ Copy resources in parallel with graphics or

compute

Async compute together with graphics‒ ALU heavy compute work at the same time as

memory/ROP bound work to utilize idle units

Multiple compute kernels collaborating‒ Can be faster than über-kernel‒ Example: Compute geometry backend & compute

rasterizer

Compute as frontend for graphics pipeline‒ Compute runs asynchronously ahead and prepares

& optimizes geometry for graphics pipeline

Queue use casesGPU perf

Page 25: Mantle for Developers

Async DMA transfers‒ Copy resources in parallel with graphics or

compute

Async compute together with graphics‒ ALU heavy compute work at the same time as

memory/ROP bound work to utilize idle units

Multiple compute kernels collaborating‒ Can be faster than über-kernel‒ Example: Compute geometry backend & compute

rasterizer

Compute as frontend for graphics pipeline‒ Compute runs asynchronously ahead and prepares

& optimizes geometry for graphics pipeline

Queue use casesGPU perf

Game engines will build large GPU job graphs‒ Move away from single sequential submission‒ Just as we already have done on CPU

Page 26: Mantle for Developers

Programmability

Page 27: Mantle for Developers

Programmability

Explicit control of GPU queues and synchronization, finally!‒ Implement your own Alternate-Frame-Rendering‒ Or something more exotic..

Use case: Workstation rendering with 4-8 GPUs‒ Super high-quality rendering & simulation‒ Load balance graphics & compute job graphs across GPUs‒ 20-40 TFlops in a single machine!

Use case: Low-latency rendering‒ Important for VR and competitive games‒ Latency optimized GPU job graph scheduling‒ VR: Simultaneously drive 2 GPUs (1 per eye)

Explicit Multi-GPU

Thibieroz, Nicolas
I'll see if I can get you a better picture of MGPU
Page 28: Mantle for Developers

Programmability

Command buffer predication & flow control‒ GPU affecting/skipping submitted commands‒ Go beyond DrawIndirect / DispatchIndirect‒ Advanced variable workloads ‒ Advanced culling optimizations

Write occlusion query results into GPU buffer‒ No CPU roundtrip needed‒ Can drive predicated rendering‒ Or use results directly in shaders (lens flares)

New mechanisms

Page 29: Mantle for Developers

Programmability

Mantle supports bindless resources‒ Shaders can select resources to use instead of

static binding from CPU‒ Extension of the descriptor set support

Key component that will open up a lot of opportunities!

Examples‒ Performance optimizations – less data to update‒ Logic & data structures that live fully on the GPU

‒ Scene culling & rendering‒ Material representations

‒ Deferred shading‒ Raytracing

Bindless resources

Page 30: Mantle for Developers

Platforms

Page 31: Mantle for Developers

Mantle gives us strong benefits on Windows today‒ Console-like performance & programmability on both Windows 7 and Windows 8‒ For us, well worth the dev time!

DX & GL are the industry standards‒ Needed for platforms that do not support Mantle‒ Needed by devs who do not want/need more control‒ Have to have fallback paths for GL/DX, but not limit oneself to it

Mantle and PlayStation 4 will drive our future Frostbite designs & optimizations‒ PS4 graphics API has great programmability & performance as well‒ Share concepts, methods & optimization strategies

TodayPlatforms

Page 32: Mantle for Developers

Want to see Mantle on Linux and Mac!‒ Would enable support for our full engine & rendering‒ Significantly easier to do efficient renderer with Mantle than with OpenGL

Use cases: ‒ Workstations‒ R&D

‒ Not limited by WDDM‒ Games

‒ Mantle + SteamOS = powerful combination!

Linux & MacPlatforms

Page 33: Mantle for Developers

Mobile architectures are getting closer in capabilities to desktop GPUs

Want graphics API that allows apps to fully utilize the hardware‒ Power efficient‒ High performance‒ Programmable

Major opportunity with Mantle – leap frog GL4, DX11‒ For mobile SoC vendors‒ For Google and Apple

MobilePlatforms

Page 34: Mantle for Developers

Mantle is designed to be a thin hardware abstraction‒ Not tied to AMD’s GCN architecture‒ Forward compatible‒ Extensions for architecture- and platform-specific functionality

Mantle would be a much more efficient graphics API for other vendors as well‒ Most Mantle functionality can be supported on today’s modern GPUs

Want to see future version of Mantle supported on all platforms and on all modern GPUs!‒ Become an active industry standard with IHVs and ISVs collaborating‒ Enable us developers to innovate with great performance & programmability everywhere

Multi-vendor?Platforms

Page 35: Mantle for Developers
Page 36: Mantle for Developers

Mantle support is in development‒ Core renderer (closer to PS4 than DX11)‒ Implement all rendering techniques used in BF4 (many!)‒ CPU optimizations (parallel dispatch, descriptor sets)‒ GPU optimizations (minimize transitions, MSAA)‒ R&D for advanced GPU optimizations‒ Memory management‒ Multi-GPU support‒ ~2 months of work

Update targeting late December

Battlefield 4Frostbite

Page 37: Mantle for Developers

Very different rendering compared to BF4

Frostbite Mantle renderer will work out of the box

Focus on APU performance

Plants vs Zombies: Garden WarfareFrostbite

Page 38: Mantle for Developers

All Frostbite games designed with Mantle‒ 15 games in development across all of EA

Advanced Mantle rendering & use cases‒ Lots of exciting R&D opportunities!

Want multi-vendor & multi-platform support!

FutureFrostbite

Page 39: Mantle for Developers

THE END

Email: [email protected]: http://frostbite.comTwitter: @repi