103
Vulkan Case Study 2016 Khronos Seoul DevU SAMSUNG Electronics Soowan Park Graphics Engineer ([email protected]) Joonyong Park Senior Graphics Engineer ([email protected])

2 Vulkan Case Study.pdf

  • Upload
    vandiep

  • View
    281

  • Download
    3

Embed Size (px)

Citation preview

Page 1: 2 Vulkan Case Study.pdf

Vulkan Case Study2016 Khronos Seoul DevUSAMSUNG Electronics

Soowan Park Graphics Engineer ([email protected])

Joonyong Park Senior Graphics Engineer ([email protected])

Page 2: 2 Vulkan Case Study.pdf

Samsung Electronics

Before the start

All case study information & contents are based on our development experiences with Galaxy S7 spanning two chipset variants, using the ARM Mali and Qualcomm Adreno GPU.

Page 3: 2 Vulkan Case Study.pdf

Samsung Electronics

Who we are

• GPU, Graphics R&D, MCD, SAMSUNG Electronics. • [email protected]

Page 4: 2 Vulkan Case Study.pdf

Samsung Electronics

What we did

ProtoStar, HIT, NFS, Vainglory

MWC, GDC, SDC, E3, Gamescom, CEDEC

Page 5: 2 Vulkan Case Study.pdf

Samsung Electronics

History

Page 6: 2 Vulkan Case Study.pdf

Samsung Electronics

History

Page 7: 2 Vulkan Case Study.pdf

Samsung Electronics

Agenda

1. Swapchain

2. Uniform Buffer

3. GPU Driver

4. Rendering

5. GLES Fall-back

6. Development Tip

Page 8: 2 Vulkan Case Study.pdf

Samsung Electronics

For who?

For Android Vulkan Developer.

It’s very simple case, But important!

Page 9: 2 Vulkan Case Study.pdf

1. Swapchain

Page 10: 2 Vulkan Case Study.pdf

Samsung Electronics

Swapchain - Android

Triple Buffering - Google Project Butter (Applied since Android 4.1 Jelly Bean release)• Android OpenGL ES runs with triple buffering by default

• adb shell dumpsys SurfaceFlinger →

Image Count of Swapchain• Android platform requires at least 3 buffers to have better performance for this reason.

With Java SurfaceView• Currently Android Vulkan only support native activity. But, there are way to using

SurfaceView & Java activity by passing surface handle to native through JNI to get NativeWindow handle.

q.v. : https://developer.android.com/ndk/reference/group___native_activity.html

• Recommend to using GLSurfaceView like separated java side Renderthread for main render loop.

#0 #1 #2 #0 #1User can’t control the number of BackBuffer in OpenGL ES

Page 11: 2 Vulkan Case Study.pdf

Samsung Electronics

Swapchain - Presentation Mode

• VK_PRESENT_MODE_MAILBOX_KHR

Swapchain Images

#0 #1 #2

Internal queue (impl dependant)

X*

vkAcquireNextImage

vkQueuePresent

vkAcquireNextImage

vkQueuePresent

vkAcquireNextImage

vkQueuePresent

#0X=#0

#1X=#1

#2X=#2

VBLANK

Display controller will read from #1

Latency

Page 12: 2 Vulkan Case Study.pdf

Samsung Electronics

Swapchain - Presentation Mode

• VK_PRESENT_MODE_FIFO_KHR

Swapchain Images

#0 #1 #2

Internal queue

X*

vkAcquireNextImage

vkQueuePresent

vkAcquireNextImage

vkQueuePresent

vkAcquireNextImage

vkQueuePresent

#0X=#0

#1Y=#1

#2Z=#2

VBLANK

Swaps #0 stored in X with the backbuffer.

Latency

Y* Z*

Page 13: 2 Vulkan Case Study.pdf

Samsung Electronics

Swapchain - Presentation Mode

VK_PRESENT_MODE_MAILBOX_KHR

VK_PRESENT_MODE_FIFO_KHR

※ DO NOT use MAILBOX mode in

game. Unless latency is critical and

you know what you’re doing.

60 FPS line

60 FPS line

Page 14: 2 Vulkan Case Study.pdf

Samsung Electronics

Swapchain - Presentation Mode

Code Level (q.v. : https://www.khronos.org/registry/vulkan/specs/1.0-wsi_extensions/xhtml/vkspec.html, , 29.5. Surface Queries)

uint32_t presentModeCount = 0;vkGetPhysicalDeviceSurfacePresentModesKHR(physicalDevice, surface, &presentModeCount, VK_NULL_HANDLE);std::vector<VkPresentModeKHR> pPresentModes(presentModeCount);vkGetPhysicalDeviceSurfacePresentModesKHR(physicalDevice, surface, &presentModeCount, pPresentModes.data());VkPresentModeKHR presentMode = VK_PRESENT_MODE_FIFO_KHR;

const uint32_t desiredArraySize = 2;VkPresentModeKHR desiredPresentMode[] ={

VK_PRESENT_MODE_FIFO_KHR,VK_PRESENT_MODE_MAILBOX_KHR

};

for (int d_n = 0; d_n < desiredArraySize; ++d_n){

for (int p_n = 0; p_n < presentModeCount; ++p_n){

if (pPresentModes[p_n] == desiredPresentMode[d_n]){

presentMode = desiredPresentMode[d_n];d_n = desiredArraySize;break;

}}

}

Page 15: 2 Vulkan Case Study.pdf

Samsung Electronics

Swapchain - SwapBuffer Comparison (Android)

WSI (Windows System Integration)RENDERFRAME N (Vulkan)

RENDERFRAME N (OpenGL ES)

APPLICATION

SURFACE FLINGER

DISPLAY

glClear / glDrawXXX #0 eglSwapBuffer #0Render Into BackBuffer (FrameBuffer 0) #0

EGLSurface : GfxBuffer #0

EGLSurface : GfxBuffer #1

EGLSurface : Gfxbuffer #2

WindowBuffer

DequeueQueue

EGLSurface : GfxBuffer #0

WindowBuffer

Associated Native Window

glFlush() #0COMMAND FLUSHING & RENDERING

No way to get GPU rendering completion

vkAcquireNextImageKHR #0

APPLICATION

SURFACE FLINGER

DISPLAY

VkImage(Buffer) #0

VkImage (Buffer) #1

VkImage (Buffer) #2

vkQueueSubmit #0Recorded Into Command Buffer #0

associated Graphics QueuevkQueuePresentKHR #0

WindowBuffer

Dequeue Queue

VkImage #0

WindowBuffer

WILL BLOCK HERE

Associated Native Window

COMMAND FLUSHING & RENDERING

Rendering Complete Semaphore

COMPLETE!

INTERNALWAIT

※ Application does the “blocking wait” to sync with GPU.(VK_PRESENT_MODE_FIFO_KHR)

WILL BLOCK HERE

Can explicitly get GPU rendering completion signal by using fence from submit

Page 16: 2 Vulkan Case Study.pdf

Samsung Electronics

Swapchain - Synchronization failed case

• Tearing

Page 17: 2 Vulkan Case Study.pdf

Samsung Electronics

Swapchain - Synchronization

• Fence Logic

VkCommandBufferPool(Single-Thread)

Swapchain

VkImage #0 VkImage #1 VkImage #2

VkFence #0

VkCommandBuffer #0

VkFence #1

VkCommandBuffer #1

VkFence #2

VkCommandBuffer #2

Swapchain

VkImage #0 VkImage #1 VkImage #2

VkFence #0

VkCommandBuffer #0

VkFence #1

VkCommandBuffer #1

VkFence #2

VkCommandBuffer #2

vkWaitForFences(fence #0)

vkResetFence(fence #0)

vkResetCommandBuffer(buf #0)

vkBeginCommandBuffer(buf #0)

Render ~

vkQueueSubmit(fence #0)

vkQueuePresentKHR

vkWaitForFences(fence #1)

vkResetFence(fence #1)

vkResetCommandBuffer(buf #1)

vkBeginCommandBuffer(buf #1)

Render ~

vkQueueSubmit(fence #1)

vkQueuePresentKHR

vkWaitForFences(fence #2)

vkResetFence(fence #2)

vkResetCommandBuffer(buf #2)

vkBeginCommandBuffer(buf #2)

Render ~

vkQueueSubmit(fence #2)

vkQueuePresentKHR

Page 18: 2 Vulkan Case Study.pdf

Samsung Electronics

Image Layout - Swapchain

• Transitioning to the correct image layout for presenting and rendering.

• Very begin of drawing, after the first acquire

• getSwapchainImagesKHR : VK_IMAGE_LAYOUT_UNDEFINED

• VK_IMAGE_LAYOUT_GENERAL

• Clear presentable image

• Draw Routine

• Acquire

• VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL

• Render

• VK_IMAGE_LAYOUT_PRESENT_SRC_KHR

• Present

// Create SwapchainvkGetSwapchainImagesKHR(device, swapchain, &swapchainImageCount, pSwapchainImages); // VK_IMAGE_LAYOUT_UNDEFINED

// Frame loopswapchainIndex = acquire();if (firstAcquire){

setImagesLayout(pSwapchainImages, swapchainImageCount, VK_IMAGE_LAYOUT_GENERAL);clearImages(pSwapchainImages, swapchainImageCount);

}setImageLayout(pSwapchainImages[swapchainIndex], VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL);/*Rendering*/setImageLayout(pSwapchainImages[swapchainIndex], VK_IMAGE_LAYOUT_PRESENT_SRC_KHR);present(swapchainIndex);

Page 19: 2 Vulkan Case Study.pdf

Samsung Electronics

Image Layout - Texture

VK_TILING_LINEAR

• Create with VK_IMAGE_LAYOUT_PREINITIALIZED

• Set ImageData using vkMapMemory, vkUnmapMemory

• VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL

VK_TILING_OPTIMAL

• Create with VK_IMAGE_LAYOUT_UNDEFINED

• VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL

• Set ImageData using Staging Buffer

• VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL

You can check format property like this

// Get Image Format PropertyVkFormatProperties formatProperty;vkGetPhysicalDeviceFormatProperties(physicalDevice, imageFormat, &formatProperty);if (formatProperty.optimalTilingFeatures & VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT) /**/;else if (formatProperty.linearTilingFeatures & VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT) /**/;

VkDescriptorImageInfo VkImageView VkImage

Texturing in Vulkan

VkSampler

VkDescriptorSet VkDeviceMemory

Page 20: 2 Vulkan Case Study.pdf

Samsung Electronics

Image Layout - Texture

• Why we should use Staging Buffer?

VK_TILING_LINEAR

Texels are laid out in memory in row-major order, possibly with some padding on each row

So you can access it with this eq.

Common

Compressed

VK_TILING_OPTIMAL

Texels are laid out in an implementation-

dependent arrangement, for more

optimal memory access

// (x,y,z,layer) are in texel coordinates

address(x,y,z,layer) = layer*arrayPitch +

z*depthPitch + y*rowPitch + x*texelSize +

offset;

// (x,y,z,layer) are in compressed texel

block coordinates address(x,y,z,layer) =

layer*arrayPitch + z*depthPitch + y*rowPitch

+ x*compressedTexelBlockByteSize + offset;

VkImage(VkDeviceMemory)

?VkImage(VkDeviceMemory)

Page 21: 2 Vulkan Case Study.pdf

Samsung Electronics

Image Layout - Texture

• How can use Staging Buffer?

vkCmdCopyBufferToImage

VkImage with VK_TILING_OPTIMAL

VkBuffer& stagingBuffer = getStagingBuffer(imageBufferSize);VkBufferImageCopy region = getRegionFromImage(image);fillBuffer(stagingBuffer, pImageData);vkCmdCopyBufferToImage(commandBuffer, stagingBuffer, image,VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, &region);

DO NOT use VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT

with VK_TILING_OPTIMAL.

?Image data Fill image data into the VkBuffer

VkBuffer

VkCommandBuffer

Page 22: 2 Vulkan Case Study.pdf

Samsung Electronics

Image Layout - Framebuffer (OnlyForColor)

• Bind for Attachment (transitioning Off-screen render target to input texture e.g. environment map.. Post-processing.. etc)

• VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL

• Bind for Texture

• VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL

// Initialize FrameBuffercreateFrameBuffer(frameBuffer); // VK_IMAGE_LAYOUT_UNDEFINEDsetImageLayout(frameBuffer, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL);………

//Bind FrameBufferbindFrameBuffer(frameBuffer); setImageLayout(frameBuffer, VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL);/*Render into frameBuffer*/unbindFrameBuffer(frameBuffer); // And set Default FramebuffersetImageLayout(frameBuffer, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL);setTexture(frameBuffer, 0);/*Render into backbuffer*/

Page 23: 2 Vulkan Case Study.pdf

Samsung Electronics

Image Layout - Framebuffer (OnlyForColor)

Off-screen #0 Original Scene

Off-screen #1 NormalMap for PostProcessing

VkImage #0VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL

VkImage #1VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL

VkImage #0VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL

VkImage #1VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL

Rendering

VkFramebuffer VkImageView VkImage

Framebuffer in Vulkan

VkDeviceMemory

Page 24: 2 Vulkan Case Study.pdf

Samsung Electronics

Swapchain - SurfaceFormat

?

Page 25: 2 Vulkan Case Study.pdf

Samsung Electronics

Swapchain - SurfaceFormat

getHolder().setFormat(PixelFormat.RGB_888)

Java to native, All related window surfaces are should have same format

VK_FORMAT_R8G8B8_UNORM

Recommend querying the surface format to check whether your target device supports it,

All Java- and native- side surfaces should use a matching format.

size_t surfaceFormatCount = 0;vkGetPhysicalDeviceSurfaceFormatsKHR(physicalDevice, surface, &surfaceFormatCount, VK_NULL_HANDLE);std::vector<VkSurfaceFormatKHR> surfaceFormats(surfaceFormatCount);vkGetPhysicalDeviceSurfaceFormatsKHR(physicalDevice, surface, &surfaceFormatCount, surfaceFormats.data());const size_t desiredArraySize = 2;VkFormat desiredSurafceFormats[] ={ VK_FORMAT_R8G8B8_UNORM, VK_FORMAT_R5G6B5_UNORM_PACK16};for (int d_n = 0; d_n < desiredArraySize; ++d_n){

for (int s_n = 0; s_n < surfaceFormatCount; ++s_n){if (surfaceFormats[s_n].format == desiredSurafceFormats[d_n]){

swapchainImageFormat = surfaceFormats[d_n].format; colorSpace = surfaceFormats[d_n].colorSpace;d_n = desiredArraySize; break;

}}

}

Need to match both format

or you need to check image format in renderpass.

APPLICATION

JAVA ACTIVITY

CREATE JAVA SURFACEVIEW

CREATE SWAPCHAIN

APPLICATION

NATIVE ACTIVITY

CREATE SWAPCHAIN

(Be careful with Java SurfaceView)

Page 26: 2 Vulkan Case Study.pdf

Samsung Electronics

Swapchain - Create / Recreation example rely on android activity events

• Surface handling

Activity Starts

surfaceCreated()

surfaceChanged()

surfaceChanged()

surfaceDestroyed()

surfaceCreated()

surfaceChanged()

Activity is shut down

resizeSurface

onPause

onResume

shutdown App

surfaceDestroyed()

Create VkSurfaceKHR

Create VkSwapchainKHR

Create VkSwapchainKHR –need to pass oldSwapchain

Destroy VkSwapchainKHR, VkSurfaceKHR

Create VkSurfaceKHR

Create VkSwapchainKHR

Destroy VkSwapchainKHR, VkSurfaceKHR

Event

Need to wait until queue is empty.

Create VkInstance

Destroy VkInstance

Crash

Page 27: 2 Vulkan Case Study.pdf

Samsung Electronics

Swapchain – Crash at onSurfaceChanged ( Resize )

SwapChain A

Image #0

Image #1

Image #2

CommandBuffer N-2Present Completed

CommandBuffer N-1Present Completed

CommandBuffer NPresenting

Surface Changed

pOldSwapchain

Crash!

Image #1

Image #2

Surface Changed

Create BvkQueueWaitIdle()

Wait till queue empty Create B

Create B

Passing Swapchain A to pOldSwapchain, Then It’s Internal resources are will be Destroyed at Swapchain B creation time.

Page 28: 2 Vulkan Case Study.pdf

Samsung Electronics

Swapchain – Crash at onSurfaceDestroyed ( Pause )

SwapChain A SwapChain B

Image #0

Image #1

Image #2

CommandBuffer N-2Present Completed

CommandBuffer N-1Present Completed

CommandBuffer NPresenting

Surface Destroye

donPause onResume

Destroy A

Surface Created

Crash!

Image #1

Image #2

Surface Destroyed

Destroy AvkQueueWaitIdle()

Wait till queue empty then destroy A

Page 29: 2 Vulkan Case Study.pdf

Samsung Electronics

Similar problem - Vulkan Object Release

destroyShader

destroyVertexBuffer

destroyIndexBuffer

Destroy graphicPipeline, descriptor…

Destroy VkBuffer, release or return VkDeviceMemory, …

Begin Frame

End Frame

destroyXXX

Queue

CommandBuffer #1

CommandBuffer #2

CommandBuffer N-1Presenting

CommandBuffer NIn-Progress

DestroyShader

vkDestroyPipeline

?VkPipeline

CommandBuffer #0

CommandBuffer N-2Present Completed

Page 30: 2 Vulkan Case Study.pdf

Samsung Electronics

Similar problem - Vulkan Object Release

RENDERFRAME N

RENDERFRAME N+1

RENDERFRAME N+2

RENDERFRAME N+3

VkCommandBuffer#0

VkCommandBuffer#1

VkCommandBuffer#2

VkCommandBuffer#0

Create VkPipeline

#0~#10

Dependency check VkPipeline

#0~#5

Use VkPipeline#0~#10

Use VkPipeline#0~#10

Use VkPipeline#6~#10

RENDERFRAME N+4

VkCommandBuffer#1

Destroy VkPipeline

#0~#5

Dependency check VkPipeline

#0~#5

Dependency check VkPipeline

#0~#5

Use VkPipeline#6~#10

DestroyShader

Page 31: 2 Vulkan Case Study.pdf

2. UniformBuffer

Page 32: 2 Vulkan Case Study.pdf

Samsung Electronics

UniformBuffer - Shader Memory Alignment

Expected

Page 33: 2 Vulkan Case Study.pdf

Samsung Electronics

UniformBuffer - Shader Memory Alignment

Error

Page 34: 2 Vulkan Case Study.pdf

Samsung Electronics

UniformBuffer - Shader Memory Alignment

ErrorExpected

Page 35: 2 Vulkan Case Study.pdf

Samsung Electronics

UniformBuffer - Shader Memory Alignment

layout(set=0, binding=0) uniform buf1{float _unif1; // #0vec3 _unif2; // #1vec2 _unif3; // #2

}

Expected

Convert SPIRV

Applied std140 layout

Result

q.v. : VulkanSpec_1.0.28, 14.5.4. Offset and Stride Assignment,

In case of shader using Vulkan GLSL Extension would not have alignment problem.

But, Need to be careful if you are using directly converted SPIR-V from it without alignment (std140) through glslang.

#2

#1

#0

#2#1#0

Order

Page 36: 2 Vulkan Case Study.pdf

Samsung Electronics

UniformBuffer - Shader Memory Alignment

• The Offset Decoration must be a multiple of its base alignment, computed recursively as follows:

◦ a scalar of size N has a base alignment of N

◦ a two-component vector, with components of size N , has a base alignment of 2N

◦ a three- or four-component vector, with components of size N , has a base alignment of 4N

◦ an array has a base alignment equal to the base alignment of its element type, rounded up to a multiple of 16

◦ a structure has a base alignment equal to the largest base alignment of any of its members, rounded up to a multiple of 16

◦ a row-major matrix of C columns has a base alignment equal to the base alignment of vector of C matrix components

◦ a column-major matrix has a base alignment equal to the base alignment of the matrix column type

• Any ArrayStride or MatrixStride decoration must be an integer multiple of the base alignment of the array or matrix from above.

• The Offset Decoration of a member immediately following a structure or an array must be greater than or equal to the next multiple of the base alignment of that structure or array.

q.v. : VulkanSpec_1.0.28, 14.5.4. Offset and Stride Assignment,

Page 37: 2 Vulkan Case Study.pdf

Samsung Electronics

layout(set=0, binding=0, std140) uniform buf1{mat4 _unif00; // #0vec4 _unif01; // #1vec4 _unif02; // #2

}

layout(set=0, binding=0, std140) uniform buf1{vec2 _unif00; // #0vec2 _unif01; // #1vec3 _unif02; // #2

}

VkDeviceMemory 4 Bytes

#0 #1 #2

#0

#1

#2

UniformBuffer - Shader Memory Alignment

Page 38: 2 Vulkan Case Study.pdf

Samsung Electronics

layout(set=0, binding=0, std140) uniform buf1{vec4 _unif00; // #0vec2 _unif01; // #1vec2 _unif02; // #2

}

layout(set=0, binding=0, std140) uniform buf1{vec2 _unif00; // #0float _unif01; // #1float _unif02; // #2

}

UniformBuffer - Shader Memory Alignment

VkDeviceMemory

#2

#0

#1

#2

#1

#0

Page 39: 2 Vulkan Case Study.pdf

Samsung Electronics

layout(set=0, binding=0, std140) uniform buf1{vec3 _unif00; // #0float _unif01; // #1vec2 _unif02; // #2

}

layout(set=0, binding=0, std140) uniform buf1{float _unif00; // #0vec3 _unif01; // #1vec2 _unif02; // #2

}

UniformBuffer - Shader Memory Alignment

VkDeviceMemory

#2

#0

#1

#2

#1#0

Page 40: 2 Vulkan Case Study.pdf

Samsung Electronics

layout(set=0, binding=0, std140) uniform buf1{float _unif00; // #0vec2 _unif01; // #1vec2 _unif02; // #2

}

layout(set=0, binding=0, std140) uniform buf1{float _unif00; // #0vec4 _unif01; // #1vec4 _unif02; // #2

}

UniformBuffer - Shader Memory Alignment

VkDeviceMemory

#2

#0

#1

#2

#1#0

Page 41: 2 Vulkan Case Study.pdf

Samsung Electronics

layout(set=0, binding=0, std140) uniform buf1{mat2 _unif00; // #0mat3 _unif01; // #1mat4 _unif02; // #2

}

layout(set=0, binding=0, std140) uniform buf1{mat3 _unif00; // #0float _unif01; // #1vec2 _unif02; // #2

}

UniformBuffer - Shader Memory Alignment

VkDeviceMemory

#0 #2#1

#2

#1#0

*

Sorting, multiple UBO, using vec4… there will be many other approaches depends on your application or engine.

Page 42: 2 Vulkan Case Study.pdf

Samsung Electronics

UniformBuffer - Memory Alignment

Memory Pools are useful for dynamic objects.

layout(set=0, binding=0, std140) uniform buf1{

vec2 _unif00; // #0vec2 _unif01; // #1float _unif02; // #2float _unif03; // #3

}

Assume that each object has following structure.

VkDeviceMemory1 Byte

vec2 vec2

float float

60 3 71 4 82 5 9 10 11 12

16 1817 19 2120

13 14 15

22 23

rendering issue

Page 43: 2 Vulkan Case Study.pdf

Samsung Electronics

UniformBuffer - Memory Alignment

UBO value corruption.

?

Page 44: 2 Vulkan Case Study.pdf

Samsung Electronics

UniformBuffer - Memory Alignment

• VkDescriptorBufferInfo - should be take care with given alignment from physical device limits

VkPhysicalDeviceLimits::minUniformBufferOffsetAlignment

VkDeviceMemory VkDeviceMemory

Applied Memory Alignment

Assume that minUniformBufferOffsetAlignment : 16, block size : 1 byte

Page 45: 2 Vulkan Case Study.pdf

Samsung Electronics

UniformBuffer - Memory Alignment

Code level

VkPhysicalDeviceProperties properties;vkGetPhysicalDeviceProperties(physicalDevice, &properties);size_t minUniformBufferOffsetAlignment = properties.limits.minUniformBufferOffsetAlignment;

size_t padding = 0;size_t mod = _uniformBufferSize % minUniformBufferOffsetAlignment;if (mod != 0){

padding = minMemoryMapAlignment - mod;}_nextBufferOffset = _uniformBufferSize + padding;

Page 46: 2 Vulkan Case Study.pdf

Samsung Electronics

UniformBuffer - Memory Alignment

Following limits in VkPhysicalDeviceLimits are important when you dealing with memory management.

size_t minMemoryMapAlignment;

VkDeviceSize minTexelBufferOffsetAlignment;

VkDeviceSize minUniformBufferOffsetAlignment;

VkDeviceSize minStorageBufferOffsetAlignment

VkDeviceSize nonCoherentAtomSize;

Page 47: 2 Vulkan Case Study.pdf

Samsung Electronics

UniformBuffer - Tile Artifact

Page 48: 2 Vulkan Case Study.pdf

Samsung Electronics

UniformBuffer - Tile Artifact

Page 49: 2 Vulkan Case Study.pdf

Samsung Electronics

UniformBuffer - Tile Artifact

Why?• Because we didn’t take care about multiple UniformBuffer usage.

Swapchain

VkImage #0 VkImage #1 VkImage #2

VkCommandBuffer #0

VkCommandBuffer #1

VkCommandBuffer #2

UniformBuffer

VkImage #0 VkImage #1 VkImage #2

VkCommandBuffer #0

VkCommandBuffer #1

VkCommandBuffer #2

Page 50: 2 Vulkan Case Study.pdf

Samsung Electronics

UniformBuffer - Tile Artifact

Mobile Tile based GPU

N N N N

N N N N

N N N+1 N+1

N = N Frame MVP Matrix

N+1 = N+1 Frame MVP Matrix(Changed)

Page 51: 2 Vulkan Case Study.pdf

Samsung Electronics

UniformBuffer - Tile Artifact

• Should have at least one UniformBuffer for each corresponding swapchain index of image

• Or using multiple DynamicOffset with A UniformBuffer in vkCmdBindDescriptorSets can solve this issue.

Swapchain

VkImage#0

VkImage#1

VkImage#2

VkCommandBuffer #0

VkCommandBuffer #1

VkCommandBuffer #2

UB #0

Swapchain

VkImage#0

VkImage#1

VkImage#2

VkCommandBuffer #0

VkCommandBuffer #1

VkCommandBuffer #2

UB #0 UB #1 UB #2

Swapchain

VkImage#0

VkImage#1

VkImage#2

VkCommandBuffer #0

VkCommandBuffer #1

VkCommandBuffer #2

UB #0

dynamicOffset

Page 52: 2 Vulkan Case Study.pdf

3. GPU Driver

Page 53: 2 Vulkan Case Study.pdf

Samsung Electronics

GPU Driver - Type of shader input

GPU Skinning problem

Page 54: 2 Vulkan Case Study.pdf

Samsung Electronics

GPU Driver - Type of shader input

Original data

SHADER

OpenGL ES Driver

Somehow driver will correct the type of input,

Even though incorrect type passed in

Original data

SHADER

Vulkan Driver

uint32 x 4

vec4

vec4 ← uvec4

uint32 x 4

vec4

vec4

Original data

SHADER

Vulkan Driver

uint32 x 4

uvec4

uvec4

EmptyPlease use correct

type of input.

Page 55: 2 Vulkan Case Study.pdf

Samsung Electronics

GPU Driver - API Comparison

In common cases, the GPU load should be the same using both APIs

But we’ve faced some cases where GLES was bit better than Vulkan

• Even though we were using the same vertices and indices

A lot of effort has been spent in OpenGL ES driver optimization.

Vulkan drivers are intended to be light-weight, predictable

and no more driver magic.

SO YOU HAVE TO IMPLEMENT OPTIMIZATIONS YOURSELF!

GPU loadGPU load

Page 56: 2 Vulkan Case Study.pdf

Samsung Electronics

GPU Driver - API Comparison

Geometry sorting (Vertex & Index)• There is a limitation that a range of vertices must be shaded. This means that a

triangle built from indices {0,1,2} should be significantly cheaper to execute than a triangle built from indices {0, 999, 1999} (3 vertices transformed vs. 2000 vertices transformed).

Without Geometry Sorting

With Geometry Sorting

Page 57: 2 Vulkan Case Study.pdf

4. Rendering

Page 58: 2 Vulkan Case Study.pdf

Samsung Electronics

Rendering - Quality

Sometimes you may see the color aliasing artifacts in Vulkan Applications.• You may need to consider changing surface format(RGB565 to RGB888)

• SPIR-V has only two precisions for shader calculation.• In Glslang logic

• lowp & mediump : RelaxedPrecision Decoration

• highp : empty

• You should consider using highp if your application needs accuracy.

• But, please use mediump wherever possible because of performance.

※ Saturate was modified for presentation.

Page 59: 2 Vulkan Case Study.pdf

Samsung Electronics

RGB32

Page 60: 2 Vulkan Case Study.pdf

Samsung Electronics

ETC1

Page 61: 2 Vulkan Case Study.pdf

Samsung Electronics

ASTC 6x6

Page 62: 2 Vulkan Case Study.pdf

Samsung Electronics

ASTC 8x8

Page 63: 2 Vulkan Case Study.pdf

Samsung Electronics

RGB32

Page 64: 2 Vulkan Case Study.pdf

Samsung Electronics

ETC1

Page 65: 2 Vulkan Case Study.pdf

Samsung Electronics

ASTC 6x6

Page 66: 2 Vulkan Case Study.pdf

Samsung Electronics

ASTC 8x8

Page 67: 2 Vulkan Case Study.pdf

Samsung Electronics

Rendering - Texture Format

You can optimize your app using various ASTC block sizes. But you should select proper option for texture quality.

ETC1, ASTC Comparison

※ It depends on the quality choice.

Anyway, It’s better to use!

Block Size Bits Per Pixel

4x4 8.00

5x4 6.40

5x5 5.12

6x5 4.27

6x6 3.56

8x5 3.20

8x6 2.67

10x5 2.56

10x6 2.13

8x8 2.00

10x8 1.60

10x10 1.28

12x10 1.07

12x12 0.89

Font, NormalMap, Color_Low, Color_HighUI, Etc…

Bandwidth (HIT case)

ETC1 read bandwidth = 14.56 MBs

ASTC read bandwidth = 13.80 MBs

bandwidth_delta = 0.76 MBs

bandwidth_reduction = 5.22%

APK SizeETC1 + OpenGL ES 2.0 : 599 MBASTC + Vulkan : 521 MB

Memory usageETC1 + OpenGL ES 2.0 : 1115 MBASTC + Vulkan : 557 MB

VkPhysicalDeviceFeaturestextureCompressionASTC_LDR

Page 68: 2 Vulkan Case Study.pdf

5. GLES Fall-back

Page 69: 2 Vulkan Case Study.pdf

Samsung Electronics

GLES Fall-back

Application::onCreate()

Is Vulkan supported Application::loadVulkanRHI ()

Application::loadOpenGLESRHI ()

Application::onSurfaceCreated()

Application::onSurfaceResize()

Highly recommend to put API detection at very first initialization stage –Before the surface creation not to waste additional resource allocations ..etc.

Create Swapchain

Create EGLSurface

Application::Vulkan::Resize()

Application::GLES::Resize()

Resize Swapchain

Resize Surface(FB, Viewport)

Page 70: 2 Vulkan Case Study.pdf

Samsung Electronics

GLES Fall-back - Vulkan Detection(Mobile)

Attempt to load Vulkan PDK

Attempt to loadBasic functions

Attempt to createinstance

Attempt to createInstance with

stable API version

Attempt to loadAll functions

Get/Check patch version from driver

1.0.0

1.0.11

THREE ADDITIONAL CHECKS

success == dlopen(“libVulkan.so”)

VK_SUCCESS == vkCreateInstanceVK_SUCCESS == vkCreateDevice

success == vkGetInstanceProcAddr

Be careful that some early patch versions may be unstable or lacking features. We use the GLES renderer for patch versions less than 11.

success == vkGetDeviceProcAddr

Page 71: 2 Vulkan Case Study.pdf

Samsung Electronics

GLES Fall-back - Resources

• Texture Resource • We do recommend to using ASTC but, still not supported drivers out there.

• At moment, Common texture format for GLES in market is ETC1_RGB8_OES.

• Maintaining 2 formats of texture pack just for the API compatibility would be burden.

• If you want to port a game which is using ETC1 to Vulkan, you can use the existing resources by using back-word compatible format VK_FORMAT_ETC2_R8G8B8_UNORM_BLOCK.

• Shader Resource• Vulkan requires SPIR-V, You can have following 2 options for the shader resources.

1) ES310 (std140) GLSL + Runtime SPIR-V conversion https://github.com/google/shaderc

2) ES2/310 (std140) GLSL + Offline-compiled SPIR-V

ES310GLSL Code

glslang(shaderC)

SPIR-VRuntimeCONVERSION

PersistentPipelineCache

GLSL Code(ES310)

PersistentProgram Binary

SPIR-VPersistent

PipelineCache

GLSL Code(ES2/310)

PersistentProgram Binary

ES310GLSL Code

glslang OfflineCONVERSION

GLSL Code(ES2.0)

Page 72: 2 Vulkan Case Study.pdf

6. Development Tip

Page 73: 2 Vulkan Case Study.pdf

Samsung Electronics

Development Tip - Vulkan Viewport Volume

ResultExpected

?

Page 74: 2 Vulkan Case Study.pdf

Samsung Electronics

Development Tip - Vulkan Viewport Volume

• Vulkan Viewport Volume• Basically, Vulkan use OriginUpperLeft Execution model. And default Viewport volume

is different from OpenGL ES.(The OriginLowerLeft execution mode must not be used; fragment entry points must declare OriginUpperLeft)

q.v. : VulkanSpec1.0.28 / A.3. Validation Rules within a module.

Add VertexShader PostFix

Multiply VMatrix in front of MVP

Modify Math Function

X (-1 ~ +1)

Y (-1 ~ +1)

Z (-1 ~ +1)

X (-1 ~ +1)

Y (-1 ~ +1)

Z (0 ~ +1)

NDC Space (Ex : DepthFunc Less, DepthClear 1.0f)

Page 75: 2 Vulkan Case Study.pdf

Samsung Electronics

Development Tip - Vulkan Viewport Volume

// VertexShadervoid main(){

gl_Position=vec4(0.5, 0.5, 0.0, 1.0);}

Page 76: 2 Vulkan Case Study.pdf

Samsung Electronics

Development Tip - Vulkan Viewport Volume

Add VertexShader PostFix.

#version 310 esprecision highp float…void main(){

…gl_position = MVP*_vertex;gl_position.y = -gl_position.y; // addedgl_position.z = (gl_position.z + gl_position.w) / 2.0; // addedreturn;

}

You will face this kind of problem without above correction in Vulkan

Page 77: 2 Vulkan Case Study.pdf

Samsung Electronics

Development Tip - Vulkan Viewport Volume

• Multiply VMatrix in front of MVP.

MAT4 VMatrix = { 1.0f, 0.0f, 0.0f, 0.0f, 0.0f, -1.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.5f, 0.5f, 0.0f, 0.0f, 0.0f, 1.0f };

Or (Depend on usage)

MAT4 VMatrix = { 1.0f, 0.0f, 0.0f, 0.0f, 0.0f, -1.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.5f, 0.0f, 0.0f, 0.0f, 0.5f, 1.0f };

MVP = VMatrix * P * V * M;setUniform(MVP);

Page 78: 2 Vulkan Case Study.pdf

Samsung Electronics

Development Tip - Vulkan Viewport Volume

Modify Math Function (q.v. : http://glm.g-truc.net/0.9.8/index.html)

#define GLM_LEFT_HANDED 0x00000001// For DirectX, Metal, Vulkan#define GLM_RIGHT_HANDED 0x00000002// For OpenGL, default in GLM

Orthoif GLM_DEPTH_CLIP_SPACE == GLM_DEPTH_ZERO_TO_ONEResult[2][2] = - static_cast<T>(1) / (zFar - zNear);Result[3][2] = - zNear / (zFar - zNear);#elseResult[2][2] = - static_cast<T>(2) / (zFar - zNear);Result[3][2] = - (zFar + zNear) / (zFar - zNear);#endif

Ortho#if GLM_COORDINATE_SYSTEM == GLM_LEFT_HANDEDreturn orthoLH(left, right, bottom, top, zNear, zFar);#elsereturn orthoRH(left, right, bottom, top, zNear, zFar);#endif

Frustum#if GLM_DEPTH_CLIP_SPACE == GLM_DEPTH_ZERO_TO_ONEResult[2][2] = farVal / (farVal - nearVal);Result[3][2] = -(farVal * nearVal) / (farVal - nearVal);#elseResult[2][2] = (farVal + nearVal) / (farVal - nearVal);Result[3][2] = - (static_cast<T>(2) * farVal * nearVal) /(farVal - nearVal);#endifPerspectiveif GLM_DEPTH_CLIP_SPACE == GLM_DEPTH_ZERO_TO_ONEResult[2][2] = zFar / (zFar - zNear);Result[3][2] = -(zFar * zNear) / (zFar - zNear);#elseResult[2][2] = (zFar + zNear) / (zFar - zNear);Result[3][2] = - (static_cast<T>(2) * zFar * zNear) /(zFar - zNear);#endif

Frustum#if GLM_COORDINATE_SYSTEM == GLM_LEFT_HANDEDreturn frustumLH(left, right, bottom, top, nearVal, farVal);#elsereturn frustumRH(left, right, bottom, top, nearVal, farVal);#endif

Perspective#if GLM_COORDINATE_SYSTEM == GLM_LEFT_HANDEDreturn perspectiveFovLH(fov, width, height, zNear, zFar);#elsereturn perspectiveFovRH(fov, width, height, zNear, zFar);#endif

Page 79: 2 Vulkan Case Study.pdf

Samsung Electronics

Development Tip - Validation Layer

Loader supports layering APIs

vkQueueSubmit

vkQueueSubmit

vkQueueSubmit

vkQueueSubmit

Call Vulkan Function

libVkLayerXXX.so

libVkLayerXXXX.so

Application

Loader

Driver

vulkan.XXX.so

libVulkan.so

Page 80: 2 Vulkan Case Study.pdf

Samsung Electronics

Development Tip - Validation Layer

Sometimes you may face VK_ERROR_DEVICE_LOST or unexpected error (GPU hang)

Turn on validation layer and fix it!

• [MEM, 3]: Linear Image 0x515 is aliased with non-linear image 0x507 which is in violation of the Buffer-Image Granularity section of the Vulkanspecification

• [DS, 49]: ]DS 0x890 encountered the following validation error at draw time: Dynamic descriptor in binding #1 at global descriptor index 2 uses buffer 4 with dynamic offset 524000 combined with offset 0 and range 720 that oversteps the buffer size of 524288[MEM, 12]: vkCmdBeginRenderPass(): cannot read invalid memory 0x26, please fill the memory before using

• bCode 7 : Cannot map an image with layout VK_IMAGE_LAYOUT_UNDEFINED. Only GENERAL or PREINITIALIZED are supported.• Code 7 : Cannot submit cmd buffer using image (0xffffffffc91fe3b0) [sub-resource: aspectMask 0x1 array layer 2, mip level 5], with layout

VK_IMAGE_LAYOUT_UNDEFINED when first use is VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL.• Code 10 : Attempt to set lineWidth to 0.000000 but physical device wideLines feature not supported/enabled so lineWidth must be 1.0f! • Code 7 : Cannot submit cmd buffer using image (0xffffffffc7fe7ea0) [sub-resource: aspectMask 0x1 array layer 0, mip level 0], with layout

VK_IMAGE_LAYOUT_UNDEFINED when first use is VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL. • [MEM] Code 12 : vkCmdBeginRenderPass(): Cannot read invalid memory 0xffffffffc93bf470, please fill the memory before using. • Code 7 : You cannot transition the layout from VK_IMAGE_LAYOUT_GENERAL when current layout is

VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL.• Code 25 : Unable to allocate 2 descriptors of type VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER from pool 0xffffffffcf476468. This pool only has 0

descriptors of this type remaining. • Code 54 : Attempt to reset command buffer (0x28ceabb21c) which is in use. • [MEM] Code 9 : Calling vkBeginCommandBuffer() on active CB 0x0xceabb21c before it has completed. You must check CB fence before this call.• Code 27 : vkUpdateDescriptorsSets() failed write update validation for Descriptor Set 0xffffffffd0e00188 with error: Cannot call

vkUpdateDescriptorSets() to perform write update on descriptor set 18446744072918925704 that is in use by a command buffer. FIX Candidate CL 83010

• Code 53 : Command Buffer 0xcf3be004 is already in use and is not marked for simultaneous use. • Code 54 : Attempt to reset command buffer (0x28ceab721c) which is in use. Code 9 : Calling vkBeginCommandBuffer() on active CB 0x0xceab721c

before it has completed. You must check CB fence before this call. • Code 7 : You cannot transition the layout from VK_IMAGE_LAYOUT_GENERAL when current layout is

VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL.

Page 81: 2 Vulkan Case Study.pdf

Samsung Electronics, MCD, GPU

Validation Error #1

CommandBuffer N

vkQueueSubmit (N) vkWaitForFence (N-2)

vkResetCommandBuffer (N-2)

CommandBuffer N-1

CommandBuffer N-2

vkQueueSubmit (N-2)

vkQueueSubmit (N-1)

Frame N

Frame N + 1

Frame N + 2

Wait Index ++ % 3

CommandBuffer N

vkQueueSubmit (N) vkWaitForFence (N)vkBeginCommandBuffer (N-2)

Frame N + 2

Wait Index ++ % 3

Code 9 : Calling vkBeginCommandBuffer() on active CB 0x0xceab721c before it has completed.You must check CB fence before this call.Code 53 : Command Buffer 0xcf3be004 is already in use and is not marked for simultaneous use.Code 54 : Attempt to reset command buffer (0x28ceab721c) which is in use.

Normal Case

Bug Case

ONE_TIME_SUBMIT

vkBeginCommandBuffer (N-2)

vkResetCommandBuffer (N-2)

Page 82: 2 Vulkan Case Study.pdf

Samsung Electronics, MCD, GPU

Validation Error #2

[MEM, 3]: Linear Image 0x515 is aliased with non-linear image 0x507 which is in violation of the Buffer-Image Granularity section of the Vulkan specification

Image (SwapChain) TILING_OPITMAL

Image (Staging) TILING_LINEAR

vkCmdCopyImageRead

MemoryCaptured RAW data

Image (SwapChain) TILING_OPITMAL

Buffer (Staging) N/AvkCmdCopyImageToBufferRead

MemoryCaptured RAW data

Normal Case

Bug Case

Page 83: 2 Vulkan Case Study.pdf

Samsung Electronics, MCD, GPU

Validation Error #3

Code 7 : Cannot submit cmd buffer using image (0xffffffffc91fe3b0) [sub-resource: aspectMask 0x1 array layer 2, mip level 5], with layoutVK_IMAGE_LAYOUT_UNDEFINED when first use is VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL.

Code 7 : You cannot transition the layout from VK_IMAGE_LAYOUT_GENERAL when current layout isVK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL.

And so on…

copy, resolve, clear…

Transition for image copy, resolve, clear… etc

Restore original image layout

switch (ImageLayout){

case VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL:return VK_ACCESS_TRANSFER_READ_BIT;

case VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL:return VK_ACCESS_TRANSFER_WRITE_BIT;

case VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL:return VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;

case VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL:return VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;

case VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL:return VK_ACCESS_SHADER_READ_BIT | VK_ACCESS_INPUT_ATTACHMENT_READ_BIT;

case VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL:return VK_ACCESS_SHADER_READ_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT;

case VK_IMAGE_LAYOUT_GENERAL:return VK_ACCESS_INPUT_ATTACHMENT_READ_BIT |

VK_ACCESS_SHADER_READ_BIT |VK_ACCESS_SHADER_WRITE_BIT |VK_ACCESS_COLOR_ATTACHMENT_READ_BIT |VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT |VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT |VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT |VK_ACCESS_TRANSFER_READ_BIT |VK_ACCESS_TRANSFER_WRITE_BIT |VK_ACCESS_MEMORY_READ_BIT |VK_ACCESS_MEMORY_WRITE_BIT;

case VK_IMAGE_LAYOUT_PRESENT_SRC_KHR:return VK_ACCESS_MEMORY_READ_BIT;

case VK_IMAGE_LAYOUT_UNDEFINED:case VK_IMAGE_LAYOUT_PREINITIALIZED:

return 0;}

Set Image memory barrier

Page 84: 2 Vulkan Case Study.pdf

Samsung Electronics, MCD, GPU

Validation Error #4

[DS, 49]: DS 0x890 encountered the following validation error at draw time: Dynamic descriptor in binding#1 at global descriptor index 2 uses buffer 4 with dynamic offset 524000 combined with offset 0 andrange 720 that oversteps the buffer size of 524288

BUFFER ( 524288 )

BUFFER ( 524000 ) 720

[MEM, 12]: vkCmdBeginRenderPass(): cannot read invalid memory 0x26, please fill the memory before using

VK_ATTACHMENT_LOAD_OP_DONT_CARE

VK_ATTACHMENT_LOAD_OP_CLEAR / LOAD

RenderPass - Attachment Description

Code 10 : Attempt to set lineWidth to 0.000000 but physical device wideLines feature not supported/enabledso lineWidth must be 1.0f!

No Debug Lines?VkPipelineRasterizationStateCreateInfo.lineWidth

Page 85: 2 Vulkan Case Study.pdf

Samsung Electronics

Development Tip - VkPipelineCache

If you don’t use VkPipelineCache, you may face performance problem. (lag)

vkCreateGraphicsPipelines -> It’s really slow! So that we need to use VkPipelineCache.

Without VkPipelineCache With VkPipelineCache (Persistent)

13.260 seconds 4.187 seconds

Game Loading Time (createGraphicPipeline 300 EA + @)

onResume

onPause

size_t pDataSize = 0;vkGetPipelineCacheData(device, pipelineCache, &pDataSize, VK_NULL_HANDLE);// if is validvkGetPipelineCacheData(device, pipelineCache, &pDataSize, pipelineCacheData.data());savePipelineCacheToSDcard(pipelineCacheData);

std::vector<unsigned char*>& pipelineCacheData = getPipelineCacheFromSDcard();VkPipelineCacheCreateInfo pipelineCacheCreateInfo = {};pipelineCacheCreateInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_CACHE_CREATE_INFO;pipelineCacheCreateInfo.initialDataSize = pipelineCacheData.size();pipelineCacheCreateInfo.pInitialData = pipelineCacheData.data();VkPipelineCache pipelineCache = VK_NULL_HANDLE;vkCreatePipelineCache(device, &pipelineCacheCreateInfo, VK_NULL_HANDLE, &pipelineCache);

createGraphicPipeline vkCreateGraphicsPipelines(device, pipelineCache, 1, &createInfo, VK_NULL_HANDLE, &pipline);

Page 86: 2 Vulkan Case Study.pdf

Samsung Electronics

Development Tip - Managing VkPipeline

Let’s think about very simple renderer logic.

To make pipeline need a lot of information.

VkGraphicsPipelineCreateInfoVkPipelineVertexInputStateCreateInfoVkPipelineInputAssemblyStateCreateInfoVkPipelineRasterizationStateCreateInfoVkPipelineColorBlendStateCreateInfoVkPipelineDepthStencilStateCreateInfoVkPipelineViewportStateCreateInfoVkPipelineMultisampleStateCreateInfoVkDynamicStateVkPipelineDynamicStateCreateInfoVkPipelineShaderStageCreateInfo…

setShader

draw

setTexture

Initialize

setRenderState

glUseProgram

glEnableglDisablegl...

glBindTexture

glDraw…vkCreateGraphicsPipelinesvkCmdDraw…

Page 87: 2 Vulkan Case Study.pdf

Samsung Electronics

Make structure to reuse VkPipeline

Development Tip - Managing VkPipeline

VertexShader FragmentShader

Ignore this block in current case

VertexAttribute #0stride, location, binding

VertexAttribute #1stride, location, binding

RenderState #0depth enable, …

RenderState #1depth disable, …

VkPipelineVertexInputStateCreateInfo, …

VkPipelineDepthStencilStateCreateInfo, …

vkCreateGraphicsPipelines

VkPipeline #0 VkPipeline #1

For example worst case,

Given RenderState & Attributes can be changed every per drawcall.

So that, having efficiently designed pipeline management structure will be very Important for your performance optimization.

setShader

draw

setTexture

setRenderState

Page 88: 2 Vulkan Case Study.pdf

Samsung Electronics

VkFreambuffer

VkRenderpass

Development Tip - Managing VkRenderpass, VkFramebuffer

VkFramebufferCreateInfo {…VkRenderPass renderPass;…

}

VkRenderPassCreateInfo {…uint32_t attachmentCount;const VkAttachmentDescription* pAttachments;

…}

VkAttachmentDescription {…VkAttachmentLoadOp loadOp;VkAttachmentStoreOp storeOp;

…} VkAttachmentDescription;

VK_ATTACHMENT_LOAD_OP_LOADVK_ATTACHMENT_LOAD_OP_CLEARVK_ATTACHMENT_LOAD_OP_DONT_CARE

vkCreateRenderPass

VkRenderPass #0 VkRenderPass #1

vkCreateFramebuffer

VkFramebuffer #0 VkFramebuffer #1

VkRenderpass & VkFramebuffer also should consider reusing.

Page 89: 2 Vulkan Case Study.pdf

Samsung Electronics

Development Tip - Clear framebuffer cost

There are 3 way to clear framebuffer (color, depth, stencil)• Renderpass Load Operation

• vkCmdClearAttachments

• vkCmdClearColorImage/vkCmdClearDepthStencilImage

It’s important to using proper clear approach to not waste additional clear cost

( e.g. clear all, color only, depth only )

• 1 clear color & 30 clear depth

Recommend to not clearing framebuffer by loading empty Renderpass begin()/end()

without actual draw calls.. etcetera.

Renderpass begin/end using LoadOpClear

vkCmdClearAttachments

24 FPS 57 FPS

Page 90: 2 Vulkan Case Study.pdf

Samsung Electronics

Development Tip - Clear framebuffer cost

Color

Depth

Stencil

Clear All

Color

Depth

Stencil

Clear Depth

Color

Depth

Stencil

Clear Depth

Color

Depth

Stencil

LoadOpClear

Color

Depth

Stencil

Clear All

Color

Depth

Stencil

Clear Depth

Depth

Clear Depth

Depth

vkCmdClearAttachments

Only Renderpass begin/end

Renderpass begin/end + APIs Faster!

LoadOpClear

LoadOpClear

LoadOpClear

LoadOpLoad

LoadOpLoad

LoadOpClear

LoadOpLoad

LoadOpLoad

LoadOpClear

LoadOpClear

LoadOpClear

vkCmdClearAttachments

Page 91: 2 Vulkan Case Study.pdf

Samsung Electronics

Development Tip - Clear framebuffer cost

Very simple example only for description.

Is insideRenderpass?

Request clear

vkCmdClearAttachments()

Set variable VK_ATTACHMENT_LOAD_OP_CLEAR

Next event

false

true

Is insideRenderpass?

Request drawPrimitive

Next event

false

true

DrawPrimitive()

Find properRenderpass

vkCmdBeginRenderPass()

CreateRenderpass()

false

true

Need to store it!

Get variable VK_ATTACHMENT_LOAD_OP_CLEAR

Page 92: 2 Vulkan Case Study.pdf

Samsung Electronics

Development Tip - PushConstant

Push constants are helpful to increase performance (the effect is GPU dependent)

It’s very easy to use.

But you should check device limit. VkPhysicalDeviceLimits::maxPushConstantsSize

// VertexShader…layout(push_constant) uniform buf1{mat4 _unif00;

} pc; // you cannot skip instancing, if uniform is push_constant.void main(){gl_position = pc._unif00 * _in_vertex;

}

vkCmdPushConstants(commandBuffer, layout, stageFlags, offset, MVPMatrix.size(), MVPMatrix.data());

VkPipelineLayout

Page 93: 2 Vulkan Case Study.pdf

Samsung Electronics

Development Tip - Swizzle

ExpectedError

The map texture format was ETC1 (VK_FORMAT_ETC2_R8G8B8_UNORM_BLOCK)

// Fragment shadervoid main() {

…vec4 mapColor = texture(mapSampler, texCoord);fragColor = mapColor.rgb * mapColor.a;

}

Page 94: 2 Vulkan Case Study.pdf

Samsung Electronics

Development Tip - Swizzle

VkComponentMapping {VkComponentSwizzle r;VkComponentSwizzle g;VkComponentSwizzle b;VkComponentSwizzle a;

};

VkImageViewCreateInfo {…

VkComponentMapping components;…

};

VK_COMPONENT_SWIZZLE_ONE

Page 95: 2 Vulkan Case Study.pdf

Samsung Electronics

Development Tip - SecondaryCommandBuffer + Multi-thread

This is simple logic for using SecondaryCommandBuffer.

Recording Phase Execute Phase

Create Secondary CommandBuffer

Bind GraphicPipelineAssume that there are no dynamic

PSOs

Bind DescriptorSet

Bind VertexBuffer / IndexBuffer

Draw

Execute Secondary CommandBuffer

Begin Primary CommandBuffer

EndPrimary CommandBuffer

Update Phase

UniformBuffer

Update UniformBuffer

Begin

End

Page 96: 2 Vulkan Case Study.pdf

Samsung Electronics

Development Tip - SecondaryCommandBuffer + Multi-thread

This is simple logic for using SecondaryCommandBuffer.

SCB #0

SCB #1

SCB #2

SCB #3

SCB #4

SCB #5

UniformBuffer #0

UniformBuffer #1

UniformBuffer #2

UniformBuffer #3

UniformBuffer #4

UniformBuffer #5

Create & Record SecondaryCommandBuffer

Bind

Draw phase

Just update UniformBuffer &

Execute SecondaryCommandBuffer!

UniformBuffer #0

UniformBuffer #1

UniformBuffer #2

UniformBuffer #3

Page 97: 2 Vulkan Case Study.pdf

Samsung Electronics

Development Tip - SecondaryCommandBuffer + Multi-thread

OpenGL ES 2.0 + Single thread

Vulkan + SecondaryCommandBuffer + Multi thread

※ CPU side matrix transform calculation.

Page 98: 2 Vulkan Case Study.pdf

Samsung Electronics

Development Tip - SecondaryCommandBuffer + Multi-thread

Queue

CPU Thread

CPU Thread

CPU Thread

CPU Thread

With Secondary Command Buffer

Update buffer 1/4

Update buffer 2/4

Update buffer 3/4

Update buffer 4/4

Execute SecondaryCommandBuffer

Page 99: 2 Vulkan Case Study.pdf

Samsung Electronics

Development Tip - Multi-thread

Queue

CPU Thread

CPU Thread

CPU Thread

CPU Thread

Command Buffer

cmd cmd cmd

cmd cmd cmd

cmd cmd cmd

cmdcmd cmd

Page 100: 2 Vulkan Case Study.pdf

Samsung Electronics

Development Tip - Multi-thread

Queue

CPU Thread

CPU Thread

CPU Thread

CPU Thread

Command Buffer

cmd cmd cmd

cmd cmd cmd

cmd cmd cmd

cmdcmd cmd

submit!

Page 101: 2 Vulkan Case Study.pdf

Samsung Electronics

Development Tip - Multi-thread

Queue

CPU Thread

CPU Thread

CPU Thread

CPU Thread

Command Buffer

cmd cmd cmd

cmd cmd cmd

cmdcmd cmd

cmd cmd cmd

VkCommandPool , VkDesciprtorPool should be synchronized or all those pools are should be independently handled by each corresponding thread.

Page 102: 2 Vulkan Case Study.pdf

Samsung Electronics

Development Tip - Reducing duplicated API calls

It is important to calling bind/set function once in a VkCommandBuffer to prevent duplicated vkCmdSetXXX, vkCmdBindXXX calls with same value / parameter.

Worst case

※ In our test case, 500 Calls vkCmdSetViewPort and vkCmdSetScissor take 1.412 ms.

Page 103: 2 Vulkan Case Study.pdf

Thank you