Upload
vandiep
View
281
Download
3
Embed Size (px)
Citation preview
Vulkan Case Study2016 Khronos Seoul DevUSAMSUNG Electronics
Soowan Park Graphics Engineer ([email protected])
Joonyong Park Senior Graphics Engineer ([email protected])
Samsung Electronics
Before the start
All case study information & contents are based on our development experiences with Galaxy S7 spanning two chipset variants, using the ARM Mali and Qualcomm Adreno GPU.
Samsung Electronics
What we did
ProtoStar, HIT, NFS, Vainglory
MWC, GDC, SDC, E3, Gamescom, CEDEC
Samsung Electronics
History
Samsung Electronics
History
Samsung Electronics
Agenda
1. Swapchain
2. Uniform Buffer
3. GPU Driver
4. Rendering
5. GLES Fall-back
6. Development Tip
Samsung Electronics
For who?
For Android Vulkan Developer.
It’s very simple case, But important!
1. Swapchain
Samsung Electronics
Swapchain - Android
Triple Buffering - Google Project Butter (Applied since Android 4.1 Jelly Bean release)• Android OpenGL ES runs with triple buffering by default
• adb shell dumpsys SurfaceFlinger →
Image Count of Swapchain• Android platform requires at least 3 buffers to have better performance for this reason.
With Java SurfaceView• Currently Android Vulkan only support native activity. But, there are way to using
SurfaceView & Java activity by passing surface handle to native through JNI to get NativeWindow handle.
q.v. : https://developer.android.com/ndk/reference/group___native_activity.html
• Recommend to using GLSurfaceView like separated java side Renderthread for main render loop.
#0 #1 #2 #0 #1User can’t control the number of BackBuffer in OpenGL ES
Samsung Electronics
Swapchain - Presentation Mode
• VK_PRESENT_MODE_MAILBOX_KHR
Swapchain Images
#0 #1 #2
Internal queue (impl dependant)
X*
vkAcquireNextImage
vkQueuePresent
vkAcquireNextImage
vkQueuePresent
vkAcquireNextImage
vkQueuePresent
#0X=#0
#1X=#1
#2X=#2
VBLANK
Display controller will read from #1
Latency
Samsung Electronics
Swapchain - Presentation Mode
• VK_PRESENT_MODE_FIFO_KHR
Swapchain Images
#0 #1 #2
Internal queue
X*
vkAcquireNextImage
vkQueuePresent
vkAcquireNextImage
vkQueuePresent
vkAcquireNextImage
vkQueuePresent
#0X=#0
#1Y=#1
#2Z=#2
VBLANK
Swaps #0 stored in X with the backbuffer.
Latency
Y* Z*
Samsung Electronics
Swapchain - Presentation Mode
VK_PRESENT_MODE_MAILBOX_KHR
VK_PRESENT_MODE_FIFO_KHR
※ DO NOT use MAILBOX mode in
game. Unless latency is critical and
you know what you’re doing.
60 FPS line
60 FPS line
Samsung Electronics
Swapchain - Presentation Mode
Code Level (q.v. : https://www.khronos.org/registry/vulkan/specs/1.0-wsi_extensions/xhtml/vkspec.html, , 29.5. Surface Queries)
uint32_t presentModeCount = 0;vkGetPhysicalDeviceSurfacePresentModesKHR(physicalDevice, surface, &presentModeCount, VK_NULL_HANDLE);std::vector<VkPresentModeKHR> pPresentModes(presentModeCount);vkGetPhysicalDeviceSurfacePresentModesKHR(physicalDevice, surface, &presentModeCount, pPresentModes.data());VkPresentModeKHR presentMode = VK_PRESENT_MODE_FIFO_KHR;
const uint32_t desiredArraySize = 2;VkPresentModeKHR desiredPresentMode[] ={
VK_PRESENT_MODE_FIFO_KHR,VK_PRESENT_MODE_MAILBOX_KHR
};
for (int d_n = 0; d_n < desiredArraySize; ++d_n){
for (int p_n = 0; p_n < presentModeCount; ++p_n){
if (pPresentModes[p_n] == desiredPresentMode[d_n]){
presentMode = desiredPresentMode[d_n];d_n = desiredArraySize;break;
}}
}
Samsung Electronics
Swapchain - SwapBuffer Comparison (Android)
WSI (Windows System Integration)RENDERFRAME N (Vulkan)
RENDERFRAME N (OpenGL ES)
APPLICATION
SURFACE FLINGER
DISPLAY
glClear / glDrawXXX #0 eglSwapBuffer #0Render Into BackBuffer (FrameBuffer 0) #0
EGLSurface : GfxBuffer #0
EGLSurface : GfxBuffer #1
EGLSurface : Gfxbuffer #2
WindowBuffer
DequeueQueue
EGLSurface : GfxBuffer #0
WindowBuffer
Associated Native Window
glFlush() #0COMMAND FLUSHING & RENDERING
No way to get GPU rendering completion
vkAcquireNextImageKHR #0
APPLICATION
SURFACE FLINGER
DISPLAY
VkImage(Buffer) #0
VkImage (Buffer) #1
VkImage (Buffer) #2
vkQueueSubmit #0Recorded Into Command Buffer #0
associated Graphics QueuevkQueuePresentKHR #0
WindowBuffer
Dequeue Queue
VkImage #0
WindowBuffer
WILL BLOCK HERE
Associated Native Window
COMMAND FLUSHING & RENDERING
Rendering Complete Semaphore
COMPLETE!
INTERNALWAIT
※ Application does the “blocking wait” to sync with GPU.(VK_PRESENT_MODE_FIFO_KHR)
WILL BLOCK HERE
Can explicitly get GPU rendering completion signal by using fence from submit
Samsung Electronics
Swapchain - Synchronization failed case
• Tearing
Samsung Electronics
Swapchain - Synchronization
• Fence Logic
VkCommandBufferPool(Single-Thread)
Swapchain
VkImage #0 VkImage #1 VkImage #2
VkFence #0
VkCommandBuffer #0
VkFence #1
VkCommandBuffer #1
VkFence #2
VkCommandBuffer #2
Swapchain
VkImage #0 VkImage #1 VkImage #2
VkFence #0
VkCommandBuffer #0
VkFence #1
VkCommandBuffer #1
VkFence #2
VkCommandBuffer #2
vkWaitForFences(fence #0)
vkResetFence(fence #0)
vkResetCommandBuffer(buf #0)
vkBeginCommandBuffer(buf #0)
Render ~
vkQueueSubmit(fence #0)
vkQueuePresentKHR
vkWaitForFences(fence #1)
vkResetFence(fence #1)
vkResetCommandBuffer(buf #1)
vkBeginCommandBuffer(buf #1)
Render ~
vkQueueSubmit(fence #1)
vkQueuePresentKHR
vkWaitForFences(fence #2)
vkResetFence(fence #2)
vkResetCommandBuffer(buf #2)
vkBeginCommandBuffer(buf #2)
Render ~
vkQueueSubmit(fence #2)
vkQueuePresentKHR
Samsung Electronics
Image Layout - Swapchain
• Transitioning to the correct image layout for presenting and rendering.
• Very begin of drawing, after the first acquire
• getSwapchainImagesKHR : VK_IMAGE_LAYOUT_UNDEFINED
• VK_IMAGE_LAYOUT_GENERAL
• Clear presentable image
• Draw Routine
• Acquire
• VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
• Render
• VK_IMAGE_LAYOUT_PRESENT_SRC_KHR
• Present
// Create SwapchainvkGetSwapchainImagesKHR(device, swapchain, &swapchainImageCount, pSwapchainImages); // VK_IMAGE_LAYOUT_UNDEFINED
// Frame loopswapchainIndex = acquire();if (firstAcquire){
setImagesLayout(pSwapchainImages, swapchainImageCount, VK_IMAGE_LAYOUT_GENERAL);clearImages(pSwapchainImages, swapchainImageCount);
}setImageLayout(pSwapchainImages[swapchainIndex], VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL);/*Rendering*/setImageLayout(pSwapchainImages[swapchainIndex], VK_IMAGE_LAYOUT_PRESENT_SRC_KHR);present(swapchainIndex);
Samsung Electronics
Image Layout - Texture
VK_TILING_LINEAR
• Create with VK_IMAGE_LAYOUT_PREINITIALIZED
• Set ImageData using vkMapMemory, vkUnmapMemory
• VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL
VK_TILING_OPTIMAL
• Create with VK_IMAGE_LAYOUT_UNDEFINED
• VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL
• Set ImageData using Staging Buffer
• VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL
You can check format property like this
// Get Image Format PropertyVkFormatProperties formatProperty;vkGetPhysicalDeviceFormatProperties(physicalDevice, imageFormat, &formatProperty);if (formatProperty.optimalTilingFeatures & VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT) /**/;else if (formatProperty.linearTilingFeatures & VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT) /**/;
VkDescriptorImageInfo VkImageView VkImage
Texturing in Vulkan
VkSampler
VkDescriptorSet VkDeviceMemory
Samsung Electronics
Image Layout - Texture
• Why we should use Staging Buffer?
VK_TILING_LINEAR
Texels are laid out in memory in row-major order, possibly with some padding on each row
So you can access it with this eq.
Common
Compressed
VK_TILING_OPTIMAL
Texels are laid out in an implementation-
dependent arrangement, for more
optimal memory access
// (x,y,z,layer) are in texel coordinates
address(x,y,z,layer) = layer*arrayPitch +
z*depthPitch + y*rowPitch + x*texelSize +
offset;
// (x,y,z,layer) are in compressed texel
block coordinates address(x,y,z,layer) =
layer*arrayPitch + z*depthPitch + y*rowPitch
+ x*compressedTexelBlockByteSize + offset;
VkImage(VkDeviceMemory)
?VkImage(VkDeviceMemory)
Samsung Electronics
Image Layout - Texture
• How can use Staging Buffer?
vkCmdCopyBufferToImage
VkImage with VK_TILING_OPTIMAL
VkBuffer& stagingBuffer = getStagingBuffer(imageBufferSize);VkBufferImageCopy region = getRegionFromImage(image);fillBuffer(stagingBuffer, pImageData);vkCmdCopyBufferToImage(commandBuffer, stagingBuffer, image,VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, ®ion);
DO NOT use VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT
with VK_TILING_OPTIMAL.
?Image data Fill image data into the VkBuffer
VkBuffer
VkCommandBuffer
Samsung Electronics
Image Layout - Framebuffer (OnlyForColor)
• Bind for Attachment (transitioning Off-screen render target to input texture e.g. environment map.. Post-processing.. etc)
• VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
• Bind for Texture
• VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL
// Initialize FrameBuffercreateFrameBuffer(frameBuffer); // VK_IMAGE_LAYOUT_UNDEFINEDsetImageLayout(frameBuffer, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL);………
//Bind FrameBufferbindFrameBuffer(frameBuffer); setImageLayout(frameBuffer, VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL);/*Render into frameBuffer*/unbindFrameBuffer(frameBuffer); // And set Default FramebuffersetImageLayout(frameBuffer, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL);setTexture(frameBuffer, 0);/*Render into backbuffer*/
Samsung Electronics
Image Layout - Framebuffer (OnlyForColor)
Off-screen #0 Original Scene
Off-screen #1 NormalMap for PostProcessing
VkImage #0VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
VkImage #1VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
VkImage #0VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL
VkImage #1VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL
Rendering
VkFramebuffer VkImageView VkImage
Framebuffer in Vulkan
VkDeviceMemory
Samsung Electronics
Swapchain - SurfaceFormat
?
Samsung Electronics
Swapchain - SurfaceFormat
getHolder().setFormat(PixelFormat.RGB_888)
Java to native, All related window surfaces are should have same format
VK_FORMAT_R8G8B8_UNORM
Recommend querying the surface format to check whether your target device supports it,
All Java- and native- side surfaces should use a matching format.
size_t surfaceFormatCount = 0;vkGetPhysicalDeviceSurfaceFormatsKHR(physicalDevice, surface, &surfaceFormatCount, VK_NULL_HANDLE);std::vector<VkSurfaceFormatKHR> surfaceFormats(surfaceFormatCount);vkGetPhysicalDeviceSurfaceFormatsKHR(physicalDevice, surface, &surfaceFormatCount, surfaceFormats.data());const size_t desiredArraySize = 2;VkFormat desiredSurafceFormats[] ={ VK_FORMAT_R8G8B8_UNORM, VK_FORMAT_R5G6B5_UNORM_PACK16};for (int d_n = 0; d_n < desiredArraySize; ++d_n){
for (int s_n = 0; s_n < surfaceFormatCount; ++s_n){if (surfaceFormats[s_n].format == desiredSurafceFormats[d_n]){
swapchainImageFormat = surfaceFormats[d_n].format; colorSpace = surfaceFormats[d_n].colorSpace;d_n = desiredArraySize; break;
}}
}
Need to match both format
or you need to check image format in renderpass.
APPLICATION
JAVA ACTIVITY
CREATE JAVA SURFACEVIEW
CREATE SWAPCHAIN
APPLICATION
NATIVE ACTIVITY
CREATE SWAPCHAIN
(Be careful with Java SurfaceView)
Samsung Electronics
Swapchain - Create / Recreation example rely on android activity events
• Surface handling
Activity Starts
surfaceCreated()
surfaceChanged()
surfaceChanged()
surfaceDestroyed()
surfaceCreated()
surfaceChanged()
Activity is shut down
resizeSurface
onPause
onResume
shutdown App
surfaceDestroyed()
Create VkSurfaceKHR
Create VkSwapchainKHR
Create VkSwapchainKHR –need to pass oldSwapchain
Destroy VkSwapchainKHR, VkSurfaceKHR
Create VkSurfaceKHR
Create VkSwapchainKHR
Destroy VkSwapchainKHR, VkSurfaceKHR
Event
Need to wait until queue is empty.
Create VkInstance
Destroy VkInstance
Crash
Samsung Electronics
Swapchain – Crash at onSurfaceChanged ( Resize )
SwapChain A
Image #0
Image #1
Image #2
CommandBuffer N-2Present Completed
CommandBuffer N-1Present Completed
CommandBuffer NPresenting
Surface Changed
pOldSwapchain
Crash!
Image #1
Image #2
Surface Changed
Create BvkQueueWaitIdle()
Wait till queue empty Create B
Create B
Passing Swapchain A to pOldSwapchain, Then It’s Internal resources are will be Destroyed at Swapchain B creation time.
Samsung Electronics
Swapchain – Crash at onSurfaceDestroyed ( Pause )
SwapChain A SwapChain B
Image #0
Image #1
Image #2
CommandBuffer N-2Present Completed
CommandBuffer N-1Present Completed
CommandBuffer NPresenting
Surface Destroye
donPause onResume
Destroy A
Surface Created
Crash!
Image #1
Image #2
Surface Destroyed
Destroy AvkQueueWaitIdle()
Wait till queue empty then destroy A
Samsung Electronics
Similar problem - Vulkan Object Release
destroyShader
destroyVertexBuffer
destroyIndexBuffer
Destroy graphicPipeline, descriptor…
Destroy VkBuffer, release or return VkDeviceMemory, …
Begin Frame
End Frame
destroyXXX
Queue
CommandBuffer #1
CommandBuffer #2
CommandBuffer N-1Presenting
CommandBuffer NIn-Progress
DestroyShader
vkDestroyPipeline
?VkPipeline
CommandBuffer #0
CommandBuffer N-2Present Completed
Samsung Electronics
Similar problem - Vulkan Object Release
RENDERFRAME N
RENDERFRAME N+1
RENDERFRAME N+2
RENDERFRAME N+3
VkCommandBuffer#0
VkCommandBuffer#1
VkCommandBuffer#2
VkCommandBuffer#0
Create VkPipeline
#0~#10
Dependency check VkPipeline
#0~#5
Use VkPipeline#0~#10
Use VkPipeline#0~#10
Use VkPipeline#6~#10
RENDERFRAME N+4
VkCommandBuffer#1
Destroy VkPipeline
#0~#5
Dependency check VkPipeline
#0~#5
Dependency check VkPipeline
#0~#5
Use VkPipeline#6~#10
DestroyShader
2. UniformBuffer
Samsung Electronics
UniformBuffer - Shader Memory Alignment
Expected
Samsung Electronics
UniformBuffer - Shader Memory Alignment
Error
Samsung Electronics
UniformBuffer - Shader Memory Alignment
ErrorExpected
Samsung Electronics
UniformBuffer - Shader Memory Alignment
layout(set=0, binding=0) uniform buf1{float _unif1; // #0vec3 _unif2; // #1vec2 _unif3; // #2
}
Expected
Convert SPIRV
Applied std140 layout
Result
q.v. : VulkanSpec_1.0.28, 14.5.4. Offset and Stride Assignment,
In case of shader using Vulkan GLSL Extension would not have alignment problem.
But, Need to be careful if you are using directly converted SPIR-V from it without alignment (std140) through glslang.
#2
#1
#0
#2#1#0
Order
Samsung Electronics
UniformBuffer - Shader Memory Alignment
• The Offset Decoration must be a multiple of its base alignment, computed recursively as follows:
◦ a scalar of size N has a base alignment of N
◦ a two-component vector, with components of size N , has a base alignment of 2N
◦ a three- or four-component vector, with components of size N , has a base alignment of 4N
◦ an array has a base alignment equal to the base alignment of its element type, rounded up to a multiple of 16
◦ a structure has a base alignment equal to the largest base alignment of any of its members, rounded up to a multiple of 16
◦ a row-major matrix of C columns has a base alignment equal to the base alignment of vector of C matrix components
◦ a column-major matrix has a base alignment equal to the base alignment of the matrix column type
• Any ArrayStride or MatrixStride decoration must be an integer multiple of the base alignment of the array or matrix from above.
• The Offset Decoration of a member immediately following a structure or an array must be greater than or equal to the next multiple of the base alignment of that structure or array.
q.v. : VulkanSpec_1.0.28, 14.5.4. Offset and Stride Assignment,
Samsung Electronics
layout(set=0, binding=0, std140) uniform buf1{mat4 _unif00; // #0vec4 _unif01; // #1vec4 _unif02; // #2
}
layout(set=0, binding=0, std140) uniform buf1{vec2 _unif00; // #0vec2 _unif01; // #1vec3 _unif02; // #2
}
VkDeviceMemory 4 Bytes
#0 #1 #2
#0
#1
#2
UniformBuffer - Shader Memory Alignment
Samsung Electronics
layout(set=0, binding=0, std140) uniform buf1{vec4 _unif00; // #0vec2 _unif01; // #1vec2 _unif02; // #2
}
layout(set=0, binding=0, std140) uniform buf1{vec2 _unif00; // #0float _unif01; // #1float _unif02; // #2
}
UniformBuffer - Shader Memory Alignment
VkDeviceMemory
#2
#0
#1
#2
#1
#0
Samsung Electronics
layout(set=0, binding=0, std140) uniform buf1{vec3 _unif00; // #0float _unif01; // #1vec2 _unif02; // #2
}
layout(set=0, binding=0, std140) uniform buf1{float _unif00; // #0vec3 _unif01; // #1vec2 _unif02; // #2
}
UniformBuffer - Shader Memory Alignment
VkDeviceMemory
#2
#0
#1
#2
#1#0
Samsung Electronics
layout(set=0, binding=0, std140) uniform buf1{float _unif00; // #0vec2 _unif01; // #1vec2 _unif02; // #2
}
layout(set=0, binding=0, std140) uniform buf1{float _unif00; // #0vec4 _unif01; // #1vec4 _unif02; // #2
}
UniformBuffer - Shader Memory Alignment
VkDeviceMemory
#2
#0
#1
#2
#1#0
Samsung Electronics
layout(set=0, binding=0, std140) uniform buf1{mat2 _unif00; // #0mat3 _unif01; // #1mat4 _unif02; // #2
}
layout(set=0, binding=0, std140) uniform buf1{mat3 _unif00; // #0float _unif01; // #1vec2 _unif02; // #2
}
UniformBuffer - Shader Memory Alignment
VkDeviceMemory
#0 #2#1
#2
#1#0
*
Sorting, multiple UBO, using vec4… there will be many other approaches depends on your application or engine.
Samsung Electronics
UniformBuffer - Memory Alignment
Memory Pools are useful for dynamic objects.
layout(set=0, binding=0, std140) uniform buf1{
vec2 _unif00; // #0vec2 _unif01; // #1float _unif02; // #2float _unif03; // #3
}
Assume that each object has following structure.
VkDeviceMemory1 Byte
vec2 vec2
float float
60 3 71 4 82 5 9 10 11 12
16 1817 19 2120
13 14 15
22 23
rendering issue
Samsung Electronics
UniformBuffer - Memory Alignment
UBO value corruption.
?
Samsung Electronics
UniformBuffer - Memory Alignment
• VkDescriptorBufferInfo - should be take care with given alignment from physical device limits
VkPhysicalDeviceLimits::minUniformBufferOffsetAlignment
VkDeviceMemory VkDeviceMemory
Applied Memory Alignment
Assume that minUniformBufferOffsetAlignment : 16, block size : 1 byte
Samsung Electronics
UniformBuffer - Memory Alignment
Code level
VkPhysicalDeviceProperties properties;vkGetPhysicalDeviceProperties(physicalDevice, &properties);size_t minUniformBufferOffsetAlignment = properties.limits.minUniformBufferOffsetAlignment;
size_t padding = 0;size_t mod = _uniformBufferSize % minUniformBufferOffsetAlignment;if (mod != 0){
padding = minMemoryMapAlignment - mod;}_nextBufferOffset = _uniformBufferSize + padding;
Samsung Electronics
UniformBuffer - Memory Alignment
Following limits in VkPhysicalDeviceLimits are important when you dealing with memory management.
size_t minMemoryMapAlignment;
VkDeviceSize minTexelBufferOffsetAlignment;
VkDeviceSize minUniformBufferOffsetAlignment;
VkDeviceSize minStorageBufferOffsetAlignment
VkDeviceSize nonCoherentAtomSize;
…
Samsung Electronics
UniformBuffer - Tile Artifact
Samsung Electronics
UniformBuffer - Tile Artifact
Samsung Electronics
UniformBuffer - Tile Artifact
Why?• Because we didn’t take care about multiple UniformBuffer usage.
Swapchain
VkImage #0 VkImage #1 VkImage #2
VkCommandBuffer #0
VkCommandBuffer #1
VkCommandBuffer #2
UniformBuffer
VkImage #0 VkImage #1 VkImage #2
VkCommandBuffer #0
VkCommandBuffer #1
VkCommandBuffer #2
Samsung Electronics
UniformBuffer - Tile Artifact
Mobile Tile based GPU
N N N N
N N N N
N N N+1 N+1
N = N Frame MVP Matrix
N+1 = N+1 Frame MVP Matrix(Changed)
Samsung Electronics
UniformBuffer - Tile Artifact
• Should have at least one UniformBuffer for each corresponding swapchain index of image
• Or using multiple DynamicOffset with A UniformBuffer in vkCmdBindDescriptorSets can solve this issue.
Swapchain
VkImage#0
VkImage#1
VkImage#2
VkCommandBuffer #0
VkCommandBuffer #1
VkCommandBuffer #2
UB #0
Swapchain
VkImage#0
VkImage#1
VkImage#2
VkCommandBuffer #0
VkCommandBuffer #1
VkCommandBuffer #2
UB #0 UB #1 UB #2
Swapchain
VkImage#0
VkImage#1
VkImage#2
VkCommandBuffer #0
VkCommandBuffer #1
VkCommandBuffer #2
UB #0
dynamicOffset
3. GPU Driver
Samsung Electronics
GPU Driver - Type of shader input
GPU Skinning problem
Samsung Electronics
GPU Driver - Type of shader input
Original data
SHADER
OpenGL ES Driver
Somehow driver will correct the type of input,
Even though incorrect type passed in
Original data
SHADER
Vulkan Driver
uint32 x 4
vec4
vec4 ← uvec4
uint32 x 4
vec4
vec4
Original data
SHADER
Vulkan Driver
uint32 x 4
uvec4
uvec4
EmptyPlease use correct
type of input.
Samsung Electronics
GPU Driver - API Comparison
In common cases, the GPU load should be the same using both APIs
But we’ve faced some cases where GLES was bit better than Vulkan
• Even though we were using the same vertices and indices
A lot of effort has been spent in OpenGL ES driver optimization.
Vulkan drivers are intended to be light-weight, predictable
and no more driver magic.
SO YOU HAVE TO IMPLEMENT OPTIMIZATIONS YOURSELF!
GPU loadGPU load
Samsung Electronics
GPU Driver - API Comparison
Geometry sorting (Vertex & Index)• There is a limitation that a range of vertices must be shaded. This means that a
triangle built from indices {0,1,2} should be significantly cheaper to execute than a triangle built from indices {0, 999, 1999} (3 vertices transformed vs. 2000 vertices transformed).
Without Geometry Sorting
With Geometry Sorting
4. Rendering
Samsung Electronics
Rendering - Quality
Sometimes you may see the color aliasing artifacts in Vulkan Applications.• You may need to consider changing surface format(RGB565 to RGB888)
• SPIR-V has only two precisions for shader calculation.• In Glslang logic
• lowp & mediump : RelaxedPrecision Decoration
• highp : empty
• You should consider using highp if your application needs accuracy.
• But, please use mediump wherever possible because of performance.
※ Saturate was modified for presentation.
Samsung Electronics
RGB32
Samsung Electronics
ETC1
Samsung Electronics
ASTC 6x6
Samsung Electronics
ASTC 8x8
Samsung Electronics
RGB32
Samsung Electronics
ETC1
Samsung Electronics
ASTC 6x6
Samsung Electronics
ASTC 8x8
Samsung Electronics
Rendering - Texture Format
You can optimize your app using various ASTC block sizes. But you should select proper option for texture quality.
ETC1, ASTC Comparison
※ It depends on the quality choice.
Anyway, It’s better to use!
Block Size Bits Per Pixel
4x4 8.00
5x4 6.40
5x5 5.12
6x5 4.27
6x6 3.56
8x5 3.20
8x6 2.67
10x5 2.56
10x6 2.13
8x8 2.00
10x8 1.60
10x10 1.28
12x10 1.07
12x12 0.89
Font, NormalMap, Color_Low, Color_HighUI, Etc…
Bandwidth (HIT case)
ETC1 read bandwidth = 14.56 MBs
ASTC read bandwidth = 13.80 MBs
bandwidth_delta = 0.76 MBs
bandwidth_reduction = 5.22%
APK SizeETC1 + OpenGL ES 2.0 : 599 MBASTC + Vulkan : 521 MB
Memory usageETC1 + OpenGL ES 2.0 : 1115 MBASTC + Vulkan : 557 MB
VkPhysicalDeviceFeaturestextureCompressionASTC_LDR
5. GLES Fall-back
Samsung Electronics
GLES Fall-back
Application::onCreate()
Is Vulkan supported Application::loadVulkanRHI ()
Application::loadOpenGLESRHI ()
Application::onSurfaceCreated()
Application::onSurfaceResize()
Highly recommend to put API detection at very first initialization stage –Before the surface creation not to waste additional resource allocations ..etc.
Create Swapchain
Create EGLSurface
Application::Vulkan::Resize()
Application::GLES::Resize()
Resize Swapchain
Resize Surface(FB, Viewport)
Samsung Electronics
GLES Fall-back - Vulkan Detection(Mobile)
Attempt to load Vulkan PDK
Attempt to loadBasic functions
Attempt to createinstance
Attempt to createInstance with
stable API version
Attempt to loadAll functions
Get/Check patch version from driver
1.0.0
1.0.11
THREE ADDITIONAL CHECKS
success == dlopen(“libVulkan.so”)
VK_SUCCESS == vkCreateInstanceVK_SUCCESS == vkCreateDevice
success == vkGetInstanceProcAddr
Be careful that some early patch versions may be unstable or lacking features. We use the GLES renderer for patch versions less than 11.
success == vkGetDeviceProcAddr
Samsung Electronics
GLES Fall-back - Resources
• Texture Resource • We do recommend to using ASTC but, still not supported drivers out there.
• At moment, Common texture format for GLES in market is ETC1_RGB8_OES.
• Maintaining 2 formats of texture pack just for the API compatibility would be burden.
• If you want to port a game which is using ETC1 to Vulkan, you can use the existing resources by using back-word compatible format VK_FORMAT_ETC2_R8G8B8_UNORM_BLOCK.
• Shader Resource• Vulkan requires SPIR-V, You can have following 2 options for the shader resources.
1) ES310 (std140) GLSL + Runtime SPIR-V conversion https://github.com/google/shaderc
2) ES2/310 (std140) GLSL + Offline-compiled SPIR-V
ES310GLSL Code
glslang(shaderC)
SPIR-VRuntimeCONVERSION
PersistentPipelineCache
GLSL Code(ES310)
PersistentProgram Binary
SPIR-VPersistent
PipelineCache
GLSL Code(ES2/310)
PersistentProgram Binary
ES310GLSL Code
glslang OfflineCONVERSION
GLSL Code(ES2.0)
6. Development Tip
Samsung Electronics
Development Tip - Vulkan Viewport Volume
ResultExpected
?
Samsung Electronics
Development Tip - Vulkan Viewport Volume
• Vulkan Viewport Volume• Basically, Vulkan use OriginUpperLeft Execution model. And default Viewport volume
is different from OpenGL ES.(The OriginLowerLeft execution mode must not be used; fragment entry points must declare OriginUpperLeft)
q.v. : VulkanSpec1.0.28 / A.3. Validation Rules within a module.
Add VertexShader PostFix
Multiply VMatrix in front of MVP
Modify Math Function
X (-1 ~ +1)
Y (-1 ~ +1)
Z (-1 ~ +1)
X (-1 ~ +1)
Y (-1 ~ +1)
Z (0 ~ +1)
NDC Space (Ex : DepthFunc Less, DepthClear 1.0f)
Samsung Electronics
Development Tip - Vulkan Viewport Volume
// VertexShadervoid main(){
gl_Position=vec4(0.5, 0.5, 0.0, 1.0);}
Samsung Electronics
Development Tip - Vulkan Viewport Volume
Add VertexShader PostFix.
#version 310 esprecision highp float…void main(){
…gl_position = MVP*_vertex;gl_position.y = -gl_position.y; // addedgl_position.z = (gl_position.z + gl_position.w) / 2.0; // addedreturn;
}
You will face this kind of problem without above correction in Vulkan
Samsung Electronics
Development Tip - Vulkan Viewport Volume
• Multiply VMatrix in front of MVP.
MAT4 VMatrix = { 1.0f, 0.0f, 0.0f, 0.0f, 0.0f, -1.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.5f, 0.5f, 0.0f, 0.0f, 0.0f, 1.0f };
Or (Depend on usage)
MAT4 VMatrix = { 1.0f, 0.0f, 0.0f, 0.0f, 0.0f, -1.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.5f, 0.0f, 0.0f, 0.0f, 0.5f, 1.0f };
MVP = VMatrix * P * V * M;setUniform(MVP);
Samsung Electronics
Development Tip - Vulkan Viewport Volume
Modify Math Function (q.v. : http://glm.g-truc.net/0.9.8/index.html)
#define GLM_LEFT_HANDED 0x00000001// For DirectX, Metal, Vulkan#define GLM_RIGHT_HANDED 0x00000002// For OpenGL, default in GLM
Orthoif GLM_DEPTH_CLIP_SPACE == GLM_DEPTH_ZERO_TO_ONEResult[2][2] = - static_cast<T>(1) / (zFar - zNear);Result[3][2] = - zNear / (zFar - zNear);#elseResult[2][2] = - static_cast<T>(2) / (zFar - zNear);Result[3][2] = - (zFar + zNear) / (zFar - zNear);#endif
Ortho#if GLM_COORDINATE_SYSTEM == GLM_LEFT_HANDEDreturn orthoLH(left, right, bottom, top, zNear, zFar);#elsereturn orthoRH(left, right, bottom, top, zNear, zFar);#endif
Frustum#if GLM_DEPTH_CLIP_SPACE == GLM_DEPTH_ZERO_TO_ONEResult[2][2] = farVal / (farVal - nearVal);Result[3][2] = -(farVal * nearVal) / (farVal - nearVal);#elseResult[2][2] = (farVal + nearVal) / (farVal - nearVal);Result[3][2] = - (static_cast<T>(2) * farVal * nearVal) /(farVal - nearVal);#endifPerspectiveif GLM_DEPTH_CLIP_SPACE == GLM_DEPTH_ZERO_TO_ONEResult[2][2] = zFar / (zFar - zNear);Result[3][2] = -(zFar * zNear) / (zFar - zNear);#elseResult[2][2] = (zFar + zNear) / (zFar - zNear);Result[3][2] = - (static_cast<T>(2) * zFar * zNear) /(zFar - zNear);#endif
Frustum#if GLM_COORDINATE_SYSTEM == GLM_LEFT_HANDEDreturn frustumLH(left, right, bottom, top, nearVal, farVal);#elsereturn frustumRH(left, right, bottom, top, nearVal, farVal);#endif
Perspective#if GLM_COORDINATE_SYSTEM == GLM_LEFT_HANDEDreturn perspectiveFovLH(fov, width, height, zNear, zFar);#elsereturn perspectiveFovRH(fov, width, height, zNear, zFar);#endif
Samsung Electronics
Development Tip - Validation Layer
Loader supports layering APIs
vkQueueSubmit
vkQueueSubmit
vkQueueSubmit
vkQueueSubmit
Call Vulkan Function
libVkLayerXXX.so
libVkLayerXXXX.so
Application
Loader
Driver
vulkan.XXX.so
libVulkan.so
Samsung Electronics
Development Tip - Validation Layer
Sometimes you may face VK_ERROR_DEVICE_LOST or unexpected error (GPU hang)
Turn on validation layer and fix it!
• [MEM, 3]: Linear Image 0x515 is aliased with non-linear image 0x507 which is in violation of the Buffer-Image Granularity section of the Vulkanspecification
• [DS, 49]: ]DS 0x890 encountered the following validation error at draw time: Dynamic descriptor in binding #1 at global descriptor index 2 uses buffer 4 with dynamic offset 524000 combined with offset 0 and range 720 that oversteps the buffer size of 524288[MEM, 12]: vkCmdBeginRenderPass(): cannot read invalid memory 0x26, please fill the memory before using
• bCode 7 : Cannot map an image with layout VK_IMAGE_LAYOUT_UNDEFINED. Only GENERAL or PREINITIALIZED are supported.• Code 7 : Cannot submit cmd buffer using image (0xffffffffc91fe3b0) [sub-resource: aspectMask 0x1 array layer 2, mip level 5], with layout
VK_IMAGE_LAYOUT_UNDEFINED when first use is VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL.• Code 10 : Attempt to set lineWidth to 0.000000 but physical device wideLines feature not supported/enabled so lineWidth must be 1.0f! • Code 7 : Cannot submit cmd buffer using image (0xffffffffc7fe7ea0) [sub-resource: aspectMask 0x1 array layer 0, mip level 0], with layout
VK_IMAGE_LAYOUT_UNDEFINED when first use is VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL. • [MEM] Code 12 : vkCmdBeginRenderPass(): Cannot read invalid memory 0xffffffffc93bf470, please fill the memory before using. • Code 7 : You cannot transition the layout from VK_IMAGE_LAYOUT_GENERAL when current layout is
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL.• Code 25 : Unable to allocate 2 descriptors of type VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER from pool 0xffffffffcf476468. This pool only has 0
descriptors of this type remaining. • Code 54 : Attempt to reset command buffer (0x28ceabb21c) which is in use. • [MEM] Code 9 : Calling vkBeginCommandBuffer() on active CB 0x0xceabb21c before it has completed. You must check CB fence before this call.• Code 27 : vkUpdateDescriptorsSets() failed write update validation for Descriptor Set 0xffffffffd0e00188 with error: Cannot call
vkUpdateDescriptorSets() to perform write update on descriptor set 18446744072918925704 that is in use by a command buffer. FIX Candidate CL 83010
• Code 53 : Command Buffer 0xcf3be004 is already in use and is not marked for simultaneous use. • Code 54 : Attempt to reset command buffer (0x28ceab721c) which is in use. Code 9 : Calling vkBeginCommandBuffer() on active CB 0x0xceab721c
before it has completed. You must check CB fence before this call. • Code 7 : You cannot transition the layout from VK_IMAGE_LAYOUT_GENERAL when current layout is
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL.
Samsung Electronics, MCD, GPU
Validation Error #1
CommandBuffer N
vkQueueSubmit (N) vkWaitForFence (N-2)
vkResetCommandBuffer (N-2)
CommandBuffer N-1
CommandBuffer N-2
vkQueueSubmit (N-2)
vkQueueSubmit (N-1)
Frame N
Frame N + 1
Frame N + 2
Wait Index ++ % 3
CommandBuffer N
vkQueueSubmit (N) vkWaitForFence (N)vkBeginCommandBuffer (N-2)
Frame N + 2
Wait Index ++ % 3
Code 9 : Calling vkBeginCommandBuffer() on active CB 0x0xceab721c before it has completed.You must check CB fence before this call.Code 53 : Command Buffer 0xcf3be004 is already in use and is not marked for simultaneous use.Code 54 : Attempt to reset command buffer (0x28ceab721c) which is in use.
Normal Case
Bug Case
ONE_TIME_SUBMIT
vkBeginCommandBuffer (N-2)
vkResetCommandBuffer (N-2)
Samsung Electronics, MCD, GPU
Validation Error #2
[MEM, 3]: Linear Image 0x515 is aliased with non-linear image 0x507 which is in violation of the Buffer-Image Granularity section of the Vulkan specification
Image (SwapChain) TILING_OPITMAL
Image (Staging) TILING_LINEAR
vkCmdCopyImageRead
MemoryCaptured RAW data
Image (SwapChain) TILING_OPITMAL
Buffer (Staging) N/AvkCmdCopyImageToBufferRead
MemoryCaptured RAW data
Normal Case
Bug Case
Samsung Electronics, MCD, GPU
Validation Error #3
Code 7 : Cannot submit cmd buffer using image (0xffffffffc91fe3b0) [sub-resource: aspectMask 0x1 array layer 2, mip level 5], with layoutVK_IMAGE_LAYOUT_UNDEFINED when first use is VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL.
Code 7 : You cannot transition the layout from VK_IMAGE_LAYOUT_GENERAL when current layout isVK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL.
And so on…
copy, resolve, clear…
Transition for image copy, resolve, clear… etc
Restore original image layout
switch (ImageLayout){
case VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL:return VK_ACCESS_TRANSFER_READ_BIT;
case VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL:return VK_ACCESS_TRANSFER_WRITE_BIT;
case VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL:return VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
case VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL:return VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;
case VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL:return VK_ACCESS_SHADER_READ_BIT | VK_ACCESS_INPUT_ATTACHMENT_READ_BIT;
case VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL:return VK_ACCESS_SHADER_READ_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT;
case VK_IMAGE_LAYOUT_GENERAL:return VK_ACCESS_INPUT_ATTACHMENT_READ_BIT |
VK_ACCESS_SHADER_READ_BIT |VK_ACCESS_SHADER_WRITE_BIT |VK_ACCESS_COLOR_ATTACHMENT_READ_BIT |VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT |VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT |VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT |VK_ACCESS_TRANSFER_READ_BIT |VK_ACCESS_TRANSFER_WRITE_BIT |VK_ACCESS_MEMORY_READ_BIT |VK_ACCESS_MEMORY_WRITE_BIT;
case VK_IMAGE_LAYOUT_PRESENT_SRC_KHR:return VK_ACCESS_MEMORY_READ_BIT;
case VK_IMAGE_LAYOUT_UNDEFINED:case VK_IMAGE_LAYOUT_PREINITIALIZED:
return 0;}
Set Image memory barrier
Samsung Electronics, MCD, GPU
Validation Error #4
[DS, 49]: DS 0x890 encountered the following validation error at draw time: Dynamic descriptor in binding#1 at global descriptor index 2 uses buffer 4 with dynamic offset 524000 combined with offset 0 andrange 720 that oversteps the buffer size of 524288
BUFFER ( 524288 )
BUFFER ( 524000 ) 720
[MEM, 12]: vkCmdBeginRenderPass(): cannot read invalid memory 0x26, please fill the memory before using
VK_ATTACHMENT_LOAD_OP_DONT_CARE
VK_ATTACHMENT_LOAD_OP_CLEAR / LOAD
RenderPass - Attachment Description
Code 10 : Attempt to set lineWidth to 0.000000 but physical device wideLines feature not supported/enabledso lineWidth must be 1.0f!
No Debug Lines?VkPipelineRasterizationStateCreateInfo.lineWidth
Samsung Electronics
Development Tip - VkPipelineCache
If you don’t use VkPipelineCache, you may face performance problem. (lag)
vkCreateGraphicsPipelines -> It’s really slow! So that we need to use VkPipelineCache.
Without VkPipelineCache With VkPipelineCache (Persistent)
13.260 seconds 4.187 seconds
Game Loading Time (createGraphicPipeline 300 EA + @)
onResume
onPause
size_t pDataSize = 0;vkGetPipelineCacheData(device, pipelineCache, &pDataSize, VK_NULL_HANDLE);// if is validvkGetPipelineCacheData(device, pipelineCache, &pDataSize, pipelineCacheData.data());savePipelineCacheToSDcard(pipelineCacheData);
std::vector<unsigned char*>& pipelineCacheData = getPipelineCacheFromSDcard();VkPipelineCacheCreateInfo pipelineCacheCreateInfo = {};pipelineCacheCreateInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_CACHE_CREATE_INFO;pipelineCacheCreateInfo.initialDataSize = pipelineCacheData.size();pipelineCacheCreateInfo.pInitialData = pipelineCacheData.data();VkPipelineCache pipelineCache = VK_NULL_HANDLE;vkCreatePipelineCache(device, &pipelineCacheCreateInfo, VK_NULL_HANDLE, &pipelineCache);
createGraphicPipeline vkCreateGraphicsPipelines(device, pipelineCache, 1, &createInfo, VK_NULL_HANDLE, &pipline);
Samsung Electronics
Development Tip - Managing VkPipeline
Let’s think about very simple renderer logic.
To make pipeline need a lot of information.
VkGraphicsPipelineCreateInfoVkPipelineVertexInputStateCreateInfoVkPipelineInputAssemblyStateCreateInfoVkPipelineRasterizationStateCreateInfoVkPipelineColorBlendStateCreateInfoVkPipelineDepthStencilStateCreateInfoVkPipelineViewportStateCreateInfoVkPipelineMultisampleStateCreateInfoVkDynamicStateVkPipelineDynamicStateCreateInfoVkPipelineShaderStageCreateInfo…
setShader
draw
setTexture
Initialize
setRenderState
glUseProgram
glEnableglDisablegl...
glBindTexture
glDraw…vkCreateGraphicsPipelinesvkCmdDraw…
Samsung Electronics
Make structure to reuse VkPipeline
Development Tip - Managing VkPipeline
VertexShader FragmentShader
Ignore this block in current case
VertexAttribute #0stride, location, binding
VertexAttribute #1stride, location, binding
RenderState #0depth enable, …
RenderState #1depth disable, …
VkPipelineVertexInputStateCreateInfo, …
VkPipelineDepthStencilStateCreateInfo, …
vkCreateGraphicsPipelines
VkPipeline #0 VkPipeline #1
For example worst case,
Given RenderState & Attributes can be changed every per drawcall.
So that, having efficiently designed pipeline management structure will be very Important for your performance optimization.
setShader
draw
setTexture
setRenderState
Samsung Electronics
VkFreambuffer
VkRenderpass
Development Tip - Managing VkRenderpass, VkFramebuffer
VkFramebufferCreateInfo {…VkRenderPass renderPass;…
}
VkRenderPassCreateInfo {…uint32_t attachmentCount;const VkAttachmentDescription* pAttachments;
…}
VkAttachmentDescription {…VkAttachmentLoadOp loadOp;VkAttachmentStoreOp storeOp;
…} VkAttachmentDescription;
VK_ATTACHMENT_LOAD_OP_LOADVK_ATTACHMENT_LOAD_OP_CLEARVK_ATTACHMENT_LOAD_OP_DONT_CARE
vkCreateRenderPass
VkRenderPass #0 VkRenderPass #1
vkCreateFramebuffer
VkFramebuffer #0 VkFramebuffer #1
VkRenderpass & VkFramebuffer also should consider reusing.
Samsung Electronics
Development Tip - Clear framebuffer cost
There are 3 way to clear framebuffer (color, depth, stencil)• Renderpass Load Operation
• vkCmdClearAttachments
• vkCmdClearColorImage/vkCmdClearDepthStencilImage
It’s important to using proper clear approach to not waste additional clear cost
( e.g. clear all, color only, depth only )
• 1 clear color & 30 clear depth
Recommend to not clearing framebuffer by loading empty Renderpass begin()/end()
without actual draw calls.. etcetera.
Renderpass begin/end using LoadOpClear
vkCmdClearAttachments
24 FPS 57 FPS
Samsung Electronics
Development Tip - Clear framebuffer cost
Color
Depth
Stencil
Clear All
Color
Depth
Stencil
Clear Depth
Color
Depth
Stencil
Clear Depth
Color
Depth
Stencil
LoadOpClear
Color
Depth
Stencil
Clear All
Color
Depth
Stencil
Clear Depth
Depth
Clear Depth
Depth
vkCmdClearAttachments
Only Renderpass begin/end
Renderpass begin/end + APIs Faster!
LoadOpClear
LoadOpClear
LoadOpClear
LoadOpLoad
LoadOpLoad
LoadOpClear
LoadOpLoad
LoadOpLoad
LoadOpClear
LoadOpClear
LoadOpClear
vkCmdClearAttachments
Samsung Electronics
Development Tip - Clear framebuffer cost
Very simple example only for description.
Is insideRenderpass?
Request clear
vkCmdClearAttachments()
Set variable VK_ATTACHMENT_LOAD_OP_CLEAR
Next event
false
true
Is insideRenderpass?
Request drawPrimitive
Next event
false
true
DrawPrimitive()
Find properRenderpass
vkCmdBeginRenderPass()
CreateRenderpass()
false
true
Need to store it!
Get variable VK_ATTACHMENT_LOAD_OP_CLEAR
Samsung Electronics
Development Tip - PushConstant
Push constants are helpful to increase performance (the effect is GPU dependent)
It’s very easy to use.
But you should check device limit. VkPhysicalDeviceLimits::maxPushConstantsSize
// VertexShader…layout(push_constant) uniform buf1{mat4 _unif00;
} pc; // you cannot skip instancing, if uniform is push_constant.void main(){gl_position = pc._unif00 * _in_vertex;
}
vkCmdPushConstants(commandBuffer, layout, stageFlags, offset, MVPMatrix.size(), MVPMatrix.data());
VkPipelineLayout
Samsung Electronics
Development Tip - Swizzle
ExpectedError
The map texture format was ETC1 (VK_FORMAT_ETC2_R8G8B8_UNORM_BLOCK)
// Fragment shadervoid main() {
…vec4 mapColor = texture(mapSampler, texCoord);fragColor = mapColor.rgb * mapColor.a;
}
Samsung Electronics
Development Tip - Swizzle
VkComponentMapping {VkComponentSwizzle r;VkComponentSwizzle g;VkComponentSwizzle b;VkComponentSwizzle a;
};
VkImageViewCreateInfo {…
VkComponentMapping components;…
};
VK_COMPONENT_SWIZZLE_ONE
Samsung Electronics
Development Tip - SecondaryCommandBuffer + Multi-thread
This is simple logic for using SecondaryCommandBuffer.
Recording Phase Execute Phase
Create Secondary CommandBuffer
Bind GraphicPipelineAssume that there are no dynamic
PSOs
Bind DescriptorSet
Bind VertexBuffer / IndexBuffer
Draw
Execute Secondary CommandBuffer
Begin Primary CommandBuffer
EndPrimary CommandBuffer
Update Phase
UniformBuffer
Update UniformBuffer
Begin
End
Samsung Electronics
Development Tip - SecondaryCommandBuffer + Multi-thread
This is simple logic for using SecondaryCommandBuffer.
SCB #0
SCB #1
SCB #2
SCB #3
SCB #4
SCB #5
UniformBuffer #0
UniformBuffer #1
UniformBuffer #2
UniformBuffer #3
UniformBuffer #4
UniformBuffer #5
Create & Record SecondaryCommandBuffer
Bind
Draw phase
Just update UniformBuffer &
Execute SecondaryCommandBuffer!
UniformBuffer #0
UniformBuffer #1
UniformBuffer #2
UniformBuffer #3
Samsung Electronics
Development Tip - SecondaryCommandBuffer + Multi-thread
OpenGL ES 2.0 + Single thread
Vulkan + SecondaryCommandBuffer + Multi thread
※ CPU side matrix transform calculation.
Samsung Electronics
Development Tip - SecondaryCommandBuffer + Multi-thread
Queue
CPU Thread
CPU Thread
CPU Thread
CPU Thread
With Secondary Command Buffer
Update buffer 1/4
Update buffer 2/4
Update buffer 3/4
Update buffer 4/4
Execute SecondaryCommandBuffer
Samsung Electronics
Development Tip - Multi-thread
Queue
CPU Thread
CPU Thread
CPU Thread
CPU Thread
Command Buffer
cmd cmd cmd
cmd cmd cmd
cmd cmd cmd
cmdcmd cmd
Samsung Electronics
Development Tip - Multi-thread
Queue
CPU Thread
CPU Thread
CPU Thread
CPU Thread
Command Buffer
cmd cmd cmd
cmd cmd cmd
cmd cmd cmd
cmdcmd cmd
submit!
Samsung Electronics
Development Tip - Multi-thread
Queue
CPU Thread
CPU Thread
CPU Thread
CPU Thread
Command Buffer
cmd cmd cmd
cmd cmd cmd
cmdcmd cmd
cmd cmd cmd
VkCommandPool , VkDesciprtorPool should be synchronized or all those pools are should be independently handled by each corresponding thread.
Samsung Electronics
Development Tip - Reducing duplicated API calls
It is important to calling bind/set function once in a VkCommandBuffer to prevent duplicated vkCmdSetXXX, vkCmdBindXXX calls with same value / parameter.
Worst case
※ In our test case, 500 Calls vkCmdSetViewPort and vkCmdSetScissor take 1.412 ms.
Thank you