46
S9391 GstCUDA: Easy GStreamer and CUDA Integration Eng. Daniel Garbanzo MSc. Michael Grüner GTC March 2019

GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

S9391 GstCUDA: Easy GStreamer

and CUDA Integration

Eng. Daniel Garbanzo MSc. Michael GrünerGTC March 2019

Page 2: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

Agenda

About RidgeRun

GStreamer Overview

CUDA Overview

GstCUDA Introduction

Application Examples

Performance Statistics

GstCUDA Demo on TX2

Q&A

2

Page 3: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

● US Company - R&D Lab in Costa Rica

● 15 years of experience

● Embedded Linux and GStreamer experts

● Custom multimedia solutions

● Digital signal/image processing

● AI and Machine Learning solutions

● System optimization: CUDA, GStreamer, OpenCL, OpenGL, OpenVX, Vulkan

● Support for embedded and resource constrained systems

● Professional services, dedicated teams and specialized tools

About Us

3

Page 4: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

● Complex multimedia applications require a lot of processing resources● GStreamer offers a flexible way for creating multimedia applications

● CUDA offers high performance accelerated processing capabilities

Medical Industry Automotive Industry Smart Devices Computer Vision

4

Page 5: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

● Open source framework for audio and video applications

● Based on a pipeline architecture

● Extensible design based on plugins (more than 1000 freely available)

● Automatic format and synchronization handling

● Tools for easy prototyping

Modularity FlexibilityPortability

5

Page 6: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

● Each plugin represents a different processing module

● The plugins are linked and arranged in a pipeline

● Freedom to build arbitrary pipelines for different applications6

Basic MP4 player GStreamer Pipeline

Page 7: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

Modular design lets you change your application easily!

7

Easily change your application end use

Easily change from SW to HW accelerated processing

Page 8: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

Code equivalent :gst-launch v4l2src ! videoconverter ! omxh265enc ! mpegtsmux ! udpsink

Code equivalent :gst-launch v4l2src ! videoconverter ! x265enc ! mpegtsmux ! filesink

Modular design lets you change your application easily!

8

Page 9: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

9

Page 10: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

GstCUDA

10

Page 11: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

GstCUDA

11

Page 12: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

What Does GstCUDA Solve?

12

Page 13: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

●●●

Integration Complexities

13

Page 14: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

Development Time

Without GstCUDA

WithGstCUDA

3 Months 10 days 5 days

Create GStreamer plugin with CUDA support

Generate CUDA algorithm

Integrate CUDA algorithm

10 days 0.1 day

Generate CUDA algorithm

Integrate CUDA algorithm

Total = 3.5 months

Total = 10.1 days

● Reduce development time

● Focus on the CUDA logic

● Minimize time to market

14

Page 15: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

●●●

Memcpy Memcpy

Performance Bottleneck

15

Page 16: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

Performance BottleneckWithout GstCUDA With GstCUDA

● Efficient memory handling improves performance

● Up to 2x 4K@60fps

● Data transfers bottleneck cause poor performance

● Limited framerate at high resolutions 16

Page 17: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

Supported Platforms● Focused for NVIDIA Embedded Platforms

Jetson TX1, TX2, TX2i and Nano Jetson AGX Xavier

17

Page 18: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

GstCUDA Key Features

18

Page 19: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

GstCUDA Key Features

19

Page 20: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

Framework Overview

20

Page 21: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

Quick Prototyping Elements

21

Page 22: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

location = median_filter.so

Cudafilter Element

22

Page 23: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

location = thermal_overlay.so

Cudamux Element

IR

23

Page 24: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

CUDA Algorithm Interface● Make your CUDA algorithm compatible by implementing these interfaces

Cudafilter Interface

bool open();

bool close();

bool process (const GstCudaData &inbuf,

GstCudaData &outbuf);

bool process_ip (const GstCudaData

&inbuf, GstCudaData &outbuf);

bool open();

bool close();

bool process (vector<GstCudaData>

&inbufs, GstCudaData &outbuf);

bool process_ip (vector<GstCudaData>

&inbufs, GstCudaData &outbuf);

Cudamux Interface

24

Page 25: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

Buffer Processing Methods

process_ip(In place)

process(Not in place)

25

Page 26: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

Create Your Custom Element

• •

● Some applications may require specialized elements● GstCUDA provides bases classes to simplify development

26

Page 27: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

GstCUDA Framework Usage Example

27

Page 28: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

GstCUDA Framework Summary

● Utils to handle memory interfaces

● GStreamer Unified Memory allocators

● Parent classes for different topologies

● The framework includes:

● Generic elements to evaluate custom algorithms

● Runtime loading of CUDA algorithms

● Complete GstCUDA element boilerplate

● CUDA algorithms for the prototyping elements

GstCUDA API Quick prototyping elements Set of examples

28

Page 29: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

GstCUDA Application Areas Examples Video

29

Page 30: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

Industrial Applications: Border Enhancement

30

Page 31: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

Automation Applications: Hough Transform

31

Page 32: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

Security Applications: Motion Detection/Estimation

32

Page 33: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

Performance Statistics

33

Page 34: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

Varying Algorithm / Fixed Image Size

● Image convolution algorithm

● Stressing compute capabilities

● Variable convolution kernel size

● 1080p@240fps / 1080p@60fps stream input

● Cudafilter element

● Unified Memory allocator

● Jetson TX2 platform

● Not In-place

Test Conditions

location = convolution.so

34

Page 35: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

Varying Algorithm / Fixed Image Size

Framerate Stats

35

Page 36: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

Varying Algorithm / Fixed Image Size

Processing Time Stats

36

Page 37: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

Varying Algorithm / Fixed Image Size

CPU Load Stats GPU Load Stats

37

*baseline = simple capture pipeline (without GstCUDA)

Page 38: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

Fixed Algorithm / Varying Image Size

● Memory copy algorithm

● Stressing data transfer

● Variable input resolution

● Cudafilter element

● Unified Memory allocator

● Jetson TX2 platform

● In-place vrs not In-place

Test Conditions

location = memcpy.so

38

Page 39: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

Fixed Algorithm / Varying Image Size

Framerate Stats

39

Note: Maximum Framerate limited to 245 fps by the video source

Page 40: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

Fixed Algorithm / Varying Image Size

Processing Time Stats

40

Page 41: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

Fixed Algorithm / Varying Image Size

CPU Load Stats GPU Load Stats

41*baseline = simple capture pipeline (without GstCUDA)

Page 42: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

Fixed Algorithm / Varying Image Size

● Simple image mixing algorithm

● Stressing data transfer

● Variable input resolution

● Cudamux element

● Unified Memory allocator

● In-place=True

● Jetson TX2 platform

Test Conditions

location = mixer.so

42

Page 43: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

Fixed Algorithm / Varying Image Size

Framerate Stats

43

Note: Maximum Framerate limited to 240fps by the video source

Page 44: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

Fixed Algorithm / Varying Image Size

CPU Load Stats GPU Load Stats

44*baseline = simple capture pipeline (without GstCUDA)

Page 45: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

GstCUDA Live Demo on Jetson TX2Sobel Filter 1080p60fps

45

gst-launch-1.0 nvcamerasrc sensor-id=2 fpsRange=60,60 ! "video/x-raw(memory:NVMM),width=1920,height=1080,framerate=60/1,format=I420" ! nvvidconv ! "video/x-raw" ! queue ! cudafilter in-place=false location=/borders.so ! queue ! nvoverlaysink

Code equivalent :

Page 46: GstCUDA: Easy GStreamer and CUDA Integration S9391 · bool process_ip (vector &inbufs, GstCudaData &outbuf); Cudamux Interface 24. Buffer Processing Methods process_ip

● GstCUDA wiki page:

○ gstcuda.ridgerun.com

● RidgeRun Website:

○ ridgerun.com

● RidgeRun Contact:

○ ridgerun.com/contact

Resources

46