25
Mike Wendt @mike_wendt github.com/nvidia github.com/mike-wendt BUILDING A GPU-FOCUSED CI SOLUTION

BUILDING A GPU-FOCUSED CI SOLUTION€¦ · Many traditional tools like Travis CI, Circle CI, and others do not support GPUs For good reasons, dangers of misuse For tools that offer

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: BUILDING A GPU-FOCUSED CI SOLUTION€¦ · Many traditional tools like Travis CI, Circle CI, and others do not support GPUs For good reasons, dangers of misuse For tools that offer

Mike Wendt

@mike_wendt

github.com/nvidia

github.com/mike-wendt

BUILDING A GPU-FOCUSED CI SOLUTION

Page 2: BUILDING A GPU-FOCUSED CI SOLUTION€¦ · Many traditional tools like Travis CI, Circle CI, and others do not support GPUs For good reasons, dangers of misuse For tools that offer

2

AGENDA

Need for CPU CI

Challenges of GPU CI

Methods to Implement GPU CI

Improving GPU CI Today

Demo

Lessons Learned

Next Steps

Getting Started

Page 3: BUILDING A GPU-FOCUSED CI SOLUTION€¦ · Many traditional tools like Travis CI, Circle CI, and others do not support GPUs For good reasons, dangers of misuse For tools that offer

3

NEED FOR GPU CI

• The leading open-source software projects from Apache and others rely on CI

• External demand

• Partners are collaborating with us on projects like GPU Open Analytics Initiative (GoAi) and need GPU CI to ensure stable builds

• Internal demand

• Large code-bases internally for all kinds of GPU-accelerated applications require testing across different platforms/hardware

• Performance testing of new drivers and hardware needs repeatable methods to make sure we continue to deliver performance

The number of GPU-accelerated applications are growing

Page 4: BUILDING A GPU-FOCUSED CI SOLUTION€¦ · Many traditional tools like Travis CI, Circle CI, and others do not support GPUs For good reasons, dangers of misuse For tools that offer

4

CHALLENGES OF GPU CI

Need GPUs

Cloud or physical

Resource management

Expose GPU configuration to developers

Driver, CUDA, GPU type

Many traditional tools like Travis CI, Circle CI, and others do not support GPUs

For good reasons, dangers of misuse

For tools that offer support, many times it is not native

Still feels “hacky,” but it gets the job done

GPUs bring a different set of problems than traditional CI

Page 5: BUILDING A GPU-FOCUSED CI SOLUTION€¦ · Many traditional tools like Travis CI, Circle CI, and others do not support GPUs For good reasons, dangers of misuse For tools that offer

5

METHODS TO IMPLEMENT

GPU CI

Page 6: BUILDING A GPU-FOCUSED CI SOLUTION€¦ · Many traditional tools like Travis CI, Circle CI, and others do not support GPUs For good reasons, dangers of misuse For tools that offer

6

BARE-METAL + GPU

Benefits

Reduces complexity with minimal setup

Works well for a small set of projects that use the same/similar dependencies

Challenges

Managing dependencies can be tricky for multiple projects

Limits ability to test multiple platforms, limited to installed CUDA/OS

Resource management is difficult

Fastest to get started with the most limitations

Page 7: BUILDING A GPU-FOCUSED CI SOLUTION€¦ · Many traditional tools like Travis CI, Circle CI, and others do not support GPUs For good reasons, dangers of misuse For tools that offer

7

BARE-METAL + GPUFastest to get started with the most limitations

Server

GPUs

CI Environment

Source

CodeTests

Test

Results

Page 8: BUILDING A GPU-FOCUSED CI SOLUTION€¦ · Many traditional tools like Travis CI, Circle CI, and others do not support GPUs For good reasons, dangers of misuse For tools that offer

8

DOCKER + NVIDIA CONTAINER RUNTIME

Docker runtime that allows for GPU pass-thru on Linux systems

Works with Debian/Ubuntu, RHEL/CentOS, and Amazon Linux

Allows for testing multiple CUDA/OS environments on one machine

Includes options to set supported driver operations and restrict GPU visibility

github.com/nvidia/nvidia-docker

Page 9: BUILDING A GPU-FOCUSED CI SOLUTION€¦ · Many traditional tools like Travis CI, Circle CI, and others do not support GPUs For good reasons, dangers of misuse For tools that offer

9

DOCKER + GPU

Benefits

Ability to test multiple CUDA/OS combinations

Handles dependency management for all projects

Enables fine-grained resource management

Supports scale needed for larger projects and teams

Challenges

Typically requires pre-built Docker images with environments for testing and code to test injected into container for testing

Configuration tends to be a lot of environment variables and cumbersome to manage

GitLab CI and Jenkins require “runners” for multiple nodes

Easier to use with some hacking still required

Page 10: BUILDING A GPU-FOCUSED CI SOLUTION€¦ · Many traditional tools like Travis CI, Circle CI, and others do not support GPUs For good reasons, dangers of misuse For tools that offer

10

DOCKER + GPUEasier to use with some hacking still required

Server

GPUs

CI Environment Docker Container

Docker + NVIDIA Runtime

Source

Code

Tests

Dockerfile or

Container

Test Results

Custom

Config

Page 11: BUILDING A GPU-FOCUSED CI SOLUTION€¦ · Many traditional tools like Travis CI, Circle CI, and others do not support GPUs For good reasons, dangers of misuse For tools that offer

11

DOCKER + GPUEasier to use with some hacking still required

Server

GPUs

CI Environment Docker Container

Docker + NVIDIA Runtime

Source

Code

Tests

Dockerfile or

Container

Test Results

Custom

Config

Page 12: BUILDING A GPU-FOCUSED CI SOLUTION€¦ · Many traditional tools like Travis CI, Circle CI, and others do not support GPUs For good reasons, dangers of misuse For tools that offer

12

DOCKER + GPUEasier to use with some hacking still required

Server

GPUs

CI Environment Docker Container

Docker + NVIDIA Runtime

Source

Code

Tests

Dockerfile or

Container

Test Results

Custom

Config

Page 13: BUILDING A GPU-FOCUSED CI SOLUTION€¦ · Many traditional tools like Travis CI, Circle CI, and others do not support GPUs For good reasons, dangers of misuse For tools that offer

13

KUBERNETES + DOCKER + GPU

Benefits

GPU support in v1.8+ of Kubernetes

Takes care of the “runner” challenge with GitLab/Jenkins

Resource management and scheduling is handled by Kubernetes

Challenges

Can only target GPUs on homogeneous nodes (heterogeneous support coming)

Not all tools support GPU CI out of the box

Docker containers required for testing, but this can be the previous step in a pipeline

Promises to be the easiest to use with minimal hacking

Page 14: BUILDING A GPU-FOCUSED CI SOLUTION€¦ · Many traditional tools like Travis CI, Circle CI, and others do not support GPUs For good reasons, dangers of misuse For tools that offer

14

KUBERNETES + DOCKER + GPUPromises to be the easiest to use with minimal hacking

Kubernetes Master

Kubernetes Master

Docker Container

Docker + NVIDIA Runtime

Source

Code

Tests

Dockerfile or

Container

Test Results

Server

CI Environment

Kubernetes Worker

GPUs

Docker Container Repo

Docker Test

Container

Scheduler

Custom

Config

Page 15: BUILDING A GPU-FOCUSED CI SOLUTION€¦ · Many traditional tools like Travis CI, Circle CI, and others do not support GPUs For good reasons, dangers of misuse For tools that offer

15

KUBERNETES + DOCKER + GPUPromises to be the easiest to use with minimal hacking

Kubernetes Master

Kubernetes Master

Docker Container

Docker + NVIDIA Runtime

Source

Code

Tests

Dockerfile or

Container

Test Results

Server

CI Environment

Kubernetes Worker

GPUs

Docker Container Repo

Docker Test

Container

Scheduler

Custom

Config

Page 16: BUILDING A GPU-FOCUSED CI SOLUTION€¦ · Many traditional tools like Travis CI, Circle CI, and others do not support GPUs For good reasons, dangers of misuse For tools that offer

16

KUBERNETES + DOCKER + GPUPromises to be the easiest to use with minimal hacking

Kubernetes Master

Kubernetes Master

Docker Container

Docker + NVIDIA Runtime

Source

Code

Tests

Dockerfile or

Container

Test Results

Server

CI Environment

Kubernetes Worker

GPUs

Docker Container Repo

Docker Test

Container

Scheduler

Custom

Config

Page 17: BUILDING A GPU-FOCUSED CI SOLUTION€¦ · Many traditional tools like Travis CI, Circle CI, and others do not support GPUs For good reasons, dangers of misuse For tools that offer

17

KUBERNETES + DOCKER + GPUPromises to be the easiest to use with minimal hacking

Kubernetes Master

Kubernetes Master

Docker Container

Docker + NVIDIA Runtime

Source

Code

Tests

Dockerfile or

Container

Test Results

Server

CI Environment

Kubernetes Worker

GPUs

Docker Container Repo

Docker Test

Container

Scheduler

Custom

Config

Page 18: BUILDING A GPU-FOCUSED CI SOLUTION€¦ · Many traditional tools like Travis CI, Circle CI, and others do not support GPUs For good reasons, dangers of misuse For tools that offer

18

HOW CAN WE MAKE THIS

BETTER TODAY?

Page 19: BUILDING A GPU-FOCUSED CI SOLUTION€¦ · Many traditional tools like Travis CI, Circle CI, and others do not support GPUs For good reasons, dangers of misuse For tools that offer

19

JENKINS PLUGIN FOR NVIDIA + DOCKER

Simplifies the configuration of Docker containers for GPU CI testing

Allows for targeting a Dockerfile within the repo to build and use for testing or a Docker image in a remote hub

Supports side-containers with GPU support

Easy to use and adapt a project for GPU CI

Based on Jenkins docker-slaves plugin

Page 20: BUILDING A GPU-FOCUSED CI SOLUTION€¦ · Many traditional tools like Travis CI, Circle CI, and others do not support GPUs For good reasons, dangers of misuse For tools that offer

20

DEMO

Page 21: BUILDING A GPU-FOCUSED CI SOLUTION€¦ · Many traditional tools like Travis CI, Circle CI, and others do not support GPUs For good reasons, dangers of misuse For tools that offer

21

JENKINS PLUGIN FOR NVIDIA + DOCKER Simplifying the configuration for GPU CI

Server

GPUs

Jenkins CI Environment Docker Container

Docker + NVIDIA Runtime

Source

Code

Tests

Dockerfile or

Container +

Plugin Config

Test Results

Page 22: BUILDING A GPU-FOCUSED CI SOLUTION€¦ · Many traditional tools like Travis CI, Circle CI, and others do not support GPUs For good reasons, dangers of misuse For tools that offer

22

LESSONS LEARNED

• CI best practices apply to GPU code as well

• Pull request testing is one of the best methods to ensure code quality

• GitLab CI works great if there are only a few GPU-enabled repos to test

• For scale-out, GitLab on Kubernetes is best

• Larger organizations and projects need a centralized CI platform like Jenkins

• Setup of a new repo is easy and with parameterized builds we can make use of existing pipelines

• Advanced uses of Jenkins

• Tagging is key to test on multiple GPU architectures and pipelines for multiple CUDA version testing

Page 23: BUILDING A GPU-FOCUSED CI SOLUTION€¦ · Many traditional tools like Travis CI, Circle CI, and others do not support GPUs For good reasons, dangers of misuse For tools that offer

23

NEXT STEPS

• Continue plugin development and release as an open source project

• Internal

• Continue deployment of GPU CI and migrate performance testing toward full GPU CI

• Leverage capabilities of Jenkins to go beyond CI with CD and workflow automation

• External

• Expand GPU CI testing by testing pull requests of open source projects using Jenkins and the plugin

• Take advantage of the GPU targeting within Kubernetes and new GPU features in the coming months

• Look at ways to more closely integrate GPU CI with GitLab CI and Jenkins plugins for Kubernetes

Page 24: BUILDING A GPU-FOCUSED CI SOLUTION€¦ · Many traditional tools like Travis CI, Circle CI, and others do not support GPUs For good reasons, dangers of misuse For tools that offer

24

GETTING STARTED

github.com/nvidia

NVIDIA Docker Runtime

nvidia-docker

NVIDIA Kubernetes Device Plugin

k8s-device-plugin

github.com/mike-wendt

Jenkins Plugin For NVIDIA

Coming soon

Docker + NVIDIA Runtime on Ubuntu

nvidia-docker-ubuntu

Links to useful repos

Page 25: BUILDING A GPU-FOCUSED CI SOLUTION€¦ · Many traditional tools like Travis CI, Circle CI, and others do not support GPUs For good reasons, dangers of misuse For tools that offer

Mike Wendt

@mike_wendt

github.com/nvidia

github.com/mike-wendt

THANK YOU