47
Corporate Technology The Many Approaches to Real-Time and Safety-Critical Linux Open Source Summit Japan 2017 Prof. Dr. Wolfgang Mauerer Siemens AG, Corporate Research and Technologies Smart Embedded Systems Corporate Competence Centre Embedded Linux Copyright c 2017, Siemens AG. All rights reserved. Page 1 31. Mai 2017 W. Mauerer Siemens Corporate Technology

The Many Approaches to Real-Time and Safety-Critical Linuxevents17.linuxfoundation.org/sites/events/files/slides/talk_10.pdf · 1 Real-Time and Safety 2 Approaches to Real-Time Architectural

  • Upload
    vanlien

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Corporate Technology

The Many Approaches toReal-Time and Safety-Critical Linux

Open Source Summit Japan 2017

Prof. Dr. Wolfgang MauererSiemens AG, Corporate Research and TechnologiesSmart Embedded SystemsCorporate Competence Centre Embedded Linux

Copyright c� 2017, Siemens AG. All rights reserved.

Page 1 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Corporate Technology

The Many Approaches toReal-Time and Safety-Critical Linux

Open Source Summit Japan 2017

Prof. Dr. Wolfgang Mauerer, Ralf Ramsauer, Andreas KolblSiemens AG, Corporate Research and TechnologiesSmart Embedded SystemsCorporate Competence Centre Embedded Linux

Copyright c� 2017, Siemens AG. All rights reserved.

Page 1 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Overview

1 Real-Time and Safety

2 Approaches to Real-TimeArchitectural PossibilitiesPractical Approaches

3 Approaches to Linux-Safety

4 Guidelines and Outlook

Page 2 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Introduction & Overview

About

Siemens Corporate Technology: Corporate Competence Centre Embedded LinuxTechnical University of Applied Science Regensburg

Theoretical Computer ScienceHead of Digitalisation Laboratory

Target Audience Assumptions

System Builders & Architects, Software ArchitectsLinux Experience availableNot necessarily RT-Linux and Safety-Critical Linux experts

Page 3 31. Mai 2017 W. Mauerer Siemens Corporate Technology

A journey through the worlds ofreal-time and safety

Page 4 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Outline

1 Real-Time and Safety

2 Approaches to Real-TimeArchitectural PossibilitiesPractical Approaches

3 Approaches to Linux-Safety

4 Guidelines and Outlook

Page 5 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Real-Time: What and Why? I

Real Time

Deterministic responses to stimuliBounded latencies (not too late, not tooearly)Repeatable resultsOptimise/quantify worst case

Real Fast

Caches, TLB, LookaheadPipelinesOptimise average case

Page 6 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Real-Time: What and Why? II

Type Characteristics Use Cases

Soft Real-Time Subjective Deadlines Media rendering, I/O95% Real-Time Deadlines met most of the time,

misses can be compensatedData acquisition, finance, navi-gation, . . .

100% Real-Time Miss deadline: Defects occur Industrial Automation & control,Robotics, Airplanes, . . .

Ensuring Real-Time

Statistical testingWCET calculation + schedulability testingFormal verification

Page 7 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Real-Time: What and Why? II

Type Characteristics Use Cases

Soft Real-Time Subjective Deadlines Media rendering, I/O95% Real-Time Deadlines met most of the time,

misses can be compensatedData acquisition, finance, navi-gation, . . .

100% Real-Time Miss deadline: Defects occur Industrial Automation & control,Robotics, Airplanes, . . .

Ensuring Real-Time

Statistical testingWCET calculation + schedulability testingFormal verification

Page 7 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Safety: What and Why?

Some undesirables

Brake: Segfault!Engines full speed ahead: Segfault!and so on. . .

Safety-Critical SystemsMalfunctions of the system (may) result in

death/injury to peopledamage to equipment/propertyenvironmental harm

Safety 6= Real-Time, but often coupled!100% RT + fatal consequences ) Safety

Page 8 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Safety: Standards

Challenge:

Electrical Power Drive IEC61800

Nuclear Power Plants IEC61513 Medical Device Software

IEC62304

Railways IEC62278

Industrial Process IEC61511

Robotic Devices ISO10218

Machinery IEC62061

Automotive ISO26262

“umbrella” standard IEC61508

Routes to Safety

StandardcompliantdevelopmentProven in useCompliantnon-compliantdevelopment

Page 9 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Outline

1 Real-Time and Safety

2 Approaches to Real-TimeArchitectural PossibilitiesPractical Approaches

3 Approaches to Linux-Safety

4 Guidelines and Outlook

Page 10 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Approaches to Real-Time Linux

Middleware

COTS Hardware

App App

Standard LanguagesControl Framework

Linux

RT LatencyΔt €

+RT

+FPGA

-Over

head+RT-Brid

ge+/- Engineering

Specialised Languages

Proprietary Hardware

Control Application

Specialised OS

Static

+RT-Net

€€€

RT LatencyΔt

Dynamic

-+

Why Real-Time Linux?

Commodity featuresSubtractive vs. additive EngineeringMulti-Core utilisation. . .

Page 11 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Architectural possibilities I

1 Traditional RTOS in side-device2 RT-Enhanced Kernel3 Separation Kernel4 Co-Kernel5 Asymmetric Multiprocessing

Pros and Cons

3 Countless variants available3 Pre-Certified Versions3 Extreme simplicity7 Hard to extend with state-of-the art IT7 Vendor lock-in7 Unusual APIs etc.

Page 12 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Architectural possibilities I

1 Traditional RTOS in side-device2 RT-Enhanced Kernel3 Separation Kernel4 Co-Kernel5 Asymmetric Multiprocessing

Pros and Cons

3 Leverage existing Linux Know-How3 Integration of high-level technologies

with little effort7 Certification complicated7 Complex system7 Only statistical RT assurance

Page 12 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Architectural possibilities I

1 Traditional RTOS in side-device2 RT-Enhanced Kernel3 Separation Kernel4 Co-Kernel5 Asymmetric Multiprocessing

Pros and Cons

3 Clean split between RT and non-RT3 Substantial certification experience7 Typically strong HW coupling7 Vendor Lock-In

Page 12 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Architectural possibilities I

1 Traditional RTOS in side-device2 RT-Enhanced Kernel3 Separation Kernel4 Co-Kernel5 Asymmetric Multiprocessing

Pros and Cons

3 Clean split between RT and non-RT3 Ressource efficient7 Non-standard maintenance efforts7 Implicit couplings

Page 12 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Architectural possibilities I

1 Traditional RTOS in side-device2 RT-Enhanced Kernel3 Separation Kernel4 Co-Kernel5 Asymmetric Multiprocessing

Pros and Cons

3 Combine advantages of split systemswith single HW basis

3 Near bare metal performance7 Implicit couplings7 Relatively new development7 Maintenance overhead

Page 12 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Architectural possibilities II

Commonality

System partitioning!Logical instead of physicalWorkloads of different criticality handled by different system portions ) MixedCriticality

Page 13 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Practical Approaches

Practical Approaches

Preempt-RTXenomai/ipipeARM/PRUGPUs/FPGA assisted RTTraditional RTOSes

Page 14 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Preempt-RT I

Enhance Linux with RT capabilities

Preemption (incl. preemption at kernellevel)Deterministic (and fine-grained) timingbehaviourAvoidance of priority inversion (prioinheritance/ceiling)

RT Howto

Don’t anything stupidLock memory (no paging)No inappropriate syscalls (networkingetc.)No block device access. . .

Linux Foundation: Official project (goal: upstreaming code)Typical Jitter: 50µs (x86), 150 µs (rpi)

Page 15 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Preempt-RT II

0

100

200

300

400

3.0.

101-

rt130

3.2.

78-rt

113

3.4.

111-

rt141

3.6.

11-rt

313.

8.13

-rt16

3.10

.101

-rt11

13.

12.5

7-rt7

73.

14.6

5-rt6

83.

18.2

9-rt3

04.

0.8-

rt64.

1.20

-rt23

4.4.

9-rt1

74.

6-rc

7-rt1

Stack Version

Num

bero

fcom

mits

Types of patches backport forwardport invariant

Page 16 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Preempt-RT III: Pros and Cons

Advantages

3 Patch availability and communitysupport

3 Re-use of engineering knowledge3 Excellent multi-core scalability3 RT in userspace easily possible

Disadvantages

7 Functional certifiability limited7 Achieving smallest latencies requires

substantial system knowledge7 Mixing RT and non-RT easy7 Fixing problems requires substantial

system knowledge

Page 17 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Xenomai 3.0 I

Xenomai: RTOS-to-Linux

Provides skins for traditional RTOSesTwo modes of operation

Run on top of Linux (w. or w/o RTcapabilities)Run over co-kernel extension(patched Linux required)

Kernel

Hardware

Scheduler A

Userspace

ProcessTask

Scheduler B

Dispatching and Collaboration Services

Task

IRQ

Task Task Task

Preemption

IRQ IRQ IRQ

ipipe patch: 450-600 KiB (depending on arch), (mostly) stable over timeTypical Jitter: 10µs (x86), 50 µs (rpi)

Image source: Siemens AG, CC BY-SA 3.0

Page 18 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Xenomai 3.0 II: Architecture sketch

Image source: Xenomai.org, CC BY-SA 3.0

Page 19 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Xenomai 3.0 II: Architecture sketch

Image source: Xenomai.org, CC BY-SA 3.0

Page 19 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Xenomai 3.0 III: Pros and Cons

Cobalt (Co-Kernel)

3 Clean split between RT/non-RT(transition is signalled)

3 Light-weight in low-end platforms (lockcontention, cache usage etc.)

7 Very limited number ofdevelopers/small community

7 Porting effort required; availability lag7 Regressions on upstream changes

Mercury (Preempt-RT)

3 Architectural basis maintained bysubstantial community

3 Very solid skin framework w/o invasivecore changes

7 Legacy scheduling not always 100%reproducible

7 Inadvertently mixing RT and non-RTeasier

Page 20 31. Mai 2017 W. Mauerer Siemens Corporate Technology

ARM + PRU I

Programmable Real-Time Unit (PRU)

Subsystem

Interconnect

INTC Peripherals

PRU0

I/O

Inst.

RAM

Shared

RAM

Data

RAM

Inst.

RAM

Data

RAM

PRU1

I/O

Shared

Memory Peripherals

Peripherals GP I/O

L4 Interconnect

ARM Subsystem

Cortex-A

L1

Instruction

Cache

L1

Data

Cache

L2 Data Cache

PRU0

(200MHz)

PRU1

(200MHz)

L3 Interconnect L3 Interconnect

Page 21 31. Mai 2017 W. Mauerer Siemens Corporate Technology

ARM + PRU II

Programmable Real-Time Unit

Dedicated (two) execution units basedon 32-Bit RISC architectureNo pipelines, no caches, separateinstruction/data memory (shared RAM)Linux Support via remoteproc andrpmsg framework200 MHz ) 5ns cycle time

Pros and Cons

3 High determinism/small jitter3 Simpler than adding µC components to

system3 Clean split between RT and non-RT7 Tied to (very) specific hardware7 Increased maintenance efforts

(additional compilers etc.)

Page 22 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Real-Time GPU/FPGA assisted Computing

Automotive/Image Processing

General Purpose Real Fast Soft Real-Time Hard Real-Time

Driver’s Display Automatic Lane Following

Movie Player Backup Camera Warning System

Speculative Evasion Pre-Planning

Eye Tracking

Route Planning Voice Control Traffic Sign Recognition Autonomous Local Navigation

Data Encryption (Disk, Network, etc.) Emergency Collision AvoidanceIntelligent Cruise Control

Autonomous Control

Figure 2: Spectrum of possible temporal requirements for a number of automotive applications that may utilize a GPU.Each feature may cross domains, as indicated by the line beneath each feature name.

note that the use of GPUs appears to be the only eco-nomically feasible solution able to meet the processingrequirements of advanced driver-assist and autonomousfeatures in future automotive applications. Unfortunately,there are obstacles created by current GPU technologythat must be overcome before GPUs can be incorporatedinto real-time systems. In this paper we discuss severalof these obstacles and present a summary of solutions wehave found through our research to date. We hope to en-gage the real-time and cyber-physical systems communi-ties to identify additional applications where the use ofGPUs may be beneficial or even necessary. Through fur-ther research and the development of a breadth of appli-cations, we hope to inspire GPU manufactures to incor-porate features into their products to improve real-timebehaviors.

This paper is organized as follows. In the next sec-tion, we present several applications where GPUs maybe beneficial in real-time systems. In Sec. 3, we presentthe unique constraints imposed by current GPU technol-ogy that pose challenges to the use of GPUs in real-timesystems. In Sec. 4, we present a summary of solutionsthat we have developed that address several of these con-straints and allow GPUs to be used in real-time systems.In Sec. 5, we present future directions for our research anddiscuss what changes may be necessary in current GPUtechnology to better support real-time systems. Finally,in Sec. 6, we conclude with remarks on the field of real-time GPUs.

2 Real-Time GPU ApplicationsThere are a number of real-time domains where GPUsmay be applied. For example, a GPU can efficientlycarry out many digital signal processing operations suchas multidimensional FFTs and convolution as well as ma-trix operations such as factorization on data sets of upto several gigabytes in size. These operations, coupled

with other GPU-efficient algorithms, can be used in med-ical imaging and video processing, where real-time con-straints are common. Additionally, a particularly com-pelling application for real-time GPUs is that of automo-biles.

GPUs can be used to implement a number of systemfeatures in the automotive domain. For user interface fea-tures, a GPU may be used to realize rich displays for thevehicle operator and to implement responsive voice-basedcontrols [16], all while possibly driving video entertain-ment displays for other passengers simultaneously. Fur-ther, a GPU can also be used to track the eyes of thevehicle operator [24]. Such tracking could be used toimplement a number of safety features. Real-time ap-plications for GPUs in automobiles become even moreapparent when we consider driver-assist and autonomousvehicle features. In these platforms, multiple streamsof data from video feeds, laser range sensors, and radarcan be processed and correlated to provide environmen-tal data for a number of vehicle functions. This data canbe used for automatic sign recognition [27], local naviga-tion (such as lane following), and obstacle avoidance [29].GPUs are well suited to handle this type of workload sincethese sensors generate enormous amounts of data. Indeed,GPUs are likely the only efficient and cost-effective solu-tion. Moreover, these are clearly safety-critical applica-tions where real-time constraints are important.

Fig. 2 depicts a number of automobile features thatcould make use of a GPU. These features are plottedalong a spectrum of temporal requirements showing ourview of the relative need for real-time performance. Thespectrum is broken up into four regions: general-purpose,“real-fast,”2 soft real-time, and hard real-time. Features inthe general-purpose region are those that could possiblybe supported by general-purpose scheduling algorithms,though may still be a part of a real-time system. The “real-fast” region captures applications that may have general

2The term “real-fast” is borrowed from Paul McKenny [26].

G. Elliot and J. H. Anderson, IEEE RTCSA, 2011, 48–54

Page 23 31. Mai 2017 W. Mauerer Siemens Corporate Technology

GPU Architecture

CPU/GPU Architecture

Abb. 1.1: Unterschied zwischen einer CPU- und GPU-Architektur(Quelle: [CGM14], Seite 9)

Für die Entwicklung der Bibliothek und den Test der Algorithmen wurden die TeslaK20c und das Jetson TK1 Entwicklungsboard verwendet. Beide basieren aus der inAbbildung 1.2 dargestellten Kepler1 Architektur.

Der L2-Cache entspricht dem globalen Speicher, der von allen laufenden Threads aufder GPU geschrieben und gelesen werden kann. Er ist der größte auf der GPU vor-handene Speicher, besitzt allerdings auch die längsten Zugriffszeiten. Um der GPUDaten für die Ausführung bereit zu stellen, müssen diese initial vom RAM Speicher derCPU über den entsprechenden Speicherbus in den globalen Speicher der GPU kopiertwerden. Nach einer Berechnung durch die GPU werden alle Ergebnisse wieder in denglobalen Speicher abgelegt, wodurch diese von hier wieder in den RAM Speicher derCPU transferiert werden können. Es ist anzumerken, dass je nach Anbindung der GPUdie Transferraten des Speicherbusses sehr lang sein können, wodurch Daten nicht öfterals nötig zwischen GPU und CPU kopiert werden sollten.

Eine GPU auf Basis der Kepler Architektur besteht wiederum aus mehreren Multipro-zessoren (SMX), deren Aufbau in Abbildung 1.3 dargestellt ist.

Jeder dieser Multiprozessoren besitzt einen kleineren, schnelleren L1- und Read-Ony-Cache, eigene Register, Shedduler, Dispatcher, mehrere Load/Store Units (LD/ST), Dou-ble-Precision Units (DP Unit), Special-Function Units (SFU) und eine hohe Anzahl anSingle-Percision Units (Cores). Da hier nur ein kleiner Einblick in die Architektur vonGPUS gegeben werden kann, sei für eine genauere Beschreibung der einzelnen Kompo-nenten zum Beispiel auf [CGM14] verwiesen.

Interessant ist jedoch der L1-Cache eines solchen Multiprozessors. Dieser ist zwar klei-ner als der globale L2-Cache, besitzt aber deutlich schnellere Zugriffszeiten als die-ser, wodurch er ausgezeichnet als Zwischenspeicher von temporären Ergebnissen einesThreads dienen kann, wie später noch gezeigt wird. Da dieser Speicher jedoch von al-len Kernen eines Multiprozessors verwendet wird, muss darauf geachtet werden, dasspro Kern nicht zu viel Speicher benötigt wird, da ansonsten die Anzahl an ausführbaren

1http://www.nvidia.de/object/nvidia-kepler-de.html

2

Program Flow

Abb. 1.4: Beispielhafter Ablauf eine Cuda-Programms(Quelle: [CGM14], Seite 25)

1.2 Cuda

CUDA ist ein von NVIDIA eigens für ihre GPUS bereitgestelltes Framework zur Pro-grammierung von Grafikprozessoren. Das bedeutet, wo OPENCL prinzipiell für alleArten von GPUS verschiedener Hersteller verwendet werden kann, zielt CUDA aus-schließlich auf die Programmierung von NVIDA-GPUS ab.

Nichts desto trotz besteht die Möglichkeit, GPUS der Firma NVIDIA mit Hilfe vonOPENCL zu programmieren. Dies wird aber von OPENCL nur stiefmütterlich unter-stützt und erreicht zudem auch nicht die Performance, die mit dem eigens für NVIDA-GPUS entwickelten CUDA-Framework erreicht werden kann.

Aus diesem Grund wurde CUDA für die Implementierung verwendet, was aber nichtausschließt, das die vorgestellten Algorithmen für andere GPUS mit ein wenig Aufwandauf OPENSSL portiert werden könnten. Dies ist jedoch nicht Teil dieser Arbeit.

CUDA stellt eine Erweiterung des C/C++-Syntax dar, indem dieser durch spezielle Schlüs-selwörter ergänzt wurde. Zudem gibt es bereits CUDA-Wrapper für die gängigsten Pro-grammiersprachen, wie zum Beispiel Java, Python, Perl und .NET, wodurch der Einstiegin die Programmierung mit CUDA sehr einfach gestaltet ist. Für die entwickelte Biblio-thek erfolgten die CUDA-Umsetzung jedoch ausschließlich mit Hilfe des erweitertenC++-Syntax.

Wie in Abbildung 1.4 dargestellt, enthält ein CUDA-Programm eine Kombination ausCPU- und GPU-Code und stellt somit ein heterogenes Programmiermodell dar.

Wird dieser Code mit dem NVCC Compiler aus dem CUDA-Framework kompiliert, sowird einerseits Maschinencode für die CPU mittels des verwendeten C++ Compilers

5

GPU Computation Cycle1.) Copy CPU ! GPU; 2.) Execute Kernels; 3.) Copy GPU ! CPU

J. Cheng et al.: Professional Cuda C Programming, John Wiley & Sons, 2014

Page 24 31. Mai 2017 W. Mauerer Siemens Corporate Technology

GPU/FPGA assisted RT: Problems

Problems

GPU execution model fundamentally non-preemptiveExecution and memory copyHyper-Q/MPS: Optimise utilisation, not determinism

Device driver issues (binary-only)I/O-device ) scheduling not straightforward

Page 25 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Ressource Arbitration

Ressource Sharing

GlobalClusteredPartitioned

Heterogenous Computing

2 compute ressources (assumption:multiple GPUs on embedded systems)(at least) 9 possible combinations!Schedulability POV: Clustered CPUs +partitioned GPUs

Parameter Descriptionm number of system CPUsh number of system GPUsc CPU cluster sizeg GPU cluster sizeTG set of all GPU-using tasksTC set of all CPU-only tasksecpui Ti’s provisioned CPU execution timeegpui Ti’s provisioned GPU execution time

qcpui

total CPU execution time withinTi’s GPU critical section

zIi size of Ti’s GPU input data (bytes)zOi size of Ti’s GPU output data (bytes)zSi size of Ti’s inter-job GPU state data (bytes)bi upperbound on blocking for Ti

Table I: Important notation.

one arbitrary GPU in its GPU cluster. egpui , zIi , zOi , and

zSi are zero for Ti ∈ TC . The term bi denotes an upper-bound on the time Ti,j may be blocked due to lock requests(for presentation simplicity, we assume tasks share no otherresources, but this is not a GPUSync requirement). We derivevalues for bi in Appendix B. Finally, Ti’s utilization is givenby ui ! (ecpu

i + egpui + xmit(zIi , zOi , zSi ))/pi, and the task set

utilization is U ! !ni=1 ui.

We refer back to the parameters summarized in Table I.Example. If we assume that the GPU usage pattern illus-trated in Fig. 3 represents the entire execution sequence of ajob Ti,j , then ecpu

i = (t2−t0)+(t4−t3)+(t6−t5)+(t9−t7),egpui = t5−t4, qcpu

i = (t2−t1)+(t4−t3)+(t6−t5)+(t8−t7),and xmit(zIi , zOi , zSi ) = (t3 − t2) + (t7 − t6) (assumingzSi = 0, i.e. the job has no state to migrate between GPUs).

B. GPUSync Structure

It helps to refer to concrete system configurations in describ-ing GPUSync, so let us define several such configurations.Fig. 4 depicts a matrix of several high-level CPU/GPUconfigurations for a 12-CPU, 8-GPU system, which we alsouse in Secs. IV and V. We refer to each cell in Fig. 4 using acolumn-major tuple, with the indices P , C, and G denotingpartition, clustered, and global choices, respectively. Thetuple (P, P ) refers to the top-left corner—a configurationwith partitioned CPUs and GPUs. Likewise, (G,C) indicatesthe right-most middle cell—globally scheduled CPUs withclustered GPUs. We use the wildcard ∗ to refer to anentire row or column: e.g., (P, ∗) refers to the left-mostcolumn—all configurations with partitioned CPUs. Withineach cell, individual CPUs and GPUs are shown on theleft and right, respectively. Dashed boxes delineate CPU andGPU clusters (no boxes are used in partitioned cases). Thesolid lines depict the association between CPUs and GPUs.For example, the solid lines in (C,C) indicate that two GPUclusters are wholly assigned to each CPU cluster. Finally,the horizontal dashed line across each cell denotes theNUMA boundary of the system. Offline, tasks are assignedto CPU and GPU clusters in accordance with the desired

Partitioned Clustered Global

Part

itio

ned

Clu

ster

edG

lobal

CPU Scheduling

GP

U O

rganiz

ation

Figure 4: Concrete configurations.

configuration.GPUSync uses a two-level nested locking structure: an

outermost token lock to allocate GPUs to jobs and innermostengine locks to arbitrate access to GPU engines. This isdepicted in Fig. 5. In Step A (or time t1 in Fig. 3), thejob requests a token from the GPU allocator responsible formanaging the GPUs in the job’s GPU cluster. The GPUallocator determines which token—and by extension, whichGPU—should be allocated to the request. The requestingjob may access the assigned GPU once it receives a tokenin Step B. In Step C, the job competes with other token-holding jobs for GPU engines; access is arbitrated by theengine locks. A job may only issue GPU operations on itsassigned GPU after acquiring its needed engine locks in StepD. For example, an engine lock must be acquired at times t2,t4, and t6 in Fig. 3. With the exception of P2P migrations,a job cannot hold more than one engine lock at a time.

GPUSync can be configured to use different lockingprotocols to manage tokens and engines. In this paper, weconfigure GPUSync to use protocols known to offer asymp-totically optimal blocking bounds under FL scheduling. Wenow describe the two locking levels in more detail. Weprovide blocking analysis in Appendix B.

Token lock. Each cluster of g GPUs is managed byone GPU allocator. We associate ρ tokens (a configurable

CE0IN

CE0OUT

EE0

GPUAllocator

request

Engine Locks

GPUg–1GPU0

Figure 5: High-level design of GPUSync.

263

G. Elliot and J. H. Anderson, IEEE RTCSA, 2011, 48–54Page 26 31. Mai 2017 W. Mauerer Siemens Corporate Technology

GPU: Scheduling Practicalities

Central Server

Dispatchtime-bounded kernelsLongest runningkernel determineslatencyAdditionalsynchronisationcomplexity

Clustering

Less wasteful thanpartitioningWorst case: Locallymaximal executiontimeHW Queue supportFurther silicon supportrequired

Preempt. Kernels

Implement contextsave/restoreHighly experimental;advanced features(streams, sharedmemory, . . . ) notsupported

Page 27 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Small RTOS I

Traditional Real-Time Operating System

Optimisation: Size, DeterminismStrong focus: scheduling and schedulabilityFrugal feature set

System POSIX Maturity VM Archs Drivers Ressources Docs

FreeRTOS 7 high 3 high high low good

RTEMS 3 very high 7 high very high avg very good

µclinux 3 avg 7 avg high avg poor

mbed 7 high 7 low low avg very good

Zephyr 7 high 7 low avg low avg

Page 28 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Small RTOS II

Prerequisite: Execution Env

Static partitioningReal-Time capable virtualisation (e.g.,KVM over Preempt-RT)

Defeats the point, somewhat. . .

Pros and Cons

3 Base systems certifiable3 Rich scheduling/schedulability options3 Clear split between RT and non-RT7 Often non-POSIX programming model7 Maintenance effort doubles7 Implicit coupling via shared ressources

(busses etc.)

Page 29 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Outline

1 Real-Time and Safety

2 Approaches to Real-TimeArchitectural PossibilitiesPractical Approaches

3 Approaches to Linux-Safety

4 Guidelines and Outlook

Page 30 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Safety Strategies with Linux

SIL2LinuxMP

“Distributed System” on a chipContainers + minimal tools (compliantdevelopment) + partitioning allocatorMinimise interference

System Partitioning

JailhouseRequires HW virtualisationUp to N OSes on N-Core

SafeGRequires ARM TrustZoneTwo OSes (trusted and untrusted)Temporal isolation: FIQ vs. IRQ

Quest and Quest-VResearch systems, interesting nichefeatures (e.g., RT-USB)

Image Source: N. McGuire, GNU/Linux for safety-related systems – SIL2LinuxMP, FOSDEM 2016Page 31 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Safety Strategies for Linux

Safety and Linux

Partition system in various waysMixed Criticality : Combine critical & uncritical workloads

Page 32 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Jailhouse I

Jailhouse: Motivation

SMP is everywhereEnables consolidation of formerlyseparate devices

Linux is almost everywhere, butLegacy software stacks requirebare-metalSafety-critical software stacksDSP-like real-time workloads

Page 33 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Jailhouse II

Jailhouse Architecture

Build static partitions on SMP systemsUse hardware-assisted virtualisationDo not schedule

No CPU core sharing, 1:1 device assignment

Split up running Linux systemSimplicity over Features

Page 34 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Jailhouse: Challenges

Issues

Memory-Mapped I/OIndivisible hardware ressourcesErroneous hardware behaviour

Page 35 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Jailhouse: Challenges

Issues

Memory-Mapped I/OIndivisible hardware ressourcesErroneous hardware behaviour

Page 35 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Jailhouse: Impact?

Measurement

RTEMS (Jailhouse/bare metal) task switching overheadLinux partition with and w/o load

RTEMS on Bare Metal RTEMS on Jailhouse/High Linux load RTEMS on Jailhouse/No Linux load

100

200300400500

5 10 15 20 25 5 10 15 20 25 5 10 15 20 25Task Switch duration [us]

# O

ccur

renc

es [k

]

Page 36 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Outline

1 Real-Time and Safety

2 Approaches to Real-TimeArchitectural PossibilitiesPractical Approaches

3 Approaches to Linux-Safety

4 Guidelines and Outlook

Page 37 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Guidelines and Outlook

Guidelines

Combinatorial explosion ofalternatives. . .Data capture/signal processing

Jailhouse, PRUAudio, media and non-fatal control

Preempt-RTReal-Time combined with throughoutrequirements

Xenomai with Cobalt kernelInvolved temporal interrelations

RTOS on system partition

Outlook

Appliances with certified andnon-certified modeIncreased HW support for partitioningand multi-OS

Page 38 31. Mai 2017 W. Mauerer Siemens Corporate Technology

Thanks for your interest!

Page 39 31. Mai 2017 W. Mauerer Siemens Corporate Technology