Implementation and Evaluation of Mixed-Criticality ... · Implementation and Evaluation of Mixed-Criticality Scheduling Algo-rithms for Multi-Core Systems ... As an example implementation

Institut fürTechnische Informatik undKommunikationsnetze

Lukas Sigrist

Implementation and Evaluation ofMixed-Criticality Scheduling Algo-rithms for Multi-Core Systems

Semester Thesis SA-2013-57September 2013 to January 2014

Tutor: Prof. Dr. Lothar ThieleSupervisors: Georgia Giannopoulou, Pengcheng Huang

ii

iii

Acknowledgments

I’d like to thank Prof. Dr. Lothar Thiele for giving me the opportunity to write this semester thesiswithin his research group of the Computer Engineering and Network Laboratory at ETH Zürich.I’m very grateful to my supervisors Georgia Giannopoulou and Pengcheng Huang for their sup-port and the enriching discussions. Their constructive feedback helped very much to achieve allthe objectives of this thesis.I also want to thank the various people of the Computer Engineering group who also helped meduring this thesis.Finally I would like to thank my family for their everlasting support.

iv

v

Abstract

Mixed-criticality scheduling, which has attracted high interest in recent years, is also focusingincreasingly towards multi-core platforms. Scheduling algorithms for such platforms need to beimplemented in a framework to evaluate their runtime overhead on real systems. This thesisimplements such a framework that supports the implementation of multi-core scheduling mixed-criticality schedulers and evaluation of their overhead. The presented framework is easily ex-tensible, such that new scheduling algorithms can be implemented and evaluated without mucheffort. It also implements the "Flexible Time Triggered Scheduler" and a partitioned extension ofthe single-core EDF-VD scheduler called "Partitioned EDF-VD" scheduler to test and demon-strate the functionality of the framework. The overheads of these two schedulers are evaluatedby extensive simulations. The results of their overhead calculations are finally used to check ifthe main overhead contributors match the expected trends. The analysis shows as well that thetotal runtime overhead in the simulations is below 0.15% for both schedulers, but can increaseconsiderably when the periods and execution times of the tasks are scaled down.

vi

Contents

1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Multi-Core Mixed Criticality Scheduling 32.1 The Scheduling Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Flexible Time Triggered Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3 Partitioned EDF-VD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3.1 EDF Scheduler with Virtual Deadlines . . . . . . . . . . . . . . . . . . . . 52.3.2 Task Set Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Implementation 93.1 Hierarchical Scheduling Framework . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1.1 The Thread Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.1.2 Scheduling from User Space . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2 Extensions for Multi-Core Support . . . . . . . . . . . . . . . . . . . . . . . . . . 143.2.1 CPU Affinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.2.2 Barrier Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.2.3 Time Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.2.4 Parser Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.2.5 Simulation Output and Visualization . . . . . . . . . . . . . . . . . . . . . 16

3.3 Flexible Time Triggered Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . 163.3.1 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.3.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.3.3 The XML Input Specification . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.4 Partitioned EDF-VD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.4.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.4.2 The XML Input Specification . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4 Experiments 254.1 Overhead Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.2 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.3.1 The Impact of the Number of Cores . . . . . . . . . . . . . . . . . . . . . 274.3.2 The Impact of the Task Overrun Probability . . . . . . . . . . . . . . . . . 284.3.3 The Impact of the System Utilization . . . . . . . . . . . . . . . . . . . . . 284.3.4 The Impact of the Task Execution Times . . . . . . . . . . . . . . . . . . . 31

4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5 Conclusion and Outlook 355.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.2 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

vii

viii CONTENTS

A Hierarchical Scheduling Framework 37A.1 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37A.2 Project Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39A.3 Simulation and Analysis Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39A.4 Visualization Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

B Original Project Assignment 43

C Presentation Slides 49

List of Figures

2.1 FTTS example schedule for two cycles of 200ms. First cycle without any over-run, second cycle with an overrun in the first frames. Therefore the low criticalsubframe is dropped in this frame. . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.1 Illustration of the added HSF scheduling layer on top of the standard Linux sched-uler on the left. [13] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2 Overview of the elements of the Hierarchical Scheduling Framework. [13] . . . . 103.3 Comparison of the thread hierarchy of the original HSF implementation and the

one adapted for multi-core support. Colors: green for new implementations, yel-low for extension or changes, red for dropped parts. . . . . . . . . . . . . . . . . 13

3.4 Priority assignment of the different framework components. [18] . . . . . . . . . . 143.5 Priority inversion at FTTS synchronization points: 1) Scheduler priority is tem-

porarily decreased, 2) Worker priority is temporarily increased. . . . . . . . . . . 183.6 Possible FTTS schedule for the task set defined in the sample XML specification.

[12] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.1 Relative scheduler overheads for different number of cores. The normalized uti-lization is set to 0.4 and the overrun probability to 40% for all simulations. . . . . . 27

4.2 Relative scheduler overheads for different overrun probabilities. The normalizedsystem utilization is set to 0.4 for all simulations. . . . . . . . . . . . . . . . . . . 29

4.3 Relative scheduler overheads for different system utilizations. The overrun prob-ability is set to 40% for all simulations. . . . . . . . . . . . . . . . . . . . . . . . . 30

4.4 Relative scheduler overheads for different number of cores for the scaled downtask sets. The normalized utilization is set to 0.4 and the overrun probability to40% for all simulations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.5 Relative scheduler overheads for different overrun probabilities for the scaleddown task sets. The normalized system utilization is set to 0.4 for all simulations. 33

4.6 Relative scheduler overheads for different system utilizations for the scaled downtask sets. The overrun probability is set to 40% for all simulations. . . . . . . . . . 34

A.1 Legend for all indicators used for the visualization . . . . . . . . . . . . . . . . . . 40A.2 Sample visualization output of the Hierarchical Scheduling Framework generated

by simfig. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

ix

x LIST OF FIGURES

Chapter 1

Introduction

A common trend in embedded systems is to integrate different applications on a single platformfor efficiency reasons and to reduce costs. This trend also applies to safety-critical applicationssuch as avionics and automotive systems. These systems are known as mixed-criticality sys-tems and the different tasks of them are classified into different criticality levels, depending ontheir impacts on safety if they fail.Often such safety critical tasks require certification of one or more so-called certification author-ities (CA). An example of such a task is the autopilot in avionics. The CA has to make someassumptions about the worst case execution time (WCET) of the different tasks to certify thecorrect functionality of the system. These assumptions are typically much more pessimistic thanthe assumptions a system designer would make. However these pessimistic execution times arenever met or only in very rare cases. A system designer would need to build a system that isover-provisioned for most cases and only uses a small part of its resources, just to get thecertification.Here the mixed-criticality scheduling approach comes into play: This type of schedulers inte-grate also less critical tasks (e.g. tasks that do not require certification, might be the navigationsystem in our avionics example) into the same platform, to take advantage of the resources thatare not needed by the more critical tasks most of the time. However, if at runtime an event occurswhere the higher critical tasks require more time to execute, these mixed-criticality schedulersguarantee execution times up to the pessimistic assumptions of the CA to pass certification. Insuch cases the less critical tasks are executed in so-called degraded mode, namely they haveshorter execution times or are directly dropped to provide more system resources to the highercritical tasks.Due to performance, efficiency and cost reasons there is an increasing interest to implementsuch systems in multi-core platforms. Because these platforms are much more complex, certifi-cation as well as the scheduling gets much more complicated.

1.1 Motivation

Mixed-criticality scheduling on multi-core platforms is a rather new topic. Different schedulingalgorithms for multi-core platforms have been proposed, but they have not been or only in veryfew cases been implemented in a real system. A reason for this is that there are only veryfew frameworks that support the implementation of such multi-core mixed-criticality schedulingalgorithms. Therefore the aim of this thesis was to develop a framework that supports multi-core mixed-criticality scheduling. The framework should be designed to measure the overheadof the implemented scheduling algorithms and should also be easily extensible such that newalgorithms can be implemented quickly and evaluated easily.Our framework extends the existing Hierarchical Scheduling Framework [13, 18], a frameworkthat was designed as a user space application to be portable between various platforms withoutany additional cost. This property perfectly fits our goals, because our framework should alsoallow to evaluate new scheduler propositions on different platforms.As an example implementation we consider a scheduling algorithm that was recently proposedby Giannopoulou et al. called "Flexible Time Triggered Scheduler" [12]. Our goal is to measureits overhead in a real-world system and to calculate its implementation overhead. As reference

1

2 CHAPTER 1. INTRODUCTION

we also implement an extension of the single-core mixed-criticality EDF-VD [7] scheduler tomulti-cores called "Partitioned EDF-VD" [8].

1.2 Related Work

So far there have only been very few scheduling frameworks that support multi-core mixed-criticality scheduling. There is an extension of the standard Linux kernel that adds support forreal-time scheduling for mixed-criticality systems including multi-core scheduling called LITMUS[9, 5]. However this framework operates on kernel level, therefore it is not easily portable andit requires a lot of effort to be extended with a new scheduler implementation. Recently a userspace framework was proposed that targets scheduling on multi-core systems [16] and imple-ments a first scheduling algorithm called clustered EDF. This framework is easier to port be-tween different platforms, but also needs some efforts because time measurement is done us-ing the x86-architecture-specific timestamp counter (TSC) and a fast context switch mechanismthat is implemented at assembly-level. The third framework worth mentioning is the HierarchicalScheduling Framework [13, 18], which is also implemented in the user space of the Linux op-erating system. This framework has already support for mixed-criticality scheduling, is portableto other platforms with no effort because it is implemented without any hardware dependen-cies and is easily extensible due to its modular implementation. However, it lacks support formulti-core specific mechanisms, which need to be added for multi-core scheduling support.For scheduling mixed-criticality tasks on multi-core platforms, there have been different propos-als. Several extensions of the single-core EDF-VD algorithm [7] have been proposed. Theseinclude global EDF-VD [15] as well as partitioned EDF-VD [8]. Different partitioned mixed-criticality scheduling algorithms based on fixed-priority scheduling are proposed in [14]. In [17] aglobal scheduling algorithm based on fixed-priority scheduling is considered. A recent proposalfor mixed-criticality scheduling on multi-core is called Flexible Time Triggered Scheduler, whichadditionally takes into account concurrent access to shared memory in multi-core systems [12].From the algorithms mentioned before, we will see how the Flexible Time Triggered Schedulerand the partitioned EDF-VD scheduler are implemented in the new framework.

1.3 Contributions

The contributions of this thesis can be summarized as follows:

• Building a multi-core scheduling framework that supports mixed-criticality scheduling byextending the Hierarchical Scheduling Framework [13, 18] with all needed multi-core sup-port.

• Implementing the multi-core mixed-criticality scheduling algorithms called "Flexible TimeTriggered Scheduler" (FTTS) and "Partitioned Earliest Deadline First with Virtual Dead-lines" (pEDF-VD) to evaluate their runtime overhead on a real-world platform.

• Proposing a metric to quantify the overhead of the newly implemented multi-core sched-ulers.

1.4 Overview

The remainder of this thesis is organized as follows: In the subsequent Chapter 2 a formaldescription of mixed-criticality scheduling problem is introduced and the theory of the imple-mented schedulers is presented. Chapter 3 looks closer at the implementation of the requiredmechanisms for multi-core scheduling and focuses on the implementation of the two mentionedschedulers. In Chapter 4 we present the setup of our experiments as well as the results wegot when running these experiments. A short discussion and outlook are finally presented inChapter 5.The most important information to get the presented scheduling framework running can befound in Appendix A.

Chapter 2

Multi-Core Mixed CriticalityScheduling

In this section we first present the problem of scheduling mixed criticality tasks. We also in-troduce the terminology needed to discuss mixed-criticality schedulers. With these definitionswe are then able to discuss the functionality of the two schedulers that are implemented in ourframework.

2.1 The Scheduling Problem

This section describes the established model for mixed-criticality task sets and shortly presentsall the metrics that we need in the subsequent sections to describe the scheduling algorithms.

Mixed Critical Task Set To describe mixed-criticality task sets, a terminology very similarto the ones in [8, 12] is used. We consider mixed-critical periodic task sets τ = {τ1, ..., τn},where each of the tasks can have a criticality level from 1 (lowest) to L (highest). A task itself ischaracterized by a 3-tuple τi = {Ti, χi,Ci}, where the separate variables are characterized asfollows:

• Ti ∈ N+ is the period of the task,

• χi ∈ {1, ..., L} is the criticality level of the task,

• Ci is a size-L vector of different execution times, where Ci(l) is the execution time oncriticality level l.

For simplicity, we assume that all tasks are released at time t = 0 and the relative deadline ofeach task corresponds to its period Di = Ti. For the different execution times, we assume thatthey are monotonically increasing with increasing criticality level. We do not care about Ci(l) forcriticality levels higher than the task ones (l > χi), because we assume that when tasks runaccording to their level-l execution time, tasks with lower criticality level than l are aborted. Wewill use the term job (of a task τi) to refer to a single execution of the task. A new job arrivesevery task period Ti.

Utilization The utilization of a periodic task is defined as the ratio between the task’s worstcase execution time (WCET) and its period. For a whole task set it is the sum of all task utiliza-tions. In case of mixed-criticality tasks, we calculate the utilization of a task τi depending on thecriticality level l, on which the task is being executed. The following formula defines the level-lutilization for the task τi [8]:

Ui(l) := Ci(l)/Ti (2.1)

To characterize the utilization of a mixed criticality task set we define the utilization parameter

Uyx (τ) =∑

τi∈τ∧χi=x

Ui(y) x, y ∈ {1, ..., L} (2.2)

3

4 CHAPTER 2. MULTI-CORE MIXED CRITICALITY SCHEDULING

where x denotes the criticality level from which tasks are considered and y denotes the level onwhich these tasks execute. For example U1

2 (τ) denotes the sum of the utilization of all tasks withcriticality 2, under the assumption that all their jobs execute on level 1. Because the implementedschedulers only support two criticality levels, we will also write low critical or LO for the criticalitylevel 1 and high critical or HI for the criticality level 2.

Mixed Critical Scheduling The execution times of different jobs of each task vary from pe-riod to period and are different for different simulation runs. An algorithm for scheduling thesedifferent jobs has to assure the following two points to be called correct (generalization of thedefinition in [8] to l criticality levels):

• If all jobs run with the lowest criticality execution profile (i.e. no task τi exceeds its Ci(1)WCET), then all jobs receive enough time between their arrival and deadline to completeexecution.

• If any job (with criticality level greater than 1) executes for more than its level-l WCET,then all jobs with criticality level l or higher receive enough time between their arrival anddeadline to complete execution.

We will present for each of the implemented schedulers which properties a task set needs tofulfill such that the scheduler is able to guarantee these two conditions. We can then classify thetask sets as schedulable or not by the corresponding algorithm.

2.2 Flexible Time Triggered Scheduler

The first scheduler we briefly discuss is the Flexible Time Triggered Scheduler, as of now calledFTTS. This section is a short summary of the scheduler that is described in detail in [12].In multi-core environments there can be access contentions for shared resources (e.g. syn-chronous access to a shared memory), which means that an access from one core can blocktasks on other cores from continuing their execution. The FTTS takes into account concurrentaccess to shared memory in its analysis and assures that no low critical task can preempt ahigher critical one.Like a standard time triggered scheduler, the FTTS has a cycle with a duration of the leastcommon multiple of all task periods. This cycle is divided into frames, which can be of differentlengths, but are also fixed during runtime.These frames are then split again into so-called subframes. These subframes are the centraland flexible part of the FTTS algorithm. Each of them is dedicated to one of the L criticalitylevels, which means that there are L subframes in each frame. The tasks are mapped to thedifferent frames and cores at design time. All tasks that are scheduled during a specific frameare executed in the subframe of their criticality level, i.e. tasks with χi = l are executed in thesubframe with criticality level l. If multiple tasks are mapped to the same subframe, these tasksare executed sequentially in a predefined order.The start time of a subframe is not time triggered to guarantee efficient resource utilization. Asubframe can only start if all tasks of the previous subframe across all cores have finished itsexecution. This is achieved using an inter-core synchronization mechanism. An exception is thefirst subframe in a frame: It starts with the time triggered beginning of a new frame.Figure 2.1 shows an example schedule of a task set that was scheduled by FTTS. The cycleduration here is 200ms and each frame lasts 50ms. This example uses only two criticality levelsand therefore two subframes per frame. We will use this example to discuss the runtime behaviorof the FTTS.At runtime the length of each subframe varies because it depends on the execution times ofits tasks. However, the worst case duration of the subframes can be calculated offline. Thesedurations depend on the criticality level, on which the tasks are executed. Because synchro-nization is needed at the end of each subframe, we will call these durations worst case barriersynchronization time (WCBST) on criticality level l.During runtime the scheduler executes all tasks in their predefined sequence. After all tasksof a subframe have been started in the predefined order, the scheduler waits until all tasks ofthe current subframe have finished on all cores (using the inter-core synchronization mentioned

2.3 Partitioned EDF-VD 5

0 50 100 150 200

Core 1

Core 2

HI LO

barrier synchronization

frameFTTS cycle 1

250 300 350 400

HI LO

overrun

dropped subframe

frameFTTS cycle 2

Figure 2.1: FTTS example schedule for two cycles of 200ms. First cycle without any overrun,second cycle with an overrun in the first frames. Therefore the low critical subframe is droppedin this frame.

above). When all tasks of the subframe have finished execution, the scheduler checks the totalexecution time t of the subframe. If necessary the scheduler increases its criticality level lSched,on which tasks are being executed, until the measured execution time is lower or equal than thecomputed WCBST of that level:

lSched = max{argminl{t ≤WCBST (l)}, lSched} (2.3)

Then the scheduler continues with the next subframe. If now the criticality level lSched is higherthan the one of the actual subframe, the scheduler simply skips that subframe and drops alltasks that would have been executed, because the deadlines of the subframe cannot be guar-anteed anymore. Important is to note that the criticality level lSched is only increased for thecurrent frame. At the start of another frame, lSched is reset to the lowest level and possibly allsubframes can be executed again.In the depicted example schedule we can see in the first cycle (0–200ms) a normal executionwithout increasing the criticality level at any point. In the first frame of the second cycle, an exam-ple of an overrunning subframe is shown: Task τ2 has a very high execution time, the subframemisses its WCBST of the low criticality level and the criticality level of the scheduler needs to beincreased. Because of that, the lower critical subframe needs to be dropped. However, as soonas the next frame starts, the scheduler’s criticality level lSched is reset to the lowest level and thescheduler can execute all subframes again, if none of them overruns a low critical WCBST.To compute a FTTS schedule at design time, [12] suggests an optimization algorithm. This usesa heuristic approach in order to assign the tasks to cores and to map the jobs to the frame inwhich they will execute. The algorithm optimizes the job’s relative order of execution for eachframe, such that the slack time between the WCBST of the last subframe and the end of theframe for each criticality level is maximized.

2.3 Partitioned EDF-VD

The partitioned EDF-VD scheduler (pEDF-VD) [8] is an extension of the single-core EDF-VDscheduler [7] to multi-core systems. It consists of one EDF-VD scheduler per CPU core and apartitioning algorithm to partition the tasks to processing cores. The partitioning is done offlineand assigns each task to one of the single-core schedulers on the different CPU cores. Atruntime each of the single-core scheduler schedules its own task set independent of the othercores and without any information exchange with them.The details of the used single-core EDF-VD scheduler and the partitioning algorithm are dis-cussed in the next two sections.

2.3.1 EDF Scheduler with Virtual Deadlines

This single-core scheduler is already implemented in the mixed criticality extension for the HSF[18]. It is a central part of the partitioned EDF-VD multi-core scheduler and supports two criti-cality levels. This scheduler starts by scheduling the tasks in low criticality mode according to


the EDF scheduling algorithm, namely depending solely on their deadlines and independent ofthe criticality level of the different tasks. The only change that needs to be done for schedulingin the low criticality mode is to adjust the deadlines of the high critical tasks to guarantee theiroriginal deadline when they need more time to execute. This is done by multiplying their originalrelative deadline with a factor x ≤ 1 to obtain their virtual deadline. The factor x depends onlyon the task set and is calculated offline before the task set is handled by the scheduler. Theformula is given as follows [7]:

x =ULOHI (τ)

1− ULOLO (τ)(2.4)

The scheduler switches to high criticality mode as soon as one of the high critical tasks hasoverrun its low critical WCET. At this point the scheduler cancels all low critical tasks that havenot finished their execution. The EDF-VD scheduler then continues scheduling only the highcritical tasks according to EDF, this time according to their original, unmodified deadlines. Thehigh criticality mode is used until no pending high critical tasks need to be scheduled. Thenthe scheduler switches back to low criticality mode and continues scheduling the low criticalitytasks according to their original deadlines and the high criticality tasks according to their virtualdeadlines.An important property of the EDF-VD scheduler is that it successfully schedules both high andlow criticality tasks of any task set, if the following condition is fulfilled [7]:

max(ULOLO (τ) + ULOHI (τ), U

HIHI (τ)

)≤ 3/4 (2.5)

This means that every task set that has a utilization of less or equal 3/4 on each criticality level issuccessfully schedulable with EDF-VD. The partitioning algorithm discussed in the next sectionwill heavily rely on this result.

2.3.2 Task Set Partitioning

Even if each core will have a fixed task set and its own EDF-VD scheduler during runtime,the task set for the pEDF-VD scheduler is given globally. This task set first needs to be par-titioned before starting the independent schedulers. The algorithm for partitioning the task setand checking its schedulability is given in Algorithm 2.1 [8, Section 6].In a first step all the high critical tasks are distributed across the cores such that the high criticalutilization UHIHI of the task sets on each core does not exceed 3/4, as it is required in Equa-tion 2.5. This is done using a first fit bin packing algorithm and starts with highest utilization taskfirst. If one of the high critical task can not be mapped to any core, the partitioning algorithmstops and returns failure.After this first step the same is done using the low critical utilization of each core (including lowcritical utilizations of already partitioned high critical tasks). Each low critical task is mappedonto the first core that matches the criteria of having a low critical utilization of lower than orequal 3/4. Here the partitioning algorithm stops as well and returns failure when a task couldnot be mapped to any core.If both steps succeed, the partitioning algorithm returns success. Please note that the schedu-lability of all the EDF-VD schedulers on each core does not need to be evaluated, because theutilization limit of 3/4 in the algorithm already assures that the EDF-VD schedulability conditionin Equation 2.5 is fulfilled by design of the partitioning algorithm. The only thing that needs to bedone after partitioning is to calculate the virtual deadlines as described before. Then everythingis ready to run the independent EDF-VD schedulers on the different cores.

2.4 Summary

In this chapter we introduced the mixed criticality scheduling problem and introduced the termi-nology to describe mixed-criticality task sets. This terminology was then used to explain the twomixed-criticality scheduling algorithms for multi-core systems called "FTTS" and "pEDF-VD".

2.4 Summary 7

Algorithm 2.1 Partitioning algorithm for the pEDF-VD scheduler. [8, Section 6]

. τ = {τ1, ..., τn} is partitioned on m processors denoted by π1, ..., πm.Tasks τ1, ..., τn1 areHI-criticality tasks, tasks τn1+1, ..., τn are LO-criticality tasks, both partsordered by decreasing utilization on their criticality level.τ(πk) denotes the set of all tasks mapped to processor πk.

τ(πk)← ∅, ∀kfor i← 1 to n1 do . Phase 1: HI-criticality tasks

for k ← 1 to m doif Ui(HI) +

∑τj∈τ(πk)

Uj(HI) ≤ 3/4 then . Satisfies schedulability condition on πkτ(πk)← τ(πk) ∪ {τi} . assign τi to πkbreak

end ifend forif k ≥ m then return PARTITIONING FAILEDend if

end forfor i← n1 + 1 to n do . Phase 2: LO-criticality tasks

for k ← 1 to m doif Ui(LO) +

∑τj∈τ(πk)

Uj(LO) ≤ 3/4 then . Satisfies schedulability condition on πkτ(πk)← τ(πk) ∪ {τi} . assign τi to πkbreak

end ifend forif k ≥ m then return PARTITIONING FAILEDend if

end forreturn PARTITIONING SUCCEEDED


Chapter 3

Implementation

This chapter describes all the multi-core extensions of the Hierarchical Scheduling Frameworkand discusses the implementation of the FTTS and pEDF-VD scheduling algorithms.First we take a closer look at the original framework and how it was extended to be deployed onmulti-core platforms. In the second half of the chapter we discuss the specific implementationsof the two added scheduling algorithms.

3.1 Hierarchical Scheduling Framework

The framework for multi-core mixed-criticality scheduling is an extension of the Hierarchi-cal Scheduling Framework (HSF), which has already support for single-core mixed-criticalityscheduling. We will first give an overview of the existing framework as described in [13, 18]and take a closer look at the parts that were extended. These extensions are discussed in thefollowing sections.HSF operates directly on top of the Linux concurrency manager in the user space and addsan additional layer between the tasks and the scheduling mechanisms of the operating system(see Figure 3.1). This allows the framework to be portable and compatible with a high rangeof hardware platforms, which represents the key design criteria of the framework beside itshierarchical scheduling support and easy extensibility.The framework itself consists of three main parts, which are shown in Figure 3.2. The parser isresponsible to read all simulation and scheduler configurations as well as the properties of thetask set to schedule. These informations are specified in an XML file. The core of the frameworkis the simulation part. It consists of different dispatchers, which register new task arrivals, oneor more schedulers that schedule all the tasks and the workers which manage the execution ofall the tasks. Each component of this simulation part is implemented in a different thread, such

Task 1 Task N...

Hardware

HW-dep. SW

Scheduler

Concurrency Mngr

HSF

Sched 1 Sched N...Wrapper

Task1

Wrapper

Task N...

Hardware

HW-dep. SW

SchedulerKernel

Space

Concurrency Mngr

User

Space Config

Linux Tasks HSF Tasks

Analysis

Figure 3.1: Illustration of the added HSF scheduling layer on top of the standard Linux scheduleron the left. [13]

9

10 CHAPTER 3. IMPLEMENTATION

XML File Parser Schedulers

Dispatchers

Workers

Statistics runtimes.csv

traces.csv

missedDeadlines.csv

INPUT SIMULATION OUTPUT

Figure 3.2: Overview of the elements of the Hierarchical Scheduling Framework. [13]

that the underlaying operating system mechanisms can be used for thread switching, priorityordering and preemption. We will have a closer look at these threads and their internal hierarchybelow. The third and last part of the framework is the statistics and output module, which collectstraces during simulation, calculates metrics like the scheduler overhead and finally saves all thecollected and calculated information to several output files. These file can be used later forfurther analysis.

3.1.1 The Thread Hierarchy

The most important part of the framework is the thread hierarchy, which represents all differ-ent components that interact with each other during simulations. To add support for multi-corescheduling, there were several changes throughout this hierarchy. We describe here shortly thefunctionality of the different parts in the original thread hierarchy of the Hierarchical SchedulingFramework and discuss the changes that were implemented by comparing the current to theoriginal thread hierarchy. The structure of the two versions of the hierarchy are summarized inFigure 3.3 and the modifications are highlighted with colors. Green color marks new or addedimplementations, yellow extensions or bigger changes and red dropped parts. Small changesor renaming of the interface are not colored. The figure only shows the public part of the classinterfaces, for more detail please refer to the documentation that comes along with the sourcecode.One can see in the mentioned figure that everything is derived from the Thread class. Thisclass represents a single thread and is an object oriented wrapper around a POSIX thread(pthread, [6]) with some additional management functions. The base class of the whole hier-archy is affected by many changes. For multi-core scheduling CPU affinity support was addedand the core mapping is now an additional parameter when creating a Thread object. Anotherimportant change is that the start of the thread was moved outside the constructor to fix a racecondition, which implies that a thread now needs to be started separately after its creation.The Dispatcher hierarchy remained approximately as it was in the original framework. Onlythe new affinity mapping was introduced as an optional setting, such that the dispatcher canbe mapped to the same CPU core, on which the task executes that the dispatcher controls.Another change in the Dispatcher class assures that the dispatching now occurs at fixedtime intervals by using a sleep function that sleeps until an absolute timestamp. This assuresthat the dispatch times cannot drift anymore. These time drifts were a problem in the previousimplementation, because when calculations are done or the dispatcher was even preemptedbetween two relative sleep calls, the time interval between dispatching two jobs is longer thanintended and varies.The common Runnable interface, which includes all the schedulers and the worker, was notaffected by big changes. It only includes an optional CPU affinity mapping parameter whencreating a new Runnable object and some additional abstractions of functions that are commonfor the Worker and the Scheduler classes.A key part of the framework is the Worker class, which organizes the threads that execute thedifferent tasks, because it interacts with nearly every component of the framework. Becauseevery part of it needs to be well understood for the interaction with the multi-core scheduler, thewhole Worker class was newly implemented from scratch. The main implementation criteriawere full backward compatibility and simplicity. That means avoiding unnecessary synchroniza-tions between the worker’s task and the interface that the class exposes for interaction with

3.1 Hierarchical Scheduling Framework 11

the schedulers, dispatchers and other threads. The interface was kept unchanged whereverpossible, because it already provided most of the needed functionality and is also needed forbackward compatibility. There were a few interface extensions to support the new multi-coreschedulers, which includes an additional task cancellation method and function to compute thetask’s utilization.The Scheduler class that abstracts common functionality provided by each scheduler remainsnearly the same. To keep compatibility with all previously implemented schedulers, this classdoes not provide a CPU affinity interface to its derived classes. The schedulers are insteadalways mapped on the first available core in the system. To avoid conflicts with worker threadson the same core as the schedulers, the workers are assigned to the remaining cores as longas not all available cores are needed for the workers. The fixed scheduler mapping guaranteesthat all existing schedulers do not require any changes at all (beside some adjustments due tochanges in the interface that is shared with the Worker class).The new FTTS and partitionedEDF_VD classes implement the two presented mixed-criticality multi-core schedulers and directly derive from the Scheduler class, because theycannot be abstracted by any previously implemented scheduler. For their implementation de-tails refer to Section 3.3 for FTTS and Section 3.4 for partitioned EDF-VD. Another notableextension of the scheduler hierarchy is the additional EDF_VD abstraction, which is needed forthe partitioned EDF-VD implementation and discussed in the corresponding section.For the new multi-core schedulers the JobSwitcher class was added. This class is only thereto switch between two different tasks, because in the FTTS implementation a task needs to beable to activate the next task in the subframe and deactivate itself without scheduler interaction.The problem is that a thread cannot deactivate itself, because this handled by assigning a lowerpriority to the thread. If a thread wants to lower its own priority, it is preempted before thefunction to lower the priority can return. This will lead to blocking when the scheduler tries toactivate the thread again using the same function that was preempted before. Therefore, the jobswitcher thread was introduced to handle the switching between the tasks, which allows to lowerthe thread priority from another thread that is not preempted during the previously mentionedfunction call.Finally one can see that the OverrunChecker class of the original framework is replacedby a whole hierarchy of overrun checkers. This is because we now need two differentkinds of overrun checkers: one that checks for execution beyond a global timestamp, theAbsoluteOverrunChecker class, and another that checks if a task has executed longer thana defined duration, the RuntimeOverrunChecker class, which implements the same func-tionality as original overrun checker. Because these overrun checkers share a lot of commonfunctionality, they are integrated into a hierarchy. There is also an overrun checker that is spe-cialized to check for a runtime overrun of a worker, but this is only a small specialization of theruntime overrun checker.

3.1.2 Scheduling from User Space

Because later the different thread priorities will be mentioned, we shortly describe how theHierarchical Scheduling Framework schedules from the user space. This part of the frameworkwas not changed during this thesis, but is mentioned here to understand the problems with thepriority assignment, which will be described in the FTTS implementation (Section 3.3).To simulate scheduling from the user space, the framework uses the priority-based operatingsystem scheduler and the scheduling priorities that the operating system provides. The prioritiesthat are used for different parts of the framework are illustrated in Figure 3.4 and summarizedin Table 3.1. The special priority assignment for the scheduler and worker when using the FTTSare discussed in Section 3.3.Because there is no functionality to suspend a thread, an Idle thread was introduced in theHSF. This thread class is directly derived form the Thread class and simply does busy wait-ing, until it is preempted by a higher priority thread. This allows to stop a task by assigning alower priority than the idle thread to it (not shown in the figure, because threads never executeon this priority level). The normal active priority for a worker is one level higher than the idlethread and is used when the worker executes a task. The next higher priority is assigned to thedispatchers, which trigger the arrival of a new job according to the task settings (arrival timeand periods). Above the dispatcher, there are only the different scheduler priorities to support


Thread

# id

# joined

# policy

# thread

# thread_attr

# thread_param

# thread_type

+ Thread()

+ ~Thread()

+ join()

+ join2()

+ wrapper()

+ getId()

+ setPriority()

# getRuntime()

Dispatcher

+ worker

# periodicity

# offset

# preempt_sem

+ Dispatcher()

+ join()

+ wrapper()

+ activate()

+ dispatch()

+ setOffset()

+ setPeriodicity()

+ setWorker()

Idle

+ Idle()

+ wrapper()

+ run()

+ end()

+ setCountThresh()

Runnable

# state

# criteria

# parent

# current_runnable

# current_task

+ Runnable()

+ join()

+ wrapper()

+ activate()

+ deactivate()

+ cancel()

+ getCriteria()

+ setParent()

Aperiodic

+ Aperiodic()

+ dispatch()

+ getReleaseTime()

+ setReleaseTime()

Periodic

+ Periodic()

+ dispatch()

+ getPeriod()

+ setPeriod()

PeriodicJitter

+ PeriodicJitter()

+ dispatch()

+ getPeriod()

+ setJitter()

+ setPeriod()

OverrunChecker

+ OverrunChecker()

+ fire()

+ activate()

+ deactivate()

+ join()

+ wrapper()

ResourceAllocator

+ ResourceAllocator()

+ join()

+ wrapper()

+ activate()

+ deactivate()

+ finishedJob()

+ overrunJob()

+ newJob()

+ updateRunnable()

Worker

+ canceling

+ finishing

+ Worker()

+ join()

+ wrapper()

+ activate()

+ deactivate()

+ cancel()

+ finishedJob()

+ overrunJob()

+ newJob()

+ debug()

+ getRelativeDeadline()

+ setCriteria()

+ setRelativeDeadline()

+ setTask()

+ getTask()

Scheduler

+ sw

# level

# sched_type

+ Scheduler()

+ join()

+ wrapper()

+ activate()

+ deactivate()

+ finishedJob()

+ newJob()

+ updateRunnable()

+ schedule()

# debug()

Server

+ Server()

+ wrapper()

+ activate()

+ deactivate()

+ finishedJob()

+ newJob()

+ updateRunnable()

+ serve()

EventBased

# activeQueue

# currentRunnable

# overrunJobDeque

# finishedJobDeque

# updateDeque

# activation_sem

# event_sem

# schedule_sem

+ EventBased()

+ join()

+ activate()

+ deactivate()

+ finishedJob()

+ overrunJob()

+ newJob()

+ updateRunnable()

+ schedule()

# handleUpdate()

# handleOverrun()

# handleFinish()

MCRF_TDMA

+ MCRF_TDMA()

+ join()

+ activate()

+ deactivate()

+ newJob()

+ finishedJob()

+ updateRunnable()

+ overrunJob()

+ schedule()

+ debug()

+ add_load()

+ add_slot()

+ set_period()

TDMA

+ TDMA()

+ join()

+ activate()

+ deactivate()

+ newJob()

+ finishedJob()

+ updateRunnable()

+ schedule()

+ add_load()

+ add_slot()

EDF

+ EDF()

EDF_VD1

+ EDF_VD1()

+ handleFinish()

+ handleOverrun()

+ handleUpdate()

+ newJob()

EDF_VD2

+ EDF_VD2()

+ handleFinish()

+ handleOverrun()

+ handleUpdate()

+ newJob()

FIFO

+ FIFO()

FixedPriority

+ FixedPriority()

RateMonotonic

+ RateMonotonic()

CBS

+ replenish()

+ serve()

DBS

(a) The original thread hierarchy

3.1 Hierarchical Scheduling Framework 13

Thread

# core_

# joined_

# policy_

# thread_

# thread_attr_

# thread_id_

# thread_param_

# thread_type_

+ Thread()

+ ~Thread()

+ join()

+ join2()

+ startThread()

+ wrapper()

+ getCore()

+ getId()

+ getPriority()

+ getRuntime()

+ getRuntimeMs()

+ setCore()

+ setPriority()

Dispatcher

# periodicity_

# offset_

# worker_

# preempt_sem_

+ Dispatcher()

+ join()

+ wrapper()

+ activate()

+ dispatch()

+ setOffset()

+ setPeriodicity()

+ setWorker()

Idle

+ Idle()

+ wrapper()

+ run()

+ end()

+ setCountThresh()

JobSwitcher

# criticality_level_

# job_stop_

# job_start_

# stop_watch_

+ JobSwitcher()

+ ~JobSwitcher()

+ join()

+ setStopWatch()

+ switchJob()

+ switchJob()

+ wrapper()

OverrunChecker

# runnable_

# timeout_

# sem_run_

# sem_end_

# sem_join_

+ OverrunChecker()

+ OverrunChecker()

+ ~OverrunChecker()

+ join()

+ wrapper()

+ fire()

+ reset()

+ setRelativeTimeout()

+ setAbsoluteTimeout()

# debug()

# info()

# overrunHandler()

# waitOverrun()

# signalHandler()

Runnable

# state_

# criteria_

# parent_

# current_task_

+ Runnable()

+ Runnable()

+ join()

+ wrapper()

+ activate()

+ activate()

+ deactivate()

+ cancel()

+ cancelBlocking()

+ dropCurrent()

+ overrunJob()

+ overrunJob()

+ overrunJob()

+ getCriteria()

+ setParent()

Aperiodic

+ Aperiodic()

+ Aperiodic()

+ dispatch()

+ getReleaseTime()

+ setReleaseTime()

Periodic

+ Periodic()

+ Periodic()

+ dispatch()

+ getPeriod()

+ setPeriod()

PeriodicJitter

+ PeriodicJitter()

+ PeriodicJitter()

+ dispatch()

+ getPeriod()

+ setJitter()

+ setPeriod()

AbsoluteOverrunChecker

+ AbsoluteOverrunChecker()

+ AbsoluteOverrunChecker()

+ ~AbsoluteOverrunChecker()

# waitOverrun()

RuntimeOverrunChecker

# timeout_relative_

+ RuntimeOverrunChecker()

+ RuntimeOverrunChecker()

+ ~RuntimeOverrunChecker()

+ setRelativeTimeout()

+ setAbsoluteTimeout()

# waitOverrun()

WorkerOverrunChecker

+ WorkerOverrunChecker()

+ WorkerOverrunChecker()

+ ~WorkerOverrunChecker()

# overrunHandler()

ResourceAllocator

+ ResourceAllocator()

+ join()

+ wrapper()

+ activate()

+ deactivate()

+ finishedJob()

+ finishedJob()

+ overrunJob()

+ newJob()

+ updateRunnable()

Worker

+ Worker()

+ Worker()

+ Worker()

+ Worker()

+ ~Worker()

+ activate()

+ activate()

+ cancel()

+ cancelBlocking()

+ deactivate()

and 10 more...

# cancel()

# finishedJob()

Scheduler

+ stop_watches

# level_

# cores_

# worker_core_base_

# sched_type_

+ Scheduler()

+ join()

+ wrapper()

+ activate()

+ deactivate()

+ finishedJob()

+ newJob()

+ updateRunnable()

+ schedule()

+ getWorkerCoreCount()

+ getWorkerCoreBase()

# debug()

Server

+ Server()

+ wrapper()

+ activate()

+ deactivate()

+ finishedJob()

+ newJob()

+ updateRunnable()

+ serve()

EventBased

# active_queue_

# current_runnable_

# overrun_job_deque_

# finished_job_deque_

# update_deque_

# activation_sem_

# event_sem_

# schedule_sem_

+ EventBased()

+ ~EventBased()

+ join()

+ activate()

+ deactivate()

+ finishedJob()

+ overrunJob()

+ newJob()

+ updateRunnable()

+ schedule()

# handleUpdate()

# handleOverrun()

# handleFinish()

FTTS

# criticality_levels_

# criticality_level_

# cycle_duration_

# frames_

# current_frame_

# current_subframe_

# active_cores_

# job_switchers_

# arrived_tasks_

# subframe_barrier_

+ FTTS()

+ ~FTTS()

+ join()

+ activate()

+ deactivate()

+ newJob()

+ finishedJob()

+ updateRunnable()

+ overrunJob()

+ schedule()

+ addLoad()

+ getCriticalityLevel()

+ printStructure()

# startTasks()

# stopTasks()

# cancelTasks()

# finishedJob()

MCRF_TDMA

+ MCRF_TDMA()

+ join()

+ activate()

+ deactivate()

+ newJob()

+ finishedJob()

+ updateRunnable()

+ overrunJob()

+ schedule()

+ debug()

+ add_load()

+ add_slot()

+ set_period()

PartitionedEDF_VD< S >

+ PartitionedEDF_VD()

+ ~PartitionedEDF_VD()

+ join()

+ activate()

+ deactivate()

+ partitionTasks()

+ finishedJob()

+ newJob()

+ updateRunnable()

+ schedule()

+ setParent()

TDMA

+ TDMA()

+ join()

+ activate()

+ deactivate()

+ newJob()

+ finishedJob()

+ updateRunnable()

+ schedule()

+ add_load()

+ add_slot()

EDF

+ EDF()

EDF_VD

+ EDF_VD()

+ handleFinish()

+ handleOverrun()

+ handleUpdate()

+ newJob()

+ updateVirtualDeadlines()

# debug()

FIFO

+ FIFO()

FixedPriority

+ FixedPriority()

RateMonotonic

+ RateMonotonic()

EDF_VD1

+ EDF_VD1()

+ handleFinish()

+ handleOverrun()

+ handleUpdate()

+ newJob()

EDF_VD2

+ EDF_VD2()

+ handleFinish()

+ handleOverrun()

+ handleUpdate()

+ newJob()

CBS

+ replenish()

+ serve()

DBS

(b) The thread hierarchy adapted for multi-core scheduling

Figure 3.3: Comparison of the thread hierarchy of the original HSF implementation and the oneadapted for multi-core support.Colors: green for new implementations, yellow for extension or changes, red for dropped parts.


Priority ThreadsHighest Framework, to manage the simulationHighest - 1 -hierarchy level

Scheduler (different priorities for hierarchical scheduling),Worker (for synchronization with FTTS only)

5 Dispatcher, overrun checkers4 Scheduling servers3 Worker (active state), FTTS (for synchronization only)2 Idle1 Worker and Scheduler in inactive state

Table 3.1: Summary of the thread priorities used in the Hierarchical Scheduling Framework.

Scheduler

Dispatcher

Worker

Idle Thread

Lowest priority

Highest priority

0 5 10 15 20 25 30 35 40

0 5 10 15 20 25 30 35 40

0 5 10 15 20 25 30 35 40

0 5 10 15 20 25 30 35 40

Figure 3.4: Priority assignment of the different framework components. [18]

hierarchical scheduling. However, the schedulers implemented in this thesis will not make useof any scheduling hierarchy and only use one scheduler priority. Not shown in the figure is thepriority of the framework that organizes the simulation, which remains inactive during the wholesimulation. Its priority is higher than all the previously mentioned parts to be able to setup andabort the simulation externally.

3.2 Extensions for Multi-Core Support

In this section we look closer at the newly added mechanisms that are essential for schedulingin multi-core environments.

3.2.1 CPU Affinity

In multi-core scheduling managing the processor affinity of the different tasks is very im-portant. Because the framework operates on top of the Linux operating system and usesthe pthread library, setting and getting the CPU affinity of a thread is not complicated. Thepthread implementation in Linux [6] that implements the POSIX thread specification [4] has ad-ditional support for the CPU affinity management with the pthread_setaffinity_np andpthread_getaffinity_np functions. With these functions setting and getting the threadaffinity is straightforward and perfectly fits our needs.During testing of different processor affinity mechanisms also the Portable Hardware Locality(hwloc) software [10, 1] was considered as an alternative. This library provides a completeabstraction of the hierarchy of the system hardware and provides also the functionality to setand get the CPU affinity of pthreads. It provides much more information about the hierarchyof caches, physical and logical CPU cores that could be used to optimize the affinity mapping.The interface is not much more complicated but needs to load and store the hardware hierarchyin the memory, which still is some additional work. Because of this and the fact the we wouldneed to include another library, the pthread affinity API was chosen, but hwloc would be a great

3.2 Extensions for Multi-Core Support 15

library if one needed to optimize a scheduler taking into account the hierarchy of the caches andother hardware components. However, such optimizations are beyond the scope of this thesis.

3.2.2 Barrier Synchronization

In multi-core environments synchronization between the different cores is another essentialpart. The POSIX standard [6, 4] also includes a barrier synchronization mechanism (thepthread_barrier_init, pthread_barrier_wait and pthread_barrier_destroyfunctions). Because we are using the pthread library for multithreaded programming in Linux,the support is already there. This barrier mechanism was evaluated by measuring the runtimeof one million subsequent barrier synchronizations on systems with two and four cores. Theresults are shown in Table 3.2. With an average delay for one barrier synchronization of up to10us, we expected that synchronization will generate a high amount of the scheduler overhead.

System Delay per barrier synchronization [µs]Dual-core (Intel Core i7-3687U at 2.1 GHz) 3.67± 1.26Quad-core (Intel Core i5-4670 at 3.4 GHz) 9.42± 0.89

Table 3.2: Time needed for one barrier synchronization on a dual-core and quad-core CPU.Average values and standard deviation of 10 test runs with 1 million subsequent barrier syn-chronizations, using one thread per available CPU core.

We first have considered an alternative called fast barrier, which has proven to be faster thanthe pthread barrier [11]. Because this implementation is only for the x86 platform and is im-plemented for Windows operating system using Intel compiler specific commands, we startedusing the pthread barrier synchronization. As we will see in the results in Section 4.3 the pthreadbarrier does not add considerably to the total scheduling overhead and scales sub-linearly withthe number of cores. Therefore we did not try to adapt the fast barrier to our framework.

3.2.3 Time Measurement

For the simulation traces and statistics output, the time measurement is also very important. Theproblem in multi-core systems is that every core needs the same global notion of time to gen-erate useful traces in order to extract statistics and analyze how the task sets were scheduled.For this purpose, two timing mechanisms were considered, as discussed below.

Operating System Timer The global timing was not that problematic as thought in advance,because we can simply use the high resolution timers of the operating system that are alsodefined in the POSIX standard [4]. These operating system timers are read over a system callinterface (clock_gettime function) and essentially access the same timer when called fromdifferent cores. The disadvantage is that there are some blocking delays when one core wantsto access the timer, while another is already inside the system call. However, the performancemeasurement of 1 million subsequent timer accesses showed that the time accesses are inthe range of a few hundred nanoseconds (see Table 3.3). These delays are smaller than theresolution of the times that are stored in the simulation output of our framework (times arestored with accuracy of 1 microsecond).

System OS clock access time [ns]Dual-core (Intel Core i7-3687U at 2.1 GHz) 158.1± 85.53Quad-core (Intel Core i5-4670 at 3.4 GHz) 254.3± 0.95

Table 3.3: Time to access the operating system timer for a dual-core and quad-core CPU.Average and standard deviation of 10 test runs with 1 million subsequent timer accesses each.

Hardware Timer An alternative time measurement method that we considered as an optionwas the x86 specific time stamp counter, because the timer can be accessed with a single CPU


instruction and also has an accuracy of one CPU cycle. The disadvantages of this option is thatthis timer is platform-specific, while the framework was designed to be cross-platform, and thatthe timer value is affected by unpredictable frequency scaling operations on older processors.Therefore we decided to keep accessing time over the operating system using the POSIX inter-face.

3.2.4 Parser Extensions

To support the new multi-core schedulers, the XML inputs needed to be extended. As a con-sequence, the parser of the input files requires extended functionality. For both the FTTS andthe partitioned EDF-VD scheduler an own parser routine was added. Because the FTTS has itsown internal hierarchy, also its parsing of the frames and subframes was implemented in a sepa-rate routine. Details about the XML specifications can be found in the scheduler implementationsections.

3.2.5 Simulation Output and Visualization

To get a usable output for multi-core simulation, the output had to be extended. This mainlyincludes the CPU affinity that needs to be stored with each trace of a task and statistics for morethan one CPU core. Furthermore the simulation visualization tool called simfig needed to beextended. This includes dynamic image scaling for many tasks and visualization of schedulerspecific events and the processor affinity of the different tasks. The newly added elements inthe output graphs are listed in Appendix A.4.

3.3 Flexible Time Triggered Scheduler

3.3.1 Implementation Details

The Flexible Time Triggered Scheduler implementation (FTTS class in the thread hierarchy inFigure 3.3b) has a main function called schedule that manages the scheduling decisions. Theimplemented algorithm is shown in Algorithm 3.1. First of all, it consists of an outer loop thatrepresents a whole cycle and loops endlessly until the scheduler is deactivated. It does no morethan checking if the condition to terminate scheduling is fulfilled. It contains another loop thatiterates over all frames of a FTTS cycle. At the start of each frame the scheduler waits until theplaned start of the frame. If the start time of the frame has already passed due to an overrunof the previous one, the scheduler continues without waiting. After waiting until the start timeof the current frame, the frame loop contains another loop, which iterates over all subframes indescending order of criticality.Inside the subframe loop most of the scheduling actions are implemented. The scheduler checksif the subframe is executed or skipped (depending on the current criticality level). If the sub-frame is skipped, all tasks of the current subframe are dropped (i.e. remove arrived jobs) andthe scheduler proceeds with the next subframe without activating any worker. If the subframe isexecuted, the subframe’s WCBSTs are loaded and the absolute time overrun checker is config-ured to abort any task in the subframe, if it has not completed execution until the latest WCBST.Then the scheduler determines the number of active cores in the subframe, which is importantto configure the barrier synchronization that is set up right afterwards. In the next step the sched-uler activates the first task on each core and enters the pthread_barrier_wait function towait for all activated cores to synchronize.When all started cores synchronize with the scheduler after their tasks have finished executing,the scheduler deactivates the last executed task on all cores, because the tasks cannot deac-tivate themselves (same reason why the JobSwitcher class is needed, see Section 3.1.1).Then the scheduler checks if the WCBST of the current criticality level was overrun and adjuststhe criticality level accordingly. With this step the scheduler finishes the subframe execution andcontinues with the next one. All these loops execute until the simulation has finished. Then, theexit condition is set and the scheduler terminates.The switch between tasks int the same subframe are handled on a core basis: Each completedtask calls the schedulers finishJob function. This function will check if there is a next task on

3.3 Flexible Time Triggered Scheduler 17

Algorithm 3.1 Implementation of the Flexible Time Triggered Scheduler.

. The variables used in this algorithms are: L for the number of criticality levels (equals thehighest criticality), l for the current criticality level of the scheduler and WCBSTx(y) for theworst case barrier synchronization time of subframe x at criticality level y.

function SCHEDULEInitialize scheduler (first frame start, etc.)while scheduler activated do . FTTS cycle loop. Loop as long as scheduler active.

for f ← 1 to #frames do . Loop over all frames f in the cyclel← 1 . Reset criticality level at beginning of frameWait for frame startfor s← L to 1 do . Loop over subframes s in f , from highest to lowest criticality

if s < l thenDrop tasks of subframecontinue

end iftimeout←WCBSTs(s) . get latest WCBST as timeoutConfigure overrun checker to abort tasks at timeoutc← number of cores active in this subframe (at least one task to execute)Configure barrier to synchronize c+ 1 threads (scheduler and c latest tasks)for core ∈ active cores do

Start first task on coreend forEnter barrier synchronization . will synchronize when all tasks have finishedfor core ∈ active cores do

Stop last task on coreend fort← current timeif t > WCBSTs(l) then . Increase criticality, when WCBSTs(l) was overrun

l← minl{t ≤WCBSTs(l)}end if

end for . End of subframeend for . End of frame

end while . End of cycleend function

the same core and starts its execution with the help of a job switcher. If there is no other jobfollowing, the function enters barrier synchronization, by calling the pthread_barrier_waitfunction.The rest of the FTTS is similar to other schedulers: The newJob function registers the arrival ofa new job and overrunJob registers a job that has finished after its deadline. Here, the lattercase is handled like a normal finished job (when deadline was met), because the schedulerwaits for these jobs to synchronize as well.

3.3.2 Challenges

FTTS was the first multi-core scheduler that was implemented in the framework. There weremany different challenges that needed to be solved:

Barrier synchronization The barrier synchronization was a challenge because it blocksat the next pthread_barrier_wait function call, when not all tasks have left theirpthread_barrier_wait function after they have synchronized. Therefore additional careneeds to be taken to assure that all tasks have left this function and are not preempted bya higher priority thread like the scheduler. To achieve this behavior we need to deal with thethread priorities, which leads to the next problem.


Scheduler-

Priority

Worker-

Priority

subframe switch

(treads synchronized)

Worker

S

time

pthread_barrier_wait

Scheduler1)

2) 1')

2')

Figure 3.5: Priority inversion at FTTS synchronization points: 1) Scheduler priority is temporarilydecreased, 2) Worker priority is temporarily increased.

Priority management The different priority levels that are used to schedule from the userspace also created many problems. In the FTTS the tasks on each core need to synchronizewith the scheduler. The problem is that the tasks and the scheduler execute with different prior-ities, but they can block each other until all of them have synchronized. Because of the fact thatthere is no support for priority inversion protocols at the pthread barrier, much care needs to betaken to avoid deadlocks. Especially the above mentioned fact that all threads need to exit thepthread_barrier_wait function before the barrier can be configured again, requires addi-tional effort for cases where the scheduler executes on a core that is shared with other tasks.To assure that the scheduler is able to configure the barrier after synchronization, the synchro-nizing worker threads and the scheduler thread swap their priorities for the time of the barriersynchronization, as illustrated in Figure 3.5. This guarantees that all worker threads have leftthe barriers pthread_barrier_wait function before the scheduler sets up the next subframeand reinitializes the barrier.

Task cancellation Another challenging part was the dynamic task cancellation by the sched-uler when a subframe has overrun a particular threshold or a complete subframe needs to bedropped. Similar functionality existed already in the previous version of the framework, but wasmodified, because it blocked until all the tasks have really canceled their execution, where weneed cancellation without blocking for performance reasons. In addition to that, the cancellationroutine was updated to drop a task very quickly when it has not started its execution and nowdoes not need to be started to only be canceled. Because the Worker class that manages thetask execution can be in many different states that can change at any time, the cancellation ofa task needed a lot of effort to be properly synchronized. This required multiple nested criticalsections, which is very challenging to implement without deadlocks. Especially finding errorswhen it does not work as intended is a tedious task.

Debugging Debugging of multi-threaded applications is known to be a cumbersome task. Themain problem is that there is no debugger than can simulate the thread interactions as if theywere executed directly in the system. There are some solutions that try to make debugging multi-threaded applications easier, but they have high overhead and sometimes require modificationsin the source code to work reliably. An open source example for such a tool is valgrind [3].However, because of the mentioned disadvantages we mainly used the standard console out-put of the system to isolate a problem. The advantage of this approach is that the application isexecuted directly on the system and in real-time, which is crucial for our scheduling framework,because the FTTS uses time triggered frame starts which are released at absolute timer val-ues. The disadvantages are that the problem needs to be narrowed down carefully in iterativesteps and that the console output has also some overhead. Furthermore, the console outputfrom different cores is also being synchronized by the operating system, which may introduceunnecessary dependencies between the threads. In the end, this approach worked and fittedour needs the best together with already mentioned valgrind tools, which were used regularlyto check for general memory leaks. However, the execution times were many times higher whenusing the the valgrind tools and the simulation results unusable.


0 50 100 150 200 300 350 400250

Core 1

Core 2

Figure 3.6: Possible FTTS schedule for the task set defined in the sample XML specification.[12]

3.3.3 The XML Input Specification

The XML input for the FTTS is structured as the scheduling algorithm itself. A simple XML inputfor the FTTS algorithm is given in Listings 3.1 and 3.2. An example schedule of the task setspecified in the XML is shown in Figure 3.6.First of all we have the root element simulation which names the simulation and containsall required data for the simulation. It contains general configuration data, such as simulationduration. The FTTS runnable element follows, which contains a cycle duration, the runnabledefinitions for all tasks to schedule and the frame definitions. A FTTS runnable element hasthe algorithm attribute set to FTTS and additional attributes for the number of cores used forscheduling and the number of different criticality levels.Each runnable element that defines a task has the additional child element name, which isused to refer to the task in the frame definitions, and an element core to specify the core map-ping. The other child elements of the runnables are similar to the XML specification of existingschedulers. They specify the criticality level of the task, the WCET for the different criticalitylevels, the period and an optional parameter for the probability distribution of the WCET.The frame elements after the task definitions are sorted in the order of their execution andcontain a duration element as well to specify their length. Each frame element has as manysubframe elements as criticality levels. Its attribute specifies again the criticality level of thesubframe. The subframes contain an element for each core used for scheduling. Finally thesecore elements contain the references to the tasks that are executed on the specified core andare declared in the order, in which they will be executed.With this XML specification the complete hierarchy of the FTTS, which is needed for schedul-ing, is defined. It is intended that the core mapping and the criticality level is defined twice,once in the task definition and again inside the FTTS hierarchy. This allows the parser to dosome consistency checks, which makes the XML more robust against accidental specificationmistakes.

3.4 Partitioned EDF-VD

3.4.1 Implementation

The partitioned EDF-VD implementation is straightforward, because it consists of simply runningan isolated single-core EDF-VD scheduler on each core, which was already implemented inthe framework [18]. There actually exist two different EDF-VD implementations: One managesonly a single queue, which needs to be updated when switching between the low and highcriticality mode. The second EDF-VD implementation manages two different queues for eachexecution mode. The first implementation has a lower average overhead than the later, becauseonly one queue needs to be managed, while the second implementation has no additionaloverhead when switching the execution mode, because no queue needs to be updated then. Inour evaluation we will use the second implementation to get an overhead independent of thenumber of execution mode switchings.The partitioned multi-core implementation consists of the PartitionedEDF_VD class, whichis directly derived from the Scheduler class. This class only implements the partitioning algo-rithm that was described in Section 2.3 and a routine to start the distributed EDF-VD schedulers.The rest of the scheduler interface is mapped directly to the distributed EDF-VD schedulers, be-cause the partitioned EDF-VD scheduler does not provide any additional functionality during


Listing 3.1: Sample XML file structure for FTTS<?xml version="1.0"?><simulation name="ftts-example">

<duration value="4" units="sec" /><runnable type="scheduler" algorithm="FTTS" cores="2" criticality_levels="2"><duration value="200" units="ms" /><runnable type="worker" periodicity="periodic" task="busy_wait">

<name value="task1" /><core value="0" /><period value="100" units="ms" /><criticality_level value="2" /><wcet criticality_level="1" value="26200" units="us" /><wcet criticality_level="2" value="48200" units="us" /><distribution value="binary" parameter="50" />

</runnable><runnable type="worker" periodicity="periodic" task="busy_wait">

<name value="task2" /><core value="1" /><period value="50" units="ms" /><criticality_level value="2" /><wcet criticality_level="1" value="18200" units="us" /><wcet criticality_level="2" value="21600" units="us" /><distribution value="binary" parameter="50" />


<name value="task3" /><core value="0" /><period value="50" units="ms" /><criticality_level value="1" /><wcet criticality_level="1" value="8500" units="us" /><wcet criticality_level="2" value="0" units="us" /><distribution value="fixedtcet" parameter="" />


<name value="task4" /><core value="0" /><period value="200" units="ms" /><criticality_level value="1" /><wcet criticality_level="1" value="21500" units="us" /><wcet criticality_level="2" value="0" units="us" /><distribution value="fixedtcet" parameter="" />

</runnable>

<frame><duration value="50" units="ms" /><subframe criticality_level="2">

<barrier criticality_level="2" value="48200" units="us" /><barrier criticality_level="1" value="27200" units="us" /><core id="0">

<task reference="task1" /></core><core id="1">

<task reference="task2" /></core>

</subframe><subframe criticality_level="1">

<barrier criticality_level="2" value="0" units="us" /><barrier criticality_level="1" value="8900" units="us" /><core id="0">

<task reference="task3" /></core><core id="1"></core>

</subframe></frame>


Listing 3.2: Sample XML file structure for FTTS continued<frame><duration value="50" units="ms" /><subframe criticality_level="2">

<barrier criticality_level="2" value="21600" units="us" /><barrier criticality_level="1" value="19200" units="us" /><core id="0"></core><core id="1"><task reference="task2" />

</core></subframe><subframe criticality_level="1">

<barrier criticality_level="2" value="3400" units="us" /><barrier criticality_level="1" value="8900" units="us" /><core id="0"><task reference="task3" />

</core><core id="1"></core>

</subframe></frame><frame><duration value="50" units="ms" /><subframe criticality_level="2">

<barrier criticality_level="2" value="21600" units="us" /><barrier criticality_level="1" value="19200" units="us" /><core id="0"></core><core id="1"><task reference="task2" />


<barrier criticality_level="2" value="3400" units="us" /><barrier criticality_level="1" value="30900" units="us" /><core id="0"><task reference="task3" /><task reference="task4" />


</subframe></frame><frame><duration value="50" units="ms" /><subframe criticality_level="2">


</core><core id="1"><task reference="task2" />




</subframe></frame>

</runnable></simulation>


Listing 3.3: Sample XML file structure for partitioned EDF-VD scheduling algorithm<?xml version="1.0"?><simulation name="pedf-vd-example">

<duration value="4" units="sec" /><runnable type="scheduler" algorithm="pEDF-VD2" cores="2" criticality_levels="2"><runnable type="worker" periodicity="periodic" task="busy_wait">

<period value="100" units="ms" /><criticality_level value="2" /><wcet criticality_level="1" value="26200" units="us" /><wcet criticality_level="2" value="48200" units="us" /><distribution value="binary" parameter="50" />


<period value="50" units="ms" /><criticality_level value="2" /><wcet criticality_level="1" value="18200" units="us" /><wcet criticality_level="2" value="21600" units="us" /><distribution value="binary" parameter="50" />


<period value="50" units="ms" /><criticality_level value="1" /><wcet criticality_level="1" value="8500" units="us" /><wcet criticality_level="2" value="0" units="us" /><distribution value="fixedtcet" parameter="" />


<period value="200" units="ms" /><criticality_level value="1" /><wcet criticality_level="1" value="21500" units="us" /><wcet criticality_level="2" value="0" units="us" /><distribution value="fixedtcet" parameter="" />

</runnable></runnable>

</simulation>

runtime. The partitioned EDF-VD class was also designed to have a template parameter thataccepts any implementation of an EDF-VD scheduler, in order to support the different existingEDF-VD schedulers.The EDF-VD implementation can be used as is. The only additional functionality that was im-plemented is the algorithm for updating the virtual deadlines, because the task set for eachsingle-core EDF-VD scheduler is not known until the initial task set has been partitioned. Thefunction to update the virtual deadlines was implemented in the additional EDF_VD class, be-cause this function is common for all EDF-VD implementations.

3.4.2 The XML Input Specification

The XML input for the partitioned EDF-VD scheduler is only a subset of the FTTS XML inputspecification. This means that every XML input for FTTS can be directly used for a partitionedEDF-VD simulation. The only necessary change is to set the algorithm attribute of the sched-ulers runnable element to pEDF-VD2 for using the partitioned EDF-VD implementation, whichuses the second variant of the EDF-VD implementation.The scheduler definitions itself only has runnable child elements, which represent the global taskset that needs to be partitioned. Because there are no frames that need to refer to a specifictask and the tasks are not mapped to cores in advance, the runnables do not require anyname or core element. Everything else remains as described in the FTTS XML specification(Section 3.3.3). A simple XML input for the partitioned EDF-VD algorithm is given in Listing 3.3.

3.5 Summary

This chapter introduced the Hierarchical Scheduling Framework and describes the made ex-tensions to support multi-core scheduling. The design decisions for multi-core extensions were

3.5 Summary 23

discussed, which include processor affinity, inter-core synchronization, time measurement, inputparser extension and output visualization. At the end the implementation of the two schedulingalgorithm "FTTS" and "pEDF-VD" with their XML input specification were described and theirimplementation challenges were discussed.


Chapter 4

Experiments

In this chapter we discuss our methods to calculate the overheads of the implemented sched-ulers. We will then shortly describe the used algorithm for sample task set generation and thesetup for the experiments. The last part is used to present and discuss the results of our simu-lations.

4.1 Overhead Calculation

To measure the overhead of the schedulers we use the presented time measurement methodand the different thread clocks of the operating system, which measure the total execution timeof each thread.The total execution time for the framework can be summarized as follows [18]:

ttotal = tscheduler + ttasks + tidle + tframework + toperating system + tsystem tasks + tuser tasks (4.1)

where tscheduler is the time used for scheduling the tasks, ttasks the time spent for executingtasks, tidle the idle time of the framework, tframework the time for all the operations of the frame-work to manage the simulation like dispatching, monitoring overruns and tracing execution,toperating system the time needed by the operating system scheduler and the context switching,tsystem tasks the time that summarizes the execution time of system tasks that do not run withinthe framework and are not controllable by the user and finally tuser tasks the time that summarizesall user tasks that run beside the framework. For our analysis we will assume that there are noother user tasks executing on the system than the framework itself, i.e. tuser tasks = 0. On a realsystem only the user tasks can be blocked by running the framework with a real-time priority,there will still remain some system task with real-time priority that can preempt the framework,therefore tsystem tasks ≥ 0.To quantify the scheduler overhead, we use the time measurement method presented in Sec-tion 3.2 to measure the overhead of the different components of the scheduler. The times ttasksand tidle can be measured directly using the system clocks to measure total thread executiontimes. There finally remains the tframework, toperating system and tsystem tasks overhead. Because wecannot measure the overhead of these components directly, we calculate the sum of them asthe difference of the total execution time ttotal and all the previously discussed components thatwe can measure directly:

tframework + toperating system + tsystem tasks = ttotal − tscheduler − ttasks − tidle (4.2)

However, we cannot influence this part of the overhead, because it depends on operating sys-tem implementation. Because of that we will focus on the analysis of the different parts of thescheduler overhead. The results of this analysis will be discussed in Section 4.3.

4.2 Simulation

For the simulations we generated many different task sets and then optimized the task map-ping for the FTTS. The task sets were generated for utilization bounds ranging form 0.2 to 8

25

26 CHAPTER 4. EXPERIMENTS

Algorithm 4.1 Mixed-criticality task set generator. [15, Section VIII]

. Let U(a, b) return a number uniformly drawn from [a, b]i← 0SL ← 0 . Current low criticality utilization (ULOLO + ULOHI )SH ← 0 . Current high criticality utilization (UHIHI )repeat

i← i+ 1uH ← U(UL, UH)uL ← uH/U(ZL, ZH)χi ← HI with probability P ; LO with probability (1− P )if χi = LO then

Ui(LO)← min(uL, Ubound − SL)SL ← SL + Ui(LO)

elseUi(HI)← min(uH , Ubound − SH)Ui(LO)← min(Ui(HI), uL, Ubound − SL)SH ← SH + Ui(HI)SL ← SL + Ui(LO)

end ifuntil max(SL, SH) ≥ Ubound

(increasing by steps of 0.2) and the mapping was optimized for 2, 4, 8 and 16 cores. For taskset generation the same algorithm as discussed in [15, Section VIII] was used. This algorithmis given again in Algorithm 4.1.The parameters used for generating the task sets are:

• The utilization bound that the generated task set will have was varied from 0.2 to 8. It isdefined as

Ubound = max((ULOLO (τ) + ULOHI (τ), U

HIHI (τ)

). (4.3)

• Task periods Ti are randomly chosen from the set {200, 400, 800}ms.

• The utilizations of the tasks on their own criticality level (i.e. LO for low criticality and HIfor high criticality tasks) are uniformly distributed form 0.1 to 0.4, i.e. [UL, UH ] = [0.1, 0.4].For high criticality task, the utilization is divided to get the LO criticality WCET (see lastpoint). They can have utilizations below 0.1 at LO criticality.

• The probability that a task is of high criticality is 30%, i.e. P = 0.3.

• The high criticality WCET for the high criticality tasks is a multiple of the low criticalityWCET. The factor between these two WCETs is uniformly distributed between 1 and 4,i.e. [ZL, ZH ] = [1, 4].

Using these parameters, the algorithm adds new tasks to the task set until the low or highcriticality utilization reaches Ubound. The generated task set is then used as input for the mappingoptimization of the FTTS algorithm as described in Section 2.2. The length of the differentframes is fixed to 200ms. One single modification is made to the task set if necessary: When theWCET of a task (i.e. WCET(LO) for low critical and WCET(HI) for high critical tasks) is higherthan the frame length of the FTTS, the task is split into subtasks with equal WCET that fit into aFTTS frame. After the FTTS mapping has been optimized and if the task set is schedulable, itis stored in an XML file using the introduced XML structure. These XML files are then used forthe simulations with the scheduling algorithms FTTS and pEDF-VD.Beside the number of cores and the system utilization, also the probability that a high criticalitytask’s execution time overruns its low criticality WCET was varied from 0% to 100% in 10% steps.In the analysis section we will refer to this probability by using the term "overrun probability". Foreach utilization we generated 50 different task sets and simulated the schedulable ones withdifferent number of cores and varying overrun probability. The generated results were then usedto analyze the impact of the number of cores, the overrun probability and the system utilizationon the overhead of the two schedulers. Finally we analyzed these impacts when the periods and

4.3 Results 27

2 4 8 160

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

Number of cores [1]

Re

lative

ove

rhe

ad

[%

]

frame initsubframe initbarrier syncjob arrivaljob finishedsubframe overrun

(a) Relative FTTS overhead for different number ofcores.

2 4 8 160

0.005

0.01

0.015

0.02

0.025

Number of cores [1]

Re

lative

ove

rhe

ad

[%

]

administrationjob finishedjob overrunnew job

(b) Relative pEDF-VD overhead for different num-ber of cores.

Figure 4.1: Relative scheduler overheads for different number of cores. The normalized utiliza-tion is set to 0.4 and the overrun probability to 40% for all simulations.

execution times of the task sets are divided by a factor of 10, to get an idea how the scheduleroverheads scale with the task execution time. The results of these simulations are discussed inthe next section.All simulations were executed on a server with two Intel Xeon E5-2690 processors at 2.90GHzand 8 cores each. This means that we could run simulations on up to 16 cores. As software weused a standard 64-bit Linux operating system with kernel version 3.7 and enabled preemption.

4.3 Results

The results of our simulations can be seen in Figures 4.1 to 4.6. We will look into how the over-head changes for different number of cores, different overrun probabilities for the tasks and withincreasing system utilization. In the last subsection we will analyze how the overhead changeswhen the task execution times are scaled down. All the overheads are measured relative to thetotal simulation duration (i.e. relative overhead = absolute overhead

simulation duration ).

4.3.1 The Impact of the Number of Cores

The total overhead for different number of cores is depicted in Figure 4.1. For the overheadmeasurement in these two figures, the normalized utilization (i.e. the average utilization percore) was fixed to 0.4 and the overrun probability was fixed to 40%. Both graphs show that thetotal overhead of the FTTS and pEDF-VD scheduler is clearly increasing for the simulations onmore cores.For the FTTS, the main components of the overhead are the subframe initialization, the task han-dling (job arrival and finishing) and the barrier synchronization. The subframe initialization hasthe highest overhead because all the scheduling decisions, job execution and overrun checksare performed there (compare Section 3.3). All mentioned overheads increase with the numberof cores. The subframe and job handling are increasing, because the normalized utilization isfixed and the number of tasks increases with the same factor as the number of cores, whichcauses more overhead for handling these tasks. For the subframe initialization part of the ad-ditional overhead comes from the fact that the scheduler needs to start and stop more threadson the additional cores. The barrier synchronization overhead is, with an average overhead of17µs for two cores, even higher than in our test measurements. This might come from the factthat the threads not only synchronize between cores, but also between the two processors.However, the synchronization overhead is still one of the lower overhead components. In ad-dition to that, the synchronization overhead is only increasing to an average value of 30µs for16 cores. Therefore we expect that the synchronization overhead is not the major challenge of


this scheduling algorithm. The other FTTS overhead components do not depend on the numberof cores, because they are either independent of the number of cores (frame initialization) oronly needed when a FTTS subframe overruns (subframe overrun), which should not happenwhen the task set is schedulable. To prevent the scheduler from overrunning the subframe, weaugmented the estimated WCBST by 10% to account the system overhead. Otherwise, the sub-frame would always overrun for the tiniest overhead. Indeed, we have only registered subframeoverruns during simulations on 16 cores, when the overhead gets to high1.The main overhead contributors of the pEDF-VD scheduler are the handling of new and fin-ished jobs. The highest and clearly increasing overhead component is the handling of new jobs,because the job queue needs to be updated with every new job that arrives, which increaseswith the number of jobs. Interestingly the overhead for removing finished jobs does not increasemuch for higher number of cores and the general administration costs for the distributed sched-ulers even decrease when the number of cores increases. It is hard to say what causes thisbehavior, especially for the decreasing cost of the general administration, and therefore furtherinvestigation of the pEDF-VD scheduler and particularly of the single-core EDF-VD implemen-tation, which is described in [18], is needed.

4.3.2 The Impact of the Task Overrun Probability

The overhead for different overrun probabilities is depicted in Figure 4.2. The upper two figuresshow the results when simulating on 4 cores and the lower two figures when simulating on 8cores. The normalized utilization is still kept at 0.4 for all four graphs.The results for the pEDF-VD scheduler confirm the results from [18] that the overhead is mostlyindependent of the overrun probability. The reason for this is that we have chosen an implemen-tation of the single-core EDF-VD scheduler that was designed to have no additional overheadfor switching between its low and high critical mode, as described in Section 3.4.For the FTTS we can see that the overhead decreases with higher overrun probabilities. Thiscomes from the fact that the low criticality subframes are also skipped with increasing prob-ability, when the overrun probability increases. If a subframe is skipped, most of the work inthe subframe initialization does not need to be done, because all the jobs of the subframe aredropped. This means that for a higher overrun probability less jobs are executed in total, whichlowers the job finish overhead. For a skipped subframe there is also no need for synchronization,because the cores have already synchronized at the point where a subframe is being skipped.The frame initialization and job arrival overhead stay constant for different overrun probabilities,because the number of arriving jobs and the number of FTTS frames are independent of theoverrun probability.When comparing the overheads for the simulations on 4 cores to the ones on 8 cores, we cansee exactly the same behavior for both scheduling algorithms. Only the level of the overheadsthat are affected by the number of cores is higher when using 8 cores.

4.3.3 The Impact of the System Utilization

The behavior of the overhead for different system utilizations is shown in Figure 4.3. The numberof cores is fixed to 4 cores for the upper two and to 8 cores for the lower two graphs. In thesegraphs the overrun probability is fixed again to 40%.For the FTTS one can see that the subframe overhead is increasing with higher utilization. Thisis caused by the higher number of tasks, which causes higher administration costs at the startof each subframe. It is also obvious that more tasks increase the overhead to handle new jobarrivals and jobs that finished execution. This fact is clearly visible in the figures as well. Theoverhead for the barrier synchronization is more or less constant with a slight decrease for verylow utilizations, because the scheduler only needs to synchronize a subset of the cores, whennot every core is used to execute a job. We observe the same trends for simulations on 4 coresas well as for simulations on 8 cores, with higher overheads when simulating on 8 cores.For the pEDF-VD scheduler, the overheads are highly dependent on the actual partitioning ofthe task set. The total overhead is increasing with the system utilization as well, but then stays

1The high overhead is actually caused by the highly increased operating system overhead for 16 cores. This partof the overhead is not included in the graphs shown here. The causes for this high overhead are not known so far andrequire further investigation.

4.3 Results 29

0 20 40 60 80 1000

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

Overrun probability [%]

Rela

tive o

verh

ead [%

]

frame initsubframe initbarrier syncjob arrivaljob finishedtotal overhead

(a) Relative FTTS overhead for different overrunprobabilities, when simulating on 4 cores.

0 20 40 60 80 1000

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018


Re

lative

ove

rhe

ad

[%

]

administrationjob finishednew jobtotal overhead

(b) Relative pEDF-VD overhead for different over-run probabilities, when simulating on 4 cores.

0 20 40 60 80 1000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16


Rela

tive o

verh

ead [%

]


(c) Relative FTTS overhead for different overrunprobabilities, when simulating on 8 cores.

0 20 40 60 80 1000

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018


Re

lative

ove

rhe

ad

[%

]


(d) Relative pEDF-VD overhead for different over-run probabilities, when simulating on 8 cores.

Figure 4.2: Relative scheduler overheads for different overrun probabilities. The normalizedsystem utilization is set to 0.4 for all simulations.


0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0.12

Normalized utilization [1]

Rela

tive o

verh

ead [%

]


(a) Relative FTTS overhead for different system uti-lizations, when simulating on 4 cores.

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018


Re

lative

ove

rhe

ad

[%

]


(b) Relative pEDF-VD overhead for different sys-tem utilizations, when simulating on 4 cores.

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16


Rela

tive o

verh

ead [%

]


(c) Relative FTTS overhead for different system uti-lizations, when simulating on 8 cores.

0 0.2 0.4 0.6 0.8 10

0.005

0.01

0.015

0.02

0.025


Re

lative

ove

rhe

ad

[%

]


(d) Relative pEDF-VD overhead for different sys-tem utilizations, when simulating on 8 cores.

Figure 4.3: Relative scheduler overheads for different system utilizations. The overrun proba-bility is set to 40% for all simulations.

4.4 Summary 31

equal for higher utilizations. The problem with these higher utilizations is that only very few tasksets were actually simulated: less than 50% of the task sets for normalized utilizations higherthan 0.5 and 0.3 for 4 and 8 cores, respectively. With utilizations higher than 0.65 and 0.5 for 4and 8 cores, only less than 20% of the task sets were simulated. This is because many task setshave been declared unschedulable by the optimization that is needed for FTTS, which is appliedbefore the task sets are stored in the XML files used for the simulations. The described behavioris similar for simulations on 4 and on 8 cores. For 8 cores the mentioned flattening and decreaseof the overhead occurs already at lower utilizations, because the number of schedulable tasksets already decreases at lower utilizations. To evaluate the overhead for pEDF-VD in moredetail, especially for higher utilizations, simulations without preceding optimization and moretask sets are needed to get more results for detailed analysis.

4.3.4 The Impact of the Task Execution Times

For the FTTS, the overhead is mostly dependent on the number of subframe initializations andthe number of tasks. This means that if the task periods are divided by 10 and all the otherparameters are kept as they were (especially the per-task-utilization), the number of subframesthat need to be handled in the same simulation duration is multiplied by a factor of 10. Becausethe utilization remains the same as before scaling the periods and WCETs down, the averagenumber of tasks also remains equal, which in turn means that the subframe overhead remainsunchanged as well. These facts lead to the result that the relative overhead of the FTTS ismultiplied by the scaling factor of 10. That the simulation outputs for the scaled down task setshow exactly this behavior can be clearly seen when comparing the resulting graphs shownin Figures 4.4 to 4.6 to the ones presented in the previous discussions (Figures 4.1 to 4.3).Therefore we can state that the absolute overhead of the FTTS highly depends on the periodsand WCETs of the tasks.For the pEDF-VD, the impact of reduced periods and task execution times is also considerable.However the overheads shown in Figures 4.4 to 4.6 are not multiplied by the same high factoras for FTTS, when the task periods are scaled down. Because the overhead changes for eachindividual single-core EDF-VD scheduler differently, we can not give any clear explanation whythe scheduling overhead changes like this. One possible explanation for the observed behavioris that the single-core EDF-VD schedulers switch earlier to high criticality mode, because theadditional overhead causes more jobs to overrun their virtual deadline. This in turn means thatless tasks need to be handled, because only the high critical ones are executed in high criticalitymode. With FTTS we do not have the problem of additional subframe overruns, because wealready accounted for the overhead at design time. However, to find the actual causes for it,the behavior of the partitioned EDF-VD scheduler requires further analysis. At this point we canonly state that the overhead for the pEDF-VD scheduler seems to scale better than for FTTSwhen the task periods and execution times are scaled down.

4.4 Summary

In this chapter the methods to measure the overhead of the different scheduler componentswere explained. After a short discussion of the task set generation and simulation setup, theresult of the analysis were presented. The overheads of the FTTS and pEDF-VD schedulerswere analyzed for different number of cores, overrun probabilities and utilizations. The analysisof the same simulations with scaled down task periods and execution times was then used toshow the impacts of the applied scaling.


2 4 8 160

0.2

0.4

0.6

0.8

1

1.2

1.4

Number of cores [1]

Re

lative

ove

rhe

ad

[%

]

frame initsubframe initbarrier syncjob arrivaljob finishedsubframe overrun

(a) Relative FTTS overhead for different number ofcores.

2 4 8 160

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

Number of cores [1]

Rela

tive o

verh

ead [%

]

administrationjob finishedjob overrunnew job

(b) Relative pEDF-VD overhead for different num-ber of cores.

Figure 4.4: Relative scheduler overheads for different number of cores for the scaled down tasksets. The normalized utilization is set to 0.4 and the overrun probability to 40% for all simulations.

4.4 Summary 33

0 20 40 60 80 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Rela

tive o

verh

ead [%

]

frame initsubframe initbarrier syncjob arrivaljob finishedsubframe overruntotal overhead

(a) Relative FTTS overhead for different overrunprobabilities, when simulating on 4 cores.

0 20 40 60 80 1000

0.02

0.04

0.06

0.08

0.1

0.12


Rela

tive o

verh

ead [%

]


(b) Relative pEDF-VD overhead for different over-run probabilities, when simulating on 4 cores.

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

1.2

1.4


Rela

tive o

verh

ead [%

]


(c) Relative FTTS overhead for different overrunprobabilities, when simulating on 8 cores.

0 20 40 60 80 1000

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1


Rela

tive o

verh

ead [%

]


(d) Relative pEDF-VD overhead for different over-run probabilities, when simulating on 8 cores.

Figure 4.5: Relative scheduler overheads for different overrun probabilities for the scaled downtask sets. The normalized system utilization is set to 0.4 for all simulations.


0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9


Rela

tive o

verh

ead [%

]


(a) Relative FTTS overhead for different system uti-lizations, when simulating on 4 cores.

0 0.2 0.4 0.6 0.8 10

0.05

0.1

0.15

0.2

0.25

0.3

0.35


Rela

tive o

verh

ead [%

]


(b) Relative pEDF-VD overhead for different sys-tem utilizations, when simulating on 4 cores.

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

1.4


Rela

tive o

verh

ead [%

]


(c) Relative FTTS overhead for different system uti-lizations, when simulating on 8 cores.

0 0.2 0.4 0.6 0.8 10

0.05

0.1

0.15

0.2

0.25


Rela

tive o

verh

ead [%

]


(d) Relative pEDF-VD overhead for different sys-tem utilizations, when simulating on 8 cores.

Figure 4.6: Relative scheduler overheads for different system utilizations for the scaled downtask sets. The overrun probability is set to 40% for all simulations.

Chapter 5

Conclusion and Outlook

5.1 Conclusion

In this thesis a multi-core scheduling framework with mixed-criticality support was implemented,which extends the Hierarchical Scheduling Framework [13, 18] with all the mechanisms thatare needed for multi-core deployment. Different implementation decisions were made and theiradvantages and disadvantages presented.The two schedulers named "FTTS" [12] and "pEDF-VD" [8] were implemented to demonstratethe functionality of the framework and to evaluate their overhead on a real-world platform. Thedocumentation of their implementation showed different problems and how they can be solved.These solutions might help to avoid possible problems when the framework is extended withadditional schedulers.In the overhead analysis of the schedulers we have seen that the overhead of the FTTS highlydepends on the number of subframes that need to be executed. The number of subframesdepends on the dimensioning of the FTTS cycle and frames, which in turn depends on the taskperiods. We have seen that the FTTS has an overhead below 0.15% for larger execution times(ten to hundred milliseconds) and that the overhead of up to 1.4% is going to be critical when thetasks have lower (a few milliseconds) execution times and periods. With an amount of at most20% of the overhead and clearly sub-linear scaling with the number of cores and utilization, thebarrier synchronization is not the major challenge of the FTTS implementation.For the pEDF-VD scheduler we have seen that it has low overhead, i.e. below 0.025% for longertask periods. When simulating task sets with task periods of only a few milliseconds, the over-head is still below 0.25%. On the other hand we also saw that the analysis of the scheduler iscomplicated, because of its partitioned manner.When comparing the results of FTTS to pEDF-VD, we can state that the pEDF-VD schedulerhas a much lower overhead than the FTTS in all simulated cases. However, the FTTS also hasits advantages, like isolation among tasks of different criticality levels.

5.2 Outlook

As already mentioned above, the Hierarchical Scheduling Framework has now support for theimplementation of more scheduling algorithms for multi-core and/or mixed-criticality systems.Beside the implementation of additional schedulers or different partitioning algorithms for pEDF-VD, the framework itself could possibly be improved by closer integration with the operatingsystem, for example by using the real-time patch for Linux [2], or by using an alternative bar-rier synchronization mechanism. This way the synchronization and operating system overheadmight be reduced.The currently implemented mixed-criticality schedulers are restricted to 2 criticality levels andcould be extended to support n criticality levels. Because the framework is easily portable toother platforms, the overhead of the implemented scheduler could also be evaluated on differentplatforms to analyze how the overhead changes for different hardware architectures.

35

36 CHAPTER 5. CONCLUSION AND OUTLOOK

Appendix A

Hierarchical SchedulingFramework

A.1 Installation

For the configuration and installation of the Hierarchical Scheduling Framework have a look atthe README file. Its content is shown below:

Listing A.1: README.md file of the Hierarchical Scheduling FrameworkHierarchical Scheduling Framework {#mainpage}=================================

Requirements------------

HSF requires a *NIX kernel with standard libraries. To compile all sources,these packages are needed:

* g++ (>= 4.7)

* make

* octave

* php

If you want to use the initial support for the hwloc library for setting theprocessor affinities of the threads, the hwloc tools and library are also needed:

* hwloc

You can also directly link the sources without compiling and installing them. Formore details see: <http://www.open-mpi.org/projects/hwloc/>\nTo compile hsf without hwloc support, check that ‘USE_LINUX_AFFINITY‘ is set to‘1‘ in ‘src/pthread/Thread.cpp‘ and uncomment ‘HSFLIBS += -lhwloc‘ in the Makefile.

For generating the output figures with the ‘simfig‘ tool you need the followingadditional libraries:

* libmgl-dev (>= 2.0)

* libX11-dev

Mathgl library should be compiled, and libmgl.so.7.0.0 should be placed in‘/usr/local/lib‘ (otherwise the MATHGL variable in the makefileshould be changed to the appropriate location). If you follow the compileinstructions of Mathgl this should happen automatically. For more informationon Mathgl, please visit: <http://mathgl.sourceforge.net/>

For building the documentation of the Hierarchical Scheduling Framework you willalso need a recent version of doxygen:

* doxygen (>= 1.8)

To build the documentation simply run ‘make doc‘ in the HSF directory. The

37

38 APPENDIX A. HIERARCHICAL SCHEDULING FRAMEWORK

documentation will then be generated in the ‘doc‘ directory.

Installation and Configuration------------------------------

1. If your HSF folder is not located in ‘~/git‘, then please change line 3 of‘hsf_paths.sh‘ to the path of your HSF folder.

2. In the terminal, type:

source hsf_paths.sh

This will set a new ‘$HSF‘ variable, and add it to your ‘$PATH‘ variable. You canalso add it to your ‘~/.bashrc‘ file, to have it load automatically

3. Privileges for executing with real-time prioritiesTo execute hsf the user needs to have the privileges to switch the applicationsscheduling policy to real-time priority based scheduling and execute it a withreal-time priority. You can do this in two different ways:

1. (RECOMMENDED) Allow the user to execute applications with real-time priorityusing the limits.conf configuration file. To allow the user with name ’user’to execute its applications with real-time priority, simply add the file‘/etc/security/limits.d/50-rtprio.conf‘ with the following two lines (replace’user’ with your username):

user - rtprio unlimiteduser - nice -20

To apply these changes a reboot of your machine is required.Please note that this user can execute any application with real-time priority.More details about this can be found in the manpage of limits.conf

2. (_NOT_ RECOMMENDED) To execute hsf with real-time priority the command couldsimply executed with root privileges for example be using ‘sudo‘. However wedo not recommend executing the framework as root.If you still want to use the sudo approach please note the following: On someolder systems, you might have to add the following line to your bash profilein order to inherit you PATH variable when using ’sudo’:

alias sudo=’sudo env PATH=$PATH $@’4. Then type:

./install.sh5. Run HSF!

You can now type in you terminal the following commands:

hsf [filename(.xml)]simulate [filename(.xml)]calculate [metric] [filename]show [metric] [filename]simfig [filename]publish [filename]

[metric] can be one (or more) of the following:

* exe|exec -> Execution Times

* resp -> Response Times

* throughput -> Throughput

* util -> Utilization

* alloc -> Resource allocation costs

* sys -> System allocation costs

* worker -> Worker costs

* missed -> Missed deadlines

A.2 Project Structure 39

A.2 Project Structure

The project is organized in 8 folders that are described in the following table:

Folder name Contentbin Binary files built from source and executable bash scriptsdoc Documentation generated from source codedocu Library documentations and theses that describe the frameworkexamples Sample XML files for the different schedulerslib External libraries, includes pugixml and CImg library sourcesobj Object files for incremental builds are stored in this folderscripts Additional scripts and sources, not required for normal simulationsrc Source files for the framework

A.3 Simulation and Analysis Scripts

The new tools and scripts used for the simulations are shortly described here. For the tools thatalready existed for other simulations, please refer to [13] for standard simulation tools or to [18]for single-core mixed-criticality specific simulation tools.

analyze_detail.m This MATLAB script analyzes the output files and plots the graphs pre-sented in this thesis. The file contains 5 sections: The first is for global definitions, such as inputfiles and settings for plotting. The second section imports all overhead measurements of thedifferent simulations. The other three sections are then used to plot the total overhead againstthe number of cores, the overrun probability or the system utilization. The first two sections al-ways need to be executed first, the last three can then be used independently. For more details,please refer directly to the comments in the script.

parseMultiCore This tool is used by the simulateMultiCore.sh script to extract a singletask set from the XML file that stores all task sets for the simulations. The parameters of thetask set can be changed with command line arguments, before it is stored in an XML file thatcan be used as simulation input for the framework.This tool is directly called by the simulateMultiCore.sh script and usually is not invoked byhand. A list of all command line options and return values is given when the tool is called withthe -help option. The output of this command is shown below:

Usage: parseMultiCore [options] <input file> [<output file>]options:

none Parse first simulation in input file as is-c Print number of schedulable configurations in the file-d <seconds> Set the simulation duration in seconds-n <index> Set index of the simulation to parse-p <value> Set overrun probability to given percent-s <name> Set the given scheduler name-t Print the total number of configurations (including unschedulable

ones)-u Print the utilization of the selected task set (selected using -n

<index>)The <output file> is necessary for all options except -c, -t and -u. All indexes start

counting at 1 for easier scripting.Type ’parseMultiCore --help’ to print this message.Exit codes: 0 no error, 1 general error, 2 non schedulable simulation selected.

showrt.sh This script only shows a list of all currently running real-time tasks with their priority,to see which tasks could interfere with the simulations. It does not take any argument.Command line: showrt.sh


simulateMultiCore.sh This script is used to simulate multiple task sets for different schedulerand overrun probabilities. The first argument is the filename of the XML that contains all tasksets to simulate. The optional second argument is to specify the index of the task set that issimulated first. This allows to resume simulations at a given task set, if the simulation of all tasksets was interrupted.Command line: simulateMultiCore.sh tasksets.xml [startIndex]

A.4 Visualization Output

Figure A.2 shows an example output generated with the simfig tool of the HierarchicalScheduling Framework (actually a simulation of a FTTS example). This example shows all thedifferent indicators that are used in the visualizations. In Figure A.1 all the different indicatorsare shown separately. Their meaning is described in the following list:

a) b) c) d) e) f) g)

Figure A.1: Legend for all indicators used for the visualization

a) Job arrival

b) Job finish, before the deadline

c) Job finish, after the deadline

d) Synchronization point across all cores (multi-core specific)

e) FTTS: cancellation of the next subframe

f) FTTS: barrier overrun in current subframe

g) A job is being executed during the colored interval.

A.4 Visualization Output 41

Figure A.2: Sample visualization output of the Hierarchical Scheduling Framework generatedby simfig.


Appendix B

Original Project Assignment

The original project assignment is depicted on the following 5 pages.

43

Institut für Technische Informatik undKommunikationsnetze

Semester Thesis at theDepartment of Information Technology and

Electrical Engineering

for

Lukas Sigrist

Implementation and Evaluation of

Mixed-Criticality Scheduling Algorithms for

Multi-core Systems

Advisors: Georgia GiannopoulouPengcheng Huang

Professor: Prof. Dr. Lothar Thiele

Handout Date: 25.09.2013Due Date: 08.01.2014

44 APPENDIX B. ORIGINAL PROJECT ASSIGNMENT

1 Project Description

Complex embedded systems are typically mixed-criticality systems, where function-alities of different importances (criticalities) co-exist. As an example, the airbornesoftware systems are usually categorized as flight critical or mission critical. Forflight critical functionalities, like the autopilot, a failure could result in an airplanecrash, whereas for the mission critical functionalities, like the radio communication,the failure consequences are not severe. On the other hand, various unexpectedsituations may happen during the operation of the airplane, since neither thehardware nor the software that we build are perfect. How should a softwaresystem react to those unexpected situations depending on its criticality? And whatshould/can we guarantee in such a dynamic system? Smart online adaptation tothe revealed scenarios is necessary to answer those questions.

Recently, in our group, we have developed a hierarchical scheduling framework(HSF), which runs in the user space of for Linux OS (on top of the Linux scheduleras shown in Figure 1) [5, 7]. HSF features several classical scheduling policies as wellas policies targeting especially mixed-criticality systems. All implemented policiesaddress single-core systems. The objective of this thesis is to extend HSF withmixed-criticality scheduling policies for multi-cores.

Real-time scheduling in multi-cores can be achieved in a global or partitioned manner,with both approaches facing their individual challenges. Such challenges includedesigning a low-overhead global scheduler or finding an appropriate partitioning ofthe real-time tasks to the processing cores, respectively. The multi-core schedulingproblem becomes even more complex if one considers the communication amongthe tasks and their interference (contention) when accessing shared resources, e.g.,a shared memory. To this end, we have developed a mixed-criticality schedulingalgorithm for resource-sharing multi-core systems [4]. To evaluate the applicability

1

45

Figure 1: Hierarchical Scheduling Framework

of the algorithm to real-world applications, it is essential to evaluate its runtimeoverhead on a real platform. Additionally, we would like to evaluate its efficiencycompared to other scheduling policies, partitioned and/or global, which have beensuggested in the literature in the last years.

2 Project Goals

The goal of this semester thesis is to implement and evaluate the runtime behaviorof the Time-Triggered and Synchronization-based (TTS) scheduling algorithm formixed-criticality multi-core systems [4]. The student needs to (i) implement the TTSalgorithm in the HSF framework and (ii) specify appropriate metrics to quantifyits performance and overhead. The proposed metrics can be used to (iii) compareTTS to other scheduling policies for multi-core systems, e.g., GLOBAL [6] and/ora partitioned extension of EDF-VD [3] for multi-cores.For this purpose, some new features will need to be implemented in HSF. Forinstance, time-triggered and barrier-based synchronization among the cores isnecessary. Partitioning of tasks to processing cores must be added as a pre-processingstep for the partitioned scheduling algorithms. Also, special attention needs to begiven to thread allocation to the available cores, given that HSF is implemented asa multi-threaded environment based on the POSIX standard [2].Depending on the progress of this project, the student may also (iv) explore thepossibility of extending existing mixed-criticality scheduling algorithm for single-coreto multi-core systems, by selecting appropriate partitioning methods.

3 Tasks

The project will be split up into several subtasks, as described below:

3.1 Familiarization with HSF

At the beginning of the project, the focus is on getting acquainted with the HSFenvironment, namely the mechanisms that enable hierarchical scheduling, runtime

2


monitoring and feedback control, the utilization of POSIX threads [2], etc. A manualof the HSF framework is given in [5], while the HSF features that enable theimplementation of mixed-criticality schedulers are documented in [7]. All thenecessary resources for this part of the project will be made available as soon asthe project starts. At the end of this project phase, it should be clear which partsof HSF need to be adapted for its extension to multi-core environments.

3.2 Implementation of Schedulers

In the second phase of the project, the TTS scheduling algorithm will beimplemented within HSF. Depending on the available time, the following twoalgorithms may also be integrated into HSF: (i) GLOBAL [6] and (ii) a partitionedextension of EDF-VD [3] (EDF-VD is already part of HSF). The XML specificationof the task sets and the logging mechanisms (simulation output) will need tobe updated to support the new schedulers. The existing overhead calculationmechanism will need to be evaluated and if necessary, enhanced. Alternativeperformance metrics can also be suggested and implemented. At the end of thisphase, quantitative measures on the performance of the implemented schedulers willneed to be presented.

3.3 Thesis Report and Final Presentation

Finally, a thesis report is written that covers all aspects of the project. In addition,the final presentation has to be prepared.

4 Project Organization

4.1 Weekly Meeting

There will be a weekly meeting to discuss the project’s progress based on a scheduledefined at the beginning of the project. A short report concerning each week’sprogress (accomplished goals, difficulties, questions, next goals) should be providedat latest the day before the meeting.

4.2 Semester Thesis Report

One hard copy of the report needs to be turned in at the end of the thesis. Thecopy remains property of the Computer Engineering and Networks Laboratory. Acopy of the developed software needs to be handed in on CD or DVD along with thethesis report.

4.3 Initial and Final Presentation

In the first month of the project, the topic of the thesis will be presented in ashort presentation during the group meeting of the Computer Engineering Lab.The duration of the talk is limited to five minutes. At the end of the project, the

3

47

outcome of the thesis will be presented in a 15 minutes task, again during the groupmeeting of the Computer Engineering Lab.

4.4 Work Environment

The work will be carried out in the framework of the European CERTAINTY [1]project, to which the Computer Engineering Lab is contributing in terms ofscheduling techniques, performance analysis and multi-core mapping optimization.Concretely, this means that the results of this work can be used by the involvedproject partners if the project goals are met.

References

[1] Certification of Real Time Applications designed for mixed criticality(Certainty). http://www.certainty-project.eu/.

[2] POSIX Threads Programming. https://computing.llnl.gov/tutorials/pthreads/.

[3] S. Baruah, V. Bonifaci, G. D’Angelo, H. Li, A. Marchetti-Spaccamela, S. Van derSter, and L. Stougie. The preemptive uniprocessor scheduling of mixed-criticalityimplicit-deadline sporadic task systems. In ECRTS, pages 145–154, 2012.

[4] G. Giannopoulou, N. Stoimenov, P. Huang, and L. Thiele. Scheduling of mixed-criticality applications on resource-sharing multicore systems. In EMSOFT,2013.

[5] A. Gomez. Hierarchical Scheduling Framework - A Programmer’s Manual. 2013.

[6] H. Li and S. Baruah. Global mixed-criticality scheduling on multiprocessors. InECRTS, pages 166–175, 2012.

[7] F. Wermelinger. Implementation and Evaluation of Mixed Criticality SchedulingApproaches. 2013.

Zurich, September, 2013

4


Appendix C

Presentation Slides

The slides of the presentation are depicted on the following pages.

49

| |

Semester thesis of Lukas Sigrist

Supervisors: Georgia Giannopoulou, Pengcheng Huang

08.01.2014 Lukas Sigrist 1

Implementation and Evaluation of Mixed-

Criticality Scheduling Algorithms for

Multi-core Systems

| |

Assumptions

More pessimistic assumptions for more critical tasks

Consequences

Over-provisioned system

Unused resources

What to do?

Dynamically guarantee deadline

Stop/degrade less critical tasks to guarantee critical task deadlines


Mixed-Criticality System

50 APPENDIX C. PRESENTATION SLIDES

| |

Different proposals for multi-core mixed criticality

scheduling algorithms

Use framework for simulation

Scheduler overhead

How overhead scales

Only few frameworks to simulate such algorithms [1]


Motivation

[1] M. S. Mollison and J. H. Anderson. Bringing Theory Into Practice: A Userspace Library for Multicore

Real-Time Scheduling. 2013.

| |

Multi-core scheduling framework

Easy to extend

Portable

Multi-core mixed criticality scheduler implementations

Flexible Time Triggered Scheduler (FTTS) [2]

Partitioned EDF-VD (pEDF-VD) [3]

Overhead calculation of the schedulers


Contributions

[2] G. Giannopoulou, N. Stoimenov, P. Huang, and L. Thiele. Scheduling of Mixed-Criticality Applications

on Resource-Sharing Multicore Systems. 2013.

[3] S. Baruah, B. Chattopadhyay, H. Li, and I. Shin. Mixed-criticality scheduling on multiprocessors. 2013.

51

| |

Introduction

Contributions

Hierarchical Scheduling Framework and Extensions

Flexible Time Triggered Scheduler

Partitioned EDF-VD Scheduler

Conclusion


Outline

| |

Introduction

Contributions




Conclusion


Outline


| | 08.01.2014 Lukas Sigrist 7

Hierarchical Scheduling Framework (HSF)

[4] A. Gomez. Hierarchical Scheduling Framework - A Programmer's Manual. 2013.

[5] F. Wermelinger. Implementation and Evaluation of Mixed Criticality Scheduling Approaches. 2013.

Hierarchical

Scheduling

Framework [4]

Mixed Criticality

Extension [5]

Multi-core

Extension (new)

Mixed criticality support for single-cores

EDF-VD, MCRF

Multi-core support, mixed criticality

FTTS, pEDF-VD

User space scheduling framework

Portable, light weight, extensible

EDF, TDMA, FP, RM, FIFO, hierarchies

| |

Processor affinity

Inter-core synchronization

Time measurement

Output visualization


HSF Multi-core Extensions

53

| |

Introduction

Contributions




Conclusion


Outline

| |

Repeats over a cycle

Consists of fixed length frames

Fixed frames are subdivided again

Subframe for each criticality level

Subframe start and length dynamic

Depend on task execution times


Flexible Time Triggered Scheduler (FTTS) [2]

[2] G. Giannopoulou, N. Stoimenov, P. Huang, and L. Thiele. Scheduling of Mixed-Criticality Applications

on Resource-Sharing Multicore Systems. 2013.


| | 08.01.2014 Lukas Sigrist 11

FTTS: Normal Operation

| | 08.01.2014 Lukas Sigrist 12

FTTS: Barrier Overrun

55

| |

Scheduler configurable by XML input

Barrier synchronization POSIX pthread_barrier for portability

Global timing

OS clock for portability

Task priorities

Dynamic priority switching necessary

Global scheduler vs distributed

Go with global implementation


FTTS Implementation Choices

| |

Simulations

50 task sets for each utilization

System utilization from 0.2 to 7.6

Probability of overrun for high critical tasks 40%

2 to 16 cores

Simulation duration 4 seconds

TiK server

2x 8 core Intel Xeon E5-2690 processors, 2.90Ghz

Standard Linux 3.7 kernel with preemption


Experiments for Overhead Evaluation


| | 08.01.2014 Lukas Sigrist 15

FTTS Overhead vs Number of Cores

Fixed parameters

Norm. utilization: 0.4

Overrun prob.: 40%

| | 08.01.2014 Lukas Sigrist 16

FTTS Overhead vs Utilization

Fixed parameters

Cores: 8

Overrun prob.: 40%

57

| | 08.01.2014 Lukas Sigrist 17

FTTS Overhead vs Task Overrun Probability

Fixed parameters

Cores: 8

Norm. utilization: 0.4

| |

Introduction

Contributions




Conclusion


Outline


| |

One EDF-VD scheduler per core

Exists in the framework since the mixed criticality extension

Two different execution modes (HI, LO)

Cancel low critical task when execution time exceeds lower WCET


EDF with Virtual Deadlines (EDF-VD) [6]

[6] S. Baruah et al. The preemptive uniprocessor scheduling of mixed-criticality implicit-deadline

sporadic task systems. 2012.

| |

EDF-VD extension to multi-core

Single-core EDF-VD per core, run independently

Offline partitioning

First bin packing algorithm as proposed in [3]


Partitioned EDF-VD (pEDF-VD) [3]

EDF-VD

EDF-VD

EDF-VD

[3] S. Baruah, B. Chattopadhyay, H. Li, and I. Shin. Mixed-criticality scheduling on multiprocessors. 2013.

59

| |

Simulations

200 task sets for each utilization

System utilization from 0.2 to 7.6

Probability of overrun for high critical tasks 40%

2 to 16 cores

Simulation duration 4 seconds

TiK server

2x 8 core Intel Xeon E5-2690 processors, 2.90Ghz

Standard Linux 3.7 kernel with preemption


Experiments for Overhead Evaluation

| | 08.01.2014 Lukas Sigrist 22

pEDF-VD Overhead vs Utilization

Fixed parameters

Cores: 8

Overrun prob.: 40%


| |

Introduction

Contributions




Conclusion


Outline

| |

Multi-core mixed criticality scheduling framework

Two schedulers implemented


Partitioned EDF-VD

Scheduler overheads below 1%

Despite task monitoring

Despite runtime decisions

Despite inter-core synchronization


Conclusion

61


Bibliography

[1] Portable Hardware Locality (hwloc). http://www.open-mpi.org/projects/hwloc/.

[2] Real-Time Linux Wiki. https://rt.wiki.kernel.org/.

[3] Valgrind. http://valgrind.org/.

[4] Standard for Information TechnologyPortable Operating System Interface (POSIX(R)) BaseSpecifications, Issue 7. IEEE Std 1003.1, 2013 Edition (incorporates IEEE Std 1003.1-2008, and IEEE Std 1003.1-2008/Cor 1-2013), pages 1–3906, 2013.

[5] J. H. Anderson, S. K. Baruah, and B. B. Brandenburg. Multicore operating-system supportfor mixed criticality. In Proceedings of the Workshop on Mixed Criticality: Roadmap toEvolving UAV Certification. Citeseer, 2009.

[6] B. Barney. POSIX Threads Programming. https://computing.llnl.gov/tutorials/pthreads/.

[7] S. Baruah, V. Bonifaci, G. D’Angelo, H. Li, A. Marchetti-Spaccamela, S. Van Der Ster, andL. Stougie. The preemptive uniprocessor scheduling of mixed-criticality implicit-deadlinesporadic task systems. In ECRTS, pages 145–154. IEEE, 2012.

[8] S. Baruah, B. Chattopadhyay, H. Li, and I. Shin. Mixed-criticality scheduling on multipro-cessors. Real-Time Systems, pages 1–36, 2013.

[9] B. B. Brandenburg, A. D. Block, J. M. Calandrino, U. Devi, H. Leontyev, and J. H. Anderson.LITMUSRT: A status report. In Proceedings of the 9th real-time Linux workshop, pages107–123, 2007.

[10] F. Broquedis, J. Clet-Ortega, S. Moreaud, N. Furmento, B. Goglin, G. Mercier, S. Thibault,and R. Namyst. hwloc: A generic framework for managing hardware affinities in HPCapplications. In Parallel, Distributed and Network-Based Processing (PDP), 2010 18thEuromicro International Conference on, pages 180–186. IEEE, 2010.

[11] F. Franchetti. Fast Barrier for x86 Platforms. http://www.spiral.net/software/barrier.html.

[12] G. Giannopoulou, N. Stoimenov, P. Huang, and L. Thiele. Scheduling of mixed-criticalityapplications on resource-sharing multicore systems. In EMSOFT, pages 1–15. IEEE, 2013.

[13] A. Gómez. Hierarchical Scheduling Framework - A Programmer’s Manual. Unpublishedarticle, 2013.

[14] O. R. Kelly, H. Aydin, and B. Zhao. On partitioned scheduling of fixed-priority mixed-criticality task sets. In Trust, Security and Privacy in Computing and Communications(TrustCom), 2011 IEEE 10th International Conference on, pages 1051–1059. IEEE, 2011.

[15] H. Li and S. Baruah. Global mixed-criticality scheduling on multiprocessors. In ECRTS,pages 166–175. IEEE, 2012.

[16] M. S. Mollison and J. H. Anderson. Bringing Theory Into Practice: A Userspace Library forMulticore Real-Time Scheduling. In RTAS, pages 283 – 292. IEEE, 2013.

63

http://www.open-mpi.org/projects/hwloc/

https://rt.wiki.kernel.org/

http://valgrind.org/

https://computing.llnl.gov/tutorials/pthreads/

https://computing.llnl.gov/tutorials/pthreads/

http://www.spiral.net/software/barrier.html

http://www.spiral.net/software/barrier.html

64 BIBLIOGRAPHY

[17] R. M. Pathan. Schedulability analysis of mixed-criticality systems on multiprocessors. InReal-Time Systems (ECRTS), 2012 24th Euromicro Conference on, pages 309–320. IEEE,2012.

[18] F. Wermelinger. Implementation and Evaluation of Mixed Criticality Scheduling Ap-proaches, 2013.

Documents

Implementation and Evaluation of Mixed-Criticality ... · Implementation and Evaluation of Mixed-Criticality Scheduling Algo-rithms for Multi-Core Systems ... As an example implementation