Telemetry data for machine learning based scheduling

Eötvös Loránd UniversityFaculty of Informatics

Telemetry data for machine learning

based scheduling

Péter Kiss Alejandro GonzálezPhD-Student at ELTE Data Science

Martín MolinaProfessor at UPM

Benedek KovacsHead of Technology and Innovation at

Ericsson

Budapest, 2020

STATEMENT

OF THESIS SUBMISSION AND ORIGINALITY

I hereby confirm the submission of the Master Thesis Work on the Com- puter Science MSc course with author and title:

Name of Student: Alejandro Gonzalez Code of Student: H4Z8R6

Title of Thesis: Telemetry data for machine learning based scheduling

Supervisor: Péter Kiss

at Eötvös Loránd University, Faculty of Informatics.

In consciousness of my full legal and disciplinary responsibility I hereby claim that the submitted thesis work is my own original intellectual prod- uct, the use of referenced literature is done according to the general rules of copyright. I understand that in the case of thesis works the following acts are considered plagiarism: • literal quotation without quotation marks and reference; • citation of content without reference; • presenting others’ published thoughts as own thoughts.

Budapest,

student

Abstract

The amount of data generated by computing clusters is very large, including nodes re-

sources data or application related data, among others. However, current systems do not

exploit all the potential that this data can offer. This thesis attempts to put into use cluster

telemetry data for two different purposes, scheduling and workload estimation. Motivated

by the latest advancements in the machine learning field, a Deep Reinforcement Learning

(DRL) based scheduler is proposed. Two different scheduling experiments are performed

in a simulated cluster environment. The results show that the DRL based scheduler can

be trained in specific cluster architectures to optimize performance parameters, such as,

job completion time, hence, obtaining the best scheduling policy compared to traditional

scheduling heuristics. In addition, Long Short-Term Memory (LSTM) neural networks are

proposed to estimate the workload in computing clusters. Hence, an experiment using

LSTM to forecast cluster resource usage was implemented. The results of the experiment

reveal that telemetry data from the past can be successfully used to predict the future

workload of the system. Furthermore, the results expose that LSTM neural networks can

be used to anticipate system failures. Finally, a combination of DRL based scheduling and

workload estimation is proposed as a future line of research.

v

Acknowledgements

I would like to express my gratitude to my academic supervisor Péter Kiss for his implica-

tion, advice and support. I would also like to thank my industrial supervisor Dr. Benedek

Kovacs for his support and the resources provided by him and his team. My appreciation

also to my co-supervisors Anna Reale and Michael Chima for their guidance and valuable

advice that helped me to complete the thesis. I would also like to thank the professor

Martin Molina from the UPM university for supporting this thesis.

I would also like to thank Ericsson for giving me the opportunity to carry out my thesis

with them. In addition, I would like to express gratitude to ELTE for hosting me during

my final year of studies. Finally, I would like to have some words for my unforgettable

flatmates Mike, Jose and Dani, your daily support made this work easier, you will always

be in my memories.

vi

Dedication

This thesis is dedicated to my parents and my brother who have always supported me in

all my decisions and have motivated me to pursue my goals. I would also like to dedicate

this thesis to my girlfriend Alba, your relentless support and affection have been of an

immeasurable value during this period.

vii

CONTENTS

Contents

1 Introduction 1

1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Background 5

2.1 Microservices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Distributed computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.1 Cloud and edge systems . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 Orchestrators and schedulers . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.4 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Method 13

3.1 Reinforcement learning foundation . . . . . . . . . . . . . . . . . . . . . . . 13

3.1.1 Reinforcement learning . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1.2 Deep reinforcement learning . . . . . . . . . . . . . . . . . . . . . . . 15

3.1.3 Deep reinforcement learning in scheduling . . . . . . . . . . . . . . . 16

4 Affinity based scheduling 18

4.1 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.2.1 Simulated cluster environment . . . . . . . . . . . . . . . . . . . . . 19

4.2.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.2.3 State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.2.4 Action space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.2.5 Reward function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.2.6 Agent: DRL scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

viii

CONTENTS

5 Distance based scheduling for multi-module applications 31

5.1 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32


5.2.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.2.3 State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.2.4 Action space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.2.5 Reward function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.2.6 Agent: DRL scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6 Time series workload estimation 42

6.1 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

6.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42


6.2.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

6.2.3 Forecasting model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6.3.1 Univariate results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6.3.2 Multivariate results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

7 Discussion 51

7.1 Affinity based scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

7.2 Distance based scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

7.3 DRL based scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

7.4 Workload estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

7.5 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

8 Conclusion 56

ix

LIST OF FIGURES

List of Figures

3.1 Schema and elements of reinforcement learning . . . . . . . . . . . . . . . . 14

3.2 Deep reinforcement learning schema with DNN as an agent . . . . . . . . . 16

4.1 Schema of affinity based scheduling experiment . . . . . . . . . . . . . . . . 21

4.2 Difference of memory usage between nodes for DRL and LB schedulers . . . 27

4.3 Average job duration evolution during the training phase in affinity based

scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.4 Average reward evolution during the training phase in affinity based scheduling 29

4.5 Average job duration of DRL and LB schedulers for the test jobset in affinity

based scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.1 Schema of distance based scheduling simulation . . . . . . . . . . . . . . . . 34

5.2 Average job duration evolution during the training phase in distance based

scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.3 Average reward evolution during the training phase in distance based schedul-

ing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.4 Average job duration of DRL and LB schedulers for the test jobset in dis-

tance based scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

6.1 Simulated cluster architecture for workload prediction . . . . . . . . . . . . 44

6.2 Segment of training data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

6.3 Univariate memory usage prediction in Node 3 . . . . . . . . . . . . . . . . 48

6.4 Univariate high memory usage peak prediction in Node 3 . . . . . . . . . . 48

6.5 Multivariate high memory usage peak prediction in Node 3 . . . . . . . . . 49

6.6 Multivariate average workload prediction in Node 3 . . . . . . . . . . . . . . 50

x

LIST OF TABLES

List of Tables

4.1 Example of observation St in affinity based scheduling experiment . . . . . . 23

4.2 Action space in affinity based scheduling experiment . . . . . . . . . . . . . 24

5.1 Location parameters of the nodes . . . . . . . . . . . . . . . . . . . . . . . . 33

5.2 Example of observation St in distance based scheduling experiment . . . . . 35

5.3 Action space in distance based scheduling experiment . . . . . . . . . . . . . 36

6.1 Sample of univariate time series memory usage datatset . . . . . . . . . . . 46

6.2 Sample of multivariate time series memory usage datatset . . . . . . . . . . 46

xi

CHAPTER 1. INTRODUCTION

Chapter 1

Introduction

1.1 Context

Nowadays, most of the industries, such as information technology or automation, rely upon

software systems to operate. Software systems have improved and facilitated many tasks.

However, they also involve some risks. In 2018, 3.7 billion people were affected by software

and system failures which originated $1.7 trillion losses in assets [1]. Some breakdowns are

unpredictable and very difficult to deal with. Nonetheless, a lot of software failures can be

actually prevented. A lot of services work in real time, thus, they require to dynamically

scale accordingly to the current state of the system and the demand of the users. This

type of systems are specially susceptible to breakdowns since they are usually distributed

architectures composed by multiple units.

Service-Oriented Architecture (SOA) emerged with the goal of providing distributed

services to different application modules [2]. Later on, the offspring of SOA, the microser-

vices, appeared with the goal of deploying distributed systems throughout a network com-

posed of different remote nodes. In this context, the need of a mechanism capable of

orchestrating distributed systems like microservices arose.

In the field of distributed computing, this mechanism, often known as “orchestrator”,

is responsible for the configuration, the management and the coordination of the different

nodes and tasks involved in the system [3]. Recently, with the appearance of edge and

cloud computing, orchestrator systems are facing new challenges. Cloud and often edge

architectures involve that nodes belonging to the same system are in distant locations

from each other. For some applications, response time, latency and other parameters are

essential. Hence, orchestrators are required to be more effective by avoiding decisions that

1


may lead to system breakdowns. Furthermore, orchestrators need to have a quick response,

moreover, they need to manage a large amount of data in real time.

Within an orchestrating system, the scheduler has become very relevant for the op-

timization and balancing of distributed systems. The scheduler is in charge of allocating

the tasks throughout the different nodes of the computing cluster. Scheduling may have

different goals, such as improving the performance, meet jobs deadlines, prevent system

breakdowns or improve response time, among others [4]. As a consequence, they assign jobs

to nodes given the amount of available resources, such as memory or Central Processing

Unit (CPU). Nonetheless, network and location parameters, such as latency or distance

between nodes, have been discovered as key factors to take into account by the scheduler.

In the last few years there is a clear trend in the Information and Communications

Technology (ICT) domain leading to the automation of services. It is a fact that Machine

Learning (ML) techniques can be found at the top of the peak among the most innova-

tive and impacting technologies. Thus, the applications of ML have reached a wide variety

of domains, such as machine vision, robotics and business analytics among others. As a

consequence, there is a great opportunity to implement automated systems with the use

of ML methods, i.e., systems that are able to be self manageable with very little or non-

existing human intervention. Therefore, ML represents a tool of great potential to be used

in computing clusters optimization. In fact, ML algorithms are being used to avoid system

failures and to analyze the system behaviour, among others.

Multiple industries are currently using distributed computing architectures and or-

chestrators to manage their systems. A clear example are container orchestrators, from

which kubernetes is the most representative figure. Some applications implementing this

type of orchestrators are embedded systems for control lines in the industry, public cloud

providers, such as Amazon Web Services or Google Cloud Platform and other open source

frameworks, such as docker or VMware [5]. All of these systems and the applications that

rely on them may benefit from the optimization of the scheduling process. This optimiza-

tion may be based on the particular requirements of each application. Moreover, predictive

maintenance solutions may aid to prevent system downtime and therefore cut industry

losses.

2


1.2 Scope

The goal of this thesis is to asses the benefits of using cluster telemetry data in distributed

systems optimization tasks. This objective may be shaped as a methodology to tackle

distributed systems related challenges, such as system down-time avoidance. Besides, this

work attempts to make use of the data generated by distributed systems to propose new

techniques to enhance system optimization based on specific application requirements. The

thesis objective is to address the research questions presented in the next paragraphs.

As discussed previously, the scheduler is the component within orchestrators that is

in charge of allocating or scheduling arriving jobs across the cluster nodes, i.e., it decides

to which node each arriving job should be allocated. Therefore, enabling the scheduler to

receive telemetry data from the different components of the cluster might result in a better

performance of the scheduler and hence, of the whole system. Furthermore, as the operation

of the scheduler consists in a decision process, it may be significant to use telemetry data

as an input to a ML model that will analyze it and will output scheduling decisions. Thus,

two research questions arise related to this use of telemetry data. Is it beneficial to use

telemetry data as an input that influences the scheduling decisions? Can ML techniques

be successfully used in the scheduling process?

Although using telemetry data to implement an optimal scheduling system is interest-

ing, it is worth considering additional opportunities in which using telemetry data may

result beneficial to optimize the functioning of a distributed system. In that way, the sec-

ond approach focuses in predicting the workload of the system. Being able to estimate the

expected workload of a system provides many advantages, such as an optimal resource pro-

visioning, which relates this second approach with the first research question. The workload

estimation of the system may be a significant input for the scheduler to take allocation

decisions depending on the expected workload. On the other hand, failure avoidance is a

hot topic that can result very beneficial to avoid losses by anticipating a system failure and

taking the opportune actions. Hence, the research question that arise is, can telemetry data

be used to estimate the workload and the failure model of a computing cluster? With this

research question, the aim is to prove that telemetry data gathered from different cluster

nodes or network links can result useful to predict the failure of a system component in

advance.

3


1.3 Structure

The thesis is organized in several chapters. Each chapter has a different purpose. First,

the background, chapter 2, includes the theoretical information to place the reader in the

context of the thesis. Besides, it describes some topic specific concepts, such as, microser-

vices, edge and cloud computing or schedulers. Finally, it includes a literature review of

the works developed in the same area that attempt to address issues that are related to

this thesis research questions.

The method, chapter 3, presents the algorithms used in the experiments, such as re-

inforcement learning algorithms. Besides, this chapter attempts to provide a reasoning of

why this machine learning techniques are suitable for the field of scheduling.

The chapters 4, 5 and 6 contain the affinity based scheduling, multi-module distance

based scheduling and workload estimation experiments respectively. Every experiment com-

prehends different sections describing the experiment, that are the goal, the implementation

methodology and the results of the experiment. The results of each experiment were placed

right after the methodology and implementation, in that way, the results are closer to the

description of each experiment, and hence, from our point of view are easier to interpret,

rather than in a separated results chapter.

Then, the discussion, chapter 7, analyzes and discusses the outcome of the experiments.

The relevance, the advantages and the disadvantages of the solutions proposed in the

experiments in relation to field of computing clusters and distributed systems are presented.

Furthermore, based on the results and their analysis, the future lines of research in the

topic are established. Finally, the conclusion 8 reviews work done and the key findings of

the thesis.

4

CHAPTER 2. BACKGROUND

Chapter 2

Background

Throughout this chapter the theoretical basis of the thesis is presented. From a generic

point of view to more specific, concepts that are relevant for the thesis and the different

domains for which the ideas discussed are meaningful, are described. Besides, previous

works and research carried out in the field are reviewed. This background will found the

theoretical basis of the thesis and will highlight which are the needs and opportunities that

are presented in the field and that can be approached through this thesis.

2.1 Microservices

Traditionally, IT applications have been developed in a monolithic fashion. However, during

the last decade, the way of developing, deploying and operating applications has changed

drastically. A monolithic application usually implies that all the dependencies, services and

processes that an application implements and performs are contained in a single piece of

software treated as a single unit. Therefore, monolithic applications are heavy and as a

consequence, they need to be deployed in powerful servers.

In the past, monolithic applications have revealed some weaknesses that strongly im-

pact the deployment and the operation of this type of applications. For instance, when an

application is not able to handle a certain amount of work but the demand keeps increas-

ing, an scaling of the application is required. Given the fact that monolithic applications

operate as a whole, it is not possible to separate the different services and processes they

include. Thus, they need to scale as a single entity. There are two different methods to scale

applications, vertical scaling, which implies providing more resources to the server where

the application is deployed, and horizontal scaling, that consists in deploying replicas of

the original application to additional servers so the load of the application is distributed.

5


Monolithic applications are not able to scale horizontally due to their heaviness. Besides, if

any part of the application is not scalable at all, the whole application becomes unscalable.

Due to the previously mentioned challenges, in the last decade the trend has been to

subdivide monolithic applications into smaller problems that can be approached indepen-

dently from the rest of of the application parts. These parts are often referred as microser-

vices. Such a shift in the development of applications has many implications. Although mi-

croservices can run in different machines independently, a constant communication among

the microservices that form an application is necessary. In contrast to monolithic applica-

tions, microservices based applications are able to scale horizontally, since only the relevant

modules needed to scale the application have to be replicated [6].

Therefore, microservices based systems are usually composed by multiple elements

that are constantly communicating with each other. Usually, this type of arrangement

involves risks where the fault of one element usually affects the performance of the whole

system. In many cases unexpected behaviours and anomalies may appear causing harmful

consequences such as, system down-time or system breakdowns.

2.2 Distributed computing

As a consequence of the evolution of monolithic applications into microservices, the archi-

tectures to host this type of applications also shifted accordingly. Distributed computing

architectures refers to systems made up by different machines connected into a common

network that communicate and coordinate the machines, this systems are commonly named

as distributed systems.

As it has been explained in the previous chapter, applications are subdivided into

smaller modules called microservices. With the goal of providing reliability and scalability,

sub-modules of the same application may be allocated in different machines. A group of

interconnected machines is also known as cluster, throughout this thesis this term will be

usually used to denote this type of systems.

Distributed systems are made up by two principal components that determine the

nature of the system:

1. Nodes: The nodes are the different computational units that form the system. Each

node has its specific resources, such as memory or computational capability. Multiple

6


nodes together form a cluster of nodes with resources equal to the sum of the resources

of all the nodes. The resources of a cluster are viewed as a whole and not as a set

of separated resources per node. The nodes can be physically separated or otherwise

they can be a partition of the same machine.

2. Network: The network is the set of links in charge of interconnecting the different

nodes of the cluster. Links are virtual connections in the case that both nodes are

located within the same machine, or physical in the case that the link interconnects

physically separated nodes. Links are characterized for some parameters such as,

bandwidth, latency or jitter.

2.2.1 Cloud and edge systems

The domain of distributed systems embraces different types of systems that can be differ-

entiated from each other. The physical distance among nodes is a relevant characteristic

of distributed systems. Therefore, they can be classified according to the physical arrange-

ment of the nodes into cloud systems and edge systems.

In cloud systems the machines providing the computational resources are located in

a distant location. Usually a cloud system is located in a data center that is significantly

remote from the user. Hence, for the interest of this work, it means that the communica-

tion between the user and the cloud system implies system latency, and in general poor

connection parameters. Cloud systems are suitable for applications that can run in remote

nodes without the need of persistent communication to the user endpoint. However, cloud

systems are not a good solution for applications that need high availability and fast re-

sponse time.

As a consequence the concept of edge systems or edge computing has appeared. Edge

systems are systems where the computing resources are located relatively close to the user

endpoint application. This type of topology provides better response time for applications

that run in real time. Edge systems are usually needed in fields like the industry, where

applications require reliability and quick response times.

2.3 Orchestrators and schedulers

The level of complexity of distributed applications often rises quickly. Therefore, the need

for an automated system capable of managing in real time both the application and the

7


hardware resources has become essential. The orchestrator is the component in charge of

the configuration, the management and the coordination of the different elements in a dis-

tributed system. It is used in a wide variety of systems such as in operating systems, virtual

machines, etc. In this work the orchestrator is considered as the element that coordinates

service-oriented architectures. More precisely the orchestrator is in charge of determining

the initial configuration of microservices, in which nodes microservices are gonna be allo-

cated, the replication policies, etc. Orchestrators manage microservices based applications

but they also take care of managing hardware resources. Usually, an orchestrator is con-

sidered as the instance that manages a cluster of units of hardware and resources where

applications are deployed.

Orchestrators take care of multiple tasks. For instance, when the application is over-

loaded the orchestrator scales the application by creating replicas. Besides, the orchestrator

is constantly monitoring the state of the cluster and of the application. Hence, in case of

failures it reallocates modules, restarts the application or the hardware components.

For the shake of the work developed in this thesis one of the focus points will be in an

specific component within orchestrators, the scheduler. A computing cluster is made up by

multiple nodes, each of them is considered as unit contributing with resources to the whole

cluster. The task of the scheduler is to assign each arriving job or application module to

one of the nodes. The scheduler may perform this decision based in several parameters,

such as CPU or memory requirements. Therefore, the more precise and complete are the

parameters that the scheduler takes as input to take allocation decisions, the better the

performance of the system will be. Moreover, in systems where jobs are constantly arriv-

ing to the cluster, the scheduler becomes a key element to avoid node overloading and to

allocate jobs to the optimal node.

The task of the scheduler is indeed to schedule jobs, however the policy followed to

take scheduling decisions depends on the goal of the system or the application to be de-

ployed. Common goals of scheduling optimization are reducing the average job completion,

minimizing the time the jobs wait to be allocated or reducing the average job slowdown

or delay. However, in a multi node cluster there are not many scheduling techniques con-

sidering network parameters or affinities between the jobs and nodes. Often jobs need to

communicate among them or they have specific execution requirements, in this case it is

important to consider affinities and network parameters such as latency, bandwidth or dis-

tance between nodes. Ultimately, an optimal scheduler would be able to adapt to different

8


types of jobs requirements and therefore applications.

Finally, it is worth highlighting the two different types of scheduling tasks, offline

scheduling and online scheduling. In offline scheduling all the jobs that the system has

to allocate are already available to be scheduled. In contrast, online scheduling considers

that jobs arrive to the scheduler in an online fashion. Hence, in online scheduling, the

number of jobs available to be scheduled varies at each time step. Hence, the focus of this

thesis are the online scheduling techniques.

2.4 Related work

The increasing demand of cloud and edge services has risen concern regarding how to opti-

mize the utilization of resources. The ultimate goal is to increase the system performance

and minimize costs for the users. As a consequence, multiple research is being developed

in the field. Some works attempt to propose scheduling techniques to optimize certain per-

formance indicators or to match the requirements of specific applications. On the other

hand, other studies aim to analyse the architecture of clusters to benefit the deployment

of applications. Through the next paragraphs a short review of related works in the field

of scheduling and resource allocation optimization is presented.

Multiple existing scheduling algorithms that were already being used in operating sys-

tems have been now applied in distributed computing and cloud systems. In Salot (2013)

[7], a description of traditional scheduling algorithms is exposed. First Come First Serve

(FCFS) is one of the simplest but most widely used scheduling algorithm. FCFS algorithm

stores jobs in an ordered queue, then, the jobs are scheduled based on the order of arrival.

FCFS is an easy to implement algorithm, however, for applications where real-time oper-

ation is essential its performance is poor, since the average job time completion is usually

high.

Another well known scheduling algorithm is Shortest Job First (SJF). SJB algorithm is

a priority scheduling algorithm in which jobs waiting to be allocated are scheduled based

on their burst time (duration). There exist two types of SJF, non pre-emptive SJF and

pre-emptive SJF. In the non pre-emptive version the process priority does not matter.

Hence, when a short job arrives while a long job is being executed the algorithm waits

until the already running job finishes to schedule the new job. On the other hand, the

pre-emptive SJF schedules shortest jobs first among all the waiting jobs, but when a new

9


job arrives that is shorter that the already running job the scheduler stops the execution

of the running job and allocates the shorter job. SJF is an optimal algorithm when most of

the jobs to be scheduled are known in advance. Nonetheless, in real scenarios the informa-

tion regarding incoming jobs is usually not known. Besides, in the case of non pre-emptive

SJF the action of stopping jobs to allocate new jobs may take some time, hence, it is not

optimal for scenarios where jobs are arriving frequently.

The Round Robin algorithm, Rasmussen et al. (2008) [8], is based on FCFS, but in-

cludes a significant extension. Round Robin algorithm schedules jobs by order of arrival,

nonetheless, each scheduled job is assigned with a time slot or quantum, whenever this

quantum is over the job stops executing and waits until is scheduled again. This process is

repeated until all jobs finish executing in their corresponding assigned time slots or quan-

tums.

Karthick et al. (2014) [9] propose a Multi Queue Scheduling (MQS) algorithm. The goal

of MQS is to give equal importance to all the jobs regardless their burst time. In that way,

MQS creates three queues of waiting jobs, small queue, medium queue and long queue.

Jobs are stored in the different queues based on their burst time respectively, so short jobs

will be stored in the small queue, average duration jobs in the medium queue and so forth.

Then, the algorithm performs a dynamic selection of the jobs from each queue. That is,

jobs are selected consecutively from each of the queues. In that way, jobs with different

duration are scheduled with the same frequency. MQS is optimal when all the arriving jobs

have the same importance and there are not optimization goals such as minimizing the

average job duration, the average job waiting time or the scheduling delay.

Other works have tried to apply different techniques or even different approaches to

optimize the allocation of resources. Zheng et al. (2011) [10] successfully implemented an

scheduler based on Parallel Genetic Algorithm (PGA). Their method has achieved a bet-

ter resource utilization rate than other algorithms such as Round Robin. Nevertheless, the

focus of the optimization is not always on the scheduler. Malik et al. (2014) [11] propose a

method to discover low latency groups of nodes. This method is capable of grouping nodes

with a low inter latency. Hence, jobs that require communication between them can be

scheduled to the same group of nodes resulting in a faster communication. Nonetheless,

this technique assumes at least partial knowledge of the inter nodes latency. Thereby, this

method is not feasible for clusters in which the inter node latency is not known at all.

10


Recently, some works have used state-of-the-art machine learning algorithms to opti-

mize the scheduling task. In specific, the work developed by Mao et al. (2016) [12] proposes

the DeepRM algorithm for online scheduling. DeepRM uses Deep Reinforcement Learning

(DRL), a combination of Reinforcement Learning (RL) and Feed Forward Neural Networks

(FFNN) to build the scheduler. The algorithm is provided with an optimization parameter

such as minimizing the average job duration or the average job slowdown, which is defined

as the ratio between the actual job duration and the expected job duration. DeepRM is

able to learn the most optimal scheduling policy to achieve the best results in terms of

average job completion time and average job slowdown. DeepRM outperforms traditional

scheduling algorithms, such as, SJF or tetris [13].

DeepRM has been followed by additional works that use the similar foundation of DRL

to extend the original algorithm. The extension DeepRM_2 [14] uses imitation learning

which leads to a faster convergence to the optimal scheduling policy compared to DeepRM.

Besides, DeepRM_2 uses Convolutional Neural Networks (CNN) to improve the scheduling

efficiency with respect to DeepRM. A later work, based on DeepRM proposes DeepRM-

reshape [15] which is capable of scheduling jobs in a multi-node environment.

The previous works propose scheduling algorithms that are successful in the proposed

scenarios. However, none of the discussed works propose the use of network telemetry data

or affinity between jobs and nodes in the scheduling process. DeepRM and DeepRM_2 are

successful in a single node scenario outperforming traditional scheduling heuristics. On the

other hand, DeepRMreshape is capable of successfully perform the scheduling task in a

multi-node cluster. Nonetheless, it considers that the different nodes have similar resources

and characteristics. Furthermore, it does not consider telemetry data or affinities as a dif-

ferentiating factor, which in a real case scenario is a key element.

The results of DeepRM have shown a great opportunity of successfully applying novel

ML techniques in the field of scheduling. Therefore, the approach followed in this work has

a similar basis grounded in the use of DRL to implement the scheduling system. Further-

more, the proposed implementation aims to extend the use of the DRL based scheduling

to more realistic scenarios.

Hence, the objective is to evaluate the performance of DRL scheduling in scenarios

where arriving jobs have affinities with specific nodes, in multi-nodes clusters where teleme-

try data can be useful to take scheduling decisions of multi-module applications.

11


12

CHAPTER 3. METHOD

Chapter 3

Method

Firstly, this chapter introduces the reader to the concept of DRL and its different ele-

ments. Besides, the motivations of applying DRL techniques in the field of scheduling

are presented. Then, the different experiments performed are presented. The first two ex-

periments show how DRL may be used to address a different challenges of scheduling

in computing clusters. Finally, a third experiment presents the use of ML techniques for

cluster workload estimation.

3.1 Reinforcement learning foundation

3.1.1 Reinforcement learning

Usually, RL has been considered as the third major sub-field within ML besides supervised

learning and unsupervised learning. In supervised learning, labelled data is used to perform

classification and prediction, whereas unsupervised learning attempts to group unlabelled

data by finding relation patterns. Yet RL tackles a completely different task.

RL is the ML technique in which an agent learns by trial an error from experience. In

other words, it is a sequential process in which, based on the environment state, the agent

interacts with the environment by taking actions following a policy and in turn, receiving

a reward from the environment and transitioning to a new environment state as shown in

Figure 3.1. The agent’s goal is to adjust its policy so the received reward is maximized over

the time. In order to approach a problem with RL it needs to satisfy the Markov property

[16]. A RL problem satisfying the Markov property may be formulated as a Markov De-

cision Process (MDP) in which the conditional future probability of future states of the

environment only depends on the present state and on the action taken by the agent.

13

CHAPTER 3. METHOD

Figure 3.1: Schema and elements of reinforcement learning

An MDP can be finite or infinite. For the shake of this work we will consider the case

of finite MDPs. A complete cycle of a finite MDP is called episode. Each episode of an

MDP is a sequential process in which each time-step is represented by the tuple {S,A,R},where R is the reward returned by the environment for taking the action A following the

policy π in the environment state S.

Another important concept is the state-value function V , which represents how good

is to be in a certain state S, based on the future received reward. Furthermore, the action-

value function Q stands for how good is to take a certain action A under a state S. But how

are V and Q quantified? It is known that at each time-step the agent receives a reward

R, hence, the value V of being on an state S can be defined as the cumulative sum of

all the rewards received from this time-step onwards until the end of the episode, and is

commonly named cumulative reward. Similarly, the value of taking an action A in a state

S can be estimated as the cumulative sum of all the rewards received from that time-step.

However, not all the rewards may have the same importance. Intuitively states and action-

states may have more influence in closer rewards. This is modelled with the cumulative

discounted reward Gt also known as return,

Gt = Rt+1 + γRt+2 + ... =

∞∑

k=0

γkRt+k+1,

where R is the reward at each time-step, γ is the discount factor stating how much impor-

tance further rewards have for the return at time t and k are the successive time-steps from

the current time to the end of the episode. The discount factor γ is always set between 0

and 1, where γ equal to 1 gives the same importance to all the future rewards, whereas γ

equal to 0 just takes into account the next reward.

14

CHAPTER 3. METHOD

Dynamic programming is often used in RL to solve a whole MDP as a set of sub-

problems. Temporal difference (TD) learning is used to update Q by using the discounted

action-value function of the following state γQt+1 instead of the complete return. This

technique is specially useful in long or infinite MDPs. The experiments performed in this

thesis involve short MDPs. Thus, we have used the Monte Carlo (MC) approach in which

the actual return is used to perform the updates after the termination of a complete episode.

After all, the goal of the agent is to optimize the policy π by maximizing the expected

return for each action-state. The return is used to adjust V and Q. Consequently, in each

state S the agent takes the action A with greater Q. There are multiple techniques used

to update the value functions, such as Q-learning or SARSA. In Q-learning the updates

are performed following the expression

Qnew(st, at) = Qold(st, at) + α(Gt −Qold(st, at)),

where Qnew(st, at) is the updated action-value function, Qold(st, at) is the old action-

value function, Gt is the return at the time-step t and α is the learning rate. The term

Gt −Qold(st, at) determines the shift of the action-value function.

3.1.2 Deep reinforcement learning

In simple environments it is feasible to store a table of Q values for each combination of

states and actions. Nonetheless, in the practice, real-world scenarios present a very large

or even infinite action-state space. This shortcoming is addressed by using a set of features

representing the environment state as an input to a function approximator πθ(s, a). The

function approximator can be any kind of function with adjustable parameters θ. The logic

behind function approximators is that similar state features most likely yield to close by

actions.

Deep neural networks have proved to be successful and reliable in a wide variety of

ML tasks. In this regard, DNNs have been also used as function approximators in DRL.

The main change from traditional RL to DRL is shown in Figure 3.2 where the agent is a

function approximator, in this case a NN. They are able to find complex patterns in data

that yield to very good results. DNN as function approximators take the features repre-

senting the state and output a probability distribution over the actions to take. Hence,

15

CHAPTER 3. METHOD

DRL converts the MDP in a stochastic process, where the actions are not deterministic

to the value Q but probabilistic. The stochastic approach allows to explore actions that

initially may seem not to be the most optimal ones but that in the end may lead to better

results.

Figure 3.2: Deep reinforcement learning schema with DNN as an agent

The parameters of the neural network (NN) need to be updated similarly as the action-

value function was updated in Q-learning. In this case, the update of the set of parameters

θ may be performed through different techniques, such as policy gradient. In the policy

gradient method, the policy i.e., the NN, is adjusted by updating the weights using the

gradient as follows,

θ = θ + α

∞∑

t

∇θ log πθ(st, at) ∗ (vt − bt),

where the gradient ∇θ log πθ(st, at) indicates the update direction that increases the

probability of taking the action at in the state st. In order to reinforce actions that yield to

higher rewards and decrease the probability of actions leading to lower rewards the term

(vt − bt) is used. The baseline bt is computed as the average return for a given time-step.

Then bt is subtracted from all the returns vt in this time-step. As a consequence, actions

that yield to a return higher than the average return are reinforced. Otherwise, actions

leading to a return lower than the average return result in a negative value and therefore,

their probability is decreased.

3.1.3 Deep reinforcement learning in scheduling

Computing clusters are usually formed by multiple nodes. Besides, each node is described

by a set of changing parameters, such as memory, CPU or capacity. Furthermore, the jobs

16

CHAPTER 3. METHOD

to be allocated have also their parameters that must be considered by the scheduler. As a

consequence, the state that describes the system at each point is large and hence, it needs

to be approached as a DRL task.

In order to ease the understanding of the work described in the following sections the

elements of a RL problem are matched with the specific elements of a distributed computing

system. The environment is the whole system where the algorithm is developed, in this case

is the computing cluster and all of its elements and working dynamic, such as nodes, jobs,

buffer, order of arrival of jobs, etc. Besides, the agent is the element that takes allocation

decisions, that is, the scheduler. In addition, the state presents the current status of the

cluster, in this specific context the state can be pictured as a set of features describing the

system, such as CPU and memory available, job requirements, job duration, etc. Finally,

the actions can be considered as all the possible scheduling decisions that the scheduler

might take, e.g., allocating one job to an specific node.

17

CHAPTER 4. AFFINITY BASED SCHEDULING

Chapter 4

Affinity based scheduling

This section presents the affinity based scheduling experiment. First, the goal of the exper-

iment is stated. Then, the implementation and the logic of the experiment are described.

Finally, the results of the experiment are presented.

4.1 Goal

Currently, the level of complexity of applications has risen exponentially. Thus, their re-

quirements are also very specific and diverse. On the other hand, computing clusters are

designed to run plenty of different applications and even several applications at the same

time. Subsequently, it is very rare to design a cluster with the goal of specifically fulfilling

the requirements of a single application. Besides, the cluster often provides an static set of

information to the scheduler. As a consequence, some information of the cluster that may

relevant for an specific application may be unknown for the scheduler, leading to a non

optimal scheduling policy. Therefore, DRL may result useful to design scheduling policies

for singular applications with specific requirements when the information from the cluster

state is not fully seen.

The different components that form an application usually have different purposes and

therefore their requirements are also diverse. For instance, a module of an application may

be CPU demanding and another module may require to run in a node with a solid-state

drive (SSD) device. These two modules have affinities to nodes with high CPU capabilities

and SSD nodes respectively. Nonetheless, the type of CPU and memory of each node may

be unknown for the scheduler. Subsequently, the scheduler is not capable of allocating

these modules accordingly to their affinities, resulting in a poor application performance.

On the other hand, DRL has potential to learn these module-node affinities through ex-

18


perience, converging into an optimal scheduling policy. DRL learning is able to associate

the scheduling decisions with the performance of the application, therefore improving its

scheduling policy with previous experience.

Therefore, the objective of this section has been to prove that a DRL based scheduler

is able to learn the affinities of arriving jobs by gaining experience through interacting

with the cluster. Thus, learning to schedule the jobs to the appropriate nodes. Hence,

enabling the final scheduler to take the scheduling decisions with the goal of optimizing a

performance metric, such as the average job duration.

4.2 Implementation

4.2.1 Simulated cluster environment

The experiments were performed in a simulated computing cluster. The simulation was

designed to be scalable. Hence, the design of the simulated cluster architecture could be

easily adjusted to the experiment requirements. The implemented simulation included all

the principal components of computing clusters and its main characteristics that are shown

below.

• Nodes: Nodes are the computing units of the simulated cluster and they run the

jobs they are assigned. Their main parameters were set to be the following.

– Memory capacity: It is the total memory capacity of the node. The memory

capacity is represented as a number. For instance, the memory capacity of a

sample node can be of 1000 memory units.

– Affinity: It represents an specific characteristic that makes the node more suit-

able for executing some type of jobs.

• Jobs: Jobs are the tasks arriving to the simulated computing cluster. They are

executed in the nodes consuming their resources. Their principal features were es-

tablished to be the following.

– Memory request: It is the amount of memory that the job require to be executed.

When the job is allocated to a node, it consumes this quantity of memory from

the available memory of the node. For instance, if a job with a memory request

of 500 memory units is assigned to a node with 750 memory units available, the

resulting available memory of the node will be of 250 memory units until the

job finishes executing.

19


– Execution time (t) : The execution time of the jobs determines how much time

the job takes to finish its execution once allocated in a node.

– Affinity preference: It represents which is the most suitable type of node to

execute this job. A job will have reduced its execution time if it is allocated to

a node with matching affinity.

• Buffer: The buffer is the element that contains the incoming jobs prior to be sched-

uled to one of the nodes. The buffer is characterized by its capacity. For instance, a

buffer may be able to contain a maximum of two jobs in two slots.

• Scheduler: The scheduler decides where to allocate each waiting job in the buffer.

That is, to which node each job of the buffer is assigned to. It is the element that we

were attempting to optimize to perform the optimal action in each specific cluster

state.

The simulated cluster in this experiment was a multi-node cluster with two computing

nodes, the node 1 and the node 2. Both nodes were set to have 1500 units of memory. The

node 1 was set to have an affinity equal to 1, otherwise, the node 2 was initialized with an

affinity of 2.

The set of jobs to be scheduled, named jobset, was composed of 60 jobs. Each job had

a memory request picked from a uniform random distribution among 250, 500 and 750

memory units. Initially, the execution time exect of all the jobs was set to 16 time-steps.

Besides, the affinity preference of each job was sampled from a random uniform distribu-

tion among 1 and 2. The system was designed to reduce the execution of jobs that were

allocated in nodes with matching affinity. Therefore, if a job with affinity preference was

allocated in a node with similar affinity, the execution time of the job was set to be exect/2,

that is 8. In contrast, if the affinity of the job and the node did not match the execution

time of the job remained the initial, 16.

The buffer was set to have a capacity of two slots, i.e., it could contain as maximum 2

jobs at the same time. The scheduler was enabled to observe the resource request and the

affinity preference of the jobs contained in the buffer. Moreover, the scheduler was provided

with the available resources from each of the nodes. Nevertheless, the resource request of

the rest of the jobs in the jobset was not shown to the scheduler until they would arrive

to the buffer. It is worth noticing that the scheduler was not able to observe the affinity of

the nodes. Hence, the scheduling decisions could not be directly taken to match the affinity

20


preference of the jobs with the affinity of the node, but otherwise, they had to be learned

from experience.

An episode of the simulation was set to run in a sequential fashion. Initially, two jobs

from the jobset were randomly selected to fill the buffer. Then, at each time-step t the

scheduler decided in which node to allocate each of the jobs in the buffer. It is worth

highlighting, that the scheduler could decide not to schedule a job, therefore, remaining in

the buffer. The episode was establish to finish when all the jobs from the jobset had been

scheduled and when all the jobs had finished its execution in the nodes. The architecture

and the logic of the simulation of the affinity based scheduling experiment is presented in

Figure 4.1.

Figure 4.1: Schema of affinity based scheduling experiment

As it can be observed in the example figure, the “Node 1” with affinity equal to 1

contains the “Job 1” which has an affinity preference of 1, therefore, since both affinities

are matching the execution time of the job is of 8 time-steps. On the other hand, the “Node

2” of affinity 2 contains the “Job 2” that has an affinity preference of 1, since the affinities

do not match, the execution time of the job is of 16 time-steps. The designed scheduler

aimed to learn the scheduling decisions that minimized the average duration of the jobs by

implicitly learning the affinities of the nodes from experience. The implementation of the

21


DRL based scheduler included several elements that are presented in the next sections.

4.2.2 Objective

The principal optimization objective was to minimize the average job duration. The dura-

tion of a job was considered as the complete time the job remains in the system. In other

words, the duration of the job was the number of time-steps the job was waiting in the

buffer to be allocated plus the time the job was running in a node, that is the execution

time of the job. Hence, for a job j the duration of the job was dj and the average job

duration of all the jobs in an episode was calculated as follows,

d =1

N

N∑

j=0

dj ,

where d is the average job duration, N is the total number of jobs in the episode and

dj is the duration of the job j.

4.2.3 State

The state is the current status of the whole system. Nonetheless, it is worth to distinguish

it from the observable state. The observable state is the collection of information that is

presented to the agent, in this case the DRL scheduler. Hence, the observable state is a

subset of the system state, since the scheduler does not have access to all the information.

As it was described in section 3.1.1, the state was used as an input to the agent, that

in turn took an action. Thus, the scheduler took the scheduling decisions based on the

observable state.

In each time-step, the state of the system may change, therefore, we refer to the observ-

able state at a given time-step t as St, also referred as observation. In this experiment the

observable state was set to contain the information of the available memory of each node

and the affinity preference and memory request of each of the jobs waiting in the buffer.

All of these features were arranged as categorical attributes. Moreover, each observation of

the system state was one-hot encoded prior to be passed as an input to the DRL scheduler.

In order to facilitate the comprehension, the job in the first slot of the buffer is named as

“job 1” and the job in the second slot of the buffer is refereed as “job 2”. Table 4.1 shows

an example of an state observation.

22


Node 1

Memory available

Node 2

Memory available

Job 1

Memory request

Job 1

Affinity preference

Job 2

Memory request

Job 2

Affinity preference

Categorical data 500 1000 750 1 250 2

One-hot encoded data 001000 000100 0001 010 0100 001

Table 4.1: Example of observation St in affinity based scheduling experiment

As it can be observed in the table, the feature “memory available” had 7 different cat-

egories, each of them corresponding to 0, 250, 500, 750 until 1500 in steps of 250. On the

other hand, the attribute “memory request” had 4 different categories corresponding to

the three different values to be picked from the random uniform distribution of memory

requests, to which we added an extra category to represent that the buffer slot was empty.

Finally, the attribute “affinity preference” had 3 different categories, two of them corre-

sponding to the two different values of affinity and the other one representing the empty

buffer slot.

4.2.4 Action space

The action space represented all the possible actions that the scheduler was able to take.

In this experiment the action state embraced a set of 9 different actions, covering all the

possible scheduling combinations for the two jobs in the buffer and for both nodes of the

cluster. Table 4.2 presents the different actions and their corresponding description.

As it can be observed, some of the actions were conceived to schedule just one of the

jobs, leaving the other job waiting in the buffer. Besides, the action 0 did not schedule any

of the waiting jobs in the buffer. Presenting all the possible actions to the scheduler allowed

to perform a complete optimization by exploring all the possibilities, even if initially they

did not look as the most optimal ones.

It is worth highlighting that when an action was picked by the scheduler it was per-

formed only if the requirements of the jobs were totally fulfilled. For instance, if the action

7 was selected but the memory requirement of the “job 2” was higher than the memory

available in the “node 2”, then neither the “job 1” nor the “job 2” were scheduled. In other

words, if any of the job requirements were not fulfilled the whole action did not take place

and one time-step proceed without scheduling any of the jobs in the buffer.

23


Action Description

0 Do not schedule any job

1 job 1: schedule to node 1

job 2: remain in buffer

2 job 1: remain in buffer

job 2: schedule to node 1




job 2: remain in buffer

5 job 1: remain in buffer








Table 4.2: Action space in affinity based scheduling experiment

4.2.5 Reward function

From the algorithm point of view, the goal was to maximize the received reward. There-

fore, the reward function had to be crafted to match with the experiment optimization

objective, i.e., minimizing the duration of the jobs.

Hence, the reward at each time-step was calculated as the negative sum of the number

of jobs present in the system in the current time-step. All the jobs running in nodes

and the jobs waiting in the buffer were considered as jobs contained in the system. The

mathematical reward function is presented as follows,

Rt =∑

j∈Js−1,

where Rt is the reward at the current time-step and Js is the set is jobs that are cur-

rently in the system.

24


4.2.6 Agent: DRL scheduler

As it has been discussed, the agent in charge of taking the actions, in this case the schedul-

ing decisions, is a DRL based scheduler. The DRL scheduler takes as input the state of the

system and returns an scheduling action, therefore, transitioning the state of the system.

The DRL scheduler built in this experiment was based on the policy gradient method

described in section 3.1.2. As explained, the policy gradient method uses a function ap-

proximator, in this case a DNN, to output a probability distribution over all the possible

actions given a certain input state.

The input for the DNN was a vector of a single dimension with 28 elements that corre-

sponded to the one-hot encoded features of the state presented in section 4.2.3. It is worth

highlighting that the performance of DNN is significantly increased by one-hot encoding

the input data. The designed DNN had an architecture of 5 different layers, with dimen-

sions 28×64×32×16×9. As it can be observed, the first and last layer have the dimensions

of the input data and of the output action space since they are the input and output layers

respectively. The other three layers correspond to the hidden layers of the DNN. In total

the DNN had a total of 4617 weights. The weights of the DNN were initialized using the

Xavier initialization [17]. The Rectified Linear unit (ReLu) was used as activation function

in all the hidden layers. The output layer was left without activation function, hence, it

outputed the logits, i.e., the probabilities of each action mapped in the interval (−∞,∞).

The optimizer used to update the weights with the policy gradient algorithm was the Adam

optimizer [18] with an initial learning rate of α = 0.001.

The goal was to design a scheduler capable of optimizing the job’s duration in sequences

of arriving jobs. First, the DRL scheduler was trained with a single set of jobs. Then, in

order to build an scheduler capable of generalizing, i.e., performing the optimization task

for different sequences of arriving jobs, a group of 5 different jobsets was used in the train-

ing phase. The total number of training iterations was set to 200. The weights of the DNN

were updated after each iteration. Moreover, in each iteration for each jobset a total of 30

episodes were run. Executing several episodes per jobset allowed to explore the outcome of

different actions under the same policy (since the weights were not updated until the end

of a single iteration), therefore, exploring the action space to find the optimal scheduling

decisions.

25


As it has been stated, the update of the weights was performed after each iteration.

Hence, the gradients for each action state pair at each time-step were stored in an ar-

ray. Furthermore, the average of all the returns in the same time-step from the different

episodes of the same jobset was subtracted from each single return in order to decrease

the probability of non optimal actions while increasing probabilities of actions yielding to

higher returns.

Finally, in order to avoid overfitting the technique of early stopping was used. Early

stopping allows to stop the training of the DNN once the performance parameter, in this

case the reward was not being optimized anymore. Hence, in this experiment the training

was interrupted if no improvement in the reward was observed after 50 iterations.

4.3 Results

The evolution of the performance of the scheduler through the training iterations was

recorded and is presented in this section. Furthermore, after finishing the training phase,

the performance of the DRL based scheduler was validated with a test jobset that was not

used in the training phase.

The performance of the DRL scheduler was compared against a baseline scheduler.

Multiple orchestrators such as kubernetes, use schedulers that aim to balance the work-

load in all the nodes. Therefore, a workload balancer scheduler was used as a reference.

The workload balancer scheduler aimed at each time-step to allocate jobs in the nodes

with less used memory to maintain an even load across the cluster.

Figure 4.2 presents the difference of memory usage between the two nodes of the cluster

when using the DRL scheduler and the Load Balancer (LB) scheduler. It is possible to

observe that the LB scheduler maintains the difference of memory usage between both

nodes more stable and lower, with an average memory usage difference of around 400

memory units and 1000 memory units as highest difference. In contrast, the difference

of memory usage with the DRL scheduler is greater in overall, reaching several times a

difference of memory usage of 1400 memory units. Nonetheless, the implemented DRL

scheduler was designed to optimize a given parameter of interest, in this case the average

job duration as the following figures present.

26


Figure 4.2: Difference of memory usage between nodes for DRL and LB schedulers

Figure 4.3 presents the evolution of the average job duration throughout the training

iterations. It is worth noticing, that the average job duration was calculated as the average

job duration of all the jobsets and episodes that were ran in each iteration. In the figure,

the blue line represents the average job duration obtained using the DRL scheduler, in con-

trast, the discontinuous green line represents the LB scheduler. During the initial iterations

the average job duration using the DRL scheduler was very high, around 16 time units.

Nonetheless, the average job duration quickly dropped throughout the first 20 training

iterations to a value of 11 time units. This is explained by the fact that the DRL rapidly

adapted its policy to optimize the average job duration. Afterwards, the average job dura-

tion kept decreasing at a much lower rate until the early stopping technique finalized the

training after nearly 60 iterations when the average job duration was not being optimized

any more. On the other hand, the average job duration of the LB scheduler remained con-

stant, since it is a static policy, with a value around 15.5 time units. In the beginning, the

LB scheduler had a better performance than the DRL scheduler. Nonetheless, the DRL

quickly learnt to take scheduling decisions that minimized the average job duration. Af-

ter a few training iterations, the DRL scheduler performance overcame the LB scheduler.

Furthermore, in the end of the training phase, there was a difference in the average job

duration of 4.5 time units. Hence, it can be concluded that the DRL successfully learnt the

optimal scheduling decisions that minimized the average job duration.

27


Figure 4.3: Average job duration evolution during the training phase in affinity based

scheduling

As stated previously, the policy gradient algorithm was trained to maximize the received

reward in each episode. Figure 4.4 presents the evolution of the average reward in the

course of all the training iterations. In this case, the curve indicates that the reward was

maximized with the number of training iterations. It is possible to observe that the reward

highly increased until the iteration 20. Then, the increasing trend was softened until the

algorithm stopped training at a reward of -780. The evolution of the reward was similar to

the evolution of the average job duration but inverted, this is explained by the fact that

the reward function was crafted to optimize the average job duration. Thus, by maximizing

the reward, we were minimizing the average job duration.

28


Figure 4.4: Average reward evolution during the training phase in affinity based scheduling

After the training finished, the DRL scheduler and the LB scheduler were used to

allocate the jobs of a test jobset. This allowed to test the performance of the DRL scheduler

on a jobset other than the ones used in the training phase, or in other words, the capacity

to generalize for unseen jobsets. The average job duration for both, the DRL scheduler and

the LB scheduler are presented in Figure 4.5. It can be observed that the DRL scheduler

achieves an average job duration a little bit higher than 10 times units. On the other hand,

the LB scheduler resulted in an average job duration of 14 time units. Therefore, the DRL

scheduler was able to generalize for unseen jobsets providing a better performance than

the LB scheduler.

29


Figure 4.5: Average job duration of DRL and LB schedulers for the test jobset in affinity

based scheduling

30

CHAPTER 5. DISTANCE BASED SCHEDULING FOR MULTI-MODULE APPLICATIONS

Chapter 5

Distance based scheduling for

multi-module applications

5.1 Goal

Nowadays, applications are made up by multiple modules, each of them performing a

different task. Besides, modules from the same application need to maintain a constant

communication among them to collaborate and to correctly run the application. There-

fore, the performance of an application often relies on the quality of the communication

among the different modules.

It is known that one of the parameters that affects the most communication among

computing nodes is the distance. In a computing cluster, the nodes might have different

distances among them. Therefore, some nodes might be located in the same location or

region whereas other nodes might be located in a distant region separated by a certain

distance that affects the velocity of the communication. As a consequence, scheduling has

become a key task to allocate modules that require a constant communication with the

goal of improving the performance of the whole application.

Traditional scheduling heuristics usually do not consider the location of the nodes in the

scheduling process. Furthermore, network telemetry data is often not available or unknown

for the scheduler. Hence, the goal of this second experiment is to demonstrate that a DRL

based scheduler is able to learn the optimal scheduling policy for multi-module applications

in a cluster where the distance among the nodes is not explicitly presented to the scheduler.

31


5.2 Implementation


This second experiment was performed in a similar environment to the first experiment.

The elements forming the cluster were similar, the nodes, the jobs (modules), the buffer and

the scheduler. Nonetheless, some parameters of these elements were changed in order to

adapt the simulation to the objective of this second experiment. The changes an additions

to the elements were the following.

• Nodes: The resource parameter “memory capacity” was maintained. On the other

hand, the parameter “affinity” was eliminated. A new parameter specific of the second

experiment was included, the “location”.

– Location: It is a numerical parameter that represents the physical location of

a node within the cluster. Nodes that were in nearby locations intuitively had

better communication, thus, the closer modules from the same application were

the faster their execution time was. On the other hand, nodes that were further

from each other had a poor communication, therefore, the modules took more

time to be executed.

• Jobs: Jobs or modules maintained the same resource request parameter, “memory re-

quest”. Similarly, to the nodes the parameter “affinity preference” was not considered.

Besides, a new parameter “application” was added.

– Application: This categorical parameter represented to which application be-

longed a job or module. In order to be executed, a job or module needed to

communicate and run together with another job of the same application in the

cluster.

• Buffer: In this second experiment the buffer was set to have a single slot. Therefore,

a single job was allowed to be scheduled in each time-step.

• Scheduler: The changes performed to the scheduler, such as the action space or the

architecture of the NN will be discussed in the following sections.

The simulated cluster in the region based scheduling experiment was a multi node clus-

ter with 6 different computing nodes, nodes 1 to 6. All nodes were set to have a memory

capacity of 500 memory units. Each node was assigned with a different location parameter

as is indicated in Table 5.1.

32


Nodes node 1 node 2 node 3 node 4 node 5 node 6

Location 1 2 3 4 5 6

Table 5.1: Location parameters of the nodes

Successively, the distance between nodes was calculated as the absolute value of the

difference between the location of the nodes. For instance, the distance between “node 2”

and “node 5” was 3, or the distance between “node 1” and “node 2” was 1.

The jobset was composed of 24 jobs to be allocated in a single episode. Each job had

a memory request of 500 memory units. Intuitively, in this experiment, a single node was

capable of containing a single job simultaneously, since the total capacity of the nodes was

equal to the memory request of any of the arriving jobs. In that way, the availability of a

node could have just two states, available or occupied, therefore, reducing the state space

and focusing on the distance among nodes. Initially, the jobs were set to have an execution

time exect of 10 time units. In addition, the jobs belonged to three different applications.

Thus, 8 jobs were assigned to “application 1”, another 8 jobs to “application 2” and the

remaining 8 jobs to “application 3”. It is worth noticing that the number of jobs belonging

to the same application in the jobset was required to be even, since jobs from the same

application needed to be grouped in pairs in order to be executed. This allowed to sim-

ulate a multi-module application where modules needed to communicate among them to

complete their tasks.

In this experiment the DRL scheduler was aimed to reduce the execution time of jobs

from the same application by learning from experience to scheduler these jobs to close by

nodes. As it has been stated, the execution time exect was calculated based on the distance

between the nodes where the jobs had been allocated as follows, exect = exect × distance.

For instance, for a pair of nodes from the same application allocated in “nodes 1” and “2”

respectively the execution time was exect = 10× |1− 2| = 10 time units. Otherwise, if the

jobs were allocated in further nodes, such as in “nodes 2” and “5”, their execution time was

exect = 10× |2− 5| = 30 time units.

In this experiment, the scheduler was allowed to observe the application identifier of

each job contained in the nodes. Besides, the scheduler was presented with the application

identifier of the job waiting in the buffer. Moreover, the application identifier of the next

three jobs to arrive to the buffer was provided to the scheduler. In that way, the scheduler

33


could take into consideration the next jobs to arrive to decide the optimal nodes to allocate

the jobs.

Similarly to the first experiment, an episode ran sequentially. At every time-step t a

new job arrived to the buffer. The episode was set to finish when all the jobs from the

jobset had been scheduled and when all the jobs had finished their execution in the nodes.

The set-up and the logic of the experiment are presented in Figure 5.1.

Figure 5.1: Schema of distance based scheduling simulation

As it can be seen in the experiment schema, two jobs that belong to the same applica-

tion “App1” are allocated in “nodes 1” and “2”, since the distance separating these nodes was

of 1 time unit their execution time was 10 time units. On the other hand, two additional

jobs from the same application “App2” are located in nodes 3 and 6, since the distance

separating these jobs is of 3 units, their execution time is multiplied by 3, resulting in

30 time units. This example of the simulated cluster shows how important was to allocate

jobs in nearby nodes to reduce their execution time and therefore, the average job duration.

5.2.2 Objective

The objective of this experiment was similar to the objective of the previous experiment, to

minimize the job duration. In this experiment the minimization of the average job duration

34


could be achieved by allocating jobs from the same application in nearby nodes.The formula

of the average job duration is,

d =1

N

N∑

j=0

dj ,

where d is the average job duration, N is the total number of jobs in the episode and

dj is the duration of the job j.

5.2.3 State

The observable state St at a given time-step t and hence, the input data for the scheduler,

differed from the previous experiment in several elements. In this experiment, the state

contained the type of application of the job contained in each node. Besides, the applica-

tion type of the job waiting in the buffer and of the next three jobs to arrive were included

in the scheduler. Thus, the state did no contain the location of the nodes. Therefore, the

distance was unknown for the scheduler, meaning that it had to learn it from experience.

The state was represented as an array of 0 and 1 containing the one-hot encoded values of

the previous mentioned fields as Table 5.2 shows.

Node 1

App...

Node 6

App

Buffer

App

Job + 1

App

Job + 2

App

Job + 3

App

Categorical data 1 ... 0 2 3 1 0

One-hot encoded data 0010 ... 0001 0100 1000 0010 0001

Table 5.2: Example of observation St in distance based scheduling experiment

The application parameter could have 4 different values. The value of 0 represented

when there was not a job in the node, in the buffer or among the arriving jobs, in addition,

the values 1, 2 or 3 represented the different application to which each job could belong

to. The “buffer app” field contained the application type of the job waiting in the buffer.

Besides, the fields named as “job + n” indicated the application of the jobs to arrive to the

buffer after “n” time-steps. In total the state was an array containing 40 elements.

5.2.4 Action space

The action space represented all the possible actions that the DRL scheduler was able

to take. In this experiment the action state embraced a set of 7 different actions. Since

the buffer counted with a single slot, each action indicated to which node the waiting job

35


or module was going to be scheduled. Table 5.3 presents the different actions and their

corresponding description.

Action Description

0 Do not schedule the job

1 Schedule the job to Node 1






Table 5.3: Action space in distance based scheduling experiment

It is worth highlighting that when an action was picked by the scheduler it was per-

formed only if the requirements of the jobs were totally fulfilled. Thus, if the DRL scheduler

decided to perform an action involving allocating a job in an already occupied node, the

action was not executed and the simulation time proceeded.

5.2.5 Reward function

As in the previous experiment, the goal was to maximize the received reward. Therefore,

the reward function had to be crafted to match with the experiment optimization objec-

tive, i.e., minimizing the duration of the jobs. Hence, the same reward function was used

in this experiment.

The reward at each time-step was calculated as the negative sum of the number of jobs

present in the system in the current time-step. All the jobs running in nodes and the jobs

waiting in the buffer were considered as jobs contained in the system The reward function

is presented as follows,

Rt =∑

j∈Js−1,

where Rt is the reward at the current time-step and Js is the set of jobs that are cur-

rently in the system.

36


5.2.6 Agent: DRL scheduler

Similarly to the affinity based scheduling experiment, the agent was a DRL based sched-

uler. The DRL was set to take as an input the state described in the previous section.

Then, the scheduler was set to output a probability distribution over all of the possible

actions within the action space.

The DRL scheduler built in this experiment was based on the policy gradient method

described in section 3.1.2. In a similar way to the previous experiment a DNN was used as

a function approximator to output the scheduling decisions.

The input for the DNN was the state array of 40 elements that corresponded to the one-

hot encoded features presented in the section 4.2.3. The designed DNN had an architecture

of 6 different layers, with dimensions 40× 128× 64× 32× 16× 7. As it can be observed,

the first and last layer have the dimensions of the input data and of the output action

space since they are the input and output layers respectively. In this case we included one

extra layer in the beginning with 128 units, due to the higher dimension of the input state

array. The total number of weights of the DNN was of 16231. The weights of the DNN

were initialized using the Xavier initialization. The Rectified Linear unit (ReLu) was used

as activation function in all the hidden layers. The output layer was left without activation

function, hence, it outputed the logits, i.e., the probabilities of each action mapped in the

interval (−∞,∞). The optimizer used to update the weights with the policy gradient algo-

rithm was the Adam optimizer [18] with an initial learning rate of α = 0.0001. Note that

in this experiment we used an initial learning rate 10 times smaller than in the previous

experiment. This change was needed since the gradient was exploding in the middle of

the training phase, and hence, the DRL scheduler was arriving to a non optimal policy.

Therefore, by upgrading the weights in smaller steps we were able to reach the optimal

policy.

The goal was to design a scheduler capable of optimizing the job’s duration in sequences

of arriving jobs. The DRL scheduler was trained with 2 different jobsets in the training

phase. The total number of training iterations was set to 300. The weights of the DNN

were updated after each iteration. In contrast to the affinity based scheduling experiment

where 30 episodes were ran per jobset an iteration, in this experiment, 150 episodes were

ran. By increasing the number of episodes we were able to enlarge the exploration phase in

the initial iterations, which resulted very beneficial due to the larger size of the input state.

37


As it has been stated, the update of the weights was performed after each iteration.

Hence, the gradients for each action state pair at each time-step were stored in an ar-

ray. Furthermore, the average of all the returns in the same time-step from the different

episodes of the same jobset was subtracted from each single return in order to decrease

the probability of non optimal actions while increasing probabilities of actions yielding to

higher returns.

Finally, in order to avoid overfitting, the technique of early stopping was used. In the

same way as in the affinity based scheduling experiment, if the performance of the sched-

uler did not improve after 50 iterations, the training was stopped.

5.3 Results

The evolution of the performance of the scheduler through the training iterations was

recorded and is presented in this section. Furthermore, after finishing the training phase,

the performance of the DRL based scheduler was validated with a test jobset that was not

used in the training phase.

In this experiment a baseline scheduler, LB, was used to compare the performance of

the DRL scheduler. As it has been stated the nodes just had space to allocate one job at a

time. Hence, the baseline scheduler allocated randomly the waiting job to one of the free

nodes and it never attempted to allocate the job in an already occupied node.

Figure 5.2 presents the evolution of the average job duration throughout the training

iterations. It is worth noticing, that the average job duration was calculated as the average

job duration of all the jobsets and episodes that were ran in each iteration. In the figure,

the blue line represents the average job duration obtained using the DRL scheduler, in

contrast, the discontinuous green line represents the LB scheduler. In this experiment non

optimal scheduling decisions yielded to large job duration times, given that the execution

time of a job could be incremented up to 5 times if two jobs were allocated in the most

remote nodes, that is, in “node 1” and in “node 6”. Therefore, in the initial iterations the

average job duration was very high, close to 55 time units. Afterwards, during the first

80 iterations the average job duration dropped to 25 time units, meaning that the DRL

was able to successfully learn and optimize the scheduling policy. Then, the average job

38


duration kept decreasing until it reached a value near to 22 time units when early stopping

ended the training. It is worth highlighting that once a node was allocated in a node, it

would not start its execution until another job of the same application was allocated in the

cluster. Therefore, some jobs had to wait during a few time steps to start its execution,

due to this reason the average job duration never decreased to the initial execution time

of 10 time units. On the other hand, the LB scheduler had an average job duration of 55

time units.

Figure 5.2: Average job duration evolution during the training phase in distance based

scheduling

39


As stated previously, the policy gradient algorithm was trained to maximize the re-

ceived reward in each episode. Figure 5.3 presents the evolution of the average reward in

the course of all the training iterations. In this case, the curve indicates that the reward was

maximized with the number of training iterations. The reward increased at a constant rate

until the 100 iteration, when it reached a reward close to -350. Then, it kept increasing at

a lower pace until early stopping halted training at a reward of -300. The evolution of both

the average job duration and the reward were closely related, by maximizing the reward,

we were minimizing the average job duration.

Figure 5.3: Average reward evolution during the training phase in distance based scheduling

40


After the training finished, a test jobset was used to asses the performance of the DRL

scheduler, and furthermore, to compare it with the LB scheduler. Hence, it was possible to

observe the ability to generalize of the DRL scheduler. The average job duration for both,

the DRL scheduler and the LB scheduler are presented in Figure 5.4. It can be observed

that the DRL scheduler achieved an average job duration around 45 times units. On the

other hand, the LB scheduler resulted in an average job duration of 70 time units.

Figure 5.4: Average job duration of DRL and LB schedulers for the test jobset in distance

based scheduling

41

CHAPTER 6. TIME SERIES WORKLOAD ESTIMATION

Chapter 6

Time series workload estimation

This section presents the workload estimation experiment. First, the main objective of the

experiment is presented, followed by the implementation and the results.

6.1 Goal

The previous experiments have shown that cluster telemetry data can be successfully used

in the scheduling process to improve the performance of the system. As stated in the scope

of the thesis, the second objective of this work was to use telemetry data from the cluster

to predict the workload of a cluster of nodes.

Predicting the future workload of the nodes in a computing cluster is a good opportu-

nity to understand the operation of a system or application. Moreover, it might be useful to

avoid overload in nodes by detecting high peaks of resources usage and hence, preventing

node failures and system breakdowns.

Therefore, the goal of this experiment was to perform a time series forecasting of the

memory usage of some nodes in a simulated cluster. This prediction was aimed to be

performed by analyzing previous memory usage data from the same node or some other

different nodes.

6.2 Implementation


In this experiment, Yet Another Fog Simulator (YAFS) [19] was used to build a cluster

of interconnected nodes. YAFS allowed to create a cluster of nodes where several applica-

42


tions can be deployed. Each application was formed by several services that were assigned

initially, in the definition of the set up, to a single node. The services exchanged mes-

sages among them through the links. A message was transmitted from a source node to

a destination node through the shortest link path. When a message arrived to a module,

it triggered the transmission of another message. Besides, messages consumed memory of

the destination nodes while they were being processed.

Messages were characterized by the parameters “instructions” and “bytes”. The instruc-

tions determined the processing time of the message upon arrival to the destination node.

The parameter bytes determined how much memory from the destination node was con-

sumed while the message was being processed. On the other hand, the nodes were charac-

terized by the parameter “RAM”, or available memory, and by the number of Instructions

Per simulation Time (IPT), which determined the processing time of a message based on

its number of instructions. Finally, the links that interconnect the nodes were characterized

by the “Bandwidth” and by the “Channel Propagation Speed”.

By using the previously described elements, we built a simulated cluster with the goal of

generating a workload in the nodes. Furthermore, to perform a prediction of the workload

based on the generated memory usage samples from the past. Figure 6.1 presents the set-up

of the simulated cluster, where nodes are represented by squares and the services of the

application by red circles. The application counted with 7 nodes in total and it contained 2

applications with different modules. For instance, “node 3” contains “service 2” and “service

1” from “application 1” and “application 2” respectively.

43


Figure 6.1: Simulated cluster architecture for workload prediction

Nodes were set to have different CPU and IPT values with the goal of emulating a real

computing cluster. In a similar way, the messages exchanged between the services were set

to have diverse memory requests and number of instructions. Hence, some messages were

heavier and took more time to be processed than others. Most of the messages exchanged

among services were deterministic, that is, they were always transmitted upon reception

of another message. In order to simulate occasional high peaks of memory usage in a node,

we introduced a set of messages that were launched with probability of 0.1, that is, they

had a low probability to be launched. These messages were set to concur in a high memory

demanding message to be executed in “node 3”. This message caused a harmful overload

of the node that ideally should be avoided. Therefore, the designed model attempted to

predict the normal workload and moreover, the high peaks of memory usage in this node.

Besides, we introduced gaussian noise to the bytes parameter of the messages in the

node. In that way, the workload generated was similar to a real scenario were the memory

usage fluctuates. Hence, the performance of the model could be tested with more complex

data by adding an stochastic component. Therefore, the memory usage at each time-step

in a node was calculated as the sum of all the bytes of the messages running in that node

with their corresponding variation sampled from the gaussian distribution.

Finally, two different scenarios were generated. In the first scenario, the workload in

“node 3” and the high peaks of memory usage depended only on the past workload in

the same node. Therefore, the prediction of the future workload was developed as an

44


univariate model. In another way, in the second scenario, the workload of “node 3” relied

on the previous workload of the same node and on the past workload of “node 2”. As a

consequence, the prediction of the future workload was carried out as a multivariate model.

6.2.2 Data

The data consisted in a time series of the memory usage in some of the nodes of the clus-

ter. The simulation was run for 1,000,000 time-steps, therefore, the dataset had 1 million

samples. The dataset was divided into a training-set and a test-set with a ratio of 70% and

30% respectively.

As it has been stated the goal was to perform a time series forecast. In other words,

train a model to predict future memory usage samples based on past memory usage sam-

ples. In this experiment, we subdivided the training-set and the test-set in segments of

250 samples by sliding a window of width 250 time-steps with a step size of 1 time-step.

Then, from each segment of 250 samples the first 200 samples were used to train the model

to predict the following 50 samples. Figure 6.2 represents with a blue points the samples

used to train the model to predict the following 50 samples represented in red. Besides,

the z-score normalization was performed on the data. The z-score normalization consists

in subtracting every sample by the mean and diving the result by the standard deviation

of the distribution of the data.

Figure 6.2: Segment of training data

45


In the scenario where the memory usage in “node 3” just depended on the previous

memory usage in the same node, the data counted with two columns. The first column for

the time-step and the second column for the memory usage of “node 3”. Table 6.1 presents

an example of a few samples from the univariate dataset.

time-step Node 3 memory usage

100 25842.62

101 25074.00

102 24644.44

103 25643.54

104 23006.18

Table 6.1: Sample of univariate time series memory usage datatset

On the other hand, the dataset used in the multivariate scenario counted with three

columns. The first column indicated the time-step, the second column the memory usage

of “node 2” and the third column the memory usage of “node 3”. Table 6.2 presents an

example of a few samples from the multivariate dataset.

time-step Node 2 memory usage Node 3 memory usage

100 4707.39 24930.52

101 4050.66 26694.94

102 4653.36 24148.10

103 4957.43 24597.24

104 5040.66 26674.67

Table 6.2: Sample of multivariate time series memory usage datatset

6.2.3 Forecasting model

The model chosen to predict the future workload in the node was a Long-Short Term

Memory (LSTM) network. LSTM networks are an specific type of recurrent neural net-

works that are characterized by maintaining in memory information from the far and close

past that may affect the future data. Hence, LSTM networks are able to find patterns in

previous data that affect the future data, in this case, patterns of previous workload that

influenced the future workload.

46


The architecture of the model counted with 2 sequential layers. The first layer was an

LSTM layer with 128 LSTM units. The second and last layer was a dense layer of size equal

to the number of target samples to predict, in this case 50 units. The training process was

performed in batches of 100 training segments, each segment with 250 samples, 200 as

input data and 50 samples as target. Besides, the model was trained for 10 epochs. The

update of the weights of the network was performed using the adam optimizer. Finally, the

loss was calculated by using the mean absolute error.

6.3 Results

The following two sections present the results of the workload prediction for both scenarios,

the univariate workload prediction and the multivariate workload prediction.

6.3.1 Univariate results

As stated, the model was trained for 10 epochs. The final loss on the validation set was of

0.2347. Considering that the data was influenced by the gaussian noise the final result was

satisfactory.

The model was able to learn the normal memory workload in “node 3”. Figure 6.3

presents an example of a prediction performed by the model. It shows in blue the historical

memory usage in “node 3” utilized to predict future samples. The red crosses represent

the true samples expected to be predicted by the model and the green dots stand for the

predicted samples by the model. It can be observed, that the model was able to predict

accurately the future memory usage in the node.

Figure 6.4 presents the prediction of the model when encountering an unusual high

peak of memory usage. In this case, it is possible to observe that there was a changing

pattern in the historical samples. The small increase and successive drop of memory usage

right before the time-step 0, was interpreted by the LSTM model as a pattern that caused

a high peak of memory usage in the future. Hence, the model was able to predict the high

peak in advance. It is possible to observe that the prediction for the high peak of memory

usage was not exact. However, the model correctly estimated that the level of memory

usage was about to rise rapidly to higher levels.

47


Figure 6.3: Univariate memory usage prediction in Node 3

Figure 6.4: Univariate high memory usage peak prediction in Node 3

6.3.2 Multivariate results

Similarly to the univariate scenario the model was trained in 10 epochs. The final valida-

tion loss of the model was 0.2243.

48


In this scenario the memory usage of the node 3 depended on the memory usage of

“node 2” and “3”. Therefore, the historical training data was taken from both “nodes 2”

and “3” to predict the future workload of “node 3”.

The Figure 6.5 presents the past memory usage of “node 2” and “3” in blue dots, and

in red and green indicators, the true future memory usage and predicted memory usage

respectively. It is possible to observe that the model learnt the specific patterns of the

previous memory usage of “node 2” and “node 3” to predict that there was going to occur

a high peak of memory usage in “node 3”.

Figure 6.5: Multivariate high memory usage peak prediction in Node 3

On the other hand, Figure 6.6 shows that when the memory usage pattern in “node 2”

did not increase until a value of 2.5, like in the previous example, the model was able to

recognize that the memory usage of “node 3” was not going to increase to high memory

usage peak, hence, avoiding a false alarm.

49


Figure 6.6: Multivariate average workload prediction in Node 3

50

CHAPTER 7. DISCUSSION

Chapter 7

Discussion

7.1 Affinity based scheduling

In the affinity based scheduling, during the evaluation of the results, it was possible to

observe some interesting behaviours that the DRL scheduler learnt in order to optimize

the average job duration. On top of everything, the scheduler learnt to prioritize allocating

each job to a node with matching affinity. In the beginning of the episode when both nodes

were empty, the scheduler usually chose an action involving allocating both jobs in order

to avoid waiting time in the buffer. Then, when the memory available in the nodes was

limited the scheduler was able to understand whether there was enough space or not in the

nodes to fit both jobs. Hence, in some cases the scheduler decided not to allocate the job

until there was enough free space in the node with matching affinity, instead of directly

allocating the job to a node with different affinity.

7.2 Distance based scheduling

In the distance based scheduling, the improvement and difference of performance between

both schedulers was very significant. It was proved that DRL based scheduling was able

to implicitly learn the location of the nodes to schedule jobs from the same application in

the closest nodes, thus, reducing their execution time. It is worth highlighting that this

experiment resulted to be more complex, since the DRL scheduler needed more iterations

to achieve the optimal scheduling policy.

Once the DRL was trained, we were able to observe some interesting patterns. For

instance, the scheduler decided to allocate the initial jobs in the nodes that were closer in

51


average to the rest of the nodes, that is, to “node 3” and to “node 4”. Therefore, the next

jobs had more chances to be located to a closer node. Besides, when jobs from the same

application were not arriving consecutively, the DRL scheduler was always leaving a free

node next to the node that was occupied by an unmatched job. For instance, if “node 1”

was containing a job from “application 2”, and the scheduler was aware that after a few

time-steps another job from “application 2” was arriving, it never scheduled another job to

“node 2”, reserving that node for the other arriving job of “application 2”.

The performance obtained on the test jobset was not as good as the one obtained with

the training jobsets. Nonetheless, it was still far better than the one obtained by the LB

scheduler. It can be interpreted that the DRL scheduler was able to extrapolate some of the

strategies learnt with the train jobsets and apply them with the test jobset, nonetheless,

it was not able to perform the best scheduling actions for the specific test jobset. This

may be explained by the fact that in this second experiment, two jobsets were used in the

training phase, and perhaps with more training jobsets the model would have been able to

generalize better. However, when more than two jobsets were used in the training phase

the training process was exploding, leading to very low rewards due to a yet unknown

reason.

7.3 DRL based scheduling

The results of the DRL based scheduling experiments showed that the performance of

this technique outscales traditional schedulers for specific optimization scenarios. In other

words, for the scheduling task it was designed for, the DRL scheduler was to able to per-

form the best possible allocation decisions of jobs.

In the experiments carried out along this work, the optimization parameter was the

average job duration. Nonetheless, DRL based scheduling can be designed to optimize any

other parameters, such as, job delay, workload balance, etc. Besides, one of the main ad-

vantages of DRL based scheduling is the ability to learn from experience without previous

knowledge. In the affinity experiment, the DRL based scheduler was able to learn which

were the most suitable nodes for every type of job without prior knowledge from the clus-

ter. Thus, the DRL technique was able to infer the type of node by observing the received

rewards after each iteration, and as a consequence, it converged to the ideal scheduling

policy for the specific scenario. Moreover, this technique also resulted effective to learn the

location of nodes within a cluster. In the location based experiment, the DRL scheduler

52


was able to learn the distances between nodes with the goal of allocating jobs that needed

to communicate in nearby nodes. All in all, it can be concluded that DRL methods applied

in scheduling are useful to find the optimal scheduling policy in unknown scenarios, where

the information from the cluster is not fully known.

On the other side, DRL based scheduling has some drawbacks. As stated, DRL sched-

ulers are very effective for the specific scheduling task they are designed. Nonetheless, when

the scheduling task changes significantly they are not able to generalize since they have

learnt an specific policy for a singular scenario. On the other hand, traditional scheduling

heuristics are able to perform decently in a wide variety of scenarios. In addition, DRL

schedulers need to be trained prior to achieving the optimal performance, therefore, they

need a period of learning and adaption. Consequently, DRL based scheduling cannot be

directly used, opposed to traditional scheduling heuristics that can be applied to new sce-

narios without the need of a training phase.

In the training phase, the format of the input data for the scheduler was also tested.

Initially, the data was set to be continuous. For instance, the available memory in the nodes

was input to the scheduler as a continuous variable, that could take continuous values in

the range between 0 and the maximum memory capacity. Nonetheless, the performance of

the DRL scheduler, in specific, of the DNN was not satisfactory. Successively, the memory

request of the jobs was set to have a categorical value among a limited number of fixed

values, e.g, 250, 500, 750, etc. The rest of the variables from both experiments, such as

the application or the affinity preference, were also categorical. In that way, it was possible

to one-hot encode these parameters. DNN are able to learn more effectively from one-hot

encoded input variables. As a consequence, the performance of the scheduler improved.

Hence, it can be concluded that DRL based scheduling is suitable in scenarios where

the optimization objective is clearly defined. Besides, applications that are stable, where

the type of jobs and arrival rate of jobs are regular, benefit from DRL scheduling. In con-

trast, the DRL scheduling techniques implemented in this work are not appropriate for

applications with unpredictable types of jobs or changing job arrival rates.

53


7.4 Workload estimation

In another way, the workload estimation experiment showed that techniques such as LSTM

can be used to forecast the workload in the nodes of a cluster. In the experiment the mem-

ory usage was predicted, nonetheless, this technique can be applied to predict the usage

of other resources, such as, the CPU utilization, the load of the network or even the tem-

perature of the devices in a cluster. The workload estimation can be used to avoid node

failures due to excessive usage of resources. Therefore, it represents a good opportunity as

a tool to be used in predictive maintenance systems.

The LSTM model presented in the experiment was able to predict very accurately the

high peaks of memory usage. However, in order to predict these high peaks of usage of

resources, it is needed to train the model on this actual patterns. In other words, if the

model had not seen a pattern indicating an overload in a node, it will fail to predict it.

Therefore, it is essential to count with some historical data of nodes failures to train this

type of models. On the other hand, statistical methods, such as autoregressive integrated

moving average, sacrifice prediction accuracy in order to provide a solid base to identify

unusual workload patterns without historical data of nodes overload.

7.5 Future work

DRL based scheduling and the workload estimation have potential to be combined. The

scheduling decisions taken by the DRL scheduler would be more effective if the escheduler

is provided with the future expected workload of the system. Thus, enabling the system

to take scheduling decisions with the goal of preventing node overloading. For instance, a

possibility would be to iteratively include as an input to the DRL scheduler, the workload

estimation of the system. Then, after the scheduling decisions would have been taken, the

future workload would be estimated with the new state of the system.

The scalability of DRL schedulers is a challenge to be overcome. Currently, as the per-

formed experiments showed, the input to the DRL scheduler, and hence, to the DNN, had

a fixed size. Differently, in realistic scenarios, the number of jobs to be scheduled simulta-

neously and the number of nodes is great and often changes, therefore, the DNN should

be scaled accordingly. Furthermore, it would be of interest to find a method to encode

inputs from different clusters with different sizes into a fixed representation. In that way,

it would be possible to design a DRL based scheduler that can operate in multiple scenarios.

54


In addition, a solid DRL based scheduler designed to operate in multiple clusters archi-

tectures with different characteristics, should go through an extensive training phase. In

other words, the DRL scheduler should be trained under different scheduling circumstances

for sufficient number of training iterations.

As it has been discussed, DRL scheduling represents a great advance in the field of

cluster management. Future lines of research may focus on building DRL based schedulers

that are able to generalize for different scheduling scenarios. Therefore, it would be in-

teresting to train DRL schedulers that are able to adapt to a wide variety of scheduling

scenarios by building more complex reward functions. It would also be remarkable to con-

sider multiple optimization parameters simultaneously, such as workload balancing or job

duration, among others, in the objective function. Currently, DRL scheduling have been

just tested in simulated environments, future works may focus on using this technique in

physical computing clusters to analyze its feasibility in a realistic scenario.

55

CHAPTER 8. CONCLUSION

Chapter 8

Conclusion

This thesis aimed to prove whether using cluster telemetry data is beneficial in the task of

scheduling and moreover, if cluster telemetry data could be used to train a scheduler based

on machine learning techniques. Founded on the results of the experiments performed,

it is possible to conclude that using online telemetry data from the cluster significantly

benefits the performance of the scheduler. Schedulers can use telemetry data, such as

available resources or distance between nodes to effectively take optimal scheduling deci-

sions. Furthermore, it has been proved that telemetry data can be used to train a machine

learning based scheduler. In specific, DRL has demonstrated to be a valuable tool to be

used in scenarios that involve to take sequential actions, such as, in the scheduling pro-

cess within computing clusters. Thus, DRL based scheduling is capable of learning the

optimal scheduling policy for specific scheduling scenarios with the goal of optimizing a

performance parameter, such as job completion time. Besides, DRL based scheduling has

been able to operate in multi-node cluster environments. Furthermore, DRL scheduling

has proven its effectiveness in unknown scenarios where the characteristics of the cluster

are not fully seen, hence, being able to learn the optimal scheduling policy for each specific

cluster disposition by interacting with the cluster in a training phase. Nonetheless, DRL

scheduling is still a novel technique and has room for improvement and development. In

specific, one of the main milestones to be achieved in the future is the possibility of build-

ing a DRL scheduler that is able to generalize for different types of clusters. That is, a

scheduler that does not need to be trained for each specific cluster and task.

On the other hand, the thesis aimed to asses if telemetry data could be used to es-

timate the workload of a computing cluster, even more, if telemetry data could be used

to design a failure model for a computing cluster. The results indicate that LSTM neural

networks are able to successfully estimate the workload of the nodes within a computing

56

CHAPTER 8. CONCLUSION

cluster. Besides, they are able to learn, from previous workload, patterns that relate them

to future nodes failures. Hence, it can be stated that telemetry data can be used to forecast

the workload of a cluster and to predict possible failures.

Finally, a future path of research was established through the possible combination of

workload estimation with scheduling systems. The use of workload estimation as an input

to scheduling systems might result valuable to take more optimal scheduling decisions.

57

BIBLIOGRAPHY

Bibliography

[1] “Software fail watch: 5th edition.” ttps://www.tricentis.com/resources/

software-fail-watch-5th-edition/, 2018.

[2] “Enabling “real world” soa through the microsoft platform.” https://web.archive.

org/web/20160206132542/https://msdn.microsoft.com/en-us/library/

bb833022.aspx, December 2006.

[3] T. Erl., Service-Oriented Architecture: Concepts, Technology and Design. 2005.

[4] C. L. Liu and J. W. Layland, “Scheduling algorithms for multiprogramming in a hard-

real-time environment,” J. ACM, vol. 20, p. 46–61, Jan. 1973.

[5] A. Shachar, “The state of the kubernetes ecosystem.” https://spot.io/blog/

kubernetes-ecosystem/, May 2018.

[6] M. Luksa, Kubernetes in Action. Manning Publications, 2018.

[7] P. Salot, “A survey of various scheduling algorithm in cloud computing environment,”

in International Journal of Research in Engineering and Technology (IJRET), Volume:

2 Issue: 2, ISSN: 2319 - 1163, 2013.

[8] R. V. Rasmussen and M. A. Trick, “Round robin scheduling - a survey,” European

Journal of Operational Research, vol. 188, pp. 617–636, August 2008.

[9] A. V. Karthick, E. Ramaraj, and R. G. Subramanian, “An efficient multi queue job

scheduling for cloud computing,” in 2014 World Congress on Computing and Com-

munication Technologies, pp. 164–166, 2014.

[10] Zhongni Zheng, Rui Wang, Hai Zhong, and Xuejie Zhang, “An approach for cloud

resource scheduling based on parallel genetic algorithm,” in 2011 3rd International

Conference on Computer Research and Development, vol. 2, pp. 444–447, 2011.

58

BIBLIOGRAPHY

[11] S. Malik, F. Huet, and D. Caromel, “Latency based group discovery algorithm for

network aware cloud scheduling,” Future Gener. Comput. Syst., vol. 31, p. 28–39, Feb.

2014.

[12] H. Mao, M. Alizadeh, I. Menache, and S. Kandula, “Resource management with deep

reinforcement learning,” in Proceedings of the 15th ACM Workshop on Hot Topics in

Networks, HotNets ’16, (New York, NY, USA), p. 50–56, Association for Computing

Machinery, 2016.

[13] R. Grandl, G. Ananthanarayanan, S. Kandula, S. Rao, and A. Akella, “Multi-resource

packing for cluster schedulers,” ACM SIGCOMM Computer Communication Review,

vol. 44, 08 2014.

[14] Y. Ye, X. Ren, J. Wang, L. Xu, W. Guo, W. Huang, and W. Tian, “A new approach

for resource scheduling with deep reinforcement learning,” 06 2018.

[15] W. Chen, Y. Xu, and X. Wu, “Deep reinforcement learning for multi-resource multi-

machine job scheduling,” 11 2017.

[16] Y. Li, “Deep reinforcement learning: An overview.,” CoRR, vol. abs/1701.07274, 2017.

[17] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward

neural networks,” in In Proceedings of the International Conference on Artificial Intel-

ligence and Statistics (AISTATS’10). Society for Artificial Intelligence and Statistics,

2010.

[18] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd Inter-

national Conference on Learning Representations, ICLR 2015, San Diego, CA, USA,

May 7-9, 2015, Conference Track Proceedings (Y. Bengio and Y. LeCun, eds.), 2015.

[19] I. Lera, C. Guerrero, and C. Juiz, “Yafs: A simulator for iot scenarios in fog computing,”

IEEE Access, vol. 7, pp. 91745–91758, 2019.

59

Documents

Telemetry data for machine learning based scheduling