Evaluation of a flexible task scheduling algorithm for distributed hard real-time systems

1130 IEEE TRANSACTIONS ON COMPUTERS, VOL. C-34, N O . 12, DECEMBER 1985

Evaluation of a Flexible Task Scheduling Algorithm for Distributed Hard Real-Time Systems

JOHN A. STANKOVIC, MEMBER, IEEE, KRITHIVASAN RAMAMRITHAM, AND SHENGCHANG CHENG

Abstract — Most systems that are required to operate under severe real-time constraints assume that all tasks and their characteristics are known a priori. Scheduling of such tasks can be done statically. Further, scheduling algorithms operating under such conditions are usually limited to multiprocessor configurations. This paper presents a scheduling algorithm, that works dynamically and on loosely coupled distributed systems, for tasks with hard real-time constraints, i.e., the tasks must meet their deadlines. It uses a scheduling component local to every node and a distributed scheduling scheme that is specifically suited to hard real-time constraints and other timing considerations. Periodic tasks, nonperiodic tasks, scheduling overheads, communication overheads due to scheduling, and preemption are all accounted for in the algorithm. Simulation studies are used to evaluate the performance of the algorithm.

Index Terms —Bidding, deadlines, distributed computing, estimation techniques, focused addressing, real-time, scheduling, simulation studies

I. INTRODUCTION

M ANY tasks performed in systems such as those found in nuclear power plants and process control [13] are

inherently distributed and have severe real-time constraints. These tasks have execution deadlines that must be met and are thus said to have hard real-time constraints. With today's advances in software, hardware, and communication technology for distributed systems, it may be possible to deal with distributed real-time systems in a more flexible manner than in the past. One of the challenges in designing such systems lies in the scheduling of tasks such that the tasks meet their deadlines. Most current research on scheduling tasks with hard real-time constraints is restricted to multiprocessing systems and hence is inappropriate for distributed systems. In addition, many of the proposed algorithms assume that all tasks and their characteristics are known in advance and hence are designed for static scheduling. Our research is directed at developing task scheduling software for loosely coupled systems with the goal of achieving flexibility through the dynamic scheduling of tasks in a distributed and adaptive manner.

Our scheme for distributed dynamic scheduling requires the presence of one scheduler per node [ 12]. These schedulers

Manuscript received May 1, 1985; revised August 14, 1985. This work was supported in part by the U.S. Army CECOM, CENCOM under Grant DAAB07-82-K-J015 and by the U.S. Office of Naval Research under Grant 048-716/3-22-85.

J.A. Stankovic is with the Department of Computer Science, Carnegie-Mellon University, Pittsburgh, PA 15213, on leave from the Department of Electrical and Computer Engineering, University of Massachusetts, Amherst, MA 01002.

K. Ramamritham is with the Department of Computer and Information Science, University of Massachusetts, Amherst, MA 01002.

S. Cheng is with the Department of Electrical and Computer Engineering, University of Massachusetts, Amherst, MA 01002.

interact in order to determine where a newly arriving task could be scheduled. Associated with a node in a distributed system is a set, possibly null, of periodic tasks which are guaranteed to execute on that node. We assume that the characteristics of periodic tasks are known in advance and that such tasks must meet their deadlines. At system initialization time it is verified that there is enough processing power at a node for all periodic tasks to meet their deadlines. In addition to the periodic tasks, we allow for the arrival of nonperiodic tasks at any node at any time and attempt to guarantee these tasks dynamically, in the presence of periodic tasks and on a network-wide basis. The goal of our scheduling algorithm is to guarantee all periodic tasks and as many of the nonperiodic tasks as possible, utilizing the resources of the entire network.

We have developed a) a locally executed guarantee algorithm for periodic and nonperiodic tasks, which determines whether or not a task can be guaranteed to meet its real-time requirements, b) a distributed scheduling algorithm suited to real-time constraints and composed of a focused addressing scheme and a bidding scheme, c) the criteria for preempting an executing task so that guaranteed tasks remain guaranteed, and d) schemes for including different types of overheads, such as scheduling overhead and communication overheads due to scheduling.

This paper is organized as follows. Section II outlines some of the current work in scheduling tasks in hard real-time systems. Section III discusses the nature of real-time tasks as well as the constraints attached to the execution of these tasks. The structure of the scheduler on a node is presented in Section IV, while the distributed scheduling algorithm is the subject of Section V. Section VI discusses the techniques used to estimate various parameters of the algorithm. The results of extensive simulation studies of the algorithm are reported in Section VII. The simulation results of a simple bidding algorithm are presented first. This is followed by results of a scheme that combines bidding and focused addressing. Section VIII discusses several extensions to the basic scheduling strategy that are currently underway. Section IX summarizes the performance results as well as the significant features of our approach to distributed task scheduling.

II. BACKGROUND

Most research on scheduling tasks with hard real-time constraints is restricted to uniprocessor and multiprocessor systems. For example, Garey and Johnson [2] describe an algorithm to determine if a two-processor schedule exists so that all tasks are completed in time, given a set of tasks, their deadlines, and the precedence constraints of all tasks; Liu and

0018-9340/85/1200-1130$01.00 © 1985 IEEE

STANKOVIC et al.: EVALUATION OF A FLEXIBLE TASK SCHEDULING ALGORITHM 1131

Layland [9] derive necessary and sufficient conditions for scheduling periodic tasks, with preemption permitted. Their results, which hold for uniprocessor systems, were subsequently extended to include arbitrary task sets [1] and precedence constraints [4]. Teixeira [15] develops a model that considers priority scheduling. It addresses external interrupts, scheduling overhead, and preemption. Johnson and Madison [5] develop a measure of free time, similar to our notion of surplus processing power, to determine whether new jobs can be permitted to execute on multiprocessor systems. These schemes are all quite inflexible, in that they do not adapt to the changing state of the system, and are restricted to uniprocessor and multiprocessor systems. We are attempting to develop more flexible dynamic scheduling techniques for loosely coupled networks.

Muntz and Coffman [11] have developed an efficient algorithm to determine minimal length preemptive schedules for tree-structured computations in multiprocessor systems. Similarly, Leinbaugh has developed analysis algorithms which, when given the device and resource requirements of each task and the cost of performing system functions, determines an upper bound on the response time of each task. His initial results for multiprocessor system [7] have been extended to distributed systems [8]. Resource requirements and some operating system overheads are accounted for in these papers, but periodic tasks are not. While these approaches are useful at system design time to statically determine the upper bounds on response times, they cannot be used for on-line scheduling, although it may be possible to extend the approaches for that purpose.

Multiprocessor scheduling in a hard real-time environment has been described by Mok and Dertouzos [10] by a scheduling game in which tokens are moved on a coordinate system defined by laxity and computation time. They have obtained the following results, which we use in our approach.

1) In the case of a single processor, both earliest deadline scheduling (scheduling the task with the earliest deadline) and least laxity scheduling (scheduling the task with the least difference between its deadline and computation time) are optimal.

2) In the multiprocessor case, neither is optimal. 3) For two or more processors, no scheduling algorithm

can be optimal without a priori knowledge of i) deadlines, ii) computation times, and iii) start times of the tasks.

Finally, it is known that optimal scheduling in a multiprocessing environment is an NP-hard problem and hence is computationally intractable [3]. The loosely coupled nature of distributed systems makes the problem even harder. Clearly then, a practical scheduling algorithm has to be based on heuristics in order to reduce scheduling costs and has to be adaptive. This is the context in which we have been studying the problem of scheduling in distributed systems.

III. NATURE OF TASKS

Our research involves dynamic scheduling of tasks, with real-time constraints, on processors in a network. A task is characterized by its start time, computation time, deadline, and possibly, period. Tasks may be periodic or nonperiodic. A nonperiodic task is one that occurs in the system just once

and at unpredictable times. Upon arrival it is characterized by its deadline and its computation time. Such a task can be scheduled any time after its arrival. A periodic task, say, with period P, is one that has to be executed exactly once every Ρ time units, i .e . , there should be one execution of the task every Ρ units. We do not associate any other semantics with periodic tasks (such as, Ρ time units should elapse between two consecutive executions of a periodic task). In a real-time system there could be a number of such periodic tasks, with different periods. Many real-time systems have a fixed set of periodic and nonperiodic tasks and allow only occasional reconfiguration of the task set at high cost in time, effort, and money. In our system model, we also assume that there is an initial base set of periodic tasks for each node in the system, and their execution must be ensured. Although we do not discuss it in this paper, it is possible to dynamically change the set of periodic tasks that execute on a node. The initial assignment of periodic tasks to nodes is assumed known, although our simulation model can also be used as a tool in choosing a good initial assignment of periodic tasks to nodes in a network.

From a scheduler's point of view, a periodic task represents tasks with known (future) start times and deadlines, whereas nonperiodic tasks may arrive at any time, can be started anytime after their arrival, and may have arbitrary deadlines. In both cases, we assume that a task's computation time is known a priori. The discussion in this paper also assumes that tasks are independent of each other.

IV. THE STRUCTURE AND FUNCTIONING OF THE

SCHEDULER ON A NODE

Each node in the distributed system has a scheduler local to that node. A set (possibly null) of guaranteed periodic tasks exists at each node. Nonperiodic tasks may arrive at any node in the network. When a new task arrives, the scheduler on that node checks whether the task can be scheduled at that node so as to finish before its deadline; if so, the task is guaranteed. Otherwise, the scheduler on the node interacts with the schedulers on other nodes, using a scheme that combines bidding and focused addressing, in order to determine the node to which the task can be sent to be scheduled. Upon arrival at that node, another attempt is made to schedule the task there. Eventually, the task either gets guaranteed and executed, or is not guaranteed.

Underlying our scheduling algorithm is the notion of guaranteeing a task. A task is said to be guaranteed if under all circumstances it will be scheduled to meet its real-time requirements. Thus, once a task has been guaranteed, all that is known is that the execution of the task will be completed before its deadline; exactly when it will be scheduled is dependent on the scheduling policy, the tasks that have been guaranteed but are waiting to be executed, the nature of periodic tasks, new arrivals, and scheduling overheads.

While periodic tasks are guaranteed tasks , nonperiodic tasks, after they arrive, may or may not be guaranteed. However, once guaranteed, they will definitely meet their deadlines.

Fig. 1 shows how the various modules, namely, the local scheduler, the dispatcher, and the bidder, as well as the guar-


Local Tasks, Tasks from other

Nodes

Request For Bids, bids,

tasks to other nodes, Request For Bids,

bids

Fig. 1. Structure of the scheduler on a node.

antee routine, that make up the scheduler on a node interact with each other. Their functions are discussed next.

A. Bidder and Local Scheduler Tasks

Tasks may arrive directly at a node or as a result of the interaction between schedulers on different nodes. New tasks arriving at a node are handled by the local scheduler task and may or may not cause preemption. Conditions under which preemption is permissible are derived below.

The local scheduler first calls the guarantee routine in order to guarantee the new task. If it is guaranteed, then it is stored in the ready queue. Guaranteed tasks are dispatched earliest-deadline-first.

If the task is not guaranteed locally, the task is handled by the bidder task. The bidder task is the component that is involved in the distributed aspect of task scheduling. It is the subject of the next section.

B. The Dispatcher Task

It is the dispatcher task that determines which of the guaranteed periodic and nonperiodic tasks is to be executed next. As. mentioned earlier, for a uniprocessor, both the earliest deadline algorithm and the least laxity algorithm are optimal. In our scheme, guaranteed tasks are executed according to the earliest-deadline-first scheme. It should be mentioned that our use of the earliest deadline algorithm for scheduling tasks on a single node does not guarantee an optimal schedule on the network as a whole. Our aim here is to guarantee tasks quickly and to reduce overheads. The simulation studies are an attempt to analyze the algorithm's behavior given the use of the earliest-deadline-first algorithm for scheduling on a node.

The dispatcher's actions are simple: whenever a task completes, the dispatcher is invoked, and it selects the task with the earliest deadline for execution. To expedite this selection, the list of guaranteed tasks is ordered according to the deadlines of the tasks. The run-time cost of the dispatcher is part of the computation time of every task.

For the purpose of scheduling, information on periodic tasks, such as their periods and computation times, is main

tained in a data structure called the periodic task table (PTT). During the operation of the system the guarantee algorithm uses the PTT in conjunction with the concept of surplus to ascertain whether nonperiodic tasks can be guaranteed. Surplus is derived from information in the system task table (STT). Both STT and the surplus are described below.

C. The System Task Table

Each node maintains an STT for all local periodic and nonperiodic tasks guaranteed at any point in time. In the STT, there is one entry per task which contains the task's arrival time, its latest start time, deadline, and computation time. All but the latest start time are inputs. Entries for tasks that have already arrived are ordered according to their deadlines; the rest are ordered by their arrival times and within each arrival time by deadlines. Note that tasks may arrive from various sources, and different tasks may have the same arrival time.

To compute the latest start time of a task with deadline D , all guaranteed tasks with deadlines greater than or equal to D are ordered according to decreasing deadlines. The latest start time is determined by assuming that tasks are scheduled to execute just in time to meet their deadlines. For example, if the first task on the list has deadline D1 and computation time C l , it has a latest start time of D l - C l . Suppose the second task on the list has a computation time C2 and deadline D2. If D2 is greater than D l - C l , then the task has a latest start time of D l - C l - C2, otherwise D2 - C2. In this manner, latest start times are calculated for every guaranteed task.

D. Surplus

Clearly, a newly arriving task can be guaranteed to execute at a node only if the surplus processing power at that node, between when the task arrives and its deadline, is greater than the computation time requirement for the task. Thus, we are interested in surplus with respect to the task about to be guaranteed or rejected. Surplus, then, is defined as the amount of computation time available on a node between the time of arrival of the new unguaranteed task and its deadline.

While surplus is not explicitly calculated for guaranteeing local tasks, surplus information is implicitly taken into account during the computation of the latest start time for such tasks: only if the latest start time is greater than the arrival time of the task is the task guaranteed. Surplus is computed explicitly while responding to a request for bid (see Section V).

E. The Initialization Task

Before the guarantee routine examines newly arriving tasks, its data structures have to be initialized. This is the function of the initialization task, which is executed just once, at system initialization time. Since a periodic task is known to represent regularly occurring tasks with future start times, at system initialization time it has to be determined if there is enough processing power to execute a given set of periodic tasks on a node. We define L to be the least common multiple L of the periods PI, · · · ,PN of all periodic tasks 1, · · · ,N assigned to a node. The tasks have computation


time CI,' " >CN. A necessary condition for the periodic tasks in L to be guaranteed on that node is

Σ {Ci*(L/Pi)}<=L;

that is, the sum of the computation times of all periodic task instances that have to be executed within L is less than or equal to L. After this necessary condition is verified, the latest start time for every task instance in an interval L is calculated using the scheme described previously. If all the tasks have latest start times no earlier than their arrival times, they are guaranteed. The calculated latest start times, computation times, and deadlines are then stored in the PTT for each task instance in the window L.

F. The Guarantee Routine

The guarantee routine local to a node is invoked to determine if there is enough surplus processing power to execute a newly arriving task before its deadline. A task can be guaranteed only after ascertaining that guaranteeing the task does not jeopardize not only previously guaranteed non-periodic tasks but also periodic tasks with future start times. If a newly arriving task cannot be guaranteed locally, the task becomes a candidate for bidding and/or focused addressing (see Section V ) . The guarantee routine (see Fig. 2) uses information in the PTT and the STT to guarantee a newly arriving task. Recall that each entry in the STT contains an arrival time, a latest start time, a deadline, and computation time. Note that the guarantee routine is coded assuming that before it is called the STT has been updated to reflect the current state of the node.

G. Consideration of Time Overheads in Scheduling

One of the prime motivations for the above separation of scheduling activity among various scheduling tasks is to take into account the time spent on scheduling. This is important in hard real-time systems.

Since the dispatcher has to be invoked each time any task, including the local scheduler task and the bidder task, completes execution and relinquishes the CPU, we require that the time taken for the dispatcher to execute be included in the computation time of every task.

It is essential that newly arriving nonperiodic tasks be examined soon after they arrive. But interrupting a running task to guarantee a newly arriving task might result in the running task missing its deadline. The following scheme is utilized in order to solve this problem: after the dispatcher chooses the next task to run, using the STT, it checks if there is sufficient surplus such that running the bidder task or the local scheduler task, after preempting the newly dispatched task, does not result in guaranteed tasks missing their deadlines. If the above is true for the bidder (local scheduler) task, then the dispatcher sets the invoke bidder bit (invoke local scheduler bit). If a message arrives from another node that requires the attention of the bidder task and the invoke bidder bit is set, then the currently running task is preempted and the bidder task is executed. If, instead, a task arrives locally and the invoke local scheduler bit is set, then the currently

Begin Let the new task be a 4-tuple, (A^JD.C), where

A = arrival time, S = laiest start time, D = deadline, C = computation time;

Let lastD be the latest deadline of the last window in the current STT; Do while (lastD < D)

Append a copy of PTi vu the end of STT; Add lastD to all the clock times of tasks in the appended copy of PTT; Update lastD;

End Do Find the current entry, i, in STT, such that (A < Ai) or (A = Ai and D < Di); Find the latest start time, S, for the new task taking into account

tasks from the current entry i to the end of the STT; If (S < tick ) or (S < A)

then return ('not guaranteed y, else

Begin Temporally consider (A,S,D,C) as inserted prior

to the entry i and to be the new current entry; For each entry j from the current entry backwards to the first entry,

Do Calculate new latest start time, Sj; If (Sj < current time) or (Sj < Aj)

then return ('not guaranteed 4); else temporarily save Sj;

End If End Do

/•New task can be guaranteed */ If the task arrived locally

then Begin Commit the insertion of (A,S,D,C) and all the new start times

Sj calculated in the above loop; Return ('guaranteed');

End else Return idle CPU time between A and D;

End If End

End If End;

Fig. 2. Outline of the guarantee routine.

running task is preempted and the local scheduler task is executed.

Even if a running task is not preemptible, thereby preventing a new task from being guaranteed soon after its arrival, the newly arriving task should be examined without undue delay. To facilitate this, both the bidder and the local scheduler tasks are executed as periodic tasks. The period and computation time of these tasks can be determined by the nature of tasks, for instance, their laxity and frequency of arrival, as well as by the nature of communication from other nodes in the system, for instance, the frequency of request for bids. Currently, these parameters are initially determined based on the estimated information about the statistical characteristics of the system, but our algorithms can be extended to adapt to the changes of the network state.

The above scheme is based on the assumption that there is a communication module, executing on a processor which is separate from the CPU on which tasks are scheduled, that is responsible for receiving communication from local sources as well as from other nodes. Based on the type of communication, this module stores received information in the appropriate data structures so that they will be looked at when the different tasks execute.

V . DISTRIBUTED TASK SCHEDULING

The scheduler with a task that needs to be scheduled, but which cannot be scheduled on that node itself, interacts with


the schedulers on other nodes in an attempt to find a node that has sufficient surplus to guarantee the task. This interaction is based on a scheme that combines focused addressing and bidding [14]. The algorithm is outlined in Fig. 3 and is explained below.

A. The Focused Addressing Scheme

Focused addressing utilizes network-wide surplus information to reduce overheads incurred by bidding in determining a good node to send a task to, and works as follows. A node, before sending request for bids (RFB's), uses the surplus information about the other nodes in the network to determine if a particular node has a surplus which is significantly greater than the computation time of the new task, which cannot be guaranteed locally. Such a node has a high probability of guaranteeing this new task, and hence the new task can be sent directly to that node.

Determination of the suitability of a node for focused addressing is carried out in the following manner. Estimate the time ART when the task will arrive at the selected node. If the estimated surplus of that node, between ART and the deadline D of the task, is FP times the computation time C of the task, then the task is sent to that node. (The computation of ART and of the surplus between ART and D will be described subsequently, when we present the bidding scheme. FP is an adaptive parameter used in focused addressing.) If a number of nodes satisfy this requirement, one of them is chosen randomly. The chosen node, referred to as the focused node, uses the guarantee routine to check if the arriving task can be guaranteed there. If there is no focused node, then the bidding scheme is invoked. Also, should the focused node fail to guarantee the task, and if time considerations permit, the bidding scheme can be invoked.

Rather than invoking bidding only when use of focused addressing fails to guarantee the task, we invoke bidding while communication with the focused node is in progress. This should increase the probability of tasks being guaranteed. In this scheme, when a node sends a task to a focused node, it also sends RFB messages to other nodes with an indication that bids should be returned to the designated focused node. If the focused node is unable to guarantee the transferred task, it chooses a node to send the task to based on the bids sent by the bidding nodes.

To facilitate this approach to distributed scheduling, every node has to keep track of the surplus of other nodes. Towards this end, the local scheduler task on every node, when invoked periodically, sends to other nodes special messages containing the percentage of free time during the next window. (Window is a system parameter used in the estimation of scheduling delays, surplus, etc. See Section VI.) In addition, surplus information is piggybacked on messages that are exchanged in the bidding process. It should be pointed out that given the delays involved in message transmission, such surplus information will be outdated by the time it is received. Hence, our algorithm utilizes this information only to estimate the surplus of nodes.

Fig. 3 provides an outline of the distributed scheduling

The Focussed Addressing Scheme:

/* The following code is executed by the local scheduler task on a node whenever a local task cannot be guaranteed. ·/

Let a task to be sent out be a 3-tupIe, (CJD,Size), where C = computation time, D = deadline, Size = size of task in packets;

Let node_surpIus[l.Ji] be the information available on this node about the percentage of free time on nodes 1 through n;

Let FP be the adaptive parameter used in foussed addressing;

Begin For each remote site i

Do Estimate A_RT, the arrival time of the task at node i, should the task be sent to node i;

If node_surpIus[i] · (D-ART) > FP · C then mark node i as a candidate for a focussed node;

End Do;

If there is any candidate for the focussed node then begin

Randomly choose one of the candidates to be the focussed node; Let the chosen focussed node be node j ; Send the task to node j ; Store task in bidder's queue of tasks for which Request For Bids

have to be sent out (Indicate that bids should be returned to the focussed node j);

End else

Store task in bidder's queue of tasks for which Request For Bids have to be sent out; (Indicate that bids should be returned to the requesting node);

End If; End;

Fig. 3. Outline of the distributed scheduling algorithm.

algorithm that combines bidding and focused addressing. Now we examine the details of the bidding scheme.

B. The Bidding Scheme

Nodes making bids do not reserve resources needed to execute the task for which they are bidding. When a task arrives at a node as the result of its bid being accepted, the task is handled as though it arrived locally. It is possible that the node is unable to guarantee the task since the node's surplus has changed since it sent the bid. This can happen due to local task arrivals and also due to the arrival of tasks as a result of previous bids. One solution to this problem is for nodes to reserve resources, specifically CPU time slots, for the task for which they are bidding. We have not adopted this solution due to the poor resource utilization that it is likely to entail: a node may bid for more tasks than it will be awarded; also, there may be multiple bids for a task. However, as will be explained shortly, in our algorithm a node making a bid takes into account its previous bids for which responses are yet to be received.

A number of factors that affect bidding require estimations, for example, communication delays, CPU requirements of future tasks, etc. In this section we describe the bidding approach, assuming that the needed estimates are available. In the next section we show how these estimations can be made.

We now describe the different phases of the bidding process in detail. The main functions of the bidder component on a node are: sending out request for bids for tasks that cannot be guaranteed locally, responding to the request for bids from other nodes, evaluating bids, and responding to task awards.

STANKOVIC et ai: EVALUATION OF A FLEXIBLE TASK SCHEDULING ALGORITHM 1135

The Bidding Scheme:

/ · The Bidding Algorithm has four phases: The request for bid phase is executed by the bidder task on

the node that has the task which cannot be guaranteed locally; The bidding phase is executed by responding nodes; The bid processing phase is executed by the bidder task on

the node to which bids are. supposed to be sent; The respond to task award phase is executed by the best bidder node;

Terms used in the following code are explained in the text; ·/

The requestor has a task queue in which tasks for which Request For Bids (RFB's) have to be sent are kept;

Let each task in the queue be a 3-tuple, (CDiSize).

Begin For each task in the bidder's task queue

Do Estimate ER, the earliest possible response time; Estimate DR, the deadline for response; If ER > DR

then Return; / · There is insufficient time to invoke bidding. 7 else Begin

Calculate the time at which to begin bid evaluation; Send Request For Bid messages to other nodes;

/ · Each message contains C, D, Size* and the time for bid evaluation ·/

Store the task in the wait_for_bid queue; Allocate an empty mcaning.bidLqueue for the task;

End; End If;

End Do; End;

Biddint:

Let each incoming Request_for-bjd message be a 4-tuple, ( C J ) £ a t , Time of request, DR).

Begin For each incoming Request For Bid message

Do If Current time + EST(comm_delay_per-measage) < DR

then begin Estimate ART, the arrival time of the corresponding task; Estimate SARTD, the local surplus between ART and D; If SARTD > C

then send a bid back to the requestor node; /* A bid contains ART, SARTD, and EST(LSLwait). */

End If End If;

End Do; End;

Fig. 3.

C. Request for Bids

For a task that cannot be guaranteed locally, a decision is made as to whether to transmit an RFB. This decision is based on calculating an earliest possible response time (ER) and a deadline for response, (DR). ER takes into account the fact that an RFB is handled by a remote node's bidder task and that a two-way communication is involved with the bidder. Hence,

ER = Current time + Comp_t ime b i d d e r

+ (2 * EST(Comm_delay_per_message))

where

Comp_t ime b i d d e r = Computation time of the bidder, and EST(Comm_delay_per__message) = average communication delay for control message such as an RFB or a bid.

DR takes into account the fact that after DR elapses, there

Bid-processing:

Let each task in the wait_for_bid queue be a 5-tupIe, (C,D,Size, Time for bid evaluation, Number_of_returned_bids);

Begin For each task in the waii_for_bid queue

Do Begin

If Number_of_returaed_bids > Required minimum-number-of-bids or Time for bid evaluation > Current time

then For each bid in the incoming_bi<Lqueue for the task

Do Estimate ETA, the arrival time of the task at the bidding node; Estimate SETAD, the surplus of bidding node between ETA and D; Keep track of the largest and second largest SETAD's as well as

the corresponding bidders; End Do,

If largest SETAD > C then

Send the task and the identity of bidder with second largest SETAD to the bidder with the largest SETAD;

else If current time < DR then

Leave the task in the wait_for_bid queue and Wait for more bids;

else /" The task cannot be guaranteed. */ Reject the task and delete it from the queue;

End If; End If;

End; End If;

End Do; End;

Response Jg. Task Award:

/'Along with the task, identity of the second best bidder is also received */ Begin

For each task received from another node as a result of a previous bid Do

Invoke guarantee routine to guarantee received task; If task is guaranteed

then add task to Ready list else begin

If current time + EST(comm_deIay_for_the_task) + EST(LSLwait) + Computation time of task < = D

then send task to second best bidder node; else reject the task;

End If; End if;

End Do; End;

should be sufficient time for 1) the bidder task to evaluate the incoming bids and determine the best bidder, 2) the task to be sent to the best bidder node, 3) the task to be guaranteed by the local scheduler at the best bidder node, and 4) the task to be executed and completed before its deadline.

DR = D - Comp_t ime b i d d e r

- EST(Comm_delay_for_the_task) — Comp_timeLocai scheduler ^

where

EST(Comm_delay_for_the_task) = average communication delay for moving a task, and

Comp.timeLoca! s c h e d u i e r = Computation time of local

scheduler task.

If ER is greater than or equal to DR, then there is no need to transmit an RFB. If ER is less than DR, then an RFB will be broadcast to all nodes. The RFB message itself contains

(Continued.)

1136 IEEE TRANSACTIONS ON COMPUTERS, V O L . C-34, N O . 12, DECEMBER 1985

the following information: D, C, size of the task, current time, and DR.

One way to reduce the communication overheads due to bidding is to send RFB's only to those nodes which have a high probability of responding to the request. Such nodes can be identified by using the surplus information about other nodes.

Before the RFB is actually sent, the algorithm calculates the time at which to begin bid evaluation, Time b i d _ e v a l , where

Time b i d _ e V ai = Current time + EST(Response_time_for_RFB's)

where

EST(Response_time_for_RFB's) = estimated delay between transmission of an RFB and the arrival of a bid.

If Time b i d evai is less than DR, then the requesting node waits until T i m e b i d e v a i before evaluat ing b ids . However , if Time b i d evai is greater than or equal to DR, then we arbitrarily let Time b i d evai = (ER + DR)/2, with the hope that at least one reply arrives in time. Information about the task as well as Time b i d evai is placed in the wait_for_bid_queue.

D. Bidding in Response to RFB's

The bidder first estimates that its response will reach the requester before the deadline for response DR. It proceeds with further actions on the request for bid only if the time of response plus the communication delay for the response is less than the indicated deadline for response.

Once a node decides to respond, it first computes ART, the estimated arrival time for the task, in case the node is awarded the task. ART is one of the three components of the bid. Computation of ART is done as follows:

ART = Current time + EST(Comm_delay_per_message) + EST(Bid_Wait) + EST(Comm_delay_for_the_task) + EST(LS_Wait)

where

EST(Bid_Wait) = the estimated delay in processing a returned bid, and

EST(LS_Wait) = the estimated wait time experienced by a transferred task at its new node before it is either guaranteed or rejected.

The second component of the bid is the CPU time surplus at this node between the estimated arrival time, ART, and the tasks deadline D. We call this SARTD. The surplus information takes into account the following.

1) Future instances of periodic and guaranteed nonperiodic tasks: this ensures that guaranteed tasks are not jeopardized.

2) Computation time needed for tasks that may arrive as a result of previous bids: this ensures that nodes requesting bids are aware of other bids by a node and hence minimizes

the probability of a node being awarded tasks with conflicting requirements or being awarded too many tasks creating an unstable situation.

3) Computation time needed for nonperiodic tasks that may arrive locally in the future: this minimizes the probability of a task arriving as a result of a bid not being guaranteed due to the arrival of a local task with similar real-time requirements.

4) Surplus resulting from tasks that do not execute to their worst case computation time.

While accurate information is available concerning 1), information needed for 2) and 3) is estimated based on the past behavior of the node.

More precisely, we compute SARTD as follows:

Let EST(CPU_time_local_between_ART_and_D) = estimated CPU time required by local tasks

that execute on the node between ART and D

Let EST(CPU_time_bid_between_ART_and_D) = estimated CPU time required by tasks that ar

rived due to bidding and that execute on the node between ART and D

Then SARTD = ((D-ART) * (1 - Percent_periodic_tasks))

- [(EST(CPU_time_local_between_ART_and_D) + EST(CPU_time_bid_between_ART_and_D)) * EST(Task_execution_time_ratio)]

where

Percent_periodic_tasks = the percentage of CPU time required by periodic tasks between ART and D , and

EST(Task_execution_time_ratio) = the average value for the ratio (actual CPU time used by a task/ worst case CPU time required by that task).

Finally, if SARTD is less than C, then no bid is made since the surplus is not sufficient. If SARTD is greater than or equal to C, then a bid is returned with the information, ART, SARTD, and an estimation of how long a new task transferred to this node will have to wait before it is processed for a possible guarantee.

E. Bid Processing

Bid processing is carried out by the node that originally sent out the request for bids. A bid processor task waits for bids returned in response to an RFB until either 1) a required minimum number of bids are received (a tunable system parameter), or 2) until Timehid_eval. Whether one or both of these factors is used is specified by a tunable system parameter. As soon as condition 1) or 2) is met, and if any bids have been received, the evaluation of the received bids is started. For each bidding node, the algorithm computes ETA, the estimated time of arrival of the task at the bidder's node. For each bidder it estimates SETAD, the surplus between ETA and D using the following formula:

SETAD = SARTD * (D — ETA)/(D - ART) .


If there is at least one bid whose SETAD is greater than or equal to C, then the bidder task chooses the one with the greatest SETAD as the best bid and the task is sent to the node that sent the bid. If for all bids, SETAD is less than C, then the bid processor task waits for more bids until DR. For each new bid, if SETAD is greater than C, then the task is immediately sent to the bidder node. The identity of the second best bidder, if any, is also communicated to the best bidder (see the next subsection).

One final note about the information sent on bids: a node utilizes this information to keep track of the surplus in other nodes. This is used for sending RFB to nodes with high surplus and in focused addressing. In response to a RFB for a task, a bidder sends the estimated arrival time and the estimated surplus between the arrival time and the task's deadline. If a node does not respond to an RFB, it is assumed that it does not have sufficient surplus. (In reality, a node may not have sent its bid because it estimated that its bid would not reach the requester before the deadline for response.) Information received via bids is bound to be fragmented, and hence, a node, if needed, utilizes information about available free times sent periodically by other nodes.

F. Response to Task Award

Once a task is awarded to a node, the awardee node treats it as a task that has arrived locally at the node and takes actions to guarantee it. If the task cannot be guaranteed, the node can request for bids and determine if some other node has the surplus to guarantee it. However, given that the task was sent to the best bidder and that the task's deadline will be closer than before, the chances of there being another node with surplus are small. Hence, we made the decision to send not only the task but also the identity of the second best bidder to the best bidder: if the best bidder cannot guarantee the task, then, should time considerations permit, it sends the task to the second best bidder, if any. Otherwise, the task is rejected.

It will be the responsibility of the environment (that submitted the task) to take appropriate actions in the event that a task is not guaranteed. Such actions could be either to resubmit the task with a later deadline or to execute error recovery code. We believe that our decision to guarantee a task as soon as possible after it arrives, rather than waiting to see if somehow it meets its deadline, allows the environment more leeway in exception handling, should the task not be guaranteed.

If there are specifications concerning, for example, the percentage of tasks that should be guaranteed, then the system should be designed to meet them. It is in this regard that the simulation model that we describe in Section VII can be used as a tool in that if it predicts that too low a percentage of tasks will be guaranteed, then more and/or faster processors are required for this system. Further, the simulation can be tailored to model a given system and can be used to determine if it is possible to meet the specifications, and if so, how the periodic tasks should be allocated to individual nodes.

VI. ESTIMATION TECHNIQUES

We now show how the estimates used in the previous section are made. Time is divided into time slots. A window is formed by five consecutive time slots. As time progresses, the window is moved, i.e., when the current time indicates the end of a time slot, the window is moved to occupy the most recent five time slots. Information, such as the delay in processing bids [used to compute EST(Bid_wait)], is gathered for each time slot. The cumulative information in the time slots forming a window is used to make the various estimations described below.

A. EST(Bid_Wait)

To compute this, the information gathered is the delay involved in processing bids. This is done in the following way. A returned bid is timestamped when it is stored in the incoming queue. When the bid is processed, the current time is subtracted from the timestamp to produce the wait time for this bid. Window_bid_wait is computed to be the average bid_wait for bids received in the most recent window. The new value for EST(Bid_wait) is computed using its current value and the value of Window_bid_wait via exponential smoothing, i .e . ,

New EST(Bid_wait) = (alpha * Window_bid_wait) + [(1 - alpha) * EST(Bid_wait)]

where alpha is a parameter used in the estimation and lies between 0 and 1.

B. EST(LS_wait)

To estimate how long a transferred task waits before it is processed (guaranteed or not) by a local scheduler task, EST(LS_wait), we timestamp tasks as they enter the wait queue at the destination node. When the local scheduler at the destination finishes processing the task, the timestamp is subtracted from the current time. This wait time is then averaged in with previous wait times in the same manner as described above for EST(Bid_wait), and estimation of the wait time in the future is done by single exponential smoothing.

C. EST(Comm_delay_per_message)

In our algorithm it is necessary to take communication delays into account during the bidding process. Communication delays depend on the characteristics of the pairs of processes involved, for instance, on the distance separating them and on the communication from other nodes in the system to the two nodes. We estimate the time in the following way: every communication will be timestamped by the sending node. The receiving node would then be able to compute the delay due to that communication by subtracting the timestamp from the time of receipt. Subsequent communication delays are estimated based on a linear relationship between message length and previous communication delay per unit length. By utilizing the latest known delay, this computation can adapt to changing system loads. Also, by knowing the length of a task, EST(Communication_delay_ for_the_task) can be determined.

1138 IEEE TRANSACTIONS ON COMPUTERS, VOL, C-34, N O . 12, DECEMBER 1985

D. EST(Response_time_for_RFB's)

Let denote the time when a returned bid is placed in a queue awaiting processing, r s e n d be the time when the RFB for this bid was sent, and Turnaround_time = r r e C e ive ~~

7 s e n d. E S T ( R e s p o n s e _ t i m e _ f o r _ R F B ' s ) is the average Turnaround_time in a window. Within a time slot we consider only those bids from nodes which have returned good bids with Γ ^ ν β lying within the time slot ( i .e . , the surplus of the bid is greater than C when it is evaluated at r r c c e i v e , whereby delay in receiving the bid is accounted for). The average turnaround time is computed over the past five time slots. Estimation is done via exponential smoothing.

E. EST(CPU_time_local_between_ART_and_D) and EST(CPU_time_bid_between_ART_and_D)

As seen in the last section, these are used to estimate the surplus between the estimated arrival time of a task, ART, and its deadline D. To do this, each node maintains the following information:

CGLT = CPU time required by guaranteed local tasks CGBT = CPU time required by tasks acquired by

bidding and guaranteed CFBT = CPU time required by tasks for which bids

were sent out and may arrive in the future.

This information is maintained in an array of time slots. When a task is guaranteed, or a bid is sent, the CPU time of that task is proportionally divided among all the time slots that lie between its ART and D so that it evenly affects all the time slots which it overlaps. If it is a guaranteed task, then ART is the current time.

A node also maintains WCGLT, which is the sum of the CGLT's in the previous five time slots; WCGBT, which is the sum of the CGBT's in the previous five time slots; and WCFBT, which is the sum of CFBT in the previous five time slots. This information is updated at the end of each time slot. Let

Percent_CGLT = WCGLT/window Success_ratio_of_bids = WCGBT/WCFBT.

EST(Percent_CGLT) and EST(Success_ratio_of_bids) are computed using single exponential smoothing.

Let CGLT2 be the CPU time required between ART and D by already guaranteed local tasks. Then

EST(CPU_time_local_between_ART_and_D) = max[EST(Percent_CGLT) * (D - ART), CGLT2].

Let CGBT2 be the CPU time required between ART and D of guaranteed tasks which arrived via past bidding, and let CFBT2 be the CPU time needed between ART and D for tasks for which bids were sent out. Then

EST(CPU_time_bid_between_ART_and_D) = max[EST(Success_ratio_of_bids) * CFBT2, CGBT2].

F. EST {Task_execution_time_ratio)

This is estimated to be the average of (actual execution time/worst case execution time) of the tasks that completed during the past window.

In all the above estimations, we assume that the clocks on different nodes are synchronized. The only effect of slight asynchrony in the clocks will be that the estimates will be slightly inaccurate. Remember, tasks are independent and are guaranteed only at the node that they actually reside, therefore, a node can guarantee a task based solely on local information. Hence, asynchrony in the clocks does not affect the guarantee algorithm itself.

VII. EVALUATION OF THE SCHEDULING ALGORITHM

The task scheduling algorithm proposed in this paper consists of two main components: the local guarantee routine and the distributed scheduling scheme. This section briefly describes the overall simulation model, the evaluation process, and the simulation results. The simulation program is written in GPSS and Fortran and it simulates a (variable) number of nodes connected by an arbitrary topology. The model is extensible so that precedence constraints and additional resource requirements (other than the CPU) can be added later. In the simulation model each node is described by its processing speed, each node runs all the modules of the scheduling algorithm (local scheduler, dispatcher, and bidder), and each node executes periodic as well as nonperiodic tasks. All the estimation techniques are included in the model. As described in our algorithm, tasks can sometimes be preempted. This facility is also included in the simulation model. Tasks are characterized by their period (if any), their computation time, and their deadline. Specific values for these parameters are generated by GPSS functions.

The communication subnet is modeled by a delay per packet per hop. The delay is described by a random variable with a distribution function chosen at run time. Various types of information move through the subnet, each experiencing different delays as a function of its size. These include requests for bids (one packet), bids themselves (a variable number of packets), and the tasks (a variable number of packets).

Arrival rates for the nonperiodic tasks are Poisson and occur independently at each node of the network. They are purposely set high to stress the algorithm. An actual real-time system might require that 95 percent of its tasks be guaranteed and hence must have more "excess" CPU power than we simulate here. The set of periodic tasks for each node must be chosen at the start of the test, but can easily be changed from run to run. The initialization task is executed to ensure that all the periodic tasks in the suggested configuration can be guaranteed. If not, the initialization task immediately terminates with an error condition and a reassignment of tasks must be tried.

The statistics accumulated include node utilization, task response time, node queue length, percentage of nonperiodic tasks that meet and do not meet their deadline, percentage of nonperiodic tasks that move to another node and then meet or do not meet their deadline, and the number of tasks moved via focused addressing and the percentage of these that do or do not meet their deadline. Using a log file it is also possible to determine which tasks were and were not guaranteed. Due to space limitations, not all these statistics are presented in this


paper. Our testing determines how these statistics change as we vary the characteristics of the periodic and nonperiodic tasks (computation time and deadline), their periods and arrival rates, respectively, the delays in the subnet, and the size of the network. Since much of the scheduler runs as a periodic task we can easily determine the effect, say, of running the bidder at various periods.

In addition, we would like to determine which parameters of our algorithm have the greatest effect on performance: for example, determining the usefulness of focused addressing, or determining the difference between an adaptive or non-adaptive policy in setting a wait time before processing arriving bids, or the effectiveness of various types of information in the bids themselves. Tuning of various scheduling parameters, such as FP, the parameter used for focused addressing, is not done in the current simulation model. However, all parameters are easily changed from run to run to facilitate evaluation.

We now present simulation results of a simple bidding algorithm. In this algorithm, focused addressing is not incorporated. Also, if the best bidder is unable to guarantee the task sent to it, the task is not retransmitted to the second best bidder. The reported simulation results show the effect of task laxities on the percentage of tasks guaranteed, show the effect of different periods for the system tasks (the local scheduler and the bidder) on the percentage of tasks guaranteed, and offer comparisons of bidding algorithm to a random scheduling algorithm (one in which when a node is unable to guarantee a task, it sends the task to one of the other nodes, selected randomly), both for Poisson arrival patterns and for burst arrival patterns. In order to evaluate the effect of the focused addressing scheme and of retransmitting the task to the second best bidder should the first fail to guarantee it, we also compare the performance of the simple bidding algorithm itself to that of a combined algorithm, i .e. , when focused addressing and task retransmission are also incorporated as described in Fig. 3. We then conclude with general observations that we have made over many simulation runs.

In all the tables below, the figures are for the entire network. The network contains five nodes and is fully connected. All internode distances are the same.

A. Effect of Task Laxities

In this first set of runs, when nonperiodic tasks arrive, their laxities are determined from a normal distribution with means of 300, 450, 600, and 750 ms. The laxity of a task is given by its deadline minus its computation time. Table I shows that when laxities are small (a mean of 300), 72.4 percent of the tasks are guaranteed locally, and that few additional tasks are guaranteed via bidding (percentage increases only to 74.2). As laxities are increased we do not see any improvements to the percentage of tasks guaranteed locally, but do see a significant improvement in tasks guaranteed via bidding, e.g. , to as much as 94.7 percent when the mean of the laxity distribution is 750 ms. This is because a given local node is still too busy to accept the task even with the relaxed deadline, but it now has time to discover a node that is able to handle this new task.

T A B L E I EFFECT OF TASK LAXITIES

| Ave. Task Laxities (ms) 368 458 688 758 | | X Tasks Guaranteed Locally j X Tasks Guaranteed Network-wide

72.4 74.2

72.1 86.2

71.8 91.5

71.6 | 94.7 |

window size - 1928 ms ave. commmunication delay per message - 58 ms period of local scheduler and bidder - 128 ms computation time of local scheduler and bidder - 3 ms task execution time ratio - 188* average task computation time - 58 ms two source nodes (18.8 arrivals/sec, 49X time for periodic tasks) three sink nodes (7.8 arrivals/sec, 2SX time for periodic tasks)

T A B L E II EFFECT OF DIFFERENT PERIODS OF THE LOCAL SCHEDULER TASK AND THE

BIDDER TASK

I Periods, in milliseconds, of I I Bidder and Local Scheduler 128 18C 258 j

I X Tasks Guaranteed Locally 71.8 66.4 55.8 | j X Tasks Guaranteed Network-wide 92.5 84.8 69.6 j

Computation times of bidder and local scheduler are proportionately increased to keep their processing power in a window constant,

window size - 728 ms average commmunication delay per message - 58 ms task execution time ratio - 180* average task computation time • 58 ms average task laxities > 688 ms two source nodes (as in table 1) three sink nodes (as in table 1)

Β. Effect of Different Periods for the Local Scheduler and Bidder Tasks

The local scheduler and bidder run as periodic tasks. Table II shows the results of several runs in which the net overhead of running the bidder and local scheduler was approximately the same, but the frequency of invocation varied. In other words, if these system tasks ran more frequently, then they would handle less work on each invocation. In this way we attempted to measure the effect of the period itself. The effect is quite dramatic, e .g. , when these system tasks execute every 120 ms the percentage of tasks guaranteed was 92.5 percent, but at a 250 ms period the percentage of tasks guaranteed dropped to 69.6 percent. Even the percentage of local tasks guaranteed dropped dramatically because new arrivals were not processed quickly enough.

C. Random Scheduling Versus Bidding — Poisson Arrivals

In this experiment and in the next we compare our bidding approach to a random scheduling algorithm (RSA). In RSA, if a task is not guaranteed locally, it is transferred to one of the other nodes selected randomly. The communication cost for task movement is accounted for in the simulation. However, unlike a bidding scheme, in RSA there is no communication cost incurred for determining the node to which a task should be sent.

In the comparison, first we consider the situation where each node in the system receives tasks at an arrival rate described by a Poisson process, each node experiencing a different average. In the case of Table III, the task load on each node from node 1 to node 5 is incrementally changed by a factor of 1.5, while in the case of Table IV, the variation of the task load on nodes is based on a less regular distribution, as given in Table IV itself. Put another way, Table III con-

1140 IEEE TRANSACTIONS ON COMPUTERS, VOL. C-34, NO. 12, DECEMBER 1985

T A B L E III COMPARISON OF THE BIDDING ALGORITHM AND THE RANDOM SCHEDULING

ALGORITHM UNDER LINEAR DISTRIBUTION OF TASK LOADS

- Network Loads | 4 Γ 5 | ί 6 |

Γ Algorithm | BIO RSA | BIO RSA ! | BIO RSA |

1 (2) I (3)

X Taeke Guaranteed Netuork-uide| 96.9 94.8 X Taeke Not Guaranteed Locally | 22.4 23.1 X Tasks of (2) Guaranteed | 86.7 78.9

1 88.3 85.6 t 75.6 75.8 | j 38.1 48.2 | 50.1 49.7 | | 69.7 64.5 | 51.7 51.8 |

uindou size - 728 ns average commmunication delay per message - 18 ms period of local scheduler and bidder - 108 me computation time of local scheduler and bidder - 1 me task execution time ratio - 75! average task computation time · 288 me average task laxity - 688 ms task loads of nodes are changed incrementally from node 1 to node 5

by a factor of 1.5. ratio of nonperiodic task load to periodic task load -4:1 surplus estimation only takes into account future taeke arriving

via bidding.

T A B L E IV COMPARISON OF THE BIDDING ALGORITHM AND THE RANDOM SCHEDULING

ALGORITHM UNDER NONLINEAR DISTRIBUTION OF TASK LOADS

I Netuork Loads I 4 | 5 I 6 | Algorithm I BIO RSA | BIO RSA | BID RSA |

I (1) X Tasks Guaranteed Network-wide| 98.3 92.5 | 87.4 79.4 78.9 68.9 | I (2) X Taeke Not Guaranteed Locally | 24.2 28.1 | 44.2 49.4 56.9 59.3 | I (3) X Taeke of (2) Guaranteed | 92.9 73.2 | 71.7 58.6 49.5 47.8 I

uindou size · 728 me average commmunication delay per message - 18 me period of local echeduler and bidder - 188 ms computation time of local echeduler and bidder - 1 ms task execution time ratio - 75* average task computation time - 288 ms average task laxity - 688 ms Distribution of task loads of nodes, uhen netuork load is 5,

from node 1 to 5 is (0.22, 8.24, 1.38, 1.51, 1.65). ratio of nonperiodic task load to periodic task load · 4:1 The surplus estimation only takes into account future tasks arriving

via bidding.

tains results for the case where each node is slightly busier than the previous node; and Table IV contains results for the case where two nodes are sink nodes (sink nodes are lightly loaded and hence serve as "sinks" for tasks from other nodes) and three nodes are source nodes (source nodes are heavily loaded and hence serve as "sources" for tasks to be sent to other nodes). The total network load is referred to as varying between 4 , 5 , and 6. A network load of 4 means that the total computation time required by all periodic and nonperiodic tasks in the system is 4 times the service speed of a single processor. Note that even though we only have five processors, a network load of 6 is not unreasonable because some of the tasks will not be guaranteed.

When the network is small and there are light loads such as for a network load of 4, it is likely that almost any of the four remaining nodes to choose from can perform the task. Hence, it is very likely that random scheduling will perform well. Table III verifies this and shows that 78.9 percent of tasks randomly sent out to other nodes were guaranteed. When the network load is increased to 5 and then to 6, for the RSA, this percentage dropped to 64.5 and 51.8 percent, respectively. This is as expected.

As shown in Table III, when the loads of nodes are increased linearly, the bidding algorithm performs only slightly better than the random scheduling algorithm, but when the task loads of nodes are unbalanced, as shown in Table IV, the performance improvement is more significant. In both cases, the bidding algorithm was able to guarantee a higher percentage of tasks that were transmitted across the network and

also have a higher percentage of tasks guaranteed on the local node. One would expect that the overhead of bidding would degrade the local performance. However, by choosing the proper node to send a task, the bidding algorithm does not overload a node already heavily loaded. This overloading is likely with random scheduling, and hence local task arrivals have a greater chance of not being guaranteed at the more heavily loaded nodes.

D. Random Scheduling Versus Bidding—Burst Arrivals

From the results reported in Table V, we see that for burst arrivals bidding works better than random scheduling under network loads 5 and 6. In these tables we show the running averages of percentage of tasks guaranteed during the period of increased arrivals. Burst arrivals were modeled as changed arrival rates every 2 s, starting at 10 s into the run and stopping at 30 s into the run. The fact that different nodes become sinks (are underloaded) and sources (are overloaded) has a smaller effect on RSA than on the bidding algorithm as long as the number of sinks and sources remains the same, since it chooses among them randomly. The bidding algorithm, though, is able to keep track of the changing pattern of which nodes are sinks and which nodes are sources, and thereby has better performance than RSA.

E. Effect of Focused Addressing and Task Retransmission

In order to evaluate the effect of focused addressing we also compared the performance of the simple bidding algorithm, used by itself, to that of a combined algorithm where, in addition to bidding, focused addressing and task retransmission were used (see Fig. 3). Recall that focused addressing attempts to directly send a task to a very lightly loaded node (in its view). When a task arrives at a focused node, it may not be guaranteed, for example, if the estimate of high surplus at this node was wrong or there was a recent surge of arrivals. If the task is not guaranteed, then it will be transmitted to the node that sends the best bid. If the task is not guaranteed by the node with the best bid, it is sent to the node with the second best bid.

Statistics on these various aspects of the algorithm are shown in Table VI. The table shows that, as the network load increases from 4 to 5 to 6, the percentage of tasks guaranteed locally decreases from 77.0 to 56.5 to 43.7. This implies that more tasks are available to be sent to other nodes. However, as the network load increases significantly, it is less likely that a focused node can be found, and fewer tasks are sent out via focused addressing. This effect is seen in line 3 , where as the load increases from 4 to 5, the percentage of tasks sent out increases, but when we increase the load further to 6, the percentage drops. Also, as the load increases, the estimates of nodes' surplus are less accurate (due, for example, to a possible surge of task arrivals). Hence, the percentage of tasks guaranteed by focused nodes decreases as the load decreases. Of course, it is possible to increase this percentage by increasing the value of the parameter F P . However, notice that as the percentage drops there is a subsequent increase in the percentage of tasks guaranteed by bidding. This implies that bidding "bails out" the poor focused addressing choices.

A decrease in the percentage of tasks guaranteed by focused nodes increases the availability of tasks for bidding.


T A B L E V EFFECT OF BURSTY ARRIVALS AT DIFFERENT NETWORK LOADS

i. Network Load - 4

a. Percentage of Taek9 Guaranteed Network Uide

I Time (sec.) 18 12 14 IB 18 28 22 24 2G 28 48 | I BID 97.1 9G.G 9G.2 95.3 95.5 94.2 94.3 94.8 93.8 93.5 94.8 | I RSA 97.6 97.8 95.9 94.2 94.5 93.2 94.8 94.2 94.8 93.8 94.8 |

b. Percentage of Tasks Moved and Guaranteed I Time (sec.) 18 12 14 1G 18 28 22 24 2G 28 48 | I BID 84.8 84.8 82.5 81.3 82.8 79.1 77.G 77.8 77.2 75.7 79.1 | I RSA 87.1 8G.7 82.5 7G.9 78.5 75.2 7G.5 77.9 78.1 77.9 78.9 |

i i. Network Load - 5

a. Percentage of Tasks Guaranteed

I Time (sec.) 18 12 14 1G 18 28 22 24 2G 28 48 | I BID 91.2 8G.4 86.8 85.2 84.3 84.7 84.4 84.8 84.4 84.6 86.8 | I RSA 88.3 83.6 83.8 82.1 81.5 81.3 81.8 88.8 81.8 81.4 83.7 |

b. Percentage of Tasks Moved and Guaranteed I Time (sec.) 18 12 14 16 18 28 22 24 26 28 48 | I BID 78.5 6G.7 G7.1 63.8 62.2 63.5 62.7 61.8 62.5 62.5 65.1 | I RSA 78.3 61.3 61.4 5G.8 56.2 55.7 55.3 54.4 54.7 55.3 68.8 |

i i i. Network Load - 6

a. Percentage of Tasks Guaranteed

I Time (sec.) 18 12 14 16 18 28 22 24 26 28 48 | I BID 78.3 73.6 74.8 72.3 72.3 71.8 71.8 71.8 72.6 78.9 72.6 I I RSA 75.6 73.1 73.2 78.9 78.6 69.6 71.8 78.3 71.4 78.3 72.8 |

b. Percentage of Tasks Moved and Guaranteed I Time (sec.) 18 12 14 16 18 28 22 24 26 28 48 | I BID 51.7 45.6 46.9 43.8 44.9 42.8 43.8 43.4 43.7 41.4 43.2 | I RSA 45.1 42.6 43.4 41.5 41.8 48.3 41.8 41.1 41.4 39.8 42.8 |

window size - 728 ms average commmunication delay per message - 18 ms period of local scheduler and bidder · 188 ms computation time of local scheduler - 2 me computation time of bidder - 3 me task execution time ratio - 75X average task computation time - 288 ms average task laxity - 688 me The base arrival rates of tasks are the same ae in table 3. Burst arrivals occur in the following way:

node 1 at 18 sec, node 2 at 14 sec, node 3 at 18 sec, node 1 at 22 eec, node 2 at 26 see)

Each buret lasts for 2 seconds. Buret arrival rate is 18 tasks/sec. The surplus estimation only takes into account future tasks arriving

via bidding.

Since bidding uses task-specific information, as the load increases, the percentage of tasks guaranteed via bidding increases. Of course, as the load increases beyond a certain point, this percentage will begin to drop.

Comparing the performance of the two algorithms via Table VI, we see that the combined algorithm performs slightly better than the one using just bidding. For example, under network load 4, the difference in the percentage of tasks guaranteed network-wide is only 1.4 percent. Although this by itself is not significant, it is important to realize that in the combined algorithm, a significant percentage of tasks (17.6 percent under network load 4) are guaranteed via focused addressing. In addition to being guaranteed with lower communication costs than those guaranteed through bidding, the disposition of tasks guaranteed through focused addressing is also known much earlier. This has relevance to those applications where the delay involved in guaranteeing a task is a significant factor.

T A B L E VI COMPARISON OF THE COMBINED ALGORITHM AND THE BIDDING ALGORITHM AT

DIFFERENT NETWORK LOADS

I Netuork Load | 4 | 5 | 6 | | Algorithm | Combined Bidding | 1 Combined Bidding | Combined Bidding | I m | 98.9 97.5 | 1 88.8 86.5 | 72.3 71.5 | I (2) | 77.8 76.8 | I 56.5 5G.1 | 43.7 43.2 | i (3) | 28.5 1 I 24.5 _ I 22.9 I | (4) I 17.G 1 I 14.9 _ I 18.8 I | (5) | 4.4 21.5 | I 16.9 38.4 I 17.3 28.3 j I (6) I 8.8 1 I 8.5 - 1 1.4 1

Notes: (1) X Tasks Guaranteed Network-wide (2) X Tasks Guaranteed Locally (3) X Tasks Sent to Focussed Nodes (4) X Tasks Guaranteed at Focussed Nodes (5) X Tasks Guaranteed at Best Bidder Nodes (6) X Tasks Guaranteed at 2nd Best Bidder Nodes window size - 728 ms communication delay - 28 me FP - 2 periods of local scheduler and bidder - 188 ms computation time of local scheduler and bidder - 1 me average task execution time ratio - 75% average task computation time - 288 me average task laxity - 758 ms ratio of nonperiodic task load to periodic task load - 4:1 The surplus estimation algorithm estimates future surplus based on future local task arrivals.

F. General Observations

We conclude this section by making some general observations based on all our experimental work, not just the simulation results described here.

1) Laxity (deadline minus computation time) of tasks is the most important factor affecting the performance of the system. As the laxity of tasks becomes smaller, fewer tasks are guaranteed. This result was demonstrated in Table I.

2) In a dynamic system, it is unrealistic to assume that tasks always run for their worst case execution time. Our local guarantee algorithm can reclaim the extra CPU time unused by tasks. We have observed that the system can guarantee many more tasks when the average time of execution of tasks is less than their worst case time, e .g. , if tasks ran, on the average, 75 percent of their worst case time.

3) It is important to have short periods of the local scheduler task and the bidder task at a source node when tasks run to their worst case time, but it is not important to do so at a sink node. This is because a source node is heavily loaded and a running task can rarely be preempted to run the local scheduler. Therefore, if the period is long at source nodes, the task wait time and the bid wait time will be prolonged and the performance will degrade. Recall that task wait time and bid wait time are the waiting times of tasks and bids in their respective queues before they are processed. In contrast, at sink nodes, most of the time the local scheduler and the bidder can be run after preempting a running task due to the surplus CPU time. Therefore, the effect of short periods for system tasks will not be substantial. If tasks only run for a fraction of their worst case time, then the effect of reducing the periods of system tasks at source nodes is not as important because the freed CPU time can be used for the invocation of system tasks. There is no effect at the sink nodes because in either case immediate invocation of the system tasks is possible.

1142 IEEE TRANSACTIONS ON COMPUTERS, VOL. C-34, NO. 12, DECEMBER 1985

4) We have observed that if the service rates of the local scheduler and bidder are equal to the arrival rates of tasks and messages, respectively, then increasing their computation time will not increase the number of tasks guaranteed, but, on the contrary, will decrease it. This is because more CPU time is scheduled for the local scheduler and bidder tasks and less CPU time is available for the new tasks.

5) In a bidding process, a bidding node needs to know the estimated bid wait time at the requester node and a requestor node needs to know the estimated task wait time at a bidding node in order to estimate the surplus available for a task at the bidding node. It has been found that it is too pessimistic to assume worst case times in these estimations. Using a moving average estimation or a best case time (an optimistic approach) produces better performance.

6) To respond to RFB's each bidding node needs to estimate the CPU time needed for future local tasks and future nonlocal tasks which will arrive through focused addressing and bidding. The preliminary results have shown that the performance is best when only the estimated CPU time of future local tasks is taken into account, and it is worst when the estimated CPU times for both the future local tasks and nonlocal tasks are taken into account. This is due to being too pessimistic in predicting the CPU time needed for future task arrivals.

(7) In most cases, the difference between the combined algorithm and the one using just bidding is not very significant. However, as mentioned earlier, in the combined algorithm a substantial percentage of tasks are guaranteed by focused nodes. This is important because the fact that a task is guaranteed can be known more quickly when it is guaranteed by a focused node rather than by a best bidder node. Also, the communication costs involved in scheduling tasks guaranteed by focused nodes are small.

8) Focused addressing should be useful for tasks which have small laxities since such tasks may not be able to afford the delay inherent in bidding. However, since the laxity is small, finding a focused node may be difficult, especially at large loads. Under such circumstances, it is better to find a focused node with a smaller FP than to reject the task.

VIII. EXTENSIONS TO THE ALGORITHM

The distributed scheduling algorithm that we have described in this paper is being extended along several directions. These are briefly described below.

A. Special-Purpose Architectures

A logical extension to the current scheme, wherein a separate processor is allocated for the communication module, is an architecture for a real-time distributed system wherein a separate processor, with limited processing capabilities, is allocated for the purposes of scheduling. In this scheme, scheduling overheads are incurred only by the additional processor since the main processing unit is used only for the purpose of executing the actual tasks. We are currently studying different multimicroprocessor architectures suited to our algorithm so that system overhead is relegated to processors running in parallel with user tasks, and so that reliability is increased.

B. Scheduling with Resource Requirements of Tasks

We have been involved in the development of a sophisticated guarantee algorithm that uses an heuristic approach for solving the problem of scheduling tasks with deadlines and general resource requirements in a dynamic real-time system. The crux of our approach lies in the heuristic function used to select the task to be scheduled next. The heuristic function weights three factors: information about tasks ' computation time, laxity, and the utilization of resources [16], [17].

Some other extensions include the consideration of collections of tasks which have precedence constraints among themselves [12], the expansion of the set of estimation techniques used in the system, and the inclusion of task priorities in the scheduling process. All of these extensions are well underway, with results for some of them already reported in the literature as referenced above. Due to space limitations, we do not discuss these extensions here.

IX. CONCLUSION

In this paper we discussed an approach to scheduling tasks with hard real-time constraints, in a loosely coupled distributed system, and have evaluated its performance.

In summary, the main features of our algorithm are the following.

1) It is a distributed scheduling technique; no node has greater importance, as far as scheduling is concerned, than any other.

2) The algorithm is designed to schedule periodic tasks as well as nonperiodic tasks. The latter may arrive at any time.

3) A combined approach of focused addressing and bidding is used for the selection of nodes on which tasks execute. We have worked out the details of the various phases involved in the distributed scheduling process.

4) Overheads involved in scheduling are taken into consideration. This includes the time spent on executing scheduling tasks at a node and the communication delays between nodes.

5) Heuristics and estimation techniques are built into the algorithm. This is necessary given the computationally hard nature of the scheduling problem.

In the course of our initial simulation studies, we noticed that the bidding algorithm used alone performs quite well when task loads are unbalanced in the network. As our current simulation studies show, performance improves when bidding is used in conjunction with focused addressing. In actuality, focused addressing is an intelligent form of random scheduling in that it takes into account nodes' surplus information in choosing a node to send a task. By using a scheme that incorporates focused addressing and bidding, we are able to reap the benefits of both schemes.

REFERENCES

[1] M. Dertouzos, "Control robotics: The procedural control of physical processes," in Proc. IFIP Congr., 1974.

[2] M. R. Garey and D. S. Johnson, "Complexity results for multiprocessor scheduling under resource constraints," SIAM J. Comput., vol. 4, 1975.

[3] R. L. Graham et al., "Optimization and approximation in deterministic sequencing and scheduling: A survey," Ann. Discrete Math., vol. 5, pp. 287-326, 1979.

STANKOVIC et al: EVALUATION OF A FLEXIBLE TASK SCHEDULING ALGORITHM 1143

[4] R. Henn, "Antwortzeitgesteuerte prozesourzuteilung unter strengen zeitbedingungen," Computing, vol. 19, 1978.

[5] Η. H. Johnson and M. S. Madison, "Deadline scheduling for a real-time multiprocessor," NTIS (Ν76Ί5843), Springfield, VA, May 1974.

[6] L. Lamport and R M. Melliar-Smith, "Synchronizing clocks in the presence of faults," SRI International, Mar. 1982.

[7] D. W. Leinbaugh, "Guaranteed response times in a hard real-time environment," IEEE Trans. Software Eng., vol. SE-6, Jan. 1980.

[8] D. W. Leinbaugh and M. Yamini, "Guaranteed response times in a distributed hard-real-time environment," in Proc. Real-Time Syst. Symp., Dec. 1982.

[9] C. L. Liu and J. Layland, "Scheduling algorithms for multiprogramming in a hard real-time environment," J. ACM, vol. 20, no. 1, Jan. 1973.

[10] A. K. Mok and M. L. Dertouzos, "Multiprocessor scheduling in a hard real-time environment," in Proc. Seventh Texas Conf. Comput. Syst., Nov. 1978.

[11] R.R. Muntz and E.G. Coffman, "Preemptive scheduling of real-time tasks on multiprocessor systems," J. ACM, vol. 17, no. 2, Apr. 1970.

[12] K. Ramamritham and J. A. Stankovic, "Dynamic task scheduling in hard real-time distributed systems," IEEE Software, July 1984.

[13] J.D. Schoeffler, "Distributed computer systems for industrial process control," Computer, vol. 17, pp. 11-19, Feb. 1984.

[14] R. G. Smith, "The contract net protocol: High-level communication and control in a distributed problem solver," IEEE Trans. Comput., vol. C-29, Dec. 1980.

[15] T. Teixeira, "Static priority interrupt scheduling," in Proc. Seventh Texas Conf Comput. Syst., Nov. 1978.

[16] W. Zhao, K. Ramamritham, and J. A. Stankovic, "Scheduling tasks with resource requirements in hard real-time systems," IEEE Trans. Software Eng., submitted for publication.

[17] W. Zhao and K. Ramamritham, "Distributed scheduling using bidding and focussed addressing," in Proc. Symp. Real-Time Syst., Dec. 1985.

means for controlling and scheduling tasks on distributed real-time systems, and developing database partitioning protocols.

Prof. Stankovic received the 1983 Outstanding Scholar Award for the School of Engineering, University of Massachusetts. He has recently published a tutorial text entitled Reliable Distributed System Software and is the Program Chairman for the 1985 Real-Time Systems Symposium.

Krithivasan Ramamritham received the B.Tech. degree in electrical engineering and the M.Tech. degree in computer science from the Indian Institute of Technology, Madras, India, in 1976 and 1978, respectively, and the Ph.D. degree in computer science from the University of Utah, Salt Lake City, in 1981.

He is currently an Assistant Professor in the Department of Computer and Information Science at the University of Massachusetts, Amherst. His research interests include software engineering, oper

ating systems, and distributed computing. Specifically, his current work covers the specification, verification, and synthesis of resource controllers, scheduling algorithms for real-time systems, mechanisms for access control, i.e., protection, in distributed systems, and the incorporation of transactions in distributed object-oriented systems.

Dr. Ramamritham is a member of the Association for Computing Machinery and the IEEE Computer Society.

John A. Stankovic (S'77-M'79) received the B.S. degree in electrical engineering, and the M.S. and Ph.D. degrees in computer science, all from Brown University, Providence, Rl, in 1970, 1976, and 1979, respectively.

He is currently an Associate Professor in the Department of Electrical and Computer Engineering, University of Massachusetts, Amherst. His current research includes investigating various approaches to distributed scheduling on general-purpose but highly integrated distributed systems, developing

Shengchang Cheng received the B.S. degree in electrical engineering from the National Taiwan University, Taiwan, Republic of China, in 1976 and the M.S. degree in electrical and computer engineering from the University of Massachusetts, Amherst, in 1983, and he is currently a Ph.D. candidate in the Department of Electrical and Computer Engineering, University of Massachusetts.

He was with Logitech Inc., Taiwan, from 1978 to 1979, as a microcomputer Design Engineer and with United Electronic, Taiwan, from 1979 to 1981, as a

Senior Design Engineer. Since 1982 he has been a Research Assistant at the University of Massachusetts. His research interests are in the areas of distributed operating systems, dynamic distributed real-time systems, and multiprocessor architectures.

Mr. Cheng is a member of the Association for Computing Machinery and the IEEE Computer Society.

Documents

Evaluation of a flexible task scheduling algorithm for distributed hard real-time systems