12
SOFTWARE—PRACTICE AND EXPERIENCE, VOL. 27(9), 1055–1066(SEPTEMBER 1997) A Flexible Real-Time Scheduling Abstraction: Design and Implementation SIU LING ANN LO Newbridge Networks Corporation, 8999 Nelson Way, Burnaby, B.C., Canada V5A 4B5 (email: [email protected]) NORMAN C. HUTCHINSON Department of Computer Science, University of British Columbia, Vancouver, B.C., Canada V6T 1Z4, (email: [email protected]) AND SAMUEL T. CHANSON Department of Computer Science, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong (email: [email protected]) SUMMARY An evolution is happening in the way that operating systems support the real-time requirements of their applications. The need to run real-time applications such as multimedia in the same environment as complex non-real-time servers and applications has motivated much interest in restructuring existing operating systems. Many issues related to thread scheduling and synchronization have been investigated. However, little consideration has been given to the flexibility and modularity required in the support of application- level scheduling needs, although it is well known that application requirements are diverse. In this paper, we describe a real-time scheduling abstraction which provides modularity and flexibility to the scheduling support of operating systems. Our design has been implemented using the Mach 3.0 kernel and a locally developed multiprocessor kernel (the r-kernel) as development platforms. 1997 by John Wiley & Sons, Ltd. KEY WORDS: real-time scheduling; soft real-time; multimedia; operating systems; flexibility INTRODUCTION Historically, real-time operating systems have implemented a single scheduling algorithm in their kernel as part of the support for threads. If an application changed and demanded richer scheduling support, it was necessary to move to a new operating system which included such a scheduler. More recently, real-time operating systems have attempted to provide richer scheduling semantics in two ways. Some systems implement multiple schedulers in the kernel and allow each application to choose the scheduler that best suits their needs. 1,2 The problem with this approach is that an application developer with scheduling needs not supported by any of the available kernel schedulers must choose another kernel. Other systems export the CPU scheduling problem to the application, with the kernel making upcalls from supervisor to user space 3 each time a scheduling decision is required. 4 While this solution allows the application CCC 0038–0644/97/091055–12 $17 50 Received 25 July 1994 1997 by John Wiley & Sons, Ltd. Revised 24 January 1997

A flexible real-time scheduling abstraction: design and implementation

Embed Size (px)

Citation preview

Page 1: A flexible real-time scheduling abstraction: design and implementation

SOFTWARE—PRACTICE AND EXPERIENCE, VOL. 27(9), 1055–1066(SEPTEMBER 1997)

A Flexible Real-Time Scheduling Abstraction:Design and Implementation

SIU LING ANN LO

Newbridge Networks Corporation, 8999 Nelson Way, Burnaby, B.C., Canada V5A 4B5(email: [email protected])

NORMAN C. HUTCHINSONDepartment of Computer Science, University of British Columbia, Vancouver, B.C., Canada V6T 1Z4,

(email: [email protected])

AND

SAMUEL T. CHANSONDepartment of Computer Science, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon,

Hong Kong (email: [email protected])

SUMMARY

An evolution is happening in the way that operating systems support the real-time requirements of theirapplications. The need to run real-time applications such as multimedia in the same environmentas complexnon-real-time servers and applications has motivated much interest in restructuring existing operatingsystems. Many issues related to thread scheduling and synchronization have been investigated. However,little consideration has been given to the flexibility and modularity required in the support of application-level scheduling needs, although it is well known that application requirements are diverse. In this paper,we describe a real-time scheduling abstraction which provides modularity and flexibility to the schedulingsupport of operating systems. Our design has been implemented using the Mach 3.0 kernel and a locallydeveloped multiprocessor kernel (the r-kernel) as development platforms. 1997 by John Wiley & Sons,Ltd.

KEY WORDS: real-time scheduling; soft real-time; multimedia; operating systems; flexibility

INTRODUCTION

Historically, real-time operating systems have implemented a single scheduling algorithmin their kernel as part of the support for threads. If an application changed and demandedricher scheduling support, it was necessary to move to a new operating system which includedsuch a scheduler. More recently, real-time operating systems have attempted to provide richerscheduling semantics in two ways. Some systems implement multiple schedulers in the kerneland allow each application to choose the scheduler that best suits their needs.1,2 The problemwith this approach is that an application developer with scheduling needs not supported by anyof the available kernel schedulers must choose another kernel. Other systems export the CPUscheduling problem to the application, with the kernel making upcalls from supervisor to userspace3 each time a scheduling decision is required.4 While this solution allows the application

CCC 0038–0644/97/091055–12 $17�50 Received 25 July 19941997 by John Wiley & Sons, Ltd. Revised 24 January 1997

Page 2: A flexible real-time scheduling abstraction: design and implementation

1056 S. L. A. LO, N. C. HUTCHINSON AND S. T. CHANSON

developer to tailor the policy to suit his application, typically a complete scheduler must beimplemented in user space to accomplish this. Neither of these solutions provide architecturalsupport for the smooth evolution of scheduling support over the lifetime of the application.

In this paper, we describe a scheduling abstraction which allows for flexibility in the im-plementation of application-level schedulers as well as better modularity of both the kerneland the application. A previous paper5 describes the architecture of our approach as wellas providing additional rationale. We have implemented this abstraction in both the Mach3.0 kernel6 and the r-kernel, a small multiprocessor kernel developed at UBC.7 Our experi-ences with these two kernels have provided an interesting education into the pitfalls in theimplementation of thread schedulers. While these problems do not have a serious impact ontime-sharing systems, they are very detrimental to real-time applications.

SCHEDULER DESIGN

Background

An overriding concern for real-time systems is to achieve predictable performance sothat important performance characteristics of the applications can be guaranteed. This hashistorically meant that a single resource scheduler is implemented in the kernel of the real-time operating system. In addition, any features that may negatively interact with the schedulerare not allowed. For example, since the FCFS rule used in the synchronization mechanismsof many systems can cause priority inversions, it is prohibited in real-time thread models.Several real-time kernels have been developed based on these notions (e.g. ARTS,8 RT Mach9

and Spring.10)Implementing an integrated resource scheduling mechanism in the kernel sacrifices flexi-

bility in a number of ways. First, resource scheduling algorithms differ so much from eachother that a kernel designed to implement one algorithm must be rewritten to support another.Second, all existing resource scheduling kernels assume that the only synchronization thatmultiple tasks will require is mutual exclusion for shared resources. In today’s more complexapplications, real-time and non-real-time tasks are required to cooperate – for example inthe handling of asynchronous external events – and therefore more general synchronizationmechanisms are required. Third, the only form of timing errors that are handled are schedulingerrors, which occur when a thread is not able to complete by its deadline. More general timingerrors are not considered, including those caused by external events, badly behaved threads,or overload.

Several real-time operating systems have avoided the difficulties involved in implementinga general resource scheduler by providing a small set of CPU scheduling policies in thekernel, and expecting that users will be able to provide additional necessary functionality ontop of these primitives. For example, Real-Time Mach2 implements the Time Sharing, FixedPriority, Rate Monotonic, Deadline Monotonic, and Earliest Deadline First policies, whileQNX11 implements three variations on fixed priority scheduling: round-robin, first-come-first-served, and an adaptive scheme that adjusts priorities based on recent CPU demands.

A layered scheduling approach

Our approach has some similarities to that taken by RT Mach and QNX in that we providea CPU scheduler in the kernel, on top of which more complex scheduling mechanisms can

Page 3: A flexible real-time scheduling abstraction: design and implementation

A FLEXIBLE REAL-TIME SCHEDULING ABSTRACTION 1057

easily be implemented. Our approach differs from others in that we have worked very hardto provide the right primitives and an architecture for implementing additional applicationspecific functionality on top of the kernel primitives. Also, we separate the notion of resourcescheduling from the thread synchronization and communication mechanism; any thread syn-chronization and communication mechanism may be used. We do not intend to provide aresource scheduling solution, but rather a foundation upon which any resource scheduler canbe built.

The CPU scheduler

We provide a hierarchical scheduling scheme which supports both the preemptive fixedpriority rule and the preemptive earliest-deadline-first (EDF) rule. The EDF rule schedulesthreads according to their non-decreasing deadlines.12 Both the priority rule and the EDF ruleare commonly used in real-time systems and have been extensively studied.13,14,15,16,17,18,19,20

While even this functionality could be exported to application space, the prevalence of theseschemes in real-time applications and the additional overhead involved in exporting theirimplementation argue that they should be supported by the kernel.

In our hierarchical scheduling scheme, each thread ti is assigned a priority pi, an earlieststarting time si, a deadline di and an arrival time ai. If si is specified, then the thread will besuspended until si. Threads are scheduled strictly according to priority. A running thread willbe preempted by a higher priority thread. At the same priority level, threads with deadlinesare scheduled by the earliest-deadline-first rule, and threads without deadlines are scheduledby the first-come-first-served (FCFS) rule (according to ai). When threads with deadlines andthreads without deadlines share the same priority level, the former are scheduled before thelatter. In other words, a thread without a deadline behaves as if it had a deadline infinitely farinto the future. A diagram of the hierarchical scheduler is shown in Figure 1.

In our design, complex scheduling mechanisms can be implemented by user-level schedulerswhich assign and change the scheduling attributes of other threads. The earliest starting time ofeach thread is measured in absolute time and serves as a point of reference for dynamic changesin scheduling. The priority rule supports the implementation of application-level schedulerswhich can span more than one priority level.5 At each priority level, the earliest-deadline-firstrule is supported so that an application-level scheduler running at a higher priority can usethis rule to schedule threads at a lower priority level. The first-come-first-served rule is alsosupported because its performance is deterministic and because it preserves the order of threadarrivals. This order is necessary when threads handling various external events must run in thesame order as the events.

By supporting both priorities and deadlines in the kernel, our scheduler trivially satisfies thedemands of a large body of existing real-time applications that depend on one or the other ofthese techniques. More importantly, our scheduler provides key building blocks from whichapplication specific scheduling policies can be implemented without requiring kernel changes.

Alarm capabilities

The hierarchical scheduler supports an alarm capability mechanism for the detection andhandling of timing errors. An alarm capability is associated with a thread t

alarm. A real-time

application acquires an alarm capability in anticipation that a future program eventX may nothappen on time. The earliest starting time of t

alarmis assigned the latest time that this program

Page 4: A flexible real-time scheduling abstraction: design and implementation

1058 S. L. A. LO, N. C. HUTCHINSON AND S. T. CHANSON

Figure 1. The hierarchical scheduler

event should happen. The application also supplies an error handling function and specifiesthe priority and optional deadline of talarm. When event X occurs, the application cancelsthe alarm, which resets the scheduling attributes of talarm so that it will not run. Otherwise,talarm will run the error handling function at the specified time.

The alarm capability mechanism provides a flexible means to handle timing errors sincethe application controls both the error handling function and the scheduling attributes thatapply to the error handler. This allows for the implementation of application-specific gracefuldegradation strategies during system overload. Moreover, a wide scope of timing errors canbe accommodated, whether they are caused by resource congestion or simply because someexternal events do not happen when expected.

The scheduler interface

The interface to the scheduler consists of the seven functions in Figure 2. The functionset thread sched attr() unifies all aspects of thread scheduling operations which include thesuspend and resume operations. A thread is suspended when its starting time is set to a timein the future; it is resumed when its starting time is set to the present time or a past time. Thefunction set alarm() sets the scheduling attributes of the alarm thread identified by alarm id.The function alarm func forms the body of the alarm thread and is passed arg as an argument.An alarm is cancelled by calling reset alarm() which suspends the alarm thread if the latterhas not started to run. If the alarm thread has started to run, this call does nothing.

Page 5: A flexible real-time scheduling abstraction: design and implementation

A FLEXIBLE REAL-TIME SCHEDULING ABSTRACTION 1059

kern_return_t

get_alarm_from_pool( alarmID *id );

return_alarm_to_pool( alarmID *id );

set_alarm( func_t alarm_func, int arg, sched_attr attr, alarmID *id );

reset_alarm( alarmID alarm_id );

rthread_create( func_t func, sched_attr attr );

alloc_alarm_pool( int num_of_alarms );

struct time_value_t startingtime;int priority;struct time_value_t deadline;

typedef struct sched_attr_t {

} sched_attr;

thread_t

int

int

int

int

int

set_thread_sched_attr( thread_t thread_id, sched_attr attr );

Figure 2. Scheduling interface

IMPLEMENTATION EXPERIENCE

We have implemented our design on two different development platforms: the Mach 3.0kernel6 and the r-kernel.7 We selected the Mach 3.0 kernel as a development environmentbecause it provides rich functionality for the implementation of a time-sharing scheduler,is a widely available development platform, and closely matches our design philosophy ofproviding a small set of core functionality on top of which more sophisticated algorithms couldbe implemented. We selected the r-kernel for its light-weight user-level thread implementationon a shared-memory parallel processor. Both the Mach 3.0 kernel and the r-kernel providesome form of priority scheduling, but their notions of priority scheduling are not preemptivefirst-come-first-served.

The CPU scheduler

The preemptive fixed priority based scheduling algorithm is well-known in the real-timecommunity for its simplicity and predictable performance. However, our experience has beenthat many kernels which implement priority scheduling do not pay sufficient attention to detailto make the resulting scheduler useful in a real-time environment. Attention must be paid toevery kernel interface in order to achieve the predictable performance expected by a real-timescheduler. This section describes briefly a number of problem areas that afflicted our efforts toimplement our real-time scheduler in both the r-kernel and in Mach 3.0. We document thesedifficulties not to detract from the virtues of our two host operating systems, but to highlightthe difficulties involved in modifying the core scheduler of an operating system to implementnew functionality. These difficulties support our claim that modifying the scheduler to address

Page 6: A flexible real-time scheduling abstraction: design and implementation

1060 S. L. A. LO, N. C. HUTCHINSON AND S. T. CHANSON

the changing needs of applications is not an effective strategy.Both the Mach and r-kernel schedulers support time sharing by allocating the processor to

threads for a fixed quantum of time in a round-robin manner. Modifying such a scheduler tosupport preemptive FCFS scheduling involves more than just disabling the periodic reschedul-ing that implements the round-robin mechanism; serious attention must also be paid to theproblem of preemption. Each time that a thread is preempted, either by the clock or by thearrival of a higher priority thread, it is reinserted into the ready queue after all other threads ofthe same priority. This causes some unfairness in the time sharing of the processor as a threadso preempted loses the rest of its scheduling quantum. In a time sharing environment thisis a very minor effect, and is experienced by all threads equally. In a real-time environmenthowever, this seemingly innocent re-insertion completely violates FCFS ordering and injectsnon-determinism, and therefore non-predictability, into the system. In order to achieve strictFCFS semantics, we must ensure that a thread which is preempted maintains its position atthe head of the ready queue.

In a uniprocessor environment this can be done by providing an interface to insert a threadinto the ready queue either before or after all other threads of the same priority and deadline. Apreempted thread is inserted before other threads, a newly ready thread is inserted after otherthreads. In a multiprocessor environment a different solution must be used since there aremultiple running threads and their relative order must be maintained. We assign a time stampto each thread when it is inserted into the ready queue. Later, when this thread is selected torun, preempted and re-inserted into the ready queue, this time stamp is used to determine theFCFS order among the threads with the same scheduling attributes.

Stack hand-off is an important optimization technique for the handling of kernel stacks inMach 3.0.21 Among various optimization techniques explored in that paper, stack hand-offprovides the most substantial performance gain. When a thread which is sending a messageenters the kernel and finds a thread waiting to receive the message, it transfers its kernel stackto the receiving thread directly. The running kernel thread is also changed to the receivingthread so that it can run in the context of the sender’s system call before exiting the kernel.Since the receiver inherits the sender’s context, both the sender’s and receiver’s messageprocessing can be optimized.

In a real-time context, the stack hand-off optimization is obtained at the cost of a potentialviolation of priority scheduling, which occurs when the receiving thread has a lower prioritythan other ready threads. In the Mach 3.0 system, since server threads are assigned higherpriorities than client threads, the hand-off from a client thread to a server thread should notviolate the priority rule. However, when a server thread sends a reply message to its client,the hand-off from the server to the client can cause a violation. Consider the situation wherea low priority client thread C sends a request message to a high priority server thread S.Before thread S replies to thread C, a medium priority thread M enters the system. Sincethread M has a priority between thread C and thread S, thread M should run when threadS blocks. When thread S sends a reply message to thread C, however, there is a hand-offto thread C causing a violation of priority scheduling. Our solution to this problem is toforce each potential stack hand-off to be examined by the scheduler. A hand-off is permittedonly if it does not violate the scheduling rules. The performance improvement due to stackhand-off is somewhat reduced. However, this is a minor issue compared to the correctness ofthe scheduling mechanism.

In our implementation of set thread sched attr(), any rescheduling required by changesto scheduling attributes is performed immediately. Mach 3.0 (versions before MK84) usesa function thread priority() which sets the priority of a thread. However, its algorithm for

Page 7: A flexible real-time scheduling abstraction: design and implementation

A FLEXIBLE REAL-TIME SCHEDULING ABSTRACTION 1061

rescheduling has been optimized for the time-sharing policy. In this algorithm, when the callerlowers its own priority, the scheduler does not check immediately whether another thread of ahigher priority should become active. The check is delayed until the expiration of the currentscheduling quantum. A similar situation occurs in the r-kernel, where checks for reschedulingare performed only when a quantum expires.

Kernel debugging

It has been much more difficult to change the kernel scheduler than we anticipated. Inthe r-kernel this difficulty stems from the tighter than anticipated relationship between thekernel and the user level thread scheduler. For example, a thread holding a user level lockwill not be preempted by the kernel. This significantly restricted how locks can be used in thescheduler. The existence of a previously undiscovered bug in the upcall mechanism added tothe difficulty, as it resulted in many strange and unrepeatable scheduling errors and systemcrashes.

In Mach, the size and complexity of the scheduler was the major difficulty. One mightnaively assume that the scheduler would be implemented as a single module with a narrowinterface to other kernel components. In fact, scheduling is a fundamental kernel mechanismand has a very wide interface, which is occasionally bypassed. While there are some functionswhich clearly implement the priority rule and the operations on the ready queues, a large partof the scheduler is implemented by a collection of other routines which block the runningthread and select a ready thread to run under a variety of circumstances. Ensuring that allof these correctly support a new scheduling discipline is not a simple task. The existence ofoptimizations like stack hand-off which bypass the scheduler completely further complicatesmatters.

A major difficulty in modifying the scheduler is determining whether every possible ex-ecution path properly obeys the new scheduling policy. Scenarios which are known to beproblematic can be verified by carefully constructing a collection of threads which attempt tocreate a violation of the scheduling rules and report their progress so that violations can bedetected. The sound application described later in this paper was invaluable as a schedulingviolation detection mechanism.

PERFORMANCE MEASUREMENTS

We have measured the performance of our primitives by running Mach 3.0 with our modifica-tions on both Sun/3 and 486 processors, and running the modified r-kernel on Motorola 88kprocessors. Table I gives the performance measurements on a single processor. The perfor-mance of set thread sched attr() is reported separately for the case where its execution causesa context switch to another thread and where it does not.

When the performance of the two kernels is compared, the call set thread sched attr() inthe r-kernel outperforms that in the Mach 3.0 kernel,� because the r-kernel supports threadscheduling in user space. The differences in the performance of the alarm capability mechanismfor the two kernels, on the other hand, are due to the two different implementations that wereused in the two kernels.

Table II gives a more detailed description of the performance of set thread sched attr() on� Note that the 88k is slower than the 486. The SPECint92 benchmarks are 17.4 and 30.1 for 88100 (25 MHz, 16K data cache

and 16K instruction cache) and 50 MHz 486 DX, respectively.

Page 8: A flexible real-time scheduling abstraction: design and implementation

1062 S. L. A. LO, N. C. HUTCHINSON AND S. T. CHANSON

Table I. Single processor performance (microseconds)

Mach 3.0 Mach 3.0 r-kernelSun 3/60 i486 M88k

set thread sched attr (no context switch) 114.5 16.1 16.1set thread sched attr (context switch) 381.8 47.7 67.7rthread create 4150.0 1133.3 77.0alloc alarm pool(1) 3803.8 1034.1 87.0get alarm & return alarm 6.9 1.7 18.0set alarm & reset alarm 263.4 39.7 65.6

the Motorola 88k. Context switches can be triggered by changing priority, setting a thread’sstarting time to a later time, or by suspending and resuming threads. The performance ofset thread sched attr() has been measured for all three of these different types of schedulingattribute changes. In the local case, when the running thread raises the priority of anotherthread, the running thread is preempted, and the performance is the same as on single CPU.The differences in the performance of set thread sched attr() in the local case are merely dueto the differences in queuing. In the remote case, the performance is affected by the mechanismwhich is used to cause a context switch. When the priority of another thread is raised, thethread whose priority is raised may need to preempt the running thread in some other CPU.All remote CPUs are informed of the need to reexamine the set of ready threads and considerpreempting their currently running thread. When a running thread suspends itself or sets itsstarting time to a later time, remote CPUs are not notified and the overhead is somewhat lower.

APPLICATIONS

Our hierarchical scheduling algorithm can be used to schedule real-time application threadsdirectly, or as a means to implement more complex scheduling policies. We first describe asound application which uses the hierarchical scheduling algorithm directly. Following that,we give examples of how other scheduling policies are implemented on top of our scheduler.

A sound application

The sensitivity of human perception to slight variations in rhythmic patterns has madeprecise control over the timing of sound an important consideration in multimedia systems.In order to experiment with real-time requirements, we have developed an application whichplays a sound sequence according to user data. This sound sequence is periodic, but doesnot necessarily have a constant period, since a single stream can be the result of mergingmultiple sound streams together. Our implementation includes a sound thread which controlsthe timing of the sound sequence and a thread which handles user input. The sound thread runsat a higher priority than both the user input thread and all other applications so that the timingof the sound sequence is precise. After playing a note, the sound thread changes its startingtime to the time when the next note should be played, by calling set thread sched attr(). Themutual exclusion required between the two threads to prevent corruption of shared data canbe obtained by using priority in a uniprocessor environment or by using locks.

Page 9: A flexible real-time scheduling abstraction: design and implementation

A FLEXIBLE REAL-TIME SCHEDULING ABSTRACTION 1063

Table II. Multiprocessor performance (microseconds)

local remoteraise/lower priority 134.3 214.7set/reset starting time 127.3 165.2suspend/resume 104.3 137.6

Multimedia systems handle temporary resource congestion in different ways than other real-time applications since human perception plays an important role.22 If a synthesizer has reachedits capacity to synthesize simultaneous voices, some voices may be omitted without beingnoticed. When resource congestion is predicted, the pace of sound generation and video replaymay be scaled. The need for an experimental approach to human perception makes it importantfor the kernel to provide flexible scheduling. The primitive set thread sched attr() and thealarm capability mechanism enable experimentation into solutions for resource congestion.For example, a high priority alarm thread can be used to abort activities which exceed theirdeadlines, while a low priority one can maintain statistics or log errors.

This sound application has been very useful for discovering problems in the implementationof the scheduler. In our experiments, a variation of as few as 20 milliseconds in the timing ofsynthesized sound is very noticeable. We discovered several violations of FCFS scheduling bycreating a periodic thread (C) at the same priority as our sound thread (S). Since both C andS are periodic and our scheduler is FCFS within a priority level, the sound sequence that Swants to play is distorted by C since no notes can be played during the time thatC is running.Supposing that S plays a note every t time units and C consumes 1:5t time units of processortime each time it runs, we expect the situation in Figure 3. When thread C runs, a long silentperiod occurs. This silent period should be even longer if thread C is ever preempted by athread of a higher priority. However, when the FCFS rule is violated as shown in Figure 4,there are two shorter silent periods instead. The result is a striking variation in the rhythm.

sound thread S

time

priority

thread C

Figure 3. Expected sound sequence

Page 10: A flexible real-time scheduling abstraction: design and implementation

1064 S. L. A. LO, N. C. HUTCHINSON AND S. T. CHANSON

sound thread S

time

prioritythread C

Figure 4. Abnormal sound sequence

Supporting other scheduling policies

One of the major goals of this work is to provide a kernel scheduler interface that is suffi-ciently general to permit more complex resource scheduling mechanisms to be implementedin user-level schedulers. This section describes how to use our interface to implement severalpopular CPU and resource scheduling policies.

The kernel scheduler implements EDF and preemptive fixed priority scheduling, and sodirectly supports many real-time applications which require either of these scheduling policies,including any based on the Rate Monotonic or Deadline Monotonic priority assignment rules,since these require only priority scheduling.

The priority inversion problem occurs when threads of different priorities are ordered by theFCFS rule in their access to a shared resource. This shared resource may be a critical section,lock, semaphore, communication primitiveor hardware resource. Since these resources cannotbe preempted, it can happen that a high priority thread must wait for a low priority thread tofinish using a shared resource before the high priority thread can continue. Priority inversionbecomes a problem when a low priority thread which holds a FCFS resource is preempted bya medium priority thread which does not need this resource. The high priority thread will thusbe blocked until the medium priority thread completes and allows the low priority thread torun and relinquish the resource.

Priority inheritance protocols23 are designed to bound blocking time in accordance withthread priority. In the basic priority inheritance protocol, a thread which holds a resourceinherits the highest priority of the threads waiting for the resource. Moreover, the threadswaiting are queued in descending order of their priorities. This strategy guarantees that athread can only be blocked by higher priority threads and at most one lower priority threadeach time it requests a resource.

Besides the basic scheme described above, there are other priority inheritance protocolswhich consider more complicated situations involving multiple resources and need moreinformation about the resources. For example, in the priority ceiling protocol the priorityceiling of a resource is defined as the highest priority of any thread which may access thisresource.

The primitive set thread sched attr() can be used to implement any priority inheritanceprotocol. Two library calls are needed; one called before obtaining a resource, and the other

Page 11: A flexible real-time scheduling abstraction: design and implementation

A FLEXIBLE REAL-TIME SCHEDULING ABSTRACTION 1065

called when relinquishing it. The data structures describing which thread holds the resourceand which threads are waiting for it are maintained by these two library calls. To ensure theintegrity of these data structures, the library calls must not be executed by more than onethread at the same time. This can be achieved by either raising the priority of the caller inthe uniprocessor case or with a lock. The call made before obtaining the resource raises thepriority of the thread holding the resource if necessary; the call made on relinquishing theresource restores the priority of the caller. In this approach, since the priority inheritanceprotocols are implemented in the user library, the implementation can be tailored to meet thespecial needs of the application without any change to the kernel.

In some scheduling schemes, an off-line scheduling algorithm is used to produce a staticschedule for threads with critical importance.24,25 The static schedule specifies the startingtime of each thread and the resources the thread is allocated. In our approach, this can beimplemented by using the primitive set thread sched attr() to set the starting times of thecritical threads. Some on-line schedulers which perform schedulability checks are designedto allocate unused resources to other less critical threads.26 These schedulers can be assigneda lower priority than the critical threads.

The EDF scheduling algorithm is used by real-time systems for optimal use of the CPU.Many variations of EDF scheduling can be achieved by utilizing the other features of ourscheduler. If non-preemptive EDF scheduling is desired, a thread can raise its priority duringits initialization to become non-preemptable. If strict deadlines are desired, where a threadshould be suspended or aborted when it reaches its deadline before completing, the alarmcapability mechanism can be used. An alarm thread scheduled to run at the deadline and witha higher priority can suspend or abort the thread which has exceeded its deadline.

CONCLUSIONThe goal of our real-time scheduling abstraction is to provide architectural support for theimplementation of application-level scheduling policies. In our design, an on-line hierarchi-cal scheduler provides the basic scheduling support. More complex scheduling mechanismscan easily be implemented by cooperating threads which assign and change the schedulingattributes of other threads. The flexibility of our scheduling abstraction is particularly usefulto applications with both dynamically changing and evolving timing requirements.

We have implemented our abstraction by using the Mach 3.0 kernel and the r-kernel as devel-opment platforms. The lessons we have learned from these implementationshave strengthenedour belief in the importance of a flexible scheduling interface. Successful modification of thethread scheduler requires detailed knowledge about the kernel implementation as evidencedby the pitfalls we have encountered. Such knowledge should not be required by an applicationprogrammer who needs to implement an application-level scheduler.

The experience we have gained by both building and using real-time schedulers has led us tobelieve very strongly that flexibility is the key to usability. Kernel designers must ensure thatthe primitives that they provide can be used to satisfy the evolving demands of applications.Application programmers must be aware that the real-time requirements of applications willchange over the lifetime of the application, and avoid tying the implementation to a single fixedscheduler which cannot be extended. By providing a flexible scheduling interface, our designmakes it possible for the application programmer to build the application-level scheduler ontop of a stable thread scheduler and a reliable kernel. This approach is advantageous to bothembedded systems and general-purpose operating systems.

Page 12: A flexible real-time scheduling abstraction: design and implementation

1066 S. L. A. LO, N. C. HUTCHINSON AND S. T. CHANSON

REFERENCES

1. H. Tokuda, J. W. Wendorf and H. Y. Wang, ‘Implementation of a time-driven scheduler for real-time operatingsystems’, Proceedings of the IEEE Real-Time Systems Symposium, December 1987, pp. 271–280.

2. H. Tokuda, T. Nakajima and P. Rao, ‘Real-time mach: towards a predictable real-time system’, Proceedingsof the USENIX Mach Workshop, Burlington, VT, October 1990, pp. 73–82.

3. D. D. Clark, ‘The structuring of systems using upcalls’, Communications of the ACM, 28(12), 171–180(December 1985).

4. T. E. Anderson, B. N. Bershad, E. D. Lazowska and H. M. Levy, ‘Scheduler activations: Effective kernelsupport for the user-level management of parallelism’, Proceedings of the Thirteenth ACM Symposium onOperating System Principles, October 1991, pp. 95–109.

5. Siu Ling Ann Lo, N. C. Hutchinson and S. T. Chanson, ‘Architectural considerations in the design of real-timekernels’, Proceedings of the IEEE Real-Time Systems Symposium, December 1993, pp. 138–147.

6. D. L. Black, D. B. Golub, D. P. Julin, R. F. Rashid, R. P. Draves, R. W. Dean, A. Korin, J. Barrera, H.Tokuda, G. Malan and D. Bohman, ‘Microkernel operating system architecture and mach’, Usenix WorkshopProceedings of Micro-kernels and other Kernel Architectures, April 1992, pp. 11–30.

7. D. Stuart Ritchie and G. W. Neufeld, ‘User level ipc and device management in the raven kernel’, UsenixWorkshop Proceedings of Micro-kernels and other Kernel Architectures, September 1993, pp. 111–125.

8. C. W. Mercer and H. Tokuda, ‘The arts real-time object model’, Proceedings of the IEEE Real-Time SystemsSymposium, December 1990, pp. 2–9.

9. T. Nakajima, T. Kitayama, H. Arakawa and H. Tokuda, ‘Integrated management of priority inversion inreal-time mach’, Proceedings of the IEEE Real-Time Systems Symposium, December 1993, pp. 120–130.

10. J. A. Stankovic and K. Ramaritham, ‘The spring kernel: A new paradigm for real-time systems’, IEEESoftware, 62–72 (May 1991).

11. D. Hildebrand, ‘An architectural overview of qnx’, Proceedings of the USENIX Workshop on Micro-Kernelsand Other Kernel Architectures, April 1992.

12. J. Blazewicz, ‘Scheduling dependent tasks with different arrival times to meet deadlines’, in W. Gelenbe (ed.),Modelling and Performance Evaluation of Computer Systems, North-Holland, 1976, pp. 57–65.

13. C. L. Liu and J. W. Layland, ‘Scheduling algorithms for multiprogramming in a hard real-time environment’,Journal of the Association for Computing Machinery, 20(1), 46–61 (January 1973).

14. J.Y.T. Leung and J. Whitehead, ‘On the complexity of fixed-priority scheduling of periodic, real-time tasks’,Performance Evaluation, 2(4), 237–250 (December 1982).

15. M. Joseph and P. Pandya, ‘Finding response times in a real-time system’, Computer Journal, 29(5), 390–395(1986).

16. P. K. Harter, ‘Response times in level-structured systems’, ACM Transactions on Computer Systems, 5(3),232–248 (August 1987).

17. J. P. Lehoczky, ‘Fixed priority scheduling of periodic task sets with arbitrary deadlines’, Proceedings of theIEEE Real-Time Systems Symposium, December 1990, pp. 201–209.

18. M. Gonzalez Harbour, M. H. Klein and J. P. Lehoczky, ‘Fixed priority scheduling of periodic tasks withvarying execution priority’, Proceedings of the IEEE Real-Time Systems Symposium, December 1991, pp.116–128.

19. S. Ramos-Thuel and J. K. Strosnider, ‘The transient server approach to scheduling time-critical recoveryoperations’, Proceedings of the IEEE Real-Time Systems Symposium, December 1991, pp. 286–295.

20. K. Jeffay and D. Stone, ‘Accounting for interrupt handling costs in dynamic priority task systems’,Proceedingsof the IEEE Real-Time Systems Symposium, December 1993, pp. 212–221.

21. R. P. Draves, B. N. Bershad, R. F. Rashid and R. W. Dean, ‘Using continuations to implement threadmanagement and communication in operating systems’, Proceedings of the Thirteenth ACM Symposium onOperating System Principles, October 1991, pp. 122–136.

22. S. Flinn, ‘Timing and synchronization of sound and image’. University of British Columbia, 1993.23. R. Rajkumar, Synchronization in Real-Time Systems, A Priority Inheritance Approach, Kluwer Academic,

1991.24. T. Shepard and J. A. Martin Gagne, ‘A pre-run-time scheduling algorithm for hard real-time systems’, IEEE

Transactions on Software Engineering, 17(7), 669–677 (July 1991).25. J. Xu and D. L. Parnas, ‘Scheduling processes with release times, deadlines, precedence and exclusion

relations’, IEEE Transactions on Software Engineering, 16(3), 360–369 (March 1990).26. W. Zhao, K. Ramamritham and J. A. Stankovic, ‘Scheduling tasks with resource requirements in hard real-time

systems’, IEEE Transactions on Software Engineering, 13(5) (May 1987).