Online Energy-Efﬁcient Hard Real-Time Scheduling for ...adt.cs.upb.de/wolfgang/isorc2012.pdf · with a computational complexity of O(1) at each scheduling point, which provides

Online Energy-Efficient Hard Real-Time Schedulingfor Component Oriented Systems

Da HeUniversity of Paderborn/C-LAB

Paderborn, GermanyEmail: [email protected]

Wolfgang MuellerUniversity of Paderborn/C-LAB

Paderborn, GermanyEmail: [email protected]

Abstract—The energy efficiency becomes one of the mostimportant concerns in mobile electronic systems design withmandatory requirements for low energy consumption, long bat-tery life and low heat dissipation. Dynamic Power Management(DPM) and Dynamic Voltage and Frequency Scaling (DVFS orDVS) are two widely applied system level techniques to conservesystem-wide power consumption. In the context of hard real-time systems, however, DPM and DVS have to be used with greatcaution in terms of timing constraints. In this article, we study thecombined application of DPM and DVS for component orientedsystems with hard real-time tasks and propose a simulatedannealing based optimization algorithm and its online executionwith constant complexity at each scheduling point. Additionally,our approach considers multiple low power states (sleep states)with non-negligible switching overhead. The experimental resultsshow that our approach can achieve almost an optimal solution.

Keywords-Dynamic Power Management; Dynamic Voltage andFrequency Scaling; Hard Real-Time Systems; Online;

I. INTRODUCTION

Clearly, nowadays energy conservation has gained signif-icant attention in the electronic systems design. Low powerconsumption, long battery life and low heat dissipation be-come mandatory development requirements and objectives toreduce the system operation costs. However, due to the rapidgrowth of system complexity and continuous advancementin deep submicron process technology towards nanoscalecircuits, power management becomes more challenging. Fromthe system level point of view, two widely accepted powerreduction techniques, Dynamic Power Management (DPM)and Dynamic Voltage and Frequency Scaling (DVFS or DVS)can be applied to efficiently reduce the system-wide energyconsumption, which is defined as the total energy consumptionof all components in a component oriented system. Hereby, wemainly refer to hardware components, e.g. processor, memoryor peripheral devices and we define a component as powermanageable, if it can be managed by the DPM or DVS.

The main idea behind DPM is to shut down the component(switch to a sleep state) when it is idle and wake up it whenrequired. If we ignore the power state switching overhead, acomponent can be switched to a sleep state whenever it is idle.However, during the state switching the retention of registercontents and stabilization of power supply usually cause bothenergy and time costs. Therefore, [1] introduced the concept

of break even time to capture this issue. The deeper the sleepstate, the more power it saves, however, the longer break eventime it requires. In this article we consider multiple sleep stateswith non-negligible state switching overhead. In contrast toDPM, the DVS technique is applied during the componentis working and slows down the component by lowering thevoltage. If only considering dynamic power consumption,the energy consumption in a time interval is a convex andincreasing function of speed (frequency). However, due to thecontinuous advancement in deep sub-micron process technol-ogy towards nanoscale circuits, the leakage power becomesdominant, thus the energy consumption becomes merely aconvex function. [2] defined the critical speed to cover thisaspect, i.e. no task should ever run below this speed.

Both DPM and DVS try to save energy at the cost ofperformance. Therefore, they have to be used with greatcaution in hard real-time systems. In general, there existmultiple research addressing this problem. However, most ofthem are either concentrating only on the DPM strategy [3],[4] or only on DVS strategy [5], [6], [7], [8], [9]. Only somerecent studies [10], [2], [11] have reported that if the system-wide energy consumption is considered, DPM and DVS areworking contradictory. In particular, DPM strategy tries torun the task as soon as possible, so that more idle time canbe utilized for sleeping, however, this needs to increase theprocessor speed and thus its energy consumption. In contrast,the DVS tries to slow down the components. However, thisdecreases the idle time and leads to reduced possibilities toshut down them. In fact, [12] has shown the NP-hardness ofthis problem. In the literatures, the system-wide energy min-imization problem on component oriented real-time systemshas not been sufficiently studied, especially with considerationof non-negligible state switching overhead. Our approachapplies Simulated Annealing (SA) heuristic algorithm, whichis fairly simple but quite efficient. Furthermore, speaking ofDPM/DVS based energy-aware real-time scheduling, there areonline approaches and offline approaches. Clearly, an onlineversion is better suited in terms of flexibility, because it canbe adaptive to the system changes. However, in the literatures,the existing online approaches are often only concentrating ondynamic slack, which comes from the task earlier completionbefore the worst case execution time, while the sophisticatedtechniques exploring static slack are only considered offline.

Our approach provides a unified online solution for both staticslack and dynamic slack exploration. In addition, it comeswith a computational complexity of O(1) at each schedulingpoint, which provides significant advantages compared toother approaches that usually run at least in polynomial time.Nevertheless, even the logarithmic complexity is an obstaclefor wide acceptance, as it is with log(n) for EDF schedulingin practical applications.

In summary, our approach has two main contributions: i)the SA based algorithm and ii) its online execution. Since weare interested in the component oriented systems, the system-wide energy consumption is of utmost importance. We assumethat the system is composed of a single processor with DPMand DVS capabilities and multiple peripheral devices with onlyDPM capability. In addition, our approach is not constrainedto a fixed real-time schedule, both Earliest Deadline First(EDF) and Rate Monotonic (RM) can be applied.

The remainder of this article is organized as follows. SectionII gives a brief outline of the state-of-the-art in the context ofenergy efficient scheduling in hard real-time systems. SectionIII introduces the system model based on Advanced Config-uration & Power Interface (ACPI) and formally addressesthe problem. In Section IV and V, we propose the simulatedannealing based algorithm for online execution. Before SectionVII concludes, the experimental results through simulation arepresented in the Section VI.

II. RELATED WORK

There have been extensive research works in the field ofapplying DPM and DVS in real-time systems. TraditionallyDPM and DVS are often considered disjointedly. In case ofDVS approach the primary focus is on the power reduction ofprocessors. One of the earliest optimal offline DVS algorithmswith polynomial complexity is proposed by Yao et al. [5]. Thework by Aydin et al. [6] provides a power aware DVS algo-rithm composed of both offline and online parts. In the offlinephase the optimal frequency allocation is determined whilein the online phase a dynamic slack reclaiming mechanismand a speculative speed adjustment are introduced to furtherreduce the processor power consumption. In [7] the authorsproposed an online intra-task DVS algorithm in combinationwith EDF to exploit dynamic slack with computation com-plexity of O(1). However, a continuous processor frequencyrange is considered. Kim et al. [8] compared various hardreal-time DVS algorithms in terms of energy efficiency andperformance under a unified framework. [9] introduced afull polynomial time approximation algorithm. However, noneof the above mentioned work considered DPM with non-negligible state switching overhead. DPM related works areusually proposed for device scheduling. In [3] an online DPMalgorithm in conjunction with EDF is proposed, where tasksare procrastinated as much as possible to create large deviceidle intervals. The work by Swaminathan et al. [4] proposedan offline optimal device scheduling algorithm for hard real-time systems based on pruning techniques. A heuristic searchalgorithm is proposed as well to find near optimal solution.

Only recently the relationship of DPM and DVS attractedmore attentions in the context of system-wide energy efficientreal-time scheduling. Generally, DPM and DVS are appliedfor devices and processors, respectively. Devadas et al. [10]studied the exact interplay of DPM and DVS. However,their focus is on frame based task model, where all tasksshare common period/deadline. In the work of Jejurikar etal. [2] the concept of critical speed was introduced. Thetask should never run under the critical speed, otherwise thepower consumption increases. However, they ignored DPMstate switching overhead. Other related works are [12] and[13]. However, none of them is fully online. The authors of[12] proposed an approach composed of an offline and anonline component. In [13] the proposed algorithm is onlysemi-online, because the system-wide optimal processor speedis computed offline. It can handle dynamic task changes but nodevice changes. Besides, [12] and [13] are tightly coupled witha fixed real-time schedule EDF. Niu [11] considered DPM andDVS capable processor and DPM capable devices. However,the proposed algorithm is not fully online either.

III. PRELIMINARIES

Before we propose the algorithm, the problem and thesystem model composed of a processor power model, a devicepower model and a task model are formally described in thissection. We address the power model of the processor anddevices following the ACPI recommendation, which is an openindustry standard aiming at unifying the HW/SW interfacefor processor/device configuration and power management.ACPI allows the operating system directed power managementand provides platform independent interface description. Morespecifically, we adopt the concept of C-states and P-states todescribe the power states of the processor and D-states fordevice. The C-states describe different power states, whichcontain one active state and multiple low power (sleep) stateswith different sleep depth. The P-states reveal different perfor-mance states when the processor is running. Mainly they differin operating frequency and power consumption. The D-statesconcept is similar to C-states, but applied on devices.

A. Processor Power Model

We denote a set of processor power states as C0, C1, ..., Cc,where C0 is the only working state, i.e. the processor canonly execute tasks in this state and C1, C2, ..., Cc are lowpower states (sleep states) in non-increasing order of powerconsumption. These states are known as C-states in ACPI.∀i : 1 ≤ i ≤ c we define P (Ci), Lon→off (Ci), Loff→on(Ci),Pon→off (Ci) and Poff→on(Ci) as follows:• P (Ci) is the power consumption of the state Ci.• Lon→off (Ci) is the latency for state switching from C0

to Ci.• Loff→on(Ci) is the latency for state switching from Ci

to C0.• Pon→off (Ci) is the power consumption for state switch-

ing from C0 to Ci.

• Poff→on(Ci) is the power consumption for state switch-ing from Ci to C0.

For a given processor power model we can easily computethe break even time for each low power state denoted asBE(Ci) [1]. For the working state C0 we further define powersub-states, which are known as performance states (P-states)in ACPI. We denote them as S1, S2, ..., Ss where S1 is the fullperformance state (with maximal voltage/frequency) and theremaining states are sorted in non-increasing order of powerconsumption. ∀i : 1 ≤ i ≤ s we denote F (Si) and P (Si) asthe corresponding frequency (normalized with regard to S1)and the power consumption, respectively. The overhead forstate switching among different P-states is accounted to thetask worst case execution time.

B. Device Power Model

We denote a set of devices as R1, R2, ..., Rm. For each de-vice we define the power states following the D-states conceptin ACPI specification. We denote Di,j as the j-th power stateof the device Ri. Di,0 is the only active state and the remainingstates are low power states (sleep states) in non-increasing or-der of power consumption. Furthermore, for the device powerstate Di,0 we denote the power consumption as P (Di,0).Similarly to the processor power model, for each low powerstate Di,j with j > 0 we define P (Di,j), Lon→off (Di,j),Loff→on(Di,j), Pon→off (Di,j) and Poff→on(Di,j) as thepower consumption and the switching overhead of the stateDi,j . The break even time for each low power state Di,j

with j > 0 is denoted as BE(Di,j). In the following texta component could be referred to either as a processor or adevice.

C. Task Model

Since we are interested in hard real-time systems with peri-odic tasks, we adopted the classical real-time task model. Thetask set (independent tasks) is denoted as Γ = {τ1, τ2, ..., τn}with W (τi) denoting the Worst Case Execution Time(WCET) under maximal processor speed, T (τi) denoting therelative deadline (equal to the period) of the task and Dev(τi)denoting the set of components required by the task execution,i.e. Dev(τi) ⊆ {processor}∪{R1, R2, ..., Rm}. Note that theprocessor is required by all tasks. The hyper period is denotedas HP , which is the least common multiple of all task periods.

D. Problem Formulation

Before formulating the problem, we first introduce severaldefinitions.

Definition 1. Given a system model as described previously,a configuration is defined as a mapping from the available P-states of the processor to the task set. It is denoted as conf :Γ→ {S1, S2, ..., Ss}.

Definition 2. Given a configuration and a real-time schedule(e.g. EDF or RM), the configuration is valid, if all tasks canmeet their deadlines.

In this work, we assume that the task execution time islinear to the processor frequency. For a given configura-tion, the WCET of task τi is scaled up according to theassigned processor frequency. It can be easily computed as

W (τi)F (conf(τi))

. In the case of RM we can then simply decidethe validity of a configuration by means of testing whetherthe processor utilization is less than or equal to 0.6931,i.e. Σ1≤i≤n

W (τi)F (conf(τi))∗T (τi)

≤ 0.693. In case of EDF theutilization upper bound is 1.

Definition 3. Given a configuration and a real-time schedule,the quality of a configuration is defined as the system-wideenergy consumption over one hyper period. The less energythe system consumes, the better quality the configuration has.

Definition 4. Given a configuration and a real-time schedule,a power state schedule is defined as a schedule, which decidesfor all components, when they must be switched on to theactive state and when can be switched off to a low powerstate and specifically to which low power state.

Due to the non-negligible power state switching overhead, acomponent must be switched on a little ahead of the dispatchtime of the task, which requires the component. Otherwise,the task is delayed and the deadline may be jeopardized. Onthe contrary, by switching off components we must alwaysensure that the selected low power state is justified, i.e. thenext idle interval must be longer than the corresponding breakeven time [1]. In general, deriving a power state schedule is anon-trivial job, because we need exact knowledge of the taskstart and finishing time.

Problem 1. Given a system model and a real-time scheduleas the input, the output is to find the optimal configuration,which is valid and has the best quality. Additionally, a powerstate schedule should be derived.

Unfortunately, the above described problem is NP-Hard,because the authors in [14] have shown that even ignoringthe power state switching overhead, the problem is alreadyNP-Hard.

IV. SIMULATED ANNEALING BASED ALGORITHM

Since the problem is NP-hard, we decided to apply simu-lated annealing algorithm to solve the problem. The main ideaof SA is to iteratively improve the solution by investigatingthe neighbours. If a neighbour solution is better, a movementto the neighbour solution is made, otherwise the movementis only made with a certain probability. The process repeatsuntil a solution with certain quality is found or the iterationnumber reaches a predefined threshold. One of the advantagesof SA is being able to escape from a local optimum, whichis a solution with all its neighbour solutions being worse thatitself.

Before we explain the algorithm in detail, we first definethe neighbourhood of a configuration in our context. Two

1The least upper bound of processor utilization for RM is n(21/n−1) andconverges to 0.693 for high values of n

configurations are neighbours, if they exactly differ in thefrequency assignment of one task. Furthermore, we take thealgorithm CS-DVS from [2] to compute our initial configura-tion. Algorithm 1 illustrates our main algorithm.

Algorithm 1 Simulated AnnealingRequire: The system modelEnsure: A configuration

1: Generate initial configuration according to CS-DVS2: Evaluate quality of the initial configuration3: while iteration < length do4: Randomly generate a neighbour configuration as the

new configuration5: if the new configuration is valid then6: Evaluate the new configuration and accept the new

configuration with probability p7: end if8: iteration = iteration+ 19: end while

In this section we assume that the quality of a configurationcan be obtained in some way and the details are explainedlater in the subsequent section. In Algorithm 1 we havetwo parameters length and p. The length specifies the totaliteration number of the algorithm. The p is the acceptanceprobability and is defined as in Equation 1, where K is aconstant, q and q

′are the quality of the current configuration

and the new configuration, respectively.

p =

{1, if q > q

′

exp( q−q′

q∗K ), otherwise(1)

V. ONLINE EXECUTION

Traditionally, the SA is usually implemented in an offlinefashion, since all required information is available beforeruntime. However, several issues will arise when we try toimplement the Algorithm 1 offline:• How to obtain the quality of a configuration? If we ignore

the state switching overhead, then a component can beswitched to a low power state whenever it is idle. Thusthe energy consumption of a component can be mainlycomputed by multiplying the power consumption of thecomponent with its active time. However, in case ofnon-negligible state switching overhead, the calculationof energy consumption becomes much more difficult,because we need to analyse the exact length of each idleinterval, based on which we are able to decide whether thecomponent can be switched to a low power state or not.The idle interval analysis requires the exact knowledgeof the start, preemption, resume and finishing time ofeach task. As a consequence, this analysis has similarcomputation complexity as the response time analysis.

• How to derive the power state schedule? The Algorithm1 actually does not answer this question directly. Apower state schedule has to be derived according to

the configuration found by Algorithm 1. If Algorithm1 is implemented offline, then this schedule is to bederived offline as well. This requires similar computationcomplexity as the response time analysis as well.

• How to deal with dynamic slack? Since the dynamic slackis not available offline, the offline algorithm can onlyexplore static slack.

• How to handle dynamic system changes? The offlinealgorithm lacks the ability to be adaptive to the systemchanges, e.g. a new task is added into the system duringruntime or a new device is plugged into the system.

In this article we solved the above mentioned problems byproposing an approach that allows the Algorithm 1 to be exe-cuted in an online and adaptive fashion. As will be explainedlater, in our online approach the evaluation of the configurationquality and deriving of the power state schedule will becometrivial jobs. Hereby the main challenge is to integrate theSA algorithm into the hard real-time system without missingany deadlines. The basic idea is to take the advantage ofthe common feature of both domains, namely ”iterative”. TheSA algorithm is iterative, because it iteratively improves thecandidate solutions, while the real-time system with periodictasks is also iterative in terms of the iterative task execution ineach hyper period. In our approach each iteration of the SAalgorithms is mapped to a hyper period. With other words,in each hyper period we explore and evaluate a candidateconfiguration. This stage is called Exploration Stage (ES)and is stopped after a certain number of hyper periods. Forthe remaining time we call it the Application Stage (AS),because in this stage we apply the best configuration we havefound in the exploration stage.

Moreover, we assume that all tasks run until their WCETduring ES. If a task finishes earlier than its WCET, then weartificially prolong its execution time to the WCET. In otherwords, we explore only static slack in the ES and the dynamicslack is explored during AS. Another important note is that,since we have no knowledge of the power state schedule at thebeginning, we keep all components always active during theentire ES, so that no task will be delayed due to the stateswitching overhead. However, in the AS we will derive apower state schedule.

A. Exploration Stage

In this stage, the main goal is trying to find the bestconfiguration. Mainly we will explore one configuration inone hyper period. During runtime we perform two activitiesto achieve this goal.

The first activity with the name Algorithm Activity (AA)occurs at the end of each hyper period. The AA mainlyperforms the work specified for each iteration in Algorithm1. First the quality of the current configuration has to beevaluated. Since we perform runtime energy recording, whichwill be explained later, the quality of the configuration canbe easily obtained. Afterwards the acquired quality is tobe compared with the quality of the configuration from theprevious hyper period. According to the acceptance probability

Fig. 1. One Example with Dev(τ1) = {processor,R1} and Dev(τ2) ={processor}

a movement to the current configuration is made. Finally thenext configuration will be generated, which is a neighbourconfiguration of the current configuration. Before the nexthyper period starts, a scheduability test is carried out to test ifthe new configuration is valid. In case of invalidity the currentconfiguration is used in the next hyper period.

The second activity with the name Recording Activity (RA)happens at each scheduling point. In RA we mainly recordthe component activation/deactivation events and the energyconsumption. The events for each component are collected inan event list over each hyper period. Each event contains twoinformation: i) the time stamp when it is recorded, ii) if itis an activation event or deactivation event. Fig. 1(a) showsan example illustrating the events e1, e2, e3 and e4 recordedinto the event list of the device R1 and e5, e6, e7, e8, e9and e10 into the event list of the processor. We observe thatall events of R1 are related to the execution of the task τ1,this is because R1 ∈ Dev(τ1). Note that in Fig. 1(a) the R1

is kept always on as mentioned earlier, even when it is notneeded (between 20ms and 40ms). The recorded event listreflects the component behaviour, when it should be activatedand when it could be deactivated. These event lists will beused in the AS to derive the power state schedule. Anothernote is that the event list of the processor actually collectsthe activation/deactivation events of all tasks. The requiredmemory for storing the events is clearly dependent on thelength of a hyper period. Theoretically the hyper period of atask set could be arbitrarily long, however, in most real lifeapplications the tasks are usually harmonic and therefore thehyper period is manageable.

The energy recording is very simple as well. Since theRA takes place during runtime, we know exactly when atask starts and finishes. At each scheduling point we firstcompute the energy consumption of involved components inthe time interval between the previous recorded event andthe current event, then this is cumulatively added to the totalenergy consumption of current hyper period. At the end ofa hyper period we will automatically obtain the total energyconsumption, which is the quality of the configuration. In Fig.1(a), for instance, at the scheduling point 20ms the task τ1finishes. The involved components are the processor and R1,because they are in Dev(τ1). Here we discuss the computationof the energy consumption for R1 during the time intervalfrom the previous event e1 to the current event e2. In thisinterval we notice that the R1 should be active, because τ1was active and therefore the energy consumption is computedby means of P (D1,0) ∗ (e2.t − e1.t), if we assume that ei.tdenotes the time stamp of ei. At the scheduling point 40ms,we notice that the R1 was idle during the time interval fromthe previous event e2 to the current event e3. The idle length isthen l = e3.t−e2.t and the energy consumption E is computedvia Equation 2. If a component supports multiple low powerstates, we select the state with the largest break even time thatis less than the length of the idle interval. In this example weassume that the R1 supports only one lower power state. Notethat here the scheduling point 30ms is not relevant for R1,because at that scheduling point the R1 is not involved.

E =

{Eon→off + Eoff + Eoff→on if l > BE(D1,1)

P (D1,0) ∗ l, otherwise(2)

where Eon→off is the energy consumption of switchingoff the component, Eoff→on is the energy consumption ofswitching on the component and the Eoff is the energyconsumption of the component when it is off. They can beexpressed by Equations 3. Note that the energy recording foreach component needs only O(1) operation at each event.

Eon→off = Pon→off (D1,1) ∗ Lon→off (D1,1)Eoff→on = Poff→on(D1,1) ∗ Loff→on(D1,1)

Eoff = P (D1,1) ∗ (l − Lon→off (D1,1)−Loff→on(D1,1))

(3)

In summary, the recorded event lists reflect the time be-haviour of the components, when they should be activatedand when they could be deactivated. The recorded energy isthe energy consumed by the components, if we schedule thecomponents according to the recorded event list.

B. Application Stage

In this stage we apply the best configuration found in the ESand derive the power state schedule from the recorded events.According to the Definition 4 the power state schedule hasthe work to decide when a component is to be switched offand when to be switched on. For this purpose we perform anactivity called DPM Activity at each scheduling point. If there

is any task finishing or being preempted at a scheduling point,then the components required by the task can be potentiallyswitched to a low power state, however, this decision isdependent on the length of the upcoming idle interval. Otherexisting online approaches are often applying complicatedtechniques for estimating the next activation time (even withtask procrastination), which usually take polynomial time atleast. On the contrary, we can simply consult the event list tocompute the length of the next idle interval, which is muchmore efficient. According to this information we can decidewhether we should switch a component to a particular lowpower state. If a component is switched to a low power state,we also need to switch on it a little ahead of the next actualrequired time. Fig. 1(b) illustrates this DPM activity withoutconsideration of any dynamic slack. Here we assume that weuse the configuration and event list from Fig. 1(a). Sinceno task runtime variation is considered, the task executionbehaves exactly the same as in Fig. 1(a). At the schedulingpoint 20ms, the task τ1 finishes and the processor and R1

can be potentially put into low power states. After consultingthe event list of the processor and R1, respectively, we knowthat the processor will be required by 30ms and R1 will berequired by 40ms. For example, both idle intervals are largeenough, then both processor and R1 are switched to a properlow power state. If a component supports multiple low powerstates, we select the state with the largest break even timethat is less than the length of the idle interval. Note thatthe components are switched on a little ahead of their actualrequired time. The switching on/off process is illustrated bytriangles in Fig. 1(b) and Fig. 1(c).

As mentioned before we also would like to explore thedynamic slack in AS. Here we adopted the dynamic slackreclaiming mechanism. More specifically, we utilize the un-used execution time of a task for additional power saving viaDPM. Mainly the components can be switched off earlier thanthe worst case. Fig. 1(c) shows the same example as in Fig.1(b), but all tasks are finishing earlier than their WCET. At thescheduling point where the task τ1 finishes (at 14ms), we canconsult the event list of the processor and R1, respectively. Asa consequence, we know that the processor will be requiredby 30ms and R1 will be required by 40ms. This informationwill guide us to compute the length of the upcoming idleinterval, which can be used to make the decision whethera component should be switched to a low power state. Oneimportant note here is that all tasks are always activated atthe recorded activation time. If a task is ready earlier than therecorded time due to the earlier completion of the previoustask, then it needs to be delayed to the recorded activationtime, because only at that time we can guarantee that all therequired components are in active state. The second instance oftask τ1 in Fig. 1(c) illustrates this situation. It becomes readyafter the task τ2 completes (at 34ms), however, it should waituntil its recorded activation time, which is at 40ms. Moredetails can be seen in Algorithm 2 in the APPENDIX.

Generally, our approach is launched at the system startand runs from ES to AS. As soon as there are any system

changes, such as new tasks are added into the system or a newdevice is plugged into the system (only allowed at hyper periodboundaries), the approach will start over with the calculation ofthe new initial configuration by means of CS-DVS algorithmand run from ES to AS again.

C. Correctness and Complexity

We prove the correctness and the efficiency of our approachthrough two theorems.

Theorem 1. Our online approach can always guarantee thesystem schedulability.

Proof: Since our approach is divided into two stages, weshow the system schedulability in ES and AS, respectively. InES we explore one configuration in one hyper period. Since theinitial configuration obtained by means of CS-DVS is clearlyvalid, in the first hyper period there will be no task missing itsdeadline. Note that in ES all components are never switchedoff (even when they are idle), thus no task will be delayedby state switching overhead. This is also the reason why theclassical schedulability test via utilization can be performed.At the end of the first hyper period the configuration forthe next hyper period is generated. As a schedulability testis performed, and only if the test is positive, the generatedconfiguration will be used in the next hyper period (otherwisethe initial configuration is used), thus the schedulability isguaranteed as well. The procedure repeats at the end of eachhyper period, therefore the configuration used in each hyperperiod is obviously valid.

In AS the best found configuration is applied and obviouslythis configuration is valid. Furthermore, since all componentsare switched on as recorded and all tasks are activated asrecorded, no task will be delayed and therefore no taskwill finish later than its recorded finishing time, which isobtained in the worst case scenario. Therefore no task willmiss its deadline. In total the system schedulability is alwaysguaranteed.

Theorem 2. Our online approach has O(1) complexity at eachscheduling point.

Proof: In ES we mainly perform two activities, thealgorithm activity and the recording activity. In the algorithmactivity we first evaluate the current configuration. The evalua-tion takes O(1), because through the runtime energy recordingthe energy consumption of a configuration is automaticallyavailable at the end of a hyper period. The generation of aneighbour configuration takes O(1) as well, because we onlyneed to change the frequency assignment of one task. As aconsequence, the processor utilization (for schedulability test)can also be obtained by updating the utilization of the changedtask. Therefore the algorithm activity takes O(1) time. In therecording activity we record the events and energy for eachinvolved components at each scheduling point. Obviously, theevents and energy recording for a single component takesonly O(1) . At one scheduling point there can be at mostm components getting involved, where m is the number of

components in the system. Therefore the recording activitytakes O(m) time at each scheduling point.

In AS the most time consuming work is to find the nextactivation time for each involved component at the schedulingpoint. Since the event list contains the events in non-decreasingorder of their time stamps. We could find the proper eventin time O(log(L)) via binary search, where L is the lengthof the event list. However, if we manage an index variableaddressing the elements in the list and advance it properly ateach scheduling point, then each time we only need to retrievethe next event according to the current index, which obviouslytakes only O(1). More details can be seen in Algorithm 2 inthe APPENDIX. Since the number of involved components islimited by m, the DPM activity takes O(m) as well.

If we assume that the number of components in the systemis relatively small and constant, we have the total runtime ofO(1) at each scheduling point.

VI. EVALUATION

We evaluated our concept by implementing an experimentsetup using SystemC, which is an event-driven simulator. Inthe experiment we selected Intel XScale processor model from[14]. The device model is taken from [3] and five devicesare used: MaxStream Wireless Module, IBM Microdrive, SSTFlash, SimpleTech Flash Card and Realtek Ethernet Chip. Alldevices support one active state and one low power state.We performed both synthetic and realistic task model in theexperiments.

A. Synthetic Task Model

In this experiment we randomly generated 500 task setsand the size of each task set is between 3 and 9. The periodof each task is within [0.05ms, 100ms]. The WCET of eachtask is also randomly generated, so that the utilizations ofthe task sets are uniformly distributed in the interval (0, 1).Furthermore, for each task we randomly select 0 ∼ 2 devicesinto the required device set. The EDF is applied as the real-time schedule (RM can also be applied) and the constantK in Algorithm 1 is selected to be 0.001 according to ourpreliminary experiments. Hereby, we simulated some samplesfrom the generated task sets with different K-values, whereK = 0.001 delivers the best average result.

Fig. 2 shows the simulation result in terms of the numberof tasks in the system (x-axis). The y-axis denotes the system-wide energy consumption (the quality of a configuration),which is normalized with regard to the configuration obtainedby CS-DVS. In the legend the SA20hps, SA50hps andSA200hps indicate the configuration obtained by our SA al-gorithm after 20, 50 and 200 hyper periods in ES, respectively.The Optimum is the optimal configuration, which is obtainedby investigating all possible configurations in simulation. Thisis not practical in the reality, however, it gives us a lowerbound of the result for evaluation purpose. As observed, ouralgorithm achieves more power saving, if we spend morehyper periods in ES. Besides, the configuration obtained bySA200hps is already very close to the optimum.

Fig. 2. The impact of task number

Fig. 3. The impact of utilization

Fig. 3 shows the simulation result in terms of the utilizationof the task set. As expected, if the system is heavily loaded(with high utilization), then there is little static slack. Asa consequence, the system-wide energy consumption can bebarely reduced. On the contrary, the power consumption canbe reduced by factor of ca. 13%, if the utilization is relativelysmall.

Since our approach supports dynamic slack exploration,we performed an experiment with the task runtime variation,which is simulated according to Gaussian distribution. Wedenote Wi and Bi as the worst case and best case exe-cution time of the task τi, respectively, and Bi is definedas Bi = Wi ∗ (1 − α) with 0 < α ≤ 1. Our Gaussianfunction is then defined with dependence to the task worstcase execution time Wi and the parameter α, namely themean is µ = Wi+Bi

2 = Wi ∗ 2−α2 and the variance is

σ2 = 0.2 ∗ (Wi − Bi) = 0.2 ∗ α ∗ Wi. With other words,the greater the α, the more variation is allowed. The Fig. 4shows the simulation results. As expected, the greater the αvalue, the more dynamic slack can be utilized and thereforethe more power can be saved.

B. Realistic Task Model

In order to make the evaluation more realistic, we performed4 real life case studies taken from [15]. TABLE I showsthe simulation results. The power reduction is acquired bycomparing the configuration found after 200 hyper periods inES with the CS-DVS configuration.

Fig. 4. Simulation results with runtime variation

TABLE ISIMULATION RESULTS OF REAL CASE STUDIES

Case study Task number Power reductionLinear Motor Control 3 27%

CNC Machine (without GUI) 7 10%Airbag Control 9 18%

Electronic Stability Control 7 10%

VII. CONCLUSION

This article introduced a fully online approach based onsimulated annealing algorithm for energy efficient schedulingin component oriented hard real-time system. The computationcomplexity is O(1) at each scheduling point. Our approachis independent of real-time schedules. Both EDF and RMcan be used. Furthermore, our approach considers multiplelow power states with non-negligible state switching overheadfor DPM strategy. Both static slack and dynamic slack canbe explored in an online fashion. The experimental resultsshow the simplicity of our approach but yet efficient powerreduction of ca. 13% in comparison with CS-DVS approach.The experiments obtained an almost optimal configurationalready after 200 hyper periods.

VIII. ACKNOWLEDGEMENT

This work was partly funded by the DFG CollaborativeResearch Centre 614 and by the German Ministry of Educationand Research (BMBF) through the BMBF project SANITAS(01M3088) and the ITEA2 projects VERDE (01IS09012), andTIMMO-2-USE (01IS10034A).

REFERENCES

[1] L. Benini, A. Bogliolo, and G. De Micheli, “A survey of designtechniques for system-level dynamic power management,” Very LargeScale Integration (VLSI) Systems, IEEE Transactions on, vol. 8, no. 3,pp. 299 –316, june 2000.

[2] R. Jejurikar and R. Gupta, “Dynamic voltage scaling for systemwideenergy minimization in real-time embedded systems,” in Low PowerElectronics and Design, 2004. ISLPED ’04. Proceedings of the 2004International Symposium on, aug. 2004, pp. 78 – 81.

[3] H. Cheng and S. Goddard, “Online energy-aware i/o device schedulingfor hard real-time systems,” in Design, Automation and Test in Europe,2006. DATE ’06. Proceedings, vol. 1, march 2006, p. 6 pp.

[4] V. Swaminathan and K. Chakrabarty, “Pruning-based, energy-optimal,deterministic i/o device scheduling for hard real-time systems,” ACMTrans. Embed. Comput. Syst., vol. 4, pp. 141–167, February 2005.

[5] F. Yao, A. Demers, and S. Shenker, “A scheduling model for reducedcpu energy,” in Foundations of Computer Science, 1995. Proceedings.,36th Annual Symposium on, oct 1995, pp. 374 –382.

[6] H. Aydin, R. Melhem, D. Mosse, and P. Mejia-Alvarez, “Power-awarescheduling for periodic real-time tasks,” Computers, IEEE Transactionson, vol. 53, no. 5, pp. 584 – 600, may 2004.

[7] C.-H. Lee and K. Shin, “On-line dynamic voltage scaling for hard real-time systems using the edf algorithm,” in Real-Time Systems Symposium,2004. Proceedings. 25th IEEE International, dec. 2004, pp. 319 – 335.

[8] W. Kim, D. Shin, H.-S. Yun, J. Kim, and S. L. Min, “Performancecomparison of dynamic voltage scaling algorithms for hard real-timesystems,” in Real-Time and Embedded Technology and ApplicationsSymposium, 2002. Proceedings. Eighth IEEE, 2002, pp. 219 – 228.

[9] S. Zhang, K. S. Chatha, and G. Konjevod, “Approximation algorithmsfor power minimization of earliest deadline first and rate monotonicschedules,” in ISLPED ’07: Proceedings of the 2007 internationalsymposium on Low power electronics and design, 2007, pp. 225–230.

[10] V. Devadas and H. Aydin, “On the interplay of dynamic voltage scalingand dynamic power management in real-time embedded applications,”in Proceedings of the 8th ACM international conference on Embeddedsoftware, ser. EMSOFT ’08. New York, NY, USA: ACM, 2008, pp.99–108.

[11] L. Niu, “System-level energy-efficient scheduling for hard real-timeembedded systems,” in Design, Automation Test in Europe ConferenceExhibition (DATE), 2011, march 2011, pp. 1 –4.

[12] V. Devadas and H. Aydin, “DFR-EDF: A unified energy managementframework for real-time systems,” in Real-Time and Embedded Technol-ogy and Applications Symposium (RTAS), 2010 16th IEEE, april 2010,pp. 121 –130.

[13] H. Cheng and S. Goddard, “Integrated device scheduling and processorvoltage scaling for system-wide energy conservation,” in InternationalWorkshop on Power-Aware Real-Time Computing (PARC), 2005.

[14] (2003) Intel XScale. http://developer.intel.com/design/xscale/.[15] S. Groesbrink, “Comparison of alternative hierarchical scheduling tech-

niques for the virtualization of embedded real-time systems,” Master’sthesis, University of Paderborn, 2010.

APPENDIX

The variable comp.index denotes the current index in theevent list of the component comp and is initialized with 1.

Algorithm 2 DPM Activity at each scheduling point in AS1: if There is a task τi finishing or being preempted then2: for all comp ∈ Dev(τi) do3: Retrieve the next activation event according to

comp.index and compute the length of the next idleinterval. Based on the length we decide, whether wecan switch comp to a proper low power states. Ifcomp is switched off, we will wake up it a littleahead of the next activation time. Advance the indexvariable comp.index

4: end for5: end if6: if There is a task τi starting or being resumed then7: if Current time is equal to the recorded time then8: Run the task τi9: for all comp ∈ Dev(τi) do

10: Advance the index variable comp.index11: end for12: else13: Wait until the recorded time14: end if15: end if

Documents

Online Energy-Efﬁcient Hard Real-Time Scheduling for ...adt.cs.upb.de/wolfgang/isorc2012.pdf · with a computational complexity of O(1) at each scheduling point, which provides