9
Efficient Energy Management using Adaptive Reinforcement Learning-based Scheduling in Large-Scale Distributed Systems Masnida Hussin, Young Choon Lee, Albert Y. Zomaya Centre for Distributed and High Performance Computing, School of Information Technologies The University of Sydney NSW 2006, Australia [email protected], {young.lee, albert.zomaya}@sydney.edu.au Abstract—Energy consumption in large-scale distributed systems, such as computational grids and clouds gains a lot of attention recently due to its significant performance, environmental and economic implications. These systems consume a massive amount of energy not only for powering them, but also cooling them. More importantly, the explosive increase in energy consumption is not linear to resource utilization as only a marginal percentage of energy is consumed for actual computational works. This energy problem becomes more challenging with uncertainty and variability of workloads and heterogeneous resources in those systems. This paper presents a dynamic scheduling algorithm incorporating reinforcement learning for good performance and energy efficiency. This incorporation helps the scheduler observe and adapt to various processing requirements (tasks) and different processing capacities (resources). The learning process of our scheduling algorithm develops an association between the best action (schedule) and the current state of the environment (parallel system). We have also devised a task-grouping technique to help the decision-making process of our algorithm. The grouping technique is adaptive in nature since it incorporates current workload and energy consumption for the best action. Results from our extensive simulations with varying processing capacities and a diverse set of tasks demonstrate the effectiveness of this learning approach. Keywords–energy efficiency, reinforcement learning, task grouping, dynamic scheduling. I. INTRODUCTION For many years, parallel and distributed computing systems (PDCSs) have served as a mainstream high performance computing (HPC) platform. These systems, such as computational grids and recently, clouds provide massive computing capacity to deal with large-scale and compute-intensive scientific and engineering applications [1-3]. Such massive computing capacity is achieved primarily by increasing the volume of system components (i.e., processors, memory, disk, network card). This rapid growth has resulted in excessive electricity consumption. The average increase rate of such energy consumption is 12% per year from 2005 to 2010 [4, 5]. For example, the Earth Simulator; one of the Japan’s fastest supercomputers with 5,120 processors consumes 11.9MW of power [6]. Computer systems consuming vast amount of power also emit excessive heat; this often results in system unreliability and performance degradation. It has been reported in previous studies (e.g.,[4-6]) that system overheating causes system freeze and frequent system failures. Among many system components, processors are the major energy consumer [7, 8]. Typically, each processor in HPC systems consumes 80 to 95Watt of power at peak state [6, 9]. However, processors may not always run at the absolute highest utilization (100%) and many processors dissipate a substantial portion of energy simply to keep them available. That is, the majority of the electricity that passes through them is wasted. This issue of idle power consumption further highlights the practical importance of energy optimisation. The energy optimization heavily depends on resources’ loads; a very high load factor (e.g., 80% to 90%) is sought for energy efficiency [5]. Typically, the processing capacity of a processor is proportional to its power draw; the faster the higher. The trade-off between performance and energy consumption can be captured using scheduling policies. The efficacy of a scheduling algorithm in PDCSs is influenced by the availability of prior information of workloads and the accuracy of this information [2]. In practical situation of PDCSs, however, such information is not readily available; and thus, scheduling decision making is much complicated. The workload, here on referred to computational tasks (tasks) arriving dynamically with various characteristics in many aspects, such as resource requirements, quality of service (QoS) and temporal constraints. The dynamic and heterogeneous nature of both tasks and resources in PDCSs further complicates scheduling. The performance of tasks is commonly measured by completion time, makespan or throughput. In this work we study the use of an adaptive reinforcement-learning (RL) scheme for effective scheduling of tasks in PDCSs aiming for better performance and energy efficiency. RL is a goal-directed learning approach that the learner (agent) gains information through its experience to react (behaves) to the dynamic environment. The agent learns through trial-and-error interactions with the environment to maximize long-term reward [10]. Thus, the stochastic nature of RL is an 2011 International Conference on Parallel Processing 0190-3918/11 $26.00 © 2011 Crown Copyright DOI 10.1109/ICPP.2011.18 385

[IEEE 2011 International Conference on Parallel Processing (ICPP) - Taipei, Taiwan (2011.09.13-2011.09.16)] 2011 International Conference on Parallel Processing - Efficient Energy

Embed Size (px)

Citation preview

Efficient Energy Management using Adaptive Reinforcement Learning-based Scheduling in Large-Scale Distributed Systems

Masnida Hussin, Young Choon Lee, Albert Y. Zomaya Centre for Distributed and High Performance Computing,

School of Information Technologies The University of Sydney

NSW 2006, Australia [email protected], {young.lee, albert.zomaya}@sydney.edu.au

Abstract—Energy consumption in large-scale distributed systems, such as computational grids and clouds gains a lot of attention recently due to its significant performance, environmental and economic implications. These systems consume a massive amount of energy not only for powering them, but also cooling them. More importantly, the explosive increase in energy consumption is not linear to resource utilization as only a marginal percentage of energy is consumed for actual computational works. This energy problem becomes more challenging with uncertainty and variability of workloads and heterogeneous resources in those systems. This paper presents a dynamic scheduling algorithm incorporating reinforcement learning for good performance and energy efficiency. This incorporation helps the scheduler observe and adapt to various processing requirements (tasks) and different processing capacities (resources). The learning process of our scheduling algorithm develops an association between the best action (schedule) and the current state of the environment (parallel system). We have also devised a task-grouping technique to help the decision-making process of our algorithm. The grouping technique is adaptive in nature since it incorporates current workload and energy consumption for the best action. Results from our extensive simulations with varying processing capacities and a diverse set of tasks demonstrate the effectiveness of this learning approach.

Keywords–energy efficiency, reinforcement learning, task grouping, dynamic scheduling.

I. INTRODUCTION

For many years, parallel and distributed computing systems (PDCSs) have served as a mainstream high performance computing (HPC) platform. These systems, such as computational grids and recently, clouds provide massive computing capacity to deal with large-scale and compute-intensive scientific and engineering applications [1-3]. Such massive computing capacity is achieved primarily by increasing the volume of system components (i.e., processors, memory, disk, network card). This rapid growth has resulted in excessive electricity consumption. The average increase rate of such energy consumption is 12% per year from 2005 to 2010 [4, 5]. For example, the Earth Simulator; one of the Japan’s fastest supercomputers with 5,120 processors consumes 11.9MW of power [6].

Computer systems consuming vast amount of power also emit excessive heat; this often results in system unreliability and performance degradation. It has been reported in previous studies (e.g.,[4-6]) that system overheating causes system freeze and frequent system failures.

Among many system components, processors are the major energy consumer [7, 8]. Typically, each processor in HPC systems consumes 80 to 95Watt of power at peak state [6, 9]. However, processors may not always run at the absolute highest utilization (100%) and many processors dissipate a substantial portion of energy simply to keep them available. That is, the majority of the electricity that passes through them is wasted. This issue of idle power consumption further highlights the practical importance of energy optimisation. The energy optimization heavily depends on resources’ loads; a very high load factor (e.g., 80% to 90%) is sought for energy efficiency [5]. Typically, the processing capacity of a processor is proportional to its power draw; the faster the higher. The trade-off between performance and energy consumption can be captured using scheduling policies. The efficacy of a scheduling algorithm in PDCSs is influenced by the availability of prior information of workloads and the accuracy of this information [2]. In practical situation of PDCSs, however, such information is not readily available; and thus, scheduling decision making is much complicated. The workload, here on referred to computational tasks (tasks) arriving dynamically with various characteristics in many aspects, such as resource requirements, quality of service (QoS) and temporal constraints. The dynamic and heterogeneous nature of both tasks and resources in PDCSs further complicates scheduling. The performance of tasks is commonly measured by completion time, makespan or throughput.

In this work we study the use of an adaptive reinforcement-learning (RL) scheme for effective scheduling of tasks in PDCSs aiming for better performance and energy efficiency. RL is a goal-directed learning approach that the learner (agent) gains information through its experience to react (behaves) to the dynamic environment. The agent learns through trial-and-error interactions with the environment to maximize long-term reward [10]. Thus, the stochastic nature of RL is an

2011 International Conference on Parallel Processing

0190-3918/11 $26.00 © 2011 Crown Copyright

DOI 10.1109/ICPP.2011.18

385

important requirement for dynamic scheduling. We address the problem of scheduling various types of tasks into a set of resources to optimize resource utilization in term of energy efficiency. A novel task-grouping (TG) technique is also devised in this work as part of the energy management for efficient mapping to resources. The TG technique is adaptive in nature; and it is capable of changing the size of task group taking into account processing requirements (processing weights) of tasks. A group of tasks is assigned to a resource on which the processing capacity is considerably favored.

Our contributions are listed as follows. First, the investigation of reinforcement learning for energy efficiency is quite novel especially with its application to energy management. Second, the incorporation of adaptive TG, into dynamic learning-based scheduling, helps tackle heterogeneity and dynamicity issues in large-scale distributed systems.

Results obtained from our extensive comparative evaluation study clearly show that our adaptive RL outperforms other learning-based energy management schemes ([11], [12], [13]) in terms of energy consumption by a noticeable margin. It is also shown that energy efficiency is realized without affecting the response time significantly.

The remainder of this paper is organized as follows. Section II describes related work on energy management using learning schemes. Section III details the models used in the paper. Our adaptive reinforcement-based scheduling algorithm with a TG technique is presented in Section IV. Experimental settings and results are presented in Section V. Finally, Section VI concludes the paper.

II. RELATED WORK

As power consumption has become a major sustainability issue particularly in large-scale data centers (e.g. clouds) a lot of attention has been paid to the energy efficiency of these systems. This energy efficiency issue calls for the development of various energy-saving techniques including dynamic voltage and frequency scaling (DVFS) [14, 15], resource hibernation [14] and memory optimization [16]. In parallel with the energy-saving techniques, energy management approaches have been studied extensively (e.g., [12, 17-19]) in the context of energy-aware task scheduling. Noticeable energy management approaches include heuristic-based methods [20], stochastic optimization techniques [17] and game theoretic strategies [19]. These approaches have demonstrated the effectiveness in minimizing energy consumption while still meeting certain performance goals. However, the efficacy of these approaches in dealing with system dynamicity is limited to a certain level. The scope of energy management should be stretched further incorporating dynamicity and heterogeneity of tasks and resources. For the case of dynamic scheduling in energy

management, it is much beneficial for a scheduler to measure its performance and adapt accordingly. Since the availability of a priori information in dynamic environment is hard, if not impossible to guarantee, adaptive reinforcement learning can be of a very effective and practical approach for a good energy management controller. The online power management system using reinforcement learning in [11] focuses on the power consumption of the Blade cluster system with varying HTTP workloads. A policy with a multi-criteria objective function is devised taking both power and performance into account for maximizing the reinforcement reward signal. The system state is characterized by a set of performance, power and load intensity metrics. It is considered that CPUs operate at the highest frequency under all workload conditions. Hence, the reward signal is defined as response time divided by total power consumed in a decision interval. The effectiveness of the reinforcement-based power controller in [11] is achieved by discovering the optimal level of CPU throttling in a given state. The power controller continuously regulates and monitors the CPU clock speed to keep the power consumption close to but not over the power upper limit (or powercap). The simple random walk policy is used for setting the powercap.

In [12], an adaptive power management policy that uses an extended version of Q-learning is proposed to deal with limited information of workload. The authors set the delay in producing an action as a performance constraint while minimizing power consumption. An agent chooses an action, either sleep or active, every time the system leaves the current state and enters another. The state is updated in every learning cycle and there are two actions related to power management (go_active and go_sleep). One of these actions is selected when the state changes. The power consumed for each action, however is known in advance. When the same state is re-observed, the minimum Q-value (product of power consumption and delay) of previous action is chosen for the next action. They also proposed the strategy of updating multiple Q-values in each cycle at the various learning rates that speed up the learning process. That is, the agent makes better decisions with more information over time.

The task consolidation policy in [13] executes all tasks with a minimum number of resources and takes scheduling as a main role in reducing power consumption. That is, instead of dynamically allocating the resource to the task; the policy estimates the impact of the task on the resource in terms of performance and power consumption in advance. The work used a machine-learning approach that learns from the current information of the system, such as power consumption level, CPU loads and completion time; and this contributes to improving the quality of scheduling decisions. The objective of that policy is to maximize user satisfaction without increasing power consumption. For a given task, the satisfaction rate is fulfilled when the completion time is less

386

than the deadline and the power relies on the percentage of CPU usage for the task. A supervised machine learning process is used to train resources and tasks to validate the model for future real workloads.

While most previous energy management policies with learning approach deal with homogenous resources our learning approach focuses on dynamic and heterogeneous environments.

III. MODELS

In this section, we describe the application, system and energy models used in our study. The power optimization and reinforcement-based scheduling are induced and presented based on these models.

A. Application model

Tasks considered in this study are computation-intensive and independent from each other (i.e. no inter-job communication or dependencies). Each task is a single arrival unit and associated with the set of parameters shown below,

Ti = {si, di} (1)

where si is the computational size of task i that is specified by millions of instructions (MI) and di is the latest time (deadline) by which task Ti is supposed to be completed. For a given task Ti, the expected (or actual) execution time ACTi is computed by the computational size of task si divided by processing capacity of a referred (the slowest) resource. Hence, the deadline of task i is defined as ACTi + add_t; where add_t is 0 – 150% of ACTi. Tasks are assumed to be sequential applications and each of which requires on no more than one processor for its execution. However, tasks may be considered to be assigned to processors with different processing capacities due primarily to the urgency of task, i.e., task priority. This study considers three levels of task priority based on deadline: low, medium and high. The priority of a task Ti is set to high if its deadline di is at most 20% later than the expected execution time ACTi. The priority is set to low if di is 80% or more later than ACTi. Otherwise, the priority is set to medium. Tasks arrive in a Poisson process. We assume that the task’s profile is available and can be provided by the user using job profiling, analytical models or historical information [21].

B. System Model

The target system used in this work (Figure 1) consists of a set of resource sites S that are loosely connected by a communication network and each has a set C of c compute nodes. These compute nodes are fully connected via a high-speed link. We assume that nodes are heterogeneous in terms of processing capacity, disk storage and memory

capacity. Each of these nodes has a set of processors given as Rj where j = {1, 2…, m}. We also assume that processors are typical as in many HPC systems. Thus, for each processor, a wattage at peak state is randomly selected from the range (80 – 95W) [6]. The speed of a processor Rj is expressed in terms of million instructions per second (MIPS) and denoted as spj. That is, each node c is associated with processing capacity to complete the scheduled tasks. More formally, the processing capacity of node c is defined as:

PCc = � ���

m

jj

c

spq 1

1 (2)

where qc is the queue length of node c. The queue qc varying in size (length) exists to limit the number of tasks to be scheduled for execution [19, 21]. In this study, there are more than one task waiting in each queue space; this is based on a TG technique that we devised for energy management. For a single task Ti executing on processor j, the time for execution is given by:

j

isp

sciET �),( (3)

Figure 1. System Model

This work aims to minimize average response time that is dominated by the waiting time experienced by the tasks at qc and the execution time, given as

AveRT = � ���

�1

_1

iitwaitET

N (4)

where N is the number of tasks submitted and completed within some observation period. In each resource site, an agent resides and agents in different sites are independent from each other, but they share a long-term memory (shared-learning memory). Each agent is limited to keep and update 15 cycles of its learning experiences in the shared-learning memory. The

Large-scale PDCSs

Shared-learning memory

Compute node Resource site Agent

387

Independent tasks

Shared-learning memory

Action

Task assignment

Task grouping

Learning experience

Agent

State Reinforcement feedback signal

Parallel compute nodes

communication link between the shared-learning memory and all agents is assumed to perform at the same speed without contentions. The inter-agent communications implicitly occur once they track each other's experiences, in order to improve their decisions.

C. Energy model

The energy management model in this work attempts to balance performance and energy consumption (energy efficiency). Specifically, it is aimed to optimize the speed of processors (in the node) at which they operate while reducing idle time as much as possible. Clearly, an increase in resource utilization contributes to the improvement of energy efficiency For a given processor, its power consumption during a particular time period can be measured with the information on its total execution time, the power consumed during busy and idle states. More formally, the power consumption PPj of a processor j is defined as

) ()ET ( min

N

1iimax t_idleppPPj ��� �

(5)

where pmax is the power usage at 100% utilization (busy), pmin is the power usage when processor j becomes idle and t_idle is total idle time of processor j, respectively. It is assumed that for a given processor the peak power (80 - 95Watt) [6] is proportional to its processing capacity. Typically, the power consumption at idle state is about 50% of the peak power [8]. The energy consumption of a compute node c is then defined as:

��

�m

jjc PP

mE

1

1 (6)

IV. REINFORCEMENT LEARNING-BASED ENERGY MANAGEMENT

This section begins by briefly describing reinforcement learning and gives the details of our approach in the context of dynamic learning-based scheduling. We incorporate a TG technique into our RL-based scheduling algorithm. The TG technique is an effective means for searching allocation alternatives and it in turn increases resource utilization [22]. It is carried out in a systematic way considering task priority (low, medium or high); and the TG process is facilitated by the learning strategy. The pictorial description of our framework is presented in Figure 3.

A. Reinforcement learning Briefly, the RL system (Figure 2) interacts between

observed states (input) and chosen actions (output), and it obtains reinforcement feedback signals (reward or penalty). The RL system then evaluates the feedback signal to

improve the quality of its action. It involves two operations: discovering the right outputs for a given input and memorizing those outputs. The RL has an attractive feature that it learns from previous experiences to produce the best solution. This generally improves its automation process while maximizing a reinforcement function [10]. In other words, the RL system is capable of repairing and improving its performance when exhibiting poor decisions quality.

Figure 2. Reinforcement learning scheme

B. The Reinforcement learning structure for dynamic scheduling

In this work, we propose a method by which the learning system improves the quality of its actions by on-going learning process. As such, the structure of our RL system is designed based on a neural network presented in [10]. For training the neural network, the agents rely on trial-and-error scheme that is greatly facilitated with convenient mathematical formalisms.

Figure 3. RL-Energy Management Approach

Formally, in this work RL within the context of energy management is formulated by the interaction between agent and parallel compute nodes as shown in Figure 3. The agent

Environment

Reinforcement system Reinforcement feedback

signal

State Action

388

or scheduler observes the current performance of its compute nodes and learns the reinforcement feedback signal. Based on such performance information, the agent makes an action. The action refers to a decision to group tasks that are dynamically arriving. The TG technique takes charge of grouping individual tasks according to their priority, i.e., a task group. Unlike traditional scheduling problems, the scheduling problem in this work aims not only the identification of best task-resource matches, but also the optimization of energy efficiency without significant performance degradation. There are two reinforcement feedback signals (reward and error) that are under consideration in the decision-making process. According to the benefit of the feedback-based RL scheme in [10], the agent has to maximize the reward to improve the performance and minimize the error for the system to be energy efficient. The information (i.e., reward and error) helps the agent determine and provide favorable actions. Specifically, each action incorporates with a learning value, given in Eq. (7).

error

rewardvall �_ (7)

The agent is intended to increase the learning value of its action in every learning cycle. Then the action is kept in the shared-learning memory for better decision-making. In other words, the agent improves its action not only by learning from its feedback signal, but also from other agents’ experiences. The amount of time taken for learning reduces as the system evolves.

Figure 4. Reinforcement-based scheduling

The incorporation of the observed state of compute nodes S(t) with the reinforcement feedback signal F(t), helps the agent AS improve the quality of the action Ac(t) (Figure 4). At a particular point in time t, the agent AS receives a state Sc(t) = (Load, q-, {PP1� m}) from each node c where Load is the total processing weight in the node’s queue, q- is

the available queue spaces and PP1�m is the power consumption of each processor, respectively. The agent AS groups newly arriving tasks (action) according to the details from Sc(t). After a task group being sent to a resource, the action is evaluated (analysis) to identify how the task group benefits in providing better utilisation and efficient energy consumption. From the analysis, the agent receives and stores a reinforcement feedback signal in the shared-learning memory (feedback) to acknowledge and memorize the quality of the action. Now, when a compute node is ready to receive other task groups, the agent performs the previous iteration (action, analyse, feedback) while improving the quality of decisions.

C. The feedback values for quality action

To determine the favorable action and make the node operate with efficient energy consumption, the agent is designed to be capable of tracking run-time information (more specifically, the reinforcement feedback signal). The feedback signal helps agent group new arrival tasks (action) to strive for optimizing TG. The first feedback signal is a reward value that aims for better performance. It denotes the number of tasks in a task group that met their deadline; given by,

��

�J

iivalrew

1�

(8)

The action is highly regarded when it is capable of gaining a high reward value. Specifically, the reward of current action is essentially to be higher than the previous action. To ensure a resource runs at optimal energy consumption, error is introduced as the second feedback value. An error value is a measure of suitability between the processing weight of the task group and the processing capacity of the resource on which the task group is assigned to. It is used to justify the processing weight of a given task group and it helps optimize power usage. The error value errtg of a task group is defined as:

errtg =

��

��

fitnessproc _

11 (9)

where proc_fitness is computed by the processing weight of a task group divided by PCc of the resource. The action is favorable if it is able to produce null or a small error value. Both reward and error are used to evaluate the performance of each scheduling output. It is important to note that the agent receives an error value errtg immediately after the task assignment process, but for reward rewval the agent has to wait until all tasks in a task group have completed their execution. In response to this, the agent can

S(t-1) S(t) S(t+1)

As(t-1) As(t) As(t+1)

Ac(t)

Parallel compute nodes

Agent (scheduler)

F(t) Sc(t) Ac(t+1)

1 if ACTi � di 0 otherwise ; where �i =

389

//Prior to task assignment process // 1: for i � J do 2: for i running on S do

3: Let opnum = 0 4: Let x+ = i of J with considering priority 5: Let x_ = i of J without considering priority 6: Let tg = P (x+ | x_) considering max (l_val) 7: for i � tg do 8: while opnum � countMax(m� Rj) do 9: Update opnum +1 10: Sort i in tg according to di 11: Calculate pw 12: end for 13: Assign tg to Rj ; Rj � S //mapping performs 14: end for 15: end for

rely merely on the error value of its previous action to group next tasks. This is more feasible than the case in that the agent waits for the reward that might increase delays in producing next task group. However, if it is determined that the reward is decreased (i.e., rewval_current < rewval_previous), the agent immediately checks and learns the actions from the shared-learning memory—considering the action with the maximum learning value—to group newly arriving tasks (action). In other words, the action contributes to the better performance and efficient energy consumption if both feedback values (i.e., reward and error) are actually realized.

D. Intelligent controller for adaptive TG

As tasks dynamically arrive with different priorities a scheduling strategy that explicitly takes into account task priority can be of much beneficial for both performance and resource utilization; hence, energy efficiency. Two processes are induced in the adaptive TG technique; merge and split processes. In particular, the task-merge process takes place prior to task assignment and the task-split process is then carried. In most cases, if not all, the number of resources in this study is assumed to be relatively fewer than the number of tasks to be processed.

1) Merge process In our case of dynamic scheduling, the merge process is

performed as new tasks arrive. Each newly arrived task is evaluated and merged into an appropriate task group. The merge process (Figure 5) helps reduce the number of resource allocation alternatives significantly. The agent performs the trial-and-error process in grouping tasks (action); and it strives for better learning value l_val. To facilitate decision making in TG, the processing requirement of a task group must be accurately identified and determined. The processing weight is used to indicate the importance of a task group relative to other groups. More formally, the processing weight pw of a task group is defined as:

��

�opnum

i i

opnum

i

dpw

iids

1

1 (10)

where opnum is the optimal number of tasks that can be merged into one group. The opnum value changes dynamically according to the number of tasks in a task group that gives the maximum learning value. However, the value must not exceed the maximum number of processors in a node. Specifically, a task group with a small pw is required to be executed as early as possible; otherwise, the task group allows some delays. Tasks can be merged in two different ways as follows.

Mixed-priority In this approach, tasks with different priorities are mixed and merged into the same group. Tasks

in a task group are sorted by their deadline (i.e., earliest-deadline first or EDF). Clearly, the mixed-priority approach is most likely to provide moderate processing weights. Since tasks are merged into a group as they arrive, there is no delay in grouping decisions. However, pw may not perfectly represent the overall priority in a task group.

Identical-priority In the identical-priority approach, tasks are grouped separately according to their priorities. Therefore, tasks with similar characteristics are expected to be grouped. In the meantime, this approach still applies EDF to sort tasks. It is clear that pw can be an accurate priority indicator since tasks in a group are all with the same priority. For example, a task group with high priority tasks would produce a higher pw compared with that of low priority tasks. Since the task group can be characterized with task priority and pw, it can be allocated to the right (the most favorable) resource. This gives advantage to minimizing response time while increasing the reward, rewval.

Figure 5. Task-merge process

2) Split process

During the task assignment process, a task group is considered as a single arrival unit and dedicated to one slot in the queue. Although, tasks in a task group have the same waiting time, their execution times still vary according to the processor performance on which a task is executed. Due to differences in the processing speed, some processors might finish the execution earlier and become idle; and the electricity that passes through them is wasted. These processors rather receive and execute other tasks. The split process (Figure 6) comes into the picture when this situation occurs and increases resource utilization. The task-split process only applies to tasks in a task group that is at the head of the queue qc. The number of tasks to be split depends on the number of idle processors. For example, if it determines two processors are running at pmin then two tasks from the next waiting task group will be chosen to split from

390

the group and execute first—considering tasks in their EDF order.

Figure 6. Task-split process

V. PERFORMANCE EVALUATION

In this section, we first describe the experiment configuration. Then, experimental results are presented.

A. Experiment Setting

To evaluate the performance of our energy management framework, we have conducted extensive simulations with a diverse set of tasks and compute nodes. There are five to ten resource sites in each of which its own agent resides. Each resource site contains a varying number of compute nodes ranging from 5 to 20 and in each node of which there are 4 to 6 processors. Due to processors being the major power consumer in compute nodes, we focus on the power consumption of the processor. The processing speed of a processor is random and uniformly distributed within the range of 500 to 1000MIPS. We used pmin and pmax of 48 and 95, respectively, which are common for processors used in data centers. The number of tasks in a particular simulation is set between 500 and 3000. Task inter-arrival times (iat) follow a Poisson distribution with a mean of five time units. Note that, iat satisfies with the TG technique without increasing a delay in grouping tasks. For a given task Ti, the computational size si is randomly generated from a uniform distribution ranging from 600 to 7200(MI) [23]. The computational size and deadline are satisfied with the measurement made for the task priority. The probabilities of three different task priorities (low, medium and high) are varied in different experiments.

B. Results

Experimental results are presented in three different ways based on performance metrics.

Experiment 1: The impact of learning approach on energy consumption

In this experiment, we investigate how energy consumption is influenced by the learning approaches used.

The comparative performance evaluation is performed between our approach (Adaptive-RL) and extended versions of three other learning approaches; Online RL [11], Q+ learning [12], and Prediction-based learning [13]. Since the learning approaches are induced into the same system model and scheduling strategy, they are appropriate for the comparisons. The performance metrics adopted in this experiment are average response time AveRT and energy consumption ECS; defines as �c=1Ec.

0

20

40

60

80

100

120

140

160

500 1000 1500 2000 2500 3000Number of tasks

aver

age

resp

onse

tim

e (t

uni

t )

Adaptive RL Online RL Q+ learning Prediction-based learning

Figure 7. Average response time with different learning approaches

02

46

81012

1416

1820

500 1000 1500 2000 2500 3000Number of tasks

ener

gy c

onsu

mpt

ion

(in m

illio

ns)

Adaptive RL Online RL Q+ learning Prediction-based learning

Figure 8. Average energy consumption with different learning approaches

Figure 7 clearly shows that Adaptive-RL outperforms other approaches. Specifically, the discrepancy of average response times among the approaches is small (by about 10% on average) when there is less volume of tasks in the system. The performance of Adaptive-RL however is more appealing as the number of tasks increases. In Figure 8, Online RL demonstrates comparable results with those of Adaptive-RL (the differences by about 5%). Adaptive-RL however still shows its strength (when both results in Figures 7 and 8 are taken into account) that clearly demonstrates the competence in minimizing both AveRT and ECS. The primary source of this performance gain is the incorporation of shared-learning memory into Adaptive-RL; and such incorporation enables fast learning process that effectively captures the trade-off between performance and energy consumption.

//After task assignment process // 1: for i � tg queuing at qc of Rj do 2: for my� Rj do 3: if PP(my) = = pmin then 4: Send i1+ to my ; considering EDF5: Update PP(my) 6: end if 7: Update tg 8: end for 9: end for

391

Experiment 2: The impact of learning process on different patterns of workloads

For this experiment, we study the performance of the learning process in two different system states: lightly loaded and heavily loaded. These states are determined based on the number incoming tasks during a particular period of time, e.g., 500 tasks and 3,000 tasks for lightly loaded and heavily loaded, respectively. In this experiment, resource utilization is used as the main performance metric. It is defined as the percentage of time the processor was busy servicing tasks.

00.10.20.30.40.50.60.70.80.9

1

10 20 30 40 50 60 70 80 90 100

% learning cycles

utili

satio

n ra

te

Adaptive RL (heavily-loaded) Online RL (heavily-loaded)

Figure 9. Utilisation rate between Adaptive-RL and Online RL in heavily loaded state

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

10 20 30 40 50 60 70 80 90 100

% learning cycles

utili

satio

n ra

te

Adaptive RL (lightly-loaded) Online RL (lightly-loaded)

Figure 10. Utilisation rate between Adaptive-RL and Online RL in lightly

loaded state

We enhance the analysis between Adaptive-RL and Online RL as both approaches show compelling results in response time and energy consumption. Our results of both learning approaches are presented in Figures 9 and 10. While both Adaptive-RL and Online RL show resource utilization of 60% or above at 100% learning cycles, resource utilization in Online RL only starts to increase after 50% and 70% of the learning cycle in heavily loaded state and lightly loaded state, respectively. Note that resource utilization with Adaptive-RL exhibits a linear relationship with learning cycle in both states. Therefore, it is much preferable to make better (fast) decisions with the incorporation of learning information from multiple agents

and two feedback values (i.e., reward, error) at different processing stages. Experiment 3: The impact of Adaptive-RL on the degree of system heterogeneity

In this section, we investigate how the variation in processing capacity PCc affects the learning process. We vary the heterogeneity of resources according to the service coefficient of variation; that is proposed and used in [24]. It is defined as the summation of differences in processing capacity on the target system divided by processing capacity in a respective resource. Specifically, a resource heterogeneity rate of 0.1 (10%) indicates the difference in processing capacity is relatively small. Two metrics, successful rate (i.e., rewval / N) and ECS are used to analyse the performance of Adaptive-RL in both lightly loaded state and heavily loaded state.

0.4

0.5

0.6

0.7

0.8

0.9

0.1 0.3 0.5 0.7 0.9Heterogeneity of resources

succ

essf

ul ra

te

Heavily-loaded Lightly-loaded

Figure 11. Successful rate of Adaptive-RL in lightly- and heavily-loaded

states

0

2

4

6

8

10

12

14

0.1 0.3 0.5 0.7 0.9

Heterogeneity of resources

ener

gy c

onsu

mpt

ion

(in m

illio

ns)

Heavily-loaded Lightly-loaded

Figure 12. Average energy consumption of Adaptive-RL in lightly- and

heavily loaded states

The high successful rate achieved using Adaptive-RL is another compelling strength. As shown in Figure 11, more than 70% of tasks (on average) have completed their execution before their deadline. It also demonstrates a better successful rate when the system heterogeneity is low. This is because the learning process takes a shorter time to understand and learn system changes—considering the resources have relatively similar capacity.

392

Figure 12 demonstrates the effectiveness of the learning process (incorporated into Adaptive-RL) in terms of energy efficiency. Specifically, the degree of resource heterogeneity does not significantly hamper Adaptive-RL to maintain good energy efficiency.

VI. CONCLUSION

Energy efficiency in PDCSs has become increasingly important in that the running and expansion of these systems are heavily dependent on their power consumption. In this paper, we address this energy efficiency issue in the context of scheduling. The incorporation of energy consumption into scheduling adds another layer of complexity to already intricate scheduling problems in PDCSs. We have effectively modelled an energy management framework with the incorporation of learning-based scheduling and the explicit consideration of heterogeneity in both resources and tasks. In addition, the task-grouping (TG) technique is devised to effectively allocate resources with varying processing capacities to tasks with different priorities and processing requirements. Our novel adaptive reinforcement-based scheduling approach (Adaptive-RL) has successfully demonstrated its efficient energy management capability with compelling performance in terms of response time.

ACKNOWLEDGMENT Professor A.Y. Zomaya's work is supported by an Australian Research Council Grant LP0884070.

REFERENCES

[1] K. Czajkowski, I. Foster, and C. Kesselman, The Grid: Blueprint for a New Computing Infrastructure: Morgan Kaufmann, 2003.

[2] Y. C. Lee, and A. Y. Zomaya, "Scheduling in grid environments," Handbook of parallel computing: models, algorithms and applications, S.Rajasekaran and J. Reif, eds., pp. 21.1-21.19: CRC Press, Boca Raton, Florida, USA, 2008.

[3] M. Armbrust, A. Fox, R. Griffith et al., Above The Clouds: A Berkeley View of Cloud Computing, EECS Department, University of California, Berkeley, 2009.

[4] J. G. Koomey, "Worldwide electricity used in data centers," Environmental Research Letters, 2008.

[5] L. Minas, and B. Ellison, "InfoQ:The Problem of Power Consumption in Servers," Energy Efficiency for Information Technology: Intel PRESS, 2009.

[6] P. Schreier, “An Energy Crisis in HPC,” Scientific Computing World : HPC Projects Dec 2008/Jan 2009.

[7] K. W. Cameron, R. Ge, and X. Feng, “High-Performance Power-Aware Distributed Computing for Scientific Applications,” IEEE Computer, pages 40-47, 2005.

[8] L. A. Barroso, and U. Holzle, “The Case for Energy-Proportional Computing,” Journal Computer, 40(12): 33-37, 2007.

[9] S. Chahal, and S. Krishnapura, “Selecting Server Processors to Reduce Total Cost,” IT@Intel Brief, March 2009.

[10] A. Y. Zomaya, M. Clements, and S. Olariu, “A Framework for Reinforcement-based Scheduling in Parallel Processor Systems,” IEEE Transaction on Parallel and Distributed Systems, 9(3): 249-260, March 1998.

[11] G. Tesauro, R. Das, H. Chan et al., “Managing Power Consumption and Performance of Computing Systems Using Reinforcement Learning,” In Proc. of Neural Information Processing Systems (NIPS07), 2007.

[12] Y. Tan, W. Liu, and Q. Qiu, “Adaptive Power Management Using Reinforcement Learning,” In Int'l Conf. on Computer-Aided Design (ICCAD'09) San Jose, US, 2009.

[13] B. Josep Ll, G. igo, Ram et al., “Towards energy-aware scheduling in data centers using machine learning,” In Proc. of the 1st Int'l Conf. on Energy-Efficient Computing and Networking, Passau, Germany, 2010.

[14] V. Venkatachalam, and M. Franz, “Power reduction techniques for microprocessor systems,” Journal ACM Computing Surveys, 37(3): 195-237, 2005.

[15] Y. C. Lee, and A. Y. Zomaya, “Minimizing Energy Consumption for Precedence-constrained Applications Using Dynamic Voltage Scaling,” In 9th IEEE/ACM Int'l Symposium on Cluster Computing and the Grid (CCGRID2009), pages 92-99, 2009.

[16] K. Patel, E. Macii, and M. Poncino, “Synthesis of Partitioned Shared Memory Architectures for Energy-Efficient Multi-Processor SoC,” Design, Automation and Test in Europe Conference and Exhibition Volume I (DATE'04), vol. 1, page 10700, 2004.

[17] K. Li, “Energy efficient scheduling of parallel tasks on multiprocessor computers,” Journal of Supercomputing, pages 1-25, 2010.

[18] Y. C. Lee, and A. Y. Zomaya, “Energy efficient utilization of resources in cloud computing systems,” Journal of Supercomputing, pages 1-13, 2010.

[19] R. Subrata, A. Y. Zomaya, and B. Landfeldt, “Cooperative power-aware scheduling in grid computing environments,” Journal of Parallel and Distributed Computing, 70(2): 84-91, 2010.

[20] S. U. Khan, and C. Ardil, “Energy Efficient Resource Allocation in Distributed Computing System,” World Academy of Science, Engineering and Technology, no. 56, pages 667-673, 2009.

[21] K. Lu, and A. Y. Zomaya, “A Hybrid Policy for Job Scheduling and Load Balancing in Heterogeneous Computational Grids,” In 6th Int'l Symposium on Parallel and Distributed Computing (ISPDC'07), page 19, 2007.

[22] M. Hussin, Y. C. Lee, and A. Y. Zomaya, “Dynamic Job-Clustering with different computing priorities for Computational Resource Allocation,” In Poster Proc. IEEE/ACM Int'l Symposium on Cluster, Cloud and Grid (CCGRID), Melbourne, Australia, pages 1-2, 2010.

[23] K. H. Kim, R. Buyya, and J. Kim, “Power Aware Scheduling of Bag-of-Tasks Applications with Deadline Constraints on DVS-enabled Clusters,” In 7th IEEE/ACM Int'l Symposium on Cluster Computing and the Grid (CCGRID2007), 2007.

[24] Y. Fei, J. Changjun, D. Rong et al., “Grid resource management policies for load balancing and energy-saving by vacation queuing theory,” Computers and Electrical Engineering, 35(6): 966-979, 2009.

393