6

Click here to load reader

[IEEE 2006 IEEE International Conference on Cluster Computing - Barcelona, Spain (2006.09.25-2006.09.28)] 2006 IEEE International Conference on Cluster Computing - Multi-Objective

  • Upload
    su-hui

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: [IEEE 2006 IEEE International Conference on Cluster Computing - Barcelona, Spain (2006.09.25-2006.09.28)] 2006 IEEE International Conference on Cluster Computing - Multi-Objective

Multi-Objective Models for Scheduling Jobs on Parallel Computer Systems

Sangsuree Vasupongayya Su-Hui Chiang

Computer Sciences DepartmentPortland State University

{vsang, suhui}@cs.pdx.edu

Abstract

This paper is concerned with the design of goal-orientedscheduling policies that deal with multiple goals on produc-tion parallel systems. Several objective models are com-pared, including two new models that allow an explicitspecification of tradeoff among different goals and a hi-erarchical model studied in our previous paper. We focuson two performance goals commonly placed on productionsystems: starvation prevention and favoring shorter jobs.In this paper, we study how to best formulate high-levelperformance goals, in an attempt to provide guideline fordefining the objectives.

Keywords: multiple objectives, optimization, parallel jobscheduling, backfill.

1. Introduction

Many real-world optimization problems involve multi-ple goals that are in conflict with each other. In this pa-per, we are concerned with one such example, which is theproblem of scheduling jobs that run on parallel computersystems. The key challenge is: how to balance conflict-ing performance goals often placed on production parallelsystems? We focus on non-preemptive scheduling policies,commonly used on production parallel systems.

To deal with multiple performance goals, one approachcommonly used on production parallel computer systems isto prioritize jobs using a function which is a weighted sumof job measures, such as job wait time and slowdown. Jobswith higher priority are considered for scheduling beforeother jobs. Such priority-based schedulers require that thesystem administrators determine the priority weights in anad-hoc manner. A larger weight can be used for a job mea-sure to emphasize the importance of the measure. However,since a weighted priority function is not directly related tothe performance goals, it is difficult for administrators to de-termine the priority weights. Furthermore, even if a set of

weights perform well for a given period of time, they mightnot be appropriate during another period of time.

Ideally, the system administrators should only need tospecify high-level performance goals, while the schedulerautomatically determines the low-level scheduler parame-ters (such as the priority weights if appropriate) accordingto the given goals. To design such goal-oriented schedulers,one question is: what are the best models for making thetradeoff among conflicting goals? Another question is: howto formulate possibly vague high-level goals? For exam-ple, ‘preventing starvation’ is a requirement often placed onproduction systems. There may be different ways to formu-late such a goal, e.g., ”minimizing the maximum wait time”,”minimizing the total job wait time in excess of some targetwait time bound”. It is not immediately clear what mea-sures should be optimized to address the goal. Dealing withmultiple goals is a difficult problem. Several approacheshave been discussed in previous papers in other problem do-mains. However, a general solution does not exist. Further-more, previous papers have not studied formulating vaguehigh-level goals.

In [16], we have studied the above questions to someextent, and the results comparing particular goal-orientedpolicies based on searches with traditional priority backfillpolicies were encouraging. We focused on two high-levelgoals commonly placed on parallel computers: preventingstarvation and favoring shorter jobs. As a starting point,we formulated the goals using a particular hierarchical two-level objective function: (1) minimizing total excessive waitand (2) minimizing average slowdown. In this paper, we (1)propose new objective models that allow an explicit speci-fication of tradeoff; (2) compare different objective modelsand measures with respect to the above two given high-levelgoals, to understand their effect on performance and to pro-vide guideline in setting the objectives.

This paper is organized as follows. Section 2 providessome background information. Section 3 defines objectivemodels and measures. Section 4 discusses our evaluationmethod. Section 5 presents evaluation results. Section 6provides a summary.

1-4244-0328-6/06/$20.00 ©2006 IEEE.

Page 2: [IEEE 2006 IEEE International Conference on Cluster Computing - Barcelona, Spain (2006.09.25-2006.09.28)] 2006 IEEE International Conference on Cluster Computing - Multi-Objective

Table 1. Capacity and job limit on IA-64

Capacity Job LimitPeriod

(#Nodes) # Nodes Run time

128 12812h 6/03 - 11/0324h 12/03 - 3/04

2. Background

This section provides information of workloads used,and briefly review backfill policies, multi-objective prob-lems, and implementation of search-based scheduling poli-cies.

2.1. Workloads

We use the workload that ran on an Intel Itanium (IA-64)Linux cluster (a.k.a. Titan) from NCSA. Table 1 summa-rizes the system capacity and job limits. The monthly loadis typically in the range of 70-80%. More information aboutmonthly job mix can be found in [16]. We also studied SP2workloads from SDSC and KTH [17]. The results are qual-itatively similar to that for IA-64 and omitted to save space.

2.2. Priority Backfill Scheduling Policies

Since the first implementation of a backfill scheduler [8]a decade ago, several widely used production schedulers,including Maui [9], LSF [12], PBS [11], LoadLeveler [6],have implemented the backfill feature. Under priority back-fill policies, jobs are considered for scheduling in the givenpriority order; one or more highest-priority waiting jobs areeach given a reservation if they cannot be started at the cur-rent time; lower-priority jobs can be backfilled on idle re-sources as long as they will not delay the reservations.

Priority backfill policies have been extensively studied.Many papers compared alternative priority functions (e.g.,[18, 14, 3, 13, 1]); some studied the impact of number ofreservations [10, 2]. The key conclusions are below. First,there is a tradeoff between minimizing the maximum waitand minimizing average-performance measures. In particu-lar, LXF-backfill (i.e., largest-slowdown-first) significantlyimproves the average job slowdown and wait time of FCFS-backfill (first-come-first-served) but may have a poor max-imum wait. In the extreme case, Shortest-Job-First withbackfill has a starvation problem and thus is not a practi-cal policy. Second, too many reservations can degrade theaverage performance, while not improving maximum wait.

Many papers have proposed variations of backfill poli-cies, mainly to improve the average slowdown or averagewait time of FCFS-backfill. Here, we discuss two recentpapers [15, 7], most related to our work. They proposed

adaptive backfill policies, which apply on-line simulationsof a fixed set of candidate backfill policies, and choose thepolicy that has the best average wait or average slowdown.The two approaches differ in when and how they performon-line simulations and what candidate policies are simu-lated. Both studies focus on only a single average perfor-mance measure and consider only a fixed set of candidatepolicies. In contrast, our policies deal with multiple per-formance goals, and explore as many schedules as the timepermits using combinatorial searches.

2.3. Multi-Objective Problems

Several approaches have been proposed to deal withmulti-objective problems. A survey of this subject can befound in many papers (e.g., [4, 5]). The approach in ourprevious study [16] is based on lexicographical ordering,combined with an idea similar to goal programming. In lex-icographical ordering, the objectives are ranked in order ofimportance; the best solution optimizes the most importantobjective, and the next lower level objective is used to breaka tie (if any). In goal programming, the measure in each ob-jective is given a target value to achieve, and the best oneminimizes the maximum deviation from the target values.

There are two other approaches, not considered. Aweighted objective is similar to a weighted priority func-tion. While simple, it is difficult to choose the weights.The pareto optimization approach finds all solutions that arenon-dominated, i.e., no other solutions are superior to themin all objective components. Some mechanism is still re-quired to choose the ‘best’ of the non-dominated solutions.

2.4. Search-based Scheduling Policies

In this section, we briefly discuss the main features ofthe goal-oriented scheduling policies proposed in [16].

At each scheduling epoch (triggered by a new job ar-rival or a departure), the scheduler explore the potentiallylarge space of possible schedules to find the ‘best’ one. Thesearch space is a tree of waiting jobs. Figure 1 shows an ex-ample tree of four jobs. Each node represents a job (exceptthe dummy node). In this example, the tree contains 16 (i.e.,4!) possible permutations (i.e., schedules) of the four jobs,and 64 nodes. When evaluating a schedule (e.g., 1-2-3-4),jobs are considered in the order they appear in the schedule,and the start time of each job is tentatively determined us-ing first fit. The performance measures of each schedule arecomputed and compared with the best schedule found so farin the current scheduling epoch. Due to the potentially largesearch space, it is computationally infeasible to explore theentire search space; a stopping criteria is used to stop thesearch. At the end of the current scheduling epoch, the bestschedule is used.

Page 3: [IEEE 2006 IEEE International Conference on Cluster Computing - Barcelona, Spain (2006.09.25-2006.09.28)] 2006 IEEE International Conference on Cluster Computing - Multi-Objective

2

34

3

24

4

23

1

34

3

14

4

13

1

24

2

14

4

21

1

23

2

13

3

12

1

0

2 3 4

434232434131424112323121

Figure 1. An example tree of four jobs 1 - 4,showing all permutations (node 0 is a dummy)

Same as in our previous study, the stopping criteria usedin this study is a limit on the number of nodes that can bevisited during each scheduling epoch. The symbol L de-notes the nodes limit. For example, for the 4-job example,with L = 4, only one schedule (which contains four nodes)can be evaluated.

The success of search algorithms depends on whetherthey explore good schedules soon enough. In [16], we havecompared several heuristics. For the particular schedul-ing problem studied, we found that the best algorithm isdepth-discrepancy search (DDS) combined with the largest-slowdown-first (lxf) branching heuristic. Note DDS deter-mines the order of schedules to be evaluated; the lxf branch-ing heuristic determines how to order the jobs in the tree.Interested readers are referred to [16] for more detail.

In this study, a range of L between 100 and 64K nodesare evaluated. We find that for most months, L = 400 nodesis representative for the performance of the goal-orientedpolicies studied. The only exception is January 2004, inwhich L = 4K is required due to a large backlog in thatmonth. In Section 5, we show the performance of goal-oriented policies using L = 400 for all except it is 4K forJan. 2004. The execution time of each scheduling epochincreases roughly linearly with L; it takes a few to a fewtens milliseconds to evaluate 4K nodes on a 2-GHZ IntelPentium-4 Windows XP with 512MB memory.

3. Objectives and policies

In Section 3.1, we define three objective models to bestudied. Policies and particular measures to be used in theobjective are defined in Section 3.2.

3.1. Objective models

Three objective models are studied: Lexical, Ordered-tradeoff, and Equal-tradeoff models. For the purpose of ex-plaining the models, consider the objective of optimizingtwo measures α and β. Assume that S1 is the best sched-ule found so far in a given scheduling epoch, and S2 is the

next schedule to be considered. Table 2 formally defines theconditions under which S2 is chosen over S1 in each model.More discussions are provided below.Lexical model denoted by Lexical(α→β). In this model, αdominates β, i.e., if S2 improves α of S1, then S2 is chosenover S1, regardless how their β values compare; the mea-sure β is only evaluated to break a tie. Note that this modelis the same as the hierarchical model used in our previouspaper [16], where α is the total excessive wait (to be definedin Section 3.2) and β is the average slowdown. Althoughsimple, the measures need to be carefully chosen to avoidsacrificing the second measure β.Ordered-tradeoff model denoted by Tradeoff(α→β). Inthis model, α does not dominate β, but is still more impor-tant than β. The values of β in S1 and S2 need to be com-pared even if S2 improves α, and if S2 degrades β morethan it improves α, then S2 is not considered to be better.The improvement and degradation are computed as ratios,as shown in Table 2. For the measure α, the notation is�α(S1, S2) and �α(S1, S2), respectively.Equal-tradeoff model denoted by Tradeoff(α:β). In thismodel, the measures α and β are equally important. Thatis, S2 is chosen over S1 if S2 improves any measure whilewithout making more degradation to the other measure.Note that Tradeoff(α:β) is equivalent to Tradeoff(β:α).

3.2. Measures and goal-oriented policies

In this section, we define the goal-oriented policies to bestudied and the measures used in their objectives.

Table 3 summarizes the measures considered. There aretwo starvation measures: total excessive wait (Tw) and max-imum wait (maxW ); two average measures: average wait(avgW ) and average slowdown (avgX). Thus, there are 2× 2 (i.e., four) pairs of measures.

The excessive wait time of each job is defined to be thejob wait time in excess of a given threshold, t. Note that ifa job has waited ≤ t, the job has a zero excessive wait. Thetotal excessive wait is the sum of the excessive wait over alljobs that are currently waiting. The idea of minimizing thetotal excessive wait is to minimize the total deviation of thewait time of each job currently waiting in the queue from thegiven threshold. To adapt to the changes in the workloads,we use a dynamic threshold, defined to be the time the oldestjob currently in the queue has been waiting since it arriveduntil when the current scheduling epoch begins. In [16], weshowed that using this dynamic threshold results in muchbetter performance than that of using a fixed threshold.

For each pair of measures, five policies can bedefined. For example, for the pair Tw and avgX , the fol-lowing policies can be defined: Lexical(Tw→avgX),Lexical(avgX→Tw), Tradeoff(Tw→avgX),Tradeoff(avgX→Tw), and Tradeoff(Tw:avgX).

Page 4: [IEEE 2006 IEEE International Conference on Cluster Computing - Barcelona, Spain (2006.09.25-2006.09.28)] 2006 IEEE International Conference on Cluster Computing - Multi-Objective

Table 2. Definition of objective models: two measures α, β

Model Notation Conditions under which S2 is considered better than S1

Lexical Lexical(α→β) (1) �α(S1, S2) > 0, or(2) �α(S1, S2) = 0 and �β(S1, S2) > 0

Ordered-tradeoff Tradeoff(α→β) (1) �α(S1, S2) > 0 and �α(S1, S2) ≥ �β(S1, S2), or(2) �α(S1, S2) = 0 and �β(S1, S2) > 0

Equal-tradeoff Tradeoff(α:β) (1) �α(S1, S2) > 0 and �α(S1, S2) ≥ �β(S1, S2), or(2) �β(S1, S2) > 0 and �β(S1, S2) ≥ �α(S1, S2)

Let αS be the value of measure α in the schedule S.Assuming minimization of α, define the following:

�α(S1, S2) = (αS1 − αS2)/αS1 : the ratio improvement in α made by S2 over S1

�α(S1, S2) = (αS2 − αS1)/αS1 : the ratio degradation in α made by S2 over S1

(Similarly, �β(S1, S2) and �β(S1, S2) can be defined)

Table 3. Notation of measures in the objective

Notation MeaningStarvation Tw Total excessive wait over all jobsmeasure maxW Maximum job wait timeAverage avgW Average job wait timemeasure avgX Average job slowdown(slowdown = turnaround time / runtime for each job)

4. Evaluation Methodology

Policies are evaluated using event-driven simulation often monthly IA-64 job traces, discussed in Section 2.1.Goal-oriented scheduling policies are compared againstFCFS-backfill and LXF-backfill, which roughly provide theenvelope for the maximum wait and average slowdown, re-spectively, for backfill policies. In our simulation, bothbackfill policies give a scheduled start time to the highest-priority job only, as we do not find more reservations toimprove the performance for the workloads studied.

An extensive set of measures are used for performanceevaluation, including the average wait, maximum wait, andaverage bounded slowdown, as well as total normalized ex-cessive wait. For performance comparison purpose, we usethe bounded slowdown instead of slowdown, to reduce thedramatic effect of very short jobs on the average slowdownmeasure. Specifically, we use 1 minute to lower bound ac-tual runtime, i.e., the bounded slowdown of a job of un-der 1 minute is computed as if it were a 1-minute job. Thenormalized excessive wait of each job submitted in a givenmonth is the job wait time in excess of the maximum waitunder FCFS-backfill of the month. This measure is usedonly for comparing policies, but not used in the simulation.

Two levels of loads are simulated: (1) ρ = original load;(2) ρ = 0.9, which are artificially created by shrinking job

interarrival times, as in previous papers (e.g., [13]). Resultsare shown for ρ = 0.9 only. The performance differencebetween policies for original load is smaller. Each simula-tion of a given month includes a one-week (from previousmonth) warm up and a one-week (from next month) cooldown. Performance measures for a month are computed forjobs submitted during the month.

Finally, we assume that job runtime information isknown a priori to the scheduler. This allows us to study theperformance of goal-oriented policies without the complexinterference from the inaccurate runtime information.

5. Evaluation Results

Given two high-level goals: starvation prevention and fa-voring shorter jobs, different objectives are constructed us-ing the objective models defined in Table 2 and measuresdefined in Table 3. This section studies whether and howthese different objectives perform.

5.1. Comparisons of objective models

In this section, we evaluate the three objective models,using the same pair of measures: (1) Tw and (2) avgX .They are Lexical(Tw→avgX), Tradeoff(Tw→avgX), andTradeoff(Tw:avgX). Note that Lexical(Tw→avgX) is thesame as ”DDS/lxf/dynB” in [16].

Figure 2 compares the three policies, as well as FCFS-backfill and LXF-backfill. As discussed in Section 2.4, thenumber of nodes limit (L) used in each policy is 400 foreach month, except it is 4K for Jan. 2004. Figure 2(a)-(b)show that the three goal-oriented policies have fairly sim-ilar average and maximum wait each month. Figure 2(c)-(d) show that Lexical(Tw→avgX) has significantly worseaverage and maximum bounded slowdown than that of thetwo Tradeoff models in a few months (especially 7/03), be-

Page 5: [IEEE 2006 IEEE International Conference on Cluster Computing - Barcelona, Spain (2006.09.25-2006.09.28)] 2006 IEEE International Conference on Cluster Computing - Multi-Objective

0

5

10

15

avg

wai

t (hr

)

6/03

7/03

8/03

9/03

10/0

3

11/0

3

12/0

3

1/04

2/04

3/04

FCFS−backfillLXF−backfillLexical(Tw → avgX)Tradeoff(Tw → avgX)Tradeoff(Tw : avgX)

(a) Avg. wait

0

50

100

150

200

max

wait

(hr)

6/0

3

7/0

3

8/0

3

9/0

3

10

/03

11

/03

12

/03

1/0

4

2/0

4

3/0

4

FCFS−backfillLXF−backfillLexical(Tw → avgX)Tradeoff(Tw → avgX)Tradeoff(Tw : avgX)

(b) Max. wait

0

50

100

150

avg

bo

un

de

d s

low

do

wn

6/0

3

7/0

3

8/0

3

9/0

3

10/0

3

11/0

3

12/0

3

1/0

4

2/0

4

3/0

4

FCFS−backfillLXF−backfillLexical(Tw → avgX)Tradeoff(Tw → avgX)Tradeoff(Tw : avgX)

(c) Avg. bounded slowdown

0

2000

4000

6000

max

bounded s

low

dow

n

6/0

3

7/0

3

8/0

3

9/0

3

10

/03

11

/03

12

/03

1/0

4

2/0

4

3/0

4

FCFS−backfillLXF−backfillLexical(Tw → avgX)Tradeoff(Tw → avgX)Tradeoff(Tw : avgX)

(d) Max. bounded slowdown

Figure 2. Tradeoff vs. Lexical models using Tw and avgX: overall performanceL = 400 (except 1/04: L = 4K)

0

20

40

60

80

100

max

wait

(hr)

10

0

20

0

40

0

80

0

1K

2K

4K

8K

16

K

64

K

# nodes limit (L)

Lexical(Tw → avgX)Tradeoff(Tw → avgX)Tradeoff(Tw : avgX)

(a) Max. wait

0

20

40

60

80

100

avg

bounded s

low

dow

n

10

0

20

0

40

0

80

0

1K

2K

4K

8K

16

K

64

K

# nodes limit (L)

Lexical(Tw → avgX)Tradeoff(Tw → avgX)Tradeoff(Tw : avgX)

(b) Avg. bounded slowdown

Figure 3. Performance versus nodes limit(August 2003)

cause Tw dominates avgX in Lexical(Tw→avgX). To il-lustrate the problem, Figure 3 shows how the three goal-oriented policies change versus L in the range of 100 and64K, for Aug. 2003. Many other months have a sim-ilar trend. Figure 3(a) shows as L increases, each pol-icy graduately improves the maximum wait, as expected.However, Figure 3(b) shows that as L increases to 4K,Lexical(Tw→avgX) degrades avgX more than improvingthe maximum wait. Nevertheless, Lexical(Tw→avgX) stillperforms well, compared to the backfill policies (shown inFigure 2), because optimizing a measure like Tw still leavesroom for improving average slowdown.

The key result is that for the optimization prob-lem studied, both Tradeoff models are preferred overthe Lexical model; between the two Tradeoff models,Tradeoff(Tw:avgX) is perhaps preferred. Finally, Figure 2also shows that these goal-oriented policies have a maxi-mum wait similar to FCFS-backfill and the average perfor-mance comparable to LXF-backfill in almost all months.

5.2. Impact of different objective measures

In this section, we study to what extent using differentmeasures in the objective affect the performance, with a fo-cus on the starvation measures. Tradeoff(Tw:avgX), fromthe previous section, is compared with policies that use dif-ferent measures in the objectives.

0

100

200

300

400

ma

x w

ait

(hr)

6/0

3

7/0

3

8/0

3

9/0

3

10

/03

11

/03

12

/03

1/0

4

2/0

4

3/0

4

Tradeoff(maxW : avgX)Tradeoff(maxW → avgX)Tradeoff(Tw : avgX)

(a) Max. wait

0

10

20

30

40

50

60

avg

boun

ded

slow

dow

n

6/03

7/03

8/03

9/03

10/0

3

11/0

3

12/0

3

1/04

2/04

3/04

Tradeoff(maxW : avgX)Tradeoff(maxW → avgX)Tradeoff(Tw : avgX)

(b) Avg. bounded slowdown

0

20

40

60

max

wai

t (hr

)

100

200

400

800

1K

2K

4K

8K

16K

64K

# nodes limit (L)

FCFS−bfLXF−bfTradeoff(maxW → avgX)Tradeoff(Tw : avgX)

(c) Max. wait vs. L(Oct. 2003)

0

100

200

300

tota

l norm

aliz

ed

exc

ess

ive w

ait

(hr)

100

200

400

800

1K

2K

4K

8K

16

K

64

K

# nodes limit (L)

FCFS−bfLXF−bfTradeoff(maxW → avgX)Tradeoff(Tw : avgX)

(d) Total normalizedexcessive wait vs. L

(Oct. 2003)

Figure 4. Optimizing maxW or Tw?Graphs (a)-(b): L = 400 (except 1/04: L = 4K)

Figure 4(a)-(b) plot the maximum wait and averageslowdown for each month under Tradeoff(maxW :avgX),Tradeoff(maxW→avgX), and Tradeoff(Tw:avgX). Fig-ure 4(c)-(d) plot the maximum wait and total normal-ized excessive wait of Tradeoff(maxW→avgX) andTradeoff(Tw:avgX) versus L, for October 2003; the twobackfill policies are included for convenience. Figure 4(a)shows Tradeoff(maxW :avgX) has a starvation problem inmost months, even though optimizing maxW is part of itsobjective. In contrast, Tradeoff(maxW→avgX), in whichmaxW is more important than avgX , performs fairly sim-ilarly to Tradeoff(Tw:avgX), shown in Figure 4(a)-(b).However, Figure 4(c) reveals a potential instability problemin Tradeoff(maxW→avgX), i.e., its maximum wait oscil-lates in a fairly wide range as L changes. The problem is

Page 6: [IEEE 2006 IEEE International Conference on Cluster Computing - Barcelona, Spain (2006.09.25-2006.09.28)] 2006 IEEE International Conference on Cluster Computing - Multi-Objective

observed in several other months. In addition, Figure 4(d)shows that Tradeoff(maxW→avgX) has worse total nor-malized excessive wait than that of Tradeoff(Tw:avgX).

The results strongly suggests that trading between asingle-job measure (such as maxW ) and an overall-jobmeasure (such as avgX) can be problematic.

Below, we comment on other results, not shownto conserve space. First, policies that optimize theaverage wait instead of the average slowdown (e.g.,Tradeoff(maxW :avgW )) have significantly worse averageand maximum slowdown for many months, while havingno or minimal improvement on the wait time measures,compared to their counterparts (in Figure 4) that optimizeavgX . Second, making avgX or avgW more importantthan Tw or maxW results in worse performance than thatof Tradeoff(Tw:avgX).

6. Conclusions

Mutli-objective problems typically involve tradeoffs. Inthis study, we propose objective models that allow an ex-plicit specification of the tradeoff among objective com-ponents, and compare them with a lexical ordering modelstudied in our previous paper [16]. We evaluate these mod-els in the context of job scheduling for parallel computersystems. We focus on two high-level goals often placedon production parallel systems: (1) starvation prevention;(2) favoring shorter jobs. Objectives are constructed usingmeasures consistent with the goals. They are (1) total ex-cessive wait (Tw) or maximum wait (maxW ); (2) averageslowdown (avgX) or average wait (avgW ), respectively.Performance is evaluated by simulation using ten monthlyIA-64 workloads from NCSA.

The key conclusions are: (1) our Equal-tradeoff modelcan be more effective in dealing with objective componentsthat are roughly equally important, compared to the Lexicalmodel studied in [16]; (2) it is not appropriate trading be-tween a single-job measure (such as maxW ) and an overallmeasure (such as avgX); (3) minimizing Tw is more ap-propriate than minimizing maxW in dealing with ‘starva-tion prevention’, when there are other conflicting objectivecomponents to consider. Another result is that optimizingaverage slowdown is more effective than optimizing aver-age wait, in that it achieves a considerably better averageslowdown and a similar average wait.

Future work includes: incorporating special priority andfairshare goals in the objectives, further improving the effi-ciency of search algorithm, and studying the impact of user-estimated run time on the new objective models.

References[1] S.-H. Chiang, A. Dusseau-Arpaci, and M. K. Vernon. The

impact of more accurate requested runtimes on produc-

tion job scheduling performance. Proc. 8th Job SchedulingStrategies for Parallel Processing, pp. 103–127, Edinburgh,Scotland, July 2002. Springer Verlag. LNCS. Vol. 2537.

[2] S.-H. Chiang and C. Fu. Re-evaluating reservation policiesfor backfill scheduling on parallel systems. 16th IASTEDInt’l Conf. on Parallel and Distributed Computing and Sys-tems, Cambridge, MA., Nov. 2004.

[3] S.-H. Chiang and M. K. Vernon. Production job schedulingfor parallel shared memory systems. In Proc. 15th IEEEInt’l. Parallel and Distributed Processing Symp. (IPDPS),San Francisco, April 2001.

[4] M. Ehrgott and X. Gandibleux. A survey and annotated bib-liography of multiobjective combinatorial optimization. ORSpektrum, 22:425–460, 2000.

[5] C. M. Fonseca and P. J. Fleming. An overview of evolution-ary algorithms in multiobjective optimisation. EvolutionaryComputation, 3(1):1–16, 1995.

[6] S. Kannan, M. Roberts, P. Mayes, D. Brelsford, and J. F.Skovira. Workload Management with Loadleveler, IBM,ibm.com/redbooks.

[7] B. Lawson and E. Smirni. Self-adaptive scheduler param-eterization via online simulation. In Proc. 19th IEEE Int’lParallel & Distributed Processing Symposium (IPDPS),Denver, April 2005.

[8] D. Lifka. The ANL/IBM SP scheduling system. Proc.1st Job Scheduling Strategies for Parallel Processing, SantaBarbara, April 1995. Springer Verlag. LNCS. Vol. 949.

[9] Maui Scheduler, http://www.supercluster.org/maui/.[10] A. W. Mu’alem and D. G. Feitelson. Utilization, predictabil-

ity, workloads, and user runtime estimates in scheduling theIBM SP2 with backfilling. IEEE Trans. Parallel & Dis-tributed Syst., 12(6):529–543, June 2001.

[11] PBS Scheduler, www.nas.nasa.gov/Software/PBS/.[12] Platform Computing Corporation, North York. LSF Sched-

uler. http://www.platform.com/.[13] S. Srinivasan, R. Kettimuthu, V. Subramani, and P. Sadayap-

pan. Selective reservation strategies for backfill job schedul-ing. Proc. 8th Job Scheduling Strategies for Parallel Pro-cessing, Edinburgh, Scotland, July 2002. Springer Verlag.LNCS. Vol. 2537.

[14] D. Talby and D. G. Feitelson. Supporting priorities and im-proving utilization of the IBM SP2 scheduler using slack-based backfilling. 13th Int’l. Parallel Processing Symp., pp.513–517, San Juan, April 1999.

[15] D. Talby and D. G. Feitelson. Improving and stabilizingparallel computer performance using adaptive backfilling.In Proc. 19th IEEE Int’l Parallel & Distributed ProcessingSymposium, Denver, April 2005.

[16] S. Vasupongayya, S.-H. Chiang, and B. Massey. Search-based job scheduling for parallel computer workloads. InProc. IEEE Int’l Conf. on Cluster Computing (Cluster2005), Boston, MA, Sep. 2005.

[17] Parallel Workloads Archive,www.cs.huji.ac.il/labs/parallel/workload/models.html.

[18] D. Zotkin and P. J. Keleher. Job-length estimation and per-formance in backfilling schedulers. In 8th IEEE Int’l Symp.on High Performance Distributed Computing, pp. 236–243,Redondo Beach, August 1999.