13
J Comb Optim (2007) 13:33–45 DOI 10.1007/s10878-006-9011-y A simple linear time approximation algorithm for multi-processor job scheduling on four processors Jingui Huang · Jianer Chen · Songqiao Chen · Jianxin Wang Published online: 12 October 2006 C Springer Science + Business Media, LLC 2006 Abstract Multiprocessor job scheduling problem has become increasingly interesting, for both theoretical study and practical applications. Theoretical study of the problem has made significant progress recently, which, however, seems not to imply practical algorithms for the problem, yet. Practical algorithms have been developed only for systems with three proces- sors and the techniques seem difficult to extend to systems with more than three processors. This paper offers new observations and introduces new techniques for the multiprocessor job scheduling problem on systems with four processors. A very simple and practical linear time approximation algorithm of ratio bounded by 1.5 is developed for the multi-processor job scheduling problem P 4 |fix|C max , which significantly improves previous results. Our tech- niques are also useful for multiprocessor job scheduling problems on systems with more than four processors. Keywords Multi-processor job scheduling . Approximation algorithm . NP-hard problem 1 Introduction One of the assumption made in classical scheduling theory is that a job is always executed by one processor at a time. With the advances in parallel algorithms, this assumption may no longer be valid for job systems. For example, in semiconductor circuit design workforce planning, a design project is to be processed by a group of people. The project contains n jobs, and each job is handled by a specific subgroup of people working simultaneously on the job. Note that a person may belong to several different subgroups but he can work for at most one subgroup at a time. Now the question is how we can schedule the jobs so J. Huang . J. Chen . S. Chen . J. Wang School of Information Science and Engineering, Central South University, ChangSha, Hunan 410083, The People’s Republic of China J. Chen () Department of Computer Science, Texas A&M University, College Station, TX 77843-3112, USA e-mail: [email protected] Springer

A simple linear time approximation algorithm for multi-processor job scheduling on four processors

Embed Size (px)

Citation preview

Page 1: A simple linear time approximation algorithm for multi-processor job scheduling on four processors

J Comb Optim (2007) 13:33–45

DOI 10.1007/s10878-006-9011-y

A simple linear time approximation algorithm formulti-processor job scheduling on four processors

Jingui Huang · Jianer Chen · Songqiao Chen ·Jianxin Wang

Published online: 12 October 2006C© Springer Science + Business Media, LLC 2006

Abstract Multiprocessor job scheduling problem has become increasingly interesting, for

both theoretical study and practical applications. Theoretical study of the problem has made

significant progress recently, which, however, seems not to imply practical algorithms for the

problem, yet. Practical algorithms have been developed only for systems with three proces-

sors and the techniques seem difficult to extend to systems with more than three processors.

This paper offers new observations and introduces new techniques for the multiprocessor

job scheduling problem on systems with four processors. A very simple and practical linear

time approximation algorithm of ratio bounded by 1.5 is developed for the multi-processor

job scheduling problem P4|fix|Cmax, which significantly improves previous results. Our tech-

niques are also useful for multiprocessor job scheduling problems on systems with more than

four processors.

Keywords Multi-processor job scheduling . Approximation algorithm . NP-hard problem

1 Introduction

One of the assumption made in classical scheduling theory is that a job is always executed

by one processor at a time. With the advances in parallel algorithms, this assumption may

no longer be valid for job systems. For example, in semiconductor circuit design workforce

planning, a design project is to be processed by a group of people. The project contains

n jobs, and each job is handled by a specific subgroup of people working simultaneously

on the job. Note that a person may belong to several different subgroups but he can work

for at most one subgroup at a time. Now the question is how we can schedule the jobs so

J. Huang . J. Chen . S. Chen . J. Wang

School of Information Science and Engineering, Central South University, ChangSha, Hunan 410083,

The People’s Republic of China

J. Chen (�)

Department of Computer Science, Texas A&M University, College Station, TX 77843-3112, USA

e-mail: [email protected]

Springer

Page 2: A simple linear time approximation algorithm for multi-processor job scheduling on four processors

34 J Comb Optim (2007) 13:33–45

that the project can be finished as early as possible. Other applications include (i) the berth

allocation problem (Lee and Cai, 1999) where a large vessel may occupy more than one berth

for loading and unloading, (ii) diagnosable microprocessor systems (Krawczyk and Kubale,

1985) where a job must be performed on parallel processors in order to detect faults, (iii)

manufacturing, where a job may need machines, tools, and people simultaneously (this gives

an example for a system in which processors may have different types), and (iv) scheduling

a sequence of meetings where each meeting requires a certain group of people (Dobson and

Karmarkar, 1989). In the scheduling literature (Hall, 1997), this kind of problems are called

multiprocessor job scheduling problems.

Formally, each instance for the multiprocessor job scheduling problem on a k-processor

system (P1, P2, . . . , Pk) consists of a set of jobs J = { j1, j2, . . . , jn}, where each job ji is

given by a pair (Qi , ti ) with Qi a subset of the processors in the k-processor system and ti the

execution time, to specify that the job ji requires the set Qi of processors with running time

ti . Therefore, the job ji is a multiprocessor job that requires the simultaneous execution of

a set of processors instead of a single processor. Of course, no processor can be assigned to

participate in the executions of more than one job at any moment. The goal is to construct a

scheduling for these multiprocessor jobs so that the k-processor system finishes the execution

of all the jobs in the minimum amount of time. Preemption is not allowed in the scheduling.

This multiprocessor job scheduling problem is denoted as Pk |fix|Cmax in the literature (Hall,

1997).

Feasibility and approximability of the Pk |fix|Cmax problem have been studied by many

researchers. The P2|fix|Cmax problem is a generalized version of the classical job scheduling

problem on a 2-processor system (Garey and Johnson, 1979), thus it is NP-hard. Chen and

Lee (1999) developed a fully polynomial time approximation scheme for the P2|fix|Cmax

problem. Hoogeveen et al. (1994) showed that the P3|fix|Cmax problem is NP-hard in the

strong sense thus it does not have a fully polynomial time approximation scheme unless P =NP (see also (Blazewicz et al., 1992,/)). Blazewicz et al. (1992) developed a polynomial

time approximation algorithm of ratio 4/3 for the problem P3|fix|Cmax, which was improved

later by Dell’Olmo et al. (1997), who gave a polynomial time approximation algorithm of

ratio 5/4 for the same problem. Both algorithms are based on the study of a special type

of schedulings called normal schedulings. Goemans (1995) further improved the results by

giving a polynomial time approximation algorithm of ratio 7/6 for the P3|fix|Cmax problem.

More recently, Amoura et al. (2002) developed a polynomial time approximation scheme for

the problem Pk |fix|Cmax for every fixed integer k. Polynomial time approximation schemes

for a more generalized version of the Pk |fix|Cmax problem have also been developed recently

(Chen and Miranda, 2001; Jansen and Porkolab, 2005).

The polynomial time approximation schemes (Amoura et al., 2002; Chen and Miranda,

2001; Jansen and Porkolab, 2005) for multiprocessor job scheduling problems are of great

theoretical significance. However, the current versions of the algorithms seem not practically

useful yet, because of the very high degree of the polynomials (Amoura et al., 2002; Chen

and Miranda, 2001), or the huge constant coefficient (Jansen and Porkolab, 2005) in the time

complexity of the algorithms. Chen and Miranda (2001) have asked for practically efficient

approximation algorithms for the multiprocessor job scheduling problem for systems with

small number of processors. For the P3|fix|Cmax problem, Goemans’ approximation algorithm

(Goemans, 1995) runs in linear time and has approximation ratio 7/6, which seems very

acceptable practically. On the other hand, no similar results have been developed for the

Pk |fix|Cmax problem with k ≥ 4. In particular, the techniques developed for the P3|fix|Cmax

problem, including the analysis of normal scheduling (Blazewicz et al., 1992; Dell’Olmo

et al., 1997) and the method of splitting 1-processor jobs (Goemans, 1995), seem difficult to

Springer

Page 3: A simple linear time approximation algorithm for multi-processor job scheduling on four processors

J Comb Optim (2007) 13:33–45 35

be generalized to systems of more than 3 processors. Currently, the best practical algorithm

for the Pk |fix|Cmax problem is due to Chen and Lee (1999), which runs in linear time and has

approximation ratio k/2 for k ≥ 3.

The current paper is a respondence to the call by Chen and Miranda (2001). We focus

on the P4|fix|Cmax problem. We offer several new observations and introduce several new

techniques. First, we derive two lower bounds for the makespan of an optimal scheduling for

a job set, one is based on 3-processor jobs and a careful pairing of 2-processor jobs, and the

other is based on the total processing time of each processor. The lower bounds are simple

but very effective in the analysis of our approximation algorithms. Secondly, we carefully

pair the 2-processor jobs and make a partial scheduling of 3-processor jobs and 2-processor

jobs. We observe that in these partial schedulings, most processor idle times are within a

small region, thus the 1-processor jobs can be inserted in this region to effectively fill the

gaps. Our derived lower bounds also help us to effectively handle large gaps. Combining all

these, we derive an approximation algorithm of ratio 1.5 for the P4|fix|Cmax problem, which

significantly improves the best previous ratio 2 for practical algorithms for the problem.

The paper is organized as follows. Section 2 derives two lower bounds for the makespan of

optimal schedulings for the P4|fix|Cmax problem. Section 3 discusses partial schedulings for

3-processor jobs and 2-processor jobs. The approximation algorithm and its analysis are given

in Section 4. In Section 5 we show that the ratio 1.5 is tight for our approximation algorithm

for the P4|fix|Cmax problem. We conclude in Section 6 and propose several problems for

future research.

2 Two simple lower bounds

Suppose that the system has four processors P1, P2, P3, and P4. An instance of the P4|fix|Cmax

problem is a set of jobs: J = { j1, j2, . . . , jn}, where each job ji is described by a pair

ji = (Qi , ti ), Qi is a subset of {P1, P2, P3, P4} indicating the processor set required to execute

the job ji , and ti is the parallel processing time of the job ji executed on the processor set

Qi . The processor set Qi is called the processing mode (or simply mode) of the job ji . To

make the job ji meaningful, the mode Qi must be a nonempty set.

A scheduling S of the job set J is an assignment of each job ji in J with a starting time to

be executed on the processor set Qi in the 4-processor system such that no processor is used

for execution of more than one job at any moment. The makespan of the scheduling S is the

latest finish time of a job in J under the scheduling S, denoted by S(J ). An optimal schedulingS of the job set J is a scheduling whose makespan is the minimum over all schedulings of J .

The makespan of an optimal scheduling is denoted by Opt(J ). An approximation algorithmA of the P4|fix|Cmax problem is an algorithm that for each given instance J of the P4|fix|Cmax

problem constructs a scheduling for J . We say that the approximation ratio of the algorithm

A is (bounded by) r if for any instance J , the scheduling S constructed by the algorithm Asatisfies S(J )/Opt(J ) ≤ r .

We consider approximation algorithms for the P4|fix|Cmax problem. Since we can always

schedule jobs of mode {P1, P2, P3, P4} before other jobs without increasing the approxi-

mation ratio of an algorithm for the P4|fix|Cmax problem, we can assume, without loss of

generality, that an instance of P4|fix|Cmax contains no jobs requiring all four processors in

the system. Thus, there are at most 24 − 2 = 14 processing modes for jobs in an instance

of the P4|fix|Cmax problem (note that the mode corresponding to the empty set is also ex-

cluded). A job is a 1-processor job if its mode contains a single processor. Similarly, we

define 2-processor jobs and 3-processor jobs.

Springer

Page 4: A simple linear time approximation algorithm for multi-processor job scheduling on four processors

36 J Comb Optim (2007) 13:33–45

Group the jobs in J , according to their processing modes into 14 subsets of jobs:

J1, J2, J3, J4, J12, J13, J14, J23, J24, J34, J123, J124, J134, J234

here for simplicity, we have used indices instead of processor subsets to indicate the processing

modes. For example, J124 is the subset of jobs whose mode is {P1, P2, P4}. For each index

label I , let tI be the total processing time of the jobs in the subset JI . For example, t124 is the

sum of the processing times of the jobs of mode {P1, P2, P4} in J .

We first derive two lower bounds for the value Opt(J ). Let

T1 = t1 + t12 + t13 + t14 + t123 + t124 + t134

T2 = t2 + t12 + t23 + t24 + t123 + t124 + t234

T3 = t3 + t13 + t23 + t34 + t123 + t134 + t234

T4 = t4 + t14 + t24 + t34 + t124 + t134 + t234

That is, Ti is the total processing time of processor Pi required by all the jobs in the instance

J .

Since even without any idle time, each processor Pi must spend processing time Ti on the

job set J , we derive the first lower bound on Opt(J ):

Opt(J ) ≥ max{T1, T2, T3, T4}

We say that two modes Q1 and Q2 are consistent if a job of mode Q1 and a job of mode Q2

can be executed in parallel (i.e., Q1 ∩ Q2 = ∅). For example, the mode {P1, P2} and mode

{P3, P4} are consistent, and the mode {P1, P3, P4} and mode {P2} are consistent.

Partition the 2-processor job subsets and the 3-processor job subsets in J into seven

groups:

{J123}, {J124}, {J134}, {J234}, {J12, J34}, {J13, J24}, {J14, J23} (1)

where each group consists of either a single subset of 3-processor jobs of the same mode, or

two subsets of 2-processor jobs whose modes are consistent.

Lemma 2.1. Let TB = t123 + t124 + t134 + t234 + T 12,34max + T 13,24

max + T 14,23max , where T 12,34

max =max{t12, t34}, T 13,24

max = max{t13, t24}, and T 14,23max = max{t14, t23}. Then Opt(J ) ≥ TB.

Proof: Pick any two jobs j and j ′ in two different groups in (1). It is easy to see that the

modes of the jobs j and j ′ are not consistent so one of them cannot start until the other

finishes. Thus, for a group of 3-processor jobs, say J123, the system takes at least time t123 to

process the jobs in the group, during which no jobs in other groups in (1) can be processed.

Similarly, for a group of 2-processor jobs, say {J12, J34}, the system takes at least time

T 12,34max = max{t12, t34} to process the jobs in the group, during which no jobs in other groups

can be processed. Therefore, the makespan of any scheduling for the job set J is at least as

large as

TB = t123 + t124 + t134 + t234 + T 12,34max + T 13,24

max + T 14,23max

as claimed by the lemma. �Springer

Page 5: A simple linear time approximation algorithm for multi-processor job scheduling on four processors

J Comb Optim (2007) 13:33–45 37

According to the above discussion, we have the following lower bound for the makespan

of an optimal scheduling for the job set J .

Corollary 2.2. Let B0 = max{TB, T1, T2, T3, T4}. Then Opt(J ) ≥ B0.

3 Partial scheduling of 2-processor jobs and 3-processor jobs

In this section, we consider partial schedulings that assigns only 2-processor jobs and 3-

processor jobs in the job set J to the system. A scheduling of the whole job set J will be

obtained by properly inserting 1-processor jobs in the job set J into a partial scheduling.

Without loss of generality, we assume t12 ≥ t34 (otherwise we exchange the indices 1 and

3, and the indices of 2 and 4). We schedule the 2-processor jobs and 3-processor jobs based

on the job groups given in (1) in terms of their processing modes. There are totally four

possible situations, according to the configurations of the 2-processor jobs in each group. We

list these schedulings by figures, as shown in Fig. 1.

The makespan of each of the schedulings in Fig. 1 is

TB = t123 + t124 + t134 + t234 + T 12,34max + T 13,24

max + T 14,23max

where T 12,34max = max{t12, t34}, T 13,24

max = max{t13, t24}, and T 14,23max = max{t14, t23}. By

Lemma 2.1, Opt(J ) ≥ TB . Note that the ordering of the groups in the scheduling can be

permuted arbitrarily without increasing the makespan. Moreover, for each 2-processor job

group of two consistent modes (e.g., the group {J12, J34}), the smaller (i.e., shorter) job subset

of one mode (e.g., J34) can be scheduled to either have the same starting time, or have the

same finishing time as the larger (i.e., longer) job subset of the other mode (e.g., J12).

Fix a scheduling S of 2-processor jobs and 3-processor jobs in Fig. 1. Under the scheduling,

each group of jobs in (1) may introduce a “gap” for certain processors in the system. For

example, the 3-processor job group {J123} introduces a gap of length t123 for processor P4,

while the 2-processor job group {J12, J34}, when t12 > t34, introduces a gap of length t12 − t34

for processors P3 and P4. We say that the scheduling S has gap structure (g1, g2, g3, g4) if

under the scheduling S, the processor Pi has gi gaps, for 1 ≤ i ≤ 4. The four schedulings

in Fig. 1 can be distinguished by their gap structures: they have gap structures (1, 3, 3, 3),

(2, 2, 4, 2), (2, 2, 2, 4), and (3, 1, 3, 3), respectively. Note that two gaps in a scheduling may

be “merged” so they look like a single continuous gap. In this case, however, we still regard

them as two gaps.

4 An approximation algorithm for P4|fix|Cmax

We are ready to present our main algorithm. Given a job set J , we first group the jobs in

J according to their modes, and combine the 2-processor job groups as given in (1). Then

we construct a partial scheduling S for the 2-processor job groups and the 3-processor job

groups, which takes one of the configurations given in Fig. 1 (precisely which configuration

is taken depends on the relations of the processing times t13 and t24, and of processing times

t14 and t23). The makespan of this partial scheduling S is TB , which is not larger than Opt(J ),

according to Lemma 2.1.

Now consider the 1-processor job subsets J1, J2, J3, and J4 of processing times t1, t2, t3,

and t4, respectively. We discuss how we insert the 1-processor job subsets into the partial

Springer

Page 6: A simple linear time approximation algorithm for multi-processor job scheduling on four processors

38 J Comb Optim (2007) 13:33–45

J123

J124

J124

J134

J134

J234J34

J12

J14

J23

J14

J24

J13

J24

J13

Processor 4

Processor 3

Processor 2

Processor 1

TB

Case 4: t14 ≤ t23 and t13 ≤ t24, gap structure = (3,1,3,3)

J123

J124

J124

J134

J134

J234J34

J12

J14

J23

J14

J24

J13

J24

J13

Processor 4

Processor 3

Processor 2

Processor 1

TB

Case 3: t14 ≤ t23 and t13 ≥ t24, gap structure = (2,2,2,4)

J123

J124

J124

J134

J134

J234J34

J12

J14

J23

J14

J24

J13

J24

J13

Processor 4

Processor 3

Processor 2

Processor 1

TB

Case 2: t14 ≥ t23 and t13 ≤ t24, gap structure = (2,2,4,2)

J123

J124

J124

J134

J134

J234J34

J12

J14

J23

J14

J24

J13

J24

J13

Processor 4

Processor 3

Processor 2

Processor 1

TB

Case 1: t14 ≥ t23 and t13 ≥ t24, gap structure = (1,3,3,3)

Fig. 1 Scheduling for 2-processor jobs and 3-processor jobs (assuming t12 ≥ t34)

scheduling S to make a scheduling for the whole job set J . Recall that under the partial

scheduling S, each processor has a certain number of gaps, which correspond to the idle time

of the processor. There are two different situations.

Recall that B0 = max{TB, T1, T2, T3, T4}, where Ti is the total processing time on proces-

sor Pi required by the job set J . According to Corollary 2.2, B0 ≤ Opt(J ).

Case 1. There is a processor Pi such that one of the gaps of Pi has length ≥ B0/2, or the

sum of the lengths of two gaps of Pi is ≥ B0/2.

Springer

Page 7: A simple linear time approximation algorithm for multi-processor job scheduling on four processors

J Comb Optim (2007) 13:33–45 39

Algorithm. ScheduleP4-I

Input: a partial scheduling S of the 2-processor jobs and 3-processor jobs in J under whichprocessor Pi has a gap of length ≥ B0 /2

Output: a scheduling S0 of the whole job set J

1. Suppose that gap for the processor Pi is caused by the job group G,

remove all job groups after the job group G in the partial scheduling S;

2. add the 1-processor job subsets J1, J2, J3, and J4 to the scheduling (let them start at their

earliest possible time after the starting time of the job group G);

3. add back the job groups removed in step 1 (in exactly the same way as they were in the

partial scheduling S).

Fig. 2 Scheduling S0 when processor Pi has a large gap

Suppose that the processor Pi has a gap of length ≥ B0/2. Let this gap be caused by a job

group G in (1). We apply the algorithm given in Fig. 2.

The algorithm SchedulP4-I produces a scheduling S0 of the whole job set J . We show

that in this subcase, the makespan of the constructed scheduling S0 is bounded by 1.5B0.

Since the gap of the processor Pi has length at least B0/2 and since the gap is caused

by the job group G, the length of the job group G is ≥ B0/2. The makespan of the partial

scheduling S is TB , as shown in Fig. 1. Suppose after removing all job groups after the job

group G in S, the remaining scheduling has makespan T ′, then the part removed from the

scheduling S has length exactly TB − T ′ since no two jobs from two different job groups can

be processed in parallel. Now adding the 1-processor job subsets J1, J2, J3, and J4 results

in a new scheduling S′′. We claim that the makespan T ′′ of the scheduling S′′ is bounded by

T ′ + B0/2. In fact, suppose T ′′ > T ′ + B0/2, then the latest finished job in the scheduling

S′′ must be a 1-processor job subset Jj . Suppose that the job subset Jj has mode {Pj }. Then

under the scheduling S′′, the processor Pj keeps busy since the job group G starts until the

job subset Jj finishes (this is because by the algorithm, the job subset Jj starts at its earliest

possible time after the starting time of the job group G). The processor Pj spends at least

B0/2 time units from the beginning to the end of the job group G since the length of the job

group G is ≥ B0/2. After this, the processor Pj needs to spend more than another B0/2 time

units to finish the job subset Jj since T ′′ > T ′ + B0/2. Therefore, the total processing time

of the processor Pj is larger than B0. This contradicts the definition of B0, in which we have

let B0 ≥ Tj (recall that Tj is the total processing time of the processor Pj ).

Thus, after adding the 1-processor job subsets, the resulting scheduling S′′ has makespan

T ′′ bounded by T ′ + B0/2. Finally, when we add back the job groups removed from the partial

scheduling S in step 1, we get a scheduling S0 of the whole job set J , whose makespan is

bounded by T ′′ + (TB − T ′), which is bounded by TB + B0/2 ≤ 1.5B0 (here we have used

the fact TB ≤ B0).

The subcase in which the processor Pi has two gaps such that the sum of the lengths of

the two gaps is larger than or equal to B0/2 can be handled in a similar manner. Suppose

the gaps are due to two job groups G1 and G2. We first permute the job groups in the partial

scheduling S so that the two job groups G1 and G2 are adjacent under the scheduling and

the two gaps of the processor Pi are “merged” into a larger continuous gap. Then we apply

an algorithm similar to ScheduleP4-I to insert the 1-processor job subsets between the job

groups G1 and G2. The detailed algorithm is given in Fig. 3.

The proof that the makespan of the scheduling S0 constructed by the algorithm

ScheduleP4-II is bounded by 1.5B0 is very similar to the proof for the case when the

Springer

Page 8: A simple linear time approximation algorithm for multi-processor job scheduling on four processors

40 J Comb Optim (2007) 13:33–45

Algorithm. ScheduleP4-II

Input: a partial scheduling S of the 2-processor jobs and 3-processor jobs in J under whichprocessor Pi has two gaps such that the sum of the lengths of the gaps is ≥ B0 /2

Output: a scheduling S0 of the whole job set J

1. Suppose that the two gaps of the processor Pi are caused by the job groups G1 and G2;

2. permute the job groups in S so that the job groups G1 and G2 become adjacent and the two

gaps of Pi are merged into a larger continuous gap (let G2 start right after G1 finishes);

3. remove all job groups after the job group G2 in the partial scheduling;

4. remove the job group G2 from the scheduling;

5. add the 1-processor job subsets J1, J2, J3, and J4 to the scheduling (let them start at their

earliest possible time after the starting time of the job group G1);

6. add back the job group G2;

7. add back the job groups removed in step 3.

Fig. 3 Scheduling S0 when two gaps of processor Pi make a large gap

processor Pi has a single gap of length larger than or equal to B0/2. For completeness, we

give a quick outline as follows.

First of all, the sum of the lengths of the two job groups G1 and G2 is larger than or

equal to B0/2. Suppose that the makespan of the scheduling S′ obtained in step 3 is T ′ (thus,

the part removed in step 3 has length TB − T ′), then the scheduling S′′ obtained in step 6

has makespan bounded by T ′ + B0/2 (otherwise a processor Pj would have processing time

larger than B0). This derives that the scheduling obtained in step 7 has a makespan bounded

by TB + B0/2 ≤ 1.5B0.

Therefore, in case 1, we can always construct a scheduling S0 for the job set J whose

makespan is bounded by 1.5B0.

Case 2. Under the partial scheduling S, no processor has either a gap of length larger than

or equal to B0/2 or two gaps such that the sum of lengths of the two gaps is larger than or

equal to B0/2.

We first explain our basic idea in dealing with this case. Look at Fig. 1. We observe

that in each scheduling, all processors finish at the same time. Therefore, in order to keep

the makespan small, we only need to reduce the amount of idle time, i.e., the gaps, for the

processors. By the assumption of the case, the gaps for each processor are small. Therefore,

we try to “fill” as many gaps as we can so that at least one processor has at most two gaps.

Since the sum of the lengths of any two gaps of any processor is less than B0/2, the processor

with at most two gaps left shows that before the makespan of the scheduling, the processor

has less than B0/2 idle time units. From this, we can easily derive that the makespan of the

scheduling is bounded by 1.5B0.

Look at each of the situations given in Fig. 1. These situations give gap structures

(1, 3, 3, 3), (2, 2, 4, 2), (2, 2, 2, 4), and (3, 1, 3, 3), respectively. There is a very nice property

between the job groups {J14, J23} and {J13, J24}: in the (1, 3, 3, 3) and (3, 1, 3, 3) structures,

each of the processor with 3 gaps has at least one gap caused by these two job groups, and

the processor with 1 gap has its gap not in these two job groups, while in the (2, 2, 4, 2) and

(2, 2, 2, 4) structures, the processor with 4 gaps has two gaps caused by these two job groups

and the two gaps are merged into a larger continuous gap. This nice property allows us to

insert the 1-processor jobs between these two job groups without increasing much makespan.

Springer

Page 9: A simple linear time approximation algorithm for multi-processor job scheduling on four processors

J Comb Optim (2007) 13:33–45 41

Algorithm. ScheduleP4-III

Input: a partial scheduling S of the 2-processor jobs and 3-processor jobs in J under whichno processor has two gaps such that the sum of the lengths of the two gaps is ≥ B0 /2

Output: a scheduling S0 of the whole job set J

1. remove the job group {J13, J24} from the partial scheduling in Figure 1;

2. add the 1-processor job subsets J1, J2, J3, and J4 to the scheduling (let them start at their

earliest possible time after the starting time of the job group {J14 ,J23});

3. add back the job group {J13, J24} (in exactly the same way as they were in the partial

scheduling S).

Fig. 4 Scheduling S0 when gaps of all processors are small

The algorithm for this case is given in Fig. 4.

Let the scheduling for the whole job set J constructed by the algorithm ScheduleP4-IIIbe S0. If the makespan of S0 is equal to the makespan TB of the partial scheduling S, then by

Lemma 2.1, S0 is an optimal scheduling for the job set J .

On the other hand, if the makespan of S0 is larger than TB , then at least one of the gaps

between the job groups {J14, J23} and {J13, J24} is filled by the 1-processor job subsets.

If the filled gap belongs to a processor Pi that has at most 3 gaps in the partial scheduling

S, then after filling one of its gaps, the processor Pi has at most 2 gaps left. By the assumption

in this case, the sum of the lengths of the gaps left in the processor Pi is less than B0/2. That

is, the total idle time of processor Pi before the makespan is less than B0/2 (note that by

the algorithm ScheduleP4-III, all processors finish at the same time, so the makespan of the

scheduling S0 is equal to the finish time of the processor Pi , see Fig. 6(b) for an illustration).

Since the total processing time Ti of the processor Pi is bounded by B0, we conclude that the

makespan of the scheduling S0 is less than Ti + 0.5B0 ≤ 1.5B0.

If the filled gap belongs to the processor Pi that has 4 gaps in the partial scheduling S,

then the filled gap must be the larger gap obtained by merging two gaps in the job groups

{J14, J23} and {J13, J24}. Thus, with this gap filled, the processor Pi has two gaps left such

that the sum of the lengths of these two gaps is less than B0/2. This again derives that the

idle time of the processor Pi before the makespan of S0 is less than B0/2, and the makespan

of the scheduling S0 is less than 1.5B0.

We summarize all these discussions in the main algorithm given in Fig. 5, and conclude

with our main theorem.

Theorem 4.1. The algorithm Schedule-P4 runs in linear time and constructs a schedulingS0 for the input job set J with makespan bounded by 1.5Opt(J ).

Proof: According to the above discussion, the makespan of the scheduling S0 is bounded

by 1.5B0, which is at most 1.5Opt(J ) according to Corollary 2.2. �

5 Tightness of ratio 1.5 for the algorithm Schedule-P4

Theorem 4.1 concludes that the approximation ratio of the algorithm Schedule-P4 is bounded

by 1.5. It is natural to ask whether this ratio 1.5 is tight for the algorithm in the sense that there

is an instance J of the problem P4|fix|Cmax on which the algorithm Schedule-P4 constructs

a scheduling whose makespan is arbitrarily close to 1.5Opt(J ). Very recently, Arambula

Springer

Page 10: A simple linear time approximation algorithm for multi-processor job scheduling on four processors

42 J Comb Optim (2007) 13:33–45

Algorithm. Schedule-P4

Input: an instance J for the P4 |fix|Cmax problem

Output: a scheduling S0 for J

1. group the jobs in J by their processing modes, and combine the 2-processor job

groups as in (1);

2. construct the partial scheduling S for 2-processor jobs and 3-processor jobs, as shown

in Figure 1;

3. if a processor Pi has a gap of length ≥ B0 /2 under the partial scheduling S

then call subroutine ScheduleP4-I to construct a scheduling S0 for J ; stop;

4. if a processor Pi has two gaps such that the sum of the lengths of the two gaps is ≥ B0 /2under the partial scheduling S

then call subroutine ScheduleP4-II to construct a scheduling S0 for J ; stop;

5. call subroutine ScheduleP4-III to construct a scheduling S0 for J .

Fig. 5 Approximation algorithm for the P4|fix|Cmax problem

(2005) has given an affirmed answer to this question. For the completeness of the discussion,

we present this result in this section.

Theorem 5.1 (I. Arambula). The ratio 1.5 is tight for the algorithm Schedule-P4.

Proof: Recall TB = t123 + t124 + t134 + t234 + T 12,34max + T 13,24

max + T 14,23max , where T 12,34

max =max{t12, t34}, T 13,24

max = max{t13, t24}, and T 14,23max = max{t14, t23}. Fix a number TB > 0 and

let ε > 0 be a small positive real number. Consider an instance of P4|fix|Cmax so that the

following conditions hold (see Fig. 6(a) for a concrete illustration):

(1) |J234| = 0.5TB − ε,

(2) |J12| ≥ |J34|, |J23| = |J14| + ε/2, |J24| = |J13| + ε/2,

(3) |J1| = 0.5TB, |J2| ≤ |J134|, |J3| ≤ |J124|, |J4| ≤ |J123|

(a simplest way to achieve such an instance is to have a single job for each mode).

Recall that for each i , Ti is the total processing time for processor Pi . The above conditions

imply

T1 = TB,

and

|J123| + |J124| + |J134| + |J12| + |J14| + |J13| = 0.5TB .

Also it is easy to see that

T2, T3, T4 ≤ TB .

Therefore, we have

B0 = max{TB, T1, T2, T3, T4} = TB .

Springer

Page 11: A simple linear time approximation algorithm for multi-processor job scheduling on four processors

J Comb Optim (2007) 13:33–45 43

J123

J124

J124

J134

J134

J234J34

J12

J14

J23

J14

J24

J13

J24

J13

J4

J3

J2

J1

P4

P3

P2

P1

TB

(a) An illustration of the instance

0.5TB − 0.5TB2 2

1-processor jobs to

be scheduled

J123

J124

J124

J134

J134

J234J34

J12

J14

J23

J14

J24

J13

J24

J13

J4

J3

J2

J1

P4

P3

P2

P1

(b) The scheduling by the algorithm Schedule-P4

1.5TB −

J123

J124

J124

J134

J134 J34

J12

J14

J23

J14

J4

J3

J2

J1

J234

J24

J13

J24

J13

P4

P3

P2

P1

TB

(c) An optimal scheduling

Fig. 6 The ratio 1.5 is tight for the algorithm Schedule-P4

Since no processor has two gaps whose sum is larger than or equal to B0/2 (in particular,

the sizes of the three gaps for processor P1 are 0.5TB − ε, ε/2, and ε/2, respectively), the

scheduling shown in Fig. 6(b) will be constructed by the algorithm Schedule-P4. It is not

difficult to verify that the makespan of the scheduling in Fig. 6(b) is equal to 1.5TB − ε. On the

other hand, if we permute the job groups so that the three gaps in processor P1 are “merged”

into a single larger time slot, then job group J1 can be inserted into this time slot, achieving

a scheduling for the job set J with makespan TB , see Fig. 6(c) for the construction of this

scheduling. By Corollary 2.2, Opt(J ) ≥ B0 = TB . Therefore, the scheduling in Fig. 6(c)

is an optimal scheduling for the job set J . If we let Appx(J ) be the makespan of the

scheduling constructed by the algorithm Schedule-P4, then we have the approximation ratio

(note B0 = TB):

Appx(J )

Opt(J )= 1.5B0 − ε

B0

= 1.5 − ε

B0

= 1.5 − ε

TB(2)

By properly choosing the values TB and ε, this ratio can be arbitrarily close to 1.5 from

below. In conclusion, the approximation ratio 1.5 is tight for the algorithm Schedule-P4.

6 Final remarks and future work

Multiprocessor job scheduling problem has become increasingly interesting, both in practi-

cal applications and in theoretical studies. Much progress has been made in recent years. In

Springer

Page 12: A simple linear time approximation algorithm for multi-processor job scheduling on four processors

44 J Comb Optim (2007) 13:33–45

particular, polynomial time approximation schemes for various multiprocessor job schedul-

ing problems have been derived (Amoura et al., 2002; Chen and Miranda, 2001; Jansen and

Porkolab, 2005), which are of great significance from a theoretical point of view. On the other

hand, practical algorithms for the problem have not been as successful. Practical algorithms

with good approximation ratio are only developed for systems with at most three proces-

sors (Dell’Olmo et al., 1997; Goemans, 1995). The current paper gives new observations,

introduces new techniques, and extends the research in this direction to systems with four

processors. Our techniques are also applicable to systems with more than four processors.

For example, for the P5|fix|Cmax problem, we can have a linear time algorithm with approx-

imation ratio bounded by 2, improving the best previous ratio 2.5 for the problem (Chen

and Lee, 1999). Further and more subtle applications of our techniques are currently under

investigation.

We propose a few open problems for future work in this research area.

The technique of grouping jobs according to their processing modes then scheduling the

jobs based on job groups is known as normal scheduling (Dell’Olmo et al., 1997). For the

problem P3|fix|Cmax, Dell’Olmo et al. (1997) show that the best normal scheduling of a job

set J has makespan bounded by 1.25Opt(J ), and proved that this bound is tight. According to

this definition, Theorem 4.1 claims that for any instance J for the problem P4|fix|Cmax, there

is a normal scheduling whose makespan is bounded by 1.5Opt(J ). Is 1.5 a tight bound for

normal schedulings for the problem P4|fix|Cmax, or the bound 1.5 can be further improved?

For each fixed k, there is a (practical) approximation algorithm for the Pk |fix|Cmax problem

whose approximation ratio is bounded by a constant (Chen and Lee, 1999). However, this

constant depends on the number k of processors in the system. Can we develop practical

algorithms of approximation ratio bounded by a constant c for the Pk |fix|Cmax problem for all

k such that the constant c is independent of k? Note that for the classical scheduling problems,

such algorithms exist (for example, Graham’s listing scheduling algorithm (Graham, 1966))

with very satisfying approximation ratio (Hall, 1997).

A generalized version of the P4|fix|Cmax problem is the P4|set |Cmax problem, in which

each job may have several alternative processing modes (Chen and Lee, 1999). Based on

normal schedulings of approximation ratio 2 for the P4|fix|Cmax problem, Chen and Lee

(1999) have developed an approximation algorithm of ratio 2 + ε for any ε > 0 for the

P4|set |Cmax problem. Is it possible to extend the results in the current paper to an improved

algorithm for the P4|set |Cmax problem?

Acknowledgments This research is supported in part by the National Natural Science Foundation of China

under Project No. 60433020, and by US National Science Foundation under Grants CCR-0311590 and CCF-

0430683.

References

Amoura AK, Bampis E, Kenyon C, Manoussakis Y (2002) Scheduling independent multiprocessor tasks.

Algorithmica 32:247–261

Arambula I (2005) Personal communication

Bianco L, Blazewicz J, Dell’Olmo P, Drozdowski M (1995) Scheduling multiprocessor tasks on a dynamic

configuration of dedicated processors. Ann Oper Res 58:493–517

Blazewicz J, Dell’Olmo P, Drozdowski M, Speranza M (1992) Scheduling multiprocessor tasks on the three

dedicated processors. Inf Proc Lett 41:275–280

Blazewicz J, Dell’Olmo P, Drozdowski M, Speranza M (1992/1994) Corrigendum to Scheduling multi-

processor tasks on the three dedicated processors. Inf Proc Lett 41:275–280. Inf Proc Lett 49:269–

270

Springer

Page 13: A simple linear time approximation algorithm for multi-processor job scheduling on four processors

J Comb Optim (2007) 13:33–45 45

Blazewicz J, Drozdowski M, Weglarz J (1986) Scheduling multiprocessor tasks to minimize scheduling length.

IEEE Trans Comput 35:389–393

Blazewicz J, Drozdowski W, Weglarz J (1994) Scheduling multiprocessor tasks—a survey. Int J Microcomput

Appl 13:89–97

Chen J, Lee C-Y (1999) General multiprocessor tasks scheduling. Naval Res Log 46:59–74

Chen J, Miranda A (2001) A polynomial time approximation scheme for general multiprocessor job scheduling.

SIAM J Comput 31:1–17

Dell’Olmo P, Speranza MG, Tuza Zs (1997) Efficiency and effectiveness of normal schedules on three dedicated

processors. Discrete Math 164:67–79

Dobson G, Karmarkar U (1989) Simultaneous resource scheduling to minimize weighted flow times. Oper

Res 37:592–600

Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness.

Freeman, San Francisco

Goemans MX (1995) An approximation algorithm for scheduling on three dedicated machines. Discrete Appl

Math 61:49–59

Graham RL (1966) Bounds for certain multiprocessing anomalies. Bell Sys Tech J 45:1563–1581

Hall LA (1997) Approximation algorithms for scheduling. In: Hochbaum DS, (ed.) Approximation algorithms

for NP-hard problems. PWS Publishing Company, pp 1–45

Hoogeveen JA, van de Velde SL, Veltman B (1994) Complexity of scheduling multiprocessor tasks with

prespecified processor allocations. Discrete Appl Math 55:259–272

Jansen K, Porkolab L (2005) General multiprocessor task scheduling: approximate solutions in linear time.

SIAM J Comput 35:519–530

Krawczyk H, Kubale M (1985) An approximation algorithm for diagnostic test scheduling in multicomputer

systems. IEEE Trans Comput 34:869–872

Lee C-Y, Cai X (1999) Scheduling one and two-processor tasks on two parallel processors. IIE Trans 31:445–

455

Lee C-Y, Lei L, Pinedo M (1997) Current trends in deterministic scheduling. Ann Ope Res 70:1–42

Springer