Upload
jingui-huang
View
213
Download
1
Embed Size (px)
Citation preview
J Comb Optim (2007) 13:33–45
DOI 10.1007/s10878-006-9011-y
A simple linear time approximation algorithm formulti-processor job scheduling on four processors
Jingui Huang · Jianer Chen · Songqiao Chen ·Jianxin Wang
Published online: 12 October 2006C© Springer Science + Business Media, LLC 2006
Abstract Multiprocessor job scheduling problem has become increasingly interesting, for
both theoretical study and practical applications. Theoretical study of the problem has made
significant progress recently, which, however, seems not to imply practical algorithms for the
problem, yet. Practical algorithms have been developed only for systems with three proces-
sors and the techniques seem difficult to extend to systems with more than three processors.
This paper offers new observations and introduces new techniques for the multiprocessor
job scheduling problem on systems with four processors. A very simple and practical linear
time approximation algorithm of ratio bounded by 1.5 is developed for the multi-processor
job scheduling problem P4|fix|Cmax, which significantly improves previous results. Our tech-
niques are also useful for multiprocessor job scheduling problems on systems with more than
four processors.
Keywords Multi-processor job scheduling . Approximation algorithm . NP-hard problem
1 Introduction
One of the assumption made in classical scheduling theory is that a job is always executed
by one processor at a time. With the advances in parallel algorithms, this assumption may
no longer be valid for job systems. For example, in semiconductor circuit design workforce
planning, a design project is to be processed by a group of people. The project contains
n jobs, and each job is handled by a specific subgroup of people working simultaneously
on the job. Note that a person may belong to several different subgroups but he can work
for at most one subgroup at a time. Now the question is how we can schedule the jobs so
J. Huang . J. Chen . S. Chen . J. Wang
School of Information Science and Engineering, Central South University, ChangSha, Hunan 410083,
The People’s Republic of China
J. Chen (�)
Department of Computer Science, Texas A&M University, College Station, TX 77843-3112, USA
e-mail: [email protected]
Springer
34 J Comb Optim (2007) 13:33–45
that the project can be finished as early as possible. Other applications include (i) the berth
allocation problem (Lee and Cai, 1999) where a large vessel may occupy more than one berth
for loading and unloading, (ii) diagnosable microprocessor systems (Krawczyk and Kubale,
1985) where a job must be performed on parallel processors in order to detect faults, (iii)
manufacturing, where a job may need machines, tools, and people simultaneously (this gives
an example for a system in which processors may have different types), and (iv) scheduling
a sequence of meetings where each meeting requires a certain group of people (Dobson and
Karmarkar, 1989). In the scheduling literature (Hall, 1997), this kind of problems are called
multiprocessor job scheduling problems.
Formally, each instance for the multiprocessor job scheduling problem on a k-processor
system (P1, P2, . . . , Pk) consists of a set of jobs J = { j1, j2, . . . , jn}, where each job ji is
given by a pair (Qi , ti ) with Qi a subset of the processors in the k-processor system and ti the
execution time, to specify that the job ji requires the set Qi of processors with running time
ti . Therefore, the job ji is a multiprocessor job that requires the simultaneous execution of
a set of processors instead of a single processor. Of course, no processor can be assigned to
participate in the executions of more than one job at any moment. The goal is to construct a
scheduling for these multiprocessor jobs so that the k-processor system finishes the execution
of all the jobs in the minimum amount of time. Preemption is not allowed in the scheduling.
This multiprocessor job scheduling problem is denoted as Pk |fix|Cmax in the literature (Hall,
1997).
Feasibility and approximability of the Pk |fix|Cmax problem have been studied by many
researchers. The P2|fix|Cmax problem is a generalized version of the classical job scheduling
problem on a 2-processor system (Garey and Johnson, 1979), thus it is NP-hard. Chen and
Lee (1999) developed a fully polynomial time approximation scheme for the P2|fix|Cmax
problem. Hoogeveen et al. (1994) showed that the P3|fix|Cmax problem is NP-hard in the
strong sense thus it does not have a fully polynomial time approximation scheme unless P =NP (see also (Blazewicz et al., 1992,/)). Blazewicz et al. (1992) developed a polynomial
time approximation algorithm of ratio 4/3 for the problem P3|fix|Cmax, which was improved
later by Dell’Olmo et al. (1997), who gave a polynomial time approximation algorithm of
ratio 5/4 for the same problem. Both algorithms are based on the study of a special type
of schedulings called normal schedulings. Goemans (1995) further improved the results by
giving a polynomial time approximation algorithm of ratio 7/6 for the P3|fix|Cmax problem.
More recently, Amoura et al. (2002) developed a polynomial time approximation scheme for
the problem Pk |fix|Cmax for every fixed integer k. Polynomial time approximation schemes
for a more generalized version of the Pk |fix|Cmax problem have also been developed recently
(Chen and Miranda, 2001; Jansen and Porkolab, 2005).
The polynomial time approximation schemes (Amoura et al., 2002; Chen and Miranda,
2001; Jansen and Porkolab, 2005) for multiprocessor job scheduling problems are of great
theoretical significance. However, the current versions of the algorithms seem not practically
useful yet, because of the very high degree of the polynomials (Amoura et al., 2002; Chen
and Miranda, 2001), or the huge constant coefficient (Jansen and Porkolab, 2005) in the time
complexity of the algorithms. Chen and Miranda (2001) have asked for practically efficient
approximation algorithms for the multiprocessor job scheduling problem for systems with
small number of processors. For the P3|fix|Cmax problem, Goemans’ approximation algorithm
(Goemans, 1995) runs in linear time and has approximation ratio 7/6, which seems very
acceptable practically. On the other hand, no similar results have been developed for the
Pk |fix|Cmax problem with k ≥ 4. In particular, the techniques developed for the P3|fix|Cmax
problem, including the analysis of normal scheduling (Blazewicz et al., 1992; Dell’Olmo
et al., 1997) and the method of splitting 1-processor jobs (Goemans, 1995), seem difficult to
Springer
J Comb Optim (2007) 13:33–45 35
be generalized to systems of more than 3 processors. Currently, the best practical algorithm
for the Pk |fix|Cmax problem is due to Chen and Lee (1999), which runs in linear time and has
approximation ratio k/2 for k ≥ 3.
The current paper is a respondence to the call by Chen and Miranda (2001). We focus
on the P4|fix|Cmax problem. We offer several new observations and introduce several new
techniques. First, we derive two lower bounds for the makespan of an optimal scheduling for
a job set, one is based on 3-processor jobs and a careful pairing of 2-processor jobs, and the
other is based on the total processing time of each processor. The lower bounds are simple
but very effective in the analysis of our approximation algorithms. Secondly, we carefully
pair the 2-processor jobs and make a partial scheduling of 3-processor jobs and 2-processor
jobs. We observe that in these partial schedulings, most processor idle times are within a
small region, thus the 1-processor jobs can be inserted in this region to effectively fill the
gaps. Our derived lower bounds also help us to effectively handle large gaps. Combining all
these, we derive an approximation algorithm of ratio 1.5 for the P4|fix|Cmax problem, which
significantly improves the best previous ratio 2 for practical algorithms for the problem.
The paper is organized as follows. Section 2 derives two lower bounds for the makespan of
optimal schedulings for the P4|fix|Cmax problem. Section 3 discusses partial schedulings for
3-processor jobs and 2-processor jobs. The approximation algorithm and its analysis are given
in Section 4. In Section 5 we show that the ratio 1.5 is tight for our approximation algorithm
for the P4|fix|Cmax problem. We conclude in Section 6 and propose several problems for
future research.
2 Two simple lower bounds
Suppose that the system has four processors P1, P2, P3, and P4. An instance of the P4|fix|Cmax
problem is a set of jobs: J = { j1, j2, . . . , jn}, where each job ji is described by a pair
ji = (Qi , ti ), Qi is a subset of {P1, P2, P3, P4} indicating the processor set required to execute
the job ji , and ti is the parallel processing time of the job ji executed on the processor set
Qi . The processor set Qi is called the processing mode (or simply mode) of the job ji . To
make the job ji meaningful, the mode Qi must be a nonempty set.
A scheduling S of the job set J is an assignment of each job ji in J with a starting time to
be executed on the processor set Qi in the 4-processor system such that no processor is used
for execution of more than one job at any moment. The makespan of the scheduling S is the
latest finish time of a job in J under the scheduling S, denoted by S(J ). An optimal schedulingS of the job set J is a scheduling whose makespan is the minimum over all schedulings of J .
The makespan of an optimal scheduling is denoted by Opt(J ). An approximation algorithmA of the P4|fix|Cmax problem is an algorithm that for each given instance J of the P4|fix|Cmax
problem constructs a scheduling for J . We say that the approximation ratio of the algorithm
A is (bounded by) r if for any instance J , the scheduling S constructed by the algorithm Asatisfies S(J )/Opt(J ) ≤ r .
We consider approximation algorithms for the P4|fix|Cmax problem. Since we can always
schedule jobs of mode {P1, P2, P3, P4} before other jobs without increasing the approxi-
mation ratio of an algorithm for the P4|fix|Cmax problem, we can assume, without loss of
generality, that an instance of P4|fix|Cmax contains no jobs requiring all four processors in
the system. Thus, there are at most 24 − 2 = 14 processing modes for jobs in an instance
of the P4|fix|Cmax problem (note that the mode corresponding to the empty set is also ex-
cluded). A job is a 1-processor job if its mode contains a single processor. Similarly, we
define 2-processor jobs and 3-processor jobs.
Springer
36 J Comb Optim (2007) 13:33–45
Group the jobs in J , according to their processing modes into 14 subsets of jobs:
J1, J2, J3, J4, J12, J13, J14, J23, J24, J34, J123, J124, J134, J234
here for simplicity, we have used indices instead of processor subsets to indicate the processing
modes. For example, J124 is the subset of jobs whose mode is {P1, P2, P4}. For each index
label I , let tI be the total processing time of the jobs in the subset JI . For example, t124 is the
sum of the processing times of the jobs of mode {P1, P2, P4} in J .
We first derive two lower bounds for the value Opt(J ). Let
T1 = t1 + t12 + t13 + t14 + t123 + t124 + t134
T2 = t2 + t12 + t23 + t24 + t123 + t124 + t234
T3 = t3 + t13 + t23 + t34 + t123 + t134 + t234
T4 = t4 + t14 + t24 + t34 + t124 + t134 + t234
That is, Ti is the total processing time of processor Pi required by all the jobs in the instance
J .
Since even without any idle time, each processor Pi must spend processing time Ti on the
job set J , we derive the first lower bound on Opt(J ):
Opt(J ) ≥ max{T1, T2, T3, T4}
We say that two modes Q1 and Q2 are consistent if a job of mode Q1 and a job of mode Q2
can be executed in parallel (i.e., Q1 ∩ Q2 = ∅). For example, the mode {P1, P2} and mode
{P3, P4} are consistent, and the mode {P1, P3, P4} and mode {P2} are consistent.
Partition the 2-processor job subsets and the 3-processor job subsets in J into seven
groups:
{J123}, {J124}, {J134}, {J234}, {J12, J34}, {J13, J24}, {J14, J23} (1)
where each group consists of either a single subset of 3-processor jobs of the same mode, or
two subsets of 2-processor jobs whose modes are consistent.
Lemma 2.1. Let TB = t123 + t124 + t134 + t234 + T 12,34max + T 13,24
max + T 14,23max , where T 12,34
max =max{t12, t34}, T 13,24
max = max{t13, t24}, and T 14,23max = max{t14, t23}. Then Opt(J ) ≥ TB.
Proof: Pick any two jobs j and j ′ in two different groups in (1). It is easy to see that the
modes of the jobs j and j ′ are not consistent so one of them cannot start until the other
finishes. Thus, for a group of 3-processor jobs, say J123, the system takes at least time t123 to
process the jobs in the group, during which no jobs in other groups in (1) can be processed.
Similarly, for a group of 2-processor jobs, say {J12, J34}, the system takes at least time
T 12,34max = max{t12, t34} to process the jobs in the group, during which no jobs in other groups
can be processed. Therefore, the makespan of any scheduling for the job set J is at least as
large as
TB = t123 + t124 + t134 + t234 + T 12,34max + T 13,24
max + T 14,23max
as claimed by the lemma. �Springer
J Comb Optim (2007) 13:33–45 37
According to the above discussion, we have the following lower bound for the makespan
of an optimal scheduling for the job set J .
Corollary 2.2. Let B0 = max{TB, T1, T2, T3, T4}. Then Opt(J ) ≥ B0.
3 Partial scheduling of 2-processor jobs and 3-processor jobs
In this section, we consider partial schedulings that assigns only 2-processor jobs and 3-
processor jobs in the job set J to the system. A scheduling of the whole job set J will be
obtained by properly inserting 1-processor jobs in the job set J into a partial scheduling.
Without loss of generality, we assume t12 ≥ t34 (otherwise we exchange the indices 1 and
3, and the indices of 2 and 4). We schedule the 2-processor jobs and 3-processor jobs based
on the job groups given in (1) in terms of their processing modes. There are totally four
possible situations, according to the configurations of the 2-processor jobs in each group. We
list these schedulings by figures, as shown in Fig. 1.
The makespan of each of the schedulings in Fig. 1 is
TB = t123 + t124 + t134 + t234 + T 12,34max + T 13,24
max + T 14,23max
where T 12,34max = max{t12, t34}, T 13,24
max = max{t13, t24}, and T 14,23max = max{t14, t23}. By
Lemma 2.1, Opt(J ) ≥ TB . Note that the ordering of the groups in the scheduling can be
permuted arbitrarily without increasing the makespan. Moreover, for each 2-processor job
group of two consistent modes (e.g., the group {J12, J34}), the smaller (i.e., shorter) job subset
of one mode (e.g., J34) can be scheduled to either have the same starting time, or have the
same finishing time as the larger (i.e., longer) job subset of the other mode (e.g., J12).
Fix a scheduling S of 2-processor jobs and 3-processor jobs in Fig. 1. Under the scheduling,
each group of jobs in (1) may introduce a “gap” for certain processors in the system. For
example, the 3-processor job group {J123} introduces a gap of length t123 for processor P4,
while the 2-processor job group {J12, J34}, when t12 > t34, introduces a gap of length t12 − t34
for processors P3 and P4. We say that the scheduling S has gap structure (g1, g2, g3, g4) if
under the scheduling S, the processor Pi has gi gaps, for 1 ≤ i ≤ 4. The four schedulings
in Fig. 1 can be distinguished by their gap structures: they have gap structures (1, 3, 3, 3),
(2, 2, 4, 2), (2, 2, 2, 4), and (3, 1, 3, 3), respectively. Note that two gaps in a scheduling may
be “merged” so they look like a single continuous gap. In this case, however, we still regard
them as two gaps.
4 An approximation algorithm for P4|fix|Cmax
We are ready to present our main algorithm. Given a job set J , we first group the jobs in
J according to their modes, and combine the 2-processor job groups as given in (1). Then
we construct a partial scheduling S for the 2-processor job groups and the 3-processor job
groups, which takes one of the configurations given in Fig. 1 (precisely which configuration
is taken depends on the relations of the processing times t13 and t24, and of processing times
t14 and t23). The makespan of this partial scheduling S is TB , which is not larger than Opt(J ),
according to Lemma 2.1.
Now consider the 1-processor job subsets J1, J2, J3, and J4 of processing times t1, t2, t3,
and t4, respectively. We discuss how we insert the 1-processor job subsets into the partial
Springer
38 J Comb Optim (2007) 13:33–45
J123
J124
J124
J134
J134
J234J34
J12
J14
J23
J14
J24
J13
J24
J13
Processor 4
Processor 3
Processor 2
Processor 1
TB
Case 4: t14 ≤ t23 and t13 ≤ t24, gap structure = (3,1,3,3)
J123
J124
J124
J134
J134
J234J34
J12
J14
J23
J14
J24
J13
J24
J13
Processor 4
Processor 3
Processor 2
Processor 1
TB
Case 3: t14 ≤ t23 and t13 ≥ t24, gap structure = (2,2,2,4)
J123
J124
J124
J134
J134
J234J34
J12
J14
J23
J14
J24
J13
J24
J13
Processor 4
Processor 3
Processor 2
Processor 1
TB
Case 2: t14 ≥ t23 and t13 ≤ t24, gap structure = (2,2,4,2)
J123
J124
J124
J134
J134
J234J34
J12
J14
J23
J14
J24
J13
J24
J13
Processor 4
Processor 3
Processor 2
Processor 1
TB
Case 1: t14 ≥ t23 and t13 ≥ t24, gap structure = (1,3,3,3)
Fig. 1 Scheduling for 2-processor jobs and 3-processor jobs (assuming t12 ≥ t34)
scheduling S to make a scheduling for the whole job set J . Recall that under the partial
scheduling S, each processor has a certain number of gaps, which correspond to the idle time
of the processor. There are two different situations.
Recall that B0 = max{TB, T1, T2, T3, T4}, where Ti is the total processing time on proces-
sor Pi required by the job set J . According to Corollary 2.2, B0 ≤ Opt(J ).
Case 1. There is a processor Pi such that one of the gaps of Pi has length ≥ B0/2, or the
sum of the lengths of two gaps of Pi is ≥ B0/2.
Springer
J Comb Optim (2007) 13:33–45 39
Algorithm. ScheduleP4-I
Input: a partial scheduling S of the 2-processor jobs and 3-processor jobs in J under whichprocessor Pi has a gap of length ≥ B0 /2
Output: a scheduling S0 of the whole job set J
1. Suppose that gap for the processor Pi is caused by the job group G,
remove all job groups after the job group G in the partial scheduling S;
2. add the 1-processor job subsets J1, J2, J3, and J4 to the scheduling (let them start at their
earliest possible time after the starting time of the job group G);
3. add back the job groups removed in step 1 (in exactly the same way as they were in the
partial scheduling S).
Fig. 2 Scheduling S0 when processor Pi has a large gap
Suppose that the processor Pi has a gap of length ≥ B0/2. Let this gap be caused by a job
group G in (1). We apply the algorithm given in Fig. 2.
The algorithm SchedulP4-I produces a scheduling S0 of the whole job set J . We show
that in this subcase, the makespan of the constructed scheduling S0 is bounded by 1.5B0.
Since the gap of the processor Pi has length at least B0/2 and since the gap is caused
by the job group G, the length of the job group G is ≥ B0/2. The makespan of the partial
scheduling S is TB , as shown in Fig. 1. Suppose after removing all job groups after the job
group G in S, the remaining scheduling has makespan T ′, then the part removed from the
scheduling S has length exactly TB − T ′ since no two jobs from two different job groups can
be processed in parallel. Now adding the 1-processor job subsets J1, J2, J3, and J4 results
in a new scheduling S′′. We claim that the makespan T ′′ of the scheduling S′′ is bounded by
T ′ + B0/2. In fact, suppose T ′′ > T ′ + B0/2, then the latest finished job in the scheduling
S′′ must be a 1-processor job subset Jj . Suppose that the job subset Jj has mode {Pj }. Then
under the scheduling S′′, the processor Pj keeps busy since the job group G starts until the
job subset Jj finishes (this is because by the algorithm, the job subset Jj starts at its earliest
possible time after the starting time of the job group G). The processor Pj spends at least
B0/2 time units from the beginning to the end of the job group G since the length of the job
group G is ≥ B0/2. After this, the processor Pj needs to spend more than another B0/2 time
units to finish the job subset Jj since T ′′ > T ′ + B0/2. Therefore, the total processing time
of the processor Pj is larger than B0. This contradicts the definition of B0, in which we have
let B0 ≥ Tj (recall that Tj is the total processing time of the processor Pj ).
Thus, after adding the 1-processor job subsets, the resulting scheduling S′′ has makespan
T ′′ bounded by T ′ + B0/2. Finally, when we add back the job groups removed from the partial
scheduling S in step 1, we get a scheduling S0 of the whole job set J , whose makespan is
bounded by T ′′ + (TB − T ′), which is bounded by TB + B0/2 ≤ 1.5B0 (here we have used
the fact TB ≤ B0).
The subcase in which the processor Pi has two gaps such that the sum of the lengths of
the two gaps is larger than or equal to B0/2 can be handled in a similar manner. Suppose
the gaps are due to two job groups G1 and G2. We first permute the job groups in the partial
scheduling S so that the two job groups G1 and G2 are adjacent under the scheduling and
the two gaps of the processor Pi are “merged” into a larger continuous gap. Then we apply
an algorithm similar to ScheduleP4-I to insert the 1-processor job subsets between the job
groups G1 and G2. The detailed algorithm is given in Fig. 3.
The proof that the makespan of the scheduling S0 constructed by the algorithm
ScheduleP4-II is bounded by 1.5B0 is very similar to the proof for the case when the
Springer
40 J Comb Optim (2007) 13:33–45
Algorithm. ScheduleP4-II
Input: a partial scheduling S of the 2-processor jobs and 3-processor jobs in J under whichprocessor Pi has two gaps such that the sum of the lengths of the gaps is ≥ B0 /2
Output: a scheduling S0 of the whole job set J
1. Suppose that the two gaps of the processor Pi are caused by the job groups G1 and G2;
2. permute the job groups in S so that the job groups G1 and G2 become adjacent and the two
gaps of Pi are merged into a larger continuous gap (let G2 start right after G1 finishes);
3. remove all job groups after the job group G2 in the partial scheduling;
4. remove the job group G2 from the scheduling;
5. add the 1-processor job subsets J1, J2, J3, and J4 to the scheduling (let them start at their
earliest possible time after the starting time of the job group G1);
6. add back the job group G2;
7. add back the job groups removed in step 3.
Fig. 3 Scheduling S0 when two gaps of processor Pi make a large gap
processor Pi has a single gap of length larger than or equal to B0/2. For completeness, we
give a quick outline as follows.
First of all, the sum of the lengths of the two job groups G1 and G2 is larger than or
equal to B0/2. Suppose that the makespan of the scheduling S′ obtained in step 3 is T ′ (thus,
the part removed in step 3 has length TB − T ′), then the scheduling S′′ obtained in step 6
has makespan bounded by T ′ + B0/2 (otherwise a processor Pj would have processing time
larger than B0). This derives that the scheduling obtained in step 7 has a makespan bounded
by TB + B0/2 ≤ 1.5B0.
Therefore, in case 1, we can always construct a scheduling S0 for the job set J whose
makespan is bounded by 1.5B0.
Case 2. Under the partial scheduling S, no processor has either a gap of length larger than
or equal to B0/2 or two gaps such that the sum of lengths of the two gaps is larger than or
equal to B0/2.
We first explain our basic idea in dealing with this case. Look at Fig. 1. We observe
that in each scheduling, all processors finish at the same time. Therefore, in order to keep
the makespan small, we only need to reduce the amount of idle time, i.e., the gaps, for the
processors. By the assumption of the case, the gaps for each processor are small. Therefore,
we try to “fill” as many gaps as we can so that at least one processor has at most two gaps.
Since the sum of the lengths of any two gaps of any processor is less than B0/2, the processor
with at most two gaps left shows that before the makespan of the scheduling, the processor
has less than B0/2 idle time units. From this, we can easily derive that the makespan of the
scheduling is bounded by 1.5B0.
Look at each of the situations given in Fig. 1. These situations give gap structures
(1, 3, 3, 3), (2, 2, 4, 2), (2, 2, 2, 4), and (3, 1, 3, 3), respectively. There is a very nice property
between the job groups {J14, J23} and {J13, J24}: in the (1, 3, 3, 3) and (3, 1, 3, 3) structures,
each of the processor with 3 gaps has at least one gap caused by these two job groups, and
the processor with 1 gap has its gap not in these two job groups, while in the (2, 2, 4, 2) and
(2, 2, 2, 4) structures, the processor with 4 gaps has two gaps caused by these two job groups
and the two gaps are merged into a larger continuous gap. This nice property allows us to
insert the 1-processor jobs between these two job groups without increasing much makespan.
Springer
J Comb Optim (2007) 13:33–45 41
Algorithm. ScheduleP4-III
Input: a partial scheduling S of the 2-processor jobs and 3-processor jobs in J under whichno processor has two gaps such that the sum of the lengths of the two gaps is ≥ B0 /2
Output: a scheduling S0 of the whole job set J
1. remove the job group {J13, J24} from the partial scheduling in Figure 1;
2. add the 1-processor job subsets J1, J2, J3, and J4 to the scheduling (let them start at their
earliest possible time after the starting time of the job group {J14 ,J23});
3. add back the job group {J13, J24} (in exactly the same way as they were in the partial
scheduling S).
Fig. 4 Scheduling S0 when gaps of all processors are small
The algorithm for this case is given in Fig. 4.
Let the scheduling for the whole job set J constructed by the algorithm ScheduleP4-IIIbe S0. If the makespan of S0 is equal to the makespan TB of the partial scheduling S, then by
Lemma 2.1, S0 is an optimal scheduling for the job set J .
On the other hand, if the makespan of S0 is larger than TB , then at least one of the gaps
between the job groups {J14, J23} and {J13, J24} is filled by the 1-processor job subsets.
If the filled gap belongs to a processor Pi that has at most 3 gaps in the partial scheduling
S, then after filling one of its gaps, the processor Pi has at most 2 gaps left. By the assumption
in this case, the sum of the lengths of the gaps left in the processor Pi is less than B0/2. That
is, the total idle time of processor Pi before the makespan is less than B0/2 (note that by
the algorithm ScheduleP4-III, all processors finish at the same time, so the makespan of the
scheduling S0 is equal to the finish time of the processor Pi , see Fig. 6(b) for an illustration).
Since the total processing time Ti of the processor Pi is bounded by B0, we conclude that the
makespan of the scheduling S0 is less than Ti + 0.5B0 ≤ 1.5B0.
If the filled gap belongs to the processor Pi that has 4 gaps in the partial scheduling S,
then the filled gap must be the larger gap obtained by merging two gaps in the job groups
{J14, J23} and {J13, J24}. Thus, with this gap filled, the processor Pi has two gaps left such
that the sum of the lengths of these two gaps is less than B0/2. This again derives that the
idle time of the processor Pi before the makespan of S0 is less than B0/2, and the makespan
of the scheduling S0 is less than 1.5B0.
We summarize all these discussions in the main algorithm given in Fig. 5, and conclude
with our main theorem.
Theorem 4.1. The algorithm Schedule-P4 runs in linear time and constructs a schedulingS0 for the input job set J with makespan bounded by 1.5Opt(J ).
Proof: According to the above discussion, the makespan of the scheduling S0 is bounded
by 1.5B0, which is at most 1.5Opt(J ) according to Corollary 2.2. �
5 Tightness of ratio 1.5 for the algorithm Schedule-P4
Theorem 4.1 concludes that the approximation ratio of the algorithm Schedule-P4 is bounded
by 1.5. It is natural to ask whether this ratio 1.5 is tight for the algorithm in the sense that there
is an instance J of the problem P4|fix|Cmax on which the algorithm Schedule-P4 constructs
a scheduling whose makespan is arbitrarily close to 1.5Opt(J ). Very recently, Arambula
Springer
42 J Comb Optim (2007) 13:33–45
Algorithm. Schedule-P4
Input: an instance J for the P4 |fix|Cmax problem
Output: a scheduling S0 for J
1. group the jobs in J by their processing modes, and combine the 2-processor job
groups as in (1);
2. construct the partial scheduling S for 2-processor jobs and 3-processor jobs, as shown
in Figure 1;
3. if a processor Pi has a gap of length ≥ B0 /2 under the partial scheduling S
then call subroutine ScheduleP4-I to construct a scheduling S0 for J ; stop;
4. if a processor Pi has two gaps such that the sum of the lengths of the two gaps is ≥ B0 /2under the partial scheduling S
then call subroutine ScheduleP4-II to construct a scheduling S0 for J ; stop;
5. call subroutine ScheduleP4-III to construct a scheduling S0 for J .
Fig. 5 Approximation algorithm for the P4|fix|Cmax problem
(2005) has given an affirmed answer to this question. For the completeness of the discussion,
we present this result in this section.
Theorem 5.1 (I. Arambula). The ratio 1.5 is tight for the algorithm Schedule-P4.
Proof: Recall TB = t123 + t124 + t134 + t234 + T 12,34max + T 13,24
max + T 14,23max , where T 12,34
max =max{t12, t34}, T 13,24
max = max{t13, t24}, and T 14,23max = max{t14, t23}. Fix a number TB > 0 and
let ε > 0 be a small positive real number. Consider an instance of P4|fix|Cmax so that the
following conditions hold (see Fig. 6(a) for a concrete illustration):
(1) |J234| = 0.5TB − ε,
(2) |J12| ≥ |J34|, |J23| = |J14| + ε/2, |J24| = |J13| + ε/2,
(3) |J1| = 0.5TB, |J2| ≤ |J134|, |J3| ≤ |J124|, |J4| ≤ |J123|
(a simplest way to achieve such an instance is to have a single job for each mode).
Recall that for each i , Ti is the total processing time for processor Pi . The above conditions
imply
T1 = TB,
and
|J123| + |J124| + |J134| + |J12| + |J14| + |J13| = 0.5TB .
Also it is easy to see that
T2, T3, T4 ≤ TB .
Therefore, we have
B0 = max{TB, T1, T2, T3, T4} = TB .
Springer
J Comb Optim (2007) 13:33–45 43
J123
J124
J124
J134
J134
J234J34
J12
J14
J23
J14
J24
J13
J24
J13
J4
J3
J2
J1
P4
P3
P2
P1
TB
(a) An illustration of the instance
0.5TB − 0.5TB2 2
1-processor jobs to
be scheduled
J123
J124
J124
J134
J134
J234J34
J12
J14
J23
J14
J24
J13
J24
J13
J4
J3
J2
J1
P4
P3
P2
P1
(b) The scheduling by the algorithm Schedule-P4
1.5TB −
J123
J124
J124
J134
J134 J34
J12
J14
J23
J14
J4
J3
J2
J1
J234
J24
J13
J24
J13
P4
P3
P2
P1
TB
(c) An optimal scheduling
Fig. 6 The ratio 1.5 is tight for the algorithm Schedule-P4
Since no processor has two gaps whose sum is larger than or equal to B0/2 (in particular,
the sizes of the three gaps for processor P1 are 0.5TB − ε, ε/2, and ε/2, respectively), the
scheduling shown in Fig. 6(b) will be constructed by the algorithm Schedule-P4. It is not
difficult to verify that the makespan of the scheduling in Fig. 6(b) is equal to 1.5TB − ε. On the
other hand, if we permute the job groups so that the three gaps in processor P1 are “merged”
into a single larger time slot, then job group J1 can be inserted into this time slot, achieving
a scheduling for the job set J with makespan TB , see Fig. 6(c) for the construction of this
scheduling. By Corollary 2.2, Opt(J ) ≥ B0 = TB . Therefore, the scheduling in Fig. 6(c)
is an optimal scheduling for the job set J . If we let Appx(J ) be the makespan of the
scheduling constructed by the algorithm Schedule-P4, then we have the approximation ratio
(note B0 = TB):
Appx(J )
Opt(J )= 1.5B0 − ε
B0
= 1.5 − ε
B0
= 1.5 − ε
TB(2)
By properly choosing the values TB and ε, this ratio can be arbitrarily close to 1.5 from
below. In conclusion, the approximation ratio 1.5 is tight for the algorithm Schedule-P4.
�
6 Final remarks and future work
Multiprocessor job scheduling problem has become increasingly interesting, both in practi-
cal applications and in theoretical studies. Much progress has been made in recent years. In
Springer
44 J Comb Optim (2007) 13:33–45
particular, polynomial time approximation schemes for various multiprocessor job schedul-
ing problems have been derived (Amoura et al., 2002; Chen and Miranda, 2001; Jansen and
Porkolab, 2005), which are of great significance from a theoretical point of view. On the other
hand, practical algorithms for the problem have not been as successful. Practical algorithms
with good approximation ratio are only developed for systems with at most three proces-
sors (Dell’Olmo et al., 1997; Goemans, 1995). The current paper gives new observations,
introduces new techniques, and extends the research in this direction to systems with four
processors. Our techniques are also applicable to systems with more than four processors.
For example, for the P5|fix|Cmax problem, we can have a linear time algorithm with approx-
imation ratio bounded by 2, improving the best previous ratio 2.5 for the problem (Chen
and Lee, 1999). Further and more subtle applications of our techniques are currently under
investigation.
We propose a few open problems for future work in this research area.
The technique of grouping jobs according to their processing modes then scheduling the
jobs based on job groups is known as normal scheduling (Dell’Olmo et al., 1997). For the
problem P3|fix|Cmax, Dell’Olmo et al. (1997) show that the best normal scheduling of a job
set J has makespan bounded by 1.25Opt(J ), and proved that this bound is tight. According to
this definition, Theorem 4.1 claims that for any instance J for the problem P4|fix|Cmax, there
is a normal scheduling whose makespan is bounded by 1.5Opt(J ). Is 1.5 a tight bound for
normal schedulings for the problem P4|fix|Cmax, or the bound 1.5 can be further improved?
For each fixed k, there is a (practical) approximation algorithm for the Pk |fix|Cmax problem
whose approximation ratio is bounded by a constant (Chen and Lee, 1999). However, this
constant depends on the number k of processors in the system. Can we develop practical
algorithms of approximation ratio bounded by a constant c for the Pk |fix|Cmax problem for all
k such that the constant c is independent of k? Note that for the classical scheduling problems,
such algorithms exist (for example, Graham’s listing scheduling algorithm (Graham, 1966))
with very satisfying approximation ratio (Hall, 1997).
A generalized version of the P4|fix|Cmax problem is the P4|set |Cmax problem, in which
each job may have several alternative processing modes (Chen and Lee, 1999). Based on
normal schedulings of approximation ratio 2 for the P4|fix|Cmax problem, Chen and Lee
(1999) have developed an approximation algorithm of ratio 2 + ε for any ε > 0 for the
P4|set |Cmax problem. Is it possible to extend the results in the current paper to an improved
algorithm for the P4|set |Cmax problem?
Acknowledgments This research is supported in part by the National Natural Science Foundation of China
under Project No. 60433020, and by US National Science Foundation under Grants CCR-0311590 and CCF-
0430683.
References
Amoura AK, Bampis E, Kenyon C, Manoussakis Y (2002) Scheduling independent multiprocessor tasks.
Algorithmica 32:247–261
Arambula I (2005) Personal communication
Bianco L, Blazewicz J, Dell’Olmo P, Drozdowski M (1995) Scheduling multiprocessor tasks on a dynamic
configuration of dedicated processors. Ann Oper Res 58:493–517
Blazewicz J, Dell’Olmo P, Drozdowski M, Speranza M (1992) Scheduling multiprocessor tasks on the three
dedicated processors. Inf Proc Lett 41:275–280
Blazewicz J, Dell’Olmo P, Drozdowski M, Speranza M (1992/1994) Corrigendum to Scheduling multi-
processor tasks on the three dedicated processors. Inf Proc Lett 41:275–280. Inf Proc Lett 49:269–
270
Springer
J Comb Optim (2007) 13:33–45 45
Blazewicz J, Drozdowski M, Weglarz J (1986) Scheduling multiprocessor tasks to minimize scheduling length.
IEEE Trans Comput 35:389–393
Blazewicz J, Drozdowski W, Weglarz J (1994) Scheduling multiprocessor tasks—a survey. Int J Microcomput
Appl 13:89–97
Chen J, Lee C-Y (1999) General multiprocessor tasks scheduling. Naval Res Log 46:59–74
Chen J, Miranda A (2001) A polynomial time approximation scheme for general multiprocessor job scheduling.
SIAM J Comput 31:1–17
Dell’Olmo P, Speranza MG, Tuza Zs (1997) Efficiency and effectiveness of normal schedules on three dedicated
processors. Discrete Math 164:67–79
Dobson G, Karmarkar U (1989) Simultaneous resource scheduling to minimize weighted flow times. Oper
Res 37:592–600
Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness.
Freeman, San Francisco
Goemans MX (1995) An approximation algorithm for scheduling on three dedicated machines. Discrete Appl
Math 61:49–59
Graham RL (1966) Bounds for certain multiprocessing anomalies. Bell Sys Tech J 45:1563–1581
Hall LA (1997) Approximation algorithms for scheduling. In: Hochbaum DS, (ed.) Approximation algorithms
for NP-hard problems. PWS Publishing Company, pp 1–45
Hoogeveen JA, van de Velde SL, Veltman B (1994) Complexity of scheduling multiprocessor tasks with
prespecified processor allocations. Discrete Appl Math 55:259–272
Jansen K, Porkolab L (2005) General multiprocessor task scheduling: approximate solutions in linear time.
SIAM J Comput 35:519–530
Krawczyk H, Kubale M (1985) An approximation algorithm for diagnostic test scheduling in multicomputer
systems. IEEE Trans Comput 34:869–872
Lee C-Y, Cai X (1999) Scheduling one and two-processor tasks on two parallel processors. IIE Trans 31:445–
455
Lee C-Y, Lei L, Pinedo M (1997) Current trends in deterministic scheduling. Ann Ope Res 70:1–42
Springer