8
Approximation algorithms for scheduling trees with general communication delays Alix Munier Laboratoire LIP6, University P. et M. Curie 4, place Jussieu 75252, Paris Cedex 05, France Received 15 November 1996 Abstract We consider the problem of scheduling a tree with general communication delays. Jakoby and Reischuk proved that this problem is NP-hard for binary trees and unlimited number of processors. Firstly, we develop a clustering procedure based on the same lower bounds as Papadimitriou and Yannakakis for a related problem. We deduce an approximation algorithm for an unlimited number of processors with relative performance 2 1=1 q, where q de- notes the maximum ratio between communication delays and duration of tasks. We also prove that, for a limited number of identical processors m, any list schedule using the clusters structure has a relative performance bounded by 1 1 1=m2 1=1 q and that this bound is tight. Ó 1999 Published by Elsevier Science B.V. All rights reserved. Keywords: General communication delays; Scheduling; Task duration; Makespan 1. Introduction The problem here consists on the minimization of the makespan for a scheduling problem with communications delays. A precedence constraint between two tasks i and j, models a data transfer: if i and j are not performed by a same processor, a delay c ij between the completion time of i and the starting time of j must be added. For several studies (see the surveys [1,2]), the complexity and the relative per- formance of several algorithms are expressed using the ratio q c max =v min between the maximum communication delay and the minimum duration of a task. Problems with q > 1 are usually harder since the structure of the optimal solutions is much more dicult to characterize. Parallel Computing 25 (1999) 41–48 0167-8191/99/$ – see front matter Ó 1999 Published by Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 8 1 9 1 ( 9 8 ) 0 0 1 0 1 - X

Approximation algorithms for scheduling trees with general communication delays

Embed Size (px)

Citation preview

Page 1: Approximation algorithms for scheduling trees with general communication delays

Approximation algorithms for scheduling treeswith general communication delays

Alix Munier

Laboratoire LIP6, University P. et M. Curie 4, place Jussieu 75252, Paris Cedex 05, France

Received 15 November 1996

Abstract

We consider the problem of scheduling a tree with general communication delays. Jakoby

and Reischuk proved that this problem is NP-hard for binary trees and unlimited number of

processors. Firstly, we develop a clustering procedure based on the same lower bounds as

Papadimitriou and Yannakakis for a related problem. We deduce an approximation algorithm

for an unlimited number of processors with relative performance 2ÿ 1=�1� q�, where q de-

notes the maximum ratio between communication delays and duration of tasks. We also prove

that, for a limited number of identical processors m, any list schedule using the clusters

structure has a relative performance bounded by 1� �1ÿ 1=m��2ÿ 1=�1� q�� and that this

bound is tight. Ó 1999 Published by Elsevier Science B.V. All rights reserved.

Keywords: General communication delays; Scheduling; Task duration; Makespan

1. Introduction

The problem here consists on the minimization of the makespan for a schedulingproblem with communications delays. A precedence constraint between two tasks iand j, models a data transfer: if i and j are not performed by a same processor, adelay cij between the completion time of i and the starting time of j must be added.

For several studies (see the surveys [1,2]), the complexity and the relative per-formance of several algorithms are expressed using the ratio q � cmax=vmin betweenthe maximum communication delay and the minimum duration of a task. Problemswith q > 1 are usually harder since the structure of the optimal solutions is muchmore di�cult to characterize.

Parallel Computing 25 (1999) 41±48

0167-8191/99/$ ± see front matter Ó 1999 Published by Elsevier Science B.V. All rights reserved.

PII: S 0 1 6 7 - 8 1 9 1 ( 9 8 ) 0 0 1 0 1 - X

Page 2: Approximation algorithms for scheduling trees with general communication delays

In this paper, we consider a set of precedence constraints reduced to an out-tree.This strong hypothesis allows several authors to get interesting theoretical results:· If the number of processors is unlimited, Chr�etienne [3] developed a polynomial

algorithm for q6 1. Jakoby and Reischuk [4] proved that the problem is NP-com-plete for binary trees with unit durations and equal communication delays c. Theyalso developed a polynomial algorithm for the special case of complete k-ary trees.

· Up to now, all the studies about scheduling trees for m identical processors con-sider unit durations and unit communication delays:� Lenstra et al. [5] proved that the problem is NP-hard. Several authors [5±7] de-

veloped polynomial algorithm for 2 processors.� Lawler [6] developed an approximation algorithm with a relative di�erence

bounded by m ) 2. Using the ®nest arguments, Guinand et al. [8] proved thatthe bound of this algorithm is �mÿ 2�=2. Mohring and Scha�ter [9] developedanother algorithm with the same relative di�erence for communication delays in{0, 1}.

� Varvarigou and Roychowdhury developed an exact algorithm with complexityO�n2m� solving this problem.

In this paper, we consider an out-tree with no assumption on communicationdelays and tasks durations. In Section 2, we introduce the problem and some usefulnotations. In Section 3, we compute a lower bound on the makespan by associatingto each task i a subset of descendants that may be executed sequentially by thesame processor of i using the same ideas as Papadimitriou and Yannakakis [10].We deduce in Section 4 an approximation algorithm of relative performance ratio2ÿ 1=�1� q� by building a partition of tasks from the subsets of descendants.Then, we prove that any list schedule using this partition has a relative perfor-mance ratio 1� �1ÿ 1=m��2ÿ 1=�1� q�� for m processors and that this bound istight.

2. Problem formulation

Let T � f1; . . . ; ng be a set of n tasks with durations v1; . . . ; vn > 0 and G � �T ;E�be an out-tree, with root 1, modeling data transfers between these tasks.8i 2 T ÿ f1g, p�i� is the immediate predecessor of i in G. For any couple of tasks�i; j�, j is said to be a descendant of i if it exists a path in G from i to j. With everyedge �i; j� 2 E is associated a communication delay cij P 0. m denotes the number ofavailable identical processors.

A schedule is de®ned by a couple of vectors �t; p� such that, 8i 2 T , ti P 0 is thestarting time of i and pi 2 f1; . . . ;mg is the processor performing i. A schedule isfeasible i� the two following conditions hold:1. 8�i; j� 2 E, if pi � pj then ti � vi6 tj else ti � vi � cij6 tj.2. 8�i; j� 2 T 2, i 6� j, with pi � pj then ti � vi6 tj or tj � vj6 ti.The makespan of the schedule �t; p� is x � maxi2T �ti � vi�. The problem is to ®nd afeasible schedule with a minimum makespan.

42 A. Munier / Parallel Computing 25 (1999) 41±48

Page 3: Approximation algorithms for scheduling trees with general communication delays

3. Lower bound and clusters

The aim of this section is to compute by recurrence, for every task i 2 T a lowerbound hi of the makespan of the sub-graph of G rooted by i for an unboundednumber of processors.

Let Ti be the set of descendants of i, and suppose that u is the makespan of afeasible schedule of tasks from Ti [ fig. We express a necessary condition on u asfollows: for any task j 2 Ti, if cp�j�j > uÿ vp�j� ÿ hj then tasks p�j� and j must beassigned to the same processor in a schedule of Ti [ fig of makespan u. The arc�p�j�; j� is said to be u-critical.

Now, let C�i; u� be the set of tasks from Ti such that the path from i to j iscomposed with only u-critical arcs. Clearly, any task j from C�i; u� must be per-formed by pi with latency hj ÿ vj. Let O be the order of tasks from C�i; u� followingdecreasing latencies. By sequencing these tasks following O, we get a schedule with aminimum makespan:

xi�u� � vi � maxj2f1;...;jC�i;u�jg

hO�j� �Xjÿ1

a�1

vO�a�

!:

So, the condition is u P xi�u�.hi is de®ned to be the minimum value u which veri®es the above condition. We set

C�i� � C�i; hi� and Oi�1�; . . . ;Oi�jC�i�j� the ordered sequences of descendants of iperformed by pi for hi. By extension, we set Oi�0� � i. 8i 2 T , C�i� [ fig is a clusterwith head i.

Lemma 1. 8i 2 T , hp�i� ÿ hi P vp�i�.

Proof. If i 2 C�p�i��, then hp�i�P vi � hi by de®nition of hp�i�. Else, i 62 C�p�i�� thencp�i�i6 hp�i� ÿ vp�i� ÿ hi. �

4. Approximation algorithms

4.1. Unlimited number of processors

The approximation algorithm presented here builds a schedule r1 � �t1; p� bychoosing clusters from fC�i� [ fig; i 2 Tg to get a partition of T. S is the set of headsof clusters chosen by the algorithm. At the end of the algorithm, C1 is the set ofclusters of r1.1. Initialization: S :� f1g; C1 :� ;;2. Loop While S 6� ; do

(a) Choose i 2 S and perform it by a new processor as soon as possible: if i � 1then t1i :� 0, else t1i :� t1p�i� � vp�i� � cp�i�i; S :� S ÿ fig;(b) C1 :� C1 [ fC�i� [ figg; Tasks from C�i� are performed without interrup-tion by the processor pi. �i:e: 8j 2 f1; . . . ; jC�i�jg; t1Oi�j� :� t1i �

Pjÿ1a�0 vOi�a��.

A. Munier / Parallel Computing 25 (1999) 41±48 43

Page 4: Approximation algorithms for scheduling trees with general communication delays

(c) Add to S the set of immediate successors of tasks in C�i� [ fig which are notin C�i�.

The schedule obtained is feasible. Indeed, since G is an out-tree, each task j isscheduled exactly once. Moreover, precedence relations are satis®ed inside clustersand communication delays are taken into account between clusters.

Let cmax be the maximum value of a communication delay, let vmin be the mini-mum value of the duration of a task and q � cmax=vmin.

Lemma 2. Let i 6� 1 be the head of a cluster of C1 and let j be the head of the cluster ofC1 containing p�i�. Then

t1i ÿ t1j 6 2ÿ 1

1� q

� ��hj ÿ hi�:

Proof. Clearly, t1i ÿ t1j � �t1i ÿ t1p�i�� � �t1p�i� ÿ t1j �. By the algorithm:1. Since i and p�i� are not in the same cluster in C1, t1i ÿ t1p�i� � vp�i� � cp�i�i. By

Lemma 1, hp�i� ÿ hi P vp�i� so t1i ÿ t1p�i�6 cp�i�i � hp�i� ÿ hi.2. Tasks from C�j� [ fjg are performed without interruption. Since

p�i� 2 C�j� [ fjg, there exists k 2 f0; . . . ; jC�j�jg with p�i� � Oj�k� and

t1p�i� ÿ t1j �Xkÿ1

a�0

vOj�a�:

By de®nition of h, hj P hp�i� �Pkÿ1

a�0 vO�a� so t1p�i� ÿ t1j 6 hj ÿ hp�i�. Hence we get theinequality t1i ÿ t1j 6 cp�i�i � hj ÿ hi. Now, since i is not in C�j�, cp�i�i6 hj ÿ vp�i� ÿ hi.So,

hj ÿ hi P cp�i�i � vp�i�P cp�i�i 1� 1

q

� �:

Then cp�i�i6 �hj ÿ hi� q1�q and the lemma holds. �

Theorem 1. Let x1 be the makespan of r1 and x1opt be the optimal value of themakespan, then

x1

x1opt

6 2ÿ 1

1� q:

Proof. Let k be a task completed at time x1. We de®ne a sequences of N + 1 tasksi0; i1 . . . ; iN as follows:1. i0 is the head of the cluster of C1 containing k (i.e. k 2 C�i0� [ fi0g with

C�i0� [ fi0g 2 C1).2. If il 6� 1, then il�1 is the head of the cluster of C1 which contains p�il�.By the algorithm,

x1 � t1i0 �X

j2C�i0�[fi0gvj � t1i0 � hi0 :

By Lemma 2, 8l � 1; . . . ;N ,

44 A. Munier / Parallel Computing 25 (1999) 41±48

Page 5: Approximation algorithms for scheduling trees with general communication delays

t1il ÿ t1ilÿ16 2ÿ 1

1� q

� ��hilÿ1

ÿ hil�:Since iN � 1 and x1opt P h1,

x16 2ÿ 1

1� q

� ��h1 ÿ hi0� � hi0 :

So, the inequality holds. h

Tightness of this bound is obvious. Let c and k be two positive integers. Let usconsider an instance of the problem given by k�c� 1� tasks and a graph G restrictedto a path between these tasks.· An optimal solution consists on executing all the tasks consecutively by a same

processor. Hence, x1opt � k�c� 1�.· Our algorithm builds k clusters of size (c + 1) each, so x1 � k�c� 1� � �k ÿ 1�c.Then,

x1

x1opt

� 1� �k ÿ 1�ck�c� 1� � 2ÿ �k � c�

k�c� 1� :

For k ! �1, we get the bound asymptotically.

4.2. Limited number of processors

In this section, we study a list algorithm using the structure of the clusters in C1.We build a schedule rm � �tm; pm� for m identical processors using the following

algorithms:1. Initialization: Let L be the set of heads for clusters in C1.

L :� fi 2 T=C�i� [ fig 2 C1g.2. Loop While L 6� ; do

(a) Let M�t� be the subset of L composed by tasks schedulable by an idle pro-cessor at time t.(b) If M�t� � ;, then t :� t � 1. Else, select arbitrary i 2 M�t� and a processor pwhich can perform i at time t and set tm

i :� t and pi :� p.(c) Perform all tasks from C�i� by p without interruption following Oi:

8j 2 f1; . . . ; jC�i�jg; tmOi�j� :� tm

i �Xjÿ1

a�0

vOi�a� and pOi�j� � pi

Set L :� Lÿ fig.For every couple �t; t0� of integers with t6 t0, we denote by I�t; t0� the number of

idle slots for rm during the interval �t; t0�. We prove the following lemma.

Lemma 3. Let i 6� 1 be the head of a cluster in C1, then

I�tmp�i� � vp�i�; tm

i �6 �mÿ 1�cp�i�i:

Proof. We consider two cases:

A. Munier / Parallel Computing 25 (1999) 41±48 45

Page 6: Approximation algorithms for scheduling trees with general communication delays

· If tmi P tm

p�i� � vp�i� � cp�i�i then, task i is schedulable by any processor from any timet P tm

p�i� � vp�i� � cp�i�i. So I�tmp�i� � vp�i�; tm

i � � I�tmp�i� � vp�i�; tm

p�i� �cp�i�i � vp�i��. More-over, i is schedulable by pp�i� in �tm

p�i� � vp�i�; tmp�i� � cp�i�i � vp�i��, so pp�i� is busy during

this interval and the inequality holds.· Else tm

i < cp�i�i � vp�i� � tmp�i� and, using the same argument, pp�i� is busy during the

interval �tmp�i� � vp�i�; tm

i �. So, the inequality holds. �

Lemma 4. Let i 6� 1 be the head of a cluster in C1, and j be the head of the cluster ofC1 containing p�i�, then

I�tmj ; t

mi �6 �mÿ 1��t1i ÿ t1j �:

Proof. Clearly, I�tmj ; t

mi � � I�tm

j ; tmp�i� � vp�i�� �I�tm

p�i� � vp�i�; tmi �. During �tm

j ; tmp�i��vp�i��,

pj performs tasks from C�j� [ fjg without interruption, so

I�tmj ; t

mp�i� � vp�i��6 �mÿ 1��tm

p�i� � vp�i� ÿ tmj �:

Moreover, by de®nition of rm, tmp�i� ÿ tm

j � t1p�i� ÿ t1j and then

I�tmj ; t

mp�i� � vp�i��6 �mÿ 1��t1p�i� � vp�i� ÿ t1j �:

By Lemma 3, I�tmp�i� � vp�i�; tm

i �6 �mÿ 1�cp�i�i so,

I�tmj ; t

mi �6 �mÿ 1��t1p�i� ÿ t1j � vp�i� � cp�i�i�:

Since t1i � t1p�i� � vp�i� � cp�i�i we get the inequality. �

Theorem 2. Let r1 (resp. rm) be the relative performance of the schedule r1 (resp. rm),then

rm6 1� 1ÿ 1

m

� �r1:

Proof. Let k be a task completed at time xm. We de®ne a sequence of N � 1 tasksi0; i1 . . . ; iN as follows:1. i0 is the head of the cluster of C1 containing k.2. If il 6� 1, then il�1 is the head of the cluster of C1 which contains p�il�.Clearly, mxm � n�I�0;xm� � n�I�0; tm

i0� �I�tm

i0;xm�.

By applying Lemma 4 to the sequences i0; . . . ; iN , we get I�0; t1i0 �6 �mÿ 1�tmi0

.Moreover, tasks from C�i0� [ fi0g are scheduled without interruption by rm, so xm ÿtmi0� hi0 6x1 ÿ t1i0 and I�tm

i0;xm�6 �mÿ 1��x1 ÿ t1i0 �. Hence,

mxm6 n� �mÿ 1�x1:Now, let x1opt (resp. xm

opt) be the optimal makespan of a schedule for unlimitednumber (resp. m) processors. Since n

m 6xmopt, we get

xm

xmopt

6 1� 1ÿ 1

m

� �x1

xmopt

:

As x1opt6xmopt, the theorem is proved. �

46 A. Munier / Parallel Computing 25 (1999) 41±48

Page 7: Approximation algorithms for scheduling trees with general communication delays

Corollary 1. If we consider the schedule r1 obtained previously, the relativeperformance of the list algorithm is 1� �1ÿ 1=m��2ÿ 1=�1� q��.

The relative performance of this algorithm is asymptotically tight. Indeed, let usconsider the out-tree G�k�, k P 2 de®ned as follows: durations of tasks are 1, com-munication delays are equal to a positive integer c. G�k� is composed with:· sequences of k subgraphs a1; . . . ; ak such that any ai is exactly a path of c + 1 tasks

and· sequences of k ÿ 2 subgraphs b1; . . . ; bkÿ2 such that any bi is exactly a root with

c ) 1 immediate successors.Fig. 1 presents the structure of the graph G�k�. We also add k�c� 1��mÿ 2� � k

independent tasks.An optimal schedule consists on scheduling the sequences ai on a ®rst processor

and sequences bj on a second one to get a makespan equal to k�c� 1�. On the re-maining slots of this second processor, k independent tasks are assigned. The otherindependent tasks are assigned to the other m ) 2 processors.

In a worst case, the two sequences ai and bj are performed by the same processorduring �2c� 1��k ÿ 2� � 2�c� 1�. The independent tasks are performed before, sowe get:

xm Pk�c� 1��mÿ 2� � k

m� �k ÿ 2��2c� 1� � 2�c� 1�:

Now,

xm

xoptP 1ÿ 2

m� 1

m�c� 1� ��k ÿ 2��2c� 1�

k�c� 1� :

For k !1, we get the right bound.

Fig. 1. Structure of G�k�.

A. Munier / Parallel Computing 25 (1999) 41±48 47

Page 8: Approximation algorithms for scheduling trees with general communication delays

5. Perspectives

In this paper, we developed an approximation algorithm with a bounded relativeperformance for a tree with general communication delays. Several interestingquestions remain open:1. In some cases, duplication of tasks allows to reduce the di�culty of the problems

with communication delays [10±12]. Now, is it possible to improve the boundproved here on m processors for trees using duplication?

2. Is it also possible to obtain approximation algorithms with bounded relative per-formance for more general classes of precedence graph with general communica-tion delays?

References

[1] P. Chr�etienne, C. Picouleau, Scheduling with communication delays: a survey, in: P. Chr�etienne, E.G.

Co�man, J.K. Lenstra, Z. Liu (Eds.), Scheduling Theory and its Applications, Wiley, New York,

1995, pp. 65±89.

[2] B. Veltman, B.J. Lageweg, J.K. Lenstra, Multiprocessor scheduling with communication delays,

Parallel Computing 16 (1990) 173±182.

[3] P. Chr�etienne, A polynomial time to optimally schedule tasks over an ideal distributed system under

tree-like precedence constraints, European Journal of Operational Research 2 (1989) 225±230.

[4] A. Jakoby, R. Reischuk, Scheduling trees with communication delays, in: 3.SWAT, 1992, pp. 165±

177.

[5] J.K. Lenstra, M. Veldhorst, B. Veltman, The complexity of scheduling trees with communication

delays, Journal of Algorithms 20 (1) (1996) 157±173.

[6] E.L. Lawler, Scheduling trees on multiprocessors with unit communication delays, in: Workshop on

Models and Algorithms for Planning and Scheduling Problems, Villa Vigoni, Lake Como, Italy, 1993.

[7] F. Guinand, D. Trystram, Optimal scheduling of uect trees on two processors, Technical report,

Apache-imag 3, Institut nationale polytechnique de Grenoble, France, 1993.

[8] F. Guinand, R. Rapine, D. Trystram, Worst case analysis of lawler's algorithm for scheduling trees

with communication delays, 95.

[9] R.H. Mohring, M.W. Scha�ter, A simple approximation algorithm for scheduling forests with unit

processing times and zero-one communication delays, Technical Report 506, Technische universitat

Berlin, 1996.

[10] C. Papadimitriou, M. Yannakakis, Towards an architecture independent analysis of parallel

algorithms, SIAM Journal on Computing 19 (1990) 322±328.

[11] J.-Y. Colin, P. Chr�etienne, C.P.M. scheduling with small communication delays and task duplication,

Operations Research 39 (1991) 681±684.

[12] A. Munier, C. Hanen, Using duplication for scheduling unitary tasks on m processors with

communication delays, Theoretical Computer Science, to be published.

48 A. Munier / Parallel Computing 25 (1999) 41±48