Short-term hydrothermal scheduling. II. Parallel simulated annealing approach

Short-term hydrothermal scheduling Part II : parallel simulated annealing approach

K.P. Wong Y.W. Wong

Indexing ferms. Fljdrothrrmul sctieduling. Simuluted annealing

Abstract: The paper develops a coarse-grained parallel simulated annealing algorithm for short- term hydrothermal scheduling. The design of the algorithm has taken into consideration load balancing, processor synchronisation reduction, communication overhead reduction and memory contention elimination. The parallel algorithm is implemented on an i860 processor in a simulated environment and is applied to a test example. The scheduling results are presented and are compared with those found by a sequential algorithm. The results indicate that the algorithm can achieve a near linear reduction in computation time.

List of principal symbols

a = the average acceptance ratio in a chain at low tem-

C = the number of sub-chains in the low-temperature

I = the total number of processors L = length of the chain M = the number of chains m, = the number of chains at high temperature m, = the number of chains at low temperature P = the process creation time ro = the average computing time for a processor S = the synchronisation time sck = sub-chain k U , = the C P U time of the clustered algorithm U , = the C P U time of the proposed algorithm U , = the C P U time of the systolic algorithm

perature

region

1 Introduction

In the companion paper [I], a sequential algorithm based on the simulated-annealing (SA) technique 12. 31 has been developed for short-term hydrothermal scheduling. Owing to the large computation requirement of the sequential SA technique, ways need to be developed to improve its speed. Although some methods to improve the speed of the sequential SA technique have been previously proposed [2, 41, parallel SA (PSA) algorithms are promising alternatives to achieve a large reduction in

0 IEE, 1994 Paper 1351C'(P9). l i n t rccrlved 8th June 1Y93 and in revised form 23rd December 19'17 The authors are with the Artificial lnlelllgence and Power Systems Research Group, Department of Electrical and Electronic Engineering. The University of Western Australia, Nedlands. Western Australia 6009

502

_ _ - ~

computing time. With the availability of powerful and high-speed numeric processors such as i860s which can work with PC-486 computers and with the availability of the software systems which can operate these processors in parallel in the near future, it can be envisaged that PSA algorithms for solving power system optimisation problems can be developed based on these processors.

In general, PSA algorithms are designed with the con- siderations of (i) the preservation of the basic require- ments in SA. (ii) the hardware archtecture of the processors, (iii) the configurations of the processors and (iv) the software environment of the processors. Items (ii)-(iv), in addition, are particularly relevant to the type of processors adopted for the development and imple- mentation of PSA algorithms. Some PSA algorithms [ 5 - 81 have been reported in the literature, however, not all of the factors affecting the design of PSA algorithms have been taken into account in them.

This paper develops a coarse-grained 191 PSA algorithm with application to short-term hydrothermal scheduling. It first reviews the PSA algorithms in Refer- ences 5 and 6. Based on these earlier algorithms, a new general PSA algorithm is then developed in this paper. The design factors in (i)-(iv) above are fully considered. The algorithm developed is implemented in a simulated parallel environment on an i860 processor. The application of the developed PSA algorithm is demonstrated through an application example.

2 Previous parallel SA algorithms

While fine-grained parallel algorithms 191 are normally executed by parallel computers such as multiple instruc- tions multiple data (MIMD) computers, the coarse- grained parallel algorithms in References 5 and 6 are executed by computers with fast and powerful numerical processors. Since a coarse-grained PSA algorithm is to be developed later in this paper, the systolic algorithm in Reference 5 and the clustered algorithm in Reference 6 are first reviewed in the following sections.

2.1 Systolic algorithm In the systolic algorithm [SI. one processor is used to generate a chain of candidate solutions at a temperature level. This chain is divided into a number I of sub-chains of equal length. For I temperature levels, therefore, there are a total of I processors executing in parallel as shown

The work reported in this paper is supported by the Energy Research and Development Corpora- tion and the Electricity Supply Association of Australia.

I E E Proc.-Gmer. 7runsm. Disrrrh.. V o l . 141. N,J. 5, September I Y Y 4

in Fig. 1. The temperature level is reduced according to the cooling schedule.

Parallel execution of this adgorithm is achieved in the following way. In Fig. 1, at temperature T,, the solution

processor temp starting sc1 sc2 5c3 sc I solution

P1 Tm sm-1, SCI 777 - P2 Tm.1 Srn,scl - - / 7 * p m J , : c i P3 Tm.7 %.I, --- - - - - P I

5 m + ~ , s c 2

Tm+l.l Sm.~-2,sci --i - - - - - A// - P 1 Tm.1 sm.~-l,sc~ --- ~ --

T m w T m + l "Tm.2

Fig. 1 Systolic algorithm + subchain generation -+ pass of inlormalion

at the end of any sub-chain sck, for k = I , 2, . .., I , is passed to the beginning of sub-chain sck at the immediately lower temperature T,,,. The ( k + 1)th sub- chain of temperature 7, is executed in parallel with kth sub-chain of temperature T,, ,.

When a solution is passed ito the start of sub-chain sck of temperature, T,,, from sub-chain sck of temperature Tm, the solution generated at the end of sub-chain sc(k - 1) of T,,, is also available. The starting solution for sck of T,,, is chosen from these two solutions. Con- sider point A at the start of sub-chain sc3 of T,,, in Fig. 1. The starting solution for this sub-chain is selected from solutions S m + Z , s r Z and S , , , , s , :3 . The selection criterion is to adopt the candidate solution which has a higher probability to lead to the quasi-equilibrium state further down the chain. The quasi-equilibrium state is the state in which the solutions are optimum solutions and they have equal probability of occurrence [SI. The evaluations of the probabilities of the solutions Sm+l ,sc , and S, , ,+2,sc , leading to the quasi-equilibriium state can be found in Reference 5.

In this algorithm, when the processor P1 has com- pleted the executions of all the sub-chains at T,, it will then be used to process the sub-chains at T,,, as shown in Fig. 1. All the other processors will be switched to execute sub-chains in the low'er temperature levels in the same manner.

2.1.1 Disadvantages of the systolic algorithm: The first disadvantage of the systolic algorithm is that its performance deteriorates when the number of processors is increased. Secondly, as the length of the sub-chains are kept constant for all temperature levels. the sub-chain length cannot be lengthened to enhance the quasi- equilibrium state to be reach.ed at the low temperature region. Although the performance of the algorithm can be improved by reducing the reduction rate of the temperature [SI, this will also increase the computation requirement.

The third disadvantage of this algorithm is that not all the processors are utilised when the first and the last 1 temperature levels are processed. With reference to Fig. 1. processors P2-PI are idling as P1 is executing sub-chain scl at T, when rn = I . All the processors will be used only after the first sub-chains of temperatures 7, to , have been executed and up to the last ( I - I)th temperature.

I E E Proc.-(;ener Tronsnt . DL.vfrih., V d 141. No. 5. September IY94

2.2 Clustered algorithm The clustered algorithm [ 6 ] consists of an algorithm for the high temperature region and an algorithm for the low temperature region.

( a ) High temperafure algorithm: Fig. 2 shows the high temperature algorithm. A t temperature 7;". the whole

5,.1,~ (stort ir ig solut on:

' m

F'3 1 1 P

I ' I

Fig. 2 Clustered algorifhm af high femperarure region

chain is divided into I equal length sub-chains, sc l - -XI , and each is executed by one processor operated in parallel with other processors. The processor performs the generation of candidate solutions, cost evaluation, acceptance criteria checking and updating of current solutions. The starting solution for all the sub-chains is Sm,s selected from the candidate solutions S,- to S,- ,, generated at the end of the sub-chains at temperature 7,- ,. The method of selecting the starting solution for the sub-chains can be found in Reference 6 .

(b) Low temperature algorithm: While there are I sub- chains, each executed by one processor in the high temperature region, the number of sub-chains in the first temperature level of the low temperature region is reduced by a factor of two and each sub-chain is executed by two processors in parallel. The sub-chain length, however, is increased by the same factor. The number of sub-chains is reduced and the length is increased in the same manner as the temperature is reduced until there is only one sub-chain left for execution by all available processors.

The flow of activities of the processors executing in parallel in a sub-chain having n processors at a temperature level is shown in Fig. 3. The upper block in Fig. 3 assumes that processor Pn has accepted a candidate solution and updated the current solution with it while the other processors have not. The activities of the other processors are then halted. The updated solution of Pn becomes the starting solution for all the processors for subsequent generations of candidate solutions in the

503

same sub-chain. This is shown in the lower block of Fig. 3.

The starting solution of the sub-chains in a temperature level is selected from the current solutions at the

r-- 1

I I

start ing solution

I Fn . . . . . .

7-- I L.. I

start ing solution

Fig. 3 Flaw (facticirie q/proce.ssors in U sub-chain

ends of all the sub-chains in the immediately higher temperature by the selection method employed in the high temperature algorithm.

(c) Algorithm switching and reduction of sub-chains: The instance at which the high temperature algorithm in (a) is switched to the low temperature algorithm in (b) and the instances at which the reduction of the number of sub-chains take place in the low temperature algorithm depend on whether there is a gain in computing time or efficiency. A dilation ratio is used to determine the switching instance. It is defined as the ratio of the computing time required to (obtain a given update ratio for the clustered algorithm to that of the sequential algorithm. The update ratio is defined as the ratio between the number of updates and the total number of pertur- bations. The switching of the algorithm takes place when the value of the dilation ratio is less than or equal to 2 [SI.

2.2.1 Disadvantages of the low temperature algorithm: In the low temperature algorithm, exchange of data

504

occurs when processors are halted and when they are restarted. If there is only one global communication channel for data exchange between the processors, this algorithm will have hardware communication problems. These problems become more severe when the number of processors executing a sub-chain is increased. In addition, the use of the dilation ratio for determining the instance to switch the high temperature algorithm to the low temperature algorithm and the instances to combine sub-chains in the low temperature algorithm, ensures a gain of efficiency, but does not guarantee that the quasi- equilibrium state can be reached.

3 Proposed parallel algorithm

The proposed parallel algorithm IS developed with reference to a small number of I860 processors which are single function multiple data (SFMD) processors. They have local memories and they can cxecute identical func- tions in parallel with different input data. These parallel processors communicate with the host through one data channel only. In the design of the algorithm, the aspects of load balancing, processor synchronisation reduction. communication overhead reduction and memory contention elimination are taken into consideration.

The proposed algorithm consists of a high temperature algorithm and a low temperature algorithm. They are described below.

3.1 High temperature algorithm The basic high temperature algorithm in the clustered algorithm is adopted, but the method to determine the instance of switching over to the low temperature algorithm is different. With reference to Fig. 2, at temperature T,- ,, each sub-chain is executed by an i860 processor in the parallel mode. There are I processors and therefore I sub-chains. If M is the total number of generations of feasible solutions in the complete chain. each processor will initially perform M I I feasible generations in a sub- chain. To determine the instance of switching to the low temperature algorithm, the acceptance ratio, which is the ratio of the number of accepted solutions to the total number of feasible solutions generated, at the end of a sub-chain is checked in the following manner.

(i) If the acceptance ratio is greater than or equal to a specified value, the execution of the sub-chain is stopped and the current solution is given by the candidate solution most recently accepted in the sub-chain. The processor for this sub-chain is then held in a waiting state until all other processors finish executing their sub- chains. I f the acceptance ratios in all sub-chains are greater than or equal to the specified value, the processor will be restarted to execute the high temperature algorithm in the next temperature level.

(ii) If the acceptance ratio in any sub-chain is less than a specified value, the high temperature algorithm is switched to the low temperature algorithm. In the algorithm switching point, all processors with the acceptance ratio in their own sub-chain greatei- than or equal to the specified value will start to execute the low temperature algorithm immediately. However, the processor with its acceptance ratio less than the specified value is allowed to continue to execute the high temperature algorithm, with the length of its sub-chain now extended according to the expression (n + 1 ) W I where n is the number of times that the sub-chain length has previously been increased. This process is repeated for this sub-chain until

I E E f ’ r o . - ( ~ w w r . Trmtsni. Dmirrh.. V o / . / 4 / . h 0. 5. Scpt<wnher l Y Y 4

the acceptance ratio is greter than or equal to the specified value or the sub-chain length has been extended for ten times. Then the processor will be used to execute the low temperature algorithm in conjunction with other processors.

The above method for determining the switching instance enables the algorithm to satisfy the quasi-equilibrium condition to a larger extent lhan the use of the dilation ratio in the clustered algorithm. In the present algorithm. the processors can be considered to be loaded evenly. As only a small number of processors are employed, the sub- chain lengths are long and the: frequency of synchronising the processors at the completion of the sub-chains is low in this algorithm. The communication overhead, which is here only confined to the transfer of the selected solution at the ends of the sub-chains a t a temperature level to the start of the sub-chains at the next level. is also reduced.

3.2 Low temperature algorithm By the method described i n (ii) above. the low temperature algorithm is initiated. The proposed low- temperature algorithm is similar to the systolic algorithm, but it fully utilises all the available processors.

Fig. 4 shows the low temperature algorithm. In the figure, it is assumed that the sub-chain executed by pro-

P -

T t

Fig. 4 Proposed olyorirhm ar rhr IOW rrmperarure regron

cessor Pi need to be extende’d once and the other processors p .~ , Py and Pz have finished executing their sub-chains at temperature Tn- ,. The low temperature algorithm is now initiated at temperature T,

The starting soluton is selected at Tm+ , from the candidate solutions in Pu. Py and PI using the same selec-

tion method as in the clustered algorithm. These processors are used to execute the first sub-chain at T,, as shown in the Fig. 4. Processor Pz will then be used to process the first sub-chain at T,,, while Px and Py execute the second sub-chain at 7;. in parallel with Pi which has just finished executing the extended sub-chain at 7 , ~ , in the high temperature region. At the last temperature level, T,, in Fig. 4, the last three sub-chains are executed respectively by (Pi. P*). (Pi, Px, Pz), and (Pi, Px,

Load balancing is achieved in the above low temperature algorithm since the lengths of the chains at all temperatures are almost equal and each chain is executed by practically one processor. With a small number of processors used. the number o f sub-chains in each chain IS small and therefore the requirement of synchronising the processors and the requirement of processor communication in the transfer of solutions are low. While these advantages are true for the systolic algorithm, the proposed low temperature algorithm utilises all the processors and idling of processors is eliminated. This also enables the quasi-equilibrium condition to he met to a larger extent than the systolic algorithm.

Pz, Py).

4 Application example

The formulation for short-term hydrothermal scheduling developed in Section 3 of the companion paper [ l ] has been combined with the proposed PSA ;ilgorithm to form a PSA-based hydrothermal scheduling algorithm. This algorithm has been implemented using C in a simulated parallel environment and is run on an 8 6 0 numeric processor.

The PSA-based scheduling algorithm has been applied to the test system in the companion paper. The param- eter settings for executing the parallel scheduling algorithm on four i860 processors in a simulated environment are identical to those used by the sequential algorithm [I] . As in the case of the sequential algorithm. the paral- iel algorithm is executed 30 times.

Table I summarises the best hydrothermal schedules by the sequential and parallel scheduling algorithms and Tdbk 2 summarises the costs of the best and worst schedules determined by the two algorithms. The results in Table 2 show that the best and worst schedule solutions provided by the parallel scheduling algorithm are slightly better than the corresponding solutions obtained from the sequential scheduling algorithm. From the results in Table 3, the average CPU time for the sequential scheduling algorithm is 3.71 times that of the parallel scheduling algorithm. While the ideal speed gain Factor is

Table 1 : Hvdrothorrnal schedules and cornoarison of fuel costs

Method Interval Therm gem Hydro Discharge Volume Cost (MW) gen (acre-ft hr) (acre ft) (S)

(MW)

Sequential 1 st day 0 h-12.0 h

2nd day 0 h-12.0 h

3rd day 0 h-12.0 h

Para1 le1 1 st day 0 h-12.0 h

2nd day 0 h-12.0 h

3rd day 0 h-12.0 h

(with relaxation) 12.0 h-24.0 h

12.0 h-24.0 h

12.0 h-24.0 h

12.0 h-24.0 h

12.0 h-24 0 h

12 0 h-24.0 h

I E E Proc.-t imer. I’ransm. Disrrih.. Y i d 141. .Yo. 5. Scpremher IY94

893 73 895 24 884 32 91 2 22 781 87 795 83

905 09 888 98 891 17 900 89 791 17 785 92

306.27 1852.18 604.76 3335.65 21 5.68 1401.93 88778 474227 16813 116560 50417 283570

29491 179572 611 02 336679 20883 136787 89911 479858 15883 111938 51408 288500

101 773.81 85746.05 92922.88 6001 5 62 70028 41 6000000 70987436

102451 36 86049 91 93635 48 60052 52 70620.00 60000.00 709870.46

505

Tab le 2: C o s t s of t h e best a n d w o r s t s c h e d u l e s

Method Best cost ($) Worst cost ($)

Sequential 709874.36 71 071 7.07 (with relaxation)

Parallel 709870.46 71 0023.77

Tab le 3: Maximum, min imum a n d a v e r a g e CPU times

Method Shortest Longest Average exec. time exec. time exec. time (Sl (s) (s)

Sequential 892 91 7 901 (with relaxation)

Parallel 234 245 239

4 when four i860 processors are assumed, a near linear gain in speed is achieved by the parallel algorithm.

5 Conclusion

A coarse-grained parallel simulated annealing algorithm has been developed. This algorithm has been combined with the formulation for short-term hydrothermal scheduling in Section 3 of the ‘companion paper [l] to estab- lish a parallel-simulated-.annealing-based hydrothermal scheduling algorithm. Using the test example in the companion paper as the test example for the developed algorithm, it has been shown that the performance of the parallel algorithm is comparable to, and it can be slightly better than, that of the sequential simulated-annealing- based algorithm.

The parallel algorithm is much faster than the sequential algorithm and a near linear reduction in speed can be

achieved. The ideal gain in speed is not achieved owing to the times required by (a) the communication between the host and the processors, (b) process creation and ter- mination in the i860s and (c) the synchronisation of the processors. Moreover, some part of the parallel algorithm, such as the selections of starting solutions for the sub-chains, cannot be executed in parallel.

6 References

1 WONG, K.P., and WONG, Y.W.: ‘Short-term hydrothermal scheduling. Part I : simulated annealing approach’, IEE Proc. C, 1994, 141,

2 AARTS, E., and KORST, J.M.: ‘Simulated annealing and Boltzmann machines: a stochastic approach to combinatorial optimization and neural computing’ (John Wiley, New York, 1989).

3 KIRKPATRICK, S.. GELATT, C.D., Jr., and VECCHI, M.P.: ‘Optimization by simulated annealing’, Science, 1983. 220, (4598). pp. 671-680

4 SZU, H.. and IIARTLEY, R.: ‘Fast simulated annealing’, P h w . Lett. A, 1987, 122, pp 157- 162

5 AARTS, E.H.L., de BONT, F.M.J., HABERS, J.H.A., and van LAARHOVEN, P.J.M.: ‘A parallel statistical cooling algorithm’. Proceedings of 3rd Annual Symposium on Theoretical Aspects of Computer Science (STACS ’86). 1986, pp. 87-97

6 AARTS, E.H.L., de BONT. F.M.J.. HABERS, J.H.A., and van LAARHOVEN, P.J.M.: ‘Parallel implementations of the statistical cooling algorithm’, Integr. VLSI J . , 1986, (41, pp. 209-238

7 DAREMA-ROGERS, F., KIRKPATRICK. S.. and NORTON, V.A.: ‘Parallel algorithms for chip placement by simulated annealing’, / E M J. Res. Deu., 1987.31. (3), pp. 391-402

R RAVIKUMAR, C.P., and PATNAIK, L.M.: ‘Performance improve- ment of simulated annealing algorithms’, Computer Sysr. Sri. Eng.,

9 IEEE Committee Report: ‘Parallel processing in power systems com- putations’ IEEiPES Summer Meeting 1991. Paper Number 91 SM 503-3 PWRS

pp. 497-500

1990.5, (2). pp. I 1 1 - 1 1s

506 IEE Pnx..-Genrr. Transm. Distrih., Vol. 141, No. 5, September 1994

Documents

Short-term hydrothermal scheduling. II. Parallel simulated annealing approach