bisc1.ps

8/8/2019 bisc1.ps

1/5

On the Usage of Differential Evolution for FunctionOptimization

by Rainer StornSiemens AG, ZFE T SN2, Otto-Hahn Ring 6, D-81739 Muenchen, Germany, currently on leave at ICSI,

1947 Center Street, Berkeley, CA 94704, [email protected]

Abstract assumed unless otherwise stated. Basically, DE

generates new parameter vectors by adding the

weighted difference between two population

vectors to a third vector. If the resulting vector

yields a lower objective function value than a

predetermined population member, the newly

generated vector replaces the vector, with which

it was compared, in the next generation;

otherwise, the old vector is retained. This basic

principle, however, is extended when it comes to

the practical variants of DE. For example an

existing vector can be perturbed by adding more

than one weighted difference vector to it. In most

cases, it is also worthwhile to mix the

parameters of the old vector with those of the

perturbed one before comparing the objective

function values. Several variants of DE which

have proven to be useful will be described in the

following.

Differential Evolution (DE) has recently proven to

be an efficient method for optimizing real-valued

multi-modal objective functions. Besides its good

convergence properties and suitability for

parallelization, DE's main assets are its

conceptual simplicity and ease of use. This

paper describes several variants of DE and

elaborates on the choice of DE's control

parameters which corresponds to the application

of fuzzy rules. Finally the design of a howling

removal unit with DE is described to provide a

real-world example for DE's applicability.

1 IntroductionDifferential Evolution (DE) [1], [2] has proven to

be a promising candidate for minimizing real-

valued, multi-modal objective functions. Besides

its good convergence properties DE is very

simple to understand and to implement. DE is

also particularly easy to work with, having only a

few control variables which remain fixed

throughout the entire minimization procedure.

2 Scheme DE/rand/1For each vector x i G, , i = 0,1,2,...,NP-1, a

perturbed vectorv i G, +1 is generated accordingDE is a parallel direct search method which

utilizes NP D-dimensional parameter vectorsto v x F x xi G r G r G r G, , , ,( )+ = + 1 31 2 (2)

with r r r NP1 2 3 0 1, , , , integer and mutually

different, and F > 0.

xi,G, i = 0, 1, 2, ... , NP-1, (1)

as a population for each generation G, i.e. for

each iteration of the minimization. NP doesn't

change during the minimization process. The

initial populationis chosen randomly and should

try to cover the entire parameter space

uniformly. As a rule, a uniform probability

distribution for all random decisions will be

The randomly chosen integers r1, r2 and r3 are

also chosen to be different from the running

index i. F is a real and constant factor [0, 2]

which controls the amplification of the differential

variation ( ), ,x xr G r G2 3 . Note that the vector

1

8/8/2019 bisc1.ps

2/5

x r G1 , which is perturbed to yield v i G, +1has no

relation to x i G, but is a randomly chosen

population member. Fig. 1 shows a two-

dimensional example that illustrates the different

vectors which play a part in the vector-generation

scheme. The notation: DE/rand/1 specifies thatthe vector to be perturbed is randomly chosen,

and that the perturbation consists of one

weighted difference vector.

where rand() is supposed to generate a random

number [0,1):

L = 0;

do {

L = L + 1;

}while(rand()< CR) AND (L < D));

Hence the probability Pr(L>=) = (CR)-1, > 0.

CR is taken from the interval [0, 1] and

constitutes a control variable in the design

process. The random decisions for both n and L

are made anew for each newly generated vector

ui,G+1.

x

xx

xx

x

x

x

x

x i,Gx

x NP Parameter vectors from generation GNewly generated parameter vector

MINIMUM

x

r ,G3 xr ,G1

xr ,G2

F( - )x r ,G2 xr ,G3

xr ,G1 F( - )x r ,G2 xr ,G3+

xx

=

v

X 1

X 0

vi ,G+1

To decide whether or not it should become a

member of generation G+1, the new vector

ui,G+1 is compared to xi G, . If vector ui,G+1yields a smaller objective function value than

x i G, , then x i G, +1 is set to ui,G+1; otherwise, the

old value xi G, is retained.

3 Scheme DE/best/1Fig.1: An example of a two-dimensionalobjective function showing its contour

lines and the process for generating

vi,G+1 in scheme DE/rand/1.

Basically, scheme DE/best/1 works the same

way as DE/rand/1 except that it generates the

vector vi,G+1 according to:

v x F x xi G best G r G r G, , , ,( )+ = + 1 1 2 . (5)

In order to increase the potential diversity of theperturbed parameter vectors, crossover is

introduced. To this end, the vector:

This time, the vector to be perturbed is the best

performing vector of the current generation.

Again, the computation of ui,G+1 is defined by

eq. (4). This will be also be the case for the

remaining variants.

u u u ui G i G i G D i G, , , ( ) ,( , ,..., )+ + + +=1 0 1 1 1 1 1 (3)

4 Scheme DE/best/2with

Scheme DE/best/2 uses two difference vectors

as a perturbation:uv for j n n n L

x for all other j Dji G

ji G D D D

ji G

,

,

,

, ,...,

[ , ]+

+

== + +

%

&K

'K1

1 1 1

0 1

(4)

v x F x x x xi G best G r G r G r G r G, , , , , ,( )+ = + + 1 1 2 3 4 . (6)

Due to the central limit theorem the randomvariation is shifted slightly into gaussian direction

which seems to be beneficial for many functions.

is formed. The acute bracketsD

denote the

modulo function with modulus D. The starting

index, n, in (4) is a randomly chosen integer from

the interval [0,D-1]. The integer L, which denotes

the number of parameters that are going to be

exchanged, is drawn from the interval [1, D].

The algorithm which determines L works

according to the following lines of pseudo code

5 Scheme DE/rand-to best/1

Scheme DE/rand-to-best/1 places the

perturbation at a location between a randomly

chosen population member and the best

population member:

2

8/8/2019 bisc1.ps

3/5

v x x x F x xi G i G best G i G r G r G, , , , , ,

( ) ( )+

= + + 1 2 3. (7) crucial. The more knowledge one includes, the

more likely the minimization is going to converge.

The sum of error squares is not always a good

choice as it has the potential to hide the path to

the global minimum. To minimize the maximum

error is often a better objective but seems toyield more local minima.

controls the greediness of the scheme. To

reduce the number of control variables we

usually set = .

6 Rules for the usage of DESince it's invention [1], DE's has been tested

extensively against artificial and real-world

minimization problems. So far, the following set

of linguistic rules has emerged to be useful when

it comes to choose the control variables F, CR

and NP:

7 Design of a howling removerIn order to demonstrate DE's applicability to real-

world problems a howling removal unit has been

designed with DE. In modern audio

communication applications hands free

environments are the current trend where

headsets are replaced by loudspeakers and

microphones. The preferred way of audio

communication is full duplex, i.e. all

loudspeakers and microphones are active as

opposed to half duplex or "walkie talkie" mode

where only one party is allowed to talk at a time.

Howling is one of the problems in full duplex

communication and builds up due to the acoustic

feedback path. One way to reduce howling is to

frequency-shift the signal that is picked up by a

microphone by 10Hz to 20Hz before it is sent to

the other communicating parties. This shift is

usually not perceived as unnatural by the human

ear. The shifted signal appears at the destination

loudspeakers and travels back to the originator,

shifted by another 10Hz to 20Hz. The signal

travels many times through this acoustic path

and is quickly shifted out of band, thus reducing

the feedback problems. Fig. 2 shows the block

diagram of the howling removal unit.

# At initialization the population should be spread

as much as possible over the objective function

surface.

# Most often the crossover probability CR [0,

1] must be considerably lower than one (e.g.

0.3). If no convergence can be achieved,

however, CR [0.8, 1] often helps.

# For many applications NP=10*D is a good

choice. F is usually chosen [0.5, 1].

# The higher the population size NP is chosen,

the lower one should choose the weighting factor

F.

# watching the parameters: it's a goodconvergence sign if the parameters of the best

population member change a lot from generation

to generation, especially at the beginning of the

minimization and even if the objective function

value of the best population member decreases

slowly.

# watching the objective function: it is not

necessarily bad, if the objective function value of

the best population member exhibits plateaus

during the minimization process. However, it is

an indication that the minimization might take a

long time or that the increase of the population

size NP might be beneficial for convergence.

4 BP LP 4

Upsampler

Bandpass

Lowpass

Downsampler

cos(1- )

A

n2

xk

yk

Fig. 2: Howling removal unit.

The upsampler fills in three zero samples

between the adjacent signal samples xk and

xk+1. The bandpass, which operates at four

times the sampling frequency A, retains the

components of the spectrum which are

sopposed to be frequency-shifted by . The

actual shift is performed via multiplication with an

# The objective function value of the best

population member shouldn't drop too fast,

otherwise the optimization might get stuck in a

local minimum.

# The proper choice of the objective function is

3

8/8/2019 bisc1.ps

4/5

appropriate cosine signal. The lowpass removes

some artifacts which appear due to the shifting

operation, and the downsampler takes every

fourth sample of the lowpass result to yield the

output signal yk at the original sampling

frequency.

scheme the magnitude response of the

corresponding filter was sampled in the

frequency domain. The number of samples that

were used are indicated in figs. 3 and 4 which

also show the final results of the design

procedure.

One of the most important features of the

howling remover is low computational complexity

as the unit has to operate in a real-time

environment. Therefore it was crucial to design a

bandpass as well as a lowpass with minimum

degree. To this end the bandpass was chosen to

be a recursive digital filter (IIR-filter) the transfer

function of which had to meet specific magnitude

constraints defined by a tolerance scheme. The

lowpass was designed as a transversal filter

(FIR-filter) without the usual linear phase

requirement. Also the lowpass had to meet

magnitude specifications defined by a tolerance

scheme.

0 0.1 0.2 0.3 0.4 0.50

0.2

0.4

0.6

0.8

1

1.2

Normalized Frequency

Magnitude 20 samples

stop band

10 samples

stop band

0.01 0.01

40 samples

pass band

10 samples

tra nsition ban d transition b and

10 samples

1.01

0.99

Fig. 3: Magnitude response of the bandpass after

the design process.

The objective function in both cases was defined

to be the maximum deviation from the

corresponding tolerance scheme or to be one of

the following penalty terms pk, whichever value

was greater:

0 0.1 0.2 0.3 0.4 0.50

0. 2

0. 4

0. 6

0. 8

1

1. 2

Normalized Frequency

Magn

itude

1.005

0.995

0.01 0.001

40 samples

10 samples

20 samples

pass band

transition

band

stop band

a) For all parameters par[i]:

p par i if par i1 20 000 100 0= +

8/8/2019 bisc1.ps

5/5

was set to 0.005. The magnitude response of the

filter still violates some parts of the tolerance

scheme slightly, yet the design was satisfactory

for the howling remover.

interval x !

"$#

02

,

. The strategy used was

DE/best/1 with NP=20, F=0.9 and CR=1. It took

30,020 function evaluations to get the result of

fig. 5. The final speed increase of the cosine

function computed by opti(x) was 17% comparedto the library function cos(x).

The lowpass filter result of fig. 4 was obtained

using strategy DE/best/2 with NP=200, F=0.5

and CR=1. The entire design took 83,800

evaluations of the lowpass transfer function. A

total of 16 parameters was used, 8 zero radii and

8 zero angles in the complex z-plane [3]. The

overall gain constant a0 was set to 0.005.

ConclusionSeveral variants of Differential Evolution (DE)

have been introduced and general hints about

their usage have been provided. Three real-world

design tasks appearing in the development of a

howling remover for audio communications have

been solved successfully by applying DE. All

three design tasks could have been performed

with specialized design tools; the advantage of

using DE, however, was that neither specialized

and most probably expensive tools nor expert

knowledge concerning the design tasks

themselves was necessary.

The third optimization for the howling remover

was concerned with the cosine function the

evaluation of which takes up a non-neglectable

amount of computing time. Hence an

approximation of cos(x) in the interval 02

,

! "$#

was performed using a polynomial opti(x) of third

degree. Fig. 5 shows that opti(x) yields an

improved approximation compared to the taylor

polynomial taylor(x) of third degree.

References

-1

-0.8

-0.6

-0.4

-0.2

00. 2

0. 4

0. 6

0. 8

1

0 0.5 1 1.5 2 2.5

cos(x),

taylor(x),

p(x)

x

cos(x)taylor(x) = 1 + 0.5*x^2

opti(x)=0.9975575805

+0.03400468081*x- 0.6044035554*x^2+0.1129638031*x^3

[1] Storn, R. and Price, K., Differential Evolution

- a simple and efficient adaptive scheme for

global optimization over continuous spaces,

Technical Report TR-95-012, ICSI,http://http.icsi.berkeley.edu/~storn/litera.html

[2] Storn, R. and Price, K., Minimizing the real

functions of the ICEC'96 contest by

Differential Evolution, Int. Conf. on

Evolutionary Computation, Nagoya, Japan.

[2] Mitra, S.K. and Kaiser, J.F., Handbook for

digital signal processing, John Wiley, 1993.Fig. 5: Approximation of cos(x) by means of a

polynomial of third order.

The optimization of the coefficients in opti(x) was

performed by minimization of the sum of errorsquares obtained by 100 sampling points in the

5

Documents

bisc1.ps