bisc1.ps

Embed Size (px)

Citation preview

  • 8/8/2019 bisc1.ps

    1/5

    On the Usage of Differential Evolution for FunctionOptimization

    by Rainer StornSiemens AG, ZFE T SN2, Otto-Hahn Ring 6, D-81739 Muenchen, Germany, currently on leave at ICSI,

    1947 Center Street, Berkeley, CA 94704, [email protected]

    Abstract assumed unless otherwise stated. Basically, DE

    generates new parameter vectors by adding the

    weighted difference between two population

    vectors to a third vector. If the resulting vector

    yields a lower objective function value than a

    predetermined population member, the newly

    generated vector replaces the vector, with which

    it was compared, in the next generation;

    otherwise, the old vector is retained. This basic

    principle, however, is extended when it comes to

    the practical variants of DE. For example an

    existing vector can be perturbed by adding more

    than one weighted difference vector to it. In most

    cases, it is also worthwhile to mix the

    parameters of the old vector with those of the

    perturbed one before comparing the objective

    function values. Several variants of DE which

    have proven to be useful will be described in the

    following.

    Differential Evolution (DE) has recently proven to

    be an efficient method for optimizing real-valued

    multi-modal objective functions. Besides its good

    convergence properties and suitability for

    parallelization, DE's main assets are its

    conceptual simplicity and ease of use. This

    paper describes several variants of DE and

    elaborates on the choice of DE's control

    parameters which corresponds to the application

    of fuzzy rules. Finally the design of a howling

    removal unit with DE is described to provide a

    real-world example for DE's applicability.

    1 IntroductionDifferential Evolution (DE) [1], [2] has proven to

    be a promising candidate for minimizing real-

    valued, multi-modal objective functions. Besides

    its good convergence properties DE is very

    simple to understand and to implement. DE is

    also particularly easy to work with, having only a

    few control variables which remain fixed

    throughout the entire minimization procedure.

    2 Scheme DE/rand/1For each vector x i G, , i = 0,1,2,...,NP-1, a

    perturbed vectorv i G, +1 is generated accordingDE is a parallel direct search method which

    utilizes NP D-dimensional parameter vectorsto v x F x xi G r G r G r G, , , ,( )+ = + 1 31 2 (2)

    with r r r NP1 2 3 0 1, , , , integer and mutually

    different, and F > 0.

    xi,G, i = 0, 1, 2, ... , NP-1, (1)

    as a population for each generation G, i.e. for

    each iteration of the minimization. NP doesn't

    change during the minimization process. The

    initial populationis chosen randomly and should

    try to cover the entire parameter space

    uniformly. As a rule, a uniform probability

    distribution for all random decisions will be

    The randomly chosen integers r1, r2 and r3 are

    also chosen to be different from the running

    index i. F is a real and constant factor [0, 2]

    which controls the amplification of the differential

    variation ( ), ,x xr G r G2 3 . Note that the vector

    1

  • 8/8/2019 bisc1.ps

    2/5

    x r G1 , which is perturbed to yield v i G, +1has no

    relation to x i G, but is a randomly chosen

    population member. Fig. 1 shows a two-

    dimensional example that illustrates the different

    vectors which play a part in the vector-generation

    scheme. The notation: DE/rand/1 specifies thatthe vector to be perturbed is randomly chosen,

    and that the perturbation consists of one

    weighted difference vector.

    where rand() is supposed to generate a random

    number [0,1):

    L = 0;

    do {

    L = L + 1;

    }while(rand()< CR) AND (L < D));

    Hence the probability Pr(L>=) = (CR)-1, > 0.

    CR is taken from the interval [0, 1] and

    constitutes a control variable in the design

    process. The random decisions for both n and L

    are made anew for each newly generated vector

    ui,G+1.

    x

    xx

    xx

    x

    x

    x

    x

    x i,Gx

    x NP Parameter vectors from generation GNewly generated parameter vector

    MINIMUM

    x

    r ,G3 xr ,G1

    xr ,G2

    F( - )x r ,G2 xr ,G3

    xr ,G1 F( - )x r ,G2 xr ,G3+

    xx

    =

    v

    X 1

    X 0

    vi ,G+1

    To decide whether or not it should become a

    member of generation G+1, the new vector

    ui,G+1 is compared to xi G, . If vector ui,G+1yields a smaller objective function value than

    x i G, , then x i G, +1 is set to ui,G+1; otherwise, the

    old value xi G, is retained.

    3 Scheme DE/best/1Fig.1: An example of a two-dimensionalobjective function showing its contour

    lines and the process for generating

    vi,G+1 in scheme DE/rand/1.

    Basically, scheme DE/best/1 works the same

    way as DE/rand/1 except that it generates the

    vector vi,G+1 according to:

    v x F x xi G best G r G r G, , , ,( )+ = + 1 1 2 . (5)

    In order to increase the potential diversity of theperturbed parameter vectors, crossover is

    introduced. To this end, the vector:

    This time, the vector to be perturbed is the best

    performing vector of the current generation.

    Again, the computation of ui,G+1 is defined by

    eq. (4). This will be also be the case for the

    remaining variants.

    u u u ui G i G i G D i G, , , ( ) ,( , ,..., )+ + + +=1 0 1 1 1 1 1 (3)

    4 Scheme DE/best/2with

    Scheme DE/best/2 uses two difference vectors

    as a perturbation:uv for j n n n L

    x for all other j Dji G

    ji G D D D

    ji G

    ,

    ,

    ,

    , ,...,

    [ , ]+

    +

    == + +

    %

    &K

    'K1

    1 1 1

    0 1

    (4)

    v x F x x x xi G best G r G r G r G r G, , , , , ,( )+ = + + 1 1 2 3 4 . (6)

    Due to the central limit theorem the randomvariation is shifted slightly into gaussian direction

    which seems to be beneficial for many functions.

    is formed. The acute bracketsD

    denote the

    modulo function with modulus D. The starting

    index, n, in (4) is a randomly chosen integer from

    the interval [0,D-1]. The integer L, which denotes

    the number of parameters that are going to be

    exchanged, is drawn from the interval [1, D].

    The algorithm which determines L works

    according to the following lines of pseudo code

    5 Scheme DE/rand-to best/1

    Scheme DE/rand-to-best/1 places the

    perturbation at a location between a randomly

    chosen population member and the best

    population member:

    2

  • 8/8/2019 bisc1.ps

    3/5

    v x x x F x xi G i G best G i G r G r G, , , , , ,

    ( ) ( )+

    = + + 1 2 3. (7) crucial. The more knowledge one includes, the

    more likely the minimization is going to converge.

    The sum of error squares is not always a good

    choice as it has the potential to hide the path to

    the global minimum. To minimize the maximum

    error is often a better objective but seems toyield more local minima.

    controls the greediness of the scheme. To

    reduce the number of control variables we

    usually set = .

    6 Rules for the usage of DESince it's invention [1], DE's has been tested

    extensively against artificial and real-world

    minimization problems. So far, the following set

    of linguistic rules has emerged to be useful when

    it comes to choose the control variables F, CR

    and NP:

    7 Design of a howling removerIn order to demonstrate DE's applicability to real-

    world problems a howling removal unit has been

    designed with DE. In modern audio

    communication applications hands free

    environments are the current trend where

    headsets are replaced by loudspeakers and

    microphones. The preferred way of audio

    communication is full duplex, i.e. all

    loudspeakers and microphones are active as

    opposed to half duplex or "walkie talkie" mode

    where only one party is allowed to talk at a time.

    Howling is one of the problems in full duplex

    communication and builds up due to the acoustic

    feedback path. One way to reduce howling is to

    frequency-shift the signal that is picked up by a

    microphone by 10Hz to 20Hz before it is sent to

    the other communicating parties. This shift is

    usually not perceived as unnatural by the human

    ear. The shifted signal appears at the destination

    loudspeakers and travels back to the originator,

    shifted by another 10Hz to 20Hz. The signal

    travels many times through this acoustic path

    and is quickly shifted out of band, thus reducing

    the feedback problems. Fig. 2 shows the block

    diagram of the howling removal unit.

    # At initialization the population should be spread

    as much as possible over the objective function

    surface.

    # Most often the crossover probability CR [0,

    1] must be considerably lower than one (e.g.

    0.3). If no convergence can be achieved,

    however, CR [0.8, 1] often helps.

    # For many applications NP=10*D is a good

    choice. F is usually chosen [0.5, 1].

    # The higher the population size NP is chosen,

    the lower one should choose the weighting factor

    F.

    # watching the parameters: it's a goodconvergence sign if the parameters of the best

    population member change a lot from generation

    to generation, especially at the beginning of the

    minimization and even if the objective function

    value of the best population member decreases

    slowly.

    # watching the objective function: it is not

    necessarily bad, if the objective function value of

    the best population member exhibits plateaus

    during the minimization process. However, it is

    an indication that the minimization might take a

    long time or that the increase of the population

    size NP might be beneficial for convergence.

    4 BP LP 4

    Upsampler

    Bandpass

    Lowpass

    Downsampler

    cos(1- )

    A

    n2

    xk

    yk

    Fig. 2: Howling removal unit.

    The upsampler fills in three zero samples

    between the adjacent signal samples xk and

    xk+1. The bandpass, which operates at four

    times the sampling frequency A, retains the

    components of the spectrum which are

    sopposed to be frequency-shifted by . The

    actual shift is performed via multiplication with an

    # The objective function value of the best

    population member shouldn't drop too fast,

    otherwise the optimization might get stuck in a

    local minimum.

    # The proper choice of the objective function is

    3

  • 8/8/2019 bisc1.ps

    4/5

    appropriate cosine signal. The lowpass removes

    some artifacts which appear due to the shifting

    operation, and the downsampler takes every

    fourth sample of the lowpass result to yield the

    output signal yk at the original sampling

    frequency.

    scheme the magnitude response of the

    corresponding filter was sampled in the

    frequency domain. The number of samples that

    were used are indicated in figs. 3 and 4 which

    also show the final results of the design

    procedure.

    One of the most important features of the

    howling remover is low computational complexity

    as the unit has to operate in a real-time

    environment. Therefore it was crucial to design a

    bandpass as well as a lowpass with minimum

    degree. To this end the bandpass was chosen to

    be a recursive digital filter (IIR-filter) the transfer

    function of which had to meet specific magnitude

    constraints defined by a tolerance scheme. The

    lowpass was designed as a transversal filter

    (FIR-filter) without the usual linear phase

    requirement. Also the lowpass had to meet

    magnitude specifications defined by a tolerance

    scheme.

    0 0.1 0.2 0.3 0.4 0.50

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    Normalized Frequency

    Magnitude 20 samples

    stop band

    10 samples

    stop band

    0.01 0.01

    40 samples

    pass band

    10 samples

    tra nsition ban d transition b and

    10 samples

    1.01

    0.99

    Fig. 3: Magnitude response of the bandpass after

    the design process.

    The objective function in both cases was defined

    to be the maximum deviation from the

    corresponding tolerance scheme or to be one of

    the following penalty terms pk, whichever value

    was greater:

    0 0.1 0.2 0.3 0.4 0.50

    0. 2

    0. 4

    0. 6

    0. 8

    1

    1. 2

    Normalized Frequency

    Magn

    itude

    1.005

    0.995

    0.01 0.001

    40 samples

    10 samples

    20 samples

    pass band

    transition

    band

    stop band

    a) For all parameters par[i]:

    p par i if par i1 20 000 100 0= +

  • 8/8/2019 bisc1.ps

    5/5

    was set to 0.005. The magnitude response of the

    filter still violates some parts of the tolerance

    scheme slightly, yet the design was satisfactory

    for the howling remover.

    interval x !

    "$#

    02

    ,

    . The strategy used was

    DE/best/1 with NP=20, F=0.9 and CR=1. It took

    30,020 function evaluations to get the result of

    fig. 5. The final speed increase of the cosine

    function computed by opti(x) was 17% comparedto the library function cos(x).

    The lowpass filter result of fig. 4 was obtained

    using strategy DE/best/2 with NP=200, F=0.5

    and CR=1. The entire design took 83,800

    evaluations of the lowpass transfer function. A

    total of 16 parameters was used, 8 zero radii and

    8 zero angles in the complex z-plane [3]. The

    overall gain constant a0 was set to 0.005.

    ConclusionSeveral variants of Differential Evolution (DE)

    have been introduced and general hints about

    their usage have been provided. Three real-world

    design tasks appearing in the development of a

    howling remover for audio communications have

    been solved successfully by applying DE. All

    three design tasks could have been performed

    with specialized design tools; the advantage of

    using DE, however, was that neither specialized

    and most probably expensive tools nor expert

    knowledge concerning the design tasks

    themselves was necessary.

    The third optimization for the howling remover

    was concerned with the cosine function the

    evaluation of which takes up a non-neglectable

    amount of computing time. Hence an

    approximation of cos(x) in the interval 02

    ,

    ! "$#

    was performed using a polynomial opti(x) of third

    degree. Fig. 5 shows that opti(x) yields an

    improved approximation compared to the taylor

    polynomial taylor(x) of third degree.

    References

    -1

    -0.8

    -0.6

    -0.4

    -0.2

    00. 2

    0. 4

    0. 6

    0. 8

    1

    0 0.5 1 1.5 2 2.5

    cos(x),

    taylor(x),

    p(x)

    x

    cos(x)taylor(x) = 1 + 0.5*x^2

    opti(x)=0.9975575805

    +0.03400468081*x- 0.6044035554*x^2+0.1129638031*x^3

    [1] Storn, R. and Price, K., Differential Evolution

    - a simple and efficient adaptive scheme for

    global optimization over continuous spaces,

    Technical Report TR-95-012, ICSI,http://http.icsi.berkeley.edu/~storn/litera.html

    [2] Storn, R. and Price, K., Minimizing the real

    functions of the ICEC'96 contest by

    Differential Evolution, Int. Conf. on

    Evolutionary Computation, Nagoya, Japan.

    [2] Mitra, S.K. and Kaiser, J.F., Handbook for

    digital signal processing, John Wiley, 1993.Fig. 5: Approximation of cos(x) by means of a

    polynomial of third order.

    The optimization of the coefficients in opti(x) was

    performed by minimization of the sum of errorsquares obtained by 100 sampling points in the

    5