The elitist non-homogeneous genetic algorithm: Almost sure convergence

Statistics and Probability Letters 83 (2013) 2179–2185

Contents lists available at ScienceDirect

Statistics and Probability Letters

journal homepage: www.elsevier.com/locate/stapro

The elitist non-homogeneous genetic algorithm:Almost sure convergenceJ.A. Rojas Cruz, A.G.C. Pereira ∗

Departamento de Matemática, Universidade Federal do Rio Grande do Norte, 59072-970, Natal-RN, Brazil

a r t i c l e i n f o

Article history:Received 17 February 2012Received in revised form 21 May 2013Accepted 22 May 2013Available online 30 May 2013

MSC:primary 60J0560J10secondary 65C40

Keywords:Markov chainsWeak and strong ergodicityNon-homogeneousGenetic algorithms

a b s t r a c t

Evolutionary algorithms are used to search for optimal points of functions. One of thesealgorithms, the non-homogeneous genetic algorithm, uses in its dynamics two parameters,namely mutation and crossover probabilities, which are allowed to change throughoutthe algorithm’s evolution. In this paper, we consider the elitist version of the non-homogeneous genetic algorithm andwe prove its almost sure convergence to a populationwhich has an optimum point in it.

© 2013 Elsevier B.V. All rights reserved.

1. Introduction

Canonical genetic algorithms (CGA) as introduced in Holland (1975) are often used to search for optimal points of func-tions; that is, if f : S → (0, ∞) is a function, these algorithms are used to find

argmaxx∈Bi

f (x) or equivalently maxf (b) : b ∈ Bi (1)

where Bi is the discretization of the domain S of the function f , and f is supposed to be a non-constant function, limitedover its domain. In optimization theory, an algorithm is said to converge to the global optimum if it generates a sequenceof solutions or function values in which the global optimum is a limit value.

Markov chains offer an appropriatemodel for analyzing CGAs, and it has been used in Eiben et al. (1991) and Fogel (1992)to prove probabilistic convergence of the best solution within a population to the global optimum under elitist selection(the best individual survives with probability 1). In Rudolph (1994), CGA global convergence properties were analyzed anda modified version of the CGA was presented in which they kept the best element of the population in an extra place, but itis not used in the evolutionary process. This element changes if throughout the evolution of the algorithm a better elementappears in the population. In Cerf (1996, 1998) some operations were used in order to obtain an expression for transitionprobabilities, and then they gave conditions for the convergence of the algorithm. In the previously mentioned papers,the mutation and crossover probabilities were kept fixed through the evolution of the algorithm. Campos et al. (2013) hasproposed a modified version of the canonical genetic algorithm: the non-homogeneous genetic algorithm, in which the

∗ Corresponding author. Tel.: +55 8496069292.E-mail addresses: [email protected] (J.A. Rojas Cruz), [email protected], [email protected] (A.G.C. Pereira).

0167-7152/$ – see front matter© 2013 Elsevier B.V. All rights reserved.http://dx.doi.org/10.1016/j.spl.2013.05.025

http://dx.doi.org/10.1016/j.spl.2013.05.025

http://www.elsevier.com/locate/stapro

http://www.elsevier.com/locate/stapro

http://crossmark.dyndns.org/dialog/?doi=10.1016/j.spl.2013.05.025&domain=pdf

mailto:[email protected]



http://dx.doi.org/10.1016/j.spl.2013.05.025

2180 J.A. Rojas Cruz, A.G.C. Pereira / Statistics and Probability Letters 83 (2013) 2179–2185

mutation and crossover probabilities were allowed to change throughout the evolution of the algorithm. Both the canonicaland the non-homogeneous genetic algorithm do not converge almost surely to a population which has an optimum pointin it. In Rudolph (1994) it is proved that the elitist homogeneous canonical genetic algorithm converges almost surely to apopulation which has an optimum point in it.

In this paper, we introduce a non-homogeneous version of the elitist homogeneous genetic algorithm and prove its al-most sure convergence to a population which has an optimum point in it.

This paper is divided into four sections. In Section 2, we present the definitions and results that we use in the rest of thepaper. In Section 3, we introduce the elitist non-homogeneous genetic algorithm and we prove some results that give ussufficient conditions for guaranteeing the almost sure convergence of the algorithm and in Section 4 we present numericalresults comparing the elitist homogeneous genetic algorithm with the non-homogeneous one.

2. Preliminaries

Let Xn be a non-homogeneous Markov chain with finite state space E = 1, . . . ,N with transition matrices given byPmm≥0, where

Pm(i, j) = P(Xm+1 = j|Xm = i) = P (m,m+1)(i, j), i, j ∈ E.

By the properties of non-homogeneousMarkov chains, the k-step transition is given by the product of the transitionmatricesPmPm+1 . . . Pm+k−1, for allm ≥ 0. Thus,

P (m,m+k)(i, j) =

i1∈E,...,ik−1∈E

Pm(i, i1)Pm+1(i1, i2) · · · Pm+k−1(ik−1, j)

and we can write the Chapman–Kolmogorov equation as

P (m,m+k)= P (m,m+r)

· P (m+r,m+k), 1 ≤ r < k.

Definition 2.1. A non-homogeneous Markov chain Pnn∈N is said to be weakly ergodic if it satisfies

limk→∞

∥µ0P (m,k)− µ1P (m,k)

∥ = 0, ∀m ≥ 0 (2)

where µ0 and µ1 are arbitrary probability distributions and ∥P∥ = supi∈E

j∈E |Pij|.When the state space is finite, (2) is equivalent to

limn→∞

|P (m,m+n)ij − P (m,m+n)

kj | = 0, ∀i, j, k ∈ E, ∀m ≥ 0.

In Isaacson and Madsen (1976), it is proved that (2) is equivalent to:

limk→∞

α(P (m,k)) = 1 or limk→∞

δ(P (m,k)) = 0, ∀m ≥ 0,

where, for a stochastic matrix Q =qiji,j∈E , Dobrushin’s ergodic coefficient is defined by

α(Q ) = 1 − maxi,k∈E

j∈E

[qij − qkj]+ with [qij − qkj]+ = max0, qij − qkj

and δ(Q ) = 1 − α(Q ) is called the delta coefficient. We can rewrite Dobrushin’s ergodic coefficient as

α(P) = mini,k∈E

j∈E

min(Pij, Pkj)

and the following inequality:

δ(PQ ) ≤ δ(P)δ(Q ) (3)

holds for any stochastic matrices P and Q .

3. The elitist non-homogeneous genetic algorithm

We follow the same genetic algorithm’s structure aswas described in Rudolph (1994), namely: ‘‘A genetic algorithm con-sists of an N-tuple of binary strings bi of length l, where the bits of each string are considered to be the genes of an individualchromosome and where the N-tuple of individual chromosomes is said to be a population. The algorithm can be sketchedas follows:Choose randomly an initial populationRepeat

J.A. Rojas Cruz, A.G.C. Pereira / Statistics and Probability Letters 83 (2013) 2179–2185 2181

perform selectionperform crossoverperform mutation

until some stopping criterion applies.When using so-called proportional selection the population of the next generation is determined by N independent ran-

dom experiments. The probability that individual bi is selected from theN-tuple (b1, b2, . . . , bN) to be amember of the nextgeneration at each experiment is given by

P(bi is selected) =f (bi)

Nj=1

f (bj),

where f is a function to be maximized which satisfies 0 < f (x) < ∞. Mutation operates independently on each individualby probabilistically perturbing each bit string. The event that the jth bit of the ith individual is flipped is stochastically inde-pendent and occurs with probability pm ∈ (0, 1). Therefore the probability that string bi resembles string b′

i after mutationcan be aggregated to

P(bi → b′

i) = pH(bi,b′

i)m (1 − pm)l−H(bi,b′

i)

where H(bi, b′

i) is the Hamming distance between strings bi and b′

i , and l is the length of bi as well as of b′

i .Usually, the crossover operator is applied with some probability pc ∈ [0, 1] in order to construct a bit string from at least

two other bit strings chosen at random.’’In Rudolph (1994), the CGA is modeled by a Markov chain whose transition matrix P , which represents the algorithm,

is the product of three stochastic matrices (S), (C) and (M), which represent selection, crossover and mutation steps re-spectively. Thus, the transition matrix is given by P = CMS, where C,M and S are kept fixed throughout the evolution ofthe algorithm. In Campos et al. (2013), the mutation and crossover probabilities are allowed to change throughout time,modeling in this way the evolution of the algorithm by a non-homogeneous Markov chain expressed by Pn = SCnMn. In thatpaper the asymptotic behavior (strong and weak ergodicity) of the chain associated with the algorithm is studied.

So we can summarize the evolution of the elitist non-homogeneous genetic algorithm in the following sketch:Choose randomly an initial population having N elements and create onemore position, the N +1th entry of the populationvector, which will keep the best element from the N previous elements.Repeat

perform selection with the first N elementsperform crossover with the first N elementsperform mutation with the first N elementsIf the best element from this new population is better than that of the N + 1th position, change the N + 1th position to

this better element; otherwise, keep the N + 1th position unchangedperform pc and pm changes, as previously planned

until some stopping criterion applies.So we can see that the last position of the population vector does not take part in the selection, crossover and mutation

steps. Then if we guarantee that the evolution of the algorithm hits the set of all N-dimensional vectors in which one of theentries is the optimum point of the function, this point will be in this population vector forever. The following theoremsgive us sufficient conditions for guaranteeing that the evolution of the non-homogeneous genetic algorithm hits thatN-dimensional set in a finite number of steps.

Very simple probabilistic arguments are used to prove the following results.

Theorem 1. Let Xnn∈N be a Markov chain with state space E. Suppose there is a nonempty subset E∗⊂ E, a natural number

n0 ∈ N and a sequence of non-negative real numbers δkk∈N such that

mini∈E,j∈E∗

P ((k−1)n0,kn0)(i, j) ≥ δk (4)

and k≥1

δk = ∞. (5)

Then E∗ is visited infinitely often with probability 1 and the Markov chain is weakly ergodic.

Proof. LetH =

n≥0 I(Xn∈E∗) be the number of times the chain reaches E∗. Observe that we have the following relationship:

(H < ∞) ⊂ ∪n≥1 An (6)


where An = (Xm ∈ E∗,m ≥ n). So, in order to prove that P(H < ∞) = 0 it is sufficient to show that P(An) = 0, for alln ∈ N. Considering B1 = (Xnn0 ∈ E∗), B2 = (X(n+1)n0 ∈ E∗), . . . , Bk+1 = (X(n+k)n0 ∈ E∗), it follows that An ⊂ (B1, B2,. . . , Bk+1), ∀k ∈ N. From the Markov property, we obtain

P(B1, B2, . . . , Bk+1) = P(Bk+1|Bk) · · · P(B2|B1).

Let us prove that

P(B2|B1) ≤ 1 − δn+1. (7)

Let Bk denote the complement of the set Bk. Thus,

P(B2|B1) =

i∈E∗

P(X(n+1)n0 = i|B1)

and

P(X(n+1)n0 = i|B1) =1

P(B1)

j∈E∗

P(X(n+1)n0 = i|Xnn0 = j)P(Xnn0 = j).

From (4) we have P(X(n+1)n0 = i|Xnn0 = j) ≥ δn+1 for i ∈ E∗, j ∈ E; hence

P(B2|B1) ≥ δn+1

or

P(B2|B1) ≤ 1 − δn+1.

By a similar argument, we obtain that P(B3|B2) ≤ 1− δn+2, P(B4|B3) ≤ 1− δn+3, . . . , P(Bk+1|Bk) ≤ 1− δn+k. It follows fromthe previous inequalities that

P(B1, B2, . . . , Bk+1) ≤ (1 − δn+1)(1 − δn+2) · · · (1 − δn+k).

Note that the previous inequality holds for all k ∈ N. From (5) it follows that P(An) = 0; therefore P(H < ∞) = 0, orequivalently, P(H = ∞) = 1.

Now we are going to verify the weak ergodicity of the Markov chain. Observe the following inequality:

α(P) = mini,k∈E

j∈E

min(Pij, Pkj) ≥ mini∈E

Pij0

where j0 is any element of E. From the hypothesis, it follows that

α(P (k−1)n0,kn0) ≥ δk

and thus

δ(P (k−1)n0,kn0) ≤ (1 − δk).

From the inequality (3) and (5) it follows that

limk→∞

δ(P (m,k)) = 0, ∀m ≥ 0.

The next theorem states that the time of the first visit to E∗ is finite with probability 1 and it is obtained as a corollary of thelast theorem, since T = ∞ ⊂ E∗ is never visited.

Theorem 2. We guarantee that, under the same hypotheses as for Theorem 1,

P(T < ∞) = 1 (8)

where T = minn ∈ N : Xn ∈ E∗.

The elitist non-homogeneous genetic algorithm is a concatenation of the method explained in Rudolph (1994) and thealgorithm proposed in Campos et al. (2013). The part due to Rudolph (1994) is that where a new position in the populationvector is created to keep the best individual, which does not participate in the evolutionary process of the traditionalalgorithm, and it is changed only if, as the algorithm evolves, a better chromosome appears in the population. The partdue to Campos et al. (2013) is that where, while the algorithm evolves, themutation and crossover probabilities are allowedto change.

As we can see, once the algorithm visits the optimum subset E∗= (x1, x2, . . . , xN , xN+1) ∈ E : f (xN+1) = maxy∈Bi f (y),

the optimum point never leaves the population. So, E∗ is a kind of closed set, and we have seen that the algorithm visitsE∗ with probability 1, and this happens in a finite time with probability 1; hence conditions (4) and (5) do not hold for


Table 1H3 function parameters.

i aij ci pij

1 3 10 30 1 0.3689 0.117 0.26732 0.1 10 35 1.2 0.4699 0.4387 0.7473 3 10 30 3 0.1091 0.8732 0.55474 0.1 10 35 3.2 0.03815 0.5743 0.8828

the complement of E∗. Using the results in Campos et al. (2013) it is easy to show that for the non-homogeneous geneticalgorithm, there is a sequence αn such that

infi∈E,j∈E∗

Pn(i, j) ≥ αn.

So we can state:

Corollary 3. Let Xnn∈N be theMarkov chainwhichmodels the elitist non-homogeneous genetic algorithm; if the sequence aboveis such that

k≥1 αk = ∞ then

Plimn→∞

Xn ∈ E∗

= 1. (9)

A simpler condition, to run in simulations, that guarantees the above result is:

Corollary 4. Let Xnn∈N be the Markov chain which models the elitist non-homogeneous genetic algorithm; if the mutationprobabilities p(n)

m n∈N are such that p(n)m > γ > 0 for all n ∈ N or

n pm(n)l < ∞ then (9) holds.

4. Numerical results

In this sectionwe present numerical results obtained from the simulations thatwe developed. It worth emphasizing that,in the non-homogeneous genetic algorithm, themutation probability should, at the beginning, be bigger that in the canonicalgenetic algorithm, to allow the algorithm to expand its search space. As the algorithm evolves, this probability should getsmaller gradually. This procedure is illustrated in the following examples. We used test functions from Dixon and Szegö(1978), Allufi-Pentini et al. (1985) andMichalewicz (1992). Three functionswere used in the simulations that follow, namely:

(a) A function from the Hartman family, given by

f (x) = −

mi=1

ci exp

n

j=1

aij(xj − pij)2

,

where Ω = x ∈ Rn; 0 ≤ xi ≤ 1, 1 ≤ i ≤ n. In this simulation we used the function known as H3 for which n = 3 and

m = 4. The function parameters are presented in Table 1.(b) Function P8 (Allufi-Pentini et al., 1985) given by

f (x) =π

n

k1 sin2(πy1) +

n−1i=1

(yi − k2)2[1 + k1 sin2(πyi+1)] + (yn − k2)2

+

ni=1

u(xi, 10, 100, 4),

where yi = 1 +xi−14 , k1 = 10, k2 = 1 and u is a penalty function given by

u(xi, a, k,m) =

k(xi − a)m, xi > a0, −a ≤ xi ≤ ak(−xi − a)m, xi < −a

where Ω = x ∈ R3; −10 ≤ xi ≤ 10, i = 1, 2, 3.

(c) f : [−2, 5] × [−2, 5] → R defined by the following expression: f (x, y) = 6 − 3 cos(2πx) − 3 cos(2πy) is proposed inMichalewicz (1992) as a challenge.

In each simulation we proceeded observing the following rules:In all simulations we used a fixed crossover probability pc = 0.5 and the stopping criterion was the number of steps,

namely: 1000.Each interval was split into 28 equally divided parts and the known optimum point was in the partition at or very near

to one point of the proposed partition.We have run each algorithm 100 times; we observed, in each repetition, in which step the algorithm reached the subset

E∗; we summed the total and we plotted those results in graphics where on the x axis we read the steps of the algorithm


Fig. 1. Function 1 and a five-element population size.

Fig. 2. Function 1 and a twenty-element population size.


and on the y axis we show the number of trajectories which reach E∗ in x steps of the algorithm. For each function we haveplotted four graphics, for populations of five and twenty elements (see Figs. 1–6).

In the crossover step, we select the pairs for changing genetic materials, in following way. First, select with probability pcthe elements of the population which are going to take part in the crossover process. After doing that, the pairs are formedfrom those elements which were selected in the following way: the first and the second selected elements, the third andfourth, etc., and we go on in this way until it is no longer possible to form another pair. From each pair, called parents,another pair is obtained, called children, by the following procedure. Remember that each element of the population is abinary string of length l. A number k is generated from a discrete uniform 1, 2, . . . , l distribution, and the binary inputof the parents before the kth position is maintained and the binary input of the parents after the kth position is changedbetween them. These changes result in two new elements, called children. This procedure is repeated for all pairs formedas before.

The continuous graphics represent the algorithm’s outcomes for a fixed mutation probability. We choose a value whichis commonly used in practice, namely pm = 0.01. In the non-homogeneous algorithm we can use as pm(n) any functionsuch that conditions (4) and (5) are satisfied. We think that the natural thing to do is as follows. At the beginning we allowthe algorithm to search among as many different points as possible, making themutation probability high. After a while, wehope that the algorithm can diminish its search area, lowering the mutation probability as time goes by. Functions whichare limited from below by a positive real number make conditions (4) and (5) hold. Wewill use in the following simulationspm(n) = 0.05 − 0.000049n, where n is the number of iterations that the algorithm has undergone. We are limiting theperformance of the algorithm to 1000 iterations, so pm(1000) ≥ 0.001. If you want to let the algorithm run indefinitely, usea pm(n) such that it is decreasing and asymptotes a positive value; this function will guarantee that conditions (4) and (5)are satisfied.

5. Conclusions

In these examples above,we can clearly see that E∗ wasmore often visitedwhen the non-homogeneous genetic algorithmwas used. Even in those realizations inwhich the numbers of visits to E∗ were close to each other, the number of realizationsneeded to get 100% success was reached first when using the algorithm just mentioned. We can observe that the bigger thepopulation size is, the closer the results are, butwe remember that the bigger the population is, the bigger the computationaleffort is. So far we have worked with a fixed crossover probability; in the future we intend to perform some tests when both





pm and pc change throughout the algorithm’s evolution. It is easy to perform the algorithm in such away that the hypothesesof the theorems above are satisfied; for example, if the pm(n) > p > 0, ∀n ∈ N, we have that all the hypotheses are verified.

Acknowledgment

The authors were partially supported by PROCAD/CAPES.

References

Allufi-Pentini, F., Parisi, V., Zirilli, F., 1985. Global optimization and stochastic differential equations. Journal of Optimization: Theory and Applications 47,1–16.

Campos, V.E.M., Pereira, A.G.C., Rojas Cruz, J.A., 2013. Modeling the genetic algorithm by a non-homogeneous Markov chain: weak and strong ergodicity.Theory of Probability and its Applications 57 (1), 144–151.

Cerf, R., 1996. A new genetic algorithm. The Annals of Applied Probability 6 (3), 778–817.Cerf, R., 1998. Asymptotic convergence of genetic algorithms. Advances in Applied Probability 30, 521–550.Dixon, L.C.W., Szegö, G.P., 1978. The global optimization problem: an introduction. In: Dixon, L.C.W., Szegö, G.P. (Eds.), Towards Global Optimization,

vol. 2. North-Holland, Amsterdam, pp. 1–15.Eiben, A.E., Aarts, E.H.L., Van Hee, K.M., 1991. Global convergence of genetic algorithms: a Markov chain analysis. In: Schewefel, H.P., Männer, R. (Eds.),

Parallel Problem Solving from Nature. Springer, Berlin, Heidelberg, pp. 4–12.Fogel, D.B., 1992. Evolving artificial intelligence, Ph.D. dissert. University of California, San Diego.Holland, J.H., 1975. Adaptation in Natural and Artificial Systems. The University of Michigan Press, Ann Arbor.Isaacson, R.L., Madsen, R.W., 1976. Markov Chains, Theory and Applications. Wiley, New York.Michalewicz, Z., 1992. Genetic Algorithms + Data Structures = Evolution Programs. Springer-Verlag, New York.Rudolph, G., 1994. Convergence analysis of canonical genetic algorithms. IEEE Transactions on Neural Networks 5, 96–101.

http://refhub.elsevier.com/S0167-7152(13)00193-4/sbref1










Documents

The elitist non-homogeneous genetic algorithm: Almost sure convergence