PHYSICALTHEORYFORPARTICLESWARM … OPTIMIZATION S.M.MikkiandA.A.Kishk CenterofAppliedElectromagneticSystemsResearch ... between the PSO algorithms, ...Published in: Progress in Electromagnetics

Progress In Electromagnetics Research, PIER 75, 171–207, 2007

PHYSICAL THEORY FOR PARTICLE SWARMOPTIMIZATION

S. M. Mikki and A. A. Kishk

Center of Applied Electromagnetic Systems ResearchDepartment of Electrical EngineeringUniversity of MississippiUniversity, MS 38677, USA

Abstract—We propose an inter-disciplinary approach to particleswarm optimization (PSO) by establishing a molecular dynamics (MD)formulation of the algorithm, leading to a physical theory for theswarm environment. The physical theory provides new insights onthe operational mechanism of the PSO method. In particular, athermodynamic analysis, which is based on the MD formulation,is introduced to provide deeper understanding of the convergencebehavior of the basic classical PSO algorithm. The thermodynamictheory is used to propose a new acceleration technique for the PSO.This technique is applied to the problem of synthesis of linear arrayantennas and very good improvement in the convergence performanceis observed. A macroscopic study of the PSO is conducted byformulating a diffusion model for the swarm environment. TheEinstein’s diffusion equation is solved for the corresponding probabilitydensity function (pdf) of the particles trajectory. The diffusionmodel for the classical PSO is used, in conjunction with Schrodinger’sequation for the quantum PSO, to propose a generalized version of thePSO algorithm based on the theory of Markov chains. This unifies thetwo versions of the PSO, classical and quantum, by eliminating thevelocity and introducing position-only update equations based on theprobability law of the method.

1. INTRODUCTION

There is a close connection between the evolution of the dynamicalvariables in physical systems and optimization. It has been knownsince two hundred years that the law of motion in Newtonian mechanicscan be obtained by minimizing certain functional called the “action.”

172 Mikki and Kishk

Moreover, with the invention of the path integral formalism by RichardFeynman in the last century, we now know that quantum phenomenacan also be described by the very same approach [1]. It turns out thatmost of the known physical processes can be intimately related to somesort of “critical” or deeper variational problem: The optimum of thisproblem leads to the equations of motion in an elegant and neat way.

Although this has been known for a long time, optimization isstill considered in physics, applied mathematics, and engineering, asa tool used to solve some difficult practical or theoretical problems.For example, the use of optimization codes in engineering problemsis mainly due to our inability to find a desired solution in reasonabletime. Optimization methods provides then a faster “search” for thebest performance of the system of interest. However, optimizationand physical systems are really two different ways to describe thesame thing. If any particle moves in the trajectory that minimizesthe action of the problem, then the search for the optimum of thefunctional (objective function) is equivalent to find the equations ofmotion of the particle. Our point of view is that if a more formalanalogy between physical systems and optimization algorithms can beconstructed on the fundamental level, then deeper insights on boththeoretical physics and the algorithm itself can be obtained.

The main purpose of this paper is twofold. First, we takethe particle swarm optimization (PSO) algorithm as a case study,or a toy model, to study how a Lagrangian formulation of thisglobal optimization strategy can be constructed. This will motivatesome of the tuning parameters that were introduced by the heuristicoptimization community based on an intuition that is not directlyrelated to the physical nature of the problem. Also, the Lagrangianformalism will reveal new hidden aspects in the algorithm. Forexample, we will show that the basic PSO algorithm lacks any“electromagnetic” nature. Particles are not electrically charged andhence do not radiate. However, the general formalism can providesome insights on how to add this electromagnetic behavior to thealgorithm in future studies. The most interesting insight, however,is the possibility of looking to both the classical and quantum versionsof the PSO algorithm as different manifestations of a single underlyingMarkov process, a view that is currently revived in theoretical physics.

Second, this work aims to introduce new inter-disciplinaryperspectives for the artificial intelligence (AI) and evolutionarycomputing (EC) communities. While EC methods, generally referredto as heuristic, work very well with complicated problems [2–5], stilllittle is known about the fundamental mechanisms responsible forthe satisfactory performance of the technique. Although the social

Progress In Electromagnetics Research, PIER 75, 2007 173

intelligence point of view has been emphasized considerably in thePSO literature, it seems that the treatment so far is still conventionalin the sense of working with interpretations based on the previouslyestablished literature of the GA [7]. The main thrust behind EC andAI strategies is the inspiration by nature, where in the case of AInature is the human mind, while in EC it is the evolutionary paradigm.Therefore, it looks plausible to continue following this original impulseby allowing for further analogies inspired by similarities with othernon-biological natural phenomena. It is the hope of the authors ofthis work that the utilization of concepts that are totally outside thetraditional AI literature may illuminates new routes in studying theproblem of foundations of EC methods.

In this paper, we introduce a new view on the PSO algorithmthat is based on physics, rather than artificial intelligence (AI) orevolutionary computing (EC). The purpose of this analogy is tofurther enhance the understanding of how the algorithm works andto provide new tools for the study of the dynamics of the method.The basic idea is to observe the close similarity between a swarm ofparticles, communicating with each other through individual and socialknowledge, and a collection of material particles interacting through aclassic Newtonian field. In particular, molecular dynamics (MD) willbe studied in connection to the PSO environment. Various relationsand analogies will be established and illustrated through this article.

Particle swarm optimization (PSO) is a global search strategy thatcan handle difficult optimization problems. Since its introduction at1995 [6], the PSO algorithm was refined and extended [3–5] to enhanceits performance in many practical problems, including engineeringelectromagnetics [10–12]. The method proved to work successfully withstandard test functions and practical engineering problems. It has beenclaimed that the PSO represents a new trend in artificial intelligenceand soft computing, in which intelligence appears as an emergentbehavior from the social dynamic interactions between swarm members[7]. To date, the work in [7] is still the only major study that providingcomprehensive and in-depth analysis of why and how the algorithmworks. However, the main theme introduced there is the evolutionaryperspective and the new paradigm of social psychology, in which theindividual behavior is understood through the collective interactionslevel. Little has been done in investigating the possible connectionsbetween the PSO algorithms, in its social intelligence context, withfundamental physics, like Newtonian mechanics and quantum theory.Of special importance is the recent work on quantum versions of thePSO in which the laws of physics, Schrodinger’s equation in thiscase, were employed directly in the formulation [13–15]. While the

174 Mikki and Kishk

incorporation of physics in the quantum version was self-evident andfundamental, the situation with the classical PSO algorithm looksvery different. Indeed, the laws of physics were not introduced asindispensable means to understand how the algorithm works. Providedthat it is still possible to keep the social knowledge point of view in thepicture, an analysis of the problem, which is inspired by fundamentalphysics, can reveal some hidden aspects and open the door for newperspective in the problem of foundations of evolutionary computing.

This paper is organized as follow. First, we review the basicprocedure of the classical PSO in order to prepare for the mathematicaltreatment of the next sections. Second, a formal analogy between thePSO and MD is established by deriving the corresponding equivalentmechanical force acting on the particle in the PSO. The resultingequations are discretized to produce the equations of motion. Third,based on this analogy, the thermodynamic performance of the methodis studied by defining effective macroscopic dynamic observables, likethe swarm temperature, and then investigating their behavior. Fourth,a new acceleration technique is proposed based on the thermodynamicanalysis presented before. The new method is applied to the problemof synthesis of linear array antennas. Fifth, a fully macroscopicformulation of the classical PSO is proposed by constructing a diffusionmodel for the particle swarm environment. Sixth, based on theknowledge of the pdf of the particle trajectory, a generalized versionof the PSO method is proposed based on the theory of Markov chains.Finally, conclusions and suggestions for future work are given.

2. REVIEW OF THE PSO ALGORITHM

We start with an N -dimensional vector space RN . A population of Mparticles is assumed to evolve in this space such that each particle isassigned the following position and velocity vectors, respectively

ri(t) =[r1(t) r2(t) . . . rN (t)

]T (1)

vi(t) =[v1(t) v2(t) . . . vN (t)

]T (2)

where T is the transpose operator and i ∈ {1, 2, ...,M}.In addition to these dynamic quantities, we postulate two memory

locations encoded in the variables pi,Ln , the local best of the ith particle,

and pgn, the global best, both for the nth dimension. The basic idea

of the classical PSO algorithm is the clever exchange of informationabout the global and local best values. This exchange is accomplished


by the following two equations

vin(t+ ∆t) = w vi

n(t)+c1ϕ1

[pi,L

n − xin(t)

]∆t+c2ϕ2

[pg

n − xin(t)

]∆t(3)

rin(t+ ∆t) = rin(t) + ∆t vin(t) (4)

where n ∈ {1, 2, ..., N}; ∆t is the time step; c1 and c2 are thecognitive and social factors, respectively; ϕ1 and ϕ2 are two statisticallyindependent random variables uniformly distributed between 0 and 1;w is the inertia factor. More in-depth details on the PSO method canbe found in [7–9].

It has been shown in [8] that in order for the basic PSO algorithm(w = 1) to converge, all particles must approach the location Pi givenby

P in =

1ϕ1 + ϕ2

(ϕ1p

i,Ln + ϕ2p

gn

)(5)

It is understood in equations (3)–(5) that for each dimension n therandom number generators of ϕ1 and ϕ2 are initialized with differentseeds. The convergence of the PSO algorithm to the location (5) can beguaranteed by proper tuning of the cognitive and social parameters ofthe algorithm [8]. To prevent explosion of the particles in the classicalPSO, a maximum velocity Vmax is introduced in each dimension toconfine swarm members inside the boundary walls of the domain ofinterest. Also, the inertia factor w is decreased linearly during thealgorithm run in order to enhance the convergence performance [7].

3. MOLECULAR DYNAMICS FORMULATION

The main goal of molecular dynamics (MD) is to simulate the stateof a system consisting of very large number of molecules. A trueconsideration of such systems requires applying quantum-theoreticapproach. However, the exact description of the quantum statesrequires solving the governing Schrodinger’s equations, a task that isinherently impossible for many-particle systems [16]. Alternatively,MD provides a short cut that can lead to accurate results in spite ofthe many approximations and simplifications implied in its procedure[18].

MD is based on calculating at each time step the positions andvelocities, for all particles, by direct integration of the equations ofmotions. These equations are constructed basically from a classicalLagrangian formalism in which interactions between the particles areassumed to follow certain potential functions. The determination of the

176 Mikki and Kishk

specific form of these potentials depends largely on phenomenologicalmodels and/or quantum-theoretic considerations.

Besides MD, there are in general three other methods to study theevolution of large number of particles: Quantum mechanics, statisticalmechanics, and Mont Carlo [18]. However, MD is preferred in ourcontext over the other methods because it permits a direct exploitationof the structural similarities between the discrete form of the updateequations of the PSO algorithm and the Lagrangian formalism.

3.1. Conservative PSO Environments

In order to formally construct the analogy between PSO andNewtonian mechanics, we consider a set of M identical particles,each with mass m, interacting with each other. We start first by aconservative system described by a Lagrangian function given by

L(ri, ri

)=

M∑i=1

12m ri · ri − U

(r1, r2, · · ·, rM

)(6)

where ri and ri = vi are the position and velocity of the ith particle,respectively. U is the potential function, describing the intrinsicstrength (energy) of the spatial locations of the particle in the space.The equations of motion can be found by searching for the criticalvalue for the action integral

S =t2∫

t1

L(r1, r2, ..., rM ; r1, r2, ..., rM

)dt (7)

where t1 and t2 are the initial and final times upon which the boundaryof the trajectory is specified. The solution for this “optimization”problem is the Euler-Lagrange equation

∂L

∂xi− d

dt

∂L

∂xi= 0 (8)

Equations (6) and (8) lead to

ai = vi = ri =Fi

m(9)

where Fi is the mechanical force acting on the ith particle in theswarm and ai is its resulted acceleration. The mechanical force canbe expressed in terms of the potential function U as follows

Fi = − ∂

∂riU

(r1, r2, ..., rM

)(10)


Equations (6)–(10) represents a complete mechanical descriptionof the particle swarm evolution; it is basically a system of continuousordinary differential equations in time. To map this continuous versionto a discrete one, like the PSO in (3) and (4), we consider the Euler-Cauchy discretization scheme [19]. It is possible to write the equationsof motion in discrete time as

v (k∆t) = v ((k − 1) ∆t) + ∆ta (k∆t) (11)r (k∆t) = r ((k − 1) ∆t) + ∆tv (k∆t) (12)

where �t is the time step of integration.By comparing equation (3) and (4) to (11) and (12), it can be

concluded that the PSO algorithm corresponds exactly to a swarm ofclassical particles interacting with a field of conservative force only ifw = 1, which corresponds to the basic form of the PSO algorithmoriginally proposed in [6]. The acceleration is given by the followingcompact vector form

ai (k∆t) = Φ[Pi − ri (k∆t)

](13)

where Φ is a diagonal matrix with the non-zero elements drawn from aset of mutually exclusive random variables uniformly distributed from0 to 1.

The integer k ∈ {1, 2, ..., Nitr}, where Nitr is the total numberof iterations (generations), represents the time index of the currentiteration. Thus, the force acting on the ith particle at the kth timestep is given by

Fi (k∆t) = m Φ[Pi − ri (k∆t)

](14)

Equation (14) can be interpreted as Hooke’s law. In the PSOalgorithm, particles appear to be driven by a force directly proportionalto the displacement of their respective positions with respect to somecenter given by (5). Thus, for each particle there exists an equivalentmechanical spring with an anisotropic Hooke’s tensor equal to mΦ.Figure 1 illustrates this analogy.

The mass of the particle appearing in equation (14) can beconsidered as an extra parameter of the theory that can be chosenfreely. This is because the basic scheme of the PSO method assumespoint-like particles. Therefore, we will choose the mass m such thatthe simplest form of the quantities of interest can be obtained.

3.2. Philosophical Discussion

The main idea in the proposed connection between the PSO algorithmand the physical system descried by the Lagrangian (6) is to shift all

178 Mikki and Kishk

Figure 1. A 3-D illustration of the mechanical analogy to the PSO.The ith particle will experience a mechanical force identical to a springwith one end attached to the position and the other end attached tothe particle’s itself.

of the ‘randomness’ and social intelligence (represented by p) to thelaw of force acting on particles in a way identical to Newton’s law inits discrete-time form. This is as if there exists a hypothetical GrandObserver who is able to monitor the motion of all particles, calculatesthe global and local best, average them in a way that looks randomonly to other observers, and then apply this force to the particlesinvolved. Obviously, this violates relativity in which nothing can travelfaster than the speed of light but Newtonian mechanics is not local inthe relativistic sense. The assumption of intelligence is just the waythe mechanical force is calculated at each time instant. After that,the system responds in its Newtonian way. There is no restriction inclassical mechanics to be imposed on the force. What makes naturerich and diverse is the different ways in which the mechanical law offorce manifests itself in Newton’s formalism. In molecular dynamics(MD), it is the specific (phenomenological) law of force what makesthe computation a good simulation of the quantum reality of atomsand molecules.

However, on a deeper level, even this Grand Observer whomonitor the performance of the PSO algorithm can be describedmathematically. That is, by combining equations (10) and (14), weget the following differential equation for the potential U

∂U

∂ri+m Φ

(Pi − ri

)= 0 (15)

The reason why we did not attempt to solve this equation is thefact that P , as defined in (5), is a complicated function of the positionsof all particles, which enforces on us the many-body interaction theme.Also, the equation cannot be solved uniquely since we know onlythe discrete values of P , while an interpolation/extrapolation to thecontinuous limit is implicit in the connection we are drawing between


the PSO algorithm and the physical system. Moreover, since P is not alinear function of the interaction between particles, then equation (15)is a complex many-body nonlinear equation. While it can be solvedin principle, the nonlinearity may produce very complicated patternsthat look random (chaos) because of the sensitivity to the precision inthe initial conditions. However, all of this does not rule out the formalanalogy between the PSO and the Lagrangian physical system, at leaston the qualitative level. In the remaining parts of this paper, we willtake U to represents some potential function with an explicit form thatis not known to us, and actually not very important in relation to theconclusions and results presented here.

Regarding to the appearance of random number generators in thelaw of force (13), few remarks are given here. It should be clear thata Newtonian particle moves in a deterministic motion because of ourinfinite-precision knowledge of the external force acting on it, its mass,and the initial conditions. According to F = ma, if m is knownprecisely, but F is random (let us say because of some ignorance inthe observer state), then the resulting trajectory will look random.However, and this is the key point, the particle does not cease to beNewtonian at all! What is Newtonian, anyhow, is the dynamical lawof motion, not the superficial, observer-dependent, judgment that themotion looks random or deterministic with respect to his knowledge.

This can be employed to reflect philosophically on the natureof the term ‘intelligence.’ We think that any decision-making mustinvolve some sort of ignorance or state of lack of knowledge. Otherwise,AI can be formulated as a purely computational system (eventuallyby a universal Turing machine). However, the controversial viewthat such automata can ultimately describe the human mind wasevacuated from our discussion at the beginning by pushing the socialintelligence P to the details of the external mechanical force law, andthen following the consequences that can be derived by considering thatthe particles respond in a Newtonian fashion. We are fully aware thatno proof about the nature of intelligence was given here in the rigorousmathematical sense, although something similar has been attemptedin literature [21].

To summarize, the social and cognitive knowledge are buried inthe displacement origin Pi, from which the entire swarm will developits intelligent behavior and search for the global optimum within thefitness landscape. The difficulty in providing full analysis of the PSOstems from the fact that the displacement center Pi is varying witheach time step and for every particle according to the information ofthe entire swarm, rendering the PSO inherently a many-body problemin which each particle interacts with all the others.

180 Mikki and Kishk

3.3. Nonconservative (Dissipative) PSO Environments

It is important to state that for the transformation between the PSOand MD, as presented by equation (13), to be exact, the inertia factor wwas assumed to be unity. However, as we pointed out in Section 2, forsatisfactory performance of the PSO algorithm in real problems oneusually reverts to the strategy of linearly decreasing w to a certainlower value. In this section, we will study in details the physicalmeaning of this linear variation.

The velocity equation for the PSO is given by

vi (k∆t) = wvi ((k − 1) ∆t) + Φ[Pi ((k − 1) ∆t) − xi ((k − 1) ∆t)

]∆t

(16)Thus, by rearranging terms we can write

vi (k∆t) − vi ((k − 1) ∆t)∆t

= −1 − w (t)∆t

vi ((k − 1) ∆t)

+Φ[Pi ((k − 1) ∆t) − xi ((k − 1) ∆t)

] (17)

Here w = w(t) refers to a general function of time (linear variation isone common example in the PSO community). By taking the limitwhen ∆t −→ 0 we find

ai (t) = βvi (t) + Φ[Pi (t) − xi (t)

](18)

whereβ = lim

∆t→0

w (t) − 1∆t

(19)

Comparing equation (13) with (18), it is clear that theconservative Lagrangian in (6) cannot admit a derivation of the PSOequation of motion when w is different from unity; there exists a termproportional to velocity that does not fit with Newton’s second law asstated in (9).

This problem can be solved by considering the physical meaningof the extra terms. For a typical variation of w starting at unity andending at some smaller value (typically in the range 0.2–0.4, dependingon the objective function), then we find from (19) that β is negative.That is, since the total force is given by the product of the accelerationai in (18) and the mass m, then it seems that the term βvi counts for“friction” in the system that tends to lower the absolute value of thevelocity as the particles evolve in time. In other words, w amountsto a dissipation in the system with strength given by the factor β.This explains why the conservative Lagrangian failed to produce the


equations of motion in this particular case since the system is actuallynonconservative.

Fortunately, it is still possible to employ a modified versionof the Lagrangian formalism to include the dissipative case in ourgeneral analogy to physical systems. To accomplish this, we splitthe Lagrangian into two parts. The first is L′, which represents theconservative part and consists of the difference between kinetic andpotential energy as in (6) and repeated here for convenience

L′ =12

∑i

m xi · xi − U (20)

where the potential energy is function of the positions only.The second part, L′′, accounts for the nonconservative or

dissipative contribution to the system. Following [22], we write

L′′ =12

∑i

βixi · xi +12

∑i

γixi · xi (21)

Here βi represents the Rayleigh losses in the system and is similarto friction or viscosity. The second term that contains γi accountsfor radiation losses in the system. For example, if the particlesare electrically charged, then any nonzero acceleration will force theparticle to radiate an electromagnetic wave that carries away part ofthe total mechanical energy of the system, producing irreversible lossesand therefore dissipation. Both ofm and γ are assumed to be constantsindependent of time but this is not necessary for βi. Moreover, theconservative Lagrangian L′ has the dimensions of energy while thenonconservative part L′′ has the units of power.

The equation of motion for the modified Lagrangian is given by[22] {

∂L′′

∂xi− d

dt

∂L′′

∂xi

}−

{∂L′

∂xi− d

dt

∂L′

∂xi

}= 0 (22)

By substituting (20) and (21) to (22), we get

−γid

dtxi +mxi + βixi = − ∂U

∂xi(23)

Comparing (23) with (18), we immediately find

γi = 0 (24)

andβi = −mβ = m lim

∆t→0

1 − w (t)∆t

(25)

182 Mikki and Kishk

where equation (15) has been used.The result (24) means that there is no electromagnetic radiation

in the PSO environment. One can say that the idealized particles inthe PSO algorithm do not carry electric charge. However, it is alwayspossible to modify the basic PSO to introduce new parameters. Oneof the main advantages of constructing a physical theory is to havean intuitive understanding of the meaning of tuning parameters in thealgorithm. We suggest considering the idea of adding an electric chargeto the particles in the algorithm and investigate whether this may leadto an improvement in the performance or not.

Equation (25) amounts to a connection between the physicalparameter β, the Rayleigh constant of friction, and a tuning parameterin the PSO algorithm, w. This motivates the idea behind slowlydecreasing the inertia w during convergence. As w(t) decreases, β(t)increases, which means that the dissipation of the environment willincrease leading to faster convergence. From this discussion we seethat the common name attributed to w is slightly misleading. Fromthe mechanical point of view, inertia is related to the mass m whilew controls the dissipation or the friction of the system. We suggestthen calling w the ‘friction constant’ or the ‘dissipation constant’ ofthe PSO algorithm.

The fact that the system is dissipative when w is less thanunity may lead to some theoretical problems in the next sections,especially those related to thermodynamic equilibrium. However, acareful analysis of the relative magnitudes of the ‘friction force’ and the‘PSO force’ in (23) shows that for a typical linear variation of w from0.9 to 0.2, the friction becomes maximum at the final iteration, mostprobably when the algorithm has already converged to some result.At the beginning of the run, the friction is minimum while particlesare expected to be ‘reasonably’ away from the local and global bests;that is, the difference ϕ(x − p) is large compared to βv. This meansthat as the algorithm evolves to search for the nontrivial optima, thefriction is not very significant. It becomes so only at the final iterationswhen β in (19) is already large compared to the PSO force (14).This explains why we have to be careful in choosing w; specifically,if the dissipation is increasing faster than the “intelligent” PSO force,immature convergence will occur and the algorithm will not be ableto catch the global optimum. Based on this discussion, we will alwaysassume that w(t) is chosen to vary with time “wisely” (i.e., by insuringthat immature convergence is avoided). Therefore, we will ignore theeffect of dissipation and treat the PSO as a collection of perfectlyNewtonian particles interacting in conservative environment. Furtherresults and studies in the remaining sections confirm this assumption.


4. EXTRACTION OF INFORMATION FROM SWARMDYNAMICS

The dynamic history of a swarm of particles is completely defined eitherby the system of equations (3) and (4) for the PSO, or (6)–(13) forMD. The two representations are equivalent, where the transformationbetween one to the other is obtained by the mapping in (13). Afterfinishing the simulation, we will end up with a set of M trajectories,each describing the successive velocities (momentums) and positions ofone particle during the course of time. Any trajectory is assumed to beone-dimensional surface in an abstract 2MN -dimensional vector space.We assume that this space forms a manifold through the adoptedcoordinate system and call it the phase space of the swarm system.

Let us write the trajectory of the ith particle as

Γi (t) ={(

ri (t) ,pim (t)

), ∀t ∈ It

}(26)

in the continuous form and

Γi (k) ={(

ri (k∆t) ,pim (k∆t)

)}Nitr

k=1(27)

for the discrete case. Here It is the continuous time interval of thesimulation and Nitr is the number of iterations. The swarm dynamichistory of either the MD or the PSO can be defined as the set of allparticles trajectories obtained after finishing the simulation

Γ (t) ={Γi (t)

}M

i=1(28)

One of the main objectives of this article is to provide newinsight on how the classical PSO algorithm works by studying differentquantities of interest. We define any dynamic observable of the swarmdynamics to be a smooth function of the swarm dynamic history Γ (t).That is, the general form of a dynamic property (observable) will bewritten as

AΦ (t) = ΦΓ (t) (29)

where Φ is a sufficiently smooth operator.The PSO is inherently a stochastic process as can be inferred

from the basic equations (3) and (4). Moreover, we will show laterthat a true description of MD must rely on statistical mechanicalconsiderations where the particle’s trajectory is postulated as astochastic process. Therefore, single evaluation of a dynamic property,as defined in (29), cannot reflect the qualitative performance of interest.Instead, some sort of averaging is required in order to get reasonable

184 Mikki and Kishk

results. We introduce here two types of averages. The first one is thetime average defined as

〈AΦ〉 = limT→∞

1T

t0+T∫t0

AΦ (t) dt (30)

and

〈AΦ〉 =1Nitr

Nitr∑k=1

AΦ (k) (31)

for continues and discrete systems, respectively. An importantassumption usually invoked in MD simulations is that an average overfinite interval should estimate the infinite integration/summation of(31) [16]. Obviously, this is because any actual simulation intervalmust be finite. The validity of this assumption can be justified onthe basis that the number of time steps in MD simulations is large.However, no such restriction can be imposed in the PSO solutionsbecause in engineering applications, especially real-time systems, thenumber of iterations must be kept as small as possible to facilitategood performance in short time intervals. Therefore, instead of timeaverages, we will invoke the ensemble average, originally introducedin statistical mechanics. We denote the ensemble average, or expectedvalue, of a dynamic observable AΦ as E[AΦ]. It is based on the fact thatour phase space is endowed with a probability measure so meaningfulprobabilities can be assigned to outcomes of experiments performed inthis space. Moreover, if the system that produced the swarm dynamicsin (28) is ergodic, then we can equate the time average with ensembleaverage. In other words, under the ergodic hypothesis we can write[16, 20]

〈AΦ〉 = E [AΦ] (32)

Therefore, by assuming that the ergodicity hypothesis is satisfied,one can always perform ensemble average whenever a time averageis required. Such assumption is widely used in MD and statisticalphysics [16, 18]. However, in the remaining parts of this paper weemploy ensemble average, avoiding therefore the controversial opinionof whether all versions of the PSO algorithm are strictly ergodic ornot. Moreover, the use of ensemble average is more convenient whenthe system under consideration is dissipative.


5. THERMODYNAMIC ANALYSIS OF THE PSOENVIRONMENT

5.1. Thermal Equilibrium

Thermal equilibrium can be defined as the state of a system ofmolecules in which no change of the macroscopic quantities occurswith time. This means that either all molecules have reached acomplete halt status (absolute zero temperature), or that averagesof the macroscopic variables of interest don’t change with time. Tostart studying the thermal equilibrium of a swarm of particles, we willemploy the previous formal analogy between the PSO and classicalNewtonian particle environments to define some quantities of interest,which will characterize and enhance our understanding of the basicmechanism behind the PSO.

In physics, the temperature of a group of interacting particles isdefined as the average of the kinetic energies of all particles. Basedon kinetic theory, the following expression for the temperature can beused [16]

T (t) =1M

m

NkB

M∑i=1

|vi (t)|2 (33)

where kB is Boltzmann’s constant and for the conventional 3-dimensional space we have N = 3. Analogously, we define the particleswarm temperature as

T (t) = E

[1M

M∑i=1

|vi (t)|2]

(34)

where without any loss of generality the particle’s mass is assumed tobe

m = NkB (35)

Since there is no exact a priori knowledge of the statisticaldistribution of particles at the off-equilibrium state, the expected valuein (34) must be estimated numerically. We employ the followingestimation here

T (t) =1MB

B∑j=1

M∑i=1

∣∣∣vji (t)

∣∣∣2 (36)

where B is the number of repeated experiment. In the j th run, thevelocities vj

i (t), i = 1, 2, ..,M, are recorded and added to the sum.The total result is then averaged over the number of runs. Theinitial positions and velocities of the particles in the PSO usually

186 Mikki and Kishk

follow a uniformly random distribution. At the beginning of the PSOsimulation, the particles positions and velocities are assigned randomvalues. From the kinetic theory of gases point of view, this meansthat the initial swarm is not at the thermal equilibrium. The reasonis that for thermal equilibrium to occur, particles velocities should bedistributed according to Maxwell’s distribution [16], which is, strictlyspeaking, a Gaussian probability distribution. However, since thereis in general no prior knowledge about the location of the optimumsolution, it is customary to employ the uniformly random initialization.

In general, monitoring how the temperature evolves is not enoughto determine if the system had reached an equilibrium state. Forisolated systems, a sufficient condition to decide whether the systemhas reached macroscopically the thermal equilibrium state is to achievemaximum entropy. However, the calculation of entropy is difficultin MD [16]. Usually MD simulations produce automatically manymacroscopic quantities of interest; by monitoring all of them, it ispossible to decide whether the system had reached equilibrium ornot. However, in the basic PSO algorithm there are no correspondingquantities of interest. What we need is really one or two auxiliaryquantities that can decide, with high reliably, if the system hasconverged to the steady thermal state. Fortunately, such a measureis available in literature in a form called the α-factor. We employ thefollowing thermal index [17]

α (t) =1B

B∑j=1

1M

M∑i=1

∥∥∥vji

∥∥∥4

[1M

M∑i=1

∥∥∥vji

∥∥∥2]2 (37)

where the usual definition of the Euclidean norm for a vector v withlength N is given by

‖v‖ =

√√√√ N∑n=1

v2n (38)

Here, vji is the velocity of the ith particle at the j th experiment.

At equilibrium, the index above should be around 5/3 at isothermalequilibrium [17].

5.2. Primary Study Using Benchmark Test Functions

In order to study the qualitative behavior of the swarm thermody-namics, we consider the problem of finding the global minimum of the


following N -dimensional standard test functions

f1 (x) =N∑

n=1

x2n (39)

f2 (x) =N∑

n=1

[x2

n − 10 cos (2πxn) + 10]

(40)

f3 (x) =N−1∑n=1

[100

(xn+1 − x2

n

)2+ (xn − 1)2

](41)

The sphere function in (39) has a single minimum located at theorigin. The function defined in (40) is known as Rastigrin function witha global minimum located at the origin. This is a hard multimodaloptimization problem because the global minimum is surrounded bya large number of local minima. Therefore, reaching the global peakwithout being stuck at one of these local minima is extremely difficult.The problem in (41), known as Rosenbrock function, is characterizedby a long narrow valley in its landscape with global minimum at thelocation [1, 1, ..., 1]T . The value of the global minimum in all of theprevious functions is zero.

Figure 2 shows the cost evolution together with the time historyof the swarm temperature obtained from the PSO simulation of thethree test functions (39)–(41). A swarm of 20 particles searchingin a 10-dimensional space is considered for each case. Figure 2(a)indicates that only the sphere function has successfully converged tothe global optimum. On the other hand, by examining the thermalbehavior of the system as indicated by Figure 2(b), it is clear thatthe swarm temperature drops rapidly, suggesting that the swarm is“cooling” while evolving with time. Eventually, the temperature goesto zero, indicating that the swarm has reached the state of thermalequilibrium. Notice that in thermodynamics zero temperature is onlya special case. Convergence to a constant non-zero temperature canalso be characterized as thermal equilibriums.

It is interesting to observe from Figure 2 that the thermalbehaviors of the three solutions look indistinguishable. This is thecase although the convergence curves of Figure 2(a) shows that onlythe sphere function has converged to its global optimum. Thisdemonstrates that convergence to thermal equilibrium does not meanthat convergence to the global optimum has been achieved. Rather,convergence to a local optimum will lead to thermal equilibrium. Theswarm temperature cannot be used then to judge the success or failureof the PSO algorithm.

188 Mikki and Kishk

0 200 400 600 800 1000

0

20

40

60

Iterations

Cos

t (dB

)

SphereRastigrinRosenbrock

(a)

0 200 400 600 800 1000

0

50

Iterations

Tem

pera

ture

(dB

)

SphereRastigrinRosenbrock

(b)

Figure 2. Cost function and temperature evolution for a swarm of 20particles searching for the minimum in a 10-dimensional space. Theparameters of the PSO algorithm are c1 = c2 = 2.0, and w is decreasedlinearly from 0.9 to 0.2. The search space for all the dimensions isthe interval [−10, 10]. A maximum velocity clipping criterion was usedwith Vmax = 10. The algorithm was run 100 times and averaged resultsare reported as (a) Convergence curves. (b) Temperature evolution.


The conclusion above seems to be in direct contrast to the resultspresented in [23]. This work introduced a quantity called averagevelocity defined as the average of the absolute values of the velocitiesin the PSO. Although the authors in [23] did not present a physicalanalogy between MD and PSO in the way we provided in Section 3,the average velocity will qualitatively reflect the behavior of the swarmtemperature as defined in (36). The main observation in [23] was thatconvergence of the ‘average velocity’ (temperature) to zero indicatesglobal convergence. Based on this, they suggested an adaptivealgorithm that forces the PSO to decrease the average velocity in orderto guarantee successful convergence to global optimum. However, itis clear from the results of Figure 2 that this strategy is incorrect.Actually, it may lead the algorithm to be trapped in a local optimum.

A better perspective on the thermal behavior of the system can beobtained by studying the evolution of the α-index. Figure 3 illustratesan average of 1000 run of the experiments of Figure 2. Careful study ofthe results reveals a different conclusion compared with that obtainedfrom Figure 2(b). It is clear from Figure 2(a) that the PSO solutionsapplied to the three different functions converge to a steady-statevalue at different number of iterations. The sphere function, Rastigrinfunction, and Rosenbrock valley converge roughly at around 400, 300,and 350 iterations, respectively. By examining the time histories of theα-index in Figure 3, it is seen that the sphere function’s index is the

0 200 400 600 800 10000

0.5

1

1.5

2

2.5

3

3.5

Iterations

Sphere

Rosenbrock

Rastigrin

Figure 3. The time evolution of the α-index for the problem ofFigure 2. The PSO algorithm was run 1000 times and averaged resultsare reported.

190 Mikki and Kishk

fastest in approaching its theoretical limit at the number of iterationswhere convergence of the cost function has been achieved. For example,the thermal index of the Rastigrin case continued to rise after the300th iteration until the 385th iteration where it starts to fall. Basedon numerous other experiments conducted by the authors, it seemsmore likely that the α-index can be considered an indicator to decideif the PSO algorithm has converged to the global optimum or not. Itlooks that convergence to global optimum in general corresponds to thefastest convergence to thermal equilibrium as indicated by our studiesof the associated α-index. Moreover, it is even possible to considerthat the return of the α-index to some steady state value, i.e., forminga peak, as a signal that convergence has been achieved. However,we should warn the reader that more intensive work with many otherobjective functions is required to study the role of the α-index as acandidate for global optimum convergence criterion.

The fact that the α-index is not converging exactly to 5/3 canbe attributed to three factors. First, very large number of particles isusually required in MD simulations to guarantee that averaged resultswill correspond to their theoretical values. Considering that to reducethe optimization cost we usually run the PSO algorithm with smallnumber of particles, the obtained curves for the α-index will havenoticeable fluctuations, causing deviations from the theoretical limit.Second, it was pointed out in Section 4 that because of the presenceof an inertia factor w in the basic PSO equations with values differentfrom unity, the analogy between the PSO and MD is not exact. Thesesmall deviations from the ideal Newtonian case will slightly changethe theoretical limit of the α-index. Our experience indicates thatthe new value of α at thermal equilibrium is around 1.5. Third,the theoretical value of 5/3 for the α-index is based on the typicalMaxwell’s distribution at the final thermal equilibrium state. However,the derivation of this distribution assumes weakly interacting particlesin the thermodynamic limit. It is evident, however, from the discussionof Section 3 that the PSO force is many-body global interaction thatdoes not resemble the situation encountered with rarefied gases, inwhich the kinetic theory of Maxwell applies very well. This observationforces us to be careful in drawing quantitative conclusions based on theformal analogy between the PSO and the physical system of interactingparticles.

5.3. Energy Consideration

Beside the basic update equations (3), (4), (11), and (12), it shouldbe clear how to specify the particles behavior when they hit theboundaries of the swarm domain. Assume that the PSO system under


consideration is conservative. If we start with a swarm total energyE, the law of conservation of energy states that the energy will stayconstant as long as no energy exchange is allowed at the boundary.From the MD point of view, there are two possible types of boundaryconditions (BC), dissipative and non-dissipative. Dissipative boundaryconditions refer to the situation when the special treatment of theparticles hitting the wall leads to a drop in the total energy of theswarm. An example of this type is the absorbing boundary condition(ABC), in which the velocity of a particle hitting the wall is set to zero[16, 18]. This means that the kinetic energy of a particle touching oneof the forbidden walls is instantaneously “killed.” If a large numberof particles hit the wall, which is the case in many practical problems,there might be a considerable loss of energy in the total swarm, whichin turns limits the capabilities of the PSO to find the optimum solutionfor several problems. The reflection boundary condition (RBC) isan example of non-dissipative BCs. However, for this to hold truereflections must be perfectly elastic; that is, when a particle hits thewall, only the sign of its vector velocity is reversed [16, 18]. Thiswill keep the kinetic energy of the particle unchanged, leading to aconservation of the total energy of the swarm.

In the latter case, it is interesting to have a closer look to theenergy balance of the PSO. From equation (6), we consider the totalmechanical energy

E = K + U (42)

where K and U are the total kinetic and potential energies,respectively. It is possible to show that this energy is always constantfor the conservative Lagrangian defined in (6) [22].

According to the thermodynamic analysis of Subsection 5.1, whenparticles converge to the global optimum, the temperature dropsrapidly. This means that when the swarm evolves, its kinetic energydecreases. Since the total energy is conserved, this means thatconvergence in the PSO can be interpreted as a process in which thekinetic energy is continually converted to potential energy. The finalthermodynamic state of the swarm at the global optimum represents,therefore, a unique spatial configuration in which the particles achievethe highest possible potential energy in the searched region in theconfiguration space. This conclusion applies only approximately tothe case when the boundary condition is dissipative since the energyloss at the wall is small compared to the total energy of the swarm(provided that large number of particles is used).

192 Mikki and Kishk

5.4. Dynamic Properties

Autocorrelation functions measure the relations between the values ofcertain property at different times. We define here a collective velocityautocorrelation function in the following way

Ψ (t) =

⟨1M

M∑i=1

vi (t0) · vi (t0 + t)

⟩⟨

1M

M∑i=1

vi (t0) · vi (t0)

⟩ (43)

where t0 is the time reference. The time average in (43) can beestimated by invoking the ergodicity hypothesis in the expected valueoperator in (29). The final expression will be

Ψ (t) =1B

B∑j=1

M∑i=1

vji (t0) · vj

i (t0 + t)

M∑i=1

vji (t0) · vj

i (t0)

(44)

where B is the number of experiments.Figure 4(a) illustrates the calculations of the autocorrelation

function in (44) with the time reference t0 = 0. The behavior ofthis curve has a very strong resemblance to curves usually obtainedthrough MD simulations. Such curves may provide a lot of informationabout the underlying dynamics of the swarm environment. First, atthe beginning of the simulation there are strong interactions betweenparticles, indicating that the velocity of each particle will changeas compared to the reference point. This is because, according toNewton’s first law, if there are no forces acting on a particle the velocitywill stay the same. However, in solids each molecule is attached toa certain spatial location. For this, it is hard to move a moleculefar away from its fixed position. What happens then is that theparticle will oscillate around its initial position. This explains thestrong oscillations of Ψ(t) in Figure 4(a), which suggest that the PSOenvironment resembles a solid state. The oscillations will not be ofequal magnitude, however, but decay in time, because there are stillperturbative forces acting on the atoms to disrupt the perfection oftheir oscillatory motion. So what we see is a function resembling adamped harmonic motion.

In liquids, because there are no fixed positions for the molecules,oscillations cannot be generated. Thus, what should be obtained is


(a)

0 50 100 150 200 250 300 350 400 450 500

0

0.5

1

1.5

2

2.5

3

Frequecny

Spe

ctru

m

Sphere

Rastigrin

Rosenbrock

(b)

Figure 4. The velocity autocorrelation function for the problem ofFigure 2 with B = 100 (a) Time domain results (b) Frequency domainresults with frequency given in terms of 1/∆t.

194 Mikki and Kishk

a dammed sinusoid with one minimum. For gases, all what can beobserved is a strong decay following an exponential law.

The observation that for the three different cost functions we getvery similar early time response reflects probably that the randomlyuniform initialization of the PSO algorithm construct a similar crystal-like solid structures, making early interactions of the particles similar.However, after 20 iterations the responses become different, as weexpect, since particles start to develop different inter-molecular forcesaccording to the fitness landscape encountered.

Another way to look into the dynamics of the PSO is to calculatethe Fourier spectrum of the time autocorrelation function as definedin (43). This gives some information about the underlying resonantfrequencies of the structure. Figure 4(b) illustrates the calculatedspectrum. The shape looks very similar to results obtained for solidmaterials in MD [18] where a profound single peak is apparent. TheFourier transform of (43) is also called the generalized phonon densityof state [18].

We conducted numerous experiments with several test functionswhere it seems that the classification of the PSO environment intogas, liquid, solid is strongly dependent on the boundary conditions,population size, and number of dimensions. Table 1 summarizes theshape of the velocity autocorrelation function corresponding to thethree cases.

Table 1. Velocity autocorrelation function corresponding to differentstates for the PSO environment.

State Velocity AutocorrelationGas Damped exponential

Liquid One period damped sinusoid with single minimumSolid Damped sinusoid with possibly multiple-oscillations

6. ACCELERATION TECHNIQUE FOR THE PSOALGORITHM

In this section, we will provide an application for the thermodynamicanalysis of Section 4. The idea is to employ the natural definitionof temperature, as given by equation (36), in a technique similar tothe simulated annealing. The idea is the observation that the decayof temperature is a necessary condition for convergence to the globaloptimum. This means that if the PSO has enough recourse to search


Figure 5. Block diagram representation of the linear arraysynthesis problem. Here the particles are xi ∈ {0 ≤ xi ≤ 1,i = 1, 2, ..., 2N}. Amplitudes and phases are ranged as Ai ∈{Amin ≤ Ai ≤ Amax, i = 1, 2, ..., N} and θi ∈ {0 ≤ θi ≤ 2π, i =1, 2, ..., N}, respectively.

for the best solution, then temperature decay may be enhanced byartificially controlling the thermal behavior of the system. We employa technique used in some MD simulations to push the swarm toequilibrium. Mainly, the velocity of each particle is modified accordingto the following formula

vi (k + 1) =

√T0

T (k)vi (k) (45)

where T0 is a user-defined target temperature. The meaning of (45)is that while the swarm is moving toward the thermal equilibrium ofthe global optimum, the velocities are pushed artificially to produce aswarm with a pre-specified target temperature. In simulated annealing,the artificial cooling of the algorithm is obtained by a rather morecomplicated method, in which the probability is calculated for eachiteration, to decide how to update the particles. Moreover, therethe biggest obstacle is the appearance of new control parameters, thetemperature, which has no clear meaning. This adds more burdenson the user in choosing the suitable value for the tuning parameters.However, it seems that the technique in (45) may provide an easieralternative.

The proposed acceleration technique will be demonstrated with apractical application from electromagnetics. We consider the design oflinear array antenna to meet certain pre-defined goal. Suppose thatthere is an N -elements linear array with uniform separation betweenthe elements of d. The normalized array factor is given by [24]

AF (u) =1

maxu {|AF (u)|}N∑

n=1

Inej2πndu/λ (46)

where In, n = 1, 2, ..., N, are the current distribution (generally

196 Mikki and Kishk

0 20 40 60 80 100

0

Iterations

Cos

t (dB

)

No AccelerationT

0 = 0.5

T0 = 0.1

T0 = 0.05

Figure 6. Convergence curves for the synthesis of symmetric patternby using a linear array of 14 elements. The population size is 8particles. The inertia factor w is decreased linearly from 0.9 to 0.2.c1 = c2 = 2.0.

complex); u = sin θ, where θ is the angle from the normal to thearray axis; d = λ/2.

The “Don’t exceed criterion” is utilized in the formulation ofthe objective function. That is, an error will be reported only ifthe obtained array factor exceeds the desired sidelobe level. Theinformation about the amplitudes and phases are encoded in thecoordinates of the particle, which are normalized between 0 and 1.Figure 5 illustrates the relation between the abstract PSO optimizationprocess and the physical problem.

The PSO algorithm is applied to synthesize an array antennawith a tapered side lobe mask that decreases linearly from −25 dBto −30 dB. The beamwidth of the array pattern is 9◦. The totalnumber of sampling points is 201. Figure 6 illustrates the convergencecurves results where it is clear that the proposed artificial coolingcan successfully accelerate the convergence of the method. The finalobtained radiation pattern for T0 = 0.05 is shown in Figure 7.

7. DIFFUSION MODEL FOR THE PSO ALGORITHM

The development of the PSO algorithm presented so far was based ondrawing a formal analogy between MD and the swarm environment.This means that a microscopic physical treatment is employed in


0 0.5 1

0

sin

Env

elpe

Pat

tern

(dB

)

Figure 7. Radiation pattern of the linear array antenna correspondingto best case of Figure 6 (T0 = 0.05).

modeling the PSO. In this section we develop an alternative view that isbased on macroscopic approach. This new analogy can be constructedby postulating that particles in the PSO environment follow a diffusionmodel. The use of diffusion models is now central in solid-state physicsand molecular dynamics since the pioneering works of Boltzmann andEinstein more than one hundred year ago. Moreover, since Einsteinwork on the origin of Brownian motion, the diffusion equation is usedroutinely as a theoretical basis for the description of the underlyingstochastic dynamics of molecular processes.

The justification of the Diffusion model can be stated as follow.From (18), we can write the equation of motion of a particle in thePSO algorithm as

x(t) = βv (t) + Γ(t) (47)

where Γ(t) = Φ [P (t) − x (t)]. From the definition of P in (5), onecan see that even in the continuous limit ∆t → 0 the term Γ(t) canbe discontinuous because of the sudden update of the global and localpositions. Thus, Γ(t) can be easily interpreted as a random fluctuationsterm that arises in the PSO environment in a way qualitatively similarto random collision. Notice that the particles in the PSO are originallythought of as soft spheres (i.e., there is no model for physical contactsbetween them leading to collisions). This interpretation brings thePSO immediately to the realm of the Brownian motion, where thedynamics is described by an equation similar to (47).

198 Mikki and Kishk

We start from the basic diffusion equation [16]

∂N (x, t)∂t

= D∇2N (x, t) (48)

where N is the concentration function of the molecules, x is the positionof the particle, and D is the diffusion constant.

Einstein obtained (48) in his 1905 paper on Brownian motion[25]. Direct utilization of this equation leads to solutions for theconcentration of particles as functions of time and space. However,a revolutionary step taken by Einstein was to give the particleconcentration different interpretations, deviating strongly from theclassical treatment. To illustrate this, let us consider a one-dimensionaldiffusion equation as given below

∂N

∂t= D

∂2N

∂x2(49)

Einstein’s interpretation of the physical meaning of the concen-tration function, N , started by considering a short time interval, ∆t,in which the position changes from x to x+dx. A very important stepwas to refer each particle to its center of gravity [25]. Here, we proposeto refer each particle to the location pi

n , as defined in (5), where weconsider this to be the natural center of gravity for particles in thePSO environment. Therefore, it is possible to define the new positionvariable

ri,nk = xi,nk − pi,n

k−1 (50)

where we have modified the notation to allow for the iteration indexk to appear as the subscript in (50) while the indices of the particlenumber i and the dimension n are the superscripts.

Clearly, the diffusion equation (49) is invariant under the lineartransformation (50). During the short time interval ∆t, particlelocations will change according to a specific probability low. Thismeans that particles trajectories are not continuous deterministicmotions, but a stochastic path reflecting the irregular motion ofparticles in noisy environments. Therefore, the re-interpretation ofthe diffusion equation in (48) is that it gives us the probability densityfunction (pdf) of the motion, not the classical density. Assuming thefollowing boundary conditions

N (r, 0) = δ(r), limr→∞ N (r, t) = 0,

∞∫−∞

N (r, t) dx = M (51)


the solution of (49) can be written as

N (r, t) =M√4πD

e−r2

4Dt√t

(52)

where this can be easily verified by substituting (52) back into (49).Equation (52), which represents a Gaussian distribution for

the particle’s positions, was observed empirically before [26]. Thedevelopment of this section presents, therefore, a theoretical derivationfor the pdf of particles moving in the PSO algorithms that matches theexperimentally observed results. However, the standard interpretationof (52) in the theory of diffusion equations is that it representsthe Green’s function of the medium. That is, depending on whatis the boundary-condition imposed on the initial swarm differentdistributions will result.

It is very important for the subsequent developments of this paperto recognize that (52) should be interpreted as a conditional pdf. Thiscan be inferred from the fact that the time variable appearing in (52)is referenced to the origin. As we mentioned above about Einstein’sanalysis of the particles trajectory, the total motion can be expressedas a succession of short intervals of time. This means that if we startat a time instant tk−1, then the conditional pdf can be expressed as

f(xi,n

k , tk|xi,nk−1, tk−1

)=

1√4πD (tk − tk−1)

exp

−

(xi,n

k − pi,nk−1

)2

4D (tk − tk−1)

(53)where f stands now for the pdf of the particles position. This finalform is valid at arbitrary time instants.

It remains to see how the diffusion constant in (53) can beestimated. Here, we come to the main contribution of Einstein’spaper in which he introduced the first connection between themacroscopic representation of the diffusion model and the microscopicrepresentation of MD. Basically, the following relation was derived in[25]

D = limt→∞

⟨[r (t) − r (0)]2

⟩6t

(54)

Clearly, this expression can be easily evaluated from the MD orPSO simulation. Therefore, it is possible to get the conditional pdfof particles motion by calculating the diffusion constant as given in(54). It should be mentioned that the pdf is in general different forvarying boundary conditions. The well-established theory of diffusion

200 Mikki and Kishk

equations can be employed in solving for the exact pdf of PSO underthese different boundary modes.

Few general remarks on the diffusion approach to the PSO isneeded her. First, as we have just mentioned, the diffusion equationis subject to various boundary conditions that govern the constraintson the swarm evolution. The Gaussian pdf in (52) was stated only asa particular solution under some general conditions (51). If the PSOcan be modeled as an evolution of particles under certain boundarymodes, then one needs to construct the appropriate diffusion modeland solve the boundary value problem. The formal implementation ofthis program requires further theoretical investigations and numericalstudies, which are beyond the scope of the current work. The theory ofdiffusion is very rich and mature in theoretical physics; our sole purposeis to invite mathematically-oriented researchers in the optimizationcommunity to consider learning from physicists in that regard.

8. MARKOV MODEL FOR SWARM OPTIMIZATIONTECHNIQUES

8.1. Introduction

In this section, a unified approach to the PSO techniques presentedin this paper will be proposed based on the theory of Markov chains.One can visualize the Markov process as a hidden (deeper) layer in thedescription of dynamic problems. It does not present new behaviorin the phenomenon under study, but may provide better insight onthe way we write down the equations of motion. Since algorithms aresensitive to the various mathematical schemes used to implement them(e.g., some models are computationally more efficient), it is possibleto gain new advantages, theoretical and conceptual, by investigatingother deeper layers like Markov chains.

We start from the observation that the Gaussian pdf of particlepositions lead to the possibility of eliminating the velocity from thealgorithm and developing, instead, position-only update equations [26].The recently introduced quantum version of the PSO has naturallyno velocity terms [13–15]. This is because in quantum mechanics,according to Heisenberg’s principle of uncertainty, it is not possible todetermine both the position and the velocity with the same precision.It can be shown that all quantum mechanical calculations will producea natural conditional pdf for the positions in the swarm environment assolutions of the governing Schrodinger’s equation [15]. Therefore, oneway to unify both the quantum and classical approaches is to considertheir common theme: The existence of certain conditional probabilitydistributions underlying the stochastic dynamics of the problem. Since


this pdf can be obtained either by solving the diffusion equation forthe classical PSO, or Schrodinger’s equation for the quantum PSO,then the update equation can be directly derived from the pdf withoutusing any further physical argument. An example will be given in thenext section.

8.2. Derivation of the Update Equations Using ProbabilityTheory

The diffusion equation and Schrodinger’s equation can be consideredas “sources” for various probability density functions that govern thestochastic behavior of moving particles. Now, since the pdf of theparticle at the kth iteration is known, we will simulate the trajectoryby transforming a uniformly distributed random variable into anotherrandom variable with the desired pdf. Let the required transformationthat can accomplish this be denoted as y = h(x). Then, the cumulativedensity function (cdf) of both X and Y will be FX(x) and FY (y),respectively. It is easy to show that for the transformation h to producethe distribution FY (y), starting from the uniform distribution FX(x),one must have [20]

y = F−1Y (x) (55)

As an example, consider the Laplace distribution that wasobtained from the delta potential well version of the quantum PSOalgorithm [13–15]. The pdf of the particle trajectory can be written as

f(ri,nk

∣∣∣ ri,nk−1

)=g ln

√2∣∣∣ri,nk−1

∣∣∣ exp

−2g ln

√2

∣∣∣ri,nk

∣∣∣∣∣∣ri,nk−1

∣∣∣ (56)

where ri,nk is defined as in (50) and g is the only control parameterin the quantum PSO. By solving (55), denoting the uniform randomvariable by u, and the position r by y, one obtains

xi,nk =

pi,n

k−1 +

∣∣∣xi,nk−1 − p

i,nk−2

∣∣∣2

ln (2u) 0 < u ≤ 12

pi,nk−1 +

∣∣∣xi,nk−1 − p

i,nk−2

∣∣∣2

ln1

2 (1 − u)12< u ≤ 1

(57)

This alternative derivation of the update equations wasaccomplished without reference to any argument in physics; theconcept of collapsing the wave function, as presented in [15], has notbeen mentioned at all; only probability theory is needed in arriving

202 Mikki and Kishk

to (57). Moreover, notice that the control parameter g cancels outin the last expression. The new update equations above were testednumerically and demonstrated a performance as good as the originalquantum PSO.

8.3. Markov Chain Model

A Markov process is defined as a stochastic process whose past has noinfluence on the future if its present is specified [20]. Mathematically,we state this definition as follow

P {xn ≤ xn|xn−1,xn−2, ....,x1,x0} = P {xn ≤ xn|xn−1} (58)

where xk, k = 1, 2, ..., n, is a sequence of random variables defined overa time index k and P is the probability operator. It is understood thatthe sequence starts at the initial time k = 0, which can be consideredthe cause, and then “propagates” the effect through certain stochasticrules. Equation (58) tells us that the effect at a certain time instancek is dependent only on the information of the previous time step k−1.If we define the information of the sequence at k to be the state atthat time instant, then it is possible to describe the Markov chain as astochastic process in which the system can remember only the previousstate.

From the analysis presented at the previous parts of this article,we have found that the PSO algorithm, being classic or quantum,involves a conditional pdf in which information about the currentposition depends only on information coming from the previous timestep. In other words, by examining some of the distributions presentedhere, such as (53) and (56), it is clear that only the previous state(iteration) will be involved in the evaluation of the probability of thenext iteration. Therefore, we classify all of the possible versions of thePSO algorithm, being classic or quantum, as stochastic Markov chains.

It is clear that the functional form of the conditional pdf doesdepend on the time index k. This is because the new center of gravitypi,n

k [see equation (50)] varies from iteration to iteration. A Markovprocess that satisfies such a property is called inhomogeneous Markovprocess [20].

From the chain rule in probability theory, one can write [20]

f (xk,xk−1, .....,x1,x0)=f (xk|xk−1) f (xk−1|xk−2)···f (x2|x1) f (x0)(59)

where f stands for the pdf of the random variables presented at itsargument.

Equation (59) implies that the joint statistics of all the positions,evaluated at all iterations up to the present time, can be expressed


in terms of the conditional pdf, at various previous times, plus theinitial distribution at the first iteration f(x0). This initial conditionis independent of the algorithm and must be chosen by the user. It iscustomary in evolutionary methods to select the starting populationas a uniform random distribution over the range of interest.

Of special importance in Markov chains is the following result,known as Chapman-Kolmogorov equation [20]

f (xn|xk) =∞∫

−∞dNxm f (xn|xm) f (xm|xk) (60)

where n > m > k. Here dNxm stands for the volume element (scalar)in the abstract N -dimensional space RN . Equation (60) can be usedto establish a causal link between any combinations of three iterationsthat need not to be successive.

For the general case of k successive steps, it easy to calculate themarginal pdf of the kth state as follows

f (xk) =∫

xk−1

∫xk−2

....

∫x0

dNxk−1 dNxk−2... d

Nx0 f (xk,xk−1, .....,x1,x0)

(61)By substituting (60) into (61) we get

f (xk) =∫

xk−1

∫xk−2

....

∫x0

dNxk−1 dNxk−2... d

Nx2 dNx0 f (xk|xk−1)

· · · f (xk−1|xk−2) f (x2|x1) f (x0) (62)

This gives the pdf of the future kth iteration by causallyaccumulating causally the effects of all previous states (iterations) asoriginated from an initial distribution.

8.4. Generalized PSO Algorithm

It is clear from equation (62) that knowledge of the conditional pdfof the sequence of positions, together with the pdf of the initialdistribution, is enough to give an exact characterization of the total pdffor any future iteration. Based on these facts, we propose a generalizedPSO algorithm summarized as follow

(i) Specify an initial distribution with known pdf f(x0).(ii) Derive the conditional pdf f (xk|xk−1), for arbitrary k, by solving

the diffusion equation (classic PSO) or Schrodinger’s equation(quantum PSO).

204 Mikki and Kishk

(iii) Calculate the marginal pdf f(xk) using equation (62).(iv) Update the position by realizing the random number generator

with the pdf f(xk) (for example, the method presented inSubsection 8.2).

The generalized PSO algorithm, as presented above, has thefollowing advantages over the traditional classical or quantum versions

(i) The velocity terms are canceled. If the derived update equationsof the position are not complicated, then this may considerablyreduce the computational complexity of the new algorithm.

(ii) The algorithm is purely stochastic. The rich literature on stabilityand time evolution of stochastic processes can be applied toprovide a deeper understanding of the convergence of the PSOalgorithm.

(iii) The generalized PSO algorithm eliminates the variations in thephysics behind the method; it shows that all PSO methods arespecial classes within the wider group of Markov chains.

8.5. Prospectus and Some Speculations

In particular, it can be inferred from the overall discussion of thissection that the Markov approach unifies the classical and the quantumversions of the swarm mechanics, a problem that is fundamental intheoretical physics. It is interesting to refer to the work proposed, yetin a different context, by Nelson in 1966 [27]. The idea was to derivequantum mechanics from a classical model starting from the stochasticdiffusion equation. Even though not all physicists have accepted theidea, very recently a general approach based on stochastic dynamicswas proposed to provide a theoretical derivation of quantum mechanicsby formulating the problem as a stochastic dynamical system obeyingsome reasonable energy-related principles [28]. If quantum mechanicscan be derived from the diffusion equation, then the diffusion theoryof Section 7 and the Markov model of this section are enough totheoretically derive any possible PSO algorithm that can be mappedto existing physical theory. A general approach to establish physicaltheory for natural selection was formulated by Lee Smolin [29] whereit can be seen that evolution, as described in biology, could be linked,on the fundamental level, to the Standard Model of particle physics.Similar ideas, beside the approach presented in our work, are inconformity with each other and strongly suggest that the foundations ofthe swarm intelligence methods could be rooted ultimately in physics,rather than biology.


9. CONCLUSIONS

In this paper, establishing a formal analogy between the PSOenvironment and the motion of a collection of particles underNewtonian laws provided a new insight on the PSO algorithm. Inparticular, the well-established branch of physics molecular dynamicswas exploited to study the thermodynamic behavior of the swarm.The thermodynamic analysis provided a new global criterion todecide whether a convergence of the algorithm is successful or not.A new acceleration technique for the classical PSO algorithm wasproposed based on thermodynamic theory. The new technique wasemployed successfully in improving the convergence performance withthe practical problem of synthesis of linear array antennas. A diffusionmodel for the PSO environment was proposed. Upon solving thecorresponding diffusion equation, the familiar Gaussian distribution,observed only empirically before, was obtained. The new diffusionmodel was integrated with the quantum model, developed in otherworks, to propose a unified scheme using the theory of Markov chains.The generalized algorithm has several advantages, like the possibilityof reducing the computational complexity and opening the door forplenty of stability analysis techniques to be applied for the study ofthe basic PSO algorithm.

ACKNOWLEDGMENT

This work was partially supported by the Department of ElectricalEngineering, University of Mississippi and the National ScienceFoundation under Grant No. ECS-524293.

REFERENCES

1. Levin, F. S., An Introduction to Quantum Theory, CambridgeUniversity Press, 2002.

2. Sijher, T. S. and A. A. Kishk, “Antenna modeling by infinitesimaldipoles using genetic algorithms,” Progress In ElectromagneticsResearch, PIER 52, 225–254, 2005.

3. Mahanti, G. K., A. Chakrabarty, and S. Das, “Phase-onlyand amplitude-phase only synthesis of dual-beam pattern linearantenna arrays using floating-point genetic algorithms,” ProgressIn Electromagnetics Research, PIER 68, 247–259, 2007.

4. Meng, Z., “Autonomous genetic algorithm for functionaloptimization,” Progress In Electromagnetics Research, PIER 72,253–268, 2007.

206 Mikki and Kishk

5. Riabi, M. L., R. Thabet, and M. Belmeguenai, “Rigorousdesign and efficient optimizattion of quarter-wave transformers inmetallic circular waveguides using the mode-matching method andthe genetic algorithm,” Progress In Electromagnetics Research,PIER 68, 15–33, 2007.

6. Kennedy, J. and R. C. Eberhart, “Particle swarm optimization,”Proc. IEEE, Conf. Neural Networks IV, Piscataway, NJ, 1995.

7. Kenedy, J. and R. C. Eberhart, Swarm Intelligence, MorganKaufmann Publishers, 2001.

8. Clerc, M. and J. Kennedy, “The particle swarm: explosion,stability, and convergence in a multi-dimensional complex space,”IEEE Trans. Evolutionary Computation, Vol. 6, No. 1, 58–73, Feb.2002.

9. Kadirkamanathan, V., K. Selvarajah, and P. J. Fleming,“Stability analysis of the particle swarm optimizer,” IEEE Trans.Evolutionary Computation, Vol. 10, No. 3, 245–255, June 2006.

10. Ciuprina, G., D. Ioan, and I. Munteanu, “Use of intelligent-particle swarm optimization in electromagnetics,” IEEE Trans.Magn., Vol. 38, No. 2, 1037–1040, Mar. 2002.

11. Robinson, J. and Yahya Rahmat-Samii, “Particle swarmoptimization in electromagnetics,” IEEE Trans. AntennasProgat., Vol. 52, 397–407, Feb. 2004.

12. Boeringer, D. and D. Werner, “Particle swarm optimization versusgenetic algorithms for phased array synthesis,” IEEE Trans.Antennas Progat., Vol. 52, No. 3, 771–779, Mar. 2004.

13. Sun, J., B. Feng, and W. B. Xu, “Particle swarm optimizationwith particles having quantum behavior,” Proc. Cong. Evolution-ary Computation, CEC2004, Vol. 1, 325–331, June 2004.

14. Mikki, S. M. and A. A. Kishk, “Investigation of the quantumparticle swarm optimization technique for electromagnetic appli-cations,” IEEE Antennas and Propagation Society InternationalSymposium, 45–48, Vol. 2A, July 3–8, 2005.

15. Mikki, S. M. and A. A. Kishk, “Quantum particle swarmoptimization for electromagnetics,” IEEE Trans. AntennasProgat., Vol. 54, 2764–2775, Oct. 2006.

16. Haile, J. M., Molecular Dynamics Simulation, Wiley, New York,1992.

17. Schommers, W., “Structures and dynamics of surfaces I,” Topicsin Current Physics, W. Schommers and P. von Blanckenhagen(eds.), Vol. 41, Springer-Verlag, Berlin, Heidelberg, 1986.


18. Rieth, M., Nano-Engineering in Science and Technology, WorldScientific, 2003.

19. Kreyszig, E., Advanced Engineering Mathematics, 8th edition,Wiley, 1999.

20. Papoulis, A., Probability, Random Variables, and StochasticProcesses, 3rd edition, McGraw-Hill, 1991.

21. Penrose, R., The Emperor’s New Mind, 1989.22. Gossick, B. R., Hamilton’s Principle and Physical Systems,

Academic Press, 1967.23. Iwasaki, N. and K. Yasuda, “Adaptive particle swarm optimiza-

tion using velocity feedback,” International Journal of Innova-tive Computing, Information and Control, Vol. 1, No. 3, 369–380,September 2005.

24. Bhattacharyya, A. K., Phased Array Antennas, Wiley-Interscience, 2006.

25. Einestien, A., “On the movement of small particles suspended ina stationary liquid demanded by the molecular-kinetic theory ofheat,” Ann. Phys. (Leipzig), Vol. 17, 549–560, 1905.

26. Kennedy, J., “Probability and dynamics in the particle swarm,”Proc. Cong. Evolutionary Computation, CEC2004, Vol. 1, 340–347, June 19–23, 2004.

27. Nelsson, E., “Derivation of Schrodinger equation from Newtonianmechanics,” Phys. Rev., Vol. 150, 1079–1085, 1966.

28. Smolin, L., “Could quantum mechanics be approximationto another theory?” http://arxiv.org/abs/quant-ph/0609109,September 2006.

29. Smolin, L., Life of the Cosmos, Oxford University Press, 1999.

Documents

PHYSICALTHEORYFORPARTICLESWARM … OPTIMIZATION S.M.MikkiandA.A.Kishk CenterofAppliedElectromagneticSystemsResearch ... between the PSO algorithms, ...Published in: Progress in Electromagnetics