- Email: [email protected]

James Kennedy Bureau of Labor Statistics, Washington, D.C. [email protected]

Abstract The particle swarm is an algorithm for finding optimal regions of complex search spaces through interaction of individuals in a population of particles. Though the algorithm, which is based on a metaphor of social interaction, has been shown to perform well, researchers have not adequately explained how it works. Further, traditional versions of the algorithm have had some dynamical properties that were not considered to be desirable, notably the particles’ velocities needed to be limited in order to control their trajectories. The present paper analyzes the particle’s trajectory as it moves in discrete time (the algebraic view), then progresses to the view of it in continuous time (the analytical view). A 5-dimensional depiction is developed, which completely describes the system. These analyses lead to a generalized model of the algorithm, containing a set of coefficients to control the system’s convergence tendencies. Some results of the particle swarm optimizer, implementing modifications derived from the analysis, suggest methods for altering the original algorithm in ways that eliminate problems and increase the optimization power of the particle swarm.

1. INTRODUCTION Particle swarm adaptation has been shown to successfully optimize a wide range of continuous functions (Angeline, 1998; Kennedy and Eberhart, 1995; Kennedy, 1997; Kennedy, 1998; Shi and Eberhart, 1998). The algorithm, which is based on a metaphor of social interaction, searches a space by adjusting the trajectories of individual vectors, called “particles” as they are conceptualized as moving points in multidimensional space. The individual particles are drawn stochastically toward the positions of their own previous best performance and the best previous performance of their neighbors. While empirical evidence has accumulated that the algorithm “works,” e.g., it is a useful tool for optimization, there has thus far been little insight into how it works. The present analysis begins with a highly simplified, deterministic version of the particle swarm in order to provide understanding about how it searches the problem space (Kennedy, 1998), then continues on to analyze the full stochastic system. A generalized model is proposed, including methods for controlling the convergence properties of the particle system. Finally, some empirical results are given, showing the performance of various implementations of the algorithm on a suite of test functions. 1.1 The particle swarm r r A population of particles is initialized with random positions x i and velocities v i , and a function, f, is evaluated, using the particle’s positional coordinates as input values. Positions and velocities are adjusted, and the function evaluated with the new coordinates at each time-step. When a particle discovers a pattern that is better than any it r r has found previously, it stores the coordinates in a vector p i . The difference between p i (the best point found by i

so far) and the individual’s current position is stochastically added to the current velocity, causing the trajectory to oscillate around that point. Further, each particle is defined within the context of a topological neighborhood comprising itself and some other particles in the population. The stochastically weighted difference between the r neighborhood’s best position p g and the individual’s current position is also added to its velocity, adjusting it for the next time-step. These adjustments to the particle’s movement through the space cause it to search around the two best positions. The algorithm in pseudocode follows: Initialize population Do For i= 1 to Population Size r r r r if f( x i )

(

)

vid = sign(vid ) ⋅ min(abs (vid ),Vmax )) xid = xid + vid Next d Next i Until termination criterion is met

The variables ϕ 1 and ϕ 2 are random positive numbers, drawn from a uniform distribution and defined by an upper limit ϕ max which is a parameter of the system. In this version, the term variable v id is limited to the range ±Vmax , r for reasons which will be explained below. The values of the elements in p g are determined by comparing the best performances of all the members of i’s topological neighborhood, defined by indexes of some other population

r members, and assigning the best performer’s index to the variable g. Thus p g represents the best position found

by any member of the neighborhood. The random weighting of the control parameters in the algorithm results in a kind of explosion or a “drunkard's walk” as particles’ velocities and positional coordinates careen toward infinity. The explosion has traditionally been contained through implementation of a Vmax parameter, which limits step-size, or velocity. The current paper however demonstrates that the implementation of properly defined constriction coefficients can prevent explosion; further, these coefficients can induce particles to converge on local optima. An important source of the swarm’s search capability is the interactions among particles as they react to one another’s findings. Analysis of interparticle effects is beyond the scope of this paper, which focuses on the trajectories of single particles. 1.2 Simplification of the system We begin the analysis by stripping the algorithm down to a most simple form; we will add things back in later. The r particle swarm formula adjusts the velocity v i by adding two terms to it. The two terms are of the same form, that r r r is, ϕ ( p − xi ) , where p is the best position found so far, by the individual particle in the first term, or by any neighbor in the second term. The formula can be shortened by redefining pid as follows: pid ←

ϕ1 pid + ϕ 2 p gd ϕ1 + ϕ 2

Thus we can simplify our initial investigation by looking at the behavior of a particle whose velocity is adjusted by only one term: vid (t + 1) = vid (t ) + ϕ ( pid − xid (t )) ,

where ϕ = ϕ 1 + ϕ 2 . This is algebraically identical to the standard two-term form. r When the particle swarm operates on an optimization problem, the value of p i is constantly updated, as the system r evolves toward an optimum. In order to further simplify the system and make it understandable, we set p i to a

constant value in the following analysis. The system will also be more understandable if we make ϕ a constant as well; where normally it is defined as a random number between zero and a constant upper limit, we will remove the stochastic component initially and reintroduce it in later sections. The effect of ϕ on the system is very important, and much of the present paper is involved in analyzing its effect on the trajectory of a particle. The system can be simplified even further by considering a 1-dimensional problem space, and again further by reducing the population to one particle. Thus we will begin by looking at a stripped-down particle by itself, e.g., a population of one, one-dimensional, deterministic particle, with a constant p. Thus we begin by considering the reduced system: v (t + 1) = v(t ) + ϕ ( p − x(t )) x(t + 1) = x(t ) + v(t + 1)

Equ. 1.1

where p and ϕ are constants. No vector notation is necessary, and there is no randomness. Kennedy (1998) found that the simplified particle’s trajectory is dependent on the value of the control parameter ϕ, and recognized that randomness was responsible for the explosion of the system, though the mechanism which caused the explosion was not understood. Ozcan and Mohan (1998a; 1998b) further analyzed the system and concluded that the particle as seen in discrete time “surfs” on an underlying continuous foundation of sine waves.

The present paper analyzes the particle swarm as it moves in discrete time (the algebraic view), then progresses to the view of it in continuous time (the analytical view). A 5-dimensional depiction is developed, which completely describes the system. These analyses lead to a generalized model of the algorithm, containing a set of coefficients to control the system’s convergence tendencies. When randomness is re-introduced to the full model with constriction coefficients, the deleterious effects of randomness are seen to be controlled. Some results of the particle swarm optimizer, using modifications derived from the analysis, are presented; these results suggest methods for altering the original algorithm in ways that eliminate some problems and increase the optimization power of the particle swarm. 2. ALGEBRAIC POINT OF VIEW The basic simplified dynamic system is defined by v t +1 = v t + ϕy t y t +1 = −v t + (1 − ϕ ) y t

Equ. 2.1

where y t = p − x t .

ϕ v 1 Let Pt = t be the current point in R 2 , and M = the matrix of the system. In this case we have − 1 1 − ϕ yt Pt +1 = MPt and, more generally, Pt = M t P0 . Thus the system is completely defined by M.

The eigenvalues of M are: 2 e = 1 − ϕ + ϕ − 4ϕ 1 2 2 2 ϕ − 4ϕ ϕ e 2 = 1 − − 2 2

Equ. 2.2

We can immediately see that the value ϕ=4 is special; below we will see what this implies. e 0 For ϕ ≠4 we can define a matrix A so that AMA−1 = L = 1 0 e2 (note that A-1 doesn’t exist when ϕ =4)

Equ. 2.3

a 1 For example, from the canonical form A = we find c 1 a = ϕ + ϕ− c =

ϕ 2 − 4ϕ 2ϕ ϕ 2 − 4ϕ 2ϕ

Equ. 2.4

In order to simplify the formulas, we multiply by 2ϕ, to produce a matrix A:

ϕ + ϕ 2 − 4ϕ A= ϕ − ϕ 2 − 4ϕ

2ϕ 2ϕ

Equ. 2.5

So if we define Qt = APt we can now write Pt +1 = A−1LAPt APt +1 = LAPt

Equ. 2.6

Qt +1 = LQt

and, finally, Qt = Lt Q0 et But L is a diagonal matrix, so we have simply Lt = 1 0

0 e2t

Equ. 2.7

In particular, there is cyclic behavior in the system if and only if Qt = Q0 (or, more generally if Qt + k = Qt ). This just means that we have a system of two equations: e1t = 1 t e2 = 1

2.1

Equ. 2.8

Case ϕ <4

For 0<ϕ<4, the eigenvalues are complex, and there is always at least one (real) solution for ϕ. More precisely we can write e1 = cos(θ ) + i sin(θ ) e2 = cos(θ ) − i sin(θ )

with cos(θ ) = 1 −

ϕ 2

and sin(θ ) =

Equ. 2.9

4ϕ − ϕ 2 . 2

Then e1t = cos(tθ ) + i sin(tθ ) t e2 = cos(tθ ) − i sin(tθ )

and cycles are given by any θ such that θ =

Equ. 2.10

2kπ t

So for each t, the solutions for ϕ are given by 2kπ ϕ = 2(1 − cos ) for k ∈ {1,2,..., t − 1} t

Equ. 2.11

Table 2.1 gives some nontrivial values of ϕ for which the system is cyclic. Table 2.1. Some ϕ values for which the system is cyclic. Cycle period ϕ 3 3 (see Figure2.1) 2 4 5 (see Figure 2.2 for the sum and 5± 5 Figure 2.3 for the difference) 2 1, 3 6,3 6, 4, 3, 12 1, 2, 3, 2 ± 3 Figures 2.1 through 2.4 show the trajectories of a particle in phase space, for various values of ϕ. When ϕ takes on one of the values from Table 2.1, the trajectory is cyclical, for any other value, the system is just quasi-cyclic, as in Figure 2.4. is the 2-norm (the Euclidean one for a vector),

We can be a little bit more precise. Below,

Qt = APt = Q0 A−1 Q0 ≥ Pt ≥

Q0

.

Equ. 2.12

A

For example, for v0=0 and y0=1, we have 1 3ϕ 2 − 4ϕ ± 5ϕ 4 − 8ϕ 3 + 16ϕ 2 2ϕ max 2ϕ 4 − 8ϕ 3 2

≥ Pt ≥

2ϕ max 3ϕ 2 − 4ϕ ± 5ϕ 4 − 8ϕ 3 + 16ϕ 2

Equ. 2.13

-------------------Figures 2.1 - 2.4 -------------------2.2

Case ϕ >4

If ϕ>4, then e1 and e2 are real numbers (and e1 ≤ e2 ), so we have either •

e1 = e2 = 1 (for t even) which implies ϕ=0, not consistent with the hypothesis ϕ>4

•

e1 = −e2 = 1 (or -1) which is impossible

•

e1 = e2 = −1 that is to say ϕ=4, not consistent with the hypothesis ϕ>4

So, and this is the point, there is no cyclic behavior for ϕ>4. And, in fact, the distance from the point Pt to the center (0,0) is strictly monotonic increasing with t, which means that Qt = APt Lt Q0 = APt Lt Q0 = APt

So

Equ. 2.14

Lt Q0 ≤ A Pt Lt Q0 A

Equ. 2.15 ≤ Pt

But one can also write Pt = A−1Qt Pt ≤ A−1 Qt

Equ. 2.16

Pt ≤ A−1 Lt Q0

So, finally, Pt increases like Lt Q0 . In Section 4 this result is used to prevent the explosion of the system, which can occur when particle velocities increase without control. 2.3

Case ϕ =4

4 1 In this situation M = − − 1 3

In this particular case, the eigenvalues are both equal to -1, and there is just one family of eigenvectors, generated − 2 by V = . So we have MV = −V . 1 Thus, if P0 is an eigenvector, proportional to V (that is to say if v 0 + 2 y 0 = 0 ), there are just two symmetrical points, for 2y Equ. 2.17 Pt +1 = ± 0 = − Pt − y0 In the case where P0 is not an eigenvector, we can directly compute how Pt decreases and/or increases. Let us define ∆ t = Pt +1

2

− Pt

. By recurrence the following form is derived:

∆t = at v02 + bt v0 y0 + ct y02

Equ. 2.18

where at,bt,ct are integers so that ∆ t = 0 for v0 + 2 y0 = 0 . The integers can be negative, zero, or positive. Supposing for a particular t we have ∆ t > 0 , one can easily compute ∆t = vt2 + 14vt yt + 24 yt2 . This quantity is positive if and only if vt is not between (or equal to) the roots {− 2 y t ,−12 y t } 32 yt Now, if ∆ t +1 is computed then we have ∆t +1 = 11vt2 + 54vt yt + 64 yt2 , and the roots are − 2 yt ,− . As 11 32 < 12 , this result means that ∆t +1 is also positive. So as soon as Pt begins to increase, it does so infinitely. 11 But it can decrease, at the beginning. The question to be answered next is, how long can it decrease before it begins increasing?

Now take the case of ∆0 < 0. This means that v0 is between -2y0 and -12y0. For instance, in the case where y0>0, 1 v 0 = −2 y 0 − ε , with ε ∈ ]0,10 y0 [ . Equ. 2.19 By recurrence, the following is derived: ∆ 0 = −10 y0ε + ε 2 ∆1 = −10 y0ε + 11ε 2 ∆ 2 = −10 y0ε + 21ε 2 ∆ t + 2 = −∆ t + 2∆ t +1 = −10 y0ε + kt + 2ε 2 , with kt + 2 = −kt + 2kt +1

Equ. 2.20

∆ t = −10 y0ε + (1 + 10t )ε 2

Equ. 2.21

Finally, 2

as long as (1 + 10t )ε ≤ 10 y 0 ε , which means that P t decreases as long as 1 − + y0 t ≤ 1 + Integer _ part 10 ε After that, Pt increases.

Equ. 2.22

The same analysis can be performed for y0<0. In this case ε<0, as well, so the formula is the same. In fact, to be even more precise, if

α = −10 y 0 ε + ε 2 β = 10ε 2 then we have

Pt = t

β α − 2 P0 2 + + 2 2 t t

β

Equ. 2.23

Thus, it can be concluded that Pt decreases/increases almost linearly when t is big enough. In particular, even if it begins to decrease, after that it tends to increase almost like t 5 v 0 + 2 y 0 . 3. 3.1

ANALYTIC POINT OF VIEW Basic explicit representation

From the basic iterative (implicit) representation, the following is derived: v (t + 2) = v (t + 1) + ϕy (t + 1)

= v (t + 1) − ϕv (t ) + (1 − ϕ )v (t + 1) − v (t )

v (t + 2) + (ϕ − 2 )v (t + 1) + v (t ) = 0 1

Equ. 3.1

Note that the present paper uses the Bourbaki convention of representing open intervals with reversed brackets. Thus ]a,b[ is equivalent to parenthetical notation (a,b).

Assuming a continuous process, this becomes a classical second order differential equation:

∂ 2v ∂t

2

+ ln(e1e2 )

∂v + ln (e1 ) ln (e2 )v = 0 ∂t

Equ. 3.2

where e1 and e2 are the roots of

λ2 + (ϕ − 2)λ + 1 = 0 .

Equ. 3.3

2 e1 = 1 − ϕ + ϕ − 4ϕ 2 2 2 ϕ − 4ϕ ϕ e2 = 1 − − 2 2

Equ. 3.4

As a result,

The general solution is v(t ) = c1e1t + c2e2t

Equ. 3.5

A similar kind of expression for y(t) is now produced, where, y (t ) =

1

ϕ

(c e (e − 1) + c e (e t 11 1

t 2 2

2

− 1)

)

Equ. 3.6

The coefficients c1 and c2 depend on v(0) and y(0). If e1 ≠ e 2 , we have − ϕy (0) − (1 − e 2 )v(0) c1 = e 2 − e1 y + ϕ ( 0 ) (1 − e1 )v(0) c = 2 e 2 − e1

Equ. 3.7

In the case where e1=e2 (ϕ=4), Equ.3.5 and Equ. 3.6 give v(0) = c1 + c2 c1 + c2 y (0) = − 2

Equ. 3.8

v(0) + 2 y (0) = 0

Equ. 3.9

so we must have

in order to prevent a discontinuity. Regarding the expressions e1 and e2, eigenvalues of the matrix M, as in Section 2 above, the same discussion about

(

)

the sign of ϕ 2 − 4ϕ can be made, particularly about the (non) existence of cycles. The above results provide a guideline for preventing the explosion of the system, for we can immediately see that it depends on whether we have

max ( e1 , e 2 ) > 1 .

Equ. 3.10

3.2 A posteriori proof One can directly verify that v(t) and y(t) are indeed solutions of the initial system. On one hand, from their expressions, v(t + 1) = e1c1e1t + e2c2e2t Equ. 3.11 1 t t y (t + 1) = e1c1e1 (e1 − 1) + e2c2e2 (e2 − 1) ϕ

(

)

and on the other hand, v (t ) + ϕy (t ) = c1 e1t + c 2 e 2t + ( c1 e1t ( e1 − 1) + c 2 et2 ( e2 − 1)) = e1 c1 e1t + e 2 c 2 e 2t

Equ. 3.12

= v (t + 1)

and also − v(t ) + (1 − ϕ ) y (t ) = −c1e1t − c2e2t +

1

ϕ

(c1 e1t (e1 − 1) + c 2 et2 (e 2 − 1)) − (c1 e1t (e1 − 1) + (c 2 et2 (e 2 − 1))

[(e c e (e − 1) + e c e (e − 1))− (e c e (e − 1) + e c e (e − 1))− ϕ c e − ϕ c e + (c e (e − 1) + c e (e − 1) ) − ϕ (c e (e − 1) + c e (e − 1) )] 1 = [(e c e (e − 1) + e c e (e − 1) ) − c e (e − (ϕ − 2 )e + 1) − c e (e − (ϕ − 2 )e + 1)] ϕ =

1

ϕ

t 111 1

t 11 1

t 2 2 2

t 2 2

t 111 1

t 111 1

2

t 11 1

2

t 2 2 2

t 2 2 2

t 2 2

t 2 2

2

t 2 11 1

2

t 11

2

1

t 2 2 2 2

Equ. 3.13

2

= y (t + 1)

3.3

General implicit and explicit representations (IR and ER)

A more general implicit representation (IR) is produced by adding five coefficients {α , β , γ , δ ,η} , which will allow us to identify how the coefficients can be chosen in order to ensure convergence. With these coefficients the system becomes vt +1 = αvt + βϕyt yt +1 = −γvt + (δ − ηϕ ) yt

ϕ ∈ R+* ∀t ∈ N , {yt , vt } ∈ R

Equ. 3.14 2

α The matrix of the system is now M ' = − γ

βϕ . Let e1' and e 2' be its eigenvalues. δ − ηϕ

The explicit (analytic) representation (ER) becomes

( ) ( ) (( )

v(t ) = c1 e ' t + c2 e ' t 1 2 1 ' t c1 e1 (e1 ' − α ) + c2 e2 ' t (e2 ' − α ) y (t ) = βϕ

( )

)

ϕ ∈ R+*

Equ. 3.15

∀t ∈ N , {y (t ), v(t )} ∈ R 2

with − βϕy (0) − (α − e2 ')v(0) c1 = e2 ' − e1 ' ' c = βϕy (0) + (α − e1 )v(0) 2 e2 ' − e1 '

Equ. 3.16

Now the constriction coefficients (see Section 4 for details) χ 1 and χ 2 are defined by e1' = χ1e1 . ' e2 = χ 2e2

Equ. 3.17

with 2 e = 1 − ϕ + ϕ − 4ϕ 1 2 2 2 ϕ − 4ϕ ϕ e 2 = 1 − − 2 2

Equ. 3.18

which are the eigenvalues of the basic system. By computing the eigenvalues directly and using Equ. 3.17, χ 1 and

χ 2 are: 2 2 χ = α + δ − ηϕ + (ηϕ ) + 2ϕ (αη − δη − 2 βγ ) + (α − δ ) 1 2 − ϕ + ϕ 2 − 4ϕ α + δ − ηϕ − (ηϕ )2 + 2ϕ (αη − δη − 2 βγ ) + (α − δ )2 = χ 2 2 − ϕ − ϕ 2 − 4ϕ

Equ. 3.19

The final complete explicit representation can then be written from 3.15 and 3.16 by replacing e1' and e 2' , respectively, by χ 1 e1 and χ 2 e 2 , and then e1, e 2 , χ 1, χ 2 by their expressions, as seen in Equ. 3.18 and 3.19. It is immediately worth noting an important difference between IR and ER. In the IR, t is always an integer and v(t) and y(t) are real numbers. In the ER real numbers are obtained if and only if t is an integer; nothing however prevents the assignment of any real positive value to t, in which case v(t) and y(t) become true complex numbers. This fact will provide an elegant way of explaining the system’s behavior, by conceptualizing it in a 5-dimensional space, as discussed in Section 4.

Note 3.1 If χ 1 and χ 2 are to be real numbers for a given ϕ value, there must be some relations among the five real coefficients {α , β , γ , δ ,η} . If the imaginary parts of χ 1 and χ 2 are set equal to zero, the following is obtained: 1 E (1 + sign(E )) B (1 − A) = 0 E (1 − sign(E ))C − α + δ − ηϕ + 2 E (1 − sign(E ))C ′ − α + δ − ηϕ − 1 E (1 + sign(E )) B (1 − A) = 0 2

with

(

A = sign ϕ 2 − 4ϕ

Equ. 3.20

)

B = ϕ 2 − 4ϕ

( (

( (

)) ))

1 ϕ 2 − 4ϕ 1 + sign ϕ 2 − 4ϕ 2 1 C′ = 2 − ϕ − ϕ 2 − 4ϕ 1 + sign ϕ 2 − 4ϕ 2 2 1 D = C 2 + ϕ 2 − 4ϕ 1 − sign ϕ 2 − 4ϕ 4 C = 2 −ϕ +

(

(

Equ. 3.21

))

E = (ηϕ )2 + 2ϕ (αη − δη − 2 βγ ) + (α − δ )2

The two equalities of Equ. 3.20 can be combined and simplified as follows: E (1 − sign(E ))(2 − ϕ ) − (α + δ − ηϕ ) B (1 − A) = 0 E B sign(E )(1 + A) = 0

Equ. 3.22

The solutions are usually not completely independent of ϕ. In order to satisfy these equations, a set of possible conditions is E > 0 Equ. 3.23 A = −1 (⇔ ϕ < 4 ) α + δ − ηϕ = 0 But these conditions are not necessary. For example, an interesting particular situation (studied below) exists where

α = β = γ = δ = η = χ ∈ R+* . In this case χ1 = χ 2 = χ for any ϕ value, and Equ. 3.20 is always satisfied.

3.4 From ER to IR The explicit representation will be useful to find convergence conditions. Nevertheless, in practice, the iterative form obtained from Equ. 3.19 is very useful: 2(α + δ − ηϕ ) = (χ + χ )(2 − ϕ ) + (χ − χ ) ϕ 2 − 4ϕ 1 2 1 2 Equ. 3.24 2 (ηϕ )2 + 2ϕ (αη − δη − 2 βγ ) + (α − δ )2 = (χ 1 + χ 2 ) ϕ 2 − 4ϕ + (χ1 − χ 2 )(2 − ϕ )

Although there are an infinity of solutions in terms of the five parameters {α , β ,γ ,δ ,η} , it is interesting to identify some particular classes of solutions. This will be done in the next section. 3.5

Particular Classes of Solutions

3.5.1 Class 1 model The first model implementing the five-parameter generalization is defined by the following relations: α = δ βγ = η 2

Equ. 3.25

In this particular case, α and η are α = 1 2(χ + χ ) + (χ − χ ) ϕ 2 − 4ϕ + ϕ 2 − ϕ 1 2 1 2 4 ϕ 2 − 4ϕ η = 1 χ1 + χ 2 + 2 − ϕ (χ1 − χ 2 ) 2 ϕ 2 − 4ϕ

Equ. 3.26

An easy way to ensure real coefficients is to have χ1 = χ 2 = χ ∈ R . Under this additional condition, a class of solution is simply given by

α = β = γ = δ =η = χ

Equ. 3.27

3.5.2 Class 1' model A related class of model is defined by the following relation: α = β γ = δ = η

Equ. 3.28

The following expressions for α and γ are derived from Equ. 3.24: (χ1 + χ 2 )(2 − ϕ ) + (χ1 − χ 2 ) ϕ 2 − 4ϕ = + ϕ −1 α 2 Equ. 3.29 (χ + χ )(ϕ − 2 ) − (χ − χ ) ϕ 2 − 4ϕ 2 1 2 1 1 γ = 4(ϕ − 1) m 2 χ 2 ϕ 2 − 4ϕ + 2 − ϕ ϕ 2 − 4ϕ + 2 χ 2 ϕ 2 − 4ϕ + 2 + ϕ ϕ 2 − 4ϕ + 8χ χ (2ϕ − 1) 1 2 1 2

If the condition χ 1 = χ 2 = χ is added, then

α = (2 − ϕ )χ + ϕ − 1 χ γ = χ or γ = ϕ − 1

Equ. 3.30

Without this condition, one can choose a value for γ, for example γ = 1 , and a corresponding α value (χ 1 , χ 2 ) which give a convergent system. 3.5.3 Class 1" model A second model related to the Class 1 formula is defined by

α = β = γ =η

α=

Equ. 3.31

2δ + (χ1 + χ 2 )(ϕ − 2) − (χ1 − χ 2 ) ϕ 2 − 4ϕ 2(ϕ − 1)

Equ. 3.32

For historical reasons and for its simplicity, the case δ = 1 has been well studied. See Section 4.3 for further discussion. 3.5.4

Class 2 model

A second class of models is defined by the relations α = β = 2δ η = 2γ

Under these constraints it is clear that 2 2(3δ − 2γϕ ) = (χ1 + χ 2 )(2 − ϕ ) + (χ1 − χ 2 ) ϕ − 4ϕ 2 2γϕ − δ = (χ1 + χ 2 ) ϕ 2 − 4ϕ + (χ1 − χ 2 )(2 − ϕ )

Equ. 3.33

Equ. 3.34

which gives us γ and δ, respectively. Again, an easy way to obtain real coefficients for every ϕ value is to have χ1 = χ 2 = χ . In this case, 3δ − 2γϕ = χ (2 − ϕ ) 2 2γϕ − δ = χ ϕ − 4ϕ

Equ. 3.35

In the case where 2γϕ ≥ δ the following is obtained: 2 δ = χ 2 − ϕ + ϕ − 4ϕ = χe 1 2 2 − ϕ + 3 ϕ 2 − 4ϕ γ = χ 4ϕ

From the standpoint of convergence it is interesting to note that we have

Equ. 3.36

•

for the Class 1 models, with the condition χ1 = χ 2 = χ e' = χ e 1 1 e2' = χ e2

•

Equ. 3.37

for the Class 1' models, with the conditions χ1 = χ 2 = χ and ϕ ≤ 2

(

)+ 4χ (ϕ − 2) + 4(ϕ − 1) ≤ χ e

2 2 e' = χ 1 − ϕ + χ 4 − 4ϕ + ϕ 1 2 χ 2 4 − 4ϕ + ϕ 2 ϕ e2' = χ 1 − − 2

(

•

1

2

=χ

)+ 4χ (ϕ − 2) + 4(ϕ − 1) ≤ χ e

2

2

Equ. 3.38 =χ

and for the the Class 2 models

' 3 3 3 1 2 1 3 3 2 1 2 3 2 e1 = χ − ϕ + ∆ − ϕ + ϕ − ϕ ∆ + 2 − ϕ − 2ϕ − 2ϕ + ∆ − 3ϕ ∆ = χ e1,class 2 2 4 4 2 4 4 4 e2' = χ 3 − 3 ϕ + 3 ∆ − 1 ϕ 2 + 1 ϕ 3 − 3 ϕ 2 ∆ − 1 2 − ϕ − 2ϕ 2 − 2ϕ 3 + ∆ − 3ϕ 2 ∆ = χ e2,class 2 2 4 4 2 4 4 4

Equ. 3.39

with ∆ = ϕ 2 − 4ϕ

This means that we will just have to choose χ <

1 | e2 |

, χ < 1 and ϕ ≤ 2 , χ <

1 e 2, class 2

respectively, to have a

convergent system. This will be discussed further in Section 4. 3.6

Removing the discontinuity

Depending on the parameters {α , β , γ , δ ,η} the system may have a discontinuity in ϕ due to the presence of the term

(ηϕ )2 − 4 βγϕ + (α − δ )2 + 2ηϕ (α − δ )

in the eigenvalues.

Thus, in order to have a completely continuous system, the values for {α , β , γ , δ ,η} must be chosen such that {α , β , γ , δ ,η} ∈ R 5 ∀ϕ ∈ R + , (ηϕ )2 − 4 βγϕ + (α − δ )2 + 2ηϕ (α − δ ) ≥ 0

Equ. 3.40

By computing the discriminant, the last condition is found to be equivalent to

βγ (− βγ + η (α − δ )) > 0

Equ. 3.41

In order to be “physically plausible,” the parameters {α , β , γ , δ ,η} must be positive. So the condition becomes

βγ < η (α − δ )

Equ. 3.42

The set of conditions taken together specify a volume in R 5 for the admissible values of the parameters.

3.7

Removing the imaginary part

When the condition specified in Equ. 3.42 is met, the trajectory is usually still partly in a complex space whenever one of the eigenvalues is negative, due to the fact that (−1)t is a complex number when t is not an integer. In order to prevent this, we must find some stronger conditions in order to maintain positive eigenvalues. Since e1' > 0 ⇔ ' e2 > 0

e1' + e2' > 0 ' ' e1e2 > 0

Equ. 3.43

the following conditions can be used to ensure positive eigenvalues: α (δ − ηϕ ) + γβϕ > 0 α + δ − ηϕ > 0

Equ. 3.44

Note 3.2 From an algebraic point of view, the conditions described in Equ. 3.43 can be written as det( M ) > 0 Equ. 3.45 trace( M ) > 0 But now these conditions depend on ϕ. Nevertheless, if the maximum ϕ value is known, they can be rewritten as: αδ αη − γβ > ϕ max α +δ > ϕ max η

Equ. 3.46

Under these conditions, all system variables are real numbers. in conjunction with the conditions in Equ’s. 3.42 and 3.44, the parameters can be selected so that the system is completely continuous and real. 3.8 Example As an example, suppose that α = β = 1 and δ = η . Now the conditions become 1 δ < ϕ max δ (ϕ max − 1) < γ < δ (1 − δ ) ϕ max

For example, when

Equ. 3.47

ϕ max = 10 y0 = 0, v0 = 1 α = β = 1 1 δ (ϕ max − 1) + δ (1 − δ ) = 0.08915 γ = 2 ϕ max 0.99 = 0.099 δ = η = ϕ max

,

Equ. 3.48

the system converges quite quickly after about 25 time steps, and at each time step the values of y and v are almost the same over a large range of ϕ values. Figure 3.1 shows an example of convergence (v≥0 and y≥0 ) for a continuous, real-valued system with ϕ = 4. 3.9 Reality and convergence The quick convergence seen in the above example suggests an interesting question. Does reality – using real-valued variables – imply convergence? In other words, does the following hold for real-valued system parameters? αδ e' < 1 αη − γβ > ϕ max 1 ⇒ e2' < 1 α +δ > ϕ max η

Equ. 3.49

The answer is no. It can be demonstrated that convergence is not always guaranteed for real-valued variables. For example, given the following parameterization: ϕ max = 10 y0 = 0, v0 = 1 α = β = 1.1 γ = 0.0891495 δ = η = 0.099

Equ. 3.50

αδ αη − γβ = 10.05 > ϕ max α + δ = 12.11 > ϕ max η

Equ. 3.51

the relations are

which will produce system divergence when ϕ = 0.1 (for instance), since e1' = 1.09 > 1 . This is seen in Figure 3.2. ---------------Figures 3.1 and 3.2 ---------------

4. CONVERGENCE AND SPACE OF STATES From the general explicit representation we find the criterion of convergence: e' < 1 1 e2' < 1

Equ. 4.1

where v(t ) and y (t ) are usually true complex numbers. Thus, the whole system can be represented in a 5-dimension space (Re( y ), Im( y ), Re(v ), Im(v), ϕ ) . In this section we study some examples of the most simple class of constricted cases: the ones with just one constriction coefficient. These will allow us to devise methods for controlling the behavior of the swarm in ways that are desirable for optimization. 4.1 Constriction For Model Type 1 Model Type 1 is described as follows: v(t + 1) = χ (v(t ) + ϕy (t )) y (t + 1) = − χ (v(t ) + (1 − ϕ ) y (t ) )

Equ. 4.2

1 1 . Since e1 ≤ e 2 the constriction We have seen that the convergence criterion is satisfied when χ < min , e1 e2 coefficient below is produced:

χ=

κ e2

, κ ∈ ]0,1[

Equ. 4.3

4.2 Constriction for model Type 1' Just as a constriction coefficient was found for the Type 1 model, the following implicit representation (with χ instead of α) is used for Type 1': v(t + 1) = χ (v(t ) + ϕy (t )) y (t + 1) = −v(t ) + (1 − ϕ ) y (t )

Equ.4.4

The coefficient becomes

χ=

κ e2

, κ ∈ ]0,1[, for ϕ ∈ ]0,2[ .

Equ.4.5

But, as seen above, this formula is a priori valid only when ϕ<2, so it is interesting to find another constriction coefficient that has desirable convergence properties. We have here e2' =

χ +1−ϕ 2

−

(χ − 1)2 + ϕ 2 − 2ϕ − 2 χϕ 2

]

Equ.4.6

[

The expression under the square root is negative for χ ∈ 1 + ϕ − 2 ϕ ,1 + ϕ + 2 ϕ . In this case, the eigenvalue is a true complex number, and e2' = χ . Thus, if 1 + ϕ − 2 ϕ < 1 , that is to say if ϕ<4, a χ needs to be selected such

]

[

that χ ∈ 1 + ϕ − 2 ϕ ,1 in order to satisfy the convergence criterion. So, for example, define χ as

χ=

2 +ϕ − 2 ϕ 2

, for ϕ ∈ ]0,4[

Equ.4.7

Now, can another formula for greater ϕ values be found? The answer is “No”. For in this case, e 2' is a real number, and its absolute value is

] strictly decreasing on α ∈ [1 + ϕ + 2

]

• strictly decreasing on α ∈ 0,1 + ϕ − 2 ϕ , and the minimal value is ϕ − 1 (greater than 1), •

[

ϕ , ∞ , with a limit of 1.

For simplicity, the formula can be the same as for Type 1, not only for ϕ<2, but also for ϕ<4. This is indeed also

(

possible, but then κ can not be too small, depending on ϕ. More precisely, the constraint κ > 1 + ϕ − 2 ϕ

) e2

must be satisfied. But as for ϕ<4, we have e2 = 1 , this just means that the curves in Figures 4.1 and 4.2 can then be interpreted as the mean and minimally acceptable χ values for sure convergence. For example, for ϕ=3, the constraint κ>0.536 must hold. But there is no such restriction on κ ∈ ]0,1[ if ϕ=1. ----------------------Figures 4.1 and 4.2 --------------------Note 4.1 The above analysis is for ϕ=constant. If ϕ is random, it is nevertheless possible to have convergence, even with a small constriction coefficient, when at least one ϕ value is strictly inside the interval of variation. 4.3 Constriction type 1" Referring to the Class 1" model, in the particular case where δ = 1 , we use the following implicit representation (with χ instead of α) v(t + 1) = χ (v(t ) + ϕy (t )) Equ. 4.8 y (t + 1) = − χv(t ) + (1 − χϕ ) y (t ) In fact, this system is hardly different from the classical particle swarm as described in the Section 1 v(t + 1) = χ (v(t ) + ϕ ( p − x(t ))) Equ. 4.9 x(t + 1) = v(t + 1) + x (t ) so it may be interesting to detail how, in practice, the constriction coefficient is found and its convergence properties proven. Step 1. Matrix of the system

χϕ χ We have immediately M = . − χ 1 − χϕ

Equ. 4.10

Step 2. Eigenvalues They are the two solutions for the equation or Thus,

Z 2 − trace(M )Z + determinant (M ) = 0

Equ. 4.11

Z 2 − (χ + 1 − χϕ )Z + χ = 0

Equ. 4.12

' χ + 1 − χϕ + ∆ e1 = 2 χ + 1 − χϕ − ∆ ' e2 = 2

with

Equ. 4.13

∆ = trace(M )2 − 4determinant (M ) 2 1 1 = χ 2 ϕ 2 − 4ϕ + 2ϕ 1 − + 1 − χ χ

Equ. 4.14

Step 3. Complex and real areas on ϕ 1 2 1 2 ,1 + + The discriminant ∆ is negative for the ϕ values in 1 + − . In this area the eigenvalues are true χ χ χ χ

complex numbers and their absolute value (i.e., module) is simply

χ .

Step 4. Extension of the complex region and constriction coefficient In the complex region, according to the convergence criterion, χ < 1 in order to get convergence. So the idea is to find a constriction coefficient depending on ϕ so that the eigenvalues are true complex numbers for a large field of ϕ values. This is generally the most difficult step and sometimes needs some intuition. Three pieces of information help us here: • The determinant of the matrix is equal to χ , •

This is the same as in Constriction Type 1, and

•

We know from the algebraic point of view the system is (eventually) convergent like M T

So it appears very probable that the same constriction coefficient used for Type 1 will work. First, we try

χ=

κ e2

,κ ∈ ]0,1[

2κ for ϕ > 4 2 ϕ − 2 + ϕ − 4ϕ that is to say else κ In this case the common absolute value of the eigenvalues is 2κ for ϕ > 4 2 ϕ − 2 + ϕ − 4ϕ else κ

Equ. 4.15

Equ. 4.16

Equ. 4.17

which is smaller than 1 for all ϕ values as soon as κ is itself smaller than 1.

It is easy to see that ∆ is negative only between ϕ min and ϕ max , depending on κ . The general algebraic form of

ϕ max is quite complicated (polynomial in κ 6 with some coefficients being roots of an equation in κ 4 ) so it is much easier to compute it indirectly for some κ values. If ϕ min is smaller than 4, then χ = κ and by solving ∆ = 0 we κ 2 + κ − 2κ 3 2

1 . Figure 4.3 shows how the discriminant 9 κ depends on ϕ , for two κ values. It is negative between the ϕ values given below:

find that ϕ min =

2

. This relation is valid as soon as κ ≥

κ

ϕ min

ϕ max

0.4 0.99

0.3377 0.000025

8.07 39799.76 --------------------Figure 4.3 ---------------------

4.3

Moderate constriction

While it is desirable for the particle’s trajectory to converge, by relaxing the constriction the particle is allowed to oscillate through the problem space initially, searching for improvement. Therefore it is desirable to constrict the system moderately, preventing explosion while still allowing for exploration. To demonstrate how to produce moderate constriction the following explicit representation is used: v(t ) = c e t + c (χe )t 1 1 2 2 1 t t y (t ) = c1e1 (e1 − 1) + c 2 (χe2 ) ( χe2 − 1) ϕ

(

χ=

κ e2

)

Equ.4.18

, κ ∈ ]0,1[

χ1 = 1 that is to say . From the relations between ER and IR, the following is obtained: χ 2 = χ 2(α + δ − ηϕ ) = (1 + χ )(2 − ϕ ) + (1 − χ ) ϕ 2 − 4ϕ Equ.4.19 2 (ηϕ )2 + 2ϕ (αη − δη − 2 βγ ) + (α − δ )2 = (1 + χ ) ϕ 2 − 4ϕ + (1 − χ )(2 − ϕ ) There is still an infinity of possibilities for selecting the parameters α..η. In other words there are many different implicit representations that produce the same explicit one. For example:

2 α = ϕ + 2 χ − χϕ + ϕ − 4ϕ (1 − χ ) 2 2 1 2 2 2 ϕ − 3χϕ − ϕ + χϕ + ϕ − 4ϕ (1 + χϕ − χ − ϕ ) β = − ϕ 2 γ = δ = η = 1

or

Equ.4.20

α = β = 1 ϕ (1 + χ ) − ϕ 2 − 4ϕ (1 − χ ) γ = 2ϕ ϕ + χ (ϕ − 2 ) − ϕ 2 − 4ϕ (1 − χ ) δ η = = 2(ϕ − 1)

Equ.4.21

From a mathematical point of view, this case is richer than the previous ones. There is no more explosion, but there is not always convergence either. This system is “stabilized,” in the sense that the representative point in the state space tends to move along an attractor which is not always reduced to a single point as in classical convergence. 4.4 Attractors and convergence Figures 4.4 through 4.7 show the “real” restrictions (Re( y ), Re(v), ϕ ) of the particles that are typically studied. We can clearly see the three cases: • “Spiral” easy convergence towards a nontrivial attractor for ϕ<4 (Figure 4.5), • Difficult convergence for ϕ≈4 (Figure 4.6), and • Quick almost linear convergence for ϕ>4 (Figure 4.7). - - - - - - -- - - - - - - - - - - - - - -Figure 4.4 - - - - - - -- - - - - - - - - - - - - - -- - - - - - -- - - - - - - - - - - - - - -Figure 4.5 - 4.10 - - - - - - -- - - - - - - - - - - - - - --

---------

Nevertheless, it is interesting to have a look at the true system, including the complex dimensions. Figures 4.8 through 4.11 show some other sections of the whole surface in R 5 . - - - - - - -- - - - - - - - - - - - - - - - - Figure 4.11. Global attractor for v and ϕ ≤ 4 . Axis (Re(v), Im(v), ϕ), κ=0.8 - - - - - - -- - - - - - - - - - - - - - -- - - Note 4.2 There is a discontinuity, for the radius is equal to zero for ϕ>4. - - - - - - -- - - - - - - - - - - - - - -- - - Figure 4.12 - - - - - - -- - - - - - - - - - - - - - -- - - Thus, what it seems to be an “oscillation” in the real space is in fact a continuous spiralic movement in a complex space. More importantly, the attractor is very easy to define: it is the “circle” c1e1t (center (0,0) and radius ρ). When ϕ < 4, ρ = c1e1 and when ϕ >4, then ρ = 0 ( lim t →∞ c1e1t with e1 < 1 ), for the constriction coefficient χ has been precisely chosen so that the part c 2 (χe 2 )t of v(t) tends to zero. This provides an intuitive way to transform

this stabilization into a true convergence. We just have to use a second coefficient in order to reduce the attractor, in the case ϕ < 4 , so that κ′ e1 → χ ′e1 , χ ′ ≤ , κ ′ ∈ ]0,1[ Equ.4.22 e1 The models studied here have only one constriction coefficient. If one sets χ ′ = χ , the Type 1 constriction is produced. But now, we understand better why it works.

5.

GENERALIZATION OF THE PARTICLE SWARM SYSTEM

Thus far the focus has been on a special version of the particle swarm system, a system reduced to scalars, collapsed terms, and nonprobabilistic behavior. The analytic findings though can easily be generalized to the more usual case where ϕ is random and two vector terms are added to the velocity. In this section the results are generalized back to the original system as defined by v (t + 1) = v(t ) + ϕ1 ( p1 − x(t )) + ϕ 2 ( p 2 − x(t )) Equ. 5.1 x(t + 1) = v(t + 1) + x(t ) Now ϕ, p, and y(t) are defined to be ϕ = ϕ1 + ϕ 2

ϕ p + ϕ 2 p2 p= 1 1 ϕ1 + ϕ 2

Equ. 5.2

y (t ) = p − x (t )

to obtain exactly the original nonrandom system described in Section 1. For instance, if there is a cycle for ϕ = ϕ c , then there is an infinity of cycles for the values {ϕ1 , ϕ 2 } so that ϕ1 + ϕ 2 = ϕ c . Upon computing the constriction coefficient, the following form is obtained: κ κ 2κ = χ= = e2 ϕ (ϕ − 4) 2 − ϕ − ϕ (ϕ − 4) ϕ 1− − 2 2 =

2κ 2 − ϕ1 − ϕ 2 −

(ϕ1 + ϕ 2 )(ϕ1 + ϕ 2 − 4)

, if (ϕ1 + ϕ 2 ) > 4

Equ. 5.3

= κ else κ ∈ ] 0,1 [

Coming back to the (v,x) system, v and x are v(t + 1) = v(t ) + ϕ1 ( p1 − x(t )) + ϕ ( p − x(t )) 2 2 ϕ1 p1 + ϕ 2 p2 x(t + 1) = χv(t + 1) + χx(t ) + (1 − χ ) ϕ + ϕ 1 2

Equ. 5.4

The use of the constriction coefficient can be viewed as a recommendation to the particle to “take smaller steps.” ϕ p + ϕ 2 p2 ) . Remember v is in fact the velocity of the particle, The convergence is towards the point (v = 0, x = 1 1 ϕ1 + ϕ 2 2

so it will indeed be equal to zero in a convergence point . Example v 0 = 1, x 0 = 4.5 p1 = 3, p 2 = 4 ϕ max,1 = 0.1, ϕ max,2 = 5

ϕ1 and ϕ2 are uniform random variables between 0 and ϕmax,1 and ϕmax,2 respectively. This example is shown in Figure 5.1. --------------Figure 5.1

------------6. RUNNING THE PARTICLE SWARM WITH CONSTRICTION COEFFICIENTS As a result of the above analysis, the particle swarm algorithm can be conceived of in such a way that the system’s explosion can be controlled, without resorting to the definition of any arbitrary or problem-specific parameters. Not only can explosion be prevented, but the model can be parameterized in such a way that the particle system consistently converges on local optima. (Except for a special class of functions, convergence on global optima cannot be proven.) The particle swarm algorithm can now be extended to include many types of constriction coefficients. The most general modification of the algorithm for minimization is presented in the following pseudocode: Assign κ, ϕmax Calculate χ, α, β, γ, δ, η Initialize population: random xi, vi Do For i=1 to population size if f(xi)

Convergence implies velocity=0, but the convergent point is not necessarily the one we want, particularly if the system is too constricted. We hope to show in a later paper how to cope with this problem, by defining the optimal parameters.

Next i Until termination criterion is met In this generalized version of the algorithm, the user selects the version and chooses values for κ and ϕ that are consistent with it. Then the two eigenvalues are computed, and the greater one is taken. This operation can be performed as follows: discrim = a=

2 2 (ηϕ ) − 4 βγ + (α −δ ) + 2ηϕ (α − δ ) 4

α + δ − µϕ

2 if (discrim>0) then

neprim1= abs( a + discrim ) neprim2= abs( a − discrim ) else neprim1= a 2 + abs( discrim ) neprim2=neprim1 max(eig.)=max(neprim1, neprim2) These steps are taken only once in each program, and thus do not slow it down. For the versions tested in this paper, the constriction coefficient is calculated simply as χ =

κ

max(eig.)

. For instance, the Type 1 version is

defined by the rules α = β = γ = δ = η = χ. The generalized description allows the user to control the degree of convergence by setting κ to various values. For instance, in the Type 1″ version, κ=1.0 results in slow convergence, meaning that the space is thoroughly searched before the population collapses into a point. In fact, the Type 1″ constriction particle swarm can be programmed as a very simple modification to the standard version presented in the Introduction. The constriction coefficient χ is calculated as shown in Equation 4.16: 2κ for ϕ > 4 χ = ϕ − 2 + ϕ 2 − 4ϕ else κ

The coefficient is then applied to the right side of the velocity adjustment: Calculate χ Initialize population Do For i= 1 to Population Size r r r r if f( x i )

(

xid = xid + vid Next d Next i Until termination criterion is met

)

Note that the algorithm now requires no explicit limit V max . The constriction coefficient makes it unnecessary. Eberhart and Shi (2000) have recommended, based on their experiments, that a liberal V max , for instance, one that is equal to the dynamic range of the variable, be used in conjunction with the Type 1″ constriction coefficient. Though this extra parameter may enhance performance, the algorithm will still run to convergence even if it is omitted. 7. EMPIRICAL RESULTS Several types of particle swarms were used to optimize a set of unconstrained real-valued benchmark functions, namely several of De Jong’s (1975) functions, Schaffer’s f6, and the Griewank, Rosenbrock, and Rastrigin functions. A population of 20 particles was run for twenty trials per function, with the best performance evaluation recorded after 2,000 iterations. Some results from Angeline’s (1998) runs using an evolutionary algorithm are shown for comparison. Though these functions are commonly used as benchmark functions for comparing algorithms, different versions of them have appeared in the literature. The formulas used here for De Jong’s f1, f2, f4 (without noise), f5, and Rastrigin functions are taken from Reynolds and Chung (1997). Schaffer’s f6 function is taken from Davis (1991), 3rd edition; note that earlier editions give a somewhat different formula. The Griewank function given here is the one used in the First International Contest on Evolutionary Optimization held at ICEC 96, and the Rosenbrock function is taken from Angeline (1998). Table 7.1. Functions used to test the effects of the constriction coefficients. Sphere function (De Jong’s f1) f ( x) = ∑in=1 x 2 i

1

Rosenbrock variant (De Jong’s f2) De Jong’s f4 – no noise

2

f 2 ( x) = 100( x12 − x 2) + (1− x1) 2 n

f 4 ( x) = ∑ i ⋅ xi 4 i =1

Foxholes (De Jong’s f5)

1 f 5 ( x ) = 1 0.002 + ∑ 6 2 j =1 j + ∑ i =1 ( x i − a ij ) 25

Schaffer’s f6 f 6 ( x) = 0.5 +

Griewank function Rosenbrock function

f 7 ( x) =

7.1.

(1.0 + 0.001( x 2 + y 2))

2

1 2 ( − 100) ) + 1 ∑in=1 ( xi −100) − ∏in=1 cos( xi i 4000 n

2

f 9 ( x) = ∑ (100( xi +1− x 2) + ( xi −1)2) i =1

Rastrigin function

(sin x 2 + y 2 ) − 0.5

[

2 f 10 ( x) = ∑in=1 xi − 10 cos(2π x i ) + 10

]

Algorithm variations used

Three variations of the generalized particle swarm were used on the problem suite. Type 1. The first version applied the constriction coefficient χ to all terms of the formula: α = β = γ = δ =η = χ using κ = 0.8 . Type 1″. The second version tested was a simple constriction which was not designed to converge, but not to explode, either, as κ was assigned a value of 1.0. The model was defined as: α=β =χ γ = δ = η = 1.0

Experimental version. The third version tested was more experimental in nature. The constriction coefficient

χ was initially defined as

κ

. If χ > 1 then it was multiplied by 0.9 iteratively until χ ≤ 1 . Once a max(e1 , e 2) satisfactory value was found, the following model was implemented: α = β =1 γ =χ

δ =η = χ2 As in the first version, a “generic” value of κ = 0.8 was used. Table 7.2 displays the problem-specific parameters implemented in the experimental trials. Table 7.2 Function parameters for the test problems. Function Dimension Initial Range 1 30 ±20 2 2 ±50 4 30 ±20 5 2 ±50 Shaffer’s f6 2 ±100 Griewank 30 ±300 Ackley 30 ±32 Rastrigin 30 ±5.12 Rosenbrock 30 ±10 7.2.

Results

Table 7.3, below, compares various constricted particle swarms’ performance to that of the traditional V max particle swarm and evolutionary optimization (EO) results reported by Angeline (1998). All particle swarm populations comprised 20 individuals. Functions were implemented in 30 dimensions except for f2, f5, and f6, which are given for two dimensions. In all cases except f5, the globally optimal function result is 0.0. For f5 the best known result is 0.998004. The limit of the control parameter ϕ was set to 4.1 for the constricted versions, and 4.0 for the V max versions of the particle swarm. The column labeled “E&S” was programmed according to the recommendations of Eberhart and Shi (2000). This condition included both Type 1″ constriction and V max , with V max set to the range of the initial domain for the function. Function results were saved with six decimal places of precision. Table 7.3. Empirical results. Mean best evaluations at the end of 2,000 iterations for various versions of particle swarm and Angeline’s (1998) evolutionary algorithm. Type 1" Type 1 Exp. E&S Angeline Function V max =2 V max =4 Version (1998) 1 15.577775 59.301901 0 0 0 0 9.8808 2 0.000500 0.0013263 0 0 0 0 4 271.107996 4349.137512 0 0 0 0 5 2.874299 3.564808 0.998004 0.998004 3.507922 0.998004 Shaffer’s f6 0.000464 0.000247 0.001459 0.002915 29.173010 0.000155 Griewank 0.562339 0.968623 0.003944 0.008614 0.038923 0.002095 0.4033 Ackley 4.287476 6.623447 0.204988 0.150886 7.135213 0.104323 Rastrigin 223.834812 299.771716 82.95618 81.689550 63.222601 57.194136 46.4689 Rosenbrock 2770.882599 37111.70703 50.193877 39.118488 47.753953 50.798139 1610.359

As can be seen, the Type 1" and Type 1 constricted versions outperformed the V max versions in almost every case; the experimental version was sometimes better, sometimes not. Further, the Type 1" and Type 1 constricted particle swarms performed better than the comparison evolutionary method on three of the four functions; with some caution we can at least consider the performances to be comparable. Eberhart and Shi’s suggestion to hedge the search by retaining V max with Type 1" constriction does seem to result in good performance on all functions. It is the best on all except the Rosenbrock function, where performance was still respectable. An analysis of variance was performed comparing the “E&S” version with Type 1", standardizing data within functions. It was found that the algorithm had a significant main effect, F(1, 342)=12.02, p<0.0006, but that there was a significant interaction of algorithm with function, F(8,342)=3.68, p<0.0004, suggesting that the gain may not be robust across all problems. These results support those of Eberhart and Shi (2000). Any comparison with Angeline’s evolutionary method should be considered cautiously. The comparison is offered only as a prima facie standard by which to assess performances on these functions after this number of iterations. There are numerous versions of the functions reported in the literature, and it is extremely likely that features of the implementation are responsible for some variance in the observed results. The comparison though does allow the reader to confirm that constricted particle swarms are comparable in performance to at least one evolutionary algorithm on these test functions. As has long been noted, the V max particle swarm succeeds at finding optimal regions of the search space, but has no feature that enables it to converge on optima (e.g., Angeline, 1998). The constriction techniques reported in this paper solve this problem, they do force convergence. The data clearly indicate an increase in the ability of the algorithm to find optimal points in the search space for these problems as a result. No algorithmic parameters were adjusted for any of the particle swarm trials. Parameters such as V max , ϕ, population size, etc. were held constant across functions. Further, it should be emphasized that the population size of 20 is considerably smaller than what is usually seen in evolutionary methods, resulting in fewer function evaluations and consequently faster clock time in order to achieve a similar result. For instance, Angeline’s results cited for comparison are based on populations of 250. 8. CONCLUSIONS This paper explores how the particle swarm algorithm works from the inside, that is, from the individual particle’s point of view. How a particle searches a complex problem space is analyzed, and improvements to the original algorithm based on this analysis are proposed and tested. Specifically, the application of constriction coefficients allows control over the dynamical characteristics of the particle swarm, including its exploration versus exploitation propensities. Though the pseudocode in Section 6 may look different from previous particle swarm programs, it is essentially the same algorithm rearranged to enable the judicious application of analytically-chosen coefficients. The actual implementation may be as simple as computing one constant coefficient and using it to weight one term in the formula. The Type 1" method in fact requires only the addition of a single coefficient, calculated once at the start of the program, with almost no increase in time or memory resources. In the current analysis the sine waves identified by Ozcan and Mohan (1998a; 1998b) turn out to be the real parts of the 5-dimensional attractor. In complex number space, e.g., in continuous time, the particle is seen to spiral toward an attractor which turns out to be quite simple in form: a circle. The real-number section by which this is observed when time is treated discretely is a sine wave. The 5-dimensional perspective completely summarizes the behavior of a particle, and permits the development of methods for controlling the explosion that results from randomness in the system. Coefficients can be applied to various parts of the formula in order to guarantee convergence, while encouraging exploration. Several kinds of coefficient adjustments are suggested in the present paper, but we have barely scratched the surface, and plenty of experiments should be prompted by these findings. Simple modifications based on the present analysis resulted in

an optimizer which appears, from these preliminary results, to be able to find the minima of some extremely complex benchmark functions. These modifications can guarantee convergence, which the traditional V max particle swarm does not. It is possible to modify the algorithm with an increase in efficiency, and without increasing its complexity – in fact, it suggests that no problem-specific parameters may need to be specified. We remind the reader that the real strength of the particle swarm derives from the interactions among particles as they search the space collaboratively. The second term added to the velocity is derived from the successes of others, it is considered a “social influence” term; when this effect is removed from the algorithm performance is r abysmal (Kennedy, 1997). Effectively, the variable p g keeps moving, as neighbors find better and better points in r the search space, and its weighting relative to p i varies with randomly with each iteration. As a particle swarm population searches over time, individuals are drawn toward one another’s successes, with the usual result being clustering of individuals in optimal regions of the space. The analysis of the social-influence aspect of the algorithm is a topic for a future paper.

REFERENCES Angeline, P. (1998). Evolutionary optimization versus particle swarm optimization: Philosophy and performance differences. In V. W. Porto, Saravanan, N., Waagen, D., and Eiben, A. E. (Eds.), Evolutionary Programming VII, 601-610. Berlin: Springer. Davis, L. (Ed.) (1991). Handbook of Genetic Algorithms. New York, NY: Van Nostrand Reinhold. De Jong, K. (1975). An analysis of the behavior of a class of genetic adaptive systems. PhD thesis, University of Michigan. Eberhart, R. C., and Shi, Y. (2000). Comparing inertia weights and constriction factors in particle swarm optimization. Proceedings of the 2000 International Congress on Evolutionary Computation (San Diego, California), IEEE Service Center, Piscataway, NJ, 84-88. Kennedy, J. (1997). The particle swarm: Social adaptation of knowledge. Proceedings of the 1997 International Conference on Evolutionary Computation (Indianapolis, Indiana), IEEE Service Center, Piscataway, NJ, 303-308. Kennedy, J., and Eberhart, R. C. (1995). Particle swarm optimization. Proceedings of the IEEE International Conference on Neural Networks (Perth, Australia), IEEE Service Center, Piscataway, NJ, IV: 1942-1948. Kennedy, J. (1998). Methods of agreement: Inference among the eleMentals. Proceedings of the 1998 IEEE International Symposium on Intelligent Control (ISIC) Held Jointly with the International Symposium on Computational Intelligence in Robotics and Automation and the Intelligent Systems and Semiotics (ISAS) A Joint Conference on the Science and Technology of Intelligent Systems. Ozcan, E., and Mohan, C. K. (1998a). Trajectory analyses for a simple particle swarm optimization system. Foundations of Genetic Algorithms 1998. Ozcan, E., and Mohan, C. K. (1998b). The behavior of particles in a simple PSO system. Proceedings of ANNIE. Reynolds, R. G., and Chung, C-J (1997). Knowledge-based self-adaptation in evolutionary programming using cultural algorithms. Proceedings of the 1997 International Conference on Evolutionary Computation (Indianapolis, Indiana), IEEE Service Center, Piscataway, NJ, 71-76. Shi, Y., and Eberhart, R. C. (1997). Parameter selection in particle swarm adaptation. In V. W. Porto, Saravanan, N., Waagen, D., and Eiben, A. E. (Eds.), Evolutionary Programming VII, 591-600. Berlin: Springer.

Copyright © 2019 PROPERTIBAZAR.COM. All rights reserved.