Evolving neural network using real coded genetic algorithm (GA) for multispectral image classification

Future Generation Computer Systems 20 (2004) 1119–1129

Evolving neural network using real coded genetic algorithm(GA) for multispectral image classification

Zhengjun Liu∗, Aixia Liu, Changyao Wang, Zheng NiuLARSIS, Chinese Academy of Sciences, The Institute of Remote Sensing Applications, Beijing 100101, China

Available online 20 February 2004

Abstract

This paper investigates the effectiveness of the genetic algorithm (GA) evolved neural network classifier and its applicationto the land cover classification of remotely sensed multispectral imagery. First, the key issues of the algorithm and the generalprocedures are described in detail. Our methodology adopts a real coded GA strategy and hybrid with a back propagation (BP)algorithm. The genetic operators are carefully designed to optimize the neural network, avoiding premature convergence andpermutation problems. Second, a SPOT-4 XS imagery is employed to evaluate its accuracy. Traditional classification algo-rithms, such as maximum likelihood classifier, back propagation neural network classifier, are also involved for a comparisonpurpose. Based on an evaluation of the user’s accuracy and kappa statistic of different classifiers, the superiority of applyingthe discussed genetic algorithm-based classifier for simple land cover classification using multispectral imagery data is estab-lished. Thirdly, a more complicate experiment on CBERS (China–Brazil Earth Resources Satellite) data and discussion alsodemonstrates that carefully designed genetic algorithm-based neural network outperforms than gradient descent-based neuralnetwork. This has been supported by the analysis of the changes of connection weights and biases of the neural network.Finally, some concluding remarks and suggestions are also presented.© 2003 Elsevier B.V. All rights reserved.

Keywords: Genetic algorithm; Land cover classification; Neural network; Remote sensing

1. Introduction

So far, several pattern recognition algorithms havebeen adopted in remote sensing land cover classifi-cation [20], including some newly developed super-vised classification method, for example, the FuzzyARTMAP Classifier[3,4,8], and the Genetic Classifier[1,17]. Within these methods, the neural network clas-sifier and some other intelligent methods have beenrecognized to be the most promising algorithms.

∗ Corresponding author. Tel.:+86-10-68184427;fax: +86-10-68211420.E-mail address: [email protected] (Z. Liu).

The classification of remotely sensed datasets us-ing artificial neural networks first appeared in remotesensing literature about 10 years ago[2]. Since then,examples and applications at different scales and withdifferent data sources have become increasingly com-mon. In nearly all cases, the neural network classi-fier has proved its superiority to traditional classifiers,usually with 10–20% overall accuracy improvements.

The most widely used neural network model is themultilayer percepton (MLP), in which the connec-tion weight training is normally completed by a backpropagation (BP) learning algorithm[18]. The idea ofweight training in MLPs is usually formulated as min-imization of an error function, such as mean squareerror (MSE) between target and actual outputs aver-

0167-739X/$ – see front matter © 2003 Elsevier B.V. All rights reserved.doi:10.1016/j.future.2003.11.024

1120 Z. Liu et al. / Future Generation Computer Systems 20 (2004) 1119–1129

aged over all examples, by iteratively adjusting con-nection weights. One of the essential characteristicsof the back propagation algorithm is gradient descent,which has been discussed in many textbooks[16] andsoftware manuals[15].

Despite its popularity as an optimization tool forneural network training, the gradient descent tech-nique also has several drawbacks. For instance, theperformance of the network learning is strictly de-pendent on the shape of the error surface, values ofthe initial connection weights, and some further so-phisticate parameters. A common error surface mayhave many local minima, multimodal and/or nondif-ferentiable, which is usually not very easy to meet thedesired convergence criterion. This typically makesthe gradient descent-based algorithm stuck in somelocal minimum when moving across the error surface.Another shortcoming is the efficiency of differentialoperation. Multilayer networks typically usesigmoidtransfer functions in the hidden layers. These func-tions are often called “squashing” functions, sincethey compress an infinite input range into a finiteoutput range.Sigmoid functions are characterized bythe fact that their slope must approach zero as theinput gets large. This may result in a problem whenusing steepest descent to train a multilayer networkwith sigmoid functions, since the magnitude of gra-dient may change more and more smaller when theinput becomes larger, therefore making tiny changesin the weights and biases, even though the weightsand biases are far from their optimal values.

Of course there are some approaches to prevent thegradient descent algorithm from becoming stuck inany local minimum when moving across the error sur-face [11]. However, all these are not able to reallyovercome the many existing problems[11,13].

On the other side, genetic algorithms (GAs)[7] of-fer an efficient search method for a complex problemspace and can be used as powerful optimization tools.With regard to the above-mentioned problems of thegradient descent a complete substitution of them by aGA might be advantageous.

Recently some investigations into neural networktraining using genetic algorithms have been pub-lished [21]. Often only selective problems with theirparticular solution are in the focus of attention.There is still few article concerning evolutionaryneural network classifier for remote sensing land

cover classification that has been published in theliterature.

With GAs, we can formulate the neural networktraining process as the evolution of connection weightsin the environment determined by the architectureand the learning task. Potential single individual so-lutions (which are chromosomes in terms of GAs) toa problem compete with each other through selec-tion, crossover, and mutation operations in order toachieve increasingly better results. With this strategy,GA can then be used effectively in the evolution tofind a near-optimal set of connection weights globallywithout computing gradient information.

During the weight training and adjusting process,the fitness functions of an neural network can be de-fined by considering two important factors: the errorbetween target and actual outputs and complexityof the neural network. Unlike the case in gradientdescent-based training algorithms, the fitness (or er-ror) function does not have to be differentiable oreven continuous since GAs do not depend on gradientinformation. Therefore, GAs can handle large, com-plex, nondifferentiable and multimodal spaces, whichare the typical cases in remote sensing classificationand many other real world applications.

This paper demonstrates a method that uses GAto train the neural network for land cover classifica-tion. The outline of this paper is as follows. First, inSection 2, we introduce the multilayer feed-forwardneural network model, the genetic algorithm, andmethodology to hybrid real coded GA with a backpropagation algorithm for neural network training.Next, in Section 3, two experiments, including asimple land cover classification and comparison withSPOT-4 XS data using our hybrid evolutionary neuralnetwork classifier and other classifiers, and a morecomplicated experiment on CBERS data and its anal-ysis are presented. Finally inSection 4, some conclu-sions are reached and future work is also proposed.

2. Methodology

2.1. The neural network

Assuming we have a three layer feed-forward neuralnetwork withm inputs (channels) andk outputs (cate-gories), andl hidden nodes. Each neuron in the hidden

Z. Liu et al. / Future Generation Computer Systems 20 (2004) 1119–1129 1121

layer usessigmoid function f(x) as its threshold func-tion, and each neuron in the output layers usesPure-lin functionp(x) as its threshold function. The neuronoutput of hidden nodeh (1 ≤ h ≤ l) and output nodeq (1 ≤ q ≤ k) can be expressed as:

zh = f(WTX) = f

(m∑i=1

ωixi − δh

), (1)

oq = p(VTZ) = p

(l∑

i=1

vizi − δq

), (2)

respectively, where the superscript T stands for a vec-tor transpose,W = [ω1, ω2, . . . , ωi, . . . , ωm] is theweight connection vector between the input nodesand hidden nodeh, V = [v1, v2, . . . , vi, . . . , vk] is theweight connection vector between the hidden nodesand output nodeq, X = [x1, x2, . . . , xi, . . . , xm] isthe input vector for each hidden node, andZ =[z1, z2, . . . , zi, . . . , zm] is the output vector of thehidden nodes.δh andδq are the corresponding biasesfor hidden nodeh and output nodeq. zh and oq arethe output neuron responses for nodeh and nodeq,respectively. TheSigmoid function f(x) is defined as:

f(x) = 1

1 + e−x, (3)

wherex ∈ [−∞,+∞], and thePurelin function p(x)is defined as:

p(x) = αx + β, (4)

whereα is a nonzero constant,β is the bias, andα,β ∈ [−∞,+∞].

Assuming we have a set of pattern samplesX ={X1, X2, . . . , Xn}, wheren is the number of samples,and each sampleXi in setX is am-dimensional featurevector; letT = {T1, T2, . . . , Ti, . . . , Tn} as setX’s cor-responding output classes,Ti = [t1, t2, . . . , tj, . . . , tk]is ak-dimensional class vector. If the target class for aspecific sample isj (1 ≤ j ≤ k), then we havetj = 1,otherwisetj = 0. For simplicity, let us denoteoij asthejth actual neuron output for the input training sam-ple Xi at the output layer, whiletij as its desired re-sponse. The mean square error function for this neuralnetwork could be described as:

ε(net) = 1

nk

n∑i=1

k∑j=1

(tij − oij)2, (5)

where ε is the mean square error, net is the neuralnetwork.

2.2. Combination of neural network and geneticalgorithm

The combination of genetic algorithm and neuralnetwork for weight training consists of three majorphases. The first phase is to decide the representa-tion of connection weights, i.e., whether we use a bi-nary strings form or directly use a real number formto represent the connection weights. The second stepis the evaluation on the fitness of these connectionweights by constructing the corresponding neural net-work through decoding each genome and computingits fitness function and mean square error function.The third one is applying the evolutionary process suchas selection, crossover, and mutation operations by agenetic algorithm according to its fitness. The evolu-tion stops when the fitness is greater than a predefinedvalue (i.e., the training error is smaller than a certainvalue) or the population has converged.

The technical design of the evolutionary strategy ofconnection weights training can be described as:

(1) Decode each individual (genotype) in the currentgeneration into a set of connection weights. Sincethis paper uses a straightforward real coded geno-type representation, what we have to do is just toset each neuron’s connection weights and bias toits correspondent gene segments. The choice ofreal coded genotype representation can search thepotential solutions more precisely in feature spacethan binary representation. Moreover, it is simpleand intuitive.

(2) Evaluate each set of the connection weights byconstructing the corresponding neural networkstructure and computing its total mean squareerror between actual and target outputs, seeEqs. (1)–(5). The fitness of an individual is deter-mined by the total MSE. The higher the error, thelower the fitness. Our fitness functionδ is definedas:

ε∗(neti) = ε(neti) − min(ε(neti))

max(ε(neti)) − min(ε(neti)), (6)

δ(neti) = e−ψε∗(neti). (7)


Here,Eq. (6)shows a MSE normalization opera-tion applied on each MLP represented by a chro-mosome.Eq. (7)shows the actual fitness function.In Eq. (7), ψ is a positive constant. A fine choicefrom our experience is to setψ to 6.0.

(3) Select parents for reproduction based on their fit-ness. A roulette wheel selection scheme is adoptedin our experiments[7,12]. The population of cur-rent generation is mapped onto a roulette wheel,where each chromosome is represented by a spacethat proportionally corresponds to its fitness.

(4) Apply search operators in conjunction with thecrossover and/or mutation operators, to parentchromosomes to generate offspring, which formthe next generations[10]. An asexual reproduc-tion operator, an arithmetical crossover operator[14], a single point random mutation operator anda nonuniform mutation operator[14] is appliedin the experiments of this article.

Crossover operator: For the asexual reproductionoperator, best 10% of chromosomes in current gen-eration are directly copied to the next generations asoffspring. For the arithmetical crossover operator. Letus assume thatC1 = (c1

1, . . . , c1i , . . . , c

1n) andC2 =

(c21, . . . , c

2i , . . . , c

2n) are two chromosomes that have

been selected to apply the crossover operator, then twooffspring, Hk, k = 1,2, are created according to thefollowing equations:

Hk = (hk1, . . . , h

ki , . . . , h

kn), k = 1,2,

h1i = λc1

i + (1 − λ)c2i , h2

i = λc2i + (1 − λ)c1

i , (8)

whereλ is a user specified positive constant. In ourexperiments,λ is set to 0.28.

Mutation operator: Let us supposeC = (c1, . . . ,

ci, . . . , cn) is a parent chromosome,ci ∈ [ai, bi] is agene to be mutated, andai and bi are the lower andupper ranges for geneci. A new gene in the offspringchromosomes,c′

i may arise from the application oftwo different mutation operators, respectively.

The first mutation operator is single point randommutation, in which a single genec′

i is randomly cho-sen number from range [ai, bi] to replaceci and toform new chromosomeC′. This mutation operator issometimes called uniform mutation.

Another mutation operator is the nonuniform mu-tation operator. Assuming this operator is applied in

a generationt, andgmax is the maximum number ofgenerations, we can describe it as:

c′i =

{ci + %(t, bi − ci) if τ = 0,

ci − %(t, ci − ai) if τ = 1,

%(t, y) = y(1 − r(1−t/gmax)b

), (9)

whereτ is a random binary number having value 0 or1, andb is a parameter chosen by the user, which deter-mines the degree of dependency on the number of it-erations. This function gives a value in the range [0, y]such that the probability of returning a number closeto zero increases as the algorithm advances. The sizeof the gene generation interval shall be lower with thepassing of generations. In our algorithm,b is set to 0.5.

The property of combining these crossover andmutation operators can make a uniform search in theinitial space in early generations and very locally ata later stage, favoring local tuning. It also greatlyreduces the risk of premature convergence.

Our genetic algorithm here works with not one butmultiple populations, all of which evolve separatelymost of the time, except for once every several gen-erations we applying a crossover operation from dif-ferent populations. Since sometimes it could happenfor a single population scheme that though the neu-ral network could theoretically solve a certain classi-fication problem, the system may not return a correctsolution. This is because of the random nature of thealgorithm and its reliance on natural selection, muta-tion and crossover. The other reason is the permuta-tion problem, which has been discussed by Hancock[9]. Thus, it could happen that a certain flow of eventsthat would lead to a correct solution will not occurand thus a solution will not be found. However, by us-ing several unrelated populations, we have decreasedthe probability of this occurrence, since if some pop-ulation has poor individuals the solution could still befound at another.

2.3. Weight connections optimization usingconjugate gradient descent algorithm

As we have discussed above, GA can be used ef-fectively in the evolution to find a near-optimal setof connection weights globally without computinggradient information and without weight connections


Output Layer Start

Initialization

Termination criterion satisfied?

Stop GA training

Input Layer

Weight

Hidden Layers

Weight

No

Yes

Chromosomes

BP training

Classification

End

Crossover / Reproduction

Mutation

Selection

Fig. 1. Framework of combining neural network and GA for classification.

initialization. However, sometimes this is not directlyachievable and the result could be refined by a ordinaryalgorithm. For our purpose, we need to find the optimalconnection weights. Hence, we could try to incorpo-rate a local search procedure into the evolution usingthe conjugate gradient descent algorithm to find thebest connection weights at the local error surface. Thisprocedure is completed by applying a BP algorithmon the GA established initial connection weights.

The overall framework of our proposed methodcould be summarized as shown inFig. 1. First, atthe initialization stage, the neural network structure,including number of input nodes, hidden nodes, andoutput nodes are specified according to the specificclassification application. Connection weights cor-responding to this MLP structure are encoded inGAs chromosomes; each chromosome represents oneMLP structure with given connection weights con-tained in its genes. Second, at the GA-based weightconnection training stage, these initialized chromo-somes which may belong to different populationsare evolved generation to generation by using GAaccording to the fitness and MSE performance ofthe correspondent MLP. Finally, at the stage of localoptimization of the error surface with BP, best con-nection weights matrix contained in the chromosomewith best fitness is chosen as the MLP initial weightand applying the gradient descent algorithm to opti-mize the best connection weights and minimize localerrors.

3. Experimental results and discussion

In this section, we will discuss some experimentalresults with our classification methodology and com-pared with other methods. Two experiments are em-ployed for our discussion.

3.1. Simple example and comparison with othermethods

A 900 × 700 pixels SPOT-4 XS high resolutionimagery of Jiangning County in Jiangsu Provincein eastern China is used for classification (Fig. 2a).There are three bands in SPOT-4 XS data: green(0.50–0.59�m), red (0.61–0.68�m), and near infrared(0.79–0.89�m). Three types of supervised classifiers:maximum likelihood classifier (MLC), back propaga-tion neural network classifier (BP-MLP), and hybridGA-based neural network classifier (GA-MLP) wereemployed. A total of 96 samples belonging to threeland cover classes were used for training. These threeclasses were built-up and bare land, pond and river,vegetation. For the BP and GA neural network struc-ture, a three layer MLP with one hidden layer consist-ing of eight hidden nodes was used. For GA-MLP, wehad two population groups and each population sizewas set to 100. The asexual reproduction probabilitywas set to 0.1, arithmetical crossover probability wasset to 0.8, nonuniform mutation probability was setto 0.07, single point random mutation probability was


Fig. 2. Classification results of SPOT-4 XS imagery of Jiangning County, Jiangsu, China.

set to 0.03, withai set to−1.0,bi set to 1.0. GA-MLPwas trained by 70 generations, followed by a BP train-ing procedure. For the back propagation training al-gorithm, the learning rate was set to 0.01, the learningrate incremental was set to 1.03, and target training

Fig. 3. Neural network training performance for SPOT XS imagery of Jiangning County, Jiangsu, China.

performance was set to 0.01602 (actual training per-formance was 0.01601). Different land cover classifi-cation results are illustrated inFig. 2b–d, respectively.

The hybrid training performance is shown inFig. 3.Fig. 3a shows the best MSE corresponding to each


Table 1Comparison of classification accuracy

Land cover User’s accuracy (%) Kappa statistic

MLC BP-MLP GA-MLP MLC BP-MLP GA-MLP

Built-up, bare land 65.39 70.83 88.89 0.583 0.649 0.866Water 100.00 100.00 100.00 1.000 1.000 1.000Vegetation 97.10 100.00 98.63 0.889 1.000 0.947

Overall 89.0 93.0 97.0 0.750 0.846 0.929

Overall accuracy and Kappa statistic is calculated by weights.

generation during the evolutionary training. From thisfigure, we can see the variation of best MSE repre-sented by the best chromosome with the number ofgenerations of the genetic algorithm. After 20 gener-ations of execution, the change of best MSE has be-come slower because of the local tuning characteristic.Fig. 3b shows the MSE corresponding to each itera-tion during the back propagation training. This figureshows that, although our genetic algorithm has greatlyreduced the total MSE of the neural network, a moreimprovement of training performance could still beachieved by applying a back propagation weight ad-justment procedure, high to 0.1.

For this imagery, we randomly chose 200 pixelsfrom the three classified land cover maps to assess theclassification accuracy. We compared these pixels withour interpretation results and the user’s accuracy[19]and kappa coefficient[5,6], which is shown inTable 1.

From Table 1 we can see that, the MLP classi-fiers are more accurate than the maximum likelihoodclassifier, the overall user’s accuracies for GA-MLP,BP-MLP and MLC are 97, 93, 89%, respectively.The reason for better accuracy with GA-MLP thanBP-MLP is possibly because of the better error res-idency when training the neural network using GA(0.0161 versus 0.03). This also proves that GA couldfind more globally optimal solution of a neural net-work.

3.2. More complicated experiment and discussion

An applicable classification algorithm cannot onlyhandle simple cases, but also more complicated con-ditions. In the second experiment, our classified cat-egories have increased to six. More training samplesare added and some of them are exclusive. This change

makes our experiment more close to normal real ap-plications.

A 900 × 900 pixels CBERS (China–Brazil EarthResources Satellite, launched in October 1999) highresolution imagery of Shihezi County in XinjiangProvince in northwestern China was used for classifi-cation (Fig. 5a). CBERSs nominal spatial resolutionis 19.5 m. There are six band in CBERS datasets,but here only three bands are used for classifica-tion. These bands include green band of wavelength0.52–0.59�m (Band 2), red band of wavelength0.63–0.69�m (Band 3), and near infrared band ofwavelength 0.77–0.89�m (Band 4). A total of 280samples belonging to six land cover classes were usedfor training. These six classes are wheat field, pondand river, desert and bare land, saline land, wet andirrigated land, and cotton field. The scatter plots ofthese 280 samples inFig. 4a and b show the complex-ity of overlapping and nonlinear class boundaries. Agood classification algorithm should correctly iden-tify the nonlinear and overlapping class boundarieswithout much priori knowledge or assumption. Forthe BP and GA neural network structures, a threelayer MLP with one hidden layer consisting of eighthidden nodes were used. For GA-MLP, we have twopopulation group and each population size was setto 150, asexual reproduction probability was set to0.1, arithmetical crossover probability was set to 0.8,nonuniform mutation probability was set to 0.05, andthe single point random mutation probability was setto 0.05, withai set to−5.0, bi set to 5.0. GA-MLPwas trained by 300 generations, followed by a BPtraining procedure. For the back propagation train-ing algorithm, the learning rate was set to 0.01, andthe learning rate incremental was set to 1.03, targettraining performance was set to 0.059. The classified


Fig. 4. Scatter plots for a training set of CBERS imagery of Shihezi County.

image is shown inFig. 5b. Assessing the classifiedimage with existing land cover maps have shown thatthe classification accuracy is over 87.3%, nearly 3%higher than BP-MLP.

The hybrid training performance of this imagery isshown inFig. 6. Fig. 6a shows the best MSE corre-sponding to each generation during the evolutionarytraining.Fig. 6b shows the MSE corresponding to eachiteration during the back propagation training. Sameconclusions can be deduced from these two figuresjust as shown inFig. 3.

Fig. 5. Classification results of CBERS imagery of Shihezi County.

To have an insightful observation of the change ofthe network structure and its mechanism between thegenetic evolved neural network and the network af-ter applying a back propagation training, we can havea comparison fromTables 2–5. Note that in these ta-bles, there are three input nodes representing CBERSBands 4, 3, and 2, which are denoted as I1, I2 andI3, respectively. The eight hidden nodes in the hiddenlayer are denoted as H1, H2, H3, H4, H5, H6, H7,and H8, respectively. The six output nodes in the out-put layer are denoted as O1, O2, O3, O4, O5, and O6,


Table 2Comparison of weight connections between input layer and hidden layer

Hidden node GA training input node BP training input node

I1 I2 I3 I1 I2 I3

H1 0.0546∗ 0.0308∗ −0.0125∗ 0.0546∗ 0.0307∗ −0.0125∗H2 0.0319 −0.0625 0.0324 0.2659 −0.2614 0.0101H3 −0.0350 0.0379∗ 0.0075 −0.1965 −0.0199∗ 0.3421H4 0.0584∗ 0.0264∗ 0.0200∗ 0.0584∗ 0.0264∗ 0.0200∗H5 0.0846∗ 0.0748∗ 0.0268∗ 0.0846∗ 0.0748∗ 0.0268∗H6 0.0073 −0.0085∗ 0.0227 0.1507 0.0849∗ −0.3348H7 −0.0279∗ −0.0006 0.0671 −0.0647∗ 0.2741 −0.2638H8 0.0917∗ −0.0029∗ 0.0572∗ 0.0917∗ −0.0029∗ 0.0572∗

∗ Less change of connection weight.

Table 3Comparison of weight connections between hidden layer and output layer

Hiddennode

GA training output node BP training output node

O1 O2 O3 O4 O5 O6 O1 O2 O3 O4 O5 O6

H1 −0.001 −0.0144 0.0271 −0.0135 0.0013 −0.0372 0.0561 0.0386 0.0044 0.0213−0.0329 −0.0587H2 0.0536 0.0078 −0.0039 −0.0296 0.041 0.042 −0.0418 0.0187 −0.2 −0.2363 0.3737 0.0951H3 −0.0404 −0.0255 0.0167 0.0493−0.0153 −0.0211 −0.1011 −0.0101 0.096 0.0059 0.3441−0.325H4 0.0521 0.0446 0.0114 0.0173−0.0078 0.0445 0.1092 0.0976−0.0113 0.0521 −0.042 0.023H5 0.0234 0.0387 −0.0237 0.0217 0.015 −0.0208 0.0805 0.0917−0.0464 0.0565 −0.0192 −0.0423H6 0.0422 0.0122 0.0504 0.0563 0.0268 0.0173−0.005 −0.3735 0.289 0.0074 0.0205 0.0905H7 0.0172 0.0598 0.0166 0.0419 0.0027 0.071−0.4048 −0.0101 −0.0123 −0.0068 0.0555 0.3922H8 0.0611 0.0061 0.0828 0.0219 0.0583 0.0418 0.1182 0.0591 0.0601 0.0567 0.0241 0.0203

respectively, representing six different classes includ-ing wheat field, pond and river, desert and bare land,saline land, wet and irrigated land, and cotton field.

Table 2demonstrates the changes of weight con-nections between input layer and hidden layer. As canbe seen from the table, there is a little change in con-nection weights between input nodes and H1, H4, H5,and H8, which have been labeled with asteroid. Notethat, most of these connection weights are larger thanothers in the corresponding columns, especially forI2, I1 (Bands 3 and 4). This may reveal that Bands 3and 4 contain more information in the neural networkto classify training samples than the other two bands.These conclusions can also be explained fromFig. 4aand b.

Compared toTable 2, there are no evident chang-ing principles shown inTable 3. It seems that geneticalgorithm has successfully evolved the connectionweights between the input layers and hidden layers,especially those most important nodes containing

maximum information like I1, H1, H4, H5 and H8;while back propagation algorithm can finely tunethe connection weights between the input layers andhidden layers, and successfully adjust the connectionweights between the hidden layers and output layers.

The idea implicated in this phenomena is that,since the genetic algorithm is a forward stochastic

Table 4Comparison of biases in hidden layer

Hidden node GA training BP training

H1 0.0547∗ 0.0547∗H2 0.0077 0.0154H3 −0.0137 0.0099H4 0.0394∗ 0.0394∗H5 0.0192∗ 0.0192∗H6 0.0305 0.0054H7 −0.0073 −0.0383H8 0.0400∗ 0.0400∗

∗ Less change of connection weight.


Fig. 6. Neural network training performance for CBERS imagery of Shihezi County.

Table 5Comparison of biases in output layer

GA training output node BP training output node

O1 O2 O3 O4 O5 O6 O1 O2 O3 O4 O5 O6

Bias 0.0408 −0.0034 0.0215 0.0048 0.0794 0.0412 0.0979 0.0496−0.0012 0.0396 0.0452 0.0197

optimization algorithm, the most accurate weightadjustment may occur in the first layer, that is the con-nection weights between the input layer and the hid-den layer; and since the back propagation algorithm isa backward optimization algorithm, the most accurateweight adjustment may occur at the last layer, that is,the connection weights between the output layer andthe hidden layer. What we can expect from this is that,if combining these two algorithms, higher accuracyof weight connection adjustment and classificationresults can be achieved. This idea is also reflected inthe bias adjustment shown inTables 4 and 5.

4. Conclusion and future work

In this article, we have discussed the advantagesand the key issues of the genetic algorithm evolvedneural network classifier in detail. Our methodologyadopts a real coded GA strategy and hybrid withback propagation algorithm. The genetic operators

are carefully designed to optimize the neural network,avoiding premature convergence and permutationproblems. A SPOT-4 XS imagery was used to testour algorithm. Preliminary research has shown that ahybrid GA algorithm-based neural network classifiercan have better overall accuracy on high resolutionland cover classification. Our experiment and discus-sion on the CBERS data have showed that carefullydesigned genetic algorithm-based neural network out-performs gradient descent-based neural network. Thishas been supported by the analysis of the changesof connection weights and biases of the neuralnetwork.

One problem when considering the combination ofneural network and genetic algorithm for land coverclassification is the determination of the optimal neu-ral network topology, i.e., which is the best neuralnetwork structure for a specific training samples andremote sensing imagery. Our neural network topologydescribed in this experiment is determined manually.A substitute method is to apply the genetic algorithm


for neural network structure optimization, which willbe a part of our future work.

Acknowledgements

This work was supported by the grant of the Knowl-edge Innovation Program of the Chinese Academy ofSciences (approved #KZCX1-SW-01), and subsidizedby the China’s Special Funds for Major State Funda-mental Research Project (G2000077900).

References

[1] S. Bandyopadhyay, S.K. Pal, Pixel classification usingvariable string genetic algorithms with chromosomedifferentiation, IEEE Trans. Geosci. Remote Sens. 39 (2)(2001) 303–308.

[2] J.A. Benediktsson, P.H. Swain, O.K. Ersoy, Neural networkapproaches versus statistical methods in classification ofmultisource remote sensing data, IEEE Trans. Geosci. RemoteSens. 28 (4) (1990) 540–552.

[3] G.A. Carpenter, M.N. Gjiaja, S. Gopal, C.E. Woodcock, ARTneural networks for remote sensing: vegetation classificationfrom Landsat TM and terrain data, IEEE Trans. Geosci.Remote Sens. 35 (2) (1997) 308–325.

[4] G.A. Carpenter, S. Gopal, S. Macomber, S. Martens, C.E.Woodcock, A neural network method for mixture estimationfor vegetation mapping, Remote Sens. Environ. 70 (1999)138–152.

[5] J. Cohen, A coefficient of agreement for nominal scales,Educ. Psychol. Measure. 20 (1) (1960) 37–46.

[6] R.G. Congalton, A review of assessing the accuracy ofclassifications of remotely sensed data, Remote Sens. Environ.37 (1991) 35–46.

[7] D.E. Goldberg, Genetic Algorithms in Search, Optimization,and Machine Learning, Addison-Wesley, New York, 1989.

[8] S. Gopal, C.E. Woodcock, A.H. Strahler, Fuzzy neuralnetwork classification of global land cover from a 1 degreeAVHRR data set, Remote Sens. Environ. 67 (1999) 230–243.

[9] P.J.B. Hancock, Genetic algorithms and permutationproblems: a comparison of recombination operators for neuralnet structure specification, in: Proceedings of the InternationalWorkshop on Combinations of Genetic Algorithms and NeuralNetworks (COGANN-92), IEEE Computer Society Press, LosAlamos, CA, 1992, pp. 108–122.

[10] F. Herrera, M. Lozano, J.L. Verdegay, Tackling Real-codedGenetic Algorithms: Operators and Tools for BehaviouralAnalysis, NEC Research Index.http://citeseer.nj.nec.com/.

[11] J. Hertz, A. Krogh, R. Palmer, A Introduction to the Theory ofNeural Computation, Addison-Wesley, Readings, CA, 1991.

[12] J.H. Holland, Adaptation in Natural and Artificial Systems,The University of Michigan Press, 1975.

[13] D.R. Hush, B.G. Horne, Progress in supervised neuralnetworks, IEEE Signal Process. Mag. 10 (1) (1993) 8–39.

[14] Z. Michalewicz, Genetic Algorithms+ Data Structures=Evolution Programs, Springer-Verlag, New York, 1992.

[15] MathWorks, Inc., Neural Network Toolbox User’s Guide,MathWorks, Inc., Natick, MA, 1997.

[16] N.J. Nilsson, Artificial Intelligence: A New Synthesis, MorganKaufmann, San Francisco, CA, 1998

[17] S.K. Pal, S. Bandyopadhyay, C.A. Murthy, Genetic Classifierfor remotely sensed images: comparison with standardmethods, Int. J. Remote Sens. 22 (13) (2001) 2545–2569.

[18] D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learningrepresentations by back-propagating errors, Nature 323 (1986)533–536.

[19] M. Story, R. Congalton, Accuracy assessment: a user’sperspective, Photogram. Eng. Remote Sens. 52 (3) (1986)397–399.

[20] J. Townshend, C. Justice, W. Li, C. Gurney, J. McManus,Global land cover classification by remote sensing: presentcapabilities and future possibilities, Remote Sens. Environ.35 (1991) 243–255.

[21] X. Yao, Evolving artificial neural networks, Proc. IEEE 87 (9)(1999) 1423–1447.

http://citeseer.nj.nec.com/

Documents

Evolving neural network using real coded genetic algorithm (GA) for multispectral image classification