Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014ISSN 2278-7763
Copyright © 2013 SciResPub. IJOART
network as in the form of backpropagation through
time model (BPTT) offers a suitable framework for
reusing the output values of the neural network in
training and it exhibited some promising performance
but only for the dynamic patterns and shown
inefficiency static patterns [21]. Later on it has been
investigated that the feed forward multilayer neural
network with enhance and extended version of
backpropagation learning algorithm [22] is more
suitable for handling the complex pattern classification
or recognition tasks in spite of its inherited problem of
local minimum, slow rate of convergence and no
guarantee of convergence [23 – 27].
It has been found that to overcome the problems of
descent gradient searching in a large search space as in
the case of complex pattern recognition task with
multilayer feedforward neural network due the
evolutionary search algorithm i.e. genetic algorithm
(GA) is a better alterative [28]. The reason of this is
quite obvious because this search technique is free from
derivatives and it evolves the population of possible
partial solutions and applies the natural selection
process for filtering them until the global optimal
solution is not found [29]. Various prominent results
have been reported in the literature for the generalize
classification for the handwritten English character
recognition problem with the integration of genetic
algorithm and backpropagation learning rule for
multilayer feed forward neural network
architecture[30,11]. In this approach the fitness
performance for the weights has been considered with
back-propagated error of the current input pattern
vector. Thus, the performance of network still depends
upon the back-propagated instantaneous random and
unknown error.
In this paper the performance of feedforward
neural network with descent gradient of distributed
error and genetic algorithm is evaluated for the
recognition of handwritten characters of ’Marathi’
script. The performance index for the feedforward
multilayer neural networks is considered here with
distributed instantaneous unknown error i.e. different
error for different layers. The genetic algorithm is
applied here to make the search process more efficient
to determine the optimal weight vector from the
population of weights. The genetic algorithm here is
applied with distributed error and the fitness function
for the genetic algorithm is also considered as the mean
of square distributed error that is different for each
layer. Hence the convergence is obtained only when the
minimum of different errors is determined. So that, the
instantaneous square error is not same for each layer
instead of this it is different for each layer and it is
considered as distributed error for the multilayer feed
forward neural network, in which the number of units
in hidden layer and output layers are equal. Thus, the
same desired output pattern for a presented input
pattern is distributed to every unit of hidden layers &
outputs layer those contains the different actual outputs
and each layer has different square error. Thus, the
instantaneous error is now distributed instead of back
propagated. The proposed hybrid evolutionary
technique i.e. descent gradient of distributed error with
genetic algorithm is used to train the multilayer neural
network architecture for the generalized classification
of hand written ’Marathi’ script.
The rest of the paper is organized as follows:
Section 2 presents the generalized descent gradient
method for the instantaneous distributed error and the
implementation of genetic algorithm in generalize way
with distributed error. Section 3 explores the
architecture and simulation design for the proposed
method. Section 4 presents the results and discussion.
The section 5 of the paper presents the conclusion
followed by references.
IJoART
90
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014ISSN 2278-7763
Copyright © 2013 SciResPub. IJOART
2. Generalized Descent gradient learning for
distributed square error
A multilayer feed forward neural network with at
least two intermediate layers commonly known as
hidden layer, in addition to the input and output layer
can perform any complex generalized pattern
classification task. The generalized delta learning rule
[23] is a very common and widely used technique to
train the multilayer feedforward neural networks for the
pattern classification & pattern mapping. In this
learning the optimum weight vector may be obtained
for the given training set, if the weights are adjusted in
such a way that the gradient descent is made along the
total error surface in the weight space. The error for the
minimization is actually not the least mean square error
for the entire training set instead of this it is an
instantaneous square error for each presented pattern on
each time. Thus, for every pattern on each time there
will be an unknown local error and there is the
incrementally updating of the weight for each local
error. Hence each time the weights are updated to
minimize this known local error by propagating this
error back to all hidden layers from the output layer.
Thus, the instantaneous error for each presented input
pattern as the square difference between the desire
pattern vector and the actual output for the units of
output layer is backpropagated to units of hidden
layers. In this current work we are considering the
distributed error instead of the backpropagated error.
The instantaneous square error is not same for the each
layer because each layer has its own actual output
pattern vector. So that for each layer the instantaneous
square error is computed with the square difference
between desire output pattern vector for the given input
sample from the training set and the actual output
pattern vector of the respective layer. This distributed
instantaneous square error imposes a constraint on the
architecture of multilayer feed forward neural network.
This constraint restrict the architecture in a way that the
number of units in output layer and the hidden layers
should same though the desire output pattern for the
presented input pattern could accommodate
conveniently by each layer. Thus, for every hidden
layer and output layer we have the different square
error. Therefore the optimum weight vector can obtain
for each layer if the weights are adjusted in such a way
that the gradient descent is made along the
instantaneous square error of that layer. It exhibits that
we have more than one objective function or minimum
error, one each for each layer except the input layer for
the presented pattern. It explores this problem as the
multi-objective optimization problem. Thus, here the
objective is to obtain the minimum of each
instantaneous square error simultaneously to determine
optimum weight vector for the presented input pattern.
Therefore, the mean of instantaneous square error of
the layer is used to update the weights of the layer and
the gradient descent of each error for each layer will
obtain at the same time. Therefore, there will be more
than one gradient descent at one time of individual
errors for the presented input pattern depending on the
number of hidden layers. Hence, the updating of weight
vector for units of hidden layers and for the units of
output layer will be proportional to their corresponding
gradient descents. So that, there is a different gradient
for each layers. Thus, the optimal weight changes will
proportional to the gradient descent of the distributed
instantaneous mean square errors for the presented
input pattern. The generalized method for obtaining the
weight update for hidden layers and output layer is
formulated as:
Let ),( ll da for Ll ,,2,1 be the current input
pattern vector set of the training set of L pattern
samples is presented to the multilayer feed forward
neural network for formulating the generalized descent
IJoART
91
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014ISSN 2278-7763
Copyright © 2013 SciResPub. IJOART
gradient of instantaneous square distributed error. As
we have discussed already about the constraint of this
multilayer feed forward neural network for keeping
same the number of units in hidden and output layer as
shown in figure 1
Fig. 1: Multilayer Feed Forward Neural
Network Architecture
The current random sample pattern ),( ll da of the
training set defines the instantaneous squared error
vector Ope at the output layer and H
pe at the hidden layer
as:
))(,),(()( 111Oklkkl
Oll
Oll
Ol ySdySdySde
(1)
))(,),(()( 111Hklkkl
Hll
Hll
Hl ySdySdySde
(2)
Therefore the instantaneous distributed mean square
error for the output and hidden layer is defined as
respectively:
K
kOklk
Okl
Ol ySdE
12
21 )]([
(3)
And,J
jHjlj
Hjl
Hl ySdE
12
21 )]([
(4)
Hence, the update in the weight for the thk unit of
output layer at iteration t for the current input pattern
vector is represented as;
kj
Ol
kjlkj w
Etw )(
(5)
And also the update in the weight for the thj unit of
hidden layer at iteration t for the same current pattern is
represented as:
ji
Hl
jilji w
Etw )(
(6)
Here kj and ji are the learning rates for the output
and hidden layer respectively.
Now, apply the chain rule on Equation 5, we have;
kj
Okl
Okl
Ol
kjkj
Ol
kjlkj w
yyE
wEtw )(
Here, the activation value isJ
j kjHjlj
Okl wySy
1)( and the output signal is
Okly
Okl
Oklk
eyfyS
11)()(
Or, )()( HjljO
kl
Ol
kjlkj yS
yEtw
IJoART
92
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014ISSN 2278-7763
Copyright © 2013 SciResPub. IJOART
)()()(
HjljO
kl
Oklk
Oklk
Ol
kj ySy
ySyS
E
)())(1)(()(
Hjlj
Oklk
OklkO
klk
Ol
kj ySySySyS
E
Now from the equation 3 we have:
K
kOklk
lkO
klk
Ol ySdyS
E1
)]([)(
Hence we have:K
kHjlj
Oklk
Oklk
Oklk
lkkj
lkj ySySySySdtw
1)())(1)(()]([)(
(7)
Thus, the weight at the iteration (t+1) for the units of
output layer with momentum term are presented as:K
klkj
Hjlj
Oklk
Oklk
Okkj
lkj twySySyStw
1)1()())(1)(()1(
(8)
Here the momentum rate constant is considered
with 10 for the output layer.
Similarly, apply the chain rule on Equation 6, we have;
ji
Hjl
Hjl
Hl
jiji
Hl
jilji w
yyE
wEtw )(
Or, iHjl
Hl
jilji a
yEtw )(
iHjl
Hjlj
Hjlj
Hl
ji ay
ySyS
E )()(
iHjlj
HjljH
jlj
Hl
ji aySySyS
E))(1)((
)(
Now, from the equation 4 we have:
J
jHjlj
lkH
jlj
Hl ySdyS
E1
)]([)(
Hence we have:
J
j iHjlj
Hjlj
Hjlj
lkji
lji aySySySdtw
1))(1)(()]([)(
(9)
Thus, the weight at the iteration (t+1) for the units of
output layer with momentum term are presented as:
J
jljii
Hjlj
Hjlj
Hljji
lji twaySyStw
1)1())(1)(()1(
(10)
Here the momentum rate constant is considered
with 10 for the hidden layer
Here an interesting observation is considered about the
number of terms appearing in the expression for weight
updating for the hidden layer. It can be seen from
equation 9 that the less number of terms are considered
with respect to the weight updating for hidden layer
from backpropagation learning rule for the
backpropagated instantaneous mean square error. Thus,
the less time complexity is involved for the
computation of weight update according to descent
gradient of distributed instantaneous mean square error.
Hence it is obvious that should consider the fast
convergence with respect to conventional generalized
delta learning rule of backpropagated error.
2.1 Genetic algorithm with descent gradient of
distributed Error
The majority of implementation of the GA is a
derivative of Holland’s innovative specification. In our
approach the genetic algorithm is incorporated with
descent gradient for distributed instantaneous mean
square error learning in the multilayer feed forward
neural network architecture for the generalized pattern
classification. The input pattern vector with its
corresponding output pattern vector form the training
IJoART
93
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014ISSN 2278-77763
Copyright © 2013 SciResPub. IJOART
set is presented to the neural network. The neural
network with its current setting of weights obtained the
actual output for each unit of hidden layers and output
layer. The distributed instantaneous mean square error
is obtained and the proposed descent gradient learning
rule for distributed error is applied up to some fixed
arbitrary n iterations. Thus, the weights between the
layers and bias values of units are updated up to n
iterations for the given input pattern and improved from
their initial stage. After this the iteration for weight
update stops and the genetic algorithm is employed to
evolve the population of modified weights and bias
values. The genetic algorithm is applying for obtaining
the optimal weight vector from the large size of weight
space for the given training set with following three
elements.
(i) The genetic code for the weight vector
representation in the form of chromosome;
(ii) The technique for evolving the population of
weight vectors;
(iii) The fitness function for evaluating the
performance of evolved weight vector;
There are lot of works is reported on the evaluation of
neural network with genetic algorithm [24]. The
majority of the work indicates the integration of genetic
algorithm with neural network is found at following
three levels [25]:
(i) Connection weights (ii) Architectures
(iii) Learning rules.
The evaluation of a weight vectors for the neural
network is an area of curiosity and it is considered in
the approach of this current work. In this approach the
genetic algorithm is using different fitness evaluation
function for each layer. The distributed instantaneous
mean square error for each layer is considered as the
fitness evaluation function for that layer. Generally the
GA starts from the random initial solution and then
converges for the optimal solution. In our approach the
GA applies after the updating of weights up to n
iterations. So that the initial population of solutions for
GA is not random instead of this the initial population
of weights as solution is suboptimal because the
weights have updated in the direction of convergence.
Thus, the GA explores from suboptimal solution to
multi objective optimal solution for the given problem.
The multi objective optimal solution reflects that every
layer expect input layer has its own different error
surface or objective function.
Chromosome Representation
A chromosome is a collection of genes
representing either a weight value or a bias value
represented in some real number. The initial population
of weight and bias for the representation of basic or
initial chromosome in our method is not random.
Instead of this the initial chromosome consists with
suboptimal value of weight and bias. Therefore the
chromosome is represented as a matrix of real numbers
for the set of weight values and bias values. As we have
discussed already that in our proposed multilayer neural
network architecture the error is considered as
distributed instantaneous mean square error i.e. the
different error for different layers. Hence the
chromosome will partition in the sub-chromosomes
corresponding to each layer hidden layer and output
layer. Hence, as per our general architecture of neural
network as shown in Figure 1 there will be two sub-
chromosomes. In the first sub-chromosome, there will
be )jji( genes and for the second chromosome
there will be )kkj( genes. Thus, the numbers of
sub-chromosomes depend upon the number of hidden
layer but the number of genes in every sub-
chromosome will same, though values of genes may
different in each sub-chromosome.
IJoART
94
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014ISSN 2278-7763
Copyright © 2013 SciResPub. IJOART
The Mutation Operator
Mutation operator randomly selects a gene from
chromosome and modified it with some random value
to generate the next population of chromosome. The
probability of mutation is kept low to minimize the
randomness for genetic algorithm. In our approach the
mutation operator applied to each sub-chromosome,
randomly selects a gene from each sub-chromosome
and adds a small random value between +1 and -1 to
generate the next population of sub-chromosome. Let
we have the chromosome NC for the network which is
partitioned in the two sub-chromosomes as NHC and
NOC for hidden layer and output layer. N
HC is
containing Hmj)ji( genes while NOC is
containing Omk)kj( genes. The size of next
generated population would be 1NH and 1NO
respectively. If the mutation operator has applied n
times over the old sub-chromosome for the output layer
and the hidden layer respectively then we have the
following new population of the sub chromosomes
[26]:
)]C(C[CC Hold_N
,Hold_N
m,Hn
1iold_N
Hnew_N
H HHH
(11)
And
)]C(C[CC Oold_N
,Oold_N
m,On
1iold_N
Onew_N
O OOO
(12)
Here H and O are the small random generated values
between -1 to + 1 for sub chromosomes of hidden layer
and output layer respectively, H & O are the
randomly selected genes from oldHC and old
OC sub-
chromosomes respectively and new_NHC & new_N
OC are
the next population of sub-chromosomes for the hidden
and output layer respectively. The inner operator
prepares a new sub-chromosome at each iterations of
mutation and outer operator is building the new
population of sub-chromosome called new_NHC &
new_NOC .
Elitism
Elitism is used with the creation of each new
population to continue the old good population in the
next generation. This process has the significance in the
way that the good solution of previous population
should not lose by the application of genetic operators.
This involved copying the best encoded network
unchanged into the new population as given in
Equations 11 and 12, to include old_NHC & old_N
OC for
creating new_NHC &
new_NOC .
Selection
The selection process of genetic algorithm selects good
or fit population from the newly generated population.
Here the selection process simultaneously considers
newly generated sub chromosomes of hidden layer and
output layer i.e. new_NHC and new_N
OC respectively for
selecting the good population for further cycle. Let a
sub chromosome SelHC from new_N
HC is selected for
which the distributed instantaneous mean square error
for the hidden layer i.e. HlE for the pattern l reached to
its accepted minimum level. Likewise a sub
chromosome SelNC from new_N
OC is selected for which
the distributed instantaneous mean square error for the
output layer i.e. OlE for the same pattern l reached to
its accepted minimum level.
Crossover
Crossover is a very important and useful operator of
genetic algorithm. Here the crossover operator
IJoART
95
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014ISSN 2278-7763
Copyright © 2013 SciResPub. IJOART
considers the selected sub-chromosomes from SelHC &
SelOC and creates next generation of population
separately for the hidden layers and output layer. We
apply the uniform crossover operator n times on the
selected sub chromosomes on different crossover points
to obtain the next generation of population. Let the
selected sub-chromosomes SelHC and Sel
OC are
considered for uniform crossover as shown in Figs.2-4.
Fig. 4: After applying crossover operator
Therefore, on applying the crossover operator n times
on selected sub-chromosome ( SelHC & Sel
OC ), the n+1
population of sub-chromosomes each can be generated
as [27]:
)]CC()CCC[(CC SelH,VV
SelH,
SelH,
SelH,
SelH
n1i
SelH
nextH HH
(13)
And
)]CC()CCC[(CC SelO,VV
SelO,
SelO,
SelO,
SelO
n1i
SelO
nextO OO
(14)
Where and are the randomly selected genes
positions from the sub-chromosomes SelHC & Sel
OC and
nextHC & next
OC is the next generation of population of
size n+1. Thus, after the cross over operation we have
2(n+1) total populations of chromosome for the
network i.e. n+1each for hidden layer and for output
layer.
Fitness Evaluation Function
Fitness evaluation function of genetic algorithm is used
to evaluate the performance of generated new
populations. It filters the populations those find suitable
as per the criteria of fitness function. Here, we use the
separate fitness evaluation function for each layer.
Therefore as per our neural network architecture, two
fitness evaluation functions have used. The one is for
output layer and second one is for the hidden layer. The
first fitness evaluation function estimates the
performance for the sub chromosome of hidden layer
i.e. nextHC and second one estimates the performance for
the sub chromosome of output layer i.e. nextOC . The
fitness function used here is proportional to the sum of
distributed instantaneous mean squared error on
respective layers. The fitness function Hf for the hidden
layer considers the instantaneous mean square error as
specified in equation 4 to evaluate the performance of
sub-chromosomes for hidden layer i.e. nextHC . The
fitness function Of for the output layer considers the
instantaneous mean square error as specified in
Equation 3 to evaluate the performance of sub-
chromosomes for output layer i.e. nextOC . Thus, the
genetic algorithm attempts to find weight vectors and
bias values for different layers those minimize the
corresponding instantaneous mean of squared error.
This procedure for evaluating the performance for
IJoART
96
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014ISSN 2278-7763
Copyright © 2013 SciResPub. IJOART
weight vector of hidden and output layer can represent
as:
0.1errormin H && 0.1errormin O
Do for all n+1chromosomes
{
( nextiHH
HlCH CCthenEerrorif next
iH,
min, )(min,
)&&
( nextiOO
HlCO CCthenEerrorif next
iO,
min, )(min,
)
))min(((min HH errorerrorelse &&
( )errorminerror(min OO ))
}
Here minHC & min
OC represents the sub-chromosomes
those have the minimum error for hidden and output
layers respectively. Here we also have the possibility
for getting more than optimal weight vectors for the
given training set because there are more than one sub-
chromosomes in hidden and in output layers those
evaluated as fit by the fitness evaluation functions of
respective layers.
3 Simulation Design and Implementation
In this simulation design and implementation, two
proposed multilayer feed forward neural networks are
considered. Both neural networks are trained with
proposed descent gradient of distributed instantaneous
mean square algorithm. Since every input pattern
consist with 16 distinct features so that each neural
network architecture contains 16 processing units in the
input layer. First neural network architecture consists
with input layer, two hidden layers with five units in
each and one output layer with 5 units. Second neural
network architecture consists with input layer, one
hidden layer of 5 units and output layer also with 5
units.
Feature Extraction
There are five different samples of handwritten
characters of ’Marathi’ script from five different people
are collected in this simulation as input stimuli for the
training pattern set. These scanned images of distinct
handwritten characters of ‘Marathi’ scripts are shown
in figure 5 as:
IJoART
97
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014ISSN 2278-7763
Copyright © 2013 SciResPub. IJOART
Fig. 5: Scanned images of handwritten distinct‘Marathi’ scripts
The scanned images of hand written characters of
‘Marathi’ scripts as shown in figure 5 are partition into
sixteen equal parts, and the density values of the pixels
for each part were calculated and obtained the center of
density gravity. Therefore for each scanned image of
handwritten characters of ‘Marathi’ scripts we obtained
the sixteen values as the input pattern vector of training
set. Thus, we have the training set, which consist with
sampled patterns of handwritten characters of
‘Marathi’ scripts and each sample pattern is considered
as pattern vector of dimension 116 with real number
values. The output pattern vector corresponds to input
pattern vector is of dimension 15 of the binary
values. The test input patterns set is also considered
with same method for the sample patterns those were
not used in training set. The sample test patterns were
used to verify the performance of trained neural
networks.
Simulation design for 16-5-5-5 Neural Network
Architecture
In the simulation of proposed feed forward multilayer
neural network architecture with two hidden layers of 5
units each and one output layer of 5 units (16-5-5-5)
involves three different instantaneous mean of square
errors at the same time i.e. oE for output layer, 1hE
for first hidden layer & 2hE for second hidden layer,
those are presented as for pattern l:
K
kOklk
lk
Ol ySdE
12))((
21
(15)
G
gHglg
lk
Hl ySdE
12))((
21
11
(16)
IJoART
98
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014ISSN 2278-7763
Copyright © 2013 SciResPub. IJOART
AndJ
jHjlj
lk
Hl ySdE
12))((
21
22
(17)
The proposed gradient learning rule for the
instantaneous mean of square error updates the weight
vector up to t iterations. After this the weight updating
is stopped and the genetic algorithm applies. The
updated weight & bias values are considered as the
initial population of chromosome for the genetic
algorithm. As per our proposed neural network
architecture in this simulation design we have three sub
chromosomes one each for both hidden layers and one
for the output layer. The first sub-chromosome as
shown in figure 6 is of 85 genes in which 80 are the
weights values on the connection link and 5 are the bias
for the units of hidden layer. The second and third sub-
chromosomes are of 30 genes each in which 25 are
weight values on the connection link and 5 are the bias
for the units of second hidden layer and output layer.
Fig. 6 (c): Sub-chromosome 3 for output layer of 30
genes
The mutation operator applies simultaneously to all the
three sub-chromosomes by adding the small random
values between -1 and 1 to the selected genes to
generate the new population of these sub chromosomes.
After this the selection is applied to all the three sub-
chromosomes for selecting the better population of
chromosomes for next generation. This selection
procedure considers the of distributed instantaneous
mean of square error as specified in equations 15, 16
and 17 as the fitness evaluation function to select the
sub chromosomes for the next generation. Now we
apply the cross over operator simultaneously on all the
selected sub-chromosomes to generate the large
population in next generation. Thus, the cross over
operator generates the populations of sub-chromosomes
for first hidden layer, second hidden layer and output
layer of size 85 genes, 30 genes and 30 genes
respectively. So that, the selected population of weights
and biases form each sub-chromosome determines the
optimal solutions for the given Training pattern set.
Thus, there are minimum three optimal solutions are
required for the convergence of neural network.
Simulation design for 16-5-5 Neural Network
Architecture
In the simulation of proposed feed forward multilayer
neural network architecture with one hidden layer of 5
units and one output layer of 5 units (16-5-5) involves
two different instantaneous mean of square errors at the
same time i.e. oE for output layer & 1hE for first
hidden layer, those are presented as for pattern l:
K
kOklk
lk
Ol ySdE
12))((
21
(18)
AndJ
jHjlj
lk
Hl ySdE
12))((
21
(19)
In this experiment we divide the chromosome into the
two sub chromosomes one each for hidden layer and
output layer. The first sub-chromosome as shown in
figure 7 is of 85 genes in which 80 are the weights
values on the connection link and 5 are the bias for the
units of hidden layer. The second sub-chromosome
IJoART
99
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014ISSN 2278-7763
Copyright © 2013 SciResPub. IJOART
consists with 30 genes in which 25 are weight values
on the connection link and 5 are the bias for the units of
output layer.
Fig. 7 (b): Sub-chromosome 2 for output layer of 30
genes
The mutation operator applies simultaneously to both
the sub-chromosomes by adding the small random
values between -1 and 1 to the selected genes to
generate the new population of these sub chromosomes.
After this the selection is applied to both the sub-
chromosomes for selecting the better population of
chromosomes for next generation. This selection
procedure considers the of distributed instantaneous
mean of square error as specified in equations 18 and
19 as the fitness evaluation functions to select the sub
chromosomes for the next generation. Now we apply
the cross over operator simultaneously on all the
selected sub-chromosomes to generate the large
population in next generation. Thus, the cross over
operator generates the populations of sub-chromosomes
for hidden layer and output layer of size 85 genes and
30 genes respectively. So that, the selected population
of weights and biases form each sub-chromosome
determines the optimal solutions for the given Training
pattern set. Thus, there are minimum two optimal
solutions are required for the convergence of neural
network.
3.3 Parameters used
The following parameters are used to accomplish the
simulation of these two experiments for the given
training set of handwritten characters of ‘Marathi’
scripts.
Genetic Algorithm with backpropagated error: The
parameters of the genetic algorithm with
backpropagated error for the simulation of both the
experiments are as follows:
Parameter Value
Learning rate for
output layer )( O0.01
Learning rate for first
hidden layer )(1H
0.01
Learning rate for
second hidden layer
)(2H
0.1
Momentum term
)(0.9
Adaption rate )K( 3.0
Mutation
population size3
Crossover
population size1000
Initial populationRandomly generated values
between 0 and 1
Fitness
evaluation function
(one fitness function)
Back propagated
instantaneous squared error
K
kOkkkl ySdE
1))((
21
Minimum error
(MAXE)0.00001
IJoART
100
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014ISSN 2278-7763
Copyright © 2013 SciResPub. IJOART
Table 1: Parameters used for genetic algorithm with
back propagated error
Genetic algorithm with distributed error: The
parameters used in the simulation of both the
experiments for genetic algorithm with descent gradient
learning for distributed error are as follows:
Parameter Value
Learning rate
for output
layer )( O
0.01
Learning rate
for hidden
layers
)&(21 HH
0.1
Momentum
term for
output layer
)(
0.9
Momentum
term for
output layer
)(
0.7
Adaption rate
)K(3.0
Minimum
error for the
output layer
)MAXE( O
0.0001
Minimum
error for the
hidden layers
)MAXE( H
0.001
Mutation
probabilitySmaller than 0.01
Mutation
population
size for sub-
chromosome
of output layer
3
Mutation
population
size for sub-
chromosome
of hidden
layers
3 each
Crossover
population
size for output
layer
1000
Crossover
population
size for first
hidden layer(
for 16-5-5-5
architecture)
1000
Crossover
population
size for
second hidden
layer( for 16-
5-5-5
architecture)
500
Crossover
population
size for
hidden layer(
for 16-5-5
architecture)
1000
Number of
iteration prior5000
IJoART
101
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014ISSN 2278-7763
Copyright © 2013 SciResPub. IJOART
to applying
GA
Initial
population
Values of weights & bias in each sub
chromosomes up to 5000 iterations of
descent gradient for distributed error.
Fitness
evaluation
functions (two
fitness
function for
16-5-5
architecture
and three
fitness
function for
16-5-5-5
architecture)
Distributed instantaneous sum of
squared errors
K
kOkk
lk
Ol ySdE
12))((
21
G
gHgg
lk
Hl ySdE
12))((
21
11
J
jHjj
lk
Hl ySdE
12))((
21
22
Table 2: Parameters used for decent gradient
learning with distributed error
4 Results and Discussion
The results from Simulation design and implementation
for both the neural network architectures i.e. for 16-5-
5-5 and 16-5-5 are considered for 65 training sample
examples of Handwritten ‘Marathi’ scripts with two
hybrid techniques. The techniques commonly used are
genetic algorithm with descent gradient for
backpropagated instantaneous mean square error and
genetic algorithm with descent gradient for distributed
instantaneous mean square error. The performance of
both the neural network architectures have been
evaluated with these two hybrid techniques of learning
for the given training set and the performance analysis
is also performed. Hence in the performance analysis it
has been found that the neural network architecture of
16-5-5-5 performed more optimally in terms of
convergence, number of epoch and number of optimal
solutions for the classification of patterns in training
set. The performance of neural network architecture for
16-5-5-5 is also found efficient and more generalized
for the test pattern set also. The results of performance
evaluation are shown with tables 5 and 6. The entries of
tables are presenting mean values of iterations and
number of convergence weight matrices of five trials
with each hybrid technique for given training set.
IJoART
102
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014ISSN 2278-7763
Copyright © 2013 SciResPub. IJOART
IJoART
103
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014ISSN 2278-7763
Copyright © 2013 SciResPub. IJOART
Table 5: Performance evaluation for GA withdescent gradient of distributed Error and back
Propagated Error for 16-5-5 architecture
IJoART
104
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014ISSN 2278-7763
Copyright © 2013 SciResPub. IJOART
Table 6: Performance evaluation for GA withdescent gradient of distributed Error and back
Propagated Error for 16-5-5-5 architectureIn the results tables are containing the information
about counts. The counts are here representing the
number of optimum solutions i.e. the number of weight
matrices on which the network is convergence for the
given training set. The integer value for the epoch in
tables is representing the number of iterations
performed by each learning method to classify the
given input pattern. It has been observed from the
results that no case of non convergence is found. Thus
the network is able to successfully converge for more
than one optimum weight vectors or solution for the
given input pattern. Table 5 of simulated result is
showing the performance evaluation between GA with
descent gradient of instantaneous mean square
distributed error and GA with descent gradient of
backpropagated error for the network architecture 16-5-
5. This evaluation is considered about the parameter of
epochs i.e. number of iteration for the convergence and
number of counts i.e. number of optimal converged
weight vectors. Results of table 5 are considered for
mean of five trials for the same input pattern. Table 6
of simulated result is showing the performance
IJoART
105
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014ISSN 2278-7763
Copyright © 2013 SciResPub. IJOART
evaluation between GA with descent gradient of
instantaneous mean square distributed error and GA
with descent gradient of backpropagated error for the
network architecture 16-5-5-5. This evaluation is also
considered about the parameter of epochs i.e. number
of iteration for the convergence and number of counts
i.e. number of optimal converged weight vectors.
Results of table 6 are also considered for mean of five
trials for the same input pattern. An important analysis
about the optimal solutions is also observed form this
simulation. Here an optimal solution is obtained only
when there is more than one objective functions are
satisfied at one time. As in the case of our neural
network of 16-5-5 architecture there are two objective
functions one each for hidden layer and output layer.
The network is converged only when both the objective
functions find their defined minimum error threshold.
Similarly in the neural network of 16-5-5-5 architecture
we have the three different objective functions and the
network is converged only when all the three objective
functions find their defined minimum error threshold.
Thus, the performance of neural networks for descent
gradient of instantaneous mean square distributed error
considers as the multi-objective optimization. On the
other hand the GA with descent gradient of
instantaneous mean square back-propagated error
considers only one objective i.e. one common error
function for objective function for all the layers. So
that, number of optimal solutions or counts are
reflecting only the converged weight matrices or
optimal weight matrices to obtain only one minimum of
error. Thus, it exhibits the case of single objective
optimization. It can be seen from the result of Table 5
& 6 that the performance of neural network architecture
with descent gradient of instantaneous mean square
distributed error for multi objective optimization is
approximately same as GA with descent gradient with
back-propagated error for single objective optimization
on the parameter of number of iterations and number of
counts.
5. Conclusion
In this work we have considered the simulation of two
neural network architectures for their performance
evaluation with descent gradient of instantaneous mean
square distributed error with GA and descent gradient
of instantaneous mean square backpropagated error
with GA for the classification of handwritten ‘Marathi’
curve scripts. We considered the instantaneous mean
square distributed error as the mean of square
difference between target output pattern and actual
output pattern from each unit of each layer differently
correspond to present input pattern. Thus, the common
target pattern is used by each layer with their respective
different computed actual output pattern. Therefore in
this approach the convergence for the given training
samples is considered only when three different error
functions are minimized simultaneously. Hence, the
optimum solution is constraints with three objectives
functions and this reflects the case of multi objective
optimization instead of single objective optimization as
in the case of descent gradient of instantaneous mean
square backpropagated error. Therefore on the basis of
simulation results & analysis the following
observations can be drawn:
1. It can observe that the performance of GA with
descent gradient of distributed error for multi
objective optimization is better in most of the cases
than GA with descent gradient of backpropagated
error for single optimization in terms of number of
optimize solutions or counts. This is obvious that
number of iteration for GA with descent gradient
of distributed error are more because the in this
method there are three objective functions and all
of them should minimize for the optimal solution.
2. It can also see from the results that the behavior of
GA with descent gradient of distributed error is
IJoART
106
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014ISSN 2278-7763
Copyright © 2013 SciResPub. IJOART
more consistent & exhibiting less randomness in
compare to GA with descent gradient of
backpropagated error. There is also another
interesting observation about the performance of
neural networks for GA with descent gradient of
distributed error for the number of counts and
iterations for the new pattern information and for
the same pattern information with different
examples. Every time for the same pattern
information with different examples the number of
counts are more & number of iteration are less and
for new pattern information these counts are low &
number of iterations are high. So that when we
move from one unknown local error minimum to
another unknown local error minimum there is less
number of optimum solutions and it requires more
number of iterations to converge.
3. Generally the GA starts form the random solutions
and converge towards the optimal solution. Hence
in multi objective optimization the randomness of
GA more increases and possibility to obtain
optimal solution decreases. In the proposed
technique, the GA does not start from random
population of solutions but instead of this it starts
from the sub-optimal solutions, because the GA is
applied after the some iteration of descent gradient
of instantaneous mean square distributed error.
These iterations explore the direction for
convergence and from here the GA starts. Thus,
GA starts from sub-optimal solutions and moves
towards the optimal solutions.
4. The multi objective optimization is a dominate
thrust area in soft computing research. There are
various real world problems where multi objective
optimization is required. The proposed method
may explore the possibility to achieve the optimal
solutions for various problems of multi objective
optimization. The performance of GA with descent
gradient of distributed error can be more improved
with different methods of image processing for
feature extraction from the handwritten curve
scripts. These aspects can consider for future work
to evaluate the performance for propose method on
various problem domain.
References
[1] Kumar, S., “Neural Networks: A Class room
approach”, New Delhi: Tata McGraw-Hill
(2004)
[2] Sun, Y., “Hopfield neural network based
algorithms for image restoration and
reconstruction-Part I: Algorithms and
Simulations”, IEEE Transaction on Signal
Process vol. 48(7), pp. 2105-2118 (2000)
[3] Szu, H., Yang, X., Telfer, B. and Sheng, Y.,
“Neural network and wavwlet transform for
scale invariant data classification”, Phys. Rev.
E 48, pp. 1497-1501 (1993)
[4] Nagy, G., “Classification Algorithms in
Pattern Recognition,” IEEE Transactions on
Audio and Electroacoustics, vol. 16(2), pp.
203-212 (1968)
[5] Hoppensteadt, F.C. and Ihikevich, E.M.,
“Synchronization of Laser Oscillators,
Associative Memory, and Optical
Neurocomputing,” Phys. Rev., vol. 62(E), pp.
4010-4013 (2000)
[6] Keith, L.P., “Classification Of Cmi energy
levels using counterpropagation neural
networks,” Phys. Rev., vol. 41(A), pp. 2457-
2461 (1990)
[7] Carlson, J.M., Langer, J.S. and Shaw, B.E.,
“Dynamics of earthquake faults,” Reviews of
modern physics, vol. 66(2), pp. 657-670
(1994)
[8] Palaniappan, R., “Method of identifying
individuals using VEP signals and neural
IJoART
107
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014ISSN 2278-7763
Copyright © 2013 SciResPub. IJOART
networks,” IEE Proc. Science Measurement
and Technology, vol. 151(1), pp. 16-20 (2004)
[9] Zhao, H., “Designing asymmetric neural
networks with associative memory,” Phys.
Rev. vol. 70(6), pp. 137-141 (2004)
[10] Schutzhold, R., “Pattern recognition on a
quantum computer,” Phys. Rev. vol. 67(A),
pp. 311-316 (2003)
[11] Impedovo, S., “Fundamentals in Handwriting
Recognition.” NATO-Advanced Study
Institute, vol. 124, Springer-Verlag (1994)
[12] Mori, S., Suen, C.Y. and Yamamoto, K.,
“Historical review of OCR research and
development,” Proceeding of the IEEE, vol.80
(7), pp. 1029-1058 (1992)
[13] Fukushima, K. and Wake, N., “Handwritten
alphanumeric character recognition by the
neocognitron,” IEEE transaction on Neural
Networks, vol. 2(3), pp. 355-365 (1991)
[14] Blackwell, K.T., Vogl, T.P., Hyman S.D.,
Barbour, G.S. and Alkon, D.L., “A New
Approach to Handwritten Character
Recognition,” Pattern Recognition vol. 25, pp.
655-666 (1992)
[15] Ie Cun, Y., Boser, B., Denkar, J.S.,
Henderson, D., Howard, R.E., Hubbard, W.,
and Jackel, L.D., “Handwritten Digit
Recognition with a Back-Propagation
Network,” Advances in Neural Information
Processing Systems, vol. 2, pp. 396-404
(1990)
[16] Kharma, N.N., and Ward, R.K., “A novel
invariant mapping applied to hand-written
Arabic character recognition,” Pattern
Recognition vol. 34(11), pp. 2115-2120
(2001)
[17] Badi, K. and Shimura, M., “Machine
recognition of Arabic cursive script,” Trans.
Inst. Electron. Commun. Eng., vol.65(E), pp.
107-114 (1982)
[18] Suen, C.Y., Nadal, C., Lagault, R., Mai, T.A.,
and Lam, L., “Computer recognition of
unconstrained handwritten numerals,” Proc.
IEEE, vol. 80(7), pp. 1162-1180 (1992)
[19] Knerr, S., Personnaz, L., and Dreyfus, G.,
“Handwritten digit recognition by neural
networks with single-layer training,” IEEE
Trans. on Neural Networks, vol. 3, pp. 962-
968 (1992)
[20] Lee, S.W., and Song, H.H., “A New Recurrent
Neural Network Architecture for Visual
Pattern Recognition,” IEEE Trans. on Neural
Networks, vol. 8(2), pp. 331-340 (1997)
[21] Urbanczik, R., “A recurrent neural network
inverting a deformable template model of
handwritten digits,” Proc. Int. Conf. Artificial
Neural Networks, Sorrento, Italy, pp. 961-964
(1994)
[22] Hagan, M.T., Demuth, H.B. and Beale, M.H.,
“Neural Network Design,” PWS Publishing
Co., Boston, MA (1996)
[23] Rumelhart, D.E., Hinton G.E., and Williams
R.J., “Learning internal representations by
error propagation.”, MIT Press, Cambridge,
vol. 1,pp. 318–362 (1986).
[24] Sprinkhuizen-Kuyer, I.G., and Boers, E.J.W.,
“The Local Minima of the error surface of the
2-2-1 XOR network,” Annals of Mathematics
and Artificial Intelligence, vol. 25(1-2), pp.
107-136 (1999)
[25] Zweiri, Y.H., Seneviratne, L.D., and
Althoefer, K., “Stability Analysis of a Three-
Term Backpropagation algorithm,” Neural
Networks Journal, vol. 18(10), pp. 1341-1347
(2005)
[26] Abarbanel, H., Talathi, S., Gibb, L., and
Rabinovich, M., “Synaptic plasticity with
IJoART
108
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014ISSN 2278-7763
Copyright © 2013 SciResPub. IJOART
discrete state synapses,” Phys. Rev., vol. E,
72:031914 (2005)
[27]Shrivastava, S. and Singh, M.P.,
“Performance evaluation of feed-forward
neural network with soft computing
techniques for hand written English
alphabets”, Journal of Applied Soft
Computing, vol. 11, pp. 1156-1182
(2011)
IJoART
109