Upload
others
View
27
Download
4
Embed Size (px)
Citation preview
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 1
Lecture 12Lecture 12
Hybrid intelligent systems:Hybrid intelligent systems:Evolutionary neural networks and fuzzy Evolutionary neural networks and fuzzy
evolutionary systemsevolutionary systems
�� IntroductionIntroduction
�� Evolutionary neural networksEvolutionary neural networks
�� Fuzzy evolutionary systemsFuzzy evolutionary systems
�� SummarySummary
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 2
Evolutionary neural networksEvolutionary neural networks
�� Although neural networks are used for solving a Although neural networks are used for solving a
variety of problems, they still have some variety of problems, they still have some
limitations. limitations.
�� One of the most common is associated with neural One of the most common is associated with neural
network training. The backnetwork training. The back--propagation learning propagation learning
algorithm cannot guarantee an optimal solution. algorithm cannot guarantee an optimal solution.
In realIn real--world applications, the backworld applications, the back--propagation propagation
algorithm might converge to a set of subalgorithm might converge to a set of sub--optimal optimal
weights from which it cannot escape. As a result, weights from which it cannot escape. As a result,
the neural network is often unable to find a the neural network is often unable to find a
desirable solution to a problem at hand. desirable solution to a problem at hand.
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 3
�� Another difficulty is related to selecting an Another difficulty is related to selecting an
optimal topology for the neural network. The optimal topology for the neural network. The
““rightright”” network architecture for a particular network architecture for a particular
problem is often chosen by means of heuristics, problem is often chosen by means of heuristics,
and designing a neural network topology is still and designing a neural network topology is still
more art than engineering.more art than engineering.
�� Genetic algorithms are an effective optimisation Genetic algorithms are an effective optimisation
technique that can guide both weight optimisation technique that can guide both weight optimisation
and topology selection.and topology selection.
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 4
y
0.91
3
4
5
6
7
8
x1
x3
x22
-0.8
0.4
0.8
-0.7
0.2
-0.2
0.6
-0.3 0.1
-0.2
0.9
-0.60.1
0.3
0.5
From neuron:
To neuron:
1 2 3 4 5 6 7 8
1
2
3
4
5
6
7
8
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0.9 -0.3 -0.7 0 0 0 0 0
-0.8 0.6 0.3 0 0 0 0 0
0.1 -0.2 0.2 0 0 0 0 0
0.4 0.5 0.8 0 0 0 0 0
0 0 0 -0.6 0.1 -0.2 0.9 0
Chromosome: 0.9 -0.3 -0.7 -0.8 0.6 0.3 0.1 -0.2 0.2 0.4 0.5 0.8 -0.6 0.1 -0.2 0.9
Encoding a set of weights in a chromosomeEncoding a set of weights in a chromosome
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 5
�� The second step is to define a fitness function for The second step is to define a fitness function for
evaluating the chromosomeevaluating the chromosome’’s performance. This s performance. This
function must estimate the performance of a function must estimate the performance of a
given neural network. We can apply here a given neural network. We can apply here a
simple function defined by the sum of squared simple function defined by the sum of squared
errors. errors.
�� The training set of examples is presented to the The training set of examples is presented to the
network, and the sum of squared errors is network, and the sum of squared errors is
calculated. The smaller the sum, the fitter the calculated. The smaller the sum, the fitter the
chromosome. chromosome. The genetic algorithm attempts The genetic algorithm attempts
to find a set of weights that minimises the sum to find a set of weights that minimises the sum
of squared errors.of squared errors.
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 6
�� The third step is to choose the genetic operators The third step is to choose the genetic operators ––
crossover and mutation. A crossover operator crossover and mutation. A crossover operator
takes two parent chromosomes and creates a takes two parent chromosomes and creates a
single child with genetic material from both single child with genetic material from both
parents. Each gene in the childparents. Each gene in the child’’s chromosome is s chromosome is
represented by the corresponding gene of the represented by the corresponding gene of the
randomly selected parent.randomly selected parent.
�� A mutation operator selects a gene in a A mutation operator selects a gene in a
chromosome and adds a small random value chromosome and adds a small random value
between between −−1 and 1 to each weight in this gene.1 and 1 to each weight in this gene.
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 7
Crossover in weight optimisationCrossover in weight optimisation
3
4
5
y6
x22
-0.3
0.9
-0.7
0.5
-0.8
-0.6
Parent 1
x11
-0.2
0.1
0.4
3
4
5
y6
x22
-0.1
-0.5
0.2
-0.9
0.6
0.3
Parent 2
x11
0.9
0.3
-0.8
0.1 -0.7 -0.6 0.5 -0.8-0.2 0.9 0.4 -0.3 0.3 0.2 0.3 -0.9 0.60.9 -0.5 -0.8 -0.1
0.1 -0.7 -0.6 0.5 -0.80.9 -0.5 -0.8 0.1
3
4
5
y6
x22
-0.1
-0.5
-0.7
0.5
-0.8
-0.6
Child
x11
0.9
0.1
-0.8
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 8
Mutation in weight optimisationMutation in weight optimisation
Original network3
4
5
y6
x22
-0.3
0.9
-0.7
0.5
-0.8
-0.6x11
-0.2
0.1
0.4
0.1 -0.7 -0.6 0.5 -0.8-0.2 0.9
3
4
5
y6
x22
0.2
0.9
-0.7
0.5
-0.8
-0.6x11
-0.2
0.1
-0.1
0.1 -0.7 -0.6 0.5 -0.8-0.2 0.9
Mutated network
0.4 -0.3 -0.1 0.2
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 9
Can genetic algorithms help us in selecting Can genetic algorithms help us in selecting
the network architecture?the network architecture?
The architecture of the network (i.e. the number of The architecture of the network (i.e. the number of
neurons and their interconnections) often neurons and their interconnections) often
determines the success or failure of the application. determines the success or failure of the application.
Usually the network architecture is decided by trial Usually the network architecture is decided by trial
and error; there is a great need for a method of and error; there is a great need for a method of
automatically designing the architecture for a automatically designing the architecture for a
particular application. Genetic algorithms may particular application. Genetic algorithms may
well be suited for this task.well be suited for this task.
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 10
�� The basic idea behind evolving a suitable network The basic idea behind evolving a suitable network
architecture is to conduct a genetic search in a architecture is to conduct a genetic search in a
population of possible architectures.population of possible architectures.
�� We must first choose a method of encoding a We must first choose a method of encoding a
networknetwork’’s architecture into a chromosome.s architecture into a chromosome.
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 11
Encoding the network architectureEncoding the network architecture
�� The connection topology of a neural network can The connection topology of a neural network can
be represented by a square connectivity matrix. be represented by a square connectivity matrix.
�� Each entry in the matrix defines the type of Each entry in the matrix defines the type of
connection from one neuron (column) to another connection from one neuron (column) to another
(row), where 0 means no connection and 1 (row), where 0 means no connection and 1
denotes connection for which the weight can be denotes connection for which the weight can be
changed through learning. changed through learning.
�� To transform the connectivity matrix into a To transform the connectivity matrix into a
chromosome, we need only to string the rows of chromosome, we need only to string the rows of
the matrix together.the matrix together.
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 12
Encoding of the network topologyEncoding of the network topology
From neuron:
To neuron:
1 2 3 4 5 6
1
2
3
4
5
6
0 0 0 0 0 0
0 0 0 0 0 0
1 1 0 0 0 0
1 0 0 0 0 0
0 1 0 0 0 0
0 1 1 1 1 0
3
4
5
y6
x22
x11
Chromosome:
0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 1 0
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 13
The cycle of evolving a neural network topologyThe cycle of evolving a neural network topology
Neural Network j
Fitness = 117
Neural Network j
Fitness = 117Generation i
Training Data Set 0 0 1.0000
0.1000 0.0998 0.8869
0.2000 0.1987 0.7551
0.3000 0.2955 0.61420.4000 0.3894 0.4720
0.5000 0.4794 0.3345
0.6000 0.5646 0.2060
0.7000 0.6442 0.0892
0.8000 0.7174 -0.0143
0.9000 0.7833 -0.10381.0000 0.8415 -0.1794
Child 2
Child 1
CrossoverParent 1
Parent 2
Mutation
Generation (i + 1)
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 14
Fuzzy evolutionary systemsFuzzy evolutionary systems
�� Evolutionary computation is also used in the Evolutionary computation is also used in the
design of fuzzy systems, particularly for generating design of fuzzy systems, particularly for generating
fuzzy rules and adjusting membership functions of fuzzy rules and adjusting membership functions of
fuzzy sets. fuzzy sets.
�� In this section, we introduce an application of In this section, we introduce an application of
genetic algorithms to select an appropriate set of genetic algorithms to select an appropriate set of
fuzzy IFfuzzy IF--THEN rules for a classification problem.THEN rules for a classification problem.
�� For a classification problem, a set of fuzzy For a classification problem, a set of fuzzy
IFIF--THEN rules is generated from numerical data. THEN rules is generated from numerical data.
�� First, we use a gridFirst, we use a grid--type fuzzy partition of an input type fuzzy partition of an input
space.space.
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 15
Fuzzy partition by a 3Fuzzy partition by a 3××××××××3 fuzzy grid3 fuzzy grid
0 1
A1 A2 A3
X1
B2
B1
B3
0
1X2
Class 1:
Class 2:
µ(x1)
µ(x2)
0
10 1
1
2
3
6
7
45
9
8
1110
12
16
15
14
13
x11
x21
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 16
�� Black and white dots denote the training patterns Black and white dots denote the training patterns
of of ClassClass 1 and 1 and ClassClass 2, respectively. 2, respectively.
�� The gridThe grid--type fuzzy partition can be seen as a type fuzzy partition can be seen as a
rule table. rule table.
�� The linguistic values of input The linguistic values of input xx1 (1 (AA11, , AA22 and and AA33) )
form the horizontal axis, and the linguistic form the horizontal axis, and the linguistic
values of input values of input xx2 (2 (BB11, , BB22 and and BB33) form the ) form the
vertical axis. vertical axis.
�� At the intersection of a row and a column lies the At the intersection of a row and a column lies the
rule consequent. rule consequent.
Fuzzy partitionFuzzy partition
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 17
In the rule table, each fuzzy subspace can have In the rule table, each fuzzy subspace can have
only one fuzzy IFonly one fuzzy IF--THEN rule, and thus the total THEN rule, and thus the total
number of rules that can be generated in a number of rules that can be generated in a KK××KKgrid is equal to grid is equal to KK××KK. .
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 18
Fuzzy rules that correspond to the Fuzzy rules that correspond to the KK××KK fuzzy fuzzy
partition can be represented in a general form as:partition can be represented in a general form as:
where where xxpp is a training pattern on input space is a training pattern on input space XX11××XX2, 2,
PP is the total number of training patterns, is the total number of training patterns, CCnn is the is the
rule consequent (either rule consequent (either ClassClass 1 or 1 or ClassClass 2), and 2), and
is the certaintyis the certainty factor that a pattern in fuzzy factor that a pattern in fuzzy
subspace subspace AAiiBBjj belongs to class belongs to class CCnn..
is Ai i = 1, 2, . . . , K
is Bj j = 1, 2, . . . , K
Rule Rij :
IF x1p
THEN xp
AND x2p
∈ Cn
n
ji
C
BACF xp = (x1p, x2p), p = 1, 2, . . . , P
CFCFAAii BBjjCCnn
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 19
To determine the rule consequent and the certainty To determine the rule consequent and the certainty
factor, we use the following procedure:factor, we use the following procedure:
Step 1Step 1:: Partition an input space into Partition an input space into KK××KK fuzzy fuzzy
subspaces, and calculate the strength of each class subspaces, and calculate the strength of each class
of training patterns in every fuzzy subspace.of training patterns in every fuzzy subspace.
Each class in a given fuzzy subspace is represented Each class in a given fuzzy subspace is represented
by its training patterns. The more training patterns, by its training patterns. The more training patterns,
the stronger the class the stronger the class −− in a given fuzzy subspace, in a given fuzzy subspace,
the rule consequent becomes more certain when the rule consequent becomes more certain when
patterns of one particular class appear more often patterns of one particular class appear more often
than patterns of any other class.than patterns of any other class.
Step 2Step 2:: Determine the rule consequent and the Determine the rule consequent and the
certainty factor in each fuzzy subspace.certainty factor in each fuzzy subspace.
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 20
The certainty factor can be interpreted as The certainty factor can be interpreted as
follows:follows:
�� If all the training patterns in fuzzy subspace If all the training patterns in fuzzy subspace AAiiBBjjbelong to the same class, then the certainty belong to the same class, then the certainty
factor is maximum and it is certain that any new factor is maximum and it is certain that any new
pattern in this subspace will belong to this class. pattern in this subspace will belong to this class.
�� If, however, training patterns belong to different If, however, training patterns belong to different
classes and these classes have similar strengths, classes and these classes have similar strengths,
then the certainty factor is minimum and it is then the certainty factor is minimum and it is
uncertain that a new pattern will belong to any uncertain that a new pattern will belong to any
particular class.particular class.
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 21
�� This means that patterns in a fuzzy subspace can This means that patterns in a fuzzy subspace can
be misclassified. Moreover, if a fuzzy subspace be misclassified. Moreover, if a fuzzy subspace
does not have any training patterns, we cannot does not have any training patterns, we cannot
determine the rule consequent at all.determine the rule consequent at all.
�� If a fuzzy partition is too coarse, many patterns If a fuzzy partition is too coarse, many patterns
may be misclassified. On the other hand, if a may be misclassified. On the other hand, if a
fuzzy partition is too fine, many fuzzy rules fuzzy partition is too fine, many fuzzy rules
cannot be obtained, because of the lack of cannot be obtained, because of the lack of
training patterns in the corresponding fuzzy training patterns in the corresponding fuzzy
subspaces.subspaces.
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 22
Training patterns are not necessarily Training patterns are not necessarily
distributed evenly in the input space. As a distributed evenly in the input space. As a
result, it is often difficult to choose an result, it is often difficult to choose an
appropriate density for the fuzzy grid. To appropriate density for the fuzzy grid. To
overcome this difficulty, we use overcome this difficulty, we use multiple multiple
fuzzy rule tablesfuzzy rule tables..
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 23
Multiple fuzzy rule tablesMultiple fuzzy rule tables
K = 2 K = 3 K = 4 K = 5 K = 6
Fuzzy IFFuzzy IF--THEN rules are generated for each fuzzy THEN rules are generated for each fuzzy
subspace of multiple fuzzy rule tables, and thus a subspace of multiple fuzzy rule tables, and thus a
complete set of rules for our case can be specified complete set of rules for our case can be specified
as: as:
2222 ++ 3322 ++ 4422 ++ 5522 ++ 6622 = 90 rules.= 90 rules.
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 24
Once the set of rules Once the set of rules SSALLALL is generated, a new is generated, a new
pattern, pattern, xx = (= (xx1, 1, xx2), can be classified by the 2), can be classified by the
following procedure:following procedure:
Step 1Step 1:: In every fuzzy subspace of the multiple In every fuzzy subspace of the multiple
fuzzy rule tables, calculate the degree of fuzzy rule tables, calculate the degree of
compatibility of a new pattern with each class.compatibility of a new pattern with each class.
Step 2Step 2:: Determine the maximum degree of Determine the maximum degree of
compatibility of the new pattern with each class.compatibility of the new pattern with each class.
Step 3Step 3:: Determine the class with which the new Determine the class with which the new
pattern has the highest degree of compatibility, pattern has the highest degree of compatibility,
and assign the pattern to this class.and assign the pattern to this class.
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 25
The number of multiple fuzzy rule tables The number of multiple fuzzy rule tables
required for an accurate pattern classification required for an accurate pattern classification
may be large. Consequently, a complete set of may be large. Consequently, a complete set of
rules can be enormous. Meanwhile, these rules rules can be enormous. Meanwhile, these rules
have different classification abilities, and thus have different classification abilities, and thus
by selecting only rules with high potential for by selecting only rules with high potential for
accurate classification, we reduce the number accurate classification, we reduce the number
of rules.of rules.
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 26
Can we use genetic algorithms for selecting Can we use genetic algorithms for selecting
fuzzy IFfuzzy IF--THEN rules ?THEN rules ?
�� The problem of selecting fuzzy IFThe problem of selecting fuzzy IF--THEN rules THEN rules
can be seen as a combinatorial optimisation can be seen as a combinatorial optimisation
problem with two objectives.problem with two objectives.
�� The first, more important, objective is to The first, more important, objective is to
maximise the number of correctly classified maximise the number of correctly classified
patterns.patterns.
�� The second objective is to minimise the number The second objective is to minimise the number
of rules. of rules.
�� Genetic algorithms can be applied to this Genetic algorithms can be applied to this
problem.problem.
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 27
A basic genetic algorithm for selecting fuzzy IFA basic genetic algorithm for selecting fuzzy IF--
THEN rules includes the following steps:THEN rules includes the following steps:
Step 1Step 1:: Randomly generate an initial population of Randomly generate an initial population of
chromosomes. The population size may be chromosomes. The population size may be
relatively small, say 10 or 20 chromosomes. relatively small, say 10 or 20 chromosomes.
Each gene in a chromosome corresponds to a Each gene in a chromosome corresponds to a
particular fuzzy IFparticular fuzzy IF--THEN rule in the rule set THEN rule in the rule set
defined by defined by SSALLALL..
Step 2Step 2:: Calculate the performance, or fitness, of Calculate the performance, or fitness, of
each individual chromosome in the current each individual chromosome in the current
population.population.
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 28
The problem of selecting fuzzy rules has two The problem of selecting fuzzy rules has two
objectives: to maximise the accuracy of the pattern objectives: to maximise the accuracy of the pattern
classification and to minimise the size of a rule set. classification and to minimise the size of a rule set.
The fitness function has to accommodate both these The fitness function has to accommodate both these
objectives. This can be achieved by introducing two objectives. This can be achieved by introducing two
respective weights, respective weights, wwPP and and wwNN, in the fitness function:, in the fitness function:
where where PPss is the number of patterns classified is the number of patterns classified
successfully, successfully, PPALLALL is the total number of patterns is the total number of patterns
presented to the classification system, presented to the classification system, NNSS and and NNALLALL are are
the numbers of fuzzy IFthe numbers of fuzzy IF--THEN rules in set THEN rules in set SS and set and set
SSALLALL, respectively., respectively.
ALL
SN
ALLP
N
Nw
P
PwSf s −=)(
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 29
The classification accuracy is more important than The classification accuracy is more important than
the size of a rule set. That is,the size of a rule set. That is,
ALL
S
ALL N
N
P
PSf s −=10)(
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 30
Step 3Step 3:: Select a pair of chromosomes for mating. Select a pair of chromosomes for mating.
Parent chromosomes are selected with a Parent chromosomes are selected with a
probability associated with their fitness; a better probability associated with their fitness; a better
fit chromosome has a higher probability of being fit chromosome has a higher probability of being
selected.selected.
Step 4Step 4: : Create a pair of offspring chromosomes Create a pair of offspring chromosomes
by applying a standard crossover operator. by applying a standard crossover operator.
Parent chromosomes are crossed at the randomly Parent chromosomes are crossed at the randomly
selected crossover point.selected crossover point.
Step 5Step 5:: Perform mutation on each gene of the Perform mutation on each gene of the
created offspring. The mutation probability is created offspring. The mutation probability is
normally kept quite low, say 0.01. The mutation normally kept quite low, say 0.01. The mutation
is done by multiplying the gene value by is done by multiplying the gene value by ––1.1.
Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 31
Step 6Step 6:: Place the created offspring chromosomes in Place the created offspring chromosomes in
the new population.the new population.
Step 7Step 7:: Repeat Repeat Step 3Step 3 until the size of the new until the size of the new
population becomes equal to the size of the initial population becomes equal to the size of the initial
population, and then replace the initial (parent) population, and then replace the initial (parent)
population with the new (offspring) population.population with the new (offspring) population.
Step 9Step 9:: Go to Go to Step 2Step 2, and repeat the process until a , and repeat the process until a
specified number of generations (typically several specified number of generations (typically several
hundreds) is considered.hundreds) is considered.
The number of rules can be cut down to less than The number of rules can be cut down to less than
2% of the initially generated set of rules.2% of the initially generated set of rules.