Bucket Brigade

MACHINE LEARNING USING A GENETIC-BASED APPROACH.

Michael O. OdetayoSchool of Mathematical and Information Sciences,

Coventry University, Priory Street, Coventry CV1 5FB, UKPhone: +1203 838377, Fax: +1203 8388585

email: [email protected]

ABSRACT: A Holland learning classifier system is one of the methods for applying a genetic-based approach tomachine learning applications. An enhanced version of the system that employs the Bucket-brigade algorithm toreward individuals in a chain of co-operating rules is implemented and assigned the task of learning rules forclassifying simple objects. Results are presented which show that the system was able to learn rules for the task. It isargued that a classifier based learning method requires little training examples and that by its use of geneticalgorithms to search for new plausible rules, the method should be able to cope with changing conditions. However,the results appear to indicate that the use of bucket-brigade as a fitness function and as a means of rewarding rules inthe type of application considered needs further investigation.

KEYWORDS: Classifiers, Bucket-brigade, Genetic Algorithms, Machine Learning

1. INTRODUCTION

A Holland Classifier Learning system is one of the ways of using evolutionary methodology for machine learningapplications. These systems are a class of rule-based, message processing systems, Holland (1986); Goldberg (1985);Riolo (1986); Holland, Holyoak, Nisbett and Thagard (1986); Goldberg (1989); Odetayo (1990). Rules in thesesystems are known as classifiers because they are mainly used to classify messages into general sets. Many rulescould be active simultaneously since the only action of an active classifier is to post a message onto a message list.

Learning in classifier systems is achieved by two mechanisms: Bucket-brigade and Genetic Algorithms, Holland(1986). Bucket-brigade is designed for allocating credit to classifiers or rules according to their usefulness in attainingsystems goals. It is also designed as a fitness function; rating each classifier in the system. Genetic Algorithms areused to search for new plausible classifiers or rules by recombining good classifiers to produce new ones.

An enhanced version of the algorithm is implemented and assigned the task of evolving learning rules for classifyingsimple objects described in terms of their features. Results are presented that demonstrate the ability of the method toevolve learning rules for the given task. It is argued that the algorithm does not require a lot of training examples.However, the results indicate that the effectiveness of bucket-brigade as a fitness function and as a mechanism forrewarding rules in the type of application considered needs further investigation.

A Holland classifier system is briefly described in section 2. Section 3 describes the classification task. Theimplemented version of the classifier system is described in section 4. Experimental Results are presented in section 5and a conclusion and discussion is given in section 6.

2. CLASSIFIER SYSTEMS

Classifier systems are a class of rule-based, message processing systems, Holland (1986); Goldberg (1985); Riolo(1986); Holland, Holyoak, Nisbett and Thagard (1986); Goldberg (1989); Odetayo (1990). Rules are known asclassifiers because they are mainly used to classify messages into general sets. Learning in classifier systems isachieved by two mechanisms: Bucket-brigade and Genetic Algorithms. Bucket-brigade allocates strength (credit) tothe classifiers according to their usefulness in attaining system goals. Genetic Algorithms are used to search for newplausible classifiers.

A basic classifier learning system is made up of an input interface, a classifier list, a message list, an output interface,Bucket-brigade and Genetic Algorithm.

2.1 CLASSIFIER LIST

The classifier list is the system’s long term memory. It is made up of a population of classifiers. A classifier is made upof one or more conditions (known as the condition part) and one action (called the action part). The condition partspecifies the set of messages to which a classifier is sensitive, and the action part indicates the message it will broadcastor send out when its condition part is satisfied.Thus, a classifier list consists of one or more classifiers of the form:

c1, c2, ..., cn/a

where c1, c2, ..., cn {n >= 1} are the conditions making up the condition part and ‘a’ is the action part.

Each ci is a string of fixed length K over a fixed alphabet. In most practical systems, the string is defined over three

alphabets: {1,0,#}. The ‘#’ is a don’t care (wild card) symbol that can match any of the chosen symbols.

For example, a classifier having a condition represented by:1#01

will recognise (match) the following messages:1101 and 1001

A classifier posts one or more messages onto the message list when it is activated.

The action part of a classifier is used to form the message it sends out when it is activated. It is also a string of fixedlength K defined over the same alphabet as the condition part. However, the meaning and purpose of ‘#’ differ fromthat of the condition part. Here it plays the role of a pass “through”. Wherever it occurs in the string of the action part,the alphabet in the corresponding position of the message string that satisfied the classifier’s condition is passed throughinto the out-going message from the classifier.

Let v1, v2, .., vk, vi ε {1,0} be a string of K symbols in the condition part of a classifier and m1, m2, ..., mk, mi ε {1,0}

be a message of K symbols matching the condition part of the classifier. Then the message sent out has value at positioni as follows:

vi if vi = 1 or 0

and mi if vi = #That is, the action part is designed for recoding recognised messages.

2.2 MESSAGE LIST

The message list acts as the system’s short-term memory and as the medium for communication between classifiers,and the output interface. It is made up of external messages (input observations) and internal messages (messages fromclassifiers). A message is represented by a string of fixed length K (same length as that for a condition or an action)over the same set of alphabets as the action or condition part of a classifier except that the alphabet ‘#’ is not allowed.

2. 3 INPUT INTERFACE

This receives the input messages from the environment and transforms them into fixed length strings to be placed on themessage list.

2.4 OUTPUT INTERFACE

Messages placed on the message list by classifiers are processed through the output interface in order to communicatewith the system’s environment.

2.5 BUCKET-BRIGADE ALGORITHM

The bucket-brigade algorithm is designed to solve the credit assignment problem for classifier systems and to determinethe worth of each classifier. The credit assignment problem is that of deciding which of a set of early active classifiersshould receive credit for setting the stage for later successful actions. To this end, a numerical quantity (called strength)is assigned to each classifier. This strength is adjusted continually by the algorithm to reflect the classifier’s pastusefulness. Each classifier whose condition part is satisfied by one or more messages makes a bid to post a messageonto the message list. Only the highest bidders are allowed to become active and hence post messages. A classifier’s biddepends on its strength and the specificity of its condition. The specificity measures the relevance of a classifier’scondition to a particular message. Formally, a classifier’s bid is defined as:

Bid = a*strength*specificity.where specificity = number of non-#s/Total Length of condition part; a = a constant less than 1 (usually 1/8 or 1/16)

The winning classifiers place their messages on the message list and their strengths are reduced by the amount of theirbids. Typically, if Sc(t) denotes the strength of a winning classifier, c, at time t and Bc(t) denotes its bid at the same

time t, then its strength at time t+1 is:Sc(t+1) = Sc(t) - Bc(t).

In other words a winner pays for posting a message on the list. Classifiers that sent messages matched by this winner

have their strengths increased by the amount of its bid. For example, if c’ sent the message matched by c above, then

the strength of c’ at time t+1 is:

Sc’(t+1) = Sc’(t) + Bc(t)

When system goals are attained, a pay-off (reward or punishment) is added to the strength of all the classifiers that areactive at that time.

2.6 BASIC EXECUTION CYCLE OF A LEARNING CLASSIFIER SYSTEM

The basic execution cycle of a learning classifiers system is:(i) Place message(s) from input interface onto message list.(ii) Compare message(s) on the message list with conditions of all classifiers and record matches.(iii) Make classifiers bid for the right to recode messages.(iv) Generate new messages.(v) Adjust strengths of classifiers using the bucket-brigade algorithm.(vi) Replace old message list by the new message list.(vii) Process message list through the output interface.(viii) If no classifier fired, use genetic algorithm to generate one or more new classifiers to replace theweakest classifier or classifiers.(ix) Return to step (i) or stop if no more messages.

3. CLASSIFICATION TASK

The learning program is assigned the task of inducing classification rules for classifying simple objects described interms of 8 attributes. The task is to classify an object that has one or more of the following features:

wing, 2-legs/wheels, 3-legs/wheels. 4-legs/wheels, small, big, flies

into one of the following:bird, plane, vehicle or bird/plane

4. THE IMPLEMENTED VERSION OF THE HOLLAND CLASSIFIER LEARNING SYSTEM

The characteristics of the enhanced version of the Holland classifier learning system developed and implemented for thetask described in section 3 above are given below.

4.1 CLASSIFIER LIST

For the sake of simplicity, each classifier had one condition and one action. The condition was represented as a fixedstring of characters defined over the alphabets {1, 0, #}. A ‘1’ indicates a feature that must be present, ‘0’ represents afeature that must be absent and ‘#’ indicates a feature that may or may not be present. The action part was alsorepresented as a string of characters over the same three alphabets as the condition part.

4.2 MESSAGE LIST

The message list was made up of training examples and messages generated by the classifiers. The training exampleswere not removed from the message list until the end of the learning session. They were ‘fed’ into the system againafter each generation. This departs from the standard practice of generating a completely new message list at eachgeneration. The procedure implemented here ensured that newly produced classifiers had a chance of bidding for andclassifying the training examples no matter when they were generated, and it also ensured that messages which were notrecognised at previous generations were not lost.

4.3 INITIAL CLASSIFIERS AND CLASSIFIER STRENGTHS

An example pattern for each class was randomly chosen, and used to form part of the initial classifiers. Some featureswere randomly converted to don’t cares and pass throughs. Pairs were then randomly selected from these initialclassifiers and ‘mated’ to produce the remaining members of a fixed-size population of classifiers.

Each newly generated classifier is assigned a value which is modified by the bucket-brigade to reflect is usefulness.This value is the same initially for each classifier.

4.4 BUCKET-BRIGADE

The bucket-brigade algorithm was modified so that external payoffs (rewards or punishment), when received, are paidto the classifier responsible for the action that earned the payoff and to all those that had taken part (in previousgenerations) in helping to arrive at that goal. It should be noted that in the standard classifier system, payoffs are addedto the strengths of classifiers active at the time the payoff is received see section 2 above. The new measure wasintroduced to overcome some of the problems encountered during the initial stages of the experiments. It wasdiscovered that some bad classifiers were rewarded simply because they happened to be active when payoff wasreceived. It was designed to prevent this from happening. The measure was also designed to speed up the passage ofreward or punishment to classifiers responsible for previous actions that led to the receipt of the payoff. In a standardimplementation of the bucket-brigade algorithm, a particular sequence of actions will have to be repeated several timesbefore payoffs could flow through to the classifiers that were active at the early stages.

4.5 REPLACEMENT OF CLASSIFIERS

In addition to the strength of a classifier which is modified by the bucket-brigade, another metric called NO_of_winswas introduced to determine when a classifier is to be removed from the fixed-size classifier list so as to have room fornew ones. The metric is a simple count of the number of times a classifier succeeded in bidding for the right to placemessages onto the message list. Whenever new classifiers are to be introduced into the system, those whose strengthsfell below the fixed value assigned to each newly generated classifier and those whose NO-of-wins is zero are replaced.

Standard classifier systems typically employ only the strength of a classifier to determine the ‘life’ span of theparticular classifier. One major problem with this arrangement is that a classifier that is necessary for some sequence ofactions (i.e. it is part of a chain) could be removed because its strength happens to be lower than other members of thepopulation. The new metric is designed to overcome this problem.

5. EXPERIMENTAL RESULTS

The system was able to learn rules for the given task using only a few training examples and starting with classifiersthat were randomly generated. The quality of training examples was not taken into account before they were chosen.

5.1 WRONG CONDITIONS

Initial experiments showed that at the end of a training period (i.e. when all the training examples have beensuccessfully recognised at a generation), some of the classifiers evolved had wrong/illegal conditions. For example, oneclassifier for recognising a plane had Wing, flies as mandatory conditions and engine as a wild card. In other words itwould classify an object that has Wings and flies but has no engine as a plane. Possible solutions to this probleminclude: careful selection of training set, increasing the size of the training set, controlling the application of geneticoperators so that valid offspring are always produced). We chose none of these solutions since our main goal is toexploit the ability of the algorithm to perform well in a noisy environment and its ability to make little or no assumptionabout its problem domain. Instead, we chose to invoke a weeding process to remove all illegal classifiers when allexamples have been classified and to re-train the system. Although this process increased the time it took the system tolearn the needed rules, the ability of the system to correctly classify previously unseen data was increased.

5.2 TRAINING SPEED

As mentioned in 4.4, 4.5 and 5.1 some measures were introduced so as to enhance the performance of the algorithm.The measure introduce in section 5.1 did increase the time for the system to complete its training sessions. It took anaverage of about 4 hours for the system to evolve the required rules. We are of the opinion that a parallelimplementation of the algorithm would speed up the learning process.

6. CONCLUSION AND DISCUSSION.

A genetic based approach – an enhanced version of the Holland classifier learning system - was able to evolve learningclassification rules. However the learning rate of a serial implementation of a version of the algorithm is slow. As themethod is essentially parallel, it is expected that a parallel implementation of the method will improve its learning ratesignificantly.

It is noteworthy that the mechanism does not require a lot of training examples. In addition, by the use of GeneticAlgorithms to search for new plausible rules, the method should be able to cope with changing conditions.

As mentioned in sections 4.4, 4.5, and 5.1 a number of measures were introduced to weed out bad classifiers and tospeed up the passage of payoffs to the classifiers that co-operated to achieve a payoff. The fact that these measures wereneeded seem to suggest that the effectiveness of using bucket-brigade as a fitness function and as a means of rewardingclassifiers in this type of application domain needs further investigation.

7. REFERENCES

Goldberg , David E. (1989), “Genetic Algorithms in Search, Optimization & Machine Learning”, Addison-WesleyPublishing Company, Inc., 1989.

Goldberg, David E. (1985), “Dynamic System Control Using Rule Learning and Genetic Algorithms”, 9th Intl. J. Conf.Artif. Intelligence (IJCAI), pp. 588 - 592, 1985.

Holland, John H. (1986), “Escaping Brittleness: The possibilities of General - purpose Learning Algorithms Applied toParallel Rule - based Systems”, Machine Learning an Artificial Intelligence Approach Vol. II, pp. 593 -623, ed. R.S.Michalski, J. G. Carbonnell and T. M. Mitchell, Tioga, Palo Alto, Calf., 1986.

Holland, John H.; Holyoak, Keith J.; Nisbett, Richard E.; and Thagard, Paul R., (1986), “Induction Processes ofInference, Learning and Discovery”, MIT Press, Cambridge, 1986.

Odetayo, Michael O. (1990), “On Genetic Algorithms in Machine Learning And Optimisation”, PhD Thesis,University of Strathclyde, Glasgow, U.K. 1990.

Riolo, Rick, L. (1986), “ A Package of Domain Independent Subroutines for Implementing Classifier Systems inArbitrary, User-defined Environments”, Logic Computers Group, Division of Computer Science and Engineering,University of Michigan, Jan. 1986.

Documents

Bucket Brigade