A neuro-expert system with a conflict-resolving meta-neural network

Pergamon Expert Systems With Applications, Vol. 9, No. 2, pp. 189-199, 1995

Copyright © 1995 Elsevier Science Ltd Printed in the USA. All tights reserved

0957-4174/95 $9.50 + .00

0957-4174(94)00061-1

A Neuro-Expert System With a Conflict-Resolving Meta-Neural Network

FIGEN ULGEN

Justsystem Corporation, 3-46 Okihamahigashi, Tokushima-shi 770, Japan

NORIO AKAMATSU

University of Tokushima, 2-1 Minami Josanjima-eho, Tokushima-shi 770, Japan

Abstract--Neural networks and expert systems are two different approaches to classification problems. Neural networks, powerful for generalization, robust system behavior, and parallel processing, are weak in explaining their results, whereas expert systems suffer from the knowledge acquisition bottleneck. This paper combines these two approaches, using the Ternary Synaptic Weights algorithm, in order to obtain a system with the benefits of both expert systems and neural networks without their inherent disabilities. We also introduce the concept of Meta-Neural Networks for classification conflict resolution, which is utilized in the implementation of a neuro-expert system for project management.

1. INTRODUCTION

IN THE PAST MANY classifier systems have been realized through expert systems, but due to the apparent success of neural networks in capturing regularities in sample data and the difficulties associated with the knowledge engineering of expert systems, the current trend is towards hybridization of these two approaches.

Expert systems and neural networks are actually quite different approaches to artificial intelligence; the former depending on top-down knowledge engineering and model-based reasoning, the latter being associated with bottom-up learning and case-based reasoning (Lee & Lippmann, 1989). In spite of their power in parallel processing, generalization ability, and fault tolerance, neural networks suffer from the black-box syndrome, being unable to explain their reasoning unless the internal connections of the network are arranged according to the interrelationships of the input nodes to reflect the dependency framework of the system (Goodman, Higgins, Miller, & Symth, 1992). Expert systems are criticized for the knowledge acquisition bottleneck and the brittleness problem which can be defined as the inability of the system to deal with aspects of reasoning

Special thanks to Andrew Flavell for his assistance in preparing this paper.

Requests for reprints should be sent to Nario Akarnatsu, University of Tokushima, 2-1 Minami Josanjima-cho, Tokushima-shi 770, Japan.

such as partial, uncertain, or fuzzy information and generalization (Gallant, 1988). A hybrid approach to solving classification problems, utilizing a neural network and an expert system, potentially offers the benefits of both, without the inherent disadvantages.

The focus of this paper is the concept of Meta-Neural Networks for conflict resolution. This concept is applied to a neuro-expert system in which the neural network is trained by the Ternary Synaptic Weights algorithm (TSW) (Ulgen & Akamatsu, 1994). The TSW algorithm automatically determines the topology of the neural network during training, such that the resultant topology reflects the relationships of the input attributes with the output classes and forms an intuitive base from which rules for an expert system can be extracted. TSW performs supervised training for binary input-output associations and the discrete representation of feature values is very suitable for conversion to rule-based representation. Because the training algorithm can find the associations between the input and output but does not know how they are related, an expert system can be built from the extracted rules. This functions as the explanation capability of the system, thereby forming a new neuro-expert system that overcomes the weak points of both systems.

Sensors, feature detectors, or historical data in tabular form may provide the input-output pairs with which to train the network, and the selection of this training data is crucial to the success of the resulting network.

189

190 F. Ulgen and N. Akamatsu

Generalization problems might occur when some training instances have attributes that are irrelevant to the decision. We refer to these attributes as don't care attributes. When the system is being tested these don't care attributes may create conflict situations because the generalized instances of these may not be within the limits of the originally taught input--output pairs (Hahn- Ming & Ching-Chi, 1991). In order to resolve conflicting outputs that the neural network might provide, we propose Meta-Neural Networks (Meta-NN), which are used in much the same way meta-rules are utilized in expert systems (Morris, 1986).

The remainder of this paper is organized as follows. Section 2.1 compares the widely utilized neural network training algorithm, Backpropagation, with the TSW training algorithm. This is followed, in Section 2.2, by an explanation of the classification of distinct patterns using TSW. Section 3 details conflicts that might arise due to the existence of don't care attributes in the input to classifier systems, and introduces Meta-Neural Networks as a solution to these conflicts. In Section 4, we present our neuro-expert system NESPRO, which is a prototype project management software package and this is followed by the conclusions in Section 5.

2. TSW ALGORITHM

2.1. Comparison of TSW Algorithm with Backpropagation Algorithm

The backpropagation (BP) algorithm, which has found widespread use in neural network training, seeks to indirectly form the hyperplanes that distinguish between distinct classes of patterns through the application of hill climbing by gradient descent, in order to minimize the error between the expected and acquired output values (Rumelhart & McClelland, 1986). It performs multiple passes over the training data, each pass including a forward phase for determination of outputs, and a backward phase for error feedback so that the synaptic link weight and node bias correction can be calculated. In comparison, the TSW algorithm, which is implemented on a three-layer network, determines the thresholds for the hidden and output layer nodes and the weights of the synaptic links between the layers in one feedforward pass. Although BP has been successfully utilized in applications such as NETtalk (Sejnowski & Rosenberg, 1986), it has some serious drawbacks including the local minima problem, slow speed of convergence, and a heuristic approach to formation of the topology of the network. Local minima trapping, which may occur depending upon the initial values assigned to the weights during training, disables the network from learning. Considering the slow nature of network training with the BP algorithm, the trial and error approach to the formation of the topology of the network is very costly. A number of learning algorithms utilizing binary syn-

aptic weights, which do not suffer from these drawbacks, have been proposed (Andree, Barkema, Lourens, Taal, & Vermeulen, 1993) and TSW is an example of this class of algorithms.

Neural networks are distinguished from expert systems in their ability to perform parallel processing when realized in hardware. A neural network trained with the TSW algorithm can be easily realized in hardware because of the nature of its synaptic links and the simple integer arithmetic it employs. In contrast, the hardware implementation of BP algorithm is very difficult. This is due to the calculation of the first derivation of the error function, which is necessary in determining the magni- tude of modification for every weight value in each backward pass, as can be seen in eq. (1)

0E - - o i f ' ( n e g ( t F o j) (1)

0 %

where oi is the output of the neuron in the previous layer i, net~ is the weighted sum being input to output layer neuron j, o r is that neuron's obtained activation, tj is the desired activation for that neuron, and f ' stands for the derivative of the activation function (usually sigmoid). The implementation of multiplication, addition, and a sigmoid function over real numbers is costly and may be impossible to realize in hardware if the number of units are large (Andree, Barkema, Lourens, Taal, & Vermeulen, 1993) (Maeda, Yamashita, & Kanata, 1991).

The key advantages of TSW learning algorithm are that it is very fast and ensures a solution, determining as a matter of course the number of hidden neurons. TSW does not encounter the problem of local minima because it does not utilize the concept of convergence. It requires only a single presentation of the training set, eliminating the need to store the training data for further iterations as is required in the BP algorithm.

In the TSW algorithm, the neural network learns through the creation of separation hyperplanes between distinct pattern groups in the hyperspace. Each hyperplane corresponds to a node in the hidden layer of the network and initially there is only one hidden layer node; during the process of formation of the hyperplanes, new nodes are created and attached to the hidden layer. Adding a new hidden layer node corresponds to extracting new features from the input attributes, therefore reducing residual classification errors. The weights between the input and hidden layers form the coefficients of the linear equations that define the hyperplanes, whereas the thresholds of the hidden layer nodes are the constants of these equations. The synaptic link weights and the thresholds of the nodes in the network therefore represent the orientation, spread, and offset of the hyperplanes and the hard-limiter thresholds are the activation functions for the hidden and output layer nodes. Thus, the training of a neural network using the TSW algorithm is a dynamic network construction

Meta-Neural Networks for Conflict Resolution 191

11

i2i3 ~ °l

• 0 2

i i h L i %

iN -1 ~ N : number of a~ribulss • ~ L: number of hidden layer nodes l N t ~ ) M: number of classes -

FIGURE 1. TSW network architecture.

process that involves adjusting the network weights and node thresholds as well as the topology.

After a neural network has been trained by the TSW algorithm, a snapshot of the system reflects the interrelationships between the input attributes and the condition units (hidden layer nodes) as seen in Figure 1.

2.2. Classification of distinct patterns with TSW algorithm

In classification using the TSW algorithm, a representa- tive input vector, corresponding to a particular output class, is chosen. The sample input vector, which we will refer to as x ~ , is a vector that is approximately at the center of all of the other input pattern vectors in the hyperspace for the same class of output. The distance between two input pattern vectors, .~ and ~, is determined as follows:

N

Dist[~, ~ ] = Zd(xq, yq) (2) q = l

0, iffXq= ~ Oryq= such that d(xq, yq) = 1, iff Xq # yq ;~

together with x ~ type patterns, and perform further separation until all of the patterns have been isolated, as is illustrated in Figure 2.

The linear equation for the separation hyperplane to be formed at distance D from x *~-~ is:

y q =

N

Z yq = (D + D + 1)/2, q = l • ,-e key .

(1--Xq), ln Xq= 1 • ~-~, key

Xq, m Xq = 0 . (4) • f t . k ey ss

0 , I H X q = ~ :

The weight of the synaptic link between a hidden layer node and its corresponding output node is 1 if the hidden layer node is activated, otherwise it is 0. The threshold for the output node is the number of hyperplanes (hidden layer nodes) created to enclose the input patterns that belong to the same class of output, minus 0•5. The output nodes have hard-limiter threshold activation functions and an output node with an activation threshold of k-0.5 will become activated if k of the hidden layer nodes that are connected to it are activated.

When an input pattern is presented to the network for classification after training, the distances between the input pattern and the training patterns are calculated in parallel by the corresponding hidden layer neurons. A hidden layer neuron outputs a 1 or a 0 depending on whether the input pattern is in the positive or the negative side of the hyperplane being implemented by it. Finally an output node gets activated if the cumulative input to the node exceeds the previously determined node threshold.

3. CONFLICT RESOLUTION W I T H META-NNs

3.1. Handling of Don't Care Attributes

Most expert systems include don't care attributes among their feature values, because there may be many features

where q = 1..N, N is the number of input nodes, Xq and yq are the qth elements of the x and y vectors, and '4t:' represents a don't care attribute• Using eq. (2) we determine, x ye~, the furthest input pattern vector with the same output classification type as x ke-7, and x ~° which is the nearest input pattern vector with an output classification type different than x ~ :

- - " P • k - ~ " x~=fi such that Dist[x key, x k] = max(Dlst[x ey, xp]) (3) p = l p

x ~ = f i such that Dist[x ~ , x k] = min(Dist[x key, x~]) p = l

where P is the number of patterns that have the same output classification type as x k--~. If Dist[x key, x y"] < Dist[xker,xn°], then the input patterns are linearly separa- ble, otherwise we perform the separation at a distance D from x ke~, ensuring that the majority of the x ~ type patterns are included in the separation. We then search for non-x key type patterns that might have been enclosed

o o o °^o/e o o o Oo u L e e

o ° o ° o oz" .

o o o%. •

oo/Oo o ° oA° • •

--/ 0 0 o o

0

O0

0

0

0 0

FIGURE 2. Separation of different pattems with multiple planes.


in the entire system, not all of which are required concurrently to satisfy a single condition. Therefore, when this situation is encountered, only the essential feature values are extracted, the rest are left as don't care attributes in the input to the system. During the training of a neural network to be utilized as a classifier, the don't care attributes can be considered as "1", "0", or both. A detailed discussion of the relative effects of assigning a value to the don't care attribute is presented in Hahn- Ming and Ching-Chi (1991). Replacing the don't care attributes with either a "1" or a "0" does not reflect the real intention, and therefore may create generalization problems. Because the main aim of the TSW training algorithm is to train a neural network that will be utilized in a hybrid neuro-expert system, don't care attributes are taken into consideration and will be represented as "# . "

The approach that we have adopted in implementing the TSW algorithm is to make the classification according to the necessary attributes while ignoring the don't care attributes. From a classifier system point of view, the hidden layer nodes correspond to condition units. If an input node has no contribution, negative or positive, to the satisfaction of the condition represented by the hidden layer node, its synaptic link is assigned to be "0," representing "no connection."

3.2. The Concept of Meta-NNs

We consider the problem of classifying a set of N discrete feature variables, which may also include don't care attributes, into M distinct classes. Formally, let X=[x] ,x 2 . . . . . xu] be the input vector where xi E {0,I,#}, (I < i < N ) and Y=NN(X ) be the output vector [Yl, Y2 . . . . . Yu] where yj E {0,I }, (I < j < M ) and NN stands for the classification performed by the neural network. The presence of a feature is indicated by a ' T ' and conversely the absence is indicated by a "0". A " # " represents a don't care attribute value.

If there are k don't care attributes in a single input pattern vector to the neural network, generalization by replacing the don't care attributes by all possible values

Inout I Oumut Input I Ourout 1##01 I 11000 ~ _ { lO001 I 11000 #l I01 ] 00100 ~ ll010l I1001 ] 1110001000

11101111ooo (a) Training samples " {°'1°11o ! o111o1:

(b)Training samples after expansion

FIGURE 3. Conversion of don't care attribute values.

(binary values in this case) results in an increase in 2*-1 input pattern vectors and Figure 3 illustrates this transformation. Examining the expansion given in Figure 3b, we notice that there is an implicit conflict in the new training set. That is, the input vector [ 11101] corresponds to two different output vectors, resulting in a one-to- many relationship between the input--output pairs.

In order to resolve such conflicts, we add a second neural network to the system and use this network the same way that meta-rules function in an expert system. Meta-rules are the roles that give directions on how to handle roles (Morris, 1986), and when more than one rule is eligible for activation, meta-mles determine the rule with the higher priority for activation, thereby eliminating the conflict set that would be created if all eligible rules were fired. We refer to the second neural network as Meta-NN and simulate the meta-rules' behavior.

The first neural network (NN) is trained with feature classification input-output pairs, and the Meta-NN is trained with the conflicts and the generalisation error cases as input, and the desired classification as output. Also if desired, additional classifier information might provide the Meta-NN with the information it might need to select one classification over the other, when a conflict occurs. With reference to Figure 4: let G=[g], g2 . . . . . gk] be the vector of the additional classifier information, where k is the number of classifier bits that carry the required information for conflict resolution. The input to the Meta-NN is (Y&G), the

x=[ , , , , 2 .... ,N]

Feature ~ 0 - - - - ~ Detectors " ~ 0___ ~

- - O ~

~ Y=[Yl,Y2 .... Yu]

NN ___~ 0 ____~

eature _ / ~ ~ itiona' Detectors ~ O

C-- [g,,g2 .... g,]

Meta-NN

FIGURE 4. Combination of a Neural Network and a Meta-Neural Network.


Neural Network I

I if user requests explanation

Rule-based explanation

system

Explanation

FIGURE 5. NESPRO neuro-expert system.

concatenation of the output vector Y = NN(X) with G. Z = (Meta-NN (Y&G) is the output from the Meta-NN, whereZ=[zl , z2 . . . . . zM] andzj ~ {0, 1}, (1 < j < M ) .

4. EXAMPLE: N E U R O - E X P E R T P R O J E C T M A N A G E M E N T SYSTEM (NESPRO)

We have implemented a prototype neuro-expert classifier system, NESPRO, that performs the task-resource assignment for software development project management according to features such as overtime work, priority of tasks, availability of resources such as programmers, and budget factors. The neural network part of the system was trained with available data from past projects, and examining the trained network enabled us to extract the general task-resource assignment pattern in order to determine the heuristics for the best allocations of resources. These patterns were converted to rules for building an explanation module to the classifier system (as in Figure 5) and are also useful for further modifications in case only the rule-base part of the system is utilized.

Because the Meta-NN's function can best be explained through its similarity with meta-rules in a rule- based system, we present here samples from two sets of extracted rules. We will refer to the first set simply as the rule set and the second set as the meta-rule set.

Rule Set: * if (task t:conflict in allocation) and

(resource r overtime = = 0) then (assign overtime to resource r on task t)

* if (task t:conflict in allocation) and (task t on resource r is low priority)

then (delay task t on resource r) * if (task t:conflict in allocation) and

(resource k is available for task t) then (assign task t to resource k)

Meta-rule Set: * if (assign overtime to resource r on task t) and

(assign task t to resource k) and (cost overtime to resource r < budget t) and (cost resource k on task t < budget t)

then (choose minimum cost action) * if (assign overtime to resource r) and

(assign task t to resource k) and (cost overtime to resource r < budget t) and (cost resource k on task t > budget t)

then (assign overtime to resource r)

With respect to neural networks, the first rule set corresponds to NN and the meta-rule set corresponds to Meta-NN. Here the meta-rule set decides upon the decisions of the rule set with the aid of additional information that, in this example, is the budget constraints. The additional information corresponds to the additional feature bits that are appended to the output of the NN as input to the Meta-NN. The additional features are those ones that are known to have a higher level of decisive effect on the solution but that are not necessary in all decision-making instances. In the case of NESPRO, additional feature bits represent information regarding particular budget constraints, skills, and past performance of programmers, which are not considered unless severe task allocation conflicts occur.

In NESPRO the resources' schedules are presented on a calendar for monthly task allocation, as presented in Figure 6. In the figure the green blocks are holidays and task allocation is not performed on these days, whereas the white blocks represent work days. The star in the center of some white blocks indicates that work has been scheduled for that particular day and clicking the mouse inside one of these blocks will display the day's schedule. A "C" mark in the upper right comer of a work day indicates the occurrence of a daily schedule conflict. Figure 7 displays the initiation of automatic schedule conflict resolution, which will be performed according to the decisions made by the neural network. The conflict resolution may require several passes over the schedule in order to resolve all conflicts. The project management routine takes the specific task properties (allowable delay, delay cost, etc.) into consideration while following the neural network's decisions to resolve the conflicts for each day. For example, Figure 8 displays the decision the neural network made for January 3rd in the first pass, which is to assign overtime work to the programmer resource. Because the menu option "Explain" in Figure 6 had a check mark on it, indicating that the user wants the neural network to explain its decisions, the dialog box in Figure 9 appears. Examining Figure 9, we can see that two rules were fired forming a conflict set in the first module, in other words, two output nodes of the network were activated. In order to chose the best solution the Meta-NN is activated automatically so that only one action is chosen as the result of this, which is to chose the minimum cost action according to the budget factors. In


FIGURE 6. Monthly resource schedules.

FIGURE 7. Initiation of automatic schedule conflict resolution.


FIGURE 8. The neural networks decision for conflict resolution for January 3rd in the first pass.

FIGURE 9. Explanation of neural network's decision for January 3rd in the first pass.

196 17. Ulgen and N. Akamatsu

RGURE 10. The neural networks decision for conflict resolution for January 3rd in the second pass.

RGURE 11. Explanation of neural network's decision for January 3rd In the second pass.


FIGURE 12. The resources schedules without conflicts.

FIGURE 13. Display for observing the neural network activations for conflict resolution.


FIGURE 14. The display of NN and Meta-NN.

this case, the minimum cost action turns out to be the allocation of overtime. This process is repeated until all the conflicts are resolved. In Figure 10, the decision by the neural network is to "Delay Task 1" of that particular day and its explanation is given in Figure 11. When all conflicts are resolved, the "C" conflict sign in the right- hand comer of the blocks disappears as shown in Figure 12. Here we also notice that the stars are spread further into the month, indicating that tasks have been assigned to these days. If the user wants to observe the activation of the neural network nodes during conflict resolution, the "Network Activation" menu item in Figure 13 provides the display and the automatic data entry to the neural network from the resource's calendar. The neural networks, NN and Meta-NN, can be observed partially in Figure 14. In this figure the outputs of the first neural network, NN, form the inputs to the second neural network, Meta-NN, along with the additional feature nodes. The outputs of the Meta-NN are the same as the NN except for the last node (which is obscured in the figure), which is to select the minimum cost among all of the members of the conflict set.

5. CONCLUSION

In this paper we have presented the TSW algorithm and the concept of Meta-Neural Networks, utilizing both in the implementation of a neuro-expert system, NESPRO.

TSW is suitable for the integration of expert systems with neural networks because of its discrete-valued synaptic weights and the binary nature of the input and

output. In the integration of symbolic artificial intelligence with neural networks, training algorithms with the synaptic link properties such as those in the TSW algorithm are beneficial. It is also suitable for hardware implementation, again owing to its discrete-valued synaptic weights and the simplicity of the calculations involved for training the network. A hybrid system, where a neural network realized in hardware performs fast decision making and an expert system provides explanations if required, can capture the best attributes of both approaches. Our future research will be oriented towards a variety of implementations of this concept.

Our hybrid system benefited from the addition of the Meta-NN to resolve conflicts that could not otherwise be dealt with by a single neural network. In our implementation, the Meta-NN was only activated when the first neural network received real-time input, which could not be classified with the initial training data. Should it be necessary, a number of Meta-NNs could be cascaded in the same way that meta-rules are cascaded in rule-based systems.

REFERENCES

Andree, H. M. A., Barkema, G. T., Lourens, W., Taal, A., & Vermeulen, J. C. (1993). A comparison study of binary feedforward neural networks and digital circuits. Neural Networks, 6, 785-790.

Gallant, S. I. (1988). Connectionist expert systems. Communications of the ACM, 31(2), 152-179.

Goodman, R., Higgins, C. M., Miller, J. W., & Symth, P. (1992). Rule- based neural networks for classification and probability estimation. Neural Computation, 4, 781-804.

Hahn-Ming, L., & Ching-Chi, H. (1991, November). The handling of


don't care attributes. Proceedings of the International Joint Conference on Neural Networks 1991, 2, 1085-1091, Singapore.

Lee, Y., & Lippmann, R. P. (1989). Practical characteristics of neural network and conventional pattern classifiers on artificial and speech problems. Practical Characteristics of Neural Networks, 168-177.

Maeda, Y., Yamashita, H., & Kanata, Y. (1991, November). Learning rules for multilayer neural networks using a difference approxima- tion. Proceedings of the International Joint Conference on Neural Networks 1991, 1,628-633, Singapore.

Morris, M. E. (1986), Meta-level reasoning for conflict resolution in backward chaining. In Gupta, A., & Prusad, S. (Eds.), Principles of

expert systems (pp 175-180). IEEE Press, N.J. Rumeihart, D. E., & McClelland, J. L. (1986), Parallel distributed

processing (Vol. 1). Cambridge, MA: The MIT Press. Sejnowski, T. J., & Rosenberg, C. R. (1986). NETtalk: A parallel

network that learns to read aloud (Tech. Rep., JHU/EECS-86-01). John Hopkins University.

Ulgen, E, & Akamatsu N. (1994, April). Ternary synaptic weights algorithm: Neural network training with don't care attributes. Proceedings of the International Symposium on Speech, Image Processing and Neural Networks 1994, 2, 503-506, Hong Kong, IEEE Press.

Documents

A neuro-expert system with a conflict-resolving meta-neural network