45052361 a New Paradigm of Knowledge Engineering by Soft Computing

| P p i Soft Computing Series — Volume 5

fl new Paradigm of Knowledge Engineering by Soft Computing

Editor: Liya Ding

RSI Fuzzy Logic

Systems Institute (FLSI)

o'jo.roco r>0010C0iL


Fuzzy Logic Systems Institute (FLSI) Soft Computing Series

Series Editor: Takeshi Yamakawa (Fuzzy Logic Systems Institute, Japan)

Vol. 1: Advanced Signal Processing Technology by Soft Computing edited by Charles Hsu (Trident Systems Inc., USA)

Vol. 2: Pattern Recognition in Soft Computing Paradigm edited by Nikhil R. Pal (Indian Statistical Institute, Calcutta)

Vol. 3: What Should be Computed to Understand and Model Brain Function? — From Robotics, Soft Computing, Biology and Neuroscience to Cognitive Philosophy edited by Tadashi Kitamura (Kyushu Institute of Technology, Japan)

Vol. 4: Practical Applications of Soft Computing in Engineering edited by Sung-Bae Cho (Yonsei University, Korea)

Vol. 6: Brainware: Bio-Inspired Architecture and Its Hardware Implementation edited by Tsutomu Miki (Kyushu Institute of Technology, Japan)

F L j p I Soft Computing Series — Volume 5


Editor

Liya Ding National University of Singapore

V f e World Scientific « • Sinaapore • New Jersey L Singapore • New Jersey • London • Hong Kong

Published by

World Scientific Publishing Co. Pte. Ltd.

P O Box 128, Farrer Road, Singapore 912805

USA office: Suite IB, 1060 Main Street, River Edge, NJ 07661

UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE

British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.

A NEW PARADIGM OF KNOWLEDGE ENGINEERING BY SOFT COMPUTING FLSI Soft Computing Series — Volume 5

Copyright © 2001 by World Scientific Publishing Co. Pte. Ltd.

All rights reserved. Thisbook, or parts thereof, may notbe reproducedinanyformorby any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.

For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.

ISBN 981-02-4517-3

Printed in Singapore by Fulsland Offset Printing

To

Prof. Lotfi A. Zadeh and other pioneers

who have changed our life in more ways t h a n one and who have encouraged as well as guided us to continue

our research and development in Soft Comput ing

Series Editor's Preface

The IIZUKA conference originated from the Workshop on Fuzzy Systems Application in 1988 at a small city, which is located in the center of Fukuoka prefecture in the most southern island, Kyushu, of Japan, and was very famous for coal mining until forty years ago. lizuka city is now renewed to be a science research park. The first IIZUKA conference was held in 1990 and from then onward this conference has been held every two years. The series of these conferences played important role in the modern artificial intelligence. The workshop in 1988 proposed the fusion of fuzzy concept and neuroscience and by this proposal the research on neuro-fuzzy systems and fuzzy neural systems has been encouraged to produce significant results. The conference in 1990 was dedicated to the special topics, chaos, and nonlinear dynamical systems came into the interests of researchers in the field of fuzzy systems. The fusion of fuzzy, neural and chaotic systems was familiar to the conference participants in 1992. This new paradigm of information processing including genetic algorithms and fractals is spread over to the world as "Soft Computing".

Fuzzy Logic Systems Institute (FLSI) was established, under the supervision of Ministry of Education, Science and Sports (MOMBUSHOU) and International Trade and Industry (MITT), in 1989 for the purpose of proposing brand-new technologies, collaborating with companies and universities, giving university students education of soft computing, etc.

FLSI is the major organization promoting so called IIZUKA Conference, so that this series of books edited from IIZUKA Conference is named as FLSI Soft Computing Series.

The Soft Computing Series covers a variety of topics in Soft Computing and will propose the emergence of a post-digital intelligent systems.

Takeshi Yamakawa, Ph.D. Chairman, IIZUKA 2000 Chairman, Fuzzy Logic Systems Institute

vii

Volume Editor's Preface

Soft computing (SC) consists of several computing paradigms, including neural networks, fuzzy set theory, approximate reasoning, and derivative-free optimization methods such as genetic algorithms. The integration of those constituent methodologies forms the core of soft computing. Also the synergy allows soft computing to incorporate human knowledge effectively, deal with imprecision and uncertainty, and learn to adapt to unknown or changing environment for better performance. Together with other modern technologies, soft computing and its applications bring unprecedented influence to intelligent systems that mimic human intelligence in thinking, learning, reasoning and many other aspects.

On the other hand, knowledge engineering (KE) that deals with knowledge acquisition, representation, validation, inferencing, explanation, and maintenance has had significant progress recently due to the indefatigable effort of researchers. Undoubtedly, the hot topics of data mining and knowledge/data discovery have injected new lease of life to the classical AI world.

It is obvious that soft computing and knowledge engineering are expected to fulfill some common targets in materializing machine intelligence. In recent trends, many researchers of SC have applied their technics in solving KE problems, and researchers of KE have adopted SC methodologies to enhance KE applications. The cooperation of the two disciplines is not only extending the application of SC, but also introducing new innovation t oKE.

There are fifteen chapters in total in this book. Except for the introductory chapter which provides the reader with a guideline on the contents,

IX

x L. Ding

the rest of the fourteen chapters is an extended version of the original conference papers selected from IIZUKA'98. These papers mainly presented works on:

• Acquisition and modelling of imprecise knowledge • Reasoning and retrieval with imprecise knowledge • Description and representation of fuzzy knowledge • Knowledge representation and integration by SC • Knowledge discovery and data mining by SC

The fourteen chapters are divided into two parts. The first part (Chapters 2 to 9) mainly focuses on fuzzy knowledge-based systems, including

(i) fuzzy rule extraction, (ii) fuzzy system tuning,

(iii) fuzzy reasoning, (iv) fuzzy retrieval, and (v) knowledge description language for fuzzy systems.

The second part (Chapters 10 to 15) mainly focuses on

(vi) knowledge representation, (vii) knowledge integration,

(viii) knowledge discovery, and

(ix) data mining

by soft computing.

The aim of this book is to help readers trace out how KE has been influenced and extended by SC and how SC will be helpful in pushing the frontier of KE further. This book is intended for researchers and also graduate students to use as a reference in the study of knowledge engineering and intelligent systems. The reader is expected to have a basic knowledge of fuzzy logic, neural networks, genetic algorithms, and knowledge-based systems.

Acknowledgments

1. All authors of original papers for the valuable contributions. 2. Prof. Takeshi Yamakawa for his constant encouragement. 3. Prof. Masao Mukaidono from Meiji University for his guidance to

me in the establishment of a foundation for research on fuzzy logic and knowledge engineering.

Volume Editor's Preface xi

4. Prof. Lotfi A. Zadeh and other pioneers (too numerous to name individually) for their support to me over the past 14 years.

5. The Institute of Systems Science, National University of Singapore for providing me the opportunity of doing research and applying the results.

6. Mrs. Jenny Russon for editing and polishing my English with amazing speed and thoroughness.

Liya DING Singapore

Contents

Series Editor's Preface vii

Volume Editor's Preface ix

Chapter 1 Knowledge Engineering and Soft Computing —An Introduction .... 1 Liya Ding

Part I: Fuzzy Knowledge-Based Systems

Chapter 2 Linguistic Integrity: A Framework for Fuzzy Modeling — AFRELI Algorithm 15 Jaito Espinosa, Joos Vandewalle

Chapter 3 A New Approach to Acquisition of Comprehensible Fuzzy Rules 43 Hiroshi Ohno, Takeshi Furuhashi

Chapter 4 Fuzzy Rule Generation with Fuzzy Singleton-Type Reasoning Method 59 Yan Shi, Masaharu Mizumoto

Chapter 5 Antecedent Validity Adaptation Principle for Table Look-Up Scheme 77 Ping-Tong Chan, Ahmad B. Rod

Chapter 6 Fuzzy Spline Interpolation in Sparse Fuzzy Rule Bases 95 Mayuka F Kawaguchi, Masaaki Miyakoshi

Chapter 7 Revision Principle Applied for Approximate Reasoning 121 Liya Ding, Peizhuang Wang, Masao Mukaidono

Chapter 8 Handling Null Queries with Compound Fuzzy Attributes 149 Shyue-Liang Wang, Yu-Jane Tsai

xiii

xiv Contents

Chapter 9 Fuzzy System Description Language 163 Kazuhiko Otsuka, Yuichiro Mori, Masao Mukaidono

Part II: Knowledge Representation, Integration, and Discovery by Soft Computing

Chapter 10 Knowledge Representation and Similarity Measure in Learning a Vague Legal Concept 189 MingQiang Xu, Kaoru Hirota, Hajime Yoshino

Chapter 11 Trend Fuzzy Sets and Recurrent Fuzzy Rules for Ordered Dataset Modelling 213 Jim F. Baldwin, Trevor P. Martin, Jonathan M. Rossiter

Chapter 12 Approaches to the Design of Classification Systems from Numerical Data and Linguistic Knowledge 241 Hisao Ishibuchi, Manabu Nii, Tomoharu Nakashima

Chapter 13 A Clustering based on Self-Organizing Map and Knowledge Discovery by Neural Network 273 Kado Nakagawa, Naotake Kamiura, Yutaka Hata

Chapter 14 Probabilistic Rough Induction 297 Juzhen Dong, Ning Zhong, Setsuo Ohsuga

Chapter 15 Data Mining via Linguistic Summaries of Databases:'An Interactive Approach 325 Janusz Kacprzyk, Slavomir Zadrozny

About the Authors 347

Keyword Index 369

Chapter 1

Knowledge Engineering and Soft Computing — An Introduction

Liya Ding

National University of Singapore

1.1 Introduction

As the title, "A New Paradigm of Knowledge Engineering by Soft Computing" , indicates, this book presents works in the intersection of two areas of computer science in the broad sense: knowledge engineering and soft computing.

Knowledge engineering (KE) [2], known as an important component of artificial intelligence (AI), is an area that mainly concentrates on activities with knowledge, including knowledge acquisition, representation, validation, inference, and explanation.

Soft computing (SC) [14], on the other hand, is an area that provides tools and methodologies for intelligent systems to be developed with the capability of handling uncertainty and imprecision, learning new knowledge and adapting themselves to a changing environment.

Though the concept of knowledge engineering was put forward in its own way in early years without the recognition of the usefulness of soft computing, soft computing methodologies, including fuzzy logic, neural networks, and evolutionary computation, have been related to one or more aspects of KE and therefore AI problems. They have done so with their particular strengths from the beginning.

There have been many remarkable works done in parallel in both KE and SC areas, but relatively less in the intersection of the two. In recent

1

2 L. Ding

trends, many researchers of SC have applied their technics in solving KE problems, and researchers of KE have adopted SC methodologies to enhance KE applications. This book is to introduce to the reader a collection of works that bring new innovation to knowledge engineering by the applications of soft computing.

1.1.1 Knowledge and Knowledge Engineering

Knowledge, or the problem of dealing with knowledge, has been of intensive interest to sociologists and psychologists for a long time. With the development of artificial intelligence (AI), the emphasis has shifted from philosophical and social concepts to the problem of representation, or more precisely, the problem of representation of knowledge in computers.

Knowledge is a highly abstract concept. Although most of us have a fairly good idea of what it means and how it relates to our life, we have probably not explored some of its wider meaning in a universal context. Knowledge can be defined as the body of facts, principles, acts, state of knowing and experience accumulated by humankind.

However, this definition is far from complete and knowledge is actually much more than this. It is having actual experiences with languages, concepts, procedures, rules, ideas, abstractions, places, customs, facts, and associations, coupled with an ability to use these experiences effectively in modeling different aspects of the world.

Knowledge is closely related to intelligence. Knowledge-based systems are often described as being 'intelligent' in the sense that they attempt to simulate many of the activities which when undertaken by a human being are regarded as being instances of intelligence.

The differentiation between types of knowledge can be found in several ways. From the point of view of the use of intelligent systems, knowledge can be divided into the following types: (1) Declarative knowledge is passive knowledge expressed as statements of facts about the world. (2) Procedural knowledge is compiled knowledge related to the performance of some task. (3) Heuristic knowledge is to describe human experience for solving complex problems.

In building a knowledge-based system for a specific domain, so-called domain knowledge can be considered to have two main kinds: a) surface knowledge and b) deep knowledge. Surface knowledge is the heuristic, experiential knowledge learned after solving a large number of problems

Knowledge Engineering and Soft Computing — An Introduction 3

in that domain. Deep knowledge refers to the basic laws of nature and the fundamental structure and behavioral principles of that domain which cannot be altered.

In regard to levels of abstraction and completeness, knowledge can be summarized in different forms. Rules are often used to represent more deterministic and abstract knowledge, by a certain relationship between the antecedent and the consequent. Cases are useful to describe knowledge gained from previous experience, which will tell us the appearance of related factors without us knowing clearly which is the cause and which is the effect. Patterns, as compared with rules and cases, are usually used to store less abstract and sometimes less complete knowledge.

The difference between types, or forms of knowledge is not always absolute. Heuristics may be of the nature of declarative knowledge or procedural knowledge. Cases may be represented in the form of rules through necessary transformation. Patterns may also be summarized as cases or even rules through appropriate granulation or quantization, if the corresponding knowledge-based system requires so.

Knowledge includes and requires the use of data and information, but should not be confused with them. Knowledge includes skill and training, perception, imagination, intuition, common sense and experience, and combines relationships, correlations, and dependencies. It has been widely accepted that with a sufficient amount of data, some useful knowledge may possibly be discovered through a certain discovery technique. As a recent hot topic, data mining for knowledge discovery has attracted more and more attention.

Knowledge engineering is a discipline devoted to integrating human knowledge in computer systems, or in other words, to building knowledge-based systems. It can be viewed from both the narrow or wider perspectives. According to the narrow perspective, knowledge engineering deals with knowledge acquisition (also referred to as knowledge elicitation), representation, validation, inference, and explanation. Alternatively, according to the wider perspective the term describes the entire process of development and maintenance of knowledge-based systems. In both cases knowledge plays the key role.

Knowledge engineering, especially the knowledge acquisition practice, involves the cooperation of human experts in that domain who work with the knowledge engineer to codify and to make explicit the rules (or other form of knowledge) that a human expert uses to solve real problems. Since

4 L. Ding

the construction of knowledge base needs human knowledge in a direct or an indirect way, an important issue in the design of knowledge-based systems is how to equip them with human knowledge that often appears to be uncertain, imprecise, and incomplete to some degree.

1.1.2 Soft Computing

Soft computing is an emerging approach to computing which parallels the remarkable ability of the human mind to reason and learn in an environment of uncertainty and imprecision. (Lotfi A. Zadeh [14])

As pointed out by Prof. Lotfi A. Zadeh, soft computing is not a single methodology, but a partnership. The principal partners at this stage are fuzzy logic (FL), neuro computing (NC), and probabilistic reasoning (PR), with the latter subsuming genetic algorithms (GA), chaotic systems, belief networks, and parts of learning theory. The pivotal contribution of FL is a methodology for computing with words; that of NC is system identification, learning, and adaptation; and that of GA is systematized random search and optimization. [5]

Fuzzy Logic Fuzzy logic has its narrow and broad sense. According to the narrow sense, fuzzy logic is viewed as a generalization of the various multivalued logics. It mainly refers to approximate reasoning, as well as knowledge representation and inference with imprecise, incomplete, uncertain or partially true information. According to the broad sense, fuzzy logic includes all the research efforts related to fuzzy inference systems (or fuzzy systems).

It is generally agreed that human knowledge includes imprecision, uncertainty, and incompleteness in nature, because the human brain interprets imprecise and incomplete sensory information provided by perceptive organs. Instead of simple rejection of the ambiguity, fuzzy set theory, as an extension of set theory, offers a systematic calculus to deal with such information. It performs numerical computation by using linguistic labels stipulated by membership functions. With fuzzy sets, human knowledge described in words can be represented and hence processed in computer. Fuzzy logic, in its narrow sense, offers the possibility of inference with uncertainty and imprecision. Together with fuzzy set theory, it provides the basis of fuzzy inference systems.

A typical fuzzy inference system has a structured knowledge represen-


tation in the form of fuzzy "if-then" rules. A fuzzy "if-then" rule (or fuzzy rule) takes the same form as a symbolic if-then rule, but is interpreted and executed in a different way with the use of linguistic variable.

Fuzzy knowledge representation and approximate reasoning have greatly extended the ability of the traditional rule-based system. However, it lacks the adaptability to deal with a changing environment and assumes the availability of well structured knowledge for the problem domain. Thus, people incorporate learning concepts in fuzzy inference systems. One important way of materializing learning in fuzzy inference systems is using neural networks.

Neural Networks The original idea of artificial neural networks (also known as neural networks) is inspired by biological nervous systems. A neural network system is a continuous-time nonlinear dynamic system. It uses connectionist architectures to mimic human brain mechanisms for intelligent behavior. Such connectionism replaces symbolically structured representation with distributed representation in the form of weights between a massive set of interconnected processing units. The weights are modified through a certain learning procedure so that the neural network system can be expected to improve its performance progressively in a specific environment.

Neural networks are good in fault-tolerance, and can learn from training data provided in non-structured and non-labelled form. However, as compared with fuzzy inference systems, the knowledge learned in a neural network system is usually non-transparent and hard to explain. Many researchers have put efforts on rule extraction from neural networks and rule generation using neural networks. Those extracted or generated rules can then be used to develop fuzzy inference systems with necessary and possible fine tuning.

Evolutionary Computation Fuzzy logic offers knowledge representation and inference mechanism for knowledge processing with imprecision and incompleteness. Neural networks materializes learning and adaptation capability for intelligent systems. While evolutionary computation provides the capacity for population-based systematic random search and optimization.

Evolutionary computing techniques such as genetic algorithms (GA) are based on the evolutionary principle of natural selection. A GA carries on evaluations of fitness for the population of possible solutions and leads the

6 L. Ding

search to a better fitness. A 'best' solution may always be expected for many AI applications.

The use of heuristic search techniques, therefore, forms an important part of application of intelligent systems. However, in reality it is not always possible to get such an optimal solution when the search space is too large for an exhaustive search and at the same time is too difficult to reduce. Genetic algorithm (GA) is a usable technique to perform more efficient search techniques to find less-than-optimum solutions.

1.1.3 Soft Computing Contributes to Knowledge Engineering

Some of the contributions of soft computing to knowledge engineering can be found in the following aspects:

• Knowledge Representation Fuzzy logic can be used to represent imprecise and incomplete knowledge described in words. On the other hand, knowledge based neural networks offer a connec-tionist way of knowledge representation with the learning ability of neural networks.

• Knowledge Acquisition When the information and data obtained as domain knowledge is less structured and summarized, neural networks can be employed for learning. A trained neural network can be viewed as a form of knowledge representation and possible rule extraction may be applied then to obtain fuzzy rules from the neural network. Some clustering techniques can also be used with fuzzy logic to help fuzzy rule extraction. Genetic algorithms can help search for more accurate fuzzy rules or fine tune fuzzy rules.

• Knowledge-based Inference In broad sense, both fuzzy inference systems and neural network systems offer knowledge-based inference. In fuzzy inference systems, inference is executed by using fuzzy rules, fuzzy relations, and fuzzy sets within the frame of fuzzy logic. While in neural network systems, the inference results are determined by the inference algorithms based on the learned knowledge in the neural networks. Genetic algorithms can be used to find a better neural network configuration.


Modeling and Developing Knowledge-based Systems Neuro-fuzzy modeling is a pivotal technique in soft computing by incorporate neural network learning concepts in fuzzy inference systems. Hybrid systems provide more capability in developing intelligent systems with the cooperation of the SC techniques. Knowledge Integration Knowledge integration becomes a critical issue to maximize the functionality of an intelligent, knowledge-based system when the knowledge for the specific domain exists at different levels of abstraction and completeness, or comes from various sources and is described in different forms. The cooperation of soft computing techniques offers more flexibility in dealing with such situation.

Knowledge Discovery If we can say that knowledge representation is for representing available knowledge, and knowledge acquisition is for obtaining existing but not well summarized knowledge, then we probably should say that knowledge discovery is to find out knowledge existing in more unknown form. Neural networks, with supervised or unsupervised learning approaches, can help discover knowledge from given data. Evolutionary computations, and probabilistic approaches have also been applied for similar purposes.

1.2 Structure of This Book

This book is organized into two parts. Part I (Chapters 2 to 9) mainly focuses on fuzzy knowledge based systems, including rule extraction, system tuning, reasoning, retrieval, and knowledge description language. Part II (Chapters 10 to 15) mainly focuses on knowledge representation, integration, discovery and data mining by soft computing.

Figures 1.1 and 1.2 illustrate contents of the chapters from the perspectives of the KE topics related and the SC techniques applied, respectively.

1.2.1 Part I: Fuzzy Knowledge-based Systems

In developing fuzzy inference systems, one of the important tasks is to construct a fuzzy rule base within the framework of fuzzy modelling. Chapter 2 introduces an algorithm for automatic fuzzy rule extraction from prior knowledge and numerical data. It consists of several main steps: two-stage

8 L. Ding

clustering of numerical data using mountain clustering and fuzzy c-means methods; generation and reduction of fuzzy membership functions for antecedents; consequence calculation and further adjustment.

Knowledge Engineering

Knowledge Modelling & Acquisition

Reasoning & Retrieval

Knowledge Based System Development

Knowledge Representation & Integration

Ch. 12: Integration )

Knowledge Discovery Ch. 13-15 Discovery

Figure 1.1: A View of the Contents based on Related KE Topics

Chapter 3 presents an algorithm for acquisition of fuzzy rules by using evolutionary computation (EP). It first constructs a fuzzy neural network based on fuzzy modelling, and then applies EP on training data to identify parameters of the fuzzy neural network, which indicate central position and width of fuzzy membership functions, as well as singleton consequent of each fuzzy rule. A "re-evaluation" of fuzzy model by evolutionary computation is executed to simplify the membership functions obtained in the early phase with a flexible "degree of explanation" indicated by the user.

Chapter 4 introduces a rule extraction method by neuro-fuzzy learning and fuzzy clustering. The proposed method is used with fuzzy singleton-type reasoning, which has been successfully applied in fuzzy control sys-


terns. The process of rule extraction is divided into two stages; fuzzy c-means clustering is first executed to generate initial tuning parameters of fuzzy rules based on input-output data; then a neuro-fuzzy learning algorithm based on the gradient descent method is later applied to tune the parameters.

Fuzzy Sets

Fuzzy Rules

Fuzzy Reasoning

Fuzzy Query

Similarity Measure

Fuzzy Clustering / NN Clustering

Fuzzy Neural Networks

Neuro-ruzzy Learning / NN Learning

Evolutionary Computation

Probability / Possibility Theory

Rough Sets

Data Mining

Figure 1.2: A View of the Contents based on SC Techniques Applied

Chapter 5 explains another algorithm for fuzzy rule extraction based on numerical data and expert knowledge. This algorithm first fixes the fuzzy membership functions in the input and output spaces and then generates fuzzy rules from given data and given expert knowledge. It uses the antecedents validity to adjust the output consequences. With more concern for reducing modelling error, it tends to generate a larger number of rules than that of data patterns.

Fuzzy reasoning is another important aspect of fuzzy knowledge based systems. In the discussion of Chapters 2 to 5, fuzzy rules and member-

10 L. Ding

ship functions are assumed to cover well the problem space. However, it is also necessary to consider applications that lack sufficient data and expert knowledge. Chapter 6 presents a technique of approximate reasoning through linear and non-linear interpolation functions of given fuzzy rules. This method makes it possible to apply the ordinary approximate reasoning with sparse fuzzy rule bases.

Chapter 7 summarizes the work of approximate reasoning using the revision principle. It is different from other methods in that it performs approximate reasoning in a more intuitive way based on the key concept of "reasoning based on revision". Five methods based on linear revision and semantic revision are presented. By using the "approximation measure", it allows the approximate reasoning with sparse fuzzy rules and arbitrary shapes of membership functions.

Fuzzy retrieval is a main topic for fuzzy database and also a useful technique for a wide range of application of fuzzy systems. Chapter 8 presents an approach of fuzzy query handling for fuzzy retrieval. It allows the use of compound fuzzy attributes, which can be derived from numbers, interval values, scalars, and sets of all these data types, with appropriate aggregation functions and similarity measures on fuzzy sets.

A general programming language for fuzzy system development is very useful to support the growth of application of fuzzy systems. Chapter 9 summarizes the work on a fuzzy system description language, which accepts the user's description of the target fuzzy system and then generates corresponding C code based on the description. It offers flexible types of data and knowledge including fuzzy sets with different kinds of membership function, fuzzy numerical and logical operations, as well as fuzzy rules.

1.2.2 Part II: Knowledge Representation, Integration, and Discovery by Soft Computing

The comprehensive applications of knowledge-based systems request more flexibility in representation and integration of different types of knowledge. Chapter 10 presents a "fuzzy factor hierarchy" for representing uncertain and vague concepts in legal expert systems. It offers the possibility to represent objects with not only numerical features but also context-based features. A structural similarity measure containing surface level component and deep level component is proposed for the reasoning and retrieval when using the fuzzy factor hierarchy. The surface level component con-


sists of distance-based and feature-based similarity, while the deep level component is determined by context-based similarity.

Chapter 11 presents two models to handle ordered dataset and time series problem for classification applications. The proposed two models are based on the theory of mass assignment, which is to unify probability, possibility, and fuzzy sets into a single theory. The memory-based modelling makes possible the belief updating method by using recurrent fuzzy rules and focuses on how the computing model captures human belief and memory. The perception-based modelling uses trend fuzzy sets to describe natural trends of a time series. It is based on the high level perception mechanism used by humans to sense their environment.

Chapter 12 introduces two approaches of knowledge integration for the design of classification systems; one is a fuzzy rule-based approach where fuzzy if-then rules generated from numerical data are used together with the given linguistic knowledge to construct a fuzzy rule-based system and the rules can be generated by heuristic procedure, genetic algorithms, or neural networks; the other one is a neural-networks-based approach where both of the given linguistic knowledge (i.e., fuzzy if-then rules) and the numerical data (i.e., training patterns) are handled as fuzzy training patterns and then used in the learning of extended neural networks.

With the rapid growth of applications of knowledge-based systems, the matter of "how to maximize the use of available knowledge, information, and data to make knowledge-based systems more "intelligent"" has become a pressing issue. The study of knowledge discovery and data mining offers a possible way towards the solution.

Chapter 13 proposes a two-stage method of knowledge discovery by neural networks. In the first stage, Self-Organizing Map (SOM) is applied for initial clustering of given training data. The result is then modified by combining some neighboring neurons that satisfy some conditions. In the second stage, a three-layered feed-forward network learns the center vector of each modified cluster obtained in the early stage. By pruning some hidden neurons to obtain a so-called " degree of contribution", it discovers the knowledge for explaining why a cluster has been formed in terms of its corresponding attribute values.

Chapter 14 presents an approach for uncertain rule discovery from database with noise and incomplete data. This approach is based on the combination of the rough set theory and the "generalization distribution table" which is used to represent the probabilistic relationships between

12 L. Ding

concepts and instances over discrete domains. It first selects a set of rules with larger strengths from possible rules, and then further finds "minimal relative reducts" from this set. It offers the flexibility to involve biases and some background knowledge in the discovery process.

Chapter 15 proposes an interactive approach to linguistic summaries of databases for data mining applications. The derived linguistic summaries are based on fuzzy logic with linguistic quantifiers. Three main types of data summaries are offered; type 1 is to receive some estimate of the cardinality of some population as a linguistic quantifier; type 2 is to determine typical values of a field; type 3 which is the most general type, is to produce fuzzy rules describing the dependencies between values of particular fields.

Both soft computing and knowledge engineering are rapidly developing and constantly evolving areas. More and more new techniques and applications of SC and KE are being proposed. The results achieved so far have already established a good foundation in building more "intelligent" machines in future which will contribute greatly to our daily life.


References

P. Beynon-Davies, "Knowledge Engineering for Information Systems", McGraw-Hill, 1993.

E. Feigenbaum & P. McCorduck, "The Fifth Generation", Addison-Wesley, 1983.

D. B. Fogel, "Evolutionary Computation - Toward a New Philosophy of Machine Intelligence", IEEE Press, 1995.

L. Fu, "Neural Networks in Computer Intelligence", McGraw-Hill, Inc., 1994.

J.-S. R. Jang, C.-T. Sun & E. Mizutani, "Neuro-Fuzzy and Soft Computing", Prentice-Hall, Inc., 1997.

C.-T. Lin & C.S. George Lee, "Neural Fuzzy Systems - A Neuro-Fuzzy Synergism to Intelligent Systems", Prentice-Hall International, Inc., 1996.

C. V. Negoita, "Expert Systems and Fuzzy Systems", The Ben-jamin/Cummings Publishing Company, Inc., 1985.

D. W. Patterson, "Introduction to Artificial Intelligence and Expert Systems", Prentice-Hall, Inc., 1990.

D. A. Waterman, "A Guide to Expert Systems", Addison-Wesley Publishing, 1986.

R. R. Yager, "Fuzzy logics and artificial intelligence", Fuzzy Sets and Systems, Vol. 90, pp.193-198, 1997.

T. Yamakawa &c G. Matsumoto (Eds.), "Methodologies for the Conception, Design and Application of Soft Computing", Proceedings of the 5th International Conference on Soft Computing and Information/Intelligent Systems (IIZUKA'98), World Scientific Publishing, 1998.

L. A. Zadeh, "Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic", Fuzzy Sets and Systems Vol. 90, pp.111-127, 1997.

L. A. Zadeh, "Fuzzy Logic = Computing with Words", IEEE Trans. Fuzzy Systems, Vol. 4, No. 2, pp.103-111, 1996.

14 L. Ding

[14] L. A. Zadeh, "Fuzzy logic, neural networks and soft computing", Commun., ACM, Vol. 37, No. 3, pp.77-84, 1994.

[15] H. J. Zimmermann, "Fuzzy Sets, Decision Making, and Expert Systems", Kluwer Academic Publishers, 1987.

Part I: Fuzzy Knowledge-Based Systems

Chapter 2

Linguistic Integrity: A Framework for Fuzzy Modeling -AFRELI Algorithm

Jairo Espinosa, Joos Vandewalle Katholieke Universiteit Leuven

Abstract

In this paper, a method for fuzzy modeling is presented. The framework of the method is the concept of Linguistic Integrity. The use of this framework present several advantages. The most important is transparency, this transparency can be exploited in two directions. The first direction is in data mining where the method can provide a linguistic relation (IF-THEN rules) among the variables. The second direction is to improve the completeness of a model by giving an easy interface to the user such that expert knowledge can be included. The algorithm starts from numerical data (input-output data) and generates a rule base with a limited number of membership functions on each input domain. The rules are created in the environment of fuzzy systems. The algorithm used for rule extraction is named (AFRELI).

Keywords: fuzzy modeling, function approximation, knowledge extraction, data mining

2.1 Introduction

The use of models is the "corner stone" of human reasoning. Human beings make use of models to determine the consequences of their acts. The representations of such models is variated and can be external (mathematical models, if-then rules, etc.) or internal (thoughts, reasoning, reflexes). Human beings use also the models not only to predict the results of their actions but also to understand the "mechanism" which governs the nature. Of course a causal nature of systems is embedded on this line of reasoning. The differences among the models are motivated by the information used

15

16 J. Espinosa & J. Vandewalle

to construct the model and the information demanded from the model (representation and accuracy). Modern science provides us with new sensors, extending our possibilities to explore the nature beyond our five senses. Most of the time the amount of data provided by sensors is overwhelming and obstructs our capacity to understand the phenomena governing the process. Information extraction is a task needed before some understanding of the process can be achieved. The basic principle of information extraction is the construction of a model which is able to capture the behavior of the data generated by the process. Recent studies have been successful on the task of constructing mathematical models out of numerical data provided by sensors (System Identification). On the other hand linguistic models constructed out of human experience in the form of IF-THEN rules had attracted the attention for multiple applications. The development of expert systems is a good example of this method.

Information about the system under study can be present in multiple forms, numerical data, expert knowledge, hypotheses which are valid on similar models (uncertain knowledge), etc. The global behavior of the system is described partially by each of these pieces of information. Some of these pieces of information are redundant and some others are unique. The aim is to design a modeling technique that can introduce as much information as possible from very different sources without major changes in the format of the data.

In this paper we present a modeling technique using fuzzy logic. Fuzzy logic is known for its capacity to combine in one framework linguistic (expert knowledge in format IF-THEN rules) and numerical information. So far the use of the so called neuro-fuzzy models has been the attempt to construct fuzzy models from numerical data [ll] [4]. To apply these models the structure of the fuzzy model should be fixed in advance (number of membership functions, number of rules, etc). Many schemes have been proposed to solve this inconvenience, some of them, are based on the accuracy of the approximation or local error [5] [10] and others are based on fuzzy clustering methods [9] [12] [6]. The results of these approaches are models with good capabilities on the framework of numerical approximation, but sometimes very poor in the context of linguistic information. This paper presents the AFRELI algorithm (Autonomous Fuzzy Rule Extractor with Linguistic Integrity), the algorithm is able to fit input-

Linguistic Integrity: A Framework for Fuzzy Modeling . . . 17

output data while maintaining the semantic integrity of the rule base, in such a way that linguistic information can be also included and the description given by the rules base can be used directly to interpret the behavior of the data. So the applications of the technique won't be limited to modeling but also can be used in data mining in order to obtain causal relations among variables. The paper is structured as follows. Section 2.2 presents the structure of the fuzzy model, section 2.3 introduces the AFRELI algorithm, section 2.4 presents the FuZion algorithm to preserve the semantic integrity of the domain, section 2.5 shows some application examples and finally, section 2.6 gives the conclusions.

2.2 Structure of the fuzzy model

One of the advantages of the modeling techniques using Fuzzy Inference Systems (FIS) is the flexibility of the structures. Some of the degrees of freedom that can be found on a FIS are shape and number of membership functions, T-norms, aggregation methods, etc. But sometimes this flexibility makes very difficult the analysis and the design of such structures. Some criteria should be applied to fix some of the parameters of the FIS. In this paper we select some parameters using criteria such as reconstruction capabilities (optimal interface design) and semantic integrity [7] [8].

• Optimal interface design

— Error-free Reconstruction: In a fuzzy system a numerical value is converted into a linguistic value by means of fuzzification. A defuzzification method should guaranteed that this linguistic value can be reconstructed in the same numerical value.

V i e [a,6]: C~1[C(x)]=x (1)

where [a, b] is the universe of discourse. This condition guarantees the perfect correspondence between a numerical value and linguistic concept and vice versa. The use of centroid denazification with triangular membership functions with overlap | will satisfy this requirement (see proof: [7])


• Semantic integrity This integrity guarantees that the membership functions will represent a linguistic concept. The conditions needed to guarantee such semantic integrity are:

— Distinguishability Each linguistic label should have semantic meaning and the fuzzy set should clearly define a range in the universe of discourse. So, the membership functions should be clearly different. Too much overlap among two membership functions means that they are representing the same linguistic concept. The assumption of the overlap equal to | makes sure that the support of each fuzzy set will be different. The distance between the modal values of the membership functions is also very important to make sure that the membership functions can be distinguished. The modal value of a membership function is defined as the a-cut with a = 1

Mi(a=i)(z), i = l,...,N (2)

— Justifiable Number of Elements The number of sets should be compatible with the number of "quantifiers" that a human being can handle. This number should not exceed the limit of 7 ± 2 distinct terms. This is a practical limitation of our brain and it is reflected in our language such that it is almost impossible to find a language where you can "formulate" more than 9 quantifiers. To handle more categories we use methods such as enumeration which are not part of the natural language [2]. The shape of the membership functions does not guarantee this property. In this paper we present the FuZion algorithm which is a method to reduce the number of membership functions on a given universe of discourse.

— Coverage Any element from the universe of discourse should belong to at least one of the fuzzy sets. This concept is also mentioned in the literature as e completeness [4]. This guarantees that the input value is considered during the inference process.

— Normalization Due to the fact that each linguistic label has semantic meaning, at least one of the values in the universe of discourse should have a membership degree equal to one. In


other words all the fuzzy sets should be normal.

Further details about these concepts can be found on [7] [8]. Based on these concepts the choice for the membership functions will be to use triangular and normal membership functions (/ii(x),[12(%)>• • -,IJ-n{x)) with a specific overlap of | . It means that the height of the intersection of two successive fuzzy sets is

hgt( / i jn / / i ± i ) = - . (3)

The choice of the AND and the OR operations will be motivated for the need of generating a continuous differentiable nonlinear map from the FIS. The use of the product as AND operation and the probabilistic sum as OR, makes easier the derivation of gradients that can be used to refine the models. If no further refinement will be applied there won't be major reason to prefer product and probabilistic sum instead of MIN/MAX operation. The aggregation method and the defuzzification method will be discussed in the next sections.

2.3 The A F R E L I a lgor i thm

The AFRELI (Automatic Fuzzy Rule Extractor with Linguistic Integrity) is an algorithm designed to obtain a good trade off between numerical approximation and linguistic integrity. The more accurate one wants to describe a function the more difficult it is to make a consistent linguistic description. The main steps involved in the algorithm are:

• Clustering. • Projection. • Reduction of terms. • Consequence calculation.

• (optional) further antecedent adjustment.

The detailed AFRELI algorithm proceeds as follows:

(1) Collect N points from the inputs (U = { i t i , . . . , UJV}) and the out-


put (Y = {yi,...,VN})

y-k =

-k J

(4)

where Uk € K" and y^ £ 3? represents the inputs and the output of the function on instant k and construct the feature vectors

Xk

,i i

I Vk

(5)

Xk € 5Rra+1. These feature vectors are a spatial representation of the samples on a n + 1 dimensional space.

(2) Using the iV feature vectors find C clusters by using mountain clustering method [12] [6] and refine them using fuzzy c-means [l]. The use of mountain clustering methods helps to find the number of clusters that should be extracted and help to initialize the position of the centers of the clusters. These two parameters are very important to obtain good results when the fuzzy c-means algorithm is applied.

* i * i

Xc = ^n+l ^ n + 1

^1

~n+l

(6)

Xce$ln+lxC. (3) Project the C prototypes (centers) of the clusters into the input

spaces. Assuming that the projected value of each prototype is the modal value of a triangular membership function.

(7)

where i = 1 , . . . , C, j = 1 , . . . , n (4) Sort the modal values on each domain such that:

ml < mJi+1 \/j (8)


(5) Add two more modal values for each input to guarantee full coverage of the input space.

m: o _ mm uu * = i , ,N

mc+i = . . J ? 3 * ^ " * * = 1 , ,JV

(9)

(10)

(6) Construct the triangular membership functions with overlap of | as:

nKx3) = max 0,min ' » - i m: »+i

m^ l i - i m^ m: «+i , (11)

where: i = 1 , . . . , C, and the trapezoidal membership functions at the extremes of each universe of discourse

ô(^) max 0,min rJ - m

^c+i(x:>) ~ max 0,min

.mJn

r.3 - •

V.i

lc+i TUr

(12)

(13)

(7) Apply PuZion algorithm to reduce the number of membership functions. The FuZion algorithm guarantees a reduction of the membership functions till they fulfill the requirements of "distinguisha-bility" and "justifiable number of elements".

(8) Associate linguistic labels (p.e.BIG, MEDIUM, SMALL, etc.) to the resulting membership functions.

(9) Construct the rule base with all possible antecedents (all possible permutations). This guarantees the completeness of the rules and full coverage of the working space. Use rules of the form:

IF u\ is n] AND u\ is tf AND . . . AND «J is p? THEN yk = yt

Equivalently the evaluation of antecedents of each rule can be expressed in terms of the min operator and the product operator as follows:

Ai,(u*) =mm{d(ul),ri(ul),...,tf(un)} (14)


Vi(uk) = Mj(ufc) • M?(ufc) • • • • • M " « ) (15)

(10) Propagate the N values of the inputs and calculate the consequences of the rules as singletons (yj). These singletons can be calculated as the solution of a Least Squares (LS) problem. Observe that the output of the fuzzy system can be calculated as:

^2/J-i{uk)yi

/(«*) = i = i (16)

where L is the number of rules and /xj(i/k) can be calculated as shown in the equations (14) or (15) (According to the selected AND operator). The system then can be represented as the weighted sum of the consequences:

f(uk) =^2wf(uk)yt (17)

where

tof(Ufc) = L

(18)

i=i

expressing wf as the strength of the rule I when the input is u*. Taking all the values the problem can be seen as:

y\ 2/2 w2,

1 -1

L WX N < W

N L J

" 2/i

2/2

. VL .

+

e i

&2

. ew .

(19)

w

The aim here is to reduce as much as possible the norm of the vector E. Using the quadratic norm:

mm| |E| |2 = m i n | | y - W 0 | | 2 (20)


The solution to this problem can be found using the LS solution if

rank(W) = dim(0) (21)

This implies that all the rules have to receive enough excitation during training. In practice, this is not always guaranteed, due to sparseness of the data. Then it is better to apply Recursive Least Squares (RLS) which guarantees that the adaptation will only affect the excited rules. Another advantage of the use of RLS is the possibility to initialize the consequence values using prior knowledge, such that the RLS algorithm only "correct" the consequences of the excited rules. In this way we can say that the prior knowledge is valid as far as the data don't say the contrary. If no prior knowledge is present then it can be created from the data and the easiest way is to construct a linear model and initialize the consequences of the rules by using the values given by this model. The mechanism to obtain the consequences is to evaluate in the linear approximation the modal values of the membership functions of a given rule and the singleton consequence will be the evaluated value. This guarantee that the fuzzy model will be "at least as good as the linear model". Another alternative with even better approximation capabilities is to use "the smallest fuzzy model that can be built" this is a fuzzy model with only two (2) membership functions on each input. This structure generates a multilinear approximator (if there is one input it will be linear, with two inputs bilinear and so on) with the advantage that the problem of consequence calculation via "Least Squares" will be well conditioned due to the fact that each point will excite the 2™ rules of the model. Once the "smallest fuzzy model" is built the model will be used to generate the initial value of the consequences using the same procedure proposed for the linear model.

The RLS algorithm used to calculate the singleton consequences of the rule base is described as follows:

6(k + 1) = 6{k) + j(k)[y(k + 1) - Wk+10(k)] (22)

with Wk = {wi ,u>2, • • • ,wkL) and:

7(fc) = P(k + l)Wk+1 (23)


Wk+1P(k)W£+1 + 1

P(k + 1) = [I--y(k)Wk+1]P(k) (25)

with the initial value -P(O) = al, where a is large. The initial values of 0(0) are the initial values of the consequences as described in the previous paragraph. If the information is a priori considered to excite the whole rule base a good initialization will be:

m=^kyk-mmkyky {2Q)

Other details about the initialization approaches are discussed in

[3]. (11) (Optional step) If some further refinement is desired to improve

the approximation. The positioning of the modal values can be optimized by using constrained gradient descent methods such that the "distinguishability" condition is the main constraint on this optimization. Observe that the use of gradient descent methods guarantee convergence to a "local minimum" making the optimal solution close to the initial solution. This is the reason to mention this step as an optional one, because the expected improvement in the solution won't be very significant for many applications, specially if there is more interest in the linguistic description of the rules. Special care should be taken in the calculation of the "true" gradient when the model is going to be used in dynamic operation (with delayed feedback from its own output).

(12) Convert the singletons to triangular membership functions with overlap \ and modal values equal to the position of the singleton y~i. Consider the vector Y whose entries are the L consequences of the rules but sorted in such a way that:

yi<V2<---<yL (27)

The triangular membership function of the i-th consequence is:

V ~ Vi-i V ~ Vi+i H\{y) = max 0,min Vi ~ j / i - i ' Vi ~ 2/i+i

(28)


and the two membership functions of the extremes:

'y-2yi+V2 y - h f4(y) =max 0,min -y\ + 2/2 y\- m

fiyL{y) = m a x 0,min( V-h-i tV-WL+VL-i

VL ~ J / i - 1 - j fL + 2/L-l

(29)

(30)

This description of the outer membership functions guarantees that their centers of gravity will be exactly on its modal value. This guarantees that the condition of error free reconstruction for optimal interface is full filed.

(13) Apply FuZion algorithm to reduce the number of membership functions in the output universe.

(14) Associate linguistic labels to the resulting membership functions. (15) With the partition of the output universe, fuzzify the values of the

singletons. Observe that each singleton will have a membership degree in at least one set and in as much as two.

(16) Relate the fuzzified values with the corresponding rule. It means that each rule will have one consequence or two weighted consequences were the weights are the non zero membership values of the fuzzified singleton. This conversion of the singleton consequences into weighted triangular consequences gives linguistic meaning to the consequences.

2.4 The FuZion algorithm

The FuZion algorithm is a routine that merges triangular membership functions with modal values that are too close to each other. This merging process is needed to preserve the distinguishability and the justifiable number of elements on each input domain to guarantee the semantic integrity. The FuZion algorithm goes as follows:

(1) Taking the triangular membership functions fix (x), HI (x),... ,fij^(x) with i overlap, and the modal values

mi = Hi(a=i)(x), i = l,...,N (31)


with:

m i < m.2 < • • • < m,N (32)

(2) Define the minimum distance M, acceptable between the modal values.

(3) Calculate the difference between successive modal values as:

dj — mj+\ — rrij, j = 1,... ,N — 1 (33)

(4) Find all the differences smaller than M. (5) If there is no difference smaller than M goto 7. (6) Merge all the modal values corresponding to consecutive differences

smaller than M using (34).

b

rn

D = b-a + l (35)

where a and b are respectively the index of the first and the last modal value of the fusioned sequence and D is the number of merged membership functions.

(7) Update ./V and Go to 3.

2.5 Examples

In this section three examples of applications of the AFRELI and FuZion algorithms are presented. The first two examples are approximations of nonlinear static maps and the last one is the prediction of a chaotic time series.

2.5.1 Example 1: Modeling a two input nonlinear function

In this example we consider the function:

/ ( x l 2 / ) = s i n ( ^ ) s i n ( ^ ) (36)


Ordinal Mambarahlp Function*

Mambarahtp function* aftar FuZion

Fig. 2.1 Effect of the FuZion algorithm

441 points regularly distributed where selected from the interval [—10,10] x [—10,10]. The graph of the function is shown in figure 2.2 Using mountain clustering and fuzzy C-means algorithm 26 clusters were found and are shown in the figure 2.3 represented with 'x'. After the cluster was found its center value is projected into the input domains as is shown in figure 2.4. In figure 2.5 the projected membership functions are shown. The FuZion algorithm is applied obtaining the membership functions shown in figure 2.6. The output membership functions are shown in figure 2.7. Figure 2.8 shows the identified surface.

The 25 extracted rules are:

(1) IF a; is Negative Large AND y is Negative Large THEN z is Negative with strength 0.01 AND Zero with strength 0.99

(2) IF x is Negative Medium AND y is Negative Large THEN z is Zero with strength 0.92 AND Positive with strength 0.08

(3) IF x is Zero AND y is Negative Large THEN z is Negative with strength 0.01 AND Zero with strength 0.99

(4) IF x is Positive Medium AND y is Negative Large THEN z is Negative with strength 0.1 AND Zero with strength 0.9


-10 -10

Fig. 2.2 Example liFunction f(x,y) = sin (^) sin (^)

Fig. 2.3 Example l:Extracted clusters (V) from the data (o)

(5) IF x is Positive Large AND y is Negative Large T H E N z is Negative with strength 0.03 AND Zero with strength 0.97


Projection of the Center of The Clusters

Fig. 2.4 Example l:Projection of the centers of the clusters

Projected Membership Functions lor Input X Projected Membership Functions For Input Y

Fig. 2.5 Example l:Projected membership functions

(6) IF x is Negative Large AND y is Negative Medium T H E N z is Zero with strength 0.96 AND Positive with strength 0.04

(7) IF x is Negative Medium AND y is Negative Medium T H E N z is Zero with strength 0.01 AND Positive with strength 0.99

(8) IF x is Zero AND y is Negative Medium T H E N z is Zero with strength 0.92 AND Positive with strength 0.08


I F x is Positive Medium AND y is Negative Medium T H E N z is Negative with strength 0.99 AND Zero with strength 0.01

I F x is Positive Large AND y is Negative Medium T H E N z is Negative with strength 0.1 AND Zero with strength 0.90

I F x is Negative Large AND y is Zero T H E N z is Negative with strength 0.02 AND Zero with strength 0.98

IF x is Negative Medium AND y is Zero T H E N z is Negative with strength 0.11 AND Zero with strength 0.89

I F x is Zero AND y is Zero T H E N z is Negative with strength 0.03 AND Zero with strength 0.97

I F x is Positive Medium AND y is Zero T H E N z is Zero with strength 0.92 AND Positive with strength 0.08

I F x is Positive Large AND y is Zero T H E N z is Negative with strength 0.01 AND Zero with strength 0.99

I F x is Negative Large AND y is Positive Medium T H E N z is Negative with strength 0.07 AND Zero with strength 0.93

IF x is Negative Medium AND y is Positive Medium T H E N z is Negative with strength 1

IF x is Zero AND y is Positive Medium T H E N z is Negative with strength 0.11 AND Zero with strength 0.89

IF x is Positive Medium AND y is Positive Medium T H E N z is Positive with strength 1

I F x is Positive Large AND y is Positive Medium T H E N z is Zero with strength 0.93 AND Positive 0.07

I F x is Negative Large AND y is Positive Large T H E N z is Negative with strength 0.02 AND Zero with strength 0.98

I F x is Negative Medium AND y is Positive Large T H E N z is Negative with strength 0.07 AND Zero with strength 0.93

IF x is Zero AND y is Positive Large T H E N z is Negative with strength 0.02 AND Zero with strength 0.98

IF x is Positive Medium AND y is Positive Large T H E N z is Zero with strength 0.96 AND Positive with strength 0.04

I F x is Positive Large AND y is Positive Large T H E N z is Negative with strength 0.01 AND Zero with strength 0.99

It is important to remark tha t for this example there is a clear domi

nance of one of the consequences in most of the rules, when this si tuation


Membership Functions foi Input X after FuZion Functions tor Input Y after FuZion

Fig. 2.6 Example 1:Membership functions after FuZion

Membership functions for the oulpul Oulpul Membership functions with UngrAstic meaning

-0.5 0 OS

Fig. 2.7 Example l:(a) Singletons (b) Membership functions with linguistic meaning

appears it will be possible to eliminate the consequence with the smallest strength with a minor impact in the numerical approximation.

2.5.2 Example 2:Modeling of a three input nonlinear function

For this example the data were generated using the function:

f(x,y,z) = (l + x°-5+y-l+z-1-5)2 (37)

In this case 216 random points from the input range [1,6] x [1,6] x [1,6] were used as training set and 125 random points from the input range


Surface generated by the fuzzy system

Fig. 2.8 Example l:Surface Generated by the Fuzzy System

[1.5,5.5] x [1.5,5.5] x [1.5,5.5] were used as validation set. As a performance index we used the average percentage error (APE):

APE^t^m^mr, (as,

where T(i) is the desired output and 0(i) is the predicted output. This performance index allows us to compare the present result with previous works. First a mountain clustering procedure was used and 11 clusters were found, further refinement was obtained by using fuzzy C-means clustering algorithm. In figure 2.9 the projected membership functions can be observed. After reduction using FuZion with a minimum distance factor of 15% of the size of the universe of discourse for each input, the membership functions shown in figure 2.10 were obtained. Figure 2.11 shows the singleton consequences and the consequences after FuZion. Table 1.1 shows the comparative results with previous work.

The result in the table shows that the model obtained with the AFRELI method has an average performance when the training points are evaluated but when the model is compared with the other models using the validation set it is clear that the ANFIS model and the AFRELI model exhibit the


Projected Membership Functions for Input X Projected Membership Function* for Input Y

Fig. 2.9 Example 2:Projected membership functions

best performance. This result confirm that the AFRELI model has not only an acceptable approximation capability and linguistic meaning, but also a good generalization.

2.5.3 Example 3:Predicting Chaotic Time Series

This example shows the capability of the algorithm to capture the dynamics governing the Mackey-Glass chaotic time series. These time series were generated using the following delay differential equation:

^ ( t ) = ° , ? " T ) ,-0.1x(t) (39) 1 + X1K}X{t — Tj

where r = 17. The numerical solution of this differential equation was obtained using fourth order Runge-Kutta method, with a time step of 0.1 and initial condition a;(0) = 1.2. The simulation was run for 2000 seconds


Projected Membership Functions tor Input X after FuZion Projected Membership Functions tot Input Y after FjZion

Fig. 2.10 Example 2: Membership functions after FuZion

and the samples were taken each second. To train and test the fuzzy system 1000 points were extracted t = 118 to 1117. The first 500 points were used as training set and the remaining as validation set. First a six step ahead predictor is constructed using past outputs as inputs of the model:

[x(t - 18) x(t - 12) x(t - 6) x(t)] (40)

and the output will be x(t + 6). After applying the mountain clustering method, 57 clusters were found. Some refinement on the position of the clusters were obtained by using Fuzzy C-Means clustering method. After projection and FuZion the membership functions shown in figure 2.12 were obtained.

To permit a comparison with previous works, the prediction error was evaluated using the so called Non-Dimensional Error Index (NDEI) denned


Table 2.1 Example 2: Performance comparison with previous work. The results from previous works were taken from [4].

Model APEj APEVAL Par am. Number

Size Train.

Set

Size Valid.

Set

AFRELI ANFIS

GMDH model Fuzzy model 1 Fuzzy model 2

1.002 % 0.043 %

4 .7% 1.5%

0.59 %

1.091 % 1.066 % 5.7% 2 . 1 % 3.4%

80 50 -

22 32

216 216 20 20 20

125 125 20 20 20

Singletons of the Consequences Consequences Membership Functions after Fu2on

18 20 22 24 20 25 30

Fig. 2.11 Example 2:(a) Singletons (b) Membership functions with linguistic meaning

NDEI = V&££i(r(0-Q(»))2

a{T) (41)

where T{i) is the desired output, 0(i) is the predicted output and a(T) is the standard deviation of the target series. The tables 1.2 and 1.3 show some comparative results. Conclusions from these results are that the performance of AFRELI from the numerical point of view is acceptable and similar to other numerically oriented techniques. The added value by the AFRELI algorithm to the model is the linguistic meaning.


Projected Merntxtrihip Function for input ir(t-lBr Projected Memberer»p Funciion for input x(t-12)

Fig. 2.12 Example 3:Membership functions after projection and FuZion

2.6 Conclusions

Fuzzy Inference Systems are "universal approximators". A comparative advantage of the fuzzy systems is its linguistic intepretability. The AFRELI algorithm in combination with the FuZion algorithm guarantees an acceptable compromise between numerical accuracy and linguistic integrity. The numerical accuracy of the algorithm can be directly controlled via the parameters of the FuZion and the clustering algorithm. Some improvements of the numerical performance of the model can be obtained by making a "fine" tuning of the parameters of the antecedents by means of gradient descent techniques, but the procedure should respect the minimum distance M between the modal values. The selection of the mentioned parameters are the user's choices.

Linguistic Integrity: A Framework for Fuzzy Modeling ... 37

Membership functions for the output Consequences Membership Functions the Output alter FuSon

Fig. 2.13 Example 3:(a)Singletons (b) Membership functions with linguistic meaning

Table 2.2 Example 3: Performance for prediction six steps ahead. The results from previous works were taken from [4].

Method

AFRELI AFRELI (with optional step)

ANFIS AR model

Cascaded-correlation NN Backpropagation MLP 6th order polynomial

Linear predictive method

Training Cases

500 500 500 500 500 500 500 2000

Non-dimensional error index

0.0493 0.0324 0.007 0.19 0.06 0.02 0.04 0.55

The use of normalized triangular membership functions with 0.5 overlap also guarantees a limited complexity. Because utmost two membership functions have a value different from zero on each input, the maximum number of evaluated rules is 2N where N is the number of inputs. Hence, the problem of the combinatorial explosion in the evaluation of fuzzy systems is now a problem of storage rather than a problem of computation. Future research will be oriented to the implementation of a similar algorithm using membership functions composed by third order polynomials. Some of the features described here are also applicable to this kind of membership functions but a different denazification method should be imple-


(a) Mackey-Glass time series

Fig. 2.14 Example 3:(a) Mackey-Glass time series (solid line) from t = 618 to 1117 and six steps ahead prediction (dashed line) (b) Prediction errors

mented to guarantee an zero error reconstruction optimal interface design. The use of this kind of membership functions will also guarantee a continuous derivative, a very important element to "fine tune" the antecedents.


(a) Mackey-Glass time series prediction 84 steps ahead Desired (Solid) and Predicted (Dashed) 1.4 r

Fig. 2.15 Example 3:(a) Mackey-Glass time series (solid line) from t = 118 to 1117 and 84 steps ahead prediction (dashed line) (b) Prediction errors

Acknowledgments

This work is supported by several institutions: the Flemish Government: Concerted Research Action GOA-MIPS (Model-based Information Processing Systems), the FWO (Fund for Scientific Research - Flanders) project G.0262.97 : Learning and Optimization: an Interdisciplinary Approach, the


Table 2.3 Example 3: Performance for prediction 84 steps ahead (the first seven rows) and 85 (the last four rows). Results for the first seven methods are obtained by simulation of the model obtained for prediction six steps ahead. Results for localized receptive fields (LRFs) and multiresolution hierarchies (MRHs) are for neurons trained to predict 85 steps ahead. The results from previous works were taken from [4].

Method

AFRELI AFRELI (with optional step)

ANFIS AR model

Cascaded-correlation NN Backpropagation MLP 6th order polynomial

Linear predictive method LRF LRF MRH MRH

Training Cases

500 500 500 500 500 500 500 2000 500

10000 500

10000

Non-dimensional error index

0.1544 0.1040 0.036 0.39 0.32 0.05 0.85 0.60 0.10-0.25 0.025-0.05 0.05 0.02

FWO Research Communities: ICCoS (Identification and Control of Complex Systems) and Advanced Numerical Methods for Mathematical Modelling, IWT Action Programme on Information Technology (ITA/GBO/T23) - Federal Office for Scientific, Technical and Cultural Affairs - Interuniver-sity Poles of Attraction Programme (IUAP P4-02 (1997-2001): Modeling, Identification, Simulation and Control of Complex Systems; and IUAP P4-24 (1997-2001): Intelligent Mechatronic Systems (IMechS) ), the European Commission: TMR project: System Identification.

Linguistic Integrity: A Framework for Fuzzy Modeling ... 41

References

Bezdek J.C., "A Physical Interpretation of Fuzzy ISODATA," IEEE Trans. Syst.,Man,Cybern., 6, p.387, 1976.

Broadbent D.,"The Magic Number Seven After Fifteen Years," Studies in Long Term Memory, Ed. A. Kennedy and A. Wilkes, p.3, 1975.

Espinosa J., Vandewalle J., "Fuzzy Modeling and Identification, A guide for the user," Proceedings of the IEEE Singapore International Symposium on Control Theory and Applications-1997, p.437,1997.

Jang,J.-S.R., " Neuro-Fuzzy Modeling: Architectures, Analyses and Applications," Phd.dissertation University of Berkeley, California, 1992.

Jang J.-S. R., "Structure Determination in Fuzzy Modeling: A Fuzzy CART Approach," Proc. of IEEE international conference on fuzzy systems, 1994.

Lori N., Costa Branco P.J.,"Autonomous Mountain-Clustering Method Applied to Fuzzy Systems Modeling," Intelligent Engineering Systems Through Artificial Neural Networks, Smart Engineering Systems: Fuzzy Logic and Evolutionary Programming , Ed. C.H. Dagli, M. Akay, C.L. Philip Chen, B. Fernandez, and J. Ghosh, 5, p.311, ASME Press, New York, 1995.

Pedrycz W., "Why Triangular Membership Functions?," Fuzzy Sets and Systems, 64, p. 21, 1994.

Pedrycz W., Valente de Oliveira J., "Optimization of fuzzy models," IEEE Trans. Syst.,Man,Cybern., Part.B, 26, p.627, 1996.

Sugeno M., Yasukawa T., "A Fuzzy Logic Based Approach to Qualitative Modeling," IEEE Trans, on Fuzzy Systems, 1, p.7, 1993.

Tan S., Vandewalle J., "An On-line Structural and Parametric Scheme for Fuzzy Modelling," Proc. of the 6th International Fuzzy Systems Association World Congress IFSA-95, p.189, 1995.

Wang L.X., "Adaptive Fuzzy Systems and Control," Prentice Hall, New Jersey, 1994.

Yager R.R., Filev D.,"Essentials of fuzzy modeling and control," John Wiley & Sons, New York, 1994.

Chapter 3

A New Approach to Acquisition of Comprehensible Fuzzy Rules

Hiroshi Ohno1 and Takeshi Furuhashi2

Toyota Central Research & Development Labs., Inc. 2Nagoya University

Abstract

We present a new approach to acquisition of comprehensible fuzzy rules for fuzzy modeling from data using Evolutionary Programming (EP). For accuracy of model, it is effective to allow overlapping of membership functions with each other in the fuzzy model. From the viewpoint of knowledge acquisition, it is desirable that the model has a smaller number of membership functions with less overlapping. Considering the trade-off between the precision and the clarity of the fuzzy model, this paper presents an acquisition method of comprehensible fuzzy rules form the identified model that satisfies the desired accuracy. The approach clearly distinguishes modeling phase and re-evaluation phase. The accurate model of unknown system in the modeling phase is to be obtained by, for example, fuzzy neural network (FNN) such as a radial basis function network, using EP. The simplified model in the re-evaluation phase can mainly be used for knowledge acquisition from unknown system. A numerical experiment was done to show the feasibility of the proposed algorithm.

Keywords: fuzzy modeling, knowledge acquisition, fuzzy neural networks, evolutionary programming, linguistic meanings

3.1 Introduction

In recent years, many researchers on fuzzy modeling from data have been widely developed for numerous applications[l]. In the fuzzy modeling, parameters of membership function and fuzzy rules are generated and tested

43

44 H. Ohno & T. Furuhashi

by evolutionary optimization algorithm. The resulting fuzzy model thus represents the nonlinear characteristics of unknown system. However, the model cannot always provide comprehensible fuzzy rules since many membership functions are generated and overlapped with each other for identifying a precise model of the nonlinear system. If the purpose of modeling is to realize the nonlinear relation of unknown system, we can use neural networks. Fuzzy modeling is superior in linguistic explanation of unknown system to neural networks and other nonlinear modeling techniques.

Comprehensibility and precision of the model are trade-off, and the com-prehensibility of the fuzzy rules thus has been sacrificed for achieving the precision. Since comprehensible fuzzy rules discovered from unknown system are useful for understanding the system, a modeling method satisfying both the precision and the clarity of the resulting model simultaneously is desirable.

The conventional studies on fuzzy modeling can be categorized into two main approaches. One is to find out precise fuzzy rules subject to the constraints of comprehensible fuzzy rules[2],[3]-[6],[7],[8].

The other is to reduce complexity after obtaining a precise fuzzy model[9]-[13]. To find out precise fuzzy rules, the fuzzy model becomes more complex,which has many membership functions. It is not suitable for knowledge acquisition. In the latter approach,the modeling performance is often degraded because of reducing number of membership functions that have available information for nonlinear modeling. Thus, from a knowledge acquisition viewpoint, it is not necessary that the simplified fuzzy model has a higher precision and at the same time a better comprehensibility.

In this paper we address the knowledge acquisition with clear linguistic meanings for the fuzzy modeling, and clearly distinguish the modeling phase and the re-evaluation phase in the process of knowledge acquisition. We propose a new approach using EP[l4], which consists of the modeling phase for identification of accurate model and the re-evaluation phase for simplification of the identified model. In the approach we introduce the degree of explanation for the acquired fuzzy model in the re-evaluation phase, which is the constraints of membership function. The constraints guarantee linguistic meanings of membership function during the re-evaluation phase. The degree of explanation that is set by the user determines the structure of knowledge in the simplified model. To generate proper knowledge of unknown system effectively, it is necessary that the user who is a domain expert interacts in the process of knowledge acquisition[l5]. This

A New Approach to Acquisition of Comprehensible Fuzzy Rules 45

is to direct the process of knowledge acquisition according to the domain knowledge. The acquired fuzzy rules are re-evaluated using EP for clarifying their linguistic meanings subject to the constraints of membership function. It is not always easy to derive the derivatives of the objective needed by the learning algorithm such as gradient descent optimization method. Evolutionary optimization method does not require the derivatives for its learning. EP can directly handle numerical data without the use of binary code.

The approaches described in the literatures[ll], [12], [13] are aimed at reducing the number of membership functions or fuzzy rules in the model. These approaches do not always make interpretation and analysis of the rules easy. Fig.3.1 shows the flow chart of our approach. The distinguishing feature of the proposed approach is the incorporation of the degree of explanation by the user.

From a neural network viewpoint, studies have been done for extracting rules or fuzzy rules from trained neural networks[l6],[l7]. However, the acquired rules in the neural network are very difficult to comprehend because of the distributed representation of knowledge in connection weights and biases of network. The proposed approach using an FNN is superior to the neural network approach in the sense of easily extracting knowledge, because it has the structure of knowledge.

This paper is organized as follows. In Section 2, our approach for knowledge acquisition from fuzzy model is presented. A fuzzy modeling using an FNN is described. Section 3 illustrates computer experiments using a simple example from literature to show the feasibility of the proposed method. In Section 4, a summary and conclusions complete the paper.

3.2 Proposed Algorithm

This section describes the proposed algorithm for the clarity of fuzzy rules identified by the FNN[l8]. Another fuzzy modeling techniques(e.g., [19]) can also be used. The algorithm consists of modeling phase and re-evaluation phase.


User: Degree of explanation

comprehensible fuzzy rules

Fig. 3.1 Flow chart of our approach

3.2.1 Fuzzy modeling

In the modeling phase, we use the FNN with Gaussian-type membership functions depicted in Fig 3.2, and identify their parameters using EP from given input-output training data. We can also use another modeling technique instead of EP, such as gradient descent type method to decide unknown parameters. The fuzzy rules are represented as

Ri'.If x\ is An and x2 is Ai2,- • • ,XN is Any then yi = Wi

Vi — T-^M -p-rJV . . I

(1)

where M is the number of rules, N is the number of inputs, Ri(i = 1,2, ...,M) denotes the ith fuzzy rule, Xj(j = 1,2, ...,N) is the input, y is the output of the fuzzy rule, Wi is the singleton, Aij is the fuzzy variable with the Gaussian-type membership function. This function is given by

VAZ1(XJ) = exp(-(a;j - cij)2/2a^j), (2)


where Cij and <Jij determine the central position and the width of the membership function, respectively.

c^, &ij, and Wi are parameters to be identified by EP. The objective of EP is to reduce the squared error between the output of the fuzzy model and the desired output of the training data.

Fig. 3.2 Architecture of fuzzy-neural network

3.2.2 Re-evaluation of fuzzy model for knowledge acquisition

In the re-evaluation phase we simplify the membership functions for the clarity of the acquired knowledge in the following procedure: First, we select the type of membership function in the antecedent such as trapezoidal or triangular with linguistic meanings, and decide the number of them. This is the process to introduce the degree of explanation by the user into the fuzzy model. Second, we replace the original membership functions of the identified fuzzy model using EP subject to the constraints as the degree of explanation. The objective of EP is to reduce the squared error between the values of each original membership function associated with the input training data and those of the new membership function. The shapes of the new membership functions are depicted in Fig.3.3. The parameter of these membership functions is summarized in Table 3.1. Other shapes of membership function can also be used.

The constraints are imposed on the shape and the order of membership functions summarized in Table 3.2. The degree of explanation is defined by these constraints and the number of new membership functions. The user


interacts with the computer system for knowledge acquisition through the degree of explanation. If the constraints are not satisfied during the operation of EP, the parameter of the membership functions, that is included in a solution vector of EP, are initialized and the EP operation is continued. The solutions of EP are real-valued vectors corresponding to the unknown parameters of the fuzzy system and the adaptable standard deviations that determine the step size for each unknown parameter. Finally, we evaluate the performance of the fuzzy model replaced by the new membership functions in terms of the minimum distance between the output of the original membership functions and new ones.

The algorithm is summarized as follows: Given a fuzzy model such as an FNN constructed from data. Step 1. Set the degree of explanation by selecting the number and shapes of membership functions and determining the order of them. Step 2. Create the new training data that comprises the input data in the modeling phase and the corresponding outputs of the original membership functions. Step 3. Adapt the new membership functions to the original membership functions of the fuzzy model using EP subject to the constraints given by the degree of explanation. Step 4. Replace the membership functions with the new membership functions with the minimum distance using the new training data in Step 2. Step 5. Evaluate the new fuzzy model using the training data in modeling phase.

In Step 5, the fuzzy rules which have the same antecedent part are merged and the singletons, w,, among them are averaged.

Table 3.1 Parameters of membership functions A B C D

x,l x,11,12 x,11,12 x,11,12


Fig. 3.3 Shape of membership functions

Tfthle 3.2 Condit ions nf t.hp ronst.ra.int.s

Shape Order

0 < / for A CBD

0 < ll< 12 for B, C, D CABAD

3.3 Computer Experiments

We used a simple example to demonstrate the proposed algorithm. Three-input and single-output function was chosen to be an unknown system for modeling phase. The function was represented as

y = (1 + x\h + x^1 + *8-

1-Bxa 5)2 (3)

Forty input-output training data were randomly generated using this function, as shown in Table 3.3. Twenty data were used in the modeling phase. The remaining were used for testing the generalization of model. The accuracy of fuzzy model was evaluated using the following performance index:

E=1/LJ2 Vi

x 100(%), (4)

where L denoted the number of data, yt denoted the ith data from Eq.3, and y* was the output of the fuzzy model. For EP, the population size that is the number of solutions was set at 100.

the membe ship funct ions w ere heavil y ove n

.

•o •pad

The experi ment il resu It had little com prehe 3 £->•

cr

ility 8,

fT t?

kl

VJ rule CO

~ sin o fD

In Tabl CD

3.4, the p erform ance inde xes of t rr

fT> resu ltiu en

ti

o

a.

a>

are CO

cr

o

2,

3

II u II 10, and t en me mbership funct o n

CO

are

oqs

f-i

functions c o rres ponding to e ach input varia cr

fD after 100 o

o

w

0) 3 (1)

i-i atio 3 C

O

3 cr

selected to CT

" ft

> 00. I n Figs 3.4, 3.5 , and to

Ol the i-i

CD

3 3 CR

mem cr

CD

rsh X)

[0,5]. For EP, the p opulat ion size t hat i CO

c* cr

ft! 3 3 3 cr

fD

O

i-t>

O utic i-t 3 IS We set the initi al ran dom v alues of c fT

l

O

** Ol

51

rti

"o

h-*

" O

P

3 a.

S

m

3.3.1 M a deli ng re %

Ot

OC

OS

O!

Ol

i|

i«

MP

O

Oi

Oi

Oi

Ol

Oi

i—

'l—

!|

—'l

—'H

-*

O

WW

HO

lW

HC

nW

MO

lW

4^

Wt

OI

-'

tO

tO

ht

».

Ol

4i

.C

Ot

O

Ol

> tO

O

l h

M

HI

OP

HP

Ol

,,

,.

.

Ol

HO

Ol

tO

MO

lS

Ol

MM

O

JO

lO

)«

lO

(0

0)

N0

1*

.f

fl

i^

tO

CO

tO

CO

CO

tO

WC

OC

OW

O

ff

lC

ON

Ol

Ol

Ô

JM

HO

Ol Ol Ol Ol Ol

Cot—1 Ol Co h-

l Ol CO I—

l Ol CO H-

(O

WJ

lD

li

^t

Ot

OH

bO

Wi

^

I—

1—1

I—

I—'

I—'

Ol

^^

Ol

CO

tO

CO

CO

•

w

WO

l0

1W

ON

)h

J°

(O

SO

CO

I—

' 0

0 CO

tO

>£

• CO

H

^M


1.00 2.00 3.00 4.00 5.00

Fig. 3.4 Membership functions of input variable (x\)

.00 2.00 3.00 4.00 5.00

Fig. 3.5 Membership functions of input variable (12)

3.3.2 Re-evaluation results

We examined the case where the degree of explanation was denned as fol-lows:the numbers of the membership functions were A = 0, B = 1, C = 1, and D = 1 for each input variable; The order of these membership functions was set from left to right as "CBD." The linguistic meanings in this case


1.00 2.00 3.00 4.00 5.00

Fig. 3.6 Membership functions of input variable (2:3)

Table 3.4 Performance index E (%)

Training data (Data 1~20) 11.46

Unknown data (Data 21~40) 12.31

were interpreted as C is "small," and B is "medium," and D is "big." Table 3.5 shows the initial random values of the parameters of the mem

bership functions assigned to B, C, and D. These parameter values were determined empirically after trial and error experiments to prevent the re-evaluation phase form getting stuck at local minimum. The maximum and minimum values of the input variable were 1 and 5, respectively.

After 5000 generations, we obtained the membership functions as shown in Figs. 3.7, 3.8, and 3.9. In these figures, the membership functions were labeled by "C," "B," and "D." In Table 3.6, the performance indexes are shown. The performance indexes were degraded from those of the original fuzzy mpdel shown in Table 3.4. This degradation is regarded as the explanation cost for the comprehensibility. In this case, the membership functions became more comprehensible than the original ones shown in Figs. 3.4, 3.5, and 3.6. If the objective of fuzzy modeling is to acquire a precise model, the original model can be used.

From these figures, it is seen that the new membership functions are


Table 3.5 Initial values of the membership functions B

C

D

x € [2.8, 3.2] 11 6 [0.3, 0.7] , 12 e [0.8, 1.2] x € [0.8, 1.2] Zl € [0.3, 0.7] , 12 £ [0.8, 1.2] x £ [4.8, 5.2] 11 £ [0.3, 0.7] , 12 6 [0.8, 1.2]

not weak consistency[20]. Prom the linguistic meaning point of view, it is desirable for membership functions to be weak consistency or consistency. Therefore, in considering this point it needs new constraints which are imposed on the membership functions.

Table 3.7 shows the fuzzy rules of the new fuzzy model that is simplified. In the table, the last column is the value of the singleton, wt. The number of fuzzy rules decreased from 10 to 9.

1.00 2.00 3.00 4.00 5.00

Fig. 3.7 Membership functions of input variable (xi)

3.4 Conclusions

In this paper, a new approach to the acquisition of comprehensible fuzzy rules from the FNN constructed from data was proposed and its feasibility


x2 1.00 2.00 3.00 4.00 5.00

Fig. 3.8 Membership functions of input variable (£2)

x3

00 2.00 3.00 4.00 5.00

Fig. 3.9 Membership functions of input variable (13)

was demonstrated through computer experiments. The proposed algorithm using EP consists of two phases: modeling and re-evaluation. In the re-evaluation phase, we can control the explanation degree for the knowledge acquisition as the constraints. This is the feature that distinguishes the proposed approach from the conventional approaches.


Table 3.6 Performance index E (%)

Training data 52.79

Unknown data 54.96

Table 3.7 Fuzzy rules of new fuzzy model. Number

1 2 3 4 5 6 7 8 9

Xi

small small small

medium medium

big big big big

X2

small medium

big small big

small small

big big

Xz

medium small small big

medium small

medium small big

y 22.690 -5.437 -1.920 -3.532 0.116 16.271 -0.144 21.760 -3.102

Future work is to improve the performance index after re-evaluation phase and to apply this method to practical application. In the re-evaluation phase, the experimental results reveal that new constraints for membership functions are needed for linguistic meanings. Moreover, introduction of another measure of distance between the original and new membership functions may improve the performance in the re-evaluation phase.


References

[1] J.C.Bezdek, "Editorial:Fuzzy Models-What Are They, and Why ?," IEEE Trans. Fuzzy Syst., Vol.1, pp.1-6, 1993.

[2] S. Horikawa, T. Furuhashi, S. Okuma, and Y. Uchikawa, "A Fuzzy Controller Using a Neural Network and its Capability to Learn Expert's Control Rules," in Proc. of Int'l. Conf. on Fuzzy Logic & Neural Networks(IIZXJKA-90), pp. 103-106, 1990.

[3] J.-S. R. Jang, "Fuzzy modeling Using Generalized Neural Networks and Kalman Filter Algorithm," in Proc. of Ninth National Conf. Artificial Intelligence(AAAl-9l), pp. 762-767, 1991.

[4] J.-S. R. Jang, "Self-Learning Fuzzy Controllers Based on Temporal Back-Propagation," IEEE Trans. Neural Networks, vol. 3, no. 5, pp. 714-723, 1992.

[5] T. Hasegawa, S. Horikawa, T. Furuhashi, et al., "A Study on Fuzzy Modeling of BOF Using a Fuzzy Neural Network," in Proc. of the 2nd Int'l Conf. on Fuzzy Logic & Neural Networks(IIZ\JKA'92), pp. 1061-1064, 1992.

[6] S. Nakayama, T. Furuhashi, and Y. Uchikawa, "A Proposal of Hierarchical Fuzzy Modeling Method," Journal of Japan Society for Fuzzy Theory and Systems, vol. 5, no. 5, pp. 1155-1168, 1993.

[7] K. Shimojima, T. Fukuda, and Y. Hasegawa, "Self-tuning Fuzzy modeling with Adaptive Membership Function, Rules, and Hierarchical Structure Based on Genetic Algorithm," Fuzzy Sets and Systems, vol. 71, no. 3, pp. 295-309, 1995.

[8] S. Matsushita, A. Kuromiya, M. Yamaoka, T. Furuhashi, and Y. Uchikawa, "Determination of Antecedent Structure of Fuzzy Modeling Using Genetic Algorithm," in Proc. of 1996 IEEE Int'l Conf. on Evolutionary Computa-ium(ICEC96), pp. 235-238, 1996.

[9] R. R. Yager and D. P. Filev, "Unified structure and parameter identification of fuzzy models," IEEE Trans. Syst., Man, Cybern., vol. 23, no. 4, pp. 1198-1205, 1993.

[10] B. G. Song, R. J. Marks II, S.Oh, P. Arabshahi, T. P. Caudell, and J. J. Choi, "Adaptive membership function fusion and annihilation in fuzzy if-then rules," in Proc. Second IEEE Int. Conf. Fuzzy Syst, pp. 961-967, 1993.


11] C. T. Chao, Y. J. Chen, and C. C. Teng, "Simplification of fuzzy-neural systems using similarity analysis," IEEE Trans. Syst., Man, Cybern., vol. 26, no. 2, pp. 344-354, 1996.

12] R. Babuska, M. Setnes, U. Kaymak, and H. R. van Nauta Lemke, "Rule base simplification with similarity measures," in Proc. Fifth IEEE Int. Conf. Fuzzy Syst, pp. 1642-1647, 1996.

13] J. Yen and L. Wang, "An SVD-based fuzzy model reduction strategy," in Proc. Fifth IEEE Int. Conf. Fuzzy Syst, pp. 835-841, 1996.

14] N. Saravanan and D. B. Fogel, "Evolving Neural Control Systems," IEEE EXPERT, 10(3), pp. 23-27, 1995.

15] Y. Nakamori, "Development and Application of an Interactive Modeling Support System," Automatica, vol. 25, no. 2, pp. 185-206, 1989.

16] J. Diederich, "Explanation and artificial neural networks," Int. J. Man-Machine Studies, Vol.37, pp. 335-355, 1992.

17] S. H. Huang and M. R. Endsley, "Providing understanding of the behavior of feedforward neural networks," IEEE Trans. Syst., Man, Cybern., Vol.27, No.3, pp. 465-474, 1997.

18] S. Horikawa, T. Furuhashi, and Y. Uchikawa, "On Fuzzy Modeling Using Fuzzy Neural Networks with the Back-Propagation Algorithm," IEEE Trans. on Neural Networks, vol. 3, no. 5, pp. 801-806, 1992.

[19] M.Sugeno and T.Yasukawa, "A fuzzy-logic-based approach to qualitative modeling," IEEE Trans. Fuzzy Syst, Vol.1, pp. 7-31, 1993.

[20] X.-J. Zeng and M. G. Singh, "Approximation Accuracy Analysis of Fuzzy Systems as Function Approximators," IEEE Trans, on Fuzzy Systems, vol. 4, no. 1, pp. 44-63, 1996.

Chapter 4 Fuzzy Rule Generation with Fuzzy Singleton-Type Reasoning Method

Yan Shi1 and Masaharu Mizumoto2

' Kyushu Tokai University 2 Osaka Electro-Communication University

Abstract

By means of fczzy singleton-type reasoning method, we propose a self-tuning method for fuzzy rule generation. In this tuning approach, first we use a learning algorithm for tuning fuzzy rules under fuzzy singleton-type reasoning method, then we roughly design initial tuning parameters of fuzzy rules based on fuzzy clustering algorithm. By this approach, the learning time can be reduced and the fuzzy rules generated are reasonable and suitable for identified system model. Finally, we show the efficiency of the employed method by identifying nonlinear functions.

Keywords : fuzzy singleton-type reasoning method, fuzzy rule generation, neuro-fuzzy learning algorithm, fuzzy c-means clustering algorithm

4.1 Introduction

In fuzzy singleton-type reasoning method by Mizumoto [10,11], that can adjust the weights of fuzzy rules, the fuzzy inference conclusion can be well improved because of the flexibility of the method. Also, it shows better fuzzy control results than the case of simplified fuzzy reasoning [10,11]. Like other fuzzy reasoning methods, it is necessary and important to design the fuzzy rules of fuzzy singleton-type reasoning method for a practical problem, in the case when the construction of a fuzzy system model is difficult by human being [5-8,13-14,16-21]. For this purpose, a learning algorithm for tuning the real numbers and weights of the consequent parts has been proposed in [10] by using the gradient descent method [15], where membership functions of antecedent parts are of triangular-type. Furthermore, in the case of one-input one-output, another so called self-generating learning algorithm fuzzy rules has

59

60 Y. Shi & M. Mizumoto

been provided by fuzzy singleton-type reasoning method [12], which tunes centers of triangular membership functions of antecedent parts, real numbers and weights of the consequent parts based on the gradient descent method. However, the above two tuning methods lack the generality for a multiple-input fuzzy system model. Also, as is well known in all of neuro-fuzzy learning algorithms [3,5-8,13-14,16-21], it has not been full investigated how to arrange the suitable initial values of tuning parameters (centers and widths of membership functions of antecedent parts, real numbers of consequent parts and their weights) before learning them.

In this article, we propose a new self-tuning method for fuzzy rule generation based on the fuzzy singleton-type reasoning method. In this developed approach, first we give a so-called neuro-fuzzy learning algorithm for tuning fuzzy rules under fuzzy singleton-type reasoning, then we roughly design initial tuning parameters of fuzzy rules by using fuzzy clustering algorithm, before learning a fuzzy model. By this approach, the learning time can be reduced and the fuzzy rules generated are reasonable and suitable for the identified system model. Moreover, the potential of the proposed technique is illustrated by identifying nonlinear functions.

4.2 Fuzzy Singleton-Type Reasoning Method (FSTRM)

We first briefly review the fuzzy singleton-type reasoning method by Mizumoto [10,11], in which the fuzzy model has m input linguistic variables (xbx2,...,xm) and one output variable y.

For convenience of representation, in the sequel we denote fuzzy singleton-type reasoning method as FSTRM. Usually, a fuzzy model with m linguistic variables (xhx2,...,xj and one output variable y can be expressed by FSTRM in the form of"If...then...with..." fuzzy inference rule model as follows [10,11]:

Rule 1 : 1 ^ i s^ u andx2 isy42i and... andxm isAm\ thenyj withwj

Rule /': If x{ isAu andx2 isA^ and... andxm i s ^ theny, with w, (1)

Rule n: If x, is^4ln andx2 is A^ and... andxm isA„„ theny„ with wn

Fuzzy Rule Generation with Fuzzy Singleton . . . 61

where Ajt (j=l,2,..m; i=l,2,.../t) is a fuzzy subset for the input linguistic variable Xj, y, is a real number for the output variable y, and w; is the weight corresponding to /'-th fuzzy rule, respectively. And n denotes the number of fuzzy rules.

When an observation (xlrr2,..., xm) is given, a fuzzy inference consequence y can be obtained by using FSTRM in the following way [10,11]:

ht=AlficMite)...AJ?cJ

y = yhtwiyi/Vhlwi

(2)

(3)

where //, (/-l,2,...,w) is an agreement of the antecedent of /-th fuzzy rule at (xhx2 xm). As a simple explanation of FSTRM, Fig. 4.1 shows the process of the fuzzy inference by using fuzzy singleton-type reasoning method wheny = 2, i = 2in(l)[10] .

It has been shown that better fuzzy control results can be obtained by FSTRM than those by simplified fuzzy reasoning method, which implies that FSTRM is a powerful tool for fuzzy logic applications [10-12].

• xi y i __. h)Wi

?W2 1 y hyvz

=Ai2(xl)A22(.X2)W2

yi

Fig. 4.1 Explanation of fuzzy singleton-type reasoning method.


4.3 Learning Algorithm of Fuzzy Rule Generation with FSTRM

For given training input-output data (x^1,...^cmy*) to a fuzzy system model, we have the following objective function E for evaluating an error between y" and

y-E=(y-yfl2 (4)

where y is the desired output value, andy is the corresponding fuzzy inference result.

To minimize the objective function E, a new neuro-fuzzy learning algorithm for tuning fuzzy rules under FSTRM is developed as follows:

Gaussian-type neuro-fuzzy learning algorithm by FSTRM

Firstly, we give a neuro-fuzzy learning algorithm, in which membership functions of the antecedent parts of the fuzzy rules are of Gaussian-type as shown in Fig. 4.2.

Fig. 4.2 Gaussian-type membership functions for input variable x}.

Let fuzzy subsets 4/, (/=l,2,..w, i=l,2,...,n) be of Gaussian-type as follows [6]:

AJt(Xj) = exp(-(Xj - cij,)2/ bj.) (5)

where aJt is the center ofAj, and bfi is the width of AJt.

By (1) - (3) a neuro-fuzzy learning algorithm under FSTRM for updating the parameters aJh bJh y, and w, (j=\,2,~m; z-l,2,...,«) is formulated based on the gradient descent method [15] as follows:


aJI(t + l)-aJI(f)-adE/daJI(f)

= a j , (0 + 1 J— (6)

bJ,{t + l)-bJI{t)-fidE/dbJI(t)

WOr'-yXy.-yWMxj-aj,)2

• M 0 + -S-*S h,w,

(7)

- / A ' / T

w,(r + l)

-w,(0 +

| / , W /

-w,( f ) -^£/dw,(0

Oiy-y*t n

y.h'wi

(8)

(9)

where a, 0, y and # are the learning rates, and r is the learning iteration.

Trianguar-type neuro-fuzzy learning algorithm by FSTRM

Fig. 4.3 Triangular-type membership functions for input variable Xj.


Next, we give another neuro-fuzzy learning algorithm, in which the memberslup functions of the antecedent parts of the fuzzy rules are of Triangular-type as shown in Fig. 4.3.

Let fuzzy subsets^, (J=l,2,..m; i=l,2,...,ri) be of Triangular-type as follows:

W-l-2|x,-a„.|/£„.,

aji-bJi/2xxJxaJi+bjl/2;

0, otherwise

(10)

where a,, is the center afAM and bJt is the width cSAj,.

By (1) - (3) a neuro-fuzzy learning algorithm under FSTRM for updating the parameters aJh bJh yt and w, (j=l,2,..m; i=l,2,...,n) is given based on the gradient descent method [15] as follows:

aJt + l)-aJt)-aaE/da„(t)

2a(y* - y)(yt - y)w, sgn(x -

-o„(0 + -

m

b4 bt:yh,w,

(11)

Z>,,. (/ +1) = *,,(?)-/*)£/6*,, (0 m

2p{y-y){yt-y)w\xJ-aJ^[Aki

•bAt) + -bji y v #

(12)

yAt + V-yM-ydElByM)

= yi{t) + ?{y'-y)h^ n

2'" (13)


W, (t +1) - W, (f) - 0dE 19wi (t)

- v , ( 0 + * ( / - ^ ( 1 4 >

h.w,

where in (11) and (12), k =£ _/' implies & = l,...1/-l,_/+l,.../w, and the mark sgn is a sign function as follows:

sgn (x) ---1, x<0; 0, x = 0; 1, x>0.

4.4 Designing Initial Tuning Parameters of Fuzzy Rules based on Fuzzy Clustering Algorithm (FCM)

Now, we are in a position to design the initial tuning parameters of fuzzy rule in the neuro-fuzzy learning algorithm (6) - (9) or (11) - (14) by using FCM, which is described briefly as follows [1,2].

Assume (^^ . . . ^ J to be variables on the input space X = Xl XJ2 X... XXm, and y be variable on the output space Y, U^R„Xc be an n X c matrix of fuzzy partition for given n training data JC* = (xl

k,x2k,...,xm

k,y*k) (k=l,2,...,n), where cis the number of clusters.

Let /i h; e U be a membership function value from /c-th vector xk to i-th cluster center vector v, (=(v1',v2',...,vm+1')ei?"rtl) (z'=l,2,...,c; 2<c<ri), and satisfy the following restrictions:

0* / / t o £ l , Vk,i (15)

0«; y juki «; n, V/ ( 1 6 )


t/^=l, VAr (17)

A n objective function Js of F C M for solving v, is defined as follows:

J , ( t / , V l , v 2 , . . . , v c ) - | ^{MuYfa-v.i <18>

where s (l<s«») means a smoothing weight, || • || is the inner product norm.

For minimizing J„ FCM algorithm is performed in the following procedures: Step 1. Given parameters c and s such that 2^c<n, l<s<°°, and the inner product norm || • ||. We can take a norm as

h-vifA=(xk-vi)A(xk-vf (19)

where A is a positive definite matrix. Step 2. Given an initial matrix C/(0) of fuzzy partitions randomly, subject to the conditions of (15)-(17). Step 3. For t = 0,1,2,..., calculate cluster centers v, (z=l,2,...,c) by using U{f) as follows:

v , - | ( / / * ) ' v | ( / / a ) ' <20>

Step 4. Update U(t) in the following ways: (1) Calculate the sets Ik and fk (k=l,2,...,n) as

/ * = { i | l ^ i ^ c,4, = |^-v, | | = 0} (21)

fk= {1,2,...,c}-Ik (22)


(2)-(a): When/, = 0,

(2)-(b):When/t ^ 0,

Afa = 0, V / e / 4 (24)

y^=i <25)

Step 5. Define a suitable matrix norm, and stop FCM process if the following condition holds:

\\U(t)-U(t+l)\\< E (26)

otherwise, set t = t+\ and return to Step 3; where £ is a sufficient small positive number, which is a terminal criterion. In this study, we take a matrix norm as [2]

p{t) -U(t +1)[ - max|/^. (r) - nkl (t +1)| (27)

In the sequel we discuss how to arrange the suitable initial values of tuning parameters of fuzzy rules (centers and widths of membership functions of antecedent parts, real numbers of consequent parts and their weights) in the learning algorithm (6) - (9) or (11) - (14).

For given n training data xk = (x{jcf,...jc^, y'^ (k=l,2,...,n), we can create the cluster center v, = (v^v^.-.v,^,') (/=l,2,...,c) and the matrix U = (Afc)„xc by using FCM process.

First, we define the importance w, of each cluster center v, (/=l,2,...,c) as


" , - l / | < C (28)

or more generally,

W|-«p(-tO (29)

where dh = \\xk - v,|| (£=1,2,...,«; i=l,2,...,c) is the distance from a point JCA to the cluster center v,.

Then, by means of the idea of the fuzzy c-shell clustering algorithm [4] we can define a scalar r, = (rx',r2,...,rj,rm¥l

l) (z'=l,2,...,c), where rj, stands for the length of the cluster center v, iny'-th axis:

rl-liMufUf-vlVliMj- (30)

Next, we correspond the cluster center v, with the initial tuning parameter vectors (au(0), a2i(0),...,aJ0),yl(0)) in (6) and (8) or in (11) and (13) as

(au(0), <h0),...,aJO),yjm = v, (31)

Namely, we have

«,,(0) = v/y=l,2,...,m; i=l,2,...,c (32)

y@) = vm¥l' i=l,2,..,c. (33)

Furthermore, we give the relations between rj (/=l,2,...,w; j=l,2,...,c) with the initial tuning parameters fy,(0) in (7) or in (12) as

bj{V) = krj, j=l,2,...,m;i=l,2,...,c (34)


where k stands for an adjusting constant, subject to the type of membership functions. Fig. 4.4 gives a simple explanation of (32) and (34) at m = 2.

Finally, the weight w,(0) of z'-th fuzzy rule in (9) or in (14) can be calculated by using (28) or (29) as

W/(0) = w, I max{ wj) (35)

Note that the number of the cluster centers coincides with the number of fuzzy rules in (6) - (9) or in (11) - (14).

Hence, we can obtain the initial tuning parameters as (32) - (35) before the learning of fuzzy rules.

Fig. 4.4 Initial membership functions of antecedent parts by cluster centers

4.S Numerical Examples

In the sequel, we only adopt Gaussian-type neuro-fuzzy learning algorithm to the system identification. Similarly, we may solve the problem by Triangular-type neuro-fuzzy learning algorithm.


First, the proposed method is applied to the following nonlinear function with one input variable and one output variable.

Example I: y= 0.3 +0.9x/(1.2x3+x + 0.3)+ TJ (36)

where x e [0,1] is an input variable, n e [0,0.15] is a noise created randomly.

We assume that n = 30, c = 3, s = 2 and A = Unit matrix in Step 1, and e = 0.0001 in Step 5, respectively, and 30 input-output training data are used randomly. FCM process obtains 3 cluster centers after 18 iterations as shown in Fig. 4.5. Then, from (32) - (35) initial parameters of fuzzy rules can be set as shown in Table 4.1. Fig. 4.6 shows an illustration for the desired model of (36) and the corresponding fuzzy model for 30 checking input data given randomly, based on the fuzzy rules of Table 4.1. In this case, the mean square error is 0.004071, the maximum absolute error is 0.189695.

Moreover, by using the learning algorithm (6) - (9), we tuned the initial parameters of fuzzy rules of Table 4.1, where the learning rates are set as a= jff= y= 6= 0.01, and the threshold 6 , which stops the learning process, is 0.002. Then, the fuzzy rules for identifying (36) are generated after 43 iterations as shown in Table 4.2. Here we employed 30 training data for the learning process given in Fig. 4.5.

9s>

O Input-output data

• Cluster centers

I

Fig. 4.5 Input-output data and Cluster centers for (36).


CD 0 O

O Desired model

• Fuzzy model

I 0 1

Fig. 4.6 Desired model for (36) and fuzzy model by Table 4.1.

Table 4.1 Initial parameters of fuzzy rules obtained by FCM for (36).

No. 1 2 3

«i,(0) 0.2065 0.5218 0.8148

MO) 0.0798 0.0748 0.0930

y&) 0.7085 0.8588 0.7981

w,(0) 0.6956 1.0000 0.6505

Table 4.2 Fuzzy rales generated by the proposed method to identify (36).

No. 1 2 3

flu 0.1489 0.4560 0.8198

bu 0.0719 0.1357 0.0363

yt 0.5830 0.8741 0.7793

w, 0.4560 1.0073 0.6497

When the fuzzy inference is performed for the former 30 checking input data again by using the fuzzy rules of Table 4.2, a better approximate result is obtained as shown in Fig. 4.7. In this case, the mean square error is 0.0016, and the maximum absolute error is 0.083778.


y

Q

O Desired model

D Fuzzy model

Fig. 4.7 Desired model for (36) and fuzzy model by Table 4.2.

Next, we compare the proposed method with the direct method (6) - (9) (namely, tune fuzzy rules without FCM process) by means of the following nonlinear function:

Example 2: y = [4sin( KX^ + 2cos( 7ix^\l\2 + 0.45 + 77 (37)

where x2, x2^[-l,l] are input variables, 7j e [0,0.05] is a noise created randomly.

As in Example 1, we assume that n = 100, c = 16, s = 2 and A = Unit matrix in Step 1, and e => 0.0001 in Step 5, and 100 input-output data are employed randomly. Then by using the proposed method, we can obtain 16 elliptical-type cluster centers with the lengths corresponding to the linguistic input variables x, and x2, after 22 iterations as shown in Fig. 4.8. By (32) - (35), we can transfer them into initial parameters of fuzzy rules as shown in Table 4.3. On the other hand, we can have another kind of initial parameters of fuzzy rules as in [10-12,19] directly, as shown in Table 4.4. Furthermore, we tuned the above two kinds of initial fuzzy rules in Tables 4.3 and 4, by using the learning algorithm (6) - (9), respectively, where 40 training data are taken in each learning process, randomly. Table 4.5 shows the iteration of the learning, the error of evaluation and the maximum absolute error for the checking data to identify (37) by the


direct method (A) and the proposed method (B), respectively. Here, the learning rates are taken as a= 0= y= 0= 0.1, and the threshold <5 is 0.0001; the error of evaluation is taken as a mean square error for the checking data, and 2601 checking data (x^2) were employed from (-1,-1) to (1,1), equally.

x2 _ . . . .

1

0

Fig. 4.8 Input data and Elliptical-type cluster centers for (37).

Table 4.3 Initial parameters of fuzzy rules for (37) by the proposed method.

No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

«„(0) -0.6181 -0.2903 0.8647 0.1797 -0.7102 -0.0781 0.8912 -0.4792 0.2408 -0.3148 0.3867 0.8256 0.5043 0.6259 -0.7490 -0.7701

6„(0) 0.1303 0.1234 0.1727 0.1143 0.1043 0.0788 0.1108 0.1533 0.1336 0.1631 0.1236 0.1025 0.1557 0.1631 0.1770 0.1466

«2,(0)

0.0320 -0.6840 0.8611 -0.8318 -0.8132 -0.6529 -0.7603 0.7728 0.0087 0.3675 0.6080 -0.5660 0.3015 -0.2328 0.6890 -0.3639

M0) 0.1397 0.1404 0.1503 0.1274 0.1646 0.1689 0.1435 0.1780 0.1555 0.1901 0.1253 0.1192 0.1603 0.1778 0.1511 0.1486

v,(0) 0.3497 0.1299 0.4511 0.5001 0.0968 0.3252 0.4693 0.0423 0.8563 0.2809 0.7194 0.6016 0.8926 0.8603 0.1748 0.3364

W,(0) 0.9497 0.9874 0.5554 0.6824 0.8505 1.0000 0.7298 0.7072 0.7936 0.8155 0.8419 0.8413 0.7970 0.8128 0.6803 0.7489

m inpm aara CZ) Elliptical-type cluster centers


Table 4.4 Initial parameters of fuzzy rules for (37) by the direct method.

No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

*w(0) -1.000 -1.000 -1.000 -1.000 -0.333 -0.333 -0.333 -0.333 0.333 0.333 0.333 0.333 1.000 1.000 1.000 1.000

MO) 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603

OJXO)

-1.000 -0.333 0.333 1.000 -1.000 -0.333 0.333 1.000 -1.000 -0.333 0.333 1.000 -1.000 -0.333 0.333 1.000

b2i(0) 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603

y,(0) 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000

W,(0)

1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000

Table 4.5 Comparison between the direct method ( 4) and

the proposed method (B) for identifying (37).

No. 1 2 3 4 5 6 7 8 9 10

Iteration (A) (B) 13 8 66 94 8 113 6 4 59 84 19 21 85 283 8 21 32 70 110 12

Error of evaluation

(A) (B) 0.00662 0.00555 0.00524 0.00464 0.00494 0.00270 0.01061 0.00661 0.00602 0.00428 0.00653 0.00549 0.00250 0.00359 0.00501 0.00435 0.00842 0.00580 0.00304 0.00458

M a x . absolute error

(A) (5) 0.2800 0.2834 0.3700 0.3436 0.2091 0.3013 0.4287 0.3195 0.4164 0.3048 0.2951 0.3166 0.1952 0.2891 0.2394 0.3129 0.4022 0.3821 0.2514 0.2921

From Table 4.5 we can see a better approximation in the case of the proposed method than that by the direct method except for a few special cases, though the iterations of two methods are almost the same.


4.6 Conclusions

In this chapter, a kind of efficient learning approach for fuzzy rule generation with fuzzy singleton-type reasoning method has been proposed. We have illustrated the efficiency of the proposed learning technique by identifying some numerical examples, which demonstrated better fuzzy inference results than those by the direct learning algorithm. Our results also indicate that the proposed learning method is more reasonable and suitable for constructing an optimum fuzzy system model in the case when applying the fuzzy singleton-type reasoning method to identification systems.

References

[1] J.C. Bezdek, "Pattern Recognition with Fuzzy Objective Function Algorithms", Plenum Press, 1981.

[2] R.L. Cannon, J. V. Dave and J.C. Bezdek, "Efficient implementation of the fuzzy c-means clustering algorithms", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 8, pp. 248-255, 1986.

[3] K.B. Cho and B. H. Wang, "Radial basis function based adaptive fuzzy systems and

their applications to system identification and prediction", Fuzzy Sets and Systems,

Vol. 83, pp. 325-339,1996. [4] R.N. Dave and S.K. Bhaswan, "Adaptive fuzzy c-shell clustering", Proceedings of

the 10th Annual North American Fuzzy Information Processing Society Meeting, pp. 195-199,1991.

[5] S. Horikawa, T. Furuhashi and Y Uchikawa, "On fuzzy modeling using fuzzy neural networks with the back-propagation algorithm", IEEE Transactions on

Neural Networks, Vol. 3, pp. 801-806, 1992. [6] H. Ichihashi, "Iterative fuzzy modeling and a hierarchical network", Proceedings of

the Fourth IFSA World Congress, pp. 49-52, 1991. [7] H. Ichihashi and I. B. Tuksen, "A neuro-fuzzy approach to data analysis of pairwise

comparisons", InternationalJournal of Approximate Reasoning, Vol. 9, pp. 227-248, 1993.

[8] S. Lee and R. M. Kil, "A Gaussian potential function network with hierarchically self-organization learning", IEEE Transactions on Neural Networks, Vol. 4, pp. 207-

224,1991. [9] M. Maeda and S. Murakami, "An automobile tracking control with a fuzzy logic",

Proceedings of the Third Fuzzy System Symposium, pp. 61-66, 1987.


532, 1992. [11]M. Mizumoto, "Fuzzy controls by fuzzy singleton type reasoning method",

Proceedings of the Fifth IFSA World Congress, pp. 945-948, 1993. [12]M. Mizumoto and M. Iwakiri, "Self-generation of fuzzy rules by fuzzy singleton-

type reasoning method", Proceedings of the Ninth Fuzzy System Symposium, pp. 585-588,1993.

[13]H. Nomura, I. Hayashi and N. Wakami, "A self-tuning method of fuzzy control by descent method", Proceedings of the Fourth IFSA World Congress, pp. 155-158, 1991.

[14]H. Nomura, I. Hayashi and N. Wakami, A self-tuning method of fuzzy control by descent method", Proceedings of the IEEE International Conference on Fuzzy

Systems, pp. 203-210,1992. [15]D.E. Rumelhart, J.L. McClelland and the PDP Research Group, "Parallel

Distributed Processing", MA: MIT Press, 1986. [16]Y. Shi, M. Mizumoto, N. Yubazaki and M. Otani, "A method of fuzzy rules

generation based on neuro-fuzzy learning algorithm", Journal of Japan Society for Fuzzy Theory and Systems, Vol. 8, pp. 695-705,1996.

[17]Y Shi, M. Mizumoto, N. Yubazaki and M. Otani, "A self-tuning method of fuzzy rules based on the gradient descent method", Journal of Japan Society for Fuzzy Theory and Systems, Vol. 8, pp. 757-765,1996.

[18]Y Shi, M. Mizumoto, N. Yubazaki and M. Otani, "A learning algorithm for tuning

fuzzy rules based on the gradient descent method", Proceedings of the Fifth IEEE International Conference on Fuzzy Systems, pp. 55-61,1996.

[19]Y. Shi, M. Mizumoto, N. Yubazaki and M. Otani, "A tuning method of fuzzy rules by fuzzy singleton-type reasoning method", Proceedings of the Fourth International

Conference on Soft Computing, pp. 553-556, 1996. [20]L.X. Wang and J.M. Mendel, "Back-Propagation fuzzy system as nonlinear

dynamic system identifiers", Proceedings of the IEEE International Conference on Fuzzy Systems, pp. 1409-1416, 1992.

[21]L.X. Wang, "Adaptive Fuzzy Systems and Control", Prentice Hall, 1994.

Chapter 5 Antecedent Validity Adaptation Principle

for Table Look-up Scheme

Ping-Tong Chan and Ahmad B. Rad

The Hong Kong Polytechnic University Kowloon, Hong Kong

Abstract

In this chapter, we propose an Antecedent Validity Adaptation (AVA) principle for fuzzy systems tuning. It is suggested that the fuzzy rules should be updated with respect to the validity of antecedents. This adaptation principle agrees with the human intuition and fuzzy logic reasoning. The principle is applied to Table Look-up (TL) scheme to model recorded data. Based on this approach an on-line fuzzy identification algorithm is also presented. These methods are successfully applied to model nonlinear systems.

Keywords: Data Modeling, Table Look-up Scheme, Antecedent Validity Adaptation

5.1 Introduction Fuzzy logic has been used extensively for the task of controller design [4]; however, more and more researchers have shown interest to explore the possibility of using fuzzy logic for modeling of complex systems. Fuzzy modeling has been carried out both offline [7] or online [5, 6]. Wang and Mendel [7] proposed a Table Look-up Scheme to generate fuzzy rules, which uses both numerical data and expert knowledge. Nozaki et al [3] proposed a heuristic method to generate fuzzy rules from numerical data. Moreover, Wang [5, 6] proposed gradient and least square methods to generate fuzzy rules on-

77

78 P.-T. Chan & A. B. Rad

line. The main advantage of fuzzy modeling (identification) is that scattered heterogeneous information such as qualitative knowledge, empirical observations, measured data and available a priori information can be interpreted and represented in a coherent format. Due to these properties, complex and ill-defined nonlinear systems can be modeled with simple structures instead of sophisticated mathematical models. This is especially effective for modeling systems that can be controlled by a skilled human operator who is equipped with a fuzzy model of the underlying dynamics acquired by combination of intuition, intelligence and practice.

In this chapter, we suggest an Antecedent Validity Adaptation (AVA) principle. The adaptation principle agrees with the human intuition and fuzzy logic reasoning. The crux of the method is to divide the data into each of its original linguistic antecedents and update the corresponding consequences with respect to their validity measure according to their product inference. The algorithm highly utilizes the resources in the fuzzy system to assimilate knowledge embedded in the data and is also capable of summarizing the available expertise. The main work in this chapter is to improve the Table Look-up Scheme, instead of updating the most significant rules, the proposed method uses more information form the available data.

The rest of this chapter is organized as follows: Section 2 introduces the antecedent validity adaptation (AVA) principle. The principle is applied to refine a Wang and Mendel's table look-up scheme [7] for recorded data followed by a simulated example in Section 3. The algorithm will be extended to form an adaptive identification algorithm in Section 4 and simulated example will be included to show the performance of the on-line modeling. Finally, the chapter is concluded in Section 5.

5.2 Antecedents Validity Adaptation Principle and Table Look-up Scheme A good engineering practice should be capable of incorporating all available information effectively; in the same spirit, the design should not discard any available information. Antecedents Validity Adaptation (AVA) principle uses the antecedents validity of each data, with respect to fuzzy sets and fuzzy rule, to adjust the output consequences.

Antecedent Validity Adaptation Principle . . . 79

Wang and Mendel [7] proposed a method to generate fuzzy rules which use both numerical data and expert knowledge. In their method, they first divided the input and output domain into several regions. Then, they generated rules from the data. A degree was assigned to each rule. Finally, combined rule-base for the controller was generated by resolving conflicts between different rules. However, one may notice that this method selects only the most influencing rules and leaves the other data information out of consideration. The AVA incorporates this information into the fuzzy system.

A step by step procedure for implementing the AVA for the TL is shown as follows:

Step 1. Define fuzzy sets to cover the input spaces Let us assume that the following m input-output pairs are given as training data for constructing a fuzzy rule-based system: {(xiph,yip))p=l,2,...,m}

where £p) = { x\P), X^ ,..., X{np) }eU={ A/1 X Af X... X A? } c Rn is the

input vector of the p-th input-output pair and y<p> e Vc R is the corresponding output and jx = 1,2,...K\\ ... ; j„ 1,2,...K„. The n-dimensional input space

A, X A2 X...X Anis divided into KiK2 ... Kn fuzzy subspaces. The fuzzy

system performs a mapping from Ua Rn to R, where U= £/]X.. .xUn, and i/j (zR, i=l,2,...,n.

In the proposed method, we have implemented the fuzzy system with triangular membership functions, fuzzy singleton fuzzifier, product inference and center of height defuzzification. The shape of each membership function is triangular: one vertex lies at the center of the region which has membership value equal to one; the other two vertices lie at the centers of two adjacent regions with membership values equal to zero. The total summation of membership values on anywhere over the universe of discourse equals to one. The quality of reconstruction depends upon the number of fuzzy subsets in the input space. Increasing the number of the input fuzzy subsets improves accuracy.

Step 2 Generate fuzzy rules from given data pairs (i) Construct the fuzzy system The rule base consists of a set of fuzzy IF-THEN rules in the form "IF a set of conditions are satisfied, THEN consequent can be inferred". We assume that the rule base is composed of fuzzy IF-THEN rule of the following form:

Rule R^-Jn : rp^ i s A/ 1 and ... andxnis A^n theny is Z?71"-7" y, =


l,2,...K{;...;jnl,2,...Kn (1)

where R^"]n ls m e label of each fuzzy IF-THEN rule and bJ^'"Jn is the consequent real number.

From the I I ._ , K fuzzy IF-THEN rules: i - i i

£--ioW*,)] ^) = —^rnr—; <2>

7i=l 7,=1 '=1

where j'=l, 2, ..., n and j , = 1,2,...AT,; ... ; y'n 1,2,...Kn, bJl""Jn are free

parameters to be designed, and A,-' are designed in Step 1,

(ii) Collect free parameters

bJ\-Jn ft=(fc,-\...,^,",,fc,2,-",,...,^2",,...,^,"Ar",...,^^")r

and rewrite as Ax) = bTa(x) (3) where a(x) is a_J ] _ . , AT . -dimensional matrix with its /i.../„-th elements

1=1

i-itn^c^)] ah ~ln (x) = -7 'f^-„ (4)

A' 7,=1 y„=l i=l

(iii) Select the initial parameter b(0) If there are linguistic rules from human experts whose IF parts agree with the IF

parts of (1), then choose (0) to be the centers of the THEN part fuzzy sets in these linguistic rules. In this way, we can construct the initial fuzzy system from conscious human knowledge. This offers the advantage of no need to set the centers of the consequences before modeling as the case of Mamdani

Type fuzzy systems [1]. The antecedent validity of the rules RJ^,Jn for the data is given by Eq. 4.


Step 3 Create a Combined Fuzzy Rule Base

m

£'.-'. =£=! ( 5 ) X«(P)'"J"

5.2.1 Illustrative example Let us explain these procedures with an example to help clarify the discussion. In this example, the task is to generate a set of fuzzy rules to formulate a mapping/[(JCI, x2); y]. Assume that the domain intervals of X\, x2, and y are [x{~, *i+L [*2~> *2+] a nd [y~: y+], respectively.

A step by step procedure is outlined as:

Step 1. Define fuzzy sets to cover the input spaces Consider the normalized domain [*i_, x\+] = [0, 1], [x2~, x2

+] = [0, 1] and [y~, y+] -[0, 1]. X\ is divided into 5 fuzzy sets, x2 is divided into 7 fuzzy sets as depicted in Fig. 1. Assume also that X\ and x2 are inputs, and y is the output. This simple two-input one-output case is selected to demonstrate the basic concept of our new approach.

82 P.-T. Chan & A. B. Rod

(0.675; CE:0.3,B1:0.7) H(x,) (0.65;CE:0.4,B1:0.6) (0.8;B1:0.8,B2:0.2)

B2

(0.28;S2:0.3,S1:0.7)

H(xJ S3 S2

(0.36;S1:0.9,CE:0.1) (0.5;CE:1)

(0.525; CE:0.9,B1:0.1) (0.55;CE:0.8,B1:0.2) (0.677; CE:0.3,B1:0.7)

B2

Figure 5.1 Input and output fuzzy sets and three data and their corresponding membership values.

Step 2 Generate Fuzzy Rules from Given Data Pairs Suppose that we are given three sets of data. Data 1: (0.8,0.28; 0.525); Data 2: (0.65,0.5; 0.677); and Data 3: (0.675, 0.36; 0.55). For the data sets 1 and 2, we determine the degrees of membership for X\ and JC2' in different regions. For example, in Fig. 1, xi =0.8 has a membership value of 0.8 in B l , and a membership value of 0.2 in B2, and zero membership in all other regions. The membership values for these data sets are shown in Table 1.


*, ( 1 )-0.8 x2

(1) - 0.675 y(1) - 0.28 xi{2) - 0.65

x2(2) - 0.5

y(2) - 0.677

Fuzzy set 1 : Membership value Bl:0.8 S2:0.3 CE:0.9 CE:0.4 CE:1 CE:0.3

Fuzzy set 2 : Membership value

B2:0.2 S 1:0.7 Bl:0.1 B 1:0.6

Bl:0.1 Table 5.1 Fuzzy set and Membership Value of Data 1 and Data 2.

Next, we assign the consequences with respect to the input linguistic variable with degree equal the antecedent validity. The results for data sets 1 & 2 are summarized in Tables 2 and 3 respectively. Data 1(0.8, 0.28; 0.525) TL Rules

IF xt is Bl and x2 is SI THEN y = CE (0.5)

Antecedent Validity

0.8x0.7 = 0.56

TLV Rules

IF X\ is B1 and x2 is S2 THEN y = 0.525 IF xi is B1 and x2 is SI THEN y = 0.525

IF x\ is B2 and x2 is S2 THEN y = 0.525

IF X\ is B2 and x2 is SI THENy = 0.525

Total

Antecedent Validity 0.8x0.3 = 0.24

0.8x0.7 = 0.56

0.2x0.3 = 0.06

0.2x0.7 = 0.14 = 1

Table 5.2 Fuzzy rules formed by TL and TLV (Data 1).


Data 2(0.65, 0.5; 0.677) TL Rules

IF xi is Bl and x-i is CE THEN y = Bl (0.75)

Total

Antecedent Validity

0.6x1 = 0.6

= 0.6

TLV Rules

IF X\ is CE and x2 is CE THEN y = 0.677 IF X\ is B1 and x2 is CE THEN y = 0.677

Total

Antecedent Validity 0.4x1=0.4

0.6x1 = 0.6

= 1 Table 5.3 Fuzzy rules formed by TL and TLV (Data 2).

Step 3 Create a Combined Fuzzy Rule Base The numerical data are coded into a common framework by the consequents of the FBF given by

P

1=1

where p is the number of training data, fa the antecedent validity of the data to that rule and y-, consequent data value. The resultant rule table for the three sets of data are shown in Figure 2(a-b).


B3

B2

Bl

x2 CE

SI

S2

S3

0.677 (0.4)

0.677 (0.6)

0.525 (0.56)

0.525 (0.24)

0.525 (0.14)

0.525 (0.06)

Data 2 <bn 0.677

(x,-0.4:CE,0.6:Bl),(x2-l:CE);

Datal <£=) 0.525

(x,-0.8:Bl,0.2:B2),(x2-0.28:S2,0.7:Sl);

S2 SI CE Bl B2

(a) The fuzzy rule base ofTLV for Data 1 and Data 2.

B3

B2

Bl

CE

SI

S2

S3

0.668 (0.43)

0.638 (0.87)

0.55 (0.63)

0.538 (1.19)

0.525 (0.24)

0.525 (0.14)

0.525 (0.06)

. Data 3 <J=i 0.55

(x,-0.3:CE,0.7:B 1), fo-O.fcS 1,0.1 :CE);

S2 SI CE Bl B2

(b) The fuzzy rule base of TLV for Data 1-3.

0.668=

0.638=

0.677*0.4 -K).55»0.03 0.4 +0.03

0.677*0.6+0.55*0.27

0.538= -

0.6 +0.27

0.525*0.56+0.55*0.63 0.56+0.63

Figure 5.2 Rule base generated by Example 1.

Let us denote the Table Look-up with AVA Reasoning with TLV and the Table Loop-up scheme with TL.

Comparing the reasoning with the two algorithms (TL) and (TLV), we get:


Data 1(0.8,0.28; 0.525) (Please refers to Fig.2a and Table 2) TL: TLV: The output is 0.5 and The output is 0.525 and Modeling error = 0.525-0.5 = 0.025. Modeling error = 0.525-0.525 = 0.

Data 2 (0.65,0.5;0.677) (Please refers to Fig.2a and Table 3) TL: TLV: The output is 0.75 and The output is 0.677 and Modeling error = 0.75-0.677=0.073. Modeling error = 0.677-0.677 = 0.

Next, Data 3 (0.675, 0.36; 0.55) is added for modeling as shown in Table 4.

JC/3) - 0.675 x2

(3) - 0.36 j ( 3 ) - 0.55

fuzzy set 1 : Membership value CE:0.3 S 1:0.9 CE.0.8

fuzzy set 2 : Membership value

B 1:0.7 CE:0.1 B 1:0.2

Table 5.4 Fuzzy set and Membership Value of Data 3.


Data 3 (0.675, 0.36; 0.55) TL Rules

IF x, is CE and x2 is SI THEN y = 0.55 IF JC, is CE and JC2 is CE THEN y = 0.55 IF x, is Bl and x2 is SI THEN y = 0.55 IF x, is B1 and x2 is CE THEN y = 0.55

Total

Antecedent Validity 0.3x0.9 = 0.27 0.3x0.1 = 0.03 0.7x0.9 = 0.63 0.7x0.1 = 0.07 = 1

Table 5.5 Fuzzy rules formed by TLV (Data 3).

We assign the consequent with respect to the input linguistic variable with degree equal the antecedent validity. Then, for reasoning with TLV, we have:

Data 1(0.8,0.28; 0.525) The output is 0.52655 (=0.538*0.56+0.525*0.14+0.525*0.24+0.525*0.06) and modeling error = 0.532-0.525 = 0.007. Data 2(0.65,0.5; 0.677) The output is 0.65 (=0.668*0.4+0.638*0.6) and modeling error = 0.65-0.677 = 0.027 Data 3 (0.675, 0.36; 0.55) The output is 0.576 (=0.638*0.27+0.668*0.03+0.538*0.63+0.55*0.07) and modeling error = 0.56974-0.55 = 0.01974

5.2.2 Some remarks on properties ofAVA • The results agree with intuition and fuzzy reasoning The adaptation agrees with the human intuition. It is noted that the proposed method is consistent with the fuzzy reasoning, the total antecedent validity equals to 1; and the input data is updated according to its validity measure; the product inference of antecedents (antecedent validity) of each rules accounts for the data's degree of contribution. In other words, the modeling algorithm is able to obtain the same parameters as the original system with a sufficient number of noiseless input-output training data. The membership value of each rule accounts for the data's degree of contribution, antecedent validity (AVA). AVA uses this value in the reasoning process and calculation the adaptation portion for the rule consequences.


• Degree of freedom in the consequences are increased Mamdani type fuzzy systems, with fuzzy singleton fuzzifier, product inference and center of height defuzzification, are a constrained Fuzzy Basis Function (FBF). The consequences in Mamdani type fuzzy systems are restricted to the fuzzy sets of the output linguistic variables. For example, consider a fuzzy system [7x5; 7] (i.e. the first input variable has 7 fuzzy sets, the second variable has 5 fuzzy sets and the output has 7 fuzzy sets). For this case, Mamdani type fuzzy system has 35 rules with 7 possible consequences; on the other hand, the FBF has 35 rules with 35 possible consequences. When the input fuzzy sets are divided into finer partitions, FBF not only has an increased number of rules, but also has an increased degree of freedom in consequences.

• information Usage is increased Essentially, TL scheme only characterizes the key features; TLV discards no details and absorbs each piece of information proportional to its antecedent validity. The idea of AVA is intuitively straightforward and computationally simple; it makes use of the intrinsic treasure of the available data. With AVA principle, all the data are fitted into a common rule table cooperatively.

TLV makes use of the remaining 44% information of Datal and the remaining 40% information of Data2. TLV has maximum increase of percentage in data usage when both xx and x2 have the membership value equal to 0.5 for two adjacent respectively; and TLV will have 75% increase in information usage. Moreover, with TLV, if the new data is not as dominates as the occupied rule, its information will still take into consideration.

For a 2 variables fuzzy system, TLV makes use of the remaining 3 rules. It is apparent that when the input variables increase, the effectiveness of information extraction by AVA will be more significant. The proposed algorithm can extract more information from the data while the simple one pass procedure of TL is unimpaired.

• Robustness is improved with respect to changes of definition of fuzzy sets It can be observed that TL is more sensitive to the definition of the fuzzy sets due to selection of only the most significant rules. When the centers of the membership functions change, the performance of the fuzzy system change. On the other hand, TLV selects the rules proportional to their antecedents validity. When the definition of the fuzzy sets is changed, the antecedents validity of the fuzzy sets and output consequents will be adapted accordingly. No bias is


against the input data when they are partition to construct fuzzy rules. Every datum gives its contribution when initialized or updated the rule base. This method does not drop any piece of information from the data and it absorbs more knowledge from the training materials.

• Information usage TL emphasis on incorporation of human knowledge; whereas our method aim at extracting numerical information. When extensive important expert knowledge is available, TL is preferable. When expert knowledge is limited or inaccessible, or reliable numerical recorded data (successful control recorded data) is available, TLV is a favorite choice. However, human knowledge can also be incorporated adequately as mentioned in Step 2.

5.3 Simulation results Example 1 In this section, we apply the method to modeling of a non-linear system. The plant to be identified is governed by the difference equation

y(k+l) = 03y(k) + 0.6y(k-l) + g[u(k)] (7) where the unknown function has the form g(u)=0.6sin(nu) + 0.3sin(3ri) + 0.lsin(5nu). From Eq. 7 the identification model is governed by the difference equation

y(*+l) = 0.3y(*) + 0.6y(*-l) +/[«(*)] (8) where /[*] is the form of Eq. 2 with 7 fuzzy sets. This problem has also been studied in [4] and [1]. Fig. 3 shows the outputs of the plant and the identified model, where the input u(k) = sin(2nk/200). It is observed from Fig. 3 that the output of the identification model follows the output of the plant. The fuzzy systems have 9 fuzzy subsets divide over the maximum range of the training data. The squared identification error by TL = 51.35 and the square identification error by TLV= 12.04.


Plant (solid), TL (dotted) and TLV (hidden)

50 100 150 200 250 300 350 400 450 500

Figure 5.3 Outputs of the plant and the identifications of TL and TLV for Example 1.

5.4 Adaptive Fuzzy Identifier The algorithm can be extended to form an on-line identification algorithm by revising step three in Section 2. to update the fuzzy basis function with input data according to AVA

bli-l»(p) = a a ' l J « tf)bll"ln(p-l) +( l -aa / l ' / " g))yp (9)

where a is the forgetting factor. The adaptive behavior will be demonstrated by two examples.

Example 2 The plant to be identified [1, 4] is described by the second-order difference equation

y(k+\) = g[y(k),y(k-l)]+u(k) (10) where g[y(k\ y(k-\)} = y(k)y(k-l)[y(k)+2.5]/{l+y\k)+y2(k-\)] (11) and u(k) = sin (2nk/25). A series-parallel identifier described by the equation

y*(*+l) =Ay{k), y(k-\)} + u{k) (12)


was used, where fiy(k), y(k-l)] is in the fuzzy system to model the non-linearity and a = 0.9. Fig. 4 shows the outputs of the plant and the identification model.

Plant (solid), TLV (dotted)

300

Figure 5.4 Output of the plant (solid) and the identification model (dotted) for Example 2.

Example 3 Time series prediction is a practical problem in economic and business planning, weather forecasting, control, etc. In this paper, we use the fuzzy system designed by the table look-up scheme to predict the Mackey-Glass chaotic time series that is generated by the following delay differential equation:

ds(t) s(t-l) = -bs(t) + a 10

(13) dt H-j?(r —i)1

The prediction of future values of this time series is a benchmark problem that has been used and reported by Wang, [5, 7] etc.


1.4

1.3

1.2

1.1

1

0.9

0.8

0.7

0.6

0.5

0.4

I 1 \ i 1 , -

' ' l"1 f / '• 1 f

: 1 'h i . 1 I , : 10 20 30 40 50 60 70 80 90 100

Figure 5.5 A section of the Mackey-Glass chaotic time series.

Fig. 5 shows 1000 points of x(k). The problem of time series prediction can be formulated as follows: given x(k-n+l), x(k-n+2), ..., x(k), estimate x(k+\), where n is a positive integer. The goal of the task is to use past values of the time series up to time t to predict the value at some point in the future t+P. The standard method for this type of prediction is to create a mapping from D points of the time series spaced apart—that is [x(t -(D-l)) , ..., x(t-D), x(t)], to a predicted future value x(t+P), where D=A and D-P=6 were used.

We use fourth-other Runge-Kutta for the differential equation (13) with time step 0.1, initial condition x(0)=1.2, and f=17. x(i) =0 for /<0. From the Mackey-Glass time series x(t), we extracted 1000 input-output data pairs of the following format: [x(f-18), x{t-\2,x(t-(>), x(t); x(t+6), x(t)\ x(t+6)], where f=125 to 1124. The first 500 pairs were used as the training data set for TLV, while the remaining 500 pairs were the checking data set for validating the identified TLV. The number of triangular membership functions assigned to each input of the TLV was set to seven. Fig. 6 shows the result of modeling of the time series.


1.4

1.3

1.2

1.1

1

0.9

0.8

0.7

0.6

0.5

0.4

• / i j \

I

ft

i (J ! ;

• '

«

i

VI a

1 l .. h 1 \ i

1 i t

W.

r '

! \

I

11 I 1 11 ^

I

Ml k ^

i

i

i

Mr i l l I 1 ! If ! 1 i

i 1 ;

1

y

« , |

J r ' :

1 « I ! "

10 15 20 25 30 35 40 45 50

Figure 5.6 Prediction and the true values of the time series.

5.5 Conclusions We have proposed the antecedent validity adaptation principle for tuning fuzzy rule base. Then, we applied the principle to refine the table look-up [7] (i.e. TLV) scheme for modeling recorded data. The main features of TLV have been illustrated with example. Next, TLV is extended with on-line identification capability. The algorithms have shown improved accuracy without impaired the one pass operation of TL; also it does not need to solve complicated differential equations or matrix manipulations. Essentially, table look-up scheme tries to accomplish the data mining by selecting the most significant rule; AVA manages to achieve the knowledge acquisition by incorporating all the available information with respect to their antecedent validity into a rule table. With AVA to nourishing the knowledge, TLV has a fast convergence and a effective adaptation to accommodate the changes. AVA enriches and redefines the knowledge in the fuzzy system. This principle helps discover, develop the dormant and unused assets to the best advance.


References [1] E.H. Mamdani, Applications of Fuzzy Algorithms for Control of Simple

Dynamic Plant, Proc. IEE; 121 (12) pp.1585-1588 (1974) [2] K.S. Narendra and K. Parthasarathy, Identification and Control of

Dynamically Systems Using Neural Networks, IEEE Transactions on Neural Networks, Vol.1, No. 1, March (1990) 4-27

[3] K. Nozaki, H. Ishibuchi, H. Tanaka, A simple but power heuristic method for generating fuzzy rules form numerical data, Fuzzy Sets and Systems 16, (1997)251-270

[4] M. Sugeno, editor, Industrial Applications of Fuzzy Control, (Amsterdam; New York, North-Holland (1985)

[5] L.X. Wang, A Course in Fuzzy Systems and Control, Upper Saddle River, N.J. :Prentice Hall (1997)

[6] L.X. Wang, Adaptive Fuzzy Systems and Control : desing and stability analysis, Englewood Cliffs, N.J. PTR Prentice Hall (1994)

[7] L.X. Wang, Generating fuzzy rules by learning from examples, IEEE Trans. On Systems, Man, and Cybern., 22, no.6 (1992) 1414-1427

Acknowledgment: The authors gratefully acknowledge the support of The Hong Kong Polytechnic University through the Grant No. G-V471.

Chapter 6 Fuzzy Spline Interpolation in Sparse Fuzzy Rule Bases

Mayuka F. Kawaguchi, and Masaaki Miyakoshi

Hokkaido University

Abstract

This chapter involves the problem of interpolative reasoning in sparse fuzzy rule bases. First of all, the authors propose a method of linear rule interpolation which utilizes the convex hull including two rules. Secondly, we extend such a linear interpolation technique to a non-linear one based on the idea of fuzzy splines. Both of rule interpolation methods give fuzzy interpolation functions which coincide with each rule of the given rule base. Next, we describe a method to generate a fuzzy partition through the fuzzy interpolation functions, which allows us to execute the ordinary approximate reasoning in the given rule base. Finally, some numerical examples demonstrate construction of fuzzy interpolation functions and fuzzy partitions by means of linear and spline interpolations.

Keywords : sparse rule base, interpolative reasoning, linear rule interpolation, fuzzy spline interpolation

6.1 Introduction

Recently, the methodology of interpolative reasoning has attracted attention as a practical problem of approximate reasoning for the case in which only imprecise and sparse pieces of knowledge are given. Koczy & Hirota [12], [13] have defined the concepts of sparse fuzzy rule bases and the distance between fuzzy sets, and have established the method called linear interpolative reasoning, which deduces some adequate conclusion from a sparse rule base. In addition, they [15] have pointed out the effectiveness of interpolative reasoning which allows size reduction of rule bases with a certain accuracy. Relating to Koczy's method, Dubois et al. [9], Baranyi et al. [1], [2], we authors [11] and Hsiao et al. [10] have previously proposed several types of linear interpolation in sparse fuzzy rule bases.

95

96 M. F. Kawaguchi & M. Miyakoshi

On the other hand, Saga et al. [16], Wang et al. [21] and Baranyi et al. [3] have proposed several techniques to combine fuzzy logic and spline functions. Saga et al. [16] have introduced parametric spline interpolation of fuzzy points which have conical membership functions. In their method, the center and the radius of the base of the cone are interpolated separately.

The aim of this chapter is to introduce a new technique of interpolative reasoning through linear or non-linear interpolation functions of given fuzzy rules. This chapter is constituted as follows. Section 6.2 is assigned to the definitions of basic notions and notations. In Section 6.3, we propose a linear method to interpolate a new rule into an arbitrary point in input space. In Section 6.4, we extend the linear rule interpolation technique to a non-linear one based on fuzzy splines by applying the basic idea of Saga's method to sparse fuzzy rule bases. In Section 6.5, we introduce fuzzy interpolation functions and describe an algorithm to construct a fuzzy partition which covers the input space of the given rule base. Section 6.6 is assigned to the conclusions of this work.

6.2 Sparse Fuzzy Rule Bases

Throughout this chapter, we consider a sparse fuzzy rule base

R = {R^Rj,...,!^} , where Ri=(Ai-*Bi) (i = \,...,r), At and 5, are

fuzzy concepts represented by fuzzy subsets of the universes X and Y, respectively, and R has some gaps i.e.

r

A'-|Jsupp(4)*0. (1) i=i

According to the pioneer work by Koczy et al. [12], [13], we also assume that the universes X and Y are totally-ordered metric spaces. Moreover, for the sake of convenience, we treat the universes X and Y as intervals on the

real line R, and assume that At and Bt are fuzzy intervals i.e. fuzzy

subsets satisfying normality, convexity, upper semicontinuity and

Fuzzy Spline Interpolation in Sparse Fuzzy Rule Bases 97

support-boundedness. The a -level sets of a fuzzy interval At are defined

as

-<4|V* =

{xWA, (*) ^ a \ f°r a 6 (0>1]

clj xmA, (x) > a J for a = 0, (2)

4= U «*/. a=[0,l]

(3)

where //^.(*) is the membership function of At. Then, Aia and fi/a

are finite closed intervals for Vae[0,l] as illustrated in Fig.6.1, and denoted

by

VAi

a

0

At

9L L

aia afa

X

minA, max A.

Fig.6.1 An a-level set of a fuzzy interval At

Aia=[aj-af-a, a , + a * ] .


A,a = [mmAia, maxAia] = [a, -afa, a, + a £ ] , (4)

Bia = [minfi,a, max5,a] = [b,-b\a, b, +b£\. (5)

In this chapter, the symbol F(X) denotes the family of all fuzzy intervals

of the universe X .

As it is well known as the L-R representation [8], a fuzzy interval At is

represented by its mean value a, (i.e. fiA . (a,) = l) , left and right spreads

ai = a,0 and af = aj0, left and right reference functions LA and RA as

follows (see Fig.6.2):

A=W^,af] (6)

VAX*) =

LAkai-x)/af') for x<ai (7)

^((*-o,-) /of) for at<x,

where LA (and RA) is a monotone decreasing, and left continuous function

satisfying LA(0) = l and Z, (l) = 0 (i^(0) = l and ^ ( l ) = 0).

We often refer to the following definition of a partial order in F{X):

def.

AjÂj <=> minAia < winA a and maxAia < maxAja

forVae[0,l].


PA,

LA((al-x)/af') RA((x-a,)la?)

Fig.6.2. L-R representation of a fuzzy interval At

6.3 Linear Rule Interpolation

6.3.1 KH-Method and Convex Hull Method

In this subsection, we discuss two methods to give some adequate reasoning conclusion by means of linear interpolation in a sparse rule base. Let's consider the simplest form of a fuzzy reasoning problem in a sparse rule base

according to the past works [18]-[20]; where an observation A* eF(X)

occurs between two rules A± => 1\ and A2 => B2:

Rule 1 :

Rule 2 :

Observation:

4=>A

A2^>B

A*


Conclusion: B'

As the first method, let us recall Koczy's linear rule interpolation [12], [13]

(hereafter, we call it KH-method), which gives the conclusion B* eF(Y) in

the case that AÂ*Â2 an(i A-^2 as follows:

• „» • „ rain Al—rain Ai„ , . „ . „ >. minBa =min5 l a + . " r - ^ - (nunS 2 a -min£,a)

nun A2a-ram Ala

„» „ maxyL, - max^4,„ -:B„ = maxBia + - —(maxS?,, -maxS^)

rmxA2a - max4 a

for Va e[0,l].

max

Fig.6.3 illustrates KH-method on X - Y plane.

Fig.6.3. Koczy's linear rule interpolation (KH-method)


As the other method, we propose an interpolation using the convex hull of

the Cartesian product of Aia and Bia . The usage of such a convex hull has

been suggested by Dubois et al. [9] in their discussion regarding approximation of control laws.

Let Sa be the convex hull of Ala x Bla and A2a x B2a as shown in

Fig.6.4. Here, Sa is represented as the set of the points (x, y) e X x 7

such that

[bL, A'bFa + (1 - A.y& ] for x e Ala = [<£ , <& ]

ye'[M£,+(X-X)b\a,k'bfa + (1 - A > £ ] for xe[a*a,a^]

[Ab[a + (1 - l)b\a, b«a ] for x e A2a = [«&, <& ],

(9)

where

x = Aa?a + (1 - A)aâ = l'a[a + (1 - A')«&,

A, X' e [0,1].

Now, we determine the « - level set Ba of the conclusion as the greatest

closed interval on 7 such that AaxB^c, Sa for the given observation A*.

Then, we obtain the following formulae instead of (8):

M. F. Kawaguchi & M. Miyakoshi

Fig.6.4. The convex hull Sa .

AlQA*QA2 and B1^B2

. _» . D max^* -maxAla , . _ . _ . vamBa = minBla + -f f-(mmB2a - mmBla)

maxA2a -maxAla

(10)

n a x ^ a -maxîQ,

D* D min.4*-min,4 la, , n •. max5 a = maxfta + iS-(maxB2a -maxB, )

minA2a - min^4la

forVae[0, l ] ,

Al]QA*Â2 and B^B^

. D . . „ m i n ^ „ - m i n A „ min5 a = min5 l a + —

vamA2a-vamAla

m i n ^ * - m i n 4 a , . • D \ i a - ( m i n 5 2 „ - m i n a „ )

m i n A „ - m i n A „ â âJ rar\A2a-minAla

D* r, max^4* - maxA- , _, . maxBa = maxBla + 2 lz-(maxB2a-maxBla)

max^2 a - maxAla

for V « e [0,11.

(11)


Fig.6.5 shows our interpolation on X - Y plane.

Fig.6.5. The new method of linear rule interpolation (convex hull method).

IS If Ba^Ba. holds for \/a,a' such that a > a', then B= (J aBt ae[0,l]

a fuzzy interval. Moreover, in this case, B coincides with the greatest solution of the fuzzy relational equation [17] :

A xB'cS,

i.e. ndi^juA.(x),fiB.(y)) < Ms(x,y),

(12)

(13)

where S = (J a Sa is a fuzzy relation on x x Y • «e[0,l]

As pointed out by Koczy et al. [14] and Shi et al. [18]-[20], in KH-method,

even if all of the fuzzy rules A± => ^ and A2 => B2, and the observation A


are defined as fuzzy intervals, the conclusion B does not always form even as

a fuzzy set. In other words, B*ac,B*a, (a>a') does not hold in general.

The same situation can occur also in our newly proposed method. In order to avoid such a difficulty, the authors try a different approach from the above-mentioned interpolations to cope with the gap of a rule base in the following subsection.

6.3.2 Linear Interpolation with L-R Fuzzy Rules

We describe a method to insert a new fuzzy rule A=> B into the gap between

two rules A1 => 5[ and A2 => B2 by means of linear interpolation. From

the practical point of view, we assume that the antecedent parts Al and A2

of the rules are given as LA - RA fuzzy intervals, and the consequent parts Bl

and B2 are given as LB-RB fuzzy intervals, i.e.

a\a=a\LA-\a\ afa=afRA\a\

$a=b\Li\a\ b*a=bfRB-\a) (i = l,2).

Now, let's discuss the condition for our interpolation method proposed in the

previous section, under which the observation A is an LA-RA fuzzy

interval and the conclusion B is an LB - RB fuzzy interval.

Theorem 1. Let an observation A be an LA-RA fuzzy interval. Then

the conclusion B for A by means of linear rale interpolation is an LB - RB

fuzzy interval if and only if the following conditions hold:


aL = a[ +c(«2 ~a[\ aR = a? +c(a% - o f ) ;

b = b1+c(b2-b1);

bL=lt+c{bL2-bl\ bR=b«+c(b«-b«),

(15)

where c = (a - ax )/(a2 - «i) •

Proof. Let us prove the case that B1^B2. The first expression of (10) can

be rewritten as follows:

(*-*)- (*>-*)+ji ~"\(f;°\{(h -AH*. -*&)} • {a2a-a2)-\ala-al)

Constituting (14) and aa=a RA (a) into the above equation, we have

x {(A2 - fc^1 («)) - (6, - if v 1 («))} •

(16)

(=> ) Constituting the assumption that B is an LB-RB fuzzy interval i.e.

b^ = bLLfl(a) into (16), we obtain


{(a^-a^\a)~(a2-a1)}{(b~biy(bL-bt)LB-\a)}

= [(aR-aÂ-\a)~(a-a1)}{(b2-bl)-(b^-bi)LB-\a)}.

Considering the condition under which the above equality holds for any

reference functions RA and LB, we get the following four equations:

(a«-a?)(bL-b[) = (a"-a^-bt)

(a2-a1){bL-blL) = (a-a1)(bi-bl

L)

{a2-ax\b-bx) = {a- ai)(b2 - b,)

From (18) and (19), we have

b = bl+-^^(b2-bl) = bi+c{b2-b1), (20) a2-ax

respectively. Constituting (20) into (17), we obtain

_fl _ -ft • b-bx i R R\

a ai + i — r l a 2 -«i J

a-, - a, v ' v ;

(17)

(18)

(19)


Moreover, we can obtain the conditions with respect to aL and bR

expressed in (15) from the second expression of (10) in the same way. (<= ) Constituting (15) into (16), we have the left side of (16) as

(b-b£)-(by-btLB-l(a))

= {b1+c(b2-bl)-b^}-(b1-blLLB

l(a))

= c(b2-bl)-b^+btLB1(a),

and a part of the right side of (16) as

(aRRA-\a)-ay(aZRA-\a)-ai)

(a%RA-\a)-a2)-(a?RA-\a)-ai)

{a*+c(aR-aR)-a*}RA-\a)-{a-ax)

(aR-aR)RA-l(a)-(a2-ai)

c(aR-aR)RA~l(a)-c(a2-ai)

~ (o2-4)RA-\a)-{a2-ax)

= c.

Thus, we gain the following result by constituting the above expressions into

(16) and then solving it for b^ :

- c{(b2 - b![LB-\a)) - (ft, - b^Lf\a))}

= {b1L+c(b2

L-blL)}Lf\a)

= bLLf\a).


Moreover, we can obtain b£ = bRRB~1(a) from the second expression of (10).

Therefore, we have shown that the conclusion B is an LB-RB fuzzy

interval. O.E.D.

We have proven the above theorem for the case that Bl^B2 . In the same

manner as for it, we can prove it also for the case that # 2 - A by using (11)

instead of (10). Theorem 1 makes it possible to insert an L-R rule at any point

fl6[d1,a2]cl between two rules Al=>Bl and A2=>B2. Fig.6.6

demonstrates a fuzzy rule A => B as the Cartesian product of A and B in the case that A and B are triangular fuzzy intervals i.e.

y

R = AxB

Fig.6.6. A fuzzy rule R = AxB interpolated at a point x = a .


LA(x) = RA(x) = l-x,LB(y) = RB(y) = l-y;

(21)

LA~\a) = RA~\a) = Lf\a) = RB~\a) = \-a.

Here, it should be noted that the above theorem holds in both linear rule interpolation methods: KH-method and the convex hull method. Furthermore, the theorem coincides with the result by Shi et al. [18], [20] when L-R fuzzy intervals reduce to triangular fuzzy intervals.

6.4 Non-linear Rule Interpolation by Fuzzy Spline

The purpose of this section is to extend the method of linear rule interpolation which we have newly proposed in the previous section, to a non-linear one based on fuzzy splines. Saga et al. [16] have fuzzified parametric spline interpolation in order to identify free-hand curves on X - Y plane. On the other hand, the authors apply the basic idea of Saga's method to non-parametric spline interpolation to describe the characteristics of input-output systems.

6.4.1 Non-parametric Spline Interpolation

Given N pairs of data {xhy,) i = 0,...,N -I such that xt<Xj for i<j,

a spline curve [6] of degree K-l is represented as a linear combination of

B-splines BjK(x) of degree K-l:

s(x)=YjaiBiK{x). (22) ;=0

The B-splines BiK(x) are obtained by de Boor-Cox Algorithm [5] from

N + K knots 4k {k = 0,l,...,N+K-l) onthe x-axisas


fl ( £ < * < £ , + 1 ) fi/iW = \ . , , _ , (23)

0 ( x < ^ , x > ^ + i ) ,

fi,xW = 7 ± i V A « W + ~ 7 L B1 + i ^ i ^ ) - (24)

The coefficients « , in (22) are obtained as the solution to a system of linear

equations s(xj) = yi (i = 0,...,N -l) when the knots E,k satisfy

A' jc( x i )>^ *e- Schoenberg-Whitney's condition £,•<*,•<£,+£ . The

spline curve s(x) is a piecewise polynomial which is smoothly connected at

each knot. Here, the term "smoothly connected" means that the derivatives of

s(x) of order 1,2,..., K - 2 (K > 3) are continuous.

6.4.2 Spline Interpolation with L-R Fuzzy Rules

Now, the authors propose a method to interpolate a rule in a sparse rule base

R = [Rl,R2,...,Rr} and to construct a fuzzy spline curve for R. Let us

consider the conventional spline functions f(x) , wfa(x) , wfa(x) ,

w%a(x) and wga(x) for (a,,£,) , (a . - .a^) , (a,-,a£) , ( o , , ^ ) and

(a,, &/a J, respectively, in the sense of (22). Then, we can interpolate a new

rule R(x) = (A(x) => £(*)) at arbitrary x in the input space X as follows:

A(x)= | J a[x-wfa{x), x + wfa{x)] (25) a=[0,l]


*(*)= U <*[Ai)-"U*\ /(*)+<*(*)] (26) a=lO, 1]

It should be noted that A(x)eF(X) and B(x)eF(Y). According to the

previous section, we regard a fuzzy inference rule R, as the Cartesian product

of Af and 5, , and define a fuzzy spline curve for a fuzzy rule base

R = {RhR2,...,Rr} as a fuzzy graph:

G = \jA(x)xB(x), (27)

i.e.

Ga= \jAa(x)xBa(x)

ieA'

= U [ * - w f a ( * ) , * + < ( i ) ] x [ / ( * ) - ^ ( * ) . / ( * ) + W&z(*)]-

(28)

Here, G is a fuzzy relation on XxY and is represented by a membership

function ^ : I x r - > [ 0 , l ] ,

Next, in order to execute the fuzzy spline, we simplify it by using L-R representation of a fuzzy interval as shown in (6) and (7).

When Aj and 5, are L-R fuzzy intervals as A,= [0,, af,a*\ and L iLARA

Bj^b^b^bj*] , respectively, for each / e{0,L...,/•} , we need to

evaluate the following five spline functions: f(x), wf(x), w?(x), w0(x)


and wg(x) for ( « , £ , ) , (a,af), (a,a?) , (atb^) and (a,b*) ,

respectively. Therefore, eqs.(25), (26) and (28) are rewritten as follows:

A(x)

= (x,w7L(x),M>f(x)) (25)'

= ( J a[x-L y l -1 (a)w/ ' (x) ,x + /? i<-1(a)w /

/ i(x)], a=[0,1]

fl(x)

= (/(*), *o(*)> **(*) ) , (26)'

= U a[f(x)-LB-\a)w^(x),f(x)+RB-l(a)wf(x)\,

G « s U {[^-^"1(«KL(^)^ + "1(«)H'f(x)]

x [f(x) - LB~\a)M>k (x), / ( x ) + iJg"1 (a)w* (x)] }.

(28)'

Fig.6.7 demonstrates a fuzzy rule R(x) as the Cartesian product of A(x)

and S(x) in die case that A(x) and B(x) are triangular fuzzy intervals i.e

eq. (21) holds.


R(x) = A(x) x B(x)

Fig.6.7. A fuzzy rule R(x) = A(x) x B(x) interpolated at a point x = x .

6.5 Interpolative Reasoning Using Fuzzy Interpolation Function

6.5.1 Fuzzy Partition

Next, we describe a method of interpolative reasoning for an arbitrary input

A*eF(X). Let us recall that G is a fuzzy relation on XxY. Therefore,

we can derive the conclusion B* e F(Y) of inference for a given observation

A* eF(X) from G as the relational composition of A* and G i.e. .

- » * A*

B =A °G,

MB-(y)= sup minj^^.(x), n~(x, .y)} •

(29)

(30) x&X

114 M. F, Kawaguchi & M. Miyakoshi

Since n^(x,y) cannot be expressed in the explicit form of a function with

respect to x and y, we cannot directly evaluate (30). Nevertheless, an

approximate conclusion B can be obtained by generating a fuzzy partition [9]

A{xl),...,A{xp) covering supp^4* , and then by applying the conventional

fuzzy reasoning method e.g.

B=Ao f P ^

U4oM*,) (31)

Here, A\xA and Byxjj are L-R fuzzy intervals interpolated at the point

x = Xj into the input space X and the output space Y, respectively:

A'JH'J'VH'JWM)^' ( 3 2 >

* y ) = W * > o W * ) ) w - <33>

We can generate a new rule base

R' = {R(xJ) = A(xJ)^B(xJ)(j = \,...,p\

through a fuzzy spline curve G on condition that the mean value of AyXj J

coincides to the right-side value of supp .4 (*,-_!) i.e. Xj = x,-_i

as the following algorithm illustrates.


<Algorithm >

Step 1. xi:=«i; y':=l

Step 2. Repeat Step 3 until Xj>ar-af

Step 3. j:=j + \; xy.= xhl + wf (x^)

Step 4. p:=j+\; xp:=ar

Step 5. For j : - \ to p do Step 6

Step 6. Output j , Xj, wf (xj), wf [xj);

J,fM»>b(xj),*>oM

It is clear that the set of the antecedents M(JCJ), A(X2),..., ^ ( ^ ) | of R'

forms a fuzzy partition on the input space X in the sense of Dubois et al. [9]:

(i) the supports of A{xx), A(x2), • • •, A(xp) form a coverage of X ,

(ii) the cores of A(x1), A(x2),•.., A\xpJ are pairwise disjoint,

(in) A(Xl),A(x2),...,A(xp) are ordered i.e. x, -wfa(Xi)> x} -wfa(Xj)

and Xj+wfa(xj) >Xj+wfa{xj) for V / > y .

6.5.2 Numerical Examples

Fig.6.8 and Fig.6.9 demonstrate numerical examples of linear rule interpolation and non-linear rule interpolation by fuzzy spline, respectively.


Fig.6.8(a) and Fig.6.9(a) show the supports of the given sparse rule base

( r = 6) . In both cases, Ai and 5, (/ = 1,2,...,6 ) are triangular fuzzy

intervals as shown in Fig.6.6 and Fig.6.7, and symmetric. Fig.6.8(b) and Fig.6.9(b) illustrate fuzzy interpolation functions represented

by 50 rules by means of linear interpolation and spline interpolation, respectively.

Fig.6.8(c) and Fig.6.9(c) show the fuzzy partitions which consist of 17 rules, generated by the Algorithm described in the previous subsection. It should be

noted that both partitions cover the region between Al and \ in the input

space.

6.6 Concluding Remarks

This chapter has presented the fundamental idea of fuzzy interpolative reasoning in which a fuzzy partition is generated through the interpolation function of the given rules. Especially, the authors have introduced the non-linear rule interpolation method by means of splines functions in addition to the linear method based on the convex hull. Our method here makes it possible to apply the ordinary approximate reasoning method to sparse fuzzy rule bases.

For the next stage of this approach, our method for the case of one-input-one-output system should be extended to the case of multi-inputs by using multivariate splines [4]. Also, it should be an important problem from the practical point of view, to apply the revision principle [7] to our rule interpolation technique instead of constructing a fuzzy partition, as Baranyi et al. [2] have suggested in their work.


(a) Given rules (r = 6 )

(b) A fuzzy linear interpolation function represented by 50 rules.

(c) A fuzzy partition generated through the above fuzzy linear function. Fig.6.8. Linear Rule Interpolation.


X

(a) Given rules ( r = 6 )

X

(b) A fuzzy interpolation curve represented by 50 rules ( K = 4 ).

(c) A fuzzy partition generated through the above fuzzy curve (K = 4 ).

Fig.6.9. Rule Interpolation by Fuzzy Spline.


References

[I] Peter Baranyi, Tamas D. Gedeon, Laszlo T. Koczy, "A General Interpolation Technique in Fuzzy Rule Bases with Arbitrary Membership Functions," Proceedings of International Conference on Systems, Man and Cybernetics,

Beijing, pp.510-515, 1996.

[2] Peter Baranyi, Sandor Mizik, Laszlo T. Koczy, Tamas D. Gedeon, Istvan Nagy,

"Fuzzy Rule Base Interpolation Based on Semantic Revision," Proceedings of

International Conference on Systems, Man and Cybernetics, San Diego, 1998. [3] Peter Baranyi, Yeung Yam, Chi-Tin Yang, "SVD Reduction in Numerical

Algorithms: Specialized to B-Spline and to Fuzzy Logic Concepts," Proceedings

of 8th IFSA World Congress (IFSA '99), Taipei, pp.782-786, 1999. [4] Charles K. Chui, Multivariate Splines, SIAM, 1988. [5] Carl de Boor, "On Calculating with B-Splines," Journal of Approximate Theory, 6,

pp.50-62, 1972. [6] Carl de Boor, A Practical Guide to Splines, Springer-Verlag, 1978. [7] Liya Ding, Peizhuang Wang, "Revision Principle Applied for Approximate

Reasoning," in Methodologies for the Conception, Design and Application of Soft

Computing (Proceedings of IIZUKA '98) (Eds. Takeshi Yamakawa and Gen

Matsumoto), World Scientific, pp.408^13, 1998. [8] Didier Dubois, Henri Prade, "Operations on Fuzzy Numbers," International

Journal of Systems Sciences, 9, pp.613-626, 1978. [9] Didier Dubois, Henri Prade, Michel Grabisch, "Gradual Rules and the

Approximation of Control Laws," in Theoretical Aspects of Fuzzy Control (Eds. Hung T Nguyen et al.), John Wiley & Sons, pp.147-181, 1995.

[10] Wen-Hoar Hsiao, Shyi-Ming Chen, Chia-Hoan Lee, "A New Interpolative

Reasoning Method in Sparse Rule-Based Systems," Fuzzy Sets and Systems, 93,

pp. 17-22, 1998. [II] Mayuka F. Kawaguchi, Masaaki Miyakoshi, Michiaki Kawaguchi, "Linear

Interpolation with Triangular Rules in Sparse Fuzzy Rule Bases," Proceedings of


7th IFSA World Congress (IFSA'97), Prague, H, pp.138-143, 1997. [12] Laszlo T. Koczy, Kaoru Hirota, "Interpolative Reasoning with Insufficient

Evidence in Sparse Fuzzy Rule Bases," Information Sciences, 71, pp. 169-201, 1993.

[13] Laszlo T. Koczy, Kaoru Hirota, "Approximate Reasoning by Linear Rule

Interpolation and General Approximation," International Journal of Approximate

Reasoning, 9, pp. 197-225, 1993. [14] Laszlo T. Koczy, Szilveszter Kovacs, "Linearity and the cnf Property in Linear

Fuzzy Rule Interpolation," Proceedings of 3rd IEEE International Conference on

Fuzzy Systems, Orlando, USA, pp.870-875, 1994. [15] Laszlo T. Koczy, Kaoru Hirota, "Size Reduction by Interpolation in Fuzzy Rule

Bases," IEEE Transactions on Systems, Man and Cybernetics, Part B, 27, pp. 14-25, 1997.

[16] Sato Saga, Hiromi Makino, "Fuzzy Spline Interpolation and its Application to

On-line Freehand Curve Identification," Proceedings of 2nd International

Conference on Fuzzy Systems (FUZZ-IEEE'93), San Francisco, pp. 1183-1190, 1993.

[17] Elie Sanchez, "Resolution of Composite Fuzzy Relation Equations," Information

and Control, 30, pp.38-48, 1976. [18] Yan Shi, Masaharu Mizumoto, Zhi Qiao Wu, "Reasoning Conditions on Koczy's

Interpolative Reasoning Method in Sparse Fuzzy Rule Bases," Fuzzy Sets and

Systems, 75, pp.63-71, 1995. [19] Yan Shi, Masaharu Mizumoto, "Reasoning Conditions on Koczy's Interpolative

Reasoning Method in Sparse Fuzzy Rule Bases. Part II," Fuzzy Sets and Systems,

87,pp.47-56, 1997. [20] Yan Shi, Masaharu Mizumoto, "A Note on Reasoning Conditions of Koczy's

Interpolative Reasoning Method," Fuzzy Sets and Systems, 96, pp.373-379, 1998. [21] Liang Wang, Reza Langari, John Yen, "Principal Components, B-Splines, and

Fuzzy System Reduction," in Fuzzy Logic for the Application to Complex Systems

(Eds. W. Chiang and J. Lee), World Scientific, pp.253-259, 1996.

Chapter 7

Revision Principle Applied for Approximate Reasoning

Liya Ding1, Peizhuang Wang2, Masao Mukaidono3

1 National University of Singapore, Singapore 2 West Texas A & M University, USA

3 Meiji University, Japan

Abstract

The basic concept of revision principle proposed for approximate reasoning is that the modification (revision) of consequent is decided by the difference (deviation) between input (given fact) and antecedent and the revising processing is based on some kind of relation between antecedent and consequent. Five revising methods have been introduced based on linear and semantic relation for approximate reasoning. As a continuous work, this article discusses the revision principle applied for approximate reasoning with multiple fuzzy rules that contain multiple sub-antecedents. An approximation measure is proposed for the integration of revision. With a generalized approximation measure, the revision principle can be applied for more general cases of fuzzy sets.

Keywords: approximate reasoning, revision principle, linear revising methods, semantic revising methods, semantic approximation, approximation measure

7.1 Introduction

When a rule P —• Q and a fact P' that is only an approximation of P are given, a conclusion will also be expected even on an approximate basis. Here the propositions P and Q are regarded as fuzzy concepts, and the fuzzy concepts are described by fuzzy sets [28; 29]. The inference can be done even when P and P' are not identical based on the concept of approximate reasoning. Approximate reasoning was put forward by Zadeh

121

122 L. Ding, P. Wang & M. Mukaidano

[30; 3l], where linguistic truth value such as very true can be used. Unlike symbolic reasoning based on binary logic, approximate reasoning is related to the semantics of propositions to a certain degree.

Compositional inference and compatibility modification inference are two main approaches for approximate reasoning [l; 2; 7; 13; 17; 18; 23; 27; 29; 30; 31]. The former realizes inference by obtaining an implication relation between antecedent and consequent of a rule and then composing the input with the relation [29]. The latter realizes inference by determining the measure of satisfaction between input and antecedent of a rule and then using the measure to modify the rule's consequent [13].

The revision principle [8; 9; 10; 19; 21; 22] was proposed in a different way. It is based on the belief that the modification (revision) of consequent should be caused only by the difference (deviation) between input (given fact) and antecedent. In other words, when a method of revision principle is used to approximate reasoning the consequent will always be derived as output if input is the same as the antecedent: Q' = Q when P' = P. This important feature is called non-deviation property and it is satisfied by all methods of revision principle [12].

The revising processing is based on some kind of relation between antecedent and consequent. For a given fuzzy rule P —> Q, it is almost impossible to describe precisely the non-linear relation Rp_>c? between PCX and Q C Y. As an alternative way, a relation metrix is often used as an approximate description. However, even with only a finite number of points of P and Q to be taken into consideration, the relation metrix may still be too large to use for inference. So the essential thought of the revision principle is to find a way which is simple for calculation but with acceptable accurecy. Instead of the intact relationship Rp_>Q which is usually hard to get, a simplified relation between the P and the Q is used in the revision principle. We select some representative points < x,fj,p(x) >, x 6 X by certain methods, and for each of them we determine only one corresponding point < y,fiQ(y) >, y GY based on given relational factor(s) to make a relational pair (x, y). The collection of all the relational pairs then forms a simplified relation between P and Q:

A similar realtion between P' and Q' can also be defined, where the P' and the Q' are the given input and the possible conclusion of approximate reasoning. When a rule P —¥ Q and a fact P', an approximation of P

Revision Principle Applied for Approximate Reasoning 123

are given, the task then becomes to deduce the approximate conclusion Q' based on

Q' = fR(Q,P,P') (1)

where /R, is a revising function based on the relation Rp,Q between the P and the Q. When a different Rp,Q is selected, we may have a different revising method and the corresponding approach to keep the Rp,Q between P' and Q'. If a revising method can keep Rp',Q' = Rp,Q for any case, then it is said that the method has the relation keeping property [12; 22]. Following this idea, linear revising methods [8; 9; 10; 19] and semantic revising methods [21; 22] were proposed.

Linear revising methods are the first set of methods developed for revision principle. Using linear revising methods, the conclusion Q' will be calculated by

Q' = Q + AQ (2)

AQ = fL{Q,P,AP) (3)

linearly, where fi, is a linear function based on fixed point law, fixed value law or value-point law.

The method based on fixed semantics law was proposed as the first semantic revising method of revision principle (it has been named as SRM-I to distinguish from SRM-II). The basic idea is based on the so-called semantics of a rule which comes from P.Z. Wang's falling shadow theory [25]. When P —»• Q and a semantic approximation [21] P' of P are given, Q' is calculated by using the semantics of P —> Q with the fixed semantics law. The SRM-II was proposed later [22]. Its basic idea is similar with SRM-I, but fixed interrelation law is used for inference instead of the fixed semantics law.

In author's early work, the proposed revising methods were described only for single rule with single antecedent. In [ll], revision principle was applied with multiple rules through neural network implementation. This article will extend the discussion to multiple rules which may contain multiple antecedents by introducing an approximation measure. The approximation measure is based on a distance between fuzzy sets. It offers a useful feature that for fuzzy sets A, B C X, am(A, B), an approximation measure of A and B is not necessarily 0 when A n B = <f>. Furthermore, the value of am(A,B) is dependent on \X\, the size of X. This


gives a flexibility to determine am(.A, B) based on the need of application. When a fuzzy rule is determined to fire, it is necessary to require a certain compatibility between the input and the antecedent. Many works have been done for compatibility and similarity measure, such as [6; 13; 24]. The proposed approximation measure can also serve for this purpose.

For simplicity, in [21; 22] we discussed only special cases where an input is a semantic approximation of the antecedent. In this article we will present how the condition can be relaxed by using a normalized approximation. Koczy and his colleague proposed a general revision principle method as a way between the revision principle and the rule interpolation techniques [3] and used so-called normalisation of the support of fuzzy set (suppnorm). Adopting the idea of suppnorm, we introduce normalized approximation and extended semantic approximation which offer a possibility to apply the semantic revising methods for fuzzy sets with arbitrary support and position. A generalized approximation measure is then proposed to deal with normalized approximation of fuzzy set.

The rest of this article is arranged as follows: the basic concepts and revising methods of revision principle are briefly explained in section 2; Section 3 introduces an approximation measure, its extended definitions and discusses their properties. The application of revision principle with multiple antecedents and multiple rules are presented in section 4; Section 5 gives the summary.

7.2 Revision Principle

In this section, we briefly review the methods of revision principle to provide reader a basis of understanding. It is assumed that P and P' are defined by fuzzy sets on the universe of discourse X as

P = \ VP(X)/X xeX

P' = J np,(x)/x xeX

and Q is defined as a fuzzy set on the universe of discourse Y as

Q= HQ(V)/V y e Y


where the Hpix) is the membership function of P, J means union of all HP(X) for x over the universe of discourse X. Notations for P' and Q are similarly denned.

For simplicity, the fuzzy sets under discussion in this section are assumed to be convex and normalized. The universes of discourse X, Y are real-number intervals. The application on more general cases will be presented in section 4 of this article.

7.2.1 Linear Revising Methods

7.2.1.1 Relational Factors in Linear Revising Methods

When the revision principle is applied for approximate reasoning, as mentioned early, a simplified relation between the antecedent and the consequent of a rule will be used. In order to get a reasonable conclusion for some applications, it is important to have an appropriate Rp,Q. Two relational factors have been suggsted to determine Rp,Q in linear revising methods. They are corresponding relation and direction of change[22; 12].

Corresponding Relation The corresponding relationship between the P and the Q of a rule P —¥ Q is found in different ways for different linear revising methods. In fixed value law and value-point law, a relational pair (x, y) is decided based on their membership values: fip(x) — HQ{V). While in fixed point law, a relational pair is decided based on certain relation between their positions on universes of discourse: y = f(x).

When we fix a v = fJ,p(x), 0<v<lioixeX, there will be /ig(yi) = tiQ(y2) = fJ>p(x), where y\ ^2/2- Figure 7.1 shows an example, where the corresponding point of x in the part AB of P will be either y\ in the part ab or 2/2 in the part ac of Q. In other words, there are two corresponding relationships (AB —• ab,x —>• j/i) and (AB -> ac,x -> 1/2). The former is called positive inference (ft) a n d the latter is called negative inference (t4-)-That is, when

dyp(x) x dnQ(y) > Q

dx dy

for Hp(x) — /XQ(J/), li!p(x) 7 0 and ^'n(y) 7 0, it is positive inference, otherwise it is negative inference. This idea is directly used with fixed


0

Fig. 7.1 The relation of corresponding points

value law and value-point law, where a relational pair < x, y > is found for fiP(x) = fiQ{y).

The similar idea is used with fixed point law to decide a y € [yi, yr] = Y, the corresponding point of a; € [xi, xr\ = X, by an unification function:

y = U(x) = a[(x - xi) -e- (xr - xt)] x (yr - yi) + yt (4)

where a is a correspondence operator defined as : J T for positive inference . \ 1 — r for negative inference

In the unification function, it is possible to use

xsi = min[inf(supp(P)),inf(supp(P'))]

xsr — max[sup(supp(P)), sup(supp(P'))]

as the left and right point instead of xi and xr, where supp(.) is the suppot of a fuzzy set [32; 15], inf(.) and sup(.) denote the infimum and the supre-mum [15] of a set. We can also estimate the supp{Q') and then similarly get ysi and ysr to be used in the unificatin function instead of yi and yr.

Direction of Change The relational factor direction of change is to determine how a consequent can be revised when an amount of revision has been calculated. For instance, assuming the rule 'if P is small then Q is large' and the fact iP' is very smalF is given, there can be some different semantic viewpoints for deducing Q'. The one is by the understanding that 'the smaller P is the large Q is'. THe other one is that 'the smaller P is the smaller Q is'. The former is called inverse inference(-) where the direction of change from Q to Q' is inverse to the change from P to P'. The latter is called compliance inference where the direction of change from Q to Q' is the same as from P to P' (Figure 7.2).

»/! V * ° J j LA


Q'(-) Q Q'(+)

iSLMH Fig. 7.2 Direction of change

Fig. 7.3 Fixed-point law

7.2.1.2 Linear Revising with Fixed-Point Law (LFP)

The basic idea here is to fix a point x in X = [xi, xr], the universe of discourse of P, to get a corresponding point y in Y = [yi, yr], the universe of discourse Y of Q, by the unification function \J(x) as given in (4). The deviation between P' and P will be captured by the difference of the values of membership functions /xp(x) and /J,p'(x) at the fixed point x. Then an approximate fJ.Q'(y) will be deduced by the deviation np>(x) — HP(X) as well as the /xg(y).

Formula 1 (Linear Revising Method with the Fixed-Point Law): I. Deviation from antecedent

Afj,P(x) = fip>(x) - ^p(x)

II. Revision to consequent 0

[i - fj-q(y)} -Hi - M * ) ] x AfJ-p(x)

(6)

A/ip(x) = 0 AfiP(x) < 0 (7) AfiP(x) > 0

III. Revised membership function of consequent


y y

Fig. 7.4 Fixed-value law

0 / iQ(y)±A/XQ(j/)<0

/^Q'(y) = { i*Q(y) ± A^Q(y) ° < M s / ) ± AMg(y) < o

1 (J,Q(y) ± A(iQ{y) > 1

IV. Approximate consequent Q' = J»Q>(y)/y

(8)

(9)

where (±) means for compliance inference the '+ ' shall be used and for inverse inference the '—' shall be used.

7.2.1.3 Linear Revising with Fixed-Value Law (LFV)

Different from the LFP, the basic idea here is to fix a value v e [0,1] such that the membership functions fip(x) = fipi(x') — fiQ(y) = v (x, x' £ X, y G Y) to find a shift Ax — x' — x on the universe of discourse X, and then by this shift to determine another shift Ay from the point y to y' for VQ'iy') — MQ(2/)I where x' is called the deviative point of x for given P and P' and it satifies:

d(j,p(x) dnp{x) ^ dnP.{x') Q or dx dx' dx

the y is the corresponding point of x and it satisfies:

djj,p{x)

, diipAx') 0 and , \ ' = 0

dx'

dnp(x) x diMqiv) > Q

dx dy dx = 0 and

dfJ.QJy) dy

where "+" is for positive inference, '—' for negative inference. Letting MQ'(2/') =

A*Q(2/)I the result can be deduced (Figure 7.4).

Formula 2 ( Linear Revising Method with the Fixed-Value Law ): The universes of discourse are X = [xm, XM] for P and Y = [yn, y v] for Q respectively. The support of P is supp(P) — ( z i , ^ ) C X, and the suport of Q is supp(Q) = (yi,y2) C Y.


I. Revision to consequent (a) Boundary dependent (a-1) for positive inference(+)

A , - Ft** AX) _ {<*; : *> * <»;»;>: c - - > * - (-•*> (a;^ - x) x' e [x, i M ]

(10)

(a-2) for negative inference(—)

A = ^ / A l ) = { (x> -x)*(y- VN) + (X- xn) x' € [xm,x) V ' * ' I | ( ^ _ X ) X (yn _ y) ^ . ^ _ 3.) j / g [ j . ) I j f ]

(ID (b) Boundary independent

Ay = F(x, y, Ax) = (12) f rrau;{yn, y(±)(x' - x) x (y2 - yi) H- (x2 - xi)} - y (±)(z ' - x) < 0 \ min{yN, y(±)(x' - x) x (y2 - yi) -f- (x2 - x j } - y ( i ) ^ ' ~x)>0

where y is the corresponding point of x.

II. Approximate consequent

Q' = J»Q(y)/{y + *y) (13)

where (±) means for compliance inference, the sign '+ ' shall be used, and for inverse inference, the sign '—' shall be used.

7.2.1.4 Linear Revising with Value-Point Law (LVP)

The value-point law is a combination of the fixed point law and the fixed value law. It fixes a value /xp(x) = v G [0,1] for x e X to get a corresponding point y £Y which satisfies VQ{y) = pp{x), and

^ P ( x ) x d ^ ( y ) > 0 or dtp(x1==0 ^ d ^ ( y ) = 0

ax ay ax ay

where "+" is for positive inference, '—' for negative inference. An approximate HQ'(y) will be deduced linearly by A/xp(x) = /zp'(x) — /xp(x), the deviation between the membership functions /xp(x) and /xp/(x) at the point x, and fi,Q(y) (Figure 7.5).

Formula 3 ( Linear Revising Method with the Value-Point Law): I. Revision to consequent

A/xQ(y) = A/iP(x) = / i p / ( x ) - / x P ( x ) (14)


Fig. 7.5 Value-point law

II. Revised membership function of consequent

(0 fiqiy) ± AfiQ(y) < 0

VQ' (v) = { MQ (y) ± A^Q (y) o < HQ (y) ± A/xQ (y) < I (15)

III. Approximate consequent

C? = J KyivVv (16)

where (±) means for compliance inference the ' + ' shall be used and for inverse inference the '—' shall be used.

7.2.2 Semantic Revising Methods

Definition l(valuable interval) An interval VA — [xiv, xrv) is called valuable interval [2l] of a fuzzy set A C X = [xi,xr\, if and only if supp(A) = (xiv, xrv) c X is the support of A [32].

Definition 2 (semantic approximation) A fuzzy set A' is said to be a semantic approximation of a fuzzy set A if and only if their valuable intervals coincide [22].

In this section, we discuss only the basic definitions of semantic revising methods for those cases where input fuzzy set and antecedent fuzzy set are semantic approximation each other. Application of semantic revising methods for more general cases will be discussed in section 4 of this article.

7.2.2.1 Semantic Relation and Interrelation in Semantic Revising Methods

When the semantic revising methods are used with a rule P —»• Q where P C X = [xi,xr] and Q C Y = [yi,yr], the interrelation and the semantic


(I) Mp(x) SR

*r X

(II) IRP,Q

_ P 1

J? \

Y

Mp« SR

s (s>y^

9 / L y >>

yr

ii) 5.Q(+)

M^y) 1

Q

Fig. 7.6 Semantic relation and interrelation of SRM-I and SRM-II

relation of P and Q can be decided by different ways. In the SRM-I we first define an interrelation:

IR<gQ = {(x,y)\y = U{x)} (17)

on X x Y, where U(a;) is the unification function as (4). Then we have the semantic relation on [0, l ] 2 , the space of the membership degrees of P and

Q-

SR% = {{s,t)\s = MPGO,* = VQ{V), (*>*) e I R P , Q } (18)

An approximate consequent Q' is deduced by fixing the semantic relation between P and Q, and keeping the semantic relation between the given P' and Q'. It is called fixed semantics law.

In the SRM-II, we first define a semantic relation on [0, l ] 2 :

and then we have the interrelation on X xY:

mgg = {(*,y)IM*) = «,Ml/) = *> M) e S I O

(19)

(20)

which satisfies: ds dt T — x -7- x * > 0 ax ay

ds , dt or — = 0 ana — = 0

aa: dy

where \t is an interrelation constant for SMR-II and decided by: f +1 for positive interrelation * = {-! for negative interrelation

(21)


An approximate consequent Q' is deduced by fixing the interrelation between P and Q, and keeping the interrelation between the given P' and Q'. This method is called fixed interrelation law.

7.2.2.2 Semantic Revising Method I (SRM-I)

The inference of using SRM-I is to get a point x in the universe of discourse X of P and P' and the corresponding point x* where fJ.p>(x) = /j,p(x*). Then based on the interrelation of SRM-I between P and Q, the points y* and y can be found in the universe of discourse Y of Q and Q'. With the semantic relation of SRM-I, A*n(y) = A*Q(2/*) is obtained. Integrating n'n(y) over all y G Y, an approximate conclusion can be deduced (Figure 7.7).

Formula 4 (The Semantic Revising Method I, SRM-I) : The valuable interval for P and P' is [xi, xr] = Xp = Xp<, and for Q and Q' is [yi, yr) = YQ = YQ,

I. For an x G Xp>, find the corresponding x* G Xp which satisfies the following two conditions:

HP{X*) =np,(x) and

d»p(x*)xd»p,(x)>0 ^ dMx*)=Q ^ dv»{x)=Q

ax ax dx dx

II. By interrelation,

y = XJ(x) = a[(x - xi) -f (av - a;;)] x {yT - yt) + yt (22)

y* = U(i*) = a[{x* - xt) + (xr - x,)] x (yr - yt) + yi (23)

where, y G YQ>, y* G YQ and (x,y),(x*,y*) G IRpQ by (17), <r is the correspondence operator by (5).

III. Based on the fixed semantics law, when Hp>(x) = (ip(x*), we have

MQ'(y) = M<?(y*), where (fip(x*), nq(y*)) G S R p ^ , (^P'(a;), MQ'(J/)) G

Oxvp/ Q / j 3.11(1 j K n / r\i ^ D r t p / j .

IV. The Q' is deduced by

Q' = J »Q>(y)/y


• • M * )

Fig. 7.7 Semantic Revising Method I

Fig. 7.8 Semantic Revising Method II

Formula 4 shows that the semantic relation in P and Q will always be kept in P' and Q' when the SRM-I is applied.


7.2.2.3 Semantic Revising Method II (SRM-II)

The inference of using SRM-II is to fix a point x in the universe of discourse X of P' and P and then get the membership values /xp(x) and fipi(x). Based on the semantic relation of SRM-II, HQ{V) = HP(X) can be obtained. From MQ(2/)

w e have the point y and then let fJ.Q'(y) = /J,pi(x). Integrating VQ'iy) o v e r a u y € Y, an approximate conclusion can be deduced(Figure 7.8).

Formula 5 (The Semantic Revising Method II, SRM-II): The valuable interval for P, P' is [a;;, xr] = Xp = Xp>, and for Q and Q' is [yi, yr] = YQ = YQ,

I. For a fj,p/(x) £ [0,1] (x £ Xp>), find the corresponding fj,p(x*) e [0,1] (a;* S Xp) which satisfies the condition x — x*.

II. By the semantic relation,

My*) = wCO (24) which satisfies the conditions

dj^nxd^y)x^>o „ ^ ^ ) = 0 and 4q(ir) = 0 ax ay ax ay

where y* € YQ and (fj,p(x*), /XQ(«/*)) € S R ^ by (19), \I> is the interrelation constant by (21).

III. Using the fixed-interrelation law IRp, L, = I R p A, when x = x*, we

have (a;, y) = (ar*,y*) and y = y*, where (a:*,y*) e l R p ^ , (x,y) £lR(pPQl.

IV. By the semantic relation,

MQ'(y) = M P ' ( a ; ) (25)

which satisfies the condition:

% M X % M X * > 0 or % M = 0 and % M = 0 aa; ay ax ay

V. The Q' is deduced by

Q' = J »Q'{y)/y

Formula 5 shows that the interrelation relation in P and Q will always be kept in P' and Q' when the SRM-II is applied.


Fig. 7.9 Non-decreasing, non-increasing and full membership part of a membership function

7.3 Approximation Measure

There have been many works done for similarity measure between fuzzy sets and elements, such as [4; 5; 14; 16; 20; 24; 26; 33]. The similarity measure based on geometric distance model discussed in [4] raised an important property that for any two fuzzy sets A and B, A,B C U, the situation of AC\B = (f> does not necessarily lead a similarity measure of A and B to be 0. Inspired by this thought and the concept of Hamming distance [15], we introduce an approximation measure between two fuzzy sets.

7.3.1 Basic Definition

We first give the basic definition of approximation measure for convex and normalized fuzzy sets.

Suppose m fuzzy sets Pi,P2,...,PTO C X, m > 1. For each of them, there is a membership function fj,pi (x) for x £ Xi C X as shown in Figure 7.9. The Xi = [a;i4,a;ri] is the valuable interval of Pi. We denote the non-decreasing part by fip(+)(x), non-increasing part by A*p(-)(x), and full membership part by Mp(=) (x)'-

' MP(+) (x) x e [xh, xai)

MP; 0*0 = { »p&(X) X S [XanXbA (26) VP(-){x) x<=(xbi,xri]

where ixp~'(x) = 1. Given a membership value t € [0,1), for an i (1 < i < m), there are

two points Xj and a:^-^ satisfying Mp(+) (xt ) = * anc^ A*p(-) (xt ) = *>

where x$ ' £ [x; 0x a i ) and a G ( a ; ^ , ^ ] . When there is only one point x* G X with /xp;(x*) = t* = 1, we have a;J;+^ = xl

t. = xai = x^.

Definition 3(approximation measure) The approximation measure of


two convex and normalized fuzzy sets Pi and Pj (Pi, Pj C X, 1 < i, j < m) is defined as one minus the distance of Pi and Pj over the universe of discourse X:

amx(Pi,Pi) = 1 - 7 7 / (\4+) ~ 4{+)\ + \4'] ~ 4(~}$/2 (27)

When X is obvious for discussion, it can be simply denoted by am(Pj, Pj). In real application, a few representative points may be sufficient for calculating an approximation measure. For instances, four points can be chosen for a trapezoidal fuzzy set and three points for a triangular fuzzy set. Based on the above definition, for any P%,Pj Q X, the following properties of the approximation measure hold.

Property 1 0 < am(Pj, Pj) < 1.

Property 2 When Pi = Pj, am(Pj ,P,) = 1.

Property 3 am(Pi ,P,) = am(PJ ,Pj).

Definition 4(least approximation measure) Let Pj, Pj C X (1 < i,j < m) be two fuzzy sets, Vpi and Vpj the valuable interval of Pi and Pj, respectively. The least approximate measure of Pj and Pj is defined as:

am L (P j ,P , ) = 1 - ± - j ( |zj ( + ) - s*+>| + \4~] - s><->|)/2 (28) \VL\ yte[o,i)

where VL = \lv,uv] C X i s the smallest set that includes both of Vpi and Vpi as subsets, and its lower boundary and upper boundary are

lv = min[inf(VPi), inf(VPj)]

uy = maa;[sup(Vpj), sup^p^].

Property 4 a m * (Pj, Pj) > amL (Pj, P,) .

When an approximate reasoning is considered in a relatively narrow range, the result will be more sensitive to the difference between a given input and the antecedent of a fuzzy rule. The a m i provides a measure for the extreme case when the universe of discourse is the smallest interval containing Pj and Pj, so it can be used to calculate the minimal guaranteed value of approximation measure during reasoning.


7.3.2 Extended Definitions

Let supp(A) be the support of fuzzy set A C X, suppi,{A) = inf(supp(A)) and suppu(A) = sup(supp{A)) be the infimum and supremum of supp(A), respectively. The central point of fuzzy set A is:

cp{A) = (sup(Aa) + inf{Aa))/2 (29)

where a = height(A) and Aa is the a-cut of A.

Definition 5 (normalized approximation with given support and central point) Let A C X be a convex fuzzy set, supp(A) = S, height(A) = a > 0, SUPPL(A) = I, suppu(A) = u, and certral point cp(A). For any given L,U £ X, and L < U, we have ' M L J / , a normalized approximation of A with given X and U, which satisfies:

suppL{nALU) = L

suppu(nALU) = U

and the membership function is determined by:

{ fj,A(cp(A))/a x = cp(nALu)

»A(f(x))/a L<x<U (30) 0 otherwise

/(*) = (x - cpCALu)) x îfZ)) + CP(A) (31)

a fZ s < c p ( M L t , ) \ u a; > cp(nALu)

- { L a; < cp("ALC7) ,„ . C/ x > cp(nALU) { '

where CP(UALU) is the central point of UALU, which satisfies L < cpiÂm) < U. When cp(nAm) is not previously given, we have cp(nAnj) = f(L, U, I, u): The following function / is a choice to keep the original shape of A as much as possible:

f(L, U, I, u) = J~f- x (cp(A) -l) + L (34)

A symmetrical nAuj can be obtained by defining


cp(nALU) = (L + U)/2

Definition 6(extended semantic approximation) Let A, B C X be fuzzy sets, suppL(A) = I A, suppu(A) = uA, cp(A) = cA, SUPPL{B) = IB,

suppu(B) = UB, and cp{B) = CB- The set B is said to be an extended semantic approximation of A if and only if there is a nBuj, normalized approximation of B with L = I A , U = UA and

cP(nBLU) = ^ B ~ _ ? B ) X M f i ) - 1B) + L-

When U = UB = UA and L = IB = IA, we have nBm = B and B is a semantic approximation of A. So this is an extended definitin of semantic approximation [21; 22]. It makes semantic revising methods possible for fuzzy sets with aritrary support and position.

For convex but nonnormalized fuzzy sets, we have a generalized definition of approximation measure.

Definition 7(generalized approximation measure) Let A and B be convex fuzzy sets, height(A) = OLA > 0, height(B) — OCB > 0, SUPPL(A) = IA,

suppu(A) = UA, SUPPL(B) = IB and suppu(B) = UB- The generalized approximation measure is defined by:

°amx{A,B) = m i n ^ ' a ^ x a m x ( M U u y l , " B i B U B ) (35) max{a.A,aB)

where nAiAUA is a normalized approximation of A with given IA and UA, nBi

BUB is a normalized approximation of B with given IB and UB-A normalized fuzzy set is also a normalized approximation of itself.

When a A = OLB = lj the Definition 7 returns back to the basic definition as (27). The generalized least approximation measure can be similarly defined. Definition 8 (simplified approximation measure) Let A,B C X be convex fuzzy sets, height(A) = OCA > 0, height(B) = as > 0, suppi,{A) = IA, suppu(A) = UA, SUPPL(B) = IB, and suppu{B) = UB- A simplified approximation measure of A and B is defined by

<, , . m min{aA,aB) w (\h - U\ + \uB - uA\)/2 amv{A,B) = r X — (36)

max{aA,aB) \V\ where V = [min(lA,lB),fnax(uA,UB)] Q X. When B is simply a shift of A on X, then we have s amv(A, B) = amy (.A, B).


Definition 9(extended approximation measure) Let A,BCXbe normalized convex fuzzy sets. When there is a C C X satisfying

min{suppL{A),suppL(B)} < supp^C) < max{suppi,{A),suppL{B)}

min{suppu(A), suppu(B)} < suppu(C) < max{suppu(A), suppu{B)}

an extended approximation measure of A and B can be deduced by eamx(A,B)=amx(A,C)*amx(C,B) (37)

where amx(A, C) is the approximation measure or simplified approximation measure of A and C, and a m ^ f B , C) of B and C, respectively. Normally, eamx(A, B) < amx(A, B).

7.4 Approximate Reasoning Using Revision Principle

7.4.1 Reasoning with Multiple Antecedents

Given a rule Pn ,P i2 , • ••,P\m —> Qi, where Pu C Xu C X and Q\ C Y. Each Pu(i = 1,..., m) can be defined by the membership function with non-decreasing part /i (+)(x) for x € [xiu,xau), non-increasing part /J, (-)(#) for

Pli Pit

x S (^bii.^rii], and full membership part (j, (=)(x) for x e [xau,Xbu], by Pli

the same manner as equation (26). The Q\ is similarly described by

>Q(+>G/) y&[yi,ya)

»Qi(y) = {A*g(-)(y) y e [yaM (38) _ M 0 ( - ) (y) y e (z/!>, 2/r]

where fJ-^(y) = 1. The reasoning can be carried out by following steps.

(1) For given Pn, P\2i ••••> P\mi w e consider each of them at once. When Plt is taken into consideration, the partially revised consequent Qu

is determined by applying one (linear or semantic) method of the revision principle to Pu, Pu, and Q\.

(2) The idea of fix-valued law is then used to get the deviation of each Qu from Q\. For any t S [0,1) (Figure 7.10), we have corresponding

%(+) = / * - ( + > «


and

f) yi(+) y(-> yi(-)

Fig. 7.10 Deviation of Qu from Qi

y'ti{+) = M-? ( + )(<)

So for i = 1,2,..., m, there are

and

Ay^=:y^-y^

(3) Calculate the corresponding yt by

y; = yi + TZi*™(Pii,Pii)

(39)

(40)

(41)

and ya, yb, yr by the same manner. (4) An approximate conclusion Qr will be integrated by all individual

revisions from Q\ to Qu:

JV €[v'a,yh]

Jy €(vb,yr\

+

+ (42)

where y for non-decreasing part and non-increasing part is calculated by


(a) for y G [y'i,y'a),

i-^&^jf' (43)

where y £ [yi, ya), AJ/J ' is the deviation for t — UQ (y).

(b) for y G (y'b,y'r],

where y G [yb,yr], AJ/J ' is the deviation for t = (J,Q (y)-

(5) The confidence of the approximate conclusion Qx is calculated by

CanfiQ'j) = Mini[axn{Pu, P'u)\ (45)

7.4.2 Reasoning with Multiple Rules

When we are given a set of n rules:

Pn,Pi2,-,Pim->Qi

P21,P22, •••,P2m -> Q2

•* n l j •* n2i •••) Mim ^ Wn

and a set of facts an approximate reasoning can be performed as follows:

(1) Applying the method introduced early for a single rule with multiple antecedents on the set of facts with each given rule Pji,Pj2, •••, Pjm —> Qj (j = l,2,...,n) toget

Qli Q21 •••> Qn-(2) Based on the confidences:

ConfiQ'j),Ccmf(Q'2), ...,Ccmf(Q'n)

to find a Qs G {Qi, Q2, •••, Qn} that satisfies

Canf{Q3) = MaxjlCanfiQ'j)}


(3) Calculate the corresponding yl by

and ya, yb, yr by the same manner. (4) An approximate conclusion Q can be obtained by:

Ql = Ler >/£{y),y

+ [, , , lly' Jy'£[y'*,yb]

/ • « • • , * •'» €(l/6,Vr]

+ / . tihvVv (47)

where y for non-decreasing part and non-increasing part is determined by

(a) for y £ [yi,ya),

. _ E ? = I " " ( Q J - , Q ; ) X ^ '

(b) for y € (j/!,,^],

(48)

V E;=iam(Q„Q;.) ^ >

where yJ(+) andy-7^-) are the corresponding points in non-decreasing part and non-increasing part of membership function of Qj (j = l ,2, . . . ,n) , respectively, for t = HQs(y), V & [vuVr]-

When the revision principle is applied to non-normalized fuzzy sets, normalized approximation and generalized approximation measure can be used. When a semantic revising method is applied for applications where an input fuzzy set may not always be a semantic approximation of corresponding antecedent, an extended semantic approximation can be used. In that case, a distance measure between a normalized approximation and its original fuzzy set needs to be calculated and used as a kind of confidence measure for the revision of consequent. We leave the details to another article.


Ql Q'

Q'l Q' Q'2

lit 9'12 t Q'2? t C '•'V \ *\ : >f\' '"\% \/ \ //'>\x :\\ '•* \ i: v

v- ' C .V / y

7.4.3 Example

Given rules:

2 3 4 5 6 7 8 9 10 Y

Fig. 7.11 An example

P2l,P22^Q2

where Pn,Pi2,P2i,P22 Q X = [0,10], and Qi ,Q 2 C Y = [0,10] (Figure 7.11). For simplicity, all of them are triangular. So Pn can be simply denoted by (0,1,2), and others by the same manner.

(a) Based on the definition of approximation measure, we have

am(P 2 ,P 1 2 )

10

l - ( 2 + 2 > / 2 = 0 . 8 10

(b) Using linear revising method with fixed value law, we get

Q'n = (2,3,4)

Q'12 = (3,4,5)

(c) Using the result of (a) and (b), we have

0.9 x 1 + 0.8 x 2 Vh = 1 +

y'ai = y'bi

0.9 + 0.8 2.47


0.9 x 1 + 0 . 8 x 2 . = 2 + — =3.47

0.9 x 1 + 0.8 x 2 ~L7~

Q\ = (2.47,3.47,4.47)

Vri = 3 + - — =4.47

(d) With the second rule, we have

and

a m ( P ; , P 2 1 ) = 0 . 8

a m ( P 2 , P 2 2 ) = 0 . 9

Q2 1 = (6,7,8)

Q22 = (7,8,9)

, = 0.8x(-2) + 0.9X(-l) yh 0.8 + 0.9

1*0,2 "^2

= 9 + °-8 ' l -y-') , 753 „;, - 1 0 + 0 . 8 X ( - 2 H 0 . 9 X ( - 1 ) ^ 5 3

Q2 = (6.53,7.53,8.53)

(e) Combine the results from (c) and (d), we have

a m ^ Q i ^ 0.853

a m ( Q 2 , Q 2 ) = 0.853

2.47x0.853 + 6.53x0.853 Vi = = 4.5 yi 0.853 + 0.853

:5.5

Va = Vb

^ 3.47 x 0.853 + 7.53 x 0.853 0.853 + 0.853

- . 4.47 x 0.853 + 8.53 x 0.853 . „ _ w„ = = 6.5 VT 0.853 + 0.853

and an approximate conclusion (Figure 7.11) Q is obtained as:

Q = (4.5,5.5,6.5)


7.5 Summary

We discussed the application of revision principle for approximate reasoning with multiple rules. An approximation measure has been proposed for the integration of revision. In the early stage of processing (i.e. multiple antecedents), it is used to combine the revision caused by individual sub-antecedent to consequent. In the late stage of processing (i.e. multiple rules), it is applied as weight for the integration of approximate result derived by using each individual rule in the early stage.

To relax the conditions given to fuzzy sets in author's early works, the concepts of normalized approximation and generalized approximation measure were introduced to handle fuzzy sets that may be nonnormalized, or with arbitrary support and position.

Our future effort will be put on the study of approximation measure in semantic basis. We will also extend our discussion to nonconvex fuzzy sets and also fuzzy sets defined on multiple dimensions.


References

[1] Baldwin, J.F., "A New Approach to Approximate Reasoning Using a Fuzzy Logic". Fuzzy Sets and Systems, 2, pp. 309-325, 1979.

[2] Baldwin, J.F., "Fuzzy Logic and Its Application to Fuzzy Reasoning", in Advances in Fuzzy Set Theory and Applications, edited by M.M. Gupta, et al., North-Holland, pp. 93-115, 1979.

[3] Baranyi, P. and L.T. Koczy, "A General Revision Principle Method as a Way Between the Revision Principle and the Rule Interpolation Techniques", Proc. 6th IEEE International Conference on Fuzzy Systems, pp. 561-566, 1997.

[4] Chen, S.M., Yeh, M.S. and Hsiao, P.Y., "A comparison of similarity measures of fuzzy values". Fuzzy Sets and Systems, Vol 72, pp. 79-89, 1995.

[5] Chen, S.M., "Similarity Measure Between Vague Sets and Between Elements", IEEE Trans, on Systems, Man and Cybernetics, Vol 27, No. 1, pp. 153-158, 1997.

[6] Cross, V. and T. Sudkamp, "Fuzzy Implication and Compatibility modification", Proc. 2nd IEEE International Conference on Fuzzy Systems, pp. 219-224, 1993.

[7] Cross, V. and T. Sudkamp, "Patterns of Fuzzy Rule-Based Inference", International Journal of Approximate Reasoning, 11, pp. 235-255, 1994.

[8] Ding, L., Z. Shen and M. Mukaidono, "A New Method for Approximate Reasoning", IEEE Proceedings of ISMVL - the 19th International Symposium on Multiple-valued Logic, pp. 179-185, 1989.

[9] Ding, L. and M. Mukaidono, "A Proposal on Approximate Reasoning Based on Revision Principle and Fixed Value Law" (in Japanese), The Transaction of the Institute of Electronics, Information and Communication Engineers, J72-D-II, 2, pp. 117-122, 1991.

[10] Ding, L., Z. Shen and M. Mukaidono, "Revision Principle for Approximate Reasoning, Based on Linear Revising Method", Proc. 2nd International Conference on Fuzzy Logic & Neural Networks (IIZUKA '92), pp. 305-308, 1992.


[11] Ding, L. and Z. Shen, "Neural Network Implementation of Fuzzy Inference for Approximate Case-based Reasoning", in Neural and Fuzzy Systems: The Emergine Science of Intelligence and Computing, edited by S. Mitra, M.M. Gupta and W. Kraske, SPIE Press, pp. 28-56, 1994.

[12] Ding, L., "Methods of Revision Principle for Approximate Reasoning", In-ternation Journal of General Systems, Vol.28, No.2-3, pp.115-137, 1999.

[13] Dubois, D. and H. Prade(1985), "The Generalized Modus Ponens Under Sup-min Composition: A Theoretical Study", in Approximate Reasoning in Expert Systems, Edited by M.M. Gupta, et al, North-Holland, pp. 217-232, 1985.

[14] Hyung, L.W., Song, Y.S. and Lee, K.M., "Similarity measure between fuzzy sets and between elements". Fuzzy Sets and Systems, Vol 62, pp. 291-293, 1994.

[15] Klir, G.J. and T.A. Folger, Fuzzy Sets, Uncertainty, and Information, Prentice-Hall International, 1988.

[16] Liu, X., "Entropy, distance measure and similarity measure of fuzzy sets and their relations", Fuzzy Sets and Systems. Vol 52, pp. 305-318, 1992.

[17] Mizumoto, M., "Some Fuzzy Inference Methods: The IF-THEN case", The Transaction of the Institute of Electronics, Information and Communication Engineers, Japan, J64-D, 5, pp. 379-386, 1981.

[18] Mizumoto, M., "Comparison of Various Fuzzy reasoning Methods", Pnoc. of 2nd IFSA Congress, pp. 2-7, 1987.

[19] Mukaidono, M., L. Ding and Z. Shen, "Approximate Reasoning Based on Revision Principle", Proc. NAFIPS'90,1, pp. 94-97, 1990.

[20] Pappis, C.P. and Karacapilidis, N.I., "A comparative assessment of measure of similarity of fuzzy values", Fuzzy Sets and Systems. Vol 56, pp. 171-174, 1993.

[21] Shen, Z., L. Ding, H.C. Lui, P.Z. Wang and M. Mukaidono, "Revision Principle for Approximate Reasoning, Based on Semantic Revising Method", IEEE Proceedings of ISMVL - the 22nd International Symposium on Multiple-valued Logic, pp. 467-473, 1992.

[22] Shen, Z., L. Ding and M. Mukaidono, "Methods of Revision Principle", Proc. of 5th IFSA Congress, pp. 246-249, 1993.

[23] Tsukamoto, Y., "An Approach to Fuzzy Reasoning Method", in Advances in Fuzzy Set Theory and Applications, edited by M.M. Gupta, et al., North-Holland, pp. 137-149, 1979.

[24] Turksen, I.B., "An Approximate Analogical Reasoning Approach Based on Similarity Measures", IEEE Trans, on Systems, Man and Cybernetics. Vol 18, pp. 1049-1056, 1988.


[25] Wang, P.Z., "Fuzziness vs. Randomness, Falling Shadow Theory", BUSE-FAL, No. 48, 1991.

[26] Wang, W.J., "New similarity measures on fuzzy sets and on elements", Fuzzy Sets and Systems. Vol 85, pp. 305-309, 1997.

[27] Yager, R.R., "An Approach to Inference in Approximate Reasoning", Journal of Man-Machine Studies, 13, pp. 323-338, 1980.

[28] Zadeh, L. A., "Fuzzy Sets", Information and Control, 8(3), pp. 338-353, 1965.

[29] Zadeh, L. A., "The Concept of a Linguistic Variable and Its Application to Approximate Reasoning (I);(II);(III)", Information Sciences, 8, pp. 199-249; 8, pp. 301-357; 9, pp. 43-80, 1975.

[30] Zadeh, L. A., "Fuzzy Logic and Approximate Reasoning", Syntheses, 30, pp. 407-428, 1975.

[31] Zadeh, L. A., "A Theory of Approximate Reasoning", Machine Intelligence, 9, edited by J. Hayes, D. Michie and L. I. Mikulich, Halstead Press, New York, pp. 149-194, 1979.

[32] Zimmermann, H.-J., Fuzzy Sets, Decision Making and Expert Systems, Kluwer Academic Publisher, Boston, 1987.

[33] Zwick, R., Carlstein, E. and Budescu, D.V., "Measures of similarity among fuzzy concepts: a comparative analysis", International Journal of Approximate Reasoning, Vol 1, 221-242, 1987.

Chapter 8 Handling Null Queries with Compound Fuzzy Attributes

Shyue-Liang Wang1, Yu-Jane Tsai2

I-Shou University, Taiwan-

National University of Kaoshiung, Taiwan

A b s t r a c t

We present here a generalized approach for handling null queries that contain compound fuzzy attributes. Null queries are queries that elicit a null answer from the database. Compound fuzzy attributes are ambiguous attributes that are not defined in the original database schema but can be derived from multiple rigid attributes in schema. Compound fuzzy attributes derived from simple numbers were studied by Nomura [11]. In this work, we extend compound fuzzy attributes so that they can be derived from numbers, interval values, scalars, and sets of all these data types. Database management systems that can handle this type of ambiguous attributes in null queries not only can reduce the occurrences of null answers but also provide an improved user-friendly query environment

Keywords: null query, compound fuzzy attribute, fuzzy database, aggregation function, similarity measure

8.1 Introduction

Ambiguous information permeates our understanding of the real world. Extending capabilities of database management systems to handle imperfect or fuzzy information has been studied extensively [13]. Precise and imprecise data modeling of the real world enterprise emphasizes on the fuzzy data representation in the fuzzy database landscape [4] [13]. Many relational and object-oriented fuzzy data models have since been proposed. Possibility distributions and similarity/proximity relations are two main techniques for representing fuzzy information [2]. However, there are significant feasibility problems for performance requirements on fuzzy database management systems. On the other hand, fuzzy queries on either crisp or fuzzy databases allow users to retrieve information in a

149

150 S.-L. Wang & Y.-J. Tsai

more flexible manner [6]. It is suggested that front-end fuzzy querying systems have greater potential in the near-term, based on performance criteria [4].

Fuzzy querying system plays an importance role in the database landscape. Due to the "lack of flexibility" by conventional DBMS queries, fuzzy queries concern issues of providing more flexible and user-friendly query environment on crisp or fuzzy databases. Typical fuzzy queries allow users specify fuzzy query conditions and fuzzy attribute values so that the system returns data that match the query statement [1]. Querying a database may result in three categories of results: exactly matched, partially matched, and null answers. Exactly matched answers are required in typical crisp queries, whereas partially matched answers are permitted for most fuzzy queries. Fuzzy queries that elicit null answers are usually classified as null queries. All these fuzzy queries assume that query attributes must appear in the original database schema. However, users may not be familiar with the database schema when formulating queries, there might be ambiguous query attributes that are not defined in the original schema. In order to provide all possible answers to users without just receiving the frustrating nulls, null queries may be permitted to contain compound fuzzy attributes in the query statement for reducing the occurrences of null answers. A compound fuzzy attribute in a null query is a fuzzy attribute that can be derived from multiple rigid attributes in the original database schema [11]. For example, a compound fuzzy attribute "Build" might be derived from rigid attributes "Height" and "Weight". From user's point of view, compound fuzzy attributes in null queries are terms that represent intuitive or semantic level meanings contained in the database.

In general, causes of null answers may be due to missing attribute values in the database, syntactic errors in the query statement, and misconception about the database schema. There are many researches aimed at resolving the issue of missing attribute values, but few works have been done on resolving semantic errors. Motro [10] proposed the concept of generalized queries to explain null answers. The generalized query finds maximal failures and users them to determine the level of misconception. The level of misconception not only explains the reason why the query statement obtains a null answer, but also retrieves possible answers for the

Handling Null Queries with Compound Fuzzy Attributes 151

generalized query. To reduce user's misconception on the database schema and therefore less null answers, compound fuzzy attributes were proposed by Nomura [11]. It utilizes an averaging operator to represent the relationship between numerical rigid attributes and numerical compound fuzzy attributes. However, compound fuzzy attributes proposed by Nomura can only be derived from simple numbers. For fuzzy set attributes, Zemankova [18] proposed that fuzzy sets can be specified explicitly as a combination of other previously defined fuzzy sets in an Ad Hoc manner. It uses logical connectors and direct specifications to represent the relationships between attributes without systematic approach.

In this work, we extend compound fuzzy attributes to fuzzy databases such that they can be derived from numbers, interval values [14], scalars [15], and sets of all these data types in a systematic way. Fuzzy aggregation functions are used to represent the relationships between rigid attributes and compound fuzzy attributes. Null queries containing compound fuzzy attributes from fuzzy databases can thus be handled under a unified approach.

The rest of the paper is organized as follows. In section 2, we describe fuzzy data types that appeared in most fuzzy data models. Section 3 describes the generation of compound fuzzy attributes in null queries in detail. In section 4, we give an example showing how these null queries are processed. Finally, some discussion and future works are described in the conclusion.

8.2 Data Representation

Fuzzy databases use many different data types to model the imperfect information such as precise, imprecise, or uncertain data types. In summary, we list some data types that appeared in most fuzzy databases as follow [8].

1. Number: this is the integer data type in all databases, e.g., Age=28.

2. Interval value: this is a range of data lie between two numbers, e.g., Salary=[30k, 35k].

3. Scalar: this data type is defined for linguistic labels and usually


expressed as fuzzy sets or possibility distributions, e.g., Behavior=Good.

4. Set of numbers: this data type contains one or more numbers, e.g., Age={28, 29}.

5. Set of interval values: this data type contains one or more interval values, e.g., Salary ={[30k, 35k], [27k, 33k]}.

6. Set of scalars: this data type contains one or more scalars, e.g., Behavior={Good, Normal}.

7. Possibility distribution of numbers: this data type involves uncertainty factors on numbers, e.g., Age= {0.8/28, 0.5/30}.

8. Possibility distribution of interval values: this data type involves uncertainty factors on intervals, e.g., Salary={0.9/[30k, 35k], 0.7/[27k, 32k]}.

9. Possibility distribution of scalars: this data type involves uncertainty factors on scalars, e.g., Behavior={0.5/Good, 0.7/Normal}.

10. Similarity/Proximity relations: this data type defines the analogical relationship between discrete attributes of a domain, e.g., Hobby ={Swimming, Sport, Stamp, Reading}.

In this work, we will concentrate on compound fuzzy attributes that are generated from numbers, interval values, scalars, and sets of all these data types. Other fuzzy data types can be handles in a similar fashion with minor modification [16]

8.3 Generating Compound Fuzzy Attributes

A compound fuzzy attribute in a null query is a fuzzy attribute that can be derived from multiple rigid attributes defined in the original database schema. For example, compound fuzzy attributes "Build" of a person can be derived from "Height" and "Weight", "Potential" of an employee can be derived from "Age", and "Experience", and "Attractiveness" of a girl can be derived from "Eye" and "Hair" colors.

Assume that a relation "Employee" is defined as Table 1. For simplicity, we only show the number, interval value and scalar data types. Sets of all these data types can be treated similarly. Suppose that we want to find


employees who are "Big" in the sense that he/she is tall and heavy. A possible query might be as follows:

Select "EmployeelD" From "Employee" Where "Build=Skinny" with Threshold^ 0.75.

EmployeelD

001

002

003

004

005

Height

Short

180

Normal

170

Short

Weight

Average

Fat

[60, 70]

45

[40, 50]

Table 8.1 A database relation "Employee".

The attribute "Build" is not defined in the "Employee" relation but it can be compounded semantically from rigid attributes "Height" and "Weight". In fact, the relationship between the compounded attribute "Build" and the rigid attributes "Height" and "Weight" can be represented by a suitable aggregation function. In addition to numeric domains for rigid attributes, we assume that the scalar domain of the rigid attribute "Height" is {Short, Normal, Tall], the scalar domain of the rigid attribute "Weight" is {Light, Average, Heavy, Fat}, and the domain of compound fuzzy attribute is {Skinny, Average, Big}. In fact, we have assumed these scalar values as symmetric trapezoidal fuzzy numbers with side parameter b for simplicity. To determine such an aggregation function, nonfuzzy attribute values must first be converted and fuzzified into fuzzy sets so that all rigid attributes have the same data type. The selection of the most suitable aggregation function will be decided by additional information provided by the system designer, which will be explained in later section. To answer a null query with compound fuzzy attributes, similarity measure will be used to measure the "closeness" of the query attribute value with respect to compounded attribute value of each tuple. Those tuples with similarities greater than the threshold value will be the answer to the null query. Therefore, a general approach for handling null queries that contain compound fuzzy attributes consists of: (1) conversion and/or fuzzification of non-fuzzy-set rigid attributes, (2) selection of the most suitable


aggregation function according to the additional information given by system designer, (3) measuring the similarity between the compounded fuzzy attribute of each tuple and query statement. In the following, we will illustrate these steps in greater detail.

8.3.1 Unit-Interval Conversion and Fuzzification

A rigid attribute may be expressed as simple numbers, interval values, scalars, or sets of all these data types. When attributes of different data types are compounded, they have to be converted into the same range and the same data representation in order to apply aggregation functions. For data types such as numbers and sets of numbers, they have to be converted to unit-interval values and then fuzzified to fuzzy sets. For interval values, as the two end points are independent simple numbers, they can be converted separately and then fuzzified. Each interval value in a set of interval values is independent to each other and can be treated separately. For scalars that are not defined on the same range, they should be converted to the domain of unit-interval.

8.3.1.1 Unit-Interval Conversion

To minimize the effect of extraordinary data, we choose an 5-function to convert a rigid numerical attribute value x, which is not necessarily defined on the unit interval, into a new value y that lies in the range of unit interval. The conversion function is defined as,

= /(*) = l - 2x

2x

(*-max)/ /length

( x - m i n V /length

med < x

med > x

(1)

for Vxe [min, max], where "max" is the maximum of the domain of the attribute values, "min" is the minimum of the domain of the attribute values, "med" is the mean value of the domain of the attribute values, i.e., (max-min)/2, and the "length" is the range of the domain of the attribute, i.e., (max-min).


For example, in the Table 1, assume that the range of the domain of the

attribute "Height" for numerical type is defined on [100cm, 220cm] and

the range of the domain of the attribute "Weight" is defined on another

interval [10kg, 120kg]. For the fourth tuple in the Table 1, the value of the

attribute "Height" is 170cm, after applying the conversion function (1), the

converted value is 0.652778, as shown in Fig. 1. For the third tuple, the

value of the attribute "Weight" is [60, 70] kg, after applying the

conversion function (1), the converted value is [0.4132, 0.5868], as shown

in Fig 2.

convert Height into the unit interval

170cm 100 120 140 160 180 200 220

-convened Height

convert Weight into the unit interval

6Ukg 7ukg 10 30 50 70 90 110

I —»— converted Weight | Fig. 8.1 The convers ion of Fig. 8.2 The conversion of

at tr ibute value 170 cm. at t r ibute value [60kg, 70kg] .

8.3.1.2 Fuzzification

In order to convert rigid number attributes into the fuzzy set

representation, we define a possibility distribution,

P(z) = exp ( (z-y,) b

? \ (2)

where yt is the unit-interval converted value, and b is the side parameter of

symmetric trapezoidal fuzzy number representing compound fuzzy

attribute. For example, the attribute value 170cm is fuzzified as shown in

Fig. 3.

For interval data type, we define a trapezoidal type fuzzy set to fuzzify the


unit-interval converted values. The trapezoidal fuzzy set is defined as follows,

'(b-yl+zy P(z)=\l Z e b w J (3)

(y1+b-zy Z e [yt,y2]

Z>y2

where yt represents the converted left and right end-point of the interval value, for 2=1, 2. For example, the attribute value [60kg, 70kg] is fuzzified as shown in Fig. 4.

OS

0 6

0.4

02

0 <

H edohfc-170ari

" A \

/ \

/ \

J V 3 02 04 0 6 OS 1

Fig. 8.3 Conversion and fuzzification Fig. 8.4 Conversion and fuzzification for the number 170cm. for the interval [60kg, 70kg].

8.3.2 Aggregation Functions

Aggregation functions on fuzzy sets are operations by which several fuzzy sets are combined in a desirable way to produce a single fuzzy set. Any aggregation function can be chosen to represent the semantic relationship between rigid attributes and the expected compound fuzzy attribute [11]. Formally, an aggregation function on n fuzzy sets (n > 2) is defined by a function h [5],

A:[0,lj'->[0,1]. When applied to fuzzy sets Ah A2, ... , An defined on X, function h produces an aggregate fuzzy set B. Thus, B(x)=h(A!(x), A2(x), ... , A„(x)),


XE X , where X is the universal set.

For simplicity, we only consider averaging operators between the minimum and the maximum [3] [9]. The averaging operators that we choose to illustrate the semantic relationship are as follows,

5, = min(xl,x2,...,xn)

f l 1 — + — + . ( * , X2

1 . + —

X B2=n

B, = (^ ,XX 2 X. . .XJ : < )«

B _ ( * l + * 2 + - + * . )

n

Bi=l-{(l-xl)x(l-x1)x...x(l-xJ^

1 1 1 • + + ... + -1 - xx 1 - *,

B, =max(xl,x1,...,xn)

1-x.

(4)

<5)

(6)

(7)

(8)

(9)

(10)

where B2 is the harmonic mean, B3 is the geometric mean, B4 is the arithmetic mean, 65 is the or-geometric mean, B$ is the or-harmonic mean. The relationship among these averaging operators is as follows, and shown in Fig. 5,

min = Bl<B2< B, < Bt < B5 < B6< B1 = max .

8.3.3 Similarity Measures

A similarity measure of two fuzzy sets is a measure that describes the similarity between fuzzy sets [5]. A similarity measure SM is a real function defined as follows:

SM :FxF ->F ,

with the following properties,


(1) SM(A,B)=SM(B,A\ V A , B e F ;

(2) SM(D,Dc)=0, V D e P ( x ) ;

(3) SM(C,C)=MaxSM(A,B),VC& F;

VA, 5, C 6 F, if A c B e C, then SM (A, #) > SAf (A, C) (4)

and SM(B, C)> SM(A, C)

where /?+ e [0,1), X is the universal set, F is the class of all fuzzy sets on X, P(X) is the class of all crisp sets on X.

Many distance-based, set-theoretic-based, and matching-function-based similarity measures have been proposed [7][12][17][19]. In order to measure the similarity between compound fuzzy attribute values of each tuple and query statement, we propose that the following properties should be satisfied by a similarity measure,

(1) AnB=<t>^>SM(A,B) = 0;

(2) A = B^>SM(A,B) = l;

(3) AnB^Cr>D^>SM(A,B)>SM(C,D);

where A, B, C, D are the fuzzy sets on the university set X. In fact, we select a similarity measure based on distance function that is defined as follows:

S(A,B)=I-IK-Î

( i i )

where A and B are fuzzy sets and at £ A,bt G B.

8.4 Example

Allowing users to specify compound fuzzy attributes in query statements requires a mechanism for system designer to define the relationship


between rigid attributes and compound fuzzy attributes. We assume that only a chosen set of rigid-attribute-value-pairs will be given their corresponding expected compound fuzzy attribute value, ex,, by system designer. This chosen set of attribute-value-pairs may not cover the entire database. Then the relationship between rigid attributes and compound fuzzy attribute will be determined based on these given information. For example, the first column of Table 2 shows a chosen set of attribute-value-pairs from the values of the domain of the rigid attributes, and the second column shows the given expected values of compound fuzzy attribute.

To select the most suitable aggregation function to represent the relationship, all nonfuzzy attribute values must be unit-interval converted and fuzzified using equations (1)~(3). For each of these attribute-value-pairs, averaging operators, equations (4)~(10) are applied to calculate the aggregated values. Using equation (11), the degrees of similarity between each aggregated value and expected value are shown in Table 2 on columns three to nine for the seven operators respectively. It can seen that the Or-Geometric operator achieves the largest similarity measure on the average and should be selected as the most suitable aggregation function representing the semantic relationship between "Height", "Weight", and "Build".

Atfife\alBpas

W45)

(HHavtf

(roroaj ( l«Aeig;)

C*nral8a>

<T*mitG5)

0O,Anaaff)

<Sm,S)

mim (18QWD

(paaraoiRt) (DSlSJAaig:)

ehatPffl)

SJJIT,

Hg

A>aagp

£bw&

Hg

iâaagp

SJrny

Aiecgp

Skkiy

Hg

Hg

faaz&

Sim

Nfr

0

osrast

0

0SS381

C09HP

(mm Q118S4

0207245

0

005708

msm

0955574

0228535

Fbnrri:

0

029554

0

anas

Q1K816

0753366

Q138121

023CB55

0

Q1XB57

0276966

0581822

Q236B18

Gaanafc

0

045B2C4

0

Q507O1

COKES

0526439

003505

Q128852

0

Q17Z555

Q112221

Q5297C9

Q132284

AiMntib

0373701

0778816

0162585

0777755

02971

0790521

Q514GS

0253927

0487491

042250

05G833

0737885

OS2S641

CKiaxreB;

0526*83

08587371

02145345

0958305

03841819

0891*65

GGD4135

Q340KP

05791993

OSOHm

Q9C85524

0937*94

Q8300BQ3

OHazmi :

0882525

0897122

04301636

08224178

03053773

03284023

05579143

024CR247

0695S12

0808419

0890772

OS294CB9

Q2157S5

MK

0499S23

038189

0178479

085014

0342303

085192

0655287

azrsra

C6S8S8

0539219

Q897D7

0873239

0823377


<m<s> (ww

/VGctSM.

Skkity

Hg

-

CCM59

asset

0Z7H6

Q0DCB16

(HEW

0E99G7

1.1KB

Q19132

Q188251

QOE22

casus

Q5318S2

05522577

Q7K3Q52

Q S 7 S B

05257907

07*5127

QSB2135

Q55Z715

OTsac

OSI65M

Table 2 Selecting the most suitable aggregation function.

f \ — 4 * —

4 k \ 4 «

S i

0.2 0.4

-Build-Skinny

n O.ft O.H

Fig. 8.5 Compounded fuzzy attribute Fig. 8.6 Compounded fuzzy attribute value for the third tuple and the value for the fifth tuple and the query attribute value "Skinny". query attribute value "Skinny".

To answer the query, the degree of similarity between the compound fuzzy

attribute value of each tuple and query statement is calculated using

equation (11). Fig 5 shows the compound fuzzy attribute for the tuple 003

and the query attribute. The degree of similarity is SM((Normal, [60,70]),

Skinny)=0.021, which is less than the threshold value 0.75 and will not be

accepted as answer to the query. Fig 6 shows the compound fuzzy

attribute value for the tuple 005 and the query attribute. The degree of

similarity is SM((Short, [40,50]), Skinny)=0.943, which is greater than the

threshold value 0.75 and will be accepted as an answer to the query.

8.5 Conclusions

Null queries caused by missing attribute values or syntactical errors in

query statement have been studied widely. However, it is more difficult to

handle null answers caused by semantic errors. To reduce the occurrences

of null answers appeared in fuzzy queries, compound fuzzy attributes

derived from simple numbers were proposed previously. In this work, we

further extend the concept of compound fuzzy attribute so that it can be

derived from more complex data types such as interval values, scalars, and

sets of numbers, interval values, and scalars. In fact, we propose a general


approach for handling fuzzy queries that contain compound fuzzy

attributes that can be derived from fuzzy databases. Database management

systems that can handle this type of ambiguous attributes in null queries

not only can reduce the occurrences of null answers but also provide an

improved user-friendly query environment. For further investigation, we

plan to extend to fuzzy data types based on fuzzy sets of numbers, interval

values, and scalars, as well as similarity/proximity based fuzzy data types.

In addition, different selection schemes for the aggregation function will

be considered.

References

[1] P. Bosc and O. Pivert, "Fuzzy Querying in Conventional Databases," Fuzzy Logic for the Management of Uncertainty (Eds., Zadeh, L. and Kacprzyk, J.), John Wiley, N. Y.(1992)

[2] B. P. Buckles and F. E. Petry, "Fuzzy Databases and Their Applications," Fuzzy Information and Decision Process (Eds., Gupta, M. and Sanchez, E.), North-Holland, New York, pp.361-371(1982)

[3] D. Dubois, "A Review of Fuzzy Sets Aggregation Connectives," Information Sciences, Vol. 36, pp.85-121(1985)

[4] R. George, F. E. Petry, B. P. Buckles, and R. Srikanth, "Fuzzy Database Systems - Challenges and Opportunities of New Era," International Journal of Intelligent Systems, Vol. 11, pp.649-659(1996)

[5] G. J. Klir and B. Yuan, "Fuzzy Sets and Fuzzy Logic Theory and Application," Prentice Hall PTR(1995)

[6] D. H. Kraft and F. E. Petry, "Fuzzy Information Systems: Managing Uncertainty in Databases and Information Retrieval Systems," Fuzzy Sets and Systems, Vol. 90, pp. 183-191 (1997)

[7] X. Liu, "Entropy, Distance Measure and Similarity Measure of Fuzzy Sets and Their Relations," Fuzzy Sets and Systems, Vol. 52, pp.305-318(1992)

[8] J. M. Medina, M. A. Vila, J. C. Cubero, and O. Pons, "Towards the Implementation of a Generalized Fuzzy Relational Database Model," Fuzzy Sets and Systems, Vol. 75, pp.273-289(1995)

[9] M. Mizumoto, "Pictorial Representations of Fuzzy Connectives, Part I: Cases of t-Norms, t-Conorms and Averaging Operators," Fuzzy Sets and

Systems, Vol. 31, pp.217-242(1989) [10] A. Motro, "Query Generalization: A Method for Interpreting Null Answers,"

Proceedings of Is' International Conference on Expert Database System,


pp.597-616(1986) [11] T. Nomura, T. Odaka, N. Ohki, T. Yokoyama, and Y. Matsushita,

"Generating Ambiguous Attributes for Fuzzy Queries," Proceedings of 1992

IEEE International Conference on Fuzzy Systems, pp.753-760(1992) [12] C. P. Pappis and N. I. Karacapilidis, "A Comparative Assessment of

Measures of Similarity of Fuzzy Values," Fuzzy Sets and Systems, Vol. 56, pp.l71-174(1993)

[13] S. Parsons, "Current Approach as to Handling Imperfect Information in Data and Knowledge Bases," IEEE Transactions on Knowledge and Data

Engineering, Vol. 8, No. 3, pp.353-372(1996) [14] S. L. Wang and Y. J. Tsai, "Null Queries with Interval-Valued Ambiguous

Attributes," Proceedings of 1998 IEEE International Conference on

Systems, Man, and Cybernetics, San Diego, USA, October(1998) [15] S. L. Wang and Y. J. Tsai, "Extending Compound Fuzzy Attributes for

Fuzzy Queries," Proceedings of 5'h International Conference on Soft

Computing, Fukuoka, Japan, October(1998) [16] S. L. Wang and Y. J. Tsai, "Compounding Fuzzy Attributes from Scalars

with Uncertainty for Fuzzy Queries," Proceedings of the 1998 6'h National

Conference on Fuzzy Theory and its Applications, Taiwan, December(1998) [17] X. Wang, B. D. Baets, and E. Kerre, "A Comparative Study of Similarity

Measures," Fuzzy Sets and Systems, Vol. 73, pp.259-268(1995) [18] M. Zemankova, "FILIP: A Fuzzy Intelligent Information System with

Learning Capabilities," Information Sciences, Vol. 14, No. 6, pp.473-486(1989)

[19] R. Zwick, E. Carlstein, and D. Budescu, "Measures of Similarity among Fuzzy Concepts: A Comparative Analysis," International Journal of Approximate Reasoning, Vol. 1, pp.221-242(1987).

Chapter 9

Fuzzy System Description Language

Kazuhiko Otsuka1, Yuichiro Mori2, and Masao Mukaidono1

lMeiji University 2Kochi University

Abstract

As a first step toward standardization of a practical programming language for fuzzy system applications, we proposed Fuzzy system Description Language (FDL) in 1996. The specification of the first version of FDL was not definitive edition. This specification was designed for hardware-coding of fuzzy control systems based on fuzzy inference as a prerequisite. So although it fulfills the intended functions, several problems arise for unexpected applications. In this article, we first describe the specification of standardized FDL with its background and properties. Then, we consider some problems ( the assignment operation, the comparison operation and the internal expressions, etc. ) arised from wide applications of FDL. We describe the improvements of FDL. At last, we describe the fuzzy inference systems based on the indirect inference method with FDL and discuss some properties.

Keywords : Fuzzy Systems, Fuzzy control, Fuzzy Inference, Discrete Expression

9.1 Introduction

9.1.1 Background

At present, there are various systems applied fuzzy theory. Most of them aim at fuzzy control based on fuzzy inference. In these systems, the description style, for example rule-format and operations, of knowledge bases for fuzzy inference considerably looks similar to each other, although the implementation of operation methods, data expression and etc. is original for each system designer. Consequently, it is very hard to effectively

163

164 K. Otsuka, Y. Mori & M. Mukaidono

share knowledge bases which have been already accumulated. In addition to the problem of the implementation with software, in the systems using hardware optimized fuzzy inference, field engineers design the systems with native development environment depending on specific hardware and the primitive language like assembler language to optimize the stored data and programing code of ROM. In these ways, programmers have to master high skill and knowledge and the system contents are usually hard to understand for the others. In order to efficiently share and store knowledge bases, we need the standard system description language for fuzzy systems not depending on the system architecture and target field In this article, we describe the fuzzy system description language (FDL) as a standard system description language for fuzzy systems.*

9.1.2 Current status of fuzzy system construction

There are various methods for composing fuzzy systems, such as using generic build-in hardware like home electoronics, using native fuzzy processing chip and large knowledge base like expert systems, and using only software system with programming languages. These systems are originally offered by each industries, laboratories and universities concerning the development. In the case of hardware systems, the development environments are closely related with hardwares used, so the replacement of the completed system is very hard. In the software side, the system description languages for fuzzy system are extended from the popular programming languages like C, Lisp and Prolog, to express fuzzy data with user-function, matrix and process for fuzzy theory [4]-[l0]. These extensions use different definitions based on properties of each system case by case, and this tendency is especially strong in the practical systems, without a standard description language for fuzzy systems, the cultivated know-how of system developments can not apply to the others. Therefore, we need to standardize the description language for fuzzy systems so that this language can help to promote fuzzy system developments further in future.

*The fuzzy system description language (FDL) [l],[2], [3] is a part of project activities in 1994 by EIAJ ( Electronic Industries Association of Japan ) which is organized by hardware makers.

Fuzzy System Description Language 165

9.2 Fuzzy s y s t e m descr ip t ion language ( F D L )

9.2.1 Outline of FDL

Most of fuzzy systems purpose to fuzzy control based on fuzzy inference.

The main da ta in these systems are fuzzy da ta which appear as fuzzy sets

and fuzzy t ru th values expressed with membership functions, but numeric

da ta and character da ta , which are used in usual programming, are also

used in these systems. The fuzzy inference is the kernel in fuzzy systems, it

works together with other parts like pre-process and post-process. In other

word, fuzzy system is a special case of an usual system involving fuzzy

information processing in its main par t . So, the description language for

fuzzy systems must be general enough for usual information processing as

well as for fuzzy information processing.

When we are going to design a programming language, we can select a

language style in the following two ways. One is an original language which

can describe fuzzy information processing suitably, and the anothers is an

extension of existing usual language which is able to t reat the par t of fuzzy

information processing . In the former case, we must learn a new language

format and programming technique and this language has poor affinity to

existing systems.

It is important what language we choice as a base language for FDL.

There have been development environments based on Lisp [6] and Prolog

[7],[8], but they are particular development systems dedicated to the sub

ject of research. In real application fields, hardware designers use native

hardware description languages like VHDL ( VHSIC Hardware Description

Language ) [ l l ] , Verilog [12] and etc., and they efficiently store programs to

ROM by assembler language. Hardware description language is only effec

tive for hardware, and the primitive language like assembler language can

not solve various present problems. Therefore these languages are not good

choices. Recently, object oriented languages like C + + language and Java

are popular in the fields of the large system developments in project teams,

but they have not become popular in other field programmers. Moreover,

the language specification of Java is being updated frequently. With more

practical consideratin, FDL is based on ANSI C Language which has high

portability and is established as a system programming language.

As a recent tendency, we also have noticed tha t the development envi

ronments on many platforms like Microsoft Windows are shifting into RAD


( Rapid Application Development ) and visual development based on ob

ject oriented languages. There are many instances of system development

with Java which is adapt ing multi-platform and network environment. The

affinity between the implement of fuzzy theory and the function include

C + + language and Java will be bet ter than ANSI C language. Therefore,

the standardization based on object-oriented language will be considered

as a next stage of this research in near future.

In rest of this section, we explain the base of the s tandard of FDL:Fuzz

System Description Language (EIAJ, AEX-2002) [l]. In the next section,

we will point out some problems in FDL from used experience of it, and

propose some improvements for these problems.

9.2 .2 Specification of FDL

9.2.2.1 Basic Specification of FDL

The da ta types and operators for fuzzy da ta are not defined in ANSI C

language which is the base language of FDL. So FDL defines the following:

• fuzzy da ta type

• notations of membership function

• operators for fuzzy da ta

• other functions for fuzzy theory

In addition, as most fuzzy systems are implemented for the fuzzy control

based on the fuzzy inference, FDL has to include the following:

• management of knowledge base

The other definitions are the same as ANSI C language.

The main purpose of FDL is to describe systems applied fuzzy theory

to wide applications domains. But there are many mat ters which are not

clearly decided how to be t reated them formally in fuzzy theory. So at

current stage, we forcus on the description fuzzy inference engines and the

the parts related to the fundamentals of the fuzzy theory.

9.2.2.2 Fuzzy type

Because the base language of FDL is ANSI C, all da ta used in program must

be declared in advance. We defined fuzzy t y p e for fuzzy set and fuzzy


t r u t h value for numerical truth value. When we declare fuzzy type, we must appoint the tag name; the type and the range of support set.

fuzzy name { type, left, right }; type = { i n t | f l o a t | double | enum }

In this example, "name" is a tag name of this fuzzy type. This is treated as well as s t r u c t and union tag in C language, "type" is the support set type of this fuzzy set, and the range of "name" is limited by two values of l e f t and r i g h t . The support set types are the integer type(int), the enumeration type (enum) limited to finite elements and the float type ( f loa t .double) which has the continuous space as the support set. In the case of enumeration type, the declaration form is following:

enum color { red , green, b lue , b lack, white }; fuzzy background { color };

When the support set is the continuous value, we have to write an appointed division number following the range field. If the division number field is omitted, this value is assumed "10" as a default.

fuzzy speed { double, 0 .0 , 200.0, 100 };

In the above example, the last value (100) shows the division number. Therefore, this fuzzy type means this membership function is stored as 101 points, which divide the interval from 0.0 to 200.0 by 100 periods. If the referenced point is different from the storing points, it is assumed to take the nearest storing point.

The declaration form of multi-dimension fuzzy type is following:

fuzzy g r id { i n t , 0, 5 } { double, - 10 .0 , 10.0, 50 };

The variable declaration of fuzzy type is same as the numerical type like "int" in C language.

fuzzy speed slow, medium, fast;

9.2.2.3 The notation of membership function

There are four kind notations for a membership function in FDL.


• Enumeration type • Function type • Singleton type • System type

We can only use these notations as the initial values at the variable declaration.

(a) Enumeration type This notation type is described by some pairs of element with its membership degree ( it is called truth value, too ). The membership degrees at non-described points are supposed to be 0.

fuzzy TypeName VarName = elements { ( ElementName, TruthVarlue ) , ••• };

The fuzzy data VarName belongs to fuzzy type TypeName. Truth Value is a membership degree of ElementName.

For example,

fuzzy color sample_color = elements { ( red, 0 . 5 ) , (yellow, 0 . 1 ) , (b lue , 1.0) };

(b) Function type This type uses a user defined function which is usable in many general programming language. This function receives some elements as function arguments and returns one fuzzy truth value data as its membership degree, funct ion is the keyword of this notation followed by the function name used as its definition. The deneral definition is following:

fuzzy TypeName VarName = function DefFuncNameO fuzzy_truth DefFuncName( ArgList )

{ ••• }

For example, fuzzy data slow which belongs to fuzzy type speed is defined by function slow.value.

fuzzy speed slow = function slow_value();

fuzzy-truth slow_value(double pos)

{ ••• }


(c) Singleton type Singleton type must specify only one element. Its membership degree is 1 for the element, and for every other is 0.

fuzzy TypeName VarName = s ingleton { ElernentName };

For example, attention_color which means the degree of yellow is only 1, and others degree are 0 is described following:

fuzzy color attention_color = s ingleton{ (yellow) };

(d) System notation At present, most fuzzy types used in fuzzy systems are one-dimensional and numeric fuzzy data, which means the support set is either integer type or floating points type. Moreover, their membership functions usually have simple shape like triangle or trapezoid. Therefore, we prepared four notations for these membership functions: enumerations with interpolation type (points), vector type (vector), triangle type (triangle) and trapezoid type (trapezoid). In these notations, we describe a membership function by pair of some characteristic elements and their membership degrees. The membership degree of other unspecified points are calculated by liner interpolation with near two specified elements.

fuzzy speed medium = tr iangle {40, 60, 80 };

9.2.2.4 Operator

The operations for fuzzy type are following:

• Assignment operation • Compare operation • Logical operation

In the operation between a fuzzy type and a usual numerical data type, the latter is casted into the fuzzy data whose value is a pair of the element and the membership degree of 1.

(a) Assignment Operation The assignment operator is "=" as the same of C language. The numerical data and the variable as fuzzy type can assign in anywhere in the program, but the fuzzy data notation can be used only in the variable declaration.


(b) Comparison operation The comparison operations for fuzzy truth value are the same of usual C language's. On the other hand, the comparison operations among the fuzzy type are only four kinds, = = , !=, <, > and they are evaluated only when their support sets are numeric and one-dimensional. " = = " can use for all fuzzy types and the result is true only two fuzzy data are complete identity. "!=" operator is the negation of " = = " . In the comparison for larger or smaller, if two fuzzy data are overlap, the result is false because we consider here it is impossible to compare. Although there are many interpretations in this case, we supposed as the above.

Fig. 9.1 The comparison of two fuzzy data

In Fig.9.1, Fuzzy data A is smaller than B and C, but the results of both " < " and ">" between B and C are false.

(c) Logical Operation There are three kinds of the logic operations: and, or, not. As one of the features of fuzzy theory, there are various definitions of logical operations. Their popular definitions are four kinds: logic, algebraic, bounded and drastic. Moreover, there are many reports about the useful definitions of the combination of them. Therefore, it is improper to fix their definitions. We adopt that the operator definition can be selected by programmers such as the operator-overload mechanism in C + + language.

fuzzy_and my_original_and;


In the above example, fuzzy_and is the keyword for the definition of and-operation, and the function named my_original_and is the definition of the operator and. This function has two arguments ( one argument if negation ) which are fuzzy truth value, and returns one fuzzy truth value. In Table. 9.1, we show the list of popular definitions and their functions name.

Table 9.1 Keyword of popular logical operation.

and or Logical fuzzyJogicand fuzzy Jogic.or Algebraic fuzzy_algebric_and fuzzy _algebric_or Bounded fuzzy_bounded_and fuzzy _bounded_or Drastic fuzzy_drastic_and fuzzy _drastic_or

9.2.2.5 Rule Base

The rule-base expressing the knowledge is the most important in the system fuzzy inference is applied. In general, the rule-base is described with if-then forms. We can select operation definitions in each rule-base because there are many inference methods. Table 9.2 shows the selectable operations and their default definitions.

Table 9.2 Operations in Rule Base. Operation And Or Not Modification Aggregation

KeyWord fuzzy _and fuzzy _or fuzzy _not modification aggregation

Default fuzzyJogicand fuzzy_logic_or fuzzy _one_minus fuzzy _logic_and fuzzy_logic_or

rule_base control(fuzzy speed self, fore) (fuzzy action result)

i if self is slow and fore is slow then result is no_action;

if self is slow and fore is false then result is accelerate;

}


In this example, two fuzzy data, se l f and fore which belong to speed are the input arguments of this rule base, and r e s u l t of ac t ion type is the output data. In the left part of "is" in the conclusion part, we can write only output variable. Moreover, in the conclusion part of a rule we can adopt one-dimensional liner function (Takagi-Sugeno Model[l5]) and we can describe the negative conclusion part (else) and weight part (with) for each rule. The rule-base has the mechanism to include the other rule-base (include) to recycle knowledge base. C language, which is a procedural programming language, executes statements in order, but in our rule-base the rule order is not influenced for evaluation. And this mechanism is applicable to the include rule-base, too.

9.2.3 Internal Expression

The quantity of information required by fuzzy data is normally much more than the usual data because the fuzzy data is expressed by the set of elements, and its membership degrees. This situation is avoidable because of the concept of fuzzy theory in which expression is more vague than the conventional. But this property causes the information quantity be huge when we completely express the fuzzy data. It is especially true when support set is continuous type. Therefore, we usually use approximate expression methods.

Here, we examine about the internal expressions of fuzzy data.

9.2.3.1 User Function method

The fuzzy data with continuous support set are usually expressed by analytic functions. One way is to represent whole membership function with an user-defined function in program. This method has an advantage in the sense of a little quantity and high precision. However, it is very hard to change the membership shape because fuzzy data is a part of program code. Moreover, we can not store the result of the calculation by the same method in general. We conclude this method is unfit for the internal expression of fuzzy data.

9.2.3.2 Representative points method

The most membership functions are the simple shape like triangle and trapezoid. In this case, we can express the whole membership function with


high precision and little quantity by designating typical points. There is no problem for the result of calculation and in changing data in this method. Therefore, this fits as the internal expressions of fuzzy data. However, the amount of information needed to specify its shape is increaseing in proportion to the complexity of its shape. In general, fuzzy data become more complicated according to repetition of operations in this method. When we implement it by hardware to speed up and compose a built-in control system, we have strong restrictions for memory size and operations. Therefore this method is considered unfit, as the main target of FDL at this point is practical fuzzy control.

1

0

Fig. 9.2 Representative Points Method.

9.2.3.3 Interval division method

The final method is that fuzzy data are stored as matrix elements divided by constant interval on the continuos support set. This method usually needs larger memory space than the others and is poorer in data precision. However, this is not affected by a shape of fuzzy data and can express by constant memory size. This is a suitable expression for the implementation of fuzzy data with hardware, because in this method the amount of data, which is decided by requested precision at first, is not increase.

As the conclusion, we adopt this expression as the internal expression of fuzzy data in FDL specification.


H 1 h H 1 • -

Fig. 9.3 Interval Division Method.

Table 9.3 Method

User Function method

Representative points method

Interval division method

Specification of expression method. Weakness Advantage

Difficult for changing degree values Problem of the result of calculation High precision Expression with low memory space

Data amount depends on its shape Need to interpolate at reference of elements Expression with low memory space High precision

Many unuseful data are contained Slow calculation speed Data amount does not depend on its shape Operation is the repeat of simple works Easy to parallel processing

9.3 Improvement of standardized FDL

9.3.1 A few problems of standardized FDL

The final aim of FDL is to describe all fuzzy data processing and proce

dures in fuzzy theory. The specification of FDL standardized by EIAJ [l]

is not definitive edition. This specification is designed for hardware-coding


of fuzzy control system based on fuzzy inference as a prerequisite. So, although it fulfills the intended functions, several problems arise for unexpected applications. In this section, we pick up these problems and propose the improvements.

9.3.2 Improvement of assignment operation

In FDL, all fuzzy sets are expressed using limited data and each data consists of pairs of an element and the membership degree, so that we can define fuzzy data by enumerating the pairs. But, this notation is very inconvenient for some practical uses. Therefore we equipped a triangle-notation and a trapezoid-notation with FDL. Fuzzy data defined by those notations are automatically converted into the enumeration forms with specific accuracy.

For example, the following definition represents Fig.9.4 and it is actually stored as Fig.9.5.

fuzzy speed FAST = t r i a n g l e { 60, 80, 100 };

1 -

0 50 100 speed

Fig. 9.4 Fuzzy data defined by triangle-notation.

According to C language specification ( because, FDL is based on C language ), these conversion procedures must be defined for each fuzzy type. When the definition was described as an initializer like above-mentioned, the syntax parser can decide the procedure that should be used, because


l - n

* •

i 1 1 • 1 1 s —

0 50 100 s P e e d

Fig. 9.5 Stored data actually.

its definition of fuzzy type is stated clearly in the left side of "=" operator.

FAST = t r i a n g l e { 60, 80, 100 };

When we use dynamic allocation of fuzzy data as an assignment statement according to the C language syntax analysis rule, the syntax parser can not detect the fuzzy type of left part. Therefore, for the upward compatibility with C language in FDL, we didn't define the assignment operation of fuzzy data except for an initializer.

As previously mentioned, standardized FDL aims at hardwarizing fuzzy system. Limitation of assignment operation enable us to store initial fuzzy data into ROM which is cheaper than RAM. If we use dynamic allocation of fuzzy data, a dependent rate RAM becomes high.

If we can't use assignment operation among a program, in other words, if we can't use dynamic allocation of fuzzy data in a program, it will be very strong constraint and restriction of the expressive ability as a programming language. When we consider various utilizations of FDL, flexibility of description is very important rather than hardwarizing. Additionally, the price of RAM becomes cheaper day by day and the problem of cost is not as important as before.

In current FDL, the function to change degrees is prepared only for an individual element of fuzzy data. By using this function for all elements of support set, we can realize same function to the assignment operation.


But this method is not practical. So we propose to improve FDL such that fuzzy data described with triangle, trapezoid and so on can be assigned to a variable like a usual numerical value.

9.3.3 Improvement of comparison operation

The comparison operation in current FDL returns {0,1} value which means completely satisfy or not instead of fuzzy value of [0,1]. For example of coincidence operator (==) , in the only case that degrees of all elements of two fuzzy data are completely equal each other, the return value is l(true), and in the other case it becomes O(false).

Fig. 9.6 Comparison of fuzzy data.

For example, in the relation shown in Fig.9.6, A==B and A==C return O(false). But, this definition is clearly different from a general concept of fuzzy theory. In the case of A==B, it is proper to return a value of similarity degree [0,1].

When we consider a definition of the operation on the basis of this concept, there was not typical interpretation for the degree of coincidence of two fuzzy data which are intersected each other. We cannot define this similarity operator easily now, because many discussions are necessary to fix this operation method. This definition problem of intersection degree is related to other comparison operations. We can re-define operations for logic operators ( not,and,or ) in FDL program. Similarly, we propose to change the FDL specification such that the comparison operators can be


re-defined in program.

9.3.4 Improvement of Internal expression

In the case that support set is finite in fuzzy data, we can express it by an array or a list structure and can easily use it in program. But in the case that support set is continuous type, the only method to express fuzzy data exactly is to define the membership function as a function which is expressed by a formula. But this method has many problems. So we express a fuzzy data of continuous support set by finite (limited) data. There are two typical methods for the expression as mentioned in 9.2.3.2, 9.2.3.3.

(1) Interval division method (2) Representative points method

At present, in the standardized FDL, fuzzy data are treated with the method of (1) for hardwarizing fuzzy system. But as mentioned earlier, the situation is changed. With a view of characteristics of each method mentioned in 9.2.3.2, 9.2.3.3, we have to reconsider the necessary functions and abilities of representing fuzzy data for future usage.

9.3.4.1 New requirements and abilities for FDL

One of the targets of the standardized FDL is reduction of cost. In other words, it is to reduce parts (chips) number. The small memory and the simple processor are good for hardwarizing. Additionally, widely utilizable fuzzy system needs constant precision and stability of throughput speed. So the interval division method was adopted to satisfy those requirements.

At current stage, the FDL is an experimental production for research and development, and algorithm verification. In other words, a case of software-level system construction will be increased. Therefore, the requirements on the hardware resources, the precision and the throughput speed are different from the case of hardwarization. There is not so strong need of memory efficiency because recent computer has much memory. Alternatively, the precision is more important than the throughput speed.

• Requirements for hardwarization

— Small and constant memory required — Constant precision


— Stability of throughput speed by simple processor

• New requirements

— Dynamic definition of fuzzy data — High precision — Adapting to constrain for hardwarization

9.3.4.2 Fusion of two methods

The internal expression of fuzzy data must be changed following the shift in the purpose of FDL. But we can't make away with the hardwarizing function, because the applications of FDL to the hardware realization don't fade out. Therefore, we added representative points method to interval division method as internal expressions of fuzzy data. When we define a fuzzy type as usual, we can use the representative points method; in this case, the division number of internal division method becomes the upper limit of the number of elements.

1 -

0

Fig. 9.7 Example of definition of near points.

In this case, however some problems are left. When points are extremely close as shown in Fig.9.7, information is lost when conversion of expression method is used(Fig.9.8). This problem can be avoided by the programmer side, because the division by designated number depends on the precision demand that a programmer specifies. The implementation in a hardware


1 -

Fig. 9.8 Information loss by changeover of expression method.

must limit the expression method to the interval division method. It should be decided according to a compiler option or a limitation of system specification.

9.4 Enhancement of the indirect inference

The popular inference method in fuzzy logic is the direct inference method proposed by Zadeh [l6],[l7], Mamdani [18] and others. There is the indirect inference method proposed by Tsukamoto [20], Baldwin [19] and others. The direct inference is calculated directly by set operations for fuzzy data given as input and knowledge. On the other hand, the indirect inference method executes inference in such a way: first mapping the whole fuzzy data to the space of fuzzy truth values, then processing the essential part of its inference in fuzzy truth space and returning the result to original space.

In the section, we forcus on the direct inference method. We will examine the problems and their solutions when we describe the system using the indirect inference method with FDL in next section.

9.4.1 The truth qualification

For example, suppose we are given the following fuzzy predicate:


((x) is .4) is more-or-less-ture

In this case, when we fixed the variable x to an element e in the universal set U, we can replace the above predicate into the following proposition:

((e) is A) is more-or-less-true

If the linguistic truth value more-or-less-true is expressed by T, then we can get new truth value corresponding at e by the following function:

MA'(e) = M/*/i(e))

We can obtain the new fuzzy set (A ) when this operation is executed for the whole universal set U. We can easily show this operation in Fig. 9.9 in which the left part is rotated from the original figure to left 90°.

Fig. 9.9 The truth qualification.

9.4.2 The converse truth qualification

The converse truth qualification is the inverse problem of the truth qualification. For example, when the following equation and fuzzy predicates A,B are given, we can calculate the linguistic truth value r from two fuzzy set A and B. This process is called the converse truth qualification.


((x) is A) is T = (x) is B

In this case, we can replace above predicate to the following function.

/J,T(HA(X)) - HB{X)

HT(S) = M B ( ^ 1 ( S ) )

We can obtain r by such process. It is shown in Fig. 9.10 as well as Fig.9.9

Fig. 9.10 The converse truth qualification.

When the membership function of fuzzy data A is one to one correspondence, we can obtain r easily as shown Fig. 9.10. In this article, we consider only the cases of one to one correspondence for A.

9.4.3 The extension for FDL

In the truth qualification and the converse truth qualification, the main features which are not in the direct inference process, are the following:

(1) Request a converse function of membership function.


(2) The membership degrees of a fuzzy da ta have to correspond to the elements of the t ru th values.

These points are applied to the internal expression of fuzzy data . We

have to calculate the converse function to execute the converse t ru th quali

fication, but it is very hard by user function method (9.2.3.1). The popular

solution is to prepare it by programmers, but this is inconvenient and un

practical obviously. Therefore the user function method is unsuitable for

general internal expression for fuzzy da ta from the above reason, too. In

the case of interval division method (9.2.3.3), the support set is divided by

the constant interval, but there is no restriction to the side of membership

degrees. In the t ru th qualification and the converse t ru th qualification, the

membership degree of a fuzzy da ta becomes the elements of the other fuzzy

da ta ( fuzzy t ru th value ) in next step, therefore, in general, the intervales

of membership degree and elements of fuzzy t ru th value usually mismatch.

In the t ru th qualification shown at Fig. 9.11, the elements of T don't cor

respond the membership degrees of A. We must calculate the membership

degree of it using interpolate by two nearest elements on r .

Moreover, in the converse t ru th qualification shown at Fig.9.12, we have

to interpolate more times compare than the t ru th qualification.

Theoretically, when we repeat these operations alternately, all fuzzy sets

remain because the t ru th qualification and the converse t ru th qualification

have duality. But if we use the fuzzy sets expressed by the interval division

method, we may get the result shown at Fig. 9.13. The example of Fig.

9.13 is the result of 100 over loops, where HA(X) = x2, ^B{X) = 1 — \fx and

the interval of division is 0.1, and the operations of the t ru th qualification

and the converse t ru th qualification are applied alternatively. The upper

figure of Fig. 9.13 is of B and the lower figure of Fig. 9.13 is of r , where the

final result becomes to straight line in r . In the actual indirect inference, we

don' t repeat the operations so many times but this problem affects the result

of indirect inference. In the interval division method, the interpolation

number is increased according to the member of operations.

Therefore, it is concluded tha t the representative points method is ef

fective as the internal expression of fuzzy da ta for these operations.


m \ .

• \ ^

\

^~~ -<v

r 1

1 0 1 i^T- — i 1 —

J I

— i 1

Fig. 9.11 The interpolation in the t ruth qualification.

Fig. 9.12 The interpolation in the converse t ruth qualification.


Final result of B

• Final result of z

Fig. 9.13 Example of the accumulation of the calculation error.


9.5 C o n c l u s i o n

In the expression of continuous fuzzy data , there are two problems. One

is related to theoretical aspect and the other is actual kinds of processing

on the computer. For example, one of the problems in theoretical aspect

is tha t many definitions of fuzzy operations are not yet decided, and one

of the problems in actual processing on computers is how to represent the

continuous membership functions in memory using finite da ta points. The

realization and the standardization of fuzzy system description language, in

which these above problems have to be considered, are expected. But un

til now, as fuzzy system description languages were designed by individual

developers, the efficienoy of developing fuzzy system is still unacceptable.

To imporve this situation, we proposed Fuzzy system Description Lan-

guage(FDL) in 1996. This FDL is the first step toward standardization of

a practical programming language for fuzzy applications.

In this article, as the the continuous work some considerations were

performed on these problems raised from experiences of FDL for describing

fuzzy systems. We executed verification about internal expressions of fuzzy

da ta which become important when the area of applications of FDL is

expanded. We aslo decided the representation method such that it will

be suitable for applications in wide range of fields. We will continue to

investigate its suitableness for various utilization.

Furthermore, we concluded that the definitions of almost fuzzy opera

tors can be re-defined except unique definitions in a theoretical sense. The

primary C language does not have such function, but it is very important

function for fuzzy da ta processing. In the current FDL, the processing and

description styles are based on C language. We believe that an object ori

ented language is suitable for proposed FDL. From the standpoint of the

current movement of software systems, we need to consider adoption of an

object oriented language for next version of FDL. It is the subject for a

future study.


R e f e r e n c e s

EIAJ: Standard Specification of Fuzzy System Description Language, EIAJ AEX-2002, 1996

Kazuhiko Otsuka, Yuichiro Mori, Masao Mukaidono: Steps Toward standardization of Fuzzy System Description Language, Proceedings IIZUKA '96, pp.259-263, 1996

Kazuhiko Otsuka, Yuichiro Mori, Masao Mukaidono: Steps Toward standardization of Fuzzy System Description Language II, Proceedings IIZUKA '98, pp.953-956, 1998

Makoto Abe, Mikio Nakatsuyama, Hiroaki Kaminaga: Fuzzy and its application, Proceedings First Asia Fuzzy System Symposium, pp.666-671, 1993

Motohide Umano, Itsuo Hatano, Hiroyuki Tamura: Data Structures and Manipulation for Fuzzy Sets on Digital Computers, 9th Fuzzy System Symposium in Sapporo, pp.77-80, 1993

Motohide Umano, Kenji Kume, Itsuo Hatono, Hiroyuki Tamura: Common Lisp Implementation of Fuzzy-Set Manipulation System, Proceedings First Asia Fuzzy System Symposium, pp.660-665, 1993

Liya Ding, Zuling Shen, Masao Mukaidono: The Properties of Fuzzy Logic for Fuzzy Prolog, Proceedings First Asia Fuzzy System Symposium, pp.648-653, 1993

Hiroaki Kikuchi, Masao Mukaidono: Linear resolution for fuzzy logic program, Journal of Japan Society for Fuzzy Theory and Systems, Vol.6 No.2, pp.294-303, 1994

Yoshifumi Inoue: Fuzzy Set Processing with C + + : 8th Fuzzy System Symposium in Hiroshima, pp.353-356, 1993

Masuo Furukawa, Takeshi Miura, Takashi Matsuda: The Programming Language for Fuzzy Control, 10th Fuzzy System Symposium in Osaka, pp.551-554, 1994

IEEE Standard VHDL Language Manual, Std 1076-1987, IEEE, NY, 1988

IEEE Standard Description Language Based on the Verilog(TM) Hardware Description Language, Std 1364-1995, IEEE, NY

188 K. Otsvka, Y. Mori & M. Mukaidono

[13] Toyuhiko Hirota, Torao Yanaru: A Digital Representation of Fuzzy Number and Its Calculation, Proceedings of International Conference of Fuzzy Logic & Neural Networks, pp.527-530, 1990

[14] Mayaka F. Kawaguchi, Tsutomu Da-te: On Calculations of Fundamental Operations of Weakly Non-Interactive Fuzzy Number, Journal of Japan Society for Fuzzy Theory and Systems, Vol.4, No.3, pp.93-105, 1992

[15] T. Takagi, M. Sugeno: Fuzzy Identification of Systems and Its Applications to Modeling and Control, IEEE Trans. Systems. Man. and Cybernetics., SMC-15, 1, pp.116-132, 1985

[16] L.A.Zadeh: The Concept of a Linguistic Variable and Its Application to Approximate Reasoning Part-I, Information Sciences, 8, pp.199-249, 1975, Part-II, Information Sciences, 8, pp.301-357, Part-Ill, Information Science, 9, pp.43-80

[17] L.A.Zadeh: Fuzzy Logic and Approximate Reasoning, Synthese, Vol.30, pp.407-428, 1975

[18] E.H.Mamdani, S.Assilian: An experiment in Linguistic Synthesis with a Fuzzy Logic Controller, Int. J. Man-Machine Studies, Vol.7, pp.1-13, 1974

[19] J.F.Baldwin: A new approach to approximate reasoning using a fuzzy logic, Fuzzy Sets and Systems, Vol.2, 1979

[20] Y.Tsukamoto: An approach to fuzzy reasoning method, Advances in Fuzzy Set Theory and Applications, edited by M.M.Gupta et al, North Holland, 1979

[21] Masao Mukaidono, Kazuyuki Nojima: The Relation Between The Direct Approach and Truth Space Approach in Approximate Reasoning System, Journal of Japan Society for Fuzzy Theory and Systems, Vol.4, No.2, pp.325-333, 1992

Part II: Knowledge Representation, Integration, and Discovery by Soft Computing

Chapter 10

Knowledge Representation and Similarity Measure in Learning a

Vague Legal Concept

MingQiang Xu1, Kaoru Hirota1, Hajime Yoshino2

1 Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology

4259 Nagatsuta, Midori-ku, Yokohama 226-8502, Japan 2Meiji Gakuin University, Legal Expert Laboratory

Abstract

Knowledge representation and similarity measure play an important role in classifying vague legal concepts. In order to consider fuzziness and context-sensitive effects, for the representation of the precedent, a fuzzy factor hierarchy is studied. Current distance-based and feature-based similarity measures are only surface level ones that can't make more than a comparison between objects. Therefore, a deep level similarity measure that can evaluate the results of the surface level one is needed. A structural similarity: factor-based similarity, that is integrated by the surface level and deep level ones is proposed. An argument model that is based on the proposed knowledge representation and similarity measure is proposed. Considering the vague legal concept in the United Nations Convention on Contracts for the International Sale of Goods(CISG), a fuzzy legal argument system is constructed. The main purpose of the proposed system is to support the law education.

Keywords : legal expert system, fuzzy logic, knowledge representation, legal reasoning, vague legal concept, CISG, similarity measure, legal argument, case-based reasoning, factor hierarchy, context, retrieval

189

190 M. Xu, K. Hirota & H. Yoshino

10.1 In t roduct ion

It is known that vague concepts exist in knowledge-based systems [11]. Usually there is no single explicit representation of an entire concept or class, but only representation of the examples of the vague concept. For a query case, by learning from the examples, it can be known whether the query case belongs to the vague concept. Many methods have been developed to address this issue. The argument seems to attract the interest of researchers from various areas, especially from legal reasoning, decision theory, philosophy, psychology and cognitive science [10].

Knowledge representation and similarity measure play an important role in classifying vague legal concepts. In the conventional legal reasoning systems, fuzziness has not been deeply considered, especially in the legal knowledge representation, while the similarity measures have not been sufficiently investigated because they cannot give a context-sensitive similarity measure.

Our goal is to develop a computational and representational model of argument considering fuzziness and context-sensitive effects.

Factors of a case are usually employed to represent the case [ll]. We know that there is uncertainty and vagueness of the information in cases, a case has not always explicit and specific factors. Therefore, a fuzzy factors hierarchy composed of issue, atomic factors and abstract factors is studied for case representation.

The current distance-based and feature-based similarity measures are only surface level ones that can just make a comparison between objects. Therefore, a deep level similarity measure that can evaluate the results of the surface level one, namely, a context-based similarity measure is proposed. On the basis of these viewpoints, a structural similarity, i.e., factor-based similarity, that is integrated by the surface level and deep level ones is proposed to model the legal argument. Considering the vague concept in the CISG (United Nations Convention on Contracts for the International Sale of Goods), a fuzzy legal argument system is constructed.

The fuzzy factors hierarchy is introduced in section 2. The similarity measure in legal reasoning is summarized in section 3. The structural similarity measure, that is a factor-based one, is proposed in section 4. The extension of distance-based and feature-based similarity measures is discussed in section 5. The context-sensitive similarity measure is proposed in section 6. The legal argument based on the structural similarity is de-

Knowledge Representation and Similarity Measure ... 191

scribed in section 7. An experiment on classifying the vague concept of the CISG is illustrated in section 8.

10.2 Fuzzy Factor Hierarchy

Concept representation using factors is an approach often used in legal expert systems. In order to represent the complex relations between a concept and its factors, factors hierarchy is also used [ll]. A factor is either an element of a case or not in the literature. Whether a query case belongs to a concept is judged by the common and distinct factors between the query case and precedent cases.

Howeve, all the factors of a case are not always crisp, namely, a factor is probably not either an element of a case or not, sometimes is an element of a case only in some extent. The value of a factor may also be fuzzy linguistic representation. So traditional factors hierarchy is not appropriate approach for modelling human internal representation of cases. A fuzzy factors hierarchy is for representation of legal cases is proposed.

Definition 1 A fuzzy factor is composed of Name, Degree, Values, Relations and Agent, denoted by tuple < N, D, V, R, A>, where

N : Name , describing the facts of cases, D: Degree, indicating to what extent a case has the factor, V: Factor Values, representing how extreme the factor is in a case to

which it applies, R: Relations, representing the level of the action of support or oppo

sition for other factors, A: Agent, represents the side whose viewpoint the factor represents.

of the agents.

The names of factors are usually symbolic expressions describing the facts of cases. A case usually consists of a set of factors, where the dimension of the set is the number of factors belonging to the set.

In the traditional legal expert systems, the degree to which a factor is is an element of a factor set that describes a case is either 1 or 0, in other words, is in a Yes/No form. But it is sometimes difficult to judge in the Yes/No manner because of vagueness and uncertainty.

The degree to which a factor is an element of a factr set can be described in several ways. We employ the membership concept and vagueness concept to represent the fuzzy Yes/No [7], where the adaptation that specific


knowledge described by limited words is represented by the concept of membership, and the uncertainty of knowledge is represented by the concept of vagueness.

The degree to which a factor is an element of a factr set can be described in several ways. We employ the

The values of a factor are its magnitudes, often represented by quantity. Quantitative property is related with numeric data, and it is an important nation in argument system. It usually has three types of representations, including crisp-data, interval-data and fuzzy-data.

The relation between factors can be strengthened or weakened with qualifiers, which could be crisp or fuzzy. In different domains, different relational expressions exist. For example, in the case of the CISG, those could be the expression like: {Best support, Support, Contrary, Best Contrary}.

Each factor represents the viewpoint of one of the agents who takes part in the argument. In legal argument system, agents usually include only a defendant and a plaintiff.

Legend Y N

++:

-:

Factor favors Factor unfavors Best Support Best Contrary -

the issue the issue +: Support : Contrary

Fig. 10.1 An Example of Fuzzy Factors Hierarchy in the CISG

Fuzzy factors are divided into fuzzy atomic factors and fuzzy abstract factors.


Fuzzy atomic factors represent the surface facts of a case. The abstract factors are the connections between the claim of a vague concept and the fuzzy atomic factors.

The claim of a vague concept is called Issue. A fuzzy factors hierarchy can be defined in the Definition 2.

Definition 2 A factors hierarchy composed of three layers.

Top layer: Issue

Middle layer: Abstract factors

Bottom layer: Atomic factors

The top layer of a fuzzy factors hierarchy contains only a single node, representing the claim of a vague concept. The bottom level is the fuzzy atomic factors.

The factors of the middle layer are structured by fuzzy abstract factors. A node can be expressed by conjuction and and disjuction or of the sub-nodes. There is usually more than one node in the middle layer.

The fuzzy factors hierarchy itself represents a causal connection between an Issue and the related fuzzy atomic factors as well as fuzzy abstract factors.

Figure 1 shows an example of a fuzzy factors hierarchy of the legal reasoning system for the CISG. The top level Issue is a claim of the vague concept "The proposal is sufficiently definite". The Fl , F2, F3, F4, F5, F6 and F7 are abstract factors, while fl, f2, f3, f4, fo and f6 are atomic factors. The meaning of these factors will be explained in section 8.

10.3 Similarity Measure in Legal Reasoning

In legal reasoning, the similarity measure is not as simple as that of the other domains such as pattern recognition or data mining since not only the numeric value but also symbolic value must be considered. Therefore, these special characteristics of the similarity measure in legal reasoing are given below.

Considering the Fuzziness

The fuzziness in legal case has been represented by fuzzy sets in Chapter 2. the same fuzziness should be considered in the similarity measure as well. In fact, fuzziness is also a reason of legal argument. Because of fuzziness,


similarity and distinction in the judgments of plaintiff and defendant exist.

Considering Agent's Viewpoint

A characteristic of the legal reasoning is that the agents typically include a plaintiff and a defendant. Because an argument exists between them, the viewpoints of two agents should be considered in similarity measure, Usually, one agent proposes a claim based on the similarities between cases, whereas the other agent uses the distinction between the cases to downplay the importance of the similarities. Without distinction, there is no argument. So, in legal argument, both similarity and distinction, that express the viewpoints of two agents, should be measured according to the movement of legal argument.

Context-Based Similarity Measure

In some situations, the similarity measure is changed with the meaning of the context, e.g., in a fuzzy factor hierarchy, the similarity between the atomic factors is different with the related abstract factors. So, the context plays an important role in the similarity measure. In legal argument, the significances of similarity and distinction are emphasized by the plaintiff and the defendant, respectively. Thus, besides the distance-based similarity and the feature-based similarity, similarity measure based on the context, that can measure the significances of similarity and distinction, should be considered in making an argument.

Integrated Similarity Measure

Similarity measure in legal argument changes with contexts and viewpoints. Consequently, it is not the same as the conventional distance-based and feature-based similarity measure, that only have final numeric results, but is composed of several stages of similarity measures, which change with the movements of the argument made by the plaintiff and the defendant. Therefore, the final similarity measure should be an integration of the distance-based, feature-based and context-based similarity measures.

Similarity Measure with Explanation

The similarity measure in legal reasoning aims to classify vague concepts. It is necessary for users to know not only the conclusion, but also the reasoning process. The process explaining the classification of a vague concept, in the case of a law education system for students, is probebly more important.

The analysis above shows that the current similarity measures are not


sufficient to meet the required characteristics. For measuring the similarity in the legal argument, first, the traditional similarity measures are extended, and then a new similarity measure is proposed in the next section.

10.4 Structural Similarity Measure: Factor-based One

When measuring the similarity between a query case and the precedent case, intuitively, at first, we find the precedent that is the most on-point case from the case base. In this stage, the similarity denotes only the precedent that has the largest number of common features with the query case. Then, the distinctions of factors in the two cases are found by the dissimilarity measure made by the opponent. Finally, the significance of the similarities and distinctions are further evaluated by both sides.

In order to achieve correspondence with this kind of cognition process of human beings, the similarity measure is classified into surface-level similarity and deep-level similarity. The comparison of these two types of similarity measure is summarized in Table 10.1.

^ \ ^

Knowledge Representation

Output

Function

Surface Level

Case-Dependant

Individual, Numeric Value

Retrieval, Comparison

Deep Level

Domain-Dependent

Aggregation, Symbolic Value

Retrieval, Evaluatior

Table 10.1 Comparison of Surface-level Similarity and Deep-level Similarity

For the factor hierarchy representation, where the knowledge of case is represented by multiple levels of factors, similarity measure between the cases is made at the layers of atomic and abstract factors. The similarity measure varying with the knowledge of cases is described at different levels of factors. The similarity measure includes not only the component similarity measure, such as the one between atomic factors, but also the


relation between atomic factors and abstract factors. The former is the surface similarity and the latter is the deep similarity. Their integration is a structural similarity measure shown in Figure 10.2. Namely, the factor-based similarity measure(Fs,m) is composed of the distance-based(DSjm), Tversky model(TSjm) and the newly proposed context-based one(C,jm), as the following equation shows.

P\L'simi •*• simi^sim) (1)

Structural Similarity

Surface Level

Component

Distance-based Similarity >

•

Component

Feature-based Similarity

i

Deep Level .

Component

Context-based Similarity

Fig. 10.2 Structural similarity

Surface similarity measure has several properties(e.g. symmetry. These properties are not necessarily satisfied by the structural similarity because of the complexity and domain-knowledge independence. The properties are investigated below.

1. Minimal i ty

Not all cases in the precedent base have the same status. Certain ob-


jects are considered more prototypical than the others, or in some way more central to the domain, or particularly salient and distinctive [2]. It is suitable for the legal domain. If the precedents are sentenced by different courts, a preference order is present among them.

2. Symmetry

Because structural similarity is an integration of surface-level similarity and deep-level similarity, there is no simple symmetry. Generally, we can not say that a precedent(P) is similar to a query case(Q), just as it cannot be said that the parents are similar to the child [l]. The reason is that the precedent case preceeds the query case, and it was sentenced in the court. However, we can say that a precedent and a query case are similar to a context such as the concept: a proposal is sufficiently definite.

^context ^context (Q,P)

3. Triangular inequality

This axiom states that if a is similar to b, and b is similar to c, then a and c cannot be very dissimilar. Namely, a and c can be related to b in different aspects. In structure similarity, transitivity does not exist in a broad sense, i.e. if case P I is similar to case Ql and PI is similar to case Q2, then Ql and Q2 are not necessarily similar to each other, because the similarities exist in different contexts and viewpoints.

10.5 Extensions of Distance-based and Feature-based Similarity Measures

Similarity measures are defined at three layers: atomic factors, values of atomic factors, and aggregation of similarities and dissimilarities. Appropriate similarity measures are chosen and applied based on the requirements. The notion of similarity is usually based on the distance(metric) and the feature set[l]. The comparisons of factors values are measured by the distance-based similarity. The fuzzy atomic factors comparisons are measured by the fuzzy T-norms. Aggregation of similarities and dissimilarities is based on the extension of feature similarity measure proposed by Tversky [l].


10.5.1 Comparisons of Factors Values

A triangular membership function can be used to represent the fuzziness of factor values. For example, the magnitudes of the importance of the important part in the vague concept " The proposal is sufficiently definite" in the CISG can be described by fuzzy membership function.

There are several methods for the determination of similarity measures of fuzzy sets [5]. Because the fuzzy set used here becomes a singleton when the factor values become crisp data, and two fuzzy sets sometimes do not overlap, the methods in [5] can not handle these problems. The difference between the centers of gravity of two membership functions A, B is used to measure the similarity.

The distance between the two centers of gravity, i.e., \CG{A) — CG(B)\, is used to describe the similarity degree. To satisfy the axioms of similarity theory, the degree of similarity S(A, B) is calculated by

S(A,B) = l-\CG(A)-CG{B)\. (2)

10.5.2 Fuzzy feature-based Similarity Measure

The modification of the Tversky model is based on considering the fuzziness and the factor hierarchy. Modification includes some extensions to the Tversky model. The Tversky model is modified in the following way.

Fuzziness

To deal with the fuzziness in the existence of factors, the Tversky model was extended into a model considering the fuzziness. Even though an extension that considers the fuzziness alreday exists in [3],it doesn't satisfy the requirement of the legal argument. Here, a further modification is introduced.

Let CB={Pi,P2, ...Pi...Pn} be the case base, and Q be a query case. Each case is represented by a set of factors. Pj={mp., ...mp

k...Tnp.}, Q={TOQ ,...TTIQ ...TTIQ}, where mp

k. and TUQ denote the degrees to which the factor fk ( {k = 1,..., t), t is the number of atomic factors) exists in the cases Pi and Q, respectively. They are judged in terms of the membership and vagueness concepts introduced in [7].

The similarity S and the distinctions DS1,DS2 between a query case Q and a precedent case P; are:


S(Q,Pi) = f(QnPi), (3)

DSl(QlPi) = f{Q-Pi), (4)

DS2(Pi,Q) = f(Pi-Q). (5)

Factor Set Similarity(FSS):

t

f(QnPi) = Y,SH(mfQ

k,m%), (6) &=i

where SH is a distance-based similarity measure between fuzzy sets that will be discussed in the next section.

Factor Set Dissimilarity(FSDS)

f(Q-Pi) = Max{Max{m% - m £ , 0)}'fc=1, (7)

f(P{ -Q) = Max{Max{mP\ - m J , 0)}*fe=1. (8)

The distinction between P and Q is not decided simply by the addition of the Cardinality of the set, but by finding the factor that has the largest Mox(m^ - nip., 0).

The degree to which a factor is an element of a case is determined by the center of gravity of the membership function that is described by the membership concept and vagueness concept.

Factor Hierarchy The fuzzy factor hierarchy that is composed of the factor value, the

atomic and abstract factors, as well as their relations, should be considered in the similarity measure.

Virtually, the only atomic factors are applied, the similarity measure is a surface one. The relations between atomic factors and abstract factors have a great influence on the similarity measure between the atomic factors in terms of the Tversky model. By considering the abstract factors of the fuzzy


Abstract Factors

Precedent Query Case

Fig. 10.3 Factor-based Similarity Measure

factor hierarchy representation, this surface similarity can be developed into deep similarity that can evaluate the similarity and the distinction. This modification is, in fact, done by a new similarity measure that can evaluate the significance of similarity and distinction (Figure 10.3). The meaning of the significance means:

Finding similarity from the distinction according to the other viewpoint and context, in order to downplay the distinction.

Finding distinction from the similarity according to the other viewpoint and context, in order to downplay the similarity.

In other words, a part of the expected extension of the Tversky model is, in fact, substituted by a deep similarity measure that is referred to as a context-based similarity measure, and will be discussed in the next section. As a conclusion, factor-based similarity measure is a structural one, being composed of the distance-based(dSim) and Tversky model(Ts;m) (the two considering the fuzziness), and the context-based one(cs;m). The way in which these kinds of similarity measures can be integrated is domain dependent.


10.6 Context-based Similarity Measure

The relation between context and similarity measure was discussed only several years ago [4], and has not been studied sufficiently. In this section, at first, context-based similarity measure is defined, and is classified in several ways. Then, the most important classification, i.e. the retrieval and evaluation, is further defined, and its properties are also discussed and proved.

10.6.1 Definition and Classification

The similarity measure is affected by the context, namely, it changes with different situations. For example, the goods Jet Engine System and Cultivator Unit are not incomparable because they are not the same type of goods, have the different functions, etc. However they are not only comparable but are also similar if it is judged by considering the composition of goods, i.e. they are both goods that are composed of several parts. The relation between context and similarity is defined below.

Definition 3 (Context-based Similarity Measure)

Given two objects: query object OQ and target object OT, a context CX, and a viewpoint VP, then similarity measure is a relation between them.

It is represented by the following formula.

Csim(0T, 0Q, CX, VP), (9)

where OT is a target object, OQ is a query object, CX is a context, VP is a viewpoint, CSim means the context-based similarity measure relation.

In the above formula, the similarity measure is related to four parameters. If a parameter is a constant, it is an input. If a parameter is a variable, it can be regarded as an output. According to whether the parameter is constant or variable, the similarity relation has different functions.

If CX in CSim is known, C s j m has a function of retrieval, namely, retrieving the relevant elements from the target object according to CX. If CX is unknown, but OQ or Op contain some known parts, then, the similarity measure makes a evaluation on OQ or Op by finding an element of CX. Therefore, the similarity measure can be classified as:


Retrieval, Evaluation

CSim is a deep similarity measure. The retrieval has a role of relation between the surface-level and the deep similarity measures. The evaluation is the essence of CS jm . The retrieval and evaluation function of context-based similarity CSjm will be discussed in detail in the following section.

The retrieval and evaluation in the similarity measure can be classified further based on different aspects like: knowledge resource and change of context.

According to knowledge resources of the context, it can be classified by:

Factor Values, Factors , Similarity Measure

The context can include constraint conditions such as the range of a factor value, a specific factor, or matching criteria, used to retrieve the expected information from a case or a case base. It allows us to assess the similarity and retrieve the relevant information by means of the con-strainted context. The relevance between the current task and current goal is returned in a semantic way. On the other hand, the similarity and distinction in the surface-level similarity measure can be evaluated by a context such as abstract factors. The similarity measure in the above classification is the result of the surface-level similarity measure.

According to the viewpoint of the change of context, it can be classified by:

Static: indicates that context is pre-defined by user or expert. Dynamic: indicates that context is obtained through the reasoning

process.

Context can be the constraint conditions pre-defined by users, for example, the value range of a atomic factor. If the pre-defined constraint is fuzzy, then the context-based similarity includes the similarity measure between fuzzy sets. It can also be the intermediate result inferred by the system, in case when it is decided by a reasoning process. Which context is used can be judged based on criteria. The judgment criteria come from knowledge representation. So, given a target, the context used for the similarity measure changes with the reasoning mechanism. In this meaning, the similarity measure is a learning-based measuring process.

Knowledge Representation and Similarity Measure . . . 203

Fig. 10.4 Illustration of Retrieval in Similarity Measure

10.6.2 Retrieval in Similarity Measure

The retrieval in the proposed similarity measure has multiple purposes, including the retrieval of the most on point case from the case base, the shared factors and unshared atomic factors between cases and the factors whose values are similar or different and so on. A definition is as follows.

Definition 4 (Retrieval)

Given a target object OT, an element / Q of query object OQ, a context F G CX, and a viewpoint VP, RCX(OT,OQ,CX, VP) means that 3/x € OT is an output if / Q and fa are viewed as the same in the context F and the viewpoint VP.

The parameters of Rcx are changed with the inference process. For example, the search space Or may be a set of precedents, or a set of the retrieved case from the precedent base. CX may be the surface-level similarity, or the pre-defined range of factor values, etc.

Monotonicity in retrieval

Monotonicity is very useful in retrieval. Retrieval results can be adjusted by the control range of a context. Obviously, if there are no ideal objects in the results, the initial context may be too restrictive, and it should be relaxed, whereas the context should be restricted to diminish the retrieved objects.

In order to obtain effective retrieval, the context must be ordered. Assume CXi C CXj, namely, CXj is more restrictive than CAY By the


" O c J

Fig. 10.5 Illustration of Evaluation in Similarity

above definition, Rcx(Ox,OQ,CXi,VP) and RCX(OX,OQ,CXJ,VP) imply fx C fj., where fT and fT are the elements of Ox in the sense of CXi and CXj, respectively. So, the less results are retrieved. This is because the more restrictive context means that more factors or factor values are required to satisfy the context.

10.6.3 Evaluation in Similarity Measure

The evaluation function follows the retrieval in order to produce the similarity measure in a context and a viewpoint.

Definition 5 (Evaluation)

Given a target object Ox, a query object € OQ, a context set CX, and a viewpoint VP, ECX(OT,OQ,CX,VP) outputs F € CX if 3 / T € Or and 3 / Q € OQ, fx and / Q are viewed as the same in the context F € CX and viewpoint VP.

It can be further classified as follows.

Ecx(0T,fQ,CX,VP)

Or and CX are considered as a reference and / Q is compared to the element of Or by finding F 6 CX and fx £ Or, such that fx and / Q are the same in F and VP.

Ecx(fT,0Q,CX,VP)

OQ and CX are considered as a reference and fx is compared to the


element of OQ by finding F € CX and / Q e OQ, such that fa and / Q are the same in F and VP.

Ecx(fT,fQ,CX,VP)

fr and / Q have the same kind of status and none of them is a reference. Finding F £ CX, let fo and / Q be the same in F and VP.

In the above formulas, if F is found, the result also returns True, otherwise, the result also returns False.

The properties of the evaluation in the similarity measure are discussed below.

Symmetry in evaluation

With respect to the similarity in evaluation, if in OT there exists fr that is the same as the JQ € OQ in the context F € CX and viewpoint VP, the result is T. If the order is changed, i.e. the known input is / Q , there exists fr that is similar to / Q in the meaning of F € CX and viewpoint VP, the result is also T. Consequently, if the / x and / Q are viewed as the same in the meaning of CX and VP, the evaluation satisfies the symmetry.

Transitivity in evaluation

Let an element A of OT or OQ viewed in context CX and viewpoint VP be denoted ACX,VP- If A € OT is the same as B € OQ in CX and VP, and B € OT is the same as C € OQ in CX and VP, it can be known that Acxyp = BCX,VP and Bcxyp = CCX.VP, therefor, ACX.VP = Ccxyp-

This property is useful in citing the similar precedent to support own viewpoint, and oppose the viewpoint of an opponent.

10.7 Structural Similarity and Making Argument

Because interpretation of a vague legal concept in a case is related to the debates made by plaintiff and defendant. Whether a precedent case is similar to a query case, usually is debated by these two sides.

In an argument, if one side analogies the query case to a precedent case, the other side will distinguish them, and if one side emphasizes the similarity, the other side will downplay it, namely, emphasizes the dissimilarity of them. So both similarity and dissimilarity should be measured. A factor has different values in different cases. It should also be considered that every factor has a pro or con direction in reasoning process.


The proposed argument model in legal reasoning system consists of three steps:

1. Side l 's Claim

The similar cases are retrieved from a set of examples. The case that has the largest similarity degree of facts to that of the query case is as the most on point case Pmopc- If the conclusion of case Pmopc favors the one side, the conclusion is regarded as the one side's claim.

Iciaim = { The conclusion of the case Pmopc}U{ The factors that support the conclusion of the case P m o p c } -

Pmopc is decided by the Factor-based similarity and the Retrieval in Similarity.

Rcase(CB, Q, Ts, VP) = Ociaim, (10)

where VP denotes the viewpoint of the Side 1, the similarity Ts defined in the section 3 is the context. When Ts satisfies :

TS = Max{f(Q, P0}r = i , Pi e CB, (11)

it is known that Oclaim = P m o p c .

2. Side 2's Objection

Another side finds the distinction from the query case and the case Pmopc, and emphasizes it. The distinction includes the difference between the shared factors, and the unshared factors. The former is to find the difference between the values of shared factors, the latter is to find the factor that favors this side from the unshared factors.

lobjection = { The factor 0®f that has the largest dissimilarity degree FSDSQ in the query case} U { The factor 0^ that has the largest dissimilarity degree FSDSp in the precedent Pmopc} U { The shared factor Odv

but has different factor values} They are decided by the Factor-based similarity and the Retrieval in

Similarity.

R factor {Pmopc, Q, ^DSl, VP) = Odj, (12)


Pfactor\Qi Pmopci TDS2, VP) = 0%, (13)

•tifactorK^mopci V, Usimi v ") = Udv {*•**)

The similarity measures should satisfy the following equations.

TDSI = f(Q ~ Pmopc), (15)

TDS2 = f{Pmopc - Q). (16)

Dsim means the dissimilarity between numeric values, it can be computed by the usual distance-based similarity measure [12]

3. Side l 's Rebuttal

The first side downplays the dissimilarity by finding the factors that can disregard the difference emphasized by the other side by the evaluation function introduced in last chapter. The distinction found by the side 2 is evaluated by the side 1.

hebuttai={ The abstract factor Osf that can downplay the differences }

Efactor(Pmopc, Oy, CX, VP) = 0$Jl°pc, (17)

Efact0AQ, 0%, CX, VP) = Off, (18)

where CX is a set of abstract factors of the fuzzy factor hierarchy, and VP is the viewpoint of the Side 1.

In these 3 steps, if there are relevant cases supported it then they should be cited.

By these 3 steps, the computational model is structured as shown in Figure 10.6. A context-sensitive interpretation Imferencei namely, the output of the proposed argument model, can be obtained as

inference — *claim U objection U ^rebuttal- (19)


Input Case Base

T s

L-DS1

^-DS2 D

Rcase

Rfactor

Efactoi

Iclaim

Iobjection

I rebuttal

Factor-based Similarity Output

Fig. 10.6 Computational Model of Fuzzy Legal Argument

10.8 Exper iment

This experiment is based on the vague concept and the cases of the CISC The vague concept "The proposal is sufficiently definite" in the CISG is employed to illustrate how to make a legal argument by the proposed approach. The fuzzy factor hierarchy of this vague concept that is focused in the fixing of price is shown in Figure 10.1. The meanings of them are represented as follows.

fi: The important part has a price / 2 : The attachment has no price f%: The attachment is not sold in market /»: There is no product that can substitute the attachment /5 : There is an important part in the goods /6: There is an attachment f7: There is no similar product for the attachment

Fl : Indicating the goods F2: Fixing the quantity F3: Making provision for determining the quantity


F4: Fixing the price F5: Making provision for determining the price. F6: The goods are composed of several parts. F7: There is not the market price. The following case Cultivator Case is used as a query case.

1) On April 1, company C in New York dispatched a letter containing an offer to the business branch of a Japanese company D in Hamburg, the content of which is that C sells a set of cultivator(the price of the tractor itself is $50,000 to D. The tractor should be equipped with a rake, which is product of company E. The farming machinery is delivered by a U.S. cargo ship).

2) The letter reached D on April 8.

3) On April 9, D telephoned C to tell "I accept your offer, but you should transport the machinery by a Japanese container"

Students can at first decide the degrees that this query case has the factors by referring to the atomic factors of the fuzzy factor hierarchy, then in the light of the output of the legal argument to learn the argument skill and further comprehend the meaning of the vague concept and the query case by comparing with the argument made by himself.

In the case base, there are 8 precedents that are represented by fuzzy factor hierarchy, for example, the atomic and abstract factors of Jet Engine Case related to the issue are as follows:

/ i : The important part has a price fa The attachment has no price /3 : The attachment is not sold in market f$: There is an important part in the goods /6: There is an attachment f7: There is no similar product for the attachment

If the following atomic factors are considered to be the properties of the query case Cultivator Case, an example of the output of this system for the query case is shown in Figure 10.7.

/ i : The important part has a price f>: The attachment has no price /3 : The attachment is not sold in market /4: There is no product that can substitute the attachment


Fig. 10.7 An Example of the Legal Argument

/ 5 : There is an important part in the goods f&: There is an attachment

The explanation for the process of the argument is as follows.

Plaintiff's Claim

The proposal of the query case is not sufficiently definite, because it has the most high similarity degree with the Jet Engine Case that has the conclusion: the proposal is not sufficiently definite, because the query case has the factors fa, $3.

Defendant's Objection

Jet Engine Case case is not applicable to the query case, because there is /*4 in the query case, and fj in the precedent.

Plaintiff's Rebuttal

Jet Engine Case case is still applicable to the query case, even though there is / 4 in the query case, but it is the same as the f7 in the Jet Engine Case under the meaning of the abstract factor F7. They both support the abstract factor F7: there is not the market price. So, the proposal is not


sufficiently definite.

The output of the system is different with the judgments selected by the user. It is helpful for users to know that results are changed by the different inputs. It also helps users to learn the skill of making legal argument. And, it is also helpful for students to understand the meaning of the statutory rules of the CISG and the meaning of the precedents and query case from the viewpoints of plaintiff and defendant.

10.9 Conclusion

The proposed structural similarity measure - factor-based similarity measure - establishes a framework of a similarity measure in legal argument. It is very important for the knowledge-based system, where the objects should be represented based on the content that cannot be represented by a simple flat structure. This work developed the study of similarity measure in legal reasoning and provided a theory basis using the similarity measure to build more effective and efficient intelligent legal reasoning systems. The proposed approach can be applied to more domains, such as diagnosis and decision support systems beyond the legal domain, especially where strong domain theory is not available.

By the fuzzy factors hierarchy, the uncertainty and vagueness of concepts are represented. The similarity and dissimilarity are used to represent the debates between agents. The argument based on the similarity and dissimilarity measures is modeled. In terms of factor hierarchy and the organization of the argument model, a vague concept can be learned from examples. An example on legal reasoning system is used to verify the effectiveness of this model. Extending the case base, and considering hypothesis in the argument are the items in the future work.


R e f e r e n c e s

[1] Amos Tversky, "Features of Similarity", Psychological Review, Vol.84, No 4, pp. 327-352, 1977

[2] Athena Tocatlidou, "Learning-based Similarity Measurement for Fuzzy Sets", Int. J. of Intelligent System, Vol.13, pp. 193-220, 1998

[3] Bernadette Bouchon-Meunier, et al., "Towards general measures of comparison of objects", Fuzzy sets and Systems 84(1996), pp. 143-153

[4] Y. Chang, "Context-Dependent Similarity", Uncertainty in Artificial Intelligence 6, 12 P.P. Bonissone, et al.(editors), pp. 41-47, North-Holland, Ny, 1991

[5] Chen S.M., Yeh M.S, Hsiao P.Y., "A Comparison of Similarity Measures of Fuzzy Values", Fuzzy sets and Systems, Vol.72 no.l, pp. 79-89, 1995

[6] Edwina L. Rissland, Kevin D. Ashley, "AA Case-Based System for Trade Secretes Law", Proc. of ICAIL'87, pp. 60-65, 1987

[7] Kaoru Hirota, "Extended Fuzzy Expression of Probabilistic Sets", In Advances in Fuzzy Set Theory and Applications, M.M.Gupta et al.(eds.), North-Hollaand, pp.201-214, 1979

[8] Kaoru Hirota, et al, "A Precedent-based Legal Judgement System Using Fuzzy Database", Int. J.of Uncertainty, Fuzziness and Knowledge-Based Systems, Vol.4, No.6, pp. 573-580, 1996

[9] Nikola Schretter, "A Fuzzy Logic Expert System For Determining the Required Waiting Period After Traffic Accidents", EUFIT'96, 1996

[10] Nikos Karacapilidis et al, "Using Case-Based Reasoning for Argumentation with Multiple Viewpoints", Proc. of ICCBR 1997, pp. 541-552

[11] Vincent Aleven, Kevin D. Ashley, How Difference Is Difference? Arguing About the Significance of Similarities and Differences, Proc. of EWCBR'96, pp. 1-15, 1996

[12] Zwick R. Carlstein E. et al, "Measures of Similarity Among Fuzzy Concepts: A comparative of Analysis", Journal of Approximate Reasoning, Vol.1, pp. 221-242, 1987

Chapter 11

Trend Fuzzy Sets and Recurrent Fuzzy Rules for Ordered Dataset

Modelling

J.F.Baldwin, T.P.Martin, J.M.Rossiter University of Bristol, UK

Abstract

We present two methods of modelling ordered datasets using Baldwin's mass assignment. The first method generates a simplified memory-based fuzzy belief updating model. Results are given in application to particle classification and facial feature detection. The second method uses a new, high level, fuzzy trend feature based on a set of fuzzy trend prototypes. These prototypes are closely related to human perceptions of shape in ordered series. The models generated using this method are concise and linguistically clear glass box models. Results are given in application to sunspot and simple sinewave data series.

Keywords : mass assignment, fuzzy sets, perception-based modelling, memory-based modelling, ordered datasets, time series, trend modelling, belief updating, Pril, high level features, recurrent fuzzy rules

11.1 Introduction and background

In this paper we will describe how mass assignment can be the enabling factor in the modelling and prediction of ordered datasets. The models produced are clear, concise and descriptive glass box models of ordered data. Results are given for gas particle classification, for facial feature extraction from digital images and for time series problems, including the prediction of sunspot activity. We take two approaches to the problem of modelling ordered datasets. Both are derived from simple analysis of human behaviour. Our simple perception-based and memory-based models of ordered datasets aim to use high level features and mechanisms based on human behaviour.

213

214 J. F. Baldwin, T. P. Martin & J. M. Rossiter

Perception-based modelling using soft computing with mass assignment is based on the high level perception mechanisms used by humans to sense their environment. This includes human hearing, touch, smell, and, in our case, vision. The decomposition and interpolation properties of soft computing techniques using mass assignment are well suited to manipulating these complex features.

Memory-based modelling using soft computing does not focus on the features being manipulated, such as the perception-based features described above, but on how the computing model captures human belief and memory. Implicit in this model is a representation of belief and a method of updating beliefs.

Our tool for this soft computing research is the fuzzy set theory based on mass assignment [4]. We will summarize the features of mass assignment which are used in this paper and point the reader in the direction of further information on this exciting soft computing paradigm.

We assume the reader has a basic knowledge of probability, possibility and classical fuzzy set theory and has, at the very least, read Zadeh's original fuzzy set publication [ll].

This paper builds on the research first presented in [6] and [7].

11.1.1 A mass assignment interpretation of fuzzy Sets

Baldwin's mass assignment unifies probability, possibility and fuzzy sets into a single theory. This section is intended as a brief summary of mass assignment and a pointer to more detailed literature.

11.1.1.1 Mass assignment definition

A mass assignment m is defined on the powerset V(X) of the universe X such that the following three conditions hold.

(1) m:V(X)->[0,l] (2) m(0) = 0

(3) Y,Aev(x) m(A) = 1

Consequently we can say that VA e V(X), m{A) < 1. Note also that m(A) can be greater than m{B) even when AC B.

Complete certainty can be expressed by the mass assignment m(A) — 1, where A is a singleton in X and VB ^ A, m(B) = 0.

Trend Fuzzy Sets and Recurrent Fuzzy Rules . . . 215

Complete uncertainty can be expressed by the mass assignment m(X) = 1.

11.1.1.2 Mass assignments and probability distributions

Given these restrictions it is clear that a mass assignment defines a family of probability distributions over the universe X. Take for example a mass assignment m = {xl} : 0.2, {zl,a:2} : 0.3, {x2, x2>) : 0.4, {a;l,a;2,a;3} : 0.1 denned over the universe X = {a;l,a;2,a;3}. This mass assignment enforces the following probability restrictions, where u,v,x, and y are variables restricting the family of probability distributions allowed,

Pr(a;l) = 0.6 — x — u — v Pr(a;2) = x + y + u Pr(a;3) = 0.4 - y + v

based on the restriction variables u,v,x, and y,

0 < x < 0.3 0 < y < 0.4 u,v > 0 0 < u + v < 0.1

11.1.1.3 A voting model of fuzzy sets

Mass assignments are related directly to fuzzy sets, and hence to possibility distributions, through Baldwin's voting interpretation of fuzzy sets [l] Using the voting interpretation any normalized fuzzy set can be expressed as a mass assignment. Figure 11.1(a) shows a simple discrete fuzzy set T = {o/ l , 6/0.7, c/0.5,d/0.1}. The mass assignment m is derived from T using the voting interpretation as illustrated in figure 11.1(b). In this example we consider a group of 10 voters, each of whom must vote on the acceptance of a member of V{X) given T. From figure 11.1(b) we see that voters 10,9, and 8 accept only a, voters 7 and 6 accept only a or b, voters 5,4,3, and 2 accept only a or b or c, and voter 1 accepts any of a,b,c or d. Normalizing the number of voters accepting any one proposition generates the mass assignment m for fuzzy set T shown in Eq. 1.

m = {a}: 0.3, {a, b} : 0.2, {a,b,c} : 0.4, {a,b,c,d} : 0.1 (1)


y -p voters

>m{a}=0.3

1 ^m{a,b}=0.2

m{a,b,c}=0.4

I! fn[rm{a 'b ' c 'd '= 0 '1

a b e d a b e d

Fig. 11.1 A voting interpretation of fuzzy sets

Having converted a fuzzy set into a mass assignment we can now use the calculus of mass assignment to reason with fuzzy sets at the mass level. The advantage of this representation is the close relationship between mass assignments and their corresponding families of probability distributions. Mass assignment therefore provides the crucial link between probability and fuzzy sets. This is a great enabler in developing soft computing solutions based on a more unified theory than commonly used by the fuzzy logic community.

The programming language Fril implements mass assignment and support logic calculus in a logic programming framework. For more information on Fril, mass assignment and support logic see [8] and [5].

11.2 Memory-based fuzzy belief updating

In this section we will present a simple belief updating system using recurrent fuzzy rules. As we shall show, this system improves class prediction in ordered datasets. We will approach the problem of belief updating from a human-centred memory and belief angle.

Trend Fuzzy Sets and Recurrent Fuzzy Rules ... 217

11.2.1 Why belief updating for ordered datasets?

The belief updating method proposed in this section was developed in response to problems encountered in classifying data in particle streams. The following example outlines the particle classification problem.

The particle stream classification problem

This problem involves the detection of hazardous particles in a stream of gases. This problem is important in both closed spaces, such as in chemical plants and in mine shafts, and in open spaces, such as during toxic chemical leaks and chemical and biological warfare.

The gas to be analysed flows past a sensor. This sensor generates a 6 feature tuple < E1,E2,E3,E4:,T > where E1,...,E4 are continuous domain features and T is some measure of time. Figure 11.2 shows this more clearly.

Ordered Dataset Output values

El E2 E3 E4 T Input

class

Fig. 11.2 A particle detector

A system is needed which takes each tuple in the ordered dataset in turn and generates a confidence that the current gas being detected is of a particular class. This confidence can be used to calculate concentration of each gas in a mixture.

Problems exist in such ordered datasets where class boundaries are indistinct. Indistinct classification is clearly an area where soft computing can have an impact. Given that the datasets we are dealing with contain some (however loose) ordering, it is natural to use information contained in the order itself when modelling the dataset.

As with many other methods we could, of course, consider some window of size m on the ordered dataset and process all or part of the window. This is shown schematically in figure 11.3 where et=0,... ,et=n is the in-


put stream of evidence and ct—n is the classification based on the evidence stream at time t. Clearly this can lead to computational explosion as the size of the window increases, although this is also determined by the methods employed by the artificial decision maker. It is more practical, efficient and elegant to model the ordering of data as a stream of evidence presented to a human being. This is shown in figure 11.4. The human decision maker here is only given one piece of evidence et=n at a time and must judge classification ct=n at any time using only its memory of past evidence and the current evidence.

't=l\^l=0

evidence window size =m et=l

et=m

artificial decision maker

classification

" ct=n

Fig. 11.3 Decision making on a windowed evidence stream

'/=n:c/=m - - cf=i c i=0

single evidence term

e,=„ human

decision maker

classification — - c , _ .

Fig. 11.4 Decision making on a point-by-point evidence stream

To keep complexity to a minimum our model uses only a single memory component in much the same way as a human would. This memory component holds a measure of belief in a classification givfai all previous evidence. Problems now arise in constructing a belief updating model using a single memory term that both represents human belief updating and is robust.

There are many other real world situations where such a simple belief and memory model derived from human behaviour is applicable. The student grades example below is one such problem.


A student grades example

A student is working for his exams. To help him prepare he is given weekly tests. The student's teacher must assess the competence of the student after each test. Competence is graded as low, medium or high. Table 11.1 shows that, given the test scores, the teacher would grade this student with low competence for each of the first three weeks. At the fourth week the student's grades improve markedly and the teacher faces a problem in assessing the competence A of the student. If the assessment of competence A is low then the teacher is clearly discounting the weight of the fourth week test result. If the competence A is judged high then the teacher is placing too much emphasis on the latest scores and not enough on past performance.

Week

1 2 3 4

Score

32 35 34 78

Competence

low low low A

Table 11.1 Student grades example

Since competence is generally agreed to be a measure of overall performance in a subject, some aggregation of previous scores is appropriate. In reality most teachers would grade the student with medium competence at the end of week 4- This example shows how the teacher's judgment is a function of the belief in the student's past competence and the latest test results.

In the next section we will examine the implementation of our simple belief updating model based on human behaviour using recurrent fuzzy rules. The performance of such a system is then shown in application to particle classification in gaseous streams and the detection of lip areas in facial images.

11.2.2 Fuzzy belief updating methods

Single point classification using mass assignment based Pril fuzzy rules generates support yt for a class c at time t given the current input attributes {a i , . . . ,a„}. yt is therefore some function of input attributes, i.e., Vt = / ({a i , . . . , a„}) .

Belief updating can be thought of as a separate process to classification, where the current class support yt forms the input attribute to the belief model. The belief model generates a class support given all the pre-


vious class supports . Class support y't from the belief model is therefore

some function g of the current class support from the classifier and some

aggregation of previous class supports, i.e., y't = g(yt, • • • ,yt-m)-

In the following three subsections we outline three possible belief up

dat ing methods;

• external fuzzy belief updat ing.

• internal fuzzy belief updat ing.

• recurrent interacting fuzzy belief updat ing.

11.2.2.1 External belief updating method

Figure 11.5 shows how a single fuzzy rule is combined with an external belief

upda t ing model to generate support for a given class. Here classification

and belief updat ing are processed separately and sequentially. A belief

updat ing model such as the Einhorn Hogarth 's anchoring-and-adjustment

model, as discussed in [9], can be used.

Datapoint (a,,..,aj

Rulec belief model

support for

y,' class c

Fig. 11.5 A fuzzy rule with belief aggregation

The advantage of this system is tha t any s tandard belief updat ing

method can be applied after the fuzzy rule system. The disadvantage is

tha t problems can be encountered with such external systems due to com

plex interactions between the fuzzy rule system and the belief updat ing

model.

11.2.2.2 Internal belief updating method

To simplify the external belief updat ing model described above we can

incorporate past support as an extra term in the fuzzy rule and use this to

replace the external belief model. This feedback rule has now internalized

the belief updat ing component. The internalized belief rule, as shown in

figure 11.6, is simple and does not interact with any of the other class rules.

If a series of inputs is presented to these rules and we then ask the

question "what is the predicted class?" the answer would be the class given

the highest rule support . This answer discards all other class supports and


therefore ignores any belief the model may have in any other class. This is inappropriate when two different classes have high supports which are almost identical.

Datapoint (ar..,an}

Rulec support for

y,' class c

Fig. 11.6 A simple support feedback rule

We propose that a more appropriate representation for predicted class supports is to form a fuzzy set called "predicted class" across the whole class universe. We propose the introduction of such a fuzzy set into our rule system, and will refer to this fuzzy set from now on as the predicted class fuzzy set. We will later detail how the predicted class fuzzy set is generated from rule supports.

The memberships of each class in the predicted class fuzzy set are calculated from the normalized rule supports using Baldwin's voting model of fuzzy sets. Eq. 2 is an example of a predicted class fuzzy set constructed on the class universe {a,b,c}.

predicted class = a/1 + 6/0.2 + c/0.5 (2)

Eq. (2) indicates that the system believes most strongly that the evidence so far supports class a. Clearly even after introducing the predicted class fuzzy set the winning class is the class with highest membership in predicted class. It is important to note that the shape of the predicted class fuzzy set records information about the current state of belief in each and every class, not just the winner.

11.2.2.3 Recurrent interacting fuzzy belief updating method

We can also use this predicted class fuzzy set as a belief element by feeding back the last predicted class fuzzy set into all prediction rules. Semantic unification [8] provides a measure of Vx{previous class | predicted class) within the feedback term. The resulting rule structure is shown in figure 11.7. If we again ask the question, "what is the predicted class?", the answer would now be the complete predicted class fuzzy set.

A fuzzy rule with a feedback term involving the predicted class fuzzy


predicted class fuzzy set

Fig. 11.7 The recurrent interacting belief updating fuzzy rule

set mimics positive and negative belief updating. Positive updating occurs when class c membership is high. This results in a high match with the feedback term in rule c, which in turn results in belief reinforcement in class c. Negative updating occurs when class c membership is low and membership of some other class or classes is high. The match with feedback term in rule c is low but the match with feedback terms in other rules is high. This results in a support for class c which, once all class supports have been normalized, is reduced. We call this an interactive recurrent fuzzy belief updating rule. The recurrence in this rule is implemented by feeding back the predicted class fuzzy set.

The inference rule used in this belief system is the Fril evidential logic rule. The evidential logic used for this interactive belief updating system is best explained with reference to figure 11.8.

((Class i s c i f ) (evlog (

( fea ture 1 i s fuzzy set 1) ( fea ture 2 i s fuzzy set 2) (previous c lass was previous class) ) ) ) : ( (1 1)(0 0))

Fig. 11.8 A recurrent evidential logic rule

The feedback term in figure 11.8 is represented by the third body term in the rule. The previous class fuzzy set is a reference fuzzy set generated during training. The previous class fuzzy set is denned over the class universe and holds information about which classes are likely to precede the class associated with that particular rule.

Datapoint (a,,...,a }

Rule a w

Rulem

1

j

pirvwus *

I l l / / V '•Cl

form predicted

class fuzzy

set

Wi W 2

W 3


If the predicted class fuzzy set matches the previous class fuzzy set to a high degree then the feedback term in this rule will have high support and this will contribute to this whole rule having high support.

The meaning of the support pairs ((1 1)(0 0)) is explained in detail in section 11.3.4.

We can generate the previous class fuzzy set simply by analyzing the transitions between classes in an ordered training dataset. The process is as follows:

(1) First we construct a transition matrix, such as in table 11.2, by counting the transitions from previous class to current class throughout the training dataset. Each column contains information that describes the likelihood of each class preceding the current class.

(2) Each column is then normalized to produce a probability distribution.

(3) The probability distributions are then converted into fuzzy sets previous class using mass assignment.

(4) The resulting previous class fuzzy sets are then included in the feedback term of the corresponding evidential logic rules.

Previous a b c

Current a 90 5 5

b 7

82 11

c 3 13 84

Table 11.2 Class transitions

As an example, let us take the column for current class c. Given that current class is c, the classes {a,b,c} will precede the current class with the probabilities Pr(a)= 0.03, Pr(6)=0.13, and Pr(c)=0.84. Using mass assignment, we obtain the previous classc fuzzy set for rule c shown in Eq. (3).

previous classc = a/0.09 + 6/0.29 + 1b/l (3)

Here a least prejudiced distribution is assumed in the distribution of masses defined by the previous class fuzzy set to the focal elements {a, b, c}.


The memberships of focal elements in the previous class fuzzy set are calculated for this example in Eq.(4).

X(a) = Pr (a )x 3 = 0.09

X(b) = X(a) + (Pr(&) - Pr(o)) x 2 = 0.29

X(c) = X(6) + (Pr(c) - Pr(ft)) = 1 (4)

This least prejudiced method of generating a fuzzy set from a probability distribution is described in detail, including a simple algorithm, in [5].

Having constructed previous class fuzzy sets for all rules the feedback weight for each rule can be calculated. We use semantic discrimination analysis [2] to determine the discriminating power of each previous class fuzzy set. These discrimination values are then normalized and distributed across the rules as term weights. Weights for attribute terms are generated using semantic discrimination analysis between attribute fuzzy sets.

11.2.3 Fuzzy belief updating results

11.2.3.1 Facial feature extraction

These results illustrate the operation of the interactive recurrent fuzzy rule in updating belief across a stream of data derived from facial images[3]. This application gives us a clear visual indication of how the belief rules operate internally.

In these results we use a 100x100 pixel colour bitmap images. The goal of this application is to binary classify the image into lip and not-lip regions. Each point is represented by the two normalized colour features red/(red + green + blue) and green/ (red + green + blue), no other features were used. The test image is traversed horizontally from top-left to bottom-right as a single stream of pixels. This traversal method is illustrated in figure 11.9 on a small section of a facial image showing just the lip region. The four stages in the image classification process are:

1 Take the 100 x 100 image and processes it into 100 horizontal slices, each of 1 pixel width.

2 Now join together the horizontal slices to form a stream of 1000 single pixels. The top slice is placed first in the stream and the


bottom slice is the last 3 Feed the pixel stream into the trained fuzzy rule system. Each pixel

point is classified as lip or not-lip. 4 Reconstruct the facial image using the reverse of processes in stages

1 and 2. The final image in figure 11.9 shows the lip regions as a shaded block.

1. original image

2. linearized image

i slice 1 i slice 2

-lko 1

bli'-o n

3. classified series

i slice 1

4. classified image

lip a: lip

Fig. 11.9 Images processed as linear streams

The 100x100 image in figure 11.10(a) was first processed by hand to define the rough lip mask in figure 11.10(b). These two images were used to train the fuzzy rules. Two test rule sets were generated, one using simple, non-recurrent, fuzzy rules and one using recurrent interacting belief updating rules. The first test involved predicting the lip region in figure 11.10(a) using the two rule sets.

Figures 11.11(a) and 11.11(b) show the predicted lip regions for simple fuzzy rules and recurrent interacting belief updating rules respectively.

A second test further highlights the performance of the recurrent rule in this application. The rule sets generated from the original figure 11.10(a) were tested on a second facial image, this time from a female. Figures


Fig. 11.10 Lip training images

Fig. 11.11 Lip test images

11.12(a) and 11.12(b) show the classification results using simple fuzzy rules and recurrent interacting belief updating rales respectively. Figures 11.12(c) and 11.12(d) show the lip areas from Figures 11.12(a) and 11.12(b) respectively In greater detail.

Figures 11.11(b) and 11.12(b) show markedly less speckling from mis-classification than figures 11.11(a) and 11.12(a). This is due to the smoothing effect of the belief memory term. To best apply this method to facial feature detection the Images can be traversed in the same way as described In figure 11.9, but In all four directions; left to right, right to left, top to bottom, and bottom to top. This will clear up more of the speckling and unify the lip region shown In figure 11.12(d).

11.2.3.2 Particle classification

Two datasets were generated from two classes of gaseous particle, each point having five continuous features.

• Dataset 1 represents a series where 200 particles of class 1 are 1;

Trend Fuzzy Sets and Recurrent Fuzzy Rides . . . 227

Fig. 11.12 Further lip test images

serieSj followed by 200 particles of class 2. Clearly dataset 1 represents the extreme case where belief in class 1 increases consistently through the first half of the dataset and then is contradicted at the half-way point. R:om the halfway point belief in class 1 falls consistently as belief in class 2 rises.

• Dataset 2 represents a series where no two points from the same class are next to each other in the series. Clearly this also is an extreme case. Here the system need only ha¥e belief in the next point given the previous point.

Table 11.3 shows results of class prediction on both datasets using single point,prediction and using the following four evidential logic rule structures:

a Simple evidential logic rules with no belief updating. Here the class is deduced only from the current input attribute tuple.

b Simple feedback rules. Support values are fed back into the same rules. Here the weight of the feedback terms in each rules are set by hand at 0.3.

c Feedback rules with automatic weight calculation. Here support values are fed back into the same rules. The weight of the feedback terms are calculated automatically using semantic discrimination analysis.


d Feedback rules using predicted class and previous class fuzzy sets. The feedback weights are calculated automatically, predicted class and previous class fuzzy sets are generate using the methods described in sections 11.2.2.2 and 11.2.2.3.

dataset

1 2

rule structure a

56.25 59.75

b

59 55.75

c

69.5 60

d

67 70

Table 11.3 Results using particle data: % correctly classified

Results on such particle classification datasets typically show 10 percent improvement when using our recurrent interacting belief updating rules over non-recurrent fuzzy rules.

11.2.4 Some conclusions from recurrent belief updating

We have shown that by adding a feedback term to the evidential logic rule we have generated a simple belief updating rule. The key to rule interaction is the use of the predicted class fuzzy set as a feedback element. We have also shown how rule weights and reference previous class fuzzy sets are generated automatically from the training data using mass assignment and semantic discrimination analysis.

Results in the applications of facial feature extraction and particle classification in streams show improvement using the belief updating evidential logic rule over simple non-recurrent evidential logic rules.

11.3 Perception-based fuzzy trend modelling

As we have shown above, classification in ordered datasets can be improved by the addition of a simple belief updating component. While this method is indeed very useful, and the model is a glass box, we often require a model that is more naturally descriptive of the behaviour of the dataset. Crucial to a concise natural description are the features used and their linguistic labeling.

The aim of this next section is to describe a time series with a set of rules which use natural linguistic terms. In our analysis we use natural linguistic


terms such as rising, falling, rising more steeply, crest etc. Using these natural terms we produce a glass box model of the series which is easily understood. This linguistic model avoids unintuitive complex mathematical descriptors and black box models, such as those commonly produced by neural networks.

We can also use the linguistic fuzzy rule models to predict how a time series will behave in the future. Such time series prediction has many real-world applications such as sunspot prediction to minimize telecommunication disruption.

11.3.1 An introduction to shape descriptors for ordered datasets

Simple linguistic terms such as rising and more complex terms such as rising more steeply are applied to ordered series to generate our descriptive models.

A simple linguistic term such as rising indicates that the series S at that point is changing; i.e., rising implies Sx+i > Sx. These terms are fuzzy measures of the first derivative trend of the series. A more complex term such as rising more steeply indicates that the trend of the series is changing; i.e., rising more steeply implies Sx+2 — Sx+i > S^+i — Sx. These terms are fuzzy measures of the second derivative trend of the series.

By matching these concepts to a time series we can describe the series with simple rules.

Figure 11.13(a) shows how shapes based on these natural linguistic terms can be matched with a discrete data series. Here rising more steeply matches the discrete series S to a higher degree than rising.

(a) (b)

rising S(x) i more ,

steeply!'

is rising

7*— » - ' * -

x ' ' x region of interest

sw( rising less , steeply

Fig. 11.13 Shape matching


The shape of any series will typically resemble more than one trend shape. For example, within a region of interest, a series can be rising steeply and leveling off at the same time. If we represent the trend shapes as fuzzy sets, a window on the series can have membership in more than one trend fuzzy set at the same time, as shown in figure 11.13(b).

Using these trend concepts we can build fuzzy rules to describe the series shape. These rules are of the form "next point is X if previous trend was Y " where X is a fuzzy value for S^, e.g. "about 0.5", and Y is a trend fuzzy set, such as "rising more steeply"

In practice it is simpler to construct rules that predict dS, the first derivative of the series, rather than the series S itself. To obtain the next S value the predicted dS is added to the current S value. Such a rule would have the form, "next point is current point + dS if previous trend was Y"

11.3.2 Prototype trend shapes

Before constructing our descriptive fuzzy rule model we first define a number of prototype trend fuzzy sets. Clearly these second derivative-like trend fuzzy sets must not be inconsistent with what most people understand by the corresponding linguistic labels. Figure 11.14 shows six such prototype shapes defined by the functions in table 11.4.

trend

falling less steeply

rising less steeply

falling more steeply

rising more steeply crest

trough

function f(x)

l - ( l - ( l - i ) a ) 2

( 1 - ( 1 - * ) < • ) *

(1 -xa)i 1 - ( 1 - X a ) 5

l - a 2 ( x - 0 . 5 ) a

a2(x-0.5)a

Table 11.4 Example prototype functions

Each second derivative trend fuzzy set is constructed from a prototype function in table 11.4 by taking a window of m + 1 values of f(x) where x is distributed uniformly across interval [0 1]. A trend fuzzy set is generated from the difference of this window series as described in the next section.

These prototype trend fuzzy sets are an important measure of the shape of a series. The prototypes are naturally linguistic and anyone encountering these linguistic names will have a good idea of the shapes they represent.

Trend Fuzzy Sets and Recurrent Fuzzy Rules . . . 231

falling less steeply falling more steeply rising more steeply

rising less steeply crest trough

Fig. 11.14 Example prototype shapes, a — 2

Having generated a set of prototype trend fuzzy sets we can now take a window in a series, generate the trend fuzzy set for that window and calculate ^{prototype trend\current trend) using Pril semantic unification [8]. This forms the basis of our fuzzy rule model.

11.3.3 Method for generating trend fuzzy sets

Given a time series S and current position n, our goal is to predict the next point sn + i from the last m points [s„_TO,..., sn].

To generate a trend fuzzy set we process a window of size m from the time series. A window is taken from the difference series D, derived from the original series S such that dra = sn + i — s„. Figure 11.15 shows how D is derived from S.

—i 1 1 1 1 1 1—- s sl s 2 \ s3 s 4 S5 s 6 sn

Si—S^ ST—Sry SA S-i Sc—S^

t t i I

— I 1 1 1—- D d l d 2 d 3 d 4

1 I

window of size m

Fig. 11.15 Extracting difference window D

The windowed difference values [d i , . . . , dm] are tested for membership


in p difference fuzzy sets, { f i , . . . , fp}. These difference fuzzy sets are defined on the continuous difference universe of D. Examples of such difference fuzzy sets are shown in figure 11.16. We can label the fuzzy sets {fi , . . . , f5} with appropriate linguistic labels such as falling fast, falling slowly, constant, rising slowly, and rising fast.

f l f2 f3 f4 f5

-1 j o 1

Fig. 11.16 Difference fuzzy sets

We must now calculate memberships of each d in each difference fuzzy set f. This generates a membership set P, shown in Eq. (5).

P = {X* (d,) | » € { 1 , . . . ,p},j G { 1 , . . . ,m}} (5)

where Xf;(dj) is the membership of point dj in fuzzy set f,. Figure 11.16 illustrates how memberships are calculated. Here Xf2(d) =

1 andxf3(d) =0.75. Having constructed a set of memberships for each window point in each

fuzzy set we now convert this into a single second derivative fuzzy set. Second derivative trend fuzzy sets are defined on the universe of compound labels L, shown in Eq. (6).

L = {label(U)-j | t € { l ) . . . , p } > i € { l ) . . . , m } } (6)

where label(fi) is the linguistic label of fuzzy set f, such as rising, falling, etc. and j is the position of the current point in the series window. Label label(fi).j is the concatenation of label{ii) and number j into a compound label. Examples of valid labels label(fi).j are therefore "risingl", "fallings", and "constants".

The membership of label(£i).j in the second derivative fuzzy set has the same value as the membership of dj in difference fuzzy set £ (that is,


Xf;(dj)) and is taken directly from difference membership set P. Now we have constructed a second derivative fuzzy set which describes

the trend of the series in the last m points.

11.3.4 Method for generating linguistic fuzzy rules

The standard Fril inference rule[5] is based on Jeffrey's rule, Eq. (7) where h is the head of the rule and b is the body.

Pr(h) = Pr(h|b)Pr(b) + Pr(h|-nb)Pr(-nb) (7)

Figure 11.17 shows a standard Fril rule, (n p) is the support pair representing Pr(h|b) and (I u) is the support pair representing Pr(h|-ib). Clearly if Pr(b) is known it is a trivial matter to calculate Pr(h).

((difference is falling if) (trend was falling_less_steeply)) : ((n p)(l u))

Fig. 11.17 Standard Fril rule

Since we are dealing with first derivative linguistic terms (such as low) in the head and second derivative terms (such as falling less steeply) in the body we need a method of determining Pr(h|b).

We can determine Pr(h|b) for each rule by constructing a table of trend fuzzy set labels against difference fuzzy set labels, as shown in table 11.5. Each (n p) is the support pair representing the conditional probability of the difference fuzzy set label given the corresponding prototype trend fuzzy set label, and is equal to Pr(h|b) for standard Fril rule. A (n p) pair is calculated as follows, where all probabilities are expressed as support pairs:

(1) Let us consider the set of difference fuzzy set labels describing fuzzy sets rising, constant, falling, etc., {A;} = {label(fi)} and the set of trend fuzzy set labels describing fuzzy sets rising more steeply, crest, falling less slowly, etc., {Bj.} £ {label(gk)}

(2) We can directly calculate Pr(Aj|dj) for each Ai and each j in the dataset using semantic unification.

(3) We can calculate Pr(B fc |dj) by first taking a window of size m previous values in the difference stream D. This window, [ d j _ m . . . dj], is converted into a trend fuzzy set fe as described in section 11.3.3. Semantic unification now gives us


Pr(Bfc|fB) which we will take as being equal to Pr(Bj.|d,-). This is repeated for all Aj and for all B*..

(4) So now we have values for P r (AJd j ) and Pr(Bj. |dj), and thus Pr((A i ,Bfc)|dj) = PrCAildOJMBjd,-).

(5) We now need to calculate Pr(A i,Bjfc) from Pr((A i ,B)t) |dj) , i.e., Pr(Aj,B,t) = J ] . P r ( ( A i , B f c ) | d j ) . P r ( d j ) . If we assume no prior for Prf^-) , PrÂ^Bj.) is

now ^2. Pr((A i ,Bj .) |Pr(d •)) which is easily calculated, (we can assume no prior because difference fuzzy sets are constructed on the distribution of the training set).

(6) Given Pr(Aj,Bfc) we now calculate Pr(Aj|Bfc) and insert this into the appropriate cell in table 11.5.

Trend fuzzy set label falling less steeply

falling more steeply rising less steeply

rising more steeply crest

trough

Difference fuzzy set label falling ("l Pl) ("2 P2) ("3 P3) ("4 P4) ("5 Ps) ("6 Pe)

constant ("7 P7) ("8 P8) ("9 Pa)

("10 P10) (" i i P11) ("12 P12)

rising ("13 P13) (ni4 P14)

("15 P15) ("16 Pl6) ("17 P17) ("18 Pis)

Table 11.5 Supports for (difference|trend)

Standard Fril rules such as figure 11.17 can be collected into a more concise Fril extended rule as shown in figure 11.18. Now Pr(h) can be calculated directly, with (np) pairs taken from table 11.5.

((difference is falling if)(

(trend was falling_less_steeply)

(trend was falling_more_steeply)

(trend was rising_less_steeply)

(trend was rising_more_steeply)

(trend was crest)

(trend was trough)

) : ( ( " !P l ) ( "2P2) ("3P3) ("4P4) ("5P5) ("6P6) )

Fig. 11.18 Extended Fril rule


11.4 Method for generating linguistic evidential logic rules

Figure 11.19 is an example of an evidential logic rule based on the six trend fuzzy sets from table 11.4. Given such a rule we need to calculate the term weights W i , i . . -Wi,6-

((difference is falling if)

(evlog (

(trend was falling_less_steeply) Wi,i

(trend was f alling-more_steeply) W i ) 2

(trend was rising.less.steeply) W i j 3

(trend was rising_more_steeply) Wi,4

(trend was crest) W i 5

(trend was trough)

))) : ((1 1)(0 0))

W 1,6

Fig. 11.19 Trend evidential logic rule

Each Wi>n is a measure of the importance of that term in determining the support for the head of the rule.

trend fuzzy set label

falling less steeply rising less steeply

falling more steeply rising more steeply

crest trough

difference fuzzy set label falling

Wi, i Wi,2 W i , 3

Wl,4 W l , 5

Wi, 6

constant

W2 , i W2,2 W2,3 W2,4 W 2 , 5

W2,6

rising

W3 , i W3,2 W3,3 W3,4 W 3 , 5

W3,6

Table 11.6 Weights for difference evidential rules

We calculate each Wy in table 11.6 for difference fuzzy set f; and prototype trend fuzzy set gj , given next difference d„+i and current trend fuzzy set g„, from Eq. (8). Weights are then normalized to preserve the constraint, Vi V . W ; J = 1. The conditional probability of one fuzzy set given another fuzzy set is calculated using Fril semantic unification.

W i j = ^ P r ( f i | d n + 1 ) . P r ( g J . | g „ ) (8)


11.4.1 Training and testing the linguistic evidential rules

Assuming that n prototype trend fuzzy sets, { g i , . . . , g n } , have already been defined, training involves the following process:

(1) p difference fuzzy sets, {f i , . . . , f p }, are denned on the universe of difference series D such that each fuzzy set covers an equal number of values in D. Typically these fuzzy sets are triangular or trapezoidal.

(2) n evidential logic rule weights, { W i , . . . , W n } , are then calculated as described in the previous section.

(3) Finally p evidential logic rules, one for each difference fuzzy set label, are constructed. Each rule has one body term for each prototype trend fuzzy set g.

Prediction involves the following process:

(1) A window of m previous difference values is taken from the difference series D.

(2) A trend fuzzy set g is then generated on the window as described previously. This trend fuzzy set describes the trend of the last m points in the series.

(3) Evaluating the evidential logic rules and defuzzifying the resulting supports gives a predicted difference value d.

(4) This predicted difference value is then added to the last point in the series S to give a predicted value for the next point in S.

11.4.2 Linguistic fuzzy trend results

Figures 11.20 and 11.21 show D and S series prediction results for the two data sets in table 11.7. The sunspot series are taken from a normalization of sunspot data described in [10]. For both cases two difference fuzzy sets are defined and hence two evidential rules were constructed. Window size for the sine series is larger than for the sunspot series to adjust for the lower frequency of the fundamental.

The graphs show prediction on the test sets as solid lines and actual values as dotted lines. Points in the region up to z are predictions of the next point in the series given a window of m previous points. Predictions after z are predictions for next point given a window of points up to z only.


Future predictions are then made on a window of previous predictions.

dataset

sine sunspot

train

sin(nx + /3)

[so, . . . ,s29]

test

sin(nx + 7) [S30, . . . , 8 5 9 ]

window size m

16 6

Table 11.7 Test data sets

z z

Fig. 11.20 Predicted sine wave

z z

Fig. 11.21 Predicted sunspot series

Clearly prediction of points in the region before z is more accurate than prediction after z where errors are accumulated and no new information is presented.

As can be seen in figures 11.20 and 11.21 the fuzzy trend model, with only two rules, catches the shape of both test cases. The model performs particularly well in figure 11.21 given the complexity of the sunspot data.

11.4.3 Some conclusions from fuzzy trend matching

We have shown that a new feature, the trend fuzzy set, can be used to represent the shape of a time series. A set of prototype trend fuzzy sets describing natural trends such as rising more steeply form the basis of terms


in Fril evidential logic rules. These rules are used to predict future points in the series given a window of previous points. With only a small number of rules, difference fuzzy sets, and trend prototypes, a good approximation of the shape of the series is produced.

Using trend fuzzy sets based on natural linguistic terms we have generated a glass box model of the series. The glass box nature of this method enables a clear understanding of the prediction model and in many cases the series as well.

Complexity in the test examples is limited to the distribution of weights in the evidential logic rules. More complex rule structures can be used to model ordered datasets more accurately.

11.4.4 Overall conclusions and future perspectives

Our two simple models of perception and belief have improved the modelling of ordered datasets. We have described the two approaches separately in order to highlight their differences.

Results across diverse applications have been presented which show the improvements in modelling and prediction possible by using these two models.

In some applications it may be desirable to combine the two models to give a belief updating system which uses the high-level fuzzy trend feature as its input. This will produce a linguistically clear and concise glass box model which captures some element of human perception and belief updating.


Reference

J. F. Baldwin, "Management of Fuzzy and Probabilistic Uncertainty for Knowledge Based Systems", in Encycloapedia of AI, (Ed. S. A. Shapiro), John Wiley, 2nd ed., pp.528-537, 1992.

J. F. Baldwin, "Knowledge from Data Using Fril and Fuzzy Methods", Fuzzy Logic (Ed. J. F. Baldwin), John Wiley and Sons, 1996.

J. F. Baldwin, S. Case, T. P. Martin, "Machine Interpretation of Facial Expressions" BT Technology Journal, Vol. 16, 3, pp.156-164, 1998.

J. F. Baldwin, J. Lawry, T. P. Martin, "A Mass Assignment Theory of the Probability of Fuzzy Events", Fuzzy Sets and Systems, Vol. 83, 3, pp.353-367, 1996.

J. F. Baldwin, T. P.Martin, B. W. Pilsworth, "Fril - Fuzzy and Evidential Reasoning in Artificial Intelligence", Research Studies Press Ltd, 1995.

J. F. Baldwin, T. P. Martin, J. M. Rossiter, "Recurrent fuzzy rules for belief updating", Proceedings of lizuka 1998, Vol. 1, pp.511-514, 1998.

J. F. Baldwin, T. P. Martin, J. M. Rossiter, "Time series modelling and prediction using fuzzy trend information", Proceedings of lizuka 1998, Vol. 1, pp.499-502, 1998.

J. F. Baldwin, B. W. Pilsworth, "Semantic Unification with Fuzzy Concepts in FRIL", International Journal of Intelligent Systems, Vol. 7, pp.61-69, 1992.

A. I. Goldman, "Epistemology and cognition", Harvard University Press, 1986, pp.344-358, 1986.

A. S. Weigend, B. A.Huberman, D. E. Rumelhart, "Predicting Sunspots and Exchange Rates with Connectionist Networks" in Nonlinear Modelling and Forecasting, Addison-Wesley, pp.395-432, 1992.

L. A. Zadeh, "Fuzzy Sets", Information and Control, Vol. 8, pp.338-353, 1965

Chapter 12

Approaches to the Design of Classification Systems from

Numerical Data and Linguistic Knowledge

Hisao Ishibuchi, Manabu Nii, and Tomoharu Nakashima

Osaka Prefecture University

Abstract

This paper discusses the design of classification systems when we have two kinds of information: numerical data and linguistic knowledge. Numerical data are given as a set of labeled samples (i.e., training patterns), which are usually used for designing classification systems in various pattern classification techniques. Linguistic knowledge is a set of fuzzy if-then rules, which is not usually utilized in non-fuzzy pattern classification techniques. In this paper, it is implicitly assumed that either kind of information is not enough for designing classification systems with high classification performance. Thus our task is to design a classification system by simultaneously utilizing these two kinds of information. In this paper, we illustrate two approaches to the design of classification systems from numerical data and linguistic knowledge. One is a fuzzy-rule-based approach where numerical data are used for generating fuzzy if-then rules. The other is a neural-network-based approach where linguistic knowledge as well as numerical data are used for training neural networks. First we discuss the extraction of fuzzy if-then rules directly from numerical data. We also describe the fuzzy rule extraction from neural networks that have already been trained using numerical data. Next we discuss the learning of neural networks from numerical data and linguistic knowledge. In the learning, fuzzy if-then rules and training patterns are handled in a common framework. Finally we examine the performance of these approaches to the design of classification systems from numerical data and linguistic knowledge through computer simulations.

Keywords : fuzzy rule-based systems, neural networks, pattern classification, knowledge extraction, learning from linguistic knowledge

241

242 H. Ishibuchi, M. Nii & T. Nakashima

12.1 Introduction

When we design information processing systems such as controllers and classifiers, two kinds of information are usually available. One is numerical data, and the other is linguistic knowledge from domain experts. Various pattern classification methods have been proposed for designing classification systems from numerical data [1-4]. Those methods usually can not utilize linguistic knowledge for designing classification systems. For example, only numerical data are used in the learning of neural networks, which are viewed as nonlinear classifiers in their application to pattern classification problems. On the other hand, fuzzy rule-based systems [5,6] are traditionally designed from linguistic knowledge of human experts. Recently various methods have been proposed for automatically designing fuzzy rule-based systems from numerical data without human experts [7-14].

The main aim of this paper is to illustrate how the two kinds of available information can be simultaneously utilized for designing pattern classification systems. Thus our task in this paper is to design a pattern classification system from numerical data and linguistic knowledge. We implicitly assume that either kind of information is not enough for designing classification systems with high classification performance. Numerical data are a set of labeled samples (i.e., training patterns with class labels). Let us assume that we have m training patterns xp = (xp\,... , xpn), p = 1,2,... , m from c classes where n is the number of attributes involved in our pattern classification problem. That is, our pattern classification problem has m training patterns with n attributes from c classes. We also assume that linguistic knowledge is given in the form of the following fuzzy if-then rules:

Rule Rj : If X\ is Aj\ and . . . and xn is Ajn

then Class Cj with CF = CFj, j = 1,2,... , M, (1)

where Rj is the label of the j - t h fuzzy if-then rule, x = (x\,... , xn) is an n-dimensional pattern vector, A^'s (i = 1,2,... , n) are linguistic values such as "small" and "large", Cj is a consequent class (i.e., one of the c classes), CFj is a certainty grade, and M is the number of given fuzzy if-then rules. Examples of such fuzzy if-then rules are "If x\ is small and X2 is small then Class 1 with CF = 0.9" and "If xi is large then Class 2 with CF = 0.8". It should be noted that there is no antecedent condition on the first attribute in the second fuzzy if-then rule. That is, given fuzzy

Approaches to the Design of Classification Systems ... 243

if-then rules may involve some "don't care" attributes. In this paper, we propose the following two approaches to the design of

classification systems from numerical data and linguistic knowledge:

(1) Fuzzy-rule-based approach: Fuzzy if-then rules are generated from numerical data. The generated fuzzy if-then rules are used in fuzzy rule-based systems together with linguistic knowledge. For generating fuzzy if-then rules, we examine two methods. One is direct rule extraction where fuzzy if-then rules are directly extracted from numerical data [15-20]. The design of fuzzy rule-based systems using the direct rule extraction is illustrated in Fig. 12.1. The other is indirect rule extraction where fuzzy if-then rules are extracted from neural networks that are trained using numerical data [21]. The design of fuzzy rule-based systems using the indirect rule extraction is illustrated in Fig. 12.2.

(2) Neural-network-based approach: Linguistic knowledge as well as numerical data are utilized in the learning of neural networks [22, 23]. Fuzzy if-then rules and training patterns are handled in a common framework in the learning. That is, fuzzy if-then rules are also used as training data. This approach is illustrated in Fig. 12.3.

We first describe the extraction of fuzzy if-then rules from numerical data and trained neural networks. Next we describe the learning of neural networks from numerical data and linguistic knowledge. Finally we examine the performance of our approaches to the design of classification systems from numerical data and linguistic knowledge through computer simulations.

Relations among numerical data, neural networks and linguistic knowledge (i.e., fuzzy if-then rules) are summarized in Fig. 12.4. The direction

Numerical data

Extraction

W Fuzzy rules

Linguistic knowledge ^ Fuzzy rules

Fuzzy rule-based system

Fig. 12.1 Fuzzy rule-based system where fuzzy if-then rules extracted from numerical data are used together with linguistic knowledge from human experts.


Numerical data

Learning

w

Linguistic knowledge =

Neural networks -

Fuzzy rules

Extraction

W Fuzzy rules

r > Fuzzy rule-based system

Fig. 12.2 Fuzzy rule-based system where fuzzy if-then rules extracted from trained neural networks are used together with linguistic knowledge from human experts.

Numerical data Training patterns Learning

Linguistic knowledge Fuzzy rules

Neural networks

Learning

Fig. 12.3 Neural-network-based classification system where numerical data and linguistic knowledge are simultaneously used in the learning.

from numerical data to neural networks is the learning of neural networks. This direction is one of the main streams of neural networks research. The direction from numerical data to fuzzy if-then rules corresponds to the extraction and learning of fuzzy if-then rules from numerical data. This direction is one of the most active research areas on fuzzy systems. Various techniques from machine learning, neural networks and evolutionary computations have been employed in this research area. While these two directions are very active, only a few studies have been reported along the other two directions in Fig. 12.4: learning of neural networks from linguistic knowledge, and fuzzy rule extraction from neural networks. As we have already mentioned, our approaches utilize the bidirectional relation between neural networks and linguistic knowledge [24].

12.2 Fuzzy Rule-Based Approach

12.2.1 Assumptions

As we have already described, we assume in this paper that the m labeled training patterns xp = ( x p l , . . . , xpn), p = 1,2,... , m and the M fuzzy if-


Learning

* 2

• :Class I o :Class 2 » :Class 3

Numerical Data

Extraction

Extraction

If JCJ is small then Class 1.

l.H

EKXXX 0.0 y , 1.0

Linguistic Knowledge

Fig. 12.4 Relations among numerical data, neural networks, and linguistic knowledge.

then rules Rj, j = 1,2,... , M in (1) are given. In this section, we describe the fuzzy rule-based approach where the given numerical data are used for generating fuzzy if-then rules. The generated rules are used together with the given linguistic knowledge for constructing fuzzy rule-based systems. We generate fuzzy if-then rules of the same form as the given fuzzy if-then rules in (1) from the numerical data.

For the simplicity of illustration, we assume that the pattern space of our pattern classification problem is the n-dimensional unit cube [0, l ] n . In computer simulations of this paper, attribute values are normalized into real numbers in the unit interval [0, 1]. We also assume that five linguistic values (i.e., small, medium small, medium, mediumlarge, and large) in Fig. 12.5 are given for all the n attributes involved in our pattern classification problem. Those linguistic values are used as antecedent fuzzy sets of fuzzy if-then rules. Of course, the selection of linguistic values is problem-dependent. In each application domain, an appropriate collection of linguistic values is to be differently chosen for each attribute. We use the five linguistic values in Fig. 12.5 for illustrating our approaches and demonstrating their performance. In addition to the five linguistic values, we also use "don't care" as a special linguistic value. The membership func-


tion of "don't care" is the same as the unit interval [0, 1] because the entire domain of each attribute is [0, 1]:

J l i f x € [0, 1],

I 0 otherwise.

Fig. 12.5 Membership function of five linguistic values (S: small, MS: medium small, M: medium, ML: mediumlarge, L: large).

Since we have the six antecedent fuzzy sets (five linguistic values in Fig. 12.5 and "don'tcare"), the total number of combinations of antecedent fuzzy sets is 6™. Among them, the most general rule has "don't care" for all the n attributes, and the most specific rules have no "don't care". For the case of two-dimensional pattern classification problems with the pattern space [0, 1] x [0, 1], we illustrate fuzzy partitions corresponding to all the 6 x 6 = 36 fuzzy if-then rules in Fig. 12.6. In this figure, the most general rule has two "don'tcare" conditions (bottom-left figure), and the most specific 25 rules have two linguistic conditions (top-right figure). The other 10 rules have a single linguistic condition on either attribute.

In many studies on fuzzy rule-based systems, "don't care" is not used as an antecedent fuzzy set. In the case of Fig. 12.6, only 25 fuzzy if-then rules are usually used in fuzzy rule-based systems. In this paper, we use "don't care" for avoiding the exponential increase in the number of fuzzy if-then rules as the dimensionality n of the pattern classification problem increases. That is, we do not use all the possible rules (i.e., 6™ rules). As we can see from Fig. 12.6, many specific fuzzy if-then rules are included in some general fuzzy if-then rules. We tackle the curse of dimensionality (i.e., exponential increase in the number of fuzzy if-then rules) by using a small number of general rules instead of a large number


DC

Xi

i i i i 7//^//////^///////^//////^///. i i i i ^ ^ ^ I

y//^//////W//////fa/////My/,.

SXMSXMXMLXL

XI

DC

DC x2

\ \ ! s i

SYMSYMYMLXL

DC Xi

0.0

Fig. 12.6 Fuzzy partitions corresponding to all the 36 fuzzy if-then rules.

of specific rules. In this sense, fuzzy rule generation from numerical data for high-dimensional problems can be viewed as choosing a small number of appropriate combinations of antecedent fuzzy sets from a huge number of possible combinations.

12.2.2 Heuristic Rule Extraction Directly from Numerical Data

In this section, we show how fuzzy if-then rules can be extracted directly from the given numerical data xp = (xpi,... , xpn), p = 1,2,... , m. Our task in this section is to extract fuzzy if-then rules of the form in (1). First we briefly illustrate a heuristic rule generation procedure of Ishibuchi et al. [15], which determines the consequent class Cj and the certainty grade CFj of the fuzzy if-then rule Rj as follows when its antecedent fuzzy sets Aji's (i = 1, 2 , . . . , n) are specified:


Step 1: Calculate the compatibility of each training pattern xp with the fuzzy if-then rule Rj by the following product operation:

Mj(,xpJ = /J'jiy-Epl) X . . . X (J,jn(%pn)i {•*)

where Hji{xPi) is the membership function of Aji. Step 2: For each class, calculate the sum of the compatibility grades of the

training patterns with the fuzzy if-then rule Rj-.

/fclass h(Rj) = ^2 W ( X P ) ' h = 1'"-2' • • • ' C ' ( 4 ) xp£Class h

where ftc\ass h(Rj) is the sum of the compatibility grades of the training patterns in Class h with the fuzzy if-then rule Rj.

Step 3: Find Class Cj that has the maximum value of /fciass h{Rj)'-

/3ciass C, (Rj) = max{/?ciass l ( jRj) . • • • ) ^Class c(Rj)}• (5)

If two or more classes take the maximum value, the consequent class Cj of the fuzzy if-then rule Rj can not be determined uniquely. In this case, let Cj be <f>. If a single class takes the maximum value in (5), that class is the consequent class of the fuzzy if-then rule Rj. If there is no training pattern compatible with the antecedent part of the fuzzy if-then rule (i.e., if there is no training pattern in the fuzzy subspace Aj\ x Aj2 x . . . , xAjn), the consequent class Cj is also specified as 4>.

Step 4' If the consequent class Cj is </», let the certainty grade CFj of the fuzzy if-then rule Rj be CFj = 1.0. Otherwise the certainty grade CFj is determined as follows:

c

CFj = (/?Class Ci{Rj)-P)lY, ^Class h(R>)' ^ h=l

where

P=Y1 0Cla»fc(fl i) /(c-l) . (7)

While the determination of the certainty grade CFj by (6)-(7) seems to be a bit complicated at a glance, this procedure is easily understood and intuitively acceptable when we consider two class classification problems


(i.e., c = 2). In this case, Cj is Class 1 and CFj is specified as follows when

/?Class l(Rj) > /?Class 2(Rj)'-

CFi felass l(Rj) — /?Class l{Rj)

/?Class l(-Rj) + feass 2(Rj)

Otherwise Cj is Class 2 and C.F,- is specified as

CFj = ^Class 2JRj) - felass l(-Rj)

JClass 1 (£,-) + , 'Class 2 ( ^ ) '

(8)

(9)

As we can see from (4) in Step 2, the determination of the consequent class Cj and the certainty grade CFj depends only on training patterns compatible with the antecedent part of the fuzzy if-then rule Rj. Such determination is illustrated in Fig. 12.7. As shown in Fig. 12.7, the certainty grade takes its maximum value (i.e., CFj = 1.0) when all the compatible patterns belong to a single class.

• : Class 1 o : Class 2 • : Class 1 o : Class 2 A

Aj}>

•

•

Ax

o

o

o

= •

V

A/p

• •

•

A\

o

o

o

=»

Q: Class \,CFj= 1.0 C,: Class 1,CF, = 0.92

Fig. 12.7 Illustration of the heuristic rule generation procedure.

12.2.3 Fuzzy Reasoning

A fuzzy rule-based classification system consists of the extracted fuzzy if-then rules from the training patterns and the given fuzzy if-then rules (i.e., linguistic knowledge). Let us denote the set of those fuzzy if-then rules in the fuzzy rule-based classification system by S. In the classification phase, a new pattern xp is classified by the single winner rule Rj* in 5 , which is


defined as follows:

Hj. • CFr = max{^(x p ) • CFj \ Rj G S}. (10)

That is, the winner rule has the maximum product of the compatibility grade /ij(xp) and the certainty grade CFj. If more than one fuzzy if-then rule have the same maximum product but different consequent classes for the new pattern xp , the classification of that pattern is rejected. The classification is also rejected if no fuzzy if-then rule is compatible with the new pattern xp (i.e., /Uj(xp) for all rules in S). This fuzzy reasoning method based on the single winner rule is easily understood by human users. It is also suitable for the learning of fuzzy if-then rules by a reward-punishment scheme [25] and the evolution of fuzzy if-then rules in a genetics-based machine learning method [19,20].

12.2.4 Fuzzy Rule Selection by Genetic Algorithms

The consequent part of each fuzzy if-then rule can be determined by the heuristic rule generation procedure when its antecedent part is specified. For small-size pattern classification problems with only a few attributes, we can generate 6" fuzzy if-then rules by examining all the possible combinations of the six antecedent fuzzy sets (i.e., five linguistic values in Fig. 12.5 and "don't care"). Such an exhaustive rule generation can not be applied to high-dimensional problems due to the exponential increase in the number of fuzzy if-then rules. One method for constructing a compact fuzzy rule-based system is to generate a set of promising candidate rules and to select only a small number of significant rules from the candidate rule set [16-18]. In the application to high-dimensional problems, we generate only general fuzzy if-then rules with many "don't care" conditions as candidate rules. For example, fuzzy if-then rules with less than three linguistic conditions (i.e., with more than (n —3) "don't care" conditions) are usually tractable in terms of their size. The number of those fuzzy if-then rules is 62 x n(n — l ) /2 .

Let Scand a n d Ncand be the candidate rule set and the number of candidate rules in Scand, respectively. Genetic algorithms are utilized for selecting only a small number of significant rules from Scand [16-18]. We denote a subset of the candidate rule set Scand by S (i.e., 5 C 5cand)- In our genetic algorithm, the inclusion and the exclusion of the j-th candidate rule is denoted by Sj = 1 and Sj = 0, respectively (j = 1,2,... , iVcand)-

Approaches to the Design of Classification Systems . . . 251

That is, Sj = 1 (SJ — 0) means that the j'-th candidate rule is included in the rule set S (excluded from the rule set 5). In this manner, every subset S of the candidate rule set Scand is coded by a bit string of the length Ncand-

First our genetic algorithm randomly generates a prespecified number of bit stings of the length iVcand to form an initial population. A fitness value of each sting S (i.e., rule set S) is defined by its classification performance on the training patterns and the number of fuzzy if-then rules in S as follows:

fitness{S) = NCP(S) - w\S\ • \S\, (11)

where NCP(S) is the number of correctly classified training patterns by the rule set 5, w\s\ is a positive constant, and \S\ is the number of fuzzy if-then rules in S. Since our aim is to select only a small number of significant rules, the second term in (11) is included in the fitness function as a kind of penalty with respect to the number of selected rules. From the current population, good strings with high fitness values are selected as parents for generating new strings by genetic operations. In computer simulations of this paper, we specify the selection probability of each string S in the current population ty by the roulette wheel selection scheme with the linear scaling as

p/^N = fitness(S) - / m i n f f l ( . { ' E s € * [ / « n e « ( 5 ) - / m i n ( * ) ] ' ^ '

where /min(*) is the fitness value of the worst sting in the current population * (i.e., the minimum fitness value in * ) .

Since every rule set 5 is denoted by a bit string, we can use standard genetic operations such as the uniform crossover and the bit mutation in our genetic algorithm. One characteristic feature of our genetic algorithm is the use of biased mutation probabilities. For efficiently decreasing the number of fuzzy if-then rules included in each rule set (i.e., the number of l's in each string) by the bit mutation, we assign a higher probability to the mutation from Sj — 1 to Sj — 0 than the mutation Sj = 0 —> s3•, = 1. This is because the former decreases the number of fuzzy if-then rules while the latter increases it. We also use an elitist strategy where the best string in the current population is always inherited to the next population with no change.


Various versions of our rule selection method were studies in Ishibuchi et al. [18] especially from a viewpoint of multi-objective optimization. Moreover a reward-punishment learning scheme of fuzzy if-then rules [25] can be applied to newly generated rule sets for improving their classification performance before their fitness values are calculated.

12.2.5 Genetics-Baaed Machine Learning for Fuzzy Rule Generation

The rule selection method in the previous section searches for a compact rule set from the given candidate rule set. Since the string length is the same as the number of candidate rules, the rule selection method can not handle a large number of candidate rules. This means that we need a prescreening procedure to generate a tractable number of candidate rules when the rule selection method is applied to high-dimensional problems. A more straightforward fuzzy rule generation method was proposed for high-dimensional pattern classification problems by Ishibuchi et al. [19, 20] where genetic operations were used for generating combinations of antecedent fuzzy sets (i.e., for specifying the antecedent part of each fuzzy if-then rule).

In our fuzzy genetics-based machine learning algorithm called a fuzzy classifier system, every fuzzy if-then rule is denoted by a string of the length n, which consists of its n antecedent fuzzy sets. Let us denote the six antecedent fuzzy sets as follows:

don't care —t 0, small —> 1, medium small —> 2, medium —> 3, medium large —> 4, large -¥ 5.

Using this notation, every fuzzy if-then rule is denoted by a string of the length n with the alphabet {0, 1, 2, 3, 4, 5}. For example, "1030" shows "If Xi is small and x^ is don't care and £3 is medium and X\ is don't care". The corresponding consequent part is determined by the heuristic rule generation procedure. So only the antecedent part is coded as a string. Such a string is handled as an individual in our fuzzy classifier system.

First our fuzzy classifier system randomly generates a prespecified number of strings of the length n with the alphabet {0, 1, 2, 3, 4, 5} to form an initial population (i.e., an initial set of fuzzy if-then rules). All the training patterns are classified by the current population for evaluating the fitness of each string (i.e., each fuzzy if-then rule). A fitness value of a fuzzy if-then rule Rj in the current population is defined as follows based on the


classification results on the training patterns:

fitness{Rj) = NCP(Rj) - werror • NMP(Rj), (13)

where NCP(Rj) is the number of correctly classified training patterns by Rj, Werror is a positive weight, and NMP(Rj) is the number of misclassi-fied training patterns by Rj. It should be noted that a single winner rule is responsible for the correct classification (or misclassification) of each training pattern. The first term and the second term in (13) are viewed as the award for the correct classification and the penalty for the misclassification, respectively.

A pair of good strings are selected from the current population for generating new strings by genetic operations. The selection probability of each string is specified in the same manner as in the case of the rule selection. The uniform crossover is used for generating two stings from the selected pair of parent strings. A mutation operation is applied to each value of the generated strings, which randomly replaces a value in the strings with another value. By iterating these genetic operations, a prespecified number of new strings are generated. The same number of the worst strings in the current strings are replaced with the newly generated strings. The number of strings in each population is kept constant during this population update.

Various versions of our fuzzy classifier system were studied in Ishibuchi & Nakashima [20] for improving its search ability to efficiently find good fuzzy if-then rules. The learning of fuzzy if-then rules [25] is applicable to each population for improving its classification performance before the fitness value of each fuzzy if-then rule is calculated.

12.2.6 Fuzzy Rule Extraction from Numerical Data via Neural Networks

In the previous sections, we describe several methods for generating fuzzy if-then rules directly from the given training patterns. In this section, we illustrate an extraction method of fuzzy if-then rules from trained neural networks [21]. Advantages of our method over other rule extraction methods [26-35] are as follows: (1) extracted fuzzy if-then rules are always linguistically interpretable, (2) a certainty grade is assigned to each rule, and (3) our method is applicable to arbitrary trained neural networks. That is, our method is a general algorithm to extract fuzzy if-then rules of the


form in (1) by handling a trained neural network as a black box model. For the simplicity of illustration, we assume that a standard three-layer

feedforward neural network has already been trained using the given training patterns while our method is applicable to more general neural networks (e.g., four-layer neural networks). The number of input units is the same as the dimensionality of the pattern classification problem (i.e., n), and the number of output units is the same as the number of classes (i.e., c). The number of hidden units, which can be arbitrarily specified, is denoted by nn- As in Rumelhart et al. [2], the input-output relation of each unit in the trained neural network can be written for an n-dimensional input vector xp = (xpi,... , xpn) as follows:

Input units : oPi = xPi, i = 1,2,... ,n (14)

Hidden units : opj = f(netpj), j = 1,2,... ,UH (15) n

netpj = ^ opi • Wji +0j, j = 1,2,... , nH (16) i=\

Output units : oPk = f(netpk), fc = 1,2,... , c (17) nH

netpk = yÔpj • wkj + Ok, k = 1,2,... ,c (18)

As the activation function /(•) in the hidden and output layers, we use the sigmoidal function f(x) = 1/(1 + exp(—a;)) as in Rumelhart et al. [2]. Our task in this section is to generate fuzzy if-then rules from the trained neural network in (14)-(18). Our rule extraction method [21] determines the consequent part of a fuzzy if-then rule using the trained neural network when its antecedent part is specified. Thus our rule extraction method can be viewed as a counterpart of the heuristic rule extraction procedure. Now let us illustrate how the consequent class Cp and the certainty grade CFp of the fuzzy if-then rule Rp can be determined by the trained neural network when its antecedent fuzzy sets Api,... ,Apn are given.

First, the antecedent fuzzy sets are presented to the trained neural network as a fuzzy input vector. The input-output relation of each unit in the trained neural network is extended to the case of the fuzzy input vector


A p — V pl > • " * ' -™-pn) < S

Input units : 0Pi = Xpi, i — 1,2,... , n (19)

Hidden units : 0Pj = f(NetPj), j = 1,2,... , nn (20) n

Netpj =^20Pi-wji+dj, j = 1,2,... ,TIH (21) 8 = 1

Output units : Opk = f(Netpk), k = 1,2,... , c (22)

Are^fc = ^ O p j - ^ j + 6 » / i ; , fc = l , 2 , . . . , c (23)

In the above formulations, uppercase letters (e.g., APi, 0Pi, NetPj) are fuzzy numbers, and lowercase letters (e.g., Wji, w^j, @j) are real numbers. The input-output relation of each unit is defined by fuzzy arithmetic on fuzzy numbers [36]. Its numerical calculation is performed by interval arithmetic [37,38] on level sets of the fuzzy input vector as in many studies on fuzzified neural networks [22,23,39,40]. In Fig. 12.8 and Fig. 12.9, we illustrate the sum of fuzzy numbers and the nonlinear mapping of fuzzy numbers by the sigmoidal activation function, respectively.

Fig. 12.8 Sum of fuzzy numbers A and B.

When a non-fuzzy input vector xp is presented to the trained neural network, we have a crisp output vector op = (o p i , . . . ,opc) calculated by (14)-(18). In this case, the non-fuzzy input vector xp is classified by the output unit with the maximum output value among the c output units.


1.0

/(Net)

0.0 - 3 Net 3

Fig. 12.9 Nonlinear mapping of fuzzy numbers by the sigmoidal activation function.

That is, we use the following classification rule:

If opk < oph for all k's (k = 1,2,... , c and k jt h)

then xp is classified as Class/i, (24)

where opk is the output value from the fc-th output unit (k = 1,2,... , c). By simply extending the above classification rule to the case of the fuzzy input vector A p = (Api,... , Apn), we have the following rule:

If Opk < Oph for all k's (k = 1,2,... , c and k ^ h)

then A p is classified as Class/i, (25)

where Opk is the fuzzy output from the k-th output unit ( f c = l , 2 , . . . , c ) . If we try to classify the fuzzy input vector A p (i.e., to determine the antecedent class Cv) by this classification rule, we have to define the inequality relation between the fuzzy outputs (i.e., Ovk < 0Ph). Since it is very difficult to clearly decide whether the inequality relation holds or not, we use the following classification rule based on the level set of the fuzzy output vector:

If [Opfc]/3 < [Oph}/3 for all k's (k = 1, 2 , . . . , c and k ^ h)

then [Ap]/3 is classified as Class/i, (26)

where [-]p denotes the level set of a fuzzy number at the level of /?, which is defined as

[X]0 = {x | iix{x)>P, xeRe}. (27)

w


In general, level sets of fuzzy numbers are intervals. We use the following definition for the inequality relation between the level sets (i.e., intervals) in (26):

[aL, au] < [bL, bu] *=> au < bL, (28)

where the superscripts "L" and "U" mean the upper and lower limits of intervals, respectively. That is, an interval A is written as

A = [aL, au] = {x | aL <x< au, x € 5ft}. (29)

When the classification rule in (26) holds for a prespecified value of j3, we classify the /?-level set [Ap]/3 of the fuzzy input vector Ap as Class h. In this case, we determine the consequent class Cp of the fuzzy if-then rule Rp

as Class h. In Fig. 12.10, we show some examples of fuzzy output vectors. The consequent class is determined as Class 2 in the case of Fig. 12.10 (a) whereas it can not be determined in Fig. 12.10 (b). In the latter case, we do not extract the corresponding fuzzy if-then rule.

From Fig. 12.10, we can see that the fuzzy output vector is classifiable when the overlap between the largest fuzzy output and the other fuzzy outputs is not large. In other words, when the overlap is small, we are sure that the fuzzy input vector belongs to Class h determined by (26). Furthermore, we may think that the smaller the overlap is, the larger the certainty of the classification is. Based on this discussion, we define the certainty grade CFP by the overlap between the largest fuzzy output (i.e., Oph) and the other fuzzy outputs as in Fig. 12.10 (a). Since the fuzzy output vector O p = (Opi,... ,Opc) is numerically calculated by interval arithmetic on level sets of the fuzzy input vector A p = (Api,... , Apn), we use the following procedure to determine the consequent class Cp and the certainty grade CFP:

Step 1: Specify the value of /? (e.g., /3 = 0.9). Step 2: Examine the classifiability of [Ap]/3 by the classification rule in

(26). If [Ap]^ is not classifiable, terminate this procedure. In this case, we do not extract the fuzzy if-then rule Rp with the antecedent fuzzy sets Api,... , Apn. Otherwise specify the consequent class Cp

as Ch that satisfies (26), and go to Step 3. Step 3: Slightly decrease the value of (3 as j3 := /? — e where e is a very

small positive constant (e.g., e = 0.01). If /? < 0 , specify the


ft

CO U CD

CD

1.0

p

0.0

Opi Op3 o p2

_

| A "cFp/S.

0.0 1.0 (a) Classifiable case.

Op3 Op2

(b) Unclassifiable case.

Fig. 12.10 Examples of fuzzy output vectors.

Step

certainty grade CFP as CFP

Otherwise go to Step 4. 4: Examine the classifiabihty of [Ap

1.0 and terminate this procedure.

by the classification rule in (26). If [Ap]/3 is classifiable, return to Step 3 for examining the classifiabihty of the fuzzy input vector A p further. Otherwise, specify the certainty grade CFP as CFp = 1 — (/? + e), and terminate this procedure.

By this procedure, the certainty grade is determined by the overlap between the largest fuzzy output Oph and the other fuzzy outputs as shown in Fig. 12.10 (a).

Let us illustrate our approach to the design of fuzzy rule-based clas-


sification systems from numerical data and linguistic knowledge through computer simulations on a simple example. We assume that the training patterns in Fig. 12.11 and the following fuzzy if-then rules are given for designing a fuzzy rule-based classification system:

If xi is small then Class 1 withCF = 1.0, (30)

If xi is large then Class 2 withCF = 1.0, (31)

If xi is medium or mediumlarge or large and x% is small

then Class 1 with CF = 1.0, (32)

where we use a trapezoidal membership function for the combined linguistic value "medium or mediumlarge or large". The antecedent fuzzy set "don't care" on the second attribute x? is omitted in the first and second fuzzy if-then rules. Our task is to design a fuzzy rule-based classification system from the training patterns in Fig. 12.11 and the fuzzy if-then rules in (30)-(32).

1.0

*2

0.0 0.0

x o o o

- • ^^^w? ° • • > > > , ^ > > ^ o

- • • ^^^^ - • - •

^^^ o

• f o o o

xx

• : Class 1

O : Class 2

1.0

Fig. 12.11 Numerical data and the classification boundary by the trained neural network.

We trained a three-layer feedforward neural network using the standard back-propagation algorithm. The classification boundary by the trained neural network is shown in Fig. 12.11. Using the five linguistic values in Fig. 12.5 and "don't care" as antecedent fuzzy sets (i.e., as fuzzy inputs to the trained neural networks), we examined 36 fuzzy input vectors. By our rule extraction method, we extracted 24 fuzzy if-then rules from the


trained neural network:

If xi is small then Class 1 with CF = 0.73,

If x\ is small and X2 is small then Class 1 withCF = 0.94,

If x\ is large and X2 is large then Class 2 withCF = 1.0.

1.0

*2

0.0

o o o

- • o o

• • x » ^ . o

- • • ^ ^ - • - •

• o o o

• : Class 1

O : Class 2

0.0 *1

1.0

Fig. 12.12 Classification boundary by the extracted 15 rules.

1.0

*2

0 0

- •

- •

- •

• 1 1

o o o

o o

• ^ ^ 0

^ \

' / • J o o o

1 1 1 1 1 1 1

• : Class 1

O : Class 2

0.0 XX

1.0

Fig. 12.13 Classification boundary by the extracted 15 rules and the given 3 rules.


In the extracted fuzzy if-then rules, some rules are included in other rules. For example, in the above 24 rules, the second fuzzy if-then rule is included in the first rule. For avoiding the use of too many fuzzy if-then rules in classification systems, we do not use fuzzy if-then rules included in other rules. In the above example, nine fuzzy if-then rules among the extracted 24 rules are included in other rules. Thus we use the other 15 fuzzy rules in a fuzzy rule-based classification system. In Fig. 12.12, we show the classification boundary obtained by those 15 fuzzy if-then rules. We can see that the classification boundary by the extracted 15 fuzzy if-then rules in Fig. 12.12 is similar to that of the trained neural network in Fig. 12.11. The extracted 15 fuzzy if-then rules are used together with the given 3 fuzzy if-then rules. In Fig. 12.13, we show the classification boundary by the fuzzy rule-based classification system with the 18 fuzzy if-then rules.

12.3 Neural Network-Based Approach

We have already explained how fuzzy rule-based classification systems can be designed from numerical data and linguistic knowledge. In this section, we describe the design of neural-network-based classification systems. As in the previous section, we assume that the m training patterns xp = (xpi,... ,xpn), and the M fuzzy if-then rules in (1) are given. A learning algorithm was proposed by Ishibuchi et al. [22] for training neural networks by fuzzy if-then rules without certainty grades. In their approach, fuzzy if-then rules of the following type are used in the learning of neural networks:

If xi is Api and . . . and x<i is Apn then Class Cp. (33)

The antecedent fuzzy sets Ap\,... , Apn are used as fuzzy inputs to neural networks as in our fuzzy rule extraction method. When the fuzzy input vector Ap = (Api,... , Apn) is presented to a three-layer feedforward neural network with n input units, n # hidden units and c output units, the input-output relation of each unit is denned by (19)-(23). The corresponding fuzzy output vector Op = ( 0 p i , . . . ,Opc) is numerically calculated by interval arithmetic on level sets of the fuzzy input vector Ap . A target vector tp = (tpi,... ,tpc) is defined for the fuzzy input vector A p by the


consequent class (i.e., Class Cp) as

f l , if Class Cp = Class*, fc = 1 > 2 > _ > c . ( 3 4 )

[0, otherwise,

A cost function to be minimized in the learning of the neural network is denned by the difference between the fuzzy output vector O p = (Opi,... , OpC) and the target vector t p = (tpi,... , tpc) as

ep = f k i M + kM), (35) k=\ a *• >

where the upper scripts "L" and "[/" denote the lower limit and the upper limit of the a-level set [0Pfc]a of the fuzzy output Opk, respectively:

[Opk]a = [[Opk]L

a, [Opk]ual (36)

The a-level set [Opfc]a is calculated from the a-level set of the fuzzy input vector A p by interval arithmetic. In the same manner as in the back-propagation algorithm [2], we can derive the learning algorithm of the connection weights and biases of the neural network from the cost function in (35). While the neural network can be trained by this cost function, the certainty grade CFp of each fuzzy if-then rule of the form in (1) is not taken into account in the learning. In this paper, we modify the membership function fi(-) of each antecedent fuzzy set as follows for utilizing the certainty grade CFp of each fuzzy if-then rule in the learning of the neural network:

»Ki =VApi{xi)-CFp, i = 1,2,... ,n; p= 1,2,... ,M, (37)

where A*vi is the modified antecedent fuzzy set. Since the membership value is discounted by CFP in (37), the a-level

set of the modified antecedent fuzzy set is empty when a > CFP. For preventing such an empty level set from being used in the learning, the value of a in the cost function (36) is restricted by the inequality a < CFp. In our learning method, each training pattern xp = (xpx,... , xpn) is handled as a fuzzy input vector in order to use numerical data and linguistic knowledge in the common learning algorithm. The input value xPi is handled as a


fuzzy number with the following membership function:

[0, ifx^xpi.

The certainty grade CFP of each training pattern is implicitly assumed as CFP = 1.0. If different certainty grades are assigned to some training patterns, they can be used as in (37) for discounting the membership function in (38). In this manner, the given M fuzzy if-then rules and the given m training patterns are commonly handled as the fuzzy training data (Api,... , Apn;Cp;CFp), p = 1,2,... , m + M where Api = xpi, i = 1,2,... ,n in the case of training patterns. The learning algorithm is summarized as follows:

Step 1: As in the learning of standard feedforward neural networks, specify the initial values of the connection weights and biases, the learning rate, the momentum constant, and the stopping condition. In addition to these parameter specifications, specify a set of values of a used in the cost function in (36). Let us denote those values as a i , . . . ,aK-

Step 2: Let k be the index of the value of a. Specify k as k = 1. Step 3: Specify the pattern index p as p = 1. Step 4: Let a be a := a*. If a > CFP then go to Step 5. Otherwise adjust

the connection weights and biases using the a-level set [AP]Q of the p-th fuzzy input pattern Ap.

Step 5: If the stopping condition is satisfied, terminate the learning algorithm. Otherwise go to Step 6.

Step 6: Update the pattern index p as p := p + 1. If p < m + M then return to Step 4. Otherwise go to Step 7.

Step 7: Update the index k as k := k + 1. If k < K then return to Step 3. Otherwise return to Step 2.

Let us illustrate our learning algorithm by the numerical data in Fig. 12.11, which has already been used for illustrating our fuzzy rule extraction method. As in the case of the fuzzy rule extraction, we assume that the 3 fuzzy if-then rules in (30)-(32) are given in addition to the 20 training patterns in Fig. 12.11. These two kinds of the available information are handled as the 23 fuzzy training patterns in our learning algorithm. We trained a three-layer feedforward neural network with two input units, 5 hidden units and two output units by the above learning algorithm. We show

264 H. hhibuchi, M. Nii & T. Nakashima

the classification boundary by the trained neural network in Fig. 12.14. From this figure, we can see that all the training patterns are correctly classified. We can also see that the classification boundary coincides with the given fuzzy if-then rules in (30)-(32). This means that our learning algorithm can simultaneously utilize the training patterns and the fuzzy if-then rules.

When there are some conflicts between numerical data and linguistic knowledge, the learning algorithm tries to find a kind of compromise. In such a learning process, the certainty grade attached to every single piece of information plays an important role (i.e., has a large effect on the final learning result). For example, if a training pattern with a very small certainty grade is included in a fuzzy if-then rule with a different consequent class, that training pattern is almost ignored in the learning of neural networks. On the other hand, when two fuzzy if-then rules with different consequent classes overlap with each other, the classification boundary is likely to follow the fuzzy if-then rule with the larger certainty grade.

1.0

*2

0.0

-- •

- •

- • "

I

1 o

•

•

• _1

•

f • '

o

•

o 1

o

_1_

o

0

^ 0

• '

V

"

o

o o 1 1 1

0.0 *1

• : Class 1

O : Class 2

1.0

Fig. 12.14 Classification boundary by the trained neural network from numerical data and linguistic knowledge.

12.4 Performance Evaluation

In this section, we examine the performance of the two approaches (i.e., the fuzzy rule-based approach and the neural network-based approach) to the design of classification systems from numerical data and linguistic knowl-


edge through computer simulations on the well-known iris data. The iris data involve 150 samples with four attributes from three classes [41]. Since we had no linguistic knowledge on the iris data, we generated linguistic knowledge in the following manner to artificially create the situation where both numerical data and linguistic knowledge are given. First we randomly divided the iris data into three subsets (say, data sets A, B, and C) with 50 samples. Data set A was used as numerical data. Data set B was used as test data for evaluating classification systems. Data set C was used for generating linguistic knowledge. We employed the GA-based rule selection method [16-18] for generating a small number of fuzzy if-then rules from the data set C. In our computer simulation, the five linguistic values in Fig. 12.5 and "don't care" were used as antecedent fuzzy sets. Since the iris data have four attributes, we examined 64 = 1296 combinations of the antecedent fuzzy sets to generate candidate fuzzy if-then rules of the following type:

If x\ is Aji and x-i is Aj2 and X3 is Aj% and X4 is Aj±

then Class C, with CFj. (39)

By the heuristic rule generation procedure, 491.8 fuzzy if-then rules were generated from the data set C on the average (over 50 trials for different specifications of the data set C). The other fuzzy if-then rules could not generated because there were no training patterns from the data set C in the corresponding fuzzy subspaces. Then the rule selection method was applied to the generated candidate rules to select a small number of relevant fuzzy if-then rules using the data set C. The average number of selected fuzzy if-then rules was 3.8 over 50 independent trials. In this manner, a small number of fuzzy if-then rules were found from the data set C as linguistic knowledge. Our task is to design a classification system from the numerical data (i.e., the data set A) and the linguistic knowledge generated by the data set C. The designed classification system is evaluated by the data set B.

In our computer simulations, we examined the following seven classification systems.

(1) Neural networks that were trained only from the numerical data (i.e., data set A).

(2) Neural networks that were trained from both the numerical data and the linguistic knowledge.


(3) Fuzzy rule-based systems where fuzzy if-then rules were generated only from the numerical data. We used the fuzzy classifier system [19-20] to generate fuzzy if-then rules from the data set A. The number of fuzzy if-then rules was specified as 20 in the fuzzy classifier system.

(4) Fuzzy rule-based systems where fuzzy if-then rules were extracted from trained neural networks in (1). The average number of extracted fuzzy if-then rules was 48.7.

(5) Fuzzy rule-based systems that were directly constructed by the linguistic knowledge.

(6) Fuzzy rule-based systems where fuzzy if-then rules were a mixture of (3) and (5).

(7) Fuzzy rule-based systems where fuzzy if-then rules were a mixture of (4) and (5).

Our computer simulation was iterated 50 times by differently partitioning the iris data set into the three subsets A, B and C. Average classification rates on the test data (i.e., data set B) by the above seven methods are summarized in Table 12.1. For the neural-network-based classification system, we show the best results over various specifications of the number of learning iterations. From this table, we can see that the highest average classification rate was obtained by neural networks trained by the two kinds of available information.

Table 12.1 Classification rates on the test patterns.

Classification system

Neural Network

Fuzzy Rule Base

Available Information Numerical data

Numerical data and linguistic knowledge Numerical data

Trained neural network Linguistic knowledge

Numerical data and trained NN Numerical data and linguistic knowledge

Rate 97.7% 97.9% 94.6% 95.6% 93.3% 95.2% 95.4%


12.5 Conclusion

In this paper, we illustrated how numerical data (i.e., training patterns) and linguistic knowledge (i.e., fuzzy if-then rules) can be simultaneously utilized in the design of pattern classification systems. We proposed two approaches. One is a fuzzy rule-based approach where fuzzy if-then rules generated from numerical data are used together with the given linguistic knowledge to construct a fuzzy rule-based classification system. We described several techniques for generating fuzzy if-then rules from numerical data. The other approach is a neural-network-based approach where the given linguistic knowledge is used in the learning of neural networks as training data. In this approach, linguistic knowledge (i.e., fuzzy if-then rules) and numerical data (i.e., training patterns) are handled in a common learning algorithm as fuzzy training patterns. Neural networks were extended to the case of fuzzy input vectors. Through computer simulations, we demonstrated that classification systems designed by utilizing the two kinds of information had high classification rates.


References

[I] R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis, John Wiley k Sons, New York, 1973.

[2] D. E. Rumelhart, J. L. McClelland, and the PDP Research Group, Parallel Distributed Processing, MIT Press, Cambridge, 1986.

[3] S. M. Weiss and C. A. Kulikowski, Computer Systems That Learn, Morgan Kaufmann Publishers, San Mateo, 1991.

[4] J. R. Quinlan, C4-5: Programs for Machine Learning, Morgan Kaufmann Publishers, San Mateo, California, 1993.

[5] M. Sugeno, "An introductory survey of fuzzy control," Information Sciences, vol. 36, no. 1/2, pp. 59-83, 1985.

[6] C. C. Lee, "Fuzzy logic in control systems: fuzzy logic controller Part I and Part II," IEEE Trans, on Systems, Man, and Cybernetics, vol. 20, no. 2, pp. 404-435, 1990.

[7] T. Takagi and M. Sugeno, "Fuzzy identification of systems and its applications to modeling and control," IEEE Trans, on Systems, Man, and Cybernetics, vol. 15, no. 1, pp. 116-132, 1985.

[8] L. X. Wang and J. M. Mendel, "Generating fuzzy rules by learning from examples," IEEE Trans, on Systems, Man, and Cybernetics, vol. 22, no. 6, pp. 1414-1427, 1992.

[9] M. Sugeno and T. Yasukawa, "A fuzzy-logic-based approach to qualitative modeling," IEEE Trans, on Fuzzy Systems, vol. 1, no. 1, pp. 7-31, 1993.

[10] S. Mitra, "Fuzzy MLP based expert system for medical diagnosis," Fuzzy Sets and Systems, vol. 65, No. 2 / 3 , pp. 285-296, 1994.

[II] S. Abe, and M.-S. Lan, "A method for fuzzy rules extraction directly from numerical data and its application to pattern classification," IEEE Trans. on Fuzzy Systems, vol. 3, no. 1, pp. 18-28, 1995.

[12] Y. Yuan, and H. Zhuang, "A genetic algorithm for generating fuzzy classification rules," Fuzzy Sets and Systems, vol. 84, no. 1, pp. 1-19, 1996.


[13] O. Cordon, and F. Herrera, "A three-state evolutionary process for learn-, ing descriptive and approximate fuzzy-logic-controller knowledge bases from examples," International Journal of Approximate Reasoning, vol. 17, no. 4, 369-407, 1997.

[14] D. Nauck, and R. Kruse, "A neuro-fuzzy method to learn fuzzy classification rules from data," Fuzzy Sets and Systems, vol. 89, pp. 277-288, 1997.

[15] H. Ishibuchi, K. Nozaki, and H. Tanaka, "Distributed representation of fuzzy rules and its application to pattern classification," Fuzzy Sets and Systems, vol. 52, no. 1, pp. 21-32, 1992.

[16] H. Ishibuchi, K. Nozaki, N. Yamamoto, and H. Tanaka, "Construction of fuzzy classification systems with rectangular fuzzy rules using genetic algorithms," Fuzzy Sets and Systems, vol. 65, pp. 237-253, 1994.

[17] H. Ishibuchi, K. Nozaki, N. Yamamoto, and H. Tanaka, "Selecting fuzzy if-then rules for classification problems using genetic algorithms," IEEE Trans. on Fuzzy Systems, vol. 3, no. 2, pp. 260-270, 1995.

[18] H. Ishibuchi, T. Murata, and I. B. Turksen, "Single-objective and multi-objective genetic algorithms for selecting linguistic rules for pattern classification problems," Fuzzy Sets and Systems, vol. 89, pp. 135-150, 1997.

[19] H. Ishibuchi, T. Nakashima, and T. Murata, "Performance evaluation of fuzzy classifier systems for multi-dimensional pattern classification problems," IEEE Trans, on Systems, Man, and Cybernetics, (in press)

[20] H. Ishibuchi and T. Nakashima, "Improving the performance of fuzzy classifier systems for pattern classification problems with continuous attributes," IEEE Transactions on Industrial Electronics, (in press)

[21] H. Ishibuchi and M. Nii, "Generating fuzzy if-then rules from trained neural networks," Proc. of 1996 IEEE International Conference on Neural Networks, pp. 1133-1138, 1996.

[22] H. Ishibuchi, R. Fujioka and H. Tanaka, "Neural networks that learn from fuzzy if-then rules," IEEE Trans, on Fuzzy Systems, vol. 1, no. 2, pp. 85-97, 1993.

[23] H. Ishibuchi, H. Tanaka, and H. Okada, "Interpolation of fuzzy if-then rules by neural networks," International J. of Approximate Reasoning, vol. 10, no. 1, pp. 3-27, 1994.

[24] H. Ishibuchi, M. Nii, and I. B. Turksen, "Bidirectional bridge between neural networks and linguistic knowledge: Linguistic rule extraction and learning from linguistic rules," Proc. of 1998 IEEE International Conference on Fuzzy Systems, pp. 1112-1117, 1998.

[25] K. Nozaki, H. Ishibuchi, and H. Tanaka, "Adaptive fuzzy rule-based classification systems," IEEE Trans, on Fuzzy Systems, vol. 4, no. 3, pp. 238-250, 1996.


[26] R. Andrews, J. Diederich, and A. B. Tickele, "Survey and critique of techniques for extracting rules from trained artificial neural networks," Knowledge-Based Systems, vol. 8, no. 6, pp. 373-389, 1995.

[27] L. Fu, "Rule generation from neural networks," IEEE Trans, on Systems, Man, and Cybernetics, vol. 24, no. 8, pp. 1114-1124, 1994.

[28] S. Sestito and T. Dillon, "Knowledge acquisition of conjunctive rules using multilayered neural networks," International Journal of Intelligent Systems, vol. 8, pp. 779-805, 1993.

[29] G. Towell and J. W. Shavlik, "Interpretation of artificial neural networks: mapping knowledge-based neural networks into rules," Advances in Neural Information Processing Systems 4 (Edited by J. E. Moody, S. J. Hanson and R. P. Lippmann), San Mateo, Morgan Kaufmann, pp. 977-984, 1992.

[30] G. Towell and J. W. Shavlik, "Extracting refined rules from knowledge-based neural networks," Machine Learning, vol. 13, pp. 71-101, 1993.

[31] Y. Hayashi, "A neural expert system with automated extraction of fuzzy if-then rules and its application to medical diagnosis," Advances in Neural Information Processing Systems 3 (Edited by R. P. Lippmann, J. E. Moody and D. S. Touretzky), San Mateo, Morgan Kaufmann, pp. 578-584, 1991.

[32] C. Matthews and I. Jagielska, "Fuzzy rule extraction from a trained multi-layered neural network," Proc. of 1995 IEEE International Conference on Neural Networks, pp. 744-748, 1995.

[33] T. Furuhashi, S. Matsushita, H. Tsutsui, Y. Uchikawa, "Knowledge extraction from hierarchical fuzzy model obtained by fuzzy neural networks and genetic algorithms," Proc. of 1997 IEEE International Conference on Neural Networks, pp. 2374-2379, 1997.

[34] N. K. Kasabov, "Fuzzy rule extraction, reasoning and rule adaptation in fuzzy neural networks," Proc. of 1997 IEEE International Conference on Neural Networks, pp. 2380-2383, 1997.

[35] M. Umano, S. Fukunaka, I. Hatono, H. Tamura, "Acquisition of fuzzy rules using fuzzy neural networks with forgetting," Proc. of 1997 IEEE International Conference on Neural Networks, pp. 2369-2373, 1997.

[36] A. Kaufmann and M. M. Gupta, Introduction to Fuzzy Arithmetic, Van Nos-trand Reinhold, New York, 1985.

[37] R. E. Moore, Methods and Applications of Interval Analysis, SIAM Studies in Applied Mathematics, Philadelphia, 1979.

[38] G. Alefeld and J. Herzberger, Introduction to Interval Computations, Academic Press, New York, 1983.

[39] J. J. Buckley and Y. Hayashi, "Fuzzy Neural Networks: A Survey," Fuzzy Sets and Systems, vol. 66, pp. 1-13, 1994.

Approaches to the Design of Classification Systems . . . 271

[40] H. Ishibuchi, K. Morioka, and I. B. Turksen, "Learning by Fuzzified Neural Networks," International J. of Approximate Reasoning, vol. 13, no. 4, pp. 327-358, 1995.

[41] R. A. Fisher, "The use of multiple measurements in taxonomic problems," Annals of Eugenics, vol. 7, pp. 179-188, 1936.

Chapter 13 A Clustering based on Self-Organizing

Map and Knowledge Discovery by Neural Network

Kado Nakagawa, Naotake Kamiura, and Yutaka Hata

Himeji Institute of Technology

Abstract

Clustering methods, such as k-means, Fuzzy C-Means (FCM), and others have been

developed. However, they are only partitioning a database, so it is difficult to discover the

reason why each cluster is formed. This paper proposes a method to discover the knowledge

of how the clusters are derived. To select the center vector of each cluster, we employ an

unsupervised clustering method based on Self-Organizing Map (SOM) without giving the

number of clusters. We define the degree of contribution calculated from the weights of the

neural network which learned the center vectors. We then describe the knowledge discovery

method from the degrees. We applied our method to the artificial data and the clustering

problem. The results show that the degree of contribution is an efficient indicator to

represent the knowledge of how the clusters are formed.

Keywords : Data Mining, Knowledge Discovery, Clustering, K-means, Fuzzy C-Means

(FCM), Center Vector, Number of Clusters, Self-Organizing Map (SOM), Unsupervised

Clustering, Reference Vector, Winning Neuron, Competitive Learning, Similarity Matching,

Updating Procedure, Dense Neuron, The Degree of Similarity, Combination, Neural Network,

Prune, The Degree of Contribution, Neyman Scott's Method

13.1 Introduction

The rapid developments of computer memory capacity technology enable us to accumulate a lot of databases with a large amount of information. The data mining [1] is useful in analyzing the contents of a large-scale database. It finds the patterns that appear at frequent intervals and then discovers the helpful knowledge and unknown rules for users. Namely, it processes a data set and

273

274 K. Nakagawa, N. Kamiura & Y. Hata

discovers the hidden knowledge. Before discovering the hidden knowledge, the information in a huge

database could be classified into some clusters. Recently, the clustering [2] has receiving considerable attention as one of the most promising approaches for classifying data. It is applied to the various fields; data analysis, pattern recognition, image processing, fuzzy modeling, and so on. K-means [3] and Fuzzy C-Means (FCM for short) [4] are well-known clustering algorithms. They merely partition a database, need some teaching data, and cannot discover the knowledge of how the clustering result is derived.

This paper proposes a useful method in discovering the reason why each cluster is formed. Our method consists of three steps. As the first step, we employ the clustering based on Self-Organizing Map (SOM for short) [5]-[7] to find the center vector of each cluster. This step finds the center vectors by doing the clustering. The SOM algorithm is robust for noises and shows easily the high-dimensional vectors over three-dimensions by mapping a high-dimensional input data space onto a low-dimensional discrete lattice of units. This algorithm is an unsupervised learning one [8]-[ll], so we can find the center vectors without any teaching data. The number of clusters equals that of the center vectors. Our method thus, does not require the number of clusters to execute the clustering though K-means and FCM require it in advance.

We employ three-layered feed-forward neural network to discover the knowledge of how the clustering result is derived. The second step repeats pruning [12][13] a neuron with the smallest sum of errors in the hidden layer, when the network learns the center vectors as teaching data. It is known that many neurons in the hidden layer may lead to overfitting of the data and poor generalization [14][15], while few neurons in the hidden layer may not give a network that learns the data. Our method is inclined to discover suitable knowledge from the network with the optimal number of neurons in the hidden layer. Two different approaches have been proposed to overcome a problem of determining an optimal number of neurons in the hidden layer which is required in the network to solve a given problem. The first approach begins with a minimal network and adds more neurons in the hidden layer only when they are needed to improve the learning capability of the network. The second approach begins with an oversized network and then prunes redundant neurons [12][13] in the hidden layer. Since the second one can discourage the use of unnecessary connections and prevent the weights of the connections from taking excessively large values, we employ it.

The third step discovers the knowledge concerned with each cluster from

A Clustering based on Self-Organizing Map and Knowledge ... 275

the degree of contribution defined by the weights of neurons in the network. The organization of this paper is as follows. In Section 13.2, we

explain the SOM algorithm. The three steps of our methods are described in Section 13.3. In Section 13.4, we apply our method to a database formed artificially by employing the Neyman Scott's method [16] and two databases of cars. Finally, in Section 13.5, a brief conclusion and future perspective are discussed.

13.2 Preliminary

13.2.1 Notation of Data Set

Let X = {x1,...,xn] be a set of the input data where xk eRN(k = l,...,n). A variable xk is an element in the TV -dimensional Euclidean space such as xk ={firstattribute,..., Af-thattribute}. Namely, each variable has N-attributes, i.e., xk - {Height, Weight, Age,...}. In this paper, for all input data X:k{j — \,...,N) concerned with j-th attribute, the following fuzzy membership value r\:k is calculated. It is

x ik - min, iljk=— — (1)

J max -nun

where max ,• is a maximum value concerned with j-th attribute and min ,• is a j J

J

minimum one. We use these fuzzy membership values as the input data.

[Example 1] When a maximum value concerned with one attribute is 100 and a minimum value is 0, an input data 50 has a fuzzy membership value 0.5.

• 13.2.2 The SOM Algorithm The SOM algorithm is an important tool to map high-dimensional data sets unknown density distributions onto a low-dimensional discrete lattice of neurons. Fig. 13.1 shows the structure of SOM. In this Fig., some neurons are arranged on the two-dimensional lattice, and a weight called reference vector to each neuron is assigned. The lattice type of array is a rectangular.


input vector x — <

reference vector m

_ \ input layer

Fig. 13.1 The structure of SOM.

The SOM consists of an input layer and a map layer with neurons. This input layer is connected with all neurons and supplies all input data to them. A set of data X = {x{,...,xn}, xkeRN with Af-attributes and n-data is fed to the SOM via an input layer and a map layer has Af-neurons. The neuron i (i = l,...,M) in a map layer has a reference vector mt, mi eR with N-attributes the same as an input data. The learning process of SOM proceeds by modifying reference vectors.

The learning process is called a competitive learning. It consists of two operations. If the Euclidean distance between a reference vector assigned to some neuron and input data xk is the smallest, when xk is fed to the SOM, then this neuron is called a winning neuron. The first operation is the similarity matching to find a winning neuron. The second operation is the renewal procedure bringing a reference vector of winning neuron and ones of neurons located around the winning neuron close to xk. We repeat these operations for the predefined number of times. The steps are shown as follows.

STEP 1 Initialize all reference vectors mi (i = 1,..., M) at random. STEP 2 Execute followings with respect to t = l,...,T.

1. Select an input data xk e X at random. 2. Find a winning neuron c satisfied equation (2) as an example

shown in Fig. 13.2.

Il**-mc!l = .m i n]ht-m i l l (2) : min \\xt -tti; I<;<M" '


where HjcA-m(-| is the Euclidean distance between xk and m,.

input data xk- J J J J J J J—J J J

P

r r r r / / r rr.j winning neuron

Fig. 13.2 A winning neuron.

map layer

3. Modify reference vectors according to equation (3) as an example shown in Fig.13.3.

IM,(f + l) _\mi(t) + a(0[*-m;(f)] if ie.Nc

if i e Nr (3)

where Nc is the number of neurons located around the winning neuron c, a(t)<l is a coefficient of learning and t is the number of renewals.

input data xk * ~~J ] / / ~7 I J J J

/ _ / I I I I U I I—I map layer

: winning neuron JVc =1 ( ) : modification neurons

Fig. 13.3 The modification of refrence vectors.


Nc and a{t) begin at big values to roughly construct a map and then take small values as the learning proceeds.

In this paper, Nc begins at a distance three and a(t) is expressed by equations (4).

0.95 (1 < t < 30999)

950.0 a{t)--

l + (r + 30000) 0.02 otherwise

(30999 < t < 77500) (4)

Fig.13.4 shows the degree of a(t).

1 - 0 . 9 § 0.8 •3 0.7 j£ 0.6 8 0.5 o0 .4 a 0.3 £ 0 . 2 H 0 . 1

0

-0.95 T-

0.02-

±L 20000 40000 60000 80000

The number of modifications 100000

Fig. 13.4 A coefficient of learning.

13.3 A Clustering Employing the SOM and Knowledge Discovery by

Neural Network

13.3.1 A Clustering Employing the SOM

In this Section, we apply the SOM to estimating the number of clusters and determining the center vector of each cluster. Nc takes a big value at first and the SOM roughly constructs a map. Then several data with similar values of attributes on X are fed to an input layer are learned repeatedly on a map layer. The adjoining reference vector of some neuron is similar to that of its adjacent neuron. After the learning, we calculate the average of distance between the reference vector of some neuron and that of its adjacent neuron. We denote the average by AD. The neuron is learned repeatedly by several data with similar values of attributes as its AD becomes smaller. We call a neuron with AD less


than or equal to threshold 5 a dense neuron. We first combine neighboring dense neurons. Then we set the center

vector of each cluster to the coordinates obtained by averaging values of attributes of the combined dense neurons. The number of clusters equals that of center vectors. The number of clusters and their center vectors are determined as follows.

STEP 1 Apply the input data to the SOM algorithm at random. STEP 2 Find dense neurons.

1. Calculate AD for each neuron. We denote AD of the neuron i by di {i = 1,2,..., M). dt is as follows.

where Nt is the number of neurons around the neuron i and TV, is different from Nc. If the neuron i is located at the edge of SOM, then Nt equals 2 or 3. Otherwise, Nt equals 4 as shown in Fig. 13.5.

~ 7 7 7 7 1 7 7 7 7 7

U

Fig. 13.5 Neuron ( and neurons around one.

2. Calculate the threshold S by equation (6) where d_max (or d_miri) is the maximum(or minimum) value among ADs. Then find the dense neurons gi,g2>—<8s

according to equation (7).

5 = d_min + 0.lx(d_max-d_min) (6)

5 = 1, for i = 1,...,M


if dj < 8, then set the number of dense neuron gs to i.

s <— s + l STEP 3 Combine the dense neurons as follows.

1. T = {Gl,G2,...,GI] be a set of initial clusters. First set / to s. Namely, T = {Gl,G2,...,G1}={gl,g2,...,gs}.

2. If / < 2, then go to STEP 1. 3. Select two clusters G and G (p,q<l). For any dense

neuron in G and G , we calculate the value F(Gp,Gq) by following equation.

F(Gp,Gq)= max f(x,y) (8)

where x(ory) is an element of Gp(orGq), and we define f(x, y) as follows.

) (9)

where a and b are constant. In this paper, we set a to -8 and b to 2. We calculate F(Gp,Gq) for all the possible pairs of two clusters. We call the maximum value of F{G ,G) the degree of similarity £. Namely,

£ = max F(Gp,G,) (10) p,q<I,p*q

If £ > 0 , then go to 4 of STEP 3. From the experimental result, we set 6 to 0.75. Otherwise, go to STEP 4.

4. Combine Gp and Gq and form the new cluster Gr. That is

Cr = C,uC,. (11)

Then / < - / - l , r < - T - G p -Gq + Gr hold. Simultaneously, renumber the elements of T from G, to Gt in order. Goto 2 of STEP 3.

In this STEP, we form / clusters: G1; G2,..., G, with dense neurons. STEP 4 Determine the center vector of each cluster and the membership

degree of each neuron as follows. 1. We denote the center vector of each cluster Gr by m_centerr

where (r = l,...,/). Calculate m_centerr as follows.


I'eC,

m_centerr - . . \Gr\ (12)

_ The sum of reference vectors of dense neurons in Gr.

The number of dense neurons in Gr. 2. Calculate the degree of similarity firi between m_centerr and

each neuron i (i = 1,..., M) as following equation.

firi=lf(m_centerr,i)}c (13)

In this paper, we set c to 1.2. 3. Calculate the membership degree [i'ri of each neuron i with

respect to cluster r.

^'n=-rIi- (14)

IX-r=\

STEP 5 Determine the membership degree of each input data as follows. For the input data /? (/? e X) and any neuron i, calculate the Euclidean distance between the neuron /} and i, and find the neuron with the smallest distance. Set the membership degree of P to that of this neuron. /J is regarded as an element of the cluster with this neuron.

[Example 2] Fig. 13.6 shows an example of ADs. If we set the threshold 8 to 0.030, then seven neurons numbered with 5,13,22,25,40,43, and 48 are selected as dense neurons. Fig.13.7 shows a map layer in the SOM. The color of neuron becomes deeper as its AD becomes smaller. The dense neurons are denoted by black ones.

282 K. Nakagawa, N. Ka.rn.iura & Y. Hata

0.11 0.1

0.09 0.08 0.07

pO-06

<0.05 0.04 0.03 0.02 0.01

0

< • a

• • • • • • •

• . . . • / • • • • MB*- :;':::!:l|ill!ii:ii|if!H:i;:illl

^:::}::mmMMi§:mM^22M:imm

» • • • ' -

• • • V r - * * • * •

:::;p:i*-:::::!i.;,-::::: : : : : : • : • : : • • - — - " - ; " ' - : - : - : : : - : " : : : : : : : : : : : : : : : : : : • : : • : : : : : : : :

;!;:;:;;;:::;;;d;y:i4Qi43:::!:!i4X;=^=';;iN!:!!^ • ; . : : : ; ; ; ; ; ; ; - ; ; ; ; ; ; ; ; : ; : ! ! ! ! ! : ! : : ; .

10 20 30 40 50 The number of neurons

Fig. 13.6 An example of the ADs.

5 = 0.030

60

©©<&©©©

#®<& Fig. 13.7 The dense neurons.

D

[Example 3] Fig.13.8 shows f(x, v) expressed by equation (9) where a=-8 and b=2. With a 6 = 0.75, the dense neurons with the degrees of similarity more than or equal to 0.75 are combined as an identical cluster. Namely, ones with the distance of reference vectors less than or equal to 0.19 forms a cluster.

Fig. 13.9 shows the map layer after combining dense neurons. This clustering based on the SOM forms cluster 1 and cluster 2. The distance of reference vectors of arbitrary two dense neurons in the cluster 1 (or cluster 2) is less than or equal to 0.19. However, the distance between reference vector of some dense neuron in the cluster 1 and that in the cluster 2 is more than 0.19.

http://Ka.rn.iura


/ ( •

•so.r

£>0.

'£ o. 1°-*S 0. o 0.

go u „

•o 0. J3 0. H

l 9

6 5 f" 4 3 2 1 0

-

0

\0.15

1 9

f(x,y) = exp(-8 x \mx - my | )2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 The distance of reference vector

0.9 1 \mx-mj

Fig. 13.8 The degrees of similarity curve.

cluster 1

B asest* HS£] 4sgH|&. ^sgilk f j ^ f BIB86» ^ i ^ f ^ j r l ^ H ^ ^8iF ]*P5

,£&&& j ssefe. jj^jtec . sss&i ]_^^^Tl jflfflBi.

cluster 2

Fig. 13.9 The set of cluster.

133.2 Knowledge Discovery by Neural Network

A clustering based on the SOM in section 13.3.1 cannot discover the knowledge of how the clusters are derived. In order to discover the knowledge, we introduce a three-layered feed-forward neural network. Our network learns the center vector of each cluster as teaching data. In learning phase, we prune a neuron with the smallest sum of errors in the hidden layer. This operation is repeated until we obtain non-converged network. We define the degree of contribution concerned with one attribute from weights of neurons in the last


converged network, and then we discover the knowledge from the degrees. We show each step to discover the knowledge.

STEP 1 Let N be the number of attributes and / be the number of clusters. Fig. 13.10 shows a three-layered feed-forward neural network with N input neurons, / output neurons, and J hidden neurons.

out,h

k-th. ©wh,in ^ . ^ XUOM,h ^ _ ^

th

Fig. 13.10 The strucuture of neural network.

where W'"" is a connection weight from the k-th neuron in the input layer to j-th neuron in the hidden layer, and W"" is one from the j-th neuron in the hidden layer to i-th neuron in the output layer.

STEP 2 Our network learns the center vector of each cluster as teaching data. The learning proceeds so that each output value will take 1 when center vector of each cluster is applied to input. We preserve the state of convergence.

STEP 3 In the hidden layer, when removing one neuron j (j = I,..., J) from the converged network, we calculate the sum of errors, namely, ((teaching data)-(output of the network))2, for every input data. We prune a neuron in the hidden layer with the smallest sum of errors and set J to J-1.

STEP 4 If the network can not converge, then stop the learning in the last


state of convergence. Otherwise, go to STEP 3. Namely, STEP 3 is repeated until a non-converged network is obtained. The above is done to clearly obtain "the degree of contribution" mentioned after.

STEP 5 For the last converged network, we calculate values, Ijk, from the k-th neuron in the input layer to the i-th neuron in the output layer.

7=1

where J is the number of remained neurons in the hidden layer. STEP 6 We define Iik as the degree of contribution that the k-th attribute

gives to the i-th cluster. When the absolute value of Iik is larger than the threshold thr, we can find a remarkable characteristic for i-th cluster concerned with k-th attribute. In this way, we employ the property of this attribute as the following knowledge. If Iik > thr,

then the knowledge of k-th attribute gives to i-th cluster is "big". If Iik<-thr, then the knowledge of k-th attribute gives to i-th cluster is "small". Otherwise, the knowledge of k-th attribute gives to i-th cluster is "neither small nor big".

In other words, a large value of Iik over the threshold thr can be represented by "big" for the knowledge of k-th attribute of the i-th cluster. A small negative value of Iik under the threshold -thr can be represented by "small". Otherwise can be represented by "neither small nor big". The threshold thr is determined by a half value of the standard deviation concerned with the absolute value of Iik.

[Example 4] When the number of clusters is three, the number of outputs equals three. As shown in Table 13.1, we make teaching data so that each output value will take 1 when center vector of each cluster is applied to input.


Table 13.1 Teaching data.

—_____^ output data(cluster) No. ~—-—__

teaching data 1 (cluster 1) teaching data 2(cluster 2) teaching data 3(cluster 3)

First

1 0 0

Second

0 1 0

Third

0 0 1

Assume that the network is converged as shown in Fig.13.11. We show that how to calculate the degree of contribution that second attribute gives to first cluster, /j 2- It is as follows.

/ 1 2 = 6 x ( - 4 ) + ( -2)x5 = -34 (16)

Weights

attibute cluster

Fig. 13.11 The converged network.

[Example 5] Assume that the degrees of contribution are calculated as shown in Table 13.2. Then the threshold thr equals 36.5. Table 13.3 shows the discovered knowledge from the degrees.


Table 13.2 The degrees of contribution.

^"^- \â t t r ibute cluster NoT~"~—-

cluster 1

cluster2

cluster3

First

11.87

52.1

•44.56

Second

6.31

-2.39 37.7

Third

-59.2

-9.18

1.01

Table 13.3 The discovered knowledge form Table 13.2 (thr=36.5).

^~~~^~~-âttribute cluster N o ^ ~ ~ - \

clusterl cluster2 cluster3

First

-

big small

Second

--

Wg

Third

small -

-

The knowledge concerned with the first cluster is "the third attribute is small", the second cluster is "the first attribute is big", and the third cluster is "the first attribute is small and the second attribute is big".

The cells marked by "-"s in Table 13.3 shows that the discovered knowledge is "neither small nor big".

• 13.4 Experimental Results

13.4.1 The Estimation of the Number of Clusters

We applied a clustering based on the SOM to an artificially synthesized five clusters with 234 data specified by two attributes (x,y) in Fig. 13.12 using Neyman Scott's Method. The size of SOM is 20 rows and 20 columns and hence the number of neurons is 400. The number of learning in the SOM was 400,000 times. Fig. 13.13 shows the result of clustering. The number of clusters was five and it is shown that our method has the fine ability of determining the number of cluster without teaching data. Moreover, we can find the center vector of each cluster.

Table 13.4 shows the experimental results of applying a clustering based on the SOM to Fig.13.12 for 10 times under the threshold 0=0.7, 0.75, and 0.8. In the case of 0=0.75, the number of clusters was always five and hence 0.75 is appropriate to 6.


r y • sample

•

• " * • * . ,

. *• ••. •^

% T • • ••

%• # « • • •

• • - A *

• • • • • • • •

data

. • *•

• •• • •

• ••

V£. . * • • • •

AT

1.2

1

0.8

0.6

0.4

0.2

0

-0.2

-0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4

Fig. 13.12 Artificial data.

Oclusterl ° cluster2 Acluster3 Xcluster4 +cluster5 •center vectors 1.2

1

0.8

0.6

0.4

0.2

0

-0.2 -0.2 0.2 0.4 0.6 0. 1.2 1.4

Fig. 13.13 The result of clustering employing the SOM.


Table 13.4 The number of clusters in changing threshold 8.

0.8

0.75

0.7

1

5

5

5

2

5

5

4

3

6

5

4

4

5

5

4

5

5

5

5

6

5

5

3

7

5

5

5

8

5

5

5

9

5

5

5

10

5

5

5

13.4.2 The Result on Discovery of Knowledge

We applied our method to a car database with 200 data specified by three attributes (Price, Engine-size, and Width). After the clustering, the number of clusters became three. So we prepared the neural network with three input neurons, three output neurons, and adequate hidden neurons. Tables 13.5,13.6, 13.7, 13.8, and 13.9 show the center vector of each cluster, teaching data, a part of clustering result, the degrees of contribution, and the discovered knowledge from the degrees, respectively. The size of SOM used before discovering the knowledge was 20 rows and 20 columns, and the number of learning in the SOM was 400,000 times. We set the sum of errors in neural network to 0.0004.

Table 13.5 The center vector of each cluster.

^""^-âttribute cluster N a ^ \ .

cluster1

cluster2

cluster3

Price($)

16434.9(0.28)

27420.1(0.55)

9542.6(0.11)

Engine-size(ci)

186.7(0.68)

189(0.71)

172.1(0.46)

Width(inch)

68.4(0.69)

66.9(0.56)

65.4(0.44)

( ):Fuzzy menbership values


^~~~~~—^ciutput center N o / ^ ~ - - ^

cluster1 cluster2

cluster3

First

1 0

Second

0

1 0 j 0

Third

0 0 1


Table 13.7 A part of clustering result

^^âttribute cluster N o > \

clusterl

cluster2

cluster3

Price($)

19045 21485 22470

22625 20000 24565 30760 41315 36880 32250 10295 12945

10345 6785 11048

Engine-size(cc)

188.8 188.8

188.8

188.8 188.8 189 189

193.8 197

199.6 175.4 175.4

169.1 170.7 172.6

Width(inch)

68.8 68.9 68.9

68.9 68.9 66.9 66.9 67.9 70.9

69.6 62.5 65.2

66 61.8 65.2

Table 13.8 The degrees of contribution.

^"~~^—âttribute cluster No/""^-^ .

clusterl cluster2

cluster3

Price($)

-32.46

188.9 -121.84

Engine-size(ci)

39.74 47.1

-63.18

Width(inch)

202.07 -71.26

-84.23

Table 13.9 The discovered knowledge (thr=153A).

^~~^~~âttribute cluster NoT^~-- \

clusterl cluster2 cluster3

Price($)

-

big -

Engine-size(ci)

-.

-

Width(inch)

big -

-


In this result, the threshold thr was estimated at 153.4. As a result, the discovered knowledge concerned with the first cluster was "the width is big", and concerned with the second cluster was "the price is big".

Finally, we applied our method to a car database with 200 data specified by six attributes (Price, Engine-size, Peak-rpm, Length, Width, and Height). After the clustering, the number of clusters became five. So we prepared the neural network with six input neurons, five output neurons, and adequate hidden neurons. Tables 13.10, 13.11, 13.12, 13.13, and 13.14 show the center vector of each cluster, teaching data, a part of clustering result, the degrees of contribution, and the discovered knowledge from the degrees, respectively.

Table 13.10 The center vector of each cluster.

^\a t t r ibute

cluster N b ^

clusterl

cluster2

cluster3

cluster4

cluster5

Price($)

18420(0.33)

7738(0.07)

8013(0.07)

12945(0.19)

6377(0.03)

Engine-size

(cubic inch)

130(0.26)

98(0.14)

108(0.18)

110(0.18)

90(0.11)

Peak-rpm (times)

5100(0.39)

4800(0.27)

4800(0.27)

5800(0.67)

5500(0.55)

Length (inch)

188.8(0.71)

166.3(0.38)

173.6(0.49)

175.4(0.51)

157.3(0.24)

Width (inch)

67.2(0.59)

64.4(0.35)

65.4(0.44)

65.2(0.42)

63.8(0.30)

Height (inch)

56.2(0.70)

53(0.43)

54.9(0.59)

54.1(0.53)

50.8(0.25)

( ): fuzzy menbership values


^ \ o u t p u t

center N o ^

clusterl

cluster2

cluster3

cluster4

cluster5

First

1

0

0

0

0

Second

0

1

0

0

0

Third

0

0

1

0

0

Fourth

0

0

0

1

0

Fifth

0

0

0

0

1


Table 13.12 A part of clustering result.

^~\4ttribute cluster Nc^-\

clusterl

cluster2

cluster3

cluster4

cluster5

Price($)

28248 28176 31600 34184 40960

13495 16500 12964 6785 11048

7995 8195 8495 9495 13845

16430 16925 7295 7295 7895

15645 7609 8558 6479 6855

Engine-size (cubic inch)

183 183 183 234 308

130 130 156 111 119

97 109 109 97 97

108 108 92 92 110

80 90 98 92 92

Peak-rpm (times)

4350 4350 4350 4750 4500

5000 5000 5000 4800 5000

4800 5250 5250 4500 4500

5800 5800 6000 6000 5800

6000 5500 5500 4800 6000

Length (inch)

190.9 -187.5 202.6 202.6 208.1

168.8 168.8 173.2 170.7 172.6

171.7 171.7 171.7 171.7 180.2

176.8 176.8 163.4 157.1 167.5

169 157.3 157.3 144.6 144.6

Width (inch)

70.3 70.3 71.7 71.7 71.7

64.1 64.1 66.3 61.8 65.2

65.5 65.5 65.5 65.5 66.9

64.8 64.8 64

63.9 65.2

65.7 63.8 63.8 63.9 63.9

Height (inch)

58.7 54.9 56.3 56.5 56.7

48.8 48.8 50.2 53.5 51.4

55.7 55.7 55.7 55.7 55.1

54.3 54.3 54.5 58.3 53.3

49.6 50.6 50.6 50.8 50.8

Table 13.13 The degree of contribution.

^xattribute

cluster Nt>K^

clusterl

cluster2

cluster3

cluster4

cluster5

Price($)

177.51

-30.93

-87.09

79.26

-68.85

Engine-size

(cubic inch)

70.1

-10.65

57.18

13.21

-37.75

Peak-rpm

(times)

-81.59

-118.51

-216.47

108.63

89.36

Length

(inch)

263.66

-65.54

142.04

83.48

-128.8

Width

(inch)

131.46

-56.51

76.86

55.46

-60.51

Height

(inch)

244.81

-98.34

290.16

87.23

-133.76


Table 13.14 The discovered knowledge (tf»r=108.2).

^^ t t r ibu te

cluster NoN^

clusterl

cluster2

cluster3

cluster4

cluster5

Price($)

big ---

-

Engine-size (cubic inch)

----

-

Peak-rpm (times)

-

small

small

big

-

Length (inch)

big -

big -

small

Width (inch)

big ---

-

Height (inch)

big

-

big -

small

The size of SOM used before discovering the knowledge was 20 rows and 20 columns, and the number of learning in the SOM was 400,000 times. We set the sum of errors in neural network to 0.0004.

In this result, the threshold thr was estimated at 108.2. As a result, the discovered knowledge concerned with the first cluster was "price, length, width, and height are big", that concerned with the second cluster was "peak-rpm is small", that concerned with the third cluster was "length and height are big and peak-rpm is small", that concerned with the fourth cluster was "peak-rpm is big", and that concerned with the fifth cluster was "length and height are small". The cells marked by "-"s in Table 13.9 and 13.14 show that the discovered knowledge is "neither small nor big".

In this way, we could estimate the number of cluster from two car databases and extract the knowledge about each cluster.

13.5 Conclusions In this paper, we proposed a useful method in discovering the reasons why the clusters were formed. We introduced the SOM to an unsupervised clustering algorithm. It could determine the center vector of each cluster and estimate the number of clusters. We used a three-layered feed-forward neural network for discovering the knowledge. From the neural network which learns the center vectors, we calculate the degree of contribution being equivalent to the product of weights of neurons learning the center vectors. If the absolute value of the degree of contribution is larger than a threshold, then we employs this attribute fed to the neuron in the input layer as the knowledge. The experimental results showed that our method had an ability to discover the knowledge of how the cluster was derived. The degree was an efficient indicator to present the


knowledge. Our method depends on the threshold to derive the knowledge. As the

future direction, it remains to automatically find the threshold.


References

[I] P. Cheeseman, and J. Stutz, "Advance in Knowledge Discovery and Data Mining,"

Bayesian Classification (AutoClass): Theory and Results, AAAI Press / The MIT

Press, Chapter 6, pp. 153-180, 1996.

[2] H. Asada, "The Dictionary of Artificial Intelligence," (in Japanese) Clustering,

Maruzen, 1993.

[3] M. R. Anderberg, "Cluster Analysis for Applications," Academic Press, New York,

1973.

[4] J. C. Bezdek, "Pattern Recognition with Fuzzy Objective Function

Algorithms," Plenum Press, New York, 1981.

[5] T. Kohonen, "Self-organization and associative memory," Third Edition, Springer-

Verlag, Berlin, 1989.

[6] T. Kohonen, "Self-Organization Map," Proc. IEEE, vol. 78, no. 9, pp. 1464-1480,

1990.

[7] T. Kohonen, "Self-Organization Maps," Springer Verlarg, Belrin, 1995.

[8] T. Kikuchi, T. Matuoka, T. Takeda, and K. Kishi, "Automatic Classification by a

Competitive Learning Neural Network," IEICE Trans., D-II, Vol. J78-D-II, No. 10,

pp. 1543-1547, 1995.

[9] M. Tanaka, Y. Furukawa, and T. Tanino, "Clustering by Using Self Organizing

Map," IEICE Trans., D-II, Vol. J79-D-II, No. 2, pp. 301-304, 1996.

[10] M. Terashima, F. Shiratani, and K. Yamamoto, "Unsupervised Cluster Segmentation

Method Using Data Density Histogram on Self-Organizing Feature Map," IEICE

Trans., D-II, Vol. J79-D-II, No. 7, pp. 1280-1290, 1996.

[II] R. Ito, T. Shida, and T. Kindo, "Competitive Models for Unsupervised Clustering,"

IEICE Trans., D-II, Vol. J79-D-II, No. 8, pp. 1390-1400, 1996.

[12] T. Hozumi, N. Kamiura, Y. Hata, and K. Yamato, "Multiple-Valued Logic Design

Based on Gate Model Networks," MULTIPLE-VALUED LOGIC An International

Journal, vol. 3, no.l, pp. 1-20, 1998.

[13] K. Nakagawa, N. Kamiura, and Y. Hata, "Knowledge Discovery Using Fuzzy C-

Means and Neural Network," Proc. of the 5-th International Conference on Soft

Computing and Information / Intelligent Systems, Vol. 2, pp. 915-918, Iizuka, Japan,

Oct, 1998.

[14] R. Setiono, "A Penalty-function Approach for Pruning Feed-forward Neural

Networks," Neural Computation, Vol. 9, No. 1, pp. 185-204, 1995.


[15] R. Setiono, "Extracting Rules from Neural Networks by Pruning and Hidden-unit

Splitting," Neural Computation, Vol. 9, No. 1, pp. 205-225, 1995.

[16] T. Kato, and K. Ozawa, "Non-hierarchical Clustering by Genetic Algorithm," (in

Japanese) Information Society of Japan, Vol.37, No.l 1, pp. 1950-1959, 1996.

Chapter 14

Probabilistic Rough Induction

Juzhen Dong1, Ning Zhong1, Setsuo Ohsuga2

1 Maebashi Institute of Technology 2 Waseda University

Abstract

In this paper, we propose a soft approach called G D T - R S for rule discovery in databases with uncertainty and incompleteness. The approach is based on the combination of Generalization Distribution Table ( G D T ) and the Rough Set methodology. A GDT is a table in which the probabilistic relationships between concepts and instances over discrete domains are represented. The GDT provides a probabilistic basis for evaluating the strength of a rule. Furthermore, the rough set methodology is used to find minimal relative reducts from the set of rules with larger strengths. Main features of our approach are ( l ) biases can be flexibly selected for search control, and background knowledge can be used as a bias to control the creation of a GDT and the rule discovery process; (2) the uncertainty of a rule, including its ability to predict possible instances, can be explicitly represented in the strength of the rule.

Keywords : inductive learning, knowledge discovery, Generalization Distribution Table (GDT), rough sets, uncertainty and incompleteness, background knowledge, soft computing, hybrid system.

14.1 I n t r o d u c t i o n

Inductive learning is a major way to discover classification rules from

databases. Various methods have been proposed[l8; 16; 7; 22; 12; 13].

According to the value of information, these methods can be divided into

two types. The first type is based on the formal value of information, tha t

is, the real meaning of da ta is not considered in the discovery process. ID3

is a typical method of this type [16]. Although rules can be discovered

297

298 J. Dong, N. Zhong & S. Ohsuga

by using the method, it is difficult to use background knowledge in the

discovery process. The other type of inductive methods is based on the

semantic value of information, tha t is, the real meaning of da ta must be

considered by using some background knowledge in the discovery process.

Dblearn is a typical method of this type [7]. It can discover rules by means

of background knowledge represented by concept hierarchies, but if there

is no background knowledge, it can do nothing. The question is

"how can both the formal value and the semantic value be

considered in a rule discovery system?"

Unfortunately, so far we have not seen any inductive method that can

consider both of the formal value and the semantic value of information.

We argue tha t an ideal rule discovery system should have such feature, tha t

is, on one hand, background knowledge can be used flexibly in the discovery

process; on the other hand, if no background knowledge is available, it can

also work.

Another issue that was not addressed in previous inductive methods is

"how can unseen instances be predicted, and how can the

uncertainty of a rule including the prediction be represented

explicitly?"

Since most of previous inductive methods are based on closed world as

sumption, they only consider the instances tha t have been collected in a

database. It is because there was no way tha t has been found to know/guess

the instances of describing a concept tha t have never been observed before.

In this paper, we propose a soft approach called GDT-RS, which is

based on the combination of Generalization Distribution Table (GDT) and

the Rough Set methodology, for discovering classification rules hidden in

da ta with uncertainty and incompleteness. The main features of GDT-RS

are

• Biases can be flexibly selected for search control and background

knowledge can be used as a bias to control the creation of a G D T

and the rule discovery process;

• Unseen instances are considered in rule discovery process and the

uncertainty of a rule, including its ability to predict unseen in

stances, can be explicitly represented in the strength of the rule.

Probabilistic Rough Induction 299

We first give the definition of a Generalization Distribution Table (GDT) ,

and describe some basic concepts on the G D T methodology. Then we out

line the rough set methodology. Furthermore, we explain how to combine

the G D T with the rough set methodology for discovering classification rules

from databases with uncertainty and incompleteness. We focus on basic

concepts, principles, two algorithms for implementation of our methodol

ogy, and describe the experimental results.

14.2 T h e G D T M e t h o d o l o g y

The central idea of our methodology is to use a variant of a transition

matrix, called t h e Genera l i za t ion D i s t r i b u t i o n Table ( G D T ) , as a

hypothesis search space for generalization. In a GDT, the probabilistic re

lationships between concepts and instances over discrete domains are rep

resented [25; 26]. This section describes the basic concepts and principles

of the G D T methodology.

14 .2 .1 GDT

We define a G D T as consisting of three components: possible instances,

possible generalizations for instances, and probabilistic relationshipsbetween

possible instances and possible generalizations.

The possible instances, which are represented in the top row of a GDT,

are all possible combinations of a t t r ibute values in a database. The number

of the possible instances is YiTLi n»> w n e r e m i s t n e number of at t r ibutes,

and rii is the number of different a t t r ibute values in a t t r ibute i.

The possible generalizations for instances, which are represented in the

left column of a GDT, are all possible cases of generalization for all possi

ble instances. "*", which specifies a wild card, denotes the generalization

for instances*. For example, the generalization *6oCi means the a t t r ibute

A is unimportant for describing a concept. The number of the possible

generalizations is n ^ i ( n « + *) — YllLi ni ~ 1-The probabilistic relationships between the possible instances and the

possible generalizations, which are represented in the elements Gij of a

GDT, are the probabilistic distribution for describing the strength of the

relationship between every possible instance and every possible general-

*For simplicity, we would like to omit the wild card in some places in this paper.


ization. The prior distribution is equip robable, if any prior background

knowledge is not used. Thus, it is defined by the Eq. (1), and }_]<-?»; = 1:

3

dj = piPIjlPGi) 1

if PIj G PGi NPG,

0 otherwise

(1)

where PIj is the jth possible instance, PGi is the i th possible generaliza

tion, and Npa, is the number of the possible instances satisfying the ith

possible generalization, that is,

NPG, - n nk

ke{l\ PG,[(]=*}

(2)

where PGi[l] is the value of the kth a t t r ibute in the possible generalization

PGi, PGj[t] = * means that PGi doesn't contain at t r ibute /.

Furthermore, for convenience, letting E = Ylk=1 «£, Eq. (1) can be

changed into the following form,

Gij =p(PIj\PGi)= <

n nk k€{l\ PG,[l]jt*}

E

0

if PIj G PGi

otherwise

(3)

because of

NpGi

n »* *€{'! PG,[i]#*}

m

it=l

n n" k€{l\ PG,[/] = *}

n nt k€{I\ PG,[l]7i*}

E

Since E is a constant for a given database, the prior distribution p(PIj \PGf)

is directly proportional to the product of the numbers of values of all at

tr ibutes contained in PGi.


Table 14.1 A GDT generated from the sample database shown in Table 14.2

pa ^ \ • bOcO • b O c l • bUO • b l c l * b 2 c O * b 2 c l aO«cO a O * c l a l * c O a l * c l &ObO* a .Obl* a 0 b 2 * a l b O * a l b l * a l b 2 *

• *cO * * c l * b O * * b l * * b 2 *

a l * *

aObOcO

1/2

1 /3

1 /2

1 /6

1 /4

1 /6

aObO c l

1 /2

1 / 3

1 /2

1 /6 1 /4

1 /6

a O b l c O

1 /2

1 /3

1 /2

1 /6

1 /4

1 /6

a O b l c l

1 /2

1 /3

1 /2

1 /6

1 /4

1 /6

a 0 b 2 c 0

1 /2

1 / 3

1 /2

1 /6

1 / 4 1 /6

* 0 b 2 c l

1 /2

1 /3

1 /2

1 /6

1 / 4 1 /6

a l b O c O

1 / 3

1 /2

1 /6

1 /4

1 /6

a l b 2 c l

1 /2

1 /3

1 /2

1 /6

1 /4

1 /6

Table 14.2 A sample database

>A u l u2 u3 u4 u5 u6 u7

a ao ao ao a i

ao ao a i

b

bo h 6o h bo b2

6i

c Cl

C]

C l

CO

C l

C l

C l

d

y y y n n n

y

Thus , in our approach, the basic process of hypothesis generation is to

generalize the instances observed in a database by searching and revising

the GDT. Here, two kinds of at tr ibutes need to be distinguished: condition

attr ibutes and decision a t t r ibutes (sometimes called class attr ibutes) in a

database. The condition at t r ibutes as possible instances are used to create

the GDT, but the decision at tr ibutes are not. The decision at t r ibutes are

normally used to decide which concept (class) should be described in a rule.

Usually a single decision at t r ibute is all tha t are required.

Table 14.1 is an example of the GDT, which is generated by using three

condition at t r ibutes, a,b, and c, in a sample database shown in Table 14.2,

and a = { a o , a i } , b— {60,61,62}, c— {co.c i} . For example, the real mean

ing of these condition at tr ibutes can be respectively assigned as Weather,


Temperature, Humidity in a weather forecast database, or Temperature,

Cough, Headache in a medical diagnosis database. At t r ibute d in Table 14.2 is used as a decision a t t r ibute . For example, the real meaning of the decision a t t r ibute can be assigned as Wind or Flu corresponding to the assigned condition at t r ibutes .

14.2 .2 Biases

Since our approach is based on the GDT, rule discovery can be constrained

by three types of biases corresponding to three components of the GDT.

The first type of bias is related to the possible generalizations in a

G D T . It is used to decide which concept description should be considered

at first. To obtain the best concept descriptions, all possible generalizations

should be considered, but not all of them need to be considered at the

same time. Possible generalizations (concept descriptions) are divided into

several levels of generalization according to the number of wild cards in a

generalization: the greater the number of wild cards, the higher the level.

For example, all possible generalizations shown in Table 14.1 are divided

into two levels of generalization:

Leve.l\: {*b0c0, *60ci, • • . , aj62*}

Level^: {**co,**ci, . . . , a i * * } .

Thus , it is clear that any generalization in a lower level is properly con

tained by one or more generalizations in an upper level. As the default,

our approach prefers more general concept descriptions in an upper level

to more specific ones in a lower level. However, if necessary, a meta control

can be used to alter the bias so tha t more specific descriptions are preferred

to more general ones.

The second type of bias is related to the probability values denoted in

Gij in a G D T . It is used to adjust the strength of the relationship between

an instance and a generalization. If no prior background knowledge as a

bias is available, as default, the probability of occurrence of all possible

instances are equiprobable, as shown in Table 14.1. However, a bias such

as background knowledge can be used during the creation of a GDT. The

distributions will be dynamically updated according to the real da ta in a

database, and they will be un-equiprobable.

The third type of bias is related to the possible instances in a GDT.

In our approach, the strength of the relationship between every possible

instance and every possible generalization depends to a certain extent on


how the possible instances are defined and selected. Furthermore, background knowledge can be used as a bias to constrain

the possible instances and the prior distributions. For example, if the back

ground knowledge

"when the air temperature is very high, it is not possible

there exists some frost at ground level"

is used, to learn rules from an earthquake database in which there are

at tr ibutes such as the air temperature , frost at ground level, two centimeters

below ground level, the atmospheric pressure etc., then we do not consider

the possible instances tha t contradict this background knowledge. Thus,

more refined results may be get by using background knowledge in the

discovery process.

14.2 .3 Adjusting the Prior Distribution by Background Knowledge

One of the main features of the G D T methodology is that biases can be

selected flexibly for search control, and background knowledge can be used

as a bias to control the creation of a G D T and the rule discovery process.

This section explains how to use background knowledge as a bias to adjust

the prior distribution.

As stated in Section 14.2, when no prior background knowledge as a bias

is available, as default, the occurrence of all possible instances is equiprob

able, and the prior distribution is shown in Eq. (1). However, the prior

distribution can be adjusted by background knowledge, and will be un-

equiprobable after the adjustment. Generally speaking, the background

knowledge can be given in

aixji ^ ai2J2 ' B>

where a8 l j l is the jith value of a t t r ibute i\, and dj2j2 is the jyth value of

a t t r ibute i2. ai1j1 is called the premise of the background knowledge, a,-2;-2

is called the conclusion of the background knowledge, and Q is called the

strength of the background knowledge. It means that Q is the probability

of occurrence of a,2j2 when a , U l occurs. Q = 0 means that "ai1jl and ai2j2

never occur together"; Q — 1 means tha t " a ^ j , and ai2j2 always occur in the

same time "; while Q = l / n ; 2 means that the occurrence of a;2 j2 is the same


as the case of without background knowledge, where rn2 is the number of values of a t t r ibute i2. For each instance PI (or each generalization PG), let PI[i] (or PG[i]) denote the entry of PI (or PG) corresponding to a t t r ibute i. For each generalization PG such that PG[ i i ] = a J U l and PG[i2] = *, the prior distribution between the PG and related instances will be adjusted. The probability of occurrence of a t t r ibute value a , 2 j 2 is changed from l/r i i2

to Q by background knowledge, so tha t , for each of the other values in at t r ibute i2, the probability of its occurrence is changed from l /n ; 2 to (1 — Q)/(rii2 — 1). Let the adjusted prior distribution be denoted by pbk-

The prior distribution adjusted by the background knowledge " a , U l =>

ai2J2, Q" is

Pbk(PI\PG)

• p(PI\PG) xQx ni2 if PGfa] = ailh , PG[i2] = *,

PI[i2] = ai2J2

= < p(PI\PG) x - J — ^ . x nh if PG{h] = ailh, PG[i2] = *, (4)

3 j ( ! < J < ni3J ^ h) PI[h] - ai2J

, p(PI\PG) x 1 otherwise

where coefficients of p(PI\PG), Q x 7i;2, „~Zi x n i 2 ' a n c^ 1 a r e called

adjusting factor (AF for short) with respect to the background knowledge

"a,i1j1 => o-i2j2, Q"• They explicitly represent the influence of a piece of

background knowledge to the prior distribution. Hence, the adjusted prior

distribution can be denoted by

Pbk(PI\PG) = P(PI\PG) x AF{PI\PG), (5)

and the AF is

AF(PI\PG)

Q x ni2 if PG[n] = ailjl, PG[i2] = *,

PI[i2] = ai2J3

- ! — ? - x ni2 if PG[H] = ahjl, PG[i2] = *, (6)

3j ' ( l < j < ni2,j ^ j2) PI[i2] = ai2J

1 otherwise.


So far, we have explained how the prior distribution is influenced by only one piece of background knowledge. We then consider the case that there are several pieces of background knowledge such that for each i (1 < i < m)

and each j (1 < j < n,), there is at most only one piece of background knowledge with its conclusion.

Let S be the set of all pieces of background knowledge to be considered. For each generalization PG, let

B[S,PG] = { i e { l , . . . , m } |

3 i i ( l < ti < m) 3 i j ( l < h < nh) 3j(l <j< m)

[(there is a piece of background knowledge in S with a,ljl as its premise and with atj as its conclusion) & PG[i{\ = a,1ji

t PG[{\ = * ] } , and for each i G B[S, PG], let

J[S, PG, i] = {j G { 1 , . . . fii}| 3 i i ( l < *i < m) 3 j i ( l < h < nh)

[(there is a piece of background knowledge in S with o,i1j1 as

its premise and with a8j as its conclusion) &i PG[ii] = a{1j1

& PG\i] -*]}.

Then, we must use the following adjusting factors AFg with respect to all

pieces of background knowledge.

AFS{PI\PG) = Y[AFi{PI\PG) (7) i = i

w here

AFi{PI\PG) (8)

if i£B[S,PG}, j e J[S,PG,i],

PI[i] = dij *%i] ^ 7)'i

jej[5,PG,i] if ie B[S,PG], x m m - \J[S, PG, i}\ l Vj(j G J[S, PG, i))[PI[i\ ? a,j]

1 otherwise

where for each i; (1 < iI < m) and each j (1 < j < ni), Qij denotes the

strength of the background knowledge (in S) with a^- as its conclusion.

Although Q can be any value from 0 to 1 in principle, giving an exact


value of Q is difficult, and the more the background knowledge, the more difficult to calculate the prior distribution. Hence, in practice, if " a ; U l => cii2j2" with higher possibility, we treat tha t Q is 1, tha t is, aj2j2 occurs but other values of a t t r ibute «2 do not, when a,-^-, occurs. In contrast, if " a , U l => a ; 2 j 2 " with lower possibility, we treat tha t Q is 0, tha t is, a,-2j2

dose not occur but other values of a t t r ibute i?. occur in equiprobable, when 0,-JJJ occurs. Furthermore, if several pieces of background knowledge with higher possibility, and the conclusions of them belong to the same at t r ibute i-2, all of the a t t r ibute values (conclusions) are treated as occurrence in equiprobable, but other values in a t t r ibute i? are treated as no occurrence.

14.2.4 Rule Strength and Unseen Instances

In our approach, learned rules are typically expressed in

X - • Y with S,

tha t is, "if X then Y with strength 5 " , where X denotes the conjunction

of the conditions tha t a concept must satisfy, Y denotes a concept tha t the

rule describes, and S is a "measure of strength" with which the rule holds.

Concretely, X is a conjunction of the a t t r ibute values of some condition

at t r ibutes which corresponds to a generalization, and Y is a conjunction of

the a t t r ibute values of some decision at tr ibutes. Below, we often identify a

generalization with the conjunction of the at t r ibute values of at t r ibutes in

the generalization. X, Y, and S are called the condition, conclusion and

strength, respectively, of the rule X —* Y with S.

The strength 5 of a rule X —• Y is defined as follows:

S(X -> Y) = s(X) x (1 - r(X -> Y)) , (9)

where s(X) is the strength of the generalization X (i.e., the condition of

the rule), which is defined below, and r(X —• Y) is the rate of noise (of rule

X —* Y) which is defined by Eq. (12) below. In other words, the strength

of a rule represents the incompleteness and uncertainty of the rule, which

is influenced by both unseen instances and noises.

The strength of the generalization X, s(X), is defined as the sum of

prior distributions between X and the observed instances satisfying X. It

represents how many of possible instances satisfying the generalization X

are observed in the database. The initial value of s(X) is 0. The value


will be dynamically updated according to the real da ta in a database. If

all the possible instances satisfying generalization A' occur in the database,

the strength will be the maximal value, 1.

Letting X — PG, the strength of the generalization PG is given by

Eq. (10) when the prior distribution is equiprobable, or by Eq. (11) when

the prior distribution is un-equiprobable (i.e., when background knowledge

is used).

s(PG) = J2p(PI,\PG) = Nin.-rei(PG) x -J— (10) , NPG

s{PG) = £p i t(PI , |PG) = (EAFS{PI,\PG)) X J - (11)

where PI\ is the observed instances in a database, N{ns-re\(PG) is the

number of the observed instances satisfying the generalization PG.

The strength of the generalization X represents explicitly the prediction

for unseen instances. It merits our at tention tha t Eq. (10) and Eq. (11) are

not suitable for duplicate instances. Hence the duplicate instances should

be handled before using the equations.

We argue that the prediction for unseen instances is an important func

tion for discovering rules in real-world databases. In most cases, the set of

instances collected in a database represents a part of all possible instances.

This is reasonable, because we expect to learn rules without first collecting

every possible instance (like physicians who learn how to diagnose diseases

without first having seen every possible pat ient) . Table 14.3 shows an ex

ample on Flu. We can see tha t only a part of symptoms of a disease related

to Headache, Temperature, Muscle-pain can be found in the database, but

several possible symptoms (unseen instances) such as

Headache(yes) A Temperature (normal) A Muscle-pain(no);

Headache(yes) A Temperature (high) A Muscle-pain(no);

Headache(yes) A Temperature (very-high) A Muscle-pain(no);

are not collected yet. This means tha t the learning task is ill-posed if

the possible symptoms are not considered in the learning process. For

previous inductive approaches, without some other sources of constraint,


Table 14.3 A sample database (decision table) on Flu

V \ U\

U2

U3

U 4

" 5

« 6

Headache yes

yes

yes

n o n o

n o

Temperature normal

high very-high

normal high

very-high

Muscle-pain yes

yes

yes yes

n o

yes

F l u no

yes

yes n o n o

yes

there is no way to know the instances of describing a concept that has

never before been observed. Our approach based on the Generalization

Distribution Table provides a possibility for predicting unseen instances and

for explicitly representing the strength of a rule including the prediction.

In other words, our approach tries to find the descriptions of concepts not

only by the instances observed during learning but also by unseen instances.

Tha t is, our approach is based on open world assumption in this sense.

On the other hand, the rate of noises, r, is given by

r(X -+ Y) Njns-re\{X) — Nins-class(X, Y )

Nins-rel{X) (12)

where Nins-rei(X) is the number of the observed instances satisfying the

generalization X, and Nins-ciass(X,Y) is the number of the instances be

longing to the class Y and satisfying the generalization X. r(X —+ Y)

shows the quality of classification, that is, how many instances satisfying

generalization A" cannot be classified into class Y. Furthermore, a user can

specify an allowed noise rate as the threshold value. Thus , the rules with

noise rates larger than the threshold value will be deleted.

14.3 C o m b i n i n g t h e G D T w i t h R o u g h S e t s

This section describes an implementation of the G D T methodology by com

bining the G D T with the rough set methodology ( G D T - R S for short) . By

using GDT-RS, we can first find the rules with larger strengths from possi

ble rules, and then find minimal relative reducts from the set of rules with

larger strengths [3].


14 .3 .1 The Rough Set Methodology

In the rough set methodology for rule discovery, a database is regarded as

a decision table, which is denoted T = (U, A, {Va}a€A, f, C, D), where U is

a finite set of instances (or objects), called the universe, A is a finite set

of at t r ibutes, each Va is the set of values of a t t r ibute a, f is a mapping

from U X A to V(= \Ja£A Va), C and D are two subsets of A, called the

sets of condition a t t r ibutes and decision a t t r ibutes , respectively, such tha t

CUD — A and CnD = 0. Equivalence classes in U/C and U/D are called

condition classes and decision classes, respectively. [15; 20; 10].

The process of rule discovery is tha t of simplifying a decision table and

generating minimal decision algorithm. In general, an approach for decision

table simplification consists of the following steps:

(1) Elimination of duplicate condition at t r ibutes . It is equivalent to

elimination of some column from the decision table.

(2) Elimination of duplicate rows.

(3) Elimination of superfluous values of at t r ibutes .

A representative approach for the computation of reducts of condition at tr ibutes is to represent knowledge in the form of a discernibility matrix [20; 15]. The basic idea can be briefly presented as follows:

Let T = (U,A,{Va}aeA>f>C,D), be a decision table with

U = {ui, u2, • • • ,un}. By a discernibility matrix of T, denoted M(T), we

will mean n x n matr ix defined as

_ f {ceC: c(Ui) / C(UJ)} if 3d e D[d(ut) ± d(Uj)] mij ~{ A if Vd G D[d(ui) = d{Uj)]

for i, j = 1, 2, . . . , n.

Thus entry mij is the set of all the condition at t r ibutes tha t classify objects

Ui and Uj into different decision classes in U/D. Since M(T) is symmetric

and ma = 0, M(T) are represented only by elements in the lower triangle,

tha t is, the m,j with 1 < j < i < n.

The discernibility function fa for T is defined as follows:

For any u; G U,

Mui) = /\{\Jmij : j # i I j € { l I 2 , . . . , n } } j

where


(i) V m-ij is the disjunction of all variables a such that c 6 rriij, if rriij ^ 0

(ii) V rriij = -L(false), if rriij = 0

(iii) V rriij — T(tfrue), if my = A.

Each logical product in the minimal disjunctive normal from (DNF) of

/ T ( M « ) is called a reduct of instance it,-.

Generating minimal decision algorithm is to eliminate the superfluous

decision rules associated with the same decision class. It is obvious that

some decision rules can be dropped without disturbing the decision-making

process, since some other rules can take over the job of the eliminated rules.

14 .3 .2 Simplifying a Decision Table by GDT-RS

By using the GDT, it is obvious that one instance can be expressed by

several possible generalizations, and several instances can be generalized

into one possible generalization. Simplifying a decision table by GDT-RS is

to find a minimal set of generalizations, which contains all of the instances

in a decision table. The method of computing the reducts of condition

at tr ibutes in GDT-RS, in principle, is equivalent to the discernibility matr ix

method [20; 15]. However, we won't find dispensable at tr ibutes. This is

because

• Finding dispensable at t r ibutes does not benefit the best solution

acquiring. The larger the number of dispensable at t r ibutes, the

more difficult to acquire the best solution.

• Some values of a dispensable a t t r ibute may be indispensable for

some values of a decision at t r ibute .

For the database with noises, the generalization tha t contains instances

in different decision classes should be checked by Eq. (12). If a general

ization X contains instances belonging to a decision class corresponding to

Y more than those belonging to other decision classes, and the noise rate

(of X —*• Y) is smaller than a threshold value, then the generalization X is

regarded as a consistent generalization of the decision class corresponding

to Y, and "X —• Y with S(X —• Y)" becomes a candidate rule. Otherwise,

the generalization X is contradictory to all the decision classes, and so no

rule with X as its premise is generated.


It is clear that if a generalization is contradictory, the related general

izations in levels upper than this generalization are also contradictory. For

example, as shown in the sample database in Table 14.2, instance aob\C\

can be generalized into {ao}, {&i}, {c i} , {ao&i}, {aoCi}, or {&1C1}. Gen

eralizations {61} and {CIQCI} are contradictory because they contain the

instances belonging to different decision classes. Furthermore, generaliza

tions {ao} and {c\} are also contradictory because {aoCi} is contradictory.

For instance a^bxci, only two generalizations {0061} and {61 c\] can be used.

14 .3 .3 Rule Selection

There are several possible ways for rule selection. For example,

• Selecting the rules tha t contain as many instances as possible;

• Selecting the rules in the levels as high as possible according to the

first type of biases stated above;

• Selecting the rules with larger s trengths.

Here we would like to describe a method of rule selection for our purpose

as follows,

• Since our purpose is to simplify the decision table, the rules that

contain less instances will be deleted if a rule that contains more

instances exists.

• Since we prefer simpler results of generalization (i.e., more general

rules), we first consider the rules corresponding to an upper level

of generalization.

• If two generalizations in the same level have different strengths, the

one with larger strength will be selected first.

14 .3 .4 Algorithms

We here describe two algorithms (called "Optimal Set of Rules" and "Sub-

Optimal Solution") for implementing the GDT-RS methodology.

14.3 .5 Algorithm 1 (Optimal Set of Rules)

Let Tnoise be the expected threshold value.


Step 1. Create one or more GDTs .

In fact, this step can be omitted because the prior distribution of a generalization can be calculated by Eq. (1) and Eq. (2), if any prior background knowledge is not used for this calculation.

Step 2. Regard the instances with the same condition a t t r ibute values (such as u i , U3, and u$ in the sample database of Table 14.2) as one instance, called a compound instance (such as u1 in the following table), so that the probabilities of generalizations can be calculated correctly.

^ T - ^ _ _ u1,(ui,u3,us)

u2

Ui

« 6

m

a

ao ao ai

ao ai

b

6o fci 6i b2

bi

c

c\ Cl

co C\

CI

d

y,y.n y n n y

Step 3. For each compound instance u' (such as the instance «j in the

above table), let DV(u') be the set of the decision classes to which

the instances in u' belong. Further, for each v G DV(u'), let

N(u',v) be the number of the instances in u' belonging to the

decision class v. Calculate the rate r„ as follows:

r„(u') = 1 N(u',v)

£ N(u',v')' v'eDV(u')

If there exist a i ) £ DV(v!) such tha t rv(u') — min{rvi(u')\v' G

DV(u')} and rv(u') < T n o l j e , then we let the compound instance

u' belong to the decision class v. If there does not exist any v £

DV(u') such that rv(u') < Tnoise, we treat the compound instance

u' as a contradictory instance, and set the decision class of u' to

± ( / a / s e ) . For example,

^lT^-4^ Ui(«l , W3,"5)

a b c ao b0 a

d 1

Let U be the set of all the instances except the contradictory ones.

Step 4- Select one instance u from U . Using the idea of discernibility

matrix, create a discernibility vector ( that is, the row or the column


corresponding to u in the discernibility matrix) for w. For example, the discernibility vector for instance u2 : aobici is as follows:

Wc "2(2/)

«i(-L) b

«2(y) A

«4(n)

a, c «6(n)

6 "7(2/)

A

{a 06i}

{6ici}

Step 5. Compute all reducts for the instance u by using the discernibil

ity function. For example, for instance «2:ao&iCi, two reducts are

acquired: a A b and b Ac.

/ T (w2) = (6) A T A (a V c) A (6) A T = (a A 6) V (6 A c).

Step 6. Acquire the rules from the reducts for the instance u, and revise

the strengths of each rule by Eq. (9). For example, for instance

u2:ao&iCi> the following rules are acquired.

• y with S — 1 x - = 0.5, and

H c i ) —• y with 5 = 2 x - = 1.

Step 7. Select bet ter rules from the rules (for u) acquired in Step 6, by

using the method stated in Section 14.3.3. For example, for the

instance tâo&ic i , the rule "{&ici} —• y" is selected because it

contains the instances more than the rule "{aob\} —* y".

Step 8. U — U — u. If U ^ 0 , then go back to Step 4. Otherwise go to

Step 9.

Step 9. Finish if the number of rules selected in Step 7 for each instance

is 1. Otherwise, by using the method stated in Section 14.3.2, find

a minimal set of rules, which contains all of the instances in the

decision table.

The following table gives the result learned from the sample data

base shown in Table 14.2.

u « 2 , « 7

«4

ue

rules h A a -*• y

co —• n

62 —• n

strengths 1

0.167 0.25

The time complexity of Algorithm 1 is 0(mn2Nrmax), where n is the

number of instances in a database, m is the number of at tr ibutes, Nrmax

is the maximal number of reducts for instances.

We can see tha t the algorithm is not suitable for the database with a lot

of a t t r ibutes . A possible method to solve the issue is to find a reduct (sub

set) of condition at tr ibutes as a preprocessing before using the algorithm


[4]. In the remainder of this section, we would like to discuss another algorithm called Sub Optimal Solution tha t is more suitable for the database with a lot of at tr ibutes.

A l g o r i t h m 2 ( S u b O p t i m a l So lu t ion)

The Sub-Optimal Solution algorithm is a greedy one. It can be described

briefly as follows.

Let Tnoise be the expected threshold value, U = {u\, w2, . . . , « „ } be the set of instances,

C — \a,\,02, • • . , a;} be the set of condition at t r ibutes ,

D = {a;+i, a j + 2 , . . . , am] be the set of decision at tr ibutes. Further, we will use R to denote a set of condition a t t r ibute values, and

RS be denote a set of rules. Initially RS = 0.

Step 1 - Step 3. Same as A l g o r i t h m 1.

Let U be the set of all the non-contradictory instances as in Algo

r i thm 1, and let F = u'/D.

Step 4- Select one decision class c from F. Let

T+ = the set of all instances in c,

T_ ~U -T+.

Tsave+ = 0, T,ave_ = 0, and R = 0,

Step 5. Let S[T+,R] = {v\ v G {/(«,-,a^) : «,• 6 T+ and 1 < j < 1} and

v£R}. For each at t r ibute value v G •S'[7+, R], let

i?'(i>) = i?U{?;},

NRi,v)(+) be the number of instances in T+ with all the a t t r ibute

values in R(v), and

NR'(V){—) be the number of instances in T_ with all the a t t r ibute

values in R (v).

Further, let

Max[T+, R] = { « £ S[T+,R]\ NR,(v){+) = max J ^ ( , 0 ( + ) } } .

Choose an at t r ibute value v G Maa;[T+, R] such that

# * ' ( „ ) ( - ) = / c min H 1 { ^ ' ( . ' ) ( - ) > >


and compute rv, by the following equation:

r„ — XR'{v)(+) + NR,(v)(-)

rv denotes the noise rate of rule "VUig.R'(„)*>,- ~* Conic)", where

Con(c) denotes the conjunction of the decision a t t r ibute values

corresponding to the decision class c.

Step 6. R - RU{v}.

Step 7. Move the instances tha t are not contained by R from T+ and T_

to T3ave+ and Tsave_, respectively. Tha t is,

Let U (v) = {u e U | v is not a condition at t r ibute value of « } .

Then

T+=T+-U'(v), T_=T-- U'(v),

J-save+ — -*-save+ U \ ^ / i

-L save- ~- -Lsave- U U \V).

Step 8. If rv > Tnoise, then go back to Step 5.

Step 9. Insert the rule "VVt£R(ti)Vi —• Con(c)" into RS.

And Set T+ = Tsave+, T_ = T s a t e _ , Tsave+ = 0, Tsave+ = 0 and

R= 0. 5/ep JO. If T+ is not empty, go back to Step 5. Step 11. F = F - c. If F ^ 0, then go to Step 4. Otherwise, output RS.

The time complexity of Algorithm 2 is 0(m2n2). Here we emphasize

tha t not every greedy approach succeeds in producing the best result over

all. Just as in life, a greedy strategy may produce a good result for a while,

yet the overall result may be poor. However, it is a bet ter way for solving

very-large, complex problems.

14 .4 E x p e r i m e n t s

Some of databases such as mushroom, meningitis, postoperative patient,

earthquake, cancer have been tested for our approach. We would like to

use a mushroom database and a meningitis database as examples.


14 .4 .1 Experiment 1

The mushroom database includes descriptions of hypothetical samples cor

responding to 23 species of gilled mushrooms in the Agaricus and Lepiota

Family. Each species is identified as definitely edible, definitely poisonous,

or unknown edibility and not recommended. This latter class was combined

with the poisonous one. The guide clearly states that there is no simple rule

for determining the edibility of a mushroom; no rule like "leaflets three, let

it be" for Poisonous Oak and Ivy.

In the mushroom database, there are 8124 instances, and not any con

tradictory instance and duplicate instance. To acquire the rules that can

discern edible and poisonous mushrooms exactly, we set the threshold value

to 0.

Tables 14.4 and 14.5 give the results of poisonous mushrooms acquired

by GDT-RS and C4.5 respectively, where the column Used Instances de

notes the number of instances contained by a rule, the Sire, column denotes

the strengths of the rules.

The rule selection is based on the method described in Section 14.3.3.

Tha t is, we first select the rules, by which the number of instances contained

is maximal, and then select the ones in higher level, at last select the rules

with larger strength. If several rules in the same level contain the same

instances and have the same strength (such as the rules in the last two

rows in Table 14.4), all of them are selected.

14.4.1.1 Comparing with C^-5

The result of our method is not the same as C4.5 [17]. By comparing the

results of them, we can see that some rules discovered by C4.5 are not

in our result, such as: "odor(m) —• poisonous". The reason is that all of

the instances containing odor(m) are contained by other rules with higher

strength in GDT-RS. Moreover, the rules discovered by using GDT-RS

usually contain more instances than those covered by rules discovered by

using C4.5. For example, the rule

gill-spacing(c) A stalk-surface-above-ring(k) —• poisonous

tha t contains 2228 instances is acquired by GDT-RS. But using C4.5, no

rule can contain instances over 2160.


Table 14.4 The result of GDT-RS for poisonous Conditions

gill-spacing(c) A stalk-surface-above-ring(k) odor(f)

stalk-surface-below-ring(k) A ring-number(o) gill-spacing(c) A ring-type(e) A population(v) cap-surface(s) A gill-spacing(c) A veil-color(w)

A ring-number(o) A population(v) cap-surface(s) A gilUspacing(c) A gill-size(n)

cap-surface(s) A bruises(f) A gill-size(n) staik-color-below-ring(w) A ring-number(o) A spore-print-color(w)

cap-color(g) A bruises(f) A stalk-root(b) cap-color(y) A bruises(f)

stalk-root(b) A habitat(g) cap-surface(y) A gill-spacing(c) A gill-size(n)

A stalk-surface-below-ring(s) gill-color(g) A stalk-root(b)

cap-color(w) A gill-spacing(c) A stalk-root(b) spore-print-color(r)

bruises(f) A gill-spacing(c) A stalk-root(b) A ring-number(o) or

gill-spacing(c) A stalk-shape(e) A stalk-root(b) A ring-number(o) bruises(f) A stalk-root(b) A ring-number(o) A habitat(d)

or stalk-shape(e) A stalk-root(b) A ring-number(o) A habitat(d)

Used Instances

2228 2160 2160 1760

1096 1040 960 872 712 672 612

560 504 152 72

1392

624

Stre.

8 /E 9 /E

12/E 60/E

576/E 16/E 16/E

243/E 100/E 20/E 35/E

64/E 60/E

100/E 9 /E

60/E

210/E

n ni, rii is the number of values of the condition attribute i.

Table 14.5 The result of C4.5 for poisonous

Conditions

odor(f) gill-spacing(c) A ring-number(o) A spore-print-color(w)

odor(p) odor(c)

spore-print-color(r) odor(m)

Error

0.1% 0.1% 0.5% 0.7% 1.9% 3.8%

Used Instances

2160 1184 256

192

72

36

14.4.2 Experiment 2

This section describes a result of an experiment in which background knowl

edge is used in the learning process to discover rules from a meningitis

database [2]. The database was collected at the Medical Research Insti tute,

Tokyo Medical and Dental University. It has 140 instances, each of which

is described by 38 at tr ibutes that can be categorized into present history,


physical examination, laboratory examination, diagnosis, therapy, clinical

course, final s tatus, risk factor etc. The task is to find important factors

for diagnosis (bacteria and virus, or their more detail classifications) and

predicting prognosis. A more detailed explanation on this database could

be found at http://www.kdel.info.eng.osaka-cu.ac.jp/SIGKBS.

For each of the decision at t r ibutes: DIAG2, DIAG, CULTURE,

C C O U R S E , and COURSE(Grouped) , we run GDT-RS on it twice: us

ing background knowledge and without using background knowledge, to

acquire the rules respectively. For discretizing the continuous at t r ibute , an

automatic discretization [19] is used.

14.4.2.1 Background Knowledge Given By a Medical Doctor

The experience of a medical doctor, shuch as:

If the brain wave (EEG-WAVE) is normal, the focus of

brain wave (EEG.FOCUS) is never abnormal;

If the number of white blood cells (WBCs) is high, the

inflammation protein (CRP) is also high

can be used as background knowledge.

In the following list, the background knowledge given by a medical doc

tor is described:

• Never occurring together.

EEG-WAVE (normal) <& EEG.FOCUS(+) CSF-CELL(low) *> CelLPoly(high)

CSF.CELL(low) <& CelLMono(high) • Occurring with lower possibility.

WBC(low) WBC(low) WBC(low) WBC(low) WBC(low) BT(low) BT(low) BT(low)

=>

=^ => => => => => =>

CRP(high) ESR(high) CSF.CELL(high) CelLPoly(high) CelLMono(high) STIFF(high) LASEGUE(high) KERNIG(high)

• Occurring with higher possibility.

http://www.kdel.info.eng.osaka-cu.ac.jp/SIGKBS


WBC(high)

WBC(high)

WBC(high)

WBC(high)

WBC(high)

BT(high)

BT(high)

BT(high)

BT(high)

BT(high)

=> => =>

EEG.FOCUS(+)

EEG.WAVE(+)

CRP(high)

CRP(high)

CRP(high)

ESR(high)

CSF

=> => => =>

=> =>

=>

=> => => =>

'.CELL(high)

CelLPoly(high)

CclLMono(high)

STIFF(high)

LASEGUE(high)

KERNIG(high)

CRP(high)

ESR(high)

FOCAL (+)

EEG.FOCUS(+)

CSF-GLV(low)

CSF-PRO(low)

Here "high" in brackets denoted in the background knowledge means that

the value is greater than the maximal value in the normal values range; and

"low" means that the value is less than the minimal value in the normal

values range.

14.4.2.2 Comparing the Results

The effects of usage of the background knowledge in GDT-RS are as follows:

First, some candidates of rules, which are deleted due to lower strengths

during rule discovery without background knowledge, are selected. For

example, rulei is deleted when no background knowledge is used, but after

using the background knowledge stated above, it is reserved because its

strength increased 4 times.

rulei • ONSET(acute) A ESR(< 5) A CSF.CELL{> 10) A CULTURE(-)

-* VIRUS(E),

Without using background knowledge, the strength S of rulei is 30*(384/E).

In the background knowledge given above, there are two clauses related to

this rule:

• Never occurring together

CSF.CELL(low) o CelLPoly(high)

CSF.CELL(low) <£> CelLMono(high).


By using automatic discretization to continuous at t r ibutes Cell-Poly and Cell-Mono, the at t r ibute values in each of at t r ibutes are divided into two groups: high and low. Since the high groups of Cell-Poly and Cell-Mono

do never occur when CSF-CELL(low) occurs, the product of the numbers of a t t r ibute values is decreased to E/4, and the strength S is increased to S = 30 * (384/JE 1 ) * 4.

Second, using background knowledge also causes some rules to be re

placed by others. For example, the rule

rule2 : DIAG(VIRUS(E)) A LOCH, V — EEG-abnormal, S = 30 /£

can be discovered without background knowledge, but after using the back

ground knowledge stated above, it is replaced by

TV,lt2* :

EEG-FOCUS(+) A LOC[4, 7) — EEG-abnormal, S = (10/E) * 4.

The reason is that both of them contain the same instances, but the strength

of rule,2' becomes larger than that of rule2-

The result has been evaluated by a medical doctor. According to his

opinions, both rule\ and rule? are reasonable, and ruley is much better

than rule-}-

Although the similar results can be obtained from the meningitis data

base by using GDT-RS and C4.5 if such background knowledge is not

used [2], it is difficult tha t such background knowledge is used in C4.5 [17].

14 .5 C o n c l u s i o n

In this paper, we presented a soft approach called GDT-RS for rule dis

covery in databases, which is based on the combination of Generalization

Distribution Table (GDT) and the Rough Set theory, as well as discussed the

algorithms for its implementation. By using the GDT-RS, we can first find

the rules with larger strengths from possible rules, and then find minimal

relative reducts from the set of rules with larger strengths. Thus, a minimal

set of rules with larger strengths can be acquired from databases with noisy,

incomplete data. We described that our approach is very soft, tha t is, (1)

biases can be flexibly selected for search control, and background knowledge


can be used as a bias to control the creation of a G D T and the discovery

process; (2) unseen instances are considered in the discovery process, and

the uncertainty of a rule, including its ability to predict possible instances,

can be explicitly represented in the strength of the rule. Some of databases

such as mushroom, meningitis, postoperative patient, earthquake, weather,

and cancer have been tested for our approach.

The ultimate aim of the research project is to create an agent-oriented

and knowledge-oriented hybrid intelligent model and system for knowledge

discovery and da ta mining in an evolutionary, parallel-distributed coop

erative mode. In this model and system, the typical methods of symbolic

reasoning such as deduction, induction, and abduction, as well as the meth

ods based on soft computing techniques such as rough sets, fuzzy sets, and

granular computing can be cooperatively used by taking the GDT, the tran

sition matrix in stochastic process as mediums. Tha t is, the work tha t we

are doing takes but one step toward this model and system.


R e f e r e n c e s

[1] J. Cendrowska, "PRISM: An Algori thm for Inducing Modular Rules" . Inter.

J. of Man-Machine Studies, Vol.27 (1987) 349-370.

[2] J.Z. Dong, N. Zhong, and S. Ohsuga, "Rule Discovery from the Meningitis Da tabase by G D T - R S " (Special Panel Discussion Session on Knowledge Discovery from a Meningitis Database) Proc. the 12th Annual Conference of JSAI, Tokyo, June 17 (1998) 83-84.

[3] J.Z. Dong, N. Zhong, and S. Ohsuga, "Probabil is t ic Rough Induct ion" , T . Yamakawa and G. Ma t sumoto (eds.) Methodologies for the Conception, Design and Application of Soft Computing, P roc . 5 th In terna t iona l Conference on Soft Comput ing and Information/Intel l igent Systems ( I IZUKA'98) , World Scientific (1998) 943-946.

[4] J.Z. Dong, N. Zhong, and S. Ohsuga, "Using Rough Sets with Heuristics to Feature Selection", Zhong, N., Skowron, A., and Ohsuga, S. (eds.) New Directions in Rough Sets, Data Mining, Granular-Soft Computing, Lecture Notes in AI 1711, Springer-Verlag (1999) 178-187.

[5] U.M. Fayyad, G. Pia te tsky-Shapiro , P. Smyth , and R. Uthurusamy (eds.) Advances in Knowledge. Discovery and Data Mining, M I T Press (1996).

[6] D .F . Gordon and M. DesJardins . "Evaluat ion and Selection of Biases in Machine Learning", Machine Learning, Vol.20 (1995) 5-22.

[7] J. Han, Y. Cai, and N. Cercone, "Data-Driven Discovery of Quan t i t a t ive Rules in Relat ional Da tabases" , IEEE Trans. Knowl. Data Eng., Vol.5 (No . l ) (1993) 29-40.

[8] H. Hirsh, "Generalizing Version Spaces", Machine Learning, Vol.17 (1994)

5-46.

[9] P. Langley, Elements of Machine Learning, Morgan Kaufmann Publishers

(1996).

[10] T.Y. Lin and N. Cercone (ed.) Rough Sets and Data Mining: Analysis of

Imprecise Data, Kluwer Academic Publ ishers (1997).

[11] T . Mollestad and A. Skowron, "A Rough Set Framework for D a t a Mining of Proposi t ional Default Rules", in Z.W. Ras and M. Michalewicz (eds.)


Proc. Ninth International Symposium on Methodologies for Intelligent Systems (ISMIS-96 ) , LNAI 1079, Springer (1996) 448-457.

[12] T .M. Mitchell, "Version Spaces: A Cand ida te Eliminat ion Approach to Rule Learning", Proc. 5th Int. Joint Conf. Artificial Intelligence (1977) 305-310.

[13] T .M. Mitchell, "Generalization as Search", Artificial Intelligence, Vol.18 (1982) 203-226.

[14] S. Ohsuga, "Symbol Processing by Non-Symbol Processor", Proc. 4th Pacific Rim International Conference on Artificial Intelligence (PRICAI'96) (1996) 193-205.

[15] Z. Pawlak. ROUGH SETS, Theoretical Aspects of Reasoning about Data,

Kluwer Academic Publishers (1991).

[16] J .R. Quinlan, "Induct ion of Decision Trees" , Machine Learning, Vol.1 (1986) 81-106.

[17] J.R. Quinlan, C4-5: Programs for Machine Learning, Morgan Kaufmann (1993).

[18] J .W. Shavlik and T .G . Diet ter ich (eds.). Readings in Machine Learning.

Moran Kaufmann Publishers (1990).

[19] N. Shan, H.J. Hamil ton, W. Ziarko, and N. Cercone. "Discretization of Con-t inuos Valued At t r ibu te s in Classification Systems", Proc. 4th International Workshop on Rough Sets, Fuzzy Sets, and Machine Discovery (RSFD'96) (1996) 74-81.

[20] A. Skowron and C. Rauszer. "The discernibility matr ics and functions in information sys tems" , R. Slowinski (ed.) Intelligent Decision Support (1992) 331-362.

[21] A. Skowron and L. Polkowski, "Synthesis of Decision Systems from D a t a Tables" , in T.Y. Lin and N. Cercone (eds.) , Rough Sets and Data Mining: Analysis of Imprecise Data, Kluwer (1997) 259-299.

[22] P. Smyth and R. M. Goodman , "An Information Theoret ic Approach to Rule Induct ion from Databases" , IEEE Trans. Knowl. Data Eng., Vol.4 (No.4) (1992) 301-316.

[23] J. Teghem and J. Char le t , "Use of Rough Sets Method to Draw Premoni tory Factors for ea r thquakes by Emphas ing Gas Geochemistry: T h e Case of a Low Seismic Activity Contex t in Belgium", in R. Slowinski (ed.) Intelligent decision support: Handbook of applications and advances of rough set theory, Kluwer (1992) 165-179.

[24] L. A. Zadeh, "Toward a Theory of Fuzzy Information Granula t ion and Its Central i ty in Human Reasoning and Fuzzy Logic", Fuzzy Sets and Systems, Elsevier Science Publishers , 90 (1997) 111-127.


[25] N. Zhong and S. Ohsuga, "Using General izat ion Distr ibut ion Tables as a Hypotheses Search Space for General izat ion", Proc. 4th International Workshop on Rough Sets, Fuzzy Sets, and Machine Discovery (RSFD-96) (1996) 396-403.

[26] N. Zhong, J.Z. Dong, and S. Ohsuga, "Discovering Rules in the Environment with Noise and Incompleteness" , Proc. 10th International Florida AI Reaserch Symposium (FLAIRS-97) edited in the Special Track on Uncertainty in . 4 / (1997) 186-191.

Chapter 15

Data Mining via Linguistic Summaries of Databases: An Interactive Approach

Janusz Kacprzyk and Slawomir Zadrozny Systems Research Institute, Polish Academy of Sciences

ul. Newelska 6, 01-447 Warsaw, Poland University of Applied Information Technology and Management

ul. Newelska 6, 01-447 Warsaw, Poland

Abstract

We propose an interactive approach to data mining meant as the derivation of linguistic summaries of databases. For interactively formulating the linguistic summaries, and then for searching the database, we employ Kacprzyk and Zadrozny's [6-11]) fuzzy querying add-on, FQUERY for Access. We present an implementation for the derivation of linguistic summaries of sales data at a computer retailer.

Keywords: data mining, knowledge discovery, linguistic summaries, fuzzy logic, database, querying, fuzzy querying, fuzzy linguistic quantifier, natural language, interface

15.1. Introduction

The recent growth of Information Technology (IT) has implied, among others, the availability of a huge amount of data (from diverse, often remote databases).

Unfortunately, the availability of (raw) data does not make by itself the use of those data more productive. More important are relevant, nontrivial dependencies that are encoded in those data. Unfortunately, they are usually hidden, and their discovery is not a trivial act, and requires some intelligence.

Data mining is meant here as the (automatic) discovery of such relations, dependencies, etc. from data (stored in a database). In particular, here we mean it as the derivation of linguistic summaries.

We propose an approach to the derivation of linguistic summaries of large

325

326 J. Kacprzyk & S. Zadrozny

sets of data in the sense of Yager [18-20], i.e. derived as linguistically quantified propositions as, e.g., by "most of the employees are young and well paid", with a degree of validity (truth).

We would also be interested in more conceptually sophisticated linguistic summaries as, e.g., "most orders are difficult". We advocate the view that the linguistic summaries of the type mentioned above may only be practically formulated interactively, i.e. via an interaction with the user. This interaction will proceed via Kacprzyk and Zadrozny's [6-11] FQUERY for Access, a fuzzy querying add-on to Access. New perspectives related to querying via WWW will be mentioned too.

As an example we will show an implementation of the data summarization system proposed for the derivation of linguistic data summaries in a sales database of a computer retailer

15.2. Linguistic Summaries Using Fuzzy Logic with Linguistic Quantifiers

In this paper we mean the linguistic summaries in the sense of Yager [18-20]. Basically, we suppose that we have:

• V - a quality (attribute) of interest, with numeric and non-numeric (e.g. linguistic) values - e.g. salary in a database of workers,

• Y = \yi,.-,ynj - a set of objects (records) that manifest quality V, e.g. the

set of workers; V(yt) - values of quality V for object y,-,

• D = fy{y\ \..., V(ynx)} - a set of data (database)

A summary of data set consists of:

• a summarizer 5 (e.g. young), • a quantity in agreement Q (e.g. most), • truth (validity) T- e.g. 0.7,

and may exemplified by "T(most of employees are young)=0.1". More specifically, if we have a summary, say "most (Q) of the employees

(y,'s) are young (S)", where "most" is a fuzzy linguistic quantifier Hg(x),xe [0,1], "young" is a fuzzy quality 5, and MsCV/Xyj e Y > t n e n using

the classic Zadeh's [23, 24] calculus of linguistically quantified propositions, we obtain

Data Mining via Linguistic Summaries of Databases ... 327

T = fiQ[j;ins(yi)] (1)

For more sophisticated summaries as, e.g., "most (Q) employees (v,'s) are young (S) and well paid (F)"> the reasoning is similar, and we obtain

m m T = fiQ(Z(jip(x,)AHB(x,))/^fiB{x,)) (2)

i=\ (=1

N where, X/igOt,) * 0, and A is a f-norm.

;=i

The above calculus may be replaced by, e.g., OWA operators (cf. Yager and Kacprzyk's [21] volume).

15.3. A General Scheme for Fuzzy Logic Based Data Summarization

The simple approach given above has some serious limitations. Basically, in its source version, it is meant for one-attribute simplified summarizers (concepts) -e.g., young. It can be extended to cover more sophisticated summaries involving some confluence of attribute values as, e.g, "young and well paid", but this should be done "manually", and leads to some combinatorial problems as a huge number of summaries should be generated and validated to find the most proper one.

The validity criterion is not trivial either, and various measures of specificity, informativeness, etc. may be employed. This relevant issue will not be discussed here, and we will refer the reader to, say, Yager [18-20] or Kacprzyk and Yager [5].

For instance, cf.. George and Srikanth [2], let a database with n attributes-labels + linguistic quantifiers, constituting a description of workers, be:

Q a few Imany Most Almost all

age young ca. 35 middle aged old

salary low medium high very high

Experience Low Medium High very high


Then, we should generate the particular combinations:

• almost none workers are: young, low salary, low experience • a few workers are: young, low salary, low experience • • almost all workers are: old, very high salary, very high experience,

whose number may be huge in practice; then, we have to calculate the validity of each summary.

This is a considerable task, and George and Srikanth [2] use a genetic algorithm to find the most appropriate summary, with quite a sophisticated fitness function.

Clearly, when we try to linguistically summarize data, the most interesting are non-trivial, human-consistent summarizers (concepts) as, e.g.:

• productive workers, • difficult orders, etc.

and it may easily be noticed that they involve a very complicated combination of attributes as with, e.g.: a hierarchy (not all attributes are of the same importance for the concept in question), the attribute values are ANDed and/or ORed, k out of n, most, etc. of them should be accounted for, etc.

The basic idea of fuzzy logic based data summarization (data mining) adopted in this paper consists in using a linguistically quantified proposition, as originated by Yager [18, 19], and here we extend it to using a fuzzy querying package.

We start with the reinterpretation of (1) and (2) for data summarization. Thus, (1) is meant as formally expressing a statement

"Most records match query S" (3)

We assume a standard meaning of the query as a set of conditions on the values of fields from the database's tables, connected with AND and OR. We allow for fuzzy terms in a query (see next section), which implies a degree of matching from [0,1] rather than a yes/no matching. Effectively, a query S defines a fuzzy subset (fuzzy property) on the set of the records, where the membership of them is determined by their matching degree with the query.

Similarly, (2) may be interpreted as expressing a statement of the following type

"Most records meeting conditions F match query S" (4)


Thus, (4) says something about a subset of records, i.e. only those which satisfy (3). That is, in database terminology, F corresponds to a filter and (4) claims that most records passing through F match query S. Moreover, since F may be fuzzy, a record may pass through it to a degree from [0,1]. As this is more general than (3), we will assume (4) as the basis.

That is, we seek, for a given database, propositions of the type (1), interpreted as (4), which are highly valid (true).

Basically, a proposition sought consists of three elements:

• a fuzzy filter F (optional), • a query S, and • a linguistic quantifier Q.

There are two limit cases, where we:

• do not assume anything about the form of any of these elements, • assume fixed forms of the fuzzy filter and query, and only seek a linguistic

quantifier Q.

Obviously, in the first case data summarization will be extremely time-consuming, though may produce interesting results, not predictable by the user beforehand.

In the second case the user has to guess a good candidate formula for summarization, but the evaluation is fairly simple - requires more or less only the same resources as answering a (fuzzy) query. Thus, the second case refers to the summarization known as ad hoc queries, extended with an automatic determination of a linguistic quantifier.

Between these two extremes there are different types of summaries, with various assumptions on what is given and what is sought. In case of a linguistic quantifier, it may be given or sought. In case of a fuzzy filter F and a fuzzy query S, more possibilities exist. Basically, both F and S consist of simple conditions, each stating what value afield should take on, and connected using logical connectives.

Here we assume that the table(s) of interest for summarization are fixed. We will use the notation shown in Table\l5.1 to describe what is given or what is sought in respect to the fuzzy filter F and query S (below A stands for F or S)


Table 15.1. Given and sought elements of summaries

A All is given (or sought), i.e., fields, values and how simple conditions are linked using logical connectives,

A c Fields and linkage of simple conditions are given, but values are left out, Av Denotes sought left out values referred to in the above notation A Only a set of fields is given, the other elements are sought

Using this notation we may propose a rough classification of summaries shown in Table 15.2.

Table 15.2. A classification of summaries

Type 1

1.1

2

2.1 3

3.1 3.2

Given S

S,F

Q,S>C

Q,^C,F Nothing

w s

Sought

Q

Q

sv

5V

S,F,Q

S,F,Q FQ

Remarks Simple summarizing through ad-hoc query (value-focused) as above + the use of fuzzy filter, i.e., summary is related to a fuzzy subset of records in the simplest case corresponds to the search for typical or exceptional values (see the comments given below this table)

Fuzzy rules, extremely expensive computationally Much more viable version of above Looking for causes of some pre-selected, interesting data features (machine learning alike)

Thus, we distinguish 3 main types of data summarization. Type 1 is a simple extension of fuzzy querying as in FQUERY for Access. Basically, the user has to conceive a query, which may be true for some population of records in the database. As a result of this type of summarization he or she receives some estimate of the cardinality of this population as a linguistic quantifier. The primary target of this type of summarization is certainly to propose such a query that a large proportion as, e.g., most, of the records satisfy it. On the other hand, it may be interesting to learn that only few records satisfy some meaningful query. Type 1.1 is a straightforward extension of Type 1 summaries by adding a fuzzy filter. Having a fuzzy querying engine dealing with fuzzy filters, the


computational complexity is here the same as for Type 1. Type 2 summaries require much more effort. The primary goal of this type

of summary is to determine typical (exceptional) values of a field. Then, query S consists of only one simple condition referring to the field under consideration. The summarizer tries to find a value, possibly fuzzy, such that the query (built of the field, equation relational operator and that value) is true for Q records. Depending on the category of Q used as, e.g., most versus few, typical or exceptional values are sought, respectively. This type of summaries may be used with more complicated, regular queries, but it quickly becomes computationally infeasible (combinatorial explosion) and the interpretation of results becomes vague. Type 2.1 may produce typical (exceptional) values for some, possibly fuzzy, subpopulations of records. From the computational point of view, the same remarks apply as for Type 1 versus Type 1.1.

Type 3 is the most general. In its full version this type of summaries is to produce fuzzy rules describing the dependencies between values of particular fields. Here the use of a filter is essential, in contrast to the previous types when it was optional. The very meaning of a fuzzy rule obtained is that if a record meets a filter's condition, then it meets also the query's conditions - this corresponds to a classical IF-THEN rule. For a general form of such a rule it is difficult to devise an effective and efficient algorithm looking for such dependencies. Full search may be acceptable only in case of restrictively limited sets of rule (summarizer) building blocks, i.e. fields and their possible values.

Type 3.1 summary may produce interesting results in a more reasonable time. It relies on the user pointing out promising fields to be used during the construction of a summarizer. For computational feasibility some limits should also be put on the complexity of query 5 and filter F in terms of the number of logical connectives allowed.

Finally, Type 3.2 is here distinguished as a special case due to its practical value. First of all it makes the generation of a summarizer less time consuming and at the same time has a good interpretation. Here the query is known in an exact form and only the filter is sought, i.e. we look for causes of given data features. For example, we may set in a query that profitability of a venture is high and look for the characterization of a subpopulation of ventures (records) of such a high profitability. Effectively, what is sought, is a (possibly fuzzy) filter F.

The summaries of type 1 and 2 have been implemented as an extension to our FQUERY for Access (cf. Kacprzyk and Zadrozny [6-11]).


15.4. FQUERY for Access: A Fuzzy Querying Add-on

FQUERY for Access is an add-on (add-in) to Microsoft Access that provides fuzzy querying capabilities (cf. Kacprzyk and Zadrozny [6-11]).

FQUERY for Access makes it possible to use fuzzy terms in regular queries, then submitted to the Microsoft Access's querying engine. The result is a set of records matching the query, but obviously to a degree from [0,1].

Briefly speaking, the following types of fuzzy terms are available:

• fuzzy values, exemplified by low in "profitability is low", • fuzzy relations, exemplified by much greater than in "income is much

greater than spending", and • fuzzy linguistic quantifiers, exemplified by most in "most conditions have to

be met".

The elements of the first two types are elementary building blocks of fuzzy queries in FQUERY for Access. They are meaningful in the context of numerical fields only. There are also other fuzzy constructs allowed which may be used with scalar fields.

If a field is to be used in a query in connection with a fuzzy value, it has to be defined as an attribute. The definition of an attribute consists of two numbers: the attribute's lower (LL) and upper (UL) limit. They set the interval which the field's values are assumed to belong to, according to the user. This interval depends on the meaning of the given field. For example, for age (of a person), the reasonable interval would be, e.g., [18,75], in a particular context, i.e. for a specific group. Such a concept of an attribute makes it possible to universally define fuzzy values.

Fuzzy values are defined as fuzzy sets on [-10, +10]. Then, the matching degree md{-,) of a simple condition referring to attribute AT and fuzzy value FV in a record R is calculated by

md{ AT = FV, R) = / i F V ( T ( R ( A T ) ) (5)

where: R(AT) is the value of attribute AT in record R, jUFV is the membership

function of fuzzy value FV, T: [LLAT,ULAT]—»[-10,10] is the mapping from the interval defining AT onto [-10,10] so that we may use the same fuzzy values for different fields. A meaningful interpretation is secured by x which makes it possible to treat all field domains as ranging over the unified interval [-10,10].

For simplicity, it is assumed that the membership functions of fuzzy values are trapezoidal as in Figure 15.1 and t is assumed linear.


•+>

Figure 15.1. An example of the membership function of a fuzzy value

Linguistic quantifiers provide for a flexible aggregation of simple conditions. In FQUERY for Access the fuzzy linguistic quantifiers are defined in Zadeh's [23, 24] sense, i.e. as fuzzy sets on the [0, 10] interval instead of the original [0, 1]. They may be interpreted either using the original Zadeh's approach or via the OWA operators (cf. Yager [19], Yager and Kacprzyk [21]). Zadeh's interpretation will be used here. The membership functions of fuzzy linguistic quantifiers are assumed piecewise linear, hence two numbers from [0,10] are needed. Again, a mapping from [0,N], where N is the number of conditions aggregated, to [0,10] is employed to calculate the matching degree of a query. More precisely, the matching degree, md(-,), for the query "Q of N conditions are satisfied" for record R is equal to

md(Q condition {,R) = /i0 ( T ( £ md{ condition;, R))) (6)

We can also assign different importance degrees for particular conditions. Then, the aggregation formula is equivalent to (1). The importance is identified with a fuzzy set on [0,1], and then treated as property F in (1).

In FQUERY for Access, queries containing fuzzy terms are still syntactically correct Access's queries through the use of parameters. Basically, Access represents the queries using SQL. Parameters, expressed as strings limited with brackets, make it possible to embed references to fuzzy terms in a query. We have assumed a special naming convention for parameters corresponding to the particular fuzzy terms. For example:

[FfA_FV fuzzy value name] will be interpreted as a fuzzy value [FfA_FQ fuzzy quantifier name] will be interpreted as a fuzzy quantifier

First, a fuzzy term has to be defined using a toolbar of FQUERY for Access that is stored internally. This maintenance of dictionaries of fuzzy terms defined


by users strongly supports our approach to data summarization. In fact, the package comes with a set of predefined fuzzy terms but the user may enrich the dictionary too.

When the user initiates the execution of a query, it is automatically transformed and then run as a native query of Access. The transformation consists primarily of the replacement of parameters referring to fuzzy terms by calls to functions that secure a proper interpretation of these fuzzy terms. Then, the query is run by Access as usually.

In our approach, the interactivity, i.e. user assistance, is in the definition of summarizers (indication of attributes and their combinations). This proceeds via a user interface of a fuzzy querying add-on.

Basically, the summarizers allowed are:

• simple as, e.g., "salary is high" • compound as, e.g., "salary is low AND age is old" • compound with quantifiers, as, e.g., "most of {salary is high, age is young,

..., training is well above average)",

We will also use "natural" linguistic terms, i.e. (7+2!) exemplified by: very low, low, medium, high, very high, and also "comprehensible" fuzzy linguistic quantifiers as: most, almost all, ..., etc.

In Kacprzyk and Zadrozny [6-11], a conventional DBMS is used, and a fuzzy querying tool is developed. Basically, the so-called quantified queries, introduced by Kacprzyk and Ziolkowski [16] and Kacprzyk, Zadrozny and Ziolkowski [15], are used. They make it possible to express complex concepts as, e.g., a "serious water pollution" may well be equated with, say, "almost all of the relevant pollution indicators considerably exceed pollution limits (maybe imprecisely specified)". Zadeh's [23, 24] calculus of linguistically quantified propositions is used. Basically, if comp\ is a degree of matching of a record with the i-th partial condition and if the query is "find X's such that Q out of {compx, ..., compp}", then the matching degree of a record with this query is

md = iiQ[—^=lcompi], for the case without importance, fr,e[0,l],

ormd = gfX/liCfy Acompj)/ X/Lj&;], for that with importance.

Notice that the quantified queries are what we do need for implementing the linguistic summaries.

We sketch now FQUERY for Access that supports various fuzzy elements in queries. The main issues are: (1) how to extend the syntax and semantics of the query, and (2) how to provide an easy way of eliciting and manipulating those terms by the user.

The main entities may be summarized as:


• <attribute> For each attribute we give: the lower and the upper limit specifying the interval of possible values, and used for scaling the values while calculating the degree of matching with a fuzzy value used, or the degree of membership in a fuzzy relation.

• <fuzzy value> These are equivalents of imprecise linguistic terms, and are defined by trapezoidal membership functions on [-10, +10].

• <fuzzy relation> An imprecise (fuzzy) relation is represented by a trapezoidal membership function.

• <fuzzy quantifier importance coefficient <OWA-tag> The fuzzy quantifiers were initially defined (cf. Kacprzyk and Zadrozny [6, 7]) as fuzzy sets in [0.0, 10.0] with piecewise linear membership functions. Then (cf. Kacprzyk and Zadrozny [8-11]), importance was added to handle queries as "most of the important subcondidons of the query are fulfilled". Moreover, the OWA operators (cf. Yager and Kacprzyk [21]) are supported.

FQUERY for Access is embedded in the native Access's environment as an add-on. Definitions of attributes, fuzzy values etc. stored in proper locations, and a mechanism for putting them into the Query-By-Example (QBE) sheet (grid) are provided. All the code and data is put into a database file, a library, installed by the user. Parameters are used, and a query transformation is performed.

FQUERY for Access provides its own toolbar. There is one button for each fuzzy element, and the buttons for declaring attributes, starting the querying, closing the toolbar and for help (cf. Figure 15.2).

Generally, the user interacts with FQUERY for Access, by pressing a button, in order to:

• declare attributes, • define fuzzy elements and put them automatically into the QBE sheet, • start the querying process,

The user inserts fuzzy elements into the QBE sheet from special tables in library. Then, the search is started by the GO button. FQUERY for Access employs the standard Access's querying procedure as queries with fuzzy elements are still legitimate. Thus, the original SQL type query is replaced by the modified one with calls to functions filling in an appropriate data structures with information required for computing the matching degree for subsequent records. Then, the query is run by Access as usually and the results are displayed.


% Microsoft Access -Troublesome orders : Kwerenda wybierajaca

' at l^yl^Ttefelî *> ' f l i4*w^!s"1FS^ „ tfl^KPltfrtwl

Older ID Customer ID Employee ID Order Amount v i

W^ msmmm Order ID Product ID U hit Puce _ Quantity t

Product ID Product Nami Supplier ID Category ID i

^

Pole;

• Swtft F o k a *

KryWa,* Sub:

Product ID Ordei details

~pr

Country Orders

ZEE

Delivery Time Orders

MI

Orders

ML

Freight m

Orders . Discount Order detat

j a :

Employee ID Orders

T T O'USA' lF(A_FVSoon] [RA_FVLow] [FfA.FV High] [F fAJV High] [FfA_FQ Mostl

WmmBBgffmm jF-i-8

•• •"'• ? ^<OT''^1./?-r.M^rpAl,"'[c8'llfel f.

Figure 15.2. Composition of a fuzzy query

The main elements used for the composition of a fuzzy query are shown in Figure 2 in which we use a fuzzified example which is included in Access: suppose that we wish to find "difficult orders" defined as:

Difficult order = from outside the USA short delivery time low order amount high freight costs high discount

most (of these conditions should be met)

This is therefore a compound query with a fuzzy linguistic quantifier. It is composed as in Figure 15.2.

The above fuzzy querying add-on was the extended by Kacprzyk and Zadrozny to fuzzy querying over the Internet. The query is defined by using a WWW browser (more specifically, Microsoft Explorer or Netscape Navigator), and the user interface is similar as in Figure 2, with a WWW-browser-like toolbar. The definition of fuzzy values, fuzzy relations, and linguistic quantifiers


is via Java applets. Basically, a query is sent to the WWW server, the searching program

decodes the query, and, if fuzzy elements exist, a HTML page is sent back to specify their membership functions, search is done sequentially, and yields a matching degree. The results are sent back as a HTML document to be displayed. So, the interface consists of: HTML pages for query formation, membership functions specifications, list of records and the content of a selected record, a searching program, and a reporting program.

15.5. Summaries via FQUERY for Access

FQUERY for Access, which extends the querying capabilities of Microsoft Access by making it possible to handle fuzzy terms, may be viewed as an interesting tool for data mining, including the generation of summaries. The simplest method of data mining through ad-hoc queries becomes much more powerful by using fuzzy terms. Nevertheless, the implementation of various types of summaries mentioned before seems to be worthwhile, and is fortunately enough relatively straightforward.

We rely on dictionaries of fuzzy terms maintained and extended by users during the subsequent sessions. The main feature supporting an easy generation of summaries is the adopted concept of context-free definitions of particular fuzzy terms. Hence, looking for a summarizer we may employ any term in the context of any attribute. Thus, we get a summarizer building blocks at hand and what is needed is an efficient procedure for their composition, compatible with the rest of the fuzzy querying system.

In case of Type 1 summaries only the list of defined linguistic quantifiers is employed. The query S is provided by the user and we are looking for a linguistic quantifier describing in a best way the proportion of records meeting this query. Hence, we are looking for a fuzzy set in the space of linguistic quantifiers such that

VS(Q) = tmth(QS(X)) = HQ(lns(Xi)/m) (7) i = l

FQUERY for Access processes the query, additionally summing up the matching degrees for all records. Thus, the sum in (7) is easily calculated. Then, the results are displayed as a list of records ordered by their matching degree. In another window, the fuzzy set of linguistic quantifiers sought is shown. We only take into account quantifiers defined by the users for querying, for efficiency. At


the same time it seems quite reasonable as the quantifiers defined by the user should have a clear interpretation. Currently, FQUERY for Access does not support fuzzy filters. As soon as this capability is added, also summaries of Type 1.1 will be available. Simply, when evaluating the filter for particular records, another sum used in (2) will be calculated and the final results will be presented as in case of Type 1 summaries.

Type 2 summaries require more effort and the redesigning of the results' display. Now, we are given the quantifier and the whole query, but without some values. Thus, first of all, we have extended the syntax of the query language introducing a placeholder for a fuzzy value. That is, the user may leave out some values in the query's conditions and request the system to find a best fit for them. To put such a placeholder into a query, the user employs a new type of parameter. We extend the list by adding the parameter [FfA_F?]. During the query processing these parameters are treated similarly as fully specified fuzzy values. However, the matching degree is calculated not just for one fuzzy value but for all fuzzy values defined in the system. The matching degrees of the whole query against the subsequent records, calculated for different combination of fuzzy values, are summed up. Finally, it is computed for particular combinations of fuzzy values how well the query is satisfied when a given combination is put into a query. Thus, we again obtain as a result a fuzzy set but this time defined in the space of vectors of fuzzy values.

Obviously, such computations are extremely time-consuming and are practically feasible only for one placeholder in a query. On the other hand, the case of one placeholder, corresponding to the search for typical or exceptional values, is the most useful form of a Type 2 summary. It is again fairly easy to embed Type 2 summaries in the existing fuzzy querying mechanism.

15.6. An Example of Implementation

The proposed data summarization procedure was implemented for a sales database of a small-to-medium size computer retailer (ca. 15 employees) located in the Southern part of Poland.

The database is characterized by:

Number of records: 8743 Number of attributes: 14 Number of transaction documents: 4000 Number of suppliers and customers: 3000 Number of products carried: 3000

Data Mining via Linguistic Summaries of Databases . . . 339

and these numbers vary over time: the number of records increases as more and more sales are recorded, the number of attributes is unchanged, the number of transactions increases, the numbers of suppliers and customers increase (mostly due to the increase of the number of customers as the number of suppliers is more or less the same), and the number of products is more or less the same.

The basic structure of the database (in the "dbf" type format) is as shown in Table 15.3.

Table 15.3. The basic structure of the database (in the "dbf type format)

Attribute name Date Time Name Amount (number) Price Commission Value

Discount

Group

Transaction value Total sale to customer Purchasing frequency Town

Attribute type

Date Time Text Numeric

Numeric Numeric Numeric

Numeric

Text

Numeric

Numeric

Numeric

Text

Description

Date of sale Time of sale transaction Name of the product Number of products sold in the transaction Unit price Commission (in %) on sale Value = amount (number) x price; of the product Discount (in %) for transaction Product group to which the product belongs Value of the whole transaction Total value of sales to the customer in fiscal year Number of purchases by customer in fiscal year Town where the customer lives or is based

First, after some initialization, we need to provide parameters. These parameters belong to 3 groups:

• „Query" - definition of the attributes and the subject, • „Type of report" - definition of how the results should be presented,


• „Method" - definition of parameters of the method (i.e. a genetic algorithm)

and their meaning is self-evident. We will now give a couple of examples. First, suppose that we are interested

in what the relation between the commission and the type of goods sold is. We obtain the linguistic summaries shown in Table 15.4.

Table 15.4. Linguistic summaries expressing relations between the group of products and commission

Summary

About Yi of sales of network elements is with a high commission About Vi of sales of computers is with a medium commission Much sales of accessories is with a high commission. Much sales of components is with a low commission About Vi of sales of software is with a low commission About Vi of sales of computers is with a low commission A few sales of components is without commission A few sales of computers is with a high commission Very few sales of printers is with a high commission

Degree of appropriateness Degree of imprecision 0.2329 0.1872

0.2045 0.3453 0.1684 0.4095 0.1376 0.5837 0.1028 0.5837 0.0225 0.5837 0.0237 0.2745 0.1418 0.1872 0.1288 0.1872

Degree of covering Weighted average 0.4202 0.3165

0.5498 0.3699 0.5779 0.3919 0.7212 0.4449 0.4808 0.3162 0.5594 0.3202 0.0355 0.2346 0.0455 0.1881 0.0585 0.1820

Degree of validiiv

0.3630

0.4753

0.5713

0.6707

0.4309

0.4473

0.0355

0.0314

0.0509

As we can see, the results can be very helpful in, e.g., negotiating commissions for various products sold.

Next, suppose that we are interested in relations between the groups of products arid times of sale. We obtain the results shown in Table 15.5.


Table 15.5. Linguistic summaries expressing relations between the groups of products and times of sale

Summary

About 1/3 of sales of computers is by the end of year About Vi of sales in autumn is of accessories About 1/3 of sales of network elements is in the beginning of year Very few sales of network elements is by the end of year Very few sales of software is in the beginning of year About Vi of sales in the beginning of year is of accessories About 1/3 of sales in the summer is of accessories About 1/3 of sales of peripherals is in the spring period About 1/3 of sales of software is by the end of year About 1/3 of sales of network elements is in the spring period About 1/3 of sales in the summer period is of components

Degree of appropriateness Degree of imprecision

0,0999 0,2010

0,0642 0,4095

0,0733 0,2124

0,0833 0,2010

0,0768 0,2124

0,0348 0,4095

0,0464 0,2745

0,0507 0,2525

0,0446 0,2010

0,0458 0,2525

0,0336 0,2745

Degree of covering Weighted average 0,3009 0,1274

0,4737 0,1143

0,2857 0,0982

0,1176 0,0980

0,1355 0,0929

0,4443 0,0860

0,3209 0,0853

0,3032 0,0809

0,2455 0,0768

0,2983 0,0763

0,3081 0,0745

Degree of validitv

0,2801

0,4790

0,1957

0,0929

0,0958

0,4343

0,3092

0,2140

0,2258

0,2081

0,3081

342 J. Kacprzyk & S. Zadroiny

Very few sales of network elements is in the autumn period A few sales of software is in the summer period

0,0485 0,1956

0,0402 0,1362

0,1471 0,0692

0,1765 0,0691

0,0955

0,1765

Notice that in this case the summaries are much less obvious than in the former case expressing relations between the group of product and commission. It should also be noted that the weighted average is here very low but this, by technical reasons, should not be taken literally as these values are mostly used to order the summaries.

Finally, let us show in Table 15.6 some of the obtained linguistic summaries expressing relations between the attributes: size of customer, regularity of customer (purchasing frequency), date of sale, time of sale, commission, group of product and day of sale. This is an example of the most sophisticated form of linguistic summaries supported by the system described.

Table 15.6. Linguistic summaries expressing relations between the attributes: size of customer, regularity of customer (purchasing frequency), date of sale, time of sale, commission, group of product and day of sale

Summary

Much sales on Saturday is about noon with a low commission Much sales on Saturday is about noon for bigger customers Much sales on Saturday is about noon Much sales on Saturday is about noon for regular customers A few sales for regular customers is with a low commission

Degree of appropriateness Degree of imprecision 0,3843 0,2748

0,3425 0,4075

0,3133 0,4708 0,3391 0,3540

0,3882 0,5837

Degree of covering Weighted average 0,6591 0,3863

0,7500 0,3648

0,7841 0,3564 0,6932 0,3558

0,1954 0,3451

Degree of validkv

0,3951

0,4430

0,4654

0,4153

0,1578


A few sales for small customers is with a low commission A few sales for one-time customers is with a low commission Much sales for small customers is for nonregular customers

0,3574 0,5837

0,3497 0,5837

0,6250 0,1458

0,2263 0,3263

0,2339 0,3195

0,7709 0,5986

0,1915

0,1726

0,5105

15.7. Conclusions

We proposed a realistic, interactive approach to linguistic summaries of large sets of data. By human assistance we could derive complex, "intelligent" and human-consistent linguistic summaries. Results of an implementation at a computer retailer are very encouraging.

References

[1] P. Bosc and J. Kacprzyk, Eds., Fuzziness in Database Management Systems. Physica-Verlag, Heidelberg, 1995.

[2] R. George and R. Srikanth, „Data summarization using genetic algorithms and fuzzy logic", in: Genetic Algorithms and Soft Computing (Eds. F. Herrera and J.L. Verdegay), Physica-Verlag, Heidelberg and New York, pp. 599-611, 1996.

[3] J. Kacprzyk, „An interactive fuzzy logic approach to linguistic data summaries", in Proceedings ofNAFIPS'99 - 18' International Conference of the North American Fuzzy Information Processing Society (Eds. R.N. Dave and T. Sudkamp). IEEE Press, Piscataway, NJ, pp. 595 - 599, 1999.

[4] J. Kacprzyk and P. Strykowski, „Linguistic data summaries for intelligent decision support", in Proceedings of EFDAN'99 - 4th European Workshop on Fuzzy Decision Analysis and Recognition technology for Management, Planning and Optimization, pp. 3-12, 1999.

[5] J. Kacprzyk and R.R, Yager, ..Intelligent Summaries of Data Using Fuzzy Logic", International Journal of General Systems (in press), 2000.


[6] J. Kacprzyk and S. Zadrozny, „Fuzzy querying for Microsoft Access", in Proceedings of the Third IEEE Conference on Fuzzy Systems (Orlando, USA), Vol. 1, pp. 167-171, 1994.

[7] J. Kacprzyk and S. Zadrozny, „FQUERY for Access: fuzzy querying for a Windows-based DBMS", in Fuzziness in Database Management Systems (Eds. P. Bosc and J. Kacprzyk) Physica-Verlag, Heidelberg, pp. 415-433, 1995.

[8] J. Kacprzyk and S. Zadrozny, "Fuzzy queries in Microsoft Access v. 2", in Proceedings of 6th IFSA World Congress (Sao Paolo, Brazil), Vol. II, pp. 341-344, 1995.

[9] J. Kacprzyk and S. Zadrozny, "Fuzzy queries in Microsoft Access v. 2", in Fuzzy Information Engineering - A Guided Tour of Applications (Eds. D. Dubois, H. Prade and R.R. Yager):, Wiley, New York, pp. 223-232, 1997.

[10] J. Kacprzyk and S. Zadrozny, implementation of OWA operators in fuzzy querying for Microsoft Access", in The Ordered Weighted Averaging Operators: Theory and Applications (Eds. R.R. Yager and J. Kacprzyk), Kluwer, Boston, pp. 293-306, 1997.

[11] J. Kacprzyk and S. Zadrozny, „Flexible querying using fuzzy logic: An implementation for Microsoft Access", in Flexible Query Answering Systems (Eds. T. Andreasen, H. Christiansen and H.L. Larsen):, Kluwer, Boston, pp. 247-275).

[12] J. Kacprzyk and S. Zadrozny, "Data Mining via Linguistic Summaries of Data: An Interactive Approach", in Methodologies for the Conception, Design and Application of Soft Computing - Proceedings of 5'h IIZUKA'98 (Eds. T. Yamakawa and G. Matsumoto), pp. 668-671, 1998.

[13] J. Kacprzyk and S. Zadrozny, "On sumarization of large datasets via a fuzzy-logic-based querying add-on to Microsoft Access", in: Intelligent Information Systems VII, IPI PAN, Warsaw, pp.249-258, 1998.

[14] J. Kacprzyk and S. Zadrozny, "On interactive linguistic summarization of databases via a fuzzy-logic-based querying add-on to Microsoft Access", in Computational Intelligence - Theory and Applications (Ed. B. Reusch). Springer, Berlin, pp. 462-472, 1999.

[15] J. Kacprzyk, S. Zadrozny and A. Ziolkowski, „FQUERY III+: a 'human consistent1 database querying system based on fuzzy logic with linguistic quantifiers", Information Systems, 6, pp. 443-453, 1989.

[16] J. Kacprzyk and A. Ziolkowski, „Database queries with fuzzy linguistic quantifiers", IEEE Transactions on Systems, Man and Cybernetics, SMC -16, pp. 474-479, 1986.

[17] D. Rasmussen and R.R. Yager, „Fuzzy query language for hypothesis evaluation", in Flexible Query Answering Systems (Eds. T. Andreasen,


H.Christiansen and H.L. Larsen). Kluwer, Boston/Dordrecht/London, pp. 23-43, 1997.

[18] R.R. Yager, „A new approach to the summarization of data", Information Sciences, 28, pp. 69- 86, 1982.

[19] R.R. Yager, „On ordered weighted avaraging operators in multicriteria decision making", IEEE Transactions on Systems, Man and Cybernetics, SMC-18, pp. 183-190, 1988.

[20] R.R. Yager, „On linguistic summaries of data", in Knowledge Discovery in Databases.(Eds. W. Frawley and G. Piatetsky-Shapiro), AAAI/MIT Press, pp. 347-363, 1991.

[21] R.R. Yager and J. Kacprzyk, Eds., The Ordered Weighted Averaging Operators: Theory and Applications. Kluwer, Boston, 1997.

[22] J. Kacprzyk and R.R. Yager, „Linguistic summarization od databases: a perspective", in Proceedings of IFSA'99 - World Congress of the International Fuzzy Sets Association (Taipei, Taiwan), vol. 1, 44-48, 1999

[23] L.A. Zadeh, „A computational approach to fuzzy quantifiers in natural languages", Computers and Mathematics with Applications,. 9, pp. 149-184, 1983.

[24] L.A. Zadeh, „Syllogistic reasoning in fuzzy logic and its application to usuality and reasoning with dispositions". IEEE Transaction on Systems, Man and Cybernetics, SMC-15, pp. 754-763, 1985.

[25]L.A. Zadeh and J. Kacprzyk, Eds., Fuzzy Logic for the Management of Uncertainty, Wiley, New York, 1992.

[26] L. A. Zadeh and J. Kacprzyk, Eds., Computing with Words in Information/Intelligent Systems. Vol 1: Foundations, Physica-Verlag, Heidelberg and New York, 1999.

[27]L.A. Zadeh and J. Kacprzyk, Eds., Computing with Words in Information/Intelligent Systems. Vol. 2: Applications, Physica-Verlag, Heidelberg and New York.

About the Authors

349

Jim F. BALDWIN Department of Engineering Maths University of Bristol Bristol, BS8 1TR, UK Phone:+44-117-928-7754 Fax:+44-117-925-1154 E-mail: [email protected]

Jim Baldwin is Professor of Artificial Intelligence and Director of the AI Research Groep at the University of Bristol. He has been active in fuzzy sets since early days of the subject, and his research in A.I. covers fundamental aspects of knowledge representation, inference under uncertainty and belief theories, fuzzy control and fuzzy set theory, and machine learning. He is the originator of support logic programming and the mass assignment theory which was developed during his EPSRC Advanced Fellowship (1990-95). He has published over 250 papers, is a member of the editorial board for a number of journals and has served on many conference program committees.

Pfmg-Tong CHAN Department of Electrical Engineering, The Hong Kong Polytechnic University, Hung Horn, Kowloon, Hong Kong. E-mail: [email protected]

P.T. Chan received the BEng, MSc and Ph.D degree in electrical engineering from the Hong Kong Polytechnic University (HKPU) in 1994, 1996 and 1999, respectively. During the two years of MSc studies, he also worked as a graduate ft trainee in an E&M engineering consultancy firm. ij He is currently a research associate in H1CPU and working on intelligent control, fuzzy systems and genetic algorithms.

mailto:[email protected]


350

Liya DING Institute of Systems Science, National University of Singapore Singapore Phone:+65-874-2516 Fax: +65-778-2571 E-mail: [email protected]

Liya Ding received the B.E. degree in Computer Engineering from Shanghai University of Technology, Shanghai, China, in 1982, and Ph.D degree in Computer Science from Meiji University, Tokyo, Japan, in 1991. Since 1991, she has been with the Institute of Systems Science, National University of Singapore. From 1994 to 1996, she was the project leader of Neuro-ISS Laboratory, Real World Computing, which is an international research project supported by Japan government. She is a member of IPSA, IEEE, and Singapore Computer Society. Her current research interests include fuzzy logic, approximate reasoning, and applications of knowledge engineering and soft computing.

Juzhen DONG Department of Information Engineering, Maebashi Institute of Technology, 460-kamisadori-cho, Maebashi-City, 371, Japan Phone & Fax : +81-27-265-7366

Juzhen Dong received her Ph.D. from Yamaguchi University, Japan. She is a cooperative researcher in Department of Information Engineering, Maebashi Institute of Technology, Japan. Her research interests include knowledge discovery and data mining, machine learning, soft computing, and intelligent information systems.


351

Jairo ESPINOSA Department of Electrical Engineering ESAT-SISTA Katholieke Universiteit Leuven Kardinaal Mercierlaae 94 B-3001 Beverfee, Belgium Phone: +32-16-393082 Fax: +32-16-393080 E-mail: [email protected] URL: http://www.esat.kuleuven.ac.be/-espinosa

Jairo Espieosa received the degree of Engineer in electronic engineering in 1993 from the IMversidad Distrital Francisco Jose* de Caldas, Colombia and the M.Eng. degree in electrical engineering from the Katholieke Universiteit Leuven, Belgium in 1995. From 1996 till 1999 he was research assistant in the Katholieke Universiteit Leuven, where he wrote his Ph.D. thesis. He is currently working for ISMC (Intelligent Systems Modeling and Control) N.V. in Belgium and he combines his work with teaching activities in the postgraduate program on automation from the Corporacidn Universitaria de Ibague, Colombia.

Takeshi FURUHASHI Department of Information Electronics, Nagoya University Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan phone:+81-052-789-2792 fax:+81-052-789-3166 e-mail: furuhashi @ nuee.nagoya-u.ac.jp

Takeshi Furuhashi received his Ph.D. degree from Department of Electrical & Electronics Engineering, Nagoya University, Japan in 1985. From 1985 to 1988, he was an Engineer at Toshiba Corporation, Japan. Since 1988, he has been with School of Engineering, Nagoya University, Japan. He is currently an associate professor at School of Engineering, Nagoya University, Japan. His main works include: (a) "A Theory on Stability of Fuzzy Control System Using Discrete Event Representation and Numerical Analysis on Transition Between Events", Journal of Japan Society for Fuzzy Theory and Systems, 10, 1, pp.126-134 (1998); (b) "A Creative Design of Fuzzy Logic Controller Using a Genetic Algorithm", Advances in Fuzzy Systems vol.7,

f 'Hi


http://www.esat.kuleuven.ac.be/~espinosa


352

pp.37-48 (1997); (c) "Selection of Input Variables of Fuzzy Model Using Genetic Algorithm with Quick Fuzzy Inference", Lecturer Notes in Artificial Intelligence, vol.1285, pp.45-53 (1997); (d) "On Fuzzy Modeling Using Fuzzy Neural Networks with the Back-Propagation Algorithm", IEEE Tran. on Neural Networks, Vol.3, No.5, pp.801-806 (1992). He is a member of IEEE, NAFIPS, SOFT, and SICE.

Yutaka HATA Department of Computer Engineering Himeji Institute of Technology 2167, Shosha, Himeji, 671-2201, Japan Phone:+81-792-67-4986 Fax: +81-792-66-8868 E-mail: [email protected] URL: http://w wwj 1 .comp.eng.hirneji-tech.ac.jp

Yutaka Hata was born in Hyogo on May 30, 1961. He received the B.E., M.E., and D.E. degrees in Electronics from Himeji Institute of Technology in 1984, 1986, and 1989, respectively. He is currently an associate professor in the Department of Computer Engineering, Himeji Institute of Technology. He spent one year in University of California at Berkeley from 1995. He is now a visiting professor in UC Berkeley. His research interests include multiple-valued logic, soft computing and image processing. He is a member of the IEEE, the Japan Society of Medical Electronics and Biological Engineering, the Japan Society for Fuzzy Theory and Systems, and Biomedical Fuzzy Systems Association.

Kaoru HIROTA Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology 4259 Nagatsuta-cho, Midori-ku, Yokohama 226-9502, Japan Phone:+81-45-924-5685 Fax: +81-45-924-5676 E-mail: [email protected] URL: http://www.hrt.dis.titech.ac.ip

Kaoru Hirota received the B.E., M.E. and Dr. E. degree in electronics from Tokyo Institute of Technology, Tokyo, Japan, in 1974, 1976, 1979, respectively. From 1979 to 1982


http://wwwj


http://www.hrt.dis.titech.ac.ip

353

he was with the Sagami Institute of Technology, Fujisawa, Japan. From 1982 to 1995 he was with the College of Engineering, Hosei University, Tokyo. Since 1995, he has been with the Interdisplinary Graduate School of Science and Technology, Tokyo Institute of Technology, Yokohama, Japan. Dr. Hirota is member of IPSA (Vice President 1991-1993, Treasurer 1997-2001) and SOFT (Vice President, 1995-1997), and he is an editor in chief of Int. of Advanced Computational Intelligence.

Hlsao ISHIMJCHI Department of Industrial Engineering Osaka Prefecture University Gakuen-cho 1-1, Sakai, Osaka 599-8531, Japan Phone: +81-722-54-9350 Fax: +81-722-54-9915 E-mail: [email protected] URL: http://www.ie.osakafu-u.ac.jp/~hisaoi/ ci_lab_e/index.html

Hisao Ishibuchi received the B. S. and M. S. degrees in precision mechanics from Kyoto University, Kyoto, Japan, in 1985 and 1987, respectively, and the Ph.D. degree from Osaka Prefecture University, Osaka, Japan, in 1992. Since 1987, he has been with Department of Industrial Engineering at Osaka Prefecture University, where he is currently a Professor from 1999. He was a Visiting Research Associate at University of Toronto from August 1994 to March 1995 and from July 1997 to March 1998. His research interests include-fuzzy rale-based systems, fuzzified neural networks, genetic algorithms, fuzzy scheduling, and evolutionary games.

Janus KACf EZYK Systems Research Institute Polish Academy of Sciences UL Newelaska 6, 01-447 Warsaw, Poland

University of Applied Information Technology and Management UL Newelska 6, 01-447 Warsaw, Poland

Phone:+48-22-836 44 14 Fax: +48-22-837 27 72 E-mail: [email protected] UR: h ttp://ww w. ibspan. waw.pl/


http://www.ie.osakafu-u.ac.jp/~hisaoi/


http://www.ibspan.waw.pl/

354

Janusz Kacprzyk received his M.S. degree in computer science and automatic control in 1970 from Warsaw University of Technology in Poland. He received the Ph.D. degree in systems analysis in 1977, and the D.Sc. ("habilitation") degrees in 1994, both from the Systems Research Institute, Polish Academy of Sciences in Warsaw, Poland. From 1970 to present he has been employed at the Systems Research Institute, Polish Academy of Sciences, currently as full professor and Deputy Director for Research. In 1981-83, in the spring of 1986 and 1988 he was a visiting professor at various American universities. His research interests include the use of soft computing, mainly fuzzy logic, in various areas related to intelligent systems, notably, decision making and optimization, control, database querying and information retrieval. Recently, he is working on the use of the computing with words paradigms in various areas related to the above. In 1991 - 1995 he was a Vice-President of IPSA, and in 1995 - 1999 he was a member of IPSA Board. From 1999 on he is a member of EUSFLAT Board. Janusz Kacprzyk is the editor in chief of two book series published by the Springer-Verlag group (Physica-Verlag, Heidelberg and New York): "Studies in Fuzziness and Soft Computing" and "Advances in Soft Computing" and serves on the editorial board on more than 10 respected journals.

Naotake KAMIUEA Department of Computer Engineering Himeji Institute of Technology 2167, Shosha, Himeji, 671-2201, Japan Phone:+81-792-67-4986 Fax: +81-792-66-8868 E-mail: [email protected]

Naotake Kamiura was born in Hyogo on February 3, 1967. He received the B.E., M.E., and D.E. degrees in Electronics from Himeji Institute of Technology, in 1990, 1992, and 1995, respectively. He is currently a research associate in the Department of Computer Engineering, Himeji Institute of Technology. His current research interests include multiple-valued logic and fault tolerance. He is a member of IEEE.

iiiiiiMlh

fi <«PWV^ ™»(pump"" #

iiiiiiiiiilMlllltlffiiilTiiiiiiii nhiitf i

RJH1H


355

Mayuka F. KAWAGUCHI

Division of Systems and Information Engineering Graduate School of Engineering Hokkaido University Kita 13, Nishi 8, Kita-ku, Sapporo 060-8628, Japan Phone: +8141-706-6805 Fax: +81-11-706-7830 E-mail: [email protected]

Mayuka F. Kawagechi received the B. Eng. degree in electronics engineering in 1985 and the M. Eng. degree in information engineering in 1987 from Hokkaido University, Japan. She received the Ph.D. degree for her studies on fuzzy arithmetic operations relating to triangular norms in 1993 from Hokkaido University, Japan. From 1988 to 1995, she was an Instructor in the Department of Information Engineering at Hokkaido University. Currently, she is an Associate Professor of Systems and Information Engineering at Hokkaido University. Her main research interests involve multiple-valued logic including fuzzy logic, inference systems and numerical analysis. She is a member of Japan Society for Fuzzy Theory and Systems (SOFT), The Institute of Electronics, Information and Communication Engineers (IEICE), Information Processing Society of Japan (IPSJ), The Institute of Electrical and Electronics Engineers (IEEE), etc.

Trewm P. MAETIN Department of Engineering Maths University of Bristol Bristol, BS8 1TR, UK Phone:+44-117-928-7754 Fax:+44-117-925-1154 E-mail: [email protected]

Trevor Martin is a Senior Lecturer in the AI Research Group at the University of Bristol. His research interests focus on uncertainty in AI, and he is co-developer of the Fril language and the fuzzy data browser. He has published over 90 papers in refereed journals and conferences, and is a joint organiser of several international fuzzy and fuzzy logic programming workshops.

ft- i • JEW



356

MasaaM MIYAKOSHI Division of Systems and Information Engineering Graduate School of Engineering Hokkaido University Kita 13, Nishi 8, Kita»ku, Sapporo 060-8628, Japan Phone: +81-11-706-6810 Fax: +81-11-706-7830 E-mail: [email protected]

MasaaM Miyakoshi received the Ph. D degree in information engineering from Hokkaido University, Sapporo, Japan, in 1985. Currently, he is a Professor in the Graduate School of Engineering at Hokkaido University. His research interests include fuzzy set theory and applications. He is a member of The Institute of Electronics, Information and Communication Engineers (IEICE) of Japan, Japan Society for Fuzzy Theory and Systems (SOFT), etc.

Masaharu MIZUMOTO Department of Engineering Informatics Faculty of Computer Science and Technology Osaka Electro-Communication University Neyagawa, Osaka 572-8530, Japan

Phone:+81-72-820-4569 Fax: +81-72-824-0014 E-mail: [email protected] URL: http://www.osakac.ac.jp/labs/mizumoto

Masaharu Mizumoto received B.Eng, M.Eng, and Dr. Eng degrees in Electrical Engineering from Osaka University in 1966, 1968 and 1971, respectively. He is a professor in the Division of Information and Computer Sciences, Graduate School of Engineering, Osaka Electro-Communication University. He was Vice President of International Fuzzy Systems Association (IPSA) in 1989-1991, Presidents of Biomedical Fuzzy Systems Association (BMFSA) in 1997-1999, and Japan Society for Fuzzy Theory and Systems (SOFT) in 1997-1999. He was an Editor-in-Chief of Journal of SOFT, and now is Advisory editors of International Journal for Fuzzy Sets and Systems, International Journal of Fuzzy Mathematics, Bulletin for Studies and Exchanges on Fuzziness and its Applications, Journal of



http://www.osakac.ac.jp/labs/mizumoto

357

Biomedical Fuzzy Systems Association, and Biomedical Soft Computing and Human Sciences. His current research interests Include fuzzy reasoning and Its applications to fuzzy control methods, fuzzy neural networks, and biomedical systems.

Yuichiro MORI Dept. of Information Science, Kochi UniYersity 2-5-1 Akebonocho, Kochi-shi Kochi 780-8520 Japan Phone: +81-88-844-8340 Fax: +81-88-844-8361 E-mail: [email protected]

I;

Yuichiro Mori received the B.E., M.E. and D.E. in Electrical Engineering from Meiji University in 1990, 1992 and 1995, respectively. He is currently an assistant professor in the Dept. of Mathematics and Information Science, Faculty of Science, Kochi University. His main research Interests are in fuzzy logic circuit and system. He is a member of the Information Processing Society of Japan; and the Japan Society for Fuzzy Theory and Systems.

Masao MUKAIDONO Dept. of Computer Science, Meiji University 1-1-1 HigashiMita, Tama-ku Kawasaki 214-8571 Japan Phone:+81-44-934-7450 Fax: +81-44-934-7912 E-mail: [email protected]

Masao Mukaidono received the B.E, M.E., and Ph.D. degrees in Electrical Engineering from Meiji University, Kasawaski, Japan, In 1965, 1967 and 1970, respectively. He is currently a Professor in the Department of Computer Science, School of Science and Technology, Meiji University. His main research interests are in multiple-valued logic; fuzzy logic and Its applications, fault-tolerant computing, fail-safe logic, and computer aided logic design. Dr. Mukaidono was president of Japan Society of Fuzzy Theory and Systems, and member of the IEEE Computer Society, the Information Processing Society of Japan, and Japanese Society of Artificial Intelligence.



358

Kaio NAKAGAWA Department of Computer Engineering Himeji Institute of Technology 2167, Shosha, Himeji, 671-2201, Japan Phone:+81-792-67-4986 Fax: +81-792-66-8868 E-mail: [email protected]

Mitsubishi Electric Corporation, Automotive Electronics Development Center 840, Chiyoda-Machi, Himeji, 670-8677, Japan Phone:+81-792-98-8894 Fax: +81-792-96-1992 E-mail: [email protected]

Kado Nakagawa was born in Osaka on October 21, 1974. He received the B.E. and M.E. degrees from Faculty of Engineering, Himeji Institute of Technology, in 1997 and 1999, respectively. He is currently pursuing his studies at Mitsubishi Electric Corporation, Automotive Electronics Development Center.

Tomoharu NAKASHIMA Department of Industrial Engineering Osaka Prefecture University Gakuen-cho 1-1, Sakai, Osaka 599-8531, Japan Phone :+81-722-54-9350 Fax :+81-722-54-9915 E-mail : [email protected] URL : http://www.ie.osakafu-u.ac.jp/-hisaoi/ ci_lab_e/index.html

Tomoharu Nakashima received the B. S. and M. S. degrees from Osaka Prefecture University, Osaka, Japan, in 1995 and 1997, respectively, and the Ph.D. degree from Osaka Prefecture University, Osaka, Japan, in 2000. From June 1998 to September 1998, he was a Postgraduate Scholar with the Knowledge-Based Intelligent Engineering Systems Centre, University of South Australia. His current research interests include fuzzy systems, machine learning, genetic algorithms, reinforcement learning, game theory, multi-agent systems, and image processing.





359

Manabu NH Department of Industrial Engineering Osaka Prefecture University Gakuen-cho 1-1, Sakai, Osaka 599-8531, Japan Phone: +81-722-54-9350 Fax: +81-722-54-9915 E-mail: [email protected] URL: http://wwwie.osakafu-u.ac.jp/~hisaoi/ ci_lab_e/index.html

Manabu Nii received the B. S. and M. S. degrees from Osaka Prefecture University, Osaka, Japan, in 1996 and 1998, respectively. He is currently pursuing the Ph.D. degree at Osaka Prefecture University. His research interests include fuzzy rule-based systems, fuzzified neural networks, and genetic algorithms.

Hires!! OHNO Department of Human Factors, Toyota Central R&D Labs., Inc. Nagkute, Aichi, 48011, Japan Phone: +81-0561-63-6579 Fax: +81-0561-63-5743 E-mail: [email protected]

Hiroshi Ohno received his Ph.D. degree from Department of Information Electronics, Nagoya University, Japan, in 1999. Since 1988, he has been with Toyota Central R&D Labs., Inc, Japan. His main works include: "Neural networks control for automatic braking control system", Neural Networks, 7, pp.1303-1312 (1994). He is a member of JSA1, IPSJ, and JCSS.

W




360

Seteuo OHSUGA Department of Information Science and Computer Science, Waseda University, Japan, E-mail : [email protected] jp

Setsuo Ohsuga is currently professor in Department of Information Science and Computer Science, Waseda University, Japan. He received his Ph.D. from the University of Tokyo. He has been professor and director of Research Center for Advanced Science and Technology (RCAST) at the University of Tokyo.

Kazuhiko 0TSUKA Dept. of Computer Science, Meiji University 1-1-1 HigashiMita, Tama-ku Kawasaki 214-8571 Japan Phone: +81-44-934-7442 Fax: +81-44-934-7912 E-mail: [email protected]

He received the B.S. degree in Science and the M.E. degree in Engineering from Meiji University, Japan in 1994 and 1996, respectively. He is currently a Ph.D. student in the Department of Computer Science, Meiji University. His main research interests are fuzzy logic and its applications, approximate reasoning in Artificial Intelligence. He is a student member of Japan Society for Fuzzy Theory and Systems.



31! 1

Ahmad Besharati EAD Department of Electrical Engineering, The Bong Kong Polytechnic University, Hung Horn, Kowloon, Hong Kong E-mail: [email protected]

A.B. Rad received the B.Sc. degree in engineering from Abadan Institute of Technology, Abadan, Iran, the M.Sc. degree in control engineering from the University of Bradford, Bradford, U.K., and the Ph.D. degree in control engineering from the University of Sussex, Brighton, U.K., In 1977, 1986 and 1988, respectively. He Is currently an Associate Professor in the Department of Electrical Engineering, The Hong Kong Polytechnic University, Kowloon, Hong Kong. He has also worked as a control and Instrumentation engineer in the oil Industry for seven years. His current interests include system identification, adaptive control and intelligent process control.

Jonathan M. EOSSITEM Department of Engineering Maths University of Bristol Bristol, BS8 1TR, UK Phone:+44-117-928-7754 Fax:+44-117-925-1154 E-mail: [email protected]

Jonothan Rosslter is a research assistant in the AI research Group at the University of Bristol. Before this position he was a research student under an EPSRC/CASE award with the UK Defense and Evaluation Research Agency. His research Interests Include fuzzy knowledge engineering, time series analysis, and object-oriented fuzzy languages.



362

Yan SHI Department of Information Science School of Information Science Hyushu Tokai University 9-14, Toroku, Kumamoto, 862-8652, Japan Phone:+81-96-386-2666 Fax: +81^96-381^7954 E-mail: [email protected] URL: http://necws-1 .ktokai-u.ac.jp/~shi/y shi.htm

Yan Shi received his Ph.D degree in Information and Computer Sciences from Osaka Electro-Communication University, Japan, in 1997. He is currently an Associate Professor in the Graduate School of Engineering, as well as the School of Information Science, at Kyushe Tokai University, Japan. He was a Research Assistant from 1982 to 1988, and a lecturer from 1988 to 1991 in the Department of Basic Science at the Northeast Heavy Machinery Institute (the present Yanshan University), China. From 1991 to 1992, he was a Visiting Researcher in- the Department of Electronics and Informatics Engineering at Yamagata University, Japan. From 1992 to 1994, he was a Research Associate in the Division of Information and Computer Sciences at Osaka Electro-Communication University, Japan. From 1994 to 1997, he was a Researcher in the Research and Development Division at Mycom Inc., Japan. From 1997 to 2000, he was an Assistant Professor in the Department of Information and System Engineering at Kyushu Tokai University, Japan. His research interests include approximate reasoning, fuzzy system modeling, neuro-fuzzy learning algorithms for system identification. He is a member of the Japan Society for Fuzzy Theory and Systems (SOFT), the Biomedical Fuzzy Systems Association (BMFSA).

Yu-Jane TSAI National University of Kaoshiung, Taiwan No. 416, Lan-Chang Rd., Nan-Tzu District, Kaohsiung, Taiwan, R.O.C. Phone: +886-7-366-1533 Fax: +886-7-366-1545 E-mail: [email protected]

Yu-Jane Tsai received the B.S. degree in Computer Science from Yuan-Ze University, Taiwan in 1995, and M.S. degree in Information Engineering from I-Shou University, Taiwan in 1998, Since 1996, she has been with the laboratory of multimedia, I-Shou

V Jl


http://necws-1


University, Taiwan, where she has been involved in applvinr *.»! mu-Jfisvm techniques to database systems. She is currently a project assistant m ,n AtU-mtc division at National University of Kaoshiueg, Kaoshiung, 'Taiwan

Joos VANDEWALLE Department of Electrical Engineering ESAT-SISTA Katholieke Universiteit Leevee Kardieaal Mercieriaan 94 B-3001 Heverlee, Belgium Phone: +32-16-321709 Fax: +32-16-321970 E-mail: [email protected] URL: http://www.esatkuleuven.ac.be/sista/sista.htnil,

Joos Vaedewalle obtained the electrical engineering degree and a doctorate in applied science, both from the Katholieke Universiteit Leuven, Belgium in 1.971 and 1976 respectively. From 1976 to 1978 he was Research Associate and from July 1978 to July 1979, he was Visiting Professor both at the University of California, Berkeley. Since 1979 he is back at the ESAT Laboratory of the Katholieke Universiteit Leuven, Belgium where he is Full Professor since 1986. He is an Academic Consultant since 1984 at the VSDM group of IMEC (Interuniversity Microelectronics Center, Leuven). Since 1999 he is Vice-Dean of the Faculty of Applied Science at the Katholieke Universiteit Leuven.

Pd-Zhuang WANG West Texas A&M University AEI, Box 60248, Canyon, TX79016, U.S.A. Tel: +1-806-6512454 Fax:+1-806-6512733 E-mail: [email protected]

Peizhuang Wang has been a Full Professor, Tutor of Ph.D., and the Head of National Laboratory of Fuzzy Information Processing and Fuzzy Computing, Beijing Normal University; China since 1983. He is an Adjunct Professor at West


http://www.esat.kuleuven.ac.be/sista/sista.htmli

http://wtamu.edu

364

Texas A&M University since 1996. He was Vice President of International Fuzzy Systems Association (IPSA) (19914993), Vice President of Guangzhou University, China (1985-89), Senior Researcher at the Institute of Systems Science, National University of Singapore (1989-95). He is Chairman of Chinese Chapter of IPSA (1993-up to date), Honorary Chairman of Aptroeix, the Fuzzy Logic Technology Company, San Jose, and Beijing (1994-). He has served as Chairman or member of organisation/program committees of many international conferences, and is a member of editorial boards of 12 Chinese Journals, 8 international refereed journals and many book series.

Shyue-Liang WANG Information Management I-Shou University 1, Section 1, Hsueh-Cheeg Road, Ta-Hse Hsiang, Kaohsiung, Taiwan, R.O.C. Phone: +886-7-656-3711 ext 6551 Fax: +886-7-656-3734 E-mail: [email protected]

Leon S.L. Wang received the B.S. in Applied Mathematics in 1977 from National Chiao Tung University in Taiwan, and the Ph.D. degree in Applied Mathematics in 1984 from the State University of New York at Stony Brook, USA, He is now an Associate Professor in Information Management at I-Shou University, Kaohsiung, Taiwan. From 1984 to 1987, he was an assistant professor in mathematics at University of New Haven, Connecticut. From 1987 to 1994, he joined New York Institute of Technology as a research associate in the Electromagnetic Lab and assistant/associate professor in the Department of Computer Science. In 1996, he is the Director of Computing Center at I-Shou University. Since 1997, he is the Chairman of Information Management at I-Shou University. Dr. Wang is a member of IEEE, Chinese Fuzzy System Association, Chinese Computer Association, Chinese Information Management Association, Chinese Association of Information and Management, Kaohsiung Association for Information Development. His current research interests include fuzzy knowledge-based systems, intelligent information systems, and electronic commerce.


365

MingQiang XU Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology 4259 Nagatsuta-cho, Midori-ku, Yokohama 226-9502, Japan Phone:+81-45-924-5685 Fax: +81-45-924-5676 E-mail: [email protected] URL: http://www.hrt.dis.titech.ac.jp

MingQiang XU received the B.Eng. and M. Eng. in automatic control from Northwestern Polytechnic University, China, Dr. Eng. in Tokyo Institute of Technology, Japan, in 1986, 1989, 1999, respectively. He majors in the study of intelligent systems such as intelligent telecommunication.

Hajime YOSHINO Meiji Gakuin University, Faculty of Law 1-2-37 Shirokanedai, Minato-ku, Tokyo, 108-8636 Japan E-mail: [email protected] URL: http://www.meijigakuin.ac.jp/ ~yoshino/jp/with_tree.htm

Brief Biographical History: 1972-75 Asst. Prof, at Meiji Gakuin University, Faculty of Law 1975-82 Assoc. Prof, at Meiji Gakuin University, Faculty of law 1982- Professor at Meiji Gakuin University, Faculty of law Main works: "Logical Structure of Contract Law System—For Constructing a Knowledge Base of the United Nations Convention on Contracts for the International Sale of Goods—", Journal of Advanced Computational Intelligence, Vol.2, No.l, pp2-11(1998). "On the Logical Foundation of Compound Predicate Formulae for Legal Knowledge Representation", Artificial Intelligence and Law, Vol.5, Nos.1-2, pp.77-96(1997).


http://www.hrt.dis.titech.ac.ip


http://www.meijigakuin.ac.jp/

366

Slawonir ZADIOZMY Systems Research Institute Polish Academy of Sciences Ul. Newelska 6, 01-447 Warsaw, Poland

University of Applied Information Technology and Management UL Newelska 6, 01-447 Warsaw, Poland

E-mail: [email protected] Phone: +48-22-836 44 14 Fax: +48-22-837 27 72 UR: http://www.ibspan.waw,pl/

Slawomir Zadrozey received in 1981 his M.S. degree in computer science from the Department of Mathematics, Computer Science and Mechanics, Warsaw University in Poland. In 1994 he received his Ph. D. degree in computer science from the Systems Research Institute, Polish Academy Science in Warsaw, Poland. Since 1981 he has been with the Systems Research Institute, Polish Academy of Sciences, now as Adjunct Professor. He is also Head of the Centre of Information Technology at the Institute. His current scientific interests include applications of fuzzy logic for decision support, database querying and data analysis. He is the author and co-author of more' than 60 articles and conference papers. He has been involved in the design and implementation of several prototype software packages. He has been actively participating in several international scientific projects. He is also the teacher at the University of Applied Information Technology and Management in Warsaw, Poland, where his interests focus on database management systems theory and application, notably in the Internet environment.

Nlng ZHONG Department of Information Engineering, Maebashi Institute of Technology, 460-kamisadori-cho, Maebashi-City, 371, Japan Phone & Fax : +81-027-265-7366 E-mail: [email protected]

Ning Zhong is currently director of Knowledge Information Systems Laboratory, and an associate professor in Department of Information Engineering, Maebashi Institute of Technology, Japan. He received his Ph.D. from the University


http://www.ibspan.waw.pl/


367

of Tokyo. His research interests include knowledge discovery and data mining, rough sets and granular-soft computing, intelligent agents and databases, knowledge and hybrid systems.

Keyword Index

A C

abduction 321 abstract factor 190 ad hoc query 329 adaptation 5 adjusting factor 304,305 AFRELI 15,16,17,19,26,32,33,35,36 aggregation 171 aggregation function 149,151,153,

154,156,159,161 antecedent validity adaptation 77,78,

87,93 approximate reasoning 4,121 approximation

extended semantic - 124,138 normalized- 124,137

approximation measure 123,135,143 extended - 139 generalized- 124,138 least - 136 simplified - 138

assignment operation 169,175 atomic factor 190 attribute 335

B

background knowledge (see knowledge)

belief networks 4 belief updating 213,216,217,218,

219,220,222,224,228,238 bias 302,321

case 3, center vector 274,278,279,280,283,

284,285,287,291,293 chaotic systems 4, chaotic time series 26,33 classification boundary 261,264 classification of summaries 330 classification system 241,261,264,

267 fuzzy rule based - 258,261,267

closed world assumption 298 clustering 6,16,19,20,27,32,34,36,

274,278,282,283,287,289,291 fuzzy c-means (FCM) -20,27,32,

34,59,274 fuzzy- 59,65,66,67 K-means - 274 mountain- 20,27,32,34 unsupervised - 293

coincidence operator 177 comparison operation 170,177 comparison operator 177 compatibility modification inference

122 competitive learning 276 compositional inference 122 compound fuzzy attributes 149,150,

151,152,153,158,159,160 connectionist architecture 5 consistency 53

weak - 53

convex hull method 99

370

D data mining 3,15,17,273,321,325,

327,337 deduction 321 defuzzification 79

center of height - 79

degree of contribution 275,283,285, 286,293

degree of explanation 48 degree of similarity 280,281 dense neuron 279,280,281,282 direct inference method 180 discernibility function 309 discernibility matrix 309,313 distinguishability 18,21,24,25 dynamic collection 176

E

enumeration type 168 evaluation 202,204,205 evidential logic 222 evidential logic rule 222,227,228,

235,236,238 evolutionary computation 1,5 evolutionary programming (EP) 43 evolutionary optimization 45 experience 2

human - 2

expert knowledge 15,16 explanation 1

F

feedback rule 220 feedforward neural networks

(see neural network) fitness 5 fixed interrelation law 123

fixed point law 123,126,127 fixed semantics law 123 fixed value law 123,125,128 FQUERY for access 332,335,338 Fril 213,216,219,222,231,233,234,

235,238 function approximation 15 function method 183 function type 168 FuZion 17,18,21,25,26,32,34,36 fuzziness 190,198 fuzzy abstract factor 192 fuzzy arithmetic 255 fuzzy atomic factor 192 fuzzy c-means (see clustering) fuzzy control 163

- system 163,175

fuzzy database 149,150,151,160 fuzzy factor 191 fuzzy factor hierarchy 190,193,199 fuzzy filter 329,330,338 fuzzy if-then rule 245,250,253,259,

261,264,265,266,267 fuzzy inference 163,175 fuzzy inference engine 166 fuzzy information processing 165 fuzzy interval 96 fuzzy interpolation function 113 fuzzy linguistic quantifier 336 fuzzy logic 1,4,6 fuzzy modeling 15,43,45,46,78 fuzzy neural network (FNN) 43 fuzzy number 255,256 fuzzy partition 113 fuzzy quantifier 335 fuzzy querying 332,336

- engine 330 - over Internet 336

fuzzy reasoning 87,249 fuzzy relation 335 fuzzy relational equation 103

371

fuzzy rule 5,44,53,70,78,82,88,110 - extraction 253 - generation 59,252 - selection 250 - tuning 62 recurrent- 213,216,219

fuzzy rule base sparse - 96

fuzzy-rule-based approach 241,264, 267

fuzzy-rule-based system 243,266 fuzzy set 165,213,259,321,337

convex - 135 normalized - 135 support of- 126,137 trend- 231,232,233,235,237,238

fuzzy set theory 4 fuzzy singleton-type reasoning - 59,

60,61

fuzzy singleton fuzzifier 79 fuzzy spline 109

-curve 111,113

fuzzy systems 4,165,186 fuzzy system description language

(FDL) 164,186 fuzzy term 334,337 fuzzy trend feature 213 fuzzy truth value 165,167,170,171 fuzzy type 166 fuzzy value 332,335,338 fuzzy variable 46 fuzzy_and 171 fuzzy _not 171 fuzzy_or 171

G

GDT-RS 297,298,308,310,316,318, 319,320

generalization distribution table (GDT) 297,298,299,301,302,308,

312,320,321 genetic algorithms, GA 4,6,250 genetics-based machine learning

252 gradient descent method 62,64 gradient descent optimization 45, granular computing 321

H

high level features 213,238 human-consistent summarizer 328 hypothesis generation 301

I

imprecision 1, indirect inference method 180 induction 321 inductive learning 297 inductive method 298 inference 1 infimum 137 initial value 263 interpolation type 169 intelligence 2 intelligent systems 1 interrelation 130 interval arithmetic 255 interval division method 173,178,

183 iris data 265

J

Jeffrey's rule 233

GA-based rule selection 265

372

K KH-method 99 knowledge 2,

- acquisition 1,2,6,43,44,45,47,48, - elicitation 3 -discovery 3,7,278,283,321 -engineering 1,2,3 - integration 7 - representation 1,3,6,190 - validation 1,3

background - 298,300,303,304, 305,306,318,319,320

declarative - 2 deep - 2 domain - 2 expert- 15,16 heuristic - 2 linguistic - 241 procedural - 2 surface - 2

knowledge-based inference 6 knowledge-based system 2,3,7,190,

211

L

L-R representation 98 learning 5 learning algorithm 263 learning rate 263 learning theory 4 least approximation measure

(see approximation measure) legal reasoning 193 linear revising method

(see revising method) linear rule interpolation 96,99 linguistic explanation 44 linguistic knowledge 241,259,262,

264,267

linguistic meaning 44,45,47,48,51, 53

linguistic models 16 linguistic quantifier 329,330,333,

337 fuzzy - 336

linguistic summaries 325,326 linguistic truth value 122 linguistic value 245,259 linguistic variable 5 logic operator 177 logical operation 170

M

mass assignment 213,214,215,216, 219,228

membership degree 169,175 membership function 4,44,47,52,53,

62,64,165,167,169,172,337 Gaussian-type - 46 trapezoidal - 47 triangular - 47,79

modelling 7 fuzzy- 15,43,45,46,78 perception-based- 213 memory-based - 213 neuro-fuzzy - 7 nonlinear - 44 trend- 213

momentum constant 263 mountain clustering

(see clustering) modification 171, multivalued logic 4

373

N

neural network 1,5,6,278,283,284, 289,293

trained- 253,259,264 feedforward - 259,263,274

neural-network-based approach 241, 243,261,264

neuro computing 4 neuro-fuzzy learning 59,60,62,63,

64,69 neuro-fuzzy modeling 7 Neyman Scott's method 275,287 non-deviation property 122 nonlinear dynamic system 5 nonlinear modeling 44 null query 149,150,151,152,153,160 numerical data 253,259,262,263,

264,267

o open world assumption 308 optimal interface design 17,38 optimization 5 ordered datasets 213,217,229 OWA operator 327

P

pattern 3 pattern classification 241 perception-based modelling 213 prior distribution 300,303 probabilistic reasoning 4 probabilistic relationship 299 prune 274,283,284

Q

quantity in agreement 326

query 328,329,334,338

R

recurrent fuzzy rule (see fuzzy rule) recursive least squares (RLS) 23 reduct 310 reference vector 275,276,278,282 relation matrix 122 relation keeping property 123 renewal procedure 276 representative points method 172,

178 retrieval 202,203,206 revising function 123 revising method

linear- 123,125,127,128,129 semantic- 123,130,132,134

revision principle 116,121 rough set 297,298,308,309,321 rough set theory 320 rule 3

fuzzy - (see fuzzy rule)

rule base 171 rule discovery 298,302 rule extraction 5,6,15,254 rule generation 5 rule selection 265,311,316 rule strength 306,320

s self-Organizing Map (SOM) 274,

275,276,278,279,281,282,287, 289,293

semantic approximation 130 semantic discrimination analysis

224,228 semantic integrity 17,18,25 semantic relation 130

374

semantic revising method (see revising method)

semantic unification 221,231 sigmoidal activation function 255,

256 similarity matching 276 similarity measure 135,149,153,157,

158,159,190,193,197,200,201, 202,211

context-based - 194,200 context-sensitive - 190 distance-based - 190,200 factor-based- 190,206 feature-based - 190 integrated - 194 structural- 190,205

singleton type 169 soft computing 1,4,321 sparse fuzzy rule base

(see fuzzy rule base) stopping condition 263 strength of rule (see rule strength) summarizer 326 support logic 216 support of fuzzy set (see fuzzy set) support pairs 223 supremum 137 system notation 169 systematic random search 5

Tversky model 196,198,200

u uncertainty 1 unsupervised clustering

(see clustering) user function method 172 user interface 334

vague legal concept 205 value-point law 123,125,129 valuable interval 130 vector type 169 voting interpretation 215 voting model 215,221

w winning neuron 276

table look-up scheme 77,78,93 time series 213,228,229,237 trained neural networks 253 training pattern 267 trapezoid type 169 trend fuzzy set 231,232,233,235,

237,238 triangle type 169 truth qualification 180

converse - 181

f*T_5l Soft Computing Series — Volume 5

fl new Paradigm of Knowledge Engineering by Soft Computing Editor: Liya Ding (National University of Singapore)

Soft computing (SC) consists of several computing paradigms, including neural

networks, fuzzy set theory, approximate reasoning, and derivative-free

optimization methods such as genetic algorithms. The integration of those

constituent methodologies forms the core of SC. In addition, the synergy allows

SC to incorporate human knowledge effectively, deal with imprecision and

uncertainty, and learn to adapt to unknown or changing environments for

better performance. Together with other modern technologies, SC and its

applications exert unprecedented influence on intelligent systems that mimic

human intelligence in thinking, learning, reasoning, and many other aspects.

Knowledge engineering (KE), which deals with knowledge acquisition,

representation, validation, inferencing, explanation, and maintenance, has

made significant progress recently, owing to the indefatigable efforts of

researchers. Undoubtedly, the hot topics of data mining and knowledge/data

discovery have injected new life into the classical Al world.

This book tells readers how KE has been influenced and extended by SC and

how SC will be helpful in pushing the frontier of KE further. It is intended for

researchers and graduate students to use as a reference in the study of

knowledge engineering and intelligent systems. The reader is expected to have

a basic knowledge of fuzzy logic, neural networks, genetic algorithms, and

knowledge-based systems.

ISBN 981-02-4517-3

www. worldscientific. com 4606 he

Documents

45052361 a New Paradigm of Knowledge Engineering by Soft Computing